## Tuesday, January 22, 2008

### Compressed Sensing: A Lower Bound for the Number of Measurements in Sparse Measurement Matrices ?

[Update: see the comment section of this post, Radu Berinde mentions that the bound I am mentioning is satisfied with high probability. Thank you Radu]

While fascinating, the Sparse Measurement Matrices proposed by Radu Berinde and Piotr Indyk has a small drawback, I think. Irrespective to the dimension of the underlying manifold under study, the number of measurements must meet a higher bound in order for all columns of the measurement matrix to be linearly independent. Namely, if k is the number of ones per column, n the number of measurements and m the dimension of the space where the signal lives, the maximum size of the signal (or dimension of the ambient space) cannot be higher than the binomial coefficient:

$C_n^k = {n \choose k} = \frac{n!}{k!(n-k)!}.$
Conversely, the number of measurements (n) must be enough for the inequality to stand.

m less than C(n,k)

If one considers using k = 2, and the number of measurements is 2 (n=2) then the absolute maximum size of the signal is 1. When the number of measurements is 10, then the absolute maximum size of the signal is 45. When looking at a 5 MB image, one needs to play with n (number of measurements) and k (number of ones per column) to find the closest m to 5 x 10^6.

Wolfram's site has a collection of references on binomial coefficients and the latest sharpest bound is:

Sondow (2005) [1] and Sondow and Zudilin (2006) [2] noted the inequality

for a positive integer and a real number.

If I did the variable change correctly, the lower bound of this inequality translates into:

m less than 1/4 n (n/k)^k 1/(1-k/n)^(n-k-1)

or roughly when k is small compared to n:

m less than 1/(4 k^k) n^(k+1)

For k=4 (4 zeros per column)

m less than 1/1024 n^ 5
in other words
the signal size must be less than the fifth power of the number of measurements up to a constant

So that for a 5 MB image of dimension, the number of measurements must roughly be superior to 88 (in fact the number must be superior to 107). For some cases, this number is clearly above the intrinsic dimension of the dataset and one wonders whether it is wise to oversample compared to a Gaussian Random Matrix approach (which would require 2K+1 measurements where K is the intrinsic dimension of the signal). On the other hand, these 88 sparse samples do not require the huge memory overhead needed to store the Gaussian Matrix. Clearly one needs to investigate a sweet spot for small sized signals. I have included this condition in the function Sparse Measurement Matrix implemented here.

Another item of caution is that the measurement matrix needs to be computed beforehand. The structure of the matrix requires that the matrix be built in full before using it. The other random measurement matrices do not have this feature.

Also, and it is relevant to people thinking about Hardware implementations, the measurements itself can be extremely sparse (all zeros) or extremely full (all ones). The sparsity of these matrices is driven through a number of non-zeros elements by columns not rows.

Finally, in regards to a previous question, the sparsity structure of this measurement matrices can used by most current solvers through the use of a function handle. A minor point remains as to whether the actual solvers could not be further designed with the sparsity structure in mind.

Reference:
[1] Sondow, J. "Problem 11132." Amer. Math. Monthly 112, 180, 2005.

[2] Sondow, J. and Zudilin, W. "Euler's Constant, -Logarithms, and Formulas of Ramanujan and Gosper." Ramanujan J. 12, 225-244, 2006.

Photo Credit: NASA/JPL/Space Science Institute, The Cassini spacecraft's close flyby of Epimetheus (116 kilometers, or 72 miles across) on December 3, 2007 returned detailed images of the moon's south polar region.

The number of ones per column is not a constant; it grows proportionally with log(n/m), where n = size of signal, m = number of measurements. Still, the matrix is very sparse, n*log non-zero values vs n*m values.

"The structure of the matrix requires that the matrix be built in full before using it."

Why is that? Each column is created independently. You do not need to check if any lines are empty - the probability of this happening is very low with correct parameters - each line has on average n*d/m ones, where d is the number of ones per column.

Also, Gaussian random matrices need K*log(N/K) measurements, where K is the sparsity (intrinsic dimension), and N is the dimension of the signal.
You cannot help the logarithmic factor; intuitively (and ignoring bits vs reals), if K = 1, you need at least log(N) bits to be able to store the position of the non-zero value of the N dimensional vector (otherwise you can't recover it); in general you need K*log(N) bits just to store these positions.

This number of measurements for sparse matrices is exactly the same (provided that the number of 1s on every column is O(log(n/m))). This is also true in practice as per the experiments in the paper (same number of measurements yield similar results).

Igor said...

Thank you very much for your comment. Let me answer one by one each of your remarks.

".. The number of ones per column is not a constant; it grows proportionally with log(n/m).."

"...A binary sparse matrix of m rows and n columns is generated in the following way: for each
column, d random values between 1 and m are generated, and 1′s are placed in that column, in rows
corresponding to the d numbers. If the d numbers are not distinct, the generation for the column
is repeated until they are (this is not really an issue when d much less than m)..."

If I understand this correctly, the number of ones is constant per column and it is equal to d. Please let me know if I misunderstood the paper.

2) you say

"...."The structure of the matrix requires that the matrix be built in full before using it."

Why is that? Each column is created independently...."

I realize that this is indeed the case but if you are designing some hardware that implements this approach like a camera, you need to have the full row starting with the very first measurement. I agree with you that the matrix can be expanded, but if you are doing a measurement you'd better have the full set of columns first.

You say:
"... each line has on average n*d/m ones, where d is the number of ones per column..."

so it looks like there is indeed a constant number of ones per column and it is d.

3) I agree with you on your second comment. But do you also agree with me that there is an obvious limit that can be quantified which puts a direct requirement on the number of samples. While O(Klog(N/K) measurements is a "sharp" bound, do you agree there may be situation may where the more obvious requirement (mentioned in the post) needs to be fulfilled when constructing the matrix ?

Again thank you for helping me better understand this.

Igor.

1) d is a parameter; you can think of it as a constant once you know your n and m. The proofs in the paper only go through for d (somewhat) proportional to the logarithm of n (see Claim 1 in Section 2).

2) I understand your point. I am not sure about this, but it might be that in practice you would get similar results by simply generating a random binary sparse matrix with probability of each element being 1 equal to n*d/m. We make no claims about such a matrix, it's just an idea.

3) I might not be understanding your question exactly; as long as m is large enough as per the O(Klog(N/K)) requirement, and you set d to the proper value, the requirement you mention is fulfilled with high probability. For very small values of m, the matrix might not be so sparse anymore though. Of course there is a limit to how small m can be (before the whole matrix contains ones), but that is not interesting in practice (you can just use Gaussians for such small m).

Another way to see this is thinking that the total number of ones in the matrix is not very dependent on m; the matrix will always have about n*log non-zero elements. Thus, for biger m the matrix is sparser than for smaller m.

Igor said...

Ok, we are converging. At least I am.

On your point 1). Yes, d is a parameter and therefore is a constant and Yes it must scale as K*log(N/K) in order to allow for the reconstruction of the whole signal.

point 2). Ok. Interesting new construction. It looks like the Bernouilli Matrices.

Point 3) Ok.

Thanks for pointing out my omission about the scaling of d. I need to correct that.

Igor.

pandarbear said...

I'm being curious to the structure of Berinde's Sparse Measurement Matrix(MM).
Infact, I try to using sprand function in MATLAB to test l1-magic for 1D sparse signal and the result is done well although the performance is weaker with the same of number sensing.

I would like to know this generative process is similar to construction of your MM ?

Igor said...

Pandarbear,

you ought to look at the matlab code for Berinde's matrix and see how it differs from the matlab matrix. The main difference is the specific number of non zeros element per columns (whereas there is no such constraint in the matlab function). Please recall that it may work for this matlab function but:
- it may not work mos tof the time (only 90% of the time instead of 99.999999%)
- there is no theory behind it for the moment.

Cheers,

Igor.