## Tuesday, December 04, 2012

### A Q&A with Ben Adcock and Anders Hansen: Infinite Dimensional Compressive Sensing, Generalized Sampling, Wavelet Crimes, Safe Zones and the Incoherence Barrier.

Part of the answer to "what is missing" pointed to the work [1,2,3] by Ben Adcok, Anders Hansen and others.  They deal with this issue of Gibbs phenomenon that could be the basis of some problematic we face in different areas touched by compressive sensing, especially hardware development. Here what I initially sent both Ben and Anders

Dear Ben and Anders,
I write a small blog on compressive sensing (http://nuit-blanche.blogspot.com/) and I was recently re- reading some of your recent papers and presentation about generalized sampling and compressive sensing. I am not sure I understand what the uneven section is and how it is chosen and was wondering if you had a toy example I could run in order to get a sense of how your discretization is different from the traditional one used in compressive sensing ?
Igor.

Ben kindly responded with:

Dear Igor,
Thanks for getting in touch.

> I write a small blog on compressive sensing (http://nuit-blanche.blogspot.com/) and I was >recently re- reading some of your recent papers and presentation about generalized sampling >and compressive sensing. I am not sure i understand what the uneven section is and how it is >chosen
I'm an active reader of your blog (which is by no means small in my opinion): it's a great resource for the CS community. I was pleased to see you mention our work over the weekend.

> and was wondering if you had a toy example I could run in order to get a sense of how your >discretization is different from the traditional one used in compressive sensing ?
Anders has most of the code, so I will leave that up to him.
The key idea is that discretization comes, as you say, from an uneven, as opposed to a finite (i.e. square) section. This ensures that isometric structure is preserved.
The height of the uneven section (N in the slides) is chosen by satisfying the so-called balancing property (slide 45). The question of how large N needs to be (for a given M) to satisfy this property depends completely on the sampling and sparsity bases. However, we have some results which say that for Fourier sampling with (most types of) wavelets as the sparsity basis that one can take N = O(M). As you can see in slide 43 though, N=M doesn't work. One needs N to be roughly twice M in this case.
The balancing property is described in more detail in our paper "Generalized sampling and infinite-dimensional compressed sensing", and the analysis for wavelets can be found in:
"On optimal wavelet reconstructions from Fourier samples: linearity and universality of the stable sampling rate"
There we consider only the nonsparse case with l^2 minimization, and analyze a simpler quantity known as the stable sampling rate. But this is the first step towards the analysis of the full balancing property in this case.
I hope that helps. Please let me know if you have questions or comments.
Best wishes,
Ben

I was intrigued so I responded with:

Hello Ben,

Thanks for being a reader of the blog. ......

I am looking at slide 43 and the rule of thumb I am getting out of it is that you need to sample in a wider set of "frequencies" than the largest frequency of the actual sparse signal in order to get a good reconstruction. This is a very rough idea but is that the gist of it ?

In the affirmative, and those are not mathematically well defined term but when you are faced with a compressible signal (as opposed to a sparse one), the idea is that when you are sampling to a certain high "frequency", you can reconstruct well only the truncated series expansion with a much lower top "frequency" number ?

Cheers,

Igor.

To what Ben, once again, kindly replied:

Hi Igor,

....
>I am looking at slide 43 and the rule of thumb I am getting out of it is that you need to sample >in a wider set of "frequencies" than the largest frequency of the actual sparse signal in order to >get a good reconstruction. This is a very rough idea but is that the gist of it ?
Yes. The range of frequencies N needs to be larger than the bandwidth of the sparse signal M.

>In the affirmative, and those are not mathematically well defined term but when you are faced >with a compressible signal (as opposed to sparse), the idea is that when you are sampling to a >certain high "frequency", you can reconstruct well only the truncated series expansion with a >much lower top "frequency" number ?
That's correct. Although in many case much lower' can often by, say, about half of the sampling frequency.

By the way, this is not an issue that arises due to sparsity or subsampling, or our method. We have have results that say that the dependence of M with N in our method is in some senses universal barrier that one will encounter regardless of the method used. In other words, there is a fundamental limit on how far out' one can go in reconstruction basis given a finite frequency range N in the sampling. This is explained in the wavelet paper I sent you, and also the following paper:

"Beyond consistent reconstructions: optimality and sharp bounds for generalized sampling, and application to the uniform resampling problem"

Best wishes,

Ben
In the meantime, Anders Hansen also sent a response:

Dear Igor
I am a great fan of your blog, and I think you are doing an excellent job for the community. Apologies for the late reply, but last week was the final week of term. Anyway, attached is a demo to show how the standard discretization using DFT and DWT fails dramatically for the simplest case of recovering the Haar wavelet. The reason why is that any discretization based on the DFT will yield a rasterized version of the truncated Fourier series. Thus, applying the DWT to that will (at best give you the wavelet coefficients of the truncated Fourier series). The truncated Fourier series of the Haar wavelet is infinitely smooth and hence it must have infinitely many Haar coefficients. This is why this approach breaks down.
Note also that if one uses a DWT based on other wavelets than Haar, then one is doing the wavelet crime, and the coefficients produced by the DWT are not even the wavelet coefficients of the truncated Fourier series.
The correct way of doing the discretization is to create a matrix whose elements are inner products of the Fourier basis and the Haar basis. This yields an infinite matrix and then one must use uneven sections.
I'll send you some codes that can produce these matrices if you would like to try this "infinite dimensional compressed sensing", just let me know if you want some more stuff. Let me know how the demo worked.
Best wishes,
Anders,
P.S. You need to download spgl1 from the spgl1 webpage to run the demo.
After witnessing the Gibbs phenomenon first hand, I asked Anders some further explanation:
.....
> Can you clarify this wavelet crime statement ?
Indeed, the wavelet crime term was introduced by Gil Strang (I believe) and is the following phenomenon: The Discrete wavelet transform is an infinite dimensional operation that takes the coefficients of the expansion of the function corresponding to the scaling function. The output of the discrete wavelet transform are the wavelet coefficients as well as the scaling coefficients corresponding to the next level. When working in a discrete framework the input of the discrete wavelet transform can only be finite. So what does it mean to take the discrete wavelet transform of say an image. Let's start with the Haar wavelet for simplicity. The Haar wavelet has the characteristic function as its scaling function, that means that we may view an image as a sum of characteristic functions, in fact we may view an image as an infinite sum of characteristic functions with only finitely many non-zero coefficients. Plugging this into the discrete wavelet transform now gives us the correct output, in particular, in the Haar case there is no wavelet crime.
So if y is a vector of Fourier coefficients of a function f and we let
x = DWT \times DFTy, (DWT is the discrete wavelet transform corresponding to the Haar wavelet)
then, since DFTy is a rasterized version of the truncated Fourier series of f, then x is essentially the Haar coefficients of the truncated Fourier series (this is why the example I sent you fails so dramatically)
However, it is in the non-Haar case the wavelet crime occurs. The problem is that for, say DB4, the scaling function is not the characteristic function. So the discrete wavelet transform of an image should be the output of the transform getting the coefficients corresponding to the scaling function as the input. The wavelet crime is when the image itself is plugged into the discrete wavelet transform (as in the Haar case). Thus, when the wavelet crime is committed, the output has very little to do with the actual wavelet coefficients.
In our framework there is no wavelet crime as we go directly for the wavelet coefficients (there is no use of the discrete wavelet transform).

While some of us were bothered by the results of the Matlab Programming Contest results, the real genesis of this conversation stemmed initially from Leslie Smith's recent quest on how to compare bicubic interpolation with compressive sensing reconstruction on the Compressive Sensing LinkedIn group. During the thread, he wrote the following:
....I recently read a paper by Romberg called "Imaging via CS" in which he simulates CS and he made his code available. I downloaded his code and started running it yesterday. It gets better PSNR results for CS than anything else I've tried. I don't understand the camera architecture he is simulated but this is an encouraging line of inquiry.

in that paper, Justin wrote about an intriguing scheme

The second scheme, which we call “compressive imaging,” again sketches the image by measuring the first 1,000 DCT coefficients but then switches to pseudorandom φk to acquire the details

This is an intriguing approach for many reasons, not the least is that it is hardly listed as a generic scheme in compressive sensing. Of specific interest this scheme, that seems to be doing extra well, parallels the approach of Ben and Anders except of course for the application of the TV norm for the final reconstruction.. How is it similar ? take a look at the six slides below extracted from reference [3]. The first three slides talk about the safe zone that Ben and Anders discussed initially at the top of this entry, then on slide 4, 5 and 6, we get to see the beginning of an explanation as to why Justin's second scheme may be right on the money. Except of course that in Ben and Anders's approach we don't need the heuristics of the TV-norm.

And then, the incoherence property is only a sufficient condition.....

Thank you to both Ben and Anders for being good sports and answering my dumb questions.

References:
[1] Ben AdcokAnders Hansen,; E. Herrholz,; G. Teschke, "Generalized sampling, infinite-dimensional compressed sensing, and semi-random sampling for asymptotically incoherent dictionaries".
[2] Ben Adcok ,  Anders Hansen, "Generalized Sampling and Infinite Dimensional Compressed Sensing".