Nuit Blanche: Compressed Sensing or Inpainting, part II

Tuesday, June 22, 2010

Compressed Sensing or Inpainting, part II

Client: “How come all the photos I took have the heads cut off?”

Me: “Hmm, Did you look though the view finder when you took them?”

Client: “I don’t know what that is. Can’t you just move the picture up so I can see their heads? I mean they’re digital pictures?”

Ever since the Wired article came out, it's been bothering me to the point where I had to write this entry on Why Compressed Sensing is NOT a CSI "Enhance" technology ... yet !. I am not alone, take for example Bob Sturm's recent entry entitled: "Compressed Sensing" and Undersampled Audio Reconstruction that talks about the Compressive Sensing Audio example featured in the NPR recent piece. In both cases, there is a sense that what is being mentioned is at the limit of compressed sensing. Why are these examples at the limit of what Compressive Sensing ? In Around the blogs in 80 hours I mentioned:

....This is why I wrote this example in Compressed Sensing: How to wow your friends back in 2007 that features an example with delta and sines. Let us note that in Compressive Sensing Audio, for the delta/sines assumption to hold, you really need have to sample enough within time-localized phenomena. In other words, Compressed Sensing is not a license to make appear something you did not catch with the sampling process. This needs to be said often, as in MRI or steady-state audio, the signal is being sampled with diracs in their appropriate phase spaces (Fourier for MRI and time for Audio) that will get you a result directly applicable by a compressive sensing approach. In other fields like imagery however, you do not sample directly in a good phase space and you need new types of exotic hardware to perform these new types of incoherent measurements...

But now with the Wired piece, and as featured in Compressed Sensing or Inpainting ? Part I Jarvis Haupt showed us an example of compressive sensing that looked like an inpainting example. Within the framework of Compressive Sensing, we really need two elements :

an encoding mechanism whose purpose is to multiplex/mix different signals (sometimes in a random fashion)
a decoding mechanism (most probably a nonlinear one)

In part I, an illustration of Compressive Sensing seemed to envision that Compressive Sensing could omit the multiplexing/encoding part while relying exclusively on the second part. But instead of arguing unproductively and taking sides between inpainting and compressive sensing, I am wondering the following question and I am looking forward to folks in the community to help me out:

Is your average point and click camera with missing pixels a compressive sensing system ?

To give some perspective, here is a presentation by Mark Neifeld made at the Duke-AFRL Meeting a year ago entitled Adaptation for Task-Specific Compressive Imaging and especially this slide:

So some people seem to think that way as well, but let us come back to a more substantive explanation about a point and click camera being a compressive sensing system. Let us first figure out if some mixing is occurring for this type of off-the-shelf camera by using a simple model of a "normal" camera and a CS one.

If randomly sampling on the CCD is really equivalent to having a camera with missing pixels. What a does a model for a missing pixel camera look like ? On a first approximation, one can think of it as a thick coded aperture camera. Talking about the thickness of a mask for coded aperture is new as the thickness of the mask is not an issue in most current computational photography undertakings (it is an issue in coded aperture cameras used in X-ray astronomy however) . In the following figures, I graphed/drew the thick coded aperture mask and the CCD/CMOS planes sideways. In the thick configuration below, rays of light do not mix with each other i.e each CCD/CMOS pixel receive an unmixed portion of the scene of interest through the mask. Images taken with this camera model translate into the Obama missing pixel of the Wired piece:

However, if you recall how coded aperture works when used in the framework of compressive sensing you'll realize that it is equivalent to a thin coded aperture mask as is shown in the following figure.

The red square on the CCD/CMOS is the recipient of light rays from many different holes on the mask: i.e. some mixing is occurring and therefore by having a random combination of holes on the mask, one can come apply the Compressive Sensing framework.

The question now becomes : how does this apply to the Obama picture in the the Wired piece knowing it was taken with the equivalent of a thick coded aperture not a thin one where compressive sensing would obviously work ? and why does the l_1 minimization seems to be working in getting back a useful image ?

Some would say that there is enough redundant information in the image with missing pixels so that with the right dictionary, the image can be somewhat perfectly reconstructed. However, I think ( and this is the reason, we shouldn't rush to judgment when saying that some system are CS or not) that some mixing is occurring that we somehow are overlooking. So how is a point and click camera an actual "weak" Compressive Sensing system ?

My take is that a point and click camera does not fit a simple coded aperture model . The optical engineering is in the business of designing systems so that a dirac in the world plane gets to be translated in as close as a dirac as possible on the CCD/CMOS plane for all wavelengths and for all depths. The problem is, we can't do it and this why the optical folks are having a ball selling us lenses of different kinds in a clearly segmented market (different type of depth, focus, abberation, field of view, blah blah blah)

In a typical optical system, a dirac in the object plane gets to be translated into a Airy function like this one on the CCD/CMOS plane.

As one can see the ripples near the peak allow for some of the light to be overflowing with other pixels nearby i.e. even in a simple system, some mixing is occurring with neighboring pixels if not at one wavelength, definitely at others. Other type of mixing in your average point and click system include:

Transport medium (air, ...)
JPEG transform

The next issue then becomes how come the l_1 minimization can work since the measurement matrix does not take these mixing into account ? I am not sure but maybe the results for multiplicative noise or Random Matrix Theory could help (arxiv: Mixed Operators in Compressed Sensing by Matthew Herman and Deanna Needell.)

Eventually the question is "are lower quality cameras or microphones "weak" compressive sensing systems " ? More to come later...