Thursday, September 30, 2010

CS: A Q&A with Felix Herrmann, SMALLbox

I took the opportunity of Felix Herrmann sending me the job opportunities for graduate studentship and postdocs to ask a question destined for people on the applied side of things:.

The question applies specifically to the applied folks. In most systems that are being solved, we somehow hope that the RIP or some such property holds for our system so we can get on the solving of actual problems. It's all great and fun to pretend that but in the end nobody is really convince of the matter.

So here are my questions: In order to be more convincing, have you tried to show that your systems followed the Donoho-Tanner phase transition ? in the negative, is it because of time constraints or something else ?

Felix kindly responded with:
Hi Igor,
I have always been a fan of the phase diagram by Donoho/Tanner because they capture the recovery conditions for typical sparse vectors and acquisition matrices. This means they are less pessimistic and they reflect practical situations better.
However, in my field we are confronted with problem sizes that make it impossible to compute complete phase diagrams. In addition, we are typically relying on redundant sparsifying transformations for which the theory is relatively under developed.
Having said this, I looked in a recent paper at the "oversampling ratio" between the percentage of coefficients required to get a particular nonlinear approximation error and the degree subsampling required to get the same error. I borrowed this idea from Mallat's book and adapted it to the seismic problem. In principle, this sort of thing is somewhat similar to phase diagrams and it gives some assurances but I am afraid maybe not the type of assurances acquisition practitioners are looking for.

While useful, the method I presented in the paper is expensive to compute and does not really lead to a workflow that would provide practical principles to design acquisition scenarios based on compressive sampling. For instance, I am not aware of practical (read computationally feasible and doable in the field) ways to do QC etc. So in summary, while CS provides beautiful insights I think we still have a long way to go to bring this technology into the main stream.
Many seismic exploration techniques rely on the collection of massive data volumes that are subsequently mined for information during processing. While this approach has been extremely successful in the past, current efforts toward higher resolution images in increasingly complicated regions of the Earth continue to reveal fundamental shortcomings in our workflows. Chiefly amongst these is the so-called “curse of dimensionality” exemplified by Nyquist’s sampling criterion, which disproportionately strains current acquisition and processing systems as the size and desired resolution of our survey areas continues to increase. In this paper, we offer an alternative sampling method leveraging recent insights from compressive sensing towards seismic acquisition and processing for data that are traditionally considered to be undersampled. The main outcome of this approach is a new technology where acquisition and processing related costs are no longer determined by overly stringent sampling criteria, such as Nyquist. At the heart of our approach lies randomized incoherent sampling that breaks subsampling related interferences by turning them into harmless noise, which we subsequently remove by promoting transform-domain sparsity. Now, costs no longer grow significantly with resolution and dimensionality of the survey area, but instead depend on transform-domain sparsity only. Our contribution is twofold. First, we demonstrate by means of carefully designed numerical experiments that compressive sensing can successfully be adapted to seismic exploration. Second, we show that accurate recovery can be accomplished for compressively sampled data volumes sizes that exceed the size of conventional transform-domain data volumes by only a small factor. Because compressive sensing combines transformation and encoding by a single linear encoding step, this technology is directly applicable to acquisition and to dimensionality reduction during processing. In either case, sampling, storage, and processing costs scale with transform-domain sparsity. We illustrate this principle by means of number of case studies.

On a different note, the SMALLbox toolbox was announced at the LVA conference. From the twitter feed:

SMALLbox provides a common API for various sparsity toolboxes like sparco, sparselab, etc.

From the website:

Sparse Models, Algorithms, and Learning for
Large-scale data (SMALL)

SMALL will develop a new foundational framework for processing signals, using adaptive sparse structured representations.
A key discriminating feature of sparse representations, which opened up the horizons to new ways of thinking in signal processing including compressed sensing, has been the focus on developing reliable algorithms with provable performance and bounded complexity.
Yet, such approaches are simply inapplicable in many scenarios for which no suitable sparse model is known. Moreover, the success of sparse models heavily depends on the choice of a "dictionary" to reflect the natural structures of a class of data, but choosing a dictionary is currently something of an "art", using expert knowledge rather than automatically applicable principles. Inferring a dictionary from training data is key to the extension of sparse models for new exotic types of data.
SMALL will explore new generations of provably good methods to obtain inherently data-driven sparse models, able to cope with large-scale and complicated data much beyond state-of-the-art sparse signal modelling. The project will develop a radically new foundational theoretical framework for the dictionary-learning problem, and scalable algorithms for the training of structured dictionaries. SMALL algorithms will be evaluated against state-of-the art alternatives and we will demonstrate our approach on a range of showcase applications. We will organise two open workshops to disseminate our results and get feedback from the research community.
The framework will deeply impact the research landscape since the new models, approaches and algorithms will be generically applicable to a wide variety of signal processing problems, including acquisition, enhancement, manipulation, interpretation and coding. This new line of attack will lead to many new theoretical and practical challenges, with a potential to reshape both the signal processing research community and the burgeoning compressed sensing industry.

Credit: SpaceX, Test firing of a Merlin 1C regeneratively cooled rocket engine on a single engine vertical test stand at the SpaceX test facility in McGregor, Texas. The next Falcon flight will occur no earlier than November 8th.

No comments: