In a previous entry, I mentioned the connection between how L1 and maximum entropy considerations. As we now know, in Compressed Sensing, the reconstruction of the signal from incoherent measurements involves a Linear Programming (L1) step featured in the Basis Pursuit approach. We also know that there is a bayesian approach to the reconstruction which obviously involves Laplace priors (because a maxent approach to the problem involving the L1 norm point to Laplace distribution as ideal priors).
In The Price of Privacy and the Limits of LP Decoding by Cynthia Dwork, Frank McSherry and Kunal Talwar show the following:
Our principal result is the discovery of a sharp threshhold ρ∗ ≈ 0.239, so that if ρ < ρ∗ and A is a random m × n encoding matrix of independently chosen standard Gaussians, where m = O(n), then with overwhelming probability over choice of A, for all x ∈ Rn, LP decoding corrects ρm arbitrary errors in the encoding Ax, while decoding can be made to fail if the error rate exceeds ρ∗.
and then in the conclusion
Comparing the results of section 7 with those of section 4, we note that while near-perfect decoding is information theoretically possible for error rates up to a half, LP decoding fails at much lower error rates. It is thus natural to look for other efficient decoding procedures for the case of adversarial error.
We already somehow knew this from previous investigations (David Donoho and Victoria Stodden in Breakdown Point of Model When the Number of Variables Exceeds the Observations ) but we now have a number: 24 percent. Interestingly, Rick Chartrand in Exact reconstruction of sparse signals via nonconvex minimization makes the case that since L1 might not be optimal and that looking at Lp with p less than 1 might lead to better results
. The idea being that asymptotically, we want p to go to zero in order to fulfill the real L0 requirement. From the abstract:
We show that by replacing the L1 norm with the Lp norm with p < 1, exact reconstruction is possible with substantially fewer measurements. We give a theorem in this direction, and many numerical examples, both in one complex dimension, and larger scale examples in two real dimensions.
Another paper by Chartrand, Nonconvex compressed sensing and error correction shows similar results. His computations seem to go very fast compared to BP/LP so this is noteworthy (we could always go for L0 directly but since it is combinatorial, the time spent on the computation is just too enormous). In light of the reconstruction error shown in that article, one cannot but recall the statement made by Aleks Jakulin and Andrew Gelman in this comments section on using log(1+d^2) norm.
As an log(1+d^2) ball does not have (thank you Yaroslav) the same shape as the Lp norm ball for p less than 1, how can we reconcile the findings that Aleks seems to find Cauchy/t-distributions are doing a good job in sparse decomposition (better than L-1)?
For other reasons, I'd like to think that Cauchy distributions are indeed grounded in serious theoretical work (besides the observation that they are not sensitive to outliers). We already know that random projections exist when modeling the primary visual cortex. We may eventually figure that some type of Autism is related to an equivalent phase change between L0 and L1 or L_log(1+d^2). There is a nice parallel between the metabolic constraints on the cortex when doing sparse coding and CPU requirements to do L0 or L1 or Lp.
 Loss function semantics.
 Rice Compressed sensing page.
 Object recognition in the cortex