....There's another trait on the side which I want to talk about; that trait is ambiguity. It took me a while to discover its importance. Most people like to believe something is or is not true. Great scientists tolerate ambiguity very well. They believe the theory enough to go ahead; they doubt it enough to notice the errors and faults so they can step forward and create the new replacement theory....
Sparsity has been recently introduced in cosmology for weak-lensing and CMB data analysis for different applications such as denoising, component separation or inpainting (i.e. filling the missing data or the mask). Although it gives very nice numerical results, CMB sparse inpainting has been severely criticized by top researchers in cosmology, based on arguments derived from a Bayesian perspective. Trying to understand their point of view, we realize that interpreting a regularization penalty term as a prior in a Bayesian framework can lead to erroneous conclusions. This paper is by no means against the Bayesian approach, which has proven to be very useful for many applications, but warns about a Bayesian-only interpretation in data analysis, which can be misleading in some cases.
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.
Hi,
ReplyDeleteThis is an interesting paper. However, I think that there is something important missing in this discussion: the loss function. Any (Bayesian) point estimate is the minimizer of the posterior expectation of some loss; in particular, the MAP is the minimizer of the posterior expectation of the 0/1 loss (in fact, it is the limit of a family of estimates, but that can be ignored in this discussion). Accordingly, there is no reason whatsoever why the distribution of MAP estimates has to be similar to the prior; why would it? Furthermore, for a MAP estimate to yield "correct results" (whatever "correct" means), there is no reason why typical samples from the prior should look like those "correct results". In fact, the compressed sensing (CS) example in Section 3.3 of the paper illustrates this quite clearly: (a) CS theory guarantees that solving (4) or (5) yields the "correct" solution; (b) as explained in section 3.1, (5) is the MAP estimate of x under a Laplacian prior (and the linear-Gaussian likelihood therein explained); (c) thus, the solution of (5) is the minimizer of the posterior expectation of the 0/1 loss under the likelihood and prior just mentioned; (e) in conclusion, if the underlying vectors are exactly sparse enough (obviously not typical samples of a Laplacian) they can be recovered by computing the MAP estimate under a Laplacian prior, that is, by computing the minimizer of the posterior expectation of the 0/1 loss. This is simply a fact. There is nothing surprising here: the message is that the prior is only half of the story and it doesn't make sense to look at a prior without looking also at the loss function. In (Bayesian) point estimation, a prior is "good", not if it describes well the underlying objects to be estimated, but if used (in combination with the likelihood function and observations) to obtain a minimizer of the posterior expectation of some loss it leads to "good" estimates.
Regards,
Mario Figueiredo.