Wednesday, January 01, 2014

Dynamics and termination cost of spatially coupled mean-field models

Happy New Year 

Today we'll start the year with a thought provoking work that we have already seen at play a while back with the appearance of seeded measurement matrices. The connection between the speed an inference algorithm and the nucleation in a physical system. Think "number of iterations in traditional algorithms" mapped to the "number of layers in a neural network". From the paper:

Inference problems can be formulated as problems of statistical mechanics at finite temperature, see [3, 4]. The variables play the role of spins, the constraints resulting from measurements correspond to interactions between spins that are summarized in the Hamiltonian of the corresponding statistical mechanical model. The loglikelihood in the inference problem is therefore the negative free energy in statistical mechanics. A lot of measurements, or a high signal-to-noise ratio, lead to interactions favoring one specifi c spin con figuration, which corresponds to the original and correct signal. A decrease of the signal-to-noise ratio or of the number of measurements modify the free-energy landscape by giving more weight to spurious con gurations corresponding to metastable states that start to compete with the low free-energy favored state and, hence, make the inference of the original signal di fficult. An optimal algorithmic way to reconstruct (infer) the signal requires the computation of marginal probabilities of the corresponding posterior probability, in the statistical physics language this means computing local magnetizations of the corresponding Boltzmann distribution. This is computationally very di fficult task and both in inference and in physics the most basic and popular approximations involve the mean- field approach (known as variational Bayes method in inference) or a Monte Carlo simulation (known as Gibbs sampling in inference)....
In compressed sensing, this corresponds to using only as many measurements as the number of elements in the compressed signal. It turns out that the encoding/measurement protocols able to achieve this optimality often correspond to random Hamiltonians, for which mean eld solutions are exact [3, 4]. This is to be contrasted with statistical physics, where mean-fi eld theory is typically a first tool used in order to eventually understand the behavior of more complicated systems. In inference, the models of primary interest often are the mean eld ones| with Hamiltonian being de ned on a random graph (corresponding to the Bethe approximation) or on a fully connected lattice (corresponding to the Curie-Weiss approximation)....
In the limit of large system sizes, and for a given design of the Hamiltonian, the best achievable performance is characterized by a phase transition that separates a region where inference is possible from a region where it is not. Just as in physics, depending on the Hamiltonian, this phase transition can be of first (discontinuous) or of second (continuous) order. Problems where the transition is of first order (e.g., the two problems above) are much more algorithmically challenging for the following reason: in mean field systems, a first order phase transition is associated with a well de ned spinodal region of exponentially-long living metastability. This metastability is due to existence of a local optimum in the posterior likelihood that is iteratively extremized by the inference algorithm, whereas optimal inference corresponds to nding the global optimum. In physics the global optimum corresponds to the equilibrium state and the local optimum to a metastable state. In the context of inference problems the presence of a rst order phase transition implies that the region where inference is possible divides into two parts| the spinodal part and the rest. In the spinodal region there exists a metastable phase corresponding to unsuccessful inference from which it would take an exponentially long (in the size of the system) time to reach the equilibrium (i.e., successful inference). Hence in inference problems where a first order phase transition appears, the corresponding spinodal line poses a barrier for algorithmic solution in polynomial time. Such an algorithmic barrier has now been identifi ed in many di erent inference problems including the compressed sensing and error correcting codes. It is well known in statistical physics that the exponentially-long-living metastable states only exist in mean- field systems. In any finite dimensional system where the surface of a compact droplet is always smaller than its volume if the droplet is su fficiently large, the system escapes from the metastable state in a time that is polynomially proportional (often linearly) to the system size. This is the concept of nucleation that is well studied in physical sciences. A natural question is therefore: is there a way to induce nucleation in the above inference problems? Recall that the corresponding Hamiltonian needs to be locally mean fi eld-like in order to achieve the best possible performance. Hence the question is how to combine the required mean- eld nature (achieving optimal performance) and finite dimensional nature in order to induce nucleation and escape from the undesirable metastable state. This can be achieved by a concept called `spatial coupling' in which one designs the Hamiltonian by taking independent copies of the mean field system and coupling them to close-neighbors along a one-dimensional chain. The idea then is that for a small part of the system called the seed (placed usually at the beginning of the chain) the parameters are such that successful inference is achieved in that part. When the seed is su ciently large and strong, the interactions along the chain ensure that successful inference is also achieved for its neighbors, and so on. The physical principle is the same as for a supercritical nucleus to start to grow during nucleation. Typically, successful inference is characterized by a travelling wavelike phenomenon, as the accurate reconstruction starts in the seed and travels `along' the chain.  The idea of using spatial coupling in order to improve performance in inference problems goes back to the so called `convolutional low density parity check codes' [6, 7].
`convolutional low density parity check codes' , uh! One cannot escape the notion that it is likely to be very relevant to a set of issues ranging from compressive sensing solvers all the way to the structure and design of future Neural Networks (Sunday Morning Insight: Sharp Phase Transitions in Machine Learning ?). Enjoy!

This work is motivated by recent progress in information theory and signal processing where the so-called `spatially coupled' design of systems leads to considerably better performance. We address relevant open questions about spatially coupled systems through the study of a simple Ising model. In particular, we consider a chain of Curie-Weiss models that are coupled by interactions up to a certain range. Indeed, it is well known that the pure (uncoupled) Curie-Weiss model undergoes a first order phase transition driven by the magnetic field, and furthermore, in the spinodal region such systems are unable to reach equilibrium in sub-exponential time if initialized in the metastable state. By contrast, the spatially coupled system is, instead, able to reach the equilibrium even when initialized to the metastable state. The equilibrium phase propagates along the chain in the form of a travelling wave. Here we study the speed of the wave-front and the so-called `termination cost'--- \textit{i.e.}, the conditions necessary for the propagation to occur. We reach several interesting conclusions about optimization of the speed and the cost.

Potentially related:

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

No comments: