Tuesday, May 08, 2012

Learning with Submodular Functions: A Convex Optimization Perspective (demo ). - implementation-

With some of the discussions on the l_infinity and the l_1 norms on Masaaki Nagahara's new blog, I was reminded of the work of others who are trying to change the norms so as to get similar results as the ones obtained in the more "traditional" setting of compressive sensing and the l_0/l_1 norm. Let us say outright that there doesn't seem too many results on the type of corresponding ensembles that provide nice recovery results for these "new" norms because the Machine Learning folks who are active on this front, are simply given these ensembles: They simply have no choice. However these innovations on the norm settings are exciting in that any good results in Machine Learning send the signals to the signal acquisition folks and their attendant hardware/sensors that they need to eventually change. At least we can hope they will, right ?  . 
With all this said, some of the recent papers/presentations/overviews on the subject include: 

Among them, there is the use of submodular functions:

Submodular functions are relevant to machine learning for mainly two reasons: (1) some problems may be expressed directly as the optimization of submodular functions and (2) the Lov´asz extension of submodular functions provides a useful set of regularization functions for supervised and unsupervised learning. In this paper, we present the theory of submodular functions from a convex analysis perspective, presenting tight links between certain polyhedra, combinatorial optimization and convex optimization problems. In particular, we show how submodular function minimization is equivalent to solving a wide variety of convex optimization problems. This allows the derivation of new efficient algorithms for approximate submodular function minimization with theoretical guarantees and good practical performance. By listing many examples of submodular functions, we review various applications to machine learning, such as clustering or subset selection, as well as a family of structured sparsity-inducing norms that can be derived and used from submodular functions.

The matlab code featuring the examples of this paper is here.

There is also a tutorial by Francis at the IMA:

"...Tutorial - Structured sparsity-inducing norms through submodular functions
September 29, 2011 9:00 am - 9:45 am

Keywords of the presentation: Convex optimization, submodular functions

Sparse methods for supervised learning aim at finding good linear predictors from as few variables as possible, i.e., with small cardinality of their supports. This combinatorial selection problem is often turned into a convex optimization problem by replacing the cardinality function by its convex envelope (tightest convex lower bound), in this case the L1-norm. In this work, we investigate more general set-functions than the cardinality, that may incorporate prior knowledge or structural constraints which are common in many applications: namely, we show that for nondecreasing submodular set-functions, the corresponding convex envelope can be obtained from its Lovasz extension, a common tool in submodular analysis. This defines a family of polyhedral norms, for which we provide generic algorithmic tools (subgradients and proximal operators) and theoretical results (conditions for support recovery or high-dimensional inference). By selecting specific submodular functions, we can give a new interpretation to known norms, such as those based on rank-statistics or grouped norms with potentially overlapping groups; we also define new norms, in particular ones that can be used as non-factorial priors for supervised learning...."

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.


Masaaki Nagahara said...


Thank you for referring my blog!
It is very nice to know there can be many kinds of norms other than l_0/l_1 norm for CS. I remember p-"norm" (0<p<1) is also proposed by some researchers (but I think this is not so "new").

Best regards,


Igor said...

Yes, those norms between 0 and 1 seem to be used in image processing with p around 0.7 for better image reconstruction.