From Stephen Becker's webpage here is;
Zero SR1 quasi-Newton method
The zeroSR1 package is based on a proximal quasi-Newton algorithm to solve
where is a smooth convex function and is a (possibly non-smooth, and possibly infinite) convex function such that the
- the proximity operator is easy to compute,
- the proximity operator is separable in the components of the variable, and
- the proximity operator is piecewise linear.
Exploiting the nature of , we show in 'A quasi-Newton proximal splitting method’ (Becker, Fadili; NIPS 2012) that one can also compute the proximity operator of in a scaled norm:
where is a diagonal matrix, is a vector so that is a rank-1 matrix, and is .
Because we can efficiently solve for the scaled prox, it opens up the possibility of a quasi-Newton method. The SR1 update is a rank-1 update, and by using a 0-memory version, the updates to the inverse Hessian are in exactly the form of .
This means that for the same cost as a proximal gradient method (or an accelerated one, like FISTA), we can incorporate second order information, and the method converges very quickly.
Types of non-smooth terms we can handle
The non-smooth term can be infinite valued; for example, it may be an indicator function of a set. The indicator function of a set is denoted
Equivalently, we are enforcing the constraint in the optimization problem.
We can solve the scaled prox of the following in time (compared to time for the regular prox) for inputs of dimension :
|l1 norm and non-negativity|
We have put a Matlab/Octave implementation on github, under the BSD 3-clause license. If you are interested in contributing a version in python or R, we will be glad to assist.
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.