From Stephen Becker's webpage here is;
Zero SR1 quasi-Newton method
Background
The zeroSR1 package is based on a proximal quasi-Newton algorithm to solve
where
is a smooth convex function and
is a (possibly non-smooth, and possibly infinite) convex function
such that the
- the proximity operator
is easy to compute,
- the proximity operator is separable in the components of the variable, and
- the proximity operator is piecewise linear.
Exploiting the nature of
, we show in 'A quasi-Newton proximal splitting method’ (Becker, Fadili; NIPS 2012) that one can also compute the proximity operator of
in a scaled norm:
where
is a diagonal matrix,
is a vector so that
is a rank-1 matrix, and
is
.
Because we can efficiently solve for the scaled prox, it opens up the
possibility of a quasi-Newton method. The SR1 update is a rank-1
update, and by using a 0-memory version, the updates to the inverse
Hessian are in exactly the form of
.
This means that for the same cost as a proximal gradient method (or
an accelerated one, like FISTA), we can incorporate second order
information, and the method converges very quickly.
Types of non-smooth terms we can handle
The non-smooth term
can be infinite valued; for example, it may be an indicator function of a set.
The indicator function of a set
is denoted
Equivalently, we are enforcing the constraint
in the optimization problem.
We can solve the scaled prox of the following
in
time (compared to
time for the regular prox) for inputs of dimension
:
Function | mathematical representation |
l1 norm | |
non-negativity constraints | |
l1 norm and non-negativity | |
box constraints | |
hinge loss |
Code
We have put a Matlab/Octave implementation on github, under the BSD 3-clause license. If you are interested in contributing a version in python or R, we will be glad to assist.
No comments:
Post a Comment