Thursday, November 06, 2014

Zero SR1 quasi-Newton method - implementation -

From Stephen Becker's webpage here is;

Zero SR1 quasi-Newton method

Background

The zeroSR1 package is based on a proximal quasi-Newton algorithm to solve

$min_{x}; f(x) + h(x)$

where

is a smooth convex function and

is a (possibly non-smooth, and possibly infinite) convex function such that the

the proximity operator $text{prox}_{h}(y) = text{argmin}_x; h(x) + frac{1}{2}|x-y|_2^2$ is easy to compute,
the proximity operator is separable in the components of the variable, and
the proximity operator is piecewise linear.

Exploiting the nature of

, we show in 'A quasi-Newton proximal splitting method’ (Becker, Fadili; NIPS 2012) that one can also compute the proximity operator of

in a scaled norm:

$text{prox}_h^V(y) = text{argmin}_x ; h(x) + frac{1}{2}|x-y|_V^2,; V = D + sigma uu^T$

where

is a diagonal matrix,

is a vector so that uu^T

is a rank-1 matrix, and sigma

Because we can efficiently solve for the scaled prox, it opens up the possibility of a quasi-Newton method. The SR1 update is a rank-1 update, and by using a 0-memory version, the updates to the inverse Hessian are in exactly the form of D + sigma uu^T

This means that for the same cost as a proximal gradient method (or an accelerated one, like FISTA), we can incorporate second order information, and the method converges very quickly.

Types of non-smooth terms we can handle

The non-smooth term

can be infinite valued; for example, it may be an indicator function of a set. The indicator function of a set

is denoted

$iota_C(x) = begin{cases} 0 & x in C +infty & x notin C end{cases}$

Equivalently, we are enforcing the constraint x in C

in the optimization problem.

We can solve the scaled prox of the following

time (compared to mathcal{O}(n)

time for the regular prox) for inputs of dimension

Function	mathematical representation
l1 norm	$h(x) = lVert xrVert_1 = sum_{i=1}^n lvert x_i rvert$
non-negativity constraints	$h(x) = iota_{C_+}$ , $C_+ = { x mid x ge 0 }$
l1 norm and non-negativity	$h(x) = lVert xrVert_1 + iota_{C_+}$
box constraints	$h(x) = iota_{C_{text{box}}}$ , $C_text{box} = { x mid ell le x le u }$
norm ball	$h(x) = iota_{C_infty}$ , $C_infty = { x mid lVert x rVert_infty le 1 }$
hinge loss	$h(x) = sum_{i=1}^n max( 0, 1 - x_i )$

Code

We have put a Matlab/Octave implementation on github, under the BSD 3-clause license. If you are interested in contributing a version in python or R, we will be glad to assist.

Join the CompressiveSensing subreddit or the Google+ Community and post there !