Random projections are used as dimensional reduction techniques in many situations. They project a set of points in a high dimensional space to a lower dimensional one while approximately preserving all pairwise Euclidean distances. Usually, random projections are applied to numerical data. In this paper, however, we present a successful application of random projections to quadratic programming problems subject to polyhedral and a Euclidean ball constraint. We derive approximate feasibility and optimality results for the lower dimensional problem. We then show the practical usefulness of this idea on many random instances, as well as on two portfolio optimization instances with over 25M nonzeros in the (quadratic) risk term.

We discuss the application of Gaussian random projections to the fundamental
problem of deciding whether a given point in a Euclidean space belongs to a
given set. In particular, we consider the two cases, when the target set is either
at most countable or of low doubling dimension. We show that, under a number of different assumptions, the feasibility (or infeasibility) of this problem is
preserved almost surely when the problem data is projected to a lower dimensional space. We also consider the threshold version of this problem, in which
we require that the projected point and the projected set are separated by a
certain distance error. As a consequence of these results, we are able to improve
the bound of Indyk-Naor on the Nearest Neigbour preserving embeddings. Our
results are applicable to any algorithmic setting which needs to solve Euclidean
membership problems in a high-dimensional space.

A celebrated result of Johnson and Lindenstrauss asserts that, in high enough dimensional spaces, Euclidean distances defined by a finite set of points are approximately preserved when these points are projected to a certain lower dimensional
space. We show that the distance from a point to a convex set is another approximate invariant, and leverage this result to approximately solve linear programs with
a logarithmic number of rows.

Random projections are random linear maps, sampled from appropriate distributions, that approximately
preserve certain geometrical invariants so that the approximation improves as the dimension of the space
grows. The well-known Johnson-Lindenstrauss lemma states that there are random matrices with surprisingly
few rows that approximately preserve pairwise Euclidean distances among a set of points. This is commonly
used to speed up algorithms based on Euclidean distances. We prove that these matrices also preserve other
quantities, such as the distance to a cone. We exploit this result to devise a probabilistic algorithm to solve
linear programs approximately. We show that this algorithm can approximately solve very large randomly
generated LP instances. We also showcase its application to an error correction coding problem.

Abstract
In this thesis, we will use random projection to reduce either the number of variables or the
number of constraints (or both in some cases) in some well-known optimization problems. By
projecting data into lower dimensional spaces, we obtain new problems with similar structures,
but much easier to solve. Moreover, we try to establish conditions such that the two problems
(original and projected) are strongly related (in probability sense). If it is the case, then
by solving the projected problem, we can either find approximate solutions or approximate
objective value for the original one.
We will apply random projection to study a number of important optimization problems,
including linear and integer programming (Chapter 2), convex optimization with linear constraints (Chapter 3), membership and approximate nearest neighbor (Chapter 4) and trustregion subproblems (Chapter 5). All these results are taken from the papers that I am
co-authored with [26, 25, 24, 27].
This thesis will be constructed as follows. In the first chapter, we will present some basic
concepts and results in probability theory. Since this thesis extensively uses elementary
probability, this informal introduction will make it easier for readers with little background
on this field to follow our works.
In Chapter 2, we will briefly introduce to random projection and the Johnson-Lindenstrauss
lemma. We will present several constructions of random projectors and explain the reason why
they work. In particular, sub-gaussian random matrices will be treated in details, together
with some discussion on fast and sparse random projections.
In Chapter 3, we study optimization problems in their feasibility forms. In particular, we
study the so-called restricted linear membership problem, which asks for the feasibility of the
system {Ax = b, x ∈ C} where C is some set that restricts the choice of parameters x. This
class contains many important problems such as linear and integer feasibility. We propose to
apply a random projection T to the linear constraints and obtain the corresponding projected
problem: {T Ax = T b, x ∈ C}. We want to find conditions on T, so that the two feasibility
4
problems are equivalent with high probability. The answer is simple when C is finite and
bounded by a polynomial (in n). In that case, any random projection T with O(log n)
rows is sufficient. When C = R
n
+, we use the idea of separating hyperplane to separate b
from the cone {Ax | x ≥ 0} and show that T b is still separated from the projected cone
{T Ax | x ≥ 0} under certain conditions. If these conditions do not hold, for example when
the cone {Ax | x ≥ 0} is non-pointed, we employ the idea in the Johnson-Lindenstrauss
lemma to prove that, if b /∈ {Ax | x ≥ 0}, then the distance between b and that cone is
slightly distorted under T, thus still remains positive. However, the number of rows of T
depends on unknown parameters that are hard to estimate.
In Chapter 4, we continue to study the above problem in the case when C is a convex set.
Under that assumption, we can define a tangent cone K of C at x
∗ ∈ arg minx∈C kAx − bk.
We establish the relations between the original and projected problems based on the concept
of Gaussian width, which is popular in compressed sensing. In particular, we prove that the
two problems are equivalent with high probability as long as the random projection T is
sampled from sub-gaussian distributions and has at least O(W2
(AK)) rows, where W(AK) is
the Gaussian-width of AK. We also extend this result to the case when T is sampled from
randomized orthonormal systems in order to exploit its fast matrix-vector multiplication.
Our results are similar to those in [21], however they are more useful in privacy-preservation
applications when the access to the original data A, b is limited or unavailable.
In Chapter 5, we study the Euclidean membership problem: “Given a vector b and a
closed set X in R
n
, decide whether b ∈ X or not”. This is a generalization of the restricted
linear membership problem considered previously. We employ a Gaussian random projection
T to embed both b and X into a lower dimension space and study the corresponding projected version: “Decide whether T b ∈ T(X) or not”. When X is finite or countable, using
a straightforward argument, we prove that the two problems are equivalent almost surely
regardless the projected dimension. However, this result is only of theoretical interest, possibly due to round-off errors in floating point operations which make its practical application
difficult. We address this issue by introducing a threshold τ > 0 and study the corresponding
“thresholded” problem: “Decide whether dist (T b, T(X)) ≥ τ”. In the case when X may
be uncountable, we prove that the original and projected problems are also equivalent if the
projected dimension d is proportional to some intrinsic dimension of the set X. In particular,
we employ the definition of doubling dimension to prove that, if b /∈ X, then Sb /∈ S(X)
almost surely as long as d = Ω(ddim(X)). Here, ddim(X) is the doubling dimension of X,
which is defined as the smallest number such that each ball in X can be covered by at most
2
dd(X) balls of half the radius. We extend this result to the thresholded case, and obtain a
more useful bound for d. It turns out that, as a consequence of that result, we are able to
5
improve a bound of Indyk-Naor on the Nearest Neigbour Preserving embeddings by a factor
of log(1/δ)
ε
.
In Chapter 6, we propose to apply random projections for the trust-region subproblem,
which is stated as min{c
>x + x
>Qx | Ax ≤ b, kxk ≤ 1}. These problems arise in trust-region
methods for dealing with derivative-free optimization. Let P ∈ R
d×n be a random matrix
sampled from Gaussian distribution, we then consider the following “projected” problem:
min{c
>P
>P x + x
>P
>P QP >P x | AP >P x ≤ b, kP xk ≤ 1},
which can be reduced to min{(P c)
>u + u
>(P QP >)u | AP >u ≤ b, kuk ≤ 1} by setting
u := P x. The latter problem is of low dimension and can be solved much faster than the
original. However, we prove that, if u
∗
is its optimal solution, then with high probability,
x
∗
:= P
>u
∗
is a (1 + O(ε))-approximation for the original problem. This is done by using
recent results about the “concentration of eigenvalues” of Gaussian matrices.

Follow

@NuitBlog or join the

CompressiveSensing Reddit, the

Facebook page, the Compressive Sensing group on

LinkedIn or the Advanced Matrix Factorization group on

LinkedIn
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also

subscribe to Nuit Blanche by Email.

Other links:

*Paris Machine Learning*:

Meetup.com||

@Archives||

LinkedIn||

Facebook||

@ParisMLGroup< br/>

*About LightOn*:

Newsletter ||

@LightOnIO|| on

LinkedIn || on

CrunchBase || our

Blog
**About myself**:

LightOn ||

Google Scholar ||

LinkedIn ||

@IgorCarron ||

Homepage||

ArXiv