Nuit Blanche: CS: Foundations of CS, Data Stream Algorithms, Isometry-enforcing Data Transformations,The Best Bits, Homomorphic Encryption

Thursday, June 18, 2009

CS: Foundations of CS, Data Stream Algorithms, Isometry-enforcing Data Transformations,The Best Bits, Homomorphic Encryption

Albert Cohen just released Ron Devore's lecture notes he is presenting in Paris entitled: Foundations of Compressed Sensing.

The lecture notes on Data Stream Algorithms by Muthu Muthukrishnan in Barabados are here. Here is an excerpt:

Motivation
The algorithms we are going to describe act on massive data that arrive rapidly and cannot be stored. These algorithms work in few passes over the data and use limited space (less than linear in the input size). We start with three real life scenarios motivating the use of such algorithms.

Example 1: Telephone call. Every time a cell-phone makes a call to another phone, several calls between switches are being made until the connection can be established. Every switch writes a record for the call over approx. 1000 Bytes. Since a switch can receive up to 500 million calls a day, this adds up to something like 1 Terabyte per month information. This is a massive amount of information but has to be analyzed for different purposes. An example is searching for drop calls trying to find out under what circumstances such drop calls happen. It is clear that for dealing with this problem we do not want to work with all the data, but just want to filter with a few passes the useful information.

Example 2: The Internet. The Internet is made of a network of routers connected to each other, receiving and sending IP packets. Each IP packet contains a packet log including its source and destination addresses as well as other information that is used by the router to decide which link to take for sending it. The packet headers have to be processed at the rate at which they flow through the router. Each package takes about 8 nanoseconds to go through a router and modern routers can handle a few million packets per second. Keeping the whole information would need more than
one Terabyte information per day and router. Statistical analysis of the traffic through the router can be done, but this has to be performed on line at nearly real time.

Example 3: Web Search. Consider a company for placing publicity in the Web. Such a
company has to analyze different possibilities trying to maximize for example the number of clicks they would get by placing an add for a certain price. For this they would have to analyze large amounts of data including information on web pages, numbers of page visitors, add prices and so on. Even if the company keeps a copy of the whole net, the analysis has to be done very rapidly since this information is continuously changing.

Also found on the interwebs: Isometry-enforcing Data Transformations for Improving Sparse Model Learning by Avishy Carmi, Irina Rish, Guillermo Cecchi, Dimitri Kanevsky, Bhuvana Ramabhadran. The abstract reads:

Imposing sparsity constraints (such as l1-regularization) on the model parameters is a practical and efficient way of handling very high-dimensional data, which also yields interpretable models due to embedded feature-selection. Compressed sensing (CS) theory provides guarantees on the quality of sparse signal (in our case, model) reconstruction that relies on the so-called restricted isometry property (RIP) of the sensing (design) matrices. This, however, cannot be guaranteed as these matrices form a subset of the underlying data set. Nevertheless, as we show, one can find a distance-preserving linear transformation of the data such that any transformed subspace of the data satisfied the RIP at some level. We demonstrate the effects of such RIP-enforcing data transformation on sparse learning methods such as sparse and compressed Random Fields, as well as sparse regression (LASSO), in the context of classifying mental states based on fMRI data.

On Thursday 25 June 2009 at 15:15, there will be a presentation entitled Sparse recovery using sparse matrices by Prof. Piotr Indyk at EPFL/Summer Research Institute in room BC 01. The abstract reads:

Over the recent years, a new *linear* method for compressing high-dimensional data has been discovered. For any high-dimensional vector x, its *sketch* is equal to Ax, where A is a carefully designed m x n matrix (possibly chosen at random). Although typically the sketch length m is much smaller than the number of dimensions n, the sketch often contains enough information to recover a *sparse approximation* to x. At the same time, the linearity of the sketching method is very convenient for many applications, such as data stream computing and compressed sensing. The major sketching approaches can be classified as either combinatorial (using sparse sketching matrices) or geometric (using dense sketching matrices). Traditionally they have achieved different trade-offs, notably between the compression rate and the running time. In this talk we show that, in a sense, the combinatorial and geometric approaches are based on different manifestations of the same phenomenon. This connection will enable us to obtain several novel algorithms and constructions, including the first known algorithm that provably achieves the optimal measurement bound and near-linear recovery time.

There is also an article in the American Scientist on Compressive sensing entitled: The Best Bits by Brian Hayes.

I don't read Polish but Radek Adamczak’s weblog points to a lecture series by Alice Guionnet entitled Large Random Matrices: Lectures on Macroscopic Asymptotics.

Finally, I was reading Michael Nielsen blog and noticed a framework that looks similar to the one being used in manifold signal processing. Michael mentions the following:

Fully homomorphic encryption
Very interesting paper about a cryptographic scheme that allows computation directly on the encoded data, without requiring decryption.

The paper is : Fully homomorphic encryption using ideal lattices by Graig Gentry. The abstract reads:

We propose a fully homomorphic encryption scheme -- i.e., a scheme that allows one to evaluate circuits over encrypted data without being able to decrypt. Our solution comes in three steps. First, we provide a general result -- that, to construct an encryption scheme that permits evaluation of arbitrary circuits, it suffices to construct an encryption scheme that can evaluate (slightly augmented versions of) its own decryption circuit; we call a scheme that can evaluate its (augmented) decryption circuit bootstrappable.

Next, we describe a public key encryption scheme using ideal lattices that is almost bootstrappable.

Lattice-based cryptosystems typically have decryption algorithms with low circuit complexity, often dominated by an inner product computation that is in NC1. Also, ideal lattices provide both additive and multiplicative homomorphisms (modulo a public-key ideal in a polynomial ring that is represented as a lattice), as needed to evaluate general circuits.

Unfortunately, our initial scheme is not quite bootstrappable -- i.e., the depth that the scheme can correctly evaluate can be logarithmic in the lattice dimension, just like the depth of the decryption circuit, but the latter is greater than the former. In the final step, we show how to modify the scheme to reduce the depth of the decryption circuit, and thereby obtain a bootstrappable encryption scheme, without reducing the depth that the scheme can evaluate. Abstractly, we accomplish this by enabling the encrypter to start the decryption process, leaving less work for the decrypter, much like the server leaves less work for the decrypter in a server-aided cryptosystem.

Graig Gentry made a presentation on this at the intractability Center at Princeton. The video is here. Other presentations made at the
Complexity and Cryptography: Status of Impagliazzo’s Worlds workshop on June 3-5 are here.

Credit: NASA, The LCROSS mission will launch today and will crash on the moon in 4 months.