Nuit Blanche

Saturday, August 04, 2012

The MMDS 2012 Slides are out! Workshop on Algorithms for Modern Massive Data Sets

In case you have to take your mind off tomorrow's suspense-filled and technologically challenging landing of Curiosity on Mars (see 7 minutes of Terror, a blockbuster taking place on Mars this Summer ) Michael Mahoney, Alex Shkolnik, Gunnar Carlsson, Petros Drineas, the organizers of Workshop on Algorithms for Modern Massive Data Sets (MMDS 2012), just made available the slides of the meeting. Other relevant meeting slides include that of the Coding, Complexity, and Sparsity Workshop and that of the GraphLab workshop. Don't grind your teeth too much tomorrow and Go Curiosity !

MMDS 2012
Tuesday, July 10, 2012. Theme: Data Analysis and Statistical Data Analysis

10:00 - 11:00 Tutorial: Jiawei Han A Meta Path-Based Approach for Similarity Search and Mining of Heterogeneous Information Networks
11:00 - 11:30 Alexander Gray Faster Learning for Massive Datasets
11:30 - 12:00 Christopher Re Hazy: Making Data-driven Statistical Applications Easier to Build and Maintain
2:00 - 3:00 Tutorial: Peter Bartlett Model Selection and Recent Results for Large Scale Problems
3:00 - 3:30 Noureddine El Karoui On Robust Regression Estimators in High-dimension
3:30 - 4:00 Jure Leskovec Affiliation Network Models for Densely Overlapping Communities in Networks
4:30 - 5:00 Haesun Park Nonnegative Matrix Factorizations for Clustering
5:00 - 5:30 Fan Chung Graham Vectorized Laplacians for Dealing with High-dimensional Data Sets
5:30 - 6:00 Joydeep Ghosh Actionable Mining of Large, Multi-relational Data using Localized Predictive Models

Wednesday, July 11, 2012. Theme: Industrial and Scientific Applications
TimeTalk

9:00 - 10:00 Tutorial: DJ Patil When Algorithms Go Wrong: How Product Design Can Save Algorithmic Limitations
10:00 - 10:30 Sean Fahey Big Data and Analytics for National Security
11:00 - 11:30 Petros Drineas Leverage Scores, the Column Subset Selection Problem, and Least-squares Problems
11:30 - 12:00 David Woodruff Low Rank Approximation and Regression in Input Sparsity Time
12:00 - 12:30 Michael W. Mahoney Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments
2:30 - 3:30 Tutorial: Rick Stevens The Biological, Algorithmic and Computational Challenges of Systems Biology
3:30 - 4:00 Tiankai Tu Fault-Tolerant Parallel Analysis of Millisecond-Scale Molecular Dynamics Trajectories
4:30 - 5:00 Alexander Szalay Current Statistical Challenges in Large Astronomical Surveys
5:00 - 5:30 Joseph Richards Astronomical Time Series Analysis for the Synoptic Survey Era
5:30 - 6:00 Tony Cass Data Handling for LHC: Plans and Reality

Thursday, July 12, 2012. Theme: Novel Algorithmic Approaches
TimeTalk
9:00 - 10:00 Tutorial: Michael Mitzenmacher Peeling Arguments: Invertible Bloom Lookup Tables and Biff Codes

10:00 - 10:30 Frederic Chazal Detection and Approximation of Linear Structures in Metric Spaces
11:00 - 11:30 Ping Li Probabilistic Hashing for Efficient Search and Learning on Massive Data
11:30 - 12:00 Ashish Goel Real Time Social Search and Related Problems
12:00 - 12:30 Andrew Goldberg Hub Labels in Databases: Shortest Paths for the Masses
2:30 - 3:00 Theodore Johnson Data Stream Warehousing
3:00 - 3:30 Josh Wills Experimenting at Scale
3:30 - 4:00 Hang Li Large Scale Machine Learning for Query Document Matching in Web Search
4:30 - 4:50 Blair Sullivan Branching Out: Quantifying Tree-like Structure in Complex Networks
4:50 - 5:10 Mahdi Soltanolkotabi A Geometric Analysis of Subspace Clustering with Outliers
5:10 - 5:30 Bahman Bahmani Scalable K-Means++
5:30 - 6:00 Steve Bartel Analytics at Dropbox

Friday, July 13, 2012. Theme: Novel Matrix and Graph Methods
TimeTalk

9:00 - 10:00 Tutorial: Yi Ma The Pursuit of Low-dimensional Structures in High-dimensional Data
10:00 - 10:30 Edoardo Airoldi Graphlets Decomposition of a Weighted Network
11:00 - 11:30 Yiannis Koutis SDD Solvers: Bridging the Gap Between Theory and Practice
11:30 - 12:00 Art Owen Bootstrapping r-fold Tensor Data
12:00 - 12:30 Kamesh Madduri Algorithms and Tools for Scalable Graph Analytics
2:30 - 3:00 Shaowei Lin Studying Model Asymptotics with Singular Learning Theory
3:00 - 3:30 David Bindel Communities, Spectral Clustering, and Random Walks
3:30 - 4:00 Ali Pinar The Block Two-Level Erdos-Renyi (BTER) Graph Model
4:30 - 5:00 Xiao-Li Meng Preprocessing, Multiphase Inference, and Massive Data in Theory and Practice
5:00 - 5:30 Alfred Hero Hub Discovery in Large Correlation Networks
5:30 - 6:00 Dan Feldman Google Your Life: Learning Sensors Data

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Friday, August 03, 2012

Imagine: A Faster Nanopore DNA Sequencing Technique

Via this paper, I was reminded of a now famous quote by David Brady widely seen as a reasoning for compressive sensing “if it is possible to compress measured data, one might argue that too many measurements were taken”.[1].

In DNA sequencing, some folks are considering compressive BLAST but this is really an after the fact technique, What about compressing the data during acquisition ? Let's say that since a lot many parts of DNA is non coding, or that DNA has some prior known sequence, can we argue that current DNA sequencing hardware and techniques are taking too many measurements ? In that case, how can compressive sensing help in that regard ? Can it improve the speed for reading DNA or can it improve the accuracy at which sequencing takes place.

Take for instance the latest and most promising technique, the Nanopore DNA sequencing (figure from Oxford Nanopores Technologies) technology which basically acts as a mesh screen and recognize a specific nucleotide (G, A, T,C) by how different the voltage across the pore varies. .

If the pore were to be bigger, the voltage measurement would be a combination of several nucleotides which could easily be put as a compressive sensing problematic.

[1] David Brady, Optical Imaging and Spectroscopy

Coding, Complexity, and Sparsity Workshop Slides

The Coding, Complexity, and Sparsity Workshop st UMich ended yesterday. Thanks to Anna Gilbert and the other organizers (S. Muthukrishnan, Hung Ngo, Ely Porat, Atri Rudra, Martin Strauss) for putting up some of the slides of the meetingm here us the abstract and the slides (when they exist)

Eric Allender, The strange link between incompressibility and computational complexity

This talk centers around some audacious conjectures that attempt to forge firm links between computational complexity classes and the study of Kolmogorov complexity.

More specifically, let R denote the set of Kolmogorov-random strings. Two familiar complexity classes are:

* BPP: the class of problems that can be solved with negligible error in polynomial-time, and

* NEXP: the class of problems solvable in nondeterministic exponential time.

Conjecture 1: NEXP = NP^R.

Conjecture 2: BPP is the class of problems non-adaptively polynomial-time reducible to R.

The first thing to note is that these conjectures are not only audacious; they are obviously false! R is not a decidable set, and thus it is absurd to suggest that the class of problems reducible to it constitutes a complexity class.

The absurdity fades if, for example, we interpret ``NP^R'' to be ``the class of problems that are NP-Turing reducible to R, no matter which universal machine we use in defining Kolmogorov complexity''. The lecture will survey the body of work (some of it quite recent) that suggests that, when interpreted properly, the conjectures may actually be true.

Eric Allender received his Ph.D. from the Georgia Institute of Technology in

1985. He has been at Rutgers University since then, serving as department

chair from 2006 to 2009. He is a Fellow of the ACM, is Editor-in-Chief of

ACM Transactions on Computation Theory, and serves on the editorial

boards of Computational Complexity, Computability, and The Chicago Journal of

Theoretical Computer Science.

Mark Braverman, An introduction to information complexity

The (two party) information complexity of a function F(X,Y) measures the least amount of

information that Alice who holds X and Bob who holds Y need to reveal to each other

about their inputs in order to compute F. While this complexity measure is interesting

in its own right -- for example in the context of information theoretic security, it is also

very useful in studying the communication complexity of problems. We will introduce

information complexity, and discuss some of its properties and connections to communication

complexity. No prior background will be assumed.

Mark Braverman is an assistant professor of computer science at Princeton University.

His interests lie in complexity theory, the theory of real computation, machine learning, algorithms, game theory, and

applications of computer science in healthcare and medicine.

For more material upon which Mark's tutorial is based, please see these references

1) http://www.cs.princeton.edu/~mbraverm/pmwiki/uploads/directsum.pdf

2) http://www.eccc.uni-trier.de/report/2010/083/

3) http://eccc.hpi-web.de/report/2011/123/

Amit Chakrabarti, The Gap-Hamming-Distance Story
Over the last ten years, plenty of ink has been spilled on a single communication problem called Gap-Hamming-Distance (GHD).  In this problem, Alice and Bob receive n-bit strings and must decide whether they are "close" (Hamming distance <= n/2 - sqrt(n)) or "far" (distance \le n/2 + sqrt(n)). How many bits must they communicate to solve this problem, with at most a 1% error?

In this talk, we will see how this problem arose and gained importance for its applications to space lower bounds for data stream algorithms. We shall then follow the journey from the early results on GHD, to stronger intermediate results obtained through a deeper understanding of the problem, and finally to three recent proofs of an optimal Omega(n) lower bound. As we proceed along this journey, we shall encounter successively deeper geometric ideas, many being used for the first time
in the context of communication complexity.

We will end with an intellectual puzzle on GHD that remains unsolved.

Amit Chakrabarti is an Associate Professor of Computer Science at Dartmouth. He received a Ph.D. in Computer Science from Princeton University in 2002, and a B.Tech. in Computer Science, along with the President of India Gold Medal, from the Indian Institute of Technology, Bombay, in 1997.

Prof. Chakrabarti has broad interests in theoretical computer science, and specializes in communication complexity and its applications, the theory of data stream algorithms, and approximation algorithms for combinatorial optimization. He is the recipient of an NSF CAREER Award and a Karen Wetterhahn Fellowship. For more information, please visit http://www.cs.dartmouth.edu/~ac
Madhi Cheraghchi, Restricted Isometry of Fourier Matrices and List Decodability of Random Linear Codes

We prove that a random linear code over $\mathbb{F}_q$, with probability arbitrarily close to $1$, is list decodable at radius $1-1/q-\varepsilon$ with list size $L=O(1/\varepsilon^2)$ and rate R=\Omega_q(\varepsilon^2/(\log^3(1/\varepsilon)))$. Up to the polylogarithmic factor in $1/\varepsilon$ and constant factors depending on $q$, this matches the lower bound $L=\Omega_q(1/\varepsilon^2)$ for the list size and upper bound $R=O_q(\varepsilon^2)$ for the rate. Previously only existence (and not abundance) of such codes was known for the special case $q=2$ (Guruswami, H{\aa}stad, Sudan and Zuckerman, 2002).

In order to obtain our result, we employ a relaxed version of the well known Johnson bound on list decoding that translates the \emph{average} Hamming distance between codewords to list decoding guarantees. We furthermore prove that the desired average-distance guarantees hold for a code provided that a natural complex matrix encoding the codewords satisfies the Restricted Isometry Property with respect to the Euclidean norm (RIP-2). For the case of random binary linear codes, this matrix coincides with a random submatrix of the Hadamard-Walsh transform matrix that is well studied in the compressed sensing literature.

Finally, we improve the analysis of Rudelson and Vershynin (2008) on the number of random frequency samples required for exact reconstruction of $k$-sparse signals of length $N$. Specifically, we improve the number of samples from $O(k \log(N) \log^2(k) (\log k + \log\log N))$ to $O(k \log(N) \log^3(k))$. The proof involves bounding the expected supremum of a related Gaussian process by using an improved analysis of the metric defined by the process. This improvement is crucial for our application in list decoding.

Joint work with Venkatesan Guruswami and Ameya Velingker.

Mahdi Cheraghchi is a post-doctoral researcher at the Computer Science Department of Carnegie Mellon University, Pittsburgh, PA, since 2011. Prior to that, he spent a year at the Department of Computer Science, University of Texas at Austin, TX, working with Prof. David Zuckerman. He received M.Sc. and Ph.D. degrees in computer science from École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland, in 2005 and 2010, respectively, working under supervision of Prof. Amin Shokrollahi. In 2004, he received his B.Sc. degree in computer engineering from Sharif University of Technology, Tehran, Iran. He has broad interests in theoretical computer science, which in particular include the interconnections between coding theory and theoretical computer science, derandomization theory and explicit constructions, probabilistically checkable proofs and inapproximability.

Anna Gal, Error-correcting codes vs. superconcentrator graphs
We prove tight bounds on the number of wires in constant depth
(unbounded fan-in) circuits computing asymptotically good 
error-correcting codes.
We show that quasilinear number of wires is sufficient for
computing good codes already in depth 2 circuits with parity gates.
This implies that a (necessarily dense) generator matrix for the code
can be written as the product of two sparse matrices.
For depth 2, our Omega(n (log n/log log n)2) lower bound gives the largest 
known lower bound for computing any linear map.

Furthermore, we identify a new class of superconcentrator-like graphs with 
connectivity properties distinct from previously studied ones.
We show that any circuit computing good codes must satisfy such 
superconcentrator-like properties.

Joint work with Kristoffer Hansen, Michal Koucky, Pavel Pudlak and Emanuele Viola.

Anna Gal received her Ph.D. in Computer Science at the University of Chicago
in 1995. She was a postdoc at the Institute for Advanced Study in Princeton,
and at Princeton University. She is a faculty member of the 
Computer Science Department at the University of Texas at Austin 
since 1997. She is a recipient of an NSF CAREER Award and an Alfred P. Sloan 
Research Fellowship. She received the Machtey Award for Best Student Paper
at FOCS 1991, and the EATCS Best Paper Award at Track A of ICALP 2003.
She is an Associate Editor of the  ACM Transactions on Computation Theory.

Piotr Indyk, Tutorial on compressive sensing/streaming algorithms

See above.

Piotr Indyk is a Professor of Electrical Engineering and Computer Science at MIT. He joined MIT in 2000, after earning PhD from Stanford University. Earlier, he received Magister degree from Uniwersytet Warszawski in 1995.

His research interests include high-dimensional computational geometry,
sketching and streaming algorithms, sparse recovery and compressive sensing. He is a recipient of the NSF CAREER Award, Sloan Fellowship, Packard Fellowship and Technology Review TR10. He is an Associate Editor for the IEEE Transactions on Signal Processing and SIAM Journal on Computing.
Swastik Kopparty, Multiplicity Codes

Multiplicity Codes are certain natural error-correcting codes based on evaluations of polynomials and their derivatives. These codes have rate approaching 1, yet they allow for sublinear time decoding from errors. Furthermore, they admit some powerful list-decoding algorithms, achieving the so-called list-decoding capacity (mimicking the behavior of the Folded Reed-Solomon Codes of Guruswami and Rudra). I will talk about these codes and their decoding algorithms.

Part of this talk is based on joint work with Shubhangi Saraf and Sergey Yekhanin.

Richard Lipton, The Hartmanis Stearns Conjecture: Complexity of numbers---two views

Raghu Meka, Better Pseudorandom Generators from Milder Pseudorandom Restrictions

We present an iterative approach to constructing pseudorandom generators, based on the repeated application of mild pseudorandom restrictions. We use this template to construct pseudorandom generators for combinatorial rectangles and read-once CNFs and a hitting set generator for width-3 branching programs that achieve near-optimal seed-length even in the low error regime. The (pseudo)random restrictions we use are milder than those typically used for proving circuit lower bounds, in that we only set a constant fraction of the bits at a time. While such restrictions do not simplify the functions drastically, we show that they can be derandomized using small-bias spaces.

Based on joint work with Parikshit Gopalan, Omer Reingold, Luca Trevisan and Salil Vadhan.

Raghu Meka is currently a postdoctoral member at the Institute for Advanced Study at Princeton. He received his PhD from the department of computer science at the University of Texas at Austin in 2011. He is a recipient of the Bert Kay best dissertation award at UT Austin, the Dean's Excellence award and an MCD fellowship at UT Austin. Prior to joining UT Austin, he completed his B.Tech in computer science from Indian Institute of Technology, Chennai, India. His main interests are in complexity theory, pseudorandomness, learning theory and data mining.

Andrew McGregor, Graph Sketching

We present a sequence of algorithmic results for analyzing graphs via random linear projections or ``sketches". We start with results for evaluating basic connectivity and k-connectivity and then use these primitives to construct combinatorial sparsifiers that allow every cut to be approximated up to a factor(1+\epsilon). Our results have numerous applications including single-pass, semi-streaming algorithms for constructing sparsifiers in fully-dynamic graph streams where edges can be added and deleted in the underlying graph.

This is joint work work with Kook Jin Ahn and Sudipto Guha.

Andrew McGregor is an Assistant Professor at the University of Massachusetts, Amherst. He received a B.A. degree and the Certificate of Advance Study in Mathematics from the University of Cambridge and a Ph.D. from the University of Pennsylvania. He did post-doctorial research at UC San Diego and Microsoft Research SVC. He is interested in many areas of theoretical computer science and specializes in data stream algorithms, linear sketching, and communication complexity. He received the NSF Career Award in 2010 and Lilly Teaching Fellowship in 2012.

Eric Price, Improved Concentration Bounds for Count-Sketch

We present a refined analysis of the classic Count-Sketch streaming heavy hitters algorithm [CCF02]. Count-Sketch uses O(k log n) linear measurements of a vector x in R^n to give an estimate x' of x. The standard analysis shows that this estimate x' satisfies ||x'-x||_infty^2 < ||x_tail||_2^2 / k, where x_tail is the vector containing all but the largest k coordinates of x. Our main result is that most of the coordinates of x' have substantially less error than this upper bound; namely, for any c < O(log n), we show that each coordinate i satisfies

(x'_i - x_i)^2 < (c/log n) ||x_tail||_2^2/k

with probability 1-2^{-c}, as long as the hash functions are fully independent. This subsumes the previous bound.

Our proof shows that any random variable with positive real Fourier transform and finite variance concentrates around 0 at least as well as a Gaussian. This result, which may be of independent interest, gives good concentration even when the noise does not converge to a Gaussian.

Eric Price just finished his third year as a Ph.D. student with Piotr Indyk at MIT, working on sparse recovery problems.

Ronitt Rubinfeld,  Local Computation Algorithms

Consider a setting in which inputs to and outputs from a computational

problem are so large, that there is not time to read them in their

entirety.   However, if one is only interested in small parts of the

output at any given time, is it really necessary to solve the entire

computational problem? Is it even necessary to view the whole input?

We propose a model of {\em local computation algorithms} which for a

given input, supports queries by a user to values of specified bits

of a legal output. The goal is to design local computation algorithms

in such a way that very little of the input needs to be seen in order

to determine the value of any single bit of the output. In particular,

the runtime and space requirements of the local computation algorithms

should be sub-linear in the size of the input. Surprisingly, there are

nontrivial computational problems where such efficiency is achievable.

Local computation algorithms are intended to distill the common features

of several concepts that have previously appeared in various algorithmic

subfields, including local distributed computation, local algorithms,

locally decodable codes, and local reconstruction. In this talk, we

survey some of the recent problems, including maximal independent set

and hypergraph coloring, for which polylogarthmic time and space local

computation algorithms have been given.

Joint work with Noga Alon, Gil Tamir, Shai Vardi and Ning Xie.

Ronitt Rubinfeld received her undergraduate degree at the University of

Michigan and PhD at the University of California, Berkeley.    She held postdoctoral positions

at DIMACS and the Hebrew University at Jerusalem.   After several years

as a faculty member at Cornell University and a senior research scientist at NEC Research Institute, she is currently on the faculties

of MIT and Tel Aviv University.   She has been a recipient of the ONR Young Investigator award, NSF Career

Award, Sloan Fellowship, Cornell Association for Computer Science Undergraduates Faculty of the Year

award and a Cornell College of Engineering Teaching award.   She was an invited speaker at the International Congress of Mathematicians in 2006.

Her research focuses on sub-linear time algorithms for big datasets.

Shubhangi Saraf, Error correcting codes and applications to theoretical computer science

In today's world there are huge amounts of data that need to get reliably stored or transmitted. However as hard as we try, some amount of noise or corruption is inevitable. Bits do get flipped and the data gets corrupted. However, we would still like to recover the original uncorrupted data. Is this even possible? This is precisely the task that error correcting codes accomplish. Error correcting codes have also found a host of striking applications in complexity theory and cryptography. In this talk I will discuss several examples of error correcting codes and we will see some of the applications to theoretical computer science.

Shubhangi is a faculty member in the Mathematics and Computer Science departments at Rutgers University. She is broadly interested in all areas of theoretical computer science and discrete mathematics. Much of her research has focused on complexity theory and property testing. More specifically, she is interested in the complexity of arithmetic computation, the role of randomness in computing, and in sublinear-time algorithms for codes and other combinatorial structures. Prior to joining Rutgers, she completed her PhD in computer science at MIT in 2011 and a postdoc at the Institute for Advanced Study in 2012.

Chris Umans, Tutorial on pseudorandomness

In a variety of settings, it is useful to view randomness as a resource to be conserved, and this perspective leads to the study of pseudorandomness, in which one tries to efficiently construct objects that "appear" random in specific ways, using little or no randomness.

In this talk, I'll cover a variety of topics in this area, such as epsilon-bias spaces, k-wise independence, expander graphs, randomness extractors and condensers, and pseudorandom generators. I'll describe how these objects are constructed and give examples of how they might be used.

No specialized background will be required.

Chris Umans received his Ph.D. in Computer Science from Berkeley in 2000. After spending two years as a postdoc in the Theory Group at Microsoft Research, he joined the Computer Science faculty at Caltech in 2002. His research interests encompass computational complexity, randomness in computation, and algebraic complexity and algorithms. He is the recipient of an NSF CAREER award, an Alfred P. Sloan Research Fellowship, and several best-paper and teaching awards. He is a managing editor of Theory of Computing.

David Woodruff, Low Rank Approximation and Regression in Input Sparsity Time

We improve the running times of algorithms for least squares regression and low-rank approximation to account for the sparsity of the input matrix. Namely, if nnz(A) denotes the number of non-zero entries of an input matrix A:

- we show how to solve approximate least squares regression given an n x d matrix A in nnz(A) + poly(d log n) time

- we show how to find an approximate best rank-k approximation of an n x n matrix in nnz(A) + n*poly(k log n) time

All approximations are relative error. Previous algorithms based on fast Johnson-Lindenstrauss transforms took at least ndlog d or nnz(A)*k time. We have implemented our algorithms, and preliminary results suggest the algorithms are competitive in practice.

Joint work with Ken Clarkson.

David Woodruff joined the theory group at IBM Almaden as a research scientist after completing his Ph.D. in 2007 at MIT in theoretical computer science.

Thursday, August 02, 2012

Post doc position in Compressive sensing and medical ultrasound imaging (France)

Post doc position in Compressive sensing and medical ultrasound imaging

IMT (Institute of mathematics of Toulouse) and IRIT (Institute of Computer Science Research of Toulouse).

Catégorie : Post-doctorant

This topic is at the interface of Mathematics, and health.

Among the existing medical imaging modalities, ultrasound imaging is one of the most widely used in various applications, such as pregnancy monitoring, cardiac imaging or blood flow estimation. The main advantages of ultrasound imaging are, in addition to its safety, its low cost and real-time nature which make it a standard modality in medical imaging. Currently, the real-time behaviour is not fully guaranteed in many applications such as 3-D ultrasound. Also the amount of ultrasound data acquired (1-D, 2-D or 3-D) becomes more and more important and makes extraction of relevant parameters difficult and costly (via signal and image processing techniques).

The primary objective of this work is to propose techniques to perform high frame imaging by reducing the amount of ultrasonic data during signal acquisitions, while maintaining the image reconstruction quality in terms of quantitative but also diagnostic power terms. The second objective of the work is to accelerate the real-time nature that characterizes this imaging modality. To do so, the investigations will focus both on the inverse problem including compressed sampling, and specific beamforming.

In practice, we will deal with the high resolution reconstruction of the projected image in a space of sparse representation, from a reduced number of samples

Applicants are expected to have at least one of the following:

Knowledge in numerical optimization
Inverse problem
Sparse representation
Compressive sensing

Knowledge in medical ultrasound imaging is very appreciated.

This post-doc will be performed at IMT (Institute of mathematics of Toulouse) and IRIT (Institute of Computer Science Research of Toulouse) and involve different groups of IRIT. It will yield collaboration with manufacturers of imaging systems and medical teams.

Job Location

IRIT, University of Toulouse, Toulouse France

Contact

Denis Kouamé, IRIT, University of Toulouse III, kouame@irit.fr

Adrian Basarab, basarab@irit.fr

Re-Weighted l_1 Dynamic Filtering for Time-Varying Sparse Signal Estimation

Following up on time varying sparse signals, Adam Charles just sent me the following:

Hello Igor,

I've been a long time visitor to Nuit Blanche, and have enjoyed your updates on the advances in the compressive sensing community. Given your discussion the last few days, we have recently submitted a paper which we believe may be of interest to yourself and the CS community: "Re-weighted l1 dynamic filtering for time-varying sparse signal estimation" (the pre-print is available here: http://arxiv.org/pdf/1208.0325v1.pdf). In this work we approach the problem of causally estimating dynamically changing sparse vectors by taking inspiration from both Kalman filtering and hierarchical sparsity models (used to induce correlations between coefficients). Our approach is meant to be robust to non-Gaussian statistics even in the model errors (e.g., shot noise), and is based on re-weighted L1 optimization (which can leverage efficient L1 solvers).
Regards,

-Adam Charles

Thanks Adam !

Adam tells me that they should release an implementation of this algorithm soon. stay tuned. If memory serves (my memory has been known to fail!) this is the first I see a mention of UKF / Particle Filter in a Kalman-CS paper. In the meantime, here is the paper:

Re-Weighted l_1 Dynamic Filtering for Time-Varying Sparse Signal Estimation by Adam Charles , and Christopher J. Rozell. The abstract reads:

Signal estimation from incomplete observations improves as more signal structure can be exploited in the inference process. Classic algorithms (e.g., Kalman ﬁltering) have exploited strong dynamic structure for time-varying signals while modern work has often focused on exploiting low-dimensional signal structure (e.g., sparsity in a basis) for static signals. Few algorithms attempt to merge both static and dynamic structure to improve estimation for time-varying sparse signals (e.g., video). In this work we present a re-weighted `1 dynamic ﬁltering scheme for causal signal estimation that utilizes both sparsity assumptions and dynamic structure. Our algorithm leverages work on hierarchical Laplacian scale mixture models to create a dynamic probabilistic model. The resulting algorithm incorporates both dynamic and sparsity priors in the estimation procedure in a robust and efﬁcient algorithm. We demonstrate the results in simulation using both synthetic and natural data.

- Update: The attendant code is located here -

More intra-block correlation and sparse sensing matrices

Phil Schniter sent me the following as a follow-up to yesterday's entry on intra-block correlation and sparse sensing matrices:

Hi Igor,

I wanted to let you know that we too have several papers that cover the topic of compressive sensing under "intra-block correlation and sparse sensing matrices". Although we pose our work in the context of "multiple measurement vector" compressive sensing, where the goal is to recover the signal matrix X from the noisy linear observations Y=A*X+N with known A matrix, the equivalent single-measurement-vector problem is y=blkdiag(A,...,A)*x+n with vectors y=vec(Y), x=vec(X), n=vec(N), where x is block-sparse when the columns of X share the same support, and where x exhibits intra-block-correlation when X exhibits correlation across rows. Moreover, the equivalent sensing matrix blkdiag(A,...,A) is "sparse" even when A is dense due to its block-diagonal nature.

One of our papers http://arxiv.org/pdf/1111.5272v3.pdf treats the above "MMV" problem where the columns of X share the same support (i.e., x is block-sparse), while another http://arxiv.org/pdf/1205.4080v1.pdf treats the more difficult problem where the columns of X exhibit a slowly changing support, sometimes referred to as the "dynamic CS" problem. Next week, at the Statistical Signal Processing Workshop, we will present a general framework to learn and exploit many forms of structure encountered in compressive sensing that has been implemented as an extension to the sourceforge GAMPmatlab package :http://www2.ece.ohio-state.edu/~schniter/pdf/ssp12_mmv.pdf. The general theme of our work is that, using approximate message passing techniques, we get performance that meets or exceeds that of Bayesian or variational techniques, but with orders-of-magnitude reduction in complexity.

Cheers,

Phil

Thanks Phil !. The paper Phil mentions are:

Efﬁcient High-Dimensional Inference in the Multiple Measurement Vector Problem by Justin Ziniel and Philip Schniter. The abstract reads:

In this work, a Bayesian approximate message passing algorithm is proposed for solving the multiple measurement vector (MMV) problem in compressive sensing, in which a collection of sparse signal vectors that share a common support are recovered from undersampled noisy measurements. The algorithm, AMP-MMV, is capable of exploiting temporal correlations in the amplitudes of non-zero coefﬁcients, and provides soft estimates of the signal vectors as well as the underlying support. Central to the proposed approach is an extension of recently developed approximate message passing techniques to the amplitudecorrelated MMV setting. Aided by these techniques, AMP-MMV offers a computational complexity that is linear in all problem dimensions. In order to allow for automatic parameter tuning, an expectation maximization algorithm that complements AMP-MMV is described. Finally, a detailed numerical study demonstrates the power of the proposed approach and its particular suitability for application to highdimensional problems.

Dynamic Compressive Sensing of Time-Varying Signals via Approximate Message Passing by Justin Ziniel and Philip Schniter. The abstract reads:

In this work the dynamic compressive sensing (CS) problem of recovering sparse, correlated, time varying signals from sub-Nyquist, non-adaptive, linear measurements is explored from a Bayesian perspective. While there has been a handful of previously proposed Bayesian dynamic CS algorithms in the literature, the ability to perform inference on high-dimensional problems in a computationally efﬁcient manner remains elusive. In response, we propose a probabilistic dynamic CS signal model that captures both amplitude and support correlation structure, and describe an approximate message passing algorithm that performs soft signal estimation and support detection with a computational complexity that is linear in all problem dimensions. The algorithm, DCS-AMP, can perform either causal ﬁltering or non-causal smoothing, and is capable of learning model parameters adaptively from the data through an expectation maximization learning procedure. We provide numerical evidence that DCS-AMP performs within 3 dB of oracle bounds on synthetic data under a variety of operating conditions. We further describe the result of applying DCS-AMP to two real dynamic CS datasets, as well as a frequency estimation task, to bolster our claim that DCS-AMP is capable of offering state-of-the-art performance and speed on real-world high-dimensional problems.

A GENERALIZED FRAMEWORK FOR LEARNING AND RECOVERY OF STRUCTURED SPARSE SIGNALS by Justin Ziniel, Sundeep Rangan, and Philip Schniter. The abstract reads:

We report on a framework for recovering single- or multi-timestep sparse signals that can learn and exploit a variety of probabilistic forms of structure. Message passing-based inference and empirical Bayesian parameter learning form the backbone of the recovery procedure. We further describe an object-oriented software paradigm for implementing our framework, which consists of assembling modular software components that collectively deﬁne a desired statistical signal model. Lastly, numerical results for synthetic and real-world structured sparse signal recovery are provided.

Wednesday, August 01, 2012

Intra-Block Correlation and Sparse Sensing Matrices

In The Cost of Not Knowing, I stated at the very end that

When solving for x in dimension N, you have to sample x a few times if you have some knowledge of the deeper inter-dependencies between the elements of x. It's been shown to work recently.

With this knowledge, your cost for discovering x can be pretty low.

How low ? we're not sure yet

Zhilin sent me an email this morning alluding to this issue of "we're not sure yet". His empirical answer for x with a certain sparsity and some knowledge of the intra-correlation between the non zero elements seems to be close to that of last year's result (more recently here) from Florent Krzakala, Marc Mézard, François Sausset,Yifan Sun, Lenka Zdeborová.

This is fascinating but in return it begs other questions that seem to be furthered by the second half of his e-mail. I shall shall return to those issues later. In the meantime, here is Zhilin's e-mail

Hi.... Igor,

....

We recently updated our two papers, and some new features/results were added, which you may be interested in.

One is the paper on block sparse Bayesian learning: "Extension of SBL Algorithms for the Recovery of Block Sparse Signals with Intra-Block Correlation". The arXiv link is here: http://arxiv.org/pdf/1201.0862v2.pdf. You may be interested in Fig.1 and Fig.2 in the revised version, which show the phase transition curves of BSBL algorithms and other block sparse algorithms. You can see that the BSBL algorithms' phase transition curves show some interesting phenomenon: in many situations BSBL algorithms can recover a block sparse signals with K non-zero elements from only K measurements.

The second paper is, "Low Energy Wireless Body-Area Networks for Fetal ECG Telemonitoring via the Framework of Block Sparse Bayesian Learning". The arXiv link is here: http://arxiv.org/pdf/1205.1287v3.

In the revised version, we used another dataset (the OSET dataset), which contains much strong noise (e.g. baseline wanders), and the fetal hearbeat signals are completely buried in the noise (in our previous version the baseline wanders were very small). This dataset is a better representative to practical scenarios, since strong baseline wanders are often seen in raw fetal ECG recordings (and other physiological signal recordings), especially in wireless telemonitoring scenarios. Another interesting result is Fig.15, which shows the used BSBL algorithm's performance is not affected by the number of non-zero elements in each column of the sensing matrix. In other words, when each column of the sensing matrix contains only two non-zero elements, the BSBL algorithm still has the same performance.

This is probably different from some existing results using other algorithms, whose performance is sensitive to the number of non-zero elements in each column of the sensing matrix. And this result indicates that the BSBL algorithm can further reduce energy consumption in the signal compression stage.

If you have any comments, please feel free to let me know.

Best regards,

Zhilin

Thanks Zhilin !

Le Minotaure (Tu dois finir ta thèse), par Simon Berjeaut

When I first put this video on the blog (thanks to Laurent), it was with the belief that it would be viral in the french speaking PhD student population. The video has reached about 23,000 views so far and the singer, Simon Berjeaut, kindly contacted me to ask if I could direct the stream of viewers to his Youtube channel and his recording of the song. I am very happy to do that. Here it is:

I am sure the song would be a big hit if it were to be translated in English. The play on words would make that difficult for a word to word translation but the theme is pretty universal.