Nuit Blanche: And so it begins ... Compressive Genomics

Tuesday, July 17, 2012

And so it begins ... Compressive Genomics

You probably recall last month's "What is Faster than Moore's Law and Why You Should Care" where we noticed two facts: one, the rapid rise of computing power or imaging capabilities leading to a difficulty in keeping up with data understanding. Two, a new technology (sequencing) promises to deliver larger datasets at an even faster pace. As stated then, our only recourse is developing better algorithms....fast. Here is an instance of that need being addressed in a new Nature paper entitled Compressive genomics by Po-Ru Loh,Michael Baym, Bonnie Berger. The introduction starts with:

In the past two decades, genomic sequencing capabilities have increased exponentially1, 2, 3, outstripping advances in computing power4, 5, 6, 7, 8. Extracting new insights from the data sets currently being generated will require not only faster computers, but also smarter algorithms. However, most genomes currently sequenced are highly similar to ones already collected9; thus, the amount of new sequence information is growing much more slowly.

Here we show that this redundancy can be exploited by compressing sequence data in such a way as to allow direct computation on the compressed data using methods we term 'compressive' algorithms. This approach reduces the task of computing on many similar genomes to only slightly more than that of operating on just one. Moreover, its relative advantage over existing algorithms will grow with the accumulation of genomic data. We demonstrate this approach by implementing compressive versions of both the Basic Local Alignment Search Tool (BLAST)10 and the BLAST-Like Alignment Tool (BLAT)11, and we emphasize how compressive genomics will enable biologists to keep pace with current data.

The Compression-accelerated BLAST (CaBLAST) and Compression-accelerated BLAT (CaBLAT).implementations are here.

This compressive BLAST approach is not unlike what we call Compressive Signal Processing (see The Fundamentals of Compressive Sensing by Mark Davenport) , where signal recovery is not a central issue anymore

Within the context of genomics, this approach is also just the beginning as BLAST is not seen by experts to be the end-all for some genomic data comparison. Parallel to that argument, vanilla Compressive Sensing is not seen by most advanced users to be the end-all either as we are navigating now towards removing some complexity through the use of additional information through structured sparsity, highly correlated non sparse signals, analysis operators, streaming algorithms , structured norms, hardware solver implementation, sparse measurement implementation and their recent extension and so on...

Closer to the intersection of genomics and compressive sensing, we have the different attempts at reverse engineering biochemical networks, some hardware that can inspect the interior of a cell such as Quadriwave lateral shearing interferometry (maybe provide some data on gene expression through some proxy) or the denovo and inventive Machine Learning approach featured in Catching an aha moment with compressive sensing, better yet we have new ways perform Bacterial Community Reconstruction Using A Single Sequencing Reaction or Simulating and analyzing compressed-sensing pooling design experiments for next-generation sequencing projects, and the list goes on.

It might even be a good idea to have a session on the subject at the next BASP meeting or even sooner. We need to get that conversation going on a large scale as we don't have much time before the field begins to crumble into many little subfields.

For videos on issues related to biology, compressive sensing, streaming algorithms you may want to watch: