Sunday, December 14, 2014

Sunday Morning Insight: The Stuff of Discovery

In machine learning, they call the connundrum "exploitation versus exploration", in other circles we talk about "improving stuff versus discovery". There are many ways discovery can be defined. For instance in Crossing into P territory, we noted that a new kind of sensor (a genome sequencer) could enable experimentations in polynomial time thereby clearly expanding the possibilities to do real discovery. While both the PacBio and Oxford Nanopore technologies have just been made available before the summer, they are already changing the nature of discovery in that field [1]. Before, people would wonder how one could put pieces of DNA together, now that this complexity is mostly gone the new norm is now becoming: since genomes can be assembled easily, what sort of discovery can be done with a collection of genomes.
As I've said before, we have had this exact same explosion unraveling in compressive sensing ten years ago. What happened since ? Many polynomial time algorithms were developed with the emphasis of being faster than the previous ones and soon enough more complex data structures began to be exploited by the algorithms. There is really no reason to believe why this should not happen in genome sequencing: We are going to have many algorithms that do alignement using either PacBio or Oxford Nanopore technologies.

But in compressive sensing, something else happened: We began to discover a few things. This past Friday ( Hamming's time: Scientific Discovery Enabled by Compressive Sensing and related fields ) I provided two candidates. One candidate used the structure of the problem and its limitation to make predictions, while the other used the new paradigm to show how to nix a theory. Here are two other: An inference based on sparsity priors [2] (as featured in Catching an aha moment with compressive sensing ) and another one [3] about finding a needle in a haystack in an exponential families of solutions featured in ( Cluster expansion made easy with Bayesian compressive sensing ).

In actuality, those four examples fall into two categories: One category is where one uses the new prior as a way to find that needle in an exponential haystack while the other category uses empirical complexity bounds to reduce the phase space of what is feasible.

Either exploit the newfound capability or reduce the exploration horizon: different sides of the same discovery coin. Both are important.

[1] Genomic sequencing: Recent tweets, papers, blog posts and attendant comment on that blog post:
Widespread polycistronic transcripts in mushroom-forming fungi revealed by single-molecule long-read mRNA sequencing by Sean Gordon, Elizabeth Tseng, Asaf Salamov, Jiwei Zhang, Xiandong Meng, Zhiying Zhao, Dongwan Don Kang, Jason Underwood, Igor V Grigoriev, Melania Figueroa, Jonathan S Schilling, Feng Chen, Zhong Wang

MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island by Philip M Ashton, Satheesh Nair, Tim Dallman, Salvatore Rubino, Wolfgang Rabsch, Solomon Mwaigwisya, John Wain & Justin O'Grady
And a comment following this article: USB-sized DNA sequencer is error-prone, but still useful

Hi! Thanks for the write up! Getting spoken about on Ars Technica is definitely crossed something off my bucket list (I'm first author on the paper discussed).

I would just like to say a few things about the MinION/Oxford Nanopore:

1) While the error rate we observed is high compared to e.g. Illumina, it is comparable to PacBio (the main high throughput, long read tech).

2) You say 'the great promise of nanopore sequencing has been very difficult to match in practice'. However, I don't really think that is true. What ONT have done is amazing!

2a) First of all, the form factor is revolutionary. I'm not sure what your definition of a USB product is, but mine would be 'something where the only connection is a USB connection'. The MinION meets this.

2b) Rather than limiting the MinION device to a small number of elite institutes, they sent it to hundreds of 'normal' people. This is a brave move that speaks to the confidence they have in their technology. We had a positive experience with it, some people probably less so, others more so. This approach to letting everyone have a crack is surely one to applaud?

2c) The technology is just fantastic - single molecule sequencing using a biological pore! Think about how hard that must be to engineer! In a way that can be shipped to and used by hundreds of non-specialists! I should say that Illumina and PacBio also have awesome devices/technologies, but this one is newer ;-)

3) A slight technical issue, but the short reads weren't used to correct the long reads. The long reads were used to join contigs made using the short reads.

4) Another slight technical issue, the Illumina technology with bias is specifically the Nextera protocol. This has been known since this technology was developed by Jay Shendure's lab.

Thanks again for writing us up! 

[2] Direct inference of protein–DNA interactions using compressed sensing methods by Mohammed AlQuraishi, and Harley H. McAdams (featured in Catching an aha moment with compressive sensing )

[3] Lance J. Nelson*, Vidvuds Ozolins, C. Shane Reese, Fei Zhou, Gus L. W. Hart, "Cluster expansion made easy with Bayesian compressive sensing," Phys. Rev. B 88, 155105 (Oct. 2013). [pdf] and Lance J. Nelson*, Gus L. W. Hart, Fei Zhou, and Vidvuds Ozolins, "Compressive sensing as a paradigm for building physics models," Phys. Rev. B 87 035125 (2013). [pdf] featured in ( Cluster expansion made easy with Bayesian compressive sensing )
Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

No comments: