Friday, July 04, 2014

A Second Inflection Point in Genome Sequencing ? and then ...

It's Friday afternoon, it Hamming's time.

If you read Nuit Blanche (Predicting the Future: The Steamrollers ), you are probably more aware of how the future unfolds. But probably not enough to figure out this upcoming game score.

The good folks at Google predict France will win based on touch by touch data. Anyway, we'll know in a few hours. What about looking into the future farther ahead ?

In genome sequencing, current techniques usually require the chemical cut of the DNA in small pieces to be eventually algorithmically reasembled into a sequence (see the Partial Digest Problem in Reconstruction of Integers from Pairwise Distances). A first inflection point in the democratization of DNA sequencing could be noticed back in 2007-2008 thanks to process parallelization [2]. Another is taking place right before our eyes and is probably going to be more prominent as massive data coming from Nanopore technology [1] unfolds. From Nanopores are here!:
In what appears to be the first example of publicly available, user-generated Oxford Nanopore MinION data, Nick Loman (aka @pathogenomenick) has given us a glimpse into the future.

Let us remember that previously on Nuit Blanche, we could already get our hands on raw data from a similar technology (Quantum Biosystems Provides Raw Data Access to New Sequencing Technology).

But soon enough, getting all this information for every person will mean that we will be able to search for the outliers (rare diseases) first [7] and search for actual drug that can target molecular networks [6] i.e; have more Stephanie Events with algorithms mentioned here on Nuit Blanche.

From [4]

For instance, recently The SwAMP Thing! entry pointed to the possibility of using AMP algorithms to perform group testing, in another entry ( ... There will be a "before" and "after" this paper ...,), other authors have evaluated what population sampling requirements when performing GWAS based on our current solvers capabilities. Again, all techniques and algorithms often featured here...

[5] Applying compressed sensing to genome-wide association studies by Shashaank Vattikuti, James J Lee, Christopher C Chang, Stephen D H Hsu and Carson C Chow

The study of molecular networks has recently moved into the limelight of biomedical research. While it has certainly provided us with plenty of new insights into cellular mechanisms, the challenge now is how to modify or even restructure these networks. This is especially true for human diseases, which can be regarded as manifestations of distorted states of molecular networks. Of the possible interventions for altering networks, the use of drugs is presently the most feasible. In this mini-review, we present and discuss some exemplary approaches of how analysis of molecular interaction networks can contribute to pharmacology (e.g., by identifying new drug targets or prediction of drug side effects), as well as list pointers to relevant resources and software to guide future research. We also outline recent progress in the use of drugs for in vitro reprogramming of cells, which constitutes an example par excellence for altering molecular interaction networks with drugs.

Genome-wide association studies have revealed that rare variants are responsible for a large portion of the heritability of some complex human diseases. This highlights the increasing importance of detecting and screening for rare variants. Although the massively parallel sequencing technologies have greatly reduced the cost of DNA sequencing, the identification of rare variant carriers by large-scale re-sequencing remains prohibitively expensive because of the huge challenge of constructing libraries for thousands of samples. Recently, several studies have reported that techniques from group testing theory and compressed sensing could help identify rare variant carriers in large-scale samples with few pooled sequencing experiments and a dramatically reduced cost.
Based on quantitative group testing, we propose an efficient overlapping pool sequencing strategy that allows the efficient recovery of variant carriers in numerous individuals with much lower costs than conventional methods. We used random k-set pool designs to mix samples, and optimized the design parameters according to an indicative probability. Based on a mathematical model of sequencing depth distribution, an optimal threshold was selected to declare a pool positive or negative. Then, using the quantitative information contained in the sequencing results, we designed a heuristic Bayesian probability decoding algorithm to identify variant carriers. Finally, we conducted in silico experiments to find variant carriers among 200 simulated Escherichia coli strains. With the simulated pools and publicly available Illumina sequencing data, our method correctly identified the variant carriers for 91.5-97.9% variants with the variant frequency ranging from 0.5 to 1.5%.
Using the number of reads, variant carriers could be identified precisely even though samples were randomly selected and pooled. Our method performed better than the published DNA Sudoku design and compressed sequencing, especially in reducing the required data throughput and cost.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

No comments: