Nuit Blanche: Sunday Morning Insight: Thinking about a Compressive Genome Sequencer

Sunday, August 18, 2013

Sunday Morning Insight: Thinking about a Compressive Genome Sequencer

If you have been reading a few entries on the subject here on Nuit Blanche, you know that genomic sequencing is a revolutionary technology that is capable of drastically changing how medicine work. In particular, there is this one technology that has been very promising for the past fifteen years and yet still has not been capable of producing more rapid genome decoding capability: nanopore sequencing.

We mentioned a few ideas on nanopore sequencing before ( see Of Well Logging and Nanopore Sequencing and Imagine: A Faster Nanopore DNA Sequencing Technique ) but here is another one.

If you read [1,2,3] , you'll note that one of the idea of nanopore sequencing is that one needs to use biological processes to slow down the translocation (movement) of the DNA through the Nanopore. This slowdown (or "rate control") needed (about three orders of magnitude according to [1]) allows for the sampling to be performed "accurately" and thereby provides a way to distinctly decide which of the base pairs (G,T,A,C) goes through the nanopore (and its attendant voltage readings).

As mentioned in the wikipedia entry on nanopores

Coupling an exonuclease to the biological pore would slow the translocation of the DNA through the pore, and increase the accuracy of data acquisition.

Or from [1]

For bandwidth and noise levels common to nanopore experiments, the specification for rate reduction is that the DNA should be slowed at least three orders of magnitude, from the un-impeded 1–3 s/nt [12] to 1 ms/nt or slower [4].

In other words, in the past ten years, much technology improvement has been focused on slowing down the DNA movement through the pores in order to be able to nicely sample the voltage recording and map that to a particular base pair. Let us also note that even then, researchers are considering several parallel operations of the same DNA strand through several pores [2] in order to allow redundancy and eventually reduce the overall voltage reading errors. Current result of the technology show a still too low accuracy.

It turns out that in compressive sensing several folks have been taken a stab at this exact problem: if a very rapid phenomenon cannot be sampled with current technology, one can find a solution if one has an ability to have a modulating technology that goes as fast as the phenomenon at play. If you have this modulating capability, then there is probably a way to use these new randomized Analog to Information samplers. Some of theses efforts are summarized in the A2I webpage set up by Emmanuel Candes at Stanford. In the case of nanopore technology, if one uses several batteries of DNA through several pores[2], and a switching technology based on, say, a different voltage across the different pores, at different times, then one might be able to forget about slowing down the DNA translocation through the pores and use directly the randomized readings of several pores to get data that can then be deconvoluted. What about sparsity ? Well, for one, there is already a generic known map of the Human genome. Any particular human genome must not be more than 2% different from that reference. The difference between the two is sparse. Easier said than done, I know, but it's important.

[1] Recent advances in nanopore sequencing by Raj D. Maitra, Jungsuk Kim, William B. Dunbar

The prospect of nanopores as a next-generation sequencing platform has been a topic of growing interest and considerable government-sponsored research for more than a decade.Oxford Nanopore Technologies recently announced the first commercial nanopore sequencing devices, to be made available by the end of 2012, while other companies (Life, Roche, and IBM) are also pursuing nanopore sequencing approaches. In this paper, the state of the art in nanopore sequencing is reviewed, focusing on the most recent contributions that have or promise to have next-generation sequencing commercial potential. We consider also the scalability of the circuitry to support multichannel arrays of nanopores in future sequencing devices, which is critical to commercial viability.

[2] Error analysis of idealized nanopore sequencing by Christopher R. O'Donnell, Hongyun Wang, William B. Dunbar

This numerical study provides an error analysis of an idealized nanopore sequencing method in which ionic current measurements are used to sequence intact single-stranded DNA in the pore, while an enzyme controls DNA motion. Examples of systematic channel errors when more than one nucleotide affects the current amplitude are detailed, which if present will persist regardless of coverage. Absent such errors, random errors associated with tracking through homopolymer regions are shown to necessitate reading known sequences (Escherichia coli K-12) at least 140 times to achieve 99.99% accuracy (Q40). By exploiting the ability to reread each strand at each pore in an array, arbitrary positioning on an error rate versus throughput tradeoff curve is possible if systematic errors are absent, with throughput governed by the number of pores in the array and the enzyme turnover rate.

[3] Dynamics of the Translocation Step Measured in Individual DNA Polymerase Complexes by Kate R. Lieberman, Joseph M. Dahl, Ai H. Mai, Mark Akeson, and Hongyun Wang

ABSTRACT: Complexes formed between the bacteriophage phi29 DNA polymerase (DNAP) and DNA ﬂuctuate between the pre-translocation and post-translocation states on the millisecond time scale. These ﬂuctuations can be directly observed with single-nucleotide precision in real-time ionic current traces when individual complexes are captured atop the α-hemolysin nanopore in an applied electric ﬁeld. We recently quantiﬁed the equilibrium across the translocation step as a function of applied force (voltage), active-site proximal DNA sequences, and the binding of complementary dNTP. To gain insight into the mechanism of this step in the DNAP catalytic cycle, in this study, we have examined the stochastic dynamics of the translocation step. The survival probability of complexes in each of the two states decayed at a single exponential rate, indicating that the observed ﬂuctuations are between two discrete states. We used a robust mathematical formulation based on the autocorrelation function to extract the forward and reverse rates of the transitions between the pre-translocation state and the post-translocation state from ionic current traces of captured phi29 DNAP−DNA binary complexes. We evaluated each transition rate as a function of applied voltage to examine the energy landscape of the phi29 DNAP translocation step. The analysis reveals that active-site proximal DNA sequences inﬂuence the depth of the pre-translocation and post-translocation state energy wells and affect the location of the transition state along the direction of the translocation.

. H/t to Jerry Zon's Three Takeaways from the 3rd Next-Generation Sequencing Conference blog entry.
Join the CompressiveSensing subreddit or the Google+ Community and post there !