Nuit Blanche: Sunday Morning Insight: A conversation on Nanopore Sequencing and Signal Processing

Sunday, September 22, 2013

Sunday Morning Insight: A conversation on Nanopore Sequencing and Signal Processing

From [8]

As some of you have noticed, nanopore sequencing is a subject that comes back often here. One of the latest instance is another Sunday Morning Insight on Thinking about a Compressive Genome Sequencer. All the entries on the subject can be found under the nanopore tag. Because I wanted to be more informed about the technology and see how compressive sensing could be inserted in it, I reached out to a few people to get some conversation going.

The following is the result a long email exchange with someone also interested in this area of nanopore sequencing, in which we tried to clarify some of the potential issues for Nanopore signal analysis, based on the published results. The letter "I" is for my remarks and questions while "A" is for the person with whom I had this conversation and who wishes to remain anonymous. A big thank you to this person for the great conversation!

I: Here are two or three things that bug me and I was wondering if you could provide some enlightenment.

From an outsider's point of view, the various reports I see about nanopore engineering seem contradictory. On the one hand, the voltage curves I see seem to have pretty low noise yet, I also see somewhere that the overall success for these techniques are only about 96% accurate( where one would expect 99.99% or better) These two facts do not fit. The only way I can reconcile them, so far, is as follows:

We see only the good traces with low noise, it's OK, it's PR. In reality, noise is actually much worse in general.
Voltage drift issues are not minimal in any sense of the word especially when reading long strands.
A false twin issue? As I previously mentioned in the blog (see http://nuit-blanche.blogspot.fr/2013/04/structural-information-in-nanopore.html ) a potential issue involving knots might be at play. If one were to assume that the distances between voltage steps changes are not uniform then it becomes difficult to make out the difference between, say, a nucleotide G that has a knot behind it (and which takes a little while to get through the nanopore) and two perfectly fine Gs following each other in the DNA strand. That way, there is pretty difficult classification issue.
Finally, for reasons that are still unknown, we see voltage step readings that can neither be classified as any of the A, G, T and C nucleotide, a situation that might be related to some of the issues mentioned above or a combination thereof or others (such as voltage change across the nanopore)

What is your feeling about this outsider's analysis or am I way off base? Is there is a simpler explanation for the low 96% accuracy?

By the way, the problems mentioned above are not insurmountable, it's just that we need to be more serious on taking a stab at them. One could, for instance, remove the slowly moving drift using an "analysis based dictionary learning" approach. There are actually other methods as well.

A: The first thing to note is that the paper you mention at:

http://nuit-blanche.blogspot.com/2013/04/structural-information-in-nanopore.html [2]

Is an analysis of single bases free in solution, not strands of DNA. I don't believe any work showing a protein nanopore producing clear, single base resolution reads has been published (as an peer reviewed paper) to date.

The current measurements shown there do look quite clean. So, yes I can see why you'd expect a low error rate. There are probably a few sources of error you might want to consider:

All reports I've seen show that the dwell time of molecule in a protein nanopore is exponentially distributed. This is why people in the ion channel literature people are mostly happy using HMMs, because it does appear to be a Markov process [1]. Given that the time is exponentially distributed, there's a high probability that you might not see a base, or see it for a short time. This is one possible source of error, in this case deletions.
Most research talks about the need to control the motion of the DNA through the nanopore. This could be a significant source of error [3] [4] [5] [6].
The diagram on your blog shows 4 bases in a current range of ~15pA, but in the literature no one has presented single base resolution on strands from a protein nanopore that I'm aware of. The work has shown “that several nucleotides contribute to the recorded signal” [6] [7].

I: I think the sentence in ref [6] makes it plain obvious as to why a low pass sensor might be interesting, from [6]

“thus far the reading of the bases from a DNA molecule in a nanopore has been hampered by the fast translocation speed of DNA together with the fact that several nucleotides contribute to the recorded signal”.

I: What is the actual purpose of these "motors" that control the motion of the strand? Is it that:

with them the process is slowed down so that we have enough electrons per base (as you mentioned earlier if it goes faster we might be electron starved for the signal.) [we talked before about 1pA being 6 electronics per time interval at 1MHz sample rate].
without them there would be no strand going through the pore?
without them the nominal dwell time would be not very well defined?
make sure that the knotty situation mentioned in the blog entry does not influence unduly the dwell time in the pore?
any or all these explanations?

A: Possibly all of the above, depending on the system. Slowing down the strand (1) is probably the most significant contribution. If you think about the default case where there are no forces at play the DNA would be moving around under Brownian motion. There might be other local forces at play that make it move faster or slower. You might be able to control that motion with a "motor", but that might not always work very well.

In addition to this, some work shows that several nucleotides might contribute to the signal [7].

I: Please explain this last sentence. If it is what I think it is, it is very interesting.

A: So, to quote the wikipedia page on nanopore sequencing:

"In the early papers methods, a nucleotide needed to be repeated in a sequence about 100 times successively in order to produce a measurable characteristic change"

If you slow the strand down, or make other changes, you might still be faced with the problem that “several nucleotides contribute to the recorded signal” [6] [7].

I: Going back to the motor. With no motor, the strand can go up or down, and since you have access only to the current, you have really no idea which direction the strand is going.

A: This is correct.

I: Which brings me to a different type of question: is there other information gathered during those experiments that could be used to detect what direction the strand is taking? and at what speed? Are there some additional measurements made during those experiment?

A: No, I don't know of any additional measurements that could be made. I think some people have talked about using fluorescence to detect the motion of the strand through a pore but I don't know how far that works has gone.

If you slow the strand down, or make other changes, you might still be faced with the problem that “several nucleotides contribute to the recorded signal” [6] [7], i.e. signal does not come from a single position. Cherf et al.[7] is probably a good reference to look at for some example traces and information on this.

I: A-ah! So the measurement seem to be falling in this category of group measurements/group testing This is really what compressive sensing projects well into. I guess the main issues are:

the strand going through is a stochastic process with a poisson distribution (the motor makes that distribution to be more peaked)
we do not seem to have other measurements that could directly or indirectly provide some side information about the actual speed of the strand going through. In another blog entry (http://nuit-blanche.blogspot.com/2013/03/of-well-logging-and-nanopores.html) I made the parallel between nanopore and well logging/drilling issues. In the drilling issue, though, the probes have accelerometers on them so that a relatively simple kalman filter on top of the other information (akin to the current sensing in the nanopore) allows a much cleaner picture to emerge.

Is there anything else I am missing from that picture?

A: I think that's pretty accurate!

I: A final question on motors, do all nanopore systems need a motor?

A: Having a way of controlling the motion is desirable.

I: Are you telling me there are other ways of controlling the motion that do not require motors?

A: Speed can be controlled by various factors including:

Viscosity of the buffer
Applied voltage
Salt concentration
Temperature

(from Nanopores - Sensing and Fundamental Biological Interactions - page 272)

They also suggest there that optical and magnetic tweezers and "DNA Transistors" could be used to control the actual motion so there are a bunch of options I think.

I: Ah! This is interesting, maybe I should get my hands on this book (Nanopores - Sensing and Fundamental Biological Interactions). That the voltage across the pore also change the dynamic is also worth investigating.

I: Thank you very much

Using the commenter's feedback, I went ahead and read this very well written 2011 review of the technology [8] (Nanopore sensors for nucleic acid analysis by Bala Murali Venkatesan and Rashid Bashir) with the following abstract:

Abstract: Nanopore analysis is an emerging technique that involves using a voltage to drive molecules through a nanoscale pore in a membrane between two electrolytes, and monitoring how the ionic current through the nanopore changes as single molecules pass through it. This approach allows charged polymers (including single-stranded DNA, double-stranded DNA and RNA) to be analysed with subnanometre resolution and without the need for labels or amplification. Recent advances suggest that nanopore-based sensors could be competitive with other third-generation DNA sequencing technologies, and may be able to rapidly and reliably sequence the human genome for under $1,000. In this article we review the use of nanopore technology in DNA sequencing, genetics and medical diagnostics

In the context of the discussion above, Here are some excerpts of the review of interest:

"... A structural drawback with α-haemolysin is that the cylindrical β-barrel can accommodate up to ~10 nucleotides at a time, all of which significantly modulate the pore current [25]: this dilutes the ionic signature of the single nucleotide in the 1.4 nm constriction, thus reducing the overall signal-to-noise ratio in sequencing applications…”

I: So in this instance, we have a group measurement and the signal-to-noise ratio definition is really about sensing a single nucleotide within a larger group.

“... Moreover, in experiments involving immobilized ssDNA, as few as three nucleotides within or near the constriction contributed to the pore current [27] compared with the ten or so nucleotides that modulate the current in native α-haemolysin [25]....”

I: Again the concept of group measurements.

“...Unidirectional transport of dsDNA through this channel (from amino-terminal entrance to carboxyl-terminal exit) was also observed [29], suggesting a natural valve mechanism in the channel that assists dsDNA packaging during bacteriophage phi29 virus maturation. The capabilities of this protein nanopore will become more apparent in years to come....”

I: The review highlights a possible mechanism to constrain the strand in only one direction.

“...The first reports of DNA sensing using solid-state nanopores emerged in early 2001 when Golovchenko and co-workers used a custom-built ion-beam sculpting tool with feedback control to make nanopores with well-defined sizes in thin SiN membranes [42]...”

I: This is one element I had not really understood, the possibility of having solid state nanopore (and potentially use Moore’s law).

“....Indeed, we observed that DNA translocation was slower in Al2O3 nanopores than in SiN nanopores with similar diameters, which was attributed to the strong electrostatic interactions between the positively charged Al2O3 surface and the negatively charged dsDNA [45]. Enhancing these interactions, either electrostatically or chemically, could reduce DNA velocities even more....”

I: or even control it **during** the analysis!

“...Translocation velocities were between about 10 and 100 nucleotides per microsecond, which is too fast for the electronic measurement of individual nucleotides…”

I: And this is where the idea of A2I comes out ( see Sunday Morning Insight: Thinking about a Compressive Genome Sequencer at http://nuit-blanche.blogspot.com/2013/08/sunday-morning-insight-thinking-about.html , use the architecture developed for these low pass sensors to get an idea of what passes through the solid state nanopore.

“...This result suggests that if the translocation speed could be reduced to roughly one nucleotide per millisecond, single-nucleotide detection should be possible, which could potentially lead to DNA sequencing with electronic readout…”

So this is, in my mind, a signal processing issue. Much discovery goes in developing hardware/,materials to slow down the phenomenon when one could probably look at it with current speeds and a different signal processing approach.

“....For example, is single-nucleotide resolution possible in the presence of thermodynamic fluctuations and electrical noise? And will the chemical and structural similarity of the purines (A and G) and the pyrimidines (C and T) inherently limit the identification of individual nucleotides using ionic current?...”

Looks like even the specialists are asking themselves good questions!

“.....SNPs and point mutations have been linked to a variety of mendelian diseases as well as more complex disease phenotypes [67]. In proof-of-principle experiments, SNPs have been detected using ~2-nm-diameter SiN nanopores [68]. Using the nanopore as a local force actuator, the binding energies of a DNA binding protein and its cognate sequence relative to a SNP sequence could be discriminated (Fig. 4b). This approach could be extended to screen mutations in the cognate sequences of various other DNA binding proteins, including transcription factors, nucleases and histones.....”

I: This is an interesting use of side information.

“.....Similarly, given the progress with solid-state nanopores, if the translocation velocity could be reduced to a single nucleotide (which is ~3Å long) per millisecond, and if nucleotides could be identified uniquely with an electronic signature (an area of intense research), it would be possible to sequence a molecule containing one million bases in less than 20 minutes....”

I: Again the reduction of speed to get “pure” signals

“....There have been preliminary reports on the use of embedded planar gate electrodes in nanopores [40] and nano-channels [81,82] to electrically modulate the ionic pore current, and the integration of single-walled carbon nanotubes for the translocation of ssDNA [83]. …”

I: It looks to me like one of the principal element of a low pass sensors descrived above from the A2I philosophy. Other mechanical changes or side information to the current device include:

“.....Recent experiments with scanning tunnelling microscopes suggest that it might be possible to identify nucleotides with electron tunnelling [89] (because the energy gaps between the highest occupied and lowest unoccupied molecular orbitals of A, C, G and T are unique [90]), and partially sequence DNA oligomers [91]....”

“.....Efforts to fabricate nanopore sensors that contain nanogap-based tunnelling detectors are currently underway [93,94], but thermal fluctuations and electrical noise present major challenges.....”

“.....Another challenge is the fact that tunnelling currents vary exponentially with both the width and the height of the barriers that electrons have to tunnel through, which in turn depends on the effective tunnel distance and on molecule orientation.....”

“....A four-point-probe measurement could therefore reveal significantly more information than the two-probe measurements attempted so far, but reliably fabricating such a four-probe structure with subnanometre precision will be a formidable challenge. It should also be noted that it is not necessary to uniquely identify all four bases for certain applications. Some researchers have used a binary conversion of nucleotide sequences (A or T = 0, and G or C = 1), to discover biomarkers and identify genomic alterations in short fragments of DNA and RNA [95,96]..."

From [8]

[1] Markov, fractal diffusion and related models for ion channel gating, MSP Sansom et al. 1989.

[2] James Clarke, Hai-Chen Wu, Lakmal Jayasinghe, Alpesh Patel, Stuart Reid, Hagan Bayley (2009). Continuous base identification for single-molecule nanopore DNA sequencing Nature Nanotechnology

[3] Controlled translocation of individual DNA molecules through protein nanopores with engineered molecular brakes, Marcela Rincon-Restrepo, Ellina Mikhailova, Hagan Bayley, and Giovanni Maglia.

[4] Nanopore Analysis of Nucleic Acids Bound to Exonucleases and Polymerases, David Deamer.

[5] Translocation of double stranded DNA through membrane adapted phi29 motor protein nanopore,David Wendell, Peng Jing, [...], and Peixuan Guo

[6] “thus far the reading of the bases from a DNA molecule in a nanopore has been hampered by the fast translocation speed of DNA together with the fact that several nucleotides contribute to the recorded signal”. DNA sequencing with nanopores, Grégory F Schneider & Cees Dekker, Nature Biotechnology. http://ceesdekkerlab.tudelft.nl/wp-content/uploads/Nature.pdf

[7] Automated forward and reverse ratcheting of DNA in a nanopore at 5-Å precision, Cherf et al. Nat. Biotechnol. 30, 344–348 (2012).

[8] Nanopore sensors for nucleic acid analysis by Bala Murali Venkatesan and Rashid Bashir, Nature Nanotechnology, 6, 615–624 (2011). Published online 18 September 2011 also at: http://libna.mntl.illinois.edu/pdf/publications/127_venkatesan.pdf

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

2 comments:

Thomas Arildsen said...: Thank you for a very interesting post that opened up a whole box of things to think about.
In particular, I find this excerpt promising for the possibilities of an (undersampling) A2I approach:

“....There have been preliminary reports on the use of embedded planar gate electrodes in nanopores [40] and nano-channels [81,82] to electrically modulate the ionic pore current, and the integration of single-walled carbon nanotubes for the translocation of ssDNA [83]. …”

I: It looks to me like one of the principal element of a low pass sensors descrived above from the A2I philosophy.

One detail that strikes me about DNA sequencing and compressed sensing is that "traditional" compressed sensing deals with vectors of continuous variables while the data representing a DNA strand can be seen as a non-sparse but discrete-valued vector. In this light, compressed sensing can be applied in a different way where signals a considered "simple" instead of sparse. This is described by Donoho & Tanner in Precise Undersampling Theorems - third problem example "(Feas)" (http://people.maths.ox.ac.uk)/tanner/papers/DoTa_PUT.pdf
We have tried to apply this type of model for communication signals in http://vbn.aau.dk/en/publications/downsampling-of-dft-precoded-signals-for-the-awgn-channel%2818feb133-b4d2-4e62-867b-1714dae6c931%29.html and http://kom.aau.dk/~tlj/sprfs2012.pdf
This recent paper on ArXiv seems to deal with this type of model as well: http://arxiv.org/pdf/1303.3943v1.pdf. Unfortunately, I have not had time to read it.; Monday, September 23, 2013 at 4:36:00 AM CDT
Igor said...: Thomas,

Good point. it also brings to the table the possiblity of looking at it from other standpoints:
- 2-bit sensing
- use a different norm for the regularization, like the ones envisioned by Bach et al. Or even the universal solver of Duarte et Baron.

At that point, one wonders if, in order to test these hypotheses, we ought to:
- find a dataset or
- make one up for the purpose of testing the idea out.

Igor.

Igor.; Monday, September 23, 2013 at 7:52:00 AM CDT