Friday, March 08, 2013

Of Well Logging and Nanopore Sequencing

Back in the day, I had a friend who worked for one of the large companies that provides services for locating oil: The process is called well logging.  There, one inserts a sensor probe in the ground and through either pushing or pulling it, one gets different readings (called logs) from the sensors (radioactive and more). It seems simple enough but when you push or pull something in the ground, especially at large depths you have two problems:
  • no GPS
  • slippage
In other words, the probe can never be located exactly in part due to the fact that the motion of the probe is not directly traceable and is not smooth by any stretch of the imagination. The way to go about this was to apply a Kalman Filter in order to evaluate the location state of the probe provided the different inputs (push or pull) and the readings of the accelerometers inside to the probe. 

I recall specifically seeing the before- and after- logs and immediately got interested in this Kalman filter algorithm. The technique has improved nowadays with the appearance of Unscented KF and other variations. Some of these variations have been covered on Nuit Blanche with the introduction of L_1 decoding instead of least square state estimation. 

One can see that the logging situation has similarities with the situation where some DNA is being pushed through a pore such as in the nanopore sequencing technology. The nanopore technology is bound to dramatically increase the speed one can perform genome analyses (at a faster pace than Moore's law). If it took 15 years to go from one device to a billion per year for CMOS, one can barely estimate the reach of genomics in a few years (we are not prepared to live in exponential times). This is all conditional on techniques like nanopores to provide better results. Where are we on that front ?

If you recall on the Nuit Blanche's Month in Review of September 2012, I mentioned

"....Still no word on Oxford Nanopore but if you recall their announcement back in February as Nick Loman reported (their new technology goes through DNA one piece at a time i.e. no need to cut it off in several smaller parts like other technology)
....For actual base-reads, ONT still has not achieved single-base sensitivity (though Brown did mention they are working on it). Instead they are reading three bases at a time, leading to 64 different current levels. They then apply the Viterbi algorithm – a probabilistic tool that can determine hidden states – to these levels to make base calls at each position.

In the same post, one can read that they have an 96% reading accuracy which is very low in that business and the reason probably they haven't gotten a machine out yet, but they are hiring people who can make sense of their data...."

Let us simply note that a generalization of the Viterbi Algorithm [is] substantially similar to the belief propagation algorithm which happens to be an algorithm in use in sparse recovery algorithms covered here on Nuit BlancheWe have had some more detailed news recently on the level of accuracy they are working on. Nick Loman got to talk to Clive Brown (the CTO of Oxford Nanopore Technologies) and here is the interesting part:
So why didn’t the MinIon come out in 2012? Technically, he lists several setbacks. The custom sensor microchip (ASIC) wasn’t performing as they wanted, necessitating a redesign from scratch. “That put us back about 5 months, but it was the right thing to do”. There have also been problems stabilising the lipid bilayer, and so over days and weeks it degrades. He set his team a new accuracy target of 1%, a major improvement from the 4% error rate announced at AGBT......

Technical breakthroughs. They’ve found that error rates can be improved by having multiple nanopores on the chip with different properties, and then merging the data. Some nanopores are better at recognising certain nucleotide signatures than others, and so they can be complementary. This is a hint that consensus accuracy might ultimately be important, a la Pacific Biosciences. ** see footnote

I note from this exchange that the initial algorithm was not good enough but that they now are looking at a better one (Viterbi ?) that would decrease the error rate. Let us note that, while 1% is a good measure, it is still not a fabulous one in that field. I can definitely imagine that a similar slippage problem due to the process itself or the deterioration of the lipid layer would be parallel to the slipping issues found in the well logging situation and that it is having an impact on the performance of the deconvolution algorithm. Increasing the number of pores might be a way to go,  improving the algorithm is another, sensing something else (besides voltage drop) near the pore is another, demultiplexing some of the voltages might be another. Eventually I would not be overly surprised to see some of the techniques of compressive sensing / sparse recovery and advanced matrix factorization techniques used to get better accuracy.. 

Update: the 2009 paper that started it all [1] shows a figure where one can witness a non constant traveling of the DNA strand through the nanopore with G,T,C,A being separated by uneven distances:

[1] James Clarke, Hai-Chen Wu, Lakmal Jayasinghe, Alpesh Patel, Stuart Reid, Hagan Bayley (2009). Continuous base identification for single-molecule nanopore DNA sequencing Nature Nanotechnology DOI: 10.1038/nnano.2009.12

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

No comments: