In this Sunday Morning Insight: What Happens When You Cross into P Territory ?, I mentioned this article on using LSH for genome alignment from long read technology (PacBio RS II or Oxford Nanopore MiNion).
Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing by Konstantin Berlin, Sergey Koren, Chen-Shan Chin, James Drake, Jane M Landolin, Adam M Phillippy
while assembling the genome is important, with cheap and fast long reads, the goalpost is now slowly moving to the unsupervised learning of groups of genomes. That type of unsupervised learning can only be enabled with the right dimensionality reduction technique, today it is MinHash
Many exciting applications for Mash (ANI, metagenome clustering, nanopore triage, ...) Preprint coming soon https://t.co/xXYtsjEuJb— Adam Phillippy (@aphillippy) August 25, 2015
Here is Mash: Fast genome distance estimation using the MinHash algorithm from Adam Phillippy's group.
Wonder how MinHash works, check this write-up by Matthew Casperson on MinHash for dummies.
- Videos: Summer School on Hashing: Theory and Applications, 2014
- Slides: Summer School on Hashing: Theory and Applications
- see also all the blog entries with the hash or sketching tag.
- And so it begins ... Compressive Genomics
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.