Friday, August 28, 2015

Mash: Fast genome distance estimation using the MinHash algorithm - implementation -




In this Sunday Morning Insight: What Happens When You Cross into P Territory ?, I mentioned this article on using LSH for genome alignment from long read technology (PacBio RS II or Oxford Nanopore MiNion).

Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing by Konstantin Berlin, Sergey Koren, Chen-Shan Chin, James Drake, Jane M Landolin, Adam M Phillippy

while assembling the genome is important, with cheap and fast long reads, the goalpost is now slowly moving to the unsupervised learning of groups of genomes. That type of unsupervised learning can only be enabled with the right dimensionality reduction technique, today it is MinHash

Here is Mash: Fast genome distance estimation using the MinHash algorithm from Adam Phillippy's group.



Wonder how MinHash works, check this write-up by Matthew Casperson on MinHash for dummies.

Related:

      
    Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
    Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

    No comments:

    Printfriendly