Thursday, October 09, 2014

RQS (Read Quality-score Sparsifier): Traversing the k-mer Landscape of NGS Read Datasets for Quality Score Sparsification - implementation -

When you have entire fields that have been dedicated to compressing a certain type of data (image, videos), you generally rely on that knowledge to be optimal in your compression algorithms. But what happens when you have a Big Data problem and yet you really do not have the years of experience that built up in, say, image processing. Compressive Sensing suddenly becomes an attractive solution.  

Traversing the k-mer Landscape of NGS Read Datasets for Quality Score Sparsification by Y. William Yu, Deniz Yorukoglu, Bonnie Berger
It is becoming increasingly impractical to indefinitely store raw sequencing data for later processing in an uncompressed state. In this paper, we describe a scalable compressive framework, Read-Quality-Sparsifier (RQS), which substantially outperforms the compression ratio and speed of other de novo quality score compression methods while maintaining SNP-calling accuracy. Surprisingly, RQS also improves the SNP-calling accuracy on a gold-standard, real-life sequencing dataset (NA12878) using a k-mer density profile constructed from 77 other individuals from the 1000 Genomes Project. This improvement in downstream accuracy emerges from the observation that quality score values within NGS datasets are inherently encoded in the k-mer landscape of the genomic sequences. To our knowledge, RQS is the first scalable sequence-based quality compression method that can efficiently compress quality scores of terabyte-sized and larger sequencing datasets.
An implementation of RQS is here.

Since then, two papers have mentioned this paper:

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

No comments: