Friday, September 27, 2013

Accurate Profiling of Microbial Communities from Massively Parallel Sequencing using Convex Optimization - implementation -

Or Zuk just sent me the following:

Dear Igor,

We've just uploaded to the arxiv a manuscript which might be of interest to Nuit Blanche's readers,
Thanks Or ! We've heard of the microbiome before see the recent Saturday Morning Videos on Human Microbiome Science.

We describe the Microbial Community Reconstruction ({\bf MCR}) Problem, which is fundamental for microbiome analysis. In this problem, the goal is to reconstruct the identity and frequency of species comprising a microbial community, using short sequence reads from Massively Parallel Sequencing (MPS) data obtained for specified genomic regions. We formulate the problem mathematically as a convex optimization problem and provide sufficient conditions for identifiability, namely the ability to reconstruct species identity and frequency correctly when the data size (number of reads) grows to infinity. We discuss different metrics for assessing the quality of the reconstructed solution, including a novel phylogenetically-aware metric based on the Mahalanobis distance, and give upper-bounds on the reconstruction error for a finite number of reads under different metrics. We propose a scalable divide-and-conquer algorithm for the problem using convex optimization, which enables us to handle large problems (with $\sim10^6$ species). We show using numerical simulations that for realistic scenarios, where the microbial communities are sparse, our algorithm gives solutions with high accuracy, both in terms of obtaining accurate frequency, and in terms of species phylogenetic resolution.
I note:

In the spirit of reproducible research, we have implemented all of our algorithms in the Matlab package COMPASS (Convex Optimization for Microbial Profiling by Aggregating Short Sequence reads), which is freely available at github:

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

No comments: