A while back, we saw that for each individual, the microbiome did not seem to change too much over time, but what about at time t, how is the microbiome different among the seven billion individuals currently on earth ? Thanks to the Twitter and Jason Moore, I came across a paper (and attendant code) that sets out to answer that question with a dictionary learning approach.
The code can be found here.
Metagenomics, the study of the total genetic material isolated from a biological host, promises to reveal host-microbe or microbe-microbe interactions that may help to personalize medicine or improve agronomic practice. We introduce a method that discovers metagenomic units (MGUs) relevant for phenotype prediction through sequence-based dictionary learning. The method aggregates patient-speciﬁc dictionaries and estimates MGU abundances in order to summarize a whole population and yield universally predictive biomarkers. We analyze the impact of Gaussian, Poisson, and Negative Binomial read count models in guiding dictionary construction by examining classiﬁcation eﬃciency on a number of synthetic datasets and a real dataset from Ref. 1. Each outperforms standard methods of dictionary composition, such as random projection and orthogonal matching pursuit. Additionally, the predictive MGUs they recover are biologically relevant.
I wonder how these greedy algorithms scale for very large databases and how different the output would be if one were to use other dictionary learning techniques (especially the ones tending to structured sparsity). Synthetic data were derived from this human Microbiome dataset.
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.