From [1]
In Sunday Morning Insight: Structured Sparsity and Structural DNA Folding Information, I mentioned the following
At the very least, there ought to be a trace of the cyclicality induced by the 147 base pairs wrapped into the nucleosomes.
So I went ahead and asked if the specialists noticed some cyclicality in genomic studies based on the fact that every 147 base pairs wrapped up around the histones are in fact physical neighbors:
I was wondering if the wrapping of the DNA around the nuclesome had been picked in GWAS studies or otherwise
Here was the answer:
While I believe that this can be an important signal, I am not aware of existing GWASs linking this.
Wow! Maybe with the right transform [2] or the right regularization....
[1] Epigenetic Control: Regulating Access to Genes within the Chromosome, Connexions course
[2] ScatNet: An implementation of Scattering Networks transforms and classification algorithms
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.
3 comments:
What kind of periodicity would you expect? We know that nucleosome positioning is mainly defined by a few anchors like insulators and promoters and then the surrounding nucleosomes arrange themselves around those.
Aaron,
Here is the thing, I don't want to know in advance what that cyclicality is. However by the obvious physical positioning, it would seem obvious that if we have a large genomic database, we ought to pick up on it, unless I am missing something, which I could since I am not a specialist. To find this cyclicality, this is probably a combinatorial problem, but what compressive sensing shows us is really that some of these problems can be attacked with the right regularization, hence the note at the end of the blog entry.
Igor.
Note that, as far as I know, we don't expect a periodicity of 147bp throughout the genome: indeed, while each nucleosome corresponds to a stretch of 147 bp on the DNA, the distance between two successive nucleosomes (called linker DNA) is variable.
The structure of the 147bp-long DNA sequence that forms a nucleosome has been studied a lot. For example it has been observed that you tend to have a 10bp periodicity within the 147bp sequence; based on these properties, some methods have been developed to predict where the nucleosomes are on the genome, see e.g.:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1310902/
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2522279/?tool=pubmed
As for the link with GWAS studies, where we collect many genomes to find correlations between DNA variations between people and diseases, I am not sure how much the nucleosome positioning information could be useful. Importantly, note that the frequency of letters that vary between individuals (called SNP) is of order 1/1000, quite low compared the 147bp length of a nucleosome.
This being said, analysis of nucleosome positioning and how it relates to other epigenetic signals (such as DNA methylation) is a hot topic, for example this recent papers (and references therein) the authors analyze how DNA methylation patterns are correlated with nucleosome conformation:
http://www.nature.com/srep/2013/130702/srep02121/full/srep02121.html
Overall, how to represent or encode the multi-scale structure of DNA is a widely open problem, it would be nice if scattering networks or other "modern" ideas in signal processing and machine learning could bring new tools!
jp
Post a Comment