Tuesday, December 10, 2013

How come we don't see this ? Part Deux

From [1]

At the very least, there ought to be a trace of the cyclicality induced by the 147 base pairs wrapped into the nucleosomes.
The idea being that in GWAS studies we probably ought to see something about the fact that several base pairs are physically close to each other when they are "far' from each other if one were to linearly read the genome sequence. 

Jean-Philippe Vert commented:

Note that, as far as I know, we don't expect a periodicity of 147bp throughout the genome: indeed, while each nucleosome corresponds to a stretch of 147 bp on the DNA, the distance between two successive nucleosomes (called linker DNA) is variable.

The structure of the 147bp-long DNA sequence that forms a nucleosome has been studied a lot. For example it has been observed that you tend to have a 10bp periodicity within the 147bp sequence; based on these properties, some methods have been developed to predict where the nucleosomes are on the genome, see e.g.: 

As for the link with GWAS studies, where we collect many genomes to find correlations between DNA variations between people and diseases, I am not sure how much the nucleosome positioning information could be useful. Importantly, note that the frequency of letters that vary between individuals (called SNP) is of order 1/1000, quite low compared the 147bp length of a nucleosome.

This being said, analysis of nucleosome positioning and how it relates to other epigenetic signals (such as DNA methylation) is a hot topic, for example this recent papers (and references therein) the authors analyze how DNA methylation patterns are correlated with nucleosome conformation:

Overall, how to represent or encode the multi-scale structure of DNA is a widely open problem, it would be nice if scattering networks or other "modern" ideas in signal processing and machine learning could bring new tools!

Thanks Jean-Philippe.

In an e-mail, Mohammed AlQuraishi also mentioned

Hi Igor,
...The point you make is interesting and I do wonder if it can be exploited somehow. Btw, you may be interested in reading this: 
It's clear that the genome exhibits a great deal of spatial organization, although I imagine that finding the right "prior" for something like this is non-trivial.
Thanks Mohammed.

Here are all the papers mentioned by both commenters:

DNA sequences that are present in nucleosomes have a preferential ∼10 bp periodicity of certain dinucleotide signals (1,2), but the overall sequence similarity of the nucleosomal DNA is weak, and traditional multiple sequence alignment tools fail to yield meaningful alignments. We develop a mixture model that characterizes the known dinucleotide periodicity probabilistically to improve the alignment of nucleosomal DNAs. We assume that a periodic dinucleotide signal of any type emits according to a probability distribution around a series of ‘hot spots’ that are equally spaced along nucleosomal DNA with 10 bp period, but with a 1 bp phase shift across the middle of the nucleosome. We model the three statistically most significant dinucleotide signals, AA/TT, GC and TA, simultaneously, while allowing phase shifts between the signals. The alignment is obtained by maximizing the likelihood of both Watson and Crick strands simultaneously. The resulting alignment of 177 chicken nucleosomal DNA sequences revealed that all 10 distinct dinucleotides are periodic, however, with only two distinct phases and varying intensity. By Fourier analysis, we show that our new alignment has enhanced periodicity and sequence identity compared with center alignment. The significance of the nucleosomal DNA sequence alignment is evaluated by comparing it with that obtained using the same model on non-nucleosomal sequences.

The exact lengths of linker DNAs connecting adjacent nucleosomes specify the intrinsic three-dimensional structures of eukaryotic chromatin fibers. Some studies suggest that linker DNA lengths preferentially occur at certain quantized values, differing one from another by integral multiples of the DNA helical repeat, ∼10 bp; however, studies in the literature are inconsistent. Here, we investigate linker DNA length distributions in the yeast Saccharomyces cerevisiae genome, using two novel methods: a Fourier analysis of genomic dinucleotide periodicities adjacent to experimentally mapped nucleosomes and a duration hidden Markov model applied to experimentally defined dinucleosomes. Both methods reveal that linker DNA lengths in yeast are preferentially periodic at the DNA helical repeat (∼10 bp), obeying the forms 10n+5 bp (integer n). This 10 bp periodicity implies an ordered superhelical intrinsic structure for the average chromatin fiber in yeast.

DNA Methylation Regulated Nucleosome Dynamics by Isabel Jimenez-Useche, Jiaying Ke, Yuqing Tian, Daphne Shim, Steven C. HowellXiangyun Qiu & Chongli Yuan
A strong correlation between nucleosome positioning and DNA methylation patterns has been reported in literature. However, the mechanistic model accounting for the correlation remains elusive. In this study, we evaluated the effects of specific DNA methylation patterns on modulating nucleosome conformation and stability using FRET and SAXS. CpG dinucleotide repeats at 10 bp intervals were found to play different roles in nucleosome stability dependent on their methylation states and their relative nucleosomal locations. An additional (CpG)5 stretch located in the nucleosomal central dyad does not alter the nucleosome conformation, but significant conformational differences were observed between the unmethylated and methylated nucleosomes. These findings suggest that the correlation between nucleosome positioning and DNA methylation patterns can arise from the variations in nucleosome stability dependent on their sequence and epigenetic content. This knowledge will help to reveal the detailed role of DNA methylation in regulating chromatin packaging and gene transcription.

We describe Hi-C, a method that probes the three-dimensional architecture of whole genomes by coupling proximity-based ligation with massively parallel sequencing. We constructed spatial proximity maps of the human genome with Hi-C at a resolution of 1 megabase. These maps confirm the presence of chromosome territories and the spatial proximity of small, gene-rich chromosomes. We identified an additional level of genome organization that is characterized by the spatial segregation of open and closed chromatin to form two genome-wide compartments. At the megabase scale, the chromatin conformation is consistent with a fractal globule, a knot-free, polymer conformation that enables maximally dense packing while preserving the ability to easily fold and unfold any genomic locus. The fractal globule is distinct from the more commonly used globular equilibrium model. Our results demonstrate the power of Hi-C to map the dynamic conformations of whole genomes.

No comments: