Nuit Blanche: Videos: Group Testing Designs, Algorithms, and Applications to Biology

Monday, February 20, 2012

Videos: Group Testing Designs, Algorithms, and Applications to Biology

There are four videos from the meeting on the Group Testing Designs, Algorithms, and Applications to Biology IMA meeting. Enjoy!

Anna Gilbert - University of Michigan http://www.math.lsa.umich.edu/~annacg/

Tutorial: Sparse signal recovery
February 13, 2012 11:15 am - 12:15 pm

Keywords of the presentation: streaming algorithms, group testing, sparse signal recovery, introduction.

My talk will be a tutorial about sparse signal recovery but, more importantly, I will provide an overview of what the research problems are at the intersection of biological applications of group testing, streaming algorithms, sparse signal recovery, and coding theory. The talk should help set the stage for the rest of the workshop.

Yaniv Erlich - Whitehead Institute for Biomedical Research http://jura.wi.mit.edu/erlich/main.html

Tutorial: Cost effective sequencing of rare genetic variations
February 13, 2012 10:00 am - 11:00 am

Lecture Video (flv)

In the past few years, we have experienced a paradigm shift in human genetics. Accumulating lines of evidence have highlighted the pivotal role of rare genetic variations in a wide variety of traits and diseases. Studying rare variations is a needle in a haystack problem, as large cohorts have to be assayed in order to trap the variations and gain statistical power. The performance of DNA sequencing is exponentially growing, providing sufficient capacity to profile an extensive number of specimens. However, sample preparation schemes do not scale as sequencing capacity. A brute force approach of preparing hundredths to thousands of specimens for sequencing is cumbersome and cost-prohibited. The next challenge, therefore, is to develop a scalable technique that circumvents the bottleneck in sample preparation.

My tutorial will provide background on rare genetic variations and DNA sequencing. I will present our sample prep strategy, called DNA Sudoku, that utilizes combinatorial pooling/compressed sensing approach to find rare genetic variations. More importantly, I will discuss several major distinction from the classical combinatorial due to sequencing specific constraints.

Noam Shental - Open University of Israel http://www.openu.ac.il/home/shental/

Identification of rare alleles and their carriers using compressed se(que)nsing
February 13, 2012 2:00 pm - 3:00 pm

Lecture Video (flv)

Keywords of the presentation: compressed sensing, group testing, genetics, rare alleles

Identification of rare variants by resequencing is important both for detecting novel variations and for screening individuals for known disease alleles. New technologies enable low-cost resequencing of target regions, although it is still prohibitive to test more than a few individuals. We propose a novel pooling design that enables the recovery of novel or known rare alleles and their carriers in groups of individuals. The method is based on combining next-generation sequencing technology with a Compressed Sensing (CS) approach. The approach is general, simple and efficient, allowing for simultaneous identification of multiple variants and their carriers. It reduces experimental costs, i.e., both sample preparation related costs and direct sequencing costs, by up to 70 fold, and thus allowing to scan much larger cohorts. We demonstrate the performance of our approach over several publicly available data sets, including the 1000 Genomes Pilot 3 study. We believe our approach may significantly improve cost effectiveness of future association studies, and in screening large DNA cohorts for specific risk alleles.

We will present initial results of two projects that were initiated following publication. The first project concerns identification of de novo SNPs in genetic disorders common among Ashkenazi Jews, based on sequencing 3000 DNA samples. The second project in plant genetics involves identifying SNPs related to water and silica homeostasis in Sorghum bicolor, based on sequencing 3000 DNA samples using 1-2 Illumina lanes.

Joint work with Amnon Amir from the Weizmann Institute of Science, and Or Zuk from the Broad Institute of MIT and Harvard

Sharon Aviran - University of California, Berkeley http://bio.math.berkeley.edu/aviran/

RNA Structure Characterization from High-Throughput Chemical Mapping Experiments
February 13, 2012 3:45 pm - 4:45 pm

Lecture Video (flv)

Keywords of the presentation: RNA structure characterization, high-throughput sequencing, maximum likelihood estimation

New regulatory roles continue to emerge for both natural and engineered noncoding RNAs, many of which have specific secondary and tertiary structures essential to their function. This highlights a growing need to develop technologies that enable rapid and accurate characterization of structural features within complex RNA populations. Yet, available structure characterization techniques that are reliable are also vastly limited by technological constraints, while the accuracy of popular computational methods is generally poor. These limitations thus pose a major barrier to the comprehensive determination of structure from sequence and thereby to the development of mechanistic understanding of transcriptome dynamics. To address this need, we have recently developed a high-throughput structure characterization technique, called SHAPE-Seq, which simultaneously measures quantitative, single nucleotide-resolution, secondary and tertiary structural information for hundreds of RNA molecules of arbitrary sequence. SHAPE-Seq combines selective 2’-hydroxyl acylation analyzed by primer extension (SHAPE) chemical mapping with multiplexed paired-end deep sequencing of primer extension products. This generates millions of sequencing reads, which are then analyzed using a fully automated data analysis pipeline. Previous bioinformatics methods, in contrast, are laborious, heuristic, and expert-based, and thus prohibit high-throughput chemical mapping.

In this talk, I will review recent developments in experimental RNA structure characterization as well as advances in sequencing technologies. I will then describe the SHAPE-Seq technique, focusing on its automated data analysis method, which relies on a novel probabilistic model of a SHAPE-Seq experiment, adjoined by a rigorous maximum likelihood estimation framework. I will demonstrate the accuracy and simplicity of our approach as well as its applicability to a general class of chemical mapping techniques and to more traditional SHAPE experiments that use capillary electrophoresis to identify and quantify primer extension products.

This is joint work with Lior Pachter, Julius Lucks, Stefanie Mortimer, Shujun Luo, Cole Trapnell, Gary Schroth, Jennifer Doudna and Adam Arkin.