Back in 2012, in predicting the future: The Steamrollers, we mentioned how sensors, especially in genomics would change our fields. On Friday, based on one of this earlier observation and continuous monitoring of the technologies involved, we mentioned that we are potentially witnessing a Second Inflection Point in Genome Sequencing ? and then wondered what it would entail briefly. In the second part of that prediction, we envisionned some of the algorithmic tools that might be used to make sense of the data coming out of the steamrollers (Predicting the Future: Randomness and Parsimony). Here is one piece of that puzzle: Charalampos (Babis) Tsourakakis's thesis entitled Mathematical and Algorithmic Analysis of Network and Biological Data that provides a year old view on the subject. For more, his arxiv preprints can be found here. You might also be interested in reading his tutorial on Algorithmic techniques for modeling and mining large graphs (AMAzING). From the thesis abstract:
"....Our contributions to computational cancer biology include new models, theoretical insights into existing models, novel algorithmic techniques and detailed experimental analysis of various datasets. Speciﬁcally, we contribute the following.• Novel algorithmic techniques for denoising array comparative genomic hybridization data. Our algorithmic results are of independent interest and provide approximation techniques for speeding up dynamic programming.• Based on empirical ﬁndings which strongly indicate an inherent geometric structure in cancer genomic data, we introduce a geometric model for ﬁnding subtypes of cancer which overcomes diﬃculties of existing methods such as principal/independent component analysis and separation methods for Gaussians. We provide a computational method which solves the optimization problem eﬃciently and is robust to outliers.• We ﬁnd the necessary and suﬃcient conditions to reconstruct uniquely an oncogenetic tree, a popular tumor phylogenetic method."
Some of the interesting bits out of the numerous interesting information in this thesis, can be found in how graphs evolve over time.
Page 244, Chapter 13 also features open problems.
Also not directly related, well kinda, the List of Open Problems in Sublinear Algorithms has been updated since the recent 2014 Bertinoro Workshop. Here they are:
- Problem 61: RNA Folding
- Problem 62: Principal Component Analysis with Nonnegativity Constraints
- Problem 63: Submodular Matching Maximization
- Problem 64: Matchings in the Turnstile Model
- Problem 65: Communication Complexity of Connectivity
- Problem 66: Distinguishing Distributions with Conditional Samples
- Problem 67: Difficult Instance for Max-Cut in the Streaming Model
- Problem 68: Approximating Rank in the Bounded-Degree Model
The slides of the Bertinoro workshop are here.
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.