Monday, March 03, 2014

Machine Learning Methods in the Computational Biology of Cancer

I just saw this very interesting and well written paper recently: Machine Learning Methods in the Computational Biology of Cancer by Mathukumalli Vidyasagar
The objectives of this "perspective" paper are to review some recent advances in sparse feature selection for regression and classification, as well as compressed sensing, and to discuss how these might be used to develop tools to advance personalized cancer therapy. As an illustration of the possibilities, a new algorithm for sparse regression is presented, and is applied to predict the time to tumor recurrence in ovarian cancer. A new algorithm for sparse feature selection in classification problems is presented, and its validation in endometrial cancer is briefly discussed. Some open problems are also presented.

 In the conclusion one can read the following:
....In order to apply techniques from compressed sensing theory to cancer biology, it would be necessary to modify the theory to the case where the measurement matrix is given, and not chosen by the user. The RIP corresponds to the assumption that in an m x n matrix A, every choice of k columns results in a nearly orthogonal set. In actual biological data, such an assumption has no hope of being true, because the expression levels of some genes would be highly correlated with those of other genes. In Cand es and Plan (2009), the authors suggest that it is possible to handle this situation by first clustering the column vectors and then choosing just one exemplar from each cluster before applying the theory. Our preliminary attempts to apply such an approach to ovarian cancer data (The Cancer Genome Atlas Network (2011)) are not very promising, leading to RIP orders of 5 or 10 { far too small to be of practical use. Thus there is a need for the development of other heuristics besides clustering to extract nearly orthogonal sets of columns for actual measurement matrices. In this connection it is worth pointing out Huang and Zhang (2010) that group RIP is easier to achieve using random projections, as compared to RIP. However, it is not clear whether a \given" A matrix is likely to satisfy a group RIP with a sufficiently large order.....

 [emphasis mine] We have known for quite some time now that an RIP condition is:

  • just a sufficient condition
  • too pessimistic in many cases 
For instance, back in 2008 in his presentation at Texas A&M [6],  Jared Tanner showed us this slide

see the red line, this is the RIP transition below it, you can safely reconstruct your signal, above you can't. Oh you don't see a "below", that's right this is how bad RIP for actual computations is (see [7] for a good overview of how the Donoho-Tanner phase transition changes this)

Quite simply they looked at their undersampled linear system of equations and figured out if an algorithm could provide then with a clue on the size of the sample population based on the fact that the complexity of this algorithm allowed it to solve cases only in a specific region of the phase space. In the paper, while they don't have clear cut reconstructions, they find out that some of the parameters actually go through some sharp phase transitions as found in the noiseless case. There seems to be a lot of low hanging fruits here.

In a recent paper by Cand\'{e}s and Plan, it is shown that the LASSO algorithm exhibits near-ideal behavior, in the following sense: Suppose y=Az+η where A satisfies the restricted isometry property (RIP), and ∥η∥2≤ϵ. Then minimizing ∥z∥1 subject to ∥y−Az∥2≤ϵ leads to an estimate x^ whose error ∥x^−x∥2 is bounded by a universal constant times the error achieved by an "oracle" that knows the location of the nonzero components of x. In the world of optimization, the LASSO algorithm has been generalized in several directions such as the group LASSO, the sparse group LASSO, either without or with tree-structured overlapping groups, and most recently, the sorted LASSO. In this paper, it is shown that {\it any algorithm\/} exhibits near-ideal behavior in the above sense, provided only that (i) the norm used to define the sparsity index is "decomposable," (ii) the penalty norm that is minimized in an effort to enforce sparsity is "γ-decomposable," and (iii) a "compressibility condition" in terms of a group restricted isometry property is satisfied. Specifically, the group LASSO, and the sparse group LASSO (with some permissible overlap in the groups), as well as the sorted ℓ1-norm minimization all exhibit near-ideal behavior. Explicit bounds on the residual error are derived that contain previously known results as special cases.
from the paper:
We partition the set of 20, 000 measurements into g = 6, 000 sets representing the number of pathways that we wish to study, and we take lmin = 4, meaning that the shortest pathway of interest has four genes. Therefore we can take smax = ⌊k/lmin⌋ = 5. Finally, let us take ζ = 10−6. With these numbers, it
is readily verified that

mS = 71, 286, mGS = 47, 960.
In other words, both values of m are larger than n! Therefore one can only conclude that the known bounds for m are too coarse to be of practical use at least in computational biology, though perhaps they might be of use in other applications where n is a few orders of magnitude larger.
Again, an RIP analysis is fine because it provides you with some sorts of comfort that mathematically things could turn out well. However, as show previously, an RIP bound is known to be too restrictive because quite simply it is a sufficient condition. And then there is this:
As to whether group sparsity offers any advantage over pure sparsity, the evidence is not conclusive
Again, that statement can only be taken in the context of this RIP analysis. If you remove that approach it's been known that a group sparsity approach has yielded fewer measurements as in the case of the block sparsity and the use of the sharp phase transition see [1,2,3,4,5].

We mentioned version 1 of the preprint of Vattikuti et al when it first came out in Application of compressed sensing to genome wide association studies and genomic selection
[7] Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing by Jared Tanner and David L. Donoho

No comments: