Tuesday, June 17, 2014

... There will be a "before" and "after" this paper ...

A while back, I was asked to perform some peer review on a paper. I usually shy away from this task as I believe the blog is sufficient enough to show my goodwill to the community at large. This being said, the peer review had a little twist. First it was on a paper I deemed important and second the peer review, while being a pre-publication review, offered the ability to make my views/comments public if and when the paper would be published. Here iare the final words of my review:

...This paper makes a connection between a central problem in large datasets found in GWAS with deep high dimensional geometry combinatorics and establishes how some of these very high dimensional problems ought to be considered in the future. There will be a "before" and "after" this paper.

Here is the paper: Applying compressed sensing to genome-wide association studies by Shashaank Vattikuti, James J Lee, Christopher C Chang, Stephen D H Hsu and Carson C Chow

The aim of a genome-wide association study (GWAS) is to isolate DNA markers for variants affecting phenotypes of interest. This is constrained by the fact that the number of markers often far exceeds the number of samples. Compressed sensing (CS) is a body of theory regarding signal recovery when the number of predictor variables (i.e., genotyped markers) exceeds the sample size. Its applicability to GWAS has not been investigated.


Using CS theory, we show that all markers with nonzero coefficients can be identified (selected) using an efficient algorithm, provided that they are sufficiently few in number (sparse) relative to sample size. For heritability equal to one (h2 = 1), there is a sharp phase transition from poor performance to complete selection as the sample size is increased. For heritability below one, complete selection still occurs, but the transition is smoothed. We find for h2 ~ 0.5 that a sample size of approximately thirty times the number of markers with nonzero coefficients is sufficient for full selection. This boundary is only weakly dependent on the number of genotyped markers.


Practical measures of signal recovery are robust to linkage disequilibrium between a true causal variant and markers residing in the same genomic region. Given a limited sample size, it is possible to discover a phase transition by increasing the penalization; in this case a subset of the support may be recovered. Applying this approach to the GWAS analysis of height, we show that 70-100% of the selected markers are strongly correlated with height-associated markers identified by the GIANT Consortium.

You can find my review and that of two other researchers in Pre-publication history section of the Journal (GigaScience), the comments to my review can be found starting at page 4 of this author's comment
Reviewer's Report Can YANG 24 Jan 2014
Reviewer's Report Igor Carron 21 Feb 2014
Reviewer's Report Or Zuk 12 Mar 2014
Resubmission - Version 2 Author's comment 18 Apr 2014
Reviewer's Report Can YANG 23 Apr 2014
Resubmission - Version 3 Author's comment 29 Apr 2014
Resubmission - Version 4 19 May 2014
Editorial acceptance 23 May 2014
Published 16 Jun 2014
Then again, this can become a post publication peer review, if you feel like adding to this comment section.


scott@giga said...

Hi Igor,

really glad you enjoyed the open review process. I don't know if you've seen yet, but the Publons has just uploaded all of our reviews, providing further discoverability, transparency and credit for all the hard work you guys have done. This is your review here: https://publons.com/review/3878/. Thanks so much again for your help with this!

Andrew said...

Hi Igor (and Scott),

Great review and very cool to see you've written up here too.

Just dropping in to say that we've fixed a typo in your name on Publons (it was caused during the initial data transfer).

You can see the full review timeline for the paper here: