Friday, April 06, 2012

Wars will be waged over PCA

While Cable and I are having fun decomposing videos and images with the latest Matrix Factorization techniques, let us remember how some of these tools will eventually be used. In The Montford Delusion, the good folks at the RealCliimate blog deconstruct some of the issues related to the hockey stick controversy which has been a centerpiece of the debate on the policy issue of Global Warming:

....The chief focus is the original hockey stick, a reconstruction of past temperature for the northern hemisphere covering the last 600 years by Mike Mann, Ray Bradley, and Malcolm Hughes (1998, Nature, 392, 779, doi:10.1038/33859, available here), hereafter called “MBH98″ (the reconstruction was later extended back to a thousand years by Mann et al, 1999, or “MBH99″ ). The reconstruction was based on proxy data, most of which are not direct temperature measurements but may be indicative of temperature. To piece together past temperature, MBH98 estimated the relationships between the proxies and observed temperatures in the 20th century, checked the validity of the relationships using observed temperatures in the latter half of the 19th century, then used the relationships to estimate temperatures as far back as 1400. The reconstruction all the way back to the year 1400 used 22 proxy data series, although some of the 22 were combinations of larger numbers of proxy series by a method known as “principal components analysis” (hereafter called “PCA”–see here). For later centuries, even more proxy series were used. The result was that temperatures had risen rapidly in the 20th century compared to the preceding 5 centuries. The sharp “blade” of 20th-century rise compared to the flat “handle” of the 15-19th centuries was reminiscent of a “hockey stick” — giving rise to the name describing temperature history.

More can be found in Dummies guide to the latest ”Hockey Stick” controversy by Gavin Schmidt and Caspar Amman. To me comment 19 is by far the most interesting,

Excellent, excellent piece.
My only complaint is the single phrase minimizing the unusual PCA centering technique Mann used. While I understand what was done, and it actually makes sense to me as a non-statistician, the comments by Professor Ian Jolliffe on your own blog suggest to me that it is simply wrong to do so, although maybe not entirely that black and white. The fact that it makes minimal difference in the end result should be highlighted, but I think you go overboard in minimizing the basic MM argument by using the following phrasing:
…claimed that the PCA used by MBH98 wasn’t valid because they had used a different “centering” convention than is customary.
I don’t want to wordsmith it on you, but use of words like “claimed,” “convention,” and “customary” seem to make the MM complaint seem arbitrary and misguided when to me it was not. It wasn’t merely claimed, they were right. It wasn’t a convention, it was a technique. It wasn’t simply not customary, it was wrong to do so. Use of double quotes around the word “centering” also hints at something intangibly nefarious, probably because the deniers so often put quotes around terms they don’t like, simply to silently imply that everything behind the term is arbitrary, or lacking truthiness, or not even relevant (like “global temperature record” or “computer models” or so called “climate scientists”).
The centering method Mann used (based on what I’ve read) was, to statisticians, both irregular and incorrect. Come out and say it, or your post starts off sounding a bit like the denier’s posts. Call a spade a spade, and let the truth stand on it’s own, without crutches.
[Or, if I'm wrong on this, please correct me.]
[Response: I don't agree. One can perform an SVD on any matrix you like, and that will provide a set of basis functions that have certain properties. The interpretation of the vectors will certainly depend on this, but it is much less important if you are simply doing a data reductions step. In this case, this was being used for data reduction, and so the particular SVD decomposition needs to be combined with a selection rule to see what is retained. I'm sure that given the subsequent furore, Mann et al would have been happier if they had used full centering, but it would have made very little difference either to the results or the reaction. - gavin]
But again, this is an excellent, concise deconstruction of something that MM have spent years and years trying to compile. It’s a wonder that their position gets any air time at all, except that there are obviously so many people out there who only ever see what they want to see, and the bigger the danger is, the more tightly they shut their eyes and try to imagine their happy place.
A wedge was certainly wide open when the initial authors went from an SVD to an unconventional PCA.


3 comments:

Joshua Stults said...

Technocracies will wage wars over PCA.

FTFY.

Democracies will continue to ignore the twitterings of wannabe philosopher kings and their yellow journalists.

Igor said...

Your point being ?

Joshua Stults said...

Well, unless some technocracies start popping up it's unlikely any wars will be fought over [pick your favorite statistical ad hockery].

Do you think there are enough regimes around that are effectively technocratic that the qualifier is unneeded?

Printfriendly