If you look back, you realize that the most surprising result of the Netflix contest is not the fact that thousands of researchers spent a year improving an algorithm by 10% (some of them using matrix completion techniques devised by some of you) but rather that what we thought we had (privacy) was pretty much an illusion. Talk about a surprise. As I was searching for a specific blog entry on the Netflix prize in my google reader feeds. I found this entry written a month ago about government 2.0. by Tom Slee. I think he aptly describes privacy as an arms race.
....Data anonymization and aggregation are often cited as ways of handling these issues, but recent developments in “re-identification” have shown that such efforts may be doomed. The identification of individuals within the Netflix Prize data set (paper) led to the cancellation of the second Netflix contest, and has cast a chill on crowdsourcing contests. Legal scholar Paul Ohm’s recent long but highly readable paper on Broken Promises of Privacy suggests that we need to re-assess the basis for many of our data privacy laws because information previously thought to be anonymous has now got to be treated as potentially privacy-damaging.
This arm's race has taken a new turn this week using some tools such as non-adaptive request to duplicate entire databases ( Nonadaptive Mastermind Algorithms for String and Vector Databases, with Case Studies by Arthur Asuncion, Michael Goodrich.). I note that the paper does not seem to be aware of recent work in group testing and compressed sensing (see references below) and one really wonders how some of the bounds and techniques devised in CS could improve the cloning or the revealing of information that we think is anonymized. I also wonder to what extent the bounds given in the Donoho-Tanner phase transition could provide some sorts of relief in that respect.. And while it may just be a simple issue of better governing through government 2.0, in the last few days, we've just had three examples of potential large privacy breach outside the whole Gov 2.0 and Google-Facebook fight for the social graph:
- Arthur's tweet pointed out to me a recent example where the french government passed a law allowing the sale of "anonymized" DMV listings.
- Maybe the HIPAA laws actually need some rethinking in light of the contest like the one proposed by HPN.
- The Gawker password debacle that shows most people use the same password for different sites.
- Boolean Compressed Sensing and Noisy Group Testing by George Atia, and Venkatesh Saligrama
- Note on Noisy Group Testing: Asymptotic Bounds and Belief Propagation Reconstruction by Dino Sejdinovic, Oliver Johnson.
- Superselectors: Efficient Constructions and Applications by Ferdinando Cicalese, Ugo Vaccaro.
- Derandomization and Group Testing by Mahdi Cheraghchi.
- Improved Constructions for Non-adaptive Threshold Group Testing by Mahdi Cheraghchi.
- Venkatesh Saligrama - Noisy Group Testing and Boolean Compressed Sensing
- Group Testing and Sparse Recovery by Mark Iwen
- Compressed Sensing with Probabilistic Measurements: A Group Testing Solution by Mahdi Cheraghchi, Ali Hormati, Amin Karbasi, Martin Vetterli