Nuit Blanche: Privacy Tradeoffs in Predictive Analytics

Friday, April 18, 2014

Privacy Tradeoffs in Predictive Analytics

Here is something new in the privacy game that relies on the fact that most matrix factorization in the recommender system business are low rank. From the paper:

To the best of our knowledge, we are the first to take into account the data disclosed by an analyst in the above privacy-accuracy tradeo , and to establish the optimality of a combined disclosure, obfuscation, and prediction scheme. Our proofs rely on the modeling assumption that is the cornerstone of matrix factorization techniques and hence validated by vast empirical evidence (namely, that the user-item ratings matrix is approximately low-rank). Moreover, the fact that our algorithms successfully block inference against a barrage of di erent classifi ers, some non-linear, further establishes our assumption's validity over real-world data.

Here is the paper: Privacy Tradeoffs in Predictive Analytics by Stratis Ioannidis, Andrea Montanari, Udi Weinsberg, Smriti Bhagat, Nadia Fawaz, Nina Taft

Online services routinely mine user data to predict user preferences, make recommendations, and place targeted ads. Recent research has demonstrated that several private user attributes (such as political affiliation, sexual orientation, and gender) can be inferred from such data. Can a privacy-conscious user benefit from personalization while simultaneously protecting her private attributes? We study this question in the context of a rating prediction service based on matrix factorization. We construct a protocol of interactions between the service and users that has remarkable optimality properties: it is privacy-preserving, in that no inference algorithm can succeed in inferring a user's private attribute with a probability better than random guessing; it has maximal accuracy, in that no other privacy-preserving protocol improves rating prediction; and, finally, it involves a minimal disclosure, as the prediction accuracy strictly decreases when the service reveals less information. We extensively evaluate our protocol using several rating datasets, demonstrating that it successfully blocks the inference of gender, age and political affiliation, while incurring less than 5% decrease in the accuracy of rating prediction.

of related interest: The Simons Institute's recent Big Data and Differential Privacy workshop.

Join the CompressiveSensing subreddit or the Google+ Community and post there !