You probably recall three weeks ago this Sunday Morning Insight entry on Why You Should Care About Phase Transitions in Clustering ? Well it looks like, at least when one considers the adjacency matrix, clustering can be put in terms of a low rank problem which yield sharp phase transitions.  Obviously that opens the door to partially observed graphs (second paper is submitted to ICML). The graph below will be added to the Advanced Matrix Factorization Jungle page.
Sharp Performance Bounds for Graph Clustering via Convex Optimization  by Ramya Korlakai Vinayak, Samet Oymak, Babak Hassibi
The problem of finding clusters in a graph arises in several applications such as social networks, data mining and computernetworks. A typical, convex optimization approach, that isoften adopted is to identify a sparse plus low-rank decomposition of the adjacency matrix of the graph, with the (dense)low-rank component representing the clusters. In this paper,we sharply characterize the conditions for successfully identifying clusters using this approach. In particular, we introducethe “effective density” of a cluster that measures its significance and we find explicit upper and lower bounds on theminimum effective density that demarcates regions of successor failure of this technique. Our conditions are in terms of (a)the size of the clusters, (b) the denseness of the graph, and(c) regularization parameter of the convex program. We also present extensive simulations that corroborate our theoretical findings.
A Convex Approach to Graph Clustering with Missing Data by Ramya Korlakai Vinayak, Samet Oymak, Babak Hassibi
We consider the problem of finding clusters in graphs which are partially observed. We analyze a popular program based on regularized nuclear norm minimization, with a goal to recover the low-rank part of the adjacency matrix. For the commonly used Stochastic Block Model, we give explicit bounds on the parameters of the problem - size and sparsity of clusters, amount of observed data, and the regularization parameter of the convex program that characterize the success or failure of this approach. We also corroborate our theoretical findings with extensive simulations, and also run our algorithm on a real data set obtained from crowdsourcing a image classification task on Amazon Mechanical Turk.

 
No comments:
Post a Comment