7 Algorithms for Massive Data Problems 238
7.1 Frequency Moments of Data Streams . . . . . . . . . . . . . . . . . . . . . 238
7.1.1 Number of Distinct Elements in a Data Stream . . . . . . . . . . . 239
7.1.2 Counting the Number of Occurrences of a Given Element. . . . . . 243
7.1.3 Counting Frequent Elements . . . . . . . . . . . . . . . . . . . . . . 243
7.1.4 The Second Moment . . . . . . . . . . . . . . . . . . . . . . . . . . 245
7.2 Matrix Algorithms Using Sampling . . . . . . . . . . . . . . . . . . . . . . 248
7.2.1 Matrix Multiplication Using Sampling . . . . . . . . . . . . . . . . 248
7.2.2 Sketch of a Large Matrix . . . . . . . . . . . . . . . . . . . . . . . . 250
7.3 Sketches of Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
7.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
8 Clustering 260
8.1 Some Clustering Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
8.2 A k-means Clustering Algorithm . . . . . . . . . . . . . . . . . . . . . . . 263
8.3 A Greedy Algorithm for k-Center Criterion Clustering . . . . . . . . . . . 265
8.4 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
8.5 Recursive Clustering Based on Sparse Cuts . . . . . . . . . . . . . . . . . . 273
8.6 Kernel Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
8.7 Agglomerative Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
8.8 Dense Submatrices and Communities . . . . . . . . . . . . . . . . . . . . . 278
8.9 Flow Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
8.10 Finding a Local Cluster Without Examining the Whole Graph . . . . . . . 284
8.11 Axioms for Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
8.11.1 An Impossibility Result . . . . . . . . . . . . . . . . . . . . . . . . 289
8.11.2 A Satis able Set of Axioms . . . . . . . . . . . . . . . . . . . . . . 295
8.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
9 Topic Models, Hidden Markov Process, Graphical Models, and Belief Propagation 301
9.1 Topic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
9.2 Hidden Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
9.3 Graphical Models, and Belief Propagation . . . . . . . . . . . . . . . . . . 310
9.4 Bayesian or Belief Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 311
9.5 Markov Random Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
9.6 Factor Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
9.7 Tree Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
9.8 Message Passing in general Graphs . . . . . . . . . . . . . . . . . . . . . . 315
9.9 Graphs with a Single Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . 317
9.10 Belief Update in Networks with a Single Loop . . . . . . . . . . . . . . . . 319
9.11 Maximum Weight Matching . . . . . . . . . . . . . . . . . . . . . . . . . . 320
9.12 Warning Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
9.13 Correlation Between Variables . . . . . . . . . . . . . . . . . . . . . . . . . 325
9.14 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
10 Other Topics 332
10.1 Rankings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .332
10.2 Hare System for Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . .334
10.3 Compressed Sensing and Sparse Vectors . . . . . . . . . . . . . . . . . . .335
10.3.1 Unique Reconstruction of a Sparse Vector . . . . . . . . . . . . . .336
10.3.2 The Exact Reconstruction Property . . . . . . . . . . . . . . . . . .339
10.3.3 Restricted Isometry Property . . . . . . . . . . . . . . . . . . . . .340
10.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .342
10.4.1 Sparse Vector in Some Coordinate Basis . . . . . . . . . . . . . . .342
10.4.2 A Representation Cannot be Sparse in Both Time and Frequency Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .342
10.4.3 Biological . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .345
10.4.4 Finding Overlapping Cliques or Communities . . . . . . . . . . . .345
10.4.5 Low Rank Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . .346
10.5 Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .347
10.6 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .348
10.6.1 The Ellipsoid Algorithm . . . . . . . . . . . . . . . . . . . . . . . .350
10.7 Integer Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .351
10.8 Semi-De nite Programming . . . . . . . . . . . . . . . . . . . . . . . . . .352
10.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.