...Graham Cormode had a nice survey of data sketching algorithms, which he represented as a haiku that I unfortunately failed to copy down. But basically he's interested in methods that have some properties among: sublinear storage, randomized, linear, can be added together; they include methods such as Bloom filters, minhashing (for which he described a nice variation that estimates the number of distinct items in a set with good multiplicative error), and the count-min sketch (which gives an additive approximation to the frequency of each item in a data stream). Towards the end he listed situations in which these methods are actually used "in the field": he included web log analysis and compressed sensing, and explicitly excluded sensor networks because the theoretical applications of sketching in sensor networks as huge randomly placed networks with too much data to store do not match the scientific applications of at most a dozen or two carefully placed sensors from which one wants to collect all the raw data. He also listed quite a few potential applications, but I think he could have included a lot more real applications if he had included Bloom filters in his application section as he did in his historical survey....
Pages
▼
Tuesday, May 17, 2011
Algorithms In The Field (#8f)
From David Eppstein's summary of day one of the Algorithm In The Field NSF workshop:
No comments:
Post a Comment