Efficient Principal Subspace Projection of Streaming Data Through Fast Similarity Matching by Andrea Giovannucci, Victor Minden, Cengiz Pehlevan, Dmitri B. Chklovskii
Big data problems frequently require processing datasets in a streaming fashion, either because all data are available at once but collectively are larger than available memory or because the data intrinsically arrive one data point at a time and must be processed online. Here, we introduce a computationally efficient version of similarity matching, a framework for online dimensionality reduction that incrementally estimates the top K-dimensional principal subspace of streamed data while keeping in memory only the last sample and the current iterate. To assess the performance of our approach, we construct and make public a test suite containing both a synthetic data generator and the infrastructure to test online dimensionality reduction algorithms on real datasets, as well as performant implementations of our algorithm and competing algorithms with similar aims. Among the algorithms considered we find our approach to be competitive, performing among the best on both synthetic and real data.
Implementations can be found: https://github.com/flatironinstitute/online_psp
and the Matlab version: https://github.com/flatironinstitute/online_psp_matlab
Image Credit: NASA/JPL-Caltech
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.
Other links:
Paris Machine Learning: Meetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOn: Newsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myself: LightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
No comments:
Post a Comment