( Personal message: I will at ICLR next week, let's grab some coffee if you are there. )
F2F: A Library For Fast Kernel Expansions by Joachim Curto, Irene Zarza, Feng Yang, Alexander J. Smola, Luc Van Gool
F2F is a C++ library for large-scale machine learning. It contains a CPU optimized implementation of the Fastfood algorithm, that allows the computation of approximated kernel expansions in loglinear time. The algorithm requires to compute the product of Walsh-Hadamard Transform (WHT) matrices. A cache friendly SIMD Fast Walsh-Hadamard Transform (FWHT) that achieves compelling speed and outperforms current state-of-the-art methods has been developed. F2F allows to obtain non-linear classification combining Fastfood and a linear classifier.
I am told by one of the author that the library should be out at some point in time.
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.
2 comments:
You can get 5000 65536-point 32 bit floating point Fast Walsh Hadamard Transforms per second on a single Intel Celeron core. They must be doing something terribly wrong.
https://drive.google.com/open?id=0BwsgMLjV0BnhMG5leTFkYWJMLU0
You should be able to get something like 75 million to 100 million 65536-point FWHT's per second on a top of the range GPU.
The basic algorithm is:
sub wht(x() as single)
dim as ulongint i,j,k
dim as ulongint hs=1,m=ubound(x) 'n-1
dim as single a,b
while hs<=m 'Walsh Hadamard transform
i=0
while i<=m
k=i+hs
while i<k
a=x(i)
b=x(i+hs)
x(i)=a+b
x(i+hs)=a-b
i+=1
wend
i=k+hs
wend
hs+=hs
wend
end sub
That should have been 75 to 100 million 256-point FWHTs per second on a single GPU.
https://estudogeral.sib.uc.pt/bitstream/10316/27842/1/Optimized%20Fast%20Walsh%E2%80%93Hadamard%20Transform.pdf
Probably about 1 to 3 million 65536-point FWHTs/sec on a single GPU. 1 65536-point FWHT/sec needs 1 MegaFlop/sec.
The only thing you then need for random projections (RP) is to do a random sign flip of the data before the FWHT. For better quality repeat. Including random permutations could give you more entropy per step but is expensive time-wise.
One thing you could do is make a "soft" hash table as a possible discriminator for GAN neural nets, or as soft memory for deep networks.
https://drive.google.com/open?id=0BwsgMLjV0BnhellXODdLZWlrOWc
Post a Comment