tag:blogger.com,1999:blog-6141980.comments2017-07-22T05:03:14.182-05:00Nuit BlancheIgorhttp://www.blogger.com/profile/17474880327699002140noreply@blogger.comBlogger1498125tag:blogger.com,1999:blog-6141980.post-16692102985063282022017-06-29T09:34:06.866-05:002017-06-29T09:34:06.866-05:00Maybe subrandom sampling can help with compressive...Maybe subrandom sampling can help with compressive sensing, perhaps being better than purely random sampling:<br />https://en.wikipedia.org/wiki/Low-discrepancy_sequence<br />I suppose there is a good chance it has already been investigated. <br />SeanVNhttp://www.blogger.com/profile/05967727000105480078noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-35430551569191075822017-06-24T18:08:58.622-05:002017-06-24T18:08:58.622-05:00Not at the moment, it looks like.
Igor.Not at the moment, it looks like.<br /><br />Igor.Igorhttp://www.blogger.com/profile/17474880327699002140noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-51100357864574434072017-06-24T14:08:07.014-05:002017-06-24T14:08:07.014-05:00Are slides available for this talk?Are slides available for this talk?Gokulhttp://www.blogger.com/profile/14250642559129421797noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-23978504194151106062017-06-04T07:52:01.408-05:002017-06-04T07:52:01.408-05:00There has been a lack of discussion about binariza...There has been a lack of discussion about binarization in neural networks. Multiplying those +1/-1 values by weights and summing allows you to store values with a high degree of independence. For a given binary input and target value you get an error. You divide the error by the number of binary values and then you simply correct each of the weights by the reduced error taking account of the binary sign. That gives a full correction to get the correct target output. In higher dimensional space most vectors are orthogonal. For a different binary input the adjustments you made to the weights will not align at all. In fact they will sum to Gaussian noise by the central limit theorem. The value you previously stored for the second binary input will now be contaminated by a slight amount of Gaussian which you can correct for. This will now introduce an even smaller amount of Gaussian noise on the value for the first binary input. Iterating back and forth will get rid of the noise entirely for both binary inputs. <br />This has high use in random projection,reservoir and extreme learning machine computing.SeanVNhttp://www.blogger.com/profile/05967727000105480078noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-73081378628405822532017-05-30T19:15:38.529-05:002017-05-30T19:15:38.529-05:00Tested in code:
http://www.freebasic.net/forum/vie...Tested in code:<br />http://www.freebasic.net/forum/viewtopic.php?f=7&t=25710<br />Conclusion: Very good<br /><br />The idea is very applicable to locality sensitive hashing as well.SeanVNhttp://www.blogger.com/profile/05967727000105480078noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-74043354810802289852017-05-30T01:29:57.391-05:002017-05-30T01:29:57.391-05:00It would seem you could fit about 100 million inte...It would seem you could fit about 100 million integer add/subtract logic units on a current semiconductor die. Clock them at 1 billion operations per second and you have 100 Peta operations per second available for "no multiply" nets. <br />https://discourse.numenta.org/t/no-multiply-neural-networks/2361<br />SeanVNhttp://www.blogger.com/profile/05967727000105480078noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-38381881368275629822017-05-30T01:25:34.630-05:002017-05-30T01:25:34.630-05:001 billion by 100 million operations is 100 Peta op...1 billion by 100 million operations is 100 Peta operations per second dude.SeanVNhttp://www.blogger.com/profile/05967727000105480078noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-52153707086744812532017-05-30T00:10:03.992-05:002017-05-30T00:10:03.992-05:00This comment has been removed by the author.SeanVNhttp://www.blogger.com/profile/05967727000105480078noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-13018719162268526322017-05-29T19:05:43.929-05:002017-05-29T19:05:43.929-05:00It should be possible to do "no multiply"...It should be possible to do "no multiply" neural nets using random sign flipping + WHT random projections and the signof function using the architecture in this paper:<br />https://arxiv.org/pdf/1705.07441.pdf<br /><br />In hardware all you would need are low transistor count, low power requirement integer add and subtract operations and a few other simple bit operators. Avoiding much more complex and space consuming multiply logic circuit. It should be quite easy to pipe-line the operations on a FPGA. One other thing is that you may not need full precision integer +- because 2's complement overflow would simply increase the amount of nonlinearity, but probably not too much. SeanVNhttp://www.blogger.com/profile/05967727000105480078noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-85552078723309688112017-05-23T17:16:45.628-05:002017-05-23T17:16:45.628-05:00I noticed this paper entitled "Exponential Ca...I noticed this paper entitled "Exponential Capacity in an Autoencoder Neural Network with a Hidden Layer."<br />https://arxiv.org/pdf/1705.07441.pdf<br />I would kinda guess the exponential capacity is because real value weights are used to encode the binary output. With finite precision arithmetic say, 16 bit half floats or 32 bit floats there probably is an optimal number of weights to sum together to get a result. After that you should likely use locality sensitive hashing to switch in different weight vectors. I also noticed recently that the new nVidia GPU chip offers a 120 Tflop half float matrix operation that might be suitable for random projections. If you were lucky you might get say 40 million 65536-point RPs per second out of it. That would be about 4000 times faster than I can get from my dual core CPU using SIMD. SeanVNhttp://www.blogger.com/profile/05967727000105480078noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-11507414279483606542017-05-19T12:52:24.840-05:002017-05-19T12:52:24.840-05:00Does it beat HYPERBAND though?Does it beat HYPERBAND though?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6141980.post-59510053660011431762017-05-16T04:17:42.032-05:002017-05-16T04:17:42.032-05:00I'll read that paper later:
On this video abou...I'll read that paper later:<br />On this video about lasso:<br />https://youtu.be/Hn8NtydkeDs<br /><br />I made this comment:<br /><br />"You are saying that the reconstructed data lies on an L1 manifold. You can learn a manifold using say a single layer neural network autoencoder. Then to reconstruct you can invert the dimensionally reduced data, get the autoencoder to correct it, send it back through the dimensional reduction and correct only the reduced aspect. Just bounce back and forth between the two.<br />Or you could set the manifold to be the moving average of the data which is a very easy manifold to correct to and bounce between the two. Anyway: https://drive.google.com/open?id=0BwsgMLjV0BnhOGNxOTVITHY1U28"SeanVNhttp://www.blogger.com/profile/05967727000105480078noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-54112671618408092032017-05-03T23:33:15.994-05:002017-05-03T23:33:15.994-05:00There is also an algorithm tsunami, not just a dat...There is also an algorithm tsunami, not just a data one!!!<br />Another possibility would be to do computational self-assembly of neural nets. <br />http://www.exa.unicen.edu.ar/escuelapav/cursos/bio/l21.pdf SeanVNhttp://www.blogger.com/profile/05967727000105480078noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-10224157172252653502017-05-03T11:05:15.590-05:002017-05-03T11:05:15.590-05:00Interesting bunch of articles Igor!
Cheers,
RaviInteresting bunch of articles Igor!<br /><br />Cheers,<br />RaviRavi Kiranhttp://www.blogger.com/profile/02116578557275934638noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-53489892968226296242017-05-02T17:30:36.675-05:002017-05-02T17:30:36.675-05:00It would be interesting to see this with Resnets t...It would be interesting to see this with Resnets too.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6141980.post-31426580021222435732017-04-26T00:38:09.441-05:002017-04-26T00:38:09.441-05:00I think what is happening is that you are getting ...I think what is happening is that you are getting unsupervised feature learning in the deeper layers and then one final readout layer. That may give a boost in performance in some circumstances. There probably are better ways to do unsupervised feature learning prior to a readout layer. There are also some aspects to do with noise and maybe some cooling effect over time as the system adapts. Thumbs up or thumbs down, I don't know. You be the judge. <br /> <br />SeanVNhttp://www.blogger.com/profile/05967727000105480078noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-83470436554465103092017-04-25T23:45:22.461-05:002017-04-25T23:45:22.461-05:00Re: https://openreview.net/pdf?id=HkXKUTVFl
I'...Re: https://openreview.net/pdf?id=HkXKUTVFl<br />I'm trying dropout in relation to the back error projection. Anyway there are tons of ideas to explore, especially if you start using fast random projection algorithms for both the back error projection and the forward aspects of a network.SeanVNhttp://www.blogger.com/profile/05967727000105480078noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-81323322034417661492017-04-24T07:51:28.809-05:002017-04-24T07:51:28.809-05:00Neat, will try.Neat, will try.SeanVNhttp://www.blogger.com/profile/05967727000105480078noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-75152063476741209672017-04-18T19:59:36.734-05:002017-04-18T19:59:36.734-05:00I was kind of interested in how you would evolve l...I was kind of interested in how you would evolve low precision neural networks.<br />There is the idea of scale free optimization where you don't pick a characteristic scale for mutations, instead mutations are evenly spread across the decades of magnitude you are interested in, and obviously there are not so many of those.<br />An example of such a mutation would be randomly + or - exp(-c*rnd()).<br />It is also highly related to using simple bit flipping as a mutation since a mutation of one bit in an 8 bit unsigned integer results in a change of 1,2,4,8,16,32,64 or 128. Which also follows an exponential curve.SeanVNhttp://www.blogger.com/profile/05967727000105480078noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-81701545458796594402017-04-18T09:43:01.071-05:002017-04-18T09:43:01.071-05:00That should have been 75 to 100 million 256-point ...That should have been 75 to 100 million 256-point FWHTs per second on a single GPU. <br /><br />https://estudogeral.sib.uc.pt/bitstream/10316/27842/1/Optimized%20Fast%20Walsh%E2%80%93Hadamard%20Transform.pdf<br /><br />Probably about 1 to 3 million 65536-point FWHTs/sec on a single GPU. 1 65536-point FWHT/sec needs 1 MegaFlop/sec. <br />The only thing you then need for random projections (RP) is to do a random sign flip of the data before the FWHT. For better quality repeat. Including random permutations could give you more entropy per step but is expensive time-wise.<br />One thing you could do is make a "soft" hash table as a possible discriminator for GAN neural nets, or as soft memory for deep networks.<br />https://drive.google.com/open?id=0BwsgMLjV0BnhellXODdLZWlrOWc<br /><br /> <br />SeanVNhttp://www.blogger.com/profile/05967727000105480078noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-4166639050989571962017-04-18T07:35:21.637-05:002017-04-18T07:35:21.637-05:00hi great work! Can you show me how the alexnet arc...hi great work! Can you show me how the alexnet arch looks like after using your method?Mosebahttp://www.blogger.com/profile/11055941569637032493noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-33817813511335606082017-04-18T07:33:43.694-05:002017-04-18T07:33:43.694-05:00Hi great work! Can you give me an example how Alex...Hi great work! Can you give me an example how AlexNet arch looks like after using your method?Mosebahttp://www.blogger.com/profile/11055941569637032493noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-52213062660811996622017-04-17T23:56:28.227-05:002017-04-17T23:56:28.227-05:00You can get 5000 65536-point 32 bit floating point...You can get 5000 65536-point 32 bit floating point Fast Walsh Hadamard Transforms per second on a single Intel Celeron core. They must be doing something terribly wrong.<br />https://drive.google.com/open?id=0BwsgMLjV0BnhMG5leTFkYWJMLU0<br /><br />You should be able to get something like 75 million to 100 million 65536-point FWHT's per second on a top of the range GPU. <br /><br />The basic algorithm is:<br /><br /> sub wht(x() as single)<br /> dim as ulongint i,j,k<br /> dim as ulongint hs=1,m=ubound(x) 'n-1<br /> dim as single a,b<br /> while hs<=m 'Walsh Hadamard transform<br /> i=0<br /> while i<=m<br /> k=i+hs<br /> while i<k<br /> a=x(i)<br /> b=x(i+hs)<br /> x(i)=a+b<br /> x(i+hs)=a-b<br /> i+=1<br /> wend<br /> i=k+hs<br /> wend<br /> hs+=hs<br /> wend<br /> end sub SeanVNhttp://www.blogger.com/profile/05967727000105480078noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-16647953914207460952017-04-16T02:42:13.377-05:002017-04-16T02:42:13.377-05:00What can you do about the remaining bit? A parity ...What can you do about the remaining bit? A parity check? Laurent Duvalhttp://www.blogger.com/profile/05343286920474971263noreply@blogger.comtag:blogger.com,1999:blog-6141980.post-13939259629821621102017-04-14T01:10:01.943-05:002017-04-14T01:10:01.943-05:00The specialized hardware these days tends to use 8...The specialized hardware these days tends to use 8 bit or less precision for speed.<br />Some of the early genetic algorithms used bit flipping as a mutation. That sounds naive however really it is a sort of scale free mutation. Where a mutation of an 8 bit unsigned number might be 1,2,4,8,16,32,64,128. A sort of exponential distribution.<br />You can compare that to the scale free mutation random + or - exp(-c*rnd()) where c is a positive number and rnd() returns 0 to 1 uniform. That mutation has uniform density across magnitude p(1)=p(0.1)=p(0.01) etc. <br />So I think random bit flipping is something you can try if you want to evolve deep neural nets of low precision, especially as back propagation is more problematic in such cases.<br />I'm sure I gave this reference before: https://pdfs.semanticscholar.org/c980/dc8942b4d058be301d463dc3177e8aab850e.pdfSeanVNhttp://www.blogger.com/profile/05967727000105480078noreply@blogger.com