Compressing 35GB of Data in 35 Pages of Numbers by Philon Nguyen
Usual information theoretical results show a logarithmic compression factor of value spaces to digital binary spaces using p-adic numbering systems. The following paper discusses a less commonly used case. It applies the same results to the difference space of bijective mappings of $n$-dimensional spaces to the line. It discusses a method where the logarithmic compression factor is provided over the Hamming radius of the code. An example is provided using the 35GB data dump of the Wikipedia website. This technique was initially developed for the study and computation of large permutation vectors on small clusters.
Combinatorial Spaces And Order Topologies by Philon Nguyen
An archetypal problem discussed in computer science is the problem of searching for a given number in a given set of numbers. Other than sequential search, the classic solution is to sort the list of numbers and then apply binary search. The binary search problem has a complexity of O(logN) for a list of N numbers while the sorting problem cannot be better than O(N) on any sequential computer following the usual assumptions. Whenever the problem of deciding partial order can be done in O(1), a variation of the problem on some bounded list of numbers is to apply binary search without resorting to sort. The overall complexity of the problem is then O(log R) for some radius R. A logarithmic upper-bound for finite encodings is shown. Also, the topology of orderings can provide efficient algorithms for search problems in combinatorial spaces. The main characteristic of those spaces is that they have typical exponential space complexities. The factorial case describes an order topology that can be illustrated using the combinatorial polytope . When a known order topology can be combined to a given formulation of a search problem, the resulting search problem has a polylogarithmic complexity. This logarithmic complexity can then become useful in combinatorial search by providing a logarithmic break-down. These algorithms can be termed as the class of search algorithms that do not require read and are equivalent to the class of logarithmically recursive functions.
Iterative Decoding Beyond Belief Propagation by Shiva Kumar Planjery, Shashi Kiran Chilappagari, Bane Vasic, David Declercq, Ludovic Danjean. The abstract reads:
At the heart of modern coding theory lies the fact that low-density parity-check (LDPC) codes can be efficiently decoded by belief propagation (BP). The BP is an inference algorithm which operates on a graphical model of a code, and lends itself to low-complexity and high-speed implementations, making it the algorithm of choice in many applications. It has unprecedentedly good error rate performance, so good that when decoded by the BP, LDPC codes approach theoretical limits of channel capacity. However, this capacity approaching property holds only in the asymptotic limit of code length, while codes of practical lengths suffer abrupt performance degradation in the low noise regime known as the error floor phenomenon. Our study of error floor has led to an interesting and surprising finding that it is possible to design iterative decoders which are much simpler yet better than belief propagation! These decoders do not propagate beliefs but a rather different kind of messages that reflect the local structure of the code graph. This has opened a plethora of exciting theoretical problems and applications. This paper introduces this new paradigm.
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.
The thesis referenced in "Compressing 35GB of Data in 35 Pages of Numbers" by the optimistic Mr. Nguyen does not appear to exist yet.
ReplyDeleteIrchans,
ReplyDeleteYou are absolutely right.
Igor.
Luckily, putting this putative thesis as a mere reference between Turing, Shannon and Hamming should suffice.
ReplyDeleteYes Laurent, we are indeed anxiously waiting for the thesis to come out.
ReplyDeleteIgor.
Here is the thesis:
ReplyDelete3141592
Njh,
ReplyDeleteNice!
Igor
Hi,
ReplyDeleteI am not aware of the easter eggs in arxiv, but is there a way to improve one's h-index with this kind of publication?
1) He references himself and three legendary people with high h-index.
2) He works in a field with little access for the noobs.
3) The size of the pdf file without the first and the last page is 512 ko... It reminds me the size of a direct-mapped cache.
4) I guess his work is in fact about the arxiv algorithm and indexation.
Very nice one for 3141592 :)
Nico.
Nico,
ReplyDeleteI don't think the h-index takes into account preprint such as those on arxiv. To get a high h-index you have to be put in the references of other papers which I doubt will happen if the paper is not what it is promising.
Igor.
LOL, that's not the first nor the last bogus claim to astounding compression ratios.
ReplyDeleteThere seem to be something "magical" to compression which draws crackpots in droves.
OTOH it is true that you can compress ONE data set of your choosing to a single bit, say you prefix one 0 bit to any data set which is not the "choosen one" to indicate no compression and send only a 1 bit to specify that the data set IS the choosen one, et voila! you "sent" the full data set with a single bit at the cost of a very minor increase of the size of anything else... :-D
Kev,
ReplyDeleteYou are absolutely right. I am still waiting to see what will come out of the thesis but, like you, I am not holding my breath.
Igor.
The thesis got censured by the Canadian government and the student received death threats.
ReplyDelete