Saturday, June 30, 2007

Of Mice and Floating Men: Dimensionality Reduction of Biological Processes

Joe Z Tsien in the recent Scientific American (free pdf here until today) talks about how he devised different techniques with mice in order to understand what a memory is. In the course of the analysis he used Multiple Discriminant Analysis (MDA) to analyze how cell activation location where related to specific memories of different events (such as falling or an earthquake). The video of the MDA in action is here. This is fascinating on two counts: a Machine learning technique producing dimension reduction is used to isolate specific elements of an actual learning process. In the detailed version of the article, Tsien remarks that the initial feature space is about 260 and the subspace of interest is 3:

After collecting the data, we first attempted to tease out any patterns that might encode memories of these startling events. Remus Osan--another postdoctoral fellow--and I analyzed the recordings using powerful pattern-recognition methods, especially multiple discriminant analysis, or MDA. This mathematical method collapses what would otherwise be a problem with a large number of dimensions (for instance, the activities of 260 neurons before and after an event, which would make 520 dimensions) into a graphical space with only three dimensions. Sadly, for classically trained biologists the axes no longer correspond to any tangible measure of neuronal activity but they do map out a mathematical subspace capable of discriminating distinct patterns generated by different events.

One cannot be but thinking how this tool (MDA) is in fact mimicking the biological process. Tsien then goes on to talk about the hierarchical structure of the memory process not unlike the primary visual cortex model from MIT whereby the visual process reduces the feature space from several hundred dimensions down to very few.

The second fascinating aspect of this experiment with mice is the similarity with an experiment I went through multiple times: flying in zero gravity. Besides the floating experience, I have always noted the inability of most flyers to clearly remembers their experiences. This is so bad, that in any of the free flying planes used by either NASA or ESA, there are large counters of parabolas informing people how many remaining zero-g periods they will have to experience over the course of two hours.
One could think that this is due to the fact that people are not accustomed to the conditions and that it is somehow traumatic. It so happens, that even after 50 flights, I still have the hardest time remembering precisely what happened during these flights. My point is : Is the paper by Tsien really locating a real memory or something else ?

Thursday, June 21, 2007

On-going concern

It is not a fair picture of Pegasus Bridge 1, as it is obviously missing the light bulbs. DARPA will visit us next week. As usual we are going through some intense last minute implementations and preparations. One of them is having to find enough people to show up at the visit (all of us have day jobs or cannot make it in one day to Texas) as we are required to have five people on-site.

Wednesday, June 20, 2007

Tuesday, June 19, 2007

Google Press Day 2007

I took a break and went to Google Press Day. Of all things, we were entertained with how Google want to reduce Carbon emission by using evaporative cooling for their data centers (this is why they want to be close to rivers). You never know when people are going to talk about two-phase flow. Nice pens and tee-shirts.

Machine Learning and Compressed Sensing

I just did an entry on the subject in John Langford's blog. It is entitled: How is Compressed Sensing going to change Machine Learning ?

Tuesday, June 12, 2007

Compressed Sensing: Universal Dimensionality Reduction, Pattern Matching and a Biology Inspired Faster Reconstruction Algorithm

[March 2, 2010: Wired/Slashdot readers, you may want to read this dedicated blog entry first!]

Mike Wakin and Richard Baraniuk have improved the initial draft on Random Projections of Smooth Manifolds and they are now fully mentioning the point that Compressed Sensing should be seen as a Universal Dimensionality Reduction tool. While they write about how that technique can help in the rapid computation of other schemes such as ISOMAP or MVU, it also propels the technique in the world of Machine Learning. There are several advantages for using this technique for manifold discovery. Not the least is the ability to avoid computing pairwise distances first. Another aspect that will certainly be investigated is how the approach of Shihao Ji, Ya Xue, and Lawrence Carin on Bayesian compressive sensing could be used to discover the dimension of the manifold. In effect, most current techniques in dimensionality reduction cannot do this since the dimension of the manifold is left as a parameter to be discovered through fitting. One of the most significant idea to take from the paper is that the manifold itself allows for reduced number of element to describe the data. This is new and tighter result in the world of randomized dimensionality reduction:

Additionally, our results suggest that, for processing large volumes of data concentrated on manifolds, the number of requisite dimensions for a structure-preserving mapping should derive from the properties of the manifold itself, rather than the number of data points (in contrast to the JL lemma).

Indeed, the Johnson-Lindenstrauss (JL) Lemma has been known since 1984 and while a Fast Transform has been developed for it [2], it is dependent on the number of samples rather than on the manifold on which these samples are from. To put it in a different light: ever since JL, random projections have been a subject of much interest. But because the bound on the minimum number of dimensions grows with the number of points, one is still facing what people call the curse of dimensionality (albeit a small version of it.) What the compressed sensing framework and what this paper bring to the table is that there is a limit on the number of random projections you need to compute (or obtain) in order to obtain all the information about the object you are studying. And that limit is not connected to the size of the sample set. This is big.

Another item caught my limited attention on the Rice Compressed Sensing page. It is an article by Jong Chul Ye on Compressed sensing shape estimation of star-shaped objects in Fourier imaging. The reason it caught my eyes is because one could begin to use Compressed Sensing as part of a pattern matching algorithm used for the rapid detection of extraterrestrial objects or flower constellations.

Compressed Sensing seems to get more traction in the MRI field with an entire session at a meeting dedicated to it , for more information check out: HYPR -HighlY constrained back PRojection- and Compressed Sensing.

My attention and writings are limited these days because of the upcoming DARPA site visit for our vehicle but I am sure I will come back and write something more in-depth about the very interesting paper by Christopher Rozell, Don Johnson, Richard Baraniuk and Bruno Olshausen on Locally Competitive Algorithms For Sparse Approximation. It is interesting because, for the first time, a scheme is devised toward solving the sparsity issue in time and has the potential to imitate the visual cortex and produce new codecs.

[1] Rice Compressed Sensing page.
[2] Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform [ pdf ] [slides (ppt)] by Nir Ailon, Bernard Chazelle.

France: Technologies de rupture et quantification de la maturite d'une technologie.

Lors de ma visite au Salon Europeen de la recherche, j'ai parle avec certaines personnes de OSEO, anciennement ANVAR (plus une autre entite dont je ne me rappelle plus le nom). Lors de la discussion, nous sommes arrives au sujet des baremes/echelles qui sont utilises en France de facon a quantifier le niveau de maturite d'une technologie. C'est important car du point de vue programmatique, les administrations et autres donneurs d'ordres prives doivent etre capable de dire aux chercheurs leurs besoins dans des termes qui sont simples. Cela permet de ne pas perdre son temps sur des technologies qui ne sont pas avancees ou qui le sont trop. Il semble que ce processus d'identification se fait au sein d'OSEO grace a l'utilisation d'experts. C'est interessant mais ce n'est pas le plus important. Il y a beaucoup de technologies que meme les experts ne peuvent juger correctement soit de par leur formation ou a cause d'une connaissance trop profonde des choses qui se font maintenant dans leur domaine. Il y a un vrai risque que nous passions, en France, a cote de technologies de rupture. Bien que ce mot soit a la mode, il est utilise, avec en tete, la definition de Clayton Christensen qui a defini le concept avec son livre "The Innovator's Dilemna" dont le premier chapitre se trouve ici. En resume, les technologies de rupture sont souvent des technologies que les experts ne considerent pas comme viable mais qui est capable d'avoir des parts de marche tres importantes dans des marches "exotiques". Cela leur permet de survivre et de s'affiner jusqu'au jour ou elles supplantent les technologies qui sont deja sur les marches plus traditionnels.

Pour en revenir a l'evaluation des technologies, il y a ce qu'on appelle le niveau de maturite d'une technologie, ou ce que l'on appelle en Americain: Technology Readiness Level (TRL). C'est un concept qui permet aux decideurs techniques, economiques et politiques de mieux cerner les differents niveaux d'avancement ou de maturite de certaines technologies de facon a permettre de repondre a certains besoins. Par exemple, la NASA ne finance en ce moment que des technologies de niveau TRL 8 a 9 pour une majorite de systemes qui iront sur la station spatiale alors que la NSF est dans le financement de technologies de niveaux TRL 1 a 4 (au maximum). Ce tableau est issue d'une traduction de l'entree de TRL sur wikipedia que j'ai modifie (je ne suis pas expert en traduction donc je suis ouvert a tout changement). Il est assez recent et a ete compose par John Mankins parcequ'il y avait beaucoup de confusion au sein de la NASA sur le choix des technologies a developer.

Niveaux de maturite des technologies a la NASA (Source : Mankins (1995), niveaux de maturite des technologies : Un livre blanc)
Niveau de maturite des technologie Description
TRL 1. Principes de base observés et rapportés C'est le « niveau le plus bas » de maturite d'une technologie. À ce niveau, la recherche scientifique commence à être traduite en recherche et développement appliqués.

This is the lowest "level" of technology maturation. At this level, scientific research begins to be translated into applied research and development.
TRL 2. Concept et/ou application de technologie formulés Une fois qu'on observe les principes physiques de base de cette technologie, des applications pratiques de ces caractéristiques peuvent « être inventées » ou identifiées au prochain niveau de maturite. À ce niveau, l'application de la technologie est encore spéculative : il n'y a pas de preuve expérimentale ou d'analyse détaillée pour soutenir la conjecture.

Once basic physical principles are observed, then at the next level of maturation, practical applications of those characteristics can be 'invented' or identified. At this level, the application is still speculative: there is not experimental proof or detailed analysis to support the conjecture.
TRL 3. Fonction critique analytique et expérimentale et/ou preuve caractéristique du concept À cette étape dans le processus de maturation, la recherche et le développement actifs (R&D) sont lancés. Ceci doit inclure des études analytiques pour placer la technologie dans un contexte approprié et des études en laboratoire pour valider physiquement que les prévisions analytiques sont correctes. Ces études et expériences devraient constituer la preuve de la validation des applications et des concepts formulés niveau precedent (TRL 2).

At this step in the maturation process, active research and development (R&D) is initiated. This must include both analytical studies to set the technology into an appropriate context and laboratory-based studies to physically validate that the analytical predictions are correct. These studies and experiments should constitute "proof-of-concept" validation of the applications/concepts formulated at TRL 2.
TRL 4. Validation de composant et/ou en prototype dans l'environnement du laboratoire Après avoir valider les applications et les concepts formules au niveau TRL2, des éléments technologiques de base doivent être intégrés de facon a établir que chacun des « morceaux » de la technologie travailleront bien ensemble. Ceci afin de documenter et prouver des niveaux de performance d'un composant et/ou d'un prototype. Cette validation doit être conçue pour soutenir le concept qui a été formulé plus tôt, et devrait également adherer aux conditions des applications potentielles de système. La validation est relativement de « basse fidélité » comparée au système final : elle pourrait se composer de composants mis en place ensemble dans un laboratoire.

Following successful "proof-of-concept" work, basic technological elements must be integrated to establish that the "pieces" will work together to achieve concept-enabling levels of performance for a component and/or breadboard. This validation must be devised to support the concept that was formulated earlier, and should also be consistent with the requirements of potential system applications. The validation is relatively "low-fidelity" compared to the eventual system: it could be composed of ad hoc discrete components in a laboratory.
TRL 5. Validation de composant et/ou du prototype dans l'environnement approprié À ce niveau de maturite, la fidélité du composant et/ou du prototype au produit final doit avoir augmenter de manière significative. Les éléments technologiques de base doivent être intégrés avec des éléments de support raisonnablement réalistes de sorte que toutes les applications (niveau composant, niveau de sous-ensemble, ou niveau système) puissent être examinées dans un environnement « simulé » ou quelque peu réaliste.

At this level, the fidelity of the component and/or breadboard being tested has to increase significantly. The basic technological elements must be integrated with reasonably realistic supporting elements so that the total applications (component-level, sub-system level, or system-level) can be tested in a 'simulated' or somewhat realistic environment.
6. Système/modèle de sous-ensemble ou démonstration de prototype dans un environnement approprié (sur terre ou dans l'espace) Une étape importante au niveau de la fidélité de la démonstration de la technologie suit l'accomplissement du niveau TRL 5. Au niveau TRL 6, un système représentatif de modèle ou de prototype ou du système - qui dépasseraient bien un agencement ad hoc de composants ou un prototype avec des composants simple non integres - serait examinée dans un environnement approprié. À ce niveau, si le seul « environnement approprié » est l'environnement de l'espace, alors le modèle/prototype doit être démontré dans l'espace.

A major step in the level of fidelity of the technology demonstration follows the completion of TRL 5. At TRL 6, a representative model or prototype system or system - which would go well beyond ad hoc, 'patch-cord' or discrete component level breadboarding - would be tested in a relevant environment. At this level, if the only 'relevant environment' is the environment of space, then the model/prototype must be demonstrated in space.
TRL7. Démonstration de prototype de système dans un environnement de l'espace Le niveau TRL 7 est une étape significative au delà du niveau TRL 6, qui exige une démonstration réelle de prototype de système dans un environnement de l'espace. Le prototype devrait être près d'un niveau opérationnel et la démonstration de cette technologie doit avoir lieu dans l'espace.

TRL 7 is a significant step beyond TRL 6, requiring an actual system prototype demonstration in a space environment. The prototype should be near or at the scale of the planned operational system and the demonstration must take place in space.
TRL8. Système réel accompli et « vol qualifié » par l'essai et la démonstration (sur la terre ou dans l'espace) Dans presque tous les cas, ce niveau est la fin du « développement d'un système technologique» pour la plupart des éléments de cette technologie. Ce niveau pourrait etre l'intégration de cette nouvelle technologie dans un système existant.

In almost all cases, this level is the end of true 'system development' for most technology elements. This might include integration of new technology into an existing system.
TRL9. Système réel « vol prouvé » par des opérations réussies de mission Dans presque tous les cas, la fin des aspects de réparation de dernier « bogue » du « développement du systeme technologique » final. Ce niveau pourrait inclure l'intégration de cette nouvelle technologie dans un système existant. Ce niveau de maturite n'inclut pas l'amélioration des systèmes en operation ou réutilisables.

In almost all cases, the end of last 'bug fixing' aspects of true 'system development'. This might include integration of new technology into an existing system. This TRL does not include planned product improvement of ongoing or reusable systems.

Il y a un tableau similaire pour l'armee (Air Force).

[ PS: Bien que cette table existe depuis 1995 et est utilise au sein de la NASA depuis 1998, il se trouve que les administrations et organismes d'etats francais commencent seulement a l'utiliser: Exemple les recentes traductions du CNES ou dans le document 2006 de la politique et objectifs scientifiques de la DGA


Friday, June 01, 2007

Sparse Representations and High Dimensional Geometry: The upcoming success of randomized algorithms

In the short course on Sparse Representations and High Dimensional Geometry, there are obviously several items of interest to compressed sensing and the general framework around it. While Compressed Sensing could be done without random measurements, it seems that its connection to that subject has brought a lot exposure to subject areas that were using randomization as well. Joel Tropp's presentation on matching pursuit is interesting to follow in bringing some theoretical background to matlab codes already available.

Because the Fourier transform is at the heart of the first revolution in numerical analysis, I am very anxious to see how both developments in reconstruction from Fourier measurements as highlighted by Roman Vershynin and the algorithm for the Very Fast Fourier Transform of Jing Zou will pan out.

As a way to tickle one's interest in randomized algorithms, one can take a look at this intriguing paper from Roman on A randomized solver for linear systems with exponential convergence.