Nuit Blanche community
@NuitBlog || Facebook (462) || Reddit (2403)
Compressive Sensing on LinkedIn (3984)
Advanced Matrix Factorization on Linkedin (13335)||
The Advanced Matrix Factorization Jungle Page ||
Monday, May 21, 2007
Tuesday, May 15, 2007
We now present a second model that uses Laplacians instead of Gaussians to model the posterior distribution of the depths. Our motivation for doing so is three-fold. First, a histogram of the relative depths (di - dj ) empirically appears Laplacian, which strongly suggests that it is better modeled as one. Second, the Laplacian distribution has heavier tails, and is therefore more robust to outliers in the image features and error in the trainingset depthmaps (collected with a laser scanner; see Section 5.1). Third, the Gaussian model was generally unable to give depthmaps with sharp edges; in contrast, Laplacians tend to model sharp transitions/outliers better.
Tree-based pursuit efficiently trades off complexity and approximation performance for overcomplete signal expansions. Finding the sparsest representation of a signal using a redundant dictionary is, in general, a NP-Hard problem. Even sub-optimal algorithms such as Matching Pursuit remain highly complex. We propose a structuring strategy that can be applied to any redundant set of functions, and which basically groups similar atoms together. A measure of similarity based on coherence allows for representing a highly redundant sub-dictionary of atoms by a unique element, called molecule. When the clustering is applied recursively on atoms and then on molecules,it naturally leads to the creation of a tree structure. We then present a new pursuit algorithm that uses the structure created by clustering as a decision tree. This tree-based algorithm offers important complexity reduction with respect to Matching Pursuit, as it prunes important parts of the dictionary when traversing the tree. Recent results on incoherent dictionaries are extended to molecules, while the true highly redundant nature of the dictionary stays hidden by the tree structure.
Monday, May 14, 2007
Last month, Andrew Gould, the CEO of Schlumberger gave a prep talk at an open house.
SCHLUMBERGER OPEN HOUSE
Schlumberger businesses and technologies demonstrations will include subsurface fluid sampling, integrated well completions, robotic tractors in a wellbore, reservoir modeling software, and geophysical seismic exploration.
10:00 a.m. to 4:00 p.m., Zachry Lobby
Chairman and CEO, Schlumberger
TITLE: Engineering Challenges (and Successes) in the Search for Oil and Gas
4:00 p.m., Room 102 Zachry
The open presentation attracted a large crowd. During the presentation, I was intrigued by the statement by Andrew that Schlumberger was positioning itself to be a provider of service for Carbon burying technology. But when you think about it, it makes sense as they have devised many services and technologies that are needed for this type of undertaking.
The room was full of people who looked like they wanted to be hired and so it was difficult to have any of them ask questions at the very end of the talk. Pissing off the CEO of the company you want to join, is a very compelling argument to not talk or ask question, or so they believe.... So I ended up having to do the dirty deed, but I was in fact really interested in several answers.
I have mentioned Schlumberger in this blog a while back, it was because of their ability to get signals from 3000 meters underground by using pulsed mud telemetry in the process generally known as Logging While Drilling. The main point was that, in order to save about 200 to 300K$ per day, they had to gather data at the drilling post in real-time so that they could steer the drilling bit (yes, drilling bits can go horizontal). Some people at Sandia have devised a Disposable Fiber Optic Telemetry System but it does not seem to have gain any traction in that industry. Pulsed mud bit rate is equivalent to an astonishing 30 bits per second transmission rate last time I checked. My question to Andrew was: have you guys done better in the past few years ? and the answer looked like a big maybe. He mentioned a new technology that uses some type of radio transmitter between each of the drilling rods but it did not seem to be a system that was yet currently used in the field. The mud communication system is an amazing piece of inventivness and the communication aspect of it is one of the most interesting problem to work on. Because of the very harsh constraints on the system (pressure, temperature,...) I am barely surprised that there isn't a better solution but I also think they should think outside the box on this one. My take would probably include using compressed sensing so that the amount of power generated in the measuring bit can be decreased tremendously. Heat generation (by the computers/electronics of the measuring bit) is non-trivial as there is little in the way of cooling when producing heat in these depths (the soil surrounding the bit is already warmer than the inside). Because of the high temperature environment, one also has to develop some better electronics to deal with these high temperature environment (see Sandia's presentation on electronics development and the need for new technology (SOI))
I then asked a question about the Canadian tar pits and the use of technology such as heat pipe to transfer energy from geothermal wells all the way up to the tar pits in order to warm them up so that they become liquid (i.e. less viscous and therefore more enconomical to retrieve from the ground). The answer looked like there is already have a program called "HTPT" that looks at that. HT may mean high temperature but I am sure what PT stands for.
And then I asked the "forward looking" question: if you wanted to differentiate yourself from your competitors in the next two or three years, where would you put your money in ? The answer was interesting because I was not expecting it. The way I interpreted what he said was: Data fusion, how do you combine the large amount of data produced in the field to have a clearer picture of your oil field (not just in three dimensions but also including time). When I went to talk to each of the engineers present at the different booth after the presentation, it did not seem that they had a view of what that entailed. One of the reasons mentioned was that most customers were not willing to put money into this type of analysis and so the company did not have a specific research team dedicated to that. The company itself is known to be dealing with very large amount of data and making sense of them for their customers. Yet summarizing that knowledge seems to be a difficult undertaking that most customers are only willing to do in-house. I am sure that an enterprising person with views on this issue could help them out. There is no reason to believe that developments in dimensionality reduction in the past few years should not be considered for those gigantic datasets.
Data fusion is also some kind of buzzword, so it may be productive to define what that means. In the measuring bit, there are different kinds of instruments, including neutron generators, radiation detectors, NMR and electromagnetic. Some of the current work seems to have been able to correlate seismic and flow measurements in order to provide a better assessment of the borehole condition. Therefore, a data fusion scheme would be aimed at correlating all the measurements from several types of sensors in order to provide additional information about either the location of the measuring bit and the time dependent geological conditions around that bit.
In order to do that, one has to compare measurements with computations. One of current generic concern is the ability to do inversion with Monte-Carlo codes such as MCNP (This is a very difficult problem because the solving of this inverse problem requires several many runs of forward computation by MCNP) or faster but coarser deterministic methods. You have many different parameters that you change (sensitivity studies) in order to figure out the distribution of parameters for the situation of interest.
Since MCNP or deterministic codes have many different parameters and are running in a finite time, one needs to have tools that provide a way of "interpolating" between parameters family you have not explored computationally. In the end, this problem is not unlike the problem faced in nuclear engineering when one runs a complex thermal hydraulics code: The Experimental Probabilistic Hypersurface tries to help in that respect.
Friday, May 11, 2007
Markerless Human Motion Capture with Multiple Cameras My Ph.D manuscript deals with the problem of markerless human motion capture. We propose an approach that relies on the use of multiple cameras and that avoids most of the constraints on the environment and the use of markers to perform the motion capture, as it is generally the case for industrial systems. The absence of markers makes harder the problem of extracting relevant information from images but also to correlate this information between. Moreover, interpreting this extracted information in terms of joint parameters motion is not an easy task. We propose an approach that relies on occluding contours of the human body. We studied the link between motion parameters and the apparent motion of the edges in images. Minimizing the error between the extracted edges and the projection of the 3D model onto the images allows to estimate the motion parameters of the actor. Among the opened issues, we show that using video based motion capture allows to provide additional hints such as contacts between body parts or between the actor and its environment. This information is particularly relevant for improving character animation.
While the initial interest is in capturing human motion at low cost (as opposed to current systems which cost up to $400,000), I believe this is the beginning of technology development that is central to the study and detection of autism in infants (3-6 months old). The current state of the affairs with regards to Autism detection is that one waits until speech is shown to be very late (about 2 year old) to begin diagnosing the disease even though it has been shown that the brain growth has been abnormal from 0 to 2 as shown by Eric Courchesne. With the advent of the digital world, home movies are beginning to be records of the state of knowledge on the condition of people. Some studies have shown at the same time that home movies could be used to figure out very early that something is not right (search Pub Med with the keywords: Autism movies). For instance, in
"Early recognition of 1-year-old infants with autism spectrum disorder versus mental retardation" by Osterling JA,Dawson G,Munson JA (Dev Psychopathol. 2002 Spring;14(2):239-51.), one can read the following:
Results indicated that 1-year-olds with autism spectrum disorder can be distinguished from 1-year-olds with typical development and those with mental retardation. The infants with autism spectrum disorder looked at others and oriented to their names less frequently than infants with mental retardation. The infants with autism spectrum disorder and those with mental retardation used gestures and looked to objects held by others less frequently and engaged in repetitive motor actions more frequently than typically developing infants.
Laurens van der Maaten just released a large matlab toolbox featuring 25 dimensionality reduction techniques [Updated: Dec 1, 2008]. And then you can use PRTools to do your own classification. Be careful what you wish for :-)
The list of techniques readily implemented:
Principal Component Analysis (PCA)
Linear Discriminant Analysis (LDA)
Multidimensional scaling (MDS)
Local Linear Embedding (LLE)
Local Tangent Space Alignment (LTSA)
Conformal Eigenmaps (extension of LLE)
Maximum Variance Unfolding (extension of LLE)
Fast Maximum Variance Unfolding (FastMVU)
Generalized Discriminant Analysis (GDA)
Stochastic Neighbor Embedding (SNE)
Neighborhood Preserving Embedding (NPE)
Locality Preserving Projection (LPP)
Linear Local Tangent Space Alignment (LLTSA)
Stochastic Proximity Embedding (SPE)
Multilayer autoencoders (training by RBM + backpropagation or by an evolutionary algorithm)
Local Linear Coordination (LLC)
Coordinated Factor Analysis (CFA)
I have had requests from several parties to give an overview on the type of obstacle avoidance scheme that might be most promising. Right now, we (Pegasus) are still evaluating some of these, so this entry should not be construed as part of our algorithm unveiling entries but rather a general overview we did a while back. It is important to realize that the main contribution of this entry is really about defining a hardware + software solution to the localization of cues that will be later learned/categorized. One large component of an obstacle avoidance solution is the machine learning/statistical device used to identify rapidly these cues as problematic for the autonomous vehicle or not. This is not unlike the human cortex (see reference section   ).
In the case of the Urban Challenge, we are facing not only stopped obstacles but moving one as well. The moving obstacle have behaviors from which one needs to learn from as well. In other words, in the case of vision, a lot of work boils down to producing some amount of cues/features (a small number) from a very large set of data (pixels from an image). In some areas of computer science this is called dimensionality reduction.
- The fly algorithm, a robust stereo algorithm using real time a genetic algorithm (yes, there is such thing as real time genetic algorithm!) and has been tried on cars. and specifically to has been used to avoid people and other objects. The initial thesis with the algorithm is in french. Improvement over the thesis have been focused on the car driving experience.
- There are also numerous commercial solutions as listed by the folks at V2_lab's where they discuss each of them. I found this entry pretty revealing about the state of the affairs with regards to stereovision, you have to look at the comment section
For most stereo matching algorithms the Firewire cameras produce higher quality uncompressed images that do not wreak havoc on sensitive feature detectors. I use the Unibrain Fire-I board camera http://www.unibrain.com/index.html with the CMU 1394 Digital Camera API (Windows), which gives you very complete control and works with just about any Firewire camera, because they all use the same standard interface. http://www.cs.cmu.edu/~iwan/1394/ . When I read the technical reports for the 2005 DARPA Grand Challenge almost every report showed pictures of vehicles equiped with stereo pairs of cameras, but at the race just about all of them had been removed, presumably because of basic issues such as camera synchronization.
- Monocular evaluation of the distance field. There are two approaches that caught our attention:
- Vision-based Motion Planning for an Autonomous Motorcycle on Ill-Structured Road, Dezhen Song, Hyun Nam Lee, Jingang Yi and Anthony Levandowski from the bike entry at the last Grand Challenge, and
- Depth Estimation Using Monocular and Stereo Cues / 2197, Ashutosh Saxena, Jamie Schulte, Andrew Y.Ng and Learning Depth from Single Monocular Images, Ashutosh Saxena, Sung Chung, and Andrew Y. Ng. In NIPS 18, 2006. [ps, pdf]
- (With regards to Monocular information, we should not forget the excellent Mono-SLAM : This website has a Matlab implementation of the SLAM using monocular vision. There is a timely thesis on the subject here where it looks like using two cameras implementing both the monoslam algorithm.)
- A Random Lens Imager, it is a hardware implementation of a totally new concept in data processing known as compressed sensing (don't ask anybody around you about it because it is too new). It needs only one camera but much of the work goes into the calibration.
 T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber and T. Poggio. Object recognition with cortex-like mechanisms. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 29 (3), pp. 411-426 , 2007 T. Serre. Learning a dictionary of shape-components in visual cortex: Comparison with neurons, humans and machines, PhD Thesis, CBCL Paper #260/MIT-CSAIL-TR #2006-028, Massachusetts Institute of Technology, Cambridge, MA, April, 2006
(Page 154-163 have the model parameters).
 T. Serre, L. Wolf and T. Poggio. Object recognition with features inspired by visual cortex. In: Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), IEEE Computer Society Press, San Diego, June 2005
Thursday, May 10, 2007
Sparsify is a set of Matlab m-files implementing a range of different algorithms to calculate sparse signal approximations. Currently sparsify contains two main sets of algorithms, greedy methods (collected under the name of GreedLab) and hard thresholding algorithms (collected in HardLab)Algorithms implemented include:
- Orthogonal Matching Pursuit (in several flavors),
- Matching Pursuit algorithm, Greedy Pursuit,
- Approximate Conjugate Gradient Pursuit,
- Nearly Orthogonal Matching Pursuit with partial Conjugate Gradients,
- Iterative hard thresholding algorithm.
Reference: Rice Compressed sensing site.
Monday, May 07, 2007
As a small team, we had to make choices that are reflected in our choice of hardware and algorithm. One of the most time intensive aspect of devising a robot is the development of an accurate motion model and then an adequate sensor/perceptual model .
We could not spend an entire team and the time to devise a motor model as well as a sensor model as what other teams have done. Some of it stems from the fact that we are not interested in getting M.S. theses written for the moment as it is the case here. Furthermore, our vehicle is far from being protected from the elements. And we considered that it would become difficult to rely on a model that could become less and less accurate as the car was getting older. Another aspect of our decision making came from the actuators we used. In the case of the steering wheel, there are moments when the steering motor can slip (especially in large deviations). This is owed to two factors: the laptop is busy doing other tasks can sometime send only the steering task with a delay or there is too much torque too be applied too fast. This results in a command control that would be difficult to model. Finally, we were interested in developing a robot that would learn it's own behavior while being supervised with very little labeled data. In other words, we want the learning to be based on data coming from an actual driving behavior. Not much data understanding should come from any subsequent data processing.
Our motor model relies on our ability to drive around while in the car through the keyboard of the command laptop. This supervised learning of the motor model was built by producing a probability distribution of the next state of the car as a function of the previous state and the commands sent by the "supervised" commands. We used a a variety of techniques to build this probability distribution including the simple but robust Experimental Probabilistic Hypersurface devised at SCM  as well as Subjective mapping representation techniques (by Michael Bowling, Ali Ghodsi and Dana Wilkinson, Subjective Localization with Action Respecting Embedding and ,   )
In the latter case, we decided against using the SDE approach but use Compressed Sensing instead for the dimensionality reduction aspect of the problem for building the sensor model. Some aspect of the reduction include some of the ideas in .
 D. Fox, J. Hightower, L. Liao, D. Schulz, and G. Borriello., Bayesian Filtering for Location Estimation, IEEE Pervasive Computing, 2003.
 Robust Mathematical Modeling, SCM, Experimental Probabilistic Hypersurfaces
 [pdf] [ps.gz] Subjective mapping, Michael Bowling, Dana Wilkinson, and Ali Ghodsi.
In New Scientific and Technical Advances in Research (NECTAR) of the Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI) , pages 1569--1572, 2006.
 [pdf] [ps.gz] Subjective localization with action respecting embedding, Michael Bowling, Dana Wilkinson, Ali Ghodsi, and Adam Milstein. In Proceedings of the International Symposium of Robotics Research (ISRR) , 2005.
 [pdf] Action respecting embedding, Michael Bowling, Ali Ghodsi, and Dana Wilkinson.
In Proceedings of the Twenty-Second International Conference on Machine Learning (ICML) , pages 65--72, 2005.
 [pdf] [ps.gz] Learning subjective representations for planning, Dana Wilkinson, Michael Bowling, and Ali Ghodsi. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI) , pages 889--894, 2005.
 INRIA :: [inria-00117116, version 1] Real-time Vehicle Motion Estimation Using Texture Learning and Monocular Vision
Friday, May 04, 2007
In a previous entry, I mentioned that we would be unveiling our algorithm. Before we do this, I will talk about the different sensors we have and use.
- GPS: Since the race is mostly a mapping between the vehicles knowledge of its surrounding and a set of maps given by DARPA with GPS coordinates ( see Route Network Definition File (RNDF) and Mission Data File (MDF) ), we need to have a GPS. We do not have a DGPS, just a normal GPS chip with which we are interfacing with a an RS-232 interface at a rate of 1 Hz. To be specific this is a GARMIN GPS 18 PC, OEM SYSTEM, WAAS.
- IMU: We also have a Microstrain MEMS IMU that provides acceleration, turning rate and heading (3DM-GX1). It provides information at 100 Hz.
- Vision system: We use a series of webcams and a set of Unibrain firewire cameras. Right now our frame rate os about 15 Hz.
All these sensors interface with the Python program through an RS-232 channel. In terms of sophistication, it just does not get any better for us. One of the underlying reason is the realization that gathering data is one thing, using them efficiently is a totally different one. In particular, there are instances where reading and processing information from the IMU is not interesting.
With regards to the actuators, we currently have two large stepper motors connecting to the python program through the serial port. The first stepper motor rotates the steering wheel and the second one activates either the brake or the acceleration pedal.
One of the ways to do supervised learning, is to run the python program from a laptop that connects to both the sensors and the stepper motors. One can then run the car through the keyboard of the laptop. It works well as long as Skype is not running on the laptop at the time ( yes, we tried :-), it's a little bit like talking on your cell while driving....
In my next entry, I will discuss the modification to the webcams and firewire cameras so that they provide meaningful information. In particular, I will talk about the algorithm for the stereo system as well as the hardware and software implementation of the compressed sensing element of our algorithm (a random lens imager using a webcam). Both are in competition and we believe that the stereo system will not need to be used eventually.
Tuesday, May 01, 2007
A. Basic NavigationThe current rules are here. The site visit is scheduled to occur in San Antonio. The RNDF file describes the road to the vehicle. The MDF is the file that describes what is expected from the vehicle for its mission. There is no mention in the MDF of potential obstacles, traffic or GPS outage (as generally seen in cities with large buildings like Paris or New-York, a phenomenon called the canyon effect). The expectation is for the autonomous vehicle to follow the rules to get out these unexpected problems and accomplish its mission within the alloted time. DARPA decides on May 10th which team to invite at the Site visits.
B. Basic Traffic
- A. 1Preparation for run
- A.2 Mission start
- A.3. Checkpoints
- A.4. Stay in lane
- A.5. Speed limits
- A.6. Excess delay
- A.7. Collisions
- A.8. Stop line
- A.9. Vehicle Separation
- A.10. Leaving lane to pass
- A.11. Returning to lane after pass
- A.12. U-turn
C. Advanced Navigation
- B.1. Basic Navigation
- B.2. Intersection precedence
- B.3. Minimum following distance
- B.4. Queueing
D. Advanced Traffic
- C.1. Basic Traffic
- C.2. Obstacle field
- C.3. Parking lot
- C.4. Dynamic re-planning
- C.5. Road following
- C.6. GPS outage
- D.1. Advanced Navigation
- D.2. Merge
- D.3. Vehicle separation during merge
- D.4. Left turn
- D.5. Vehicle separation during left turn
- D.6. Zones
- D.7. Emergency braking
- D.8. Defensive driving
- D.9. Traffic jam
Technical Note: A parser for the RNDF and MDF can be found here.