Showing posts with label data fusion. Show all posts
Showing posts with label data fusion. Show all posts

Wednesday, December 07, 2011

Leonardo's Challenge

Forget Darpa's shredder challenge, this one has waited millions of years to get to us. You probably recall this entry on Leonardo, a fossilized mummy of a 65 million years old dinosaur (i.e. internal organs and the skin have been fossilized!). If you don't remember that entry, please take a second to read it again. I also added the attendant video:









I have been talking to some of the folks involved in this video. In particular Tom Kaye and Art Anderson. Tom is the person who provided software that could enhance the shots while Art is the person behind the actual X-ray taking at Ellington Fields (the shots were so powerful they had to do them at night when nobody would be around the hangars where the X-ray machine performed).

These X-ray shots are very unique and Art kindly provided a large suite of these shots for this challenge. Leonardo's challenge is pretty simple: Can we reconstruct something beyond just playing on the contrast ? Reconstruction may not mean 3D reconstruction, one could simply play with the contrast and SIFT points to assemble the different images together (as we don't have any reference on how these shots were taken)..it's your turn to be smart about how to use this pretty unique dataset....I'll feature the best efforts on the blog.

Here some example of enhancement Tom did:

Before

After



The Leonardo Challenge Dataset is here.

Thanks Art for making it happen.

Saturday, September 13, 2008

Leaving Houston

I have talked about Leaving Houston before when under pressure from the elements. Here are data that did not know existed in 2005. Doppler radar data over houston (Looks like it is avoiding Texas A&M University)





and a view of what a 4 meter or 6 meter flood can do to the Coast and live view of the weather.



Thank you Pedro for the one-week heads-up.

Monday, July 21, 2008

CS: Learning Sparse Representations for Audiovisual Signals



Gianluca Monaci
and Friedrich Sommer, submitted a poster at Computational and Systems Neuroscience 2008 (COSYNE08) conference entitled: Learning Sparse Representations for Audiovisual Signals. They try to build a model of the early integration of cross modal fusion of audio and visual signals.



Tuesday, April 15, 2008

CoSaMP, CVX , Mapping and Search and Rescue.


I have reshaped David Mary's script presented in MMA14 for CoSaMP and put it in the Compressive Sensing Code section. If there is any mistake, it's mine.


CVX: Matlab Software for Disciplined Convex Programming by Michael Grant, Stephen Boyd, and Yinyu Ye seems to have a larger Users' Guide.

I have found this interesting kit that may provide much ability in terms on performing some type of compressive sensing with cameras. Please note the heart shape coming from out of focus light in the photo above. More information can be found here or on lensbabies site. Instead of cutting off a lens housing, one can directly buy one to do some heterodyning camera. But I am sure we can think of even better things.




While some people seem to think that you need a GPS-Camera phone to know where you are (the eye-phone) other people like James Hays and Alexei Efros seem to trust the crowd to do part of the job as mentioned in im2gps: estimating geographic information from a single image.


It now looks like we can use satellite imagery to find the boats that slowed down the internet. This needs to be implemented in search and rescue operations. The search for Jim Gray showed that the use of this type of imagery took too long to be processed to be useful for search and rescue teams. On May 31, 2008, there will be a tribute to Jim Gray at Berkeley, I don't think I will be able to attend. I still believe that the main challenges listed here are not solved. I have mentioned some solutions (1, 2) by I am sure more can be found. All the entries on search and rescue operations can be found here. Following up on that, Alexandre Jenny tells me they have started to implement the orthographic projection capability in Autopano. This is a very good news as it will provide the ability to do low level flying, collect images and make maps out of them. This would be a very important capability in case of major disaster ("but that can’t be — it’s still in Google Maps!": Making maps using commercial overflights ). I definitely need to put together a page that summarize all the entries written in this blog on the subject since it seems to be of interest to different sets of people (search and rescue teams, journalists,...)

Thursday, December 27, 2007

Building a Roaming Dinosaur: Why is Policy Learning Not Used ?


Dubai is going to host a Jurassic Park. It is about time: the current display of Animatronics are very much underwhelming and certainly do not yield the type of magic moment as displayed by Laura Dern's face in Jurassic Park. Yet, all over the world, kids and families line up to pay from $10 to $100 for the privilege of being wowed. The most interesting feature of the Dubai 'Resteless Planet' undertaking will be the claimed ability for the dinosaurs to be roaming. This is no small feat as none of the current animatronics are able to do that. I have a keen interest in this as you probably have noticed from the different entries on muscles, scaling and various autonomous robotics undertaking.

So over the years, I have kept an eye on the current understanding of dinosaur gait. Examples on the web can be found here and here. A sentence seems to be pretty much a good summary:

When the American Museum of Natural History wanted to create a digital walking Tyrannosaurus rex for a new dinosaur exhibit, it turned to dinosaur locomotion experts John Hutchinson and Stephen Gatesy for guidance.

The pair found the process humbling.

With powerful computers and sophisticated modeling software, animators can take a pile of digital bones and move them in any way they want. This part is easy; but choosing the most likely motion from the myriad of possibilities proved difficult......

The researchers think that one way to narrow down the possibilities of dinosaur movement is to use more rigorous physical constraints in computer models. These constraints fall into two broad categories: kinematic (motion-based) and kinetic (force-based). One simple kinematic constraint, for example, is that the ankle and knee cannot bend backwards.
So in short, it is one thing to know the skeleton, it is another one to devise how the movement goes. Yet, the current methods devised to figure gait are relying on not so sophisticated methods. As it turns out, we have a similar problem in robotics and machine learning. Because robots are becoming increasingly complex, there needs to be new methods of collecting data and summarizing them in what are called 'policies'. New methods are able to learn behavior for robots even though they have many degrees of freedom though some type of supervised learning. Some of the techniques include Non-Negative Matrix Factorization (NMF), diffusion processes and some of the techniques we tried in our unsuccesful attempt in DARPA's race in the future.

[ Update: a dinosaur finding showing preserved organic parts shows us that basing our intuition on just bones is not enough. It looks as though dinosaurs may have been much larger. Ona different note, it is one thing to model human behavior (and by extension dinosaur behavior) using Differential equations,but the problem you are trying to solve is 'given a certain behavior, how can it fit the model set forth by the differential equations?'. This is what is called an inverse problem and while a set of differential equations may give you a sentiment that you are modeling everything right, they generally are simplification of the real joint behavior and their interaction with the environment (soil,...). In short, to give a sense of realness, you have to go beyond a description with differential equations alone, for these reasons alone. For this reason, building a real roaming dinosaur need the type of undertaking mentioned above in this entry ]

Tuesday, September 18, 2007

Imaging from the sky: When You Become The Map

There is a new call from the HASP folks about submitting new payloads to be flown next year in a NASA high altitude balloon. Deadline is December 18, 2007, it is directed toward undergraduate projects.
From the HASP website:

September 17, 2007: HASP CALL FOR PAYLOADS 2007-2008 RELEASED: The HASP Call for Payloads 2007-2008 (CFP) has been released and application materials are now available on the HASP website “Participant Info” page. Student groups interested in applying for a seat on the September 2008 flight of HASP should download these materials and prepare an application. New for this year is an increase in the allowed weight of the student payloads. Small class payloads can now mass up to 3 kilograms and large class payloads can weigh as heavy as 20 kilograms. Applications are due December 18, 2007 and selections will be announced by mid-January 2008.


The photos below and sideways are a 10 percent composite of several photos taken at 30,000 feet, with a 3x optical zoom at 500 mph. The speed makes it very unl
ikely to get any good details without some type of processing. And so, for the time being, imaging the ground with some type of precision with some type of point and shoot camera seems to be only feasible to payloads on balloons.
Compared to satellite imagery, one of the interesting capability is to remove the effect of clouds when possible. In satellite imagery, cameras work with pushbroom technology where the imager is a line of pixels (not a square dye). One consequence is the inability of photographing twice the same object with one sweep. Using off the shelf cameras on much slower balloons allow one to obtain multiple images of the same object at different angle. This is important when one wants to evaluate whether the object is noise or not.

Chris Anderson of the Long Tail book mentioned a different approach by Pict'Earth to using images from the sky using UAVs and patching them into Google Earth. This is interesting, but as I have mentioned before, when you take enough images, you don't need Google Earth, you don't need the headache of re-projecting these images onto some maps (even though it looks easier with Yahoo Map Mixer for small images), because you are the map. No need for IMUs or GPS instrumentation. This is clearly an instance of advances in stitching algorithms removing hardware requirements on the sensors. As for the current results Chris is getting from PTGui, I am pretty sure the autopano folks will enable the orthographic projection soon in order to cater to that market. With balloons, the view is from very far, so the patching algorithm has no problem stitching images together. In the case of UAVs, you need the orthographic projections.

Eventually, two other issues become tremendously important (especially in the context of Search And Rescue). Cameras and memory are going cheaper and one is faced with GB's of data to store, map and share. Our experience is that the sharing is challenging when you go over 2 GB of data mostly because of small file format limits (2 GB). Zoomify is interesting and they need to figure out a way to deal with larger images. While Autopano allows for images taken at different times to be overlayed with each other (a very nice feature), the viewer might be interested in this time information. Right now I know of no tool that allows one to switch back and forth between different times for the same map.

References:

1. Comparing Satellite Imagery and GeoCam Data
2. A 150-km panoramic image of New Mexico

Sunday, August 26, 2007

Hard Problems: Walking Dinosaurs wearing make-ups while sleeping.

I am always a little astonished by some things that I do not see implemented because they are too hard. Yet, I don't even see them being even attempted in the first place even though there is a very large market for each of those. Here they are:
  • How come we don't have Jurassic Park with walking dinosaurs ? everytime there is an animotronics coming in town, you have lines of kids waiting to see those things even if the time spent waiting will be longer than the time spent watching them and yet we still don't have walking dinosaurs (except when a human is inside). How come ? (my interest here lie in muscle, autonomous ). It looks as though we have only been able to devise their gait recently. Some people are already making a business case that building them will get people to come.
  • Knowing that some women spent as much as two hours every day to do their make-ups, how come there is not a Make-up robot for women ? ( autonomous ). This is all the more interesting that much technology goes into changing the face/shape of women in magazines. How come there isn't a similar technology to evaluate if the make-up is good enough ? Think Snow White mirror.
  • People spend an average of 8 hours sleeping yet there is no real good technology to improve sleep. How come there isn't an autonomous pillow that shapes itself around one's head over the course of the sleep. Or since a 32 GB SD card can allow people to record entire sleeping patterns for over 8 hour. What is the software that will allow to check if the pattern is a good one or a detrimental one ?

Friday, July 13, 2007

Adding Search and Rescue Capabilities (part II): Modeling what we see and do not see

One of the concern that one has during a search and rescue operation (part I is here) is whether or not, the item of interest was seen or detected. I am not entirely sure for instance that SAROPS includes this, so here is the result of some of the discussions I have had with some friends on this. While the discussions were about the Tenacious, one should keep an eye on how it applies to other types of mishap that may lead to a similar undertaking.

In the search for the Tenacious, there were several sensors used at different times:
  • Mark One Eyeball from Coast Guards or from some private parties or from onlookers from the coast
  • sensors used by the Coast Guard in Planes and Boats
  • sensors (Radar, visual, IR, multispectral) from satellites or high altitude planes
  • webcams looking at the SF port and bay.

Each and every one of these sensors give some information about their field of view but they are limited by their capabilities. The information from the sensor is dependent on its resolution and other elements. While the issue of resolution is well understood, at least spatially, sensor visibility is dependent on:
  • cloud cover (high altitude, satellites), haze (low altitude)
  • the calmness of the sea
  • the orientation of the sensor (was the object of interest in the sensor cone ?)
  • the ability of the sensor to discriminate the target of interest from the background (signature of the target)
  • the size of the target (are we looking for a full boat or debris ?)
And so whenever there is a negative sighting over an area, the statement is really about the inability of the detector to detect the target of interest due the elements listed above. And so the probability of the target of interest not being there is not zero (except in very specific circumstances). In effect, when the data fusion occurs when merging information from all these sensors, it is important to be able to quantify what we don't know as much as what we know. It is also important to realize that different maps are really needed for each scenario. A scenario about searching for debris is different from that of searching for a full size boat. What the detectors/sensors see is different in these two scenarios. While one can expect to have a good signal when searching for a full size boat, most sensors are useless when it comes to detecting minutes debris.

In the aerospace business, some of us use software like STK that provides different modules in order to schedule and understand information about specific satellite trajectories and so forth. It may be a good add-on to the current SAROPS capabilities in terms of quantifying the field of view.



But the main issue is really about building the right probability distribution as the search goes on and how one can add any heterogenous information into a coherent view of the search.

Time is also a variable that becomes more and more important as the search goes. In particular it is important to figure out the ability to do data fusion with time stamped data. One can see in this presentation, that while the search grid is regular, one can see some elements drifting out of the field of view as the search is underway. So the issue is really about quantifying data fusion with sensors input as well as maritime currents and provide a probability of escaping the search grid. SAROPS already does some of this, but I am not sure the timing element of the actual search (made by CG planes, boat) is entered in the software as the search go on. It was difficult for us to get back that timing from the search effort (it was rightfully not their priority) and one simply wonders if this is an input to SAROPS when iterating on the first empty searches. If one thinks along the lines of the 8000 containers scenario, this is important as it has been shown that some of these containers have different lifespan at sea level and right under the surface. In this case, the correlation between time stamped sensor outputs become central as a submerged but within a few feet underwater containers may not be viewable from specific sensors (but would remain dangerous to navigation). Also this is not because we did not see anything on the second path at the same location (provided no current) that the object is not here anymore, rather the sensor did not detect it. In the illustration below one can see the different targets found by the Radarsat/John Hopkins team for the Tenacious. Without time stamp it is nearly impossible to make a correlation between hits on the first and the second satellite path.

The bayesian framework seems to have already been adopted by SAROPS and previous versions. It may need some additional capabilities to take into account most the issues mentioned above (sensor network or EPH). In either case, a challenge of some kind, with real data might be a way to advance the current state of the art.

Monday, July 09, 2007

Adding Search and Rescue Capabilities (part I): Using Hyperspectral and Multispectral Cameras to Search for Non-Evading Targets

In the search for the Tenacious, I mentioned the possibility of using Hyperspectral or multispectral cameras on-board current satellites to see if there was a way to locate it. The big challenge resides in the resolution. Most of these cameras have a coarser resolution than, say, either the Tenacious or any medium sailing boat (i.e. one pixel is more than the size of the boat). Hence the generic expectation is that one cannot locate a boat using these. These cameras are also mostly used for other purposes such as environmental studies and the access is rightfully restricted to a small circle of experts. Because of that there is a large amount of convincing to do in order to have access to that imagery. The underlying reasoning as to why we could, in effect, discriminate between something that is interesting and something that is not, can be put in two categories:
  • the boat and its wake span a large part of the pixel and using a few bands, one can see a large difference between a man made object and the sea. In other words, the underlying scene is very sparse and one in effect detect very rapidly interesting artifacts. This a little bit like superresolution.
  • In some cameras like Hyperion on EO-1 or Meris (Envisat) there are 250 spectral channels. Even if the spatial resolution is coarse, we are bound to see something different when using 250 bands (as opposed to the traditional three color bands) especially against a very uniform background (sea). Techniques such as the ones developed by Mauro Maggioni and Ronald Coifman should be evaluated for that purpose.

Recall that the idea is to produce a map of what is potentially interesting, not an exact match of where that target is. A second step dealing with data fusion is responsible for eliminating the false positives given information from other sensors. With the help of Lawrence Ong, Sorin Popescu and I showed that you could see boats with Landsat 7. This is new but falls into the first category highlighted above. The second category has not been investigated as far as I know: Maybe it should. There are three categories of targets/signatures that should be investigated:

  • During the Tenacious search, a false positive was detected in the form of a sighting of a green dye. These dye are generally part of "distress kits" and used whenever a boat want to make it clear it has problem. While it was a false positive for other reasons, I had a discussion with the EO-1 folks (at JPL and Goddard) who mentioned that maybe producing ground truth data with green dye and Hyperion could probably lead to having a similar capability than the one we currently have for detecting volcanoes. In other words, produce a calibration formula to be fed to EO-1 so that in the future, its autonomous detection capability can provide data to the ground that this green dye has been detected over a specific area. Since one can schedule imagery on EO-1 and figure out Envisat data gathering cycle, this could be done as a small private endeavor.
  • Another signature of interest is that of the boat as produced on the camera. If it is a boat or a plane, it is very likely that these have been imaged before by the same camera over the same harbour or airport at some other time. But for the latter, a signature is not really that important per se. A large signal over background noise on some channels should be enough to find that boat/plane. In the case of the plane, the signature may be interesting as the background is generally cluttered.
  • In order to verify the ability to find current boats at sea, one could try to locate the boats currently involved in much advertized journeys or races. One could find out from the current stock of envisat and eo-1 photos whether boats like the Schooner Anne can be located. That boat is part of a 1000 days at sea journey. They have a map of their location day after day. The boat is a schooner (or about 120 feet large).


Another item that would have sped up the search is the ability to query simultaneously different databases on the availability of hyperspectral or multispectral images from different platforms. Either USGS or the ESA platforms are very nice, but making it into one search would have been a nice time saver. I am also pretty sure that there are other Earth Observation platforms from Asia (India in particular) that could have been used, provided I knew about them. Yet I cannot find anywhere on the web a catalog of civilian hyperspectral or multispectral imagers on current satellites.


Finally, let us recall that doing this can help us locate hard cases like the Tenacious but it may also help us in a totally different endeavor. As one can see from the extraordinary effort of the Coast Guards for the Tenacious, one boat can consume a large amount of man power. Let us imagine a case where you have to do the tracking of 8000 targets lost at sea.

In the search for Cessna N2700Q, the Civil Air Patrol tried the new ARCHER system without success on that search. And it looks like this is not happening only for this search as some people are doubting its capability for Search and Rescue Operations.
As indicated by Bradley,

CAP forum posts indicate ARCHER
requires a very narrow search area to be of much use. Our problem is that we're not sure where this Cessna pilot went after he dropped
below radar (N34° 48' 7" , W111° 56' 52").
This is the same problem that arises for EO-1, the swath of interest is generally very narrow compared to the size of the problem. We should probably think of a way of integrating Compressed Sensing into current hyperspectral imagery to increase the field of view. Let us recall that one of the reason this would be interesting is that these systems are there to point out major differences from the background, they are not there to produce very nice imagery.


If any of those items are of interest to you please contact me. I am particularly interested in people (from Asia, Europe or the U.S.) that can have direct access to this type of imagery so we can test some of what is said in this entry.

[ Si vous pensez que ce sujet est important et qu'il doit etre etudie, je serais tres heureux de pouvoir vous aider. N'hesitez pas a me contacter ]

Monday, May 14, 2007

Deep down, Making sense of it all one bit at a time


Last month, Andrew Gould, the CEO of Schlumberger gave a prep talk at an open house.

SCHLUMBERGER OPEN HOUSE
Schlumberger businesses and technologies demonstrations will include subsurface fluid sampling, integrated well completions, robotic tractors in a wellbore, reservoir modeling software, and geophysical seismic exploration.
10:00 a.m. to 4:00 p.m., Zachry Lobby

OPEN PRESENTATION
Andrew Gould
Chairman and CEO, Schlumberger
TITLE: Engineering Challenges (and Successes) in the Search for Oil and Gas
4:00 p.m., Room 102 Zachry


The open presentation attracted a large crowd. During the presentation, I was intrigued by the statement by Andrew that Schlumberger was positioning itself to be a provider of service for Carbon burying technology. But when you think about it, it makes sense as they have devised many services and technologies that are needed for this type of undertaking.

The room was full of people who looked like they wanted to be hired and so it was difficult to have any of them ask questions at the very end of the talk. Pissing off the CEO of the company you want to join, is a very compelling argument to not talk or ask question, or so they believe.... So I ended up having to do the dirty deed, but I was in fact really interested in several answers.

I have mentioned Schlumberger in this blog a while back, it was because of their ability to get signals from 3000 meters underground by using pulsed mud telemetry in the process generally known as Logging While Drilling. The main point was that, in order to save about 200 to 300K$ per day, they had to gather data at the drilling post in real-time so that they could steer the drilling bit (yes, drilling bits can go horizontal). Some people at Sandia have devised a Disposable Fiber Optic Telemetry System but it does not seem to have gain any traction in that industry. Pulsed mud bit rate is equivalent to an astonishing 30 bits per second transmission rate last time I checked. My question to Andrew was: have you guys done better in the past few years ? and the answer looked like a big maybe. He mentioned a new technology that uses some type of radio transmitter between each of the drilling rods but it did not seem to be a system that was yet currently used in the field. The mud communication system is an amazing piece of inventivness and the communication aspect of it is one of the most interesting problem to work on. Because of the very harsh constraints on the system (pressure, temperature,...) I am barely surprised that there isn't a better solution but I also think they should think outside the box on this one. My take would probably include using compressed sensing so that the amount of power generated in the measuring bit can be decreased tremendously. Heat generation (by the computers/electronics of the measuring bit) is non-trivial as there is little in the way of cooling when producing heat in these depths (the soil surrounding the bit is already warmer than the inside). Because of the high temperature environment, one also has to develop some better electronics to deal with these high temperature environment (see Sandia's presentation on electronics development and the need for new technology (SOI))

I then asked a question about the Canadian tar pits and the use of technology such as heat pipe to transfer energy from geothermal wells all the way up to the tar pits in order to warm them up so that they become liquid (i.e. less viscous and therefore more enconomical to retrieve from the ground). The answer looked like there is already have a program called "HTPT" that looks at that. HT may mean high temperature but I am sure what PT stands for.

And then I asked the "forward looking" question: if you wanted to differentiate yourself from your competitors in the next two or three years, where would you put your money in ? The answer was interesting because I was not expecting it. The way I interpreted what he said was: Data fusion, how do you combine the large amount of data produced in the field to have a clearer picture of your oil field (not just in three dimensions but also including time). When I went to talk to each of the engineers present at the different booth after the presentation, it did not seem that they had a view of what that entailed. One of the reasons mentioned was that most customers were not willing to put money into this type of analysis and so the company did not have a specific research team dedicated to that. The company itself is known to be dealing with very large amount of data and making sense of them for their customers. Yet summarizing that knowledge seems to be a difficult undertaking that most customers are only willing to do in-house. I am sure that an enterprising person with views on this issue could help them out. There is no reason to believe that developments in dimensionality reduction in the past few years should not be considered for those gigantic datasets.

Data fusion is also some kind of buzzword, so it may be productive to define what that means. In the measuring bit, there are different kinds of instruments, including neutron generators, radiation detectors, NMR and electromagnetic. Some of the current work seems to have been able to correlate seismic and flow measurements in order to provide a better assessment of the borehole condition. Therefore, a data fusion scheme would be aimed at correlating all the measurements from several types of sensors in order to provide additional information about either the location of the measuring bit and the time dependent geological conditions around that bit.

In order to do that, one has to compare measurements with computations. One of current generic concern is the ability to do inversion with Monte-Carlo codes such as MCNP (This is a very difficult problem because the solving of this inverse problem requires several many runs of forward computation by MCNP) or faster but coarser deterministic methods. You have many different parameters that you change (sensitivity studies) in order to figure out the distribution of parameters for the situation of interest.

Since MCNP or deterministic codes have many different parameters and are running in a finite time, one needs to have tools that provide a way of "interpolating" between parameters family you have not explored computationally. In the end, this problem is not unlike the problem faced in nuclear engineering when one runs a complex thermal hydraulics code: The Experimental Probabilistic Hypersurface tries to help in that respect.

Monday, May 07, 2007

DARPA Urban Challenge: Unveiling our algorithm, Part Three, Building the Motor/Perceptual models



As a small team, we had to make choices that are reflected in our choice of hardware and algorithm. One of the most time intensive aspect of devising a robot is the development of an accurate motion model and then an adequate sensor/perceptual model [1].

We could not spend an entire team and the time to devise a motor model as well as a sensor model as what other teams have done. Some of it stems from the fact that we are not interested in getting M.S. theses written for the moment as it is the case here. Furthermore, our vehicle is far from being protected from the elements. And we considered that it would become difficult to rely on a model that could become less and less accurate as the car was getting older. Another aspect of our decision making came from the actuators we used. In the case of the steering wheel, there are moments when the steering motor can slip (especially in large deviations). This is owed to two factors: the laptop is busy doing other tasks can sometime send only the steering task with a delay or there is too much torque too be applied too fast. This results in a command control that would be difficult to model. Finally, we were interested in developing a robot that would learn it's own behavior while being supervised with very little labeled data. In other words, we want the learning to be based on data coming from an actual driving behavior. Not much data understanding should come from any subsequent data processing.

Our motor model relies on our ability to drive around while in the car through the keyboard of the command laptop. This supervised learning of the motor model was built by producing a probability distribution of the next state of the car as a function of the previous state and the commands sent by the "supervised" commands. We used a a variety of techniques to build this probability distribution including the simple but robust Experimental Probabilistic Hypersurface devised at SCM [2] as well as Subjective mapping representation techniques (by Michael Bowling, Ali Ghodsi and Dana Wilkinson, Subjective Localization with Action Respecting Embedding and [3], [4] [5] [6])



In the latter case, we decided against using the SDE approach but use Compressed Sensing instead for the dimensionality reduction aspect of the problem for building the sensor model. Some aspect of the reduction include some of the ideas in [7].

References:
[1] D. Fox, J. Hightower, L. Liao, D. Schulz, and G. Borriello., Bayesian Filtering for Location Estimation, IEEE Pervasive Computing, 2003.

[2] Robust Mathematical Modeling, SCM, Experimental Probabilistic Hypersurfaces

[3] [pdf] [ps.gz] Subjective mapping, Michael Bowling, Dana Wilkinson, and Ali Ghodsi.
In New Scientific and Technical Advances in Research (NECTAR) of the Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI) , pages 1569--1572, 2006.

[4] [pdf] [ps.gz] Subjective localization with action respecting embedding, Michael Bowling, Dana Wilkinson, Ali Ghodsi, and Adam Milstein. In Proceedings of the International Symposium of Robotics Research (ISRR) , 2005.

[5] [pdf] Action respecting embedding, Michael Bowling, Ali Ghodsi, and Dana Wilkinson.
In Proceedings of the Twenty-Second International Conference on Machine Learning (ICML) , pages 65--72, 2005.

[6] [pdf] [ps.gz] Learning subjective representations for planning, Dana Wilkinson, Michael Bowling, and Ali Ghodsi. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI) , pages 889--894, 2005.

[7] INRIA :: [inria-00117116, version 1] Real-time Vehicle Motion Estimation Using Texture Learning and Monocular Vision

Friday, May 04, 2007

DARPA Urban Challenge: Unveiling our algorithm, Part Deux, Sensor and Actuator basics.


In a previous entry, I mentioned that we would be unveiling our algorithm. Before we do this, I will talk about the different sensors we have and use.
  • GPS: Since the race is mostly a mapping between the vehicles knowledge of its surrounding and a set of maps given by DARPA with GPS coordinates ( see Route Network Definition File (RNDF) and Mission Data File (MDF) ), we need to have a GPS. We do not have a DGPS, just a normal GPS chip with which we are interfacing with a an RS-232 interface at a rate of 1 Hz. To be specific this is a GARMIN GPS 18 PC, OEM SYSTEM, WAAS.

  • IMU: We also have a Microstrain MEMS IMU that provides acceleration, turning rate and heading (3DM-GX1). It provides information at 100 Hz.
  • Vision system: We use a series of webcams and a set of Unibrain firewire cameras. Right now our frame rate os about 15 Hz.

All these sensors interface with the Python program through an RS-232 channel. In terms of sophistication, it just does not get any better for us. One of the underlying reason is the realization that gathering data is one thing, using them efficiently is a totally different one. In particular, there are instances where reading and processing information from the IMU is not interesting.

With regards to the actuators, we currently have two large stepper motors connecting to the python program through the serial port. The first stepper motor rotates the steering wheel and the second one activates either the brake or the acceleration pedal.

One of the ways to do supervised learning, is to run the python program from a laptop that connects to both the sensors and the stepper motors. One can then run the car through the keyboard of the laptop. It works well as long as Skype is not running on the laptop at the time ( yes, we tried :-), it's a little bit like talking on your cell while driving....

In my next entry, I will discuss the modification to the webcams and firewire cameras so that they provide meaningful information. In particular, I will talk about the algorithm for the stereo system as well as the hardware and software implementation of the compressed sensing element of our algorithm (a random lens imager using a webcam). Both are in competition and we believe that the stereo system will not need to be used eventually.

Sunday, April 08, 2007

DARPA Urban Challenge: Unveiling our algorithm

In a previous entry, I mentioned the fact that we are unveiling our strategy.unveiling our strategy for our entry in the DARPA Urban Challenge (DARPA Urban Challenge is about driving at a record pace in some urban environment past many urban difficulties, including no GPS capability). This was not accurate, in fact, we really are going to unveil our algorithm. You'll be privy of the development quirks and everything that goes on implementing an algorithm that has to respond on-line to a challenging situation. I'll be talking on the history of why we are choosing specific algorithms over others. I will specifically talk more about manifold-based models for decision making in the race and the use of techniques devised to produce a storage device of previous actions in order to produce some sort of supervised learning capability. In the previous race, we were eliminated early mostly because we were plagued with mechanical problems that most of us had never faced before (none of us had robotics background), we hope to go farther this time as the vehicle is OK. For reference, we have already shown some of our drive by wire program before as well. We made some of our data available before and I expect, as time permit to do the same as we go along. Because our entry is truely innovative, we are trying to balance not getting eliminated by passing every steps of the application and those innovation in the algorithm. However, since all of us are not interested in just an autonomous car, our emphasis will always be on the side of doing something that most other entries are not attempting such as using compressed sensing and robust mathematical techniques for instance.

Saturday, March 24, 2007

Driving on a manifold: unveiling our strategy

Our entry in DARPA Urban Challenge will feature compressed sensing as a way to reduce the dimensionality of our vision sensor. We will then have to infer the connection between our GPS track (RNDF and MDF) and the reduced parameters obtained from random projections.

Thursday, March 22, 2007

Compressed Sensing, Primary Visual Cortex, Dimensionality Reduction, Manifolds and Autism

In a previous entry, I mentioned the potential connection between compressed sensing, the primary cortex and cognition deficit diseases like autism without much explanation. Here is an attempt at filling the holes.

When David Field and Bruno Olshausen showed that the primary cortex was getting inputs from our eyes as a set of sparse functions that looked like ridgelets and curvelets, it became obvious that one result was missing: If natural images are sparse and our eye system has sparse receptors, is there a way our brain finds a sparse decomposition of the world in a way that works in a linear fashion? The thinking goes that our brain is really capable of understanding scenes without an iteration process (an iteration process is nonlinear and has a high cost in terms of energy). When Emmanuel Candes and David Donoho showed that in fact, non-adaptive schemes using curvelets could decompose natural images it became obvious that a good parallel could be made between the physiology of the primary cortex and this new type of decomposition. But how do you do this decomposition ? While an m-term curvelet expansion of a scene can be thresholded and can rival with complex adaptive approximation schemes, it does not answer how the primary cortex eventually comes up with that m number.

The state of the art on our thinking about the primary cortex can be found here in this review by Graham and Field on sparse coding in the neocortex. It specifically addresses the bounds on the primary cortex induced by the metabolic constraints:

We conclude our discussion by returning to the issue of metabolic constraints. Could we argue that primary evolutionary pressure driving towards sparse coding is one related to the metabolic costs of neural firing? As noted earlier, both Attwell and Laughlin (2001) and Lennie (2003) argue that there are not enough resources to achieve anything but a low-activity system. Moreover, when we find sparse activity in frontal cortex (Abeles et al., 1990), it is more difficult to argue that the sparse activity must arise because it is mapping the sparse structure of the world. Even at early levels, if sparseness were metabolically desirable, there are a number of ways of achieving sparseness without matching the structure of the world. Any one of a wide variety of positively accelerating nonlinearities would do. Simply giving the neurons a very high threshold would achieve a sparse code, but the system would lose information. We argue that the form of sparse coding found in sensory systems is useful because such codes maintain the information in the environment, but do so more efficiently. We argue that the evolutionary pressure to move the system towards a sparse code comes from the representational power of sparse codes. However, we do accept that metabolic constraints are quite important. It has been demonstrated that at the earliest levels of the visual system, ganglion cells (Berry, Warland and Meister, 1997) and lateral geniculate nucleus cells (Reinagel and Reid, 2000) show sparse (non-Gaussian) responses to temporal noise. A linear code, no matter how efficiently it was designed would not show such sparse activity, so we must assume that the sparseness is at least in part due to the nonlinearities in the system and not due to the match between the receptive fields and the sparse structure of the input. As with the results show sparse responses in non-sensory areas, we must accept that metabolic may also be playing a significant role.

Well it's nice to acknowledge we have physical limitations, but to assume that a linear code cannot exist simply because we currently do not have a model for it, is probably assuming too much. So what do we know ? the primary cortex is a low energy system which basically removes from consideration any complex system that requires resources (like an iteration system). This situation favors a linear system but so far, we have not found a good model for that. There is something deeper still. Even if we knew much about the sparsity of a scene, understanding the brain is really about understanding how large amount of information is reduced when traveling from the eye into the brain. In other words, we need to reduce the amount of data that our megapixel sensor called the eye is bringing in, and we must do this very fast (30 times a second). To put things in perspective, let us take an example: let us imagine we are seeing a scene where somebody waves a hand. If we were to take a video of this scene, we would probably get a 40 MB avi file (uncompressed). That file could then be compressed to 1 MB using MPEG for instance. While the compression is impressive, it is not impressive enough. In effect, when our brain sees this video, it can remember the hand and how it moved. The movement is probably two dimensional and so the brain really remembers the two parameters needed to produce a hand that waves in the manner that is shown in the video. In other words, the brain is probably not storing 1 MB of information when it stores this hand waving activity, it is most probably storing how two parameters changed over time, which in many occasion is much less than 1 MB of data. The real question becomes: Is there a way to reduce that the 1MB information further ? We are not asking ourselves if the input or the receptor are sparse (it is a necessary condition), we are interested in answering the question on how sparser we can make this information by using the connection between these sparse elements. Can we reduce the dimensionality of the signal further and exploit it ?

Enters Dimensionality reduction: Ever since the discovery of dimensionality reduction schemes (LLE, Isomap, Laplacian-diffusion..) that are taking high dimensionality data and and are able to map them into low dimensionality manifolds, researchers have been trying to extend these techniques to wider sets of problems. In the cognition world for instance, Jonathan Pillow and Eero Simoncelli perform dimensionality reduction applied to neural models but it is not obvious how these techniques directly translate into a specific functionality of the primary cortex even if they take an example of a V1 cell. It is also not obvious how some of these techniques are robust to noise. But as stated earlier, there are different ways to go about dimensionality reduction. One of the most intriguing which has robustness built into it is Compressed Sensing. Compressed Sensing has the ability to produce a robust decomposition of a manifold. Mike Wakin looked into that during his dissertation and found that smooth manifolds can be readily compressed using Compressed Sensing thereby making it a very simple solution to dimensionality reduction (see R. G. Baraniuk and M. B. Wakin in Random Projections of Smooth Manifolds ) but as Donoho and Grimes had shown earlier, sharp objects such as arms, legs have edges and that makes the manifold non-differentiable. This is a problem because it means that one cannot easily extract parameters from a video if we have these sharp edges. In order take care of that problem, one can be inspired by Biology i.e. to smooth these images using Gabor wavelets as in the human vision system (Object Recognition with Features Inspired by Visual Cortex by Thomas Serre, Lior Wolf and Tomaso Poggio) and then use the random projection of smooth manifolds to eventually figure out the parameters of the movements ( for more information on how to do that see High-resolution navigation on non-differentiable image manifolds or the Multiscale Structure of Non-Differentiable Image Manifolds)

[in Object Recognition with Features Inspired by Visual Cortex by Thomas Serre, Lior Wolf, Tomaso Poggio one may note that one can only be struck by the pain the algorithm goes through into in order to be robust.

  • S1: Apply a battery of Gabor filters to the input image. The filters come in 4 orientations θ and 16 scales s (see Table 1). Obtain 16×4 = 64 maps (S1)sθ that are arranged in 8 bands (e.g., band 1 contains filter outputs of size 7 and 9, in all four orientations, band 2 contains filter outputs of size 11 and 13, etc).
  • C1: For each band, take the max over scales and positions: each band member is sub-sampled by taking the max over a grid with cells of size NΣ first and the max between the two scale members second, e.g., for band 1, a spatial max is taken over an 8 ×8 grid first and then across the two scales (size 7 and 9). Note that we do not take a max over different orientations, hence, each band (C1)Σcontains 4 maps.
  • During training only: Extract K patches Pi=1,...K of various sizes ni × ni and all four orientations (thus containing ni × ni × 4 elements) at random from the (C1)Σ maps from all training images.
  • S2: For each C1 image (C1)Σ, compute: Y = exp(−γ||X − Pi||2) for all image patches X (at all positions) and each patch P learned during training for each band independently. Obtain S2 maps (S2)Σi .
  • C2: Compute the max over all positions and scales for each S2 map type (S2)i (i.e., corresponding to a particular patch Pi) and obtain shift- and scale-invariant C2 features (C2)i , for i = 1 . . .K.


]
Besides Wakin, Donoho, Baraniuk and others collaborating with them, few have made that connection. Yves Meyer makes a passing reference to the use of compressed sensing in physiology (in Perception et compression des images fixes.) Gabriel Peyre however is a little more specific here.
Analogies in physiology:
This compressed sampling strategy could potentially lead to interesting models for various sensing operations performed biologically. Skarda and Freeman have proposed a non-linear chaotic dynamic to explain the analysis of sensory inputs. This chaotic state of the brain ensures robustness toward unknown events and unreliable measurements, without using too many computing resources. While the theory of compressed sensing is presented here as a random acquisition process, its extension to deterministic or dynamic settings is a fascinating area for future research in signal processing.
I am mentioning Gabriel Peyre's work because he works on bandelets. I had mentioned bandelets before in the context of an announcement made by the company Let it wave (headed by Stephane Mallat ) where they showed that faces could be compressed down to 500 bytes or the size of a bar code.



If bandelets provide a recognizable Faces nonlinearly with 500 bytes, one only needs 5 x 500 bytes = 2.5 KB random samples (within the meaning of Compressed sensing) of that Face to be able to reconstruct it. 2.5KB is better than 10 MP or 1 MB. The number 5 is the bound for compressed sensing (more specific asymptotic laws/results can be found in the summary of Terry Tao)

However, the strongest result so far is the one found by Mike Wakin on random projection on a manifold where he uses neighborhood criteria in the compressed sensing space to permit an even better reconstruction than just assuming sparsity. In the figure below, one can compare the 5 random projection results using the Manifold based recovery from the traditional algorithms used like Orthogonal Matching Pursuit and Basis Pursuit.



Naturally, an extension of this is target detection as featured in the Smashed Filter article from Rice. With this type of result/framework, I am betting we can go lower than 2.5 KB of universal samples to characterize a face. The connection between cognition and compressed sensing and autism is simple: Face processing seem deficient for people affected with autism and we don't know why. A model based on the elements I just mentioned might give an insight into this issue.

Thursday, March 01, 2007

Current use of EO data for Search And Rescue operations (SAR)

The current use of Earth Observation data is directed toward using wind scattometer data and include them directly into the drift modeling used for Search and Rescue Operations of known objects (objects for which we know the original location but want to know more about drift).


What is interesting is the apparent mismatch between the model data and the EO actual observations as shown by Michel Olagnon from IFREMER in the photo below (the picture shows the model color with satellite swath lines providing real information, the mismatch show the inaccuracy of the model).


With regards to data fusion in the current SAROPS software by the USCG, it looks like the current configuration only include overlays of low resolution.



it does not seem to address the ability to image directly the objects of interest (either a boat lost at sea or drifting containers) .

Drifting behavior while searching

This series of images comes from the presentation of Art Allen (USCG) on SAROPS. One can clearly sees that drift is an important component of the search activity while the search is underway. Another question begs to be answered: how come the search grid is not uniform in order to provide efficient information ?





Printfriendly