Friday afternoon is Hamming's time. Today I decided to compete in the Best Camera Application contest of XIMEA, a maker of small hyperspectral cameras. Here is my entry:
Challenging task: Make hyperspectral imaging mainstream
Idea: Create a large database of hyperspectral imagery for use in Machine/Deep Learning Competitions
Machine Learning is the field concerned with creating, training and using algorithms dedicated to making sense of data. These algorithms are taking advantage of training data (images, videos) as a way of improving for tasks such as detection, classification, etc. In recent years, we have witnessed a spectacular growth in this field thanks to the joint availability of large datasets originating from the internet and the attendant curating/labeling efforts of said images and videos.
Numerous labeled datasets available such as CIFAR , Imagenet , etc. routinely permit algorithms of increased complexity to be developed and compete in state of the art classification contests. For instance, the rise of deep learning algorithms comes from breaking all the state-of-the-art classification results in the “ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry”  More recent examples of this heated competition results were recently shown at the NIPS conference last week where teams at Microsoft Research produced breakthroughs in classification with an astounding 152 layer neural networks . This intense competition between highly capable teams at universities and large internet companies is only possible because some large amount of training data is being made available.
Image or even video processing for hyperspectral imagery cannot follow the development of image processing that occurred for the past 40 years. The underlying reason stems from the fact that this development was performed at considerable expense by companies and governments alike and eventually yielded standards such as Jpegs, gif, Jpeg2000, mpeg, etc…Because such funding is no longer available we need to find ways of improving and working with new imaging modalities.
Technically, since hyperspectral imagery is still a niche market, most analysis performed in this field runs the risk of being seen as an outgrowth of normal imagery: i.e substandards tools such as JPEG or labor intensive computer vision tools are being used to classify and use this imagery without much thought into using the additional structure of the spectrum information. More sophisticated tools such as advanced matrix factorization (NMF, PCA, Sparse PCA, Dictionary learning, ….) in turn focus on the spectral information but seldomly use the spatial information. Both approaches suffer from not investigating more fully the inherent robust structure of this imagery.
For hyperspectral imagery to become mainstream, algorithms for compression and for its day-to-day use has to take advantage of the current very active and highly competitive development in Machine Learning algorithms. In short, creating large and rich hyperspectral imagery datasets beyond what is currently available ([5-8] is central for this technology to grow out its niche markets and become central in our everyday lives.
In order to make hyperspectral imagery mainstream, I propose to use a XIMEA camera and shoot imagery and video of different objects, locations and label these datasets.
The datasets will then be made available on the internet for use by parties interested in performing classification competition based on them (Kaggle, academic competitions,...).
As a co-organizer of the meetup, I also intend on enlisting some of the folks in the Paris Machine Learning meetup group ( with close to 3000 members it is one of the largest Machine Learning meetup in the world ) to help in enriching this dataset.
The dataset should be available from servers probably colocated at a university or some non-profit organization (to be identified). A report presenting the dataset should be eventually academically citable.
ReferencesLearning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009, https://www.cs.toronto.edu/~kriz/cifar.html Imagenet dataset, http://www.image-net.org/ ImageNet Classification with Deep Convolutional Neural Networks, Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf Microsoft researchers win ImageNet computer vision challenge, http://blogs.microsoft.com/next/2015/12/10/microsoft-researchers-win-imagenet-computer-vision-challenge/ T. Skauli and J. Farrell. “A collection of hyperspectral images for imaging systems research”. In Proceedings of the SPIE Electronic Imaging ‘2013, https://scien.stanford.edu/index.php/hyperspectral-image-data/ Foster, D.H., Amano, K., Nascimento, S.M.C., & Foster, M.J. (2006). Frequency of metamerism in natural scenes. Journal of the Optical Society of America A, 23, 2359-2372., http://personalpages.manchester.ac.uk/staff/david.foster/Hyperspectral_images_of_natural_scenes_04.html Parraga CA, Brelstaff G, Troscianko T, Moorhead IR, Journal of the Optical Society of America 15 (3): 563-569, 1998 or G. Brelstaff, A. Párraga, T. Troscianko and D. Carr, SPIE. Vol. 2587. Geog. Inf. Sys. Photogram. and Geolog./Geophys. Remote Sensing, 150-159, 1995, Paris Machine Learning meetup, http://www.meetup.com/Paris-Machine-learning-applications-group/
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.