Friday, November 26, 2010

Using Kinect for a compressive sensing hack ? and a dictionary attack

As some of you know, I am very much interested in hardware hacking as a way to show case compressive sensing on the cheap. A few days ago, I mentioned the different numerous hacks surrounding the release of the Kinect camera that is a USB camera connecting to the Xbox and providing 3D map information. As it turns out, here is what wikipedia has on the reengineering undertaken so far:

The depth sensor consists of an infrared laser projector combined with a monochrome CMOS sensor, and allows the Kinect sensor to see in 3D under any ambient light conditions.[9][22] The sensing range of the depth sensor is adjustable, with the Kinect software capable of automatically calibrating the sensor based on gameplay and the player's physical environment, such as the presence of furniture.[23]
Described by Microsoft personnel as the primary innovation of Kinect,[24][25][26][27] the software technology enables advanced gesture recognition, facial recognition, and voice recognition.[28] According to information supplied to retailers, the Kinect is capable of simultaneously tracking up to six people, including two active players for motion analysis with a feature extraction of 20 joints per player.[29]
Through reverse engineering efforts,[30] it has been determined that the Kinect sensor outputs video at a frame rate of 30 Hz, with the RGB video stream at 8-bit VGA resolution (640 × 480 pixels) with a Bayer color filter, and the monochrome video stream used for depth sensing at 11-bit VGA resolution (640 × 480 pixels with 2,048 levels of sensitivity). The Kinect sensor has a practical ranging limit of 1.2–3.5 metres (3.9–11 ft) distance when used with the Xbox software. The area required to play Kinect is roughly around a 6m² area, although the sensor can maintain tracking through an extended range of approximately 0.7–6 metres (2.3–20 ft). The sensor has an angular field of view of 57° horizontally and a 43° vertically, while the motorized pivot is capable of tilting the sensor as much as 27° either up or down. The microphone array features four microphone capsules,[31] and operates with each channel processing 16-bit audio at a sampling rate of 16 kHz.[29]
The description is not really conclusive about the actual underlying technology getting the depth map. So I turned to Daniel Reetz ( you may recall our previous interaction before) for some answers and here is what he said:
Hey Igor,
The projector is not modulated in time at all. It is just a laser continuously shining through some holographic diffuser. The diffuser material creates a fixed pattern of speckle. The speckle is essentially “in focus” from the front of the Kinect sensor all the way to the furthest back surface that it can see.
So there seems to be no time-domain subsampling to be had from the projector. The camera which images the projected images to do depth estimation is rather high resolution (1280×920, offhand), and seems to be being read out at 30hz. The resulting depth map has only 320×240 pixels, but not necessarily 320×240 actual resolution.
So the answer is a little complicated. The projected speckle image is unchanging. In this case, I took a bunch of pictures of this non-changing pattern so that the images could be averaged together for better speckle counting.
I can’t wait to get back to LA where I can do the speckle generation demonstration I have in mind, and spend some more time with these patents/papers to get to the core of this technology.
Thanks Daniel !

So it looks like the time modulation is out of the window but I keep wondering if this set-up can be used for some compressive sensing experiments. You can buy the Kinect with an Xbox here or here. Most hacks performed so far are documented in the KinectHacks blog.

In a different direction, I found this project fascinating: Why blurring sensitive information is a bad idea as it touches on dictionary attacks for images, a subject not unknown on this blog where one does a comparison between known elements of a dictionary muddled through a blurring (random ?) function and an unknown scene.

No comments:

Printfriendly