Thursday, June 02, 2011

Mining the Cell Phone Data to Find the Source of the German Super E-Coli outbreak

A commenter to A solution to finding the E-Coli outbreak source in Germany stated that with 500 cases, mining the cell phone data was not a major undertaking. Let me make the case it is. As far as we understand, we still don't know the nature of the contamination, in effect everything is on the table: water, food, aerial pathogen and it looks like the incubation period is three to eight days. It is not a case of finding out if all these people went to one restaurant or something similar. the issue is figuring out if given all these records, there are other records linking them together. For instance, let us imagine it is a cucumber foodstuff (it looks like it is not), the data mining would figure out all the paths of the sick people and then figure out if all these locations are connected by other cell phone location data ( those of the people selling the sick people the foodstuff). In the end, it may take two or three iterations and additional constraints before having some sorts of the places of interest.  


Anonymous said...

nonnative english speakers might miss the fact you wrote "as fart as we know" haha good one!

Igor said...

Fixed. thanks Anonymous,

Anonymous said...

I have your point but I still don't think it is a major undertaking by the standards of the computing power that is available. The Netflix prize had 100 million data points and people were routinely solving the system to a reasonable accuracy (not the winners of course) in less than a day on a single commodity desk top machine. If you throw a typical university cluster with 100 machines at the problem we are talking about a problem that might take minutes to hours. If you could use the spare capacity at a Google data center it might take only seconds.

From a North American perspective, an additional source of data about the movements of individuals would be credit card data and there would be no reason why credit card and telephone data could not be combined to map movements. So far as privacy goes - asking the sick people to consent to disclosure should take care of that.

Anonymous said...

Using the mobile data to track individuals movements would partly be dependent upon the handset (3G iPhone, android etc) that supports location based services. With luck there will be many small packets of data for email or some data application. This would provide many frequent data points you could use to track detailed individual movements, say every 20 mins. If only voice calls and SMS/text are used, then there will be fewer points of data to work with and you will likely need many thousands of individuals (rather than 500) to build detailed insight.

What you describe is definitely a good idea, and not dissimilar to marketing analytics the telco might do themselves.