Wednesday, November 09, 2016

Paris Machine Learning Meetup #3 Season 4: OPECST, Correlations, Transfer Learning, DL @Amazon, Car Sales

Video of the streaming is here:

The meetup will be hosted by AAA-data / Comité des Constructeurs Français d'Automobiles and the networking event is sponsored by . A big thank you to them.

The program for this third regular meetup of the season (and the fifth total for season 4) is a little extraordinary this time and will feature the following:

You may have already read many times that the job of a Data Scientist is to skim through a huge amount of data searching for correlations between some variables of interest. And also, that one of his worst enemies (besides correlation doesn't imply causation) is spurious correlation. But what really is correlation? Are there several types of correlations? Some "good", some "bad"? What about their estimation? This talk will be a very visual presentation around the notion of correlation and dependence. I will first illustrate how the standard linear correlation is estimated (Pearson coefficient), then some more robust alternative: the Spearman coefficient. Building on the geometric understanding of their nature, I will present a generalization that can help Data Scientists to explore, interpret, and measure the dependence (not necessarily linear or comonotonic) between the variables of a given dataset. Financial time series (stocks, credit default swaps, fx rates), and features from the UCI datasets are considered as use cases. 

Dataiku recently worked on an e-business vacation retailer recommender system based on users' previously visited products. We created a meta model on top of classical recommender system that generated an increase of 7% in revenue during the A/B test. For this type of business, the content of the product image is paramount. The next step was obviously to add image information in the recommender. The key take away is this: you don’t need a deep learning expert to solve the tagging problem. Because labeled datasets and corresponding pre-trained neural network are available on the Internet, you can use “transfer learning” and map your problem to an existing one. The post processing step consists in grouping labels to get features associated to more global visual themes. For instance, "theme beach" = coast + ocean + sandbar. We use them to recommend customer personalized products or to address marketing issues such as : what kind of image should I propose for this product?

La base de données AAA DATA comptabilise plus de 300 millions d’événements avérés avec un historique unique de plus de 15 ans qui permet de valider la solution prédictive proposée à ses clients. Le challenge a été d’appliquer du machine learning sur ces 300 millions d’événements

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

No comments: