Wednesday, June 19, 2019

Accelerating the Nelder - Mead Method with Predictive Parallel Evaluation - implementation -

** Nuit Blanche is now on Twitter: @NuitBlog **



The Nelder–Mead (NM) method has been recently proposed for application in hyperparameter optimization (HPO) of deep neural networks. However, the NM method is not suitable for parallelization, which is a serious drawback for its practical application in HPO. In this study, we propose a novel approach to accelerate the NM method with respect to the parallel computing resources. The numerical results indicate that the proposed method is significantly faster and more efficient when compared with the previous naive approaches with respect to the HPO tabular benchmarks.
 The attendant implementaiton is here.



Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv

Tuesday, June 18, 2019

Toward Instance-aware Neural Architecture Search - implementation -

** Nuit Blanche is now on Twitter: @NuitBlog **



Recent advancements in Neural Architecture Search (NAS) have achieved significant improvements in both single and multiple objectives settings. However, current lines of research only consider searching for a single best architecture within a search space. Such an assumption restricts the model from capturing the high diversity and variety of real-world data. With this observation, we propose InstaNAS, an instance-ware NAS framework that aims to search for a distribution of architectures. Intuitively, we assume that real-world data consists of many domains (e.g., different difficulties or structural characteristics), and each domain can have one or multiple experts that have relatively more preferable performance. The controller of InstaNAS is not only responsible for sampling architectures during its search phase, but also needs to identify which down-stream expert architecture to use for each input instance during the inference phase. We demonstrate the effectiveness of InstaNAS in a multiple-objective NAS setting that considers the trade-offs between accuracy and latency. Within a search space inspired by MobileNetV2 on a series of datasets, experiments show that InstaNAS can achieve either higher accuracy with same latency or significant latency reduction without compromising accuracy against MobileNetV2.
The attendant implementation is here: https://github.com/AnjieZheng/InstaNAS


Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv

A simple dynamic bandit algorithm for hyper-parameter tuning - implementation -

** Nuit Blanche is now on Twitter: @NuitBlog **




Hyper-parameter tuning is a major part of modern machine learning systems. The tuning itself can be seen as a sequential resource allocation problem. As such, methods for multi-armed bandits have been already applied. In this paper, we view hyper-parameter optimization as an instance of best-arm identification in infinitely many-armed bandits. We propose D-TTTS, a new adaptive algorithm inspired by Thompson sampling, which dynamically balances between refining the estimate of the quality of hyper-parameter configurations previously explored and adding new hyper-parameter configurations to the pool of candidates. The algorithm is easy to implement and shows competitive performance compared to state-of-the-art algorithms for hyper-parameter tuning.
The attendant code is here.


Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv

Monday, June 17, 2019

Random Search and Reproducibility for Neural Architecture Search

** Nuit Blanche is now on Twitter: @NuitBlog **





Neural architecture search (NAS) is a promising research direction that has the potential to replace expertdesigned networks with learned, task-specific architectures. In order to help ground the empirical results in this field, we propose new NAS baselines that build off the following observations: (i) NAS is a specialized hyperparameter optimization problem; and (ii) random search is a competitive baseline for hyperparameter optimization. Leveraging these observations, we evaluate both random search with early-stopping and a novel random search with weight-sharing algorithm on two standard NAS benchmarks—PTB and CIFAR-10. Our results show that random search with early-stopping is a competitive NAS baseline, e.g., it performs at least as well as ENAS, a leading NAS method, on both benchmarks. Additionally, random search with weight-sharing outperforms random search with early-stopping, achieving a state-of-the-art NAS result on PTB and a highly competitive result on CIFAR-10. Finally, we explore the existing reproducibility issues of published NAS results. 

An implementation of the paper is at: https://github.com/liamcli/randomNAS_release

The code base requires the following additional repositories:

Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv

Towards Learning of Filter-Level Heterogeneous Compression of Convolutional Neural Networks - implementation -

** Nuit Blanche is now on Twitter: @NuitBlog **





Recently, deep learning has become a de facto standard in machine learning with convolutional neural networks (CNNs) demonstrating spectacular success on a wide variety of tasks. However, CNNs are typically very demanding computationally at inference time. One of the ways to alleviate this burden on certain hardware platforms is quantization relying on the use of low-precision arithmetic representation for the weights and the activations. Another popular method is the pruning of the number of filters in each layer. While mainstream deep learning methods train the neural networks weights while keeping the network architecture fixed, the emerging neural architecture search (NAS) techniques make the latter also amenable to training. In this paper, we formulate optimal arithmetic bit length allocation and neural network pruning as a NAS problem, searching for the configurations satisfying a computational complexity budget while maximizing the accuracy. We use a differentiable search method based on the continuous relaxation of the search space proposed by Liu et al. (2019a). We show, by grid search, that heterogeneous quantized networks suffer from a high variance which renders the benefit of the search questionable. For pruning, improvement over homogeneous cases is possible, but it is still challenging to find those configurations with the proposed method. The code is publicly available at https://github.com/yochaiz/Slimmable and https://github.com/yochaiz/darts-UNIQ.



Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv

Saturday, June 15, 2019

Saturday Morning Videos: AutoML Workshop at ICML 2019

** Nuit Blanche is now on Twitter: @NuitBlog **


Katharina Eggensperger, Matthias Feurer, Frank Hutter, and Joaquin Vanschoren organized the AutoML workshop at ICML, and there are videos of the event that took place yesterday. Awesome.! Here is the intro for the workshop:
Machine learning has achieved considerable successes in recent years, but this success often relies on human experts, who construct appropriate features, design learning architectures, set their hyperparameters, and develop new learning algorithms. Driven by the demand for off-the-shelf machine learning methods from an ever-growing community, the research area of AutoML targets the progressive automation of machine learning aiming to make effective methods available to everyone. The workshop targets a broad audience ranging from core machine learning researchers in different fields of ML connected to AutoML, such as neural architecture search, hyperparameter optimization, meta-learning, and learning to learn, to domain experts aiming to apply machine learning to new types of problems.

All the videos are here.

Bayesian optimization is a powerful and flexible tool for AutoML. While BayesOpt was first deployed for AutoML simply as a black-box optimizer, recent approaches perform grey-box optimization: they leverage capabilities and problem structure specific to AutoML such as freezing and thawing training, early stopping, treating cross-validation error minimization as multi-task learning, and warm starting from previously tuned models. We provide an overview of this area and describe recent advances for optimizing sampling-based acquisition functions that make grey-box BayesOpt significantly more efficient.
The mission of AutoML is to make ML available for non-ML experts and to accelerate research on ML. We have a very similar mission at fast.ai and have helped over 200,000 non-ML experts use state-of-the-art ML (via our research, software, & teaching), yet we do not use methods from the AutoML literature. I will share several insights we've learned through this work, with the hope that they may be helpful to AutoML researchers.



AutoML aims at automating the process of designing good machine learning pipelines to solve different kinds of problems. However, existing AutoML systems are mainly designed for isolated learning by training a static model on a single batch of data; while in many real-world applications, data may arrive continuously in batches, possibly with concept drift. This raises a lifelong machine learning challenge for AutoML, as most existing AutoML systems can not evolve over time to learn from streaming data and adapt to concept drift. In this paper, we propose a novel AutoML system for this new scenario, i.e. a boosting tree based AutoML system for lifelong machine learning, which won the second place in the NeurIPS 2018 AutoML Challenge. 


In this talk I'll survey work by Google researchers over the past several years on the topic of AutoML, or learning-to-learn. The talk will touch on basic approaches, some successful applications of AutoML to a variety of domains, and sketch out some directions for future AutoML systems that can leverage massively multi-task learning systems for automatically solving new problems.


Recent advances in Neural Architecture Search (NAS) have produced state-of-the-art architectures on several tasks. NAS shifts the efforts of human experts from developing novel architectures directly to designing architecture search spaces and methods to explore them efficiently. The search space definition captures prior knowledge about the properties of the architectures and it is crucial for the complexity and the performance of the search algorithm. However, different search space definitions require restarting the learning process from scratch. We propose a novel agent based on the Transformer that supports joint training and efficient transfer of prior knowledge between multiple search spaces and tasks.
Neural architecture search (NAS) is a promising research direction that has the potential to replace expert-designed networks with learned, task-specific architectures. In order to help ground the empirical results in this field, we propose new NAS baselines that build off the following observations: (i) NAS is a specialized hyperparameter optimization problem; and (ii) random search is a competitive baseline for hyperparameter optimization. Leveraging these observations, we evaluate both random search with early-stopping and a novel random search with weight-sharing algorithm on two standard NAS benchmarks—PTB and CIFAR-10. Our results show that random search with early-stopping is a competitive NAS baseline, e.g., it performsat least as well as ENAS, a leading NAS method, on both benchmarks. Additionally, random search with weight-sharing outperforms random search with early-stopping, achieving a state-of-the-art NAS result onPTB and a highly competitive result on CIFAR-10. Finally, we explore the existing reproducibility issues of published NAS results.
The practical work of deploying a machine learning system is dominated by issues outside of training a model: data preparation, data cleaning, understanding the data set, debugging models, and so on. What does it mean to apply ML to this “grunt work” of machine learning and data science? I will describe first steps towards tools in these directions, based on the idea of semi-automating ML: using unsupervised learning to find patterns in the data that can be used to guide data analysts. I will also describe a new notebook system for pulling these tools together: if we augment Jupyter-style notebooks with data-flow and provenance information, this enables a new class of data-aware notebooks which are much more natural for data manipulation.
Panel Discussion





Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv

Thursday, June 13, 2019

Inverse Scattering via Transmission Matrices: Broadband Illumination and Fast Phase Retrieval Algorithms

** Nuit Blanche is now on Twitter: @NuitBlog **

Here is an enduring example of the Donoho-Tanner phase transition, the ability to differentiate out certain reconstruction algorithms (I used to use an earlier example to talk about the peer review process).



When a narrowband coherent wavefront passes through or reflects off of a scattering medium, the input and output relationship of the incident field is linear and so can be described by a transmission matrix (TM). If the TM for a given scattering medium is known, one can computationally “invert” the scattering process and image through the medium. In this work, we investigate the effect of broadband illumination, i.e., what happens when the wavefront is only partially coherent? Can one still measure a TM and “invert” the scattering? To accomplish this task, we measure TMs using the double phase retrieval technique, a method which uses phase retrieval algorithms to avoid difficult-to-capture interferometric measurements. Generally, using the double phase retrieval method requires performing massive amounts of computation. We alleviate this burden by developing a fast, GPU-accelerated algorithm, prVAMP, which lets us reconstruct 2562642 TMs in under five hours. After reconstructing several TMs using this method, we find that, as expected, reducing the coherence of the illumination significantly restricts our ability to invert the scattering process. Moreover, we find that past a certain bandwidth an incoherent, an intensity-based scattering model better describes the scattering process and is easier to invert.
 Implementations: 

Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv

Wednesday, June 12, 2019

Ce soir: Paris Machine Learning Meetup, Season 6 Finale. Data4Good, Hybridation, Sequence Predictions, Unreliable Data

** Nuit Blanche is now on Twitter: @NuitBlog **





Tonight is the season 6 finale for the Paris Machine Learning Meetup. We'll talk Data4Good, Hybridation, Sequence Predictions and Unreliable Data

Thank you to Morning-coworking for welcoming us and providing the drinks ! if you want to register to attend this is here. Jacqueline Forien, Claude Falguieres, Franck Bardol and myself will see you tonight.

Streaming is here: 


SCHEDULE :
6:45PM doors open - 7PM talks - 9PM networking - 10PM End

Here are the abstracts and the eventual presentation slides.

** Emmanuel Bacry, CNRS, University Paris-Dauphine, Two ambitious AI for Good projects
I will present two AI related ambitious projects I am involved in. The National Health Data Hub (in which I am involved as the CSO) that started a few months ago and a startup I cofounded two years ago, nam.R (35 employees as of today).

** John Whitbeck, Liftoff, Reliable ML on unreliable data
In academic research, supervised learning benchmarks are performed against static datasets (e.g. MNIST, Imagenet). However, in industrial applications of machine learning, dynamic datasets are the norm. Data churn, label drift, delayed data, censored data, corrupted data, training-serving skew, human error are but a few factors that can dramatically impact the performance of a machine learning system.
Over the years, Liftoff has developed many strategies to deliver reliable predictions from unreliable data. In this talk, we'll discuss real-time monitoring of serving accuracy, automated model safety checks, end-to-end, feature integrity tests, and how to efficiently patch immutable append-only datasets.

** Nabil BenmeradArnaud de Moissac, DCBrain, ML/RL for Combinatorial Optimisation : the next big thing
Deep RL is nice to win atari games but is useless to solve large and complex real-life problems like combinatorial optimisation issues. We can find these everyday problems in the industry, transportation, grid management, supply chain … Usually solved with Operational Research, we need to use new paradigms to be able to solve these issues fast enough to perform everyday optimisation. A hybrid approach between ML & OR can be very efficient as we tested it at DCbrain. And as Yoshua Bengio said « we strongly believe that this is just the beginning of a new era for combinatorial optimisation algorithms »

** Gilles Madi, Prevision.io, Auto-Lag Networks for Real Valued Sequence to Sequence Prediction
Many machine learning problems involve predicting a sequence of future values of a target variable. State-of-the-art approaches for such use cases involve LSTM based sequence to sequence models.
To improve the performances, those models generally use lagged values of the target variable as additional input features. Therefore, appropriate lag factor has to be chosen during feature engineering. This choice often requires business knowledge of the data. Furthermore, state-of-the-art sequence to sequence models are not designed to naturally handle hierarchical time series use cases.
In this paper, we propose a novel architecture that naturally handles hierarchical time series. The contribution of this paper is thus two-folds. First we show the limitations of classical sequence to sequence models in the case of problems involving a real valued target variable, namely the error accumulation problem and we propose a novel LSTM based approach to overcome those limitations.
Second, we highlight the limitations of manually selecting fixed lag values to improve the performance of a model. We then use an attention mechanism to introduce a dynamic and automatic lag factor selection that overcomes the former limitations, and requires no business knowledge of the data.
We call this architecture Auto-Lag Network (AL-Net). We finally validate our Auto-Lag Net model against state-of-the-art results on real-world time series and hierarchical time series data sets.





Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv

Tuesday, June 11, 2019

Deep Learning based compressive sensing - Highly Technical Reference Page/Aggregator, implementation -

** Nuit Blanche is now on Twitter: @NuitBlog **



Thuong Nguyen Canh just sent me the following e-mail featuring a new Highly Technical Reference Page/Aggregator on Deep Learning based compressive sensing as well as two of his recent papers with their implementations.
Dear Igor,


I just want to share a Github repo that I am maintaining for Deep Learning based compressive sensing and our recent paper toward multi-scale deep compressive sensing.

1. Multi-Scale Deep Compressive Sensing Network, IEEE VCIP 2018.  
Abstract: With joint learning of the sampling and recovery, the deep learning-based compressive sensing (DCS) has shown significant improvement in performance and running time reduction. Its reconstructed image, however, losses high-frequency content especially at low subrates. It is understood due to relatively much low-frequency information captured into the sampling matrix. This behavior happens similarly in the multi-scale sampling scheme which also samples more low-frequency components. This paper proposes a multi-scale DCS (MS-DCSNet) based on convolutional neural network. Firstly, we convert image signal using multiple scale-based wavelet transform. Then, the signal is captured through the convolution block by block across scales. The initial reconstructed image is directly recovered from multi-scale measurements. Multi-scale wavelet convolution is utilized to enhance the final reconstruction quality. The network learns to perform both multi-scale in sampling and reconstruction thus results in better reconstruction quality.

Source Code


2. Difference of Convolution for Deep Compressive Sensing, IEEE ICIP 2019 
Deep learning-based compressive sensing (DCS) has improved the compressive sensing (CS) with fast and high reconstruction quality. Researchers have further extended it to multi-scale DCS which improves reconstruction quality based on Wavelet decomposition. In this work, we mimic the Difference of Gaussian via convolution and propose a scheme named as Difference of convolution-based multi-scale DCS (DoC-DCS). Unlike the multi-scale DCS based on a well-designed filter in the wavelet domain, the proposed DoC-DCS learns decomposition, thereby, outperforms other state-of-the-art compressive sensing methods.

Source code

Best regards,
Thuong Nguyen Canh

Thanks Thuong !


Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn


Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog

Monday, June 10, 2019

Random Projections for Quadratic Programs over a Euclidean Ball

** Nuit Blanche is now on Twitter: @NuitBlog ** 

Using random projections to shrink Linear and Quadratic programs!


From presentation Random projections for Quadratic Programming over a Euclidean ball



Random projections are used as dimensional reduction techniques in many situations. They project a set of points in a high dimensional space to a lower dimensional one while approximately preserving all pairwise Euclidean distances. Usually, random projections are applied to numerical data. In this paper, however, we present a successful application of random projections to quadratic programming problems subject to polyhedral and a Euclidean ball constraint. We derive approximate feasibility and optimality results for the lower dimensional problem. We then show the practical usefulness of this idea on many random instances, as well as on two portfolio optimization instances with over 25M nonzeros in the (quadratic) risk term.


We discuss the application of Gaussian random projections to the fundamental problem of deciding whether a given point in a Euclidean space belongs to a given set. In particular, we consider the two cases, when the target set is either at most countable or of low doubling dimension. We show that, under a number of different assumptions, the feasibility (or infeasibility) of this problem is preserved almost surely when the problem data is projected to a lower dimensional space. We also consider the threshold version of this problem, in which we require that the projected point and the projected set are separated by a certain distance error. As a consequence of these results, we are able to improve the bound of Indyk-Naor on the Nearest Neigbour preserving embeddings. Our results are applicable to any algorithmic setting which needs to solve Euclidean membership problems in a high-dimensional space. 



A celebrated result of Johnson and Lindenstrauss asserts that, in high enough dimensional spaces, Euclidean distances defined by a finite set of points are approximately preserved when these points are projected to a certain lower dimensional space. We show that the distance from a point to a convex set is another approximate invariant, and leverage this result to approximately solve linear programs with a logarithmic number of rows.

 Random projections are random linear maps, sampled from appropriate distributions, that approximately preserve certain geometrical invariants so that the approximation improves as the dimension of the space grows. The well-known Johnson-Lindenstrauss lemma states that there are random matrices with surprisingly few rows that approximately preserve pairwise Euclidean distances among a set of points. This is commonly used to speed up algorithms based on Euclidean distances. We prove that these matrices also preserve other quantities, such as the distance to a cone. We exploit this result to devise a probabilistic algorithm to solve linear programs approximately. We show that this algorithm can approximately solve very large randomly generated LP instances. We also showcase its application to an error correction coding problem.
Abstract In this thesis, we will use random projection to reduce either the number of variables or the number of constraints (or both in some cases) in some well-known optimization problems. By projecting data into lower dimensional spaces, we obtain new problems with similar structures, but much easier to solve. Moreover, we try to establish conditions such that the two problems (original and projected) are strongly related (in probability sense). If it is the case, then by solving the projected problem, we can either find approximate solutions or approximate objective value for the original one. We will apply random projection to study a number of important optimization problems, including linear and integer programming (Chapter 2), convex optimization with linear constraints (Chapter 3), membership and approximate nearest neighbor (Chapter 4) and trustregion subproblems (Chapter 5). All these results are taken from the papers that I am co-authored with [26, 25, 24, 27]. This thesis will be constructed as follows. In the first chapter, we will present some basic concepts and results in probability theory. Since this thesis extensively uses elementary probability, this informal introduction will make it easier for readers with little background on this field to follow our works. In Chapter 2, we will briefly introduce to random projection and the Johnson-Lindenstrauss lemma. We will present several constructions of random projectors and explain the reason why they work. In particular, sub-gaussian random matrices will be treated in details, together with some discussion on fast and sparse random projections. In Chapter 3, we study optimization problems in their feasibility forms. In particular, we study the so-called restricted linear membership problem, which asks for the feasibility of the system {Ax = b, x ∈ C} where C is some set that restricts the choice of parameters x. This class contains many important problems such as linear and integer feasibility. We propose to apply a random projection T to the linear constraints and obtain the corresponding projected problem: {T Ax = T b, x ∈ C}. We want to find conditions on T, so that the two feasibility 4 problems are equivalent with high probability. The answer is simple when C is finite and bounded by a polynomial (in n). In that case, any random projection T with O(log n) rows is sufficient. When C = R n +, we use the idea of separating hyperplane to separate b from the cone {Ax | x ≥ 0} and show that T b is still separated from the projected cone {T Ax | x ≥ 0} under certain conditions. If these conditions do not hold, for example when the cone {Ax | x ≥ 0} is non-pointed, we employ the idea in the Johnson-Lindenstrauss lemma to prove that, if b /∈ {Ax | x ≥ 0}, then the distance between b and that cone is slightly distorted under T, thus still remains positive. However, the number of rows of T depends on unknown parameters that are hard to estimate. In Chapter 4, we continue to study the above problem in the case when C is a convex set. Under that assumption, we can define a tangent cone K of C at x ∗ ∈ arg minx∈C kAx − bk. We establish the relations between the original and projected problems based on the concept of Gaussian width, which is popular in compressed sensing. In particular, we prove that the two problems are equivalent with high probability as long as the random projection T is sampled from sub-gaussian distributions and has at least O(W2 (AK)) rows, where W(AK) is the Gaussian-width of AK. We also extend this result to the case when T is sampled from randomized orthonormal systems in order to exploit its fast matrix-vector multiplication. Our results are similar to those in [21], however they are more useful in privacy-preservation applications when the access to the original data A, b is limited or unavailable. In Chapter 5, we study the Euclidean membership problem: “Given a vector b and a closed set X in R n , decide whether b ∈ X or not”. This is a generalization of the restricted linear membership problem considered previously. We employ a Gaussian random projection T to embed both b and X into a lower dimension space and study the corresponding projected version: “Decide whether T b ∈ T(X) or not”. When X is finite or countable, using a straightforward argument, we prove that the two problems are equivalent almost surely regardless the projected dimension. However, this result is only of theoretical interest, possibly due to round-off errors in floating point operations which make its practical application difficult. We address this issue by introducing a threshold τ > 0 and study the corresponding “thresholded” problem: “Decide whether dist (T b, T(X)) ≥ τ”. In the case when X may be uncountable, we prove that the original and projected problems are also equivalent if the projected dimension d is proportional to some intrinsic dimension of the set X. In particular, we employ the definition of doubling dimension to prove that, if b /∈ X, then Sb /∈ S(X) almost surely as long as d = Ω(ddim(X)). Here, ddim(X) is the doubling dimension of X, which is defined as the smallest number such that each ball in X can be covered by at most 2 dd(X) balls of half the radius. We extend this result to the thresholded case, and obtain a more useful bound for d. It turns out that, as a consequence of that result, we are able to 5 improve a bound of Indyk-Naor on the Nearest Neigbour Preserving embeddings by a factor of log(1/δ) ε . In Chapter 6, we propose to apply random projections for the trust-region subproblem, which is stated as min{c >x + x >Qx | Ax ≤ b, kxk ≤ 1}. These problems arise in trust-region methods for dealing with derivative-free optimization. Let P ∈ R d×n be a random matrix sampled from Gaussian distribution, we then consider the following “projected” problem: min{c >P >P x + x >P >P QP >P x | AP >P x ≤ b, kP xk ≤ 1}, which can be reduced to min{(P c) >u + u >(P QP >)u | AP >u ≤ b, kuk ≤ 1} by setting u := P x. The latter problem is of low dimension and can be solved much faster than the original. However, we prove that, if u ∗ is its optimal solution, then with high probability, x ∗ := P >u ∗ is a (1 + O(ε))-approximation for the original problem. This is done by using recent results about the “concentration of eigenvalues” of Gaussian matrices. 

Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv

Saturday, June 08, 2019

Saturday Morning Videos: L4DC: Learning for Dynamics and Control, May 30 & 31, 2019, MIT

** Nuit Blanche is now on Twitter: @NuitBlog **


The L4DC: Learning for Dynamics and Control conference took place on May 30 & 31, 2019 at MIT. From the website page:


Over the next decade, the biggest generator of data is expected to be devices which sense and control the physical world.
This explosion of real-time data that is emerging from the physical world requires a rapprochement of areas such as machine learning, control theory, and optimization. While control theory has been firmly rooted in tradition of model-based design, the availability and scale of data (both temporal and spatial) will require rethinking of the foundations of our discipline. From a machine learning perspective, one of the main challenges going forward is to go beyond pattern recognition and address problems in data driven control and optimization of dynamical processes.
Our overall goal is to create a new community of people that think rigorously across the disciplines, asks new questions, and develops the foundations of this new scientific area

Here are the videos followed by the posters at the conference:














Emo Todorov (University of Washington): “Acceleration-based methods for trajectory optimization through contacts”









Posters 

  • Murad Abu-Khalaf, Sertac Karaman & Daniela Rus, MIT, “Shared Linear Quadratic Regulation Control: A Reinforcement Learning Approach” PDF
  • Aaron Ames & Andrew J. Taylor, CalTech, “A Control Lyapunov Function Approach to Episodic Learning” PDF
  • Anuradha M. Annaswamy, MIT, “Stable and Fast Learning with Momentum and Adaptive Rates” PDF
  • Thomas Beckers, Technical University of Munich, “Gaussian Process Based Identification and Control with Formal Guarantees” PDF
  • Kostas Bekris, Rutgers University, “Closing the Reality Gap of Physics-Based Robot Simulation Through Task-Oriented Bayesian Optimization”
  • Julian Berberich, University of Stuttgart, “Data-Driven Model Predictive Control with Stability and Robustness Guarantees” PDF
  • Tom Bertalan, MIT, “On Learning Hamiltonian Systems from Data”
  • Calin Belta & Xiao Li, Boston University, “A Formal Methods Approach to Reinforcement Learning For Robotic Control”
  • Nicholas M. Boffi, Harvard University, and Jean-Jacques Slotine, MIT, “A continuous-time analysis of distributed stochastic gradient”
  • Byron Boots, Georgia Institute of Technology, “An Online Learning Approach to Model Predictive Control”
  • Octavia Camps, Northeastern University College of Engineering, “KW-DYAN: A Recurrent Dynamics-Based Network for Video Prediction” PDF
  • Bugra Can, Rutgers University, “Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances”
  • Pratik Chaudhari, Amazon Web Services, “P3O: Policy-on Policy-off Policy Optimization”
  • Alessandro Chiuso, ETH Zuerich, “CoRe: Control Oriented Learning – A Regularisation-Based Approach”
  • Glen Chou, University of Michigan, “Learning Constraints from Demonstrations”
  • Claus Danielson, Ankush Chakrabarty, Stefano Di Cairano, Mitsubishi Electric Research Laboratories, “Invariance for Safe Learning in Constraint-Enforcing Control”
  • Adithya Devraj, University of Florida, “Stochastic Approximation and the Need for Speed” PDF
  • Vikas Dhiman, UC San Diego, “Model-based Transfer Learning of Skills across Robots and Tools”
  • Zhe Du, University of Michigan, “Online Robust Switched System Identification”
  • Alec Farid, Princeton University, “PAC-Bayes Control: Learning Policies that Provably Generalize to Novel Environments” PDF
  • Dylan Foster, MIT, “Model Selection for Contextual Bandits”
  • Travis Gibson, Harvard Medical School, “Connections Between Adaptive Control and Machine Learning”
  • Stephanie Gil, Arizona State University, “Generalized Rollout Algorithms for POMDP with Application to Sequential Repair Problems”
  • Mert Gurbuzbalaban, Rutgers University, “Robust Accelerated Gradient Methods”
  • Josiah Hanna, University of Texas, “Robot Learning in Simulation with Action Transformations”
  • Hamed Hassani, University of Pennsylvania, “Distributed Scenarios in Submodular Optimization”
  • Jonathan How, MIT, “Knowledge Transfer via Learning to Teach in Cooperative Multiagent Reinforcement Learning”
  • Ameya Jagtap, Brown University, “Time-Parallel and Fractional Physics-Informed Neural Networks for Solving Transient PDEs” PDF
  • Yassir Jedra, KTH Royal Institute of Technology in Stockholm, “Sample Complexity Lower Bounds for Linear System Identification”
  • Angjoo Kanazawa, UC Berkeley, “SFV: Reinforcement Learning of Physical Skills from Videos”
  • Bahir El Kadir & Amir Ali Ahmadi, Princeton University, “Learning Dynamical Systems With Side Information”
  • Reza Khodayi-mehr, Duke University, “Model-Based Learning of Turbulent Flows using Mobile Robots” PDF
  • Dong-Ki Kim, MIT, “Knowledge Transfer via Learning to Teach in Cooperative Multiagent Reinforcement Learning”
  • George Kissas & Yibo Yang, University of Pennsylvania, “Learning the Flow Map of Dynamical Systems with Self-Supervised Neural Runge-Kutta Networks”
  • Alec Koppel, University of Pennsylvania, “Global Convergence of Policy Gradient Methods: A Nonconvex Optimization Perspective”
  • Abdul Rahman Kreidieh, UC Berkeley, “Scalable methods for the control of mixed autonomous system”
  • Nevena Lazic, Google, “POLITEX: Regret Bounds for Policy Iteration using Expert Prediction” PDF
  • Armin Lederer, Technical University of Munich, “Stable Feedback Linearization and Optimal Control for Gaussian Processes” PDF
  • Na Li, Harvard University, “The Role of Prediction in Online Control”
  • Jason Liang, MIT, “Learning the Contextual Demand Curve in Repeated Auctions” 
  • Nikolai Matni, UC Berkeley, “Robust Guarantees for Perception-Based Control”
  • Jared Miller, Yang Zheng, Biel Roig-Solvas, Mario Sznaier, Antonis Papachristodoulou, Northeastern University/Harvard University/University of Oxford, “Chordal Decomposition in Rank Minimized SDPs”
  • Yannis Paschalidis, Boston University, “Distributionally Robust Learning and Applications to Predictive and Prescriptive Health Analytics” PDF
  • Panagiotis Patrinos & Mathijs Schuurmans, KU Leuven, “Safe Learning-Based Control of Stochastic Jump Linear Systems: A Distributionally Robust Approach” PDF
  • Amirhossein Reisizadeh, UC Santa Barbara, “Robust and Communication-Efficient Collaborative Learning” PDF
  • Anders Rantzer, Lund University, “On the Non-Robustness of Certainty Equivalence Control”
  • Lilian Ratliff, Sam Burden, Sam Coogan, Benjamin Chasnov & Tanner Fiez, University of Washington, “Certifiable Algorithms for Learning and Control in Multiagent Systems” PDF
  • Alejandro Ribeiro, University of Pennsylvania, “Know Your Limits: Learning Feasible Specifications Using Counterfactual Optimization”
  • Thomas Schön, Uppsala University, “Robust Exploration for Data-Driven Linear Quadratic Control”
  • Artin Spiridonof, Boston University, “Network Independence in Distributed Optimization” PDF
  • Lili Su, MIT, “Distributed Learning and Estimation in the Presence of Byzantine Agents”
  • Friedrich Solowjow & Sebastian Trimpe, Max Planck Institute for Intelligent Systems – Stuttgart, Germany, “Event-Triggered Learning”
  • Karan Singh, Princeton University, “Online Control with Adversarial Disturbances”
  • Madeleine Udell, Cornell University, “OBOE: Collaborative Filtering for Automated Machine Learning” PDF
  • Min Wen, University of Pennsylvania, “Constrained Cross-Entropy Method for Safe Reinforcement Learning”
  • Zhi Xu, MIT, “On Reinforcement Learning Using Monte Carlo Tree Search with Supervised Learning: Non-Asymptotic Analysis”
  • Lin F. Yang, Princeton University, “Sample-Optimal Parametric Q-Learning with Linear Transition Models” PDF
  • Yan Zhang, Duke University, “Distributed Off-Policy Actor-Critic Reinforcement Learning with Policy Consensus”



Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn


Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog

Printfriendly