Tuesday, September 29, 2015

Anomaly Detection in High Dimensional Data: Hyperspectral data, movies and more...

From the CAIB report featured in The Modeler's Known Unknowns and Unknown Knowns
Anomaly detection is when you are concerned with the "unknown unknowns" or to put it in a perspective that is currently solely missing from many algorithms: you are dealing with sometimes adversarial/evading counterparties or unexpected/outside model behaviors (outliers). There are some very sophisticated algorithms in machine learning and compressive sensing dealing with detailed classifications but when faced with unkown unknowns, you want to quantify anomaly detection or how far data is from your "frame-of-mind" model. High dimensional data afforded by cheap memory and CMOS is likely making these needles hard to find. Here are some recent preprints that showed up on my radar screen recently on the subject. And yes sparsity is sometimes key to detect them. Enjoy !

We discuss recent progress in techniques for modeling and analyzing hyperspectral images and movies, in particular for detecting plumes of both known and unknown chemicals. We discuss novel techniques for robust modeling of the background in a hyperspectral scene, and for detecting chemicals of known spectrum, we use partial least squares regression on a resampled training set to boost performance. For the detection of unknown chemicals we view the problem as an anomaly detection problem, and use novel estimators with low-sampled complexity for intrinsically low-dimensional data in high-dimensions that enable use to model the "normal" spectra and detect anomalies. We apply these algorithms to benchmark data sets made available by Lincoln Labs at the Automated Target Detection program co-funded by NSF, DTRA and NGA, and compare, when applicable, to current state-of-art algorithms, with favorable results.
Optimal Sparse Kernel Learning for Hyperspectral Anomaly Detection
Zhimin Peng, Prudhvi Gurram, Heesung Kwon, Wotao Yin

In this paper, a novel framework of sparse kernel learning for Support Vector Data Description (SVDD) based anomaly detection is presented. In this work, optimal sparse feature selection for anomaly detection is first modeled as a Mixed Integer Programming (MIP) problem. Due to the prohibitively high computational complexity of the MIP, it is relaxed into a Quadratically Constrained Linear Programming (QCLP) problem. The QCLP problem can then be practically solved by using an iterative optimization method, in which multiple subsets of features are iteratively found as opposed to a single subset. The QCLP-based iterative optimization problem is solved in a finite space called the \emph{Empirical Kernel Feature Space} (EKFS) instead of in the input space or \emph{Reproducing Kernel Hilbert Space} (RKHS). This is possible because of the fact that the geometrical properties of the EKFS and the corresponding RKHS remain the same. Now, an explicit nonlinear exploitation of the data in a finite EKFS is achievable, which results in optimal feature ranking. Experimental results based on a hyperspectral image show that the proposed method can provide improved performance over the current state-of-the-art techniques.

MultiView Diffusion Maps
Ofir Lindenbaum, Arie Yeredor, Moshe Salhov, Amir Averbuch

In this study we consider learning a reduced dimensionality representation from datasets obtained under multiple views. Such multiple views of datasets can be obtained, for example, when the same underlying process is observed using several different modalities, or measured with different instrumentation. Our goal is to effectively exploit the availability of such multiple views for various purposes, such as non-linear embedding, manifold learning, spectral clustering, anomaly detection and non-linear system identification. Our proposed method exploits the intrinsic relation within each view, as well as the mutual relations between views. We do this by defining a cross-view model, in which an implied Random Walk process between objects is restrained to hop between the different views. Our method is robust to scaling of each dataset, and is insensitive to small structural changes in the data. Within this framework, we define new diffusion distances and analyze the spectra of the implied kernels. We demonstrate the applicability of the proposed approach on both artificial and real data sets.

A Framework of Sparse Online Learning and Its Applications by Dayong Wang, Pengcheng Wu, Peilin Zhao, Steven C.H. Hoi

The amount of data in our society has been exploding in the era of big data today. In this paper, we address several open challenges of big data stream classification, including high volume, high velocity, high dimensionality, high sparsity, and high class-imbalance. Many existing studies in data mining literature solve data stream classification tasks in a batch learning setting, which suffers from poor efficiency and scalability when dealing with big data. To overcome the limitations, this paper investigates an online learning framework for big data stream classification tasks. Unlike some existing online data stream classification techniques that are often based on first-order online learning, we propose a framework of Sparse Online Classification (SOC) for data stream classification, which includes some state-of-the-art first-order sparse online learning algorithms as special cases and allows us to derive a new effective second-order online learning algorithm for data stream classification. In addition, we also propose a new cost-sensitive sparse online learning algorithm by extending the framework with application to tackle online anomaly detection tasks where class distribution of data could be very imbalanced. We also analyze the theoretical bounds of the proposed method, and finally conduct an extensive set of experiments, in which encouraging results validate the efficacy of the proposed algorithms in comparison to a family of state-of-the-art techniques on a variety of data stream classification tasks.

This paper presents a new approach, based on polynomial optimization and the method of moments, to the problem of anomaly detection. The proposed technique only requires information about the statistical moments of the normal-state distribution of the features of interest and compares favorably with existing approaches (such as Parzen windows and 1-class SVM). In addition, it provides a succinct description of the normal state. Thus, it leads to a substantial simplification of the the anomaly detection problem when working with higher dimensional datasets.

Anomaly Detection in Unstructured Environments using Bayesian Nonparametric Scene Modeling
Yogesh Girdhar, Walter Cho, Matthew Campbell, Jesus Pineda, Elizabeth Clarke, Hanumant Singh

This paper explores the use of a Bayesian non-parametric topic modeling technique for the purpose of anomaly detection in video data. We present results from two experiments. The first experiment shows that the proposed technique is automatically able characterize the underlying terrain, and detect anomalous flora in image data collected by an underwater robot. The second experiment shows that the same technique can be used on images from a static camera in a dynamic unstructured environment. The second dataset consisting of video data from a static seafloor camera, capturing images of a busy coral reef. The proposed technique was able to detect all three instances of an underwater vehicle passing in front of the camera, amongst many other observations of fishes, debris, lighting changes due to surface waves, and benthic flora.

Sparsity in Multivariate Extremes with Applications to Anomaly Detection
Nicolas Goix (LTCI), Anne Sabourin (LTCI), Stéphan Clémençon (LTCI)
(Submitted on 21 Jul 2015)
Capturing the dependence structure of multivariate extreme events is a major concern in many fields involving the management of risks stemming from multiple sources, e.g. portfolio monitoring, insurance, environmental risk management and anomaly detection. One convenient (non-parametric) characterization of extremal dependence in the framework of multivariate Extreme Value Theory (EVT) is the angular measure, which provides direct information about the probable 'directions' of extremes, that is, the relative contribution of each feature/coordinate of the 'largest' observations. Modeling the angular measure in high dimensional problems is a major challenge for the multivariate analysis of rare events. The present paper proposes a novel methodology aiming at exhibiting a sparsity pattern within the dependence structure of extremes. This is done by estimating the amount of mass spread by the angular measure on representative sets of directions, corresponding to specific sub-cones of Rd_+. This dimension reduction technique paves the way towards scaling up existing multivariate EVT methods. Beyond a non-asymptotic study providing a theoretical validity framework for our method, we propose as a direct application a --first-- anomaly detection algorithm based on multivariate EVT. This algorithm builds a sparse 'normal profile' of extreme behaviours, to be confronted with new (possibly abnormal) extreme observations. Illustrative experimental results provide strong empirical evidence of the relevance of our approach.

Universal Anomaly Detection: Algorithms and Applications
Shachar Siboni, Asaf Cohen

Modern computer threats are far more complicated than those seen in the past. They are constantly evolving, altering their appearance, perpetually changing disguise. Under such circumstances, detecting known threats, a fortiori zero-day attacks, requires new tools, which are able to capture the essence of their behavior, rather than some fixed signatures. In this work, we propose novel universal anomaly detection algorithms, which are able to learn the normal behavior of systems and alert for abnormalities, without any prior knowledge on the system model, nor any knowledge on the characteristics of the attack. The suggested method utilizes the Lempel-Ziv universal compression algorithm in order to optimally give probability assignments for normal behavior (during learning), then estimate the likelihood of new data (during operation) and classify it accordingly. The suggested technique is generic, and can be applied to different scenarios. Indeed, we apply it to key problems in computer security. The first is detecting Botnets Command and Control (C&C) channels. A Botnet is a logical network of compromised machines which are remotely controlled by an attacker using a C&C infrastructure, in order to perform malicious activities. We derive a detection algorithm based on timing data, which can be collected without deep inspection, from open as well as encrypted flows. We evaluate the algorithm on real-world network traces, showing how a universal, low complexity C&C identification system can be built, with high detection rates and low false-alarm probabilities. Further applications include malicious tools detection via system calls monitoring and data leakage identification.

Anomaly Detection for malware identification using Hardware Performance Counters by Alberto Garcia-Serrano

Computers are widely used today by most people. Internet based applications, like ecommerce or ebanking attracts criminals, who using sophisticated techniques, tries to introduce malware on the victim computer. But not only computer users are in risk, also smartphones or smartwatch users, smart cities, Internet of Things devices, etc. Different techniques has been tested against malware. Currently, pattern matching is the default approach in antivirus software. Also, Machine Learning is successfully being used. Continuing this trend, in this article we propose an anomaly based method using the hardware performance counters (HPC) available in almost any modern computer architecture. Because anomaly detection is an unsupervised process, new malware and APTs can be detected even if they are unknown.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

No comments: