Nuit Blanche: Distinguishing Cause from Effect using Observational Data: Methods and Benchmarks

Monday, December 22, 2014

Distinguishing Cause from Effect using Observational Data: Methods and Benchmarks

More work paralleling that of David Lopez-Paz et al's s Non-linear Causal Inference using Gaussianity Measures - implementation - ( We featured David at Paris Machine Learning #2 Season 2 ). A simple note when one looks at this from the signal processing standpoint. There, we usually set problems as

y = A x + epsilon

where epsilon is some gaussian noise. In these papers on causality, that specific shape for the noise would tend to imply that y is not the result of x as "residuals in the anti-causal direction are more Gaussian than the residuals in the actual causal direction". Anyway, today we have the following (including a cause and effects pairs database ).

Distinguishing cause from effect using observational data: methods and benchmarks by Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, Bernhard Schölkopf

The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X, Y . This was often considered to be impossible. Nevertheless, several approaches for addressing this bivariate causal discovery problem were proposed recently. In this paper, we present the benchmark data set CauseEffectPairs that consists of 88 different "cause-effect pairs" selected from 31 datasets from various domains. We evaluated the performance of several bivariate causal discovery methods on these real-world benchmark data and on artificially simulated data. Our empirical results provide evidence that additive-noise methods are indeed able to distinguish cause from effect using only purely observational data. In addition, we prove consistency of the additive-noise method proposed by Hoyer et al. (2009).

Of additional interest is a Database with cause-effect pairs

We consider the problem of learning causal directed acyclic graphs from an observational joint distribution. One can use these graphs to predict the outcome of interventional experiments, from which data are often not available. We show that if the observational distribution follows a structural equation model with an additive noise structure, the directed acyclic graph becomes identifiable from the distribution under mild conditions. This constitutes an interesting alternative to traditional methods that assume faithfulness and identify only the Markov equivalence class of the graph, thus leaving some edges undirected. We provide practical algorithms for finitely many samples, RESIT (Regression with Subsequent Independence Test) and two methods based on an independence score. We prove that RESIT is correct in the population setting and provide an empirical evaluation.

Implementations are available here.

Learning Sparse Causal Models is not NP-hard by Tom Claassen, Joris M. Mooij, Tom Heskes

This paper shows that causal model discovery is not an NP-hard problem, in the sense that for sparse graphs bounded by node degree k the sound and complete causal model can be obtained in worst case order N2(k+2) independence tests, even when latent variables and selection bias may be present. We present a modi cation of the well-known FCI algorithm that implements the method for an independence oracle, and suggest improvements for sample/real-world data versions. It does not contradict any known hardness results, and does not solve an NP-hard problem: it just proves that sparse causal discovery is perhaps more complicated, but not as hard as learning minimal Bayesian networks

Proof Supplement - Learning Sparse Causal Models is not NP-hard (UAI2013) by Tom Claassen, Joris M. Mooij, Tom Heskes

This article contains detailed proofs and additional examples related to the UAI-2013 submission `Learning Sparse Causal Models is not NP-hard'. It describes the FCI+ algorithm: a method for sound and complete causal model discovery in the presence of latent confounders and/or selection bias, that has worst case polynomial complexity of order N2(k+1) in the number of independence tests, for sparse graphs over N nodes, bounded by node degree k. The algorithm is an adaptation of the well-known FCI algorithm by (Spirtes et al., 2000) that is also sound and complete, but has worst case complexity exponential in N.