Bachelor / Master Thesis: Large Scale Survey and Benchmarking of Causal Discovery Algorithms

“Causal inference from observational data “demands a good deal of humility.”

— Cochran

Note: The topic can be extended to accommodate two students. Details will be discussed with the supervisor.

Background

Causal discovery is a branch of causal inference, a subfield shared by many areas, such as medicine, epidemiology, social science, economics. Its goal is to derive causal relationships between observed variables in a form of a causal graph when an experiment is infeasible.

There exist several causal discovery methods and algorithms and several newer ones are being published, each claiming to be more accurate or suitable for some scenarios than the others. However, unlike supervised learning, we can only “crawl towards the truth” in causal inference. Practitioners must rely on the authors’ claim when selecting methods, because it is difficult to verify algorithm performance with real-world data. Unfortunately, the many published methods and algorithms are not tested on the same data sets nor evaluated using the same metrics. There exist only a few review papers on methods [1][2] and evaluation frameworks [3], but none to date attempts to benchmark them under equal grounds.

Goals

The ultimate goals of this project are to update the taxonomy of causal discovery methods, provide evidence of method performance and guidelines to method selection.

Tasks

Research and unify evaluation frameworks for causal discovery
Curate real-world data sets suitable for causal discovery
Review literature on causal discovery methods, sort them according to a known classification system, and update it if necessary.
Design experiment settings and criteria to evaluate the researched causal discovery methods
Evaluate causal discovery methods with the developed/unified framework.

Qualification

Proactive and communicative work style
Good English reading and writing
Good Python and/or R programming
Not afraid of complicated-looking mathematical symbols, statistics, and probability theory

Interested? Please contact: Ployplearn Ravivanpong (ployplearn.ravivanpong@kit.edu)

References

[1] Review of Causal Discovery Methods Based on Graphical Models. Glymour, C., Zhang, K., & Spirtes, P. Frontiers in genetics, 10, 524. (2019). https://doi.org/10.3389/fgene.2019.00524

[2] D’ya like DAGs? A Survey on Structure Learning and Causal Discovery. MJ Vowels, NC Camgoz, R Bowden – arXiv preprint arXiv:2103.02582, 2021

[3] The Case for Evaluating Causal Models Using Interventional Measures and Empirical Data. Gentzel, Amanda & Garant, Dan & Jensen, David. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.

[4] CauseMe: An online system for benchmarking causal discovery methods. J. Muñoz-Marí, G. Mateo, J. Runge, and G. Camps-Valls. In preparation (2020).