Master Thesis: Gaussian Process Regression of Air Quality Data with non-Gaussian Likelihoods

Background

In this thesis you will work on the topic of air quality prediction. Traditional methods of air quality prediction use expensive physical simulation models which require a carefully maintained inventory of sources in the domain along with high expertise in running the simulation models. Inverse modelling is a complementary method which attempts to interpolate sensor data, rather than evolve source information, to obtain information about the spatial distribution of air pollutants. Gaussian Process Regression is a method of Bayesian Inference, which interpolates between support points by using a kernel function. One of its advantages is that any prediction is probabilistic in nature, giving a mean estimate along with a standard deviation.
Air quality sensors received more and more attention in recent years with their hard- and software become more advanced and cheaper, offering the possibility to deploy large sensor networks in cities. At the same time, the challenges of climate change prompt countries to adopt laws with stricter air quality limits which, in turn, incentivizes cities to invest in monitoring air quality. Monitoring air quality in the case of particulate matter with low-cost sensors are fundamentally counting experiments using laser diffraction. This raises the question whether a Poisson statistic is required to adequately model interpolation of particulate matter.
The objective of this thesis is to implement non-Gaussian likelihood functions in a Gaussian Process Regression and evaluate whether they are better suited to predict particulate matter concentrations in regions between sensors than Gaussian likelihoods.

Tasks

  1. Understanding the principles of Gaussian Process Regression and the role of the likelihood
  2. Implementation of a Gaussian Process Regression (GPR) based on a Poisson likelihood
  3. Evaluation of the Poisson likelihood GPR against a Gaussian likelihood GPR on an air quality dataset according to scientific principles

Skillset

  • Basic understanding of Bayesian Statistics
  • Good python skills and experience with numpy (or PyTorch)

If you are interested in this topic, please contact Paul Tremper (tremper@teco.edu)