Pattern extraction from point-cloud datasets and cosmological applications


Point-cloud datasets are ubiquitous in many science and non-science fields. These data are usually coming along with unique patterns that some algorithms are meant to extract and that are linked with the underlying phenomenon that generated the data. In this presentation, motivated by cosmological problematics, we will focus on two kinds of spatially structured datasets. First, clustered-type patterns in which the datapoints are separated in the input space into multiple groups. We will show that the unsupervised clustering procedure performed with a Gaussian Mixture Model can be formulated in terms of a statistical physics optimisation problem. This formulation enables the unsupervised extraction of many key information about the dataset itself, like the number of clusters, their size and how they are embedded in space, particularly interesting for high-dimensional input spaces where visualisation is not possible. On the other hand, we will study spatially continuous datasets assuming as standing on an underlying 1D structure that we aim to learn. To this end, we resort to a regularisation of the Gaussian Mixture Model in which a spatial graph is used as a prior to approximate the underlying 1D structure. The overall graph is efficiently learnt by means of the Expectation-Maximisation algorithm with guaranteed convergence and comes together with the learning of the local width of the structure. We then illustrate applications of the algorithm to model and identify the filamentary pattern drawn by the galaxy distribution of the Universe in cosmological datasets.

Dec 2, 2021 10:00 AM — 11:00 AM
IECL, probability and statistics
Tony Bonnaire
Tony Bonnaire
Postdoctoral researcher