The late-time matter distribution depicts a complex pattern commonly called the cosmic web. In this picture, the spatial arrangement of matter is that of dense anchors, the nodes, linked together by elongated bridges of matter, the filaments, found at the intersection of thin mildly-dense walls, themselves surrounding large empty voids. This distribution, shaped by gravitational forces since billions of years, carries crucial information on the underlying cosmological model and on the evolution of the large-scale structures. Detecting and studying elements of cosmic web, playing also a key role in the formation and evolution of galaxies, are challenging tasks requiring the elaboration of optimised methods to handle the intrinsic complexity of the pattern made of multi-scale structures of various shapes and densities.With the aim of identifying and characterising the cosmic web environments, we propose several approaches to analyse spatially structured point-cloud datasets, not restricted to cosmological ones, by means of unsupervised machine learning methods based on mixture models. In particular, we use principles emanating from statistical physics to get a better understanding of the learning dynamics of a clustering algorithm and expose how statistical physics can be used to explore the data distribution and obtain key insights on its structure.In order to identify the filamentary part of the pattern, its most prominent feature, we propose a regularisation of the clustering procedure to iteratively learn a non-linear representation of structured datasets, assuming it was generated by an underlying one-dimensional manifold. The method models this latent structure as a graph embedded as a prior in the Bayesian formulation of the problem to estimate a principal graph passing in the ridges of the matter distribution as traced by galaxies or halos.We show that this formulation is especially well-suited for the description of the filaments since it allows the description of their geometrical properties (lengths, widths, etc.) and associates to each tracer a probability of residing in a given filament. The resulting algorithm is successfully used to detect filaments in state-of-the-art numerical simulations. It also allows us to study the relation between the connectivity of galaxy clusters to the cosmic web and their dynamical and morphological properties. Finally, based on a large suite of N-body simulations, we perform a comprehensive analysis of the cosmological information content based on the two-point statistics derived in the cosmic web environments (nodes, filaments, walls and voids). We show that they can break some degeneracies among key parameters of the model making them a suitable alternative probe to significantly improve the constraints on cosmological parameters obtained by standard analyses.