Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection - 3IA Côte d’Azur – Interdisciplinary Institute for Artificial Intelligence
Article Dans Une Revue Statistics and Computing Année : 2024

Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection

Résumé

Feature selection in clustering is a hard task which involves simultaneously the discovery of relevant clusters as well as relevant variables with respect to these clusters. While feature selection algorithms are often model-based through optimised model selection or strong assumptions on the data distribution, we introduce a discriminative clustering model trying to maximise a geometry-aware generalisation of the mutual information called GEMINI with a simple ℓ1 penalty: the Sparse GEMINI. This algorithm avoids the burden of combinatorial feature subset exploration and is easily scalable to high-dimensional data and large amounts of samples while only designing a discriminative clustering model. We demonstrate the performances of Sparse GEMINI on synthetic datasets and large-scale datasets. Our results show that Sparse GEMINI is a competitive algorithm and has the ability to select relevant subsets of variables with respect to the clustering without using relevance criteria or prior hypotheses.
Fichier principal
Vignette du fichier
Sparse and geometry-aware generalisation of the mutual information.pdf (1.09 Mo) Télécharger le fichier
Sparse and geometry-aware generalisation of the mutual information (1).pdf (1.22 Mo) Télécharger le fichier
Origine Fichiers éditeurs autorisés sur une archive ouverte
licence

Dates et versions

hal-04755942 , version 1 (20-12-2024)

Licence

Identifiants

Citer

Louis Ohl, Pierre-Alexandre Mattei, Charles Bouveyron, Mickaël Leclercq, Arnaud Droit, et al.. Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection. Statistics and Computing, 2024, 34 (5), pp.155. ⟨10.1007/s11222-024-10467-9⟩. ⟨hal-04755942⟩
8 Consultations
0 Téléchargements

Altmetric

Partager

More