Sparse and group-sparse clustering for mixed data An illustration of the vimpclust package
hal.structure.identifier | Université de Bordeaux [UB] | |
hal.structure.identifier | Méthodes avancées d’apprentissage statistique et de contrôle [ASTRAL] | |
dc.contributor.author | CHAVENT, Marie | |
hal.structure.identifier | Safran Aircraft Engines | |
dc.contributor.author | LACAILLE, Jérôme | |
hal.structure.identifier | Statistique, Analyse et Modélisation Multidisciplinaire (SAmos-Marin Mersenne) [SAMM] | |
dc.contributor.author | MOURER, Alex | |
hal.structure.identifier | CEntre de REcherches en MAthématiques de la DEcision [CEREMADE] | |
dc.contributor.author | OLTEANU, Madalina | |
dc.date.accessioned | 2024-04-04T02:39:47Z | |
dc.date.available | 2024-04-04T02:39:47Z | |
dc.date.conference | 2022-06-13 | |
dc.identifier.uri | https://oskar-bordeaux.fr/handle/20.500.12278/191002 | |
dc.description.abstractEn | High-dimensional data may often contain both numerical and categorical features, and in some cases features may be available as natural groups (repeated measurements, categories of features, ...). Clustering this kind of data raises several issues: how to simultaneously deal with numerical and categorical features? how to build meaningful clusters of the input entities? how to select the most informative features or groups of features for the clustering? In the k-means framework, one may rely on a penalised version of the between-cluster variance, and find both the best partitioning of the data, and the most informative features or groups of features. The present manuscript illustrates sparse k-means and group sparse k-means for mixed data, using the vimpclust package. The example provided on a small real-life dataset shows how feature selection may be directly combined with clustering, and provide a meaningful selection while preserving the quality of the clustering. | |
dc.language.iso | en | |
dc.subject | clustering | |
dc.subject | k-means parcimonieux | |
dc.subject | pénalités L1 et L1-groupe | |
dc.subject | données mixtes | |
dc.subject | packages R | |
dc.subject.en | clustering | |
dc.subject.en | sparse k-means | |
dc.subject.en | L1 and group-L1 penalties | |
dc.subject.en | mixed data | |
dc.subject.en | R packages | |
dc.title.en | Sparse and group-sparse clustering for mixed data An illustration of the vimpclust package | |
dc.type | Communication dans un congrès | |
dc.subject.hal | Mathématiques [math]/Statistiques [math.ST] | |
bordeaux.hal.laboratories | Institut de Mathématiques de Bordeaux (IMB) - UMR 5251 | * |
bordeaux.institution | Université de Bordeaux | |
bordeaux.institution | Bordeaux INP | |
bordeaux.institution | CNRS | |
bordeaux.conference.title | JDS 2022 - 53èmes Journées de Statistique de la Société Française de Statistique (SFdS) | |
bordeaux.country | FR | |
bordeaux.conference.city | Lyon | |
bordeaux.peerReviewed | oui | |
hal.identifier | hal-03839521 | |
hal.version | 1 | |
hal.invited | non | |
hal.proceedings | non | |
hal.conference.end | 2022-06-17 | |
hal.popular | non | |
hal.audience | Internationale | |
hal.origin.link | https://hal.archives-ouvertes.fr//hal-03839521v1 | |
bordeaux.COinS | ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.au=CHAVENT,%20Marie&LACAILLE,%20J%C3%A9r%C3%B4me&MOURER,%20Alex&OLTEANU,%20Madalina&rft.genre=unknown |
Fichier(s) constituant ce document
Fichiers | Taille | Format | Vue |
---|---|---|---|
Il n'y a pas de fichiers associés à ce document. |