Sparse k-means for mixed data via group-sparse clustering
hal.structure.identifier | Quality control and dynamic reliability [CQFD] | |
dc.contributor.author | CHAVENT, Marie | |
hal.structure.identifier | Safran Aircraft Engines | |
dc.contributor.author | LACAILLE, Jerome | |
hal.structure.identifier | Statistique, Analyse et Modélisation Multidisciplinaire (SAmos-Marin Mersenne) [SAMM] | |
hal.structure.identifier | Safran Aircraft Engines | |
hal.structure.identifier | Quality control and dynamic reliability [CQFD] | |
dc.contributor.author | MOURER, Alex | |
hal.structure.identifier | CEntre de REcherches en MAthématiques de la DEcision [CEREMADE] | |
dc.contributor.author | OLTEANU, Madalina | |
dc.date.accessioned | 2024-04-04T02:47:29Z | |
dc.date.available | 2024-04-04T02:47:29Z | |
dc.date.issued | 2020-10-02 | |
dc.date.conference | 2020-10-02 | |
dc.identifier.uri | https://oskar-bordeaux.fr/handle/20.500.12278/191645 | |
dc.description.abstractEn | The present manuscript tackles the issue of variable selection for clustering, in high dimensional data described both by numerical and categorical features. First, we build upon the sparse k-means algorithm with lasso penalty, and introduce the group-L1 penalty-already known in regression-in the unsupervised context. Second, we preprocess mixed data and transform categorical features into groups of dummy variables with appropriate scaling, on which one may then apply the group-sparse clustering procedure. The proposed method performs simultaneously clustering and feature selection, and provides meaningful partitions and meaningful features, numerical and categorical, for describing them. | |
dc.language.iso | en | |
dc.subject.en | Clustering | |
dc.subject.en | Kmeans algorithm | |
dc.subject.en | Variables selection | |
dc.subject.en | Sparse Models | |
dc.subject.en | Lasso penalty | |
dc.subject.en | Group lasso | |
dc.subject.en | Interpretability | |
dc.subject.en | Explainability | |
dc.subject.en | Weighted Kmeans | |
dc.title.en | Sparse k-means for mixed data via group-sparse clustering | |
dc.type | Communication dans un congrès | |
dc.subject.hal | Statistiques [stat] | |
bordeaux.volume | 978-2-87587-074-2 | |
bordeaux.hal.laboratories | Institut de Mathématiques de Bordeaux (IMB) - UMR 5251 | * |
bordeaux.institution | Université de Bordeaux | |
bordeaux.institution | Bordeaux INP | |
bordeaux.institution | CNRS | |
bordeaux.conference.title | ESANN 2020 - 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning | |
bordeaux.country | BE | |
bordeaux.conference.city | Bruges / Virtual | |
bordeaux.peerReviewed | oui | |
hal.identifier | hal-03130672 | |
hal.version | 1 | |
hal.invited | non | |
hal.proceedings | non | |
hal.conference.end | 2020-10-04 | |
hal.popular | non | |
hal.audience | Internationale | |
hal.origin.link | https://hal.archives-ouvertes.fr//hal-03130672v1 | |
bordeaux.COinS | ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.date=2020-10-02&rft.volume=978-2-87587-074-2&rft.au=CHAVENT,%20Marie&LACAILLE,%20Jerome&MOURER,%20Alex&OLTEANU,%20Madalina&rft.genre=unknown |
Fichier(s) constituant ce document
Fichiers | Taille | Format | Vue |
---|---|---|---|
Il n'y a pas de fichiers associés à ce document. |