CHAVENT, Marie; LACAILLE, Jérôme; MOURER, Alex; OLTEANU, Madalina

hal.structure.identifier	Université de Bordeaux [UB]
hal.structure.identifier	Méthodes avancées d’apprentissage statistique et de contrôle [ASTRAL]
dc.contributor.author	CHAVENT, Marie
hal.structure.identifier	Safran Aircraft Engines
dc.contributor.author	LACAILLE, Jérôme
hal.structure.identifier	Statistique, Analyse et Modélisation Multidisciplinaire (SAmos-Marin Mersenne) [SAMM]
dc.contributor.author	MOURER, Alex
hal.structure.identifier	CEntre de REcherches en MAthématiques de la DEcision [CEREMADE]
dc.contributor.author	OLTEANU, Madalina
dc.date.accessioned	2024-04-04T02:39:47Z
dc.date.available	2024-04-04T02:39:47Z
dc.date.conference	2022-06-13
dc.identifier.uri	https://oskar-bordeaux.fr/handle/20.500.12278/191002
dc.description.abstractEn	High-dimensional data may often contain both numerical and categorical features, and in some cases features may be available as natural groups (repeated measurements, categories of features, ...). Clustering this kind of data raises several issues: how to simultaneously deal with numerical and categorical features? how to build meaningful clusters of the input entities? how to select the most informative features or groups of features for the clustering? In the k-means framework, one may rely on a penalised version of the between-cluster variance, and find both the best partitioning of the data, and the most informative features or groups of features. The present manuscript illustrates sparse k-means and group sparse k-means for mixed data, using the vimpclust package. The example provided on a small real-life dataset shows how feature selection may be directly combined with clustering, and provide a meaningful selection while preserving the quality of the clustering.
dc.language.iso	en
dc.subject	clustering
dc.subject	k-means parcimonieux
dc.subject	pénalités L1 et L1-groupe
dc.subject	données mixtes
dc.subject	packages R
dc.subject.en	clustering
dc.subject.en	sparse k-means
dc.subject.en	L1 and group-L1 penalties
dc.subject.en	mixed data
dc.subject.en	R packages
dc.title.en	Sparse and group-sparse clustering for mixed data An illustration of the vimpclust package
dc.type	Communication dans un congrès
dc.subject.hal	Mathématiques [math]/Statistiques [math.ST]
bordeaux.hal.laboratories	Institut de Mathématiques de Bordeaux (IMB) - UMR 5251	*
bordeaux.institution	Université de Bordeaux
bordeaux.institution	Bordeaux INP
bordeaux.institution	CNRS
bordeaux.conference.title	JDS 2022 - 53èmes Journées de Statistique de la Société Française de Statistique (SFdS)
bordeaux.country	FR
bordeaux.conference.city	Lyon
bordeaux.peerReviewed	oui
hal.identifier	hal-03839521
hal.version	1
hal.invited	non
hal.proceedings	non
hal.conference.end	2022-06-17
hal.popular	non
hal.audience	Internationale
hal.origin.link	https://hal.archives-ouvertes.fr//hal-03839521v1
bordeaux.COinS	ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.au=CHAVENT,%20Marie&LACAILLE,%20J%C3%A9r%C3%B4me&MOURER,%20Alex&OLTEANU,%20Madalina&rft.genre=unknown

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Institut de Mathématiques de Bordeaux (IMB) - UMR 5251

Show simple item record

Sparse and group-sparse clustering for mixed data An illustration of the vimpclust package

Files in this item

This item appears in the following Collection(s)