Combining clustering of variables and feature selection using random forests
CHAVENT, Marie
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
SARACCO, Jerome
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
Ecole Nationale Supérieure de Cognitique [ENSC]
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
Ecole Nationale Supérieure de Cognitique [ENSC]
CHAVENT, Marie
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
SARACCO, Jerome
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
Ecole Nationale Supérieure de Cognitique [ENSC]
< Leer menos
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
Ecole Nationale Supérieure de Cognitique [ENSC]
Idioma
en
Article de revue
Este ítem está publicado en
Communications in Statistics - Simulation and Computation. 2021-01-11, vol. 50, n° 2, p. 426-445
Taylor & Francis
Resumen en inglés
Standard approaches to tackle high-dimensional supervised classification often include variable selection and dimension reduction. The proposed methodology combines clustering of variables and feature selection. Hierarchical ...Leer más >
Standard approaches to tackle high-dimensional supervised classification often include variable selection and dimension reduction. The proposed methodology combines clustering of variables and feature selection. Hierarchical clustering of variables allows to built groups of correlated variables and summarizes each group by a synthetic variable. Originality is that groups of variables are unknown a priori. Moreover clustering approach deals with both numerical and categorical variables. Among all the possible partitions, the most relevant synthetic variables are selected with a procedure using random forests. Numerical performances are illustrated on simulated and real datasets. Selection of groups of variables provides easier interpretation of results.< Leer menos
Palabras clave en inglés
Clustering of variables
Random forests
Supervised classification
Variable selection
Orígen
Importado de HalCentros de investigación