Post-clustering difference testing: valid inference and practical considerations
HIVERT, Benjamin
Statistics In System biology and Translational Medicine [SISTM]
Bordeaux population health [BPH]
Statistics In System biology and Translational Medicine [SISTM]
Bordeaux population health [BPH]
THIEBAUT, Rodolphe
Statistics In System biology and Translational Medicine [SISTM]
Bordeaux population health [BPH]
Leer más >
Statistics In System biology and Translational Medicine [SISTM]
Bordeaux population health [BPH]
HIVERT, Benjamin
Statistics In System biology and Translational Medicine [SISTM]
Bordeaux population health [BPH]
Statistics In System biology and Translational Medicine [SISTM]
Bordeaux population health [BPH]
THIEBAUT, Rodolphe
Statistics In System biology and Translational Medicine [SISTM]
Bordeaux population health [BPH]
Statistics In System biology and Translational Medicine [SISTM]
Bordeaux population health [BPH]
HEJBLUM, Boris
Statistics In System biology and Translational Medicine [SISTM]
Bordeaux population health [BPH]
< Leer menos

Statistics In System biology and Translational Medicine [SISTM]
Bordeaux population health [BPH]
Idioma
EN
Article de revue
Este ítem está publicado en
Computational Statistics and Data Analysis.
Fecha de defensa
2024Resumen en inglés
Clustering is part of unsupervised analysis methods that consist in grouping samples into homogeneous and separate subgroups of observations also called clusters. To interpret the clusters, statistical hypothesis testing ...Leer más >
Clustering is part of unsupervised analysis methods that consist in grouping samples into homogeneous and separate subgroups of observations also called clusters. To interpret the clusters, statistical hypothesis testing is often used to infer the variables that significantly separate the estimated clusters from each other. However, data-driven hypotheses are considered for the inference process, since the hypotheses are derived from the clustering results. This double use of the data leads traditional hypothesis test to fail to control the Type I error rate particularly because of uncertainty in the clustering process and the potential artificial differences it could create. We propose three novel statistical hypothesis tests which account for the clustering process. Our tests efficiently control the Type I error rate by identifying only variables that contain a true signal separating groups of observations.< Leer menos
Palabras clave en inglés
Clustering
Hypothesis testing
Double-dipping
Circular analysis
Selective inference
Multimodality test
Dip Test
Centros de investigación