HIVERT, Benjamin; AGNIEL, Denis; THIEBAUT, Rodolphe; HEJBLUM, Boris

La plateforme OSKAR Bordeaux évolue pour rejoindre l'archive ouverte HAL. Retrouvez tous vos dépôts sur le nouveau portail HAL UB : https://u-bordeaux.hal.science/. Pour toute aide ou information, contactez-nous info@oskar-bordeaux.fr

Voir/Ouvrir

BPH_2022_Hivert_preprint.pdf (3.435Mo)

Métadonnées

Afficher la notice complète

Licence d’utilisation du document

HIVERT, Benjamin
Statistics In System biology and Translational Medicine [SISTM]
Bordeaux population health [BPH]

AGNIEL, Denis

THIEBAUT, Rodolphe

Statistics In System biology and Translational Medicine [SISTM]
Bordeaux population health [BPH]

Langue

Article de revue

Ce document a été publié dans

Computational Statistics and Data Analysis.

Date de soutenance

2024

Résumé en anglais

Clustering is part of unsupervised analysis methods that consist in grouping samples into homogeneous and separate subgroups of observations also called clusters. To interpret the clusters, statistical hypothesis testing is often used to infer the variables that significantly separate the estimated clusters from each other. However, data-driven hypotheses are considered for the inference process, since the hypotheses are derived from the clustering results. This double use of the data leads traditional hypothesis test to fail to control the Type I error rate particularly because of uncertainty in the clustering process and the potential artificial differences it could create. We propose three novel statistical hypothesis tests which account for the clustering process. Our tests efficiently control the Type I error rate by identifying only variables that contain a true signal separating groups of observations.< Réduire

Mots clés en anglais

Clustering

Hypothesis testing

Double-dipping

Circular analysis

Selective inference

Multimodality test

Dip Test

URI

https://oskar-bordeaux.fr/handle/20.500.12278/172203

Unités de recherche

Bordeaux Population Health Research Center (BPH) - UMR 1219