HIVERT, Benjamin; AGNIEL, Denis; THIEBAUT, Rodolphe; HEJBLUM, Boris

Statistics In System biology and Translational Medicine [SISTM]
Bordeaux population health [BPH]

AGNIEL, Denis
Rand Corporation

THIEBAUT, Rodolphe

Statistics In System biology and Translational Medicine [SISTM]
Bordeaux population health [BPH]

Language

Document de travail - Pré-publication

English Abstract

Post-clustering inference in scRNA-seq analysis presents significant challenges in controlling Type I error during Differential Expression Analysis. Data fission, a promising approach, aims to split the data into two new independent parts, but relies on strong parametric assumptions of non-mixture distributions, which are violated in clustered data. We show that applying data fission to these mixtures requires knowledge of the clustering structure to accurately estimate component-specific scale parameters. These estimates are critical for ensuring decomposition and independence. We theoretically quantify the direct impact of the bias in estimating this scales parameters on the inflation of the Type I error rate, caused by a deviation from the independence. Since component structures are unknown in practice, we propose a heteroscedastic model with non-parametric estimators for individual scale parameters. This model uses proximity between observations to capture the effect of the underlying mixture on data dispersion. While this approach works well when clusters are well-separated, it introduces bias when separation is weak, highlighting the difficulty of applying data fission in real-world scenarios with unknown degrees of separation.Read less <

English Keywords

Unsupervised learning

Mixture Model

Post-clustering inference

Type I error

Non-parametric estimation

local variance

URI

https://oskar-bordeaux.fr/handle/20.500.12278/202848

European Project

European HIV Vaccine Alliance (EHVA): a EU platform for the discovery and evaluation of novel prophylactic and therapeutic vaccine candidates

ANR Project

MultiScale AI for SingleCell-Based Precision Medicine - ANR-22-PESN-0002
University of Bordeaux Graduate School in Digital Public Health - ANR-17-EURE-0019

Collections

Bordeaux Population Health Research Center (BPH) - UMR 1219

View/Open

Metadata

Share this item!

License

Running in circles: practical limitations for real-life application of data fission and data thinning in post-clustering differential analysis