Mostrar el registro sencillo del ítem
Running in circles: practical limitations for real-life application of data fission and data thinning in post-clustering differential analysis
dc.rights.license | open | en_US |
hal.structure.identifier | Statistics In System biology and Translational Medicine [SISTM] | |
hal.structure.identifier | Bordeaux population health [BPH] | |
dc.contributor.author | HIVERT, Benjamin | |
hal.structure.identifier | Rand Corporation | |
dc.contributor.author | AGNIEL, Denis | |
hal.structure.identifier | Statistics In System biology and Translational Medicine [SISTM] | |
hal.structure.identifier | Bordeaux population health [BPH] | |
dc.contributor.author | THIEBAUT, Rodolphe | |
hal.structure.identifier | Statistics In System biology and Translational Medicine [SISTM] | |
hal.structure.identifier | Bordeaux population health [BPH] | |
dc.contributor.author | HEJBLUM, Boris
ORCID: 0000-0003-0646-452X IDREF: 189970316 | |
dc.date.accessioned | 2024-10-28T10:35:08Z | |
dc.date.available | 2024-10-28T10:35:08Z | |
dc.date.created | 2024 | |
dc.identifier.uri | https://oskar-bordeaux.fr/handle/20.500.12278/202848 | |
dc.description.abstractEn | Post-clustering inference in scRNA-seq analysis presents significant challenges in controlling Type I error during Differential Expression Analysis. Data fission, a promising approach, aims to split the data into two new independent parts, but relies on strong parametric assumptions of non-mixture distributions, which are violated in clustered data. We show that applying data fission to these mixtures requires knowledge of the clustering structure to accurately estimate component-specific scale parameters. These estimates are critical for ensuring decomposition and independence. We theoretically quantify the direct impact of the bias in estimating this scales parameters on the inflation of the Type I error rate, caused by a deviation from the independence. Since component structures are unknown in practice, we propose a heteroscedastic model with non-parametric estimators for individual scale parameters. This model uses proximity between observations to capture the effect of the underlying mixture on data dispersion. While this approach works well when clusters are well-separated, it introduces bias when separation is weak, highlighting the difficulty of applying data fission in real-world scenarios with unknown degrees of separation. | |
dc.description.sponsorship | MultiScale AI for SingleCell-Based Precision Medicine - ANR-22-PESN-0002 | en_US |
dc.description.sponsorship | University of Bordeaux Graduate School in Digital Public Health - ANR-17-EURE-0019 | en_US |
dc.language.iso | EN | en_US |
dc.subject.en | Unsupervised learning | |
dc.subject.en | Mixture Model | |
dc.subject.en | Post-clustering inference | |
dc.subject.en | Type I error | |
dc.subject.en | Non-parametric estimation | |
dc.subject.en | local variance | |
dc.title.en | Running in circles: practical limitations for real-life application of data fission and data thinning in post-clustering differential analysis | |
dc.type | Document de travail - Pré-publication | en_US |
dc.subject.hal | Statistiques [stat]/Applications [stat.AP] | en_US |
dc.subject.hal | Sciences du Vivant [q-bio]/Biochimie, Biologie Moléculaire/Génomique, Transcriptomique et Protéomique [q-bio.GN] | en_US |
dc.subject.hal | Statistiques [stat]/Méthodologie [stat.ME] | en_US |
dc.subject.hal | Statistiques [stat]/Machine Learning [stat.ML] | en_US |
dc.identifier.arxiv | 2405.13591 | en_US |
dc.description.sponsorshipEurope | European HIV Vaccine Alliance (EHVA): a EU platform for the discovery and evaluation of novel prophylactic and therapeutic vaccine candidates | en_US |
bordeaux.hal.laboratories | Bordeaux Population Health Research Center (BPH) - UMR 1219 | en_US |
bordeaux.institution | Université de Bordeaux | en_US |
bordeaux.institution | INSERM | en_US |
bordeaux.institution | INRIA | en_US |
bordeaux.team | SISTM_BPH | en_US |
bordeaux.import.source | hal | |
hal.identifier | hal-04745517 | |
hal.version | 1 | |
hal.popular | non | en_US |
hal.audience | Internationale | en_US |
hal.export | false | |
workflow.import.source | hal | |
dc.rights.cc | Pas de Licence CC | en_US |
bordeaux.subtype | Prepublication/Preprint | en_US |
bordeaux.COinS | ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.au=HIVERT,%20Benjamin&AGNIEL,%20Denis&THIEBAUT,%20Rodolphe&HEJBLUM,%20Boris&rft.genre=preprint |