Show simple item record

dc.rights.licenseopenen_US
hal.structure.identifierStatistics In System biology and Translational Medicine [SISTM]
hal.structure.identifierBordeaux population health [BPH]
dc.contributor.authorHIVERT, Benjamin
hal.structure.identifierRand Corporation
dc.contributor.authorAGNIEL, Denis
hal.structure.identifierStatistics In System biology and Translational Medicine [SISTM]
hal.structure.identifierBordeaux population health [BPH]
dc.contributor.authorTHIEBAUT, Rodolphe
hal.structure.identifierStatistics In System biology and Translational Medicine [SISTM]
hal.structure.identifierBordeaux population health [BPH]
dc.contributor.authorHEJBLUM, Boris
ORCID: 0000-0003-0646-452X
IDREF: 189970316
dc.date.accessioned2024-10-28T10:35:08Z
dc.date.available2024-10-28T10:35:08Z
dc.date.created2024
dc.identifier.urihttps://oskar-bordeaux.fr/handle/20.500.12278/202848
dc.description.abstractEnPost-clustering inference in scRNA-seq analysis presents significant challenges in controlling Type I error during Differential Expression Analysis. Data fission, a promising approach, aims to split the data into two new independent parts, but relies on strong parametric assumptions of non-mixture distributions, which are violated in clustered data. We show that applying data fission to these mixtures requires knowledge of the clustering structure to accurately estimate component-specific scale parameters. These estimates are critical for ensuring decomposition and independence. We theoretically quantify the direct impact of the bias in estimating this scales parameters on the inflation of the Type I error rate, caused by a deviation from the independence. Since component structures are unknown in practice, we propose a heteroscedastic model with non-parametric estimators for individual scale parameters. This model uses proximity between observations to capture the effect of the underlying mixture on data dispersion. While this approach works well when clusters are well-separated, it introduces bias when separation is weak, highlighting the difficulty of applying data fission in real-world scenarios with unknown degrees of separation.
dc.description.sponsorshipMultiScale AI for SingleCell-Based Precision Medicine - ANR-22-PESN-0002en_US
dc.description.sponsorshipUniversity of Bordeaux Graduate School in Digital Public Health - ANR-17-EURE-0019en_US
dc.language.isoENen_US
dc.subject.enUnsupervised learning
dc.subject.enMixture Model
dc.subject.enPost-clustering inference
dc.subject.enType I error
dc.subject.enNon-parametric estimation
dc.subject.enlocal variance
dc.title.enRunning in circles: practical limitations for real-life application of data fission and data thinning in post-clustering differential analysis
dc.typeDocument de travail - Pré-publicationen_US
dc.subject.halStatistiques [stat]/Applications [stat.AP]en_US
dc.subject.halSciences du Vivant [q-bio]/Biochimie, Biologie Moléculaire/Génomique, Transcriptomique et Protéomique [q-bio.GN]en_US
dc.subject.halStatistiques [stat]/Méthodologie [stat.ME]en_US
dc.subject.halStatistiques [stat]/Machine Learning [stat.ML]en_US
dc.identifier.arxiv2405.13591en_US
dc.description.sponsorshipEuropeEuropean HIV Vaccine Alliance (EHVA): a EU platform for the discovery and evaluation of novel prophylactic and therapeutic vaccine candidatesen_US
bordeaux.hal.laboratoriesBordeaux Population Health Research Center (BPH) - UMR 1219en_US
bordeaux.institutionUniversité de Bordeauxen_US
bordeaux.institutionINSERMen_US
bordeaux.institutionINRIAen_US
bordeaux.teamSISTM_BPHen_US
bordeaux.import.sourcehal
hal.identifierhal-04745517
hal.version1
hal.popularnonen_US
hal.audienceInternationaleen_US
hal.exportfalse
workflow.import.sourcehal
dc.rights.ccPas de Licence CCen_US
bordeaux.subtypePrepublication/Preprinten_US
bordeaux.COinSctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.au=HIVERT,%20Benjamin&AGNIEL,%20Denis&THIEBAUT,%20Rodolphe&HEJBLUM,%20Boris&rft.genre=preprint


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record