HEJBLUM, Boris; WEBER, G. M.; LIAO, K. P.; PALMER, N. P.; CHURCHILL, S.; SHADICK, N. A.; SZOLOVITS, P.; MURPHY, S. N.; KOHANE, I. S.; CAI, T.

doi:10.1038/sdata.2018.298

La plateforme OSKAR Bordeaux évolue pour rejoindre l'archive ouverte HAL. Retrouvez tous vos dépôts sur le nouveau portail HAL UB : https://u-bordeaux.hal.science/. Pour toute aide ou information, contactez-nous info@oskar-bordeaux.fr

Bordeaux population health [BPH]

WEBER, G. M.

LIAO, K. P.

Langue

Article de revue

Ce document a été publié dans

Scientific Data. 2019-01-08, vol. 6, p. 180298

Résumé en anglais

We develop an algorithm for probabilistic linkage of de-identified research datasets at the patient level, when only diagnosis codes with discrepancies and no personal health identifiers such as name or date of birth are available. It relies on Bayesian modelling of binarized diagnosis codes, and provides a posterior probability of matching for each patient pair, while considering all the data at once. Both in our simulation study (using an administrative claims dataset for data generation) and in two real use-cases linking patient electronic health records from a large tertiary care network, our method exhibits good performance and compares favourably to the standard baseline Fellegi-Sunter algorithm. We propose a scalable, fast and efficient open-source implementation in the ludic R package available on CRAN, which also includes the anonymized diagnosis code data from our real use-case. This work suggests it is possible to link de-identified research databases stripped of any personal health identifiers using only diagnosis codes, provided sufficient information is shared between the data sources.< Réduire

Mots clés en anglais

SISTM

URI

https://oskar-bordeaux.fr/handle/20.500.12278/8118

DOI

http://dx.doi.org/10.1038/sdata.2018.298

Unités de recherche

Bordeaux Population Health Research Center (BPH) - UMR 1219

Voir/Ouvrir

Métadonnées

Partager cette publication !

Licence d’utilisation du document

Probabilistic record linkage of de-identified research datasets with discrepancies using diagnosis codes

Langue

Ce document a été publié dans

Résumé en anglais

Mots clés en anglais

URI

DOI

Unités de recherche