Show simple item record

dc.rights.licenseopenen_US
hal.structure.identifierBordeaux population health [BPH]
dc.contributor.authorXU, Binbin
hal.structure.identifierBordeaux population health [BPH]
dc.contributor.authorGIL-JARDINE, Cedric
ORCID: 0000-0001-5329-6405
IDREF: 159039223
hal.structure.identifierBordeaux population health [BPH]
dc.contributor.authorTHIESSARD, Frantz
hal.structure.identifierBordeaux population health [BPH]
dc.contributor.authorTELLIER, Eric
hal.structure.identifierStatistics In System biology and Translational Medicine [SISTM]
hal.structure.identifierBordeaux population health [BPH]
dc.contributor.authorAVALOS FERNANDEZ, Marta
hal.structure.identifierBordeaux population health [BPH]
dc.contributor.authorLAGARDE, Emmanuel
dc.date.accessioned2021-04-30T10:46:44Z
dc.date.available2021-04-30T10:46:44Z
dc.identifier.urihttps://oskar-bordeaux.fr/handle/20.500.12278/27139
dc.description.abstractEnIn order to build a national injury surveillance system based on emergency room (ER) visits we are developing a coding system to classify their causes from clinical notes in free-text. Supervised learning techniques have shown good results in this area but require large number of annotated dataset. New levels of performance have been recently achieved in neural language models (NLM) with models based on the Transformer architecture incorporating an unsupervised generative pre-training step. Our hypothesis is that methods involving a generative self-supervised pre-training step can significantly reduce the required number of annotated samples for supervised fine-tuning. In this case study, we assessed whether we could predict from free-text clinical notes whether a visit was the consequence of a traumatic or non-traumatic event. Using fully re-trained GPT-2 models (without OpenAI pre-trained weightings), we compared two scenarios: Scenario A (26 study cases of different training data sizes) consisted in training the GPT-2 on the trauma/non-trauma labeled (up to 161 930) clinical notes. In Scenario B (19 study cases), a first step of self-supervised pre-training phase with unlabeled (up to 151 930) notes and the second step of supervised fine-tuning with labeled (up to 10 000) notes. Results showed that, Scenario A needed to process >6 000 notes to achieve good performance (AUC>0.95), Scenario B needed only 600 notes, gain of a factor 10. At the end case of both scenarios, for 16 times more data (161 930 vs. 10 000), the gain from Scenario A compared to Scenario B is only an improvement of 0.89% in AUC and 2.12% in F1 score. To conclude, it is possible to adapt a multi-purpose NLM model such as the GPT-2 to create a powerful tool for classification of free-text notes with only very small number of labeled samples.
dc.language.isoENen_US
dc.subject.enNeural Language Model
dc.subject.enPre-training
dc.subject.enTransformer
dc.subject.enGPT-2
dc.title.enNeural Language Model for Automated Classification of Electronic Medical Records at the Emergency Room. The Significant Benefit of Unsupervised Generative Pre-training
dc.typeDocument de travail - Pré-publicationen_US
dc.subject.halStatistiques [stat]/Machine Learning [stat.ML]en_US
dc.subject.halStatistiques [stat]/Méthodologie [stat.ME]en_US
dc.subject.halStatistiques [stat]/Calcul [stat.CO]en_US
dc.subject.halStatistiques [stat]/Applications [stat.AP]en_US
dc.subject.halInformatique [cs]/Apprentissage [cs.LG]en_US
dc.subject.halSciences du Vivant [q-bio]/Santé publique et épidémiologieen_US
dc.identifier.arxiv1909.01136en_US
bordeaux.hal.laboratoriesBordeaux Population Health Research Center (BPH) - UMR 1219en_US
bordeaux.institutionUniversité de Bordeauxen_US
bordeaux.institutionINSERMen_US
bordeaux.teamSISTMen_US
bordeaux.teamERIASen_US
bordeaux.teamSISTM_BPH
bordeaux.teamIETOen_US
bordeaux.import.sourcehal
hal.identifierhal-02425097
hal.version1
hal.exportfalse
workflow.import.sourcehal
bordeaux.COinSctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.au=XU,%20Binbin&GIL-JARDINE,%20Cedric&THIESSARD,%20Frantz&TELLIER,%20Eric&AVALOS%20FERNANDEZ,%20Marta&rft.genre=preprint


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record