Mostrar el registro sencillo del ítem
Pre-Training a Neural Language Model Improves the Sample Efficiency of an Emergency Room Classification Model
dc.rights.license | open | en_US |
hal.structure.identifier | Bordeaux population health [BPH] | |
dc.contributor.author | XU, Binbin | |
hal.structure.identifier | Bordeaux population health [BPH] | |
dc.contributor.author | GIL-JARDINE, Cedric
ORCID: 0000-0001-5329-6405 IDREF: 159039223 | |
hal.structure.identifier | Bordeaux population health [BPH] | |
dc.contributor.author | THIESSARD, Frantz | |
hal.structure.identifier | Bordeaux population health [BPH] | |
dc.contributor.author | TELLIER, Eric | |
hal.structure.identifier | Bordeaux population health [BPH] | |
dc.contributor.author | AVALOS FERNANDEZ, Marta | |
hal.structure.identifier | Bordeaux population health [BPH] | |
dc.contributor.author | LAGARDE, Emmanuel | |
dc.date.accessioned | 2021-02-23T09:02:28Z | |
dc.date.available | 2021-02-23T09:02:28Z | |
dc.date.issued | 2020 | |
dc.date.conference | 2020-05-19 | |
dc.identifier.uri | https://oskar-bordeaux.fr/handle/20.500.12278/26321 | |
dc.description.abstractEn | To build a French national electronic injury surveillance system based on emergency room visits, we aim to develop a coding system to classify their causes from clinical notes in free-text. Supervised learning techniques have shown good results in this area but require a large amount of expert annotated dataset which is time consuming and costly to obtain. We hypothesize that the Natural Language Processing Transformer model incorporating a generative self-supervised pre-training step can significantly reduce the required number of annotated samples for supervised fine-tuning. In this preliminary study, we test our hypothesis in the simplified problem of predicting whether a visit is the consequence of a traumatic event or not from free-text clinical notes. Using fully retrained GPT-2 models (without OpenAI pre-trained weights), we assess the gain of applying a self-supervised pre-training phase with unlabeled notes prior to the supervised learning task. Results show that the number of data required to achieve a ginve level of performance (AUC>0.95) was reduced by a factor of 10 when applying pre-training. Namely, for 16 times more data, the fully-supervised model achieved an improvement <1% in AUC. To conclude, it is possible to adapt a multipurpose neural language model such as the GPT-2 to create a powerful tool for classification of free-text notes with only a small number of labeled samples. | |
dc.language.iso | EN | en_US |
dc.publisher | The AAAI Press | en_US |
dc.subject | SISTM | |
dc.subject | ERIAS | |
dc.subject | IETO | |
dc.title.en | Pre-Training a Neural Language Model Improves the Sample Efficiency of an Emergency Room Classification Model | |
dc.title.alternative | FLAIRS-32 - Thirty-Third International Flairs Conference | en_US |
dc.type | Communication dans un congrès avec actes | en_US |
dc.subject.hal | Sciences du Vivant [q-bio]/Santé publique et épidémiologie | en_US |
bordeaux.page | 264-9 | en_US |
bordeaux.hal.laboratories | Bordeaux Population Health Research Center (BPH) - UMR 1219 | en_US |
bordeaux.institution | Université de Bordeaux | en_US |
bordeaux.conference.title | FLAIRS-32 - Thirty-Third International Flairs Conference | en_US |
bordeaux.country | us | en_US |
bordeaux.title.proceeding | Proceedings of the thirty-third international florida artificial intelligence research society conference | en_US |
bordeaux.team | SISTM_BPH | |
bordeaux.team | ERIAS | en_US |
bordeaux.team | IETO | en_US |
bordeaux.conference.city | Palo Alto | en_US |
bordeaux.peerReviewed | oui | en_US |
hal.export | false | |
bordeaux.COinS | ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.date=2020&rft.spage=264-9&rft.epage=264-9&rft.au=XU,%20Binbin&GIL-JARDINE,%20Cedric&THIESSARD,%20Frantz&TELLIER,%20Eric&AVALOS%20FERNANDEZ,%20Marta&rft.genre=proceeding |
Archivos en el ítem
Archivos | Tamaño | Formato | Ver |
---|---|---|---|
No hay archivos asociados a este ítem. |