XU, Binbin; GIL-JARDINE, Cedric; THIESSARD, Frantz; TELLIER, Eric; AVALOS FERNANDEZ, Marta; LAGARDE, Emmanuel

dc.rights.license	open	en_US
hal.structure.identifier	Bordeaux population health [BPH]
dc.contributor.author	XU, Binbin
hal.structure.identifier	Bordeaux population health [BPH]
dc.contributor.author	GIL-JARDINE, Cedric
hal.structure.identifier	Bordeaux population health [BPH]
dc.contributor.author	THIESSARD, Frantz
hal.structure.identifier	Bordeaux population health [BPH]
dc.contributor.author	TELLIER, Eric
hal.structure.identifier	Bordeaux population health [BPH]
dc.contributor.author	AVALOS FERNANDEZ, Marta
hal.structure.identifier	Bordeaux population health [BPH]
dc.contributor.author	LAGARDE, Emmanuel
dc.date.accessioned	2021-02-23T09:02:28Z
dc.date.available	2021-02-23T09:02:28Z
dc.date.issued	2020
dc.date.conference	2020-05-19
dc.identifier.uri	https://oskar-bordeaux.fr/handle/20.500.12278/26321
dc.description.abstractEn	To build a French national electronic injury surveillance system based on emergency room visits, we aim to develop a coding system to classify their causes from clinical notes in free-text. Supervised learning techniques have shown good results in this area but require a large amount of expert annotated dataset which is time consuming and costly to obtain. We hypothesize that the Natural Language Processing Transformer model incorporating a generative self-supervised pre-training step can significantly reduce the required number of annotated samples for supervised fine-tuning. In this preliminary study, we test our hypothesis in the simplified problem of predicting whether a visit is the consequence of a traumatic event or not from free-text clinical notes. Using fully retrained GPT-2 models (without OpenAI pre-trained weights), we assess the gain of applying a self-supervised pre-training phase with unlabeled notes prior to the supervised learning task. Results show that the number of data required to achieve a ginve level of performance (AUC>0.95) was reduced by a factor of 10 when applying pre-training. Namely, for 16 times more data, the fully-supervised model achieved an improvement <1% in AUC. To conclude, it is possible to adapt a multipurpose neural language model such as the GPT-2 to create a powerful tool for classification of free-text notes with only a small number of labeled samples.
dc.language.iso	EN	en_US
dc.publisher	The AAAI Press	en_US
dc.subject	SISTM
dc.subject	ERIAS
dc.subject	IETO
dc.title.en	Pre-Training a Neural Language Model Improves the Sample Efficiency of an Emergency Room Classification Model
dc.title.alternative	FLAIRS-32 - Thirty-Third International Flairs Conference	en_US
dc.type	Communication dans un congrès avec actes	en_US
dc.subject.hal	Sciences du Vivant [q-bio]/Santé publique et épidémiologie	en_US
bordeaux.page	264-9	en_US
bordeaux.hal.laboratories	Bordeaux Population Health Research Center (BPH) - U1219	en_US
bordeaux.institution	Université de Bordeaux	en_US
bordeaux.conference.title	FLAIRS-32 - Thirty-Third International Flairs Conference	en_US
bordeaux.country	us	en_US
bordeaux.title.proceeding	Proceedings of the thirty-third international florida artificial intelligence research society conference	en_US
bordeaux.team	SISTM_BPH
bordeaux.team	ERIAS	en_US
bordeaux.team	IETO	en_US
bordeaux.conference.city	Palo Alto	en_US
bordeaux.peerReviewed	oui	en_US
hal.export	false
bordeaux.COinS	ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.date=2020&rft.spage=264-9&rft.epage=264-9&rft.au=XU,%20Binbin&GIL-JARDINE,%20Cedric&THIESSARD,%20Frantz&TELLIER,%20Eric&AVALOS%20FERNANDEZ,%20Marta&rft.genre=proceeding

Archivos en el ítem

Archivos	Tamaño	Formato	Ver
No hay archivos asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Bordeaux Population Health Research Center (BPH) - UMR 1219

Mostrar el registro sencillo del ítem

Pre-Training a Neural Language Model Improves the Sample Efficiency of an Emergency Room Classification Model

Archivos en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)