DRAME, Khadim; DIALLO, Abdourahmane Gayo; SAMBE, Gorgoumack

doi:10.1007/978-3-031-24197-0_2

El sistema se apagará debido a tareas habituales de mantenimiento. Por favor, guarde su trabajo y desconéctese.

dc.rights.license	open	en_US
dc.contributor.author	DRAME, Khadim
hal.structure.identifier	Statistics In System biology and Translational Medicine [SISTM]
hal.structure.identifier	Bordeaux population health [BPH]
dc.contributor.author	DIALLO, Abdourahmane Gayo ORCID: 0000-0002-9799-9484 IDREF: 112800084
dc.contributor.author	SAMBE, Gorgoumack
dc.date.accessioned	2023-06-23T13:01:52Z
dc.date.available	2023-06-23T13:01:52Z
dc.date.issued	2023-01-18
dc.date.conference	2021-10-26
dc.identifier.uri	https://oskar-bordeaux.fr/handle/20.500.12278/182787
dc.description.abstractEn	Finding similar sentences or paragraphs is a key issue when dealing with text redundancy. This is particularly the case in the clinical domain where redundancy in clinical notes makes their secondary use limited. Due to lack of resources, this task is a key challenge for French clinical documents. In this paper, we introduce a semantic similarity computing approach between French clinical sentences based on supervised machine learning algorithms. The proposed approach is implemented in a system called CONCORDIA, for COmputing semaNtic sentenCes for fRench Clinical Documents sImilArity. After briefly reviewing various semantic textual similarity measures reported in the literature, we describe the approach, which relies on Random Forest (RF), Multilayer Perceptron (MLP) and Linear Regression (LR) algorithms to build different supervised models. These models are thereafter used to determine the degrees of semantic similarity between clinical sentences. CONCORDIA is evaluated using traditional evaluation metrics, EDRM (Accuracy in relative distance to the average solution) and Spearman correlation, on standard benchmarks provided in the context of the DEFT 2020 challenge. According to the official results of this challenge, our MLP based model ranked first out of the 15 submitted systems with an EDRM of 0.8217 and a Spearman correlation coefficient of 0.7691. The post-challenge development of CONCORDIA and the experiments performed after the DEFT 2020 edition showed a significant improvement of the performance of the different implemented models. In particular, the new MLP based model achieves a Spearman correlation coefficient of 0.80. On the other hand, the LR one, which combines the output of the MLP model with word embedding similarity scores, obtains the higher Spearman correlation coefficient with a score of 0.8030. Therefore, the experiments show the effectiveness and the relevance of the proposed approach for finding similar sentences on French clinical notes.
dc.language.iso	EN	en_US
dc.publisher	Springer, Cham	en_US
dc.subject.en	Sentence similarity
dc.subject.en	Machine learning
dc.subject.en	Random forest
dc.subject.en	Multilayer perceptron
dc.subject.en	French clinical notes
dc.title.en	Machine Learning Based Finding of Similar Sentences from French Clinical Notes
dc.type	Communication dans un congrès avec actes	en_US
dc.identifier.doi	10.1007/978-3-031-24197-0_2	en_US
dc.subject.hal	Sciences du Vivant [q-bio]/Santé publique et épidémiologie	en_US
bordeaux.page	26-42	en_US
bordeaux.volume	469	en_US
bordeaux.hal.laboratories	Bordeaux Population Health Research Center (BPH) - UMR 1219	en_US
bordeaux.institution	Université de Bordeaux	en_US
bordeaux.institution	INSERM	en_US
bordeaux.conference.title	International Conference on Web Information Systems and Technologies	en_US
bordeaux.country	fr	en_US
bordeaux.title.proceeding	Lecture Notes in Business Information Processing	en_US
bordeaux.team	AHEAD_BPH	en_US
bordeaux.team	SISTM_BPH	en_US
bordeaux.conference.city	Virtual	en_US
bordeaux.peerReviewed	oui	en_US
hal.identifier	hal-04139356
hal.version	1
hal.date.transferred	2023-06-23T13:01:54Z
hal.export	true
dc.rights.cc	Pas de Licence CC	en_US
bordeaux.COinS	ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.date=2023-01-18&rft.volume=469&rft.spage=26-42&rft.epage=26-42&rft.au=DRAME,%20Khadim&DIALLO,%20Abdourahmane%20Gayo&SAMBE,%20Gorgoumack&rft.genre=proceeding

Archivos en el ítem

Archivos	Tamaño	Formato	Ver
No hay archivos asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Bordeaux Population Health Research Center (BPH) - UMR 1219

Mostrar el registro sencillo del ítem

Machine Learning Based Finding of Similar Sentences from French Clinical Notes

Archivos en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)