Mostrar el registro sencillo del ítem

dc.rights.licenseopenen_US
dc.contributor.authorDRAME, Khadim
hal.structure.identifierStatistics In System biology and Translational Medicine [SISTM]
hal.structure.identifierBordeaux population health [BPH]
dc.contributor.authorDIALLO, Abdourahmane Gayo
ORCID: 0000-0002-9799-9484
IDREF: 112800084
dc.contributor.authorSAMBE, Gorgoumack
dc.date.accessioned2023-06-23T13:01:52Z
dc.date.available2023-06-23T13:01:52Z
dc.date.issued2023-01-18
dc.date.conference2021-10-26
dc.identifier.urihttps://oskar-bordeaux.fr/handle/20.500.12278/182787
dc.description.abstractEnFinding similar sentences or paragraphs is a key issue when dealing with text redundancy. This is particularly the case in the clinical domain where redundancy in clinical notes makes their secondary use limited. Due to lack of resources, this task is a key challenge for French clinical documents. In this paper, we introduce a semantic similarity computing approach between French clinical sentences based on supervised machine learning algorithms. The proposed approach is implemented in a system called CONCORDIA, for COmputing semaNtic sentenCes for fRench Clinical Documents sImilArity. After briefly reviewing various semantic textual similarity measures reported in the literature, we describe the approach, which relies on Random Forest (RF), Multilayer Perceptron (MLP) and Linear Regression (LR) algorithms to build different supervised models. These models are thereafter used to determine the degrees of semantic similarity between clinical sentences. CONCORDIA is evaluated using traditional evaluation metrics, EDRM (Accuracy in relative distance to the average solution) and Spearman correlation, on standard benchmarks provided in the context of the DEFT 2020 challenge. According to the official results of this challenge, our MLP based model ranked first out of the 15 submitted systems with an EDRM of 0.8217 and a Spearman correlation coefficient of 0.7691. The post-challenge development of CONCORDIA and the experiments performed after the DEFT 2020 edition showed a significant improvement of the performance of the different implemented models. In particular, the new MLP based model achieves a Spearman correlation coefficient of 0.80. On the other hand, the LR one, which combines the output of the MLP model with word embedding similarity scores, obtains the higher Spearman correlation coefficient with a score of 0.8030. Therefore, the experiments show the effectiveness and the relevance of the proposed approach for finding similar sentences on French clinical notes.
dc.language.isoENen_US
dc.publisherSpringer, Chamen_US
dc.subject.enSentence similarity
dc.subject.enMachine learning
dc.subject.enRandom forest
dc.subject.enMultilayer perceptron
dc.subject.enFrench clinical notes
dc.title.enMachine Learning Based Finding of Similar Sentences from French Clinical Notes
dc.typeCommunication dans un congrès avec actesen_US
dc.identifier.doi10.1007/978-3-031-24197-0_2en_US
dc.subject.halSciences du Vivant [q-bio]/Santé publique et épidémiologieen_US
bordeaux.page26-42en_US
bordeaux.volume469en_US
bordeaux.hal.laboratoriesBordeaux Population Health Research Center (BPH) - UMR 1219en_US
bordeaux.institutionUniversité de Bordeauxen_US
bordeaux.institutionINSERMen_US
bordeaux.conference.titleInternational Conference on Web Information Systems and Technologiesen_US
bordeaux.countryfren_US
bordeaux.title.proceedingLecture Notes in Business Information Processingen_US
bordeaux.teamAHEAD_BPHen_US
bordeaux.teamSISTM_BPHen_US
bordeaux.conference.cityVirtualen_US
bordeaux.peerReviewedouien_US
hal.identifierhal-04139356
hal.version1
hal.date.transferred2023-06-23T13:01:54Z
hal.exporttrue
dc.rights.ccPas de Licence CCen_US
bordeaux.COinSctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.date=2023-01-18&rft.volume=469&rft.spage=26-42&rft.epage=26-42&rft.au=DRAME,%20Khadim&DIALLO,%20Abdourahmane%20Gayo&SAMBE,%20Gorgoumack&rft.genre=proceeding


Archivos en el ítem

ArchivosTamañoFormatoVer

No hay archivos asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem