Afficher la notice abrégée

dc.rights.licenseopenen_US
dc.contributor.authorBEGAZO, Rolinson
dc.contributor.authorAGUILERA, Ana
hal.structure.identifierESTIA - Institute of technology [ESTIA]
dc.contributor.authorDONGO, Irvin
dc.contributor.authorCARDINALE, Yudith
dc.date.accessioned2025-03-10T14:31:26Z
dc.date.available2025-03-10T14:31:26Z
dc.date.issued2024-09-06
dc.identifier.issn1424-8220en_US
dc.identifier.urihttps://oskar-bordeaux.fr/handle/20.500.12278/205470
dc.description.abstractEnEmotion recognition through speech is a technique employed in various scenarios of Human–Computer Interaction (HCI). Existing approaches have achieved significant results; however, limitations persist, with the quantity and diversity of data being more notable when deep learning techniques are used. The lack of a standard in feature selection leads to continuous development and experimentation. Choosing and designing the appropriate network architecture constitutes another challenge. This study addresses the challenge of recognizing emotions in the human voice using deep learning techniques, proposing a comprehensive approach, and developing preprocessing and feature selection stages while constructing a dataset called EmoDSc as a result of combining several available databases. The synergy between spectral features and spectrogram images is investigated. Independently, the weighted accuracy obtained using only spectral features was 89%, while using only spectrogram images, the weighted accuracy reached 90%. These results, although surpassing previous research, highlight the strengths and limitations when operating in isolation. Based on this exploration, a neural network architecture composed of a CNN1D, a CNN2D, and an MLP that fuses spectral features and spectogram images is proposed. The model, supported by the unified dataset EmoDSc, demonstrates a remarkable accuracy of 96%.
dc.language.isoENen_US
dc.rightsAttribution 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/us/*
dc.subject.enSpeech emotion recognition
dc.subject.enDeep learning
dc.subject.enSpectral features
dc.subject.enSpectrogram imaging
dc.subject.enFeature fusion
dc.subject.enConvolutional neural network
dc.title.enA Combined CNN Architecture for Speech Emotion Recognition
dc.typeArticle de revueen_US
dc.identifier.doi10.3390/s24175797en_US
dc.subject.halSciences de l'ingénieur [physics]en_US
bordeaux.journalSensorsen_US
bordeaux.page5797en_US
bordeaux.volume24en_US
bordeaux.hal.laboratoriesESTIA - Rechercheen_US
bordeaux.issue17en_US
bordeaux.institutionUniversité de Bordeauxen_US
bordeaux.peerReviewedouien_US
bordeaux.inpressnonen_US
bordeaux.import.sourcecrossref
hal.identifierhal-04984986
hal.version1
hal.date.transferred2025-03-10T14:31:29Z
hal.popularnonen_US
hal.audienceInternationaleen_US
hal.exporttrue
workflow.import.sourcecrossref
dc.rights.ccCC BYen_US
bordeaux.COinSctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=Sensors&rft.date=2024-09-06&rft.volume=24&rft.issue=17&rft.spage=5797&rft.epage=5797&rft.eissn=1424-8220&rft.issn=1424-8220&rft.au=BEGAZO,%20Rolinson&AGUILERA,%20Ana&DONGO,%20Irvin&CARDINALE,%20Yudith&rft.genre=article


Fichier(s) constituant ce document

Thumbnail
Thumbnail

Ce document figure dans la(les) collection(s) suivante(s)

Afficher la notice abrégée