Speech Emotion Recognition using Time-frequency Random Circular Shift and Deep Neural Networks
hal.structure.identifier | Informatique, BioInformatique, Systèmes Complexes [IBISC] | |
dc.contributor.author | XIA, Sylvain | |
hal.structure.identifier | Informatique, BioInformatique, Systèmes Complexes [IBISC] | |
dc.contributor.author | FOURER, Dominique | |
hal.structure.identifier | Laboratoire de l'intégration, du matériau au système [IMS] | |
dc.contributor.author | AUDIN, Liliana | |
hal.structure.identifier | Laboratoire Bordelais de Recherche en Informatique [LaBRI] | |
dc.contributor.author | ROUAS, Jean-Luc | |
hal.structure.identifier | Laboratoire Bordelais de Recherche en Informatique [LaBRI] | |
dc.contributor.author | SHOCHI, Takaaki | |
dc.date.accessioned | 2022-03-07T14:26:26Z | |
dc.date.available | 2022-03-07T14:26:26Z | |
dc.date.conference | 2022-05-23 | |
dc.identifier.uri | https://oskar-bordeaux.fr/handle/20.500.12278/129775 | |
dc.description.abstractEn | This paper addresses the problem of emotion recognition from a speech signal. Thus, we investigate a data augmentation technique based on circular shift of the input time-frequency representation which significantly enhances the emotion prediction results using a deep convolutional neural network method. After an investigation of the best combination of the method parameters, we comparatively assess several neural network architectures (Alexnet, Resnet and Inception) using our approach applied on two publicly available datasets: eNTERFACE05 and EMO-DB. Our results reveal an improvement of the prediction accuracy in comparison to a more complicated technique of the state of the art based on Discriminant Temporal Pyramid Matching (DCNN-DTPM). | |
dc.language.iso | en | |
dc.subject.en | Speech Emotion Recognition (SER) | |
dc.subject.en | Deep Convolutional Neural Networks | |
dc.subject.en | Time-frequency | |
dc.subject.en | Random Circular Shift (RCS) | |
dc.title.en | Speech Emotion Recognition using Time-frequency Random Circular Shift and Deep Neural Networks | |
dc.type | Communication dans un congrès avec actes | |
dc.subject.hal | Informatique [cs]/Son [cs.SD] | |
dc.subject.hal | Informatique [cs]/Traitement du signal et de l'image | |
dc.subject.hal | Informatique [cs]/Intelligence artificielle [cs.AI] | |
bordeaux.hal.laboratories | CLLE Montaigne : Cognition, langues, Langages, Ergonomie - UMR 5263 | * |
bordeaux.institution | Université Bordeaux Montaigne | |
bordeaux.country | PT | |
bordeaux.title.proceeding | Speech Prosody 2022 | |
bordeaux.conference.city | Lisbonne | |
bordeaux.peerReviewed | oui | |
hal.identifier | hal-03583535 | |
hal.version | 1 | |
hal.origin.link | https://hal.archives-ouvertes.fr//hal-03583535v1 | |
bordeaux.COinS | ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.au=XIA,%20Sylvain&FOURER,%20Dominique&AUDIN,%20Liliana&ROUAS,%20Jean-Luc&SHOCHI,%20Takaaki&rft.genre=proceeding |
Fichier(s) constituant ce document
Fichiers | Taille | Format | Vue |
---|---|---|---|
Il n'y a pas de fichiers associés à ce document. |