A Framework to Evaluate Fusion Methods for Multimodal Emotion Recognition
Langue
EN
Article de revue
Ce document a été publié dans
IEEE Access. 2023, vol. 11, p. 10218-10237
Résumé en anglais
Multimodal methods for emotion recognition consider several sources of data to predict
emotions; thus, a fusion method is needed to aggregate the individual results. In the literature, there is a
high variety of fusion ...Lire la suite >
Multimodal methods for emotion recognition consider several sources of data to predict
emotions; thus, a fusion method is needed to aggregate the individual results. In the literature, there is a
high variety of fusion methods to perform this task, but they are not suitable for all scenarios. In particular,
there are two relevant aspects that can vary from one application to another: (i) in many scenarios, individual
modalities can have different levels of data quality or even be absent, which demands fusion methods
able to discriminate non-useful from relevant data; and (ii) in many applications, there are hardware
restrictions that limit the use of complex fusion methods (e.g., a deep learning model), which could be
quite computationally intensive. In this context, developers and researchers need metrics, guidelines, and a
systematic process to evaluate and compare different fusion methods that can fit to their particular application
scenarios. As a response to this need, this paper presents a framework that establishes a base to perform
a comparative evaluation of fusion methods to demonstrate how they adapt to the quality differences of
individual modalities and to evaluate their performance. The framework provides equivalent conditions
to perform a fair assessment of fusion methods. Based on this framework, we evaluate several fusion
methods for multimodal emotion recognition. Results demonstrate that for the architecture and dataset
selected, the methods that best fit are: Self-Attention andWeighted methods for all available modalities, and
Self-Attention and Embracenet+when a modality is missing. Concerning the time, the best times correspond
to Multilayer Perceptron (MLP) and Self-Attention models, due to their small number of operations. Thus,
the proposed framework provides insights for researchers in this area to identify which fusion methods better
fit their requirements, and thus to justify the selection.< Réduire
Mots clés en anglais
Emotion recognition
Fusion methods
Multimodality.
Unités de recherche