GONZALEZ-DIAZ, Ivan; MOLINA-MORENO, Miguel; BENOIS-PINEAU, Jenny; DE RUGY, Aymar

doi:10.1109/JBHI.2024.3430810

BENOIS-PINEAU, Jenny
Laboratoire Bordelais de Recherche en Informatique [LaBRI]

Idioma

Article de revue

Este ítem está publicado en

IEEE Journal of Biomedical and Health Informatics. 2024-07-18p. 1-17

Resumen en inglés

This work tackles the problem of automatically predicting the grasping intention of humans observing their environment, with eye-tracker glasses and video cameras recording the scene view. Our target application is the assistance to people with motor disabilities and potential cognitive impairments, using assistive robotics. Our proposal leverages the analysis of human attention captured in the form of gaze fixations recorded by an eye-tracker on the first person video, as the anticipation of prehension actions is a well studied and well known phenomenon. We propose a multi-task system that simultaneously addresses the prediction of human attention in the near future, and the anticipation of grasping actions. In our model, visual attention is modeled as a competitive process between a discrete set of states, each one associated to a well-known gaze movement pattern from visual psychology. We additionally consider an asymmetric multitask problem, where attention modeling is an auxiliary task that helps to regularize the learning process of the main action prediction task, and propose a constrained multi-task loss that naturally deals with this asymmetry. Our model shows superior performance than other losses for dynamic multi-task learning, current dominant deep architectures for general action forecasting and particularly-tailored models for predicting grasping intention. In particular, it provides state-of-the-art performance in three datasets for egocentric action anticipation, with an average precision of 0.569 and 0.524 in GITW and Sharon datasets, respectively, and an accuracy of 89.2% and a success rate of 51.7% in Invisible dataset.< Leer menos

Palabras clave en inglés

Grasping Action Forecasting

Multi-Task Learning

Interpretable Attention Prediction

Constrained Loss

URI

https://oskar-bordeaux.fr/handle/20.500.12278/202009

DOI

http://dx.doi.org/10.1109/JBHI.2024.3430810

Centros de investigación

Institut de neurosciences cognitives et intégratives d'Aquitaine (INCIA) - UMR 5287

Ver/Abrir

Metadatos

Compartir este ítem

Licencia de uso del documento

Asymmetric multi-task learning for interpretable gaze-driven grasping action forecasting

Idioma

Este ítem está publicado en

Resumen en inglés

Palabras clave en inglés

URI

DOI

Centros de investigación