AUGONNET, Cédric; CLET-ORTEGA, Jérôme; THIBAULT, Samuel; NAMYST, Raymond

hal.structure.identifier	Laboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifier	Efficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.author	AUGONNET, Cédric
hal.structure.identifier	Laboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifier	Efficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.author	CLET-ORTEGA, Jérôme
hal.structure.identifier	Laboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifier	Efficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.author	THIBAULT, Samuel
hal.structure.identifier	Laboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifier	Efficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.author	NAMYST, Raymond
dc.date.accessioned	2024-04-15T09:49:08Z
dc.date.available	2024-04-15T09:49:08Z
dc.date.issued	2010-12
dc.date.conference	2010-12-08
dc.identifier.uri	https://oskar-bordeaux.fr/handle/20.500.12278/198232
dc.description.abstractEn	To fully tap into the potential of heterogeneous machines composed of multicore processors and multiple accelerators, simple offloading approaches in which the main trunk of the application runs on regular cores while only specific parts are offloaded on accelerators are not sufficient. The real challenge is to build systems where the application would permanently spread across the entire machine, that is, where parallel tasks would be dynamically scheduled over the full set of available processing units. To face this challenge, we previously proposed StarPU, a runtime system capable of scheduling tasks over multicore machines equipped with GPU accelerators. StarPU uses a software virtual shared memory (VSM) that provides a high-level programming interface and automates data transfers between processing units so as to enable a dynamic scheduling of tasks. We now present how we have extended StarPU to minimize the cost of transfers between processing units in order to efficiently cope with multi-GPU hardware configurations. To this end, our runtime system implements data prefetching based on asynchronous data transfers, and uses data transfer cost prediction to influence the decisions taken by the task scheduler. We demonstrate the relevance of our approach by benchmarking two parallel numerical algorithms using our runtime system. We obtain significant speedups and high efficiency over multicore machines equipped with multiple accelerators. We also evaluate the behaviour of these applications over clusters featuring multiple GPUs per node, showing how our runtime system can combine with MPI.
dc.description.sponsorship	Programmation des technologies multicoeurs hétérogènes - ANR-08-COSI-0013
dc.language.iso	en
dc.title.en	Data-Aware Task Scheduling on Multi-Accelerator based Platforms
dc.type	Communication dans un congrès
dc.subject.hal	Informatique [cs]/Système d'exploitation [cs.OS]
dc.description.sponsorshipEurope	Performance Portability and Programmability for Heterogeneous Many-core Architectures
bordeaux.hal.laboratories	Laboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800	*
bordeaux.institution	Université de Bordeaux
bordeaux.institution	Bordeaux INP
bordeaux.institution	CNRS
bordeaux.conference.title	16th International Conference on Parallel and Distributed Systems
bordeaux.country	CN
bordeaux.conference.city	Shangai
bordeaux.peerReviewed	oui
hal.identifier	inria-00523937
hal.version	1
hal.invited	non
hal.proceedings	oui
hal.conference.end	2010-12-10
hal.popular	non
hal.audience	Internationale
hal.origin.link	https://hal.archives-ouvertes.fr//inria-00523937v1
bordeaux.COinS	ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.date=2010-12&rft.au=AUGONNET,%20C%C3%A9dric&CLET-ORTEGA,%20J%C3%A9r%C3%B4me&THIBAULT,%20Samuel&NAMYST,%20Raymond&rft.genre=unknown

Archivos en el ítem

Archivos	Tamaño	Formato	Ver
No hay archivos asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Laboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800

Mostrar el registro sencillo del ítem

Data-Aware Task Scheduling on Multi-Accelerator based Platforms

Archivos en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)