AUGONNET, Cédric; AUMAGE, Olivier; FURMENTO, Nathalie; THIBAULT, Samuel; NAMYST, Raymond

La plateforme OSKAR Bordeaux évolue pour rejoindre l'archive ouverte HAL. Retrouvez tous vos dépôts sur le nouveau portail HAL UB : https://u-bordeaux.hal.science/. Pour toute aide ou information, contactez-nous info@oskar-bordeaux.fr

Afficher la notice abrégée

hal.structure.identifier	Efficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.author	AUGONNET, Cédric
hal.structure.identifier	Efficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.author	AUMAGE, Olivier
hal.structure.identifier	Efficient runtime systems for parallel architectures [RUNTIME]
hal.structure.identifier	Laboratoire Bordelais de Recherche en Informatique [LaBRI]
dc.contributor.author	FURMENTO, Nathalie
hal.structure.identifier	Efficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.author	THIBAULT, Samuel
hal.structure.identifier	Efficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.author	NAMYST, Raymond
dc.date.accessioned	2024-04-15T09:41:20Z
dc.date.available	2024-04-15T09:41:20Z
dc.date.issued	2014-05-16
dc.identifier.uri	https://oskar-bordeaux.fr/handle/20.500.12278/197592
dc.description.abstractEn	GPUs have largely entered HPC clusters, as shown by the top entries of the latest top500 issue. Exploiting such machines is however very challenging, not only because of combining two separate paradigms, MPI and CUDA or OpenCL, but also because nodes are heterogeneous and thus require careful load balancing within nodes themselves. The current paradigms are usually limited to only offloading parts of the computation and leaving CPUs idle, or they require static work partitioning between CPUs and GPUs. To handle single-node architecture heterogeneity, we have previously proposed StarPU, a runtime system capable of dynamically scheduling tasks in an optimized way on such machines. We show here how the task paradigm of StarPU has been combined with MPI communications, and how we extended the task paradigm itself to allow mapping the task graph on MPI clusters such as to automatically achieve an optimized distributed execution. We show how a sequential-like Cholesky source code can be easily extended into a scalable distributed parallel execution, and already exhibits a speedup of 5 on 6 nodes.
dc.language.iso	en
dc.title.en	StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators
dc.type	Rapport
dc.subject.hal	Informatique [cs]/Calcul parallèle, distribué et partagé [cs.DC]
bordeaux.hal.laboratories	Laboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800	*
bordeaux.institution	Université de Bordeaux
bordeaux.institution	Bordeaux INP
bordeaux.institution	CNRS
bordeaux.type.institution	INRIA
bordeaux.type.report	rr
hal.identifier	hal-00992208
hal.version	1
hal.audience	Non spécifiée
hal.origin.link	https://hal.archives-ouvertes.fr//hal-00992208v1
bordeaux.COinS	ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.date=2014-05-16&rft.au=AUGONNET,%20C%C3%A9dric&AUMAGE,%20Olivier&FURMENTO,%20Nathalie&THIBAULT,%20Samuel&NAMYST,%20Raymond&rft.genre=unknown

Fichier(s) constituant ce document

Fichiers	Taille	Format	Vue
Il n'y a pas de fichiers associés à ce document.

Ce document figure dans la(les) collection(s) suivante(s)

Laboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800

Afficher la notice abrégée

StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators

Fichier(s) constituant ce document

Ce document figure dans la(les) collection(s) suivante(s)