Afficher la notice abrégée

hal.structure.identifierEfficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.authorAUGONNET, Cédric
hal.structure.identifierEfficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.authorAUMAGE, Olivier
hal.structure.identifierEfficient runtime systems for parallel architectures [RUNTIME]
hal.structure.identifierLaboratoire Bordelais de Recherche en Informatique [LaBRI]
dc.contributor.authorFURMENTO, Nathalie
hal.structure.identifierEfficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.authorTHIBAULT, Samuel
hal.structure.identifierEfficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.authorNAMYST, Raymond
dc.date.accessioned2024-04-15T09:41:20Z
dc.date.available2024-04-15T09:41:20Z
dc.date.issued2014-05-16
dc.identifier.urihttps://oskar-bordeaux.fr/handle/20.500.12278/197592
dc.description.abstractEnGPUs have largely entered HPC clusters, as shown by the top entries of the latest top500 issue. Exploiting such machines is however very challenging, not only because of combining two separate paradigms, MPI and CUDA or OpenCL, but also because nodes are heterogeneous and thus require careful load balancing within nodes themselves. The current paradigms are usually limited to only offloading parts of the computation and leaving CPUs idle, or they require static work partitioning between CPUs and GPUs. To handle single-node architecture heterogeneity, we have previously proposed StarPU, a runtime system capable of dynamically scheduling tasks in an optimized way on such machines. We show here how the task paradigm of StarPU has been combined with MPI communications, and how we extended the task paradigm itself to allow mapping the task graph on MPI clusters such as to automatically achieve an optimized distributed execution. We show how a sequential-like Cholesky source code can be easily extended into a scalable distributed parallel execution, and already exhibits a speedup of 5 on 6 nodes.
dc.language.isoen
dc.title.enStarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators
dc.typeRapport
dc.subject.halInformatique [cs]/Calcul parallèle, distribué et partagé [cs.DC]
bordeaux.hal.laboratoriesLaboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800*
bordeaux.institutionUniversité de Bordeaux
bordeaux.institutionBordeaux INP
bordeaux.institutionCNRS
bordeaux.type.institutionINRIA
bordeaux.type.reportrr
hal.identifierhal-00992208
hal.version1
hal.audienceNon spécifiée
hal.origin.linkhttps://hal.archives-ouvertes.fr//hal-00992208v1
bordeaux.COinSctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.date=2014-05-16&rft.au=AUGONNET,%20C%C3%A9dric&AUMAGE,%20Olivier&FURMENTO,%20Nathalie&THIBAULT,%20Samuel&NAMYST,%20Raymond&rft.genre=unknown


Fichier(s) constituant ce document

FichiersTailleFormatVue

Il n'y a pas de fichiers associés à ce document.

Ce document figure dans la(les) collection(s) suivante(s)

Afficher la notice abrégée