Show simple item record

hal.structure.identifierLaboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifierEfficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.authorAUGONNET, Cédric
hal.structure.identifierLaboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifierEfficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.authorCLET-ORTEGA, Jérôme
hal.structure.identifierLaboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifierEfficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.authorTHIBAULT, Samuel
hal.structure.identifierLaboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifierEfficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.authorNAMYST, Raymond
dc.date.accessioned2024-04-15T09:49:08Z
dc.date.available2024-04-15T09:49:08Z
dc.date.issued2010-12
dc.date.conference2010-12-08
dc.identifier.urihttps://oskar-bordeaux.fr/handle/20.500.12278/198232
dc.description.abstractEnTo fully tap into the potential of heterogeneous machines composed of multicore processors and multiple accelerators, simple offloading approaches in which the main trunk of the application runs on regular cores while only specific parts are offloaded on accelerators are not sufficient. The real challenge is to build systems where the application would permanently spread across the entire machine, that is, where parallel tasks would be dynamically scheduled over the full set of available processing units. To face this challenge, we previously proposed StarPU, a runtime system capable of scheduling tasks over multicore machines equipped with GPU accelerators. StarPU uses a software virtual shared memory (VSM) that provides a high-level programming interface and automates data transfers between processing units so as to enable a dynamic scheduling of tasks. We now present how we have extended StarPU to minimize the cost of transfers between processing units in order to efficiently cope with multi-GPU hardware configurations. To this end, our runtime system implements data prefetching based on asynchronous data transfers, and uses data transfer cost prediction to influence the decisions taken by the task scheduler. We demonstrate the relevance of our approach by benchmarking two parallel numerical algorithms using our runtime system. We obtain significant speedups and high efficiency over multicore machines equipped with multiple accelerators. We also evaluate the behaviour of these applications over clusters featuring multiple GPUs per node, showing how our runtime system can combine with MPI.
dc.description.sponsorshipProgrammation des technologies multicoeurs hétérogènes - ANR-08-COSI-0013
dc.language.isoen
dc.title.enData-Aware Task Scheduling on Multi-Accelerator based Platforms
dc.typeCommunication dans un congrès
dc.subject.halInformatique [cs]/Système d'exploitation [cs.OS]
dc.description.sponsorshipEuropePerformance Portability and Programmability for Heterogeneous Many-core Architectures
bordeaux.hal.laboratoriesLaboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800*
bordeaux.institutionUniversité de Bordeaux
bordeaux.institutionBordeaux INP
bordeaux.institutionCNRS
bordeaux.conference.title16th International Conference on Parallel and Distributed Systems
bordeaux.countryCN
bordeaux.conference.cityShangai
bordeaux.peerReviewedoui
hal.identifierinria-00523937
hal.version1
hal.invitednon
hal.proceedingsoui
hal.conference.end2010-12-10
hal.popularnon
hal.audienceInternationale
hal.origin.linkhttps://hal.archives-ouvertes.fr//inria-00523937v1
bordeaux.COinSctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.date=2010-12&rft.au=AUGONNET,%20C%C3%A9dric&CLET-ORTEGA,%20J%C3%A9r%C3%B4me&THIBAULT,%20Samuel&NAMYST,%20Raymond&rft.genre=unknown


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record