Show simple item record

hal.structure.identifierHigh-End Parallel Algorithms for Challenging Numerical Simulations [HiePACS]
hal.structure.identifierLaboratoire Bordelais de Recherche en Informatique [LaBRI]
dc.contributor.authorAGULLO, Emmanuel
hal.structure.identifierLaboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifierEfficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.authorAUGONNET, Cédric
hal.structure.identifierInnovative Computing Laboratory [Knoxville] [ICL]
dc.contributor.authorDONGARRA, Jack
hal.structure.identifierInnovative Computing Laboratory [Knoxville] [ICL]
dc.contributor.authorFAVERGE, Mathieu
hal.structure.identifierInnovative Computing Laboratory [Knoxville] [ICL]
dc.contributor.authorLTAIEF, Hatem
hal.structure.identifierLaboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifierEfficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.authorTHIBAULT, Samuel
hal.structure.identifierInnovative Computing Laboratory [Knoxville] [ICL]
dc.contributor.authorTOMOV, Stanimire
dc.date.accessioned2024-04-15T09:48:13Z
dc.date.available2024-04-15T09:48:13Z
dc.date.issued2011-05
dc.date.conference2011-05-16
dc.identifier.urihttps://oskar-bordeaux.fr/handle/20.500.12278/198163
dc.description.abstractEnOne of the major trends in the design of exascale architectures is the use of multicore nodes enhanced with GPU accelerators. Exploiting all resources of a hybrid accelerators- based node at their maximum potential is thus a fundamental step towards exascale computing. In this article, we present the design of a highly efficient QR factorization for such a node. Our method is in three steps. The first step consists of expressing the QR factorization as a sequence of tasks of well chosen granularity that will aim at being executed on a CPU core or a GPU. We show that we can efficiently adapt high-level algorithms from the literature that were initially designed for homogeneous multicore architectures. The second step consists of designing the kernels that implement each individual task. We use CPU kernels from previous work and present new kernels for GPUs that complement kernels already available in the MAGMA library. We show the impact on performance of these GPU kernels. In particular, we present the benefits of new hybrid CPU/GPU kernels. The last step consists of scheduling these tasks on the computational units. We present two alternative approaches, respectively based on static and dynamic scheduling. In the case of static scheduling, we exploit the a priori knowledge of the schedule to perform successive optimizations leading to very high performance. We, however, highlight the lack of portability of this approach and its limitations to relatively simple algorithms on relatively homogeneous nodes. Alternatively, by relying on an efficient runtime system, StarPU, in charge of ensuring data availability and coherency, we can schedule more complex algorithms on complex heterogeneous nodes with much higher productivity. In this latter case, we show that we can achieve high perfor- mance in a portable way thanks to a fine interaction between the application and the runtime system. We demonstrate that the obtained performance is very close to the theoretical upper bounds that we obtained using Linear Programming.
dc.language.isoen
dc.title.enQR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators
dc.typeCommunication dans un congrès
dc.identifier.doi10.1109/IPDPS.2011.90
dc.subject.halInformatique [cs]/Calcul parallèle, distribué et partagé [cs.DC]
bordeaux.hal.laboratoriesLaboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800*
bordeaux.institutionUniversité de Bordeaux
bordeaux.institutionBordeaux INP
bordeaux.institutionCNRS
bordeaux.conference.title25th IEEE International Parallel & Distributed Processing Symposium
bordeaux.countryUS
bordeaux.conference.cityAnchorage
bordeaux.peerReviewedoui
hal.identifierinria-00547614
hal.version1
hal.invitednon
hal.proceedingsoui
hal.conference.end2011-05-20
hal.popularnon
hal.audienceInternationale
hal.origin.linkhttps://hal.archives-ouvertes.fr//inria-00547614v1
bordeaux.COinSctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.date=2011-05&rft.au=AGULLO,%20Emmanuel&AUGONNET,%20C%C3%A9dric&DONGARRA,%20Jack&FAVERGE,%20Mathieu&LTAIEF,%20Hatem&rft.genre=unknown


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record