Show simple item record

hal.structure.identifierHigh-End Parallel Algorithms for Challenging Numerical Simulations [HiePACS]
hal.structure.identifierLaboratoire Bordelais de Recherche en Informatique [LaBRI]
dc.contributor.authorAGULLO, Emmanuel
hal.structure.identifierLaboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifierEfficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.authorAUGONNET, Cédric
hal.structure.identifierDepartment of Computer Science. University of Tennessee
hal.structure.identifierOak Ridge National Laboratory [Oak Ridge] [ORNL]
hal.structure.identifierSchool of Computer Science [Manchester]
dc.contributor.authorDONGARRA, Jack
hal.structure.identifierDepartment of Computer Science. University of Tennessee
dc.contributor.authorLTAIEF, Hatem
hal.structure.identifierLaboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifierEfficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.authorNAMYST, Raymond
hal.structure.identifierHigh-End Parallel Algorithms for Challenging Numerical Simulations [HiePACS]
hal.structure.identifierLaboratoire Bordelais de Recherche en Informatique [LaBRI]
dc.contributor.authorROMAN, Jean
hal.structure.identifierLaboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifierEfficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.authorTHIBAULT, Samuel
hal.structure.identifierDepartment of Computer Science. University of Tennessee
dc.contributor.authorTOMOV, Stanimire
dc.date.accessioned2024-04-15T09:48:12Z
dc.date.available2024-04-15T09:48:12Z
dc.date.issued2010-07
dc.date.conference2010-07-13
dc.identifier.urihttps://oskar-bordeaux.fr/handle/20.500.12278/198162
dc.description.abstractEnAlthough the hardware has dramatically changed in the last few years, nodes of multicore chips augmented by Graphics Processing Units (GPUs) seem to be a trend of major importance. Previous approaches for scheduling dense linear operations on such a complex node led to high performance but at the double cost of not using the potential of all the cores and producing a static and non generic code. In this extended abstract, we present a new approach for scheduling dense linear algebra operations on multicore architectures with GPU accelerators using a dynamic scheduler capable of using the full potential of the node [1]. We underline the benefits both in terms of programmability and performance. We illustrate our approach with a Cholesky factorization relying on cutting edge GPU and CPU kernels [2], [3] achieving roughly 900 Gflop/s on an eight cores node accelerated with three NVIDIA Tesla GPUs.
dc.language.isoen
dc.title.enDynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators.
dc.typeCommunication dans un congrès
dc.subject.halInformatique [cs]/Calcul parallèle, distribué et partagé [cs.DC]
bordeaux.hal.laboratoriesLaboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800*
bordeaux.institutionUniversité de Bordeaux
bordeaux.institutionBordeaux INP
bordeaux.institutionCNRS
bordeaux.conference.titleSymposium on Application Accelerators in High Performance Computing (SAAHPC)
bordeaux.countryUS
bordeaux.conference.cityKnoxville
bordeaux.peerReviewedoui
hal.identifierinria-00547616
hal.version1
hal.invitednon
hal.proceedingsoui
hal.conference.end2010-07-15
hal.popularnon
hal.audienceInternationale
hal.origin.linkhttps://hal.archives-ouvertes.fr//inria-00547616v1
bordeaux.COinSctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.date=2010-07&rft.au=AGULLO,%20Emmanuel&AUGONNET,%20C%C3%A9dric&DONGARRA,%20Jack&LTAIEF,%20Hatem&NAMYST,%20Raymond&rft.genre=unknown


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record