Mostrar el registro sencillo del ítem

hal.structure.identifierMathematics and Computer Science Division [ANL] [MCS]
dc.contributor.authorBUNTINAS, Darius
hal.structure.identifierLaboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifierEfficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.authorGOGLIN, Brice
hal.structure.identifierMathematics and Computer Science Division [ANL] [MCS]
dc.contributor.authorGOODELL, David
hal.structure.identifierLaboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifierEfficient runtime systems for parallel architectures [RUNTIME]
hal.structure.identifierEcole Nationale Supérieure d'Electronique, Informatique et Radiocommunications de Bordeaux [ENSEIRB]
dc.contributor.authorMERCIER, Guillaume
hal.structure.identifierLaboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifierEfficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.authorMOREAUD, Stéphanie
dc.contributor.editorIEEE
dc.date.accessioned2024-04-15T09:51:40Z
dc.date.available2024-04-15T09:51:40Z
dc.date.issued2009
dc.date.conference2009-09-22
dc.identifier.urihttps://oskar-bordeaux.fr/handle/20.500.12278/198448
dc.description.abstractEnThe emergence of multicore processors raises the need to efficiently transfer large amounts of data between local processes. MPICH2 is a highly portable MPI implementation whose large-message communication schemes suffer from high CPU utilization and cache pollution because of the use of a double-buffering strategy, common to many MPI implementations. We introduce two strategies offering a kernel-assisted, single-copy model with support for noncontiguous and asynchronous transfers. The first one uses the now widely available vmsplice Linux system call; the second one further improves performance thanks to a custom kernel module called KNEM. The latter also offers I/OAT copy offload, which is dynamically enabled depending on both hardware cache characteristics and message size. These new solutions outperform the standard transfer method in the MPICH2 implementation when no cache is shared between the processing cores or when very large messages are being transferred. Collective communication operations show a dramatic improvement, and the IS NAS parallel benchmark shows a 25% speedup and better cache efficiency.
dc.language.isoen
dc.title.enCache-Efficient, Intranode, Large-Message MPI Communication with MPICH2-Nemesis
dc.typeCommunication dans un congrès
dc.identifier.doi10.1109/ICPP.2009.22
dc.subject.halInformatique [cs]/Système d'exploitation [cs.OS]
bordeaux.hal.laboratoriesLaboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800*
bordeaux.institutionUniversité de Bordeaux
bordeaux.institutionBordeaux INP
bordeaux.institutionCNRS
bordeaux.conference.title38th International Conference on Parallel Processing (ICPP-2009)
bordeaux.countryAT
bordeaux.conference.cityVienne
bordeaux.peerReviewedoui
hal.identifierinria-00390064
hal.version1
hal.invitednon
hal.proceedingsoui
hal.popularnon
hal.audienceInternationale
hal.origin.linkhttps://hal.archives-ouvertes.fr//inria-00390064v1
bordeaux.COinSctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.date=2009&rft.au=BUNTINAS,%20Darius&GOGLIN,%20Brice&GOODELL,%20David&MERCIER,%20Guillaume&MOREAUD,%20St%C3%A9phanie&rft.genre=unknown


Archivos en el ítem

ArchivosTamañoFormatoVer

No hay archivos asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem