BUNTINAS, Darius; GOGLIN, Brice; GOODELL, David; MERCIER, Guillaume; MOREAUD, Stéphanie

doi:10.1109/ICPP.2009.22

hal.structure.identifier	Mathematics and Computer Science Division [ANL] [MCS]
dc.contributor.author	BUNTINAS, Darius
hal.structure.identifier	Laboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifier	Efficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.author	GOGLIN, Brice
hal.structure.identifier	Mathematics and Computer Science Division [ANL] [MCS]
dc.contributor.author	GOODELL, David
hal.structure.identifier	Laboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifier	Efficient runtime systems for parallel architectures [RUNTIME]
hal.structure.identifier	Ecole Nationale Supérieure d'Electronique, Informatique et Radiocommunications de Bordeaux [ENSEIRB]
dc.contributor.author	MERCIER, Guillaume
hal.structure.identifier	Laboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifier	Efficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.author	MOREAUD, Stéphanie
dc.contributor.editor	IEEE
dc.date.accessioned	2024-04-15T09:51:40Z
dc.date.available	2024-04-15T09:51:40Z
dc.date.issued	2009
dc.date.conference	2009-09-22
dc.identifier.uri	https://oskar-bordeaux.fr/handle/20.500.12278/198448
dc.description.abstractEn	The emergence of multicore processors raises the need to efficiently transfer large amounts of data between local processes. MPICH2 is a highly portable MPI implementation whose large-message communication schemes suffer from high CPU utilization and cache pollution because of the use of a double-buffering strategy, common to many MPI implementations. We introduce two strategies offering a kernel-assisted, single-copy model with support for noncontiguous and asynchronous transfers. The first one uses the now widely available vmsplice Linux system call; the second one further improves performance thanks to a custom kernel module called KNEM. The latter also offers I/OAT copy offload, which is dynamically enabled depending on both hardware cache characteristics and message size. These new solutions outperform the standard transfer method in the MPICH2 implementation when no cache is shared between the processing cores or when very large messages are being transferred. Collective communication operations show a dramatic improvement, and the IS NAS parallel benchmark shows a 25% speedup and better cache efficiency.
dc.language.iso	en
dc.title.en	Cache-Efficient, Intranode, Large-Message MPI Communication with MPICH2-Nemesis
dc.type	Communication dans un congrès
dc.identifier.doi	10.1109/ICPP.2009.22
dc.subject.hal	Informatique [cs]/Système d'exploitation [cs.OS]
bordeaux.hal.laboratories	Laboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800	*
bordeaux.institution	Université de Bordeaux
bordeaux.institution	Bordeaux INP
bordeaux.institution	CNRS
bordeaux.conference.title	38th International Conference on Parallel Processing (ICPP-2009)
bordeaux.country	AT
bordeaux.conference.city	Vienne
bordeaux.peerReviewed	oui
hal.identifier	inria-00390064
hal.version	1
hal.invited	non
hal.proceedings	oui
hal.popular	non
hal.audience	Internationale
hal.origin.link	https://hal.archives-ouvertes.fr//inria-00390064v1
bordeaux.COinS	ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.date=2009&rft.au=BUNTINAS,%20Darius&GOGLIN,%20Brice&GOODELL,%20David&MERCIER,%20Guillaume&MOREAUD,%20St%C3%A9phanie&rft.genre=unknown

Fichier(s) constituant ce document

Fichiers	Taille	Format	Vue
Il n'y a pas de fichiers associés à ce document.

Ce document figure dans la(les) collection(s) suivante(s)

Laboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800

Afficher la notice abrégée

Cache-Efficient, Intranode, Large-Message MPI Communication with MPICH2-Nemesis

Fichier(s) constituant ce document

Ce document figure dans la(les) collection(s) suivante(s)