MA, Teng; BOSILCA, George; BOUTEILLER, Aurélien; GOGLIN, Brice; SQUYRES, Jeffrey; DONGARRA, Jack

doi:10.1109/ICPP.2011.29

hal.structure.identifier	Innovative Computing Laboratory [Knoxville] [ICL]
dc.contributor.author	MA, Teng
hal.structure.identifier	Innovative Computing Laboratory [Knoxville] [ICL]
dc.contributor.author	BOSILCA, George
hal.structure.identifier	Innovative Computing Laboratory [Knoxville] [ICL]
dc.contributor.author	BOUTEILLER, Aurélien
hal.structure.identifier	Laboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifier	Efficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.author	GOGLIN, Brice
hal.structure.identifier	Cisco Systems
dc.contributor.author	SQUYRES, Jeffrey
hal.structure.identifier	Innovative Computing Laboratory [Knoxville] [ICL]
dc.contributor.author	DONGARRA, Jack
dc.contributor.editor	IEEE
dc.date.accessioned	2024-04-15T09:47:18Z
dc.date.available	2024-04-15T09:47:18Z
dc.date.issued	2011-09
dc.date.conference	2011-09-13
dc.identifier.uri	https://oskar-bordeaux.fr/handle/20.500.12278/198087
dc.description.abstractEn	Shared memory is among the most common approaches to implementing message passing within multi-core nodes. However, current shared memory techniques do not scale with increasing numbers of cores and expanding memory hierarchies -- most notably when handling large data transfers and collective communication. Neglecting the underlying hardware topology, using copy-in/copy-out memory transfer operations, and overloading the memory subsystem using one-to-many types of operations are some of the most common mistakes in today's shared memory implementations. Unfortunately, they all negatively impact the performance and scalability of MPI libraries -- and therefore applications. In this paper, we present several kernel-assisted intra-node collective communication techniques that address these three issues on many-core systems. We also present a new Open MPI collective communication component that uses the KNEM Linux module for direct inter-process memory copying. Our Open MPI component implements several novel strategies to decrease the number of intermediate memory copies and improve data locality in order to diminish both cache pollution and memory pressure. Experimental results show that our KNEM-enabled Open\,MPI collective component can outperform state-of-art MPI libraries (Open\,MPI and MPICH2) on synthetic benchmarks, resulting in a significant improvement for a typical graph application.
dc.language.iso	en
dc.title.en	Kernel Assisted Collective Intra-node MPI Communication Among Multi-core and Many-core CPUs
dc.type	Communication dans un congrès
dc.identifier.doi	10.1109/ICPP.2011.29
dc.subject.hal	Informatique [cs]/Système d'exploitation [cs.OS]
bordeaux.hal.laboratories	Laboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800	*
bordeaux.institution	Université de Bordeaux
bordeaux.institution	Bordeaux INP
bordeaux.institution	CNRS
bordeaux.conference.title	40th International Conference on Parallel Processing (ICPP-2011)
bordeaux.country	TW
bordeaux.conference.city	Taipei
bordeaux.peerReviewed	oui
hal.identifier	inria-00602877
hal.version	1
hal.invited	non
hal.proceedings	oui
hal.conference.end	2011-09-16
hal.popular	non
hal.audience	Internationale
hal.origin.link	https://hal.archives-ouvertes.fr//inria-00602877v1
bordeaux.COinS	ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.date=2011-09&rft.au=MA,%20Teng&BOSILCA,%20George&BOUTEILLER,%20Aur%C3%A9lien&GOGLIN,%20Brice&SQUYRES,%20Jeffrey&rft.genre=unknown

Archivos en el ítem

Archivos	Tamaño	Formato	Ver
No hay archivos asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Laboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800

Mostrar el registro sencillo del ítem

Kernel Assisted Collective Intra-node MPI Communication Among Multi-core and Many-core CPUs

Archivos en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)