MA, Teng; BOSILCA, George; BOUTEILLER, Aurélien; GOGLIN, Brice; SQUYRES, Jeffrey; DONGARRA, Jack

El sistema se apagará debido a tareas habituales de mantenimiento. Por favor, guarde su trabajo y desconéctese.

hal.structure.identifier	Innovative Computing Laboratory [Knoxville] [ICL]
dc.contributor.author	MA, Teng
hal.structure.identifier	Innovative Computing Laboratory [Knoxville] [ICL]
dc.contributor.author	BOSILCA, George
hal.structure.identifier	Innovative Computing Laboratory [Knoxville] [ICL]
dc.contributor.author	BOUTEILLER, Aurélien
hal.structure.identifier	Laboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifier	Efficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.author	GOGLIN, Brice
hal.structure.identifier	Cisco Systems
dc.contributor.author	SQUYRES, Jeffrey
hal.structure.identifier	Innovative Computing Laboratory [Knoxville] [ICL]
dc.contributor.author	DONGARRA, Jack
dc.date.accessioned	2024-04-15T09:48:20Z
dc.date.available	2024-04-15T09:48:20Z
dc.date.issued	2010-12
dc.identifier.uri	https://oskar-bordeaux.fr/handle/20.500.12278/198171
dc.description.abstractEn	More memory hierarchies, NUMA architectures and network-style interconnection are widely used in modern many-core CPU design to achieve performance scalability. As a leading intra-node programming model, Message Passing Interface (MPI) implementations must exploit these architectures to provide reliable performance portability. These new architectures not only require specialized MPI point-to-point messaging protocols, they also require carefully designed and tuned algorithms for MPI collective operations. Multiple issues must be taken into account: 1) minimizing the number of copies required, 2) minimizing traffic to ''remote'' NUMA memory, and 3) carefully avoiding memory bottlenecks for ''rooted'' collective operations. In this paper, we present a kernel assisted intra-node collective module addressing those three issues on many-core systems. A kernel level inter-process memory copy module, called KNEM, is used by a novel Open MPI collective module to implement several improved strategies based on decreasing the number of intermediate memory copies and improving locality to reduce both the pressure on the memory banks and the cache pollution. The collective topology is mapped onto the NUMA topology to minimize cross traffic on inter-socket links. Experiments illustrate that the KNEM enabled Open MPI collective module can achieve up to a threefold speedup on synthetic benchmarks, resulting in a 12% improvement for a parallel graph shortest path discovery application.
dc.language.iso	en
dc.subject.en	MPI
dc.subject.en	Multicore
dc.subject.en	Shared memory
dc.subject.en	NUMA
dc.subject.en	Kernel
dc.subject.en	Collective communication
dc.title.en	Kernel Assisted Collective Intra-node Communication Among Multicore and Manycore CPUs
dc.type	Rapport
dc.subject.hal	Informatique [cs]/Système d'exploitation [cs.OS]
bordeaux.page	11
bordeaux.hal.laboratories	Laboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800	*
bordeaux.institution	Université de Bordeaux
bordeaux.institution	Bordeaux INP
bordeaux.institution	CNRS
bordeaux.type.report	rr
hal.identifier	inria-00544872
hal.version	1
hal.audience	Non spécifiée
hal.origin.link	https://hal.archives-ouvertes.fr//inria-00544872v1
bordeaux.COinS	ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.date=2010-12&rft.spage=11&rft.epage=11&rft.au=MA,%20Teng&BOSILCA,%20George&BOUTEILLER,%20Aur%C3%A9lien&GOGLIN,%20Brice&SQUYRES,%20Jeffrey&rft.genre=unknown

Archivos en el ítem

Archivos	Tamaño	Formato	Ver
No hay archivos asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Laboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800

Mostrar el registro sencillo del ítem

Kernel Assisted Collective Intra-node Communication Among Multicore and Manycore CPUs

Archivos en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)