Mostrar el registro sencillo del ítem

hal.structure.identifierInnovative Computing Laboratory [Knoxville] [ICL]
dc.contributor.authorMA, Teng
hal.structure.identifierInnovative Computing Laboratory [Knoxville] [ICL]
dc.contributor.authorBOSILCA, George
hal.structure.identifierInnovative Computing Laboratory [Knoxville] [ICL]
dc.contributor.authorBOUTEILLER, Aurélien
hal.structure.identifierLaboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifierEfficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.authorGOGLIN, Brice
hal.structure.identifierCisco Systems
dc.contributor.authorSQUYRES, Jeffrey
hal.structure.identifierInnovative Computing Laboratory [Knoxville] [ICL]
dc.contributor.authorDONGARRA, Jack
dc.date.accessioned2024-04-15T09:48:20Z
dc.date.available2024-04-15T09:48:20Z
dc.date.issued2010-12
dc.identifier.urihttps://oskar-bordeaux.fr/handle/20.500.12278/198171
dc.description.abstractEnMore memory hierarchies, NUMA architectures and network-style interconnection are widely used in modern many-core CPU design to achieve performance scalability. As a leading intra-node programming model, Message Passing Interface (MPI) implementations must exploit these architectures to provide reliable performance portability. These new architectures not only require specialized MPI point-to-point messaging protocols, they also require carefully designed and tuned algorithms for MPI collective operations. Multiple issues must be taken into account: 1) minimizing the number of copies required, 2) minimizing traffic to ''remote'' NUMA memory, and 3) carefully avoiding memory bottlenecks for ''rooted'' collective operations. In this paper, we present a kernel assisted intra-node collective module addressing those three issues on many-core systems. A kernel level inter-process memory copy module, called KNEM, is used by a novel Open MPI collective module to implement several improved strategies based on decreasing the number of intermediate memory copies and improving locality to reduce both the pressure on the memory banks and the cache pollution. The collective topology is mapped onto the NUMA topology to minimize cross traffic on inter-socket links. Experiments illustrate that the KNEM enabled Open MPI collective module can achieve up to a threefold speedup on synthetic benchmarks, resulting in a 12% improvement for a parallel graph shortest path discovery application.
dc.language.isoen
dc.subject.enMPI
dc.subject.enMulticore
dc.subject.enShared memory
dc.subject.enNUMA
dc.subject.enKernel
dc.subject.enCollective communication
dc.title.enKernel Assisted Collective Intra-node Communication Among Multicore and Manycore CPUs
dc.typeRapport
dc.subject.halInformatique [cs]/Système d'exploitation [cs.OS]
bordeaux.page11
bordeaux.hal.laboratoriesLaboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800*
bordeaux.institutionUniversité de Bordeaux
bordeaux.institutionBordeaux INP
bordeaux.institutionCNRS
bordeaux.type.reportrr
hal.identifierinria-00544872
hal.version1
hal.audienceNon spécifiée
hal.origin.linkhttps://hal.archives-ouvertes.fr//inria-00544872v1
bordeaux.COinSctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.date=2010-12&rft.spage=11&rft.epage=11&rft.au=MA,%20Teng&BOSILCA,%20George&BOUTEILLER,%20Aur%C3%A9lien&GOGLIN,%20Brice&SQUYRES,%20Jeffrey&rft.genre=unknown


Archivos en el ítem

ArchivosTamañoFormatoVer

No hay archivos asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem