JEANNOT, Emmanuel; MERCIER, Guillaume; TESSIER, François

The system will be going down for regular maintenance. Please save your work and logout.

Metadata

License

JEANNOT, Emmanuel
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]

MERCIER, Guillaume
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]

TESSIER, François
Efficient runtime systems for parallel architectures [RUNTIME]
Laboratoire Bordelais de Recherche en Informatique [LaBRI]

Language

Rapport

This item was published in

2013-03-22p. 32

Abstract

Les générations actuelles de grappes de noeuds NUMA possèdent des processeurs multicoeurs ou manycore. Le programmation efficace de telles architectures est un véritable défi parce que de nombreux détails matériels doivent être pris en considération, en particulier la hiérarchie mémoire. Afin d'améliorer les performances des applications parallèles, une idée séduisante est de diminuer le coût de leurs communications en faisant correspondre leur schéma de communication à l'architecture matérielle sous-jacente. Dans ce rapport de recherche, nous détaillons l'algorithme et les techniques proposés afin d'obtenir ce résultat : d'abord, nous collectons deux informations-clefs, à savoir, le schéma de communication et les détails matériels de l'architecture-cible. Ensuite, nous calculons une permutation des numéros de rang des processus de l'application. Pour finir, ces nouveaux numéros de rang sont utilisés dans les opérations de communication en vue de diminuer les coûts de communication de l'application.Read less <

English Abstract

Current generations of NUMA node clusters feature multicore or manycore processors. Programming such architectures efficiently is a challenge because numerous hardware characteristics have to be taken into account, especially the memory hierarchy. One appealing idea to improve the performance of parallel applications is to decrease their communication costs by matching the communication pattern to the underlying hardware architecture. In this report, we detail the algorithm and techniques proposed to achieve such a result: first, we gather both the communication pattern information and the hardware details. Then we compute a relevant reordering of the various process ranks of the application. Finally, those new ranks are used to reduce the communication costs of the application.Read less <

English Keywords

Parallel programming

High performance computing

Multicore processing

Metadata

Share this item!

License

Process Placement in Multicore Clusters: Algorithmic Issues and Practical Techniques

Language

This item was published in

Abstract

English Abstract

English Keywords

URI

Origin

Collections