Scheduling Dynamic OpenMP Applications over Multicore Architectures
BROQUEDIS, François
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
DIAKHATE, François
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
DAM Île-de-France [DAM/DIF]
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
DAM Île-de-France [DAM/DIF]
THIBAULT, Samuel
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
Voir plus >
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
BROQUEDIS, François
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
DIAKHATE, François
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
DAM Île-de-France [DAM/DIF]
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
DAM Île-de-France [DAM/DIF]
THIBAULT, Samuel
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
AUMAGE, Olivier
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
NAMYST, Raymond
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
WACRENIER, Pierre-André
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
< Réduire
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
Langue
en
Communication dans un congrès
Ce document a été publié dans
International Workshop on OpenMP, 2008-05, West Lafayette, IN. 2008
Résumé en anglais
Approaching the theoretical performance of hierarchical multicore machines requires a very careful distribution of threads and data among the underlying non-uniform architecture in order to minimize cache misses and NUMA ...Lire la suite >
Approaching the theoretical performance of hierarchical multicore machines requires a very careful distribution of threads and data among the underlying non-uniform architecture in order to minimize cache misses and NUMA penalties. While it is acknowledged that OpenMP can enhance the quality of thread scheduling on such architectures in a portable way, by transmitting precious information about the affinities between threads and data to the underlying runtime system, most OpenMP runtime systems are actually unable to efficiently support highly irregular, massively parallel applications on NUMA machines. In this paper, we present a thread scheduling policy suited to the execution of OpenMP programs featuring irregular and massive nested parallelism over hierarchical architectures. Our policy enforces a distribution of threads that maximizes the proximity of threads belonging to the same parallel section, and uses a NUMA-aware work stealing strategy when load balancing is needed. It has been developed as a plug-in to the ForestGOMP OpenMP platform. We demonstrate the efficiency of our approach with a highly irregular recursive OpenMP program resulting from the generic parallelization of a surface reconstruction application. We achieve a speedup of 14 on a 16-core machine with no application-level optimization.< Réduire
Mots clés en anglais
OpenMP
Nested Parallelism
Hierarchical Thread Scheduling
Bubbles
Multi-Core
NUMA
SMP
Project ANR
Parallélisme et Amélioration du Rendement des Apllications - ANR-05-CIGC-0001
Origine
Importé de halUnités de recherche