Automatic Mapping of Stream Programs on Multicore Architectures
BARTHOU, Denis
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
BARTHOU, Denis
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
< Réduire
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
Langue
en
Communication dans un congrès
Ce document a été publié dans
International Workshop on Compilers for Parallel Computers, 2010-07-07, Vienna.
Résumé en anglais
Stream languages explicitly describe fork-join and pipeline parallelism, o ering a powerful programming model for general multi- core systems. This parallelism description can be exploited on hybrid architectures, eg. ...Lire la suite >
Stream languages explicitly describe fork-join and pipeline parallelism, o ering a powerful programming model for general multi- core systems. This parallelism description can be exploited on hybrid architectures, eg. composed of Graphics Processing Units (GPUs) and general purpose multicore processors. In this paper, we present a novel approach to optimize stream programs for hybrid architectures composed of GPU and multicore CPUs. The ap- proach focuses on memory and communication performance bottlenecks for this kind of architecture. The initial task graph of the stream program is rst transformed so as to reduce fork-join synchronization costs. The transformation is obtained through the application of a sequence of some optimizing elementary stream restructurations enabling communication e cient mappings. Then tasks are scheduled in a software pipeline and coarsened with a coarsening level adapted to their placement (CPU of GPU). Our experiments show the importance of both the synchroniza- tion cost reduction and of the coarsening step on performance, adapting the grain of parallelism to the CPUs and to the GPU.< Réduire
Origine
Importé de halUnités de recherche