Automatic efficient data layout for multithreaded stencil codes on CPUs and GPUs
BARTHOU, Denis
Efficient runtime systems for parallel architectures [RUNTIME]
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
BARTHOU, Denis
Efficient runtime systems for parallel architectures [RUNTIME]
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
< Leer menos
Efficient runtime systems for parallel architectures [RUNTIME]
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Idioma
en
Communication dans un congrès
Este ítem está publicado en
IEEE Proceedings of High Performance Computing conference, IEEE Proceedings of High Performance Computing conference, High Performance Computing conference, 2012-12-18. 2012-12-18p. 1-10
Resumen en inglés
Stencil based computation on structured grids is a kernel at the heart of a large number of scientific applications. The variety of stencil kernels used in practice make this computation pattern difficult to assemble into ...Leer más >
Stencil based computation on structured grids is a kernel at the heart of a large number of scientific applications. The variety of stencil kernels used in practice make this computation pattern difficult to assemble into a high performance computing library. With the multiplication of cores on a single chip, answering architectural alignment requirements became an even more important key to high performance. In addition to vector accesses, data layout optimization must also consider concurrent parallel accesses. In this paper, we develop a strategy to automatically generate stencil codes for multicore vector architectures, searching for the best data layout possible to answer architectural alignment problems. We introduce a new method for aligning multidimensional data structures, called multipadding, that can be adapted to specificities of multicores and GPUs architectures. We present multiple methods with different level of complexity. We show on different stencil patterns that generated codes with multipadding display better performances than existing optimizations.< Leer menos
Proyecto ANR
Vers le Petaflop pour LQCD - ANR-08-COSI-0010
Orígen
Importado de HalCentros de investigación