Performance Analysis and Optimization of the Tiled Cholesky Factorization on NUMA Machines
JEANNOT, Emmanuel
Efficient runtime systems for parallel architectures [RUNTIME]
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
JEANNOT, Emmanuel
Efficient runtime systems for parallel architectures [RUNTIME]
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
< Reduce
Efficient runtime systems for parallel architectures [RUNTIME]
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Language
en
Communication dans un congrès
This item was published in
PAAP 2012 - IEEE International Symposium on Parallel Architectures, Algorithms and Programming, 2012-12-17, Taipei. 2012-12
IEEE
English Abstract
We discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-time (NUMA) shared memory machines. We show how to optimize thread placement and data placement in order to achieve ...Read more >
We discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-time (NUMA) shared memory machines. We show how to optimize thread placement and data placement in order to achieve performance gain up to 50% compared to state-of-the-art libraries such as Plasma or MKL.Read less <
Origin
Hal imported