Improving Message Passing over Ethernet with I/OAT Copy Offload in Open-MX
GOGLIN, Brice
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
GOGLIN, Brice
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
< Réduire
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Efficient runtime systems for parallel architectures [RUNTIME]
Langue
en
Communication dans un congrès
Ce document a été publié dans
Cluster 2008, 2008-09-29, Tsukuba. 2008
Résumé en anglais
Open-MX is a new message passing layer implemented on top of the generic Ethernet stack of the Linux kernel. Open-MX works on all Ethernet hardware, but it suffers from expensive memory copy requirements on the receiver ...Lire la suite >
Open-MX is a new message passing layer implemented on top of the generic Ethernet stack of the Linux kernel. Open-MX works on all Ethernet hardware, but it suffers from expensive memory copy requirements on the receiver side due to the hardware's inability to deposit messages directly in the target application buffers. This article presents the implementation of an asynchronous memory copy offload in the Open-MX stack thanks to Intel I/O Acceleration Technology. The overlapping of large message fragment copies with the processing increases the receive throughput by 30% while reducing the CPU usage by up to 40%. It enables Open-MX to reach 10 gigabit/s Ethernet line rate for large messages. Open-MX large intra-node communication also benefits significantly from the I/OAT hardware since the performance of its one-copy-based local communication mechanism is almost doubled by using blocking I/OAT memory copies. By combining all these optimizations, the Open-MX large message performance on top of 10G hardware is now able to bridge the gap with the native Myrinet Express stack.< Réduire
Origine
Importé de halUnités de recherche