A Generic and High Performance Approach for Fault Tolerance in Communication Library
hal.structure.identifier | Computer Science Department [CST] | |
dc.contributor.author | TRAHAY, François | |
hal.structure.identifier | Laboratoire Bordelais de Recherche en Informatique [LaBRI] | |
hal.structure.identifier | Efficient runtime systems for parallel architectures [RUNTIME] | |
dc.contributor.author | DENIS, Alexandre | |
hal.structure.identifier | Computer Science Department [CST] | |
dc.contributor.author | ISHIKAWA, Yutaka | |
dc.date.accessioned | 2024-04-15T09:43:48Z | |
dc.date.available | 2024-04-15T09:43:48Z | |
dc.date.issued | 2010-12-10 | |
dc.identifier.uri | https://oskar-bordeaux.fr/handle/20.500.12278/197791 | |
dc.description.abstractEn | With the increase of the number of nodes in clusters, the probability of failures increases. In this paper, we study the failures in the network stack for high performance networks. We present the design of several fault-tolerance mechanisms for communication libraries to detect failures and to ensure message integrity. We have implemented these mechanisms in the N EW M ADELEINE communication library with a quick detection of failures in a portable way, and with fallback to available links when an error occurs. Our mechanisms ensure the integrity of messages without lowering too much the networking performance. Our evaluation show that ensuring fault-tolerance does not impact significantly the performance of most applications. | |
dc.language.iso | en | |
dc.subject.en | NewMadeleine | |
dc.subject.en | MPI | |
dc.subject.en | MadMPI | |
dc.subject.en | pioman | |
dc.title.en | A Generic and High Performance Approach for Fault Tolerance in Communication Library | |
dc.type | Rapport | |
dc.subject.hal | Informatique [cs]/Réseaux et télécommunications [cs.NI] | |
bordeaux.hal.laboratories | Laboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800 | * |
bordeaux.institution | Université de Bordeaux | |
bordeaux.institution | Bordeaux INP | |
bordeaux.institution | CNRS | |
bordeaux.type.institution | INRIA Bordeaux | |
bordeaux.type.report | rr | |
hal.identifier | hal-00793176 | |
hal.version | 1 | |
hal.origin.link | https://hal.archives-ouvertes.fr//hal-00793176v1 | |
bordeaux.COinS | ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.date=2010-12-10&rft.au=TRAHAY,%20Fran%C3%A7ois&DENIS,%20Alexandre&ISHIKAWA,%20Yutaka&rft.genre=unknown |
Fichier(s) constituant ce document
Fichiers | Taille | Format | Vue |
---|---|---|---|
Il n'y a pas de fichiers associés à ce document. |