AUGONNET, Cédric; THIBAULT, Samuel; NAMYST, Raymond

The system will be going down for regular maintenance. Please save your work and logout.

hal.structure.identifier	Laboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifier	Efficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.author	AUGONNET, Cédric
hal.structure.identifier	Laboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifier	Efficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.author	THIBAULT, Samuel
hal.structure.identifier	Laboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifier	Efficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.author	NAMYST, Raymond
dc.date.accessioned	2024-04-15T09:49:29Z
dc.date.available	2024-04-15T09:49:29Z
dc.date.issued	2010-03
dc.identifier.uri	https://oskar-bordeaux.fr/handle/20.500.12278/198263
dc.description.abstractEn	Multicore machines equipped with accelerators are becoming increasingly popular. The TOP500-leading RoadRunner machine is probably the most famous example of a parallel computer mixing IBM Cell Broadband Engines and AMD opteron processors. Other architectures, featuring GPU accelerators, are expected to appear in the near future. To fully tap into the potential of these hybrid machines, pure offloading approaches, in which the main core of the application runs on regular processors and offloads specific parts on accelerators, are not sufficient. The real challenge is to build systems where the application would permanently spread across the entire machine, that is, where parallel tasks would be dynamically scheduled over the full set of available processing units. To face this challenge, we propose a new runtime system capable of scheduling tasks over heterogeneous, accelerator-based machines. Our system features a software virtual shared memory that provides a weak consistency model. The system keeps track of data copies within accelerator embedded-memories and features a data-prefetching engine. Such facilities, together with a database of self-tuned per-task performance models, can be used to greatly improve the quality of scheduling policies in this context. We demonstrate the relevance of our approach by benchmarking various parallel numerical kernel implementations over our runtime system. We obtain significant speedups and a very high efficiency on various typical workloads over multicore machines equipped with multiple accelerators.
dc.description.sponsorship	Programmation des technologies multicoeurs hétérogènes - ANR-08-COSI-0013
dc.language.iso	en
dc.title.en	StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines
dc.type	Rapport
dc.subject.hal	Informatique [cs]/Système d'exploitation [cs.OS]
bordeaux.page	33
bordeaux.hal.laboratories	Laboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800	*
bordeaux.institution	Université de Bordeaux
bordeaux.institution	Bordeaux INP
bordeaux.institution	CNRS
bordeaux.type.institution	INRIA
bordeaux.type.report	rr
hal.identifier	inria-00467677
hal.version	1
hal.audience	Non spécifiée
hal.origin.link	https://hal.archives-ouvertes.fr//inria-00467677v1
bordeaux.COinS	ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.date=2010-03&rft.spage=33&rft.epage=33&rft.au=AUGONNET,%20C%C3%A9dric&THIBAULT,%20Samuel&NAMYST,%20Raymond&rft.genre=unknown

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Laboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800

Show simple item record

StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines

Files in this item

This item appears in the following Collection(s)