STANISIC, Luka; THIBAULT, Samuel; LEGRAND, Arnaud; VIDEAU, Brice; MÉHAUT, Jean-François

The system will be going down for regular maintenance. Please save your work and logout.

hal.structure.identifier	Middleware efficiently scalable [MESCAL]
hal.structure.identifier	Laboratoire d'Informatique de Grenoble [LIG]
dc.contributor.author	STANISIC, Luka
hal.structure.identifier	Laboratoire Bordelais de Recherche en Informatique [LaBRI]
hal.structure.identifier	Efficient runtime systems for parallel architectures [RUNTIME]
dc.contributor.author	THIBAULT, Samuel
hal.structure.identifier	Middleware efficiently scalable [MESCAL]
hal.structure.identifier	Laboratoire d'Informatique de Grenoble [LIG]
dc.contributor.author	LEGRAND, Arnaud
hal.structure.identifier	Laboratoire d'Informatique de Grenoble [LIG]
dc.contributor.author	VIDEAU, Brice
hal.structure.identifier	Laboratoire d'Informatique de Grenoble [LIG]
dc.contributor.author	MÉHAUT, Jean-François
dc.date.accessioned	2024-04-15T09:41:35Z
dc.date.available	2024-04-15T09:41:35Z
dc.date.created	2014-02-13
dc.date.issued	2014-03-27
dc.identifier.uri	https://oskar-bordeaux.fr/handle/20.500.12278/197611
dc.description.abstract	Les architectures multi-cœurs comprenant plusieurs GPU sont devenues courantes dans le domaine du calcul à hautes performances (HPC). Cependant, tirer le maximum de performance de telles architectures hétérogènes est un défi qui nécessite d'ordonnancer précisément les calculs aux différentes unités de traitement, ainsi que les transferts de données afférents. Les approches les plus prometteuses reposent sur des intergiciels (runtime) à base de tâches qui présentent une abstraction de la machine sur laquelle ces tâches sont ordonnancées de manière opportuniste. En conséquence, toute la difficulté devient de choisir la granularité des tâches, la structure du graphe de tâches et d'optimiser les algorithmes d'ordonnancement. L'évaluation des combinaisons de ces différents paramètres est un défi en soi. En effet, l'obtention de mesures précises nécessite l'exclusivité d'accès aux ressources pendant la totalité des expériences. De plus, ces observations étant limitées aux quelques systèmes disponibles il peut être difficile d'en tirer des conclusions générales. Dans ce rapport de recherche, nous montrons comment nous avons simulé/émulé StarPU, un runtime dynamique pour architecture hybride, à l'aide de SimGrid, un simulateur versatile d'architectures distribuées. Cette approche permet d'obtenir très rapidement des prédictions de performances précises à quelques pour cent près sur des noyaux d'algèbre linéaire dense. Ceci permet aux développeurs de l'application et du runtime de décider rapidement quelle optimisation activer ou d'évaluer l'opportunité de faire évoluer l'architecture.
dc.description.abstractEn	Multi-core architectures comprising several GPUs have become mainstream in the field of High-Performance Computing. However, obtaining the maximum performance of such heterogeneous machines is challenging as it requires to carefully offload computations and manage data movements between the different processing units. The most promising and successful approaches so far rely on task-based runtimes that abstract the machine and rely on opportunistic scheduling algorithms. As a consequence, the problem gets shifted to choosing the task granularity, task graph structure, and optimizing the scheduling strategies. Trying different combinations of these different alternatives is also itself a challenge. Indeed, getting accurate measurements requires reserving the target system for the whole duration of experiments. Furthermore, observations are limited to the few available systems at hand and may be difficult to generalize. In this research report, we show how we crafted a coarse-grain hybrid simulation/emulation of StarPU, a dynamic runtime for hybrid architectures, over SimGrid, a versatile simulator for distributed systems. This approach allows to obtain performance predictions accurate within a few percents on classical dense linear algebra kernels in a matter of seconds, which allows both runtime and application designers to quickly decide which optimization to enable or whether it is worth investing in higher-end GPUs or not.
dc.description.sponsorship	Simulation de systèmes de prochaine génération - ANR-11-INFR-0013
dc.language.iso	en
dc.subject.en	Simulation
dc.subject.en	Runtime
dc.subject.en	Hybrid platforms
dc.title.en	Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures
dc.type	Rapport
dc.subject.hal	Informatique [cs]/Modélisation et simulation
dc.subject.hal	Informatique [cs]/Calcul parallèle, distribué et partagé [cs.DC]
bordeaux.hal.laboratories	Laboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800	*
bordeaux.institution	Université de Bordeaux
bordeaux.institution	Bordeaux INP
bordeaux.institution	CNRS
bordeaux.type.institution	INRIA
bordeaux.type.report	rr
hal.identifier	hal-00966862
hal.version	1
hal.audience	Non spécifiée
hal.origin.link	https://hal.archives-ouvertes.fr//hal-00966862v1
bordeaux.COinS	ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.date=2014-03-27&rft.au=STANISIC,%20Luka&THIBAULT,%20Samuel&LEGRAND,%20Arnaud&VIDEAU,%20Brice&M%C3%89HAUT,%20Jean-Fran%C3%A7ois&rft.genre=unknown

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Laboratoire Bordelais de Recherche en Informatique (LaBRI) - UMR 5800

Show simple item record

Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures

Files in this item

This item appears in the following Collection(s)