BEAUMONT, Olivier; LAMBERT, Thomas; MARCHAL, Loris; THOMAS, Bastien

La plateforme OSKAR Bordeaux évolue pour rejoindre l'archive ouverte HAL. Retrouvez tous vos dépôts sur le nouveau portail HAL UB : https://u-bordeaux.hal.science/. Pour toute aide ou information, contactez-nous info@oskar-bordeaux.fr

Métadonnées

Afficher la notice complète

Licence d’utilisation du document

BEAUMONT, Olivier
Reformulations based algorithms for Combinatorial Optimization [Realopt]

LAMBERT, Thomas
Reformulations based algorithms for Combinatorial Optimization [Realopt]

MARCHAL, Loris
Optimisation des ressources : modèles, algorithmes et ordonnancement [ROMA]
Laboratoire de l'Informatique du Parallélisme [LIP]

Langue

Rapport

Ce document a été publié dans

2017-02

Résumé en anglais

MapReduce is a well-know framework for distributing data-processingcomputations onto parallel clusters. In MapReduce, a large computationis broken into small tasks that run in parallel on multiple machines,and scales easily to very large clusters of inexpensive commoditycomputers. Before the Map phase, the original dataset is split intodata chunks that are replicated (a constant number of times, usually3) and distributed randomly onto computing nodes. During the Mapphase, local tasks (i.e., tasks whose data chunks are stored locally)are assigned in priority when processors request tasks. In thispaper, we provide the first complete theoretical analysis of datalocality in the Map phase of MapReduce, and more generally, forbag-of-tasks applications that behave like MapReduce. We prove that iftasks are homogeneous (in terms of processing time), as soon as thereplication factor is larger than 2, FindAssignment, a matching basedalgorithm, achieves a quasi-perfect makespan (i.e., optimal up to anadditive constant of one step) using a sophisticated matchingalgorithm. Above result is proved with high probability when thenumber of tasks becomes arbitrarily large, and we therefore complementtheoretical results with simulations that corroborate them even forsmall number of tasks. We also show that the matching-based approachleads to an improvement of data locality during the Map phase andtherefore decreases the amount of communications needed to achieveperfect makespan, compared to the classical MapReduce greedy approach.< Réduire

Mots clés en anglais

Resource Allocation and Scheduling

Analysis of Randomized Algorithms

Matchings

MapReduce

Balls-into-bins

URI

https://oskar-bordeaux.fr/handle/20.500.12278/193525

Project ANR

Solveurs pour architectures hétérogènes utilisant des supports d'exécution - ANR-13-MONU-0007

Origine

Importé de hal

Unités de recherche

Institut de Mathématiques de Bordeaux (IMB) - UMR 5251