Conference proceedings article
A Malleable and Fault-Tolerant Task Pool Framework for X10



Publication Details
Authors:
Bungart, M.; Fohry, C.
Editor:
IEEE
Publisher:
Curran Associates
Place:
Red Hook, New York
Publication year:
2017
Pages range:
749-757
Book title:
2017 IEEE International Conference on Cluster Computing (CLUSTER)
ISBN:
978-1-5386-2326-8

Abstract
Current HPC environments require parallel programs that are both malleable and fault-tolerant. Malleability denotes the ability to embrace system-initiated resource changes, and fault tolerance denotes the ability to cope with, e.g., permanent node failures.This paper considers the task pool pattern, specifically its lifeline-based variant. It builds on a previous fault-tolerant realization, and integrates the ability to add resources. We suggest a growth protocol that is able to cope with failures of old and new resources during its execution. New resources replace failed ones in one data structure, while getting a new role in another.The algorithm was implemented in a framework for the programming language X10. Correctness tests and performance measurements used the Unbalanced Tree Search (UTS) benchmark. We compared the performance on a constant vs. varying number of workers with same average, and observed negligible differences in execution time and task throughput.


Authors/Editors

Last updated on 2019-25-07 at 16:24