Beitrag in einem Tagungsband
A Malleable and Fault-Tolerant Task Pool Framework for X10
Details zur Publikation
Autor(inn)en: | Bungart, M.; Fohry, C. |
Herausgeber: | IEEE |
Verlag: | Curran Associates |
Verlagsort / Veröffentlichungsort: | Red Hook, New York |
Publikationsjahr: | 2017 |
Seitenbereich: | 749-757 |
Buchtitel: | 2017 IEEE International Conference on Cluster Computing (CLUSTER) |
ISBN: | 978-1-5386-2326-8 |
DOI-Link der Erstveröffentlichung: |
Zusammenfassung, Abstract
Current HPC environments require parallel programs that are both malleable and fault-tolerant. Malleability denotes the ability to embrace system-initiated resource changes, and fault tolerance denotes the ability to cope with, e.g., permanent node failures.This paper considers the task pool pattern, specifically its lifeline-based variant. It builds on a previous fault-tolerant realization, and integrates the ability to add resources. We suggest a growth protocol that is able to cope with failures of old and new resources during its execution. New resources replace failed ones in one data structure, while getting a new role in another.The algorithm was implemented in a framework for the programming language X10. Correctness tests and performance measurements used the Unbalanced Tree Search (UTS) benchmark. We compared the performance on a constant vs. varying number of workers with same average, and observed negligible differences in execution time and task throughput.
Current HPC environments require parallel programs that are both malleable and fault-tolerant. Malleability denotes the ability to embrace system-initiated resource changes, and fault tolerance denotes the ability to cope with, e.g., permanent node failures.This paper considers the task pool pattern, specifically its lifeline-based variant. It builds on a previous fault-tolerant realization, and integrates the ability to add resources. We suggest a growth protocol that is able to cope with failures of old and new resources during its execution. New resources replace failed ones in one data structure, while getting a new role in another.The algorithm was implemented in a framework for the programming language X10. Correctness tests and performance measurements used the Unbalanced Tree Search (UTS) benchmark. We compared the performance on a constant vs. varying number of workers with same average, and observed negligible differences in execution time and task throughput.