Beitrag in einem Tagungsband
OpenMP for next generation heterogeneous clusters

Details zur Publikation
Breitbart, J.
Verlagsort / Veröffentlichungsort:
Proceedings of the 2nd USENIX Workshop on Hot Topics in Parallelism (HOTPAR 2010)

Zusammenfassung, Abstract
The last years have seen great diversity in new hardware with e. g. GPUs providing multiple times the processing power of CPUs. Programming GPUs or clusters of GPUs however is still complicated and time consuming. In this paper we present extensions to OpenMP that allow one program to scale from a single multi-core CPU to a many-core cluster (e. g. a GPU cluster). We extend OpenMP with a new scheduling clause to enable developers to specify automatic tiling and library functions to access the tile size or the number of the currently calculated tile. We furthermore demonstrate that the intra-tile parallelization can be created automatically based on the inter-tile parallelization and thereby allows for scalability to shared memory many-core architectures. To be able to use OpenMP on distributed memory systems we propose a PGAS-like memory level called world memory. World memory does not only allow data to be shared among multiple processes, but also allows for fine-grained synchronization of processes. World memory has two states: initialized and uninitialized. A process reading from uninitialized memory will be suspended, until another process writes to that memory and thereby initializes it. This concept requires oversaturating the available hardware with processes.

Autor(inn)en / Herausgeber(innen)

Zuletzt aktualisiert 2019-25-07 um 11:01