Beitrag in einem Tagungsband
Semiautomatic cache optimizations using OpenMP



Details zur Publikation
Autor(inn)en:
Breitbart, J.
Herausgeber:
Springer-Verlag
Verlagsort / Veröffentlichungsort:
Iceland
Publikationsjahr:
2011
Seitenbereich:
TBD
Buchtitel:
Proceedings of the 10th Para: State of the Art in Scientific and Parallel Computing Conference (Para 2010) Reykjavík

Zusammenfassung, Abstract
The processing power of multicore CPUs increases at a high rate, whereas memory bandwidth is falling behind. Almost all modern processors use multiple cache levels to overcome the penalty of slow main memory; however cache eciency is directly bound to data locality. This paper studies a possible way to incorporate data locality exposure into the syntax of the parallel programming system OpenMP. We study data locality optimizations on two applications: matrix multiplication and Gau-Seidel stencil. We show that only small changes to OpenMP are required to expose data locality so a compiler can transform the code. Our notion of tiled loops allows developers to easily describe data locality even at scenarios with non-trivial data dependencies. Furthermore, we describe two new optimization techniques. One explicitly uses a form of local memory to prevent conict cache misses, whereas the second one modi es the wavefront parallel programming pattern with dynamically sized blocks to increase the number of parallel tasks. As an additional contribution we explore the bene t of using multiple levels of tiling.


Autor(inn)en / Herausgeber(innen)


Forschungsfelder


Zuletzt aktualisiert 2019-25-07 um 15:43

Link teilen