Conference proceedings article
Semiautomatic cache optimizations using OpenMP

Publication Details
Breitbart, J.
Publication year:
Pages range:
Book title:
Proceedings of the 10th Para: State of the Art in Scientific and Parallel Computing Conference (Para 2010) Reykjavík

The processing power of multicore CPUs increases at a high rate, whereas memory bandwidth is falling behind. Almost all modern processors use multiple cache levels to overcome the penalty of slow main memory; however cache eciency is directly bound to data locality. This paper studies a possible way to incorporate data locality exposure into the syntax of the parallel programming system OpenMP. We study data locality optimizations on two applications: matrix multiplication and Gau-Seidel stencil. We show that only small changes to OpenMP are required to expose data locality so a compiler can transform the code. Our notion of tiled loops allows developers to easily describe data locality even at scenarios with non-trivial data dependencies. Furthermore, we describe two new optimization techniques. One explicitly uses a form of local memory to prevent conict cache misses, whereas the second one modi es the wavefront parallel programming pattern with dynamically sized blocks to increase the number of parallel tasks. As an additional contribution we explore the bene t of using multiple levels of tiling.


Research Areas

Last updated on 2019-25-07 at 15:43