Performance Portable Stencil Code Generation with Lift


Slides of the talk

Stencil computations are used in a wide range of applications from physical simulations to machine-learning. They expose large amount of parallelism and are a perfect fit for modern parallel hardware such as GPUs (Graphic Processing Unit). Although stencil computations have been studied for several decades, optimizing and tuning them for different hardware remains challenging for most programmers.

DSLs (Domain Specific Language) have shown that it is possible to raise the programming abstraction and offer good performance at the same time. However, these approaches push the burden on DSL implementers which almost have to write a full-fledged compiler and optimizer with little reuse between different DSLs. LIFT has recently emerged as a new approach to achieving performance portability which is based on a small set of reusable parallel primitives which can be targeted by DSLs or library writers. LIFT ’s key novelty is in its encoding of optimizations as a system of rewrite rules which are used to explore the optimization space. This approach to optimizations is easily extensible requiring very little effort from compiler writers. However, LIFT has mostly focused on linear algebra operations and it remains to be seen whether this approach is applicable for other domains.

In this talk, I will explain how we put this to the test and extend LIFT with support for stencil computations. By leveraging the existing LIFT primitives and optimizations, we only require the addition of a very small number of primitives together with a few rewrite rules. Performance results on several stencil applications show that this approach leads to high performance.