Type of publication
Poster presentation

Raul Torres Vivek Kale Abid Malik Tom Scogland Roger Ferrer Barbara Chapman

Conference / Journal
SC 21
Year of publication

Nodes of emerging supercomputers have multiple GPUs, i.e., a multi-GPU, on them. Applications are often parallelized across the GPUs of a multi-GPU using MPI, but a more performant and portable solution for parallelizing across the GPUs is needed. OpenMP, which is used to parallelize computation within a multi-core or a GPU, could facilitate parallelization of computation across the GPUs in a performant and portable way through, e.g., low memory requirements compared to MPI and directive-based parallelization. In this work, we present a solution that provides support in OpenMP for parallelizing an application across GPUs of a multi-GPU through language extensions and compiler optimizations developed in LLVM's OpenMP implementation. Preliminary experimentation of our solution using the Stream benchmark on a cluster’s node having four GPUs suggests that our approach can be a performant, portable and easy-to-use solution for application programmers to harness the computational power of the GPUs of a node.