New hybrid computing systems consist of a multicore CPU (central processing unit) and one or more massively parallel accelerator devices, such as GPUs (graphics processing units). Effectively utilizing these systems involves using all of the available computational resources, which may be difficult to program. Computing libraries have long existed to alleviate programmer burden and to provide high-performing and tested implementations for common tasks. Unfortunately, the library space is very fragmented, even in cases where the libraries cover similar functionality. In the accelerated computing space, this is compounded by different libraries and manufacturer-specific products that are used, all of which are incompatible with one another. The work described here seeks to overcome much of this burden, by creating libraries that are truly cross-platform, supporting hardware from different manufacturers, of different generations, and of differing levels of parallelism.

A partial implementation of the BLAS interface library was created for OpenCL devices. The work builds on the prior art in two specific ways:

  1. Two-layered implementation approach: The library consists of “reference” implementations first, which are designed to run on a variety of hardware by a variety of manufacturers. The second layer is to provide an apparatus to hook in other pieces of code that are either open- or closed-source, which are optimized for a certain scenario (either hardware manufacturer, specific device, or a specific set of input parameters).
  2. Provide support across entire hardware classes: Despite its promise of being cross-platform, OpenCL fails to deliver the ability to run the same code on both the CPU and the GPU in many situations. By the apparatus listed in item 1 above, this can be provided in the libraries.

The key to making these two goals happen is in the code that selects the appropriate implementation for a given library call.

In the first phase, the work was limited to the library called BLAS, which contains fundamental linear algebra operations. A small exploratory effort was made into the LAPACK library, which builds on BLAS and contains more advanced operations. Future work is to expand to many other library areas.

The principal advantage to developers is the ability to adapt rapidly to different hardware architectures, either with or without accelerator technology, or to switch to hardware from a different manufacturer. The secondary advantage is a much improved user experience across different mathematical libraries (which are often used together in real-world applications) — the present experience is very disjointed.

This work was done by John Humphrey of EM Photonics, Inc. for Goddard Space Flight Center. GSC-16694-1