What is One API? This has been a common question since Intel announced One API vision during Intel Architecture Day back in December 2018. The aim of this is to deliver uniform software experience with optimal performance across broad range of Intel hardware. There has been some press releases and high level presentations depicting how One API is going to solve programming challenge for scalar processors (CPUs), vector processors (GPUs), matrix processors (AI accelerators) as well as spatial processing elements (FPGAs). For people waiting for Intel Xe as Xeon Phi successor, this is exciting. But for application developers current situation is somewhat confusing as it is difficult to answer simple questions:
- How One API programming model will look like?
- Will this be Intel's proprietary solution?
- Will this be fully compatible with AMD or NVIDIA GPUs?
- Should I port my application to CUDA/HIP/OpenACC/OpenMP or wait for One API?
Obviously Intel engineers and institutes with early access program can answer this questions. For the rest of us, these are all still unknowns. I am not in position to directly answers all questions but I have been looking at related announcements, conference presentations, mailing lists and code repositories. One can try to correlate this to find out how One API might look like. I tried to put together all information in this blog post and I think this gives a good picture of current state.
What Intel has announced?
There has been few press releases from Intel about One API (see list below). Intel announced One API during Intel Architecture Day (December 11th, 2018). There were not much details provided but it was clear that Intel want to provide a unified programming model to simplify application development across diverse computing architectures. Intel said a public project release will be available in 2019. In June 2019, during Intel’s Software Technology Day in London, Intel provided an update about One API project. It was announced that One API will support Direct Programming and API Programming model. The Direct Programming will be based on a new, direct programming language called Data Parallel C++ (DPC++). The API Programming approach will be based on optimized libraries to accelerate workload from different domains. The DPC++ is what most interesting to many of us and will be based Khronos Group’s SYCL heterogeneous programming model (more details in next section). A developer beta of One API project is expected to be released in Q4 2019.
If you would like to read these announcements (which are quite vague in my opinion), here are links:
- Intel Newsroom | December 12, 2018 : New Intel Architectures and Technologies Target Expanded Market Opportunities
- Intel Newsroom | June 19, 2019 : Intel’s ‘One API’ Project Delivers Unified Programming Model Across Diverse Architectures
- Intel Blogs | June 20, 2019 : Intel’S One API will allow to write code once, then target many processing resources: CPUS, GPUS, FPGAS, AI Engines
What is in press, conferences, mailing lists or repositories?
As Intel has revealed very few details, different tech news portals have summarized above mentioned announcements and there is little (new) information. During last year Intel has emphasized that they would like to keep One API efforts open, standards based and portable. This claim is supported by the RFC that Intel team submitted to LLVM mailing list in January 2019. The RFC states that Intel would like to add SYCL programming model support and facilitate collaboration on C++ single-source heterogeneous programming for accelerators like GPU, FPGA, DSP, etc. from different hardware and software vendors. After couple of weeks, Intel open sourced SYCL Compiler and Runtime which is available on GitHub. This repository is seen as staging area for upstreaming SYCL support to LLVM.
Once we connect One API with SYCL then lot of things become more clearer. We can find more information about Intel's effort in SYCL ecosystem and possible programming model that Intel is trying to build. During EuroLLVM 2019, Andrew Savonichev from Intel presented SYCL Compiler. During Embedded Vision Summit 2019, Konstantin Bobrovski from Intel also presented Intel Open-Source SYCL Project. As OpenCL driver will be an important component, there is push for related development as well.
From these developments so far, it is clear that the One API will be closely connected to SYCL. Here are some references that will provide more insights:
- LLVM mailing list | January 11, 2019 : [RFC] Add SYCL programming model support
- EuroLLVM | April 2019 : SYCL compiler: zero-cost abstraction and type safety for heterogeneous computing
- EuroLLVM | April 2019 : RFC: Reference OpenCL Runtime library for LLVM
- EVS | May 2019 : Intel Open Source SYCL Compiler Project
- linux.conf.au | Jan 2019 : But Mummy I don't want to use CUDA - Open source GPU compute
- servethehome.com | December 24, 2018 : Intel One API to Rule Them All Is Much Needed
- fudzilla.com | May 10, 2019 : Raja Koduri announces OneAPI launch in Q4 19
So What is SYCL?
SYCL is a cross-platform, single source, C++ abstraction layer layer on top of OpenCL. It allows developers to leverage standard C++ language to target heterogeneous devices supported by OpenCL. In contract to Microsoft's C++ AMP and NVIDIA's CUDA, SYCL is a pure C++ DSEL (domain specific embedded language) without any C++ extension. This allows one to develop application using standard C++ for standard CPUs or a new architecture without having the hardware and specific compiler available. SYCL specification is around for quite some time, first specification SYCL 1.2 was announced back in GDC 2014. There are multiple implementations available: ComputeCpp, triSYCL, hipSYCL and Intel's LLVM SYCL.
There are already good resources/tutorials about SYCL. Instead of repeating more here, I will leave this section with handy references:
- Andrew Richards | EVS, May 2019 : OpenCL and SYCL
- Michael Wong | LLVM Developers Meeting, October 2018 : The Future Direction of C++ and the Four Horsemen of Heterogenous C++
- Gordon Brown | CppCon, September 2018 : A Modern C++ Programming Model for GPUs using Khronos SYCL
- David Airlie | LPC, November 2018 : Open Source GPU compute stack - Not dancing the CUDA dance
- Michael Wong | CppCon, September 2017 : C++17 ParallelSTL: A Standardization Experience Report for CPU and GPU on SYCL
- Ronan Keryell | November 2017 : TensorFlow SYCL with triSYCL
- Ronan Keryell | IWOCL , December 2015 : Moder C++ Heterogeneous Computing And SYCL for OpenCL
- A J Guillon | March 2014 : SYCL 1.2: Unofficial High-Level Overview
sycl.tech is good place to get latest updates about SYCL.
And How Can I Try It?
Although SYCL is based on standard C++ language, some compiler extensions are required to enable code execution on accelerators (e.g. to annotate functions for device execution). Intel has implemented these changes in LLVM and open sourced SYCL implementation on GitHub. This has two components: SYCL compiler and runtime library. There is a Getting Started guide which is quite straightforward to start with. By the way, I don't think there is support for OSX yet. Below are steps to setup Intel's SYCL implementation on my linux box (Ubuntu 18.04).
Step I First we have to install Intel CPU Runtime for OpenCL Applications with SYCL support provided here. There is a newer release but it's a source release and binary distribution is not provided yet. Following these instructions, I installed these libraries as:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
# run with sudo OR switch to root sudo su # edownload and xtract binaries mkdir -p /opt/intel cd /opt/intel wget https://github.com/intel/llvm/releases/download/expoclcpu-1.0.0/oclcpuexp.tar.gz tar zxvf oclcpuexp.tar.gz # create ICD file pointing to the new RT echo /opt/intel/oclcpuexp/libintelocl.so > /etc/OpenCL/vendors/intel_expcpu.icd # configure library paths echo /opt/intel/oclcpuexp > /etc/ld.so.conf.d/libintelopenclexp.conf ldconfig -f /etc/ld.so.conf.d/libintelopenclexp.conf # remove tarball rm oclcpuexp.tar.gz # if using root, exit shell |
Step II This step is optional and only requires if we want to run on GPU device. Intel has provided OpenCL runtime for GPU here here.
From my understanding, only Intel GPUs are currently supported. There might be possibility to target other GPUs using SPIR backend but I haven't tried that yet.
On my linux box I have NVIDIA GPU amd hence installed OpenCL libraries as:
1 2 3 4 5 |
$ lspci | grep VGA 01:00.0 VGA compatible controller: NVIDIA Corporation GM107GL [Quadro K620] (rev a2) $ sudo apt install -y ocl-icd-libopencl1 opencl-headers clinfo ocl-icd-opencl-dev |
Now we can query all OpenCL supported devices using clinfo
command:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
$ clinfo | grep Device Device Name Quadro K620 Device Vendor NVIDIA Corporation Device Vendor ID 0x10de Device Version OpenCL 1.2 CUDA Device Type GPU ... Device Name Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz Device Vendor Intel(R) Corporation Device Vendor ID 0x8086 Device Version OpenCL 2.1 (Build 0) Device Type CPU ... |
We have now two OpenCL enabled devices available: NVIDIA Quadro GPU and Intel Haswell CPU.
Step III Next we have to install SYCL compiler. This is similar to building LLVM from source with some extra projects(which is bit heavy to build). Assuming you have necessary build tools, we can download and build LLVM with SYCL support as:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# clone intel's llvm form git clone https://github.com/intel/llvm -b sycl $HOME/sycl/llvm # build llvm mkdir $HOME/sycl/llvm/build cd $HOME/sycl/llvm/build cmake -DCMAKE_BUILD_TYPE=Release \ -DLLVM_EXTERNAL_PROJECTS="llvm-spirv;sycl" \ -DLLVM_EXTERNAL_SYCL_SOURCE_DIR=$HOME/sycl/llvm/sycl \ -DLLVM_EXTERNAL_LLVM_SPIRV_SOURCE_DIR=$HOME/sycl/llvm/llvm-spirv \ -DLLVM_ENABLE_PROJECTS="clang;llvm-spirv;sycl" \ -DCMAKE_INSTALL_PREFIX=$HOME/sycl/install \ $HOME/sycl/llvm/llvm make -j<code>nproc</code> sycl-toolchain |
To use clang++ that is just built, set PATH and LD_LIBRARY_PATH environmental variables as:
1 2 3 4 |
export PATH=$HOME/sycl/llvm/build/bin:$PATH export LD_LIBRARY_PATH=$HOME/sycl/llvm/build/lib:$LD_LIBRARY_PATH |
Step IV With development environment setup, we can now test small SYCL programs. Here is a hello-world program to list all devices. The program is self explanatory and you can easily guess what is going on:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
// list_devices.cpp #include <cassert> #include <iostream> #include <utility> #include <CL/sycl.hpp> namespace sycl = cl::sycl; sycl::string_class get_type(const sycl::device& dev) { if (dev.is_host()) { return "host"; } else if (dev.is_gpu()) { return "OpenCL.GPU"; } else if (dev.is_accelerator()) { return "OpenCL.ACC"; } else { return "OpenCL.CPU"; } } int main() { sycl::device d; std::cout << "Default device type: " << get_type(d) << std::endl; int i = 1; for (const auto& dev : sycl::device::get_devices()) { std::cout << "Device " << i++ << " is available of type: " << get_type(dev) << std::endl; } } |
We can compile this program with clang++ that we have built before (make sure it's in $PATH). We can restrict SYCL exposed devices using environmental variable SYCL_DEVICE_TYPE:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
$ clang++ -std=c++11 -fsycl list_devices.cpp -lOpenCL -o list_devices $ SYCL_DEVICE_TYPE=HOST ./list_devices Default device type: host Device 1 is available of type: host $ SYCL_DEVICE_TYPE=GPU ./list_devices Default device type: host Device 1 is available of type: OpenCL.GPU $ ./list_devices Default device type: host Device 1 is available of type: OpenCL.GPU Device 2 is available of type: OpenCL.CPU Device 3 is available of type: host |
Here is more involved example demonstrating from SYCL Reference guide[^sycl-reference-card]. I have added comments so that you can follow the examples without much efforts:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
// fill_array.cpp #include <CL/sycl.hpp> #include <iostream> namespace sycl = cl::sycl; constexpr int LENGTH = 64; int main() { int data[LENGTH]; // new block scope to ensure all SYCL tasks are completed before exiting block { // create a queue to enqueue work on cpu device (there is also gpu_selector) sycl::queue myQueue(sycl::cpu_selector{}); // wrap the data variable in a buffer sycl::buffer<int, 1> resultBuf(data, sycl::range<1>(LENGTH)); // submit commands to the queue myQueue.submit([&](sycl::handler& cgh) { // get access to the buffer for writing auto writeResult = resultBuf.get_access<sycl::access::mode::write>(cgh); // enqueue a parallel_for task: this is kernel function that will be // compiled by a device compiler and executed on a device cgh.parallel_for<class simple_test>(sycl::range<1>(LENGTH), [=](sycl::id<1> idx) { writeResult[idx[0]] = static_cast<int>(idx[0]); }); // end of the kernel function }); // end of the queue commands } // end of scope, so wait for the queued work to complete // buffer will be up-to-date, print result for (int i = 0; i < LENGTH; i++) { std::cout << "data[" << i << "] = " << data[i] << std::endl; } return 0; } |
We can now compile and run this example as:
1 2 3 4 5 6 7 8 9 10 11 |
$ clang++ -std=c++11 -fsycl fill_array.cpp -lOpenCL -o fill_array $ ./fill_array data[0] = 0 data[1] = 1 data[2] = 2 ... data[61] = 61 data[62] = 62 data[63] = 63 |
All good! We now have working SYCL installation and you can dive deep yourself! If you are interested, Codeplay has put together nice tutorial with their own SYCL implementation called ComputeCpp. Another way to learn more about is SYCL specification
and tests under LLVM SYCL source :).
Conclusion
Intel has been putting significant efforts in SYCL ecosystem and this would be major contribution to the LLVM/Clang infrastructure.
I think One API won't be a magical solution but SYCL with specific extensions, optimized libraries for Intel architectures. It's clear that new C++ standards (C++11, C++17, C++20) is taking centre stage and different vendors are already pushing this. Based on above developments we can try to answer questions mentioned at the beginning:
- How One API programming model will look like? : SYCL based C++ programming model with some extensions and optimized libraries?
- Will this be Intel's proprietary solution? : Not entirely but there will be some Intel specific extensions?
- Will this be compatible with AMD or NVIDIA GPUs? : SYCL is open standard, so "theoretically" yes using other implementations?
- Should I port my application to CUDA/HIP/OpenACC/OpenMP or wait for One API? : It's more about, can you move to future C++17/20 programming models with SYCL like interface? Implementations are still going to use OpenMP, ROCm, CUDA etc underneath.
Until Intel unveil beta release in Q4 2019, there is sufficient material for us learn about modern C++ and SYCL programming model. That's all for this post! Happy weekend!