I've been looking at what it would take to convert GPULib from CUDA to OpenCL. In the effort, I have created a simple DLM that provides some basic arithmetic operations and some more advanced features. For example, we can create variables on the GPU in standard ways:
IDL> dx = cl_findgen(10)
IDL> dresult = cl_fltarr(10)
We can also transfer data from the CPU to the GPU:
IDL> hy = 2.0 * findgen(10)
IDL> dy = cl_putvar(hy)
Performing a simple operation is easy (though I will eventually change the form of this call to return the value of the result):
IDL> status = cl_add(dx, dy, dresult)
Retrieve the result:
IDL> result = cl_getvar(dresult)
I have implemented memory operations for allocating, transferring data, and freeing memory. I also have routines for obtaining metadata on variables, as well as the available OpenCL devices and platforms.
OpenCL kernels are always constructed from strings, so it is easy to make custom kernels on-the-fly to evaluate user entered expressions. In the next example we will evaluate the expression:
dz = 2. * dx + sin(dy)
First, allocate and initialize the inputs/outputs:
dx = cl_findgen(10)
dy = cl_putvar(2. * findgen(n))
dz = cl_fltarr(10, /nozero)
Next, create the kernel using a string representing the expression, a string array of the variable names in the expression, and an array of type codes of the variables in the expression:
kernel = cl_compile('z[i] = 2. * x[i] + sin(y[i])', $
['x', 'y', 'z'], $
lonarr(3) + 4L, $
/simple)
The
SIMPLE
keyword indicates that we are just filling in an expression in a simple element-wise kernel. To run the kernel on our data, call
CL_EXECUTE
with the kernel identifier and a structure containing the values for the variables in the expression:
status = cl_execute(kernel, { x: dx, y: dy, z: dz })
Finally, retrieve the results as before:
z = cl_getvar(dz)
Without the
SIMPLE
keyword set, the user can specify the entire full code for the kernel.
There are some great advantages to using OpenCL:
- OpenCL is an open standard.
- OpenCL is not limited to NVIDIA GPUs (or even GPUs at all, OpenCL can run on various platforms including CPUs).
- It is easy to create on-the-fly kernels.
- OpenCL uses the same concepts as CUDA, so many kernels can be translated fairly easily.
Unfortunately, there is one big disadvantage right now:
- OpenCL libraries are not as far along as CUDA libraries, both those provided by NVIDIA and third-parties.
This would mean that anything beyond basic arithmetic in the current GPULib would be eliminated right now. I don't think it will be long before OpenCL catches up in this regard, but this is a serious issue currently.