GPULib: 2009

Tuesday, November 17, 2009

GPULib 1.2.2 Released

GPULib 1.2.2 is now available from the Tech-X website.

We've made extensive changes to the build system, which is now cleaner and more robust. Full release notes follow.

Build system changes
- Added --with-extra-nvcc-flags=... to configure which allows extra flags to be passed to nvcc.
- If --prefix is not set, make install will fail gracefully, instead of attempting to install in /.
- Fixed --with-matlab-dir=... configure option.
- Added IDL and MATLAB configuration info to config.summary to make it easier to troubleshoot problem.
- Added several missing Windows build files.
- Removed several obsolete files and directories.
- Running 'make clean' will not affect documentation.
- Running 'make install' will properly build code if not already built.
- Install directory is now laid out properly.
- Fixed "No rule to make target `docs/GPULib_UsersGuide.pdf', needed by `all-am'" error.

IDL changes
- Fixed bug whereby FFT was only operating on a single row.
- Fixed bug whereby GPUPOW was not found.
- Corrected bwtest example.
- Fixed time reporting for FDTD demo.
- Fixed typo in FDTD demo README.

MATLAB changes
- Added potential to specify the device number form gpuInit() and accInit().
- Fixed Bug in gpuSet function.
- Fixed Makefile which was incorrect for 32-bit Linux.

2nd edition of Mort Canty's book uses GPULib

From the CRC Press site for Image Analysis, Classification, and Change Detection in Remote Sensing: With Algorithms for ENVI/IDL, Second Edition:

This popular introduction to the processing of remote sensing imagery has been updated to include coverage of the latest versions of the ENVI software environment. This new edition covers support vector machines and other kernel-based methods. Illustrating many programming examples in the array-oriented language ID, the text includes coverage of basic Fourier, wavelet, principal components and minimum noise fraction transformations; convolution filters, topographic modeling, image-to-image registration and ortho-rectification; image fusion; supervised and unsupervised land cover classification with neural networks; hyperspectral analysis; multivariate change detection.

I was excited to hear that GPULib was used in this version of the book. Mort says:

In the text I discuss routines for nonlinear principal component analysis, supervised classification and nonlinear clustering, and explain that they can take advantage of GPULib/CUDA, if installed. (I use your routine GPU_DETECT() to check for GPULib).

Friday, July 24, 2009

GPULib docs from ENVI menu

A recent post by Mort Canty provides a handy program that adds an item to ENVI's help menu that will bring up the GPULib docs.

Tuesday, July 14, 2009

GPULib 1.2 released

GPULib 1.2 is available from the Tech-X website. This release focused on improved MATLAB bindings with a few important bug fixes for the IDL bindings along with a few new kernels. Full release notes follow:

Changes/new features in GPULib version 1.2

General

The main focus of this release is on the improved MATLAB bindings. Some new kernels were added since the release of version 1.0.8.

GPULib kernels

gpuAtan2, gpuFmod, gpuPow

IDL bindings

- Support for the new kernels. For the time being, these funtions only support float and double (so no complex types) and no affine transform arguments.
- Added example bwtest.pro showing the use of page-locked variables for fast CPU/GPU data transfer.
- Added finite-different time-domain example demonstrating the use of views for efficient array sub-selection
- Added spectral angle mapper example.
- Bug fixes for decon_hubble example
- improved documentation

MATLAB bindings

MATLAB GPULib version 1.2 has many major changes from the previous release. READ the README!

First and foremost, there are two distinct and completely separate interfaces to the library. They should NEVER be intermingled.

1.) The accArray class replaces the old gpuArray class from the previous release. This interface requires MATLAB R2008a or higher. This interface hass automatic garbage collection, overloaded operators, and overloaded versions of native MATLAB functions, ...
2.) "gpu" interface class can be used with older versions of MATLAB though it's not clear how far back one can go.

The interface was redesigned for speed. The accArray class is about 2.5X faster than the gpuArray interface for many functions tested. Some of the "gpu"-prefixed functions can be up to 10X faster than the gpuArray interface.

MATLAB GPULib has many new functions including
1.) fft, ifft, fft2, and ifft2
2.) Reduction operations, including sum, cumsum, prod, cumprod,... These functions support 1D vectors and 2D matrices currently.
3.) Single (Complex) and Double (Complex) precision versions of Matrix Multiplication, Transpose and Complex Conjugate Transpose.

The accArray class does not support the subsref.m (i.e. b=A(i)), subsasgn.m (i.e. A(i)=b), or array concatentation functions like A=[B; C; D], These will be supported in future releases.

The "gpu" interface supports subscripting through the gpuSubsref, gpuSubsasgn, and gpuSub2ind functions.

Both interfaces support page-locked host memory allocation via cudaMallocHost. This gives the possibility of much faster memory transfer from CPU memory to GPU memory and back.

Both interfaces include more comprehensive native MATLAB-like documentation.

New examples include: bench, bwtest, fdtd, and fftExample.

Friday, May 8, 2009

GPULib slides from VISualize 2009

Here are the slides from Peter Messmer's GPULib talk at VISualize 2009 a couple weeks ago. Peter and I also stayed around after the IDL presentations to do hands on training for GPULib. The training was well attended; it was good to see the interest in GPU computing with IDL. Most people did not bring a laptop, making it less “hands on” than we had originally intended, but it was good to be able to answer individual questions.

Tuesday, April 21, 2009

NVIDIA Releases OpenCL Driver

OpenCL is officially out -- have a look at the press release, or the OpenCL site.

Monday, March 16, 2009

Detecting presense of GPULib

Over the weekend, a user (Mort Canty) asked the GPULib user's mailing list for a way to detect the presence. One of our developers wrote a routine to do just that, and I thought it might be of use to other GPULib users as well.

Below is the routine GPU_DETECT that will return a boolean indicating if GPULib is present. It will also initialize the GPU if it finds GPUINIT. If you have any questions, feel free to ask!

; docformat = 'rst'

;+
; Initializes GPULib if it is present.
;
; :Returns:
; 1 if GPULib is present, 0 if not
;
; :Params:
; devId: in, optional, type=numtype
; id of the GPU device to be used for GPU computations
;
; :Keywords:
; _ref_extra : in, optional, type=keywords
; keywords to GPUINIT
;-
function gpu_detect, devId, _ref_extra=e
compile_opt strictarr

catch, error
if (error ne 0L) then begin
catch, /cancel
return, 0
endif

gpuinit, devId, _extra

return, 1
end

Friday, February 27, 2009

GPULib 1.0.8 update available

GPULib 1.0.8 is available at

http://gpulib.txcorp.com

This release is addressing several bugs pointed out by users; we're still working on the MATLAB refactoring for a future version. As always, if you have any questions, feel free to post in the comments, the GPULib mailing list, or contact us directly at support@txcorp.com.

Also, one note; during testing of this release, we found some problems with using it on Windows Vista. We're looking for solutions, but at the moment XP is the better choice for running GPULib.

Thursday, January 22, 2009

32- vs. 64-bit IDL on Mac OS X

IDL 7.0.4 was recently released, providing full 64-bit support for Mac OS X, and some users who have upgraded have asked how this impacts GPULib, as using the library when running IDL in 64-bit mode doesn't work.

Currently, in order to build 64-bit code using GCC on OS X, one needs to set an additional flag, depending on the architecture targeted. We don't do this at the moment for GPULib, so it defaults to building the more compatible 32-bit libraries. Full details at

http://developer.apple.com/documentation/Darwin/Conceptual/64bitPorting/building/chapter_5_section_2.html

Apple goes so far as to say, "You should transition your application to a 64-bit executable format only when the 64-bit environment offers a compelling advantage for your specific application." We haven't switched to using 64-bit libraries with GPULib on Mac OS X, since thus far there hasn't been a need. If you do need the extra address space using GPULib, please let us know! Otherwise, you can add the "-32" flag to IDL 7.0.4 and run IDL in 32-bit mode, which will be compatible with the current release of GPULib.

Tuesday, January 13, 2009

Performing a shift in IDL with GPULib

A couple of users recently asked about doing a shift of a matrix using GPULib. While there is no interface that mimics 'shift' at the moment, all the functionality for shifting arrays is there. For example, you can use the following to shift an array along the x-direction:

IDL> x = findgen(5, 5)
IDL> print, x
0.00000 1.00000 2.00000 3.00000 4.00000
5.00000 6.00000 7.00000 8.00000 9.00000
10.0000 11.0000 12.0000 13.0000 14.0000
15.0000 16.0000 17.0000 18.0000 19.0000
20.0000 21.0000 22.0000 23.0000 24.0000

IDL> a = gpuPutArr(x)
IDL> b = gpuFltarr(5, 5)

; here comes the actual shift:
; we first perform b[0:3, *] = a[1:*, *]
; see the documentation for 'gpuSubArr' for an
; explanation of the arguments.
IDL> gpuSubArr, a, [1, -1], -1, b, [0, 3], -1

; and then b[4, * ] = a[0,*]
IDL> gpuSubArr, a, 0, -1, b, 4, -1

IDL> res = gpuGetArr(b)
IDL> print, res

1.00000 2.00000 3.00000 4.00000 0.00000
6.00000 7.00000 8.00000 9.00000 5.00000
11.0000 12.0000 13.0000 14.0000 10.0000
16.0000 17.0000 18.0000 19.0000 15.0000
21.0000 22.0000 23.0000 24.0000 20.0000

Shifts in other directions can be implemented similarly. This is not the fastest possible shift, but at least it should allow you to perform the shift on the GPU, rather than transferring it back to the CPU.

As always, if you have questions about this or any other use of GPULib, feel free to leave comments here, or email us at support@txcorp.com.

Double precision and GPULib

A few users have asked about double precision calculations using GPULib. I thought I'd try to clear up some common questions.

First, you may notice when running the GPULib unit tests in IDL (using 'make check') that the double precision tests fail. There are two reasons for this. First, not all CUDA-enabled hardware is capable of double precision calculations -- currently, the GTX 200 series are the only cards that can do this; check the NVIDIA Web site for details on your specific card.

Secondly, you may need to update your video card drivers. We've had some issues with the 177.x series of CUDA drivers, but the 180.x series is doing much better. I'd recommend upgrading to the most recent drivers available from NVIDIA in any case, as they are constantly improving both features and performance. The downloads are available at http://www.nvidia.com/object/cuda_get.html