Thursday, February 23, 2012

Multiplying a scalar by an array

Currently, you have to be a bit tricky to multiply a scalar by an array. (In the next version of GPULib, there will be a efficient, direct way to do scalar/array operations.)

Most GPULib operations have a short form and a long form. The short form is a simple version of the operation that takes two array arguments. For addition, the short form is equivalent to:

C = A + B
where A, B, and C are all arrays of the same size. In code, this is done with:
c = gpuadd(a, b)
The long form takes five arguments, three scalars and two arrays, in the form of:
C = s1*A + s2*B + s3
where A, B, and C are arrays of the same size and s1, s2, and s3 are scalars. In code, this is:
c = gpuadd(s1, a, s2, b, s3)
If you just want to do a scalar times an array, use the long form with A for both A and B, while also setting b and c to 0.0. For example, to do 2. * findgen(10), do:
IDL> gpuinit
Welcome to GPULib 1.5.0 (Revision: 2199M)
Graphics card: Tesla C2070, compute capability: 2.0, memory: 751 MB available, 1279 MB total
Checking GPU memory allocation...no errors
IDL> a = findgen(10)
IDL> da = gpuputarr(a)
IDL> dc = gpufltarr(10)
IDL> scalar = 2.0
IDL> ; dc = scalar * da + 0. * da + 0.
IDL> dc = gpuadd(scalar, da, 0., da, 0., lhs=dc) 
IDL> c = gpugetarr(dc)
IDL> print, c
      0.00000      2.00000      4.00000      6.00000      8.00000
      10.0000      12.0000      14.0000      16.0000      18.0000
Note, that most operations have a long form version that performs an affine transform along with the operation.

Monday, January 31, 2011

GPULib 1.4.4 released!

We are pleased to announce that GPULib 1.4.4 is available from the Tech-X website.


The following changes have been made for version 1.4.4:
  • Support for CUDA 3.2 (rather than CUDA 3.2 rc2)
  • Added more extensive checks for issues in GPUINIT.
  • Added ability to transpose 1-dimensional arrays like TRANSPOSE does.
  • Now handles ranges, index arrays, and * in any indexing position of an array in the same manner as IDL.
  • Now can use single-dimensional indexing for multi-dimensional arrays.
  • Trying to create a view from discontinuous memory is now disallowed.
  • Fixed garbage collection issue with repeated assignment to the same variable name.
  • Fixed an issue whereby multiple calls to GPUINIT were causing cudaErrorSetOnActiveProcess errors.
  • Fixed memory allocation bug in GPUTRANSPOSE.
  • Fixed typo in GPUMATRIX_MULTIPLY.
  • Added unit tests for GPUTRANSPOSE, GPUMAXOP.
  • Now prints an error message when trying to create a view from discontinuous memory.
  • Fixed build issue on Mac OS X Leopard.
  • Fixed several install issues for Windows when building from source.
  • Improvements to the QuickStartGuide for Windows.
  • Fixed getDimensions bug for certain views.
  • Added error checking for changing dimensions.
  • Fixed memory leak in GPUFFT.
  • Fixed errors in GPUMAXOP.
  • Fixed failing unit tests of GPUMAXOP.
  • Added missing demo file to install list.
  • Fixed digital signing of installer file. 
  • Fixed disappearing graphic in transform3d demo.
  • Corrected IDL_LIBDIR path.
  • Fixed path of file gpulib_version.txt in CMakeLists.txt.
  • Fixed full reduction in GPUTOTAL.
  • Fixed -fPIC warnings when building from source on Windows.
  • Fixed color table issue in GPU_SWIRL_DEMO.
  • Added "press q to stop" option to GPU_FDTD_DEMO.
  • Fixed flipped image in GPU_DECONHUBBLE_DEMO.


Monday, November 15, 2010

GPULib 1.4.2 released!

We are pleased to announce that GPULib 1.4.2 is available from the Tech-X website.

For Windows users, the installer contains the following pre-built versions of GPULib 1.4.2:
  1. 32- or 64-bit single precision (compute capability 1.0) built with CUDA 3.2
  2. 32- or 64-bit double precision (compute capability 1.3) built with CUDA 3.2
  3. 32- or 64-bit Fermi (compute capability 2.0) built with CUDA 3.2
(As before, Windows users can opt to build GPULib from source if they so desire).

The following changes have been made for version 1.4.2:
  • Fixed make install so that appropriate files are now installed.
  • All demos cleaned up and numerous memory leaks fixed.
  • Created a new comprehensive PDF for Windows users, covering general use as well as how to build from source.
  • Windows installation now includes all source files in a zip archive, so that users who are not building from source do not have their install folder cluttered with source files.
  • gpuinit() now displays amount of free memory as well as total memory available on the device.
  • Fixed bug in GPUMATRIX_MULTIPLY. 
  • Fixed several small bugs and memory leaks.

We are grateful to Mort Canty and David Grier for submitting bug reports which led to fixes for this release!

Friday, October 1, 2010

GPULib 1.4.0 released!

We are pleased to announce that GPULib 1.4.0 is available from the Tech-X website.

For Windows users, we created an installer which installs one of the following pre-built versions of GPULib 1.4.0:
  1. 32- or 64-bit single precision (compute capability 1.0) built with CUDA 3.1
  2. 32- or 64-bit double precision (compute capability 1.3) built with CUDA 3.1
  3. 32- or 64-bit Fermi (compute capability 2.0) built with CUDA 3.1
  4. 32-bit single precision (compute capability 1.0) built with CUDA 2.3
  5. 32-bit double precision (compute capability 1.3) built with CUDA 2.3
(As before, Windows users can opt to build GPULib from source.)

In addition, the following changes have been made:
  • Now builds with CMake for cross-platform compatibility.
  • Now supports CUDA streams, enabling concurrent execution of multiple kernels.
  • Now supports asynchronous data transfer. 
  • Now leverages new features of IDL 8.0 enabling more seamless integration between the two products. 
  • Includes a variety of new algorithms, such as functions for sorting and large histogramming. 
  • gpuinit() now provides additional information, including GPULib version and model of your CUDA-enabled card, and additionally performs a small allocation to ensure the card is available.
  • Demos updated to fail gracefully if you do not have enough memory on your card, or if your card does not have the compute capability to run the demo.
  • Fixed a bug whereby assigning a regular IDL variable to a slice failed.
  • gpushift() fixed for the case where a dimension is shifted by 0.
  • Fixed problems with gpuInterpolate functions.
  • Fixed memory leak in gpufix().
  • Fixed memory leak in gpufft().
  • MATLAB support has been dropped.

Tuesday, November 17, 2009

GPULib 1.2.2 Released

GPULib 1.2.2 is now available from the Tech-X website

We've made extensive changes to the build system, which is now cleaner and more robust. Full release notes follow.

Build system changes 
- Added --with-extra-nvcc-flags=... to configure which allows extra flags to be passed to nvcc.
- If --prefix is not set, make install will fail gracefully, instead of attempting to install in /.
Fixed --with-matlab-dir=... configure option.
- Added IDL and MATLAB configuration info to config.summary to make it easier to troubleshoot problem.
- Added several missing Windows build files.
- Removed several obsolete files and directories. 
- Running 'make clean' will not affect documentation.
- Running 'make install' will properly build code if not already built.
- Install directory is now laid out properly.
- Fixed "No rule to make target `docs/GPULib_UsersGuide.pdf', needed by `all-am'" error.

IDL  changes  
- Fixed bug whereby FFT was only operating on a single row.
- Fixed bug whereby GPUPOW was not found.
- Corrected bwtest example.
- Fixed time reporting for FDTD demo.
- Fixed typo in FDTD demo README.

MATLAB changes 
- Added potential to specify the device number form gpuInit() and accInit().
- Fixed Bug in gpuSet function.
- Fixed Makefile which was incorrect for 32-bit Linux.

2nd edition of Mort Canty's book uses GPULib

From the CRC Press site for Image Analysis, Classification, and Change Detection in Remote Sensing: With Algorithms for ENVI/IDL, Second Edition:

This popular introduction to the processing of remote sensing imagery has been updated to include coverage of the latest versions of the ENVI software environment. This new edition covers support vector machines and other kernel-based methods. Illustrating many programming examples in the array-oriented language ID, the text includes coverage of basic Fourier, wavelet, principal components and minimum noise fraction transformations; convolution filters, topographic modeling, image-to-image registration and ortho-rectification; image fusion; supervised and unsupervised land cover classification with neural networks; hyperspectral analysis; multivariate change detection.

I was excited to hear that GPULib was used in this version of the book. Mort says:

In the text I discuss routines for nonlinear principal component analysis, supervised classification and nonlinear clustering, and explain that they can take advantage of GPULib/CUDA, if installed. (I use your routine GPU_DETECT() to check for GPULib).

Friday, July 24, 2009

GPULib docs from ENVI menu

A recent post by Mort Canty provides a handy program that adds an item to ENVI's help menu that will bring up the GPULib docs.