Available to registered developers.

Here are the interesting new stuff I found:

  • Support for malloc() and free() in kernels: dynamic global memory allocation !
  • This is implemented with a new syscall linking mechanism that seems to allow kernel to be linked to precompiled system calls. Infos on the linking mechanism (.calltargets , .callprototype ) can be found in section 10.3 of the PTX ISA manual. I hope this mechanism will get exposed for user functions in the API !
    • 64 bits addressing support in CUDA driver AP: Allows manipulating more than 4GB of device memory.
    • New System Management Interface (nvidia-smi) for reporting various hardware counters informations
    • New stream synchronization function cudaStreamSynchronize(): allow GPU-side inter-streams synchronisation.
    • A set of new calls is available to allow the creation of CUDA devices with interoperability with Direct3D devices that use SLI in AFR (Alternate Frame Rendering) 
    • New flag to driver API texture reference (CU_TRSF_SRGB), which enables sRGB->linear conversion on a read.
    • Reference manual adds architecture information on GF10x (GF104, GF106, GF108) class hardware (compute capability 2.1)
    Changes in PTX ISA 2.2:
    • Add tld4 (fetch4) instruction for loading a component (r, g, b, or a) from the four texels compising the bilinear interpolation footprint of a given texture location.
    • Add kernel pointer parameter state space and alignment of the memory being pointed to.

    New CUDA Libraries
    • CUSPARSE, supporting sparse matrix computations.
    • CURAND, supporting random number generation for both host and device code with Sobel quasi-random and XORWOW pseudo random routines.