Available to registered developers.
Here are the interesting new stuff I found:
- Support for malloc() and free() in kernels: dynamic global memory allocation ! This is implemented with a new syscall linking mechanism that seems to allow kernel to be linked to precompiled system calls. Infos on the linking mechanism (.calltargets , .callprototype ) can be found in section 10.3 of the PTX ISA manual. I hope this mechanism will get exposed for user functions in the API !
- 64 bits addressing support in CUDA driver AP: Allows manipulating more than 4GB of device memory.
- New System Management Interface (nvidia-smi) for reporting various hardware counters informations
- New stream synchronization function cudaStreamSynchronize(): allow GPU-side inter-streams synchronisation.
- A set of new calls is available to allow the creation of CUDA devices with interoperability with Direct3D devices that use SLI in AFR (Alternate Frame Rendering)
- New flag to driver API texture reference (CU_TRSF_SRGB), which enables sRGB->linear conversion on a read.
- Reference manual adds architecture information on GF10x (GF104, GF106, GF108) class hardware (compute capability 2.1)
- Add tld4 (fetch4) instruction for loading a component (r, g, b, or a) from the four texels compising the bilinear interpolation footprint of a given texture location.
- Add kernel pointer parameter state space and alignment of the memory being pointed to.
New CUDA Libraries
- CUSPARSE, supporting sparse matrix computations.
- CURAND, supporting random number generation for both host and device code with Sobel quasi-random and XORWOW pseudo random routines.