NVIDIA made available the specification of the PTX 2.0 ISA for Fermi, this can be downloaded there:
Among interesting things I saw :
- New Texture, Sampler and Surface types: Opaque type for manipulating texture, sampler and surface descriptor as normal variables. -> More flexible texture manipulation, allow arrays of textures for instance.
- New syntax for abstracting an underlining ABI (Application Binary Interface): define a syntax for function definition/calls, parameter passing, variadic functions, and dynamic memory allocation in the stack ("alloca"). -> true function calls, and recursivity ! But not yet implemented in CUDA 3.0.
- New binary instructions popc (population count, number of one bits), clz (count leading zeros), bfind (non significant non-sign bit), brev (bit reverse), bfe/bfi (bit field extract/insert, ?), prmt (permute)
- Cache operators (18.104.22.168): Allow to select (per operation) the level of caching in the cache hierarchy (L1/L2) of the load/store instructions.
- Prefetch instructions (Table 84) that allows forcing the load of a page in global/local memory into a specific cache level).
- Surface load/store (surd/sust, Tables 90/91): Read/Write (through ROPs ?) into render targets. (Support 3D R/W! Hum.. really working ?)
- Video instructions: Vector operations on bytes/half-words/words.
- Performance tuning directives (10.3): Allows to help the compiler to optimize the code based on bloc configurations.