NVIDIA made available the specification of the PTX 2.0 ISA for Fermi, this can be downloaded there:

Among interesting things I saw :
  • New Texture, Sampler and Surface types: Opaque type for manipulating texture, sampler and surface descriptor as normal variables. -> More flexible texture manipulation, allow arrays of textures for instance.
  • New syntax for abstracting an underlining ABI (Application Binary Interface): define a syntax for function definition/calls, parameter passing, variadic functions, and dynamic memory allocation in the stack ("alloca"). -> true function calls, and recursivity ! But not yet implemented in CUDA 3.0.
  • New binary instructions  popc (population count, number of one bits), clz (count leading zeros), bfind (non significant non-sign bit), brev (bit reverse), bfe/bfi (bit field extract/insert, ?), prmt (permute)
  • Cache operators (8.7.5.1): Allow to select (per operation) the level of caching in the cache hierarchy (L1/L2) of the load/store instructions.
  • Prefetch instructions (Table 84) that allows forcing the load of a page in global/local memory into a specific cache level).
  • Surface load/store (surd/sust, Tables 90/91): Read/Write (through ROPs ?) into render targets. (Support 3D R/W! Hum.. really working ?)
  • Video instructions: Vector operations on bytes/half-words/words.
  • Performance tuning directives (10.3): Allows to help the compiler to optimize the code based on bloc configurations.