There is also the support for direct peer-to-peer communication between GPUs and mappping multiple GPUs (and potentially other third party devices like network/infiniband) memory into the same address space in order to provide direct memory access (Unified Virtual Addressing, UVA). Virtual functions should also now be supported, along with the New and Delete functions for dynamic memory allocations from kernels.
Update: CUDA 4.0 RC released to registered developers
Slides are available there: http://bit.ly/cuda4features
Among the interesting novelties I did not see before, it seems inline PTX will be officially supported with this release ! Also the dissasembler (cuobjdump) that were previously limited to Tesla ISA now support Fermi ISA disassembly. Take a look as the manual for the list of supported instructions.