One big pain with CUDA under Windows Vista or Seven is that performances suffers a lot from limits and overheads imposed by the WDDM (Windows Display Driver Model) the driver has to comply to.
This means slower kernel launches, limit on the size of memory allocations and a lot of constraints that prevents NVIDIA to efficiently implement a lot of features in CUDA.
Tim Murray on the CUDA forum:
"Welcome to WDDM. Kernel launch overhead is ~3us on non-WDDM platforms. On WDDM, it's 40 at a minimum and can potentially be much larger. Considering the number of kernels you're launching in 10ms, that's going to add up."
"WDDM is a lot more than just a rendering interface. It manages all the memory on the device so it can page it in and out as necessary, which is a good thing for display cards. However, we get zero benefit from it in CUDA, because we have pointers! As a result, you can't really do paging in a CUDA app, so you get zero benefit from WDDM. However, because it's the memory manager, we can't just go around it for CUDA because WDDM will assume it owns the card completely, start moving memory, and whoops your CUDA app just exploded. So no, there's not really some magic workaround for cards that can also be used as display."
To overcome this problem, NVIDIA provides a compute-only drivers for Tesla boards. But with little effort it can be also be installed on GeForce.
How to install them on GeForce: