One common problem when using templates to parametrize CUDA kernels (cf. my previous post) is to be able to dynamically select the set of template parameters to use for a call, depending on runtime variables. This usually leads to an exploding number of cascaded if/switch and a lot of code copy/paste to instantiate the whole parameters tree for each kernel call.
This situation is illustrated by the following code for boolean parameters:

In addition to the pain it is to write, such code results in the compilation of an exploding number of versions of the same kernel, one for each instantiated template configuration.

Dynamic template parameters with JIT Kernels compilation
There is a CUDA feature I am dreaming about for a few time now and that would solve both problems: dynamic template parameters. What I mean by this is the ability for a CUDA kernel to accept true C variables (containing runtime values) as integer template parameters. The syntax would simply look like this:

This feature would be implemented by taking advantage of a C-level JIT (Just In Time) kernel compilation (current CUDA JIT compiler operates at the PTX level). It implies recompiling the kernel at runtime with a new set of template parameters each time a value changed. It requires tracking the last value of each parameters so that recompilation happens only when necessary. To be a bit more efficient, generated code could also be cached in some way so that it can be reused.
This would change the kernel compilation paradigm to something closer to the OpenCL compiling model, but while keeping the nice CUDA-C syntax provided by nvcc.
That feature would be very useful, and it would be great if NVIDIA makes CUDA evolves in that direction, or if someone write a JIT CUDA-C compiler that allows that !

Emulating dynamic templates... with templates !
While waiting for that feature, dynamic integer template parameters can be partially emulated today... with template metaprogramming ! The idea is to instantiate the whole parameters tree at compile time using templates, and to select the right one at runtime, based on the variables.

More details coming in the next post !