Following my last post, I will describe how template metaprogramming can be used to "emulate" dynamic kernel template parameters. This approach does not prevent explosion of the amount of generated code at compile time, but provides a simple interface to "transform" runtime variable values into kernel template parameters.

If you are not familiar with templates metaprogramming, I suggest you to read Andrei Alexandrescu's great book: Modern C++ Design: Generic Programming and Design Patterns Applied. This book comes with a small and simple metaprogramming library called Loki that is just a series of helper headers. My code relies on a few structures from it.

Code description
Here is what a kernel call looks like with this mechanism:

The actual kernel call is encapsulated into a functor (here MyKernelCallFunctor) that is passed, with the runtime variables, to the call function of what I called the KernelCaller. The KernelCaller "transform" the set of runtime variables passed to the call function into a set of compile time parameters passed to the functor in the form of a typelist.
The functor is in charge making the actual kernel call by extracting the template values from the typelist, and by passing to the __global__ function the arguments previously passed through it's constructor and stored into the structure. It has to be written for each kernel you want to launch:

The KernelCaller is the core of the mechanism. It operates by recursively adding static values into the typelist by testing the value of the runtime variables. In reallity the compiler will generate code for all the branches and so will instanciate all possible sets of compile time parameters as typelists, the right one being selected at runtime. To ease usage, the runtime variables are passed using a variable argument list and transformed into a vector containing the values.

The compile time recursion is stopped thanks to the number of arguments passed as a template parameter to the KernelCaller.  :

Here it is, as is we already have a working mechanism able to deal with arbitrary number of boolean parameters !

Extending the approach to integer parameters
The approach as presented here works well for boolean template parameters. If we want to deal with more values than 0/1 per variable, we need to write something like this:

But this is not very funny to write and not very scalable. So once again we can rely on templates to generate the code for us !
In this case here is what the KernelCaller::call function looks like:

And here is the ValueSelector mechanism:


Loki structures
Here are the Loki structures we rely on:


Full source code
A full running sample code can be downloaded here.

Have fun, and... sorry for this crazy long post !

[Update June 21]
Code updated: There was 2 bugs in the code :
  • Using a reference as first parameter of a vararg is forbiden by C++ and resulted in a crash on 32bits machines -> transformed into a pointer.
  • The MaxArgValue argument to the KernelCaller was not kept by the ValueSelector, resulting in the use of the default value of 10 whatever the user specified so it was very slow to compile even for boolean values.
I added two #defines to control the sample kernel call: SAMPLE_NUMBER_OF_PARAMS indicating the number of parameters to use and SAMPLE_NUMBER_OF_INTEGER_VALUES indicating the number of values to use for each paramater (2 for boolean parameters).
I also added a #pragma message in order to follow the recursive compilation. 
Dependency on cutil_inline.h removed.