Following my last post, I will describe how template metaprogramming can be used to "emulate" dynamic kernel template parameters. This approach does not prevent explosion of the amount of generated code at compile time, but provides a simple interface to "transform" runtime variable values into kernel template parameters.
If you are not familiar with templates metaprogramming, I suggest you to read Andrei Alexandrescu's great book: Modern C++ Design: Generic Programming and Design Patterns Applied. This book comes with a small and simple metaprogramming library called Loki that is just a series of helper headers. My code relies on a few structures from it.
Code description
Here is what a kernel call looks like with this mechanism:
The actual kernel call is encapsulated into a functor (here MyKernelCallFunctor) that is passed, with the runtime variables, to the call function of what I called the KernelCaller. The KernelCaller "transform" the set of runtime variables passed to the call function into a set of compile time parameters passed to the functor in the form of a typelist.
The functor is in charge making the actual kernel call by extracting the template values from the typelist, and by passing to the __global__ function the arguments previously passed through it's constructor and stored into the structure. It has to be written for each kernel you want to launch:
The KernelCaller is the core of the mechanism. It operates by recursively adding static values into the typelist by testing the value of the runtime variables. In reallity the compiler will generate code for all the branches and so will instanciate all possible sets of compile time parameters as typelists, the right one being selected at runtime. To ease usage, the runtime variables are passed using a variable argument list and transformed into a vector containing the values.
The compile time recursion is stopped thanks to the number of arguments passed as a template parameter to the KernelCaller. :
Here it is, as is we already have a working mechanism able to deal with arbitrary number of boolean parameters !
But this is not very funny to write and not very scalable. So once again we can rely on templates to generate the code for us !
In this case here is what the KernelCaller::call function looks like:
And here is the ValueSelector mechanism:
Have fun, and... sorry for this crazy long post !
[Update June 21]
Code updated: There was 2 bugs in the code :
- Using a reference as first parameter of a vararg is forbiden by C++ and resulted in a crash on 32bits machines -> transformed into a pointer.
- The MaxArgValue argument to the KernelCaller was not kept by the ValueSelector, resulting in the use of the default value of 10 whatever the user specified so it was very slow to compile even for boolean values.
I also added a #pragma message in order to follow the recursive compilation.
Dependency on cutil_inline.h removed.
June 20, 2010 at 12:15 AM
This is an awesome technique, it saved hundreds of lines of code. Generally, this solution is the most universal I have seen so far. Yet I have tried it with bool params, but it'd be interesting to check out int params as well!
June 24, 2010 at 1:42 AM
Un peu rien a voir avec le post mais je voulais savoir sur quoi vous développiez comme système ? Windows ou Linux (distribution ?), Version Cuda ? etc...
Et sinon au niveau des volumes, j'aimerais savoir où vous trouvez vos volumes pour faire vos rendu comme les exemples de rendu avec les GigaVoxels. Comme j'étudie cette technique, j'aurais aimer avoir de quoi faire des tests et autre...
Et maintenant pour ce qui concerne l'article, je trouve cette technique assez intéressante, je vais essayer de comprendre et implémenter un exemple. Pour l'instant, je n'en ai pas vraiment l'utilité mais cela pourrait se présenter :D
June 24, 2010 at 9:29 AM
Hum la question sur l'OS ressemble un peu a une tentative de lançage de troll ;-) Windows et la derniere version dispo de CUDA (3.1Beta). Pour les volumes ca depend, ils peuvent venir de diverses sources, il y a aussi pas mal de procédural ou de voxelization d'objets que je génère moi meme.
June 24, 2010 at 11:34 AM
Oh non aucune tentative, c'était pour savoir un peu sur quoi vous travaillez. Perso, je bosse sur linux mais j'avoue parfois avoir des difficultés avec des mise à jour qui parfois mettent à mal mon installation CUDA. Donc c'était juste pour savoir si vous étiez sur Linux, je ne dirais donc rien sur le fait que vous soyez sur Windows ;)
Merci pour vos réponses.
July 20, 2010 at 6:41 PM
Obviously this is a very cool technique, I always like this kind of engineering :) However, as you are building kernel calls here, what is the performance implication of building & iterating over std::vector<> parameter lists at runtime (as opposed to lack of overhead at compile-time, as usually is the case for metaprogramming techniques); or should I not be worried about performance in this type of code?
July 20, 2010 at 10:09 PM
Hi Nico,
I really think that the cost of manipulating an std::vector is negligible compared to the cost of launching a kernel on the GPU, so that should not be a problem in practice.