Yesturday NVIDIA released an official disasembler for sm_1.x (pre-Fermi) real hardware ISA. It's like an official version of DECUDA :-) (that Wladimir stopped to develop)
It takes either an ELF CUDA binary, a cubin or even an exe file, and provides the low level assembly code of the CUDA kernels.
It is only available for registered developer for now, but you can get a little more information the CUDA forum.
That's something a lot of developers have been asking for for a while. That allows to see the impact of optimizations on the real microcode, and it is particularly important for register usage for instance (since registers allocations is done after the PTX level).
Nice NVIDIA finally end up unveiling it's real hardware ISA instructions. AMD is still a little bit ahead on this since the ISA instructions and microcode is available even for the Evergreen architecture (RV870): http://developer.amd.com/gpu/ATIStreamSDK/assets/AMD_Evergreen-Family_ISA_Instructions_and_Microcode.pdf
Official CUDA disasembler for sm_1.x real ISA
Subscribe to:
Post Comments (Atom)
January 19, 2011 at 4:14 PM
That's very nice :) Now I finally don't have to feel guilty anymore for not having time to maintain decuda.
I wonder if they're going to release a sm2_0 version as well.
January 19, 2011 at 5:44 PM
Hi Wladimir :-) Yes releasing an sm2_0 version of it would be very interesting, but I am not sure they are ready to expose such kind of details on Fermi hardware before the next generation be released... :-( Anyway nice to hear from you ! What are you doing currently ?
January 20, 2011 at 2:00 PM
Hello Cyril :)
Well, they have seen now that people will always reverse engineer their instruction formats, and AMD is publicly releasing PDFs covering their ISAs, so somehow I hope they realize trying to keep it under covers is ineffective. I'm finally going to get a Fermi card soon, which will re-trigger my interest for CUDA.
Currently I work at ASML, to parallelize some of their mathematical modeling code for setting lens parameters.
As for side projects, I'm reading into machine learning and AI again, seems that's an area where a lot is happening lately, and it's very suited to paralellism.