After Intel officially admitted (through this blog post, also an interesting article here) that Larrabee is not going to play in the consumer gaming market in the "near future", BsN publish an interesting post-mortem article from Andrew Richards : Why Intel Larrabee Really Stumbled: Developer Analysis
At a given point of your PhD, even trees start looking like the Stanford Bunny !
And it's not Photoshopped (I saw it during a hike near Grenoble).
Just discovered a good review of OpenGL 4.0 made by Christophe Riccio "Groovounet", I missed it in March.
NVIDIA released a beta version of the CUDA 3.1 toolkit for register developers.
New features from the programming guide :
- 16bits float textures supported by the runtime API. __float2half_rn() and __half2float() intrinsic added (Table C-3).
- Surface memory interface exposed in the runtime API (Section 3.2.5, B9). Read/Write access into textures (CUDA Arrays). But limited to 1D and 2D Arrays yet.
- Up to 16 parallel kernel launches on Fermi (it was only 4 in CUDA 3.0). Not sure how it is really implemented (one per SM ? multiple per SM ?).
- Recursive calls supported in device function on Fermi (B.1.4). Stack size query and setting functions added (cudaThreadGetLimit(), cudaThreadSetLimit()).
- Function pointers supported on device functions on Fermi (B.1.4). Function pointers to global functions supported on all GPUs.
- Just noticed that a __CUDA_ARCH__ macro allowing to write different code paths depending on the architecture (or code executed on the host) is here since CUDA 3.0 (B.1.4).
- printf support into kernels integrated into the API for sm_20 (B.14). Note that a cuprintf supporting all architectures was provided to register developers a few months ago.
- New __byte_perm(x,y,s) intrinsic (C.2.3).
- New __forceinline__ function qualifier to force inlining on Fermi. A __noinline__ was also present already to allow forcing function call on sm_1.x
- New –dlcm compilation flag to specify global memory caching strategy on Fermi (G.4.2).
Interesting new stuff in the Fermi Compatibility Guide:
- Just-in-time kernel compilation can be used with the runtime API with R195 drivers (Section 1.2.1).
- Details using the volatile keyword for intra-warp communications (Section 1.2.2).
Interesting new stuff in the Best Practice Guide:
- Uses signed integer instead of unsigned as loop counter. It allows the compiler to perform strength reduction and can provides better performances (Section 6.3).
Jeremy Sugerman defended his PhD yesterday, the slides from his talk on GRAMPS can be found there:
Programming Many-Core Systems with GRAMPS
Here is also a talk he did on GRAMPS at UCDavis in February :
A quite old stuff (from January) I just read again. I like this point of view :-)