Icare3D

NVIDIA Turing Vulkan/OpenGL extensions

2018-09-19T21:06:00.009+02:00

NVIDIA just launched the TU102/TU104 (GeForce RTX 2080ti/2080), first GPUs based on the Turing architecture. This new architecture brings hardware ray-tracing acceleration, as well as many other new and really cool graphics features. A good high-level overview of the architecture and new graphics features can be found in the Turing Architecture whitepaper as well as this blog post. Most of these features are exposed through both Vulkan and OpenGL extensions, and I will quickly go through each of them in this post. A big thanks to the many people at NVIDIA who worked hard to provide us with these extensions !

Most features split into a Vulkan or OpenGL -specific extension (GL_*/VK_*), and a GLSL or SPIR-V shader extension (GLSL_*/SPV_*).

Ray-Tracing Acceleration

VK_NVX_raytracing (GLSL_NVX_raytracing / SPV_NVX_raytracing)

Turing brings hardware acceleration for ray-tracing through dedicated units called RT cores. The RT cores provide BVH traversal as well as ray-triangle intersection. This acceleration is exposed in Vulkan through a new ray-tracing pipeline, associated with a series of new shader stages. This programming model maps the DXR (DirectX Ray-Tracing) model, which is quickly described in this blog post, and this blog post details the Vulkan implementation.

A GTC 2018 presentation about Vulkan Ray-Tracing can also be found there: http://on-demand.gputechconf.com/gtc/2018/video/S8521/ (Slides here).
This blog post details

Mesh Shading

VK_NV_mesh_shader / GL_NV_mesh_shader (GLSL_NV_mesh_shader / SPV_NV_mesh_shader)

This is a new programmable geometry pipeline which replaces the traditional VS/HS/DS/GS pipeline with basically a Compute-based programming model. This new pipeline is based on two shader stages, a Task Shader and a Mesh Shader (separated by an expansion stage), which are used to ultimately generate a compact mesh description called a Meshlet. A Meshlet is a mini indexed geometry representation which is maintained on chip and is directly fed to the rasterizer for consumption. This exposes a very flexible and very efficient model with Compute Shader features and generic cooperative thread groups (workgroups, shared memory, barrier synchronizations...). Applications are endless, and this can for instance be used to implement efficient culling or LOD schemes, or perform procedural geometry generation.

Many details can be found in this excellent blog post by Christoph Kubisch: https://devblogs.nvidia.com/introduction-turing-mesh-shaders/

As well as in his Siggraph 2018 presentation: http://on-demand.gputechconf.com/siggraph/2018/video/sig1811-3-christoph-kubisch-mesh-shaders.html

A full OpenGL sample code which implements a compute-based adaptive tessellation technique can also be found there: https://github.com/jdupuy/opengl-framework/tree/master/demo-isubd-terrain

Variable Rate Shading

VK_NV_shading_rate_image / GL_NV_shading_rate_image (GLSL_NV_shading_rate_image / SPV_NV_shading_rate)

This is a very powerful hardware feature which allows the application to dynamically control the number of fragment shader invocations (independently of the visibility rate) and vary this shading rate across the framebuffer. The shading rate is controlled using a texture image ("Shading Rate Image", 8b/texel) where each texel specifies an independent shading rate for blocks of 16x16 pixels. The rate is actually specified indirectly using 8b indices into a palette which is specified per-viewport and stores the actual shading rate flags.

Not only the feature allows to vary the MSAA shading rate per-pixel (allowing 1x,4x,8x, and now even 16x SSAA, but with a maximum of 8x depth test and color storage), but it also allows to drop the shading rate below one invocation per-pixel, down to one invocation per block of 4x4 pixels (through one per 1x2, 2x1, 2x2, 2x4 and 4x2 pixels) and even zero invocation.

The GLSL extensions also exposes intrinsics allowing fragment shaders to read the effective fragment size in pixels (gl_FragmentSizeNV) as well as the number of fragment shader invocation for a fully covered pixel (gl_InvocationsPerPixelNV). This opens the road to many new algorithms and more efficient implementations of optimized shading rate techniques, like Foveated Rendering, Lens Adaptation (for VR), Content or Motion Adaptive Shading.

More info on Variable Rate Shading in this blog post: https://devblogs.nvidia.com/turing-variable-rate-shading-vrworks/

Exclusive Scissor Test

VK_NV_scissor_exclusive / GL_NV_scissor_exclusive

This adds a second per-viewport scissor test, which culls fragments *inside* (exclusive) the specified rectangle, unlike the standard scissor test which culls *outside* (inclusive). This can be used for instance to implement more efficient multi-resolution foveated-rendering techniques (in conjunction with Variable Rate Shading), where raster passes fill concentric strips of pixels by enabling both inclusive and exclusive scissor tests.

Texture Access Footprint

VK_NV_shader_image_footprint / GL_NV_shader_texture_footprint (GLSL_NV_shader_texture_footprint / SVP_NV_shader_image_footprint)

These extensions expose a set of GLSL (and SPIR-V) query functions which report the texture-space footprints of texture lookups, ie. some data identifying the set of all texels that may be accessed in order to return a filtered result for the corresponding texture accesses (which can use anisotropic-filtering and potentially cover large footprints). Footprints are returned and represented as an LOD value, an anchor point and a 64-bit bitfield where each bit represents coverage for a group of neighboring texel (in 2D, group granularity can range from 2x2 to 256x256 texels).

This is actually an important component for implementing multi-pass decoupled and texture-space shading pipelines, where a restricted set of actually visible pixels must be determined in order to efficiently perform shading in a subsequent pass.

Derivatives in Compute Shader

VK_NV_compute_shader_derivatives / GL_NV_compute_shader_derivatives (GLSL_NV_compute_shader_derivatives / SPV_NV_compute_shader_derivatives)

These extensions bring Compute even closer to Graphics by adding support for Quad-based derivatives in Compute Shaders, using the x and y coordinates of the local workgroup invocation ID. This allows Compute Shaders to use both built-in derivative functions like dFdx(), as well as texture lookup functions using automatic LOD computation, and the texture level of detail query function (textureQueryLod()).
Two layout qualifiers are provided allowing to specify Quad arrangements based on a linear index or 2D indices.

Shader Subgroup Operations

VK_NV_shader_subgroup_partitioned / GL_NV_shader_subgroup_partitioned (SPV_NV_shader_subgroup_partitioned)

These shader extensions provide a series of ballot-based partitioning and scan/reduce operations which operate on "subgroups" of shader invocations. This can be used for instance to implement clustering and de-duplication operations on sets of values distributed among different shader invocations.

Barycentric Coordinates and manual attributes interpolation

VK_NV_fragment_shader_barycentric / GL_NV_fragment_shader_barycentric (GLSL_NV_fragment_shader_barycentric / SPV_NV_fragment_shader_barycentric)

Illustration courtesy of Jean-Colas Prunier,
https://www.scratchapixel.com/

This feature exposes barycentric coordinates as Fragment Shader input in GLSL (and SPIR-V), and provides the ability for a Fragment Shader to directly fetch raw per-vertex values in order to perform manual barycentric interpolation.

A three-component vector built-in input gl_BaryCoordNV provides perspective-corrected barycentric coordinates (gl_BaryCoordNoPerspNV for non- perspective-correct). Per-vertex inputs use the same brackets array syntax as for Tesselation and Geometry Shader inputs, and a pervertexNV qualifier is added to identify input blocs and variables which read raw per-vertex values from the vertices of the original primitive.

This feature potentially allows more efficient data passing to the Fragment Shader using compact or compressed data formats for instance. It could also allow interpolation from vertex values fetched directly from memory, user defined interpolations, or various reconstructions and computations using raw attributes accessed from the three vertices.

Ptex Hardware Acceleration

VK_NV_corner_sampled_image

An corner-sampled image has texels centered on integer coordinates instead of being halfway, which allows edge sampling coordinates to filter to the exact texels on the edge of the texture. This facilitates implementing Ptex (Per-face Texture [Burley and Lacewell 2008], cf. https://developer.nvidia.com/sites/default/files/akamai/gamedev/docs/Borderless%20Ptex.pdf) texturing in real-time applications by providing proper filtering and interpolation. Ptex uses separate textures for each face of a subdivision surface or polygon mesh, and sample locations are placed at pixel corners, maintaining continuity between adjacent patches by duplicating values along shared edges.

Representative Fragment Test

VK_NV_representative_fragment_test / GL_NV_representative_fragment_test

This extension has been designed to allow optimizing occlusion queries techniques which rely on per-fragment recording of visible primitives. It allows the hardware to stop generating fragments and stop emitting fragment shader invocations for a given primitive as long as a single fragment has passed early depth and stencil tests. This reduced subset of fragment shader invocation can then be used to record visible primitives in a more performant way. This is only a performance optimization, and no guarantee is given on the number of discarded fragments and consequently the number of fragment shader invocations that will actually be executed.

A typical use case for this feature can be found in this occlusion culling sample code: https://github.com/nvpro-samples/gl_occlusion_culling

Multi-View Rendering

VK_KHR_multiview / GL_OVR_multiview

Pascal Simultaneous Multi-Projection (SMP) and stereo view features allowed broadcasting the same geometric data for rasterization to multiple views (up to 2) and viewports (up to 16) with very limited possibilities of per-view (and viewport) variations (an horizontal position offset per-view and xyz swizzle per-viewport + viewport attributes). Turing generalizes and improves over this feature by allowing to specify fully independent per-view attribute values (including vertex positions) and exposes up to 4 views. No new extension is provided, but the feature is exposed transparently as an optimization to the existing standardized Vulkan VK_KHR_multiview and OpenGL GL_OVR_multiview extensions.
More info on Multi-View Rendering in this bog post: https://devblogs.nvidia.com/turing-multi-view-rendering-vrworks/

NVIDIA Maxwell OpenGL extensions

2014-09-19T19:47:00.015+02:00

NVIDIA just launched the second-generation Maxwell architecture with the GM204 GPU, which is I believe, an incredible chip. The Maxwell 2 architecture is both highly energy efficient (~2x perf/watt of Kepler in games), and provides a lot of very exciting new graphics features (some of them are exposed in Direct3D). These features are exposed in form of new OpenGL extensions in the R344 driver that was released today, and the specification for all NVIDIA supported GL extensions can be found here. NVIDIA also released new SDK samples using these extensions.

List of new extensions

Quick description of the new extensions

Multisampling

Target-independent multisampling control and mixed samples (NV_framebuffer_mixed_samples + EXT_raster_multisample):

This feature adds a lot of flexibility to the multi-sampled rasterization. It decouples the rasterization sampling frequency (which can be set explicitly) from the actual framebuffer storage. This enables rasterization to operate at higher sampling frequency than the one of the target render color buffers. It supports both depth and stencil testing at this frequency, if the corresponding depth and stencil buffers are sampled accordingly (it must be a multiple of the number of samples in the color buffers).
There are still some constraints; All color buffers must have the same number of samples, and the raster sample count must match the depth and stencil buffer sample count if depth or stencil test is enabled, and it must be higher or equal to the color buffer sample count.

A new “coverage reduction stage” is introduced in the per-fragment operations (after the fragment shader in early-z mode, after the depth-test in late-z), which converts a set of covered raster/depth/stencil samples to a set of covered color samples. There is an implementation-dependent association of raster samples to color samples. The reduced "color coverage" is computed such that the coverage bit for each color sample is 1 if any of the associated bits in the fragment's coverage is set, and 0 otherwise. This feature can be used in conjunction with the coverage to color feature (cf. below), in order to get the FS output coverage mask automatically transformed into a color by ROP. According to AnandTech, when rasterizing with explicit multisampling and no render-target, Maxwell allows evaluating primitive coverage at 16x MSAA.

Note that EXT_raster_multisample is equivalent to "Target-Independent Rasterization" in Direct3D 11.1, which allows using multiple raster samples with a single color sample, as long as depth and stencil tests are disabled, and it is actually a subset of NV_framebuffer_mixed_samples which is more general and exposes more flexibility.

Coverage to color conversion (NV_fragment_coverage_to_color):

This allows using ROP to automatically convert the post depth-/stencil-/alpha- test coverage mask into a color and write it into a color render target. This conversion is performed before the new coverage reduction stage (cf. NV_framebuffer_mixed_samples) and can be useful in order to save coverage in the context of deferred shading.

Post-depth coverage (EXT_post_depth_coverage):

When operating in early-depth mode (layout(early_fragment_tests) in;, see here for more information), this extension allows the fragment shader to get the post depth-test coverage mask of the current fragment as input (gl_SampleMaskIn[], for which only sample passing the depth-test are set), unlike the standard GL 4.5 behavior which provides the pre- depth-test coverage (actual triangle coverage).

Multisample coverage override (NV_sample_mask_override_coverage)

With standard OpenGL, the Fragment Shader output coverage mask (gl_SampleMask[]) is ANDed with the actual primitive input coverage mask before being used in subsequent pipeline stages. This extension disables this AND operation, which allows the fragment shader to entirely override the primitive coverage, and enables setting coverage bits that are not present in the input mask. This is actually very nice, because it allows using the output coverage as a way to dynamically route color output values into arbitrary sample locations inside a multisampled render target.

Programmable sample locations (NV_sample_locations):

Allows applications to explicitly set the location of sub-pixel samples for multisample rasterization, providing fully programmable sampling patterns. Sampling patterns can be defined within a grid of adjacent pixels, which depends on the number of samples. According to provided queries, the sub-pixel positions are snapped to a 16x16 sub-pixel grid.

Rasterization

Conservative rasterization (NV_conservative_raster):

This is a really great feature. It allows Rasterization to generate fragments for any pixel touched by a triangle, even if no sample location is covered on the pixel. A new control is also provided to modify the window coordinate snapping precision in order to allow the application to match conservative rasterization triangle snapping with the snapping that would have occurred at higher resolution. Polygons with zero area generate no fragments. Any location within a pixel may be used for interpolating attributes, potentially causing attribute extrapolation if outside the triangle. This can be useful for binning purpose for instance (using one pixel per-tile).

Fragment Shader Interlock (NV_fragment_shader_interlock):

This extension exposes an hardware-accelerate critical section for the fragment shader, allowing hazard-free read-modify-write operations on a per-pixel basis. It also allows enforcing primitive-ordering for threads entering the critical section. It provides new GLSL calls beginInvocationInterlockNV() and endInvocationInterlockNV() defining a critical section which is guaranteed to be executed only for one fragment at a time. Interlock can be done on a per-pixel or a per-sample basis if multi-sampled rasterization is used. This feature is useful for algorithms that need to access per-pixel data structures via shader load and store operations, while avoiding race conditions. Obvious applications are OIT and programmable blending for instance.

Screen-space bounding-box rasterization (NV_fill_rectangle):

This allows rasterizing the axis-aligned screen-space bounding box of submitted triangles, disregarding the actual triangle edges. It can be useful for drawing a full-screen quad without an generating an internal edge for instance, or for more efficiently drawing user interfaces.

Geometry processing

Geometry Shader Passthrough (NV_geometry_shader_passthrough):

This extension allows making geometry shaders more efficient in the case where they are pass-through, ie. there is a one-to-one mapping between input and output primitives. In this case, per-vertex attributes are simply copied from the input primitive into the output primitive, and the geometry shader is only used to set per-primitive attributes (like gl_Layer, gl_ViewportMask[] ... ), which can be computed from the input vertex attributes.

Viewport Multicast (NV_viewport_array2):

Viewport multicast allows automatically broadcasting the same primitive to multiple viewports (and/or multiple layers when using layered render-targets) simultaneously, in order to be rasterized multiple times. It is exposed through a new gl_ViewportMask[] GLSL output attribute which is available in both the vertex shader and the geometry shader. This can be especially powerful when combined to the new passthrough geometry shader. A sample using it for speeding-up cascaded shadow maps is available here.

Texturing

Enhanced sparse textures (EXT_sparse_texture2) :

This extension improves on ARB_sparse_texture, which separates the allocation of virtual address space from the physical memory of textures, and provides the ability to sparsely allocate the physical backing-store of 2D/3D/2DArray textures on a per-tile basis. This new extension adds the ability to retrieve texture access residency information from GLSL, to specify minimum allocated LOD to texture fetches and to return a constant zero value for lookups into unallocated pages. It also adds support for multi-sampled textures.

Texture Filter min/max (EXT_texture_filter_minmax):

This exposes a new sampler parameter which allows performing a min or max reduction operation on the values sampled inside a texture filtering footprint, instead of the regular linear interpolation. It is supported for all kind of textures, as well as anisotropic filtering.

Atomics

FP16 global atomics (NV_shader_atomic_fp16_vector) :

This extension provides a set of new atomic operations operating on 2 and 4 components vectors of 16b floating point values for images, bindless pointers to global memory and storage buffers.

Unreal Engine 4 : real-time GI using voxel cone-tracing !

2012-06-08T18:31:00.004+02:00

EPIC games just released two impressive videos presenting their new UE4 game engine featuring real-time global illumination that supports diffuse and specular bounces on dynamic scenes.

According to this interview of Tim Sweeney, their technique is called SVOGI (Sparse Voxel Octree Global Illumination) and... it is based on the voxel cone-tracing technique we published last year at Pacific Graphics ! :-D

That's really awesome to see research results being used so quickly in a game engine ! Beyond the paper, you can find more details about our voxel cone-tracing approach in my GTC 2012 talk, my Siggraph talk, as well as in my Ph.D thesis on GigaVoxels.

GTC 2012 Talk: "Octree-Based Sparse Voxelization for Real-Time Global Illumination"

2012-05-18T03:19:00.005+02:00

This week I gave a talk at the NVIDIA GPU Technology Conference about a fast sparse voxelization technique that I worked on at NVIDIA in the context of my real-time global illumination approach (using voxel cone-tracing inside a sparse voxel octree).

Slides: (PPTX, 50MB), (PDF, 7MB)

Video (if you want to enjoy my french accent): http://www.gputechconf.com/

In this talk I first give an overview of the real-time GI approach, before quickly detailing the new GPU voxelization and octree construction algorithm.

This sparse voxelization technique will be published in the OpenGL Insights book that will be out for Siggraph.

[UPDADE 07/12] The book chapter in OpenGL Insights has been published online here.

Ph.D thesis: GigaVoxels

2012-01-26T03:01:00.008+01:00

I defended my Ph.D thesis on GigaVoxels last July, and the document is now online.

You can download it there:
GigaVoxels: A Voxel-Based Rendering Pipeline For Efficient Exploration Of Large And Detailed Scenes

You can also check my other publications on my Ph.D webpage.

Interview on 3DVF

2011-12-02T19:00:00.001+01:00

I gave an interview to 3DVF, a great French community website dedicated to image synthesis, and numerical creation in general. The interview is mainly centered around our voxel-based GI approach, as well as my former INRIA research team and computer graphics research in France.

The interview be read here (in French).

Ph.D defended and postoc at NVIDIA Research

2011-11-09T23:42:00.000+01:00

I did not take time to report on this before, but I defended my Ph.D thesis on GigaVoxels at INRIA this summer (my thesis will be soon available there), and I started a one year postdoc at NVIDIA Research in the SF bay area. There, I continue working on voxel representations, of course :-D

Even if I am now an NVIDIA employee, this continue to be a personal blog, all the things I publish here are totally mine and my opinions do not necessarily reflect the ones from my employer !

Real-Time Shadows Book

2011-09-30T11:06:00.000+02:00

Just a quick post to recommend you the great book "Real-time shadows" that details, explore and greatly explain a large number of relevant techniques for real-time shadow rendering, which we all know must be one of the most prolific field in computer graphics !

In addition, this book is written by 4 leading experts of the field: Elmar Eisemann, Michael Schwarz, Ulf Assarsson and Michael Wimmer.

For me, it is now THE reference on shadows rendering !

More info on the book website. and a good description on the publisher's website. Of course it can be bought on amazon.

Siggraph 2011 Slides

2011-09-13T22:30:00.002+02:00

Here is a quick list of interesting Siggraph 2011 courses and talks available online.

Courses/Talks :
Advances in Real-Time Rendering in Games
Beyond Programmable Shading
State-of-the-Art: Interactive Global Illumination
Destruction and Dynamics for Film and Game Production
Filtering Approaches for Real-Time Anti-Aliasing
Production Volume Rendering
Compiler Technology for Rendering
Liquid Simulation with mesh-based Surface Tracking

Companies:
CRYTEK
NVIDIA Slides , Videos
INTEL

OpenGL/OpenCL :
OpenGL BOF Slides
Introduction to Modern OpenGL Programming
OpenCL BOF Slides

Others:
HPG 2011

Also a great list with more posters and talks is available on Stephen Hill's Blog
There is also my talk on Interactive Indirect Illumination Using Voxel Cone Tracing :-)

OpenGL 4.2 specification released

2011-08-08T17:54:00.004+02:00

Specifications can be found in the GL registry: http://www.opengl.org/registry/

NVIDIA drivers already supporting it as well as new ARB extensions can be found there: http://developer.nvidia.com/opengl-driver

A full and very good review can be found on G-Truc Creation: http://www.g-truc.net/post-0414.html
Among interesting things, the shader_atomic_counters, conservative depth, instanced transformed feedback, the integration of shader_image_load_store and more !

And congrats to Christophe for his first credit in the spec :-D

"Interactive Indirect Illumination Using Voxel Cone Tracing" paper (Pacific Graphics 2011)

2011-08-04T17:55:00.004+02:00

I am glad to announce that our paper "Interactive Indirect Illumination Using Voxel Cone Tracing" (cf. my previous post http://blog.icare3d.org/2011/06/interactive-indirect-illumination-and.html) has been accepted at Pacific Graphics 2011 !

You can find the authors version of the paper on my research page :
http://www.icare3d.org/research-cat/publications/interactive-indirect-illumination-using-voxel-cone-tracing.html
On my INRIA webpage:

http://artis.imag.fr/Publications/2011/CNSGE11b/

It is also on my NVIDIA webpage:
http://research.nvidia.com/publication/interactive-indirect-illumination-using-voxel-cone-tracing

Also, don't forget to attend my talk if you are going to Siggraph 2011 !

Tuesday, 9 August 9:00 am - 10:30 am | West Building, Rooms 109/110

Free 3D meshes links

2011-06-26T16:58:00.002+02:00

I started to compile links to websites where free 3D models can be found. If you know other good websites, feal free to post them in the comments :-)

Static models and scenes:

Great collection of models for scientific publications on Morgan McGuire webpage: http://graphics.cs.williams.edu/data/meshes.xml
3D Render challenge: http://www.3drender.com/challenges/
Crytek: http://www.crytek.com/cryengine/cryengine3/downloads
Keenan Crane : http://www.cs.caltech.edu/~keenan/models.html
Sibenik model: http://hdri.cgtechniques.com/~sibenik2/download/
AIM@Shape : http://shapes.aimatshape.net/
Characters Creative Common: http://artist-3d.com/
Characters: http://www.3dvalley.com/3d-models/characters
Blender files: http://www.blendswap.com/3D-models/category/featured/
Archive 3D: http://archive3d.net/

Animated models and scenes:

Ingo Wald: http://www.sci.utah.edu/~wald/animrep/
MIT CSAIL 1: http://people.csail.mit.edu/drdaniel/mesh_animation/index.html
MIT CSAIL 2: http://people.csail.mit.edu/drdaniel/dynamic_shape/index.html
MIT Animals and Face: http://people.csail.mit.edu/sumner/research/deftransfer/data.html
Face data: http://grail.cs.washington.edu/software-data/stfaces/index.html
Pants: http://www.ryanmwhite.com/research/cloth_cap.html

Interactive Indirect Illumination and Ambient Occlusion Using Voxel Cone Tracing

2011-06-24T15:02:00.017+02:00

I am happy to present you a new approach to compute indirect illumination and ambient occlusion in real-time using a pre-filtered geometry representation stored inside a sparse voxel octree, as well as a new approximate voxel-based cone-tracing. The sparse voxel octree is used as a proxy for occlusion information and indirect illumination, it supports dynamic objects and environments thanks to a new very fast voxelization algorithm that updates the octree structure and pre-filter geometry and lighting information.

This work has been done in collaboration with Miguel Sainz and Simon Green from NVIDIA, as well as Fabrice Neyret from CNRS and Elmar Eisemann from ParisTech.

The paper ha been published at Pacific Graphic. You can find the authors version of it on my research page:
http://www.icare3d.org/research-cat/publications/interactive-indirect-illumination-using-voxel-cone-tracing.html

I also presented this work as a talk at Siggraph 2011 in Vancouver as well as a poster at I3D 2011.

[Update 30/06/2011]: Benoit Rogez wrote a very good article about our approach on 3DVF (in French) : here
[Update 10/06/2012]: More details on the algorithm and the fast voxelization and octree construction can be found in my GTC 2012 presentation.

Indirect lighting (global illumination):
High quality video: http://artis.imag.fr/Membres/Cyril.Crassin/GIVoxels/Siggraph11_GI1.mov

Ambient occlusion:
High quality video: http://artis.imag.fr/Membres/Cyril.Crassin/GIVoxels/Siggraph11_AO1.mov

The vicious circle of generalization @AltDevBlogADay

2011-04-05T05:37:00.001+02:00

Full story here :-D

This can be described as vicious circle of generalization:

We want the code to be as generalized as possible.
We design everything future-proof and extendible.
When a feature request arrives, we’re doomed we need to change a lot of code.
Why?
Because everything was designed as generalized as possible.
goto 1;

GDC 2011 Technical Presentations

2011-03-10T13:02:00.004+01:00

GDC 2011 is over now and presentations start to appear on-line :-)

DICE: http://publications.dice.se/
NVIDIA: http://www.nvidia.com/object/gdc2011.html
AMD: http://developer.amd.com/documentation/presentations/Pages/default.aspx#GDC
INTEL: http://software.intel.com/en-us/articles/intelgdc2011/
Khronos on OpenGL: http://www.khronos.org/library/detail/2011-gdc-opengl

More links can be found on this blog: http://msinilo.pl/blog/?p=724

Season of next-gen game engines

2011-03-10T12:53:00.002+01:00

This is the season of the new game engines : Dice Frostbite 2, Crytek CryEngine 3 and Epic Unreal Engine 3 !

CUDA 4.0 RC Released to Registered Developers

2011-03-04T12:33:00.003+01:00

Here it is, CUDA 4.0 RC just got released to NVIDIA Registered developers.

Interesting stuff from the CUDA manual:

Layered Textures Support (GL_TEXTURE_1D/2D_ARRAY) : New tex.a1d/.a2d modifiers in PTX. But unfortunately the surface instruction do not support them yet, Grrrr
Layered textures are created using cudaMalloc3DArray() with the cudaArrayLayered flag. New cudaTextureType2DLayered/ cudaTextureType2DLayered texture sampler types and tex1DLayered()/tex2DLayered() access intrinsics.

New .address_size PTX specifier : Allows to specify the address size (32b/64b) used throughout a PTX module.
Inline PTX assembly: This feature was already present since CUDA 2.x but was not officially supported. It's now fully supported and documented :-D
Driver API, new thread-safe stateless launch API function cuLaunchKernel(): cuLaunchKernel(kernelObj, blocksPerGrid, 1, 1, threadsPerBlock, 1, 1, 0, 0, args, 0);
FERMI ISA documented and supported by cuobjdump.
Enhanced C++: Support for operators new and delete, virtual functions.

CUDA 4.0 announced @GDC 2011

2011-02-28T17:05:00.007+01:00

Today NVIDIA announced CUDA 4.0 at the GDC. It will be available to registered developers on March 4^th :-D

Among interesting novelties, there is the support for layered textures (GL_TEXTURE_2D_ARRAY) that I hope will also be supported for surface access !
There is also the support for direct peer-to-peer communication between GPUs and mappping multiple GPUs (and potentially other third party devices like network/infiniband) memory into the same address space in order to provide direct memory access (Unified Virtual Addressing, UVA). Virtual functions should also now be supported, along with the New and Delete functions for dynamic memory allocations from kernels.

Looking forward to test all of this !

More info : Anandtech, NVIDIA pressroom, Dr Dobbs GPGPU.org

Update: CUDA 4.0 RC released to registered developers
Slides are available there: http://bit.ly/cuda4features
Among the interesting novelties I did not see before, it seems inline PTX will be officially supported with this release ! Also the dissasembler (cuobjdump) that were previously limited to Tesla ISA now support Fermi ISA disassembly. Take a look as the manual for the list of supported instructions.

Writing a PhD thesis using FreeMind and LateX : My FreeMind Latex import plug-in

2011-02-19T20:48:00.000+01:00

I am currently in the process of writing my PhD thesis, and I discovered that relying on a mind mapping software can in fact revealed very useful especially at the beginning of the process. Indeed, I found mind mapping softwares like FreeMind very useful tools to graphically organize ideas into a hierarchy and that allows you to quickly and very efficiently construct and update the outline of your thesis.

It is actually easy to export a mindmap from FreeMind into a hierarchy of latex sections, in order to directly get your document structure ready to fill.

I am actually using this XSLT definition to do so.

The only problem when working like this is that once you have exported your document structure from Freemind into latex, you have made modifications and written some sections into your latex, there is no way to import it back to FreeMind in order to reorganize your document.

To allow this usage, I wrote a small groovy script for FreeMind that read a latex file and build the tree hierarchy out of it. It also imports the content of the sections into the text attribute of the nodes, and this content gets correctly exported by the latex export XSLT.

My Groovy script: FreeMindLatexImportScript.groovy

This script must put into the FreeMind/plugins/script directory, and the following section must be added to the FreeMind/plugins/ScriptingEngine.xml file in order to get the script visible into the Tools menu:

Of course, this code is provided with no warranty, but it should work well :-)

Have fun !

#AltDevBlogADay

2011-02-18T20:22:00.000+01:00

Today, I would like to invite you to follow an awesome new participating game development blog called #AltDevBlogADay. This blog has been founded in January by Mike Acton who is engine director at Insomniac Games, and group together many very good game developers who publish one article each day on the website.

Here is a short list of the article I particularly liked and I advice you to take a look at, among those already published:

January pop stack post !

2011-01-18T18:30:00.001+01:00

I have not been posting here for a while and I am sorry about this. I have been quite busy lately with a paper submission and also the writing of my PhD thesis ! Anyway, I wish everybody an happy new year :-)

There is a couple of links I stacked lately and I would like to share, so here they are !

NVIDIA Announced Project Denver : A fully integrated CPU+GPU chip dedicated to high-performance computing and based on ARM cores. It is designed to power future products ranging from personal computers to servers and supercomputers. It looks like NVIDIA's answer to AMD Fusion and is also clearly a "we don't believe in the future of x86 architectures" launched to the industry. I really believe this is the right approach to bring high performance CPUs to all market segments. Thanks to the rise of mobile and tablet like devices, we may finally see the end of the hegemony of the x86 architecture. I really hope that it will not be the last dinosaur from NVIDIA ;-)
gDebugger is now free !
“Principles of Digital Image Synthesis” book is now free for download. And a nicely formated PDF can be found here.
Data-Oriented Design - Links and Thoughts : The concept of Data-Oriented Design (DOD) as opposed to Object Oriented Programming (OOP) is a concept that is emerging and gaining popularity among the game developers community lately. The idea is that instead of building performance sensitive parts of applications around classes that represent individual objects with ineficient -non-cache coherent- memory access patterns, it is more efficient to build them by thinking in terms of data flows and memory access patterns. That's a concept that may not be unfamiliar to GPU developers :-D
Overtime vs Productivity : Two very interesting blog posts by Allan Mckay that discuss on how people work, why we often get burned out and how to avoid this.
A lot of very interesting technical documentations about NVIDIA GPUs and coming from reverse engineering of various hardware can be found on pscnv Git repository. Pscnv is a fork of the nouveau project, an open source driver for NVIDIA GPUs.

"How the GPU works" @c0de517e

2010-11-24T09:45:00.001+01:00

I rediscovered a very good in-depth explanation on how GPU works published in 2008 on c0de517e blog:
Part 1 Part 2 Part 3

CUDA "Better Performance at Lower Occupancy" @GTC2010

2010-11-23T18:26:00.001+01:00

A friend point me this very interesting talk at NVIDIA GTC:
Better Performance at Lower Occupancy

They deny two common fallacies that CUDA developer usually believe in:

Multithreading is the only way to hide latency on GPU
Shared memory is as fast as registers

All the GTC2010 presentations can be found there (with slides and videos !):
http://www.nvidia.com/object/gtc2010-presentation-archive.html

Old Real-Time GPU Raytracer

2010-11-19T16:13:00.002+01:00

I just translated from French to English an old page on my website about a real-time GPU raytracer I developed for fun 4 years ago, during my Master Thesis. It is old school GPGPU in OpenGL and Cg that can run on an NV40 (GeForce 6800). No need for CUDA or a GF110 to do GPU raytracing ! ;-)

The application also features a slow and unoptimized CPU raytracer.

See there: http://www.icare3d.org/myprojects/opengl_projects/raytracer_gpu_full_1.0.html

PS: It is funny to see what was possible at this time, but it was developed quickly and the shader code itself is not a reference !

Fluid Simulation for Video Games @INTEL

2010-11-18T18:16:00.000+01:00

There is a very interesting series of article about fluid simulations for video games written by Michael J. Gourlay on intel developer website. Source code is also provided.

Parts: 1, 2, 3, 4, 5, 6, 7, 8