Various GPU stuff from Siggraph time

NVIDIA Siggraph 2010 presentations available in streaming there.

Siggraph 2010 Khronos OpenGL BOF and OpenCL BOF slides available.

Reference pages for OpenGL 3.3 and OpenGL 4.1 are online on !
  • I already said it, but I love the way OpenGL has been evolving since OpenGL 3.0 ! It really seems to be a real willpower from the vendors to make it again a first class innovative API :-)

OptiX 2.0, and Cg Toolkit 3.0 released by NVIDIA
  • SM 5 support in Cg at last ! 
  • I tried OptiX (previoulsy NVIRT) recently and I was really impressed, especially by the easiness of usage of the "high level" optixu interface. That's really an awesome tool.

OpenGL 4.1 Specifications released + NVIDIA drivers

The specifications of OpenGL 4.1 just got released by the Khronos group (But why didn't they wait for the OpenGL BOF ??).

It does not bring a lot of new features, but it's still great to see OpenGL quickly evolving ! Direct State Access does not get into the core yet (sorry Christophe ;-), and I am not sure we will get it before OpenGL 5.0...

As usual, NVIDIA is very likely to announce the release of drivers supporting OpenGL 4.1 during the OpenGL BOF :-) forum official thread here.

Here are the main new features:
  • Viewport Array (ARB_viewport_array). This is, for me, the most interesting new feature. It allows to manipulate multiple viewports inside a given render call. Viewports control the behavior of the "viewport transformantion" stage (view space -> window coordinates, scissor test). Multiple viewports can be created and the geometry shader can direct emitted primitives to a selected viewport. A separate viewport rectangle and scissor region can be specified for each viewport.
  • Ability to get the binary representation of a program object (ARB_get_program_binary). This is a long-awaited feature present in DX for a while.
  • Separate shader objects (ARB_separate_shader_objects). It allows to compile and to to link a separate program for each shader stage (PS/GS/TCS/TES/FS). A Program Pipeline Object is introduced to manipulate and bind the separate programs. That's also a useful features, and that was the way to do in Cg.
  • Improved compatibility with OpenGL ES 2.0 (ARB_ES2_compatibility). Adds a few missing functions and tokens.
  • Support for 64bits vertex attributes in GLSL (ARB_vertex_attrib_64bit).
  • Increases required size for textures/renderbuffers.

    Some interesting new extensions were also released:
    • ARB_debug_output: Callback mechanisms to receive enhanced errors and warning messages.
    • ARB_robustness: Address multiple specific goals to improve robustness, for example when running WebGL applications. For instance it provide additional "safe" APIs that bound the amount of data returned by an API query.
    • ARB_shader_stencil_export: Ability to set stencil values in a fragment shader for enhanced rendering flexibility :-)
    • ARB_cl_event: Link OpenGL sync objects to OpenCL event objects for enhanced OpenCL interoperability. 

    UPDATE 27/07: That's done, NVIDIA released it's OpenGL 4.1 drivers ! Everything there.

        OpenGL 4.0+ ABuffer V2.0: Linked lists of fragment pages

        The main problem with my first ABuffer implementation (cf. my previous post) was that a fixed maximum number of fragments per pixel has to be allocated at initialization time. With this approach, the size of the ABuffer can quickly become very large when the screen resolution and depth complexity of the scene increase.

        Using linked lists of fragment pages per pixel

        Original basic approach

        To try to solve this problem, I implemented a variant of the recent OIT method presented at the GDC2010 by AMD and using per-pixel linked lists. The main difference in my implementation is that fragments are not stored and linked individually but into small pages of fragments (containing 4-6 fragments). Those pages are stored and allocated in a shared pool whose size is changed dynamically depending on the scene demands.
        Using pages allows to increase the cache coherency when accessing the fragments, improve the efficiency of concurrent access to the shared pool and decrease the storage cost of the links. This is at the cost of a slight over-allocation of fragments.
        The shared pool is composed of a fragment buffer where fragment data is stored, and a link buffer storing links between the pages that are reverse chained. Each pixel of the screen contains the index of the last page it references, as well as a counter with the total number of fragments stored in that pixel (incremented using atomics).
        The access to the shared pool is manage through a global page counter, incremented using an atomic operation each time a page is needed by a fragment. The allocation of a page is done by a fragment when it detects that the current page is full, or there is not any page yet for the pixel. This is done inside a critical section to unsure that multiple fragments together in the pipeline and falling into the same pixel will be handled correctly.

        ABuffer memory occupancy differences:

        Some memory occupancy examples of the fragments storage depending on screen resolution (Basic vs Linked Lists):
        • 512x512:    64MB vs 6.5MB 
        • 768x708:   132.7MB vs 11.7MB
        • 1680x988:  405MB vs 27.42MB

        The cost of this huge reduction of the storage need is that the rendering speed decreases compared to the basic approach. Linked lists can be down to half the speed of the basic approach when per-fragment additional costs are low, due to the additional memory access and the increased complexity of the fragment shader (more code, more registers). But this cost seems well amortized when the shading costs per-fragment increase.

        Order Independent Transparency (OIT) demo application & source code
        New keys:
        • 'x' : Switch between ABuffer Algorithms (V1.0 Basic and V2.0 Linked List)
        • 'n' : Display the number of fragments per pixel.
        • 'g' : Swith between Alpha-Blending and Gelly resolve modes.

        UPDATE 28/10/2010: Oscarbg did a port of the demo so that it can run on AMD (mainly removing everything related to shader_load/store), more info there:
        But sadly still does not work on AMD, so if an AMD guy read that, your help is welcome !
        I can't try myself since I don't have any AMD card :-(

          Copyright © Icare3D Blog
          Designed by Templates Next | Converted into Blogger Templates by Theme Craft