NVIDIA just launched the second-generation
Maxwell architecture with the GM204 GPU, which is I believe, an incredible chip. The Maxwell 2 architecture is both highly energy efficient (~2x perf/watt of Kepler in games), and provides a lot of very exciting new graphics features (
some of them are exposed in Direct3D). These features are exposed in form of new OpenGL extensions in the
R344 driver that was released today, and the specification for all NVIDIA supported GL extensions can be found
here. NVIDIA also released
new SDK samples using these extensions.
List of new extensions
Quick description of the new extensions
Multisampling
This feature adds a lot of flexibility to the multi-sampled rasterization. It decouples the rasterization sampling frequency (which can be set explicitly) from the actual framebuffer storage.
This enables rasterization to operate at higher sampling frequency than the one of the target render color buffers.
It supports both depth and stencil testing at this frequency, if the corresponding depth and stencil buffers are sampled accordingly (it must be a multiple of the number of samples in the color buffers).
There are still some constraints; All color buffers must have the same number of samples, and the raster sample count must match the depth and stencil buffer sample count if depth or stencil test is enabled, and it must be higher or equal to the color buffer sample count.
A new “coverage reduction stage” is introduced in the per-fragment operations (after the fragment shader in early-z mode, after the depth-test in late-z), which converts a set of covered raster/depth/stencil samples to a set of covered color samples.
There is an implementation-dependent association of raster samples to color samples. The reduced "color coverage" is computed such that the coverage bit for each color sample is 1 if any of the associated bits in the fragment's coverage is set, and 0 otherwise.
This feature can be used in conjunction with the coverage to color feature (cf. below), in order to get the FS output coverage mask automatically transformed into a color by ROP.
According to
AnandTech, when rasterizing with explicit multisampling and no render-target, Maxwell allows evaluating primitive coverage at 16x MSAA.
Note that
EXT_raster_multisample is equivalent to "
Target-Independent Rasterization" in Direct3D 11.1, which allows using multiple raster samples with a single color sample, as long as depth and stencil tests are disabled, and it is actually a subset of
NV_framebuffer_mixed_samples which is more general and exposes more flexibility.
This allows using ROP to automatically convert the post depth-/stencil-/alpha- test coverage mask into a color and write it into a color render target.
This conversion is performed before the new coverage reduction stage (cf. NV_framebuffer_mixed_samples) and can be useful in order to save coverage in the context of deferred shading.
When operating in early-depth mode (
layout(early_fragment_tests) in;, see
here for more information),
this extension allows the fragment shader to get the post depth-test coverage mask of the current fragment as input (
gl_SampleMaskIn[], for which only sample passing the depth-test are set),
unlike the standard GL 4.5 behavior which provides the pre- depth-test coverage (actual triangle coverage).
With standard OpenGL, the Fragment Shader output coverage mask (
gl_SampleMask[]) is ANDed with the actual primitive input coverage mask before being used in subsequent pipeline stages.
This extension disables this AND operation, which allows the fragment shader to entirely override the primitive coverage, and enables setting coverage bits that are not present in the input mask.
This is actually very nice, because it allows using the output coverage as a way to dynamically route color output values into arbitrary sample locations inside a multisampled render target.
Allows applications to explicitly set the location of sub-pixel samples for multisample rasterization, providing fully programmable sampling patterns. Sampling patterns can be defined within a grid of adjacent pixels, which depends on the number of samples. According to provided queries, the sub-pixel positions are snapped to a 16x16 sub-pixel grid.
Rasterization
This is a really great feature. It allows Rasterization to generate fragments for any pixel touched by a triangle, even if no sample location is covered on the pixel.
A new control is also provided to modify the window coordinate snapping precision in order to allow the application to match conservative rasterization triangle snapping with the snapping that would have occurred at higher resolution.
Polygons with zero area generate no fragments. Any location within a pixel may be used for interpolating attributes, potentially causing attribute extrapolation if outside the triangle.
This can be useful for binning purpose for instance (using one pixel per-tile).
This extension exposes an hardware-accelerate critical section for the fragment shader, allowing hazard-free read-modify-write operations on a per-pixel basis.
It also allows enforcing primitive-ordering for threads entering the critical section.
It provides new GLSL calls beginInvocationInterlockNV() and endInvocationInterlockNV() defining a critical section which is guaranteed to be executed only for one fragment at a time.
Interlock can be done on a per-pixel or a per-sample basis if multi-sampled rasterization is used.
This feature is useful for algorithms that need to access per-pixel data structures via shader load and store operations, while avoiding race conditions. Obvious applications are OIT and programmable blending for instance.
This allows rasterizing the axis-aligned screen-space bounding box of submitted triangles, disregarding the actual triangle edges.
It can be useful for drawing a full-screen quad without an generating an internal edge for instance, or for more efficiently drawing user interfaces.
Geometry processing
This extension allows making geometry shaders more efficient in the case where they are pass-through, ie. there is a one-to-one mapping between input and output primitives.
In this case, per-vertex attributes are simply copied from the input primitive into the output primitive, and the geometry shader is only used to set per-primitive attributes (like gl_Layer, gl_ViewportMask[] ... ), which can be computed from the input vertex attributes.
Viewport multicast allows automatically broadcasting the same primitive to multiple viewports (and/or multiple layers when using layered render-targets) simultaneously, in order to be rasterized multiple times.
It is exposed through a new
gl_ViewportMask[] GLSL output attribute which is available in both the vertex shader and the geometry shader.
This can be especially powerful when combined to the new passthrough geometry shader. A sample using it for speeding-up cascaded shadow maps is available
here.
Texturing
This extension improves on
ARB_sparse_texture, which separates the allocation of virtual address space from the physical memory of textures, and provides the ability to sparsely allocate the physical backing-store of 2D/3D/2DArray textures on a per-tile basis.
This new extension adds the ability to retrieve texture access residency information from GLSL, to specify minimum allocated LOD to texture fetches and to return a constant zero value for lookups into unallocated pages. It also adds support for multi-sampled textures.
This exposes a new sampler parameter which allows performing a min or max reduction operation on the values sampled inside a texture filtering footprint, instead of the regular linear interpolation.
It is supported for all kind of textures, as well as anisotropic filtering.
Atomics
This extension provides a set of new atomic operations operating on 2 and 4 components vectors of 16b floating point values for images, bindless pointers to global memory and storage buffers.