Yakiimo3D

Mostly DirectX 11 Programming

Siggraph 2010: 3 New Intel DirectX11 Demos

http://visual-computing.intel-research.net/art/publications/sdsm/
Sample Distribution Shadow Maps

http://visual-computing.intel-research.net/art/publications/avsm/
Adaptive Volumetric Shadow Maps

http://visual-computing.intel-research.net/art/publications/deferred_rendering/
Deferred Rendering for Current and Future Rendering Pipelines

Learned about it from this Beyond3D thread (http://forum.beyond3d.com/showthread.php?t=58180). All 3 demos come with source code and ran fine for me on my HD5750.

DX11 Order Independent Transparency with MSAA

Introduction

I extended my DX11 OIT demo (http://www.yakiimo3d.com/2010/07/19/dx11-order-independent-transparency/) to support MSAA.

CodePlex link for my program’s source code and binary are provided at the end of the article.

Relevant Links

1) http://www.yakiimo3d.com/2010/07/19/dx11-order-independent-transparency/
Blog article for my original DX11 OIT demo. My original demo has no support for MSAA, so jaggies are very noticeable along the edges of the colored quads.

2) http://developer.amd.com/documentation/presentations/Pages/default.aspx
ATI’s conference presentation list contains a link to their “GDC 2010: OIT and GI using DX11 linked lists” presentation ppt. My original demo and this MSAA supported version are both based on this presentation.

Screen Captures






From top to bottom. MSAA none. MSAA 2x. MSAA 4x. MSAA 8x. I notice big quality improvements between none, 2x and 4x, but not too much between 4x and 8x. I think the anti-aliasing starts to look pretty nice around 4x. Notice that with increased sampling, the image quality improves, but the fps also drops.

Implementation Details

Not too many changes were necessary to the original demo in order to add MSAA support. The main change was adding the SV_COVERAGE and SV_SAMPLEINDEX semantic parameters to my pixel shader input structures and using their information.

Changes to StoreFragments.hlsl

struct ScenePS_Input
{
    float4 pos               : SV_POSITION;
    float4 color             : COLOR0;
    uint nCoverage        : SV_COVERAGE;                // Bit set according to which samples are covered.
};

I added a nCoverage SV_COVERAGE parameter to give me information about which samples are covered by the current fragment. For example, bit 1, 4 of nCoverage are set if samples 1 and 4 are covered. Now, in addition to storing each fragment’s color and depth, I also store the fragment’s sample coverage information.

Changes to SortFragmentsAndRender.hlsl

struct QuadPS_Input
{
    float4 pos          : SV_POSITION;
    uint nSampleIndex       : SV_SAMPLEINDEX;        // specify to run pixel shader per sample instead of per pixel  
};
...
// Read and store linked list data to the temporary buffer.
while( nNext != 0xFFFFFFFF )
{
    FragmentLink element = FragmentLinkSRV[nNext];
    aData[ nNumFragment ] = element.fragmentData;

    // Use sample only if the fragment covered it.
    if( aData[ nNumFragment ].nDepthAndCoverage & (1<<input.nSampleIndex) )
    {
        anIndex[ nNumFragment ] = nNumFragment;
        ++nNumFragment;
    }          
    nNext = element.nNext;
}

The SV_SAMPLEINDEX semantic was added to DX10.1 and can be used in DX11. By adding a pixel shader input parameter with the SV_SAMPLEINDEX semantic, it’s possible to make the pixel shader run once per sample instead of once per pixel. By running the sorting & blending pixel shader per sample and discarding the samples not covered by the fragment, it’s possible to get MSAA anti-aliasing even with the StructuredBuffer input fragment data.

OIT with MSAA Demo

Source Code & Binary
http://yakiimo3d.codeplex.com/releases/view/49570
The demo starts with a default setting of 4x MSAA. Open the Change Device menu and change the MSAA setting using the Multisample Count combobox.

Comments

This demo was my first time using the SV_SAMPLEINDEX pixel shader semantic and processing pixel per sample instead of per pixel. It’s apparently useful for other stuff such as deferred rendering MSAA, so I’ll probably find myself using it again some time in the future. Also, I bought the XBOX Live Arcade game Limbo and played it while taking a break from writing this blog post. From what I’ve played so far, it’s a dark, but very artistic and beautiful game. So far, I’m glad I bought it.

DX11 Order Independent Transparency

Introduction

This is my implementation of the ATI DX11 linked list OIT algorithm. The DirectX11 SDK OIT sample is pretty slow (a compute shader usage sample), but have read from numerous sources that the ATI OIT algorithm is pretty fast (it is), so I decided to try implementing it. Lots of people have already implemented ATI’s linked list OIT and have made public their implementations so I had a lot of good references for my simple implementation.

CodePlex link for my program’s source code and binary are provided at the end of the article.

Relevant Links

1) http://developer.amd.com/documentation/presentations/Pages/default.aspx
ATI’s conference presentation list contains a link to their “GDC 2010: OIT and GI using DX11 linked lists” presentation ppt. This demo is an implementation of the algorithm described in this presentation.

2) http://www4.atword.jp/cathy39/category/direct3d11/oit-direct3d11/
GPU Pro contributor Kaori Kubota-san’s linked list OIT implementation. She walks through the linked list OIT implementation pretty carefully and her explanations are interspersed with source code. Couldn’t find sample code for stuff like D3D11_BUFFER_UAV_FLAG_COUNTER, so her explanations were very helpful. My implementation is based on the ATI presentation ppt and her blog posts.

3) http://orenk2k.blogspot.com/2010/03/oit-order-independent-transparency.html
A blog post about a linked-list OIT implementation. Good explanations.

4) http://sites.google.com/site/monshonosuana/directxno-hanashi-1/directx-110
A Japanese graphics programming website recently posted a linked list OIT implementation. Good explanations (in Japanese) with source code and binary.

5) http://www.uraldev.ru/articles/id/36
Another linked list OIT implementation. Article is in Russian, so I used google translate to read it. Nice article with source code and binary provided. Learned about the implementation from the author’s blog post comment here http://www.wolfgang-engel.info/blogs/?p=116

6) http://joescg.blogspot.com/2010/01/oita-buffer-demo.html
Another Russian OIT implementation. Good explanations read using google translate again. Learned about the implementation from here http://www.geeks3d.com/forums/index.php/topic,861.0.html

7) http://www.yakiimo3d.com/2010/06/19/dx11-high-quality-global-illumination-rendering-using-rasterization/
The DX11 linked list technique can be used to speed up algorithms besides alpha blending. Toshiya Hachisuka-san’s “High-Quality Global Illumination Rendering Using Rasterization” algorithm can be sped up using ATI’s StructuredBuffer linked list technique.

A Screen Captures From My OIT Demo



Regular alpha blending using ID3D11BlendState. Notice that the far section of the blue quad is incorrectly rendered over the red quad. Since each colored quad is drawn in a single draw call, it’s not possible to sort the geometry to get correct alpha blending.



Rendered with the OIT algorithm. In the first pass, all pixel fragments are stored into a large StructuredBuffer. In the 2nd pass, the stored pixel fragments are read and sorted per-pixel inside a pixel shader program and everything is correctly rendered.

Implementation Details

Other websites already explain the linked list OIT algorithm, so not going to do that in this blog post. On my ATI HD5750, with the DirectX SDK OIT sample, I get a fps of around 10 fps at a screen size of 320×240. With my ATI OIT implementation, I get a fps of around 700 fps at a screen size of 640×480. With regular ID3D11BlendState alpha blending, I get a fps around 3000 fps at a screen size of 640×480.

OIT Demo

Source Code & Binary
http://yakiimo3d.codeplex.com/releases/view/49213

Comments

It’s been over 3 months since my last demo. For the past month, I had been feeling somewhat down, but now I feel much better, so maybe I will be more productive. Not sure what I want to implement next. Maybe a translucent material demo based on this OIT demo. Or maybe experiment with DX11 tesselation and implement a DX11 Phong Tesselation demo.

CEDEC 2010: LostPlanet2 DirectX11 Features

http://cedec.cesa.or.jp/2010/program/PG/C10_P0167.html

Looks like Capcom is hosting a DirectX11 session at CEDEC 2010. The session summary says they are going to be talking about the graphical and performance enhancements gained by adding DirectX11 support for Lost Planet 2. The summary says the session is also going to talk about the difficulties involved in adding DirectX11 support to existing titles. As someone interested in DirectX11 programming, this session looks great! I hope I get to go to this year’s CEDEC and hear this session.

DX11 “High-Quality Global Illumination Rendering Using Rasterization”

http://otd7.jbbs.livedoor.jp/709763/bbs_plain?06190300
BBS post 1576 is Toshiya Hachisuka-san’s comment about the possibility of a faster implementation of “High-Quality Global Illumination Rendering Using Rasterization” using DX11 (in Japanese.)

In my blog post about GameFest 2010 presentations (http://www.yakiimo3d.com/2010/06/18/the-entire-gamefest-2010-presentations-now-available/), I wrote about Toshiya Hachisuka-san’s GPU Gems 2 article “High-Quality Global Illumination Rendering Using Rasterization”. Today, I noticed that Hachisuka-san made a comment on his BBS that his GPU Gems 2 method can be updated to use a DX11 OIT linked-list type of implementation (with no sorting necessary) and sped up. Seems like an interesting implementation to try out with possibilities for real good results.