<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Yakiimo3D &#187; DirectCompute</title>
	<atom:link href="http://www.yakiimo3d.com/category/directcompute/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.yakiimo3d.com</link>
	<description>Mostly DirectX 11 Programming</description>
	<lastBuildDate>Sun, 15 May 2011 07:58:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>DirectX11 Interop CUDA Mandelbrot Fractal</title>
		<link>http://www.yakiimo3d.com/2011/03/06/directx11-interop-cuda-mandelbrot-fractal/</link>
		<comments>http://www.yakiimo3d.com/2011/03/06/directx11-interop-cuda-mandelbrot-fractal/#comments</comments>
		<pubDate>Sun, 06 Mar 2011 14:36:30 +0000</pubDate>
		<dc:creator>yakiimo02</dc:creator>
				<category><![CDATA[Cuda]]></category>
		<category><![CDATA[Demo]]></category>
		<category><![CDATA[DirectCompute]]></category>
		<category><![CDATA[DirectX11]]></category>

		<guid isPermaLink="false">http://www.yakiimo3d.com/?p=1516</guid>
		<description><![CDATA[Introduction I updated my DirectCompute Mandelbrot fractal demo to be able to render using both CUDA and DirectCompute. The program is a simple Mandelbrot renderer, but you can dynamically switch between a CUDA and DirectCompute render. Relevant Links http://developer.download.nvidia.com/compute/cuda/sdk/website/Graphics_Interop.html#simpleD3D11Texture The NVIDIA CUDA 3.2 SDK includes a DirectX11 interop sample &#8220;Simple D3D11 Texture&#8221; which I used [...]]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>I updated my DirectCompute Mandelbrot fractal demo to be able to render using both CUDA and DirectCompute. The program is a simple Mandelbrot renderer, but you can dynamically switch between a CUDA and DirectCompute render.</p>
<h2>Relevant Links</h2>
<p><a href="http://developer.download.nvidia.com/compute/cuda/sdk/website/Graphics_Interop.html#simpleD3D11Texture" onclick="pageTracker._trackPageview('/outgoing/developer.download.nvidia.com/compute/cuda/sdk/website/Graphics_Interop.html_simpleD3D11Texture?referer=');">http://developer.download.nvidia.com/compute/cuda/sdk/website/Graphics_Interop.html#simpleD3D11Texture</a><br />
The NVIDIA CUDA 3.2 SDK includes a DirectX11 interop sample &#8220;Simple D3D11 Texture&#8221; which I used as reference. For the 3.2 SDK, I think this is the only CUDA sample using D3D11. The SDK sample uses a ID3D11Texture3D as a CUDA resource while my Mandelbrot fractal program uses a ID3D11Buffer as a CUDA resource.<br />
<br />
<a href="http://developer.nvidia.com/object/cuda-by-example.html" onclick="pageTracker._trackPageview('/outgoing/developer.nvidia.com/object/cuda-by-example.html?referer=');">http://developer.nvidia.com/object/cuda-by-example.html</a><br />
I mainly used the CUDA 3.2 SDK programming manual to learn CUDA, but I also read the NVIDIA &#8220;Cuda by Example&#8221; book on <a href="http://my.safaribooksonline.com/?portal=informit" onclick="pageTracker._trackPageview('/outgoing/my.safaribooksonline.com/?portal=informit&amp;referer=');">Safari Informit</a>. The book is very easy to follow and I read it in one sitting. This book teaches the basics of CUDA in a simple language and I thought it was a good first book.<br />
<br />
<a href="http://www.yakiimo3d.com/2010/02/02/directcompute-mandelbrot-fractal-viewer/">http://www.yakiimo3d.com/2010/02/02/directcompute-mandelbrot-fractal-viewer/</a><br />
My old DirectCompute Mandelbrot fractal viewer. The program has a bug (oh no!) when the screen size dimensions are not divisible by the thread group size dimensions, which I fixed in my new CUDA interop demo.</p>
<h2>Demo Notes</h2>
<p>Took some timings to compare the CUDA DirectX11 interop and DirectCompute performances. Timings were taken on my Geforce GTX 460, Driver 266.58, Vista 64-bit SP2, CUDA SDK 3.2. Neither the CUDA nor DirectCompute implementations have been optimized. With better documentation, better tools and finer control over the program, I think the CUDA program has a better chance of good optimization.</p>
<table>
<tr>
<td>Num Iterations</td>
<td>DirectCompute</td>
<td>CUDA</td>
</tr>
<tr>
<td>8</td>
<td>0.588ms/frame (1700fps)</td>
<td>0.749ms/frame (1335fps)</td>
</tr>
<tr>
<td>256</td>
<td>1.439ms/frame (695fps)</td>
<td>1.639ms/frame (610fps)</td>
</tr>
<tr>
<td>624</td>
<td>3.185ms/frame (314fps)</td>
<td>2.907ms/frame (344fps)</td>
</tr>
<tr>
<td>1024</td>
<td>4.878ms/frame (205fps)</td>
<td>4.274ms/frame (234fps)</td>
</tr>
</table>
<p>
The higher the iteration count, the more work that the compute shader has to do. When the iteration count is low, DirectCompute is faster. When the iteration count becomes higher, CUDA becomes faster than DirectCompute. I&#8217;m assuming this means that with my current code, the CUDA kernel execution is faster, but that there is a fixed cost for DirectX11 interop that makes CUDA initially slower.<br />
<br />
<a href="http://nvidia.fullviewmedia.com/gdc2011/agenda.html" onclick="pageTracker._trackPageview('/outgoing/nvidia.fullviewmedia.com/gdc2011/agenda.html?referer=');">http://nvidia.fullviewmedia.com/gdc2011/agenda.html</a><br />
On Friday, I watched the NVIDIA GDC2011 &#8220;GPU Radiosity: Porting the Enlighten runtime to CUDA&#8221; presentation, and around 28:23, the speaker mentions that &#8220;Switching between D3D and CUDA is expensive (it&#8217;s a power cycle!)&#8221;. I&#8217;m guessing this power mode switch cost is what makes CUDA initially slower in my Mandelbrot fractal program. I watched the entire Enlighten presentation and it was very interesting. The stream had some technical info, but if you are curious about Enlighten tech, there is a DICE&#038;Geomerics Siggraph 2010 presentation that contains even more detailed technical information about Enlighten <a href="http://advances.realtimerendering.com/s2010/index.html" onclick="pageTracker._trackPageview('/outgoing/advances.realtimerendering.com/s2010/index.html?referer=');">http://advances.realtimerendering.com/s2010/index.html</a>.</p>
<h2>Demo</h2>
<p>Source Code &#038; Binary<br />
<a href="http://yakiimo3d.codeplex.com/releases/view/62087" onclick="pageTracker._trackPageview('/outgoing/yakiimo3d.codeplex.com/releases/view/62087?referer=');">http://yakiimo3d.codeplex.com/releases/view/62087</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.yakiimo3d.com/2011/03/06/directx11-interop-cuda-mandelbrot-fractal/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>DirectCompute Cloth Sample Included In Bullet Physics 2.77</title>
		<link>http://www.yakiimo3d.com/2010/09/23/directcompute-cloth-sample-included-in-bullet-physics-2-77/</link>
		<comments>http://www.yakiimo3d.com/2010/09/23/directcompute-cloth-sample-included-in-bullet-physics-2-77/#comments</comments>
		<pubDate>Thu, 23 Sep 2010 03:51:45 +0000</pubDate>
		<dc:creator>yakiimo02</dc:creator>
				<category><![CDATA[DirectCompute]]></category>
		<category><![CDATA[DirectX11]]></category>

		<guid isPermaLink="false">http://www.yakiimo3d.com/?p=1023</guid>
		<description><![CDATA[http://bulletphysics.org/Bullet/phpBB3/viewtopic.php?t=5681 Just found out that the recently released Bullet Physics 2.77 contains OpenCL and DirectCompute hardware accelerated cloth simulation samples contributed by AMD. http://channel9.msdn.com/Blogs/gclassy/DirectCompute-Lecture-Series-230-GPU-Accelerated-Physics You can watch a video of ATI/AMD&#8217;s Lee Howes presenting on the DirectCompute cloth implementation on MSDN (linked in the above Erwin Coumans&#8217;s announcement post.) If you don&#8217;t want to watch [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://bulletphysics.org/Bullet/phpBB3/viewtopic.php?t=5681" onclick="pageTracker._trackPageview('/outgoing/bulletphysics.org/Bullet/phpBB3/viewtopic.php?t=5681&amp;referer=');">http://bulletphysics.org/Bullet/phpBB3/viewtopic.php?t=5681</a><br />
Just found out that the recently released Bullet Physics 2.77 contains OpenCL and DirectCompute hardware accelerated cloth simulation samples contributed by AMD.<br />
<br />
<a href="http://channel9.msdn.com/Blogs/gclassy/DirectCompute-Lecture-Series-230-GPU-Accelerated-Physics" onclick="pageTracker._trackPageview('/outgoing/channel9.msdn.com/Blogs/gclassy/DirectCompute-Lecture-Series-230-GPU-Accelerated-Physics?referer=');">http://channel9.msdn.com/Blogs/gclassy/DirectCompute-Lecture-Series-230-GPU-Accelerated-Physics</a><br />
You can watch a video of ATI/AMD&#8217;s Lee Howes presenting on the DirectCompute cloth implementation on MSDN (linked in the above Erwin Coumans&#8217;s announcement post.) If you don&#8217;t want to watch the full video, notice that the slides for the video are available for download as well.<br />
<br />
I downloaded Bullet Physics 2.77 and compiled and ran the included DX11 DirectCompute cloth sample. On my HD5750, with the Release build, I get around 310-350 fps for the 5 cloth scene. As mentioned in the forum annoucement, no collision detection yet, so the cloth sometimes penetrates itself, but overall it looks nice.<br />
<br />
<a href="http://cedec.cesa.or.jp/2010/en/sessions/PG/C10_P0206.html" onclick="pageTracker._trackPageview('/outgoing/cedec.cesa.or.jp/2010/en/sessions/PG/C10_P0206.html?referer=');">http://cedec.cesa.or.jp/2010/en/sessions/PG/C10_P0206.html</a><br />
Apparently the AMD DirectCompute session at CEDEC 2010 (Japan&#8217;s GDC) talked about this integrated cloth simulation sample for the Bullet. I went to a couple of CEDEC sessions this year, but I did not go to the above AMD session. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.yakiimo3d.com/2010/09/23/directcompute-cloth-sample-included-in-bullet-physics-2-77/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Siggraph 2010: 3 New Intel DirectX11 Demos</title>
		<link>http://www.yakiimo3d.com/2010/07/31/siggraph-2010-3-new-intel-directx11-demos/</link>
		<comments>http://www.yakiimo3d.com/2010/07/31/siggraph-2010-3-new-intel-directx11-demos/#comments</comments>
		<pubDate>Sat, 31 Jul 2010 05:50:31 +0000</pubDate>
		<dc:creator>yakiimo02</dc:creator>
				<category><![CDATA[DirectCompute]]></category>
		<category><![CDATA[DirectX11]]></category>

		<guid isPermaLink="false">http://www.yakiimo3d.com/?p=978</guid>
		<description><![CDATA[http://visual-computing.intel-research.net/art/publications/sdsm/ Sample Distribution Shadow Maps http://visual-computing.intel-research.net/art/publications/avsm/ Adaptive Volumetric Shadow Maps http://visual-computing.intel-research.net/art/publications/deferred_rendering/ Deferred Rendering for Current and Future Rendering Pipelines Learned about it from this Beyond3D thread (http://forum.beyond3d.com/showthread.php?t=58180). All 3 demos come with source code and ran fine for me on my HD5750.]]></description>
			<content:encoded><![CDATA[<p><a href="http://visual-computing.intel-research.net/art/publications/sdsm/" onclick="pageTracker._trackPageview('/outgoing/visual-computing.intel-research.net/art/publications/sdsm/?referer=');">http://visual-computing.intel-research.net/art/publications/sdsm/</a><br />
Sample Distribution Shadow Maps<br />
<br />
<a href="http://visual-computing.intel-research.net/art/publications/avsm/" onclick="pageTracker._trackPageview('/outgoing/visual-computing.intel-research.net/art/publications/avsm/?referer=');">http://visual-computing.intel-research.net/art/publications/avsm/</a><br />
Adaptive Volumetric Shadow Maps<br />
<br />
<a href="http://visual-computing.intel-research.net/art/publications/deferred_rendering/" onclick="pageTracker._trackPageview('/outgoing/visual-computing.intel-research.net/art/publications/deferred_rendering/?referer=');">http://visual-computing.intel-research.net/art/publications/deferred_rendering/</a><br />
Deferred Rendering for Current and Future Rendering Pipelines<br />
<br />
Learned about it from this Beyond3D thread (<a href="http://forum.beyond3d.com/showthread.php?t=58180" onclick="pageTracker._trackPageview('/outgoing/forum.beyond3d.com/showthread.php?t=58180&amp;referer=');">http://forum.beyond3d.com/showthread.php?t=58180</a>). All 3 demos come with source code and ran fine for me on my HD5750. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.yakiimo3d.com/2010/07/31/siggraph-2010-3-new-intel-directx11-demos/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Beyond3D DirectCompute Buddhabrot Thread</title>
		<link>http://www.yakiimo3d.com/2010/05/15/beyond3d-directcompute-buddhabrot-thread/</link>
		<comments>http://www.yakiimo3d.com/2010/05/15/beyond3d-directcompute-buddhabrot-thread/#comments</comments>
		<pubDate>Sat, 15 May 2010 12:52:11 +0000</pubDate>
		<dc:creator>yakiimo02</dc:creator>
				<category><![CDATA[DirectCompute]]></category>
		<category><![CDATA[DirectX11]]></category>
		<category><![CDATA[WebSite]]></category>

		<guid isPermaLink="false">http://www.yakiimo3d.com/?p=655</guid>
		<description><![CDATA[http://forum.beyond3d.com/showthread.php?t=57042 People on the Beyond3D GPGPU forum have optimized my DirectCompute Buddhabrot implementation. Good discussions that I can learn a lot from. Running pcchen&#8217;s optimized version on my HD5750 (you can dl it from the forum), I saw a 6x speed up from my original code. GPGPU Flam4&#8242;s author Keldor also gave me optimization advice [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://forum.beyond3d.com/showthread.php?t=57042" onclick="pageTracker._trackPageview('/outgoing/forum.beyond3d.com/showthread.php?t=57042&amp;referer=');">http://forum.beyond3d.com/showthread.php?t=57042</a><br />
<br />
People on the Beyond3D GPGPU forum have optimized my DirectCompute Buddhabrot implementation. Good discussions that I can learn a lot from. Running pcchen&#8217;s optimized version on my HD5750 (you can dl it from the forum), I saw a 6x speed up from my original code. GPGPU Flam4&#8242;s author Keldor also gave me optimization advice as a comment on my blog post (<a href="/?p=481">http://www.yakiimo3d.com/2010/03/29/dx11-directcompute-buddhabrot-nebulabrot/</a>). I was planning on writing an optimized version of my Buddhabrot implementation and had been looking up DirectCompute RNG implementations, but I got real busy at work, and now that time has passed, I want to do something else. My next demo is probably not going to be an optimized DirectCompute Buddhabrot implementation.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.yakiimo3d.com/2010/05/15/beyond3d-directcompute-buddhabrot-thread/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>DX11 DirectCompute Buddhabrot &amp; Nebulabrot Renderer</title>
		<link>http://www.yakiimo3d.com/2010/03/29/dx11-directcompute-buddhabrot-nebulabrot/</link>
		<comments>http://www.yakiimo3d.com/2010/03/29/dx11-directcompute-buddhabrot-nebulabrot/#comments</comments>
		<pubDate>Mon, 29 Mar 2010 01:17:30 +0000</pubDate>
		<dc:creator>yakiimo02</dc:creator>
				<category><![CDATA[Demo]]></category>
		<category><![CDATA[DirectCompute]]></category>
		<category><![CDATA[DirectX11]]></category>

		<guid isPermaLink="false">http://www.yakiimo3d.com/?p=481</guid>
		<description><![CDATA[Introduction I wrote a DX11 DirectCompute implementation of the famous Buddhabrot fractal. The implementation is an extension of my earlier Mandelbrot fractal DX11 DirectCompute program (http://www.yakiimo3d.com/2010/02/02/directcompute-mandelbrot-fractal-viewer/). I also use my Rheinhard tonemapping code (http://www.yakiimo3d.com/2010/03/13/dx11-directcompute-global-operator-photographic-tonemapping/) to bring the HDR Buddhabrot color values into the LDR framebuffer&#8217;s [0,1] range. It&#8217;s good that I used code from my [...]]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>I wrote a DX11 DirectCompute implementation of the famous Buddhabrot fractal. The implementation is an extension of my earlier Mandelbrot fractal DX11 DirectCompute program (<a href="/?p=226">http://www.yakiimo3d.com/2010/02/02/directcompute-mandelbrot-fractal-viewer/</a>). I also use my Rheinhard tonemapping code (<a href="/?p=333">http://www.yakiimo3d.com/2010/03/13/dx11-directcompute-global-operator-photographic-tonemapping/</a>) to bring the HDR Buddhabrot color values into the LDR framebuffer&#8217;s [0,1] range. It&#8217;s good that I used code from my old demos because I found and fixed bugs in my tonemapping code and also realized I had forgotten to upload the source and binary for my DX11 Mandelbrot demo to CodePlex (doh!).<br />
<br />
Regular Buddhabrot renderings result in monotone achromatic images because the same single value is written to each RGB channel. The Nebulabrot is a simple extension to the Buddhabrot, where you plot the Buddhabrot 3 times with a different iteration exit max value, and assign each of the 3 iteration plots to a different RGB channel (In an actual implementation, you can render the 3 iteration exit max value plots in one draw by just using if branches.) Since the implementation is easy and the resulting renders are more interesting, my Buddhabrot implementation supports Nebulabrot renderings as well.<br />
<br />
As usual, a CodePlex link for my Buddhabrot program&#8217;s source code and binary are provided near the end of the article.</p>
<h2>Relevant Links</h2>
<p>1) <a href="http://www.superliminal.com/fractals/bbrot/bbrot.htm" onclick="pageTracker._trackPageview('/outgoing/www.superliminal.com/fractals/bbrot/bbrot.htm?referer=');">http://www.superliminal.com/fractals/bbrot/bbrot.htm</a><br />
Web page by the Buddhabrot&#8217;s discoverer Melinda Green. Contains good explanations of the Buddhabrot and the Nebulabrot. There is a Buddhabrot implementation links list at the bottom of her page. While searching information about the Buddhabrot, I came across Melinda Green&#8217;s stackoverflow.com profile (<a href="http://stackoverflow.com/users/181535/melinda-green" onclick="pageTracker._trackPageview('/outgoing/stackoverflow.com/users/181535/melinda-green?referer=');">http://stackoverflow.com/users/181535/melinda-green</a>). I thought it was cool that the discoverer is a programmer and uses sites like stackoverflow.<br />
<br />
2) <a href="http://local.wasp.uwa.edu.au/~pbourke/fractals/buddhabrot/" onclick="pageTracker._trackPageview('/outgoing/local.wasp.uwa.edu.au/_pbourke/fractals/buddhabrot/?referer=');">http://local.wasp.uwa.edu.au/~pbourke/fractals/buddhabrot/</a><br />
Paul Bourke&#8217;s web page has great explanations for a wide variety of mathematical and geometric algorithms. I&#8217;ve used his webpage for reference on numerous other occasions. His page on the Buddhabrot contains an implementation with source code.<br />
<br />
3) <a href="http://iquilezles.org/www/articles/budhabrot/budhabrot.htm" onclick="pageTracker._trackPageview('/outgoing/iquilezles.org/www/articles/budhabrot/budhabrot.htm?referer=');">http://iquilezles.org/www/articles/budhabrot/budhabrot.htm</a><br />
Very nice Nebulabrot renderings. This page made me realize that I could use a square or cubic function in order to increase the contrast of my Buddhabrot and make the renderings more pretty.<br />
<br />
4) <a href="http://brnz.org/hbr/?p=297" onclick="pageTracker._trackPageview('/outgoing/brnz.org/hbr/?p=297&amp;referer=');">http://brnz.org/hbr/?p=297</a><br />
A very cool PS3 SPU Buddhabrot implementation with source code. I follow the author on twitter (<a href="http://twitter.com/twoscomplement" onclick="pageTracker._trackPageview('/outgoing/twitter.com/twoscomplement?referer=');">http://twitter.com/twoscomplement</a>) and his PS3 implementation was one of my first encounters with the Buddhabrot.<br />
<br />
5) <a href="http://www.steckles.com/buddha/" onclick="pageTracker._trackPageview('/outgoing/www.steckles.com/buddha/?referer=');">http://www.steckles.com/buddha/</a><br />
The base routine of the Buddhabrot algorithm is similar to pathtracing and involves random sampling of the image each frame. This page suggests applying the Metropolis-Hastings algorithm used with pathtracing in order to statistically speed up convergence of the Buddhabrot image. The results seem impressive, and if I feel up to it, I would like to try this technique in the future.<br />
<br />
6) <a href="http://en.wikipedia.org/wiki/Buddhabrot" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/Buddhabrot?referer=');">http://en.wikipedia.org/wiki/Buddhabrot</a><br />
The Wikipedia page for the Buddhabrot has good information and links.  </p>
<h2>Movie of My Buddhabrot Demo</h2>
<p><object width="480" height="385"><param name="movie" value="http://www.youtube.com/v/K0-mZONfoMU&#038;hl=ja_JP&#038;fs=1&#038;"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/K0-mZONfoMU&#038;hl=ja_JP&#038;fs=1&#038;" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="385"></embed></object></p>
<p>I posted a movie of my Buddhabrot demo to YouTube. The Buddhabrot in the demo is a Nebulabrot with max iteration values of red: 10,000 green: 1,000 blue: 100 (the default values in my program.)</p>
<h2>A Screen Capture of My Buddhabrot Demo</h2>
<p><a href="http://www.yakiimo3d.com/wp-content/uploads/2010/03/buddhabrot_render.png"><img src="http://www.yakiimo3d.com/wp-content/uploads/2010/03/buddhabrot_render.png" alt="1592x1028 Buddhabrot" width="800" height="525"/></a><br />
<br />
I let my demo run for around 20 min and then took a screenshot.  The frame rate was around 42 fps during the render. My implementation collects 10,000 samples per frame, so the above Nebulabrot was rendered at a sampling rate of 420,000 samples per second. Click on the image to see the image at its full 1592&#215;1028 resolution. The Nebulabrot parameters are the same as the default ones used in the movie. The tonemap middle-grey value is probably different.</p>
<h2>Implementation Details</h2>
<h3>Compute Shader Thread Setup</h3>
<p>I call Dispatch with a thread group dimension of <10,10,1> (10&#215;10=100 thread groups in total.) I declare numthread with a (10,10,1) size; this means each thread group will contain 100 threads. 100 thread groups each with 100 threads means that a total of 10,000 threads are executed in parallel during a single compute shader execution.  I only do 1 random number sampling per thread, so 10,000 Buddhabrot samplings are performed in one frame. </p>
<h3>Random Number Generation</h3>
<p>I do random number generation on the CPU-side and just lock &#038; copy the data to a GPU buffer every frame (prob should use multiple buffers to avoid lock, but I don&#8217;t do that in this demo.) I tried assigning a multiply-with-carry random number generator to each thread and generating a RNG sequence inside the compute shader, but the results showed repetition. Finding and implementing a GPU parallel RNG algorithm seemed like a lot of work, so I didn&#8217;t try investigating further.</p>
<h3>Atomic Operations</h3>
<p>I used InterlockedAdd to write the Buddhabrot plot to a global memory iteration count buffer. Writing to global memory repeatedly (very rare worst case is iterations max-1 times, so 9999+999+99 for the demo Nebulabrot) during each thread execution (10,000 threads ttl in the demo!) probably resulted in a big performance hit. I&#8217;ve read that atomic operations should be minimized if possible, but I actually got slower performance if I didn&#8217;t use InterlockedAdd (writes are not synchronized and wrong, but just for testing.) Not exactly sure why this is.</p>
<h3>Tone Mapping</h3>
<p>I used my earlier implementation of the Rheinhard global tonemapping operator to map the Buddhabrot output buffer to a LDR range. Some Buddhabrot renders on the Internet have no details in the dark areas because the authors didn&#8217;t have time to implement a good HDR->LDR mapping algorithm. The nature of Rheinhard tonecurve (less compression in the darker areas) and the fact that I can set the middle-grey value helps my Buddhabrot renders retain detail in both the bright and dark areas of the image.</p>
<h3>Contrast Brightening</h3>
<p>In order to increase the contrast of my Buddhabrot renders, and make the bright areas really bright, I cube the luminance value of the Buddhabrot output buffer in Yxy color space. Combined with Rheinhard tonemapping, I get images with strong bright areas while still retaining detail in the darker areas.<br />
</p>
<h2>Buddhabrot Demo</h2>
<p>In terms of hardware, I own the same hardware I was using when I wrote my Mandelbrot DirectCompute implementation (more recent ATI Catalyst drivers.) The CPU in my machine is an Intel Core2Quad Q6600 2.4ghz (not overclocked) and my GPU is an ATI Radeon HD5750.<br />
<br />
My Buddhabrot DirectCompute implementation turned out 4-6 times faster than my reference CPU implementation (single-thread, non-SIMD.) This is a smaller performance gain compared to my Mandelbrot DirectCompute implementation, which was +50x faster than the CPU implementation (also single-thread, non-SIMD.) I&#8217;m still impressed with the DirectCompute performance though, since the Buddhabrot fractal is more computationally complex than the Mandelbrot fractal, and a 4-6x time speedup is a noticeable one when doing renders.<br />
<br />
Source Code &#038; Binary<br />
<a href="http://yakiimo3d.codeplex.com/releases/view/42716" onclick="pageTracker._trackPageview('/outgoing/yakiimo3d.codeplex.com/releases/view/42716?referer=');">http://yakiimo3d.codeplex.com/releases/view/42716</a><br />
</p>
<h2>Conclusions</h2>
<p>I really enjoyed writing this Buddhabrot DirectCompute program. Being able to create a beautiful, interesting image like the Buddhabrot is really cool and made me glad I began learning programming 10+ years ago.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.yakiimo3d.com/2010/03/29/dx11-directcompute-buddhabrot-nebulabrot/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>

