DX11 DirectCompute Buddhabrot & Nebulabrot Renderer
I wrote a DX11 DirectCompute implementation of the famous Buddhabrot fractal. The implementation is an extension of my earlier Mandelbrot fractal DX11 DirectCompute program (http://www.yakiimo3d.com/2010/02/02/directcompute-mandelbrot-fractal-viewer/). I also use my Rheinhard tonemapping code (http://www.yakiimo3d.com/2010/03/13/dx11-directcompute-global-operator-photographic-tonemapping/) to bring the HDR Buddhabrot color values into the LDR framebuffer’s [0,1] range. It’s good that I used code from my old demos because I found and fixed bugs in my tonemapping code and also realized I had forgotten to upload the source and binary for my DX11 Mandelbrot demo to CodePlex (doh!).
Regular Buddhabrot renderings result in monotone achromatic images because the same single value is written to each RGB channel. The Nebulabrot is a simple extension to the Buddhabrot, where you plot the Buddhabrot 3 times with a different iteration exit max value, and assign each of the 3 iteration plots to a different RGB channel (In an actual implementation, you can render the 3 iteration exit max value plots in one draw by just using if branches.) Since the implementation is easy and the resulting renders are more interesting, my Buddhabrot implementation supports Nebulabrot renderings as well.
As usual, a CodePlex link for my Buddhabrot program’s source code and binary are provided near the end of the article.
Web page by the Buddhabrot’s discoverer Melinda Green. Contains good explanations of the Buddhabrot and the Nebulabrot. There is a Buddhabrot implementation links list at the bottom of her page. While searching information about the Buddhabrot, I came across Melinda Green’s stackoverflow.com profile (http://stackoverflow.com/users/181535/melinda-green). I thought it was cool that the discoverer is a programmer and uses sites like stackoverflow.
Paul Bourke’s web page has great explanations for a wide variety of mathematical and geometric algorithms. I’ve used his webpage for reference on numerous other occasions. His page on the Buddhabrot contains an implementation with source code.
Very nice Nebulabrot renderings. This page made me realize that I could use a square or cubic function in order to increase the contrast of my Buddhabrot and make the renderings more pretty.
A very cool PS3 SPU Buddhabrot implementation with source code. I follow the author on twitter (http://twitter.com/twoscomplement) and his PS3 implementation was one of my first encounters with the Buddhabrot.
The base routine of the Buddhabrot algorithm is similar to pathtracing and involves random sampling of the image each frame. This page suggests applying the Metropolis-Hastings algorithm used with pathtracing in order to statistically speed up convergence of the Buddhabrot image. The results seem impressive, and if I feel up to it, I would like to try this technique in the future.
The Wikipedia page for the Buddhabrot has good information and links.
Movie of My Buddhabrot Demo
I posted a movie of my Buddhabrot demo to YouTube. The Buddhabrot in the demo is a Nebulabrot with max iteration values of red: 10,000 green: 1,000 blue: 100 (the default values in my program.)
A Screen Capture of My Buddhabrot Demo
I let my demo run for around 20 min and then took a screenshot. The frame rate was around 42 fps during the render. My implementation collects 10,000 samples per frame, so the above Nebulabrot was rendered at a sampling rate of 420,000 samples per second. Click on the image to see the image at its full 1592×1028 resolution. The Nebulabrot parameters are the same as the default ones used in the movie. The tonemap middle-grey value is probably different.
Compute Shader Thread Setup
I call Dispatch with a thread group dimension of <10,10,1> (10×10=100 thread groups in total.) I declare numthread with a (10,10,1) size; this means each thread group will contain 100 threads. 100 thread groups each with 100 threads means that a total of 10,000 threads are executed in parallel during a single compute shader execution. I only do 1 random number sampling per thread, so 10,000 Buddhabrot samplings are performed in one frame.
Random Number Generation
I do random number generation on the CPU-side and just lock & copy the data to a GPU buffer every frame (prob should use multiple buffers to avoid lock, but I don’t do that in this demo.) I tried assigning a multiply-with-carry random number generator to each thread and generating a RNG sequence inside the compute shader, but the results showed repetition. Finding and implementing a GPU parallel RNG algorithm seemed like a lot of work, so I didn’t try investigating further.
I used InterlockedAdd to write the Buddhabrot plot to a global memory iteration count buffer. Writing to global memory repeatedly (very rare worst case is iterations max-1 times, so 9999+999+99 for the demo Nebulabrot) during each thread execution (10,000 threads ttl in the demo!) probably resulted in a big performance hit. I’ve read that atomic operations should be minimized if possible, but I actually got slower performance if I didn’t use InterlockedAdd (writes are not synchronized and wrong, but just for testing.) Not exactly sure why this is.
I used my earlier implementation of the Rheinhard global tonemapping operator to map the Buddhabrot output buffer to a LDR range. Some Buddhabrot renders on the Internet have no details in the dark areas because the authors didn’t have time to implement a good HDR->LDR mapping algorithm. The nature of Rheinhard tonecurve (less compression in the darker areas) and the fact that I can set the middle-grey value helps my Buddhabrot renders retain detail in both the bright and dark areas of the image.
In order to increase the contrast of my Buddhabrot renders, and make the bright areas really bright, I cube the luminance value of the Buddhabrot output buffer in Yxy color space. Combined with Rheinhard tonemapping, I get images with strong bright areas while still retaining detail in the darker areas.
In terms of hardware, I own the same hardware I was using when I wrote my Mandelbrot DirectCompute implementation (more recent ATI Catalyst drivers.) The CPU in my machine is an Intel Core2Quad Q6600 2.4ghz (not overclocked) and my GPU is an ATI Radeon HD5750.
My Buddhabrot DirectCompute implementation turned out 4-6 times faster than my reference CPU implementation (single-thread, non-SIMD.) This is a smaller performance gain compared to my Mandelbrot DirectCompute implementation, which was +50x faster than the CPU implementation (also single-thread, non-SIMD.) I’m still impressed with the DirectCompute performance though, since the Buddhabrot fractal is more computationally complex than the Mandelbrot fractal, and a 4-6x time speedup is a noticeable one when doing renders.
Source Code & Binary
I really enjoyed writing this Buddhabrot DirectCompute program. Being able to create a beautiful, interesting image like the Buddhabrot is really cool and made me glad I began learning programming 10+ years ago.