hpdz.net

High-Precision Deep Zoom

Technical Info

Fractal Rendering Software

All images on this site were created with custom software I have been slowly developing over the past 7-8 years. It is written in C++ with a little hand-coded assembly language. The compiler is Visual Studio, and the user interface is based on MFC.

Topics

Math Operations

For low-precision rendering, the software uses the Pentium's native double-precision IEEE-754 floating point data type. This is fine as long as the pixel spacing can be sufficiently accurately represented in this data type. This format has 52 bits of precision for the fractional part of a number, which implies a precision limit of 2.2e-16. That imposes a practical limit of about 2e-14 as the smallest-size image that can be drawn with the native double-precision type, depending on exactly where on the complex plane you are and also on how many pixels are in the image. If you zoom much smaller than that the granularity of the bits in the numbers becomes comparable to the spacing between pixels, and you can't accurately step through the image. For images that require smaller pixel steps than about 2e-14, specialized high-precision arithmetic software has to be used. This is the case for all fractal imaging systems capable of deep zooming. Unfortunately, even the best software high-precision arithmetic functions in the world are much slower than the native hardware-based floating point operations in the Pentium.

The critical high-precision math sections in my software are written in hand-coded assembly language and have evolved through several generations. Multiplication, which is the most time-consuming operation, is coded using the Pentium IV SSE2 instructions. This instruction set has instructions for operating on four DWORDs at a time, and when data is 16-byte aligned these instructions are stunningly fast. However, there are some significant drawbacks to using the SSE2 instructions. They do not support a carry flag, and they require complicated shuffling of the operands load them into the right parts of the 128-bit registers. Nevertheless, the speed gain offsets the additional complications, and the end result is that I have a multiplication function that is more than twice as fast as my previous one that used only the general-purpose registers. Originally, when I only had a Pentium III, I worked really hard on getting a faster multiplication function using the P-III's MMX instructions, which do not include the SSE2 extensions, but the overhead was more than the gain that could be realized using only the 64-bit SSE/MMX instructions in the Pentium III.

Addition, which is much less time-consuming than multiplication, is still implemented with the general purpose instructions.

Performance data for the program are shown in the table below. The multiplications/second benchmark is measured by repeatedly multiplying two numbers with no other calculations being performed. The Mandelbrot operations/second data refers to repeated iterations of the second-order Mandelbrot polynomial as would be used in an actual drawing of an image.

The numerical format uses one 32-bit DWORD to represent the integer part and the remaining bits to represent the fractional part. The value shown in the third column is the number of digits that can be represented by the fractional part. This is not the same as the practical limit for rendering images, since the coordinate increments need to be very accurate across the whole image, which requires an additional safety factor of at least a factor of 100, and I usually use 1000 just to be conservative.

High-Precision Arithmetic Performance on Pentium-IV System
Total BitsFractional BitsFractional Part Decimal DigitsMultiplications, Millions/secondMandelbrot Operations, Thousands/second
1289628.8910.641938
19216048.167.141488
25622467.434.571067
32028886.702.91771
384352105.962.29609
448416125.231.78488
512480144.491.45388

To put this in perspective, this software can draw a 640x480 image of the interior of the Mandelbrot set to 1000 iterations at 160 digits of precision in 8.4 minutes on my 3.2 GHz Pentium IV [640 * 480 * 1000 / 609,000 = 504 seconds].

Colorization

Colorization is done by several methods. I often use a simple periodic linear method where count numbers are mapped to a wrap-around color table, with interpolation between table values as needed. I use fractional count generation to make smooth gradients of colors, greatly reducing the visible dwell bands. I sometimes use an adaptive palette mapping algorithm that balances the spectrum of colors across the range of counts within an image, but this method works much better for still images than it does for animations because the range of counts tends to be very unstable as you zoom into an image, resulting in very annoying jitter of the color mapping. I have experimented with a rank-order system for coloring, and it reduces this problem considerably. I made a test animation and it looks OK. I may incorporate this into some future artistic animation projects.

Distributed Processing

The software is capable of distributed processing (DP) and can divide up the task of drawing an image among several computers, or among several cores on a multi-core system. This was more useful when I had several nearly equivalent Pentium III systems. Once I got a Pentium-IV, it was so much faster that adding the P-III systems to the processing didn't make much difference, and I didn't use the DP capability very much. Now that I have a quad-core system, the DP subsystem allows the software to divide up the rendering work among the four cores.

Animation

Animation paths are calculated using an exponential function to provide zooming. A polynomial function provides smooth start and stop of zooming. Similarly, a polynomial function scaled to the frame size controls panning. Panning and zooming are implemented with a complicated scale-invariant panning adjustment so panning is well-behaved even while zooming (see HowNotToZoom for an example of what motivated me to develop this).

Types of Fractals

The software can generate images in high-precision for any order polynomial Mandelbrot function. I have not implemented other types of fractal generating functions at this point.

Drawing Operations

The software has a high-precision distance estimator function. I like the DE method of drawing for many things, especially areas where the set is predominantly filamentous. It keeps the details of the points near the set in sharp contrast, and if the parameters are set right, it also adds a texture to the background that makes the whole image much more visually interesting. A similar effect can be achieved in the standard count-based rendering method by reducing the escape radius to 4, rather than keeping it at a very high value like 10000. Higher values for the escape radius, combined with fractional count generation, can produce very smooth gradients in the background, but sometimes that's not what you want and it can look rather dull.

I have in the past used a rectangular decomposition method (a slight variation of the Mariani-Silver algorithm) to speed up drawing of areas that have a lot of points in the set. My variation was to only use the algorithm to fill in the interior of the set, so that smooth gradients outside the set can still be drawn. I gave this whole thing up when I was focusing my efforts on still images because typically they do not have many close-up encounters with mini-brots within them. However, when animating deep zooms, mini-brots often appear, and I am actively working on bringing back rectangular decomposition with an adaptive algorithm to detect when it will be optimal to render the image this way.

Anti-Aliasing

Aliasing is inevitable when drawing the Mandelbrot set because it has an infinitely high spatial bandwidth and you have to sample it before you can make any attempt to filter it. No matter how fine a sampling grid you make, you can never establish a Nyquist frequency higher than the highest spatial frequency of the image data, so there will always be aliasing. The dominant form this aliasing takes is occasional (or overwhelming, depending on where you are trying to view) bright pixels showing up in areas where it looks like they don't belong.

Although it is impossible to eliminate this noise altogether, there are ways to mitigate its effects. One easy method is to use a simple windowed average, where each pixel in the final image is the average of several pixels in a small (smaller than the size of the pixel) neighborhood around its center. The average can be windowed so that subpixels farther from the center are weighted less than subpixels near the center. This works pretty well, but multiplies the rendering time by gigantic factors like 4, 9, 16, 25, etc. I also tried a median filter, which proved to be a bit too severe, even with only a 2 pixel radius. A median filter using just the four nearest neighbors (up, down, left, and right) works fairly well, and a median filter with subpixel oversampling works very well indeed. Still, there is a substantial increase in rendering time, and I don't use anti-aliasing methods very much for animations. This is definitely an area where throwing money at the problem will help tremendously.

There are highly sophisticated methods for reducing aliasing and also filtering out the noise that I call "sparkle" noise. This is basically what the mainstream image processing community would call "salt-and-pepper" noise, although in this case there is no pepper, only salt. A median filter will help, but I am investigating far more sophisticated, relatively new, nonlinear filters for reducing this noise.

Frame Interpolation

This is, in my opinion, a controversial technique for speeding up drawing of extreme deep zooms. First let me explain what it is and how it works, then why I think it is controversial. I will also explain how it can be done in a way that isn't "cheating."

Drawing every frame individually is extraordinarily time-consuming -- it can easily take 5 minutes to draw a single 640x480 frame at modest zoom levels, and at extreme magnifications it can take an hour or more. It is also very wasteful since there is a lot of redundant information from one frame to the next. This is especially true when panning without zooming -- you're essentially recalculating 99% of exactly the same image information as in the previous frame at each step.

Frame interpolation is a way to speed this up. The idea is to draw an area once, then zoom in or pan around in that area using standard digital image processing methods to interpolate between the pixels in the larger image. Interpolating the pixels for the video frames from the larger image is much faster than drawing them all from scratch for every frame, especially for images with a very high average iteration count. For example, if you want a 320x240 video frame, you can draw, say, a 640x480 master image (sometimes called a "key frame"), hold it in memory, and zoom in to that image digitally by interpolating between the pixels in the calculated 640x480 image to generate each video frame (also called "interpolation frame"), rather than calculating every single video output frame directly.

The pragmatist in me loves this, since it can easily cut the time required to render a video substantially -- a factor of 10 or even 100, depending on how the interpolation is configured. But the purist in me balks at this. Yes, I balk. It somehow seems like cheating. Now, there are different ways of doing this, and some seem to me less like cheating than others. The criterion I would suggest is that the pixel increment size in any video frame should be no smaller than the pixel increment in the master image. Stated another way, the spatial sampling rate of the video frame should be no smaller than the spatial sample rate of the master (calculated) image. Or, stated yet another way, the spatial frequency of the video frames should be no higher than that of the master image. The video output frame should always be at a size where the master frame is oversampled relative to the spatial frequency of the video frame. You shouldn't be crossing into the domain where you're putting more than one pixel of video output between individual pixels in the master image, which would mean it is undersampled. As long as you're zooming into a master image with higher spatial resolution than the video output, you're not creating information that wasn't calculated. Make sense?

Anyway, another advantage of this method is that it reduces the amount of flickering caused by random fluctuations in the high-frequency spatial noise of the set. However, it also results in an odd jerky appearance each time a new large master frame is rendered. The solution is to overlap the child video frames from each master image by smoothly blending them as a new master/key frame is used. This is something I plan to implement soon. I expect that frame interpolation will open areas of the set that have so far been prohibitively intense computationally for serious (E100-level) deep zooming, like the cusps and the Seahorse Valleys.

Any video that I create with frame interpolation will be clearly labeled as such. I have not yet done any spatially undersampled interpolations at this point, but I might. I have seen a few staggeringly deep zooms on the internet (like E300 to E1000) and I am sure they were done this way.