**64-bit math to accelerate fractal calculations**

When AMD announced their 64-bit processor architecture around 2002 we immediately realize that these processors, which are now standard in almost all computers, would greatly enhance deep-zoom fractal calculations. While a 32-bit processor can multiply two 32-bit numbers and get a 64-bit result, a 64-bit processor can multiply two 64-bit numbers and get a 128-bit result, potentially giving a four times speedup when doing high-precision math. In addition to this significant advantage the 64-bit processors have 64-bit registers, they have twice as many registers, and they can reference many millions of times more memory than the old 32-bit processors.

The 2.0 version of Fractal eXtreme comes in two versions: 32-bit and 64-bit. The 64-bit version uses the power of 64-bit processors to calculate deep-zoom fractals much faster than a 32-bit pogram can. Given a 64-bit processor (most processors from 2006 or later) and a 64-bit operating system (64-bit Vista for instance) the 64-bit version of Fractal eXtreme can calculate deep-zoom fractals an average of 3.5 times faster and up to 5 times faster than the 32-bit version running on the same machine. Together with many other calculation tricks this lets Fractal eXtreme calculate fractal images using hundreds of digits of precision without undue slowdown.

Cygnus Software strongly recommends using the 64-bit version of Windows Vista because only then can the 64-bit version of Fractal eXtreme be used. Using the 64-bit version of Vista also allows more memory to be used by your computer which leads to better performance, and lets memory hungry programs such as games run more reliably. With 4 GB of memory costing just $100 in April 2008 there is no doubt that 64-bit Windows is the right direction to move in. Since 64-bit Vista can run 32-bit applications at full speed there is little reason not to install the 64-bit version of Vista.

**Technical Details**

At low zoom levels Fractal eXtreme uses the floating-point unit (FPU) of the processor to do fractal calculations. This is fast and convenient, but after about 45 zooms (a magnification of about 30 trillion) the FPU no longer has enough precision. At that point Fractal eXtreme has to switch to doing high-precision math using the integer unit. Fractal eXtreme does high-precision math in a way similar to how a person using pencil and paper would do it.

If a person is using pencil and paper to multiply two four-digit numbers together then they have to multiply each digit of the bottom number by each digit of the top number, as shown below:

5123 x 9823 ---- 15369 (3 * 5123) 10246 (2 * 5123) 40984 (8 * 5123) 46107 (9 * 5123) -------- 50323229

Thus, to multiply two four-digit numbers a person has to do sixteen multiplications (four digits times four digits) plus various additions.

Imagine if instead of learning your times-tables up to 12x12 you had learned them up to 99x99. Then you would be able to multiply two digits at a time, and the number of steps would be reduced from sixteen multiplications to just four, a 75% savings, as shown below.

5123 x 9823 ---- 117829 (23 * 5123) 502054 (98 * 5123) -------- 50323229

The diagram above doesn't show the saving very well as it can easily be misinterpreted as just being a 50% saving in steps, but the tables below should make it clearer. We can picture multi-digit multiplication as being a square grid with the digits of the numbers being written along the top edge and left side. Each box in the grid is filled in with the product of the top-edge number above it and the left-edge number to its left, as shown below. The results then have to be added up, along diagonals, but that's not important for this discussion. We can see in the table below that there are sixteen entries in the table that have to be filled in with the result of a multiplication.

5 | 1 | 2 | 3 | |

9 | 9*5 | 9*1 | 9*2 | 9*3 |

8 | 8*5 | 8*1 | 8*2 | 8*3 |

2 | 2*5 | 2*1 | 2*2 | 2*3 |

3 | 3*5 | 3*1 | 3*2 | 3*3 |

In the table below we show what happens if we can do math two digits at a time. We have half as many 'digits' in both numbers so the number of multiplications is only one quarter of what it would otherwise be. This is the 64-bit math savings.

51 | 23 | |

98 | 98*51 | 98*23 |

23 | 23*51 | 23*23 |

Human beings typically do math using digits that can hold numbers from zero to nine. A 32-bit computer, which includes most computers in use for the past ten or fifteen years, does math using 32-bit 'digits' that can hold numbers from zero to 4294967295, equivalent to more than nine base-10 digits. This gives them a huge advantage over human beings when doing high-precision math (especially since they can multiply these together with perfect accuracy around a billion times a second).

A 64-bit computer can, if programmed correctly, do math using *64-bit* digits that can hold numbers from zero to 18446744073709551615, equivalent to more than nineteen base-10 digits! Because these 64-bit processors can typically multiply and add these 64-bit numbers just as quickly as 32-bit numbers, and because 75%
fewer multiplications are needed, it means that doing high-precision math using 64-bit digits can run four times as fast as high-precision
math using 32-bit digits.

Fractal eXtreme employs some other tricks to help accelerate its math. Let's imagine that Fractal eXtreme is doing 192-bit calcs, a relatively modest deep-zoom precision equivalent to about 58 decimal digits, and well beyond the precision of a floating-point unit. Each 192-bit number consists of six 32-bit 'digits' and we can represent the digits of these two numbers as ABCDEF and GHIJKL. The obvious way to multiply them together is to do thirty-six multiplications, like this:

A | B | C | D | E | F | |

G | G*A | G*B | G*C | G*D | G*E | G*F |

H | H*A | H*B | H*C | H*D | H*E | H*F |

I | I*A | I*B | I*C | I*D | I*E | I*F |

J | J*A | J*B | J*C | J*D | J*E | J*F |

K | K*A | K*B | K*C | K*D | K*E | K*F |

L | L*A | L*B | L*C | L*D | L*E | L*F |

However we are only interested in having a 192-bit result, so some of these multiplications are unnecessary. For our fractal calculation purposes (using fixed-point math with the sign-bit and three integral bits to the left of the decimal point, for those who care) we are only interested in the most significant portions of the result so we can simply not bother doing fifteen of the multiplications, saving a significant amount of time, as shown below:

A | B | C | D | E | F | |

G | G*A | G*B | G*C | G*D | G*E | G*F |

H | H*A | H*B | H*C | H*D | H*E | - |

I | I*A | I*B | I*C | I*D | - | - |

J | J*A | J*B | J*C | - | - | - |

K | K*A | K*B | - | - | - | - |

L | L*A | - | - | - | - | - |

At high precisions this trick reduces the work to be done by almost 50%. With 64-bit math the 192 digit numbers consist of three 64-bit 'digits' each and only six multiplications are needed, as shown below.

AB | CD | EF | |

GH | GH*AB | GH*CD | GH*EF |

IJ | IJ*AB | IJ*CD | - |

KL | KL*AB | - | - |

But wait, there's more! Much of the multiplication done in fractal calculations is squaring a number -- multiplying it by itself. There are some additional shortcuts available when squaring which are best demonstrated with the 32-bit calculations. As before we will represent the first number's digit as ABCDEF, and this number is multiplied by itself. The obvious multiplication grid for ABCDEF times ABCDEF looks like this:

A | B | C | D | E | F | |

A | A*A | A*B | A*C | A*D | A*E | A*F |

B | B*A | B*B | B*C | B*D | B*E | - |

C | C*A | C*B | C*C | C*D | - | - |

D | D*A | D*B | D*C | - | - | - |

E | E*A | E*B | - | - | - | - |

F | F*A | - | - | - | - | - |

However, we can see that there is some duplication here. We are calculating A*B and B*A, and that is wasteful because we know that A*B is equal to B*A. Instead we could calculate A*B just once and then add it to the result twice. This means that there are another *nine* multiplications (represented with periods in
the table below) that we can avoid doing, so our total has dropped, in this case, from thirty-six multiplications down to just twelve! That's a huge savings over doing it with base-10 digits, which could take up to 3,364 multiplications. In the 64-bit math case we can square a 192-bit number with just four multiplications!

A | B | C | D | E | F | |

A | A*A | A*B | A*C | A*D | A*E | A*F |

B | . | B*B | B*C | B*D | B*E | - |

C | . | . | C*C | C*D | - | - |

D | . | . | . | - | - | - |

E | . | . | - | - | - | - |

F | . | - | - | - | - | - |

The high-precision math experts who calculate billions of digits of pi use even more advanced techniques, using fourier transforms to generate polynomials to further reduce the number of multiplications needed, but these techniques don't become cost-effective until you are doing math with millions of bits of accuracy. Meanwhile Fractal eXtreme still has a few tricks up its sleeve. Fractal eXtreme has to handle variable precision, supporting from two 'digits' of precision all the way up to 240. Rather than using one routine that can handle an arbitrary number of digits, with all of the testing and branching that this implies, Fractal eXtreme generates over a hundred different multiplication functions to handle the full range of 64-bit precision multiplications, and many of the 32-bit precision multiplications, without a single branch. Additionally these high-precision multiply routines use a special technique that allows them to avoid the cost of writing temporary results to memory, even when working with the largest numbers. And, because avoiding work is even faster than doing it, Fractal eXtreme uses loop-detection and guessing to try to avoid doing calculations whenever possible.

Given the huge speedups that 64-bit processors have given us, it is reasonable to
ask whether 128-bit processors will soon give even greater advantages. While 128-bit
processors *would* give huge advantages, they are unlikely to ever exist.
Moore's law would have to continue for about another fifty years (doubling transistor-counts
32 more times) before processors
would have enough memory to require a 128-bit processor, and it is pretty clear
that we will hit atomic limits long before that happens. Future speedups in fractal
calculation will come from having more processors. With eight processor machines
already reasonably priced we can look forward to many years of geometrically increasing processing
power that Fractal eXtreme is well suited to take advantage of.

Some GPUs (Graphics Processing Units) already have more than 100 processors, and can do math at impressive rates. However they tend to do poorly at high-precision math, and have other disadvantages which so far make them slower than CPUs for fractal calculations.