Sharedwww / wiki / 09(2f)583e(2f)schedule(2f)toom7.htmlOpen in CoCalc
Author: William A. Stein
09/583e/schedule/toom7
 [top] [TitleIndex] [WordIndex]

09/583e/schedule/toom7

Bill Hart: Toom 7

<<TableOfContents>>

Motivation: be fast for certain ranges of integer multiplication. On core 2, will kick "their pants".

Work with 64-bit computer.

$B=2^{64}$

A "limb" is a "machine word".

To multiply integers

  1. split into polynomials
  2. evaluate polynomials
  3. multiplications
  4. interpolate --> polynomials

  5. reconstruct our product

Now, write in symbols $M=f_1(2^k)$ and $N=f_2(2^k)$, for sufficiently big $k$, where $f_1(x) = a_0 + a_1 x + \cdots + a_n x^n$, and $f_2(x) = b_0 + b_1 x + \cdots + b_n x^n$, where the $a_i$ and $b_i$ are the "digits in base $2^k$".

Then $MN = f_1(2^k) \cdot f_2(2^k) = (f_1 \cdot f_2)(2^k)$. Evaluating $f_1\cdot f_2$ (reconstruct -- part 5) is easy bit shifting. So now we focus on polynomial multiplication.

Karatsuba (1962)

$$(c_0 + c_1y)(d_0 + d_1y) = c_0d_0 + c_1d_1 y^2 + [(c_0 + c_1)(d_0 + d_1) -c_0d_0 - c_1d_1]y$$

  • 3 distinct multiplications
  • 5 additions/subtractions

For larger polynomials, divide and conquer.

$$c_0 = a_0 + a_1 x + ... + a_{\lceil n/2\rceil} x^{\lceil n/2\rceil}$$

$$c_1 = a_{\lceil n/2\rceil+1} x^{\lceil n/2\rceil+1} + \cdots + a_n x^n$$

$$d_0 = ...$$

$$d_1 = ...$$

Knuth Multiplication

Very very hard to find in the literature (impossible?). I couldn't find a reference.

  • $$(c_0 + c_1y)(d_0 + d_1y) = c_0d_0 + c_1d_1 y^2 + [-(c_0 - c_1)(d_0 - d_1)  + c_0d_0 + c_1d_1]y$$

This is better, since some intermediate results are smaller. Overflowing into the next limb in Karatsuba is potentially quite nasty. Worrying about sign is less work than dealing with an extra limb. You have to use signed representations -- there is extra branching in both cases, and it really does make a difference. GMP really uses this and it does make a difference. At this level, saving one cycle per limb is a big improvement.

Another way of thinking

The above is difficult to generalize without being more systematic. Toom 3 is the next logical thing, where we break into 3 pieces instead of 2.

Consider

$$f(x)g(x) = h(x)$$

where $f,g$ have length $n$, where the length is the degree plus $1$. Then $h$ has length $2n-1$. To find the coefficients of $h$ we only need values of $h$ at $2n-1$ distinct points.

Example: Karatsuba

  • Evaluate at $v_\infty = \infty, v_1 = 1, v_2 = 0$.

    $h(0) = f(0) g(0) = c_0 d_0$

    $h(1) = f(1) g(1) = (c_0 + d_0)(c_1 + d_1)$

    $h(\infty) = f(\infty) g(\infty) = y^2(c_0 + c_1/y)(d_0 + d_1/y)|_{y=0} = c_1 d_1$.

So we end up doing the same three multiplies as with Karatsuba.

Example: Knuth

Same, but evaluate at $v_\infty = \infty, v_1 = -1, v_2 = 0$.

Interpolation

Now generalizing we evaluate at $2n-1$ points and need to find the resulting polynomial.

$$h(x) = m_0 + m_1 x + \cdots + m_s x^s \qquad s = 2n-2$$

and we have $w_i = h(v_i)$ for $s+1$ values $v_0,\ldots,v_{s-1},v_s$. We recover $m_0,\ldots,m_s$ by solving a linear system with $s+1$ rows: $A m = w,$ so $m = A^{-1}w$.

Theorem: generically $A$ is invertible.

So solve the system explicitly and get a set of linear expressions.

Toom 3 (Toom 1963, Cook 1966)

$$f(x) = c_0 + c_1 y + c_2 y^2$$

$$g(x) = d_0 + d_1 y + d_2 y^2$$

$$h(x) = f(x) g(x)\qquad{\rm (degree 4)}$$

Evaluate at $4+1 = 5$ points. One gets a messy identity.

New Idea: Winograd (1980)

Winograd made some suggestions:

Evaluate at $\infty$ and fractions. Amazingly, nobody thought of this for quite a while -- sometimes the good ideas are the simple ones. Say $v_i = 1/2$, so

$$2^{2n} f(1/2)g(1/2) = 2^{2n} h(1/2).$$

This greatly improves the linear algebra we have to do.

Toom 3 matrix. Evaluate at $\infty, 2, -1, 1, 0$. The matrix is: $$\left(\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 0 \\ 16 & 8 & 4 & 2 & 1 \\ 1 & -1 & 1 & -1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 1 \end{array}\right)$$ One then worries about finding a minimal list of row operations to reduce this to the identity.

Evaluate at $\infty, -1, 1, 1/2, 0$: $$\left(\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 0 \\ 1 & -1 & 1 & -1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 2 & 4 & 8 & 16 \\ 0 & 0 & 0 & 0 & 1 \end{array}\right)$$ One can solve the resulting systems more efficiently on actual hardware with this matrix.

Unbalanced operands

$$f(x) = a_0 + a_1x +\cdots +a_n x^n$$

and

$$g(x) = b_0 + a_1x +\cdots +b_m x^m$$

Compute $h(x) = f(x)g(x)$ simply by evaluating at $m+1+n+1-1=m+n+1$ distinct points.

Toom 4 2

  • For example, suppose we have $a_0 + a_1 x + a_2 x^2 + a_3 x^3$ and $b_0 + b_1 x$. Evaluate at $5$ points. Pretty much the same as Toom 3... the interpolation (the most involved part) is exactly the same as Toom 3.

Odd/even Karatsuba (Hanrot-Zimmerman)

  • David Harvey reinvented this, not knowing about the original paper.

The trick here is to write

$$f(x) = A_1(x^2) + x A_2(x^2)$$

and similarly

$$g(x) = B_1(x^2) + x B_2(x^2).$$

Then $f(x)g(x) = A_1 B_1  + x^2 A_2 B_2 + x[(A_1+A_2)(B_1 + B_2) - A_1B_1 - A_2B_2]$

Advantage: $A_1$ and $B_1$ do not have to be the same size, so get balanced multiplication no matter what.

So far though, nobody has figured out how to make this really practical/fast/better than anything else.


2013-05-11 18:32