|
A Tour of NTL: Some Performance Data
Here are some timing figures from using NTL. The figures were obtained using a 700MHz Pentium III running Linux, and using gcc version 2.95.2. Also, GMP version 3.1.1 was used as the primary long integer package.
Basic arithmeticFor various values of n, we give the running times for the following tasks:
The time for IMUL, IREM, IINV is stated in units of 10^{-6}s, and that of PMUL in units of seconds.
The IMUL, IREM, and IINV times essentially reflect the performance of the underlying GMP routines (NTL's interface to these routines is very "light weight"). The PMUL time reflects the performance of NTL's FFT implementation of polynomial arithmetic.
Factoring polynomials over finite fieldsWe next consider the problem of factoriing of univariate polynomials modulo a prime p. As test polynomials, we take the family of polynomials defined in [V. Shoup, J. Symb. Comp. 20:363-397, 1995]. For every n, we define p to be the first prime greater than 2^{n-2}*PI, and the polynomial is sum(a[n-i]*X^i, i = 0..n), where a[0] = 1, and a[i+1] = a[i]^2 + 1.
Here are some running times, in seconds, for various values of n:
Also of interest is space usage.
The n = 512 case used about 5MB main memory,
and the n= 1024 case used 18MB main memory.
The second problem considered is factoring univariate polynomials
over the integers.
In NTL 5.2, we have implemented
Mark van Hoeij's new
method for
polynomial factorization.
Prior to this, NTL offered an implementation of the classical
``Zassenhaus method,'' which can take exponential time
in the worst case, although the NTL algorithm implements
several novel ``pruning techniques'' that can subsantially
speed up the brute-force search stage of the Zassenhaus method.
However, van Hoeij's algorithm seems to have made all of these
techniques more or less obsolete.
Note that van Hoeij's method is more a general algorithmic technique
than a specific algorithm.
To obtain a specific algorithm, several parameters and strategies
must be chosen.
We have attempted to make these choices so as to obtain
a good, general-purpose algorithm that works fairly well
on a wide variety of input polynomials.
All of these choices are a bit heuristic, and any feedback
refarding possible improvements is most welcome.
We consider a number of test polynomials, collected and/or suggested
by Paul Zimmermann
and Mark van Hoeij.
P1 comes from the Rational
Univariate Representation (RUR) of the Cyclic 6 system,
and P2 and P3 come from the RUR of Cyclic 7.
P4 was contributed by A. Hulpke and H. Matzat: it is the 5-set resolvent of
the polynomial f = x^11 + 101*x^10 + 4151*x^9 + 87851*x^8 + 976826*x^7 +
4621826*x^6 - 5948674*x^5 - 113111674*x^4 - 12236299*x^3 + 1119536201*x^2 -
1660753125*x - 332150625 and its factorization would prove that f has
Galois group M11.
P5 is the Swinnerton-Dyer polynomial for 2,3,5,7,11,13, i.e. the
product of all terms of the form x+/-sqrt(2)+/-...+/-sqrt(13).
It is a "bad" polynomial for the method of factoring in Fp and
combining the factors in Z, because all its factors in Fp are of
degree at most 2.
P6 is the resultant with respect to x of the polynomial
p = -2566974800*x^2 +134697056*x^4
-3312297*x^6 +43109*x^8 -308*x^10 +x^12 +1142440000 with p(y-2*x);
it was contributed by Frederic Lehobey and
Nicolas Rennert.
P7 (contributed by John Abbott) is the norm of
x^6 + (a1+a2+a3+a4+a5+a6)*(x^4-x^2) + 1 with a6^2-2, a5^2-3, a4^2-5,
a3^2-7, a2^2-11 and a1^2-13, i.e. the resultant of that polynomial with
a6^2-2 with respect to a6, then with a5^2-3 wrt a5, and so on.
P8 (contributed by Jean-Charles Faugere) comes from the Groebner basis
of Cyclic 9.
M12_5 and M12_6 are the 5th and 6th resolvents of the polynomial f
with Galois group M_{12} from the example in van Hoeij's paper.
These polynomials are non-monic (unlike the others).
We also used the polynomials S6, S7, S8, S9, and S10,
where Si is the Swinnerton-Dyer polynomial
of degree 2^i corresponding to the first i primes.
These are all irreducible polynomials,
and are particularly bad for the Zassenhaus method.
The following two polynomials, H1 and H2, were suggested by Mark van Hoeij
as examples that may
be particularly challenging to factor using his algorithm:
The polynomial H1 is defined as it is, rather than as x^960-1,
simply to "hide" the specail structure of the polynomial -- many
factorizers would apply special techniques to factor x^960-1,
but will most likely not apply these techniques to (x-1)^960-1.
To factor this polynomial with van Hoeij's method, one has to consider
very high traces, and thus it is essential that a good implementation
at some point crosses over to a "dense" strategy (or alternatively,
implements a brute-force factor search in combination with van Hoeij's
method).
You can download
all of the above polynomials (except H1, which is easy trivial to generate
from scratch).
We collected timing data for both the Zassenhaus and van Hoeij
algorithms, both with and without the so-called "power hack".
By setting a run-time switch, the user can choose to employ a "power hack",
which attempts to speed up the factorization if the polynomial
f(x) is of the form g(x^m) for some m > 1.
However, while this "power hack" may sometimes speed things up substantially,
it can also slow things down -- for the Zassenhaus method, this slowdown
is usually negligible, but for our implementation of van Hoeij's algorithm,
this slowdown can be quite substantial.
Because of this, in our implementation of van Hoeij, we have modified
the power hack so that if it "seems" to be going slowly, it is abandoned.
This "early abort" power hack strategy
is "on" by default, but can be turned off by the
user by setting a run-time switch.
In the table below, running time is presented in seconds
for each of the 4 methods:
We also state for each polynomial the number r of modular
factors of the polynomial (this is not necessarily relevant for the
power-hack running times),
For comparison, we also state some running times for Maple version 5.1,
running on a 500MHz Alpha 21264/EV6,
as reported by Paul Zimmermann.
Some remarks:
|