18

What is the complexity (on the standard integer RAM) of computing the standard discrete Fourier transform of a vector of $n$ integers?

The classical algorithm for fast Fourier transforms, inappropriately[1] attributed to Cooley and Tukey, is usually described as running in $O(n \log n)$ time. But most of the arithmetic operations executed in this algorithm start with complex $n$th roots of unity, which are (for most $n$) irrational, so exact evaluation in constant time is not reasonable. The same issue arises with the naive $O(n^2)$-time algorithm (multiplying by a Vandermonde matrix of complex roots of unity).

It's not even clear how to represent the output of the DFT exactly (in any useful form). In other words, it's not clear that computing DFTs is actually possible!

So suppose we only need $b$ bits of precision in each output value. What's the complexity of computing the discrete Fourier transform, as a function of $n$ and $b$? (For concreteness, feel free to assume $n$ is a power of $2$.)

Or does every instance of "FFT" in the literature actually mean "fast number-theoretic transform"?[2]

See my related questions on the complexity of Gaussian elimination and Euclidean shortest paths.

[1] It should really be called (some prefix of) the Gauss-Runge-König-Yates-Stumpf-Danielson-Lánczos-Cooley-Tukey algorithm.

[2] And if so, why do most textbooks describe only the complex-number algorithm?

Jeffε
  • 23,129
  • 10
  • 96
  • 163
  • I think your claim that the algorithm is incorrectly named detracts from the good question that follows. – Tyson Williams Sep 12 '11 at 14:41
  • +1 An eye opener. I always blindly accepted the $O(n \log n)$ time complexity for DFT. – Mohammad Al-Turkistany Sep 12 '11 at 15:37
  • Totally awesome question. – Suresh Venkat Sep 12 '11 at 16:58
  • Is the variable $b$ really needed for a proper definition? If I understand correctly, the addition and multiplication operation are on words of infinite precision and isn't it probably sufficient to place different dft coefficients on different words for right definition? And infact, the actual question on complexity is to minimize the number of real or complex multiplications and additions. – v s Sep 12 '11 at 17:13
  • 1
    I think that's his point: in theory you don't have to worry about $b$, but in any ACTUAL implementation you DO have to worry about it and the error that might be incurred. – Suresh Venkat Sep 12 '11 at 17:17
  • Thankyou for the clarification. A straightforward approach that is actually used is represent each word to the necessary bits of precision (this is the best that can be done) and compute by using FFT algorithm. A variant that is probably useful in efficient usage of word spaces in processors used in signal processing systems is the following. Supposing if the intermediary word sizes can be expanded to $kb$ bits and the final representation is $b$ bits, what is the complexity? This would probably help in designing or modifying existing instrution sets in fixed/floating point processors. – v s Sep 12 '11 at 17:27
  • 1
    Actually this is a good question each additional bit of precision adds $3dB$ to the signal strength (multiply by $2$). So I think the question will be most useful if the intermediary word sizes can be expanded! – v s Sep 12 '11 at 17:29
  • Just an example: Increasing any additional signal strength is a real 'big' deal in communication systems. If for instance, your algorithm provides $1$ additional bit of accuracy for instance over existing results, then if I understand correctly, you have increased signal quality by roughly a factor of $2$ and hence you have doubled(atleast increased by constant multiple since you have to consider variables like beam patterns etc) cell phone coverage, probably doubled (atleast increased by a constant multiple) revenue for service providers or lowered error correction complexity in half. – v s Sep 12 '11 at 17:39
  • 3
    Computable analysis has considered this, and related questions. This paper produces a complexity bound for computation of the Fourier transform within the framework of Weirauch's Type II effectivity. The bound is that it is linear in the presentation of the (infinite, real-valued) input. Both the input and the output are defined wrt precision parameters in this system, so there may be a way to translate this into the RAM model. – Aaron Sterling Sep 12 '11 at 17:48
  • 3
    Have a look at Method A in the paper of Schönhage and Strassen on integer multiplication. It uses complex Fourier transforms with bounded precision. I think, it is also described in Knuth Vol. 2. – Markus Bläser Sep 12 '11 at 19:29
  • Jeff, having just taught FFT in graduate algorithms that you regularly teach, let me say that one reason to use complex numbers is that most students are familiar with it. Bringing up finite fields etc takes time. The issue of computing an exact convolution for integer sequences came up and I simply pointed them to the wikipedia article. For signal processing I guess floating point arithmetic is ok and it appears that the numerical properties of FFT have been analyzed though I do not know the details. – Chandra Chekuri Sep 12 '11 at 20:51
  • in my class, I find that complex numbers and finite fields draw the same kind of glazed look :) – Suresh Venkat Sep 13 '11 at 10:52
  • 2
    Markus, Aaron: convert to answers ? – Suresh Venkat Sep 13 '11 at 10:52
  • Chandra: I agree that the complex number version is more familiar (despite the glazed eyes that both Suresh and I see). My question is why standard references don't include another half page explaining and motivating the modular-arithmetic version. – Jeffε Sep 13 '11 at 12:04
  • Aaron: The paper you mention describes a non-standard model of computation for integrable functions and establishes a linear-time(!) algorithm for the CONTINUOUS Fourier transform in that model. It's completely unclear how to map that result onto the integer-RAM complexity of the discrete Fourier transform. – Jeffε Sep 13 '11 at 12:08
  • 1
    v s: No, my actual question is how to minimize the number of operations on the standard integer RAM. (I'm well aware that this is not the standard question, which is asked and answered in standard textbooks.) It's perfectly fine to expand to (say) $2^b$ bit of precision in the intermediate values, if that's what's required, as long as you remember to analyze the resulting arithmetic operations correctly. – Jeffε Sep 13 '11 at 12:10
  • Jeff, Dasgupta etal book has an exercise discussing the modular-arithmetic version. – Chandra Chekuri Sep 13 '11 at 14:30
  • 1
    @Jeff: The operation is linear wrt the presentation. Type II effectivity is an attempt to avoid the problems in the real-RAM model, where, eg, one can determine whether two reals are equal in one time step. The function in the paper is only continuous in the sense that the FT function really takes as input a naming function, which, when given a precision parameter, outputs a name of a real to that precision. I'm not saying I know how to map TTE to the integer-RAM model, but I wouldn't be surprised if that has been considered somewhere. I'll take a look. – Aaron Sterling Sep 13 '11 at 16:03
  • 1
    This is the kind of question that should have been studied in computable analysis and complexity of real functions, but I couldn't find them in the standard references (Ko's and Weihrauch's books). – Kaveh Sep 13 '11 at 17:51
  • 1
    Possibly, Gauss-Runge-König-Yates-Stumpf-Danielson-Lánczos-Good-Cooley-Tukey; Cooley and Tukey describe their algorithm as an application of an algorithmic approach due to Good, although "derived and presented in a rather different form". – Ken Clarkson Sep 14 '11 at 21:58
  • Do you know that there are exists different versions of FFT not only over C which works over finite fields and other rings? For many applications this versions are more useful. – Klim Sep 22 '11 at 09:04
  • @Klim: Sure; that's the "number theoretic transform". – Jeffε Sep 24 '11 at 16:49

2 Answers2

9

This answer is a variant of the analysis of the first algorithm ("Methode A") by Schönhage and Strassen for multiplication of long integers.

Assume we want to compute an FFT of length $K = 2^k$. Scale your input such that all values are smaller than 1. Let us first assume that we compute with $m$-bit fixed point arithmetic ($m$ bits after the binary point). Let $\delta = 2^{1/2 -m}$ be the ("complex") unit of least position. Let $\omega = \exp(2\pi i/K)$.

1) One can compute approximations $\omega_j'$ such that $|\omega_j' - \omega^j| \le (2k-1)\delta$ for all $0 \le j \le K-1$. This can be done in time $O(K M(m))$ where $M(m)$ is the time needed to multiply $m$-bit numbers. (see Knuth Vol. 2, 3rd ed., page 309).

If standard integer RAM means logarithmic cost, then $M(m) = O(m \log m)$. If standard integer RAM means word RAM, then $M(m) = O(m)$. (Schönhage and Strassen show in "Methode A" how to reduce in linear time the multiplication of $m$-bit numbers to $m$ multiplication of $O(\log m)$ bit numbers. The latter can be done at unit costs.)

2) The classical Cooley-Tukey FFT computes operations of the form $a = b + \omega^j c$. We use $m$-bit fixed point arithmetic, these opertions become $a' = truncate(b' + \omega_j' c')$. If we know $b'$ and $c'$ up to an error of $\epsilon$, we get $a'$ up to an error of $2\epsilon + 2k\delta$.

3) Using induction, it is easy to see that we get the final result with error $(2^k - 1) \cdot 2k\delta$. To get precision $b$ in the end, $m \ge k + \log k + b + O(1)$.

4) Thus the final running time is $O(K k M(k+b))$.

This should also work with floating point numbers: 1) can still be done with fixed point arithmetic, 2) is also true for floating point numbers.


In fixed point arithmetic, I think, it can even be done faster. First we reduce the computation of the FFT to the multiplication of polynomials using Bluestein's trick. The length of the coefficients needed to get the desired precision should be $O(k + b)$. Then we reduce the multiplication of polynomials to the multiplication of long integers. (Append the coefficients to a long number and separate them by blocks of zero of length $O(k+b)$.) The length of the integers is $O(K(k+b))$.

Markus Bläser
  • 2,938
  • 18
  • 19
  • So from point (4), setting K = n and b = O(log n), and assuming we're running on the word RAM, we get a running time of $O(n \log^2 n)$. Right? – Jeffε Sep 14 '11 at 12:56
  • Yes. The second algorithm even yields $O(n \log n)$, assuming that precision $O(k+b)$ is sufficient. (I do not see any point why this is not sufficient, but I did not do the details.) – Markus Bläser Sep 14 '11 at 14:02
  • 2
    BTW, if $b$ is as small as $O(\log n)$, then also the first algorithm gives running time $O(n \log n)$ since $M(O(\log n)) = 1$. – Markus Bläser Sep 14 '11 at 15:49
  • I happened to look at the book of Aho, Hopcroft and Ullman on "The Design and Analysis of Algorithms" and they discuss the algorithm in the bit model and related issues in some detail. – Chandra Chekuri Sep 21 '11 at 18:38
  • But as far as I remember, they only discuss the "number-theoretic FFT" in the bit-model. – Markus Bläser Sep 26 '11 at 13:18
  • We should note that the assumption of significand size $b = O(\log(n))$ is totally arbitrary. An equally plausible situation is $b = O(w)$. Then, assuming we have floating-point numbers that fit in a word (of size $w$) and assuming that word operations generally take $O(1)$ time each, time will be $O(n \cdot \log^k(n))$ with $k == 1$, given that we want to support all numbers representable by such a floating-point number. Also, of course, Blaeser is being generous to Jeffε s.t. technically $O(n \cdot \log(n))$ is in $O(n \cdot \log^2(n))$. – bzliu94 Jan 28 '18 at 04:10
8

This is not a complete answer, but I can point you to some relevant papers and also partially explain why it's not so easy to extract an answer to your specific question from the literature.

Let me start by asking, why do you want to know the answer to this question? Typically, the people who find themselves caring about this sort of issue are those faced with actually implementing a high-performance FFT for a practical application. Such people care less about asymptotic complexity in some idealized computational model than about maximizing performance under their particular hardware and software constraints. For example, the developers of the Fastest Fourier Transform in the West write in their paper:

The best choice depends upon hardware details like the number of registers, latency and throughput of instructions, size and associativity of caches, structure of the processor pipeline, etc.

These are issues that theorists typically don't want to sully their hands with, but they are of great importance in actual implementations. If a theorist declares, "I've figured out the absolute best asymptotic bit complexity in the RAM model," the practitioner might say, "That's nice," but may find such a theoretical result useless for his or her purposes.

Having said that, I think that your best bet is to look at the numerical analysis literature. For example, Tasche and Zeuner have taken a close look at the numerical stability of the FFT algorithm. This may still not be exactly what you want, because the general consensus among practitioners seems to be that to achieve a given amount of numerical precision, the best practical approach is to precompute certain numbers called "twiddle factors" to high accuracy. If you're doing only one FFT, then this is not going to be the fastest approach because you don't get to amortize the cost of your one-time precomputation over a large number of FFT computations. Still, their analysis of the worst-case roundoff error should still be relevant to your question.

Timothy Chow
  • 7,550
  • 35
  • 43
  • I bet people would be interested in knowing if they can squeeze $1$ extra bit of precision say if they can do a $1024$ point FFT (OFDM in WLANs) in say $100$ extra multiplications over the current algorithms. – v s Sep 13 '11 at 15:57
  • 1
    I'm interested as a purely theoretical question, in the interest of correct and honest scholarship. It is quite common to read "and here we use an FFT, which as everyone knows runs in O(n log n) time" in the middle of an otherwise purely combinatorial algorithm, otherwise analyzed in terms of pointer traversals and O(log n)-bit integer arithmetic. If, in fact, integer convolution can be performed in O(n log n) time using a slight variant of the FFT, this is perhaps forgivable but still sloppy. If not, any poor schmuck who tries to implement the algorithm is going to get THE WRONG ANSWER. – Jeffε Sep 14 '11 at 12:51
  • And of course, I don't expect the answer to my question to have any impact in practice whatsoever. – Jeffε Sep 14 '11 at 12:58
  • 2
    Jeff, as far as honest scholarship is concerned, isn't it enough to say that the FFT requires O(n log n) ring operations? That is the natural way to measure the complexity of the FFT algorithm. I don't see the motivation for converting everything into one particular model of computation. Is there some theorem you're trying to prove where it's crucial to keep track of the number of bits of precision? As for your poor schmuck, I don't buy that he'll get the "wrong answer." In any actual implementation, the question you're asking here is very unlikely to be the dominant concern. – Timothy Chow Sep 14 '11 at 14:53
  • Tim: Sure, it's enough to say $O(n \log n)$ ring operations if you're analyzing the FFT in isolation. But if the FFT is just one component of a larger algorithm, reporting the running time of the larger algorithm requires a consistent model of computation for all its constituent subroutines, including the FFT. For example, "convolve the two integer sequences using the Cooley-Tukey FFT algorithm and then insert the resulting coefficients into a hash table" (to make up a totally fake example) is asking for trouble. – Jeffε Sep 15 '11 at 12:10