One-dimensional Fast Fourier Transform (FFT) operations work fastest on grids whose size is divisible by a power of two. Because of this, padding grids (that are not already sized to a power of two) so that their size is the next highest power of two can speed up operations. While this works well for one-dimensional grids, it does not work well for two-dimensional grids.
For a two-dimensional grid, there are certain pad sizes that work better than others. Therefore, the need exists to generalize a strategy for determining optimal pad sizes. There are three steps in the FFT algorithm. The first is to perform a one-dimensional transform on each row in the grid. The second step is to transpose the resulting matrix. The third step is to perform a one-dimensional transform on each row in the resulting grid. Steps one and three both benefit from padding the row to the next highest power of two, but the second step needs a novel approach.
An algorithm was developed that struck a balance between optimizing the grid pad size with prime factors that are small (which are optimal for one-dimensional operations), and with prime factors that are large (which are optimal for two-dimensional operations). This algorithm optimizes based on average run times, and is not fine-tuned for any specific application. It increases the amount of times that processor-requested data is found in the set-associative processor cache. Cache retrievals are 4–10 times faster than conventional memory retrievals.
The tested implementation of the algorithm resulted in faster execution times on all platforms tested, but with varying sized grids. This is because various computer architectures process commands differently. The test grid was 512×512. Using a 540×540 grid on a Pentium V processor, the code ran 30 percent faster. On a PowerPC, a 256×256 grid worked best. A Core2Duo computer preferred either a 1040×1040 (15 percent faster) or a 1008×1008 (30 percent faster) grid.
There are many industries that can benefit from this algorithm, including optics, image-processing, signal-processing, and engineering applications.
This work was done by Bruce H. Dean, David L. Aronstein, and Jeffery S. Smith of Goddard Space Flight Center.
This invention is owned by NASA, and a patent application has been filed. Inquiries concerning nonexclusive or exclusive license for its commercial development should be addressed to the Patent Counsel, Goddard Space Flight Center, (301) 286-7351. Refer to GSC-15678-1.