pmeerw's blog

Mon, 19 Sep 2011

FFT performance on ARM Cortex-A8

All measurements in seconds for 10000 repetitions. Run on Beagleboard-XM clocked at 900 MHz (no RunFast). Compiled using -O2 -march=armv7-a -ffast-math -fPIC -mfloat-abi=softfp -mfpu=neon.

complex-to-complex

length (N) ooura djb kiss libav fftw2 fftw3 fftw3/neon fftw3/new
2048 10.22 11.56 14.2 1.0 10.92 16.16 2.82 2.87
1024 4.5 5.2 5.61 0.46 5.11 7.22 1.16 1.16
512 2.07 2.3 2.98 0.2 2.59 2.89 0.36 0.34
256 0.88 1.0 1.12 0.08 1.01 1.12 0.12 0.11

real-to-complex

length (N) ooura djb kiss libav fftw2 fftw3 fftw3/neon fftw3/new
2048 5.37 - 6.91 0.7 4.71 7.37 7.38 7.38
1024 2.49 - 3.45 0.32 2.19 3.14 3.13 3.13
512 1.09 - 1.43 0.2 1.09 1.2 1.2 1.2
256 0.49 - 0.72 0.08 0.41 0.46 0.46 0.47

oourafft (as of 2006/12/28) is free and available at http://www.kurims.kyoto-u.ac.jp/~ooura/fft.html.
djbfft is available at http://cr.yp.to/djbfft.html.
kissfft is under BSD license and available at http://sourceforge.net/projects/kissfft/.
fftw2 is GPL licensed (version 2.1.5), available at http://www.fftw.org/. fftw3 is GPL licensed (version 3.2.2). fftw3/neon is based on fftw 3.2.2 and has ARM/NEON patches added. fftw3/new is GPL licensed (version 3.3.1-beta) and has ARM/NEON support.

posted at: 14:00 | path: /programming | permanent link

Made with PyBlosxom