pmeerw's blog

Fri, 16 Sep 2011

KissFFT and ARM NEON

I added ARM NEON SIMD support to kiss FFT. Beware, this primarily enables 2 and 4 parallel FFTs, it not necessarily speeds up a single transform (well, in fact it does :-))

Runtime for real-to-complex transform (N=256, forward and inverse transform, 10000 repetitions) in seconds:
float float
(RunFast)
float32x2_t float32x4_t
1.62 1.22 0.66 0.98
Note: float32x2_t and float32x4_t, respectively, compute two and four FFTs in parallel!

posted at: 15:33 | path: /programming | permanent link

Made with PyBlosxom