I added ARM NEON SIMD support to kiss FFT. Beware, this primarily enables 2 and 4 parallel FFTs, it not necessarily speeds up a single transform (well, in fact it does )
Runtime for real-to-complex transform (N=256, forward and inverse transform, 10000 repetitions) in seconds:
float | float (RunFast) |
float32x2_t | float32x4_t |
1.62 | 1.22 | 0.66 | 0.98 |
posted at: 15:33 | path: /programming | permanent link