The following code (from math_runfast.c) improves
kiss FFT's real-to-complex transform (N=256) runtime from
1.62 to 1.22 seconds (forward and inverse transform, 10000 repetitions).
void enable_runfast() { #ifdef __arm__ static const unsigned int x = 0x04086060; static const unsigned int y = 0x03000000; int r; asm volatile ( "fmrx %0, fpscr \n\t" //r0 = FPSCR "and %0, %0, %1 \n\t" //r0 = r0 & 0x04086060 "orr %0, %0, %2 \n\t" //r0 = r0 | 0x03000000 "fmxr fpscr, %0 \n\t" //FPSCR = r0 : "=r"(r) : "r"(x), "r"(y) ); #endif }
In RunFast mode the VFP11 coprocessor, there are no user exception traps, rounding behaviour is slightly different (no negative zeros) and NaNs are handled differently.
Ideal speedup on Cortex-A8 for RunFast is reportedly 40%. There is a patch for eglibc on meego: http://permalink.gmane.org/gmane.comp.handhelds.meego.devel/7937
posted at: 13:13 | path: /programming | permanent link