GCC supports profile-guided optimization for some time now. I gave it a try on the MC-EZBC video codec, which certainly can use some speedup from optimization .
To generate an executable which collects runtime information, use
gcc -fprofile-generate -o prog_gen_gpo prog.cThis create files ending in
*.gcda
and *.gcno
in your source tree.
In a second compilation pass, instruct gcc to make use of the runtime profile information
gcc -fprofile-use -o prog prog.cThe profiling options must be given in the compile and link stage!
Results for decoding 32 frames, Foreman sequence, on a Intel Core2 Duo, 2.66 GHz,
compiled with gcc 4.3:
The following CFLAGS -DNDEBUG -O3 -g -march=core2 -fomit-frame-pointer -fprofile-use -msse3 -mfpmath=sse
provide best results,
the new (with gcc 4.3) -march=core2
(or -march=native
) helps at lot (2.1 sec),
-mfpmath=sse
brings 0.5 sec.
For comparison: gcc 4.2.3 -O3 is dead slow, 46.2 sec!
posted at: 13:47 | path: /programming | permanent link