pmeerw's blog

Thu, 17 Jul 2008

Profile-guided optimization (PGO) with GCC

GCC supports profile-guided optimization for some time now. I gave it a try on the MC-EZBC video codec, which certainly can use some speedup from optimization :-).

To generate an executable which collects runtime information, use

gcc -fprofile-generate -o prog_gen_gpo prog.c
This create files ending in *.gcda and *.gcno in your source tree.

In a second compilation pass, instruct gcc to make use of the runtime profile information

gcc -fprofile-use -o prog prog.c
The profiling options must be given in the compile and link stage!

Results for decoding 32 frames, Foreman sequence, on a Intel Core2 Duo, 2.66 GHz, compiled with gcc 4.3:

The following CFLAGS -DNDEBUG -O3 -g -march=core2 -fomit-frame-pointer -fprofile-use -msse3 -mfpmath=sse provide best results, the new (with gcc 4.3) -march=core2 (or -march=native) helps at lot (2.1 sec), -mfpmath=sse brings 0.5 sec.

For comparison: gcc 4.2.3 -O3 is dead slow, 46.2 sec!

posted at: 13:47 | path: /programming | permanent link

Made with PyBlosxom