 pmeerw's blog
 
pmeerw's blog
GCC supports profile-guided optimization for some time now. 
I gave it a try on the MC-EZBC video codec,
which certainly can use some speedup from optimization  .
.
To generate an executable which collects runtime information, use
gcc -fprofile-generate -o prog_gen_gpo prog.cThis create files ending in
*.gcda  and *.gcno in your source tree.
In a second compilation pass, instruct gcc to make use of the runtime profile information
gcc -fprofile-use -o prog prog.cThe profiling options must be given in the compile and link stage!
Results for decoding 32 frames, Foreman sequence, on a Intel Core2 Duo, 2.66 GHz,
compiled with gcc 4.3:
The following CFLAGS -DNDEBUG -O3 -g -march=core2 -fomit-frame-pointer -fprofile-use -msse3 -mfpmath=sse provide best results,
the new (with gcc 4.3) -march=core2 (or -march=native) helps at lot (2.1 sec), 
-mfpmath=sse brings 0.5 sec.
For comparison: gcc 4.2.3 -O3 is dead slow, 46.2 sec!
posted at: 13:47 | path: /programming | permanent link