Profiling-based optimization with GCC
Getting performance information with gprof (1)
gprof (1) gives you statistics about your program: total execution time, per-function cpu time, and so on.
In order to use gprof, you must compile and link with the -pg option.
You can profile optimized code, but you can't use the -fomit-frame-pointer option. e.g.:
$ g++ -Wall -03 -mcpu=athlon-mp -ffast-math -pg -c mySource.cxx $ g++ -pg -o myMightyProgram -lz -lpng
Once done, run gprof like this:
$ gprof myMightyProgram -b --ignore-non-functions >profiling.txt
Sample gprof output:
Each sample counts as 0.001 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 26.90 0.39 0.16 main 22.67 0.53 0.13 7604961 0.00 0.00 CBuffer<cvertex>::operator[](unsigned) 8.12 0.58 0.05 9054 0.01 0.03 CControl::process() 0.85 0.58 0.01 7605360 0.00 0.00 CBuffer<unsigned>::operator[](unsigned) 0.51 0.58 0.00 9053 0.00 0.00 DefaultInverseSqrt::operator()(float) 0.51 0.59 0.00 9053 0.00 0.00 CCamera::apply(CCamera::ERotationMode) ...
Optimizing GCC output based on profiling information
GCC has two interesting options: -fprofile-arcs and -fbranch-probabilities.
When using profiling information, compilation becomes a 2-pass process:
- First pass: Gathering of profiling information:
- Compile your project with your usual optimization options, plus -fprofile-arcs.
$ g++ -Wall -03 -mcpu=athlon-mp -ffast-math -fprofile-arcs -c mySource.cxx ...
- Run your program and experience it in a "production-like" way. It will create a bunch of .da files.
- Compile your project with your usual optimization options, plus -fprofile-arcs.
- Second pass: Profiling-based optimization:
- Compile your project again, replacing -fprofile-arcs with -fbranch-probabilities.
$ g++ -Wall -03 -mcpu=athlon-mp -ffast-math -fbranch-probabilities -c mySource.cxx ...
- Compile your project again, replacing -fprofile-arcs with -fbranch-probabilities.
Other interesting gcc experimental features are -fssa, -fssa-cpp, -fssa-dce (see man gcc).
Also, here is a little profiler I wrote for ia32/ia64.
Leave a Comment