Profiling-based optimization with GCC

Getting performance information with gprof (1)

gprof (1) gives you statistics about your program: total execution time, per-function cpu time, and so on.
In order to use gprof, you must compile and link with the -pg option.
You can profile optimized code, but you can't use the -fomit-frame-pointer option. e.g.:

$ g++ -Wall -03 -mcpu=athlon-mp -ffast-math -pg -c mySource.cxx
$ g++ -pg -o myMightyProgram -lz -lpng

Once done, run gprof like this:

$ gprof myMightyProgram -b --ignore-non-functions >profiling.txt

Sample gprof output:

Each sample counts as 0.001 seconds.   
   %   cumulative   self              self     total           
  time   seconds   seconds    calls  ms/call  ms/call  name    
  26.90      0.39     0.16                             main
  22.67      0.53     0.13  7604961     0.00     0.00  CBuffer<cvertex>::operator[](unsigned)
   8.12      0.58     0.05     9054     0.01     0.03  CControl::process()
   0.85      0.58     0.01  7605360     0.00     0.00  CBuffer<unsigned>::operator[](unsigned)
   0.51      0.58     0.00     9053     0.00     0.00  DefaultInverseSqrt::operator()(float)
   0.51      0.59     0.00     9053     0.00     0.00  CCamera::apply(CCamera::ERotationMode)
...

Optimizing GCC output based on profiling information

GCC has two interesting options: -fprofile-arcs and -fbranch-probabilities.
When using profiling information, compilation becomes a 2-pass process:

  • First pass: Gathering of profiling information:
    • Compile your project with your usual optimization options, plus -fprofile-arcs.
      $ g++ -Wall -03 -mcpu=athlon-mp -ffast-math -fprofile-arcs -c mySource.cxx
      ...
      
    • Run your program and experience it in a "production-like" way. It will create a bunch of .da files.
  • Second pass: Profiling-based optimization:
    • Compile your project again, replacing -fprofile-arcs with -fbranch-probabilities.
      $ g++ -Wall -03 -mcpu=athlon-mp -ffast-math -fbranch-probabilities -c mySource.cxx
      ...
      

Other interesting gcc experimental features are -fssa, -fssa-cpp, -fssa-dce (see man gcc).
Also, here is a little profiler I wrote for ia32/ia64.

Powered by Blogger.