Python Profiling : Deterministic vs Statistical Profilers

Different python profiling tools use different methodologies for gathering performance data and hence have different runtime overhead. Before choosing a profiler tool it is helpful to understand two commonly employed techniques for collecting performance data :

  • Deterministic profiling Deterministic profilers execute trace functions at various points of interest (e.g. function call, function return) and record precise timings of these events. Typically this requires source code instrumentation but python provides hooks (optional callbacks) which can be used to insert trace functions.
  • Statistical profiling Instead of tracking every event (e.g. call to every function), statistical profilers interrupt application periodically and collect samples of the execution state (call stack snapshots). The call stacks are then analysed to determine execution time of different parts of the application.

There are several deterministic profilers available based on built-in modules like profile and cProfile. These profilers can gather high-resolution profiling information but often there is one main drawback : high overhead. If the application has large number of function calls, you can imagine that the profiler will end up collecting lots of record. And if these functions are small, their execution time could be inaccurate due to overhead from measurement itself.

If we are developing small applications or debugging on workstation, this overhead is ok. But in production environment we do not want to use such tools if there is a noticeable performance impact. This is where statistical profilers come to rescue. These profilers samples the effective instruction pointer to determine where the execution time is spent. Also, one can adjust the sampling interval as needed : set sampling resolution very low for non-interesting part of the application and increase it only when you need high resolution profiling data. This is very helpful in production environment where you can dynamically adjust the profiling settings.

Consider below toy example with call chain :

Sample toy example call-chain

And here is sample code (for demonstration only):

If we run above example without any profiler, it takes ~1.8 seconds on my workstation:

When we run the same example under cProfile, the execution time increases to ~4.5 seconds :

So there is roughly 2.5x slowdown. Instead of cProfile if we use sampling profiler like PyFlame, Py-Spy or Plop, we see :

With these sampling profilers the overhead is considerably low (~ 10%). And this is why for large, complex applications in production environment the sampling profilers are more interesting. Read more about this on Uber blog.

Here is flamegraph produced from Py-Flame :

And this one from Py-Spy :

If you are wondering about different profiling tools available for python applications, here is short list : cProfile, PyCallGraph, RunSnakeRun, vprof, line_profiler, What Studio Profiling, PyFlame, plop, pprofile, StackImpact Python Profiler, Py-Spy, memory_profiler, vmprof, Pyinstrument, Python-Flamegraph, PyVmMonitor

Similar to Summary of Profiling Tools For Parallel Applications, comprehensive summary of 30 different python profilers will follow soon!