Sunday, 21st April 2024: While working with different HPC performance tools, Linux perf has always intrigued me. Over the years I have curiously watched Brendan Gregg's talks and have learned a lot from various examples. As a performance engineer, I find Linux perf's capabilities fascinating. However, I have struggled to fully engage with Linux perf. This disconnect can be attributed to various factors. For instance, perf is often unavailable or lacks necessary permissions on HPC systems, its focus is primarily on single-server analysis rather than distributed applications executing on a large number of servers, etc.… Read the rest
Methods
Navigating the Complexity of Large Codebases Using Vtune + xdot (or perf + gprof2dot)
Sunday, 7th April 2024: Back in September 2013, when I started my journey at the Blue Brain, I was navigating relatively large codebases for the first time. I was eager to gain a comprehensive understanding of code structures, their execution workflow, and performance aspects. During this period I started using Intel Vtune with xdot/gprof2dot and found it extremely useful. With this combination, I could generate detailed execution call graphs of applications and then sit together to deep dive into both the structural and performance aspects of the code with the senior engineers.… Read the rest
C-Reduce: Systematically Tackling (Not Only) Compiler Bugs
Sunday, 14th January 2024: Happy to get this post out within the first month of 2024! I started writing this in November 2023 and was hoping to get this out sooner. But then my second daughter 👧 arrived, putting some extra time constraints on how much free time I could find. Now, as I finally getting back to the schedule, I am delighted to bring this post to completion.
This blog post deviates a bit from the usual focus on performance-related aspects, and there's a specific reason for this.… Read the rest
core-to-core-latency: A Nice Little Tool!
Saturday, 23rd Sept 2023: I've been curiously staring at my blog for quite some time, and it reminds me over and over again that it's been nearly two years since I managed to write new content here 😞. I have a few work-in-progress articles, and unfortunately, they've remained incomplete for quite some time. It's been a bit challenging to find dedicated long weekend hours to write the detailed posts that I really love.… Read the rest
Python Profiling : Deterministic vs Statistical Profilers
Different python profiling tools use different methodologies for gathering performance data and hence have different runtime overhead. Before choosing a profiler tool it is helpful to understand two commonly employed techniques for collecting performance data :
- Deterministic profiling Deterministic profilers execute trace functions at various points of interest (e.g. function call, function return) and record precise timings of these events. Typically this requires source code instrumentation but python provides hooks (optional callbacks) which can be used to insert trace functions.
- Statistical profiling Instead of tracking every event (e.g. call to every function), statistical profilers interrupt application periodically and collect samples of the execution state (call stack snapshots).