C-Reduce: Systematically Tackling (Not Only) Compiler Bugs

Sunday, 14th January 2024: Happy to get this post out within the first month of 2024! I started writing this in November 2023 and was hoping to get this out sooner. But then my second daughter 👧 arrived, putting some extra time constraints on how much free time I could find. Now, as I finally getting back to the schedule, I am delighted to bring this post to completion.

This blog post deviates a bit from the usual focus on performance-related aspects, and there's a specific reason for this.… Read the rest

pahole: Analysing Memory Layout of Complex Data Structures With Ease

Sunday, 5th November 2023: Putting together this blog post feels like a positive stride! As I mentioned in the previous post, (core-to-core latency tool), I'm aiming to integrate more consistent writing into my routine. While it took a month to pen this down, it's progress from the previous year 😇. Hoping that the upward trend will continue...!

During the past few weekends, I've been reading about the capabilities of the perf C2C (Cache To Cache) tool.… Read the rest

core-to-core-latency: A Nice Little Tool!

Saturday, 23rd Sept 2023: I've been curiously staring at my blog for quite some time, and it reminds me over and over again that it's been nearly two years since I managed to write new content here 😞. I have a few work-in-progress articles, and unfortunately, they've remained incomplete for quite some time. It's been a bit challenging to find dedicated long weekend hours to write the detailed posts that I really love.… Read the rest

LinkTest : Measuring Communication Latency and Bandwidth At Scale

October 2020: What makes supercomputers special? They have state-of-the-art processors, fast parallel file systems, specialized power & cooling infrastructure and complex software stack to run. But, a high-speed interconnect that tightly integrates thousands of nodes differentiate a supercomputer from a commodity cluster. Data movement within a node or across nodes is an important aspect for many scientific applications and hence low latency, high bandwidth interconnect technology is one of the key elements of the HPC systems.

Setting up such a system with tens of thousands of nodes and performance tuning is not an easy task. Especially during the early days of deployment and acceptance benchmarking where we often have to run various tests for weeks to identify issues, fix them and reach expected performance.… Read the rest

Understanding CPU Architecture And Performance Using LIKWID

March 2020: I was planning to write about CPU microarchitecture analysis for a long time. I started writing this post more than a year ago, just before the beginning of COVID-19. But with so many things happening around (and new parenting responsibilities 👧), this got delayed for quite a long time. Finally getting some weekend time to get this out!

Like previous blog posts, this also became longer and longer as I started writing details.… Read the rest

I/O Performance Analysis with Darshan

When optimizing parallel applications at scale, we often focus on computation-communication aspects and I/O often gets limited attention. With increasing performance gap between compute and I/O subsystems, improving I/O performance remains one of the major challenge. As filesystem is a shared resource, few jobs running on a system can significantly impact performance of other applications. In such scenario, even if we use profiling tool (see list here) to identify slow I/O routines, it's difficult to understand real cause. For example, there might be other applications dominating filesystem resulting in poor I/O performance.… Read the rest

Summary of Debugging Tools for Parallel Applications

Nowadays it's not uncommon to run parallel applications with hundreds of thousands of processes on supercomputing platforms. Debugging these parallel applications with sporadic crashes, deadlocks, memory errors or incorrect results is a challenging task. There are number of tools available that help identifying and fixing bugs but one needs to understand tools, their capabilities and when they can be used. This post tries to summarise various debugging tools (open source as well as commercial).

Note that not all tools can be used with distributed applications. For example, open source tools like GDB and Valgrind are commonly used for debugging serial, multi-threaded applications.… Read the rest