Summary of Debugging Tools for Parallel Applications

Nowadays it's not uncommon to run parallel applications with hundreds of thousands of processes on supercomputing platforms. Debugging these parallel applications with sporadic crashes, deadlocks, memory errors or incorrect results is a challenging task. There are number of tools available that help identifying and fixing bugs but one needs to understand tools, their capabilities and when they can be used. This post tries to summarise various debugging tools (open source as well as commercial).

Note that not all tools can be used with distributed applications. For example, open source tools like GDB and Valgrind are commonly used for debugging serial, multi-threaded applications.… Read the rest

Summary of Profiling Tools for Parallel Applications

Many scientific/industrial applications run on workstation to largest supercomputers in the world. With the continuous evolution of hardware platforms, achieving good performance is a challenging task. There are many profiling tools available to analyse and optimise the performance. But not all tools/methods are available on every platform, especially in high performance computing. First step in performance engineering workflow is to understand which tools are available and when they can be used. There is no one-size-fits-all solution : some are designed with broad feature list for high level analysis and others for specific platform with low level hardware metrics.

While choosing profiling tool one need to consider different aspects:

  • Goal : Are you interested in high level performance metrics?
Read the rest