LinkTest : Measuring Communication Latency and Bandwidth At Scale

October 2020: What makes supercomputers special? They have state-of-the-art processors, fast parallel file systems, specialized power & cooling infrastructure and complex software stack to run. But, a high-speed interconnect that tightly integrates thousands of nodes differentiate a supercomputer from a commodity cluster. Data movement within a node or across nodes is an important aspect for many scientific applications and hence low latency, high bandwidth interconnect technology is one of the key elements of the HPC systems.

Setting up such a system with tens of thousands of nodes and performance tuning is not an easy task. Especially during the early days of deployment and acceptance benchmarking where we often have to run various tests for weeks to identify issues, fix them and reach expected performance.… Read the rest