System benchmarks.


In what year would this be the world’s fastest computer?

Of the 500 fastest computers in the world, 500 of them run the Linux kernel. Of the distributions used by the top 500 clusters, there is a list, but not really summarized. For instance at this URL for the November 2023, “Operating System” category, RHEL, Ubuntu, etc. are listed in many different versions. Just plain “Linux” is listed for 45% of the clusters.

By my rough summary, of the 500 machines on the list in November, 2023, 272 of them have a known distro. Broken down into major distro categories, it is roughly:

  • RHEL: 56%, 152 systems.

  • SLES: 29%, 79 systems.

  • Ubuntu: 15%, 41 systems.

Although Ubuntu is a Debian derivative, none of the systems listed Debian. There were no Arch Linux, Gentoo, or similar other distros listed. Of the RHEL clones, Rocky Linux appears to be ascendant.


The Linpack TPP benchmark “measures the floating point rate of execution for solving a linear system of equations.”


There is a ROCm optimized version of HPL.

  • It looks like it hasn’t been updated for ROCm release 6.0.2 though. The gfx1100 isn’t listed.

  • Depends on roctracer and roctx.

  • May need MPI recompiled for GPU.

  • OpenMP may be needed too (if not here, elsewhere).


DGEMM “measures the floating point rate of execution of double precision real matrix-matrix multiplication.”


STREAM is “a simple synthetic benchmark program that measures sustainable memory bandwidth (in GB/s) and the corresponding computation rate for simple vector kernel.”


PTRANS (parallel matrix transpose) “exercises the communications where pairs of processors communicate with each other simultaneously. It is a useful test of the total communications capacity of the network.”


“RandomAccess measures the rate of integer random updates of memory (GUPS).”


“FFT measures the floating point rate of execution of double precision complex one-dimensional Discrete Fourier Transform (DFT).”

Communication Bandwidth and Latency

Communication bandwidth and latency is “a set of tests to measure latency and bandwidth of a number of simultaneous communication patterns; based on b_eff (effective bandwidth benchmark).”


HPC Challenge benchmarks.

The HPC Challenge benchmarks are in the Debian hpcc package.

cp -p /usr/share/doc/hpcc/examples/_hpccinf.txt hpccinf.txt

See the Output section of this documentation for benchmark results.


Benchmarks in tinygrad.


Proprietary with a few libre datasets and benchmarks available.

Don’t let “Commons” in the name lead you to think this is available to the mere public. Lots of proprietary bits involved, closed lists, corporate signups and signatures, etc. Their use of “Commons” in their name perhaps causes confusion in the marketplace with Wikipedia Commons (and other groups that serve the public). This isn’t like Wikipedia Commons at all.

The upstream tinycorp is working on implementing some of their benchmarks using tinygrad and AMD GPUs.

Phoronix Test Suite

Phoronix test suite:

git clone
cd phoronix-test-suite/
apt install php-cli php-xml
./phoronix-test-suite list-missing-dependencies
./phoronix-test-suite list-tests
./phoronix-test-suite install pts/hpcc

Meh, this automatically installs dependencies and builds, but doesn’t use ROCm.


Benchmarks optimized for ROCm.


HPL for ROCm from AMD.

git clone
cd rocHPL/
 # git checkout v6.0.0 # build fails in Ubuntu
# ./build/bin/rochpl --input ./build/rocHPL/HPL.dat
# 1 GPU (works then fails subsequent runs)
./mpirun_rochpl -P 1 -Q 1 -N  45056 --NB 384
Node Binding: Process 0 [(p,q)=(0,0)] CPU Cores: 64 - {0-63}
GPU  Binding: Process 0 [(p,q)=(0,0)] GPU: 0, pciBusID c3
Local matrix size = 15.1361 GBs
./mpirun_rochpl -P 1 -Q 2 -N  64000 --NB 384
./mpirun_rochpl -P 2 -Q 2 -N  90112 --NB 384
./mpirun_rochpl -P 2 -Q 4 -N  128000 --NB 384


HPCG for ROCm.

git clone
cd rocHPCG/