Benchmarks
System benchmarks.
Top500
In what year would this be the world’s fastest computer?
Of the 500 fastest computers in the world, 500 of them run the Linux kernel. Of the distributions used by the top 500 clusters, there is a list, but not really summarized. For instance at this URL for the November 2023, “Operating System” category, RHEL, Ubuntu, etc. are listed in many different versions. Just plain “Linux” is listed for 45% of the clusters.
By my rough summary, of the 500 machines on the list in November, 2023, 272 of them have a known distro. Broken down into major distro categories, it is roughly:
RHEL: 56%, 152 systems.
SLES: 29%, 79 systems.
Ubuntu: 15%, 41 systems.
Although Ubuntu is a Debian derivative, none of the systems listed Debian. There were no Arch Linux, Gentoo, or similar other distros listed. Of the RHEL clones, Rocky Linux appears to be ascendant.
Linpack
The Linpack TPP benchmark “measures the floating point rate of execution for solving a linear system of equations.”
rocHPL
There is a ROCm optimized version of HPL.
It looks like it hasn’t been updated for ROCm release 6.0.2 though. The
gfx1100
isn’t listed.Depends on
roctracer
androctx
.May need MPI recompiled for GPU.
OpenMP may be needed too (if not here, elsewhere).
DGEMM
DGEMM “measures the floating point rate of execution of double precision real matrix-matrix multiplication.”
STREAM
STREAM is “a simple synthetic benchmark program that measures sustainable memory bandwidth (in GB/s) and the corresponding computation rate for simple vector kernel.”
PTRANS
PTRANS (parallel matrix transpose) “exercises the communications where pairs of processors communicate with each other simultaneously. It is a useful test of the total communications capacity of the network.”
RandomAccess
“RandomAccess measures the rate of integer random updates of memory (GUPS).”
FFT
“FFT measures the floating point rate of execution of double precision complex one-dimensional Discrete Fourier Transform (DFT).”
Communication Bandwidth and Latency
Communication bandwidth and latency is “a set of tests to measure latency and bandwidth of a number of simultaneous communication patterns; based on b_eff (effective bandwidth benchmark).”
hpcc
HPC Challenge benchmarks.
The HPC Challenge benchmarks are in the Debian hpcc
package.
cp -p /usr/share/doc/hpcc/examples/_hpccinf.txt hpccinf.txt
hpcc
See the Output section of this documentation for benchmark results.
tinygrad
Benchmarks in tinygrad.
mlnotcommons
Proprietary with a few libre datasets and benchmarks available.
Don’t let “Commons” in the name lead you to think this is available to the mere public. Lots of proprietary bits involved, closed lists, corporate signups and signatures, etc. Their use of “Commons” in their name perhaps causes confusion in the marketplace with Wikipedia Commons (and other groups that serve the public). This isn’t like Wikipedia Commons at all.
The upstream tinycorp is working on implementing some of their benchmarks using
tinygrad
and AMD GPUs.
Phoronix Test Suite
Phoronix test suite:
git clone https://github.com/phoronix-test-suite/phoronix-test-suite/
cd phoronix-test-suite/
apt install php-cli php-xml
./phoronix-test-suite list-missing-dependencies
./phoronix-test-suite list-tests
./phoronix-test-suite install pts/hpcc
Meh, this automatically installs dependencies and builds, but doesn’t use ROCm.
ROCm
Benchmarks optimized for ROCm.
HPL
HPL for ROCm from AMD.
git clone https://github.com/ROCm/rocHPL
cd rocHPL/
# git checkout v6.0.0 # build fails in Ubuntu
./install.sh
# ./build/bin/rochpl --input ./build/rocHPL/HPL.dat
# 1 GPU (works then fails subsequent runs)
./mpirun_rochpl -P 1 -Q 1 -N 45056 --NB 384
Node Binding: Process 0 [(p,q)=(0,0)] CPU Cores: 64 - {0-63}
GPU Binding: Process 0 [(p,q)=(0,0)] GPU: 0, pciBusID c3
Local matrix size = 15.1361 GBs
./mpirun_rochpl -P 1 -Q 2 -N 64000 --NB 384
./mpirun_rochpl -P 2 -Q 2 -N 90112 --NB 384
./mpirun_rochpl -P 2 -Q 4 -N 128000 --NB 384
HPCG
HPCG for ROCm.
git clone https://github.com/ROCm/rocHPCG
cd rocHPCG/
./install.sh