Benchmark Output
Benchmark output.
HPCC
Ran with default configuration from
/usr/share/doc/hpcc/examples/_hpccinf.txt
,
not configured for system.
Just using one GPU, if any at all. Not a benchmark of the system, but an example of the benchmarks’ output.
########################################################################
This is the DARPA/DOE HPC Challenge Benchmark version 1.5.0 October 2012
Produced by Jack Dongarra and Piotr Luszczek
Innovative Computing Laboratory
University of Tennessee Knoxville and Oak Ridge National Laboratory
See the source files for authors of specific codes.
Compiled on Jul 28 2022 at 20:21:34
Current time (1707329577) is Wed Feb 7 11:12:57 2024
Hostname: 'tinyrocs-01'
########################################################################
================================================================================
HPLinpack 2.0 -- High-Performance Linpack benchmark -- September 10, 2008
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 2560
NB : 80
PMAP : Column-major process mapping
P : 1
Q : 1
PFACT : Right
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : 1ringM
DEPTH : 1
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
Begin of MPIRandomAccess section.
Running on 1 processors (PowerofTwo)
Total Main table size = 2^22 = 4194304 words
PE Main table size = 2^22 = 4194304 words/PE
Default number of updates (RECOMMENDED) = 16777216
Number of updates EXECUTED = 16777216 (for a TIME BOUND of 60.00 secs)
CPU time used = 0.711323 seconds
Real time used = 0.775589 seconds
0.021631567 Billion(10^9) Updates per second [GUP/s]
0.021631567 Billion(10^9) Updates/PE per second [GUP/s]
Verification: CPU time used = 0.116873 seconds
Verification: Real time used = 0.122269 seconds
Found 0 errors in 4194304 locations (passed).
Current time (1707329578) is Wed Feb 7 11:12:58 2024
End of MPIRandomAccess section.
Begin of StarRandomAccess section.
Main table size = 2^22 = 4194304 words
Number of updates = 16777216
CPU time used = 0.116436 seconds
Real time used = 0.120543 seconds
0.139180173 Billion(10^9) Updates per second [GUP/s]
Found 0 errors in 4194304 locations (passed).
Node(s) with error 0
Minimum GUP/s 0.139180
Average GUP/s 0.139180
Maximum GUP/s 0.139180
Current time (1707329579) is Wed Feb 7 11:12:59 2024
End of StarRandomAccess section.
Begin of SingleRandomAccess section.
Main table size = 2^22 = 4194304 words
Number of updates = 16777216
CPU time used = 0.117685 seconds
Real time used = 0.121014 seconds
0.138638948 Billion(10^9) Updates per second [GUP/s]
Found 0 errors in 4194304 locations (passed).
Node(s) with error 0
Node selected 0
Single GUP/s 0.138639
Current time (1707329579) is Wed Feb 7 11:12:59 2024
End of SingleRandomAccess section.
Begin of MPIRandomAccess_LCG section.
Running on 1 processors (PowerofTwo)
Total Main table size = 2^22 = 4194304 words
PE Main table size = 2^22 = 4194304 words/PE
Default number of updates (RECOMMENDED) = 16777216
Number of updates EXECUTED = 16777216 (for a TIME BOUND of 60.00 secs)
CPU time used = 0.823503 seconds
Real time used = 0.835786 seconds
0.020073589 Billion(10^9) Updates per second [GUP/s]
0.020073589 Billion(10^9) Updates/PE per second [GUP/s]
Verification: CPU time used = 0.129699 seconds
Verification: Real time used = 0.131317 seconds
Found 0 errors in 4194304 locations (passed).
Current time (1707329580) is Wed Feb 7 11:13:00 2024
End of MPIRandomAccess_LCG section.
Begin of StarRandomAccess_LCG section.
Main table size = 2^22 = 4194304 words
Number of updates = 16777216
CPU time used = 0.105328 seconds
Real time used = 0.105761 seconds
0.158632666 Billion(10^9) Updates per second [GUP/s]
Found 0 errors in 4194304 locations (passed).
Node(s) with error 0
Minimum GUP/s 0.158633
Average GUP/s 0.158633
Maximum GUP/s 0.158633
Current time (1707329580) is Wed Feb 7 11:13:00 2024
End of StarRandomAccess_LCG section.
Begin of SingleRandomAccess_LCG section.
Main table size = 2^22 = 4194304 words
Number of updates = 16777216
CPU time used = 0.104021 seconds
Real time used = 0.105718 seconds
0.158697654 Billion(10^9) Updates per second [GUP/s]
Found 0 errors in 4194304 locations (passed).
Node(s) with error 0
Node selected 0
Single GUP/s 0.158698
Current time (1707329580) is Wed Feb 7 11:13:00 2024
End of SingleRandomAccess_LCG section.
Begin of PTRANS section.
M: 1280
N: 1280
MB: 80
NB: 80
P: 1
Q: 1
TIME M N MB NB P Q TIME CHECK GB/s RESID
---- ----- ----- --- --- --- --- -------- ------ -------- -----
WALL 1280 1280 80 80 1 1 0.03 PASSED 0.463 0.00
CPU 1280 1280 80 80 1 1 0.03 PASSED 0.486 0.00
WALL 1280 1280 80 80 1 1 0.03 PASSED 0.460 0.00
CPU 1280 1280 80 80 1 1 0.03 PASSED 0.460 0.00
WALL 1280 1280 80 80 1 1 0.03 PASSED 0.460 0.00
CPU 1280 1280 80 80 1 1 0.03 PASSED 0.464 0.00
WALL 1280 1280 80 80 1 1 0.03 PASSED 0.460 0.00
CPU 1280 1280 80 80 1 1 0.03 PASSED 0.460 0.00
WALL 1280 1280 80 80 1 1 0.03 PASSED 0.460 0.00
CPU 1280 1280 80 80 1 1 0.03 PASSED 0.463 0.00
Finished 5 tests, with the following results:
5 tests completed and passed residual checks.
0 tests completed and failed residual checks.
0 tests skipped because of illegal input values.
END OF TESTS.
Current time (1707329581) is Wed Feb 7 11:13:01 2024
End of PTRANS section.
Begin of StarDGEMM section.
Scaled residual: 0.013912
Node(s) with error 0
Minimum Gflop/s 146.361991
Average Gflop/s 146.361991
Maximum Gflop/s 146.361991
Current time (1707329581) is Wed Feb 7 11:13:01 2024
End of StarDGEMM section.
Begin of SingleDGEMM section.
Scaled residual: 0.013912
Node(s) with error 0
Node selected 0
Single DGEMM Gflop/s 195.359250
Current time (1707329581) is Wed Feb 7 11:13:01 2024
End of SingleDGEMM section.
Begin of StarSTREAM section.
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2184533, Offset = 0
Total memory required = 0.0488 GiB.
Each test is run 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
The SCALAR value used for this run is 0.420000
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 1531 microseconds.
(= 1531 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
VERBOSE: total setup time for rank 0 = 0.015617 seconds
-------------------------------------------------------------
Function Rate (GB/s) Avg time Min time Max time
Copy: 28.9555 0.0012 0.0012 0.0012
Scale: 20.7960 0.0017 0.0017 0.0017
Add: 22.1379 0.0024 0.0024 0.0024
Triad: 22.5178 0.0023 0.0023 0.0024
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
Node(s) with error 0
Minimum Copy GB/s 28.955521
Average Copy GB/s 28.955521
Maximum Copy GB/s 28.955521
Minimum Scale GB/s 20.796040
Average Scale GB/s 20.796040
Maximum Scale GB/s 20.796040
Minimum Add GB/s 22.137910
Average Add GB/s 22.137910
Maximum Add GB/s 22.137910
Minimum Triad GB/s 22.517757
Average Triad GB/s 22.517757
Maximum Triad GB/s 22.517757
Current time (1707329581) is Wed Feb 7 11:13:01 2024
End of StarSTREAM section.
Begin of SingleSTREAM section.
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2184533, Offset = 0
Total memory required = 0.0488 GiB.
Each test is run 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
The SCALAR value used for this run is 0.420000
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 1545 microseconds.
(= 1545 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
VERBOSE: total setup time for rank 0 = 0.008809 seconds
-------------------------------------------------------------
Function Rate (GB/s) Avg time Min time Max time
Copy: 28.9555 0.0014 0.0012 0.0013
Scale: 21.6015 0.0018 0.0016 0.0017
Add: 23.0975 0.0026 0.0023 0.0024
Triad: 23.5923 0.0025 0.0022 0.0024
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
Node(s) with error 0
Node selected 0
Single STREAM Copy GB/s 28.955521
Single STREAM Scale GB/s 21.601503
Single STREAM Add GB/s 23.097493
Single STREAM Triad GB/s 23.592322
Current time (1707329581) is Wed Feb 7 11:13:01 2024
End of SingleSTREAM section.
Begin of MPIFFT section.
Number of nodes: 1
Vector size: 524288
Generation time: 0.022
Tuning: 0.009
Computing: 0.030
Inverse FFT: 0.030
max(|x-x0|): 1.423e-15
Gflop/s: 1.635
Current time (1707329581) is Wed Feb 7 11:13:01 2024
End of MPIFFT section.
Begin of StarFFT section.
Vector size: 1048576
Generation time: 0.043
Tuning: 0.000
Computing: 0.027
Inverse FFT: 0.029
max(|x-x0|): 1.698e-15
Node(s) with error 0
Minimum Gflop/s 3.815061
Average Gflop/s 3.815061
Maximum Gflop/s 3.815061
Current time (1707329581) is Wed Feb 7 11:13:01 2024
End of StarFFT section.
Begin of SingleFFT section.
Vector size: 1048576
Generation time: 0.043
Tuning: 0.000
Computing: 0.029
Inverse FFT: 0.029
max(|x-x0|): 1.698e-15
Node(s) with error 0
Node selected 0
Single FFT Gflop/s 3.623061
Current time (1707329582) is Wed Feb 7 11:13:02 2024
End of SingleFFT section.
Begin of LatencyBandwidth section.
Current time (1707329582) is Wed Feb 7 11:13:02 2024
End of LatencyBandwidth section.
Begin of HPL section.
================================================================================
HPLinpack 2.0 -- High-Performance Linpack benchmark -- September 10, 2008
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 2560
NB : 80
PMAP : Column-major process mapping
P : 1
Q : 1
PFACT : Right
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : 1ringM
DEPTH : 1
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WC11C2R4 2560 80 1 1 0.31 3.650e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0034130 ...... PASSED
================================================================================
Finished 1 tests with the following results:
1 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------
End of Tests.
================================================================================
Current time (1707329582) is Wed Feb 7 11:13:02 2024
End of HPL section.
Begin of Summary section.
VersionMajor=1
VersionMinor=5
VersionMicro=0
VersionRelease=f
LANG=C
Success=1
sizeof_char=1
sizeof_short=2
sizeof_int=4
sizeof_long=8
sizeof_void_ptr=8
sizeof_size_t=8
sizeof_float=4
sizeof_double=8
sizeof_s64Int=8
sizeof_u64Int=8
sizeof_struct_double_double=16
CommWorldProcs=1
MPI_Wtick=1.000000e-09
HPL_Tflops=0.0364979
HPL_time=0.30672
HPL_eps=1.11022e-16
HPL_RnormI=1.78083e-12
HPL_Anorm1=666.101
HPL_AnormI=663.835
HPL_Xnorm1=1494.04
HPL_XnormI=2.76479
HPL_BnormI=0.499975
HPL_N=2560
HPL_NB=80
HPL_nprow=1
HPL_npcol=1
HPL_depth=1
HPL_nbdiv=2
HPL_nbmin=4
HPL_cpfact=R
HPL_crfact=C
HPL_ctop=1
HPL_order=C
HPL_dMACH_EPS=1.110223e-16
HPL_dMACH_SFMIN=2.225074e-308
HPL_dMACH_BASE=2.000000e+00
HPL_dMACH_PREC=2.220446e-16
HPL_dMACH_MLEN=5.300000e+01
HPL_dMACH_RND=1.000000e+00
HPL_dMACH_EMIN=-1.021000e+03
HPL_dMACH_RMIN=2.225074e-308
HPL_dMACH_EMAX=1.024000e+03
HPL_dMACH_RMAX=1.797693e+308
HPL_sMACH_EPS=5.960464e-08
HPL_sMACH_SFMIN=1.175494e-38
HPL_sMACH_BASE=2.000000e+00
HPL_sMACH_PREC=1.192093e-07
HPL_sMACH_MLEN=2.400000e+01
HPL_sMACH_RND=1.000000e+00
HPL_sMACH_EMIN=-1.250000e+02
HPL_sMACH_RMIN=1.175494e-38
HPL_sMACH_EMAX=1.280000e+02
HPL_sMACH_RMAX=3.402823e+38
dweps=1.110223e-16
sweps=5.960464e-08
HPLMaxProcs=1
HPLMinProcs=1
DGEMM_N=1477
StarDGEMM_Gflops=146.362
SingleDGEMM_Gflops=195.359
PTRANS_GBs=0.459873
PTRANS_time=0.0283251
PTRANS_residual=0
PTRANS_n=1280
PTRANS_nb=80
PTRANS_nprow=1
PTRANS_npcol=1
MPIRandomAccess_LCG_N=4194304
MPIRandomAccess_LCG_time=0.835786
MPIRandomAccess_LCG_CheckTime=0.131317
MPIRandomAccess_LCG_Errors=0
MPIRandomAccess_LCG_ErrorsFraction=0
MPIRandomAccess_LCG_ExeUpdates=16777216
MPIRandomAccess_LCG_GUPs=0.0200736
MPIRandomAccess_LCG_TimeBound=60
MPIRandomAccess_LCG_Algorithm=0
MPIRandomAccess_N=4194304
MPIRandomAccess_time=0.775589
MPIRandomAccess_CheckTime=0.122269
MPIRandomAccess_Errors=0
MPIRandomAccess_ErrorsFraction=0
MPIRandomAccess_ExeUpdates=16777216
MPIRandomAccess_GUPs=0.0216316
MPIRandomAccess_TimeBound=60
MPIRandomAccess_Algorithm=0
RandomAccess_LCG_N=4194304
StarRandomAccess_LCG_GUPs=0.158633
SingleRandomAccess_LCG_GUPs=0.158698
RandomAccess_N=4194304
StarRandomAccess_GUPs=0.13918
SingleRandomAccess_GUPs=0.138639
STREAM_VectorSize=2184533
STREAM_Threads=1
StarSTREAM_Copy=28.9555
StarSTREAM_Scale=20.796
StarSTREAM_Add=22.1379
StarSTREAM_Triad=22.5178
SingleSTREAM_Copy=28.9555
SingleSTREAM_Scale=21.6015
SingleSTREAM_Add=23.0975
SingleSTREAM_Triad=23.5923
FFT_N=1048576
StarFFT_Gflops=3.81506
SingleFFT_Gflops=3.62306
MPIFFT_N=524288
MPIFFT_Gflops=1.63489
MPIFFT_maxErr=1.42286e-15
MPIFFT_Procs=1
MaxPingPongLatency_usec=-1
RandomlyOrderedRingLatency_usec=-1
MinPingPongBandwidth_GBytes=-1
NaturallyOrderedRingBandwidth_GBytes=-1
RandomlyOrderedRingBandwidth_GBytes=-1
MinPingPongLatency_usec=-1
AvgPingPongLatency_usec=-1
MaxPingPongBandwidth_GBytes=-1
AvgPingPongBandwidth_GBytes=-1
NaturallyOrderedRingLatency_usec=-1
FFTEnblk=16
FFTEnp=8
FFTEl2size=1048576
M_OPENMP=-1
omp_get_num_threads=0
omp_get_max_threads=0
omp_get_num_procs=0
MemProc=64
MemSpec=-1
MemVal=-1
MPIFFT_time0=1.4e-07
MPIFFT_time1=0.00198887
MPIFFT_time2=0.0114458
MPIFFT_time3=0.00053736
MPIFFT_time4=0.0140529
MPIFFT_time5=0.00190846
MPIFFT_time6=1.3e-07
CPS_HPCC_FFT_235=0
CPS_HPCC_FFTW_ESTIMATE=0
CPS_HPCC_MEMALLCTR=0
CPS_HPL_USE_GETPROCESSTIMES=0
CPS_RA_SANDIA_NOPT=0
CPS_RA_SANDIA_OPT2=0
CPS_USING_FFTW=0
End of Summary section.
########################################################################
End of HPC Challenge tests.
Current time (1707329582) is Wed Feb 7 11:13:02 2024
########################################################################