1. Introduction
This guide describes benchmark examples that are provided by unclebench-benchmarks package and their main customizable parameters.
2. Stream
Description:
STREAM is a memory bandwidth benchmark (https://www.cs.virginia.edu/stream/).
Parameters
Parameter | Description |
---|---|
stream_array_size |
Size of stream work array, it is recommanded to at least 4 times the size of the last level of cache. Default value should not be modified. |
compile_options |
Better results can be obtained by using some specific compiler options. |
3. HPL
Description:
HPL is the High Performance Linpack benchmark (http://www.netlib.org/benchmark/hpl/).
It is used to measure the maximal number of double precision operations per second that can be executed on a system.
It is not memory-bound, fully vectorized and highly scalable.
Parameters:
Parameter | Description |
---|---|
variant_NB |
Block size, can impact HPL performance. |
memory_proportion |
Proportion of available memory that is used to define the matrix size. It is set to 0.4 by default (40%). |
variant_Ntemp |
May be modified to directly specify the main matrix size. |
4. NPB
Description:
NAS Parallel benchmark is a set of programs with diverse communication schemes and is especially useful to evaluate MPI performance. (https://www.nas.nasa.gov/publications/npb.html).
Parameters:
Parameter | Description |
---|---|
Bench |
Benchmarks to test, possible values are : cg, lu, is, ep, ft, mg |
Class |
Possible values can be found on https://www.nas.nasa.gov/publications/npb_problem_sizes.html (A, B, …) |
processes |
Number of processes |
5. IOR
Description:
Parallel filesystem I/O benchmark can be used for testing performance of parallel file systems using various interfaces and access patterns.
Parameters:
Parameter | Description |
---|---|
variant_v |
0 indicates a file is shared among all MPI task, 1 indicates one file per MPI task |
file_size |
File size |
xfer |
Size of each write/read in MBytes |
block_size |
Size of contiguous blocks written per task (it is a multiple of 'xfer') |
6. IMB
Description:
Intel MPI benchmarks performs a set of MPI performance measurements for point-to-point and global communication operations for a range of message sizes.
Parameters:
Parameter | Description |
---|---|
nodes |
Number of nodes to use (Be aware, a high number of nodes used could increase drastically the time of collective MPI commuications) |
args_exec |
Benchmark to execute: pingpong, sendrecv, allreduce, alltoallv and IMB options like -map or msglog (see official IMB documentation for their meaning). |
7. HPCC
Description:
It is a suite of benchmarks that measure performance of processor, memory subsytem, and the interconnect
Parameters:
Parameter | Description |
---|---|
variant_NB |
Block size (It could affect Linpack performance) |
variant_Ntemp |
It is possible to modify the formula and use bigger matrix for Linpack (The memory size is limited by default to 40% of available memory). |
8. IO500
Description:
It is a suite of benchmarks including ior and mdtest tests that measure I/O performance, see https://www.vi4io.org/std/io500 for more details.
Useful Parameters:
Parameter | Description |
---|---|
tasks_per_node |
Default is the number of NUMA regions of a node, but different number of tasks per node could give better results. |
ior_params |
Default is "-t 2048k -b 2g -F" which means 2048 block sizes, 2 GB writen per processus and one file per processus. |
9. Tensorflow
Description:
It performs Tensorflow benchmarks on GPU, it is composed of 4 different benchmarks using very well known neural network models. See https://www.tensorflow.org/performance/benchmarks for more details.
The benchmark can be run on several machines, however the same instance of the benchmark will be run on each machine. There is no distributed implementation of the benchmarks used.
Useful Parameters:
Parameter | Description |
---|---|
bench_id |
chooses a model as well as the right batch size for the bechmark. By default all possible benchmarks are executed: 0, 1, 2, 3. |
model |
model of neural network to be used during benchmarking, options are: inceptionv3, resnet-50, resnet-152, vgg16, alexNex. By default all models are run. |
batch_size |
Size of batch, number of images to feed to the neural network at a time. Values depend on the model used, default values are taken from https://www.tensorflow.org/performance/benchmarks. |