1. Introduction

This guide describes benchmark examples that are provided by unclebench-benchmarks package and their main customizable parameters.

2. Stream

Description:
STREAM is a memory bandwidth benchmark (https://www.cs.virginia.edu/stream/).

Parameters

Parameter Description

stream_array_size

Size of stream work array, it is recommanded to at least 4 times the size of the last level of cache. Default value should not be modified.

compile_options

Better results can be obtained by using some specific compiler options.

3. HPL

Description:
HPL is the High Performance Linpack benchmark (http://www.netlib.org/benchmark/hpl/).
It is used to measure the maximal number of double precision operations per second that can be executed on a system.
It is not memory-bound, fully vectorized and highly scalable.

Parameters:

Parameter Description

variant_NB

Block size, can impact HPL performance.

memory_proportion

Proportion of available memory that is used to define the matrix size. It is set to 0.4 by default (40%).

variant_Ntemp

May be modified to directly specify the main matrix size.

4. NPB

Description:
NAS Parallel benchmark is a set of programs with diverse communication schemes and is especially useful to evaluate MPI performance. (https://www.nas.nasa.gov/publications/npb.html).

Parameters:

Parameter Description

Bench

Benchmarks to test, possible values are : cg, lu, is, ep, ft, mg

Class

Possible values can be found on https://www.nas.nasa.gov/publications/npb_problem_sizes.html (A, B, …​)

processes

Number of processes

5. IOR

Description:

Parallel filesystem I/O benchmark can be used for testing performance of parallel file systems using various interfaces and access patterns.

Parameters:

Parameter Description

variant_v

0 indicates a file is shared among all MPI task, 1 indicates one file per MPI task

file_size

File size

xfer

Size of each write/read in MBytes

block_size

Size of contiguous blocks written per task (it is a multiple of 'xfer')

6. IMB

Description:

Intel MPI benchmarks performs a set of MPI performance measurements for point-to-point and global communication operations for a range of message sizes.

Parameters:

Parameter Description

nodes

Number of nodes to use (Be aware, a high number of nodes used could increase drastically the time of collective MPI commuications)

args_exec

Benchmark to execute: pingpong, sendrecv, allreduce, alltoallv and IMB options like -map or msglog (see official IMB documentation for their meaning).

7. HPCC

Description:

It is a suite of benchmarks that measure performance of processor, memory subsytem, and the interconnect

Parameters:

Parameter Description

variant_NB

Block size (It could affect Linpack performance)

variant_Ntemp

It is possible to modify the formula and use bigger matrix for Linpack (The memory size is limited by default to 40% of available memory).

8. IO500

Description:

It is a suite of benchmarks including ior and mdtest tests that measure I/O performance, see https://www.vi4io.org/std/io500 for more details.

Useful Parameters:

Parameter Description

tasks_per_node

Default is the number of NUMA regions of a node, but different number of tasks per node could give better results.

ior_params

Default is "-t 2048k -b 2g -F" which means 2048 block sizes, 2 GB writen per processus and one file per processus.

9. Tensorflow

Description:

It performs Tensorflow benchmarks on GPU, it is composed of 4 different benchmarks using very well known neural network models. See https://www.tensorflow.org/performance/benchmarks for more details.
The benchmark can be run on several machines, however the same instance of the benchmark will be run on each machine. There is no distributed implementation of the benchmarks used.

Useful Parameters:

Parameter Description

bench_id

chooses a model as well as the right batch size for the bechmark. By default all possible benchmarks are executed: 0, 1, 2, 3.

model

model of neural network to be used during benchmarking, options are: inceptionv3, resnet-50, resnet-152, vgg16, alexNex. By default all models are run.

batch_size

Size of batch, number of images to feed to the neural network at a time. Values depend on the model used, default values are taken from https://www.tensorflow.org/performance/benchmarks.