1. What’s UncleBench?

UncleBench is a benchmarking software currently using JUBE as a benchmarking environment engine. Its architecture makes it easier to handle platforms settings, benchmark descriptions, sources and test cases as separate resources. It provides useful commands to modify parameters on the fly without having to change the benchmark or platform description files. Additionally, a report command gives the possibility to extract performance results and build an annotated HTML performance reports.

2. Obtaining, building and installing UncleBench

UncleBench’s code is available at GitHub https://github.com/edf-hpc/unclebench. You can clone it using git:

$ git clone https://github.com/edf-hpc/unclebench.git

Debian packaging files can be found here:

$ https://github.com/scibian/unclebench.git

So, if you are using a Debian or a Debian derivative system, you only need to build a Debian package and install it. Before building the package, you need to install the following packages:

$ apt-get install debhelper dh-python python-all python-setuptools pandoc dpkg-dev

If you are not using Debian, UncleBench provides a setuptools script for its installation. Just run:

$ python setup.py install

3. Benchmark and platform files

Jube benchmark and platform description files should be located under the directories listed in the following table.

Description Default directory Environment variable for custom path

Platform files location

/usr/share/unclebenrch/platform

UBENCH_PLATFORM_DIR

Benchmark files location

/usr/share/unclebench/benchmarks

UBENCH_BENCHMARK_DIR

All platforms and benchmarks files located under those directories will be recognized by UncleBench.

3.1. Benchmark file writing

In order to add a benchmark we have to create a directory named as the benchmark. This file will contain a jube file named as the benchmark:

  benchmarks/
  ├── hpl
  │   ├── HPL.dat.in
  │   ├── hpl.xml
  │   ├── Make.gnu
  │   ├── Make.intel
  │   └── pq_script.py
  └── stream
      └── stream.xml

The benchmark directory can contain any additional files necessary for the benchmark configuration (e.g. hpl.xml, Make.gnu, etc). Benchmarks files (e.g. hpl.xml, stream.xml ) are written using Jube format based on XML.

The following constraints enables UncleBench integration:

A parameter named nodes must be used to define the number of nodes on which the benchmark should be run:

  <parameter name="nodes" type="int">1</parameter>

A parameter named submit must be used to launch the batch script:

  <do> $submit $submit_script </do>

A platform loading header must be added to enable the unclebench -p option:

  <include-path>
	<path> $UBENCH_PLATFORM_DIR </path>
  </include-path>
  <include from="platforms.xml" path="include-path"/>

A multisource section is needed in order to describe where the sources and test cases can be found.

  <multisource>
        <source protocol="https">
	     <url>https://<server_url>/benchs/public</url>
	     <file>/stream/stream-5.10.tar.gz </file>
        </source>
  </multisource>

The sources of a benchmark can be downloaded using these protocols : "Git", "SVN", "Local", "https". The protocols "Git" and "SVN" have the parameter revision, which determines the version of the benchmark to be downloaded by putting the corresponding commit. It allows to create a symbolic link with a name composed of the revision and the name of repository which will be used during the execution. <file>,<do> are two optional parameters to precise files and execute Shell commands within the multisource.

For example to use the configuration files located in the benchmark directory we will add a local source:

  <source protocol="local" >
  	<file>$UBENCH_BENCHMARK_DIR/hpl/HPL.dat.in</file>
	<file>$UBENCH_BENCHMARK_DIR/hpl/Make.gnu</file>
	<file>$UBENCH_BENCHMARK_DIR/hpl/Make.intel</file>
  </source>

A second example to use the configuration files located in Git repository using a specific commit:

  <source protocol="git" name="io500">
        <url>https://github.com/VI4IO/io-500-dev.git</url>
	<revision>b316f68fb724172627ac5add193e3374de879d3c</revision>
	<do> sed "0,/build_ior/ s/build_ior//" utilities/prepare.sh \
	| sed "s#./configure#true#;s#./bootstrap#true#" | bash </do>
	<file>IO500/IO500.xml</file>
  </source>

The definition of complex benchmarks can be divided into several files, it is commonly divided into steps (e.g prepare, compile, execute). Therefore a benchmark definition will be composed of several files that are organized as follows:

  bundle/
  ├── bench_execute.xml
  ├── bench_compile.xml
  ├── bench_prepare.xml
  └── bench.xml

The main file bench.xml will look something like:

<jube>

    <include-path>
	<path> $UBENCH_PLATFORM_DIR </path>
    </include-path>
    <include from="platforms.xml" path="include-path"/>

    <multisource>
    .
    .
    .
    </multisource>

    <benchmark name="bench" outpath="benchmark_runs">

	<!-- =====================  Compile  ===================== -->

	<include from="bench_compile.xml" path="fileset"/>
	<include from="bench_compile.xml" path="parameterset"/>
	<include from="bench_compile.xml" path="step"/>

	<!-- =====================  Prepare  ===================== -->

	<include from="bench_prepare.xml" path="fileset"/>
	<include from="bench_prepare.xml" path="step"/>

	<!-- =====================  Execute  ===================== -->

	<include from="bench_execute.xml" path="fileset"/>
	<include from="bench_execute.xml" path="parameterset"/>
	<include from="bench_execute.xml" path="substituteset"/>
	<include from="bench_execute.xml" path="step"/>


    </benchmark>
</jube>

All benchmark should have a step execute which minimal structure should look like this:

<step depend="prepare" name="execute">

      <use>system_parameters</use>
      <use from="platform.xml">cluster_specs</use>
      <use from="platform.xml">execute_set</use>
      <use from="platform.xml">execute_sub</use>
      <use from="platform.xml">jobfiles</use>

      <do done_file="$done_file">$submit $submit_script</do>

</step>

More information about the syntax can be found in the Jube website

3.2. Platform file writing

Platforms are organized under the directory platform, each platform has a directory in which a file platform.xml is located. The structure looks something like this:


├── platform
│   ├── template.xml
│   └── Zbook15
│       ├── platform.xml
│       └── submit.job.in

Here, only platform Zbook15 is defined. The file template.xml is a file used by unclebench to generate automatically a global platform.xml file which allows to include any platform from the benchmark description. The parameters normally used in this file are described here

4. Download benchmark sources and test cases

4.1. multisource sections

A multisource section is needed in the <benchmark_name>.xml Jube file to describe where the sources and test cases can be found.

  <multisource>
        <source protocol="https">
	     <url>https://<server_url>/benchs/public</url>
	     <file>/stream/stream-5.10.tar.gz </file>
        </source>
  </multisource>

Different protocols can be used: local, http and SVN. You have the possiblity of referencing the files downloaded by assigning them a name, for example:

  <multisource>
        <source protocol="https" name="stream">
	     <url>https://<server_url>/benchs/public</url>
	     <file>/stream/stream-5.10.tar.gz </file>
        </source>
  </multisource>

In this case a JUBE variable called stream will be availble to be referenced thourgh the benchmark description, for example:

  <fileset name="source">
    <prepare>tar -xvf ${stream} .</prepare>
  </fileset>

You can use this functionality for automatically creating JUBE workspaces for each file downloaded. For example, if you want to test two different versions:

  <multisource>
        <source protocol="https" name="stream">
	     <url>https://<server_url>/benchs/public</url>
	     <file>/stream/stream-4.10.tar.gz </file>
	     <file>/stream/stream-5.10.tar.gz </file>
        </source>
  </multisource>

To use the generated JUBE variables from another steps you have to add <use> ubench_config </use> and <use> ubench_files </use> elements into the step. For example, if you want the previous files to be accessible on a compile step:

  <step name="compile" export="true">

      <use> ubench_config </use>
      <use> ubench_files </use>

      <do> tar -xvf ${stream} -C stream </do>
      <!-- Choose compiler and MPI versions -->
      <do> module purge </do>
      <do> module load $module_compile $module_mpi </do>
      <do workdir="stream">
	${mpi_cc} ${cflags_opt} -std=c99 -DSTREAM_ARRAY_SIZE=${stream_array_size} -mcmodel=large -fopenmp stream_mpi.c -o stream_mpi.exe
      </do>
    </step>

Physically, the files would be accessible through symbolic links in the step directory.

4.2. ubench fetch

$ ubench fetch -b <benchmark_name>

This command downloads benchmark sources and test case files from a location specified by the multisource section to a local directory. This default local directory where resources are fetched is /scratch/<user>/Ubench/resource but can be customized with UBENCH_RESOURCE_DIR environment variable.

5. Run benchmark(s)

5.1. ubench run

The ubench run command launches one or multiple benchmarks on a given platform. We can specify a number of nodes to run on, using a number or a list of nodes using clustershell. It is also possible to customize benchmark parameters on the fly.

$ ubench run -b <benchmark_name> -p <platform> [-w <nodes_list|number_of_nodes>] [-c <paramter_name>:<value>]

Run <benchmark_name> on 4 nodes:

$ ubench run -b <benchmark_name> -p <platform> -w 4
--- Ubench platform name set to : <platform>
---- <benchmark_name> description files are already present in run directory and will be overwritten.
---- benchmark run directory : /scratch/<user>/Ubench/benchmarks/<platform>/<benchmark_name>/benchmark_runs/000000
---- Use the following command to follow benchmark progress: " ubench log -p <platform> -b <benchmark_name> -i 000000"

Run three times <benchmark_name> on 4,8 and 16 nodes:

$ ubench run -b <benchmark_name> -p <platform> -w 4 8 16

Run <benchmark_name> on 6 nodes with precise given Ids:

$ ubench run -p <platform> -b <benchmark_name> -w cn[100-105,205,207]

5.2. ubench list / ubench result

The ubench list command prints a table of all runs that have been launched for a benchmark on a given platform.

  $ ubench list -p <platform> -b <benchmark_name>

  Benchmark_name   | Platform   | ID     | Date              | Run_directory  | Nodes |
  ------------------------------------------------------------------------------------
  <benchmark_name> | <platform> | 000000 | Mon Jan 16 10...  | /scratch/....  | 1     |
  <benchmark_name> | <platform> | 000001 | Mon Jan 16 10...  | /scratch/....  | 4     |

The ubench result command calls the jube result command that parse benchmark result files according to <analyse> and <result> xml sections content. If no id is given results of the last executed benchmark are printed.

$ ubench result -p <platform> -b <benchmark_name> [-i <benchmark_id]

Example with a hpcc benchmark launch with -w 4 8 16 option:

  $ ubench result -p <platform> -b hpcc

  Processing hpcc benchmark :
  ----analysing results
  ----extracting analysis
  nodes  MPIRandAcc_GUPs  MPIFFT_Gflops  PTRANS_GBs  StarDGEMM_Gflops  RORingBandwidth_GBytes  RORingLatency_usec  ...
  4      0.862475         106.63         33.4613     34.5633           0.496759                1.27526             ...
  8      1.40598          177.489        70.8118     34.0936           0.401475                1.28486             ...
  16     2.62665          345.495        121.784     33.5744           0.348505                1.32338             ...

Additionally, a result file in YAML format is generated which contains environment related information and results. An excerpt of such a file is shown below:

Benchmark_name: hpcc
Date: Wed Feb  7 10:11:17 2018
ID: '000000'
Nodes: 1 4 8 16
Platform: cluster
Run_directory: /scratch/user/hpcc/benchmark_runs/000000
cmdline: ubench run -b hpcc -p cluster -w 32 48
runs:
  '1':
    GB_per_node: '64'
    LLC_cache_line_size: '64'
    MB_LLC_size: '2.5'
    NUMA_regions: '2'
    OMP_NUM_THREADS: '1'
    arch: intel
    args_exec: ''
    args_starter: -n 24
    cc: icc
    cflags: -O2
    cflags_opt: -O3 -xHost
    chainjob_needs_submit: 'false'
    chainjob_script: ./judge-chainJobs.sh
    comp_v: '2'
    comp_version: intel
    context_fields:
    - nodes
    - modules
    cores_per_NUMA_region: '12'
    custom_id: '0'
    custom_nodes: '1'
    done_file: ready
    env: export OMP_NUM_THREADS=1
    errlogfile: job.err
    executable: ./hpcc
    fc: ifort
    fflags: -O2
    fflags_opt: -O3 -xHost
    job_id_ubench: '13854133'
    mail: ''
    memory_proportion: '0.1'
    module_compile: icc/2016.4.072 ifort/2016.4.072
    module_mpi: impi/2016.4.072
    mpi_cc: mpicc
    mpi_cxx: mpicxx
    mpi_f77: mpif77
    mpi_f90: mpif90
    mpi_v: '4'
    nodes: '1'
    notification: ''
    outlogfile: job.out
    platform_name: cluster
    results_bench:
      MPIFFT_Gflops: '31.0547'
      MPIRandAcc_GUPs: '0.33787'
      PTRANS_GBs: '8.4763'
      RORingBandwidth_GBytes: '1.4881'
      RORingLatency_usec: '0.842583'
      StarDGEMM_Gflops: '21.5038'
      StarSTREAM_Triad: '3.34193'
    starter: mpirun
    submit: sbatch
    submit_script: submit.job
    taskspernode: '24'

The ubench result command has a debug mode that allows to check each execution’s path for more errors’s traceability. it can be activated by adding the option -d or --debug as shown below :

  $ ubench result --debug -p <platform> -b <benchmark_name> [-i <benchmark_id]

5.3. ubench status / ubench log

ubench status command prints the status of each step of a benchmark run.

  $ ubench status -p <platform> -b <benchmark_name>

  Status run dir: /scratch/<user>/Ubench/benchmarks/<platform>
  Processing <benchmark_name> benchmark :

    Status for step: compile
    --------------------------------
    id	started	done	workdir
    --------------------------------
    0 	true	true	/scratch/....

    Status for step: execute
    --------------------------------
    id	started	done	workdir
    --------------------------------
    1	true	false	/scratch/....

ubench log prints the concatenation of every files in <analyse> section + standard files like stdout stderr and run.log. It can be usefull to follow precisely the benchmark process without the need to dig in benchmark run directory.

5.4. ubench compare

Compares results generated by the command ubench result. The comparison will be made between results that were obtained with the same context (i.e., mpi_version, fflags,cc, nodes).


$ ubench compare -d ~/15-03-2018/hpl/benchmark_runs/000001 ~/23-03-2018/hpl/benchmark_runs/000001
   comparing :
   - /home/user/15-03-2018/hpl/benchmark_runs/000001/
   - /home/user/23-03-2018/hpl/benchmark_runs/000001/
nodes                                            modules custom_nodes_id result_pre result_post_0  result_diff_0
   32  icc/2015.1.133 ifort/2015.1.133 openmpi/1.10.4...            None    11660.0       11630.0       -0.25729
   48  icc/2015.1.133 ifort/2015.1.133 openmpi/1.10.4...            None    17060.0       17210.0        0.87925

We can for example compare performance values for different MPI implementations, by using -c to take nodes as context, -a to compare between mpi_versions and passing twice the result directory:


$ ubench compare -d ~/15-03-2018/hpl/ ~/15-03-2018/hpl/ -c nodes -a mpi_version
  - /home/user/15-03-2018/hpl/benchmark_runs/000001/
  - /home/user/15-03-2018/hpl/benchmark_runs/000001/
nodes      mpi_version_pre result_pre   mpi_version_post_0 result_post_0  result_diff_0
  1         OpenMPI-1.10      448.6         OpenMPI-1.10         448.6       0.000000
  1        OpenMPI-2.0.1      457.9        OpenMPI-2.0.1         457.9       0.000000
  1        OpenMPI-2.0.1      457.9  IntelMPI-2016.4.072         459.6       0.371260
  1        OpenMPI-2.0.1      457.9        OpenMPI-1.6.5         450.7      -1.572396
  1        OpenMPI-2.0.1      457.9         OpenMPI-1.10         448.6      -2.031011
  1  IntelMPI-2016.4.072      459.6  IntelMPI-2016.4.072         459.6       0.000000
  1  IntelMPI-2016.4.072      459.6        OpenMPI-1.6.5         450.7      -1.936466
  1  IntelMPI-2016.4.072      459.6         OpenMPI-1.10         448.6      -2.393386
  1  IntelMPI-2016.4.072      459.6        OpenMPI-2.0.1         457.9      -0.369887

6. Advanced usage

6.1. Customize parameters on the fly

With UncleBench it is possible to customize benchmark and platform parameters on the fly. The listparams command lists every customizable parameter:

 $ ubench listparams -b hpl -p <platform_name>

platform parameters
-----------------------------------------------
              comp_v : 2
        comp_version : ["gnu","intel","intel"][$comp_v]
                  cc : ["gcc","icc","icc"][$comp_v]
                  fc : ["gfortran","ifort","ifort"][$comp_v]
              cflags : -O2
              fflags : -O2
          cflags_opt : ["-O3 -march=core-avx2","-O3 -xHost","-O3 -xHost"][$comp_v]
          fflags_opt : "-O3 -march=core-avx2"
      module_compile : ["","icc/2016 ifort/2016","icc/2017 ifort/2017"][$comp_v]
         module_blas : ["","mkl/2016","mkl/2017"][$comp_v]
           blas_root : ["","/opt/intel/2016.0.047","/opt/mkl-2017.0.098"][$comp_v]
             starter : srun
               mpi_v : 2
            mpi_root : ["/opt/openmpi-1/","/opt/openmpi-2/","/opt/impi/2016","/opt/impi/2018/"][$mpi_v]
              mpi_cc : mpicc
             mpi_cxx : mpicxx
             mpi_f90 : mpif90
             mpi_f77 : mpif77
          module_mpi : ["","openmpi/1","openmpi/2","impi/2016-srun","impi/2017-srun"][$mpi_v]
          cuda_tlk_v : 0
             cudnn_v : 1
       platform_name : Cluster1
           partition : cn
         GB_per_node : 126
         MB_LLC_size : 70
 LLC_cache_line_size : 64
        NUMA_regions : 2
cores_per_NUMA_region : 14
cores_per_half_NUMA_region : 7

benchmark parameters
-----------------------------------------------
              hpl_id : 0
                 hpl : ['hpl-2.2.tar.gz'][${hpl_id}]
           variant_v : 0
        variant_name : ["Full_MPI"][$variant_v]
          variant_NB : 192
   memory_proportion : 0.2
       variant_Ntemp : (${memory_proportion}*${nodes}*(${GB_per_node})*1e9/8) ** 0.5
           variant_N : int( round( ${variant_Ntemp} / ${variant_NB} ) * ${variant_NB})
                arch : ${comp_version}
               nodes : 1
        taskspernode : $NUMA_regions*$cores_per_NUMA_region
      threadspertask : 1
          executable : ./xhpl
             modules : $module_compile $module_mpi $module_blas
           timelimit : 24:00:00
        args_starter : ${binding_full_node}

For example a HPL benchmark can be launched with memory_proportion parameter customized to make HPL consume only 40 percent of available memory.

  $ubench run -b hpl -p <platform_name> -c memory_proportion:0.4

  -- Ubench platform name set to : <platform_name>
  ---- hpl description files are already present in run directory and will be overwritten.
  ---- memory_proportion parameter was modified from 0.8 to 0.4 for this run
  ---- benchmark run directory : /scratch/..../hpl/benchmark_runs/000000
  ---- Use the following command to follow benchmark progress :  " ubench log -p eocn -b hpl -i 000000"

In order to run with a different compiler and a different mpi version, we should type:

  $ubench run -p eocn -b hpl -c comp_v:0 mpi_v:2

A file containing the parameters to change can be passed, the previous command can be executed as follows:

  $ubench run -p eocn -b hpl -f parameters.yaml

The content of the file looks like:

comp_v: 0
mpi_v: 2

UncleBench allows to perform only the step execute of a given benchmark using the option -e, some parameters have to be set such as: executable, input files, etc:

  $ubench run -p eocn -b imb -e -c executable:'/scratch/user/imb/exe/IMB-MPI1' modules:'icc/2016 ifort/2016 impi/2016'

A second way to perform an execution-only mode is by passing a yaml file that contains all the necessary changes and parameters such as: binary, modules, executable, input files :

  $ubench run -p eocn -b hpl -e -f parameters.yaml

For example, the benchmark HPL can be re-executed with a file hpl.yaml that looks like:

executable: $UBENCH_RUN_DIR_BENCH/eogn/hpl/benchmark_runs/000001/000001_execute/work/xhpl
binary: $UBENCH_RUN_DIR_BENCH/eogn/hpl/benchmark_runs/000001/000001_execute/work/xhpl
pq_script: $UBENCH_RUN_DIR_BENCH/eogn/hpl/benchmark_runs/000001/000001_execute/work/pq_script.py
data: $UBENCH_RUN_DIR_BENCH/eogn/hpl/benchmark_runs/000001/000001_execute/work/HPL.dat
modules: 'icc/2018.1.038 ifort/2018.1.038 impi/2018.1.038-srun'

6.2. Build performance report

The ubench report command builds asciidoctor source files that are compiled into a global performance report. It requires a Yaml metada file describing where performance results files are located and which information to extract and to write in the performance report.

  $ubench report -m <metadata_file> -o <output_dir>

The Yaml file to provide must be organized as in the example below :

author: '<Report author>'
sessions:
    - default:
        platform: '<platform_name>'
    - '<session1_name>':
        tester: '<session1_tester_name>'
	dir:<session1_result_directory>
        date_start: <start_date_session1>
        date_end: <end_date_session1>
    - '<session2_name>':
        dir:<session2_result_directory>
        tester: '<session2_tester_name>'
        date_start: <start_date_session2>
        date_end: <end_date_session2>
    - ...

contexts:
    - default:
        compare: True
	compare_threshold: '5'
        context:
            - 'nodes'
            - 'tasks'
        context_res: 'mpi_version'
    - '<bench1_name>':
	compare_threshold: '1'
        context:
            - 'nodes'
        context_res: 'mpi_version'
    - ...

benchmarks:
    - default:
        result: 'ok'
    - '<bench1_name>':
        '<session1_name>':
            comment: '<bench1_name_s1_comment>'
        '<session2_name>':
            comment: '<bench1_name_s1_comment>'
    - '<bench2_name>':
        '<session1_name>':
            comment: '<bench2_name_s2_comment>'
        '<session2_name>':
            comment: '<bench2_name_s2_comment>'
    - ...