1. What’s UncleBench?
UncleBench is a benchmarking software currently using JUBE as a benchmarking environment engine. Its architecture makes it easier to handle platforms settings, benchmark descriptions, sources and test cases as separate resources. It provides useful commands to modify parameters on the fly without having to change the benchmark or platform description files. Additionally, a report command gives the possibility to extract performance results and build an annotated HTML performance reports.
2. Obtaining, building and installing UncleBench
UncleBench’s code is available at GitHub https://github.com/edf-hpc/unclebench. You can clone it using git:
$ git clone https://github.com/edf-hpc/unclebench.git
Debian packaging files can be found here:
$ https://github.com/scibian/unclebench.git
So, if you are using a Debian or a Debian derivative system, you only need to build a Debian package and install it. Before building the package, you need to install the following packages:
$ apt-get install debhelper dh-python python-all python-setuptools pandoc dpkg-dev
If you are not using Debian, UncleBench provides a setuptools script for its installation. Just run:
$ python setup.py install
3. Benchmark and platform files
Jube benchmark and platform description files should be located under the directories listed in the following table.
Description | Default directory | Environment variable for custom path |
---|---|---|
Platform files location |
/usr/share/unclebenrch/platform |
UBENCH_PLATFORM_DIR |
Benchmark files location |
/usr/share/unclebench/benchmarks |
UBENCH_BENCHMARK_DIR |
All platforms and benchmarks files located under those directories will be recognized by UncleBench.
3.1. Benchmark file writing
In order to add a benchmark we have to create a directory named as the benchmark. This file will contain a jube file named as the benchmark:
benchmarks/ ├── hpl │ ├── HPL.dat.in │ ├── hpl.xml │ ├── Make.gnu │ ├── Make.intel │ └── pq_script.py └── stream └── stream.xml
The benchmark directory can contain any additional files necessary for the benchmark configuration (e.g. hpl.xml, Make.gnu, etc). Benchmarks files (e.g. hpl.xml, stream.xml ) are written using Jube format based on XML.
The following constraints enables UncleBench integration:
A parameter named nodes must be used to define the number of nodes on which the benchmark should be run:
<parameter name="nodes" type="int">1</parameter>
A parameter named submit must be used to launch the batch script:
<do> $submit $submit_script </do>
A platform loading header must be added to enable the unclebench -p option:
<include-path> <path> $UBENCH_PLATFORM_DIR </path> </include-path> <include from="platforms.xml" path="include-path"/>
A multisource section is needed in order to describe where the sources and test cases can be found.
<multisource> <source protocol="https"> <url>https://<server_url>/benchs/public</url> <file>/stream/stream-5.10.tar.gz </file> </source> </multisource>
The sources of a benchmark can be downloaded using these protocols : "Git", "SVN", "Local", "https". The protocols "Git" and "SVN" have the parameter revision, which determines the version of the benchmark to be downloaded by putting the corresponding commit. It allows to create a symbolic link with a name composed of the revision and the name of repository which will be used during the execution. <file>,<do> are two optional parameters to precise files and execute Shell commands within the multisource.
For example to use the configuration files located in the benchmark directory we will add a local source:
<source protocol="local" > <file>$UBENCH_BENCHMARK_DIR/hpl/HPL.dat.in</file> <file>$UBENCH_BENCHMARK_DIR/hpl/Make.gnu</file> <file>$UBENCH_BENCHMARK_DIR/hpl/Make.intel</file> </source>
A second example to use the configuration files located in Git repository using a specific commit:
<source protocol="git" name="io500"> <url>https://github.com/VI4IO/io-500-dev.git</url> <revision>b316f68fb724172627ac5add193e3374de879d3c</revision> <do> sed "0,/build_ior/ s/build_ior//" utilities/prepare.sh \ | sed "s#./configure#true#;s#./bootstrap#true#" | bash </do> <file>IO500/IO500.xml</file> </source>
The definition of complex benchmarks can be divided into several files, it is commonly divided into steps (e.g prepare, compile, execute). Therefore a benchmark definition will be composed of several files that are organized as follows:
bundle/ ├── bench_execute.xml ├── bench_compile.xml ├── bench_prepare.xml └── bench.xml
The main file bench.xml will look something like:
<jube> <include-path> <path> $UBENCH_PLATFORM_DIR </path> </include-path> <include from="platforms.xml" path="include-path"/> <multisource> . . . </multisource> <benchmark name="bench" outpath="benchmark_runs"> <!-- ===================== Compile ===================== --> <include from="bench_compile.xml" path="fileset"/> <include from="bench_compile.xml" path="parameterset"/> <include from="bench_compile.xml" path="step"/> <!-- ===================== Prepare ===================== --> <include from="bench_prepare.xml" path="fileset"/> <include from="bench_prepare.xml" path="step"/> <!-- ===================== Execute ===================== --> <include from="bench_execute.xml" path="fileset"/> <include from="bench_execute.xml" path="parameterset"/> <include from="bench_execute.xml" path="substituteset"/> <include from="bench_execute.xml" path="step"/> </benchmark> </jube>
All benchmark should have a step execute which minimal structure should look like this:
<step depend="prepare" name="execute"> <use>system_parameters</use> <use from="platform.xml">cluster_specs</use> <use from="platform.xml">execute_set</use> <use from="platform.xml">execute_sub</use> <use from="platform.xml">jobfiles</use> <do done_file="$done_file">$submit $submit_script</do> </step>
More information about the syntax can be found in the Jube website
3.2. Platform file writing
Platforms are organized under the directory platform, each platform has a directory in which a file platform.xml is located. The structure looks something like this:
├── platform │ ├── template.xml │ └── Zbook15 │ ├── platform.xml │ └── submit.job.in
Here, only platform Zbook15 is defined. The file template.xml is a file used by unclebench to generate automatically a global platform.xml file which allows to include any platform from the benchmark description. The parameters normally used in this file are described here
4. Download benchmark sources and test cases
4.1. multisource sections
A multisource section is needed in the <benchmark_name>.xml Jube file to describe where the sources and test cases can be found.
<multisource> <source protocol="https"> <url>https://<server_url>/benchs/public</url> <file>/stream/stream-5.10.tar.gz </file> </source> </multisource>
Different protocols can be used: local, http and SVN. You have the possiblity of referencing the files downloaded by assigning them a name, for example:
<multisource> <source protocol="https" name="stream"> <url>https://<server_url>/benchs/public</url> <file>/stream/stream-5.10.tar.gz </file> </source> </multisource>
In this case a JUBE variable called stream will be availble to be referenced thourgh the benchmark description, for example:
<fileset name="source"> <prepare>tar -xvf ${stream} .</prepare> </fileset>
You can use this functionality for automatically creating JUBE workspaces for each file downloaded. For example, if you want to test two different versions:
<multisource> <source protocol="https" name="stream"> <url>https://<server_url>/benchs/public</url> <file>/stream/stream-4.10.tar.gz </file> <file>/stream/stream-5.10.tar.gz </file> </source> </multisource>
To use the generated JUBE variables from another steps you have to add <use> ubench_config </use> and <use> ubench_files </use> elements into the step. For example, if you want the previous files to be accessible on a compile step:
<step name="compile" export="true"> <use> ubench_config </use> <use> ubench_files </use> <do> tar -xvf ${stream} -C stream </do> <!-- Choose compiler and MPI versions --> <do> module purge </do> <do> module load $module_compile $module_mpi </do> <do workdir="stream"> ${mpi_cc} ${cflags_opt} -std=c99 -DSTREAM_ARRAY_SIZE=${stream_array_size} -mcmodel=large -fopenmp stream_mpi.c -o stream_mpi.exe </do> </step>
Physically, the files would be accessible through symbolic links in the step directory.
4.2. ubench fetch
$ ubench fetch -b <benchmark_name>
This command downloads benchmark sources and test case files from a location specified by the multisource section to a local directory. This default local directory where resources are fetched is /scratch/<user>/Ubench/resource but can be customized with UBENCH_RESOURCE_DIR environment variable.
5. Run benchmark(s)
5.1. ubench run
The ubench run command launches one or multiple benchmarks on a given platform. We can specify a number of nodes to run on, using a number or a list of nodes using clustershell. It is also possible to customize benchmark parameters on the fly.
$ ubench run -b <benchmark_name> -p <platform> [-w <nodes_list|number_of_nodes>] [-c <paramter_name>:<value>]
Run <benchmark_name> on 4 nodes:
$ ubench run -b <benchmark_name> -p <platform> -w 4 --- Ubench platform name set to : <platform> ---- <benchmark_name> description files are already present in run directory and will be overwritten. ---- benchmark run directory : /scratch/<user>/Ubench/benchmarks/<platform>/<benchmark_name>/benchmark_runs/000000 ---- Use the following command to follow benchmark progress: " ubench log -p <platform> -b <benchmark_name> -i 000000"
Run three times <benchmark_name> on 4,8 and 16 nodes:
$ ubench run -b <benchmark_name> -p <platform> -w 4 8 16
Run <benchmark_name> on 6 nodes with precise given Ids:
$ ubench run -p <platform> -b <benchmark_name> -w cn[100-105,205,207]
5.2. ubench list / ubench result
The ubench list command prints a table of all runs that have been launched for a benchmark on a given platform.
$ ubench list -p <platform> -b <benchmark_name> Benchmark_name | Platform | ID | Date | Run_directory | Nodes | ------------------------------------------------------------------------------------ <benchmark_name> | <platform> | 000000 | Mon Jan 16 10... | /scratch/.... | 1 | <benchmark_name> | <platform> | 000001 | Mon Jan 16 10... | /scratch/.... | 4 |
The ubench result command calls the jube result command that parse benchmark result files according to <analyse> and <result> xml sections content. If no id is given results of the last executed benchmark are printed.
$ ubench result -p <platform> -b <benchmark_name> [-i <benchmark_id]
Example with a hpcc benchmark launch with -w 4 8 16 option:
$ ubench result -p <platform> -b hpcc Processing hpcc benchmark : ----analysing results ----extracting analysis nodes MPIRandAcc_GUPs MPIFFT_Gflops PTRANS_GBs StarDGEMM_Gflops RORingBandwidth_GBytes RORingLatency_usec ... 4 0.862475 106.63 33.4613 34.5633 0.496759 1.27526 ... 8 1.40598 177.489 70.8118 34.0936 0.401475 1.28486 ... 16 2.62665 345.495 121.784 33.5744 0.348505 1.32338 ...
Additionally, a result file in YAML format is generated which contains environment related information and results. An excerpt of such a file is shown below:
Benchmark_name: hpcc Date: Wed Feb 7 10:11:17 2018 ID: '000000' Nodes: 1 4 8 16 Platform: cluster Run_directory: /scratch/user/hpcc/benchmark_runs/000000 cmdline: ubench run -b hpcc -p cluster -w 32 48 runs: '1': GB_per_node: '64' LLC_cache_line_size: '64' MB_LLC_size: '2.5' NUMA_regions: '2' OMP_NUM_THREADS: '1' arch: intel args_exec: '' args_starter: -n 24 cc: icc cflags: -O2 cflags_opt: -O3 -xHost chainjob_needs_submit: 'false' chainjob_script: ./judge-chainJobs.sh comp_v: '2' comp_version: intel context_fields: - nodes - modules cores_per_NUMA_region: '12' custom_id: '0' custom_nodes: '1' done_file: ready env: export OMP_NUM_THREADS=1 errlogfile: job.err executable: ./hpcc fc: ifort fflags: -O2 fflags_opt: -O3 -xHost job_id_ubench: '13854133' mail: '' memory_proportion: '0.1' module_compile: icc/2016.4.072 ifort/2016.4.072 module_mpi: impi/2016.4.072 mpi_cc: mpicc mpi_cxx: mpicxx mpi_f77: mpif77 mpi_f90: mpif90 mpi_v: '4' nodes: '1' notification: '' outlogfile: job.out platform_name: cluster results_bench: MPIFFT_Gflops: '31.0547' MPIRandAcc_GUPs: '0.33787' PTRANS_GBs: '8.4763' RORingBandwidth_GBytes: '1.4881' RORingLatency_usec: '0.842583' StarDGEMM_Gflops: '21.5038' StarSTREAM_Triad: '3.34193' starter: mpirun submit: sbatch submit_script: submit.job taskspernode: '24'
The ubench result command has a debug mode that allows to check each execution’s path for more errors’s traceability. it can be activated by adding the option -d or --debug as shown below :
$ ubench result --debug -p <platform> -b <benchmark_name> [-i <benchmark_id]
5.3. ubench status / ubench log
ubench status command prints the status of each step of a benchmark run.
$ ubench status -p <platform> -b <benchmark_name> Status run dir: /scratch/<user>/Ubench/benchmarks/<platform> Processing <benchmark_name> benchmark : Status for step: compile -------------------------------- id started done workdir -------------------------------- 0 true true /scratch/.... Status for step: execute -------------------------------- id started done workdir -------------------------------- 1 true false /scratch/....
ubench log prints the concatenation of every files in <analyse> section + standard files like stdout stderr and run.log. It can be usefull to follow precisely the benchmark process without the need to dig in benchmark run directory.
5.4. ubench compare
Compares results generated by the command ubench result. The comparison will be made between results that were obtained with the same context (i.e., mpi_version, fflags,cc, nodes).
$ ubench compare -d ~/15-03-2018/hpl/benchmark_runs/000001 ~/23-03-2018/hpl/benchmark_runs/000001 comparing : - /home/user/15-03-2018/hpl/benchmark_runs/000001/ - /home/user/23-03-2018/hpl/benchmark_runs/000001/
nodes modules custom_nodes_id result_pre result_post_0 result_diff_0 32 icc/2015.1.133 ifort/2015.1.133 openmpi/1.10.4... None 11660.0 11630.0 -0.25729 48 icc/2015.1.133 ifort/2015.1.133 openmpi/1.10.4... None 17060.0 17210.0 0.87925
We can for example compare performance values for different MPI implementations, by using -c to take nodes as context, -a to compare between mpi_versions and passing twice the result directory:
$ ubench compare -d ~/15-03-2018/hpl/ ~/15-03-2018/hpl/ -c nodes -a mpi_version - /home/user/15-03-2018/hpl/benchmark_runs/000001/ - /home/user/15-03-2018/hpl/benchmark_runs/000001/
nodes mpi_version_pre result_pre mpi_version_post_0 result_post_0 result_diff_0 1 OpenMPI-1.10 448.6 OpenMPI-1.10 448.6 0.000000 1 OpenMPI-2.0.1 457.9 OpenMPI-2.0.1 457.9 0.000000 1 OpenMPI-2.0.1 457.9 IntelMPI-2016.4.072 459.6 0.371260 1 OpenMPI-2.0.1 457.9 OpenMPI-1.6.5 450.7 -1.572396 1 OpenMPI-2.0.1 457.9 OpenMPI-1.10 448.6 -2.031011 1 IntelMPI-2016.4.072 459.6 IntelMPI-2016.4.072 459.6 0.000000 1 IntelMPI-2016.4.072 459.6 OpenMPI-1.6.5 450.7 -1.936466 1 IntelMPI-2016.4.072 459.6 OpenMPI-1.10 448.6 -2.393386 1 IntelMPI-2016.4.072 459.6 OpenMPI-2.0.1 457.9 -0.369887
6. Advanced usage
6.1. Customize parameters on the fly
With UncleBench it is possible to customize benchmark and platform parameters on the fly. The listparams command lists every customizable parameter:
$ ubench listparams -b hpl -p <platform_name> platform parameters ----------------------------------------------- comp_v : 2 comp_version : ["gnu","intel","intel"][$comp_v] cc : ["gcc","icc","icc"][$comp_v] fc : ["gfortran","ifort","ifort"][$comp_v] cflags : -O2 fflags : -O2 cflags_opt : ["-O3 -march=core-avx2","-O3 -xHost","-O3 -xHost"][$comp_v] fflags_opt : "-O3 -march=core-avx2" module_compile : ["","icc/2016 ifort/2016","icc/2017 ifort/2017"][$comp_v] module_blas : ["","mkl/2016","mkl/2017"][$comp_v] blas_root : ["","/opt/intel/2016.0.047","/opt/mkl-2017.0.098"][$comp_v] starter : srun mpi_v : 2 mpi_root : ["/opt/openmpi-1/","/opt/openmpi-2/","/opt/impi/2016","/opt/impi/2018/"][$mpi_v] mpi_cc : mpicc mpi_cxx : mpicxx mpi_f90 : mpif90 mpi_f77 : mpif77 module_mpi : ["","openmpi/1","openmpi/2","impi/2016-srun","impi/2017-srun"][$mpi_v] cuda_tlk_v : 0 cudnn_v : 1 platform_name : Cluster1 partition : cn GB_per_node : 126 MB_LLC_size : 70 LLC_cache_line_size : 64 NUMA_regions : 2 cores_per_NUMA_region : 14 cores_per_half_NUMA_region : 7 benchmark parameters ----------------------------------------------- hpl_id : 0 hpl : ['hpl-2.2.tar.gz'][${hpl_id}] variant_v : 0 variant_name : ["Full_MPI"][$variant_v] variant_NB : 192 memory_proportion : 0.2 variant_Ntemp : (${memory_proportion}*${nodes}*(${GB_per_node})*1e9/8) ** 0.5 variant_N : int( round( ${variant_Ntemp} / ${variant_NB} ) * ${variant_NB}) arch : ${comp_version} nodes : 1 taskspernode : $NUMA_regions*$cores_per_NUMA_region threadspertask : 1 executable : ./xhpl modules : $module_compile $module_mpi $module_blas timelimit : 24:00:00 args_starter : ${binding_full_node}
For example a HPL benchmark can be launched with memory_proportion parameter customized to make HPL consume only 40 percent of available memory.
$ubench run -b hpl -p <platform_name> -c memory_proportion:0.4 -- Ubench platform name set to : <platform_name> ---- hpl description files are already present in run directory and will be overwritten. ---- memory_proportion parameter was modified from 0.8 to 0.4 for this run ---- benchmark run directory : /scratch/..../hpl/benchmark_runs/000000 ---- Use the following command to follow benchmark progress : " ubench log -p eocn -b hpl -i 000000"
In order to run with a different compiler and a different mpi version, we should type:
$ubench run -p eocn -b hpl -c comp_v:0 mpi_v:2
A file containing the parameters to change can be passed, the previous command can be executed as follows:
$ubench run -p eocn -b hpl -f parameters.yaml
The content of the file looks like:
comp_v: 0 mpi_v: 2
UncleBench allows to perform only the step execute of a given benchmark using the option -e, some parameters have to be set such as: executable, input files, etc:
$ubench run -p eocn -b imb -e -c executable:'/scratch/user/imb/exe/IMB-MPI1' modules:'icc/2016 ifort/2016 impi/2016'
A second way to perform an execution-only mode is by passing a yaml file that contains all the necessary changes and parameters such as: binary, modules, executable, input files :
$ubench run -p eocn -b hpl -e -f parameters.yaml
For example, the benchmark HPL can be re-executed with a file hpl.yaml that looks like:
executable: $UBENCH_RUN_DIR_BENCH/eogn/hpl/benchmark_runs/000001/000001_execute/work/xhpl binary: $UBENCH_RUN_DIR_BENCH/eogn/hpl/benchmark_runs/000001/000001_execute/work/xhpl pq_script: $UBENCH_RUN_DIR_BENCH/eogn/hpl/benchmark_runs/000001/000001_execute/work/pq_script.py data: $UBENCH_RUN_DIR_BENCH/eogn/hpl/benchmark_runs/000001/000001_execute/work/HPL.dat modules: 'icc/2018.1.038 ifort/2018.1.038 impi/2018.1.038-srun'
6.2. Build performance report
The ubench report command builds asciidoctor source files that are compiled into a global performance report. It requires a Yaml metada file describing where performance results files are located and which information to extract and to write in the performance report.
$ubench report -m <metadata_file> -o <output_dir>
The Yaml file to provide must be organized as in the example below :
author: '<Report author>' sessions: - default: platform: '<platform_name>' - '<session1_name>': tester: '<session1_tester_name>' dir:<session1_result_directory> date_start: <start_date_session1> date_end: <end_date_session1> - '<session2_name>': dir:<session2_result_directory> tester: '<session2_tester_name>' date_start: <start_date_session2> date_end: <end_date_session2> - ... contexts: - default: compare: True compare_threshold: '5' context: - 'nodes' - 'tasks' context_res: 'mpi_version' - '<bench1_name>': compare_threshold: '1' context: - 'nodes' context_res: 'mpi_version' - ... benchmarks: - default: result: 'ok' - '<bench1_name>': '<session1_name>': comment: '<bench1_name_s1_comment>' '<session2_name>': comment: '<bench1_name_s1_comment>' - '<bench2_name>': '<session1_name>': comment: '<bench2_name_s2_comment>' '<session2_name>': comment: '<bench2_name_s2_comment>' - ...