1. Introduction

This guide describes how the Zbook15 platform file example is organized. This example may not be optimal but can still help you to build your own platform files.

2. Zbook15 platform parameters

2.1. compiler_set

These parameters define available compiler suites and associated commands and modules.

Parameter Description Value example

compv

Enable the benchmark to select one or multiple compiler versions

0,1

comp_version

Name associated with the compiler suite

[gnu,<other_compiler>][$comp_v]

cc

C/C++ compiler command

["gfortran","<?>"][$comp_v]

cflags

Default C/C++ compilation flags

-O2

fflags

Default Fortran compilation flags

-O2

cflags_opt

Aggresive C optimization flags

["-O3 -march=native","?"][$comp_v]

fflags_opt

Aggresive Fortran optimization flags

["-O3 -march=native","?"][$comp_v]

module_compile

Modules that are needed to compile

module_blas

Modules that provide Blas library

blas_root

Blas Root directory

<Needs a local blas installation path>

2.2. mpi_set

These parameters define available MPI libraries and associated commands and modules.

Parameter Description Value example

mpiv

Enable the benchmark to select one or multiple compiler versions

0,1

mpi_version

Name associated with the compiler suite

[OpenMPI-1.6.5,<other_mpi>][$comp_v]

mpi_cc

C compiler MPI wrapper command

mpicc

mpi_cxx

C++ compiler MPI wrapper command

mpic++

mpi_f90

Fortran compiler MPI wrapper command

mpif90

module_mpi

Module providing MPI

binding_full_node

-n $tasks

Optimal binding when using one task per core

binding_half_node

--mca rmaps_base_schedule_policy slot --bind-to-core -cpus-per-rank 2 -num-sockets 1 --npersocket 2 -n $tasks

Optimal binding when using two times less tasks than available cores.

binding_hybrid

--mca rmaps_base_schedule_policy slot --bind-to-core -cpus-per-rank 2 -num-sockets 1 --npersocket 2 -n $tasks

Optimal binding when using two times less tasks than available cores and two threads per task

binding_stream

--mca rmaps_base_schedule_policy slot --bind-to-core -cpus-per-rank 4 -num-sockets 1 --npersocket 1 -n $tasks

Optimal binding when using one task per socket filling socket cores with threads.

2.3. cuda_set

These parameters define available Nvidia CUDA related libraries and moudles.

Parameter Description Value example

cuda_tlk_v

Enable the benchmark to select one or multiple CUDA toolkit versions

0,1

cudnn_v

Enable the benchmark to select one or multiple CUDA Deep Neural Network (CuDNN) library versions

0,1

module_mpi

Module providing CUDA toolkit

module_mpi

Module providing CuDNN library

2.4. execute_set

Define batch system dependent parameters. As Zbook15 does not use a job scheduler, a Slurm cluster execute_set is shown as an example.

2.4.1. Zbook15

Parameter Description Value example

submit

Command to submit jobs

bash

submit_singleton

Command to submit singleton jobs

(See Slurm Cluster for an example)

submit_script

Name of the batch script template

job.submit

starter

Launcher command

mpirun

args_starter

Optionals starter arguments that may be set by the benchmark

2.4.2. Slurm Cluster

Parameter Description Value example

submit

Command to submit jobs

sbatch

submit_singleton

Command to submit singleton jobs

sbatch --dependency=singleton

submit_script

Name of the batch script template

job.submit

starter

Launcher command

srun

args_starter

Optionals starter arguments that may be set by the benchmark

2.5. cluster_specs

This parameter set defines platform hardware specifications.

Parameter Description Value example

platform_name

Name of the platform

Zbook15

GB_per_node

Available RAM per node (GB)

16

MB_LLC_size

Size of the last level of cache (MB

6

LLC_cache_line_size

Size of LLC cache line (B)

64

NUMA_regions

Number of non uniform memory access regions

1

core_per_NUMA_region

Number of cores per NUMA region

4

2.6. system_parameters

These parameters define default values used to fill the job template file. Values are most of the times reset by benchmark description file.

Parameter Description Value example

nodes

Default number of nodes, set by benchmarks and modified by the -w option

1

taskspernode

Default number of tasks per node

4

threadspertask

Default number of threads per task

1

tasks

Number of tasks

$nodes * $taskspernode

OMP_NUM_THREADS

Number of OpenMP threads

$threadspertask

executable, args_exec,mail, ..

See job.submit template to understand theses parameters

2.7. execute_sub / jobfiles

jobfiles defines the job submission script template file. execute_sub describes all the subsitutions that need to be done to get a complete submission script from the template.