PRACTICAL: Benchmarking Molecular Dynamics Using GROMACS 3

Overview

Teaching: 10 min
Exercises: 20 min
Questions
  • Does simultaneous multithreading (SMT) improve GROMACS performance on ARCHER2?

  • How does load balancing affect GROMACS performance?

Objectives
  • Understand how to use simultaneous multithreading (SMT) on ARCHER2.

  • Learn how to disable GROMACS dynamic load-balancing and appreciate the effect load balancing can have on performance.

Two hardware threads per core

The --hint=nomultithread asks SLURM to ignore the possibility of running two threads per core. If we remove this option, this makes available 256 “cpus” per node (2 threads per core in hardware). To run 8 MPI tasks with 1 task per NUMA region running 32 OpenMP threads, the script would look like:

#!/usr/bin/env bash

#SBATCH --partition=standard
#SBATCH --time=00:10:00

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8

#SBATCH --hint=multithread
#SBATCH --distribution=block:cyclic

#SBATCH --cpus-per-task=32

module load epcc-job-env
module load xthi/1.0

export OMP_PLACES=cores
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

srun gmx_mpi mdrun -ntomp $SLURM_CPUS_PER_TASK -s benchMEM.tpr

Note: physical cores appear as affinity 0-127, while the extra “logical” cores are numbered 128-255. Logical cores 0 and 128 occupy the same physical core etc.

Multithreading and GROMACS?

Staring with the MPI-only case first, how does enabling multithreading affect GROMACS performance?

What about the performance of hybrid MPI+OpenMP jobs?

Load balancing

GROMACS performs dynamic load balancing when it deems necessary. Can you tell from your md.log files so far whether it has been doing so, and what it calculated the load imbalance was before deciding to do so?

To demonstrate the effect of the load imbalance counteracted by GROMACS’s dynamic load balancing scheme, investigate what happens when this is turned off by including the -dlb no option to gmx_mpi mdrun.

Key Points