Traffic model simulation: Threaded computation
Having run this example in serial we can now explore running the simulation in parallel in order to test how to speed up getting the average velocity of cars changes with traffic density.
The source code
In this exercise we will be using the traffic simulation program. The source code is available in Git repository EPCC-Exercises we have already downloaded.
We will now be looking at the multi threaded version located in the C-OMP
folder. The uses openmp to parallelise the execution of the model.
Compiling the source code
We will compile the openmp version of the source code using a Makefile.
Move into the C-OMP
directory and list the contents.
cd C-OMP ls
Output:
traffic.c traffic.h trafficlib.c uni.c uni.h Makefile
You will see that there are various code files. The Makefile contains the commands to compile them together to produce the executable program. To use the Makefile type make
command.
Note
We don’t need to set a new environment file as the ‘EPCC-Exercises/Env/env-{ machine_name }.sh’ we sourced earlier also set the correct environment parameters to correctly build the parallel examples of the code on { machine_name }.
make
Output:
cc -g -O3 -c traffic.c
cc -g -O3 -c trafficlib.c
cc -g -O3 -c uni.c
cc -g -O3 -o traffic traffic.o trafficlib.o uni.o -lm
This should produce an executable file called traffic
.
Running: Threaded code
We can run the threaded program directly on the login nodes,
./traffic 0.52
Again the argument is setting the target traffic density of cars for the model.
By default the number of threads used is 256. This can be changed by setting,
export OMP_NUM_THREADS=8
for example this sets the number of threads to 8.
Running this quickly on the login node leads to
Output:
Length of road is 32000000
Number of iterations is 100
Target density of cars is 0.520000
Running on 8 thread(s)
Initialising road ...
...done
Actual density of cars is 0.519982
At iteration 10 average velocity is 0.789461
At iteration 20 average velocity is 0.837157
At iteration 30 average velocity is 0.858209
At iteration 40 average velocity is 0.870573
At iteration 50 average velocity is 0.878940
At iteration 60 average velocity is 0.885087
At iteration 70 average velocity is 0.889761
At iteration 80 average velocity is 0.893441
At iteration 90 average velocity is 0.896447
At iteration 100 average velocity is 0.898955
Finished
Time taken was 2.446868 seconds
Update rate was 1307.794488 MCOPs
Running: Batch system
Now we have tested the simulation on the login node we now need to set this up to run on the compute nodes.
Note
In general jobs should not be run on the login nodes. It is however reasonable to run small and short duration tests to validate your software before submitting to the compute noe queues.
Example substitution:
archer2.slurm:
To run the simulation using the compule nodes you need to a job script,
#!/bin/bash
#SBATCH --job-name=sharpen
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --time=00:01:00
# Replace [budget code] below with your project code (e.g. t01)
#SBATCH --account=[budget code]
#SBATCH --partition=standard
#SBATCH --qos=standard
# Setup the batch environment
module load epcc-job-env
# Set the number of threads to the CPUs per task
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# Launch the parallel job
srun --hint=nomultithread --distribution=block:block ./traffic 0.52
This is an OpenMP program so we control the number of parallel threads used with the --cpus-per-task
variable.
To submit the job to run on the compute nodes we use the sbatch
command
sbatch archer2.slurm
Output:
Submitted batch job 1793266
Where the number is the unique job ID.
Note
On ARCHER2 you must submit jobs from the ``/work`` filesystem.
Monitoring the batch job
The slurm command squeue
can be used to show the status of the jobs. Without any options or arguments it lists all jobs known by the scheduler.
squeue
To show just your jobs add the -u $USER
option
squeue -u $USER
Note that for this example it runs very quickly so you may not see it in the queue before it finishes running.
Finding the output
Slurm places the output from your job in a file called slurm-<jobID>.out
. You can view it using the cat
command
cat slurm-1793266.out
Length of road is 32000000
Number of iterations is 100
Target density of cars is 0.520000
Running on 4 thread(s)
Initialising road ...
...done
Actual density of cars is 0.519982
At iteration 10 average velocity is 0.789461
At iteration 20 average velocity is 0.837157
At iteration 30 average velocity is 0.858209
At iteration 40 average velocity is 0.870573
At iteration 50 average velocity is 0.878940
At iteration 60 average velocity is 0.885087
At iteration 70 average velocity is 0.889761
At iteration 80 average velocity is 0.893441
At iteration 90 average velocity is 0.896447
At iteration 100 average velocity is 0.898955
Finished
Time taken was 2.779974 seconds
Update rate was 1151.089820 MCOPs
Conclusion
Executing this for different values of the traffic density can now be acheived much more quickly as each simulation is fast than the serial case and we can submit jobs to multiple nodes simultaneously allowing us to run in parallel reducing our time to science.
Compare the time taken to perform calculations with the same traffic densities using the serial code and compare the time to solution. Plotting this data can be used to calculate the speed up of the threaded code over the serial version.
The simulation bares out a plot of what we expected that as te traffic density increases the speed of the traffic drops.