Tursa scheduler configuration update 2026

Last update

This information was last updated on 15 July 2026.

On 9 July 2026 the Tursa scheduler configuration underwent major changes to make the resource more flexible. This page provides an overview of the planned changes and what it means for users.

Summary of user impact

This list provides a high level overview of user impacts. More detail can be found lower down this page.

Strict topology blocking will no longer be enforced on the system.
Users that wish to continue to request topology blocking will need to use the --switches option to sbatch/srun/salloc to request this.
Jobs queued during the migration will retain their topology blocking requests that were the default at time of submission. Users that want to remove this constraint will need to delete jobs and resubmit.

Overview of Tursa interconnect topology

The current configuration of the scheduler is designed to match onto the interconnect topology as initial use cases for Tursa were critically dependent on best interconnect performance which can only by achieved by matching job layout to interconnect topology.

Many of the changes are based on relaxing topology restrictions so we provide a very brief description of the topology layout here as useful context for the following descriptions.

Tursa is composed of 181 compute nodes, all but five of which are contained within blocks connected to a set of 4 L1 switches. These are divided into

14 blocks of 8 nodes of A100-40
6 blocks of 8 nodes of A100-80 nodes
4 blocks of 4 nodes of A100-80 nodes

All blocks are connected via 20 L2 switches in a fat tree topology.

What has changed?

Removal of enforced topology blocking

Originally, all jobs on Tursa are subject to enforced topology blocking. For multi-node jobs of 8 nodes or less, all the nodes in the job had to come from a single block that shared an L1 switch. For jobs larger than 8 nodes, the scheduler was further configured to allocate jobs into predefined 16, 32 and 64 node blocks based on tested performance between groups of individual 8 node blocks.

For users, the effect of these restrictions was that single node failures in a block could render larger jobs very difficult to place and lead to long queue times.

This change removed this enforced blocking so that jobs, by default, can be assigned nodes from anywhere in the interconnect topology. This allows for more flexibility in job placement and reduces the impact of single node failures on availability of resources.

Users can recover the strict blocking behaviour by using the Slurm --switches option. See below for specific examples of how to use this.

Performance of 8-nodes or less jobs

If the application being run on 8-nodes or less has a critical performance dependence on interconnect performance then ensuring all nodes in the job come from the same block that share an L1 switch makes sense to get best performance. Many HPC applications do not have their performance limited in such a way.

Removal of power of two job size restriction

Another consequence of the strict interconnect blocking is that jobs were restricted to power of two node count sizes (to match onto the topology layout).

This change removed the restriction on job node count sizes to power of two. This allows for more flexibility in job placement.

Changes to priority formula

Along with changes to blocking restrictions, we implemented some changes to the job priority setup:

Enable allocation-tied fairshare: orginally, all projects had the same number of shares on the service. This change linked the number of shares on the system to the size of the project's current GPUh allocation. This change helps ensure that projects get priorities that allow them to use the level of allocation they have been granted on the service.
Update priority weights: To support the change in allocation-tied fairshare, we updated the weights for different parts of the priority formula to increase the weight associated with the fairshare priority component relative to other components.

Slurm `--switches` option

Users can use the Slurm --switches option to partially recover the job topology blocking behaviour from before the change.

Static 16-node block or greater setup cannot be recovered

In the original configuration, the blocks in the topology that make up 16, 32, 64 or 128 node blocks are statically defined. In the updated configuration, any combination of 8 node blocks can make up larger jobs with strict blocking defined by --switches options as described below.

Default behaviour

In the updated configuration, no interconnect blocking is enforced by default. If you do not supply any additional options to your Slurm jobs the scheduler is free to pick nodes from anywhere on the system.

Enforcing blocking: 8-nodes or less

A single interconnect block on the Tursa topology (where all nodes share L1 switches) contains 8 nodes. If you are running jobs of 8-nodes and less and want to ensure all nodes are in a single block (i.e. share a L1 switch) you should add the --switches=1 option to your Slurm submission commands. Most commonly, this would mean adding the following to your batch submission script:

#SBATCH --switches=1

Adding blocking is likely to increase queue time

Adding the additional constraint around interconnect blocking with the switches option will likely have a detrimental effect on queue time.

Enforced blocking: 16-nodes or more

Beyond 8-nodes, to ensure strict interconnect blocking, you need to increase the number of switches along with node count:

Node count	`switches` option	Notes
16	`switches=2`
32	`switches=4`
64	`switches=8`	Not available for A100-80
128	`switches=16`	Not available for A100-80. Only available when A100-40 and A100-80 nodes are mixed in the job.

Power of 2 job sizes make sense for strict blocking

While there is no longer a restriction to power of two size jobs in the updated configuration, it will usually make sense to stick to these sizes if you want to request strict interconnect topology blocking.