Connecting to the remote HPC system
To connect to a remote HPC system using SSH, we would run the following command and enter an SSH key passphrase and machine password. (For ARCHER2, this is a TOTP)
Why do we use HPC?
- High Performance Computing (HPC) typically involves connecting to
very large computing systems located elsewhere in the world.
- These systems can perform tasks that would be impossible or much
slower on smaller, personal computers.
- We already rely on remote servers every day.
Working on a remote HPC system
- HPC systems are large, fixed-location clusters designed for computationally intensive tasks, unlike cloud systems which are flexible and distributed.
- HPC systems typically provide login nodes and a set of worker nodes.
- The resources found on independent (worker) nodes can vary in volume and type (amount of RAM, processor architecture, availability of network mounted filesystems, etc.).
- Files and environments are often shared across nodes, meaning users can access their data and run jobs anywhere within the cluster.
Working with the scheduler
- Schedulers manage fairness and efficiency on HPC systems, deciding which user jobs run and when.
- A job is any command or script submitted for execution.
- The scheduler handles how compute resources are shared between users.
- Jobs should not run on login nodes — they must be submitted to the scheduler.
- MPI jobs require special launch commands (srun, mpirun, etc.) and explicit process counts to utilize multiple cores or nodes effectively.
Accessing software via Modules
- HPC systems use modules to help deal with software incompatibilities, versioning and dependencies
- We can see what modules we currently have loaded with
module list
- We can see what modules are available with
module avail
- We can load a module with
module load softwareName
. - We can unload a module with
module unload softwareName
. - We can swap modules for different versions with
module swap old-softwareName new-softwareName
.
Transferring files with remote computers
- It is an essential skill to be able to transfer files to and from a cluser
-
wget
andcurl -O
can be used to download a file from the internet. -
scp
transfers files to and from your computer. - If you have a lot of data to transfer, it is good practice to archive and compress the data
Using resources effectively
- Benchmarking is an essential practice for understanding your workload and using resources efficiently
- Efficient usage is not just about getting the time-to-solution as low as possible
Using shared resources responsibly
- Login nodes are a shared resource - be a good citizen!
- Your data on the system is your responsibility.
- Plan and test your large-scale work to prevent inefficient use of resources
- It is often best to convert many files to a single archive file before transferring.
- Again, don’t run stuff on the login node.