Introduction to High-Performance Computing

Bootstrapping your use of HPC

Overview

Teaching: 10 min
Exercises: 140 min
Questions
  • How can I get started on using HPC?

  • Where can I get help to start using HPC?

Objectives
  • Get help to get your work up and running on an HPC system

  • Understand where you can get help from in the future

Now you know enough about HPC to explore how to use it for your work or to understand what its potential benefits are you. You may also have ideas around where the barriers and difficulties may lie and have further questions on how you can start using and/or trying HPC in your area.

This session is designed to give you the opprotunity to explore these questions and issues. The instructors and helpers on the course will be on hand to answer your questions and discuss next steps with you.

Potential discussions

Things you could discuss with the instructors and helpers could include:

## Options for this session

There are a number of different options for practical work during this session. The challenges below include: exploring your own work; an extended example using a parallel HPC application; an extended example using high throughput computing on multiple serial analyses. If you have something else you want to use the session for (e.g. to discuss things with the instructors/helpers as described above) then please feel free to do this. The idea of the session is to help you bootstrap your use of advanced computing and this will differ from individual to individual!

Exploring your work using HPC

If you have a practical example of something from your area of work that you would like help with getting up and running on an HPC system or exploring the performance of on an HPC system, this is great! Please feel free to discuss this with us and ask questions (both technical and non-technical).

Exploring the performance of GROMACS

GROMACS is a world-leading biomolecular modelling package that is heavily used on HPC systems around the world. Choosing the best resources for GROMACS calculations is non-trivial as it depends on may factors, including:

In this exercise, you should try and decide on a good choice of resources and settings on Cirrus for a typical biomolecular system. This will involve:

If you want to explore further than this initial task then there are a number of different interesting ways to do this. For example:

Please ask for more information on these options from a helper!

Running many serial BLAST+ analyses in parallel

BLAST+ is a…

In this exercise, you should use what you have learnt so far to set up a way to run multiple serial BLAST+ analyses in parallel. There are many different ways to do this that can be used on their own or in combination. Some ideas include:

We have prepared an example dataset that has 100 sequences to analyse (actually this is 10 sequences repeated 10 times). This set is based on the BLAST GNU Parallel example at: https://github.com/LangilleLab/microbiome_helper/wiki/Quick-Introduction-to-GNU-Parallel

This exercise involves:

You can explore further by investigating different ways to parallelise this problem and/or combining multiple parallel strategies.

You could also investigate the variation in performance as you run multiple copies on a node. At what point does the hardware become overloaded?

Key Points