Introduction to High Performance Computing for Life Scientists

Description

High-performance computing (HPC) is a fundamental technology used to solve a wide range of scientific research problems. Many important challenges in science such as protein folding, the search for the Higgs boson, drug discovery, and the development of nuclear fusion all depend on simulations, models and analyses run on HPC facilities to make progress.

This course introduces HPC to life science researchers, focusing on the aspects that are most important for those new to this technology to understand. It will help you judge how HPC can best benefit your research, and equip you to go on to successfully and efficiently make use of HPC facilities in future. The course will cover basic concepts in HPC hardware, software, user environments, filesystems, and programming models. It also provides an opportunity to gain hands-on practical experience and assistance using an HPC system (ARCHER2, the UK national supercomputing service) through examples drawn from the life sciences, such as biomolecular simulation.

This course is presented by:


General Information

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below). They are also required to abide by the ARCHER2 Training Code of Conduct.

Accessibility: We are committed to making this workshop accessible to everybody. Where the course is being run in person and face-to-face, the workshop organizers have checked that:

All course materials will be provided in advance of the lesson online. If we can help making learning easier for you (e.g. sign-language interpreters, lactation facilities) please get in touch (using contact details below) and we will attempt to provide them.

Contact: Please email J.Sindt@ed.ac.uk for more information.


Prerequisites

You should be happy with connecting using basic bash commands, such as cd, mv, cp, and ssh. Familiarity with using the Linux command line will be useful, but you do not need to be an epexrt in shell scripting. You should also be happy editing plain text files in a remote terminal (or, alternatively, editing them on your local system and copying them to the remote HPC system using scp).


Schedule

Setup Download files required for the lesson
Day 1 10:00 1. Welcome What can I expect from this course?
How will the course work and how will I get help?
How can I give feedback to improve the course?
10:15 2. Who we are: Introducing BioExcel, PRACE, and EPCC training What is BioExcel?
What is PRACE?
What is EPCC?
Who else is attending this course?
11:00 3. LECTURE: High-Performance Computing (HPC) What is high-performance computing?
11:20 4. PRACTICAL: Connecting to ARCHER2 and transferring data How can I access ARCHER2 interactively and transfer data?
11:40 5. BREAK Break
12:00 6. LECTURE: HPC Architectures [pre-recorded] What components make up a high-performance computer?
How are they typically connected together?
What does this mean for how software runs?
12:20 7. PRACTICAL: Overview of the ARCHER2 system and modules What hardware and software is available on ARCHER2?
How does the hardware fit together?
What software is available on ARCHER2 and how can I use it?
13:00 8. LUNCH Break
13:40 9. LECTURE: Batch systems and parallel application launchers [pre-recorded] What is a queueing system?
What is a (parallel) application launcher
How do these work?
14:00 10. PRACTICAL: Batch Systems and ARCHER2 Slurm Scheduler How do I write job submission scripts?
How do I control jobs?
How do I find out what resources are available?
14:30 11. BREAK Break
14:50 12. LECTURE: Parallel Computing Patterns How can software solve problems in parallel?
What are the (dis)advantages of common patterns of parallel computing?
What are common challenges software faces in using many processors effectively?
What does this mean for your use of HPC?
15:20 13. PRACTICAL: HMMER (1 of 2) How does genomic sequence alignment software run in parallel using HPC?
15:55 14. REVIEW: Review of Day 1 What have we learned today?
What would you like us to focus on more tomorrow?
16:00 Finish
Day 2 10:00 15. Welcome What did we cover in day 1?
What will we cover in day 2 and how does this relate to day 1?
10:05 16. LECTURE: Measuring Parallel Performance What is performance and how is it measured?
What is scalability?
What is meant by strong and weak scaling?
10:35 17. PRACTICAL: HMMER (2 of 2) How efficiently can genomic sequence alignment software scale when run in parallel using HPC?
11:20 18. BREAK Break
11:40 19. LECTURE: Computational Building Blocks: Software What is an operating system (OS)?
What OSs do HPC facilities use?
What are processes?
What are threads?
How are processes and threads relevant to parallel computing?
12:10 20. LECTURE: Computational Building Blocks: Hardware What happens in a computer’s hardware when it runs your software?
What is a processor / core / CPU?
What is an accelerator?
How do processor, accelerator, memory, interconnect, disk, etc. make a parallel computer fast or slow?
12:40 21. LUNCH Break
13:30 22. PRACTICAL: Benchmarking Molecular Dynamics Performance Using GROMACS 1 How does a small, 80k-atom system performance scale as more cores are used?
What about a larger, 12M-atom system?
14:00 23. LECTURE: Parallel Programming Models How is software actually written to run in parallel?
Does this differ in the shared-memory vs distributed-memory context?
What are the roles of processes and threads?
How is software written to use accelerators (GPUs)?
Why is this useful to know for running (if not writing) software on HPC?
14:30 24. BREAK Break
14:50 25. PRACTICAL: Benchmarking Molecular Dynamics Using GROMACS 2 How do we run hybrid MPI and OpenMP jobs on ARCHER2?
Does adding OpenMP to MPI GROMACS affect performance?
15:20 26. LECTURE: Compiling software: from source code to executable What is involved in compiling sofware?
Is it useful to be able to compile software myself?
Can a compiler speed up and/or parallelise code?
What compiler should I use?
15:45 27. REVIEW: Review of Day 2 What have we learned today?
What would you like us to focus on more tomorrow?
16:00 Finish
Day 3 10:00 28. Welcome What did we cover in day 2?
What will we cover in day 3 and how does this relate to days 1 & 2?
10:05 29. PRACTICAL: Benchmarking Molecular Dynamics Using GROMACS 3 Does simultaneous multithreading (SMT) improve GROMACS performance on ARCHER2?
How does load balancing affect GROMACS performance?
10:35 30. LECTURE: Pipelines and Workflows Is it possible to run data-intensive computational pipelines (e.g. bioinformatics) in parallel on HPC?
How about biomolecular simulation workflows?
11:20 31. BREAK Break
11:40 32. PRACTICAL: QM/MM simulations with CP2K How can using MPI+OpenMP benefit the performance of an application?
Why is profiling code useful?
What are communication overheads and how these might change on different numbers of processes or threads?
12:40 33. LUNCH Break
13:30 34. LECTURE: The future of HPC What are some of the key trends in HPC in the next few years?
What does this mean for my use of HPC for science?
14:00 35. PRACTICAL: QM/MM Simulations Using CP2K 2 How can using MPI+OpenMP benefit the performance of an application?
Why is profiling code useful?
What are communication overheads and how these might change on different numbers of processes or threads?
14:30 36. BREAK Break
14:50 37. LECTURE: The HPC landscape in the EU and UK What HPC resources are available in the EU and in the UK?
What are the different tiers of HPC facilities, and what does this mean?
15:20 38. REVIEW: Course review and where next? What have we learned over this course?
Where do we go from here?
What other training do BioExcel, PRACE, and EPCC offer?
15:40 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.