Reproducible computational environments using containers: Introduction to Singularity for RSEs

Introducing Containers

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • What are containers, and why might they be useful to me?

Objectives
  • Show how software depending on other software leads to configuration management problems.

  • Identify the problems that software installation can pose for research.

  • Explain the advantages of containerization.

  • Explain how using containers can solve software configuration problems

Learning about Containers

The Australian Research Data Commons has produced a short introductory video about containers that covers many of the points below. Watch it before or after you go through this section to reinforce your understanding!

How can software containers help your research?

Australian Research Data Commons, 2021. How can software containers help your research?. [video] Available at: https://www.youtube.com/watch?v=HelrQnm3v4g DOI: http://doi.org/10.5281/zenodo.5091260

Scientific Software Challenges

What’s Your Experience?

Take a minute to think about challenges that you have experienced in using scientific software (or software in general!) for your research. Then, share with your neighbors and try to come up with a list of common gripes or challenges.

You may have come up with some of the following:

A lot of these characteristics boil down to one fact: the main program you want to use likely depends on many other different programs (including the operating system!), creating a very complex, and often fragile system. One change or missing piece may stop the whole thing from working or break something that was already running. It’s no surprise that this situation is sometimes informally termed “dependency hell”.

Software and Science

Again, take a minute to think about how the software challenges we’ve discussed could impact (or have impacted!) the quality of your work. Share your thoughts with your neighbors. What can go wrong if our software doesn’t work?

Unsurprisingly, software installation and configuration challenges can have negative consequences for research:

Thankfully there are ways to get underneath (a lot of) this mess: containers to the rescue! Containers provide a way to package up software and its dependencies while also offering ways to provide access to resources such as files and communications networks in a uniform manner.

What is a Container?

To understand containers, let’s first talk briefly about your computer.

Your computer has some standard pieces that allow it to work – often what’s called the hardware. One of these pieces is the CPU or processor; another is the amount of memory or RAM that your computer can use to store information temporarily while running programs; another is the hard drive, which can store information over the long-term. All these pieces work together to do the “computing” of a computer, but we don’t see them because they’re hidden from view (usually).

Instead, what we see is our desktop, program windows, different folders, and files. These all live in what’s called the filesystem. Everything on your computer – programs, pictures, documents, the operating system itself – lives somewhere in the filesystem.

NOW, imagine you want to install some new software but don’t want to take the chance of making a mess of your existing system by installing a bunch of additional stuff (libraries/dependencies/etc.). You don’t want to buy a whole new computer because it’s too expensive. What if, instead, you could have another independent filesystem and running operating system that you could access from your main computer, and that is actually stored within this existing computer?

Or, imagine you have two tools you want to use in your groundbreaking research on cat memes: PurrLOLing, a tool that does AMAZINGLY well at predicting the best text for a meme based on the cat species and WhiskerSpot, the only tool available for identifying cat species from images. You want to send cat pictures to WhiskerSpot, and then send the species output to PurrLOLing. But there’s a problem: PurrLOLing only works on Ubuntu and WhiskerSpot is only supported for OpenSUSE so you can’t have them on the same system! Again, we really want another filesystem (or two) on our computer that we could use to chain together WhiskerSpot and PurrLOLing in a “pipeline”…

Container systems, like Singularity, are special programs on your computer that make it possible! The term “container” can be usefully considered with reference to shipping containers. Before shipping containers were developed, packing and unpacking cargo ships was time consuming and error prone, with high potential for different clients’ goods to become mixed up. Just like shipping containers keep things together that should stay together, software containers standardize the description and creation of a complete software system: you can drop a container into any computer with the container software installed (the ‘container host’), and it should “just work”.

Virtualization

Containers are an example of what’s called virtualization – having a second “virtual” computer running and accessible from a main or host computer. Another example of virtualization are virtual machines or VMs. A virtual machine typically contains a whole copy of an operating system in addition to its own filesystem and has to get booted up in the same way a computer would. A container is considered a lightweight version of a virtual machine; underneath, the container is (usually) using the Linux kernel and simply has some flavour of Linux + the filesystem inside.

One final term: while the container is an alternative filesystem layer that you can access and run from your computer, the container image is the template for a container. The container image has all the required information, including the necessary files, encapsulated within a digital “object”. This can then be used to start up a running copy of the container. A running container tends to be transient and can be started and shut down. The container image is more long-lived, as a template that can be used for creating containers. You could think of the container image like a cookie cutter – it can be used to create multiple copies of the same shape (or container) and remains unchanged, where cookies come and go. If you want a different type of container (cookie) you need a different container image (cookie cutter).

Putting the Pieces Together

Think back to some of the challenges we described at the beginning. The many layers of scientific software installations make it hard to install and re-install scientific software – which ultimately, hinders reliability and reproducibility.

But now, think about what a container is – a self-contained, complete, separate computer filesystem. What advantages are there if you put your scientific software tools into containers?

This solves several of our problems:

The rest of this workshop will show you how to download and run containers from pre-existing container images on your own computer, and how to create and share your own container images.

Use cases for containers

Now that we have discussed a little bit about containers – what they do and the issues they attempt to address – you may be able to think of a few potential use cases in your area of work. Some examples of common use cases for containers in a research context include:

Key Points

  • Almost all software depends on other software components to function, but these components have independent evolutionary paths.

  • Small environments that contain only the software that is needed for a given task are easier to replicate and maintain.

  • Critical systems that cannot be upgraded, due to cost, difficulty, etc. need to be reproduced on newer systems in a maintainable and self-documented way.

  • Virtualization allows multiple environments to run on a single computer.

  • Containerization improves upon the virtualization of whole computers by allowing efficient management of the host computer’s memory and storage resources.

  • Containers are built from ‘recipes’ that define the required set of software components and the instructions necessary to build/install them within a container image.

  • Singularity is just one software platform that can create containers and the resources they use.


Singularity: Getting started

Overview

Teaching: 30 min
Exercises: 20 min
Questions
  • What is Singularity and why might I want to use it?

  • How do I run different commands within a container?

  • How do I access an interactive shell within a container?

Objectives
  • Understand what Singularity is and when you might want to use it.

  • Undertake your first run of a simple Singularity container.

  • Learn how to run different commands when starting a container.

  • Learn how to open an interactive shell within a container environment.

The episodes in this lesson will introduce you to the Singularity container platform and demonstrate how to set up and use Singularity.

Work in progress…

This lesson is new material that is under ongoing development. We will introduce Singularity and demonstrate how to work with it. As the tools and best practices continue to develop, elements of this material are likely to evolve. We welcome any comments or suggestions on how the material can be improved or extended.

What is Singularity?

Singularity (or Apptainer, we’ll get to this in a minute…) is a container platform that supports packaging and deploying software and tools in a portable and reproducible manner.

You may be familiar with Docker, another container platform that is now used widely. If you are, you will see that in some ways, Singularity is similar to Docker. However, in other ways, particularly in terms of the system’s architecture, it is fundamentally different. These differences mean that Singularity is particularly well-suited to running on shared platforms such as distributed, High Performance Computing (HPC) infrastructure, as well as on a Linux laptop or desktop.

Singularity runs containers from container images which, as we discussed, are essentially a virtual computer disk that contains all of the necessary software, libraries and configuration to run one or more applications or undertake a particular task, e.g. to support a specific research project. This saves you the time and effort of installing and configuring software on your own system or setting up a new computer from scratch, as you can simply run a Singularity container from an image and have a virtual environment that is equivalent to the one used by the person who created the image. Singularity/Apptainer is increasingly widely used in the research community for supporting research projects due to its support for shared computing platforms.

System administrators will not, generally, install Docker on shared computing platforms such as lab desktops, research clusters or HPC platforms because the design of Docker presents potential security issues for shared platforms with multiple users. Singularity/Apptainer, on the other hand, can be run by end-users entirely within “user space”, that is, no special administrative privileges need to be assigned to a user in order for them to run and interact with containers on a platform where Singularity has been installed.

A little history…

Singularity is open source software and was initially developed within the research community. A couple of years ago, the project was “forked” something that is not uncommon within the open source software community, with the software effectively splitting into two projects going in different directions. The fork is being developed by a commercial entity, Sylabs.io who provide both the free, open source SingularityCE (Community Edition) and Pro/Enterprise editions of the software. The original open source Singularity project has recently been renamed to Apptainer and has moved into the Linux Foundation. The initial release of Apptainer was made about a year ago, at the time of writing. While earlier versions of this course focused on versions of Singularity released before the project fork, we now base the course material on recent Apptainer releases. Despite this, the basic features of Apptainer/Singularity remain the same and so this material is equally applicable whether you’re working with a recent Apptainer release or a slightly older Singularity version. Nonetheless, it is useful to be aware of this history and that you may see both Singularity and Apptainer being used within the research community over the coming months and years.

Another point to note is that some systems that have a recent Apptainer release installed may also provide a singularity command that is simply a link to the apptainer executable on the system. This helps to ensure that existing scripts being used on the system that were developed before the migration to Apptainer will still function correctly.

For now, the remainder of this material refers to Singularity but where you have a release of Apptainer installed on your local system, you can simply replace references to singularity with apptainer, if you wish.

Open a terminal on the system that you are using for the course and check that the singularity command is available in your terminal:

$ singularity --version
singularity-ce version 3.11.0

Depending on the version of Singularity installed on your system, you may see a different version.

Loading a module

HPC systems often use modules to provide access to software on the system so you may need to use the command:

$ module load singularity

before you can use the singularity command on remote systems. However, this depends on how the system is configured. If in doubt, consult the documentation for the system you are using or contact the support team.

Images and containers: reminder

A quick reminder on terminology: we refer to both container images and containers. What is the difference between these two terms?

Container images (sometimes just images) are bundles of files including an operating system, software and potentially data and other application-related files. They may sometimes be referred to as a disk image or image and they may be stored in different ways, perhaps as a single file, or as a group of files. Either way, we refer to this file, or collection of files, as an image.

A container is a virtual environment that is based on a container image. That is, the files, applications, tools, etc that are available within a running container are determined by the image that the container is started from. It may be possible to start multiple container instances from an image. You could, perhaps, consider an image to be a form of template from which running container instances can be started.

Getting a container image and running a Singularity container

Singularity uses the Singularity Image Format (SIF) and container images are provided as single SIF files (usually with a .sif or .img filename extension). Singularity container images can be pulled from the Sylabs Cloud Library, a registry for Singularity container images. Singularity is also capable of running containers based on container images pulled from Docker Hub and other Docker image repositories (e.g. Quay.io). We will look at accessing container images from Docker Hub later in the course.

Sylabs Remote Builder

Note that in addition to providing a repository that you can pull container images from, Sylabs Cloud Library can also build Singularity images for you from a recipe - a configuration file defining the steps to build an image. We will look at recipes and building images later in the workshop.

Pulling a container image from a remote library

Let’s begin by creating a test directory, changing into it and pulling an existing container image from a remote library:

$ mkdir test
$ cd test
$ singularity pull hello-world.sif docker://hello-world
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob 70f5ac315c5a done  
Copying config 286cc18214 done  
Writing manifest to image destination
Storing signatures
2023/09/06 14:59:34  info unpack layer: sha256:70f5ac315c5af948332962ff4678084ebcc215809506e4b8cd9e509660417205
INFO:    Creating SIF file...

What just happened? We pulled a container image from a remote repository (in this case, stored on Docker Hub) using the singularity pull command and directed it to store the container image in a file using the name hello-world.sif in the current directory. If you run the ls command, you should see that the hello-world.sif file is now present in the current directory.

Why is the protocol docker://?

The OCI format has become the defacto standard for distributing containers - this has evolved from the format Docker originally developed. We need to tell Apptainer when we want to convert remote containers that are saved in OCI format into its native format - we do that using the docker:// syntax

$ ls -lh
total 72M
-rwxr-xr-x. 1 auser group   45056 Sep  6 15:00 hello-world.sif

Running a Singularity container

We can now run a container based on the hello-world.sif container image:

$ singularity run hello-world.sif
WARNING: passwd file doesn't exist in container, not updating
WARNING: group file doesn't exist in container, not updating

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (arm64v8)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/


The above command ran a lolcow container based on the container image we downloaded from the online repository and the resulting output was shown.

Extra warnings

You may see a warning such as:

INFO:    underlay of /etc/localtime required more than 50 (77) bind mounts

We will explain mounts and what this info statement means later in the workshop

What just happened? When we use the singularity run command, Singularity does three things:

1. Starts a Running Container 2. Performs Default Action 3. Shuts Down the Container
Starts a running container, based on the container image. Think of this as the “alive” or “inflated” version of the container – it’s actually doing something. If the container has a default action set, it will perform that default action. This could be as simple as printing a message (as above) or running a whole analysis pipeline! Once the default action is complete, the container stops running (or exits).

Default action

How did the container determine what to do when we ran it? What did running the container actually do to result in the displayed output?

When you run a container from a Singularity container image using the singularity run command, the container runs the default run script that is embedded within the container image. This is a shell script that can be used to run commands, tools or applications stored within the container image on container startup. We can inspect the container image’s run script using the singularity inspect command:

$ singularity inspect -r hello-world.sif

This shows us the script within the hello-world.sif image configured to run by default when we use the singularity run command.

Running specific commands within a container

We now know that we can use the singularity inspect command to see the run script that a container is configured to run by default. What if we want to run a different command within a container?

If we know the path of an executable that we want to run within a container, we can use the singularity exec command. For example, using a container based on the lightweight Alpine Linux distribution which we can pull from Docker Hub we can run the following to first create a Singularity container image file from the image on Docker Hub and then print a message from a running container:

singularity pull alpine.sif docker://alpine
singularity exec alpine.sif echo "Hello, world"
Hello, world

Here we see that a container has been started from the alpine.sif image and the echo command has been run within the container, passing the input Hello, world. The command has echoed the provided input to the console and the container has terminated.

Note that the use of singularity exec has overriden any run script set within the image metadata and the command that we specified as an argument to singularity exec has been run instead.

Basic exercise: Running a different command within the “hello-world” container

Can you run a container based on the alpine.sif image that prints the current date and time?

Solution

singularity exec alpine.sif /bin/date
Fri Jun 26 15:17:44 BST 2020

Difference between singularity run and singularity exec

Above, we used the singularity exec command. In earlier episodes of this course we used singularity run. To clarify, the difference between these two commands is:

Opening an interactive shell within a container

If you want to open an interactive shell within a container, Singularity provides the singularity shell command. Again, using the alpine.sif image, and within our test directory, we can run a shell within a container from the hello-world image:

$ singularity shell alpine.sif
Singularity> whoami
[<your username>]
Singularity> ls
hello-world.sif alpine.sif
Singularity> 

As shown above, we have opened a shell in a new container started from the alpine.sif image. Note that the shell prompt has changed to show we are now within the Singularity container.

Use the exit command to exit from the container shell.

Key Points

  • Singularity is another container platform and it is often used in cluster/HPC/research environments.

  • Singularity has a different security model to other container platforms, one of the key reasons that it is well suited to HPC and cluster environments.

  • Singularity has its own container image format (SIF).

  • The singularity command can be used to pull images from Sylabs Cloud Library and run a container from an image file.

  • The singularity exec is an alternative to singularity run that allows you to start a container running a specific command.

  • The singularity shell command can be used to start a container and run an interactive shell within it.


Using Docker images with Singularity

Overview

Teaching: 5 min
Exercises: 10 min
Questions
  • How do I use Docker images with Singularity?

Objectives
  • Learn how to run Singularity containers based on Docker images.

Using Docker images with Singularity

Singularity can also start containers directly from Docker container images, opening up access to a huge number of existing container images available on Docker Hub and other registries.

While Singularity doesn’t actually run a container using the Docker container image (it first converts it to a format suitable for use by Singularity), the approach used provides a seamless experience for the end user. When you direct Singularity to run a container based on a Docker container image, Singularity pulls the slices or layers that make up the Docker container image and converts them into a single-file Singularity SIF container image.

For example, moving on from the simple Hello World examples that we’ve looked at so far, let’s pull one of the official Docker Python container images. We’ll use the image with the tag 3.9.6-slim-buster which has Python 3.9.6 installed on Debian’s Buster (v10) Linux distribution:

$ singularity pull python-3.9.6.sif docker://python:3.9.6-slim-buster
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob 33847f680f63 done  
Copying blob b693dfa28d38 done  
Copying blob ef8f1a8cefd1 done  
Copying blob 248d7d56b4a7 done  
Copying blob 478d2dfa1a8d done  
Copying config c7d70af7c3 done  
Writing manifest to image destination
Storing signatures
2021/07/27 17:23:38  info unpack layer: sha256:33847f680f63fb1b343a9fc782e267b5abdbdb50d65d4b9bd2a136291d67cf75
2021/07/27 17:23:40  info unpack layer: sha256:b693dfa28d38fd92288f84a9e7ffeba93eba5caff2c1b7d9fe3385b6dd972b5d
2021/07/27 17:23:40  info unpack layer: sha256:ef8f1a8cefd144b4ee4871a7d0d9e34f67c8c266f516c221e6d20bca001ce2a5
2021/07/27 17:23:40  info unpack layer: sha256:248d7d56b4a792ca7bdfe866fde773a9cf2028f973216160323684ceabb36451
2021/07/27 17:23:40  info unpack layer: sha256:478d2dfa1a8d7fc4d9957aca29ae4f4187bc2e5365400a842aaefce8b01c2658
INFO:    Creating SIF file...

Note how we see Singularity saying that it’s “Converting OCI blobs to SIF format”. We then see the layers of the Docker container image being downloaded and unpacked and written into a single SIF file. Once the process is complete, we should see the python-3.9.6.sif container image file in the current directory.

We can now run a container from this container image as we would with any other Singularity container image.

Running the Python 3.9.6 image that we just pulled from Docker Hub

Try running the Python 3.9.6 container image. What happens?

Try running some simple Python statements…

Running the Python 3.9.6 image

$ singularity run python-3.9.6.sif

This should put you straight into a Python interactive shell within the running container:

Python 3.9.6 (default, Jul 22 2021, 15:24:21) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

Now try running some simple Python statements:

>>> import math
>>> math.pi
3.141592653589793
>>> 

In addition to running a container and having it run the default run script, you could also start a container running a shell in case you want to undertake any configuration prior to running Python. This is covered in the following exercise:

Open a shell within a Python container

Try to run a shell within a singularity container based on the python-3.9.6.sif container image. That is, run a container that opens a shell rather than the default Python interactive console as we saw above. See if you can find more than one way to achieve this.

Within the shell, try starting the Python interactive console and running some Python commands.

Solution

Recall from the earlier material that we can use the singularity shell command to open a shell within a container. To open a regular shell within a container based on the python-3.9.6.sif container image, we can therefore simply run:

$ singularity shell python-3.9.6.sif
Singularity> echo $SHELL
/bin/bash
Singularity> cat /etc/issue
Debian GNU/Linux 10 \n \l

Singularity> python
Python 3.9.6 (default, Jul 22 2021, 15:24:21) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print('Hello World!')
Hello World!
>>> exit()

Singularity> exit
$ 

It is also possible to use the singularity exec command to run an executable within a container. We could, therefore, use the exec command to run /bin/bash:

$ singularity exec python-3.9.6.sif /bin/bash
Singularity> echo $SHELL
/bin/bash

You can run the Python console from your container shell simply by running the python command.

References

[1] Gregory M. Kurzer, Containers for Science, Reproducibility and Mobility: Singularity P2. Intel HPC Developer Conference, 2017

Key Points

  • Singularity can start a container from a Docker image which can be pulled directly from Docker Hub.


Files in Singularity containers

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • How do I make data available in a Singularity container?

  • What data is made available by default in a Singularity container?

Objectives
  • Understand that some data from the host system is usually made available by default within a container

  • Learn more about how Singularity handles users and binds directories from the host filesystem.

The key concept to remember when running a Singularity container, you only have the same permissions to access files as the user on the host system that you start the container as. (If you are familiar with Docker, you may note that this is different behaviour than you would see with that tool.)

In this episode we will look at working with files in the context of Singularity containers and how this links with Singularity’s approach to users and permissions within containers.

Users within a Singularity container

The first thing to note is that when you ran whoami within the container shell you started at the end of the previous episode, you should have seen the same username that you have on the host system when you ran the container.

For example, if my username were jc1000, I would expect to see the following:

singularity shell lolcow.sif
Singularity> whoami
jc1000

But wait! I downloaded the standard, public version of the lolcow container image from the Cloud Library. I haven’t customised it in any way. How is it configured with my own user details?!

If you have any familiarity with Linux system administration, you may be aware that in Linux, users and their Unix groups are configured in the /etc/passwd and /etc/group files respectively. In order for the running container to know of my user, the relevant user information needs to be available within these files within the container.

Assuming this feature is enabled within the installation of Singularity on your system, when the container is started, Singularity appends the relevant user and group lines from the host system to the /etc/passwd and /etc/group files within the container[1].

This means that the host system can effectively ensure that you cannot access/modify/delete any data you should not be able to on the host system from within the container and you cannot run anything that you would not have permission to run on the host system since you are restricted to the same user permissions within the container as you are on the host system.

Files and directories within a Singularity container

Singularity also binds some directories from the host system where you are running the singularity command into the container that you are starting. Note that this bind process is not copying files into the running container, it is making an existing directory on the host system visible and accessible within the container environment. If you write files to this directory within the running container, when the container shuts down, those changes will persist in the relevant location on the host system.

There is a default configuration of which files and directories are bound into the container but ultimate control of how things are set up on the system where you are running Singularity is determined by the system administrator. As a result, this section provides an overview but you may find that things are a little different on the system that you’re running on.

One directory that is likely to be accessible within a container that you start is your home directory. You may also find that the directory from which you issued the singularity command (the current working directory) is also bound.

The binding of file content and directories from a host system into a Singularity container is illustrated in the example below showing a subset of the directories on the host Linux system and in a running Singularity container:

Host system:                                                      Singularity container:
-------------                                                     ----------------------
/                                                                 /
├── bin                                                           ├── bin
├── etc                                                           ├── etc
│   ├── ...                                                       │   ├── ...
│   ├── group  ─> user's group added to group file in container ─>│   ├── group
│   └── passwd ──> user info added to passwd file in container ──>│   └── passwd
├── home                                                          ├── usr
│   └── jc1000 ───> user home directory made available ──> ─┐     ├── sbin
├── usr                 in container via bind mount         │     ├── home
├── sbin                                                    └────────>└── jc1000
└── ...                                                           └── ...

Questions and exercises: Files in Singularity containers

Q1: What do you notice about the ownership of files in a container started from the lolcow.sif image? (e.g. take a look at the ownership of files in the root directory (/) and your home directory (~/)).

Exercise 1: In this container, try creating a file in the root directory / (e.g. using touch /myfile.dat). What do you notice? Try removing the /singularity file. What happens in these two cases?

Exercise 2: In your home directory within the container shell, try and create a simple text file (e.g. echo "Some text" > ~/test-file.txt). Is it possible to do this? If so, why? If not, why not?! If you can successfully create a file, what happens to it when you exit the shell and the container shuts down?

Answers

A1: Use the ls -l / command to see a detailed file listing including file ownership and permission details. You should see that most of the files in the / directory are owned by root, as you would probably expect on any Linux system. If you look at the files in your home directory, they should be owned by you.

A Ex1: We’ve already seen from the previous answer that the files in / are owned by root so we would not expect to be able to create files there if we’re not the root user. However, if you tried to remove /singularity you would have seen an error similar to the following: cannot remove '/singularity': Read-only file system. This tells us something else about the filesystem. It’s not just that we do not have permission to delete the file, the filesystem itself is read-only so even the root user would not be able to edit/delete this file. We will look at this in more detail shortly.

A Ex2: Within your home directory, you should be able to successfully create a file. Since you’re seeing your home directory on the host system which has been bound into the container, when you exit and the container shuts down, the file that you created within the container should still be present when you look at your home directory on the host system.

Binding additional host system directories to the container

You will sometimes need to bind additional host system directories into a container you are using over and above those bound by default. For example:

The -B option to the singularity command is used to specify additional binds. For example, to bind the /mnt/c/Users/Andrew directory (my Windows home directory in WSL2) into a container you could use (note this directory is unlikely to exist on the host system you are using so you will need to test this using a different directory):

singularity shell -B /mnt/c/Users/Andrew lolcow.sif
Singularity> ls /mnt/c/Users/Andrew

Note that, by default, a bind is mounted at the same path in the container as on the host system. You can also specify where a host directory is mounted in the container by separating the host path from the container path by a colon (:) in the option:

singularity shell -B /mnt/c/Users/Andrew:/Windows lolcow.sif
Singularity> ls /Windows

You can also specify multiple binds to -B by separating them by commas (,).

You can also copy data into a container image at build time if there is some static data required in the image. We cover this later in the section on building Singularity container images.

References

[1] Gregory M. Kurzer, Containers for Science, Reproducibility and Mobility: Singularity P2. Intel HPC Developer Conference, 2017. Available at: https://www.intel.com/content/dam/www/public/us/en/documents/presentation/hpc-containers-singularity-advanced.pdf

Key Points

  • Your current directory and home directory are usually available by default in a container.

  • You have the same username and permissions in a container as on the host system.

  • You can specify additional host system directories to be available in the container.


Creating Your Own Container Images

Overview

Teaching: 40 min
Exercises: 40 min
Questions
  • How can I make my own Singularity container images?

  • How do I document the ‘recipe’ for a Singularity container image?

  • How can I make more complex container images?

Objectives
  • Explain the purpose of a Singularity recipe file and show some simple examples.

  • Demonstrate how to build a Singularity container image from a recipe file.

  • Compare the steps of creating a container image interactively versus a recipe file.

  • Create an installation strategy for a container image.

  • Explain how you can include files within Singularity container images when you build them.

  • Explain how you can access files on the Singularity host from your Singularity containers.

There are lots of reasons why you might want to create your own Singularity container image.

Building using Docker rather than Singularity

An alternative to building container images using Singularity itself is to use Docker to build the images that you then run using Singularity. This has a number of advantages:

  • Docker/Docker Desktop is often easier to install than SingularityCE/Apptainer (particularly on macOS/Windows systems)
  • Docker can build cross-platform - you can build container images for x86 systems on Arm-based systems (such as Mac M1/M2 systems)
  • Docker is generally more efficient in dealing with uploading/downloading container image data that makes it better for moving your container images to remote HPC facilities

This session primarily uses Singularity to build the container images but we also provide the equivalent Dockerfiles in case you want to build container images using Docker rather than Singularity.

Building and uploading images using Docker

Typically, you will build using Docker with a command such as (assuming you are issuing the command from the same directory as the Dockerfile and that your Docker Hub username is alice):

docker image build --platform linux/amd64 -t alice/image-name .

You can then push your built image to Docker Hub with:

docker push alice/image-name

Finally, you can log into the remote system and build a Singularity image from the image on Docker Hub with:

singularity build image-name.sif docker://alice/image-name

You can also build directory from a tar archive exported from Docker using the docker-archive image type if you do not want to upload via Docker Hub or another online repository.

Starting with a basic Alpine Linux image

Before creating a reproducible installation, let’s start with a minimal Linux container image. Create a Singularity container image from an alpine Docker container image:

singularity pull alpine.sif docker://alpine 
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob 31e352740f53 done
Copying config f4b9357049 done
Writing manifest to image destination
Storing signatures
2023/06/17 09:38:24  info unpack layer: sha256:31e352740f534f9ad170f75378a84fe453d6156e40700b882d737a8f4a6988a3
INFO:    Creating SIF file...

Now, start a shell in a container based on the new container image:

singularity shell alpine.sif

Because this is a basic container, there’s a lot of things not installed – for example, python3.

Singularity> python3
/bin/sh: python3: not found

Python 3 is not provided by the alpine container image. However, the Alpine version of Linux has an installation tool (a package manager) called apk that we can use to install Python 3, or indeed a wide range of other software, libraries or tools. So, we could build our own container image that adds Python 3 to the alpine container image. Software can be installed using the apk add command, e.g. to install Python 3 (as well as a couple of additional related packages) we might use:

apk add --update python3 py3-pip python3-dev

Interactive installation

You may wonder why we cannot install Python 3 directly in the running container itself by using the command above. If you try to do this, you will get an error:

ERROR: Unable to lock database: Read-only file system
ERROR: Failed to open apk database: Read-only file system

This is because the system directories where apk would install Python are in read-only areas of file system in the running container. Installing software interactively is not ideal anyway from a reproducibility aspect as it makes it difficult to know exactly what process was followed to install the software in the container image and track changes to this process over time and/or versions of the container image.

Writable container images

There is a way to create an image in a way that can be written to but it is a bit convoluted and not as useful as you might first expect; due, in a large part to the reproducibility issues discussed above. To be able to install Python 3 in a running Alpine container we need to build and run the container in a different way. You need to use the --sandbox flag:

singularity build --sandbox alpine-writable.sandbox docker://alpine

Once the sandbox container image has been built, we need to open a shell in a container based on the sandbox container image:

singularity shell --writable alpine-writable.sandbox

Now, finally, we can use the apk add --update python3 py3-pip python3-dev command in the running container to install Python 3. Note, the installation will persist in the sandbox container image even if you shut down the running container and start a new one.

If you then want to convert the sandbox container image to a standard container image you can use the build command:

singularity build alpine-python.sif alpine-writable.sandbox

This approach can be useful for exploring the install commands to use to create your container images but it is not generally a good way to create reproducible container images.

Singularity CE docs on sandbox images

Put installation instructions in a Singularity recipe file

A Singularity recipe file is a plain text file with keywords and commands that can be used to create a new container image. This is a much more reproducible approach than installing things interactively as it allows us to have a record of exactly how we installed software in the container image and, as it is plain text, it lends itself well to being placed under version control (e.g. using git and Github/Gitlab) to track and manage versions of the recipe file. We will start by creating a very simple recipe file that defines an image based on Alpine Linux with Python 3 installed.

Using your favourite text editor, create a file called alpine-python.def and add the following lines:

Bootstrap: docker
From: alpine:latest

%post
    apk add --update python3 py3-pip python3-dev

%runscript
    python3 --version

Let’s break this file down:

Build with Docker

Dockerfile

FROM alpine:latest

RUN apk add --update python3 py3-pip python3-dev

CMD ["python3", "--version"]

Extending our recipe file

So far, we only have a text file named alpine-python.def – we do not yet have a container image. Also, in order to create and use your own container images, you may need to provide some more detailed instructions than the very basic recipe we’ve created so far. For example, you may want to include files in your container image that are currently on your host system by copying them into the image when it is built. You may also want to undertake more advanced software installation or configuration.

Before we go ahead and use our recipe to build a container image, we’re going to create a simple Python script on our host system and update our recipe to have this copied into our container image when it is created.

Use your text editor to create a Python script called sum.py with the following contents:

#!/usr/bin/env python3

# Comment added in container

import sys
try:
   total = sum(int(arg) for arg in sys.argv[1:])
   print('sum =', total)
except ValueError:
   print('Please supply integer arguments')

Including your scripts and data within a container image

We’re now going to add some configuration to enable us to include one or more files from our local system into the container image that we’re going to build.

You might want to do this if you have a local script that you’d like to include within your image, for example, or perhaps some static input data that will always be required by software within your image. To demonstrate the process, we’re going to modify our recipe file to add our sum.py script into the container image.

We’ll do this by modifying our recipe file to include a new %files section:

Bootstrap: docker
From: alpine:latest

%files
    sum.py /home

%post
    apk add --update python3 py3-pip python3-dev

%runscript
    python3 --version

The %files section here specifies that the file sum.py (in the current directory from where we initiate the container image build process) should be copied to the target location /home inside the container image that will be built based on this recipe file. Multiple lines can be added to the %files section to have additional files copied from the host filesystem into the container image.

Build with Docker

Dockerfile

FROM alpine:latest

COPY sum.py /home

RUN apk add --update python3 py3-pip python3-dev

CMD ["python3", "--version"]

Note that it’s not necessarily a good idea to put your scripts inside the container image if you’re constantly changing or editing them. In this case, referencing the scripts in a shared location that is mounted into a running container from the host system is often a better approach. You should also think carefully about container size – if you run ls -lh *.sif you’ll see the size of each container image in the current directory. The bigger your container image becomes, the more impractical it will be to share and download.

Build a new Singularity container image from the recipe

We now want Singularity to take this recipe file, run the installation commands contained within it, and then save the resulting container as a new container image. To do this we will use the singularity build command.

We have to provide the singularity build command with two pieces of information:

As we are building a container image we need admin/root privileges so we need to use the sudo command to run our singularity build command.

sudo Password

As you are using sudo, you may be asked by the system for your password when you run this command. Your system will typically ask for the password when using sudo for the first time after an expiry period is reached (this can be every 5 mins but is sometimes longer, it depends on the system you are using).

Running singularity build with sudo

The statement above that says that we need to run singularity build with sudo is not entirely correct in all cases – there are some cases where singularity build can be run without sudo. You may, for example, find this is the case if you’re running via lima on macOS. However, as a general rule we suggest to use sudo since this ensures that the running process has the necessary privileges to create files with the correct ownership and permissions within the generated container image.

All together, the build command that you should run on your computer, will have a structure like the following:

sudo singularity build <container image file name> <recipe file name>

For example, if my recipe is in the file alpine-python.def and I wanted to call my container image file alpine-python.sif, I would use this command:

sudo singularity build alpine-python.sif alpine-python.def
INFO:    Starting build...
Getting image source signatures
Copying blob 31e352740f53 done  
Copying config f4b9357049 done  
Writing manifest to image destination
Storing signatures
2023/08/03 18:25:46  info unpack layer: sha256:31e352740f534f9ad170f75378a84fe453d6156e40700b882d737a8f4a6988a3
INFO:    Copying sum.py to /home
INFO:    Running post scriptlet
+ apk add --update python3 py3-pip python3-dev
fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/community/x86_64/APKINDEX.tar.gz
(1/27) Installing libbz2 (1.0.8-r5)
(2/27) Installing libexpat (2.5.0-r1)
(3/27) Installing libffi (3.4.4-r2)
(4/27) Installing gdbm (1.23-r1)
(5/27) Installing xz-libs (5.4.3-r0)
(6/27) Installing libgcc (12.2.1_git20220924-r10)
(7/27) Installing libstdc++ (12.2.1_git20220924-r10)
(8/27) Installing mpdecimal (2.5.1-r2)
(9/27) Installing ncurses-terminfo-base (6.4_p20230506-r0)
(10/27) Installing libncursesw (6.4_p20230506-r0)
(11/27) Installing libpanelw (6.4_p20230506-r0)
(12/27) Installing readline (8.2.1-r1)
(13/27) Installing sqlite-libs (3.41.2-r2)
(14/27) Installing python3 (3.11.4-r0)
(15/27) Installing python3-pycache-pyc0 (3.11.4-r0)
(16/27) Installing pyc (0.1-r0)
(17/27) Installing py3-setuptools-pyc (67.7.2-r0)
(18/27) Installing py3-pip-pyc (23.1.2-r0)
(19/27) Installing py3-parsing (3.0.9-r2)
(20/27) Installing py3-parsing-pyc (3.0.9-r2)
(21/27) Installing py3-packaging-pyc (23.1-r1)
(22/27) Installing python3-pyc (3.11.4-r0)
(23/27) Installing py3-packaging (23.1-r1)
(24/27) Installing py3-setuptools (67.7.2-r0)
(25/27) Installing py3-pip (23.1.2-r0)
(26/27) Installing pkgconf (1.9.5-r0)
(27/27) Installing python3-dev (3.11.4-r0)
Executing busybox-1.36.1-r0.trigger
OK: 142 MiB in 42 packages
INFO:    Adding runscript
INFO:    Creating SIF file...
INFO:    Build complete: alpine-python.sif

Exercise: Container images and running a container

  1. How might you check that your container image file was created successfully?

  2. What command will run a container based on the container image you’ve created and perform the default action?

  3. What is causing this default action to run and how could you change the default action?

  4. Can you make it do something different, like print “hello world”?

Solution

  1. To check that the file for your new image has been created, run ls. You should see the name of your new container image file listed.

  2. We would use singularity run alpine-python.sif to run a container based on the alpine-python.sif container image and perform the default action.

  3. The default action is being triggered by the run script embedded in the image. This is defined in the %runscript section of the recipe file we created earlier. To update the default action within the image, one option is to edit the recipe file and rebuild the image.

  4. We could use singularity exec alpine-python.sif echo "hello world" to run a container based on the container image and perform the default action.

Exercise: Explore the script sum.py script

Start a container from your image that runs the sum.py script. What happens if you use the singularity exec command above and put numbers after the script name?

Solution

This script comes from the Python Wiki and is set to add all numbers that are passed to it as arguments.

Exercise: Interactive use

We can also use the script interactively within a running container. What commands would you use to start a container from your alpine-python.sif container image and then run the sum.py script interactively in this container?

Solution

The Singularity command to run the container interactively is:

singularity shell alpine-python.sif
Singularity> python3 sum.py 10 12 10
sum = 32

Making the sum.py script run automatically

To close out our practical work on building containers, there’s one thing we haven’t yet done. At present, when we run a container from our image, the default run script prints the Python version. let’s modify this to run the sum.py script by default:

Make the sum.py script run automatically

Can you modify the alpine-sum.def recipe file so that the sum.py is run automatically when using the singularity run command?

Solution

Bootstrap: docker
From: alpine:latest

%post
    apk add --update python3 py3-pip python3-dev

%files
    sum.py /home

%runscript
    python3 /home/sum.py

Build and test it:

sudo singularity build alpine-sum.sif alpine-sum.def
singularity run alpine-sum.sif

You’ll notice that you can run the container without arguments just fine, resulting in sum = 0, but this is boring. Supplying arguments however doesn’t work:

singularity run alpine-sum.sif 10 11 12

still results in

sum = 0

This is because the arguments 10 11 12 are not interpreted by the runscript in the container.

To achieve the goal of having a command that always runs when a container is run from the container image and can be passed the arguments given on the command line, we need to tell the runscript to use the arguments:

Handling command line arguments in run scripts

Can you modify update the alpine-sum.def recipe file to handle command line arguments passed to sum.py?

Solution

Bootstrap: docker
From: alpine:latest

%post
    apk add --update python3 py3-pip python3-dev

%files
    sum.py /home

%runscript
    python3 /home/sum.py "$@"

Build and test your updated image:

sudo singularity build alpine-sum.sif alpine-sum.def
singularity run alpine-sum.sif 10 11 12
sum = 33



While it may not look like you have achieved much at this stage, you have already created an image that combines a lightweight Linux operating system installation with your own configuration and software installed. This can now be used to run a given command and it can also operate reliably across different platforms that have Singularity or Apptainer installed.



Build with Docker

Dockerfile

FROM alpine:latest

COPY sum.py /home

RUN apk add --update python3 py3-pip python3-dev

ENTRYPOINT ["python3", "/home/sum.py"]

Some additional notes and warnings

Security Warning

Login credentials including passwords, tokens, secure access tokens or other secrets must never be stored in a container image. If secrets are stored, they are at high risk to be found and exploited when made public.

Alternatives for copying files into a container image

Another approach for getting your own files into a container image is by using the %post section and adding commands that download the files from the Internet. For example, if your code is in a GitHub repository, you could include this statement in your recipe file to download the latest version every time you build the container image:

%post
    ...other installation commands...
    git clone https://github.com/alice/mycode

Similarly, the wget command can be used to download any file publicly available on the internet:

%post
    ...other installation commands...
    wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.10.0/ncbi-blast-2.10.0+-x64-linux.tar.gz

Note that the above examples depend on commands (git and wget respectively) that must be available within your container: Linux distributions such as Alpine may require you to install such commands before using them.

Boring but important notes about installation

There are a lot of choices when it comes to installing software – sometimes too many! Here are some things to consider when creating your own container image:

In general, a good strategy for installing software is:

Sharing your Singularity container images

You have a few different options available to share your container image files with other people, including:

More advanced definition files

Here we’ve looked at a very simple example of how to create an image. At this stage, you might want to have a go at creating your own definition file for some code of your own or an application that you work with regularly. There are several definition file sections that were not used in the above example, these are:

The Sections part of the definition file documentation details all the sections and provides an example definition file that makes use of all the sections.

Additional Singularity features

Singularity has a wide range of features. You can find full details in the Singularity User Guide and we highlight a couple of key features here that may be of use/interest:

Remote Builder Capabilities: If you have access to a platform with Singularity installed but you don’t have root access to create containers, you may be able to use the Remote Builder functionality to offload the process of building an image to remote cloud resources. You’ll need to register for a cloud token via the link on the Remote Builder page.

Signing containers: If you do want to share container image (.sif) files directly with colleagues or collaborators, how can the people you send an image to be sure that they have received the file without it being tampered with or suffering from corruption during transfer/storage? And how can you be sure that the same goes for any container image file you receive from others? Singularity supports signing containers. This allows a digital signature to be linked to an image file. This signature can be used to verify that an image file has been signed by the holder of a specific key and that the file is unchanged from when it was signed. You can find full details of how to use this functionality in the Singularity documentation on Signing and Verifying Containers.

Best practices for writing container image definition files

Take a look at Nüst et al.’s “Ten simple rules for writing Dockerfiles for reproducible data science” [1] for some great examples of best practices to use when writing Dockerfiles. The GitHub repository associated with the paper also has a set of example Singularityfiles demonstrating how the rules highlighted by the paper can be applied.

[1] Nüst D, Sochat V, Marwick B, Eglen SJ, Head T, et al. (2020) Ten simple rules for writing Dockerfiles for reproducible data science. PLOS Computational Biology 16(11): e1008316. https://doi.org/10.1371/journal.pcbi.1008316

Key Points

  • Singularity recipe files specify what is within Singularity container images.

  • The singularity build command is used to build a container image from a recipe file.

  • singularity build requires admin/root privileges so usually needs to be prefixed with sudo.

  • Singularity allows containers to read and write files from the Singularity host.

  • You can copy files from your host system into your Singularity container images at creation time by using the %files section in the recipe file.


Running MPI parallel jobs using Singularity containers

Overview

Teaching: 30 min
Exercises: 40 min
Questions
  • How do I set up and run an MPI job from a Singularity container?

Objectives
  • Learn how MPI applications within Singularity containers can be run on HPC platforms

  • Understand the challenges and related performance implications when running MPI jobs via Singularity

We assume that everyone here is familiar with what MPI is, even if they are not experienced MPI developers.

What is MPI?

MPI - Message Passing Interface - is a widely used standard for parallel programming. It is used for exchanging messages/data between processes in a parallel application. If you’ve been involved in developing or working with computational science software.

Usually, when working on HPC systems, you compile your application against the MPI libraries provided on the system or you use applications that have been compiled by the HPC system support team. This approach to portability: source code portability is the traditional approach to making applications portable to different HPC platforms.

However, compiling complex HPC applications that have lots of dependencies (including MPI) is not always straightforward and can be a significant challenge as most HPC systems differ in various ways in terms of OS and base software available. There are a number of different approaches that can be taken to make it easier to deploy applications on HPC systems; for example, the Spack software automates the dependency resolution and compilation of applications. Containers provide another potential way to resolve these problems but care needs to be taken when interfacing with MPI on the host system which adds more complexity to running containers in parallel on HPC systems.

MPI codes with Singularity containers

Obviously, we will not have admin/root access on the HPC platform we are using so cannot (usually) build our container images on the HPC system itself. However, we do need to ensure our container is using the MPI library on the HPC system itself so we can get the performance benefit of the HPC interconnect. How do we overcome these contradictions?

The answer is that we install a version of the MPI library in our container image that is binary compatible with the MPI library on the host system and install our software in the container image using the local version of the MPI library. At runtime, we then ensure that the MPI library from the host is used within the running container rather than the locally-installed version of MPI.

There are two widely used open source MPI library distributions on HPC systems:

This typically means that if you want to distribute HPC software that uses MPI within a container image you will need to maintain versions that are compatible with both MPICH and OpenMPI. There are efforts underway to provide tools that will provide a binary interface between different MPI implementations, e.g. HPE Cray’s MPIxlate software but these are not generally available yet.

Building a Singularity container image with MPI software

This example makes the assumption that you’ll be building a container image on a local platform and then deploying it to a HPC system with a different but compatible MPI implementation using a combination of the Hybrid and Bind models from the Singularity documentation. We will build our application using MPI in the container image but will bind the MPI library from the host into the container at runtime. See Singularity and MPI applications in the Singularity documentation for more technical details.

The example we will build will:

The Singularity container image definition file to install MPICH and the OSU micro-benchmark we will use to build the container image is shown below. Save this in a file called osu_benchmarks.def

Bootstrap: docker
From: ubuntu:20.04

%environment
    export OSU_DIR=/usr/local/libexec/osu-micro-benchmarks/mpi
    export LD_LIBRARY_PATH=/usr/lib/libibverbs:$LD_LIBRARY_PATH
    export PATH=/usr/local/libexec/osu-micro-benchmarks/mpi/startup:$PATH
    export PATH=/usr/local/libexec/osu-micro-benchmarks/mpi/pt2pt:$PATH
    export PATH=/usr/local/libexec/osu-micro-benchmarks/mpi/collective:$PATH

%post
    export DEBIAN_FRONTEND=noninteractive TZ=Europe/London
    # Install required dependencies
    apt-get update && apt-get install -y --no-install-recommends \
        apt-utils \
        build-essential \
        curl \
        libcurl4-openssl-dev \
        libzmq3-dev \
        pkg-config \
        software-properties-common
    apt-get clean
    apt-get install -y dkms
    apt-get install -y autoconf automake build-essential numactl libnuma-dev autoconf automake gcc g++ git libtool

    # Download and build an ABI compatible MPICH
    cd /
    curl -k -sSLO http://www.mpich.org/static/downloads/3.4.2/mpich-3.4.2.tar.gz \
        && tar -xzf mpich-3.4.2.tar.gz -C /root \
        && cd /root/mpich-3.4.2 \
        && ./configure --prefix=/usr --with-device=ch4:ofi --disable-fortran \
        && make -j8 install \
        && cd / \
        && rm -rf /root/mpich-3.4.2 \
        && rm /mpich-3.4.2.tar.gz

    # Download and build OSU benchmarks
    curl -k -sSLO http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-5.4.1.tar.gz \
        && tar -xzf osu-micro-benchmarks-5.4.1.tar.gz -C /root \
        && cd /root/osu-micro-benchmarks-5.4.1 \
        && ./configure --prefix=/usr/local CC=/usr/bin/mpicc CXX=/usr/bin/mpicxx \
        && cd mpi \
        && make -j8 install \
        && cd / \
        && rm -rf /root/osu-micro-benchmarks-5.4.1 \
        && rm /osu-micro-benchmarks-5.4.1.tar.gz

A quick overview of what the above definition file is doing:

Build with Docker

Dockerfile

FROM ubuntu:20.04

ENV DEBIAN_FRONTEND=noninteractive

# Install the necessary packages (from repo)
RUN apt-get update && apt-get install -y --no-install-recommends \
 apt-utils \
 build-essential \
 curl \
 libcurl4-openssl-dev \
 libzmq3-dev \
 pkg-config \
 software-properties-common
RUN apt-get clean
RUN apt-get install -y dkms
RUN apt-get install -y autoconf automake build-essential numactl libnuma-dev autoconf automake gcc g++ git libtool

# Download and build an ABI compatible MPICH
RUN curl -sSLO http://www.mpich.org/static/downloads/3.4.2/mpich-3.4.2.tar.gz \
   && tar -xzf mpich-3.4.2.tar.gz -C /root \
   && cd /root/mpich-3.4.2 \
   && ./configure --prefix=/usr --with-device=ch4:ofi --disable-fortran \
   && make -j8 install \
   && cd / \
   && rm -rf /root/mpich-3.4.2 \
   && rm /mpich-3.4.2.tar.gz

# OSU benchmarks
RUN curl -sSLO http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-5.4.1.tar.gz \
   && tar -xzf osu-micro-benchmarks-5.4.1.tar.gz -C /root \
   && cd /root/osu-micro-benchmarks-5.4.1 \
   && ./configure --prefix=/usr/local CC=/usr/bin/mpicc CXX=/usr/bin/mpicxx \
   && cd mpi \
   && make -j8 install \
   && cd / \
   && rm -rf /root/osu-micro-benchmarks-5.4.1 \
   && rm /osu-micro-benchmarks-5.4.1.tar.gz

# Add the OSU benchmark executables to the PATH
ENV PATH=/usr/local/libexec/osu-micro-benchmarks/mpi/startup:$PATH
ENV PATH=/usr/local/libexec/osu-micro-benchmarks/mpi/pt2pt:$PATH
ENV PATH=/usr/local/libexec/osu-micro-benchmarks/mpi/collective:$PATH
ENV OSU_DIR=/usr/local/libexec/osu-micro-benchmarks/mpi

# path to mlx IB libraries in Ubuntu
ENV LD_LIBRARY_PATH=/usr/lib/libibverbs:$LD_LIBRARY_PATH

Build and test the OSU Micro-Benchmarks image

Using the above definition file, build a Singularity container image named osu_benchmarks.sif.

Once the image has finished building, test it by running the osu_hello benchmark that is found in the startup benchmark folder.

Note: the build process can take a while. If you want to test running while the build is happening, you can log into ARCHER2 and use a pre-built version of the container image to test. You can find this container image at:

${EPCC_SINGULARITY_DIR}/osu_benchmarks.sif

Solution

You should be able to build an image from the definition file as follows:

$ singularity build osu_benchmarks.sif osu_benchmarks.def

Assuming the image builds successfully, you can then try running the container locally and also transfer the SIF file to a cluster platform that you have access to (that has Singularity installed) and run it there.

Let’s begin with a single-process run of startup/osu_hello on your local system (where you built the container) to ensure that we can run the container as expected. We’ll use the MPI installation within the container for this test. Note that when we run a parallel job on an HPC cluster platform, we use the MPI installation on the cluster to coordinate the run so things are a little different…

Start a shell in the Singularity container based on your image and then run a single process job via mpirun:

$ singularity shell --contain osu_benchmarks.sif
Singularity> mpirun -np 1 osu_hello

You should see output similar to the following:

# OSU MPI Hello World Test v5.7.1
This is a test with 1 processes

Running Singularity containers with MPI on HPC system

Assuming the above tests worked, we can now try undertaking a parallel run of one of the OSU benchmarking tools within our container image on the remote HPC platform.

This is where things get interesting and we will begin by looking at how Singularity containers are run within an MPI environment.

If you’re familiar with running MPI codes, you’ll know that you use mpirun (as we did in the previous example), mpiexec, srun or a similar MPI executable to start your application. This executable may be run directly on the local system or cluster platform that you’re using, or you may need to run it through a job script submitted to a job scheduler. Your MPI-based application code, which will be linked against the MPI libraries, will make MPI API calls into these MPI libraries which in turn talk to the MPI daemon process running on the host system. This daemon process handles the communication between MPI processes, including talking to the daemons on other nodes to exchange information between processes running on different machines, as necessary.

When running code within a Singularity container, we don’t use the MPI executables stored within the container, i.e. we DO NOT run:

singularity exec mpirun -np <numprocs> /path/to/my/executable

Instead we use the MPI installation on the host system to run Singularity and start an instance of our executable from within a container for each MPI process. Without Singularity support in an MPI implementation, this results in starting a separate Singularity container instance within each process. This can present some overhead if a large number of processes are being run on a host. Where Singularity support is built into an MPI implementation this can address this potential issue and reduce the overhead of running code from within a container as part of an MPI job.

Ultimately, this means that our running MPI code is linking to the MPI libraries from the MPI install within our container and these are, in turn, communicating with the MPI daemon on the host system which is part of the host system’s MPI installation. In the case of MPICH, these two installations of MPI may be different but as long as there is ABI compatibility between the version of MPI installed in your container image and the version on the host system, your job should run successfully.

We can now try running a 2-process MPI run of a point to point benchmark osu_latency on ARCHER2.

Undertake a parallel run of the osu_latency benchmark (general example)

Move the osu_benchmarks.sif Singularity image onto ARCHER2 where you are going to undertake your benchmark run using the scp command or similar. Alternatively, you can use the pre-built container image on ARCHER2 at ${EPCC_SINGULARITY_DIR}/osu_benchmarks.sif.

Next, create a job submission script called submit.slurm on the /work file system on ARCHER2 to run containers based on the container image across two nodes on ARCHER2. A template based on the example in the ARCHER2 documentation is:

#!/bin/bash

#SBATCH --job-name=singularity_parallel
#SBATCH --time=0:10:0
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1

#SBATCH --partition=standard
#SBATCH --qos=short
#SBATCH --account=ta121

# Load the module to make the Cray MPICH ABI available
module load cray-mpich-abi

export OMP_NUM_THREADS=1
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK

# Set the LD_LIBRARY_PATH environment variable within the Singularity container
# to ensure that it used the correct MPI libraries.
export SINGULARITYENV_LD_LIBRARY_PATH="/opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/lib-abi-mpich:/opt/cray/pe/mpich/8.1.23/gtl/lib:/opt/cray/libfabric/1.12.1.2.2.0.0/lib64:/opt/cray/pe/gcc-libs:/opt/cray/pe/gcc-libs:/opt/cray/pe/lib64:/opt/cray/pe/lib64:/opt/cray/xpmem/default/lib64:/usr/lib64/libibverbs:/usr/lib64:/usr/lib64"

# This makes sure HPE Cray Slingshot interconnect libraries are available
# from inside the container.
export SINGULARITY_BIND="/opt/cray,/var/spool,/opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/lib-abi-mpich:/opt/cray/pe/mpich/8.1.23/gtl/lib,/etc/host.conf,/etc/libibverbs.d/mlx5.driver,/etc/libnl/classid,/etc/resolv.conf,/opt/cray/libfabric/1.12.1.2.2.0.0/lib64/libfabric.so.1,/opt/cray/pe/gcc-libs/libatomic.so.1,/opt/cray/pe/gcc-libs/libgcc_s.so.1,/opt/cray/pe/gcc-libs/libgfortran.so.5,/opt/cray/pe/gcc-libs/libquadmath.so.0,/opt/cray/pe/lib64/libpals.so.0,/opt/cray/pe/lib64/libpmi2.so.0,/opt/cray/pe/lib64/libpmi.so.0,/opt/cray/xpmem/default/lib64/libxpmem.so.0,/run/munge/munge.socket.2,/usr/lib64/libibverbs/libmlx5-rdmav34.so,/usr/lib64/libibverbs.so.1,/usr/lib64/libkeyutils.so.1,/usr/lib64/liblnetconfig.so.4,/usr/lib64/liblustreapi.so,/usr/lib64/libmunge.so.2,/usr/lib64/libnl-3.so.200,/usr/lib64/libnl-genl-3.so.200,/usr/lib64/libnl-route-3.so.200,/usr/lib64/librdmacm.so.1,/usr/lib64/libyaml-0.so.2"

# Launch the parallel job.
srun --hint=nomultithread --distribution=block:block \
    singularity exec ${EPCC_SINGULARITY_DIR}/osu_benchmarks.sif \
        osu_latency

Finally, submit the job to the batch system with

sbatch submit.slurm

Solution

As you can see in the mpirun command shown above, we have called srun on the host system and are passing to MPI the singularity executable for which the parameters are the image file and the name of the benchmark executable we want to run.

The following shows an example of the output you should expect to see. You should have latency values reported for message sizes up to 4MB.

Rank 1 - About to run: /.../mpi/pt2pt/osu_latency
Rank 0 - About to run: /.../mpi/pt2pt/osu_latency
# OSU MPI Latency Test v5.6.2
# Size          Latency (us)
0                       0.38
1                       0.34
...

This has demonstrated that we can successfully run a parallel MPI executable from within a Singularity container.

Investigate performance of native benchmark compared to containerised version

To get an idea of any difference in performance between the code within your Singularity image and the same code built natively on the target HPC platform, try running the osu_allreduce benchmarks natively on ARCHER2 on all cores on at least 16 nodes (if you want to use more than 32 nodes, you will need to use the standard QoS rather than the short QoS). Then try running the same benchmark that you ran via the Singularity container. Do you see any performance differences?

What do you see?

Do you see the same when you run on small node counts - particularly a single node?

Note: a native version of the OSU micro-benchmark suite is available on ARCHER2 via module load osu-benchmarks.

Discussion

Here are some selected results measured on ARCHER2:

1 node:

  • 4 B
    • Native: 6.13 us
    • Container: 5.30 us (16% faster)
  • 128 KiB
    • Native: 173.00 us
    • Container: 230.38 us (25% slower)
  • 1 MiB
    • Native: 1291.18 us
    • Container: 2101.02 us (39% slower)

16 nodes:

  • 4 B
    • Native: 17.66 us
    • Container: 18.15 us (3% slower)
  • 128 KiB
    • Native: 237.29 us
    • Container: 303.92 us (22% slower)
  • 1 MiB
    • Native: 1501.25 us
    • Container: 2359.11 us (36% slower)

32 nodes:

  • 4 B
    • Native: 30.72 us
    • Container: 24.41 us (20% faster)
  • 128 KiB
    • Native: 265.36 us
    • Container: 363.58 us (26% slower)
  • 1 MiB
    • Native: 1520.58 us
    • Container: 2429.24 us (36% slower)

For the medium and large messages, using a container produces substantially worse MPI performance for this benchmark on ARCHER2. When the messages are very small, containers match the native performance and can actually be faster.

Is this true for other MPI benchmarks that use all the cores on a node or is it specific to Allreduce?

Summary

Singularity can be combined with MPI to create portable containers that run software in parallel across multiple compute nodes. However, there are some limitations, specifically:

Key Points

  • Singularity images containing MPI applications can be built on one platform and then run on another (e.g. an HPC cluster) if the two platforms have compatible MPI implementations.

  • When running an MPI application within a Singularity container, use the MPI executable on the host system to launch a Singularity container for each process.

  • Think about parallel application performance requirements and how where you build/run your image may affect that.


Containers in Research Workflows: Reproducibility and Granularity

Overview

Teaching: 25 min
Exercises: 15 min
Questions
  • How can I use container images to make my research more reproducible?

  • How do I incorporate containers into my research workflow?

Objectives
  • Understand how container images can help make research more reproducible.

  • Understand what practical steps I can take to improve the reproducibility of my research using containers.

Although this workshop is titled “Reproducible computational environments using containers”, so far we have mostly covered the mechanics of using Singularity with only passing reference to the reproducibility aspects. In this section, we discuss these aspects in more detail.

Work in progress…

Note that reproducibility aspects of software and containers are an active area of research, discussion and development so are subject to many changes. We will present some ideas and approaches here but best practices will likely evolve in the near future.

Reproducibility

By reproducibility here we mean the ability of someone else (or your future self) being able to reproduce what you did computationally at a particular time (be this in research, analysis or something else) as closely as possible even if they do not have access to exactly the same hardware resources that you had when you did the original work.

Some examples of why containers are an attractive technology to help with reproducibility include:

Sharing images

We have made use of a few different online repositories during this course, such as Sylabs Cloud Library and Docker Hub which provide platforms for sharing container images publicly. Once you have uploaded a container image, you can point people to its public location and they can download and build upon it.

This is fine for working collaboratively with container images on a day-to-day basis but these repositories are not a good option for long time archive of container images in support of research and publications as:

Archiving and persistently identifying container images using Zenodo

When you publish your work or make it publicly available in some way it is good practice to make container images that you used for computational work available in an immutable, persistent way and to have an identifier that allows people to cite and give you credit for the work you have done. Zenodo is one service that provides this functionality.

Zenodo supports the upload of zip archives and we can capture our Singularity container images as zip archives. For example, to convert the container image we created earlier, alpine-sum.sif in this lesson to a zip archive (on the command line):

zip alpine-sum.zip alpine-sum.sif

Note: These zip container images can become quite large and Zenodo supports uploads up to 50GB. If your container image is too large, you may need to look at other options to archive them or work to reduce the size of the container images.

Once you have your archive, you can deposit it on Zenodo and this will:

In addition to the archive file itself, the deposit process will ask you to provide some basic metadata to classify the container image and the associated work.

Note that Zenodo is not the only option for archiving and generating persistent DOIs for container images. There are other services out there – for example, some organizations may provide their own, equivalent, service.

Reproducibility good practice

Container Granularity

As mentioned above, one of the decisions you may need to make when containerising your research workflows is what level of granularity you wish to employ. The two extremes of this decision could be characterized as:

Of course, many real applications will sit somewhere between these two extremes.

Positives and negatives

What are the advantages and disadvantages of the two approaches to container granularity for research workflows described above? Think about this and write a few bullet points for advantages and disadvantages for each approach in the course Etherpad.

Solution

This is not an exhaustive list but some of the advantages and disadvantages could be:

Single large container image

  • Advantages:
    • Simpler to document
    • Full set of requirements packaged in one place
    • Potentially easier to maintain (though could be opposite if working with large, distributed group)
  • Disadvantages:
    • Could get very large in size, making it more difficult to distribute
    • May end up with same dependency issues within the container image from different software requirements
    • Potentially more complex to test
    • Less re-useable for different, but related, work

Multiple smaller container images

  • Advantages:
    • Individual components can be re-used for different, but related, work
    • Individual parts are smaller in size making them easier to distribute
    • Avoid dependency issues between different pieces of software
    • Easier to test
  • Disadvantage:
    • More difficult to document
    • Potentially more difficult to maintain (though could be easier if working with large, distributed group)
    • May end up with dependency issues between component container images if they get out of sync

Next steps with containers

Now that we’re at the end of the lesson material, take a moment to reflect on what you’ve learned, how it applies to you, and what to do next.

  1. In your own notes, write down or diagram your understanding of Singularity containers and container images: concepts, commands, and how they work.
  2. In your own notes, write down how you think you might use containers in your daily work. If there’s something you want to try doing with containers right away, what is a next step after this workshop to make that happen?

Key Points

  • Container images allow us to encapsulate the computation (and data) we have used in our research.

  • Using online containerimage repositories allows us to easily share computational work we have done.

  • Using container images along with a DOI service such as Zenodo allows us to capture our work and enables reproducibility.


[Optional] The Singularity cache

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • Why does Singularity use a local cache?

  • Where does Singularity store images?

Objectives
  • Learn about Singularity’s image cache.

  • Learn how to manage Singularity images stored locally.

Singularity’s image cache

Singularity uses a local cache to save downloaded container image files in addition to storing them as the file you specify. As we saw in the previous episode, images are simply .sif files stored on your local disk.

If you delete a local .sif container image that you have pulled from a remote container image repository and then pull it again, if the container image is unchanged from the version you previously pulled, you will be given a copy of the container image file from your local cache rather than the container image being downloaded again from the remote source. This removes unnecessary network transfers and is particularly useful for large container images which may take some time to transfer over the network. To demonstrate this, remove the lolcow.sif file stored in your test directory and then issue the pull command again:

$ rm lolcow.sif
$ singularity pull lolcow.sif docker://ghcr.io/apptainer/lolcow
INFO:    Using cached image

As we can see in the above output, the container image has been returned from the cache and we do not see the output that we saw previously showing the container image being downloaded from the Cloud Library.

How do we know what is stored in the local cache? We can find out using the singularity cache command:

$ singularity cache list
There are 1 container file(s) using 71.54 MiB and 4 oci blob file(s) using 73.00 MiB of space
Total space used: 144.54 MiB

This tells us how many container image files are stored in the cache and how much disk space the cache is using but it doesn’t tell us what is actually being stored. To find out more information we can add the -v verbose flag to the list command:

$ singularity cache list -v
NAME                     DATE CREATED           SIZE             TYPE
16ec32c2132b43494832a0   2023-09-04 07:55:27    27.24 MiB        blob
5ca731fc36c28789c5ddc3   2023-09-04 07:55:28    45.75 MiB        blob
9a3b8e28e0be343c2f8828   2023-09-04 07:55:29    0.50 KiB         blob
fd0daa4d897cbb381c3bad   2023-09-04 07:55:29    1.36 KiB         blob
5b140746df59d5a75498f9   2023-09-04 07:55:34    71.54 MiB        oci-tmp

There are 1 container file(s) using 71.54 MiB and 4 oci blob file(s) using 73.00 MiB of space
Total space used: 144.54 MiB

This provides us with some more useful information about the actual container images stored in the cache. In the TYPE column we can see that our container image type is oci-tmp because it’s a SIF container image that has been created by merging together OCI layers from a Docker container image. We also have each of the original OCI layers pulled from Docker Hub stored in the cache with type blob.

Cleaning the Singularity image cache

We can remove container images from the cache using the singularity cache clean command. Running the command without any options will display a warning and ask you to confirm that you want to remove everything from your cache.

You can also remove specific container images or all container images of a particular type. Look at the output of singularity cache clean --help for more information.

Cache location

By default, Singularity uses $HOME/.singularity/cache as the location for the cache. You can change the location of the cache by setting the SINGULARITY_CACHEDIR environment variable to the cache location you want to use.

Key Points

  • Singularity caches downloaded images so that an unchanged image isn’t downloaded again when it is requested using the singularity pull command.

  • You can free up space in the cache by removing all locally cached images or by specifying individual images to remove.


[Optional] Using Singularity to run BLAST+

Overview

Teaching: 30 min
Exercises: 30 min
Questions
  • How can I use Singularity to run bioinformatics workflows with BLAST+?

Objectives
  • Show example of using Singularity with a common bioinformatics tool.

We have now learned enough to be able to use Sigularity to deploy software without us needed to install the software itself on the host system.

In this section we will demonstrate the use of a Singularity container image that provides the BLAST+ software.

Source material

This example is based on the example from the official NCBI BLAST+ Docker container documentation Note: the efetch parts of the step-by-step guide do not currently work using Singularity version of the image so we provide a dataset with the data already downloaded.

(This is because the NCBI BLAST+ Docker container image has the efetch tool installed in the /root directory and this special location gets overwritten during the conversion to a Singularity container image.)

Download the required data

Download the blast_example.tar.gz.

Unpack the archive which contains the downloaded data required for the BLAST+ example:

tar -xvf blast_example.tar.gz
x blast/
x blast/blastdb/
x blast/queries/
x blast/fasta/
x blast/results/
x blast/blastdb_custom/
x blast/fasta/nurse-shark-proteins.fsa
x blast/queries/P01349.fsa

Finally, move into the newly created directory:

cd blast
ls  
blastdb        blastdb_custom fasta          queries        results

Create the Singularity container image

NCBI provide official Docker containers with the BLAST+ software hosted on Docker Hub. We can create a Singularity container image from the Docker container image with:

singularity pull ncbi-blast.sif docker://ncbi/blast
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob f3b81f6693c5 done  
Copying blob 9e3ea8720c6d done  
Copying blob f1910abb61ed done  
Copying blob 5ac33d4de47b done  
Copying blob 8402427c8382 done  
Copying blob 06add1a477bc done  
Copying blob d9781f222125 done  
Copying blob 4aae31cc8a8b done  
Copying blob 6a61413c1ffa done  
Copying blob c657bf8fc6ca done  
Copying blob 1776e565f5f8 done  
Copying blob d90474a0d8c8 done  
Copying blob 0bc89cb1b9d7 done  
Copying blob b8a272fccf13 done  
Copying blob 891eb09f891f done  
Copying blob 4c64befa8a35 done  
Copying blob 7ab0b7afbc21 done  
Copying blob b007c620c60b done  
Copying blob f877ffc04713 done  
Copying blob 6ee97c348001 done  
Copying blob 03f0ee97190b done  
Copying config 28914b3519 done  
Writing manifest to image destination
Storing signatures
2023/06/16 08:26:53  info unpack layer: sha256:9e3ea8720c6de96cc9ad544dddc695a3ab73f5581c5d954e0504cc4f80fb5e5c
2023/06/16 08:26:53  info unpack layer: sha256:06add1a477bcffec8bac0529923aa8ae25d51f0660f0c8ef658e66aa89ac82c2
2023/06/16 08:26:53  info unpack layer: sha256:f3b81f6693c592ab94c8ebff2109dc60464d7220578331c39972407ef7b9e5ec
2023/06/16 08:26:53  info unpack layer: sha256:5ac33d4de47beb37ae35e9cad976d27afa514ab8cbc66e0e60c828a98e7531f4
2023/06/16 08:27:03  info unpack layer: sha256:8402427c8382ab723ac504155561fb6d3e5ea1e7b4f3deac8449cec9e44ae65a
2023/06/16 08:27:03  info unpack layer: sha256:f1910abb61edef8947e9b5556ec756fd989fa13f329ac503417728bf3b0bae5e
2023/06/16 08:27:03  info unpack layer: sha256:d9781f222125b5ad192d0df0b59570f75b797b2ab1dc0d82064c1b6cead04840
2023/06/16 08:27:03  info unpack layer: sha256:4aae31cc8a8b726dce085e4e2dc4671a9be28162b8d4e1b1c00b8754f14e6fe6
2023/06/16 08:27:03  info unpack layer: sha256:6a61413c1ffa309d92931265a5b0ecc9448568f13ccf3920e16aaacc8fdfc671
2023/06/16 08:27:03  info unpack layer: sha256:c657bf8fc6cae341e3835cb101dc4c6839ba4aad69578ff8538b3c1eba7abb21
2023/06/16 08:27:04  info unpack layer: sha256:1776e565f5f85562b8601edfd29c35f3fba76eb53177c8e89105f709387e3627
2023/06/16 08:27:04  info unpack layer: sha256:d90474a0d8c8e6165d909cc0ebbf97dbe70fd759a93eff11a5a3f91fa09a470e
2023/06/16 08:27:04  warn rootless{root/edirect/aux/lib/perl5/Mozilla/CA/cacert.pem} ignoring (usually) harmless EPERM on setxattr "user.rootlesscontainers"
2023/06/16 08:27:04  warn rootless{root/edirect/aux/lib/perl5/Mozilla/CA.pm} ignoring (usually) harmless EPERM on setxattr "user.rootlesscontainers"
2023/06/16 08:27:04  warn rootless{root/edirect/aux/lib/perl5/Mozilla/mk-ca-bundle.pl} ignoring (usually) harmless EPERM on setxattr "user.rootlesscontainers"
2023/06/16 08:27:04  info unpack layer: sha256:0bc89cb1b9d7ca198a7a1b95258006560feffaff858509be8eb7388b315b9cf5
2023/06/16 08:27:04  info unpack layer: sha256:b8a272fccf13b721fa68826f17f0c2bb395de377e0d22c98d38748eb5957a4c6
2023/06/16 08:27:04  info unpack layer: sha256:891eb09f891ff2c26f24a5466112e134f6fb30bd3d0e78c14c0d676b0e68d60a
2023/06/16 08:27:04  info unpack layer: sha256:4c64befa8a35c9f8518324524dfc27966753462a4c07b2234811865387058bf4
2023/06/16 08:27:04  info unpack layer: sha256:7ab0b7afbc21b75697a7b8ed907ee9b81e5b17a04895dc6ff7d25ea2ba1eeba4
2023/06/16 08:27:04  info unpack layer: sha256:b007c620c60b91ce6a9e76584ecc4bc062c822822c204d8c2b1c8668193d44d1
2023/06/16 08:27:04  info unpack layer: sha256:f877ffc04713a03dffd995f540ee13b65f426b350cdc8c5f1e20c290de129571
2023/06/16 08:27:04  info unpack layer: sha256:6ee97c348001fca7c98e56f02b787ce5e91d8cc7af7c7f96810a9ecf4a833504
2023/06/16 08:27:04  info unpack layer: sha256:03f0ee97190baebded2f82136bad72239254175c567b19def105b755247b0193
INFO:    Creating SIF file...

Now we have a container with the software in, we can use it.

Build and verify the BLAST database

Our example dataset has already downloaded the query and database sequences. We first use these downloaded data to create a custom BLAST database by using a container to run the command makeblastdb with the correct options.

singularity exec ncbi-blast.sif \
    makeblastdb -in fasta/nurse-shark-proteins.fsa -dbtype prot \
    -parse_seqids -out nurse-shark-proteins -title "Nurse shark proteins" \
    -taxid 7801 -blastdb_version 5

Building a new DB, current time: 06/16/2023 14:35:07
New DB name:   /home/auser/test/blast/blast/nurse-shark-proteins
New DB title:  Nurse shark proteins
Sequence type: Protein
Keep MBits: T
Maximum file size: 3000000000B
Adding sequences from FASTA; added 7 sequences in 0.0199499 seconds.

To verify the newly created BLAST database above, you can run the blastdbcmd -entry all -db nurse-shark-proteins -outfmt "%a %l %T" command to display the accessions, sequence length, and common name of the sequences in the database.

singularity exec ncbi-blast.sif \
    blastdbcmd -entry all -db nurse-shark-proteins -outfmt "%a %l %T"
Q90523.1 106 7801
P80049.1 132 7801
P83981.1 53 7801
P83977.1 95 7801
P83984.1 190 7801
P83985.1 195 7801
P27950.1 151 7801

Now we have our database we can run queries against it.

Run a query against the BLAST database

Lets execute a query on our database using the blastp command:

singularity exec ncbi-blast.sif \
    blastp -query queries/P01349.fsa -db nurse-shark-proteins \
    -out results/blastp.out

At this point, you should see the results of the query in the output file results/blastp.out. To view the content of this output file, use the command less results/blastp.out.

less results/blastp.out
...output trimmed...

Query= sp|P01349.2|RELX_CARTA RecName: Full=Relaxin; Contains: RecName:
Full=Relaxin B chain; Contains: RecName: Full=Relaxin A chain

Length=44
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

P80049.1 RecName: Full=Fatty acid-binding protein, liver; AltName...  14.2    0.96


>P80049.1 RecName: Full=Fatty acid-binding protein, liver; AltName: Full=Liver-type
fatty acid-binding protein; Short=L-FABP
Length=132

...output trimmed...

With your query, BLAST identified the protein sequence P80049.1 as a match with a score of 14.2 and an E-value of 0.96.

Accessing online BLAST databases

As well as building your own local database to query, you can also access databases that are available online. For example, to see which databases are available online in the Google Compute Platform (GCP):

singularity exec ncbi-blast.sif update_blastdb.pl --showall pretty --source gcp
Connected to GCP
BLASTDB                                                      DESCRIPTION                                                                                                              SIZE (GB)      LAST_UPDATED
nr                                                           All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects        369.4824      2023-06-10
swissprot                                                    Non-redundant UniProtKB/SwissProt sequences                                                                                 0.3576      2023-06-10
refseq_protein                                               NCBI Protein Reference Sequences                                                                                          146.5088      2023-06-12
landmark                                                     Landmark database for SmartBLAST                                                                                            0.3817      2023-04-25
pdbaa                                                        PDB protein database                                                                                                        0.1967      2023-06-10
nt                                                           Nucleotide collection (nt)                                                                                                319.5044      2023-06-11
pdbnt                                                        PDB nucleotide database                                                                                                     0.0145      2023-06-09
patnt                                                        Nucleotide sequences derived from the Patent division of GenBank                                                           15.7342      2023-06-09
refseq_rna                                                   NCBI Transcript Reference Sequences                                                                                        47.8721      2023-06-12

...output trimmed...

Similarly, for databases hosted at NCBI:

singularity exec ncbi-blast.sif update_blastdb.pl --showall pretty --source ncbi
Connected to NCBI
BLASTDB                                                      DESCRIPTION                                                                                                              SIZE (GB)      LAST_UPDATED
env_nr                                                       Proteins from WGS metagenomic projects (env_nr).                                                                            3.9459      2023-06-11
SSU_eukaryote_rRNA                                           Small subunit ribosomal nucleic acid for Eukaryotes                                                                         0.0063      2022-12-05
LSU_prokaryote_rRNA                                          Large subunit ribosomal nucleic acid for Prokaryotes                                                                        0.0041      2022-12-05
16S_ribosomal_RNA                                            16S ribosomal RNA (Bacteria and Archaea type strains)                                                                       0.0178      2023-06-16
env_nt                                                       environmental samples                                                                                                      48.8599      2023-06-08
LSU_eukaryote_rRNA                                           Large subunit ribosomal nucleic acid for Eukaryotes                                                                         0.0053      2022-12-05
ITS_RefSeq_Fungi                                             Internal transcribed spacer region (ITS) from Fungi type and reference material                                             0.0067      2022-10-28
Betacoronavirus                                              Betacoronavirus                                                                                                            55.3705      2023-06-16

...output trimmed...

Notes

You have now completed a simple example of using a complex piece of bioinformatics software through Singularity containers. You may have noticed that some things just worked without you needing to set them up even though you were running using containers:

  1. We did not need to explicitly bind any files/directories in to the container. This worked because Singularity automatically binds the current directory into the running container, so any data in the current directory (or its subdirectories) will generally be available in running Singularity containers. (If you have used Docker containers, you will notice that this is different from the defalt behaviour there.)
  2. Access to the internet is automatically available within the running container in the same way as it is on the host system without us needed to specify any additional options.
  3. Files and data we create within the container have the right ownership and permissions for us to access outside the container.

In addtion, we were able to use the tools in the container image provided by NCBI without having to do any work to install the software irrespecetive of the computing platform that we are using. (In fact, the example this is based on runs the pipeline using Docker on a cloud computing platform rather than on your local systeam.)

Key Points

  • We can use containers to run software without having to install it

  • The commands we use are very similar to those we would use natively

  • Singularity handles a lot of complexity around data and internet access for us


[Optional] Additional topics and next steps

Overview

Teaching: 60 min
Exercises: 0 min
Questions
  • How do I understand more on how containers work?

  • What different container technologies are there and what are differences/implications?

  • How can I orchestrate different containers?

Objectives
  • Understand container technologies better.

  • Provide useful links to continue your journey with containers.

Additional topics

Key Points

  • TBC