Reproducible Computational Environments Using Containers: Introduction to Docker and Singularity

Introducing Containers

Overview

Teaching: 15 min
Exercises: 5 min
Questions
  • What are containers, and why might they be useful to me?

Objectives
  • Show how software depending on other software leads to configuration management problems.

  • Identify the problems that software installation problems can pose for research.

  • Give two examples of how containers can solve software configuration problems.

Disclaimers

  1. Docker is complex software used for many different purposes. We are unlikely to give examples that suit all of your potential ideal use-cases, but would be delighted to at least open up discussion of what those use-cases might be.

  2. Containers are a topic that requires significant amounts of technical background to understand in detail. Most of the time containers, particularly as wrapped up by Docker, do not require you to have a deep technical understanding of container technology, but when things go wrong, the diagnostic messages may turn opaque rather quickly.

Scientific Software Challenges

What’s Your Experience?

Take a minute to think about challenges that you have experienced in using scientific software (or software in general!) for your research. Then, share with your neighbors and try to come up with a list of common gripes or challenges.

You may have come up with some of the following:

Etc.

A lot of these characteristics boil down to one fact: the main program you want to use likely depends on many, many, different other programs (including the operating system!), creating a very complex, and often fragile system. One change or missing piece may stop the whole thing from working or break something that was already running. It’s no surprise that this situation is sometimes informally termed “dependency hell”.

Software and Science

Again, take a minute to think about how the software challenges we’ve discussed could impact (or have impacted!) the quality of your work. Share your thoughts with your neighbors. What can go wrong if our software doesn’t work?

Unsurprisingly, software installation and configuration challenges can have negative consequences for research:

Thankfully there are ways to get underneath (a lot of) this mess: containers to the rescue! Containers provide a way to package up software dependencies and access to resources such as files and communications networks in a uniform manner.

What is a Container?

Docker is a tool that allows you to build what are called “containers.” It’s not the only tool that can create containers, but is the one we’ve chosen for this part of the workshop. But what is a container?

To understand containers, let’s first talk briefly about your computer.

Your computer has some standard pieces that allow it to work - often what’s called the hardware. One of these pieces is the CPU or processor; another is the amount of memory or RAM that your computer can use to store information temporarily while running programs; another is the hard drive, which can store information over the long-term. All these pieces work together to do the “computing” of a computer, but we don’t see them, because they’re hidden away.

Instead, what we see is our desktop, program windows, different folders, and files. These all live in what’s called the file system. Everything on your computer - programs, pictures, documents - lives somewhere in the file system. One way to think of the file system is the layer of stuff that can be activated to use use the CPU, memory and hard drive of your computer.

NOW, imagine you wanted to have a second computer. You don’t want to buy a whole new computer because it’s too expensive. What if, instead, you could have another file system that you could store and access from your main computer, but that is self-contained?

A container system (like Docker) is a special program on your computer that does this. The term “container” can be usefully considered with reference to shipping containers. Before shipping containers were developed, packing and unpacking cargo ships was time consuming, and error prone, with high potential for different clients’ goods to become mixed up. Software containers standardise the packaging of a complete software system: you can drop a container into a computer with the container software installed (also called a container host), and it should “just work”.

Virtualization

Containers are an example of what’s called virtualization – having a second “virtual” computer running and accessible from a main or host computer. Another example of virtualization are virtual machines or VMs. A virtual machine typically contains a whole copy of an operating system in addition to its own file system and has to get booted up in the same way a computer would. A container is considered a lightweight version of a virtual machine; underneath, the container is using the Linux kernel and simply has some flavor of Linux + the file system inside.

One final term: if the container is an alternative file system layer that you can access and run from your computer, the container image is like a template for that container. The container image has all the needed information to start up a running copy of the container. A running container tends to be transient and can be started and shut down. The image is more long-lived, as a source file for the container. You could think of the container image like a cookie cutter – it can be used to create multiple copies of the same shape (or container) and is relatively unchanging, where cookies come and go. If you want a different type of container (cookie) you need a different image (cookie cutter).

Putting the Pieces Together

Think back to some of the challenges we described at the beginning. The many layers of scientific software installations make it hard to install and re-install scientific software – which ultimately, hinders reliability and reproducibility.

But now, think about what a container is - a self-contained, complete, separate computer file system. What if you put your scientific software tools into a container?

This solves several of our problems:

The rest of this workshop will show you how to download and run pre-existing containers on your own computer, and how to create and share your own containers.

Key Points

  • Almost all software depends on other software components to function, but these components have independent evolutionary paths.

  • Projects involving many software components can rapidly run into a combinatoric explosion in the number of software version configurations available, yet only a subset of possible configurations actually works as desired.

  • Containers collect software components together and can help avoid software dependency problems.

  • Virtualisation is an old technology that container technology makes more practical.

  • Docker is just one software platform that can create containers and the resources they use.


Introducing the Docker command line

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • How do I interact with Docker?

Objectives
  • Explain how to check that Docker is installed and is ready to use.

  • Demonstrate some initial Docker command line interactions.

Docker command line

Start the Docker application that you installed in working through the setup instructions for this session. Note that this might not be necessary if your laptop is running Linux or if the installation added the Docker application to your startup process.

You may need to login to Docker Hub

The Docker application will usually provide a way for you to log in to the Docker Hub using the application’s menu (macOS) or systray icon (Windows) and it is usually convenient to do this when the application starts. This will require you to use your Docker Hub username and your password. We will not actually require access to Dockerhub until later in the course but if you can login now, you should do so.

Determining your Docker Hub username

If you no longer recall your Docker Hub username, e.g., because you have been logging into the Docker Hub using your email address, you can find out what it is through the steps:

  • Open http://hub.docker.com/ in a web browser window
  • Sign-in using your email and password (don’t tell us what it is)
  • In the top-right of the screen you will see your username

Once your Docker application is running, open a shell (terminal) window, and run the following command to check that Docker is installed and the command line tools are working correctly. I have appended the output that I see on my Mac, but the specific version is unlikely to matter much: it certainly does not have to precisely match mine.

$ docker --version
Docker version 19.03.5, build 633a0ea

The above command has not actually relied on the part of Docker that runs containers, just that Docker is installed and you can access it correctly from the command line.

A command that checks that Docker is working correctly is the docker container list command (we cover this command in more detail later in the course).

Without explaining the details, output on a newly installed system would likely be:

$ docker container ls
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

(The command docker info will achieve a similar end but produces a larger amount of output.)

However, if you instead get a message similar to the following

Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

then you need to check that you have started the Docker Desktop, Docker Engine, or however else you worked through the setup instructions.

Key Points

  • A toolbar icon indicates that Docker is ready to use.

  • You will typically interact with Docker using the command line.


Break

Overview

Teaching: min
Exercises: min
Questions
Objectives

Comfort break

Key Points


Exploring and Running Containers

Overview

Teaching: 20 min
Exercises: 10 min
Questions
  • How do I interact with a Docker container on my computer?

Objectives
  • Use the correct command to see which Docker images are on your computer.

  • Download new Docker images.

  • Demonstrate how to start an instance of a container from an image.

  • Describe at least two ways to run commands inside a running Docker container.

Reminder of terminology: images and containers

Recall that a container “image” is the template from which particular instances of containers will be created.

Let’s explore our first Docker container. The Docker team provides a simple container image online called hello-world. We’ll start with that one.

Downloading Docker images

The docker image command is used to list and modify Docker images. You can find out what container images you have on your computer by using the following command (“ls” is short for “list”):

$ docker image ls

If you’ve just installed Docker, you won’t see any images listed.

To get a copy of the hello-world Docker image from the internet, run this command:

$ docker pull hello-world

You should see output like this:

Using default tag: latest
latest: Pulling from library/hello-world
1b930d010525: Pull complete
Digest: sha256:f9dfddf63636d84ef479d645ab5885156ae030f611a56f3a7ac7f2fdd86d7e4e
Status: Downloaded newer image for hello-world:latest
docker.io/library/hello-world:latest

DockerHub

Where did the hello-world image come from? It came from the DockerHub website, which is a place to share Docker images with other people. More on that in a later episode.

Exercise: Check on Your Images

What command would you use to see if the hello-world Docker image had downloaded successfully and was on your computer? Give it a try before checking the solution.

Solution

To see if the hello-world image is now on your computer, run:

$ docker image ls

Note that the downloaded hello-world image is not in the folder where you are in the terminal! (Run ls by itself to check.) The image is not a file like our normal programs and files; Docker stores it in a specific location that isn’t commonly accessed, so it’s necessary to use the special docker image command to see what Docker images you have on your computer.

Running the hello-world container

To create and run containers from named Docker images you use the docker run command. Try the following docker run invocation. Note that it does not matter what your current working directory is.

$ docker run hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

What just happened? When we use the docker run command, Docker does three things:

1. Starts a Running Container 2. Performs Default Action 3. Shuts Down the Container
Starts a running container, based on the image. Think of this as the “alive” or”inflated” version of the container – it’s actually doing something If the container has a default action set, it will perform that default action. This could be as simple as printing a message (as above) or running a whole analysis pipeline! Once the default action is complete, the container stops running (or exits). The image is still there, but nothing is actively running.

The hello-world container is set up to run an action by default - namely to print this message.

Using docker run to get the image

We could have skipped the docker pull step; if you use the docker run command and you don’t already have a copy of the Docker image, Docker will automatically pull the image first and then run it.

Running a container with a chosen command

But what if we wanted to do something different with the container? The output just gave us a suggestion of what to do – let’s use a different Docker image to explore what else we can do with the docker run command. The suggestion above is to use ubuntu, but we’re going to run a different type of Linux, alpine instead because it’s quicker to download.

Run the Alpine Docker container

Try downloading and running the alpine Docker container. You can do it in two steps, or one. What are they?

What happened when you ran the Alpine Docker container?

$ docker run alpine

If you never used the alpine docker image on your computer, docker probably printed a message that it couldn’t find the image and had to download it. If you used the alpine image before, the command will probably show no output. That’s because this particular container is designed for you to provide commands yourself. Try running this instead:

$ docker run alpine cat /etc/os-release

You should see the output of the cat /etc/os-release command, which prints out the version of Alpine Linux that this container is using and a few additional bits of information.

Hello World, Part 2

Can you run the container and make it print a “hello world” message?

Give it a try before checking the solution.

Solution

Use the same command as above, but with the echo command to print a message.

$ docker run alpine echo 'Hello World'

So here, we see another option – we can provide commands at the end of the docker run command and they will execute inside the running container.

Running containers interactively

In all the examples above, Docker has started the container, run a command, and then immediately shut down the container. But what if we wanted to keep the container running so we could log into it and test drive more commands? The way to do this is by adding the interactive flag -it to the docker run command and by providing a shell (usually bash or sh) as our command. The alpine docker image doesn’t include bash so we need to use sh.

$ docker run -it alpine sh

Technically…

Technically, the interactive flag is just -i, the extra -t (combined as -it above) is an option that allows you to connect to a shell like bash. But since usually you want to have a command line when run interactively, it always makes sense to use the two together.

Your prompt should change significantly to look like this:

/ #

That’s because you’re now inside the running container! Try these commands:

All of these are being run from inside the running container, so you’ll get information about the container itself, instead of your computer. To finish using the container, just type exit.

/ # exit

Practice Makes Perfect

Can you find out the version of Linux installed on the busybox container? Can you find the busybox program? What does it do? (Hint: passing --help to almost any command will give you more information.)

Solution 1 - Interactive

Run the busybox container interactively – you can use docker pull first, or just run it with this command:

$ docker run -it busybox sh

Then try, running these commands

/# cat /proc/version
/# busybox --help

Exit when you’re done.

/# exit

Solution 2 - Run commands

Run the busybox container, first with a command to read out the Linux version:

$ docker run busybox cat /proc/version

Then run the container again with a command to print out the busybox help:

$ docker run busybox busybox --help

Conclusion

So far, we’ve seen how to download Docker images, use them to run commands inside running containers, and even how to explore a running container from the inside. Next, we’ll take a closer look at all the different kinds of Docker images that are out there.

Key Points

  • The docker pull command downloads Docker images from the internet.

  • The docker image command lists Docker images that are (now) on your computer.

  • The docker run command creates running containers from images and can run commands inside them.

  • When using the docker run command, a container can run a default action (if it has one), a user specified action, or a shell to be used interactively.


Finding Containers on the Docker Hub

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • What is the Docker Hub, and why is it useful?

Objectives
  • Explain how the Docker Hub augments Docker use.

  • Explore the Docker Hub webpage for a popular Docker image.

  • Find the list of tags for a particular Docker image.

  • Identify the three components of a container’s identifier.

In the previous episode, we ran a few different containers: hello-world, alpine, and maybe busybox. Where did these containers come from? The Docker Hub!

Introducing the Docker Hub

The Docker Hub is an online repository of container images, a vast number of which are publicly available. A large number of the images are curated by the developers of the software that they package. Also, many commonly used pieces of software that have been containerised into images are specifically endorsed, which means that you can trust the containers to have been checked for functionality, stability, and that they don’t contain malware.

Docker can be used without connecting to the Docker Hub

Note that while the Docker Hub is well integrated into Docker functionality, the Docker Hub is certainly not required for all types of use of Docker containers. For example, some organisations may run container infrastructure that is entirely disconnected from the Internet.

Exploring an Example Docker Hub Page

As an example of a Docker Hub page, let’s explore the page for the python language. The most basic form of containerised python is in the “python” image (which is endorsed by the Docker team). Open your web browser to https://hub.docker.com/_/python to see what is on a typical Docker hub software page.

The top-left provides information about the name, short description, popularity (i.e., over a million downloads in the case of this image), and endorsements.

The top-right provides the command to pull this image to your computer.

The main body of the page contains many used headings, such as:

At least in my experience, the “Examples of how to use the image” section of most images’ pages will provide examples that are likely to adequately cover your intended use of the image.

Exploring Image Versions

A single Docker Hub page can have many different versions of container images, based on the version of the software inside. These versions are indicated by “tags”. When referring to the specific version of a container by its tag, you use a colon, :, like this:

CONTAINERNAME:TAG

So if I wanted to download the python container, with Python 3.8, I would use this name:

$ docker pull python:3.8

But if I wanted to download a Python 3.6 container, I would use this name:

$ docker pull python:3.6

The default tag (which is used if you don’t specify one) is called latest.

So far, we’ve only seen containers that are maintained by the Docker team. However, it’s equally common to use containers that have been produced by individual owners or organizations. Containers that you create and upload to Docker Hub would fall into this category, as would the containers maintained by organizations like ContinuumIO (the folks who develop the Anaconda Python environment) or community groups like rocker, a group that builds community R containers.

The name for these group- or individually-managed containers have this format:

OWNER/CONTAINERNAME:TAG

Repositories

The technical name for the contents of a Docker Hub page is a “repository.” The tag indicates the specific version of the container image that you’d like to use from a particular repository. So a slightly more accurate version of the above example is:

OWNER/REPOSITORY:TAG

What’s in a name?

How would I download the Docker container produced by the rocker group that has version 3.6.1 of R and the tidyverse installed?

Solution

First, search for rocker in Docker Hub. Then look for their tidyverse image. You can look at the list of tags, or just guess that the tag is 3.6.1. Altogether, that means that the name of the container we want to download is:

$ docker pull rocker/tidyverse:3.6.1

Many Different Containers

There are many different containers on Docker Hub. This is where the real advantage of using containers shows up – each container represents a complete software installation that you can use and access without any extra work!

The easiest way to find containers is to search on Docker Hub, but sometimes software pages have a link to their containers from their home page.

What container is right for you?

Find a Docker container that’s relevant to you. If you’re unsuccessful in your search, or don’t know what to look for, you can use the R or Python containers we’ve already seen.

Once you find a container, use the skills from the previous episode to download the image and explore it.

Key Points

  • The Docker Hub is an online repository of container images.

  • Many Docker Hub images are public, and may be officially endorsed.

  • Each Docker Hub page about an image provides structured information and subheadings

  • Most Docker Hub pages about images contain sections that provide examples of how to use those images.

  • Many Docker Hub images have multiple versions, indicated by tags.

  • The naming convention for Docker containers is: OWNER/CONTAINER:TAG


Cleaning Up Containers

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • How do I interact with a Docker container on my computer?

Objectives
  • Explain how to list running and completed containers.

Removing images

The images and their corresponding containers can start to take up a lot of disk space if you don’t clean them up occasionally, so it’s a good idea to periodically remove container images that you won’t be using anymore.

In order to remove a specific image, you need to find out details about the image, specifically, the “image ID”. For example say my laptop contained the following image.

$ docker image ls
REPOSITORY       TAG         IMAGE ID       CREATED          SIZE
hello-world      latest      fce289e99eb9   15 months ago    1.84kB

You can remove the image with a docker image rm command that includes the image ID, such as:

$ docker image rm fce289e99eb9

or use the image name, like so:

$ docker image rm hello-world

However, you may see this output:

Error response from daemon: conflict: unable to remove repository reference "hello-world" (must force) - container e7d3b76b00f4 is using its referenced image fce289e99eb9

This happens when Docker hasn’t cleaned up some of the times when a container has been actually run. So before removing the container image, we need to be able to see what containers are currently running, or have been run recently, and how to remove these.

What containers are running?

Working with containers, we are going to shift to a new docker command: docker container. Similar to docker image, we can list running containers by typing:

$ docker container ls
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

Notice that this command didn’t return any containers because our containers all exited and thus stopped running after they completed their work.

docker ps

The command docker ps serves the same purpose as docker container ls, and comes from the Unix shell command ps which describes running processes.

What containers have run recently?

There is also a way to list running containers, and those that have completed recently, which is to add the --all/-a flag to the docker container ls command as shown below.

$ docker container ls --all
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS                     PORTS               NAMES
9c698655416a        hello-world         "/hello"            2 minutes ago       Exited (0) 2 minutes ago                       zen_dubinsky
6dd822cf6ca9        hello-world         "/hello"            3 minutes ago       Exited (0) 3 minutes ago                       eager_engelbart

Keeping it clean

You might be surprised at the number of containers Docker is still keeping track of. One way to prevent this from happening is to add the --rm flag to docker run. This will completely wipe out the record of the run container when it exits. If you need a reference to the running container for any reason, don’t use this flag.

How do I remove an exited container?

To delete an exited container you can run the following command, inserting the CONTAINER ID for the container you wish to remove. It will repeat the CONTAINER ID back to you, if successful.

$ docker container rm 9c698655416a
9c698655416a

If you want to remove all exited containers at once you can use the docker containers prune command. Be careful with this command. If you have containers you may want to reconnect to, you should not use this command. It will ask you if to confirm you want to remove these containers, see output below. If successfull it will print the full CONTAINER ID back to you.

$ docker container prune
WARNING! This will remove all stopped containers.
Are you sure you want to continue? [y/N] y
Deleted Containers:
9c698655416a848278d16bb1352b97e72b7ea85884bff8f106877afe0210acfc
6dd822cf6ca92f3040eaecbd26ad2af63595f30bb7e7a20eacf4554f6ccc9b2b

Removing images, for real this time

Now that we’ve removed any potentially running or stopped containers, we can try again to delete the hello-world image.

$ docker image rm hello-world
Untagged: hello-world:latest
Untagged: hello-world@sha256:5f179596a7335398b805f036f7e8561b6f0e32cd30a32f5e19d17a3cda6cc33d
Deleted: sha256:fce289e99eb9bca977dae136fbe2a82b6b7d4c372474c9235adc1741675f587e
Deleted: sha256:af0b15c8625bb1938f1d7b17081031f649fd14e6b233688eea3c5483994a66a3

The reason that there are a few lines of output, is that a given image may have been formed by merging multiple underlying layers. Any layers that are used by multiple Docker images will only be stored once. Now the result of docker image ls should no longer include the hello-world image.

Key Points

  • The docker container command lists containers that have been created.


Creating your own container images

Overview

Teaching: 20 min
Exercises: 15 min
Questions
  • How can I make my own Docker images?

Objectives
  • Explain the purpose of a Dockerfile and show some simple examples.

  • Demonstrate how to build a Docker image from a Dockerfile.

  • Compare the steps of creating a container interactively versus a Dockerfile.

  • Create an installation strategy for a container

  • Demonstrate how to upload (‘push’) your container images to the Docker Hub.

  • Describe the significance of the Docker Hub naming scheme.

There are lots of reasons why you might want to create your own Docker image.

Interactive installation

Before creating a reproducible installation, let’s experiment with installing software inside a container. Start the alpine container from before, interactively:

$ docker run -it alpine sh

Because this is a basic container, there’s a lot of things not installed – for example, python3.

/# python3
sh: python3: not found

Inside the container, we can run commands to install Python 3. The Alpine version of Linux has a installation tool called apk that we can use to install Python 3.

/# apk add --update python3 py3-pip python3-dev

We can test our installation by running a Python command:

/# python3 --version

Once Python is installed, we can add Python packages using the pip package installer:

/# pip install cython

Exercise: Searching for Help

Can you find instructions for installing R on Alpine Linux? Do they work?

Solution

A quick search should hopefully show that the way to install R on Alpine Linux is:

/# apk add R

Once we exit, these changes are not saved to a new container by default. There is a command that will “snapshot” our changes, but building containers this way is not very reproducible. Instead, we’re going to take what we’ve learned from this interactive installation and create our container from a reproducible recipe, known as a Dockerfile.

If you haven’t already, exit out of the interactively running container.

/# exit

Put installation instructions in a Dockerfile

A Dockerfile is a plain text file with keywords and commands that can be used to create a new container image.

From your shell, go to the folder you downloaded at the start of the lesson and print out the Dockerfile inside:

$ cd ~/Desktop/docker-intro/basic
$ cat Dockerfile
FROM <EXISTING IMAGE>
RUN <INSTALL CMDS FROM SHELL>
RUN <INSTALL CMDS FROM SHELL>
CMD <CMD TO RUN BY DEFAULT>

Let’s break this file down:

Exercise: Take a Guess

Do you have any ideas about what we should use to fill in the sample Dockerfile to replicate the installation we did above?

Solution:

Based on our experience above, edit the Dockerfile (in your text editor of choice) to look like this:

FROM alpine
RUN apk add --update python3 py3-pip python3-dev
RUN pip install cython
CMD cat /proc/version && python3 --version

The recipe provided by this Dockerfile will use Alpine Linux as the base container, add Python and the Cython library, and set a default print command.

Create a new Docker image

So far, we just have a file. We want Docker to take this file, run the install commands inside, and then save the resulting container as a new container image. To do this we will use the docker build command.

We have to provide docker build with two pieces of information:

$ docker build -t USERNAME/CONTAINERNAME .

The -t option names the container; the final dot indicates that the Dockerfile is in our current directory.

For example, if my user name was alice and I wanted to call my image alpine-python, I would use this command:

$ docker build -t alice/alpine-python .

Exercise: Review!

  1. Think back to earlier. What command can you run to check if your image was created successfully? (Hint: what command shows the images on your computer?)

  2. We didn’t specify a tag for our image name. What did Docker automatically use?

  3. What command will run the container you’ve created? What should happen by default if you run the container? Can you make it do something different, like print “hello world”?

Solution

  1. To see your new image, run docker image ls. You should see the name of your new image under the “REPOSITORY” heading.

  2. In the output of docker image ls, you can see that Docker has automatically used the latest tag for our new image.

  3. We want to use docker run to run the container.

$ docker run alice/alpine-python

should run the container and print out our default message, including the version of Linux and Python.

$ docker run alice/alpine-python echo "Hello World"

will run the container and print out “Hello world” instead.

While it may not look like you have achieved much, you have already effected the combination of a lightweight Linux operating system with your specification to run a given command that can operate reliably on macOS, Microsoft Windows, Linux and on the cloud!

Boring but important notes about installation

There are a lot of choices when it comes to installing software - sometimes too many! Here are some things to consider when creating your own container:

In general, a good strategy for installing software is:

Share your new container on Docker Hub

Images that you release publicly can be stored on the Docker Hub for free. If you name your image as described above, with your Docker Hub username, all you need to do is run the opposite of docker pulldocker push.

$ docker push alice/alpine-python

Make sure to substitute the full name of your container!

In a web browser, open https://hub.docker.com, and on your user page you should now see your container listed, for anyone to use or build on.

Logging In

Technically, you have to be logged into Docker on your computer for this to work. Usually it happens by default, but if docker push doesn’t work for you, run docker login first, enter your Docker Hub username and password, and then try docker push again.

What’s in a name? (again)

You don’t have to name your containers using the USERNAME/CONTAINER:TAG naming scheme. On your own computer, you can call containers whatever you want and refer to them by the names you choose. It’s only when you want to share a container that it needs the correct naming format.

You can rename images using the docker tag command. For example, imagine someone named Alice has been working on a workflow container and called it workflow-test on her own computer. She now wants to share it in her alice Docker Hub account with the name workflow-complete and a tag of v1. Her docker tag command would look like this:

$ docker tag workflow-test alice/workflow-complete:v1

She could then push the re-named container to Docker Hub, using docker push alice/workflow-complete:v1

Key Points

  • Dockerfiles specify what is within Docker images.

  • The docker build command is used to build an image from a Dockerfile

  • You can share your Docker images through the Docker Hub so that others can create Docker containers from your images.


Lunch

Overview

Teaching: min
Exercises: min
Questions
Objectives

Lunch break

Key Points


Creating More Complex Container Images

Overview

Teaching: 30 min
Exercises: 30 min
Questions
  • How can I make more complex container images?

Objectives
  • Explain how you can include files within Docker images when you build them.

  • Explain how you can access files on the Docker host from your Docker containers.

In order to create and use your own containers, you may need more information than our previous example. You may want to use files from outside the container, copy those files into the container, and just generally learn a little bit about software installation. This episode will cover these. Note that the examples will get gradually more and more complex - most day-to-day use of containers can be accomplished using the first 1-2 sections on this page.

Using scripts and files from outside the container

In your shell, change to the sum folder in the docker-intro folder and look at the files inside.

$ cd ~/Desktop/docker-intro/sum
$ ls

This folder has both a Dockerfile and a python script called sum.py. Let’s say we wanted to try running the script using our recently created alpine-python container.

Running containers

What command would we use to run python from the alpine-python container?

If we try running the container and Python script, what happens?

$ docker run alice/alpine-python python3 sum.py
python3: can't open file 'sum.py': [Errno 2] No such file or directory

No such file or directory

What does the error message mean? Why might the Python inside the container not be able to find or open our script?

The problem here is that the container and its file system is separate from our host computer’s file system. When the container runs, it can’t see anything outside itself, including any of the files on our computer. In order to use Python (inside the container) and our script (outside the container, on our computer), we need to create a link between the directory on our computer and the container.

This link is called a “mount” and is what happens automatically when a USB drive or other external hard drive gets connected to a computer - you can see the contents appear as if they were on your computer.

We can create a mount between our computer and the running container by using an additional option to docker run. We’ll also use the variable $PWD which will substitute in our current working directory. The option will look like this

-v $PWD:/temp

What this means is – link my current directory with the container, and inside the container, name the directory /temp

Let’s try running the command now:

$ docker run -v $PWD:/temp alice/alpine-python python3 sum.py

But we get the same error!

python3: can't open file 'sum.py': [Errno 2] No such file or directory

This final piece is a bit tricky – we really have to remember to put ourselves inside the container. Where is the sum.py file? It’s in the directory that’s been mapped to /temp – so we need to include that in the path to the script. This command should give us what we need:

$ docker run -v $PWD:/temp alice/alpine-python python3 /temp/sum.py

Note that if we create any files in the /temp directory while the container is running, these files will appear on our host filesystem in the original directory and will stay there even when the container stops.

Exercise: Explore the script

What happens if you use the docker run command above and put numbers after the script name?

Solution

This script comes from the Python Wiki and is set to add all numbers that are passed to it as arguments.

Exercise: Checking the options

Our Docker command has gotten much longer! Can you go through each piece of the Docker command above the explain what it does? How would you characterize the key components of a Docker command?

Solution

Here’s a breakdown of each piece of the command above

  • docker run: use Docker to run a container
  • -v $PWD:/temp: connect my current working directory ($PWD) as a folder inside the container called /temp
  • alice/alpine-python: name of the container to run
  • python3 /temp/sum.py: what commands to run in the container

More generally, every Docker command will have the form: docker [action] [docker options] [docker image] [command to run inside]

Exercise: Interactive jobs

Try using the directory mount option but run the container interactively. Can you find the folder that’s connected to your computer? What’s inside?

Solution

The docker command to run the container interactively is:

$ docker run -v $PWD:/temp -it alice/alpine-python sh

Once inside, you should be able to navigate to the /temp folder and see that’s contents are the same as the files on your computer:

/# cd /temp
/# ls

Mounting a folder can be very useful when you want to run the software inside your container on many different input files. In other situations, you may want to save or archive an authoritative version of your data by adding it to the container permanently. That’s what we will cover next.

Including personal scripts and data in a container

Our next project will be to add our own files to a container - something you might want to do if you’re sharing a finished analysis or just want to have an archived copy of your entire analysis including the data. Let’s assume that we’ve finished with our sum.py script and want to add it to the container itself.

In your shell, you should still be in the sum folder in the docker-intro folder.

$ pwd
$ /Users/yourname/Desktop/docker-intro/sum

We will modify our Dockerfile again to build an image based on Alpine Linux with Python 3 installed (just as we did perviously). This time we will add an additional line before the CMD line:

COPY sum.py /home

This line will cause Docker to copy the file from your computer into the container’s file system at build time. Modify the Dockerfile as before (or copy the version from the basic/ subdirectory) and add the extra copy line. Once you have done that, build the container like before, but give it a different name:

$ docker build -t alice/alpine-sum .

Exercise: Did it work?

Can you remember how to run a container interactively? Try that with this one. Once inside, try running the Python script.

Solution

You can start the container interactively like so:

$ docker run -it alice/alpine-sum sh

You should be able to run the python command inside the container like this:

/# python3 /home/sum.py

This COPY keyword can be used to place your own scripts or own data into a container that you want to publish or use as a record. Note that it’s not necessarily a good idea to put your scripts inside the container if you’re constantly changing or editing them. Then, referencing the scripts from outside the container is a good idea, as we did in the previous section. You also want to think carefully about size – if you run docker image ls you’ll see the size of each image all the way on the right of the screen. The bigger your image becomes, the harder it will be to easily download.

Copying alternatives

Another trick for getting your own files into a container is by using the RUN keyword and downloading the files from the internet. For example, if your code is in a GitHub repository, you could include this statement in your Dockerfile to download the latest version every time you build the container:

RUN git clone https://github.com/alice/mycode

Similarly, the wget command can be used to download any file publicly available on the internet:

RUN wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.10.0/ncbi-blast-2.10.0+-x64-linux.tar.gz

More fancy Dockerfile options

We can expand on the example above to make our container even more “automatic”. Here are some ideas:

FROM alpine

COPY sum.py /home
RUN apk add --update python py-pip python-dev

# Run the sum.py script as the default command
CMD python3 /home/sum.py
# OR
# CMD ["python3", "/home/sum.py"]

Build and test it:

$ docker build -t alpine-sum:v1 .
$ docker run alpine-sum:v1
FROM alpine

COPY sum.py /home
RUN apk add --update python3 py3-pip python3-dev

# Run the sum.py script as the default command and
# allow people to enter arguments for it
ENTRYPOINT ["python3", "/home/sum.py"]

Build and test it:

$ docker build -t alpine-sum:v2 .
$ docker run alpine-sum:v2 1 2 3 4
FROM alpine

COPY sum.py /home
# set script permissions
RUN chmod +x /home/sum.py
# add /home folder to the PATH
ENV PATH /home:$PATH

RUN apk add --update python py-pip python-dev

Build and test it:

$ docker build -t alpine-sum:v3 .
$ docker run alpine-sum:v3 sum.py 1 2 3 4

Key Points

  • You can include files from your Docker host into your Docker images by using the COPY instruction in your Dockerfile.

  • Docker allows containers to read and write files from the Docker host.


Break

Overview

Teaching: min
Exercises: min
Questions
Objectives

Comfort break

Key Points


Containers used in generating this lesson

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • How can containers be useful to me for building websites?

Objectives
  • Demonstrate how to construct a website using containers to transform a specification into a fully-presented website.

The website for this lesson is generated mechanically, based on a set of files that specify the configuration of the site, its presentation template, and the content to go on this page. This is far more manageable than editing each webpage of the lesson separately, for example, if the page header needs to change, this change can be made in one place, and all the pages regenerated. The alternative would be needing to edit each page to repeat the change: this is not productive or suitable work for humans to do!

In your shell window, move to your docker-intro directory. We will be expanding a ZIP file into this directory later.

Now open a web browser window and:

  1. Navigate to the GitHub repository that contains the files for this session, at https://github.com/carpentries-incubator/docker-introduction/;
  2. Click the green “Clone or download” button on the right-hand side of the page;
  3. Click “Download ZIP”.
  4. The downloaded ZIP file should contain one directory named docker-introduction-gh-pages.
  5. Move the docker-introduction-gh-pages folder into the docker-intro folder.

There are many ways to work with ZIP files

Note that the last two steps can be achieved using a Mac or Windows graphical user interface. There are also ways to effect expanding the ZIP archive on the command line, for example, on my Mac I can achieve the effect of those last two steps through running the command unzip ~/Downloads/docker-introduction-gh-pages.zip.

In your shell window, if you cd into the docker-introduction-gh-pages folder and list the files, you should see something similar to what I see:

$ cd docker-introduction-gh-pages
$ ls
AUTHORS			_episodes		code
CITATION		_episodes_rmd		data
CODE_OF_CONDUCT.md	_extras			fig
CONTRIBUTING.md		_includes		files
LICENSE.md		_layouts		index.md
Makefile		aio.md			reference.md
README.md		assets			setup.md
_config.yml		bin

You can now request that a container is created that will compile the files in this set into the lesson website, and will run a simple webserver to allow you to view your version of the website locally. Note that this command will be long and fiddly to type, so you probably want to copy-and-paste it into your shell window. This command will continue to (re-)generate and serve up your version of the lesson website, so you will not get your shell prompt back until you type control+c. This will stop the webserver, since it cleans away the container.

For macOS, Linux and PowerShell:

$ docker run --rm -it -v ${PWD}:/srv/jekyll -p 127.0.0.1:4000:4000 jekyll/jekyll:latest jekyll serve

When I ran the macOS command, the output was as follows:

Unable to find image 'jekyll/jekyll:latest' locally
latest: Pulling from jekyll/jekyll
cbdbe7a5bc2a: Pull complete 
aa8ae8202b42: Pull complete 
b21786fe7c0d: Pull complete 
68296e6645b2: Pull complete 
6b1c37303e2d: Pull complete 
0fb11dc849c1: Pull complete 
Digest: sha256:bb45414c3fefa80a75c5001f30baf1dff48ae31dc961b8b51003b93b60675334
Status: Downloaded newer image for jekyll/jekyll:latest

jekyll serve
Fetching gem metadata from https://rubygems.org/...........
Fetching gem metadata from https://rubygems.org/.
Resolving dependencies.....
Using concurrent-ruby 1.1.7
Using i18n 0.9.5
Fetching minitest 5.14.2
Installing minitest 5.14.2
Using thread_safe 0.3.6
Fetching tzinfo 1.2.8
Installing tzinfo 1.2.8
Fetching zeitwerk 2.4.2
Installing zeitwerk 2.4.2
Fetching activesupport 6.0.3.4
Installing activesupport 6.0.3.4
Fetching public_suffix 3.1.1
Installing public_suffix 3.1.1
Using addressable 2.7.0
Using bundler 2.1.4
Fetching coffee-script-source 1.11.1
Installing coffee-script-source 1.11.1
Using execjs 2.7.0
Using coffee-script 2.4.1
Using colorator 1.1.0
Using ruby-enum 0.8.0
Fetching commonmarker 0.17.13
Installing commonmarker 0.17.13 with native extensions
Fetching unf_ext 0.0.7.7
Installing unf_ext 0.0.7.7 with native extensions
Fetching unf 0.1.4
Installing unf 0.1.4
Fetching simpleidn 0.1.1
Installing simpleidn 0.1.1
Fetching dnsruby 1.61.5
Installing dnsruby 1.61.5
Using eventmachine 1.2.7
Using http_parser.rb 0.6.0
Fetching em-websocket 0.5.2
Installing em-websocket 0.5.2
Using ffi 1.13.1
Using ethon 0.12.0
Fetching multipart-post 2.1.1
Installing multipart-post 2.1.1
Fetching ruby2_keywords 0.0.2
Installing ruby2_keywords 0.0.2
Fetching faraday 1.1.0
Installing faraday 1.1.0
Using forwardable-extended 2.6.0
Using gemoji 3.0.1
Fetching sawyer 0.8.2
Installing sawyer 0.8.2
Fetching octokit 4.19.0
Installing octokit 4.19.0
Using typhoeus 1.4.0
Fetching github-pages-health-check 1.16.1
Installing github-pages-health-check 1.16.1
Using rb-fsevent 0.10.4
Using rb-inotify 0.10.1
Using sass-listen 4.0.0
Using sass 3.7.4
Using jekyll-sass-converter 1.5.2
Fetching listen 3.3.3
Installing listen 3.3.3
Using jekyll-watch 2.2.1
Fetching rexml 3.2.4
Installing rexml 3.2.4
Using kramdown 2.3.0
Using liquid 4.0.3
Using mercenary 0.3.6
Using pathutil 0.16.2
Fetching rouge 3.23.0
Installing rouge 3.23.0
Using safe_yaml 1.0.5
Using jekyll 3.9.0
Fetching jekyll-avatar 0.7.0
Installing jekyll-avatar 0.7.0
Fetching jekyll-coffeescript 1.1.1
Installing jekyll-coffeescript 1.1.1
Using jekyll-commonmark 1.3.1
Fetching jekyll-commonmark-ghpages 0.1.6
Installing jekyll-commonmark-ghpages 0.1.6
Fetching jekyll-default-layout 0.1.4
Installing jekyll-default-layout 0.1.4
Fetching jekyll-feed 0.15.1
Installing jekyll-feed 0.15.1
Fetching jekyll-gist 1.5.0
Installing jekyll-gist 1.5.0
Fetching jekyll-github-metadata 2.13.0
Installing jekyll-github-metadata 2.13.0
Using mini_portile2 2.4.0
Using nokogiri 1.10.10
Using html-pipeline 2.14.0
Using jekyll-mentions 1.6.0
Fetching jekyll-optional-front-matter 0.3.2
Installing jekyll-optional-front-matter 0.3.2
Using jekyll-paginate 1.1.0
Fetching jekyll-readme-index 0.3.0
Installing jekyll-readme-index 0.3.0
Using jekyll-redirect-from 0.16.0
Fetching jekyll-relative-links 0.6.1
Installing jekyll-relative-links 0.6.1
Fetching rubyzip 2.3.0
Installing rubyzip 2.3.0
Fetching jekyll-remote-theme 0.4.2
Installing jekyll-remote-theme 0.4.2
Using jekyll-seo-tag 2.6.1
Using jekyll-sitemap 1.4.0
Fetching jekyll-swiss 1.0.0
Installing jekyll-swiss 1.0.0
Fetching jekyll-theme-architect 0.1.1
Installing jekyll-theme-architect 0.1.1
Fetching jekyll-theme-cayman 0.1.1
Installing jekyll-theme-cayman 0.1.1
Fetching jekyll-theme-dinky 0.1.1
Installing jekyll-theme-dinky 0.1.1
Fetching jekyll-theme-hacker 0.1.2
Installing jekyll-theme-hacker 0.1.2
Fetching jekyll-theme-leap-day 0.1.1
Installing jekyll-theme-leap-day 0.1.1
Fetching jekyll-theme-merlot 0.1.1
Installing jekyll-theme-merlot 0.1.1
Fetching jekyll-theme-midnight 0.1.1
Installing jekyll-theme-midnight 0.1.1
Fetching jekyll-theme-minimal 0.1.1
Installing jekyll-theme-minimal 0.1.1
Fetching jekyll-theme-modernist 0.1.1
Installing jekyll-theme-modernist 0.1.1
Fetching jekyll-theme-primer 0.5.4
Installing jekyll-theme-primer 0.5.4
Fetching jekyll-theme-slate 0.1.1
Installing jekyll-theme-slate 0.1.1
Fetching jekyll-theme-tactile 0.1.1
Installing jekyll-theme-tactile 0.1.1
Fetching jekyll-theme-time-machine 0.1.1
Installing jekyll-theme-time-machine 0.1.1
Fetching jekyll-titles-from-headings 0.5.3
Installing jekyll-titles-from-headings 0.5.3
Using jemoji 0.12.0
Using kramdown-parser-gfm 1.1.0
Using minima 2.5.1
Using unicode-display_width 1.7.0
Using terminal-table 1.8.0
Fetching github-pages 209
Installing github-pages 209
Bundle complete! 1 Gemfile dependency, 91 gems now installed.
Use `bundle info [gemname]` to see where a bundled gem is installed.
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-linux-musl]
Configuration file: /srv/jekyll/_config.yml
            Source: /srv/jekyll
       Destination: /srv/jekyll/_site
 Incremental build: disabled. Enable with --incremental
      Generating... 
      Remote Theme: Using theme carpentries/carpentries-theme
                    done in 7.46 seconds.
 Auto-regeneration: enabled for '/srv/jekyll'
    Server address: http://0.0.0.0:4000
  Server running... press ctrl-c to stop

In the preceding output, you see Docker downloading the image for Jekyll, which is a tool for building websites from specification files such as those used for this lesson. The line jekyll serve indicates a command that runs within the Docker container instance. The output below that is from the Jekyll tool itself, highlighting that the website has been built, and indicating that there is a server running.

Open a web browser window and visit the address http://localhost:4000/. You should see a site that looks very similar to that at https://carpentries-incubator.github.io/docker-introduction/.

Using a new shell window, or using your laptop’s GUI, locate the file index.md within the docker-introduction-gh-pages directory, and open it in your preferred editor program.

Near the top of this file you should see the description starting “This session aims to introduce the use of Docker containers with the goal of using them to effect reproducible computational environments.” Make a change to this message, and save the file.

If you reload your web browser, the change that you just made should be visible. This is because the Jekyll container saw that you changed the index.md file, and regenerated the website.

You can stop the Jekyll container by clicking in its terminal window and typing control+c.

You have now achieved using a reproducible computational environment to reproduce a lesson about reproducible computing environments.

Key Points

  • The generation of this lesson website can be effected using a container.


Containers in research workflows: reproducibility and granularity

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • How can I use container images to make my research more reproducible?

  • How do I incorporate containers into my research workflow?

  • What are container orchestration tools and how can they potentially help me?

Objectives
  • Understand how container images can help make research more reproducible.

  • Understand what practical steps I can take to improve the reproducibility of my research using containers.

  • Know that container orchestration tools are and what they can do

Although this workshop is titled “Reproducible computational environments using containers”, so far we have mostly covered the mechanics of using Docker with only passing reference to the reproducibility aspects. In this section, we discuss these aspects in more detail.

Work in progress…

Note that reproducibility aspects of software and containers are an active area of research, discussion and development so are subject to many changes. We will present some ideas and approaches here but best practices will likely evolve in the near future.

Reproducibility

By reproducibility here we mean the ability of someone else (or your future self) being able to reproduce what you did computationally at a particular time (be this in research, analysis or something else) as closely as possible even if they do not have access to exactly the same hardware resources # that you had when you did the original work.

Some examples of why containers are an attractive technology to help with reproducibility include:

Sharing images

As we have already seen, the Docker Hub provides a platform for sharing images publicly. Once you have uploaded an image, you can point people to its public location and they can download and build upon it.

This is fine for working collaboratively with images on a day-to-day basis but the Docker Hub is not a good option for long time archive of images in support of research and publications as:

Archiving and persistently identifying images using Zenodo

When you publish your work or make it publicly available in some way it is good practice to make images that you used for computational work available in an immutable, persistent way and to have an identifier that allows people to cite and give you credit for the work you have done. Zenodo provides this functionality.

Zenodo supports the archiving of tar archives and we can capture our Docker images as tar archives using the docker save command. For example, to export the image we created earlier in this lesson:

docker save alice/alpine-python:v1 -o alpine-python.tar

These tar images can become quite large and Zenodo supports uploads up to 50GB so you may need to compress your archive to make it fit on Zenodo using a tool such as gzip (or zip):

gzip alpine-python.tar

Once you have your archive, you can deposit it on Zenodo and this will:

In addition to the archive file itself, the deposit process will ask you to provide some basic metadata to classify the image and the associated work.

Note that Zenodo is not the only option for archiving and generating persistent DOIs for images. There are other services out there - for example, some organizations may provide their own, equivalent, service.

Restoring the image from a save

Unsurprisingly, the command docker load alpine-python.tar.gz would be used to load the saved container and make it available to be used on your system. Note that the command can restore the compressed container directly without the need to uncompress first.

Reproducibility good practice

Container Granularity

As mentioned above, one of the decisions you may need to make when containerising your research workflows is what level of granularity you wish to employ. The two extremes of this decision could be characterised as:

Of course, many real applications will sit somewhere between these two extremes.

Positives and negatives

What are the advantages and disadvantages of the two approaches to container granularity for research workflows described above? Think about this and write a few bullet points for advantages and disadvantages for each approach in the course Etherpad.

Solution

This is not an exhaustive list but some of the advantages and disadvantages could be:

Single large container:

  • Advantages:
    • Simpler to document
    • Full set of requirements packaged in one place
    • Potentially easier to maintain (though could be opposite if working with large, distributed group)
  • Disadvantages:
    • Could get very large in size, making it more difficult to distribute
      • Could use Docker multi-stage build docs.docker.com/develop/develop-images/multistage-build to reduce size
      • Singularity also has a multistage build feature: sylabs.io/guides/3.2/user-guide/definition_files.html#multi-stage-builds
    • May end up with same dependency issues within the container from different software requirements
    • Potentially more complex to test
    • Less re-useable for different, but related, work

Multiple smaller containers:

  • Advantages:
    • Individual components can be re-used for different, but related, work
    • Individual parts are smaller in size making them easier to distribute
    • Avoid dependency issues between different softwares
    • Easier to test
  • Disadvantage:
    • More difficult to document
    • Potentially more difficult to maintain (though could be easier if working with large, distributed group)
    • May end up with dependency issues between component containers if they get out of sync

Container Orchestration

Although you can certainly manage research workflows that use multiple containers manually, there are a number of container orchestration tools that you may find useful when managing workflows that use multiple containers. We won’t go in depth on using these tools in this lesson but instead briefly describe a few options and point to useful resources on using these tools to allow you to explore them yourself.

The Wild West

Use of container orchestration tools for research workflows is a relatively new concept and so there is not a huge amount of documentation and experience out there at the moment. You may need to search around for useful information or, better still, contact your friendly neighbourhood RSE to discuss what you want to do.

Docker Compose provides a way of constructing a unified workflow (or service) made up of multiple individual Docker containers. In addition to the individual Dockerfiles for each container, you provide a higher-level configuration file which describes the different containers and how they link together along with shared storage definitions between the containers. Once this high-level configuration has been defined, you can use single commands to start and stop the orchestrated set of containers.

Kubernetes is an open source framework that provides similar functionality to Docker Compose. Its particular strengths are that is platform independent and can be used with many different container technologies and that it is widely available on cloud platforms so once you have implemented your workflow in Kubernetes it can be deployed in different locations as required. It has become the de facto standard for container orchestration.

Docker Swarm provides a way to scale out to multiple copies of similar containers. This potentially allows you to parallelise and scale out your research workflow so that you can run multiple copies and increase throughput. This would allow you, for example, to take advantage of multiple cores on a local system or run your workflow in the cloud to access more resources. Docker Swarm uses the concept of a manager container and worker containers to implement this distribution.

Key Points

  • Container images allow us to encapsulate the computation (and data) we have used in our research.

  • Using a service such as Docker Hub allows us to easily share computational work we have done.

  • Using container images along with a DOI service such as Zenodo allows us to capture our work and enables reproducibility.

  • Tools such as Docker Compose, Docker Swarm and Kubernetes allow us to describe how multiple containers work together.


Singularity: Getting started

Overview

Teaching: 30 min
Exercises: 20 min
Questions
  • What is Singularity and why might I want to use it?

Objectives
  • Understand what Singularity is and when you might want to use it.

  • Undertake your first run of a simple Singularity container.

The episodes in this lesson will introduce you to the Singularity container platform and demonstrate how to set up and use Singularity.

This material is split into 2 parts:

Part I: Basic usage, working with images

  1. Singularity: Getting started: This introductory episode
  2. Working with Singularity containers: Going into a little more detail about Singularity containers and how to work with them

Part II: Creating images, running parallel codes

  1. Building Singularity images: Explaining how to build and share your own Singularity images
  2. Running MPI parallel jobs using Singularity containers: Explaining how to run MPI parallel codes from within Singularity containers

Work in progress…

This lesson is new material that is under ongoing development. We will introduce Singularity and demonstrate how to work with it. As the tools and best practices continue to develop, elements of this material are likely to evolve. We welcome any comments or suggestions on how the material can be improved or extended.

Singularity - Part I

What is Singularity?

Singularity is another container platform. In some ways it appears similar to Docker from a user perspective, but in others, particularly in the system’s architecture, it is fundamentally different. These differences mean that Singularity is particularly well-suited to running on distributed, High Performance Computing (HPC) infrastructure, as well as a Linux laptop or desktop!

System administrators will not, generally, install Docker on shared computing platforms such as lab desktops, research clusters or HPC platforms because the design of Docker presents potential security issues for shared platforms with multiple users. Singularity, on the other hand, can be run by end-users entirely within “user space”, that is, no special administrative privileges need to be assigned to a user in order for them to run and interact with containers on a platform where Singularity has been installed.

Getting started with Singularity

Initially developed within the research community, Singularity is open source and the repository is currently available in the “The Next Generation of High Performance Computing” GitHub organisation. Part I of this Singularity material is intended to be undertaken on a remote platform where Singularity has been pre-installed.

If you’re attending a taught version of this course, you will be provided with access details for a remote platform made available to you for use for Part I of the Singularity material. This platform will have the Singularity software pre-installed.

Installing Singularity on your own laptop/desktop

If you have a Linux system on which you have administrator access and you would like to install Singularity locally on this system, some information is provided in Part II of the Singularity material.

Check that Singularity is available

Sign in to the remote platform, with Singularity installed, that you’ve been provided with access to. Check that the singularity command is available in your terminal:

$ singularity --version
singularity version 3.7.0

Depending on the version of Singularity installed on your system, you may see a different version. At the time of writing, v3.7.0 is the latest release of Singularity.

Singularity on HPC systems: Loading a module

HPC systems often use modules to provide access to software on the system. If you get a command not found error (e.g. bash: singularity: command not found or similar) you may need to load the singularity module before you can use the singularity command:

$ module load singularity

Images and containers

We’ll start with a brief note on the terminology used in this section of the course. We refer to both images and containers. What is the distinction between these two terms?

Images are bundles of files including an operating system, software and potentially data and other application-related files. They may sometimes be referred to as a disk image or container image and they may be stored in different ways, perhaps as a single file, or as a group of files. Either way, we refer to this file, or collection of files, as an image.

A container is a virtual environment that is based on an image. That is, the files, applications, tools, etc that are available within a running container are determined by the image that the container is started from. It may be possible to start multiple container instances from an image. You could, perhaps, consider an image to be a form of template from which running container instances can be started.

Getting an image and running a Singularity container

If you recall from learning about Docker, Docker images are formed of a set of layers that make up the complete image. When you pull a Docker image from Docker Hub, you see the different layers being downloaded to your system. They are stored in your local Docker repository on your system and you can see details of the available images using the docker command.

Singularity images are a little different. Singularity uses the Signularity Image Format (SIF) and images are provided as single SIF files. Singularity images can be pulled from Singularity Hub, a registry for container images. Singularity is also capable of running containers based on images pulled from Docker Hub and some other sources. We’ll look at accessing containers from Docker Hub later in the Singularity material.

Singularity Hub

Note that in addition to providing a repository that you can pull images from, Singularity Hub can also build Singularity images for you from a recipe - a configuration file defining the steps to build an image. We’ll look at recipes and building images later.

Let’s begin by creating a test directory, changing into it and pulling a test Hello World image from Singularity Hub:

$ mkdir test
$ cd test
$ singularity pull hello-world.sif shub://vsoch/hello-world
INFO:    Downloading shub image
 59.75 MiB / 59.75 MiB [===============================================================================================================] 100.00% 52.03 MiB/s 1s

What just happened?! We pulled a SIF image from Singularity Hub using the singularity pull command and directed it to store the image file using the name hello-world.sif. If you run the ls command, you should see that the hello-world.sif file is now in your current directory. This is our image and we can now run a container based on this image:

$ singularity run hello-world.sif
RaawwWWWWWRRRR!! Avocado!

The above command ran the hello-world container from the image we downloaded from Singularity Hub and the resulting output was shown.

How did the container determine what to do when we ran it?! What did running the container actually do to result in the displayed output?

When you run a container from an image without using any additional command line arguments, the container runs the default run script that is embedded within the image. This is a shell script that can be used to run commands, tools or applications stored within the image on container startup. We can inspect the image’s run script using the singularity inspect command:

$ singularity inspect -r hello-world.sif
#!/bin/sh 

exec /bin/bash /rawr.sh

This shows us the script within the hello-world.sif image configured to run by default when we use the singularity run command.

That concludes this introductory Singularity episode. The next episode looks in more detail at running containers.

Key Points

  • Singularity is another container platform and it is often used in cluster/HPC/research environments.

  • Singularity has a different security model to other container platforms, one of the key reasons that it is well suited to HPC and cluster environments.

  • Singularity has its own container image format (SIF).

  • The singularity command can be used to pull images from Singularity Hub and run a container from an image file.


Break

Overview

Teaching: min
Exercises: min
Questions
Objectives

Comfort break

Key Points


Working with Singularity containers

Overview

Teaching: 30 min
Exercises: 25 min
Questions
  • How do I run a shell or different commands within a container?

  • Where does Singularity store images?

Objectives
  • Learn about Singularity’s image cache.

  • Understand how to run different commands when starting a container and open an interactive shell within a container environment.

  • Learn more about how singularity handles users and binds directories from the host filesystem.

  • Learn how to run Singularity containers based on Docker images.

Singularity’s image cache

While Singularity doesn’t have a local image repository in the same way as Docker, it does cache downloaded image files. As we saw in the previous episode, images are simply .sif files stored on your local disk.

If you delete a local .sif image that you have pulled from a remote image repository and then pull it again, if the image is unchanged from the version you previously pulled, you will be given a copy of the image file from your local cache rather than the image being downloaded again from the remote source. This removes unnecessary network transfers and is particularly useful for large images which may take some time to transfer over the network. To demonstrate this, remove the hello-world.sif file stored in your test directory and then issue the pull command again:

$ rm hello-world.sif
$ singularity pull hello-world.sif shub://vsoch/hello-world
INFO:    Use image from cache

As we can see in the above output, the image has been returned from the cache and we don’t see the output that we saw previously showing the image being downloaded from Singularity Hub.

How do we know what is stored in the local cache? We can find out using the singularity cache command:

$ singularity cache list
There are 1 container file(s) using 62.65 MB and 0 oci blob file(s) using 0.00 kB of space
Total space used: 62.65 MB

This tells us how many container files are stored in the cache and how much disk space the cache is using but it doesn’t tell us what is actually being stored. To find out more information we can add the -v verbose flag to the list command:

$ singularity cache list -v
NAME                     DATE CREATED           SIZE             TYPE
hello-world_latest.sif   2020-04-03 13:20:44    62.65 MB         shub

There are 1 container file(s) using 62.65 MB and 0 oci blob file(s) using 0.00 kB of space
Total space used: 62.65 MB

This provides us with some more useful information about the actual images stored in the cache. In the TYPE column we can see that our image type is shub because it’s a SIF image that has been pulled from Singularity Hub.

Cleaning the Singularity image cache

We can remove images from the cache using the singularity cache clean command. Running the command without any options will display a warning and ask you to confirm that you want to remove everything from your cache.

You can also remove specific images or all images of a particular type. Look at the output of singularity cache clean --help for more information.

Basic exercise: Clearing specific image types from the cache

What command would you use to remove only images of type shub from your local Singularity image cache?

How could you test this safely to ensure your command is going to do the right thing?

Solution

$ singularity cache clean --type=shub
$ singularity cache clean -n --type=shub
Removing /<cache_dir>/.singularity/cache/shub

Working with containers

Running specific commands within a container

We saw earlier that we can use the singularity inspect command to see the run script that a container is configured to run by default. What if we want to run a different command within a container, or we want to open a shell within a container that we can interact with?

If we know the path of an executable that we want to run within a container, we can use the singularity exec command. For example, using the hello-world.sif container that we’ve already pulled from Singularity Hub, we can run the following within the test directory where the hello-world.sif file is located:

$ singularity exec hello-world.sif /bin/echo Hello World!
Hello World!

Here we see that a container has been started from the hello-world.sif image and the /bin/echo command has been run within the container, passing the input Hello World!. The command has echoed the provided input to the console and the container has terminated.

Basic exercise: Running a different command within the “hello-world” container

Can you run a container based on the hello-world.sif image that prints the current date and time?

Solution

$ singularity exec hello-world.sif /bin/date
Fri Jun 26 15:17:44 BST 2020

Running a shell within a container

If you want to open an interactive shell within a container, Singularity provides the singularity shell command. Again, using the hello-world.sif image, and within our test directory, we can run a shell within a container from the hello-world image:

$ singularity shell hello-world.sif
Singularity> whoami
[<your username>]
Singularity> ls
hello-world.sif
Singularity> 

As shown above, we have opened a shell in a new container started from the hello-world.sif image.

Running a shell inside a Singularity container

Q: What do you notice about the output of the above commands entered within the Singularity container shell?

Q: Does this differ from what you might see within a Docker container?

Use the exit command to exit from the container shell.

Users, files and directories within a Singularity container

The first thing to note is that when you run whoami within the container you should see the username that you are signed in as on the host system when you run the container. For example, if my username is jc1000:

$ singularity shell hello-world.sif
Singularity> whoami
jc1000

But hang on! I downloaded the standard, public version of the hello-world.sif image from Singularity Hub. I haven’t customised it in any way. How is it configured with my own user details?!

If you have any familiarity with Linux system administration, you may be aware that in Linux, users and their Unix groups are configured in the /etc/passwd and /etc/group files respectively. In order for the shell within the container to know of my user, the relevant user information needs to be available within these files within the container.

Assuming this feature is enabled on your system, when the container is started, Singularity appends the relevant user and group lines from the host system to the /etc/passwd and /etc/group files within the container [1].

Singularity also binds some directories from the host system where you are running the singularity command into the container that you’re starting. Note that this bind process isn’t copying files into the running container, it is simply making an existing directory on the host system visible and accessible within the container environment. If you write files to this directory within the running container, when the container shuts down, those changes will persist in the relevant location on the host system.

There is a default configuration of which files and directories are bound into the container but ultimate control of how things are set up on the system where you’re running Singularity is determined by the system administrator. As a result, this section provides an overview but you may find that things are a little different on the system that you’re running on.

One directory that is likely to be accessible within a container that you start is your home directory. The mapping of file content and directories from a host system into a Singularity container is illustrated in the example below showing a subset of the directories on the host Linux system and in a Singularity container:

Host system:                                                      Singularity container:
-------------                                                     ----------------------
/                                                                 /
├── bin                                                           ├── bin
├── etc                                                           ├── etc
│   ├── ...                                                       │   ├── ...
│   ├── group  ─> user's group added to group file in container ─>│   ├── group
│   └── passwd ──> user info added to passwd file in container ──>│   └── passwd
├── home                                                          ├── usr
│   └── jc1000 ───> user home directory made available ──> ─┐     ├── sbin
├── usr                 in container via bind mount         │     ├── home
├── sbin                                                    └────────>└── jc1000
└── ...                                                           └── ...

Questions and exercises: Files in Singularity containers

Q1: What do you notice about the ownership of files in a container started from the hello-world image? (e.g. take a look at the ownership of files in the root directory (/))

Exercise 1: In this container, try editing (for example using the editor vi which should be avaiable in the container) the /rawr.sh file. What do you notice?

If you’re not familiar with vi there are many quick reference pages online showing the main commands for using the editor, for example this one.

Exercise 2: In your home directory within the container shell, try and create a simple text file. Is it possible to do this? If so, why? If not, why not?! If you can successfully create a file, what happens to it when you exit the shell and the container shuts down?

Answers

A1: Use the ls -l command to see a detailed file listing including file ownership and permission details. You may see that all the files are owned by you, alternatively, most files in the root (/) directory may be owned by the root user. If the files are owned by you, this looks good - you should be ready to edit something in the exercise that follows…otherwise, if the files are owned by root, maybe not…

A Ex1: Unfortunately, it’s not so easy, depending on how you tried to edit /rawr.sh you probably saw an error similar to the following: Can't open file for writing or Read-only file system

A Ex2: Within your home directory, you should be able to successfully create a file. Since you’re seeing your home directory on the host system which has been bound into the container, when you exit and the container shuts down, the file that you created within the container should still be present when you look at your home directory on the host system.

Using Docker images with Singularity

Singularity can also start containers from Docker images, opening up access to a huge number of existing container images available on Docker Hub and other registries.

While Singularity doesn’t support running Docker images directly, it can pull them from Docker Hub and convert them into a suitable format for running via Singularity. When you pull a Docker image, Singularity pulls the slices or layers that make up the Docker image and converts them into a single-file Singularity SIF image.

For example, moving on from the simple Hello World examples that we’ve looked at so far, let’s pull one of the official Docker Python images. We’ll use the image with the tag 3.8.6-slim-buster which has Python 3.8.6 installed on Debian’s Buster (v10) Linux distribution:

$ singularity pull python-3.8.6.sif docker://python:3.8.6-slim-buster
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob 852e50cd189d done
Copying blob 334ed303e4ad done
Copying blob a687a65725ea done
Copying blob fe607cb30fbe done
Copying blob b8a3bc0a3645 done
Copying config 08d8e312de done
Writing manifest to image destination
Storing signatures
2020/12/07 18:36:18  info unpack layer: sha256:852e50cd189dfeb54d97680d9fa6bed21a6d7d18cfb56d6abfe2de9d7f173795
2020/12/07 18:36:19  info unpack layer: sha256:334ed303e4ad2f8dc872f2e845d79012ad648eaced444e009ae9a397cc4b4dbb
2020/12/07 18:36:19  info unpack layer: sha256:a687a65725ea883366a61d24db0f946ad384aea893297d9510e50fa13f565539
2020/12/07 18:36:19  info unpack layer: sha256:fe607cb30fbe1148b5885d58c909d0c08cbf2c0848cc871845112f3ee0a0f9ba
2020/12/07 18:36:19  info unpack layer: sha256:b8a3bc0a3645e2afcd8807830833a0df0bd243d58d518e17b2335342e2614bd3
INFO:    Creating SIF file...
INFO:    Build complete: python-3.8.6.sif

Note how we see singularity saying that it’s “Converting OCI blobs to SIF format”. We then see the layers of the Docker image being downloaded and unpacked and written into a single SIF file. Once the process is complete, we should see the python-3.8.6.sif image file in the current directory.

We can now run a container from this image as we would with any other singularity image.

Running the Python 3.8.6 image that we just pulled from Docker Hub

Try running the Python 3.8.6 image. What happens?

Try running some simple Python statements…

Running the Python 3.8.6 image

$ singularity run python-3.8.6.sif

This should put you straight into a Python interactive shell within the running container:

Python 3.8.6 (default, Nov 25 2020, 02:47:44) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

Now try running some simple Python statements:

>>> import math
>>> math.pi
3.141592653589793
>>> 

In addition to running a container and having it run the default run script, you could also start a container running a shell in case you want to undertake any configuration prior to running Python. This is covered in the following exercise:

Open a shell within a Python container

Try to run a shell within a singularity container based on the python-3.8.6.sif image. That is, run a container that opens a shell rather than the default Python interactive console as we saw above. Can you find more than one way to achieve this?

Within the shell, try starting the Python interactive console and running some Python commands.

Solution

Recall from the earlier material that we can use the singularity shell command to open a shell within a container. To open a regular shell within a container based on the python-3.8.6.sif image, we can therefore simply run:

$ singularity shell python-3.8.6.sif
Singularity> echo $SHELL
/bin/bash
Singularity> cat /etc/issue
Debian GNU/Linux 10 \n \l

Singularity> exit
$ 

It is also possible to use the singularity exec command to run an executable within a container. We could, therefore, use the exec command to run /bin/bash:

$ singularity exec python-3.8.6.sif /bin/bash
Singularity> echo $SHELL
/bin/bash

You can run the Python console from your container shell simply by running the python command.

This concludes the second episode and Part I of the Singularity material. Part II contains a further two episodes where we’ll look creating your own images and then more advanced use of containers for running MPI parallel applications.

References

[1] Gregory M. Kurzer, Containers for Science, Reproducibility and Mobility: Singularity P2. Intel HPC Developer Conference, 2017. Available at: https://www.intel.com/content/dam/www/public/us/en/documents/presentation/hpc-containers-singularity-advanced.pdf

Key Points

  • Singularity caches downloaded images so that an image isn’t downloaded again when it is requested using the singularity pull command.

  • The singularity exec and singularity shell commands provide different options for starting containers.

  • Singularity can start a container from a Docker image which can be pulled directly from Docker Hub.


Lunch

Overview

Teaching: min
Exercises: min
Questions
Objectives

Lunch break

Key Points


Building Singularity images

Overview

Teaching: 30 min
Exercises: 30 min
Questions
  • How do I create my own Singularity images?

Objectives
  • Understand the different Singularity container file formats.

  • Understand how to build and share your own Singularity containers.

Singularity - Part II

Brief recap

In the first two episodes of this Singularity material we’ve seen how Singularity can be used on a computing platform where you don’t have any administrative privileges. The software was pre-installed and it was possible to work with existing images such as Singularity image files already stored on the platform or images obtained from a remote image repository such as Singularity Hub or Docker Hub.

It is clear that between Singularity Hub and Docker Hub there is a huge array of images available but what if you want to create your own images or customise existing images?

In this first of two episodes in Part II of the Singularity material, we’ll look at building Singularity images.

Preparing to use Singularity for building images

So far you’ve been able to work with Singularity from your own user account as a non-privileged user. This part of the Singularity material requires that you use Singularity in an environment where you have administrative (root) access. While it is possible to build Singularity containers without root access, it is highly recommended that you do this as the root user, as highlighted in this section of the Singularity documentation. Bear in mind that the system that you use to build containers doesn’t have to be the system where you intend to run the containers. If, for example, you are intending to build a container that you can subsequently run on a Linux-based cluster, you could build the container on your own Linux-based desktop or laptop computer. You could then transfer the built image directly to the target platform or upload it to an image repository and pull it onto the target platform from this repository.

There are three different options for accessing a suitable environment to undertake the material in this part of the course:

  1. Run Singularity from within a Docker container - this will enable you to have the required privileges to build images
  2. Install Singularity locally on a system where you have administrative access
  3. Use Singularity on a system where it is already pre-installed and you have administrative (root) access

We’ll focus on the first option in this part of the course. If you would like to install Singularity directly on your system, see the box below for some further pointers. Note that the installation process is an advanced task that is beyond the scope of this course so we won’t be covering this.

Installing Singularity on your local system (optional) [Advanced task]

If you are running Linux and would like to install Singularity locally on your system, Singularity provide the free, open source Singularity Community Edition. You will need to install various dependencies on your system and then build Singularity from source code.

If you are not familiar with building applications from source code, it is strongly recommended that you use the Docker Singularity image, as described below in the “Getting started with the Docker Singularity image” section rather than attempting to build and install Singularity yourself. The installation process is an advanced task that is beyond the scope of this session.

However, if you have Linux systems knowledge and would like to attempt a local install of Singularity, you can find details in the INSTALL.md file within the Singularity repository that explains how to install the prerequisites and build and install the software. Singularity is written in the Go programming language and Go is the main dependency that you’ll need to install on your system. The process of installing Go and any other requirements is detailed in the INSTALL.md file.

Note

If you do not have access to a system with Docker installed, or a Linux system where you can build and install Singularity but you have administrative privileges on another system, you could look at installing a virtualisation tool such as VirtualBox on which you could run a Linux Virtual Machine (VM) image. Within the Linux VM image, you will be able to install Singularity. Again this is beyond the scope of the course.

If you are not able to access/run Singularity yourself on a system where you have administrative privileges, you can still follow through this material as it is being taught (or read through it in your own time if you’re not participating in a taught version of the course) since it will be helpful to have an understanding of how Singularity images can be built.

You could also attempt to follow this section of the lesson without using root and instead using the singularity command’s --fakeroot option. However, you may encounter issues with permissions when trying to build images and run your containers and this is why running the commands as root is strongly recommended and is the approach described in this lesson.

Getting started with the Docker Singularity image

The Singularity Docker image is available from Quay.io.

Familiarise yourself with the Docker Singularity image

  • Using your previously acquired Docker knowledge, get the Singularity image for v3.7.0 and ensure that you can run a Docker container using this image. You might want to use the v3.7.0-slim version of this image since it is significantly smaller than the standard image - the slim version of the image will be used in the examples below.

  • Create a directory (e.g. $HOME/singularity_data) on your host machine that you can use for storage of definition files (we’ll introduce these shortly) and generated image files.

    This directory should be bind mounted into the Docker container at the location /home/singularity every time you run it - this will give you a location in which to store built images so that they are available on the host system once the container exits. (take a look at the -v switch)

Note: To be able to build an image using the Docker Singularity container, you’ll probably need to add the --privileged switch to your docker command line.

Questions:

  • What is happening when you run the container?
  • Can you run an interactive shell in the container?

Running the image

Having a bound directory from the host system accessible within your running Singularity container will give you somewhere to place created images so that they are accessible on the host system after the container exits. Begin by changing into the directory that you created above for storing your definiton files and built images (e.g. $HOME/singularity_data).

You may choose to:

  • open a shell within the Docker image so you can work at a command prompt and run the singularity command directly
  • use the docker run command to run a new container instance every time you want to run the singularity command.

Either option is fine for this section of the material.

Some examples:

To run the singularity command within the docker container directly from the host system’s terminal:

docker run -it --privileged --rm -v ${PWD}:/home/singularity quay.io/singularity/singularity:v3.7.0-slim cache list

To start a shell within the Singularity Docker container where the singularity command can be run directly:

docker run -it --entrypoint=/bin/sh --privileged --rm -v ${PWD}:/home/singularity quay.io/singularity/singularity:v3.7.0-slim

To make things easier to read in the remainder of the material, command examples will use the singularity command directly, e.g. singularity cache list. If you’re running a shell in the Docker container, you can enter the commands as they appear. If you’re using the container’s default run behaviour and running a container instance for each run of the command, you’ll need to replace singularity with docker run --privileged -v ${PWD}:/home/singularity quay.io/singularity/singularity:v3.7.0-slim or similar.

Building Singularity images

Introduction

As a platform that is widely used in the scientific/research software and HPC communities, Singularity provides great support for reproducibility. If you build a Singularity container for some scientific software, it’s likely that you and/or others will want to be able to reproduce exactly the same environment again. Maybe you want to verify the results of the code or provide a means that others can use to verify the results to support a paper or report. Maybe you’re making a tool available to others and want to ensure that they have exactly the right version/configuration of the code.

Similarly to Docker and many other modern software tools, Singularity follows the “Configuration as code” approach and a container configuration can be stored in a file which can then be committed to your version control system alongside other code. Assuming it is suitably configured, this file can then be used by you or other individuals (or by automated build tools) to reproduce a container with the same configuration at some point in the future.

Different approaches to building images

There are various approaches to building Singularity images. We highlight two different approaches here and focus on one of them:

You can take a look at Singularity’s “Build a Container” documentation for more details on different approaches to building containers.

Why look at Singularity Definition Files?

Why do you think we might be looking at the definition file approach here rather than the sandbox approach?

Discussion

The sandbox approach is great for prototyping and testing out an image configuration but it doesn’t provide the best support for our ultimate goal of reproducibility. If you spend time sitting at your terminal in front of a shell typing different commands to add configuration, maybe you realise you made a mistake so you undo one piece of configuration and change it. This goes on until you have your completed configuration but there’s no explicit record of exactly what you did to create that configuration.

Say your container image file gets deleted by accident, or someone else wants to create an equivalent image to test something. How will they do this and know for sure that they have the same configuration that you had? With a definition file, the configuration steps are explicitly defined and can be easily stored, for example within a version control system, and re-run.

Definition files are small text files while container files may be very large, multi-gigabyte files that are difficult and time consuming to move around. This makes definition files ideal for storing in a version control system along with their revisions.

Creating a Singularity Definition File

A Singularity Definition File is a text file that contains a series of statements that are used to create a container image. In line with the configuration as code approach mentioned above, the definition file can be stored in your code repository alongside your application code and used to create a reproducible image. This means that for a given commit in your repository, the version of the definition file present at that commit can be used to reproduce a container with a known state. It was pointed out earlier in the course, when covering Docker, that this property also applies for Dockerfiles.

We’ll now look at a very simple example of a definition file:

Bootstrap: docker
From: ubuntu:20.04

%post
    apt-get -y update && apt-get install -y python3

%runscript
    python3 -c 'print("Hello World! Hello from our custom Singularity image!")'

A definition file has a number of optional sections, specified using the % prefix, that are used to define or undertake different configuration during different stages of the image build process. You can find full details in Singularity’s Definition Files documentation. In our very simple example here, we only use the %post and %runscript sections.

Let’s step through this definition file and look at the lines in more detail:

Bootstrap: docker
From: ubuntu:20.04

These first two lines define where to bootstrap our image from. Why can’t we just put some application binaries into a blank image? Any applications or tools that we want to run will need to interact with standard system libraries and potentially a wide range of other libraries and tools. These need to be available within the image and we therefore need some sort of operating system as the basis for our image. The most straightforward way to achieve this is to start from an existing base image containing an operating system. In this case, we’re going to start from a minimal Ubuntu 20.04 Linux Docker image. Note that we’re using a Docker image as the basis for creating a Singularity image. This demonstrates the flexibility in being able to start from different types of images when creating a new Singularity image.

The Bootstrap: docker line is similar to prefixing an image path with docker:// when using, for example, the singularity pull command. A range of different bootstrap options are supported. From: ubuntu:20.04 says that we want to use the ubuntu image with the tag 20.04.

Next we have the %post section of the definition file:

%post
    apt-get -y update && apt-get install -y python3

In this section of the file we can do tasks such as package installation, pulling data files from remote locations and undertaking local configuration within the image. The commands that appear in this section are standard shell commands and they are run within the context of our new container image. So, in the case of this example, these commands are being run within the context of a minimal Ubuntu 20.04 image that initially has only a very small set of core packages installed.

Here we use Ubuntu’s package manager to update our package indexes and then install the python3 package along with any required dependencies (in Ubuntu 20.04, the python3 package installs python 3.8.5). The -y switches are used to accept, by default, interactive prompts that might appear asking you to confirm package updates or installation. This is required because our definition file should be able to run in an unattended, non-interactive environment.

Finally we have the %runscript section:

%runscript
    python3 -c 'print("Hello World! Hello from our custom Singularity image!")'

This section is used to define a script that should be run when a container is started based on this image using the singularity run command. In this simple example we use python3 to print out some text to the console.

We can now save the contents of the simple defintion file shown above to a file and build an image based on it. In the case of this example, the definition file has been named my_test_image.def. (Note that the instructions here assume you’ve bound the image output directory you created to the /home/singularity directory in your Docker Singularity container):

$ singularity build /home/singularity/my_test_image.sif /home/singularity/my_test_image.def

Recall from the details at the start of this section that if you are running your command from the host system command line, running an instance of a Docker container for each run of the command, your command will look something like this:

$ docker run -it --privileged --rm -v ${PWD}:/home/singularity quay.io/singularity/singularity:v3.7.0-slim build /home/singularity/my_test_image.sif /home/singularity/my_test_image.def

The above command requests the building of an image based on the my_test_image.def file with the resulting image saved to the my_test_image.sif file. Note that you will need to prefix the command with sudo if you’re running a locally installed version of Singularity and not running via Docker because it is necessary to have administrative privileges to build the image. You should see output similar to the following:

INFO:    Starting build...
Getting image source signatures
Copying blob da7391352a9b done  
Copying blob 14428a6d4bcd done  
Copying blob 2c2d948710f2 done  
Copying config aa23411143 done  
Writing manifest to image destination
Storing signatures
2020/12/08 09:15:18  info unpack layer: sha256:da7391352a9bb76b292a568c066aa4c3cbae8d494e6a3c68e3c596d34f7c75f8
2020/12/08 09:15:19  info unpack layer: sha256:14428a6d4bcdba49a64127900a0691fb00a3f329aced25eb77e3b65646638f8d
2020/12/08 09:15:19  info unpack layer: sha256:2c2d948710f21ad82dce71743b1654b45acb5c059cf5c19da491582cef6f2601
INFO:    Running post scriptlet
+ apt-get -y update
Get:1 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]
...
  [Package update output truncated]
...
Fetched 16.6 MB in 3s (6050 kB/s)
Reading package lists...
+ apt-get install -y python3
Reading package lists...
...
  [Package install output truncated]
...
Processing triggers for libc-bin (2.31-0ubuntu9.1) ...
INFO:    Adding runscript
INFO:    Creating SIF file...
INFO:    Build complete: my_test_image.sif
$ 

You should now have a my_test_image.sif file in the current directory. Note that in your version of the above output, after it says INFO: Starting build... you may see a series of skipped: already exists messages for the Copying blob lines. This happens when the Docker image slices for the Ubuntu 20.04 image have previously been downloaded and are cached on the system where this example is being run. On your system, if the image is not already cached, you will see the slices being downloaded from Docker Hub when these lines of output appear.

Permissions of the created image file

You may find that the created Singularity image file on your host filesystem is owned by the root user and not your user. In this case, you won’t be able to change the ownership/permissions of the file directly if you don’t have root access.

However, the image file will be readable by you and you should be able to take a copy of the file under a new name which you will then own. You will then be able to modify the permissions of this copy of the image and delete the original root-owned file since the default permissions should allow this.

Testing your Singularity image

In a moment we’ll test the created image on our HPC platform but, first, you should be able to run a shell in an instance of the Docker Singularity container and run your singularity image there.

Run the Singularity image you’ve created

Can you run the Singularity image you’ve just built from a shell within the Docker Singularity container?

Solution

$ docker run -it --entrypoint=/bin/sh --privileged --rm -v ${PWD}:/home/singularity quay.io/singularity/singularity:v3.7.0-slim
/ # cd /home/singularity
/home/singularity # singularity run my_test_image.sif
Hello World! Hello from our custom Singularity image!
/home/singularity # 

Using singularity run from within the Docker container

It is strongly recommended that you don’t use the Docker container for running Singularity images in any production setting, only for creating them, since the Singularity command runs within the container as the root user.

However, for the purposes of this simple example, the Docker Singularity container provides an ideal environment to test that you have successfully built your container.

Now we’ll test our image on an HPC platform. Move your created .sif image file to a platform with an installation of Singularity. You could, for example, do this using the command line secure copy command scp. For example, the following command would copy my_test_image.sif to the remote server identified by <target hostname> (don’t forget the colon at the end of the hostname!):

$ scp -i <full path to SSH key file> my_test_image.sif <target hostname>:

You could provide a destination path for the file straight after the colon at the end of the above command (without a space), but by default, the file will be uploaded to you home directory.

Try to run the container on the login node of the HPC platform and check that you get the expected output.

More advanced definition files

Here we’ve looked at a very simple example of how to create an image. At this stage, you might want to have a go at creating your own definition file for some code of your own or an application that you work with regularly. There are several definition file sections that were not used in the above example, these are:

The Sections part of the definition file documentation details all the sections and provides an example definition file that makes use of all the sections.

Additional Singularity features

Singularity has a wide range of features. You can find full details in the Singularity User Guide and we highlight a couple of key features here that may be of use/interest:

Remote Builder Capabilities: If you have access to a platform with Singularity installed but you don’t have root access to create containers, you may be able to use the Remote Builder functionality to offload the process of building an image to remote cloud resources. You’ll need to register for a cloud token via the link on the Remote Builder page.

Signing containers: If you do want to share container image (.sif) files directly with colleagues or collaborators, how can the people you send an image to be sure that they have received the file without it being tampered with or suffering from corruption during transfer/storage? And how can you be sure that the same goes for any container image file you receive from others? Singularity supports signing containers. This allows a digital signature to be linked to an image file. This signature can be used to verify that an image file has been signed by the holder of a specific key and that the file is unchanged from when it was signed. You can find full details of how to use this functionality in the Singularity documentation on Signing and Verifying Containers.

Key Points

  • Singularity definition files are used to define the build process and configuration for an image.

  • Singularity’s Docker container provides a way to build images on a platform where Singularity is not installed but Docker is available.

  • Existing images from remote registries such as Docker Hub and Singularity Hub can be used as a base for creating new Singularity images.


Running MPI parallel jobs using Singularity containers

Overview

Teaching: 30 min
Exercises: 30 min
Questions
  • How do I set up and run an MPI job from a Singularity container?

Objectives
  • Learn how MPI applications within Singularity containers can be run on HPC platforms

  • Understand the challenges and related performance implications when running MPI jobs via Singularity

Running MPI parallel codes with Singularity containers

MPI overview

MPI - Message Passing Interface - is a widely used standard for parallel programming. It is used for exchanging messages/data between processes in a parallel application. If you’ve been involved in developing or working with computational science software, you may already be familiar with MPI and running MPI applications.

When working with an MPI code on a large-scale cluster, a common approach is to compile the code yourself, within your own user directory on the cluster platform, building against the supported MPI implementation on the cluster. Alternatively, if the code is widely used on the cluster, the platform administrators may build and package the application as a module so that it is easily accessible by all users of the cluster.

MPI codes with Singularity containers

We’ve already seen that building Singularity containers can be impractical without root access. Since we’re highly unlikely to have root access on a large institutional, regional or national cluster, building a container directly on the target platform is not normally an option.

If our target platform uses OpenMPI, one of the two widely used source MPI implementations, we can build/install a compatible OpenMPI version on our local build platform, or directly within the image as part of the image build process. We can then build our code that requires MPI, either interactively in an image sandbox or via a definition file.

If the target platform uses a version of MPI based on MPICH, the other widely used open source MPI implementation, there is ABI compatibility between MPICH and several other MPI implementations. In this case, you can build MPICH and your code on a local platform, within an image sandbox or as part of the image build process via a definition file, and you should be able to successfully run containers based on this image on your target cluster platform.

As described in Singularity’s MPI documentation, support for both OpenMPI and MPICH is provided. Instructions are given for building the relevant MPI version from source via a definition file and we’ll see this used in an example below.

While building a container on a local system that is intended for use on a remote HPC platform does provide some level of portability, if you’re after the best possible performance, it can present some issues. The version of MPI in the container will need to be built and configured to support the hardware on your target platform if the best possible performance is to be achieved. Where a platform has specialist hardware with proprietary drivers, building on a different platform with different hardware present means that building with the right driver support for optimal performance is not likely to be possible. This is especially true if the version of MPI available is different (but compatible). Singularity’s MPI documentation highlights two different models for working with MPI codes. The hybrid model that we’ll be looking at here involves using the MPI executable from the MPI installation on the host system to launch singularity and run the application within the container. The application in the container is linked against and uses the MPI installation within the container which, in turn, communicates with the MPI daemon process running on the host system. In the following section we’ll look at building a Singularity image containing a small MPI application that can then be run using the hybrid model.

Building and running a Singularity image for an MPI code

Building and testing an image

This example makes the assumption that you’ll be building a container image on a local platform and then deploying it to a cluster with a different but compatible MPI implementation. See Singularity and MPI applications in the Singularity documentation for further information on how this works.

We’ll build an image from a definition file. Containers based on this image will be able to run MPI benchmarks using the OSU Micro-Benchmarks software.

In this example, the target platform is a remote HPC cluster that uses MPICH. The container can be built via the Singularity Docker image that we used in the previous episode of the Singularity material.

Begin by creating a directory and, within that directory, downloading and saving the “tarballs” for version 5.6.3 of the OSU Micro-Benchmarks from the OSU Micro-Benchmarks page and for [MPICH version 3.3.2] from the MPICH downloads page.

In the same directory, save the following definition file content to a .def file, e.g. osu_benchmarks.def:

Bootstrap: docker
From: ubuntu:20.04

%files
    /home/singularity/osu-micro-benchmarks-5.6.3.tar.gz /root/
    /home/singularity/mpich-3.3.2.tar.gz /root/

%environment
    export SINGULARITY_MPICH_DIR=/usr

%post
    apt-get -y update && DEBIAN_FRONTEND=noninteractive apt-get -y install build-essential libfabric-dev libibverbs-dev gfortran
    cd /root
    tar zxvf mpich-3.3.2.tar.gz && cd mpich-3.3.2
    echo "Configuring and building MPICH..."
    ./configure --prefix=/usr --with-device=ch3:nemesis:ofi && make -j2 && make install
    cd /root
    tar zxvf osu-micro-benchmarks-5.6.3.tar.gz
    cd osu-micro-benchmarks-5.6.3/
    echo "Configuring and building OSU Micro-Benchmarks..."
    ./configure --prefix=/usr/local/osu CC=/usr/bin/mpicc CXX=/usr/bin/mpicxx
    make -j2 && make install

%runscript
    echo "Rank ${PMI_RANK} - About to run: /usr/local/osu/libexec/osu-micro-benchmarks/mpi/$*"
    exec /usr/local/osu/libexec/osu-micro-benchmarks/mpi/$*

A quick overview of what the above definition file is doing:

Note that base path of the the executable to run is hardcoded in the run script so the command line parameter to provide when running a container based on this image is relative to this base path, for example, startup/osu_hello, collective/osu_allgather, pt2pt/osu_latency, one-sided/osu_put_latency.

Build and test the OSU Micro-Benchmarks image

Using the above definition file, build a Singularity image named osu_benchmarks.sif.

Once you have built the image, use it to run the osu_hello benchmark that is found in the startup benchmark folder.

NOTE: If you’re not using the Singularity Docker image to build your Singularity image, you will need to edit the path to the .tar.gz file in the %files section of the definition file.

Solution

You should be able to build an image from the definition file as follows:

$ singularity build osu_benchmarks.sif osu_benchmarks.def

Note that if you’re running the Singularity Docker container directly from the command line to undertake your build, you’ll need to provide the full path to the .def file at which it appears within the container - for example, if you’ve bind mounted the directory containing the file to /home/singularity within the container, the full path to the .def file will be /home/singularity/osu_benchmarks.def._

Assuming the image builds successfully, you can then try running the container locally and also transfer the SIF file to a cluster platform that you have access to (that has Singularity installed) and run it there.

Let’s begin with a single-process run of osu_hello on the local system to ensure that we can run the container as expected:

$ singularity run osu_benchmarks.sif startup/osu_hello

You should see output similar to the following:

Rank  - About to run: /usr/local/osu/libexec/osu-micro-benchmarks/mpi/startup/osu_hello
# OSU MPI Hello World Test v5.6.2
This is a test with 1 processes

Note that no rank number is shown since we didn’t run the container via mpirun and so the ${PMI_RANK} environment variable that we’d normally have set in an MPICH run process is not set.

Running Singularity containers via MPI

Assuming the above tests worked, we can now try undertaking a parallel run of one of the OSU benchmarking tools within our container image.

This is where things get interesting and we’ll begin by looking at how Singularity containers are run within an MPI environment.

If you’re familiar with running MPI codes, you’ll know that you use mpirun, mpiexec or a similar MPI executable to start your application. This executable may be run directly on the local system or cluster platform that you’re using, or you may need to run it through a job script submitted to a job scheduler. Your MPI-based application code, which will be linked against the MPI libraries, will make MPI API calls into these MPI libraries which in turn talk to the MPI daemon process running on the host system. This daemon process handles the communication between MPI processes, including talking to the daemons on other nodes to exchange information between processes running on different machines, as necessary.

When running code within a Singularity container, we don’t use the MPI executables stored within the container (i.e. we DO NOT run singularity exec mpirun -np <numprocs> /path/to/my/executable). Instead we use the MPI installation on the host system to run Singularity and start an instance of our executable from within a container for each MPI process. Without Singularity support in an MPI implementation, this results in starting a separate Singularity container instance within each process. This can present some overhead if a large number of processes are being run on a host. Where Singularity support is built into an MPI implementation this can address this potential issue and reduce the overhead of running code from within a container as part of an MPI job.

Ultimately, this means that our running MPI code is linking to the MPI libraries from the MPI install within our container and these are, in turn, communicating with the MPI daemon on the host system which is part of the host system’s MPI installation. These two installations of MPI may be different but as long as there is ABI compatibility between the version of MPI installed in your container image and the version on the host system, your job should run successfully.

We can now try running a 2-process MPI run of a point to point benchmark osu_latency. If your local system has both MPI and Singularity installed and has multiple cores, you can run this test on that system. Alternatively you can run on a cluster. Note that you may need to submit this command via a job submission script submitted to a job scheduler if you’re running on a cluster. If you’re attending a taught version of this course, some information will be provided below in relation to the cluster that you’ve been provided with access to.

Undertake a parallel run of the osu_latency benchmark (general example)

Move the osu_benchmarks.sif Singularity image onto the cluster (or other suitable) platform where you’re going to undertake your benchmark run.

You should be able to run the benchmark using a command similar to the one shown below. However, if you are running on a cluster, you may need to write and submit a job submission script at this point to initiate running of the benchmark.

$ mpirun -np 2 singularity run osu_benchmarks.sif pt2pt/osu_latency

Expected output and discussion

As you can see in the mpirun command shown above, we have called mpirun on the host system and are passing to MPI the singularity executable for which the parameters are the image file and any parameters we want to pass to the image’s run script, in this case the path/name of the benchmark executable to run.

The following shows an example of the output you should expect to see. You should have latency values shown for message sizes up to 4MB.

Rank 1 - About to run: /.../mpi/pt2pt/osu_latency
Rank 0 - About to run: /.../mpi/pt2pt/osu_latency
# OSU MPI Latency Test v5.6.2
# Size          Latency (us)
0                       0.38
1                       0.34
...

Undertake a parallel run of the osu_latency benchmark (taught course cluster example)

This version of the exercise for undertaking a parallel run of the osu_latency benchmark with your Singularity container that contains an MPI build is specific to this run of the course.

The information provided here is specifically tailored to the HPC platform that you’ve been given access to for this taught version of the course.

Move the osu_benchmarks.sif Singularity image onto the cluster where you’re going to undertake your benchmark run. You should use scp or a similar utility to copy the file.

The platform you’ve been provided with access to uses Slurm schedule jobs to run on the platform. You now need to create a Slurm job submission script to run the benchmark.

Download this template script and edit it to suit your configuration.

Submit the modified job submission script to the Slurm scheduler using the sbatch command.

$ sbatch osu_latency.slurm

Expected output and discussion

As you will have seen in the commands using the provided template job submission script, we have called mpirun on the host system and are passing to MPI the singularity executable for which the parameters are the image file and any parameters we want to pass to the image’s run script. In this case, the parameters are the path/name of the benchmark executable to run.

The following shows an example of the output you should expect to see. You should have latency values shown for message sizes up to 4MB.

INFO:    Convert SIF file to sandbox...
INFO:    Convert SIF file to sandbox...
Rank 1 - About to run: /.../mpi/pt2pt/osu_latency
Rank 0 - About to run: /.../mpi/pt2pt/osu_latency
# OSU MPI Latency Test v5.6.2
# Size          Latency (us)
0                       1.49
1                       1.50
2                       1.50
...
4194304               915.44
INFO:    Cleaning up image...
INFO:    Cleaning up image...

This has demonstrated that we can successfully run a parallel MPI executable from within a Singularity container. However, in this case, the two processes will almost certainly have run on the same physical node so this is not testing the performance of the interconnects between nodes.

You could now try running a larger-scale test. You can also try running a benchmark that uses multiple processes, for example try collective/osu_gather.

Investigate performance when using a container image built on a local system and run on a cluster

To get an idea of any difference in performance between the code within your Singularity image and the same code built natively on the target HPC platform, try building the OSU benchmarks from source, locally on the cluster. Then try running the same benchmark(s) that you ran via the singularity container. Have a look at the outputs you get when running collective/osu_gather or one of the other collective benchmarks to get an idea of whether there is a performance difference and how significant it is.

Try running with enough processes that the processes are spread across different physical nodes so that you’re making use of the cluster’s network interconnects.

What do you see?

Discussion

You may find that performance is significantly better with the version of the code built directly on the HPC platform. Alternatively, performance may be similar between the two versions.

How big is the performance difference between the two builds of the code?

What might account for any difference in performance between the two builds of the code?

If performance is an issue for you with codes that you’d like to run via Singularity, you are advised to take a look at using the bind model for building/running MPI applications through Singularity.

Singularity wrap-up

This concludes the 4 episodes of the course covering Singularity. We hope you found this information useful and that it has inspired you to use Singularity to help enhance the way you build/work with research software.

As a new set of material, we appreciate that there are likely to be improvements that can be made to enhance the quality of this material. We welcome your thoughts, suggestions and feedback on improvements that could be made to help others making use of these lessons.

Key Points

  • Singularity images containing MPI applications can be built on one platform and then run on another (e.g. an HPC cluster) if the two platforms have compatible MPI implementations.

  • When running an MPI application within a Singularity container, use the MPI executable on the host system to launch a Singularity container for each process.

  • Think about parallel application performance requirements and how where you build/run your image may affect that.