Reproducible Computational Environments Using Containers: Introduction to Docker and Singularity: Glossary

Key Points

Introducing Containers
  • Almost all software depends on other software components to function, but these components have independent evolutionary paths.

  • Small environments that contain only the software that is needed for a given task are easier to replicate and maintain.

  • Critical systems that cannot be upgraded, due to cost, difficulty, etc. need to be reproduced on newer systems in a maintainable and self-documented way.

  • Virtualization allows multiple environments to run on a single computer.

  • Containerization improves upon the virtualization of whole computers by allowing efficient management of the host computer’s memory and storage resources.

  • Containers are built from ‘recipes’ that define the required set of software components and the instructions necessary to build/install them within a container image.

  • Docker is just one software platform that can create containers and the resources they use.

Introducing the Docker Command Line
  • A toolbar icon indicates that Docker is ready to use (on Windows and macOS).

  • You will typically interact with Docker using the command line.

  • To learn how to run a certain Docker command, we can type the command followed by the --help flag.

Exploring and Running Containers
  • The docker image pull command downloads Docker container images from the internet.

  • The docker image ls command lists Docker container images that are (now) on your computer.

  • The docker container run command creates running containers from container images and can run commands inside them.

  • When using the docker container run command, a container can run a default action (if it has one), a user specified action, or a shell to be used interactively.

Cleaning Up Containers
  • docker container has subcommands used to interact and manage containers.

  • docker image has subcommands used to interact and manage container images.

  • docker container ls or docker ps can provide information on currently running containers.

Finding Containers on Docker Hub
  • The Docker Hub is an online repository of container images.

  • Many Docker Hub container images are public, and may be officially endorsed.

  • Each Docker Hub page about a container image provides structured information and subheadings

  • Most Docker Hub pages about container images contain sections that provide examples of how to use those container images.

  • Many Docker Hub container images have multiple versions, indicated by tags.

  • The naming convention for Docker container images is: OWNER/CONTAINER_IMAGE_NAME:TAG

Creating Your Own Container Images
  • Dockerfiles specify what is within Docker container images.

  • The docker image build command is used to build a container image from a Dockerfile.

  • You can share your Docker container images through the Docker Hub so that others can create Docker containers from your container images.

Creating More Complex Container Images
  • Docker allows containers to read and write files from the Docker host.

  • You can include files from your Docker host into your Docker container images by using the COPY instruction in your Dockerfile.

Examples of Using Container Images in Practice
  • There are many ways you might use Docker and existing container images in your research project.

Singularity: Getting started
  • Singularity is another container platform and it is often used in cluster/HPC/research environments.

  • Singularity has a different security model to other container platforms, one of the key reasons that it is well suited to HPC and cluster environments.

  • Singularity has its own container image format (SIF).

  • The singularity command can be used to pull images from Sylabs Cloud Library and run a container from an image file.

Using Singularity containers to run commands
  • The singularity exec is an alternative to singularity run that allows you to start a container running a specific command.

  • The singularity shell command can be used to start a container and run an interactive shell within it.

Using Docker images with Singularity
  • Singularity can start a container from a Docker image which can be pulled directly from Docker Hub.

The Singularity cache
  • Singularity caches downloaded images so that an unchanged image isn’t downloaded again when it is requested using the singularity pull command.

  • You can free up space in the cache by removing all locally cached images or by specifying individual images to remove.

Files in Singularity containers
  • Your current directory and home directory are usually available by default in a container.

  • You have the same username and permissions in a container as on the host system.

  • You can specify additional host system directories to be available in the container.

Using Singularity to run BLAST+
  • We can use containers to run software without having to install it

  • The commands we use are very similar to those we would use natively

  • Singularity handles a lot of complexity around data and internet access for us

Containers in Research Workflows: Reproducibility and Granularity
  • Container images allow us to encapsulate the computation (and data) we have used in our research.

  • Using online containerimage repositories allows us to easily share computational work we have done.

  • Using container images along with a DOI service such as Zenodo allows us to capture our work and enables reproducibility.

(Optional) Running MPI parallel jobs using Singularity containers
  • Singularity images containing MPI applications can be built on one platform and then run on another (e.g. an HPC cluster) if the two platforms have compatible MPI implementations.

  • When running an MPI application within a Singularity container, use the MPI executable on the host system to launch a Singularity container for each process.

  • Think about parallel application performance requirements and how where you build/run your image may affect that.

(Optional) Additional topics and next steps
  • TBC

Glossary

Command-line argument/option
See the Carpentries Glossario entry
Command-line interface (CLI)
See the Carpentries Glossario entry
Container
A particular instance of a lightweight virtual machine derived from a container image. Containers are typically transient, unlike container images which persist.
Container image
The persistent binary artefact that encapsulates the set of files and configuration for running an instance of a container. Sometimes shortened to just image
CPU/processor
See the Carpentries Glossario entry
Dependency
See the Carpentries Glossario entry
Dependency hell
A colloquial term for the frustration of some software users who run into issues with software packages which have dependencies on specific versions of other software packages. The dependency issue arises when several packages have dependencies on the same shared packages or libraries, but they depend on different and incompatible versions of the shared packages. If the shared package or library can only be installed in a single version, the user may need to address the problem by obtaining newer or older versions of the dependent packages. This, in turn, may break other dependencies and push the problem to another set of packages. Extract from Wikipedia
Digital object identifier (DOI)
See the Carpentries Glossario entry
Docker
A software framework for creating, running and managing containers.
Docker build context
The docker build command builds Docker images from a Dockerfile and a “context”. A build's context is the set of files located in the specified PATH or URL.
Docker Hub
An online library of Docker container images.
Docker Hub repository
A collection of related Docker container images hosted on Docker Hub.
Docker tag
The specific version identifier associated with a Docker container image.
Dockerfile
The file containing the commands to build a Docker container image along with the Docker context.
Filesystem
See the Carpentries Glossario entry
Filesystem layer
Each container image is made up of multiple read-only filesystem layers that represent the file system differences from the layers below them in the image.
Hardware
See the Carpentries Glossario entry
Hard drive
The hardware in a computer that hosts the filesystem (or, sometimes, other storage types).
Host computer
The computer system which is running the container.
Memory/RAM
Random Access Memory (RAM) is where data the CPU is working with is temporarily stored.
Operating system (OS)
See the Carpentries Glossario entry
Reproducible research
See the Carpentries Glossario entry
Software library
See the Carpentries Glossario entry
Tar archive
A file archive format commonly used in Unix-like operating systems that combines multiple files into a single file. tar archive files are used as the export format of Docker images.
Virtualization
Containers are an example of virtualization – having a second “virtual” computer running and accessible from a host computer.