Glossary

Key Points

Introducing Containers	Almost all software depends on other software components to function, but these components have independent evolutionary paths. Small environments that contain only the software that is needed for a given task are easier to replicate and maintain. Critical systems that cannot be upgraded, due to cost, difficulty, etc. need to be reproduced on newer systems in a maintainable and self-documented way. Virtualization allows multiple environments to run on a single computer. Containerization improves upon the virtualization of whole computers by allowing efficient management of the host computer’s memory and storage resources. Containers are built from ‘recipes’ that define the required set of software components and the instructions necessary to build/install them within a container image. Docker is just one software platform that can create containers and the resources they use.
Introducing the Docker Command Line	A toolbar icon indicates that Docker is ready to use (on Windows and macOS). You will typically interact with Docker using the command line. To learn how to run a certain Docker command, we can type the command followed by the `--help` flag.
Exploring and Running Containers	The `docker image pull` command downloads Docker container images from the internet. The `docker image ls` command lists Docker container images that are (now) on your computer. The `docker container run` command creates running containers from container images and can run commands inside them. When using the `docker container run` command, a container can run a default action (if it has one), a user specified action, or a shell to be used interactively.
Cleaning Up Containers	`docker container` has subcommands used to interact and manage containers. `docker image` has subcommands used to interact and manage container images. `docker container ls` or `docker ps` can provide information on currently running containers.
Finding Containers on Docker Hub	The Docker Hub is an online repository of container images. Many Docker Hub container images are public, and may be officially endorsed. Each Docker Hub page about a container image provides structured information and subheadings Most Docker Hub pages about container images contain sections that provide examples of how to use those container images. Many Docker Hub container images have multiple versions, indicated by tags. The naming convention for Docker container images is: `OWNER/CONTAINER_IMAGE_NAME:TAG`
Creating Your Own Container Images	`Dockerfile`s specify what is within Docker container images. The `docker image build` command is used to build a container image from a `Dockerfile`. You can share your Docker container images through the Docker Hub so that others can create Docker containers from your container images.
Creating More Complex Container Images	Docker allows containers to read and write files from the Docker host. You can include files from your Docker host into your Docker container images by using the `COPY` instruction in your `Dockerfile`.
Examples of Using Container Images in Practice	There are many ways you might use Docker and existing container images in your research project.
Singularity: Getting started	Singularity is another container platform and it is often used in cluster/HPC/research environments. Singularity has a different security model to other container platforms, one of the key reasons that it is well suited to HPC and cluster environments. Singularity has its own container image format (SIF). The `singularity` command can be used to pull images from Sylabs Cloud Library and run a container from an image file.
Using Singularity containers to run commands	The `singularity exec` is an alternative to `singularity run` that allows you to start a container running a specific command. The `singularity shell` command can be used to start a container and run an interactive shell within it.
Using Docker images with Singularity	Singularity can start a container from a Docker image which can be pulled directly from Docker Hub.
The Singularity cache	Singularity caches downloaded images so that an unchanged image isn’t downloaded again when it is requested using the `singularity pull` command. You can free up space in the cache by removing all locally cached images or by specifying individual images to remove.
Files in Singularity containers	Your current directory and home directory are usually available by default in a container. You have the same username and permissions in a container as on the host system. You can specify additional host system directories to be available in the container.
Using Singularity to run BLAST+	We can use containers to run software without having to install it The commands we use are very similar to those we would use natively Singularity handles a lot of complexity around data and internet access for us
Containers in Research Workflows: Reproducibility and Granularity	Container images allow us to encapsulate the computation (and data) we have used in our research. Using online containerimage repositories allows us to easily share computational work we have done. Using container images along with a DOI service such as Zenodo allows us to capture our work and enables reproducibility.
(Optional) Running MPI parallel jobs using Singularity containers	Singularity images containing MPI applications can be built on one platform and then run on another (e.g. an HPC cluster) if the two platforms have compatible MPI implementations. When running an MPI application within a Singularity container, use the MPI executable on the host system to launch a Singularity container for each process. Think about parallel application performance requirements and how where you build/run your image may affect that.
(Optional) Additional topics and next steps	TBC

Command-line argument/option: See the Carpentries Glossario entry
Command-line interface (CLI): See the Carpentries Glossario entry
Container: A particular instance of a lightweight virtual machine derived from a container image. Containers are typically transient, unlike container images which persist.
Container image: The persistent binary artefact that encapsulates the set of files and configuration for running an instance of a container. Sometimes shortened to just image
CPU/processor: See the Carpentries Glossario entry
Dependency: See the Carpentries Glossario entry
Dependency hell: A colloquial term for the frustration of some software users who run into issues with software packages which have dependencies on specific versions of other software packages. The dependency issue arises when several packages have dependencies on the same shared packages or libraries, but they depend on different and incompatible versions of the shared packages. If the shared package or library can only be installed in a single version, the user may need to address the problem by obtaining newer or older versions of the dependent packages. This, in turn, may break other dependencies and push the problem to another set of packages. Extract from Wikipedia
Digital object identifier (DOI): See the Carpentries Glossario entry
Docker: A software framework for creating, running and managing containers.
Docker build context: The docker build command builds Docker images from a Dockerfile and a “context”. A build's context is the set of files located in the specified PATH or URL.
Docker Hub: An online library of Docker container images.
Docker Hub repository: A collection of related Docker container images hosted on Docker Hub.
Docker tag: The specific version identifier associated with a Docker container image.
Dockerfile: The file containing the commands to build a Docker container image along with the Docker context.
Filesystem: See the Carpentries Glossario entry
Filesystem layer: Each container image is made up of multiple read-only filesystem layers that represent the file system differences from the layers below them in the image.
Hardware: See the Carpentries Glossario entry
Hard drive: The hardware in a computer that hosts the filesystem (or, sometimes, other storage types).
Host computer: The computer system which is running the container.
Memory/RAM: Random Access Memory (RAM) is where data the CPU is working with is temporarily stored.
Operating system (OS): See the Carpentries Glossario entry
Reproducible research: See the Carpentries Glossario entry
Software library: See the Carpentries Glossario entry
Tar archive: A file archive format commonly used in Unix-like operating systems that combines multiple files into a single file. tar archive files are used as the export format of Docker images.
Virtualization: Containers are an example of virtualization – having a second “virtual” computer running and accessible from a host computer.

Reproducible Computational Environments Using Containers: Introduction to Docker and Singularity: Glossary

Key Points

Glossary