Welcome
Overview
Teaching: 15 min
Exercises: 0 minQuestions
What can I expect from this course?
How will the course work and how will I get help?
How can I give feedback to improve the course?
Objectives
Understand how this course works, how I can get help and how I can give feedback.
Code of Conduct
To make this as good a learning experience as possible for everyone involved we require all participants to adhere to the ARCHER2 Code of Conduct.
Any form or behaviour to exclude, intimidate, or cause discomfort is a violation of the Code of Conduct. In order to foster a positive and professional learning environment we encourage the following kinds of behaviours throughout this course:
- Use welcoming and inclusive language
- Be respectful of different viewpoints and experiences
- Gracefully accept constructive criticism
- Focus on what is best for the course
- Show courtesy and respect towards other course participants
If you believe someone is violating the Code of Conduct, we ask that you report it to ARCHER2 Training Code of Conduct Committee by completing this form, who will take the appropriate action to address the situation.
Course structure and method
Rather than having separate lectures and practical sessions, this course is taught following The Carpentries methodology where we all work together through material learning key skills and information throughout the course. Typically, this follows the method of the instructor demonstrating and then the attendees doing along with the instructor.
There are helpers available to assist you and to answer any questions you may have as we work through the material together. You should also feel free to ask questions of the instructor whenever you like. The instructor will also provide many opportunities to pause and ask questions.
We will also make use of a shared collaborative document - the etherpad. You will find a link to this collaborative document on the course page. We will use it for a number of different purposes, for example, it may be used during exercises and instructors and helpers may put useful information or links in the etherpad that help or expand on the material being taught. If you have useful information to share with the class then please do add it to the etherpad. At the end of the course, we take a copy of the information in the etherpad, remove any personally-identifiable information and post this on the course archive page so you should always be able to come back and find any information you found useful.
Feedback
Feedback is integral to how we approach training both during and after the course. In particular, we use informal feedback activities during the course to ensure we tailor the pace and content appropriately for the attendees and structured feedback after the course to help us improve our training for the future.
You will be provided with the opportunity to provide feedback on the course after it has finished. We welcome all this feedback, both good and bad, as this information in key to allow us to continually improve the training we offer.
Key Points
We should all understand and follow the ARCHER2 Code of Conduct to ensure this course is conducted in the best teaching environment.
The course will be flexible to best meet the learning needs of the attendees.
Feedback is an essential part of our training to allow us to continue to improve and make sure the course is as useful as possible to attendees.
Introducing Containers
Overview
Teaching: 20 min
Exercises: 0 minQuestions
What are containers, and why might they be useful to me?
Objectives
Show how software depending on other software leads to configuration management problems.
Identify the problems that software installation can pose for research.
Explain the advantages of containerization.
Explain how using containers can solve software configuration problems
Disclaimers
-
Docker is complex software used for many different purposes. We are unlikely to give examples that suit all of your potential ideal use-cases, but would be delighted to at least open up a discussion of what those use-cases might be.
-
Containers are a topic that can require a significant amount of technical background to understand in detail. Most of the time containers, particularly as provided by Docker, do not require you to have a deep technical understanding of container technology in order to make use of them, but when things go wrong, the diagnostic messages may become difficult to understand.
Scientific Software Challenges
What’s Your Experience?
Take a minute to think about challenges that you have experienced in using scientific software (or software in general!) for your research.
You may have come up with some of the following:
- you want to use software that doesn’t exist for the operating system (Mac, Windows, Linux) you’d prefer.
- you struggle with installing a software tool because you have to install a number of other dependencies first. Those dependencies, in turn, require other things, and so on (i.e. combinatoric explosion).
- the software you’re setting up involves many dependencies and only a subset of all possible versions of those dependencies actually works as desired.
- you’re not actually sure what version of the software you’re using because the install process was so circuitous.
- you and a colleague are using the same software but get different results because you have installed different versions and/or are using different operating systems.
- you installed everything correctly on your computer but now need to install it on a colleague’s computer/campus computing cluster/etc.
- you’ve written a package for other people to use but a lot of your users frequently have trouble with installation.
- you need to reproduce a research project from a former colleague and the software used was on a system you no longer have access to.
A lot of these characteristics boil down to one fact: the main program you want to use likely depends on many, many, different other programs (including the operating system!), creating a very complex, and often fragile system. One change or missing piece may stop the whole thing from working or break something that was already running. It’s no surprise that this situation is sometimes informally termed “dependency hell”.
Software and Science
Again, take a minute to think about how the software challenges we’ve discussed could impact (or have impacted!) the quality of your work.
Unsurprisingly, software installation and configuration challenges can have negative consequences for research:
- you can’t use a specific tool at all, because it’s not available or installable.
- you can’t reproduce your results because you’re not sure what tools you’re actually using.
- you can’t access extra/newer resources because you’re not able to replicate your software set up.
- others cannot validate and/or build upon your work because they cannot recreate your system’s unique configuration.
Thankfully there are ways to get underneath (a lot of) this mess: containers to the rescue! Containers provide a way to package up software dependencies and access to resources such as files and communications networks in a uniform manner.
What is a Container? What is Docker?
Docker is a tool that allows you to build what are called “containers.” It’s not the only tool that can create containers, but is the one we’ve chosen for this workshop. But what is a container?
To understand containers, let’s first talk briefly about your computer.
Your computer has some standard pieces that allow it to work – often what’s called the hardware. One of these pieces is the CPU or processor; another is the amount of memory or RAM that your computer can use to store information temporarily while running programs; another is the hard drive, which can store information over the long-term. All these pieces work together to do the “computing” of a computer, but we don’t see them because they’re hidden from view (usually).
Instead, what we see is our desktop, program windows, different folders, and files. These all live in what’s called the filesystem. Everything on your computer – programs, pictures, documents, the operating system itself – lives somewhere in the filesystem.
NOW, imagine you want to install some new software but don’t want to take the chance of making a mess of your existing system by installing a bunch of additional stuff (libraries/dependencies/etc.). You don’t want to buy a whole new computer because it’s too expensive. What if, instead, you could have another independent filesystem and running operating system that you could access from your main computer, and that is actually stored within this existing computer?
Or, imagine you have two tools you want to use in your groundbreaking research on cat memes: PurrLOLing
, a tool that does AMAZINGLY well at predicting the best text for a meme based on the cat species and WhiskerSpot
, the only tool available for identifying cat species from images. You want to send cat pictures to WhiskerSpot
, and then send the species output to PurrLOLing
. But there’s a problem: PurrLOLing
only works on Ubuntu and WhiskerSpot
is only supported for OpenSUSE so you can’t have them on the same system! Again, we really want another filesystem (or two) on our computer that we could use to chain together WhiskerSpot
and PurrLOLing
in a “pipeline”…
Container systems, like Docker, are special programs on your computer that make it possible! The term “container” can be usefully considered with reference to shipping containers. Before shipping containers were developed, packing and unpacking cargo ships was time consuming and error prone, with high potential for different clients’ goods to become mixed up. Just like shipping containers keep things together that should stay together, software containers standardize the description and creation of a complete software system: you can drop a container into any computer with the container software installed (the ‘container host’), and it should “just work”.
Virtualization
Containers are an example of what’s called virtualization – having a second “virtual” computer running and accessible from a main or host computer. Another example of virtualization are virtual machines or VMs. A virtual machine typically contains a whole copy of an operating system in addition to its own filesystem and has to get booted up in the same way a computer would. A container is considered a lightweight version of a virtual machine; underneath, the container is (usually) using the Linux kernel and simply has some flavour of Linux + the filesystem inside.
One final term: while the container is an alternative filesystem layer that you can access and run from your computer, the container image is the ‘recipe’ or template for a container. The container image has all the required information to start up a running copy of the container. A running container tends to be transient and can be started and shut down. The image is more long-lived, as a source file for the container. You could think of the container image like a cookie cutter – it can be used to create multiple copies of the same shape (or container) and is relatively unchanging, where cookies come and go. If you want a different type of container (cookie) you need a different image (cookie cutter).
Putting the Pieces Together
Think back to some of the challenges we described at the beginning. The many layers of scientific software installations make it hard to install and re-install scientific software – which ultimately, hinders reliability and reproducibility.
But now, think about what a container is – a self-contained, complete, separate computer filesystem. What advantages are there if you put your scientific software tools into containers?
This solves several of our problems:
- documentation – there is a clear record of what software and software dependencies were used, from bottom to top.
- portability – the container can be used on any computer that has Docker installed – it doesn’t matter whether the computer is Mac, Windows or Linux-based.
- reproducibility – you can use the exact same software and environment on your computer and on other resources (like a large-scale computing cluster).
- configurability – containers can be sized to take advantage of more resources (memory, CPU, etc.) on large systems (clusters) or less, depending on the circumstances.
The rest of this workshop will show you how to download and run pre-existing containers on your own computer, and how to create and share your own containers.
Use cases for containers
Now that we have discussed a little bit about containers – what they do and the issues they attempt to address – you may be able to think of a few potential use cases in your area of work. Some examples of common use cases for containers in a research context include:
- Using containers solely on your own computer to use a specific software tool or to test out a tool (possibly to avoid a difficult and complex installation process, to save your time or to avoid dependency hell).
- Setting up software in a container and then sharing it with your collaborators for use on their computers or a remote computing resource (e.g. cloud-based or HPC system).
- Archiving the container(s) so you can repeat analysis/modelling using the same software and configuration in the future – capturing your workflow.
Key Points
Almost all software depends on other software components to function, but these components have independent evolutionary paths.
Small environments that contain only the software that is needed for a given task are easier to replicate and maintain.
Critical systems that cannot be upgraded, due to cost, difficulty, etc. need to be reproduced on newer systems in a maintainable and self-documented way.
Virtualization allows multiple environments to run on a single computer.
Containerization improves upon the virtualization of whole computers by allowing efficient management of the host computer’s memory and storage resources.
Containers are built from ‘recipes’ that define the required set of software components and the instructions necessary to build/install them within a container image.
Docker is just one software platform that can create containers and the resources they use.
Introducing the Docker Command Line
Overview
Teaching: 10 min
Exercises: 0 minQuestions
How do I know Docker is installed and running?
How do I interact with Docker?
Objectives
Explain how to check that Docker is installed and is ready to use.
Demonstrate some initial Docker command line interactions.
Use the built-in help for Docker commands.
Docker command line
Start the Docker application that you installed in working through the setup instructions for this session. Note that this might not be necessary if your laptop is running Linux or if the installation added the Docker application to your startup process.
You may need to login to Docker Hub
The Docker application will usually provide a way for you to log in to the Docker Hub using the application’s menu (macOS) or systray icon (Windows) and it is usually convenient to do this when the application starts. This will require you to use your Docker Hub username and your password. We will not actually require access to the Docker Hub until later in the course but if you can login now, you should do so.
Determining your Docker Hub username
If you no longer recall your Docker Hub username, e.g., because you have been logging into the Docker Hub using your email address, you can find out what it is through the steps:
- Open https://hub.docker.com/ in a web browser window
- Sign-in using your email and password (don’t tell us what it is)
- In the top-right of the screen you will see your username
Once your Docker application is running, open a shell (terminal) window, and run the following command to check that Docker is installed and the command line tools are working correctly. Below is the output for a Mac version, but the specific version is unlikely to matter much: it does not have to precisely match the one listed below.
$ docker --version
Docker version 20.10.5, build 55c4c88
The above command has not actually relied on the part of Docker that runs containers, just that Docker is installed and you can access it correctly from the command line.
A command that checks that Docker is working correctly is the docker container list
command (we cover this command in more detail later in the course).
Without explaining the details, output on a newly installed system would likely be:
$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
(The command docker info
will achieve a similar end but produces a larger amount of output.)
However, if you instead get a message similar to the following
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
then you need to check that you have started the Docker Desktop, Docker Engine, or however else you worked through the setup instructions.
Getting help
Often when working with a new command line tool, we need to get help. These tools often have some
sort of subcommand or flag (usually help
, -h
, or --help
) that displays a prompt describing how to use the
tool. For Docker, it’s no different. If we run docker --help
, we see the following output (running docker
also produces the help message):
Usage: docker [OPTIONS] COMMAND
A self-sufficient runtime for containers
Options:
--config string Location of client config files (default "/Users/vini/.docker")
-c, --context string Name of the context to use to connect to the daemon (overrides DOCKER_HOST env var and default context set with "docker context use")
-D, --debug Enable debug mode
-H, --host list Daemon socket(s) to connect to
-l, --log-level string Set the logging level ("debug"|"info"|"warn"|"error"|"fatal") (default "info")
--tls Use TLS; implied by --tlsverify
--tlscacert string Trust certs signed only by this CA (default "/Users/vini/.docker/ca.pem")
--tlscert string Path to TLS certificate file (default "/Users/vini/.docker/cert.pem")
--tlskey string Path to TLS key file (default "/Users/vini/.docker/key.pem")
--tlsverify Use TLS and verify the remote
-v, --version Print version information and quit
Management Commands:
app* Docker App (Docker Inc., v0.9.1-beta3)
builder Manage builds
buildx* Build with BuildKit (Docker Inc., v0.5.1-docker)
config Manage Docker configs
container Manage containers
context Manage contexts
image Manage images
manifest Manage Docker image manifests and manifest lists
network Manage networks
node Manage Swarm nodes
plugin Manage plugins
scan* Docker Scan (Docker Inc., v0.6.0)
secret Manage Docker secrets
service Manage services
stack Manage Docker stacks
swarm Manage Swarm
system Manage Docker
trust Manage trust on Docker images
volume Manage volumes
Commands:
attach Attach local standard input, output, and error streams to a running container
build Build an image from a Dockerfile
commit Create a new image from a container's changes
cp Copy files/folders between a container and the local filesystem
create Create a new container
diff Inspect changes to files or directories on a container's filesystem
events Get real time events from the server
exec Run a command in a running container
export Export a container's filesystem as a tar archive
history Show the history of an image
images List images
import Import the contents from a tarball to create a filesystem image
info Display system-wide information
inspect Return low-level information on Docker objects
kill Kill one or more running containers
load Load an image from a tar archive or STDIN
login Log in to a Docker registry
logout Log out from a Docker registry
logs Fetch the logs of a container
pause Pause all processes within one or more containers
port List port mappings or a specific mapping for the container
ps List containers
pull Pull an image or a repository from a registry
push Push an image or a repository to a registry
rename Rename a container
restart Restart one or more containers
rm Remove one or more containers
rmi Remove one or more images
run Run a command in a new container
save Save one or more images to a tar archive (streamed to STDOUT by default)
search Search the Docker Hub for images
start Start one or more stopped containers
stats Display a live stream of container(s) resource usage statistics
stop Stop one or more running containers
tag Create a tag TARGET_IMAGE that refers to SOURCE_IMAGE
top Display the running processes of a container
unpause Unpause all processes within one or more containers
update Update configuration of one or more containers
version Show the Docker version information
wait Block until one or more containers stop, then print their exit codes
Run 'docker COMMAND --help' for more information on a command.
There is a list of commands and the end of the help message says: Run 'docker COMMAND --help' for more information on
a command.
For example, take the docker container ls
command that we ran previously. We can see from the Docker help prompt
that container
is a Docker command, so to get help for that command, we run:
docker container --help # or instead 'docker container'
Usage: docker container COMMAND
Manage containers
Commands:
attach Attach local standard input, output, and error streams to a running container
commit Create a new image from a container's changes
cp Copy files/folders between a container and the local filesystem
create Create a new container
diff Inspect changes to files or directories on a container's filesystem
exec Run a command in a running container
export Export a container's filesystem as a tar archive
inspect Display detailed information on one or more containers
kill Kill one or more running containers
logs Fetch the logs of a container
ls List containers
pause Pause all processes within one or more containers
port List port mappings or a specific mapping for the container
prune Remove all stopped containers
rename Rename a container
restart Restart one or more containers
rm Remove one or more containers
run Run a command in a new container
start Start one or more stopped containers
stats Display a live stream of container(s) resource usage statistics
stop Stop one or more running containers
top Display the running processes of a container
unpause Unpause all processes within one or more containers
update Update configuration of one or more containers
wait Block until one or more containers stop, then print their exit codes
Run 'docker container COMMAND --help' for more information on a command.
There’s also help for the container ls
command:
docker container ls --help # this one actually requires the '--help' flag
Usage: docker container ls [OPTIONS]
List containers
Aliases:
ls, ps, list
Options:
-a, --all Show all containers (default shows just running)
-f, --filter filter Filter output based on conditions provided
--format string Pretty-print containers using a Go template
-n, --last int Show n last created containers (includes all states) (default -1)
-l, --latest Show the latest created container (includes all states)
--no-trunc Don't truncate output
-q, --quiet Only display container IDs
-s, --size Display total file sizes
You may notice that there are many commands that stem from the docker
command. Instead of trying to remember
all possible commands and options, it’s better to learn how to effectively get help from the command line. Although
we can always search the web, getting the built-in help from our tool is often much faster and may provide the answer
right away. This applies not only to Docker, but also to most command line-based tools.
Exploring a command
Run
docker --help
and pick a command from the list. Explore the help prompt for that command. Try to guess how a command would work by looking at theUsage:
section of the prompt.Solution
Suppose we pick the
docker build
command:docker build --help
Usage: docker build [OPTIONS] PATH | URL | - Build an image from a Dockerfile Options: --add-host list Add a custom host-to-IP mapping (host:ip) --build-arg list Set build-time variables --cache-from strings Images to consider as cache sources --disable-content-trust Skip image verification (default true) -f, --file string Name of the Dockerfile (Default is 'PATH/Dockerfile') --iidfile string Write the image ID to the file --isolation string Container isolation technology --label list Set metadata for an image --network string Set the networking mode for the RUN instructions during build (default "default") --no-cache Do not use cache when building the image -o, --output stringArray Output destination (format: type=local,dest=path) --platform string Set platform if server is multi-platform capable --progress string Set type of progress output (auto, plain, tty). Use plain to show container output (default "auto") --pull Always attempt to pull a newer version of the image -q, --quiet Suppress the build output and print image ID on success --secret stringArray Secret file to expose to the build (only if BuildKit enabled): id=mysecret,src=/local/secret --ssh stringArray SSH agent socket or keys to expose to the build (only if BuildKit enabled) (format: default|<id>[=<socket>|<key>[,<key>]]) -t, --tag list Name and optionally a tag in the 'name:tag' format --target string Set the target build stage to build.
We could try to guess that the command could be run like this:
docker build .
or
docker build https://github.com/docker/rootfs.git
Where
https://github.com/docker/rootfs.git
could be any relevant URL that supports a Docker image.
Key Points
A toolbar icon indicates that Docker is ready to use (on Windows and macOS).
You will typically interact with Docker using the command line.
To learn how to run a certain Docker command, we can type the command followed by the
--help
flag.
Exploring and Running Containers
Overview
Teaching: 20 min
Exercises: 10 minQuestions
How do I interact with a Docker container on my computer?
Objectives
Use the correct command to see which Docker images are on your computer.
Be able to download new Docker images.
Demonstrate how to start an instance of a container from an image.
Describe at least two ways to execute commands inside a running Docker container.
Reminder of terminology: images and containers
Recall that a container “image” is the template from which particular instances of containers will be created.
Let’s explore our first Docker container. The Docker team provides a simple container
image online called hello-world
. We’ll start with that one.
Downloading Docker images
The docker image
command is used to list and modify Docker images.
You can find out what container images you have on your computer by using the following command (“ls” is short for “list”):
$ docker image ls
If you’ve just installed Docker, you won’t see any images listed.
To get a copy of the hello-world
Docker image from the internet, run this command:
$ docker pull hello-world
You should see output like this:
Using default tag: latest
latest: Pulling from library/hello-world
1b930d010525: Pull complete
Digest: sha256:f9dfddf63636d84ef479d645ab5885156ae030f611a56f3a7ac7f2fdd86d7e4e
Status: Downloaded newer image for hello-world:latest
docker.io/library/hello-world:latest
Docker Hub
Where did the
hello-world
image come from? It came from the Docker Hub website, which is a place to share Docker images with other people. More on that in a later episode.
Exercise: Check on Your Images
What command would you use to see if the
hello-world
Docker image had downloaded successfully and was on your computer? Give it a try before checking the solution.Solution
To see if the
hello-world
image is now on your computer, run:$ docker image ls
Note that the downloaded hello-world
image is not in the folder where you are in the terminal! (Run
ls
by itself to check.) The image is not a file like our normal programs and documents;
Docker stores it in a specific location that isn’t commonly accessed, so it’s necessary
to use the special docker image
command to see what Docker images you have on your
computer.
Running the hello-world
container
To create and run containers from named Docker images you use the docker run
command. Try the following docker run
invocation. Note that it does not matter what your current working directory is.
$ docker run hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
What just happened? When we use the docker run
command, Docker does three things:
1. Starts a Running Container | 2. Performs Default Action | 3. Shuts Down the Container |
---|---|---|
Starts a running container, based on the image. Think of this as the “alive” or “inflated” version of the container – it’s actually doing something. | If the container has a default action set, it will perform that default action. This could be as simple as printing a message (as above) or running a whole analysis pipeline! | Once the default action is complete, the container stops running (or exits). The image is still there, but nothing is actively running. |
The hello-world
container is set up to run an action by default –
namely to print this message.
Using
docker run
to get the imageWe could have skipped the
docker pull
step; if you use thedocker run
command and you don’t already have a copy of the Docker image, Docker will automatically pull the image first and then run it.
Running a container with a chosen command
But what if we wanted to do something different with the container? The output
just gave us a suggestion of what to do – let’s use a different Docker image
to explore what else we can do with the docker run
command. The suggestion above
is to use ubuntu
, but we’re going to run a different type of Linux, alpine
instead because it’s quicker to download.
Run the Alpine Docker container
Try downloading and running the
alpine
Docker container. You can do it in two steps, or one. What are they?
What happened when you ran the Alpine Docker container?
$ docker run alpine
If you never used the alpine docker image on your computer, docker probably printed a message that it couldn’t find the image and had to download it. If you used the alpine image before, the command will probably show no output. That’s because this particular container is designed for you to provide commands yourself. Try running this instead:
$ docker run alpine cat /etc/os-release
You should see the output of the cat /etc/os-release
command, which prints out
the version of Alpine Linux that this container is using and a few additional bits of information.
Hello World, Part 2
Can you run the container and make it print a “hello world” message?
Give it a try before checking the solution.
Solution
Use the same command as above, but with the
echo
command to print a message.$ docker run alpine echo 'Hello World'
So here, we see another option – we can provide commands at the end of the docker run
command and they will execute inside the running container.
Running containers interactively
In all the examples above, Docker has started the container, run a command, and then
immediately shut down the container. But what if we wanted to keep the container
running so we could log into it and test drive more commands? The way to
do this is by adding the interactive flag -it
to the docker run
command and provide a shell (bash
,sh
, etc.)
as our command. The alpine docker image doesn’t include bash
so we need to use sh
.
$ docker run -it alpine sh
Technically…
Technically, the interactive flag is just
-i
– the extra-t
(combined as-it
above) is the “pseudo-TTY” option, a fancy term that means a text interface. This allows you to connect to a shell, likebash
, using a command line. Since you usually want to have a command line when running interactively, it makes sense to use the two together.
Your prompt should change significantly to look like this:
/ #
That’s because you’re now inside the running container! Try these commands:
pwd
ls
whoami
echo $PATH
cat /etc/os-release
All of these are being run from inside the running container, so you’ll get information
about the container itself, instead of your computer. To finish using the container,
just type exit
.
/ # exit
Practice Makes Perfect
Can you find out the version of Linux installed on the
busybox
container? (Hint: If you search online, you’ll find that there are a few different ways to find out what version of Linux a computer or container is running. Because thebusybox
container is very simplified, you’ll want to use a command that prints out the contents of the file/proc/version
.)Can you also find the
busybox
program? What does it do? (Hint: try passing--help
to almost any command will give you more information.)Solution 1 – Interactive
Run the busybox container interactively – you can use
docker pull
first, or just run it with this command:$ docker run -it busybox sh
Then try, running these commands
/# cat /proc/version /# busybox --help
Exit when you’re done.
/# exit
Solution 2 – Run commands
Run the busybox container, first with a command to read out the Linux version:
$ docker run busybox cat /proc/version
Then run the container again with a command to print out the busybox help:
$ docker run busybox busybox --help
Conclusion
So far, we’ve seen how to download Docker images, use them to run commands inside running containers, and even how to explore a running container from the inside. Next, we’ll take a closer look at all the different kinds of Docker images that are out there.
Key Points
The
docker pull
command downloads Docker images from the internet.The
docker image
command lists Docker images that are (now) on your computer.The
docker run
command creates running containers from images and can run commands inside them.When using the
docker run
command, a container can run a default action (if it has one), a user specified action, or a shell to be used interactively.
Break
Overview
Teaching: min
Exercises: minQuestions
Objectives
Comfort break
Key Points
Finding Containers on Docker Hub
Overview
Teaching: 10 min
Exercises: 10 minQuestions
What is the Docker Hub, and why is it useful?
Objectives
Explain how the Docker Hub augments Docker use.
Explore the Docker Hub webpage for a popular Docker image.
Find the list of tags for a particular Docker image.
Identify the three components of a container’s identifier.
In the previous episode, we ran a few different containers: hello-world
, alpine
,
and maybe busybox
. Where did these containers come from? The Docker Hub!
Introducing the Docker Hub
The Docker Hub is an online repository of container images, a vast number of which are publicly available. A large number of the images are curated by the developers of the software that they package. Also, many commonly used pieces of software that have been containerized into images are officially endorsed, which means that you can trust the containers to have been checked for functionality, stability, and that they don’t contain malware.
Docker can be used without connecting to the Docker Hub
Note that while the Docker Hub is well integrated into Docker functionality, the Docker Hub is certainly not required for all types of use of Docker containers. For example, some organizations may run container infrastructure that is entirely disconnected from the Internet.
Exploring an Example Docker Hub Page
As an example of a Docker Hub page, let’s explore the page for the Python language. The most basic form of containerised Python is in the “python” image (which is endorsed by the Docker team). Open your web browser to https://hub.docker.com/_/python to see what is on a typical Docker Hub software page.
The top-left provides information about the name, short description, popularity (i.e., more than a billion downloads in the case of this image), and endorsements.
The top-right provides the command to pull this image to your computer.
The main body of the page contains many used headings, such as:
- Which tags (i.e., image versions) are supported;
- Summary information about where to get help, which computer architectures are supported, etc.;
- A longer description of the package;
- Examples of how to use the image; and
- The licence that applies.
The “Examples of how to use the image” section of most images’ pages will provide examples that are likely to adequately cover your intended use of the image.
Exploring Image Versions
A single Docker Hub page can have many different versions of container images,
based on the version of the software inside. These
versions are indicated by “tags”. When referring to the specific version of a container
by its tag, you use a colon, :
, like this:
CONTAINERNAME:TAG
So if I wanted to download the python
container, with Python 3.8, I would use this name:
$ docker pull python:3.8
But if I wanted to download a Python 3.6 container, I would use this name:
$ docker pull python:3.6
The default tag (which is used if you don’t specify one) is called latest
.
So far, we’ve only seen containers that are maintained by the Docker team. However, it’s equally common to use containers that have been produced by individual owners or organizations. Containers that you create and upload to Docker Hub would fall into this category, as would the containers maintained by organizations like ContinuumIO (the folks who develop the Anaconda Python environment) or community groups like rocker, a group that builds community R containers.
The name for these group- or individually-managed containers have this format:
OWNER/CONTAINERNAME:TAG
Repositories
The technical name for the contents of a Docker Hub page is a “repository.” The tag indicates the specific version of the container image that you’d like to use from a particular repository. So a slightly more accurate version of the above example is:
OWNER/REPOSITORY:TAG
What’s in a name?
How would I download the Docker container produced by the
rocker
group that has version 3.6.1 of R and the tidyverse installed?Solution
First, search for
rocker
in Docker Hub. Then look for theirtidyverse
image. You can look at the list of tags, or just guess that the tag is3.6.1
. Altogether, that means that the name of the container we want to download is:$ docker pull rocker/tidyverse:3.6.1
Finding Containers on Docker Hub
There are many different containers on Docker Hub. This is where the real advantage of using containers shows up – each container represents a complete software installation that you can use and access without any extra work!
The easiest way to find containers is to search on Docker Hub, but sometimes software pages have a link to their containers from their home page.
Note that anyone can create an account on Docker Hub and share a container there, so it’s important to exercise caution when choosing a container on Docker Hub. These are some indicators that a container on Docker Hub is consistently maintained, functional and secure:
- The image is updated regularly.
- The image associated with a well established company, community, or other group that is well-known.
- There is a Dockerfile or other listing of what has been installed to the container.
- The image page has documentation on how to use the container.
If a container is never updated, created by a random person, and does not have a lot of metadata, it is probably worth skipping over. Even if such a container is secure, it is not reproducible and not a dependable way to run research computations.
What container is right for you?
Find a Docker container that’s relevant to you. Take into account the suggestions above of what to look for as you evaluate options. If you’re unsuccessful in your search, or don’t know what to look for, you can use the R or Python containers we’ve already seen.
Once you find a container, use the skills from the previous episode to download the image and explore it.
Key Points
The Docker Hub is an online repository of container images.
Many Docker Hub images are public, and may be officially endorsed.
Each Docker Hub page about an image provides structured information and subheadings
Most Docker Hub pages about images contain sections that provide examples of how to use those images.
Many Docker Hub images have multiple versions, indicated by tags.
The naming convention for Docker containers is:
OWNER/CONTAINER:TAG
Cleaning Up Containers
Overview
Teaching: 10 min
Exercises: 0 minQuestions
How do I interact with a Docker container on my computer?
How do I manage my containers and images?
Objectives
Explain how to list running and completed containers.
Know how to list and remove container images.
Removing images
The images and their corresponding containers can start to take up a lot of disk space if you don’t clean them up occasionally, so it’s a good idea to periodically remove container images that you won’t be using anymore.
In order to remove a specific image, you need to find out details about the image, specifically, the “image ID”. For example, say my laptop contained the following image:
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
hello-world latest fce289e99eb9 15 months ago 1.84kB
You can remove the image with a docker image rm
command that includes the image ID, such as:
$ docker image rm fce289e99eb9
or use the image name, like so:
$ docker image rm hello-world
However, you may see this output:
Error response from daemon: conflict: unable to remove repository reference "hello-world" (must force) - container e7d3b76b00f4 is using its referenced image fce289e99eb9
This happens when Docker hasn’t cleaned up some of the times when a container has been actually run. So before removing the container image, we need to be able to see what containers are currently running, or have been run recently, and how to remove these.
What containers are running?
Working with containers, we are going to shift to a new docker command: docker container
. Similar to docker image
, we can list running containers by typing:
$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
Notice that this command didn’t return any containers because our containers all exited and thus stopped running after they completed their work.
docker ps
The command
docker ps
serves the same purpose asdocker container ls
, and comes from the Unix shell commandps
which describes running processes.
What containers have run recently?
There is also a way to list running containers, and those that have completed recently, which is to add the --all
/-a
flag to the docker container ls
command as shown below.
$ docker container ls --all
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9c698655416a hello-world "/hello" 2 minutes ago Exited (0) 2 minutes ago zen_dubinsky
6dd822cf6ca9 hello-world "/hello" 3 minutes ago Exited (0) 3 minutes ago eager_engelbart
Keeping it clean
You might be surprised at the number of containers Docker is still keeping track of. One way to prevent this from happening is to add the
--rm
flag todocker run
. This will completely wipe out the record of the run container when it exits. If you need a reference to the running container for any reason, don’t use this flag.
How do I remove an exited container?
To delete an exited container you can run the following command, inserting the CONTAINER ID
for the container you wish to remove.
It will repeat the CONTAINER ID
back to you, if successful.
$ docker container rm 9c698655416a
9c698655416a
You can remove all exited containers using the docker container prune
command. Be careful
with this command. If you have containers you may want to reconnect to, you should not use this
command. It will ask you if to confirm you want to remove these containers, see output below.
If successful it will print the full CONTAINER ID
back to you.
$ docker container prune
WARNING! This will remove all stopped containers.
Are you sure you want to continue? [y/N] y
Deleted Containers:
9c698655416a848278d16bb1352b97e72b7ea85884bff8f106877afe0210acfc
6dd822cf6ca92f3040eaecbd26ad2af63595f30bb7e7a20eacf4554f6ccc9b2b
Removing images, for real this time
Now that we’ve removed any potentially running or stopped containers, we can try again to
delete the hello-world
image.
$ docker image rm hello-world
Untagged: hello-world:latest
Untagged: hello-world@sha256:5f179596a7335398b805f036f7e8561b6f0e32cd30a32f5e19d17a3cda6cc33d
Deleted: sha256:fce289e99eb9bca977dae136fbe2a82b6b7d4c372474c9235adc1741675f587e
Deleted: sha256:af0b15c8625bb1938f1d7b17081031f649fd14e6b233688eea3c5483994a66a3
The reason that there are a few lines of output, is that a given image may have been formed by merging multiple underlying layers.
Any layers that are used by multiple Docker images will only be stored once.
Now the result of docker image ls
should no longer include the hello-world
image.
Key Points
docker container
has subcommands used to interact and manage containers.
docker image
has subcommands used to interact and manage images.
docker ps
can provide information on currently running containers.
Lunch
Overview
Teaching: min
Exercises: minQuestions
Objectives
Lunch break
Key Points
Creating Your Own Container Images
Overview
Teaching: 20 min
Exercises: 15 minQuestions
How can I make my own Docker images?
How do I document the ‘recipe’ for a Docker image?
Objectives
Explain the purpose of a
Dockerfile
and show some simple examples.Demonstrate how to build a Docker image from a
Dockerfile
.Compare the steps of creating a container interactively versus a
Dockerfile
.Create an installation strategy for a container.
Demonstrate how to upload (‘push’) your container images to the Docker Hub.
Describe the significance of the Docker Hub naming scheme.
There are lots of reasons why you might want to create your own Docker image.
- You can’t find a container with all the tools you need on Docker Hub.
- You want to have a container to “archive” all the specific software versions you ran for a project.
- You want to share your workflow with someone else.
Interactive installation
Before creating a reproducible installation, let’s experiment with installing
software inside a container. Start the alpine
container from before, interactively:
$ docker run -it alpine sh
Because this is a basic container, there’s a lot of things not installed – for
example, python3
.
/# python3
sh: python3: not found
Inside the container, we can run commands to install Python3. The Alpine version of
Linux has a installation tool called apk
that we can use to install Python3.
/# apk add --update python3 py3-pip python3-dev
We can test our installation by running a Python command:
/# python3 --version
Once Python is installed, we can add Python packages using the pip package installer:
/# pip install cython
Exercise: Searching for Help
Can you find instructions for installing R on Alpine Linux? Do they work?
Solution
A quick search should hopefully show that the way to install R on Alpine Linux is:
/# apk add R
Once we exit, these changes are not saved to a new container by default. There is
a command that will “snapshot” our changes, but building containers this way is
not easily reproducible. Instead, we’re going to take what we’ve learned from this
interactive installation and create our container from a reproducible recipe,
known as a Dockerfile
.
If you haven’t already, exit out of the interactively running container.
/# exit
Put installation instructions in a Dockerfile
A Dockerfile
is a plain text file with keywords and commands that
can be used to create a new container image.
From your shell, go to the folder you downloaded at the start of the lesson and print out the Dockerfile inside:
$ cd ~/Desktop/docker-intro/basic
$ cat Dockerfile
FROM <EXISTING IMAGE>
RUN <INSTALL CMDS FROM SHELL>
RUN <INSTALL CMDS FROM SHELL>
CMD <CMD TO RUN BY DEFAULT>
Let’s break this file down:
- The first line,
FROM
, indicates which container we’re starting with. It is the “base” image we are going to start from. - The next two lines
RUN
, will indicate installation commands we want to run. These are the same commands that we used interactively above. - The last line,
CMD
, indicates the default command we want the container to run, if no other command is provided. It is recommended to provideCMD
in exec-form (see theCMD
section of the Dockerfile documentation for more details). It is written as a list which contains the executable to run as its first element, optionally followed by any arguments as subsequent elements. The list is enclosed in square brackets ([]
) and its elements are double-quoted ("
) strings which are separated by commas. For example,CMD ["ls", "-lF", "--color", "/etc"]
would translate tols -lF --color /etc
.
shell-form and exec-form for CMD
Another way to specify the parameter for the
CMD
instruction is the shell-form. Here you type the command as you would call it from the command line. Docker then silently runs this command in the image’s standard shell.CMD cat /etc/passwd
is equivalent toCMD ["/bin/sh", "-c", "cat /etc/passwd"]
. We recommend to prefer the more explicit exec-form because we will be able to create more flexible container command options and make sure complex commands are unambiguous in this format.
Exercise: Take a Guess
Do you have any ideas about what we should use to fill in the sample Dockerfile to replicate the installation we did above?
Solution:
Based on our experience above, edit the
Dockerfile
(in your text editor of choice) to look like this:FROM alpine RUN apk add --update python3 py3-pip python3-dev RUN pip install cython CMD ["python3", "--version"]
The recipe provided by the Dockerfile
shown in the solution to the preceding exercise will use Alpine Linux as the base container,
add Python and the Cython library, and set a default command to request Python to report its version information.
Create a new Docker image
So far, we only have a text file named Dockerfile
– we do not yet have a container image.
We want Docker to take this Dockerfile
,
run the installation commands contained within it, and then save the
resulting container as a new container image. To do this we will use the
docker build
command.
We have to provide docker build
with two pieces of information:
- the location of the
Dockerfile
- the name of the new image. Remember the naming scheme from before? You should name
your new image with your Docker Hub username and a name for the container, like this:
USERNAME/CONTAINERNAME
.
All together, the build command that you should run on your computer, will have a similar structure to this:
$ docker build -t USERNAME/CONTAINERNAME .
The -t
option names the container; the final dot indicates that the Dockerfile
is in
our current directory.
For example, if my user name was alice
and I wanted to call my
image alpine-python
, I would use this command:
$ docker build -t alice/alpine-python .
Exercise: Review!
Think back to earlier. What command can you run to check if your image was created successfully? (Hint: what command shows the images on your computer?)
We didn’t specify a tag for our image name. What tag did Docker automatically use?
What command will run the container you’ve created? What should happen by default if you run the container? Can you make it do something different, like print “hello world”?
Solution
To see your new image, run
docker image ls
. You should see the name of your new image under the “REPOSITORY” heading.In the output of
docker image ls
, you can see that Docker has automatically used thelatest
tag for our new image.We want to use
docker run
to run the container.The following command should run the container and print out our default message, the version of Python:
$ docker run alice/alpine-python
To run the container and print out “Hello world” instead:
$ docker run alice/alpine-python echo "Hello World"
While it may not look like you have achieved much, you have already effected the combination of a lightweight Linux operating system with your specification to run a given command that can operate reliably on macOS, Microsoft Windows, Linux and on the cloud!
Boring but important notes about installation
There are a lot of choices when it comes to installing software – sometimes too many! Here are some things to consider when creating your own container:
- Start smart, or, don’t install everything from scratch! If you’re using Python as your main tool, start with a Python container. Same with R. We’ve used Alpine Linux as an example in this lesson, but it’s generally not a good container to start with for initial development and experimentation because it is a less common distribution of Linux; using Ubuntu, Debian and CentOS are all good options for scientific software installations. The program you’re using might recommend a particular distribution of Linux, and if so, it may be useful to start with a container image for that distribution.
- How big? How much software do you really need to install? When you have a choice, lean towards using smaller starting images and installing only what’s needed for your software, as a bigger image means longer download times to use.
- Know (or Google) your Linux. Different distributions of Linux often have distinct sets of tools for installing software. The
apk
command we used above is the software package installer for Alpine Linux. The installers for various common Linux distributions are listed below:- Ubuntu:
apt
orapt-get
- Debian:
deb
- CentOS:
yum
Most common software installations are available to be installed via these tools. A web search for “install X on Y Linux” is usually a good start for common software installation tasks; if something isn’t available via the Linux distribution’s installation tools, try the options below.
- Ubuntu:
- Use what you know. You’ve probably used commands like
pip
orinstall.packages()
before on your own computer – these will also work to install things in containers (if the basic scripting language is installed). - README. Many scientific software tools have a README or installation instructions that lay out how to install software. You want to look for instructions for Linux. If the install instructions include options like those suggested above, try those first.
In general, a good strategy for installing software is:
- Make a list of what you want to install.
- Look for pre-existing containers.
- Read through instructions for software you’ll need to install.
- Try installing everything interactively in your base container – take notes!
- From your interactive installation, create a
Dockerfile
and then try to build the container again from that.
Share your new container on Docker Hub
Images that you release publicly can be stored on the Docker Hub for free. If you
name your image as described above, with your Docker Hub username, all you need to do
is run the opposite of docker pull
– docker push
.
$ docker push alice/alpine-python
Make sure to substitute the full name of your container!
In a web browser, open https://hub.docker.com, and on your user page you should now see your container listed, for anyone to use or build on.
Logging In
Technically, you have to be logged into Docker on your computer for this to work. Usually it happens by default, but if
docker push
doesn’t work for you, rundocker login
first, enter your Docker Hub username and password, and then trydocker push
again.
What’s in a name? (again)
You don’t have to name your containers using the USERNAME/CONTAINER:TAG
naming scheme. On your own computer, you can call containers whatever you want, and refer to
them by the names you choose. It’s only when you want to share a container that it
needs the correct naming format.
You can rename images using the docker tag
command. For example, imagine someone
named Alice has been working on a workflow container and called it workflow-test
on her own computer. She now wants to share it in her alice
Docker Hub account
with the name workflow-complete
and a tag of v1
. Her docker tag
command
would look like this:
$ docker tag workflow-test alice/workflow-complete:v1
She could then push the re-named container to Docker Hub,
using docker push alice/workflow-complete:v1
Key Points
Dockerfile
s specify what is within Docker images.The
docker build
command is used to build an image from aDockerfile
.You can share your Docker images through the Docker Hub so that others can create Docker containers from your images.
Creating More Complex Container Images
Overview
Teaching: 30 min
Exercises: 30 minQuestions
How can I make more complex container images?
Objectives
Explain how you can include files within Docker images when you build them.
Explain how you can access files on the Docker host from your Docker containers.
In order to create and use your own containers, you may need more information than our previous example. You may want to use files from outside the container, copy those files into the container, and just generally learn a little bit about software installation. This episode will cover these. Note that the examples will get gradually more and more complex – most day-to-day use of containers can be accomplished using the first 1–2 sections on this page.
Using scripts and files from outside the container
In your shell, change to the sum
folder in the docker-intro
folder and look at
the files inside.
$ cd ~/Desktop/docker-intro/sum
$ ls
This folder has both a Dockerfile
and a Python script called sum.py
. Let’s say
we wanted to try running the script using our recently created alpine-python
container.
Running containers
What command would we use to run Python from the
alpine-python
container?
If we try running the container and Python script, what happens?
$ docker run alice/alpine-python python3 sum.py
python3: can't open file 'sum.py': [Errno 2] No such file or directory
No such file or directory
What does the error message mean? Why might the Python inside the container not be able to find or open our script?
Solution
The problem here is that the container and its filesystem is separate from our host computer’s filesystem. When the container runs, it can’t see anything outside itself, including any of the files on our computer.
In order to use Python (inside the container) and our script (outside the container, on our computer), we need to create a link between the directory on our computer and the container.
This link is called a “mount” and is what happens automatically when a USB drive or other external hard drive gets connected to a computer – you can see the contents appear as if they were on your computer.
We can create a mount between our computer and the running container by using an additional
option to docker run
. We’ll also use the variable ${PWD}
which will substitute
in our current working directory. The option will look like this
-v ${PWD}:/temp
What this means is – link my current directory with the container, and inside the
container, name the directory /temp
Let’s try running the command now:
$ docker run -v ${PWD}:/temp alice/alpine-python python3 sum.py
But we get the same error!
python3: can't open file 'sum.py': [Errno 2] No such file or directory
This final piece is a bit tricky – we really have to remember to put ourselves
inside the container. Where is the sum.py
file? It’s in the directory that’s been
mapped to /temp
– so we need to include that in the path to the script. This
command should give us what we need:
$ docker run -v ${PWD}:/temp alice/alpine-python python3 /temp/sum.py
Note that if we create any files in the /temp
directory while the container is
running, these files will appear on our host filesystem in the original directory
and will stay there even when the container stops.
Other Commonly Used Docker Run Flags
Docker run has many other useful flags to alter its function. A couple that are commonly used include
-w
and-u
.The
--workdir
/-w
flag sets the working directory a.k.a. runs the command being executed inside the directory specified. For example, the following code would run thepwd
command in a container started from the latest ubuntu image in the/home/alice
directory and print/home/alice
. If the directory doesn’t exist in the image it will create it.docker run -w /home/alice/ -i -t ubuntu pwd
The
--user
/-u
flag lets you specify the username you would like to run the container as. This is helpful if you’d like to write files to a mounted folder and not write them asroot
but rather your own user identity and group. A common example of the-u
flag is--user $(id -u):$(id -g)
which will fetch the current user’s ID and group and run the container as that user.
Exercise: Explore the script
What happens if you use the
docker run
command above and put numbers after the script name?Solution
This script comes from the Python Wiki and is set to add all numbers that are passed to it as arguments.
Exercise: Checking the options
Our Docker command has gotten much longer! Can you go through each piece of the Docker command above the explain what it does? How would you characterize the key components of a Docker command?
Solution
Here’s a breakdown of each piece of the command above
docker run
: use Docker to run a container-v ${PWD}:/temp
: connect my current working directory (${PWD}
) as a folder inside the container called/temp
alice/alpine-python
: name of the container to runpython3 /temp/sum.py
: what commands to run in the containerMore generally, every Docker command will have the form:
docker [action] [docker options] [docker image] [command to run inside]
Exercise: Interactive jobs
Try using the directory mount option but run the container interactively. Can you find the folder that’s connected to your computer? What’s inside?
Solution
The docker command to run the container interactively is:
$ docker run -v ${PWD}:/temp -it alice/alpine-python sh
Once inside, you should be able to navigate to the
/temp
folder and see that’s contents are the same as the files on your computer:/# cd /temp /# ls
Mounting a folder can be very useful when you want to run the software inside your container on many different input files. In other situations, you may want to save or archive an authoritative version of your data by adding it to the container permanently. That’s what we will cover next.
Including your scripts and data within a container image
Our next project will be to add our own files to a container – something you
might want to do if you’re sharing a finished analysis or just want to have
an archived copy of your entire analysis including the data. Let’s assume that we’ve finished with our sum.py
script and want to add it to the container itself.
In your shell, you should still be in the sum
folder in the docker-intro
folder.
$ pwd
$ /Users/yourname/Desktop/docker-intro/sum
Let’s add a new line to the Dockerfile
we’ve been using so far to create a copy of sum.py
.
We can do so by using the COPY
keyword.
COPY sum.py /home
This line will cause Docker to copy the file from your computer into the container’s filesystem. Let’s build the container like before, but give it a different name:
$ docker build -t alice/alpine-sum .
Exercise: Did it work?
Can you remember how to run a container interactively? Try that with this one. Once inside, try running the Python script.
Solution
You can start the container interactively like so:
$ docker run -it alice/alpine-sum sh
You should be able to run the python command inside the container like this:
/# python3 /home/sum.py
This COPY
keyword can be used to place your own scripts or own data into a container
that you want to publish or use as a record. Note that it’s not necessarily a good idea
to put your scripts inside the container if you’re constantly changing or editing them.
Then, referencing the scripts from outside the container is a good idea, as we
did in the previous section. You also want to think carefully about size – if you
run docker image ls
you’ll see the size of each image all the way on the right of
the screen. The bigger your image becomes, the harder it will be to easily download.
Copying alternatives
Another trick for getting your own files into a container is by using the
RUN
keyword and downloading the files from the internet. For example, if your code is in a GitHub repository, you could include this statement in your Dockerfile to download the latest version every time you build the container:RUN git clone https://github.com/alice/mycode
Similarly, the
wget
command can be used to download any file publicly available on the internet:RUN wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.10.0/ncbi-blast-2.10.0+-x64-linux.tar.gz
Note that the above
RUN
examples depend on commands (git
andwget
respectively) that must be available within your container: Linux distributions such as Alpine may require you to install such commands before using them withinRUN
statements.
More fancy Dockerfile
options (optional, for presentation or as exercises)
We can expand on the example above to make our container even more “automatic”. Here are some ideas:
Make the sum.py
script run automatically
FROM alpine
COPY sum.py /home
RUN apk add --update python3 py3-pip python3-dev
# Run the sum.py script as the default command
CMD ["python3", "/home/sum.py"]
Build and test it:
$ docker build -t alice/alpine-sum:v1 .
$ docker run alice/alpine-sum:v1
You’ll notice that you can run the container without arguments just fine,
resulting in sum = 0
, but this is boring. Supplying arguments however
doesn’t work:
docker run alice/alpine-sum:v1 10 11 12
results in
docker: Error response from daemon: OCI runtime create failed:
container_linux.go:349: starting container process caused "exec:
\"10\": executable file not found in $PATH": unknown.
This is because the arguments 10 11 12
are interpreted as a
command that replaces the default command given by CMD
["python3", "/home/sum.py"]
in the image.
To achieve the goal of having a command that always runs when the
container is run and can be passed the arguments given on the
command line, use the keyword ENTRYPOINT
in the Dockerfile
.
FROM alpine
COPY sum.py /home
RUN apk add --update python3 py3-pip python3-dev
# Run the sum.py script as the default command and
# allow people to enter arguments for it
ENTRYPOINT ["python3", "/home/sum.py"]
# Give default arguments, in case none are supplied on
# the command-line
CMD ["10", "11"]
Build and test it:
$ docker build -t alice/alpine-sum:v2 .
# Most of the time you are interested in the sum of 10 and 11:
$ docker run alice/alpine-sum:v2
# Sometimes you have more challenging calculations to do:
$ docker run alice/alpine-sum:v2 12 13 14
Overriding the ENTRYPOINT
Sometimes you don’t want to run the image’s
ENTRYPOINT
. For example if you have a specialized image that does only sums, but you need an interactive shell to examine the container:$ docker run -it alice/alpine-sum:v2 /bin/sh
will yield
Please supply integer arguments
You need to override the
ENTRYPOINT
statement in the image like so:$ docker run -it --entrypoint /bin/sh alice/alpine-sum:v2
Add the sum.py
script to the PATH
so you can run it directly:
FROM alpine
COPY sum.py /home
# set script permissions
RUN chmod +x /home/sum.py
# add /home folder to the PATH
ENV PATH /home:$PATH
RUN apk add --update python3 py3-pip python3-dev
Build and test it:
$ docker build -t alice/alpine-sum:v3 .
$ docker run alice/alpine-sum:v3 sum.py 1 2 3 4
Key Points
Docker allows containers to read and write files from the Docker host.
You can include files from your Docker host into your Docker images by using the
COPY
instruction in yourDockerfile
.
Break
Overview
Teaching: min
Exercises: minQuestions
Objectives
Comfort break
Key Points
Examples of Using Container Images in Practice
Overview
Teaching: 20 min
Exercises: 0 minQuestions
How can I use Docker for my own work?
Objectives
Use existing container images and Docker in a research project.
Now that we have learned the basics of working with Docker images and containers, let’s apply what we learned to an example workflow.
You may choose one or more of the following examples to practice using containers.
Jekyll Website Example
In this Jekyll Website example, you can practice rendering this lesson website on your computer using the Jekyll static website generator in a Docker container. Rendering the website in a container avoids a complicated software installation; instead of installing Jekyll and all the other tools needed to create the final website, all the work can be done in the container. Additionally, when you no longer need to render the website, you can easily and cleanly remove the software from your computer.
GitHub Actions Example
In this GitHub Actions example, you can learn more about continuous integration in the cloud and how you can use container images with GitHub to automate repetitive tasks like testing code or deploying websites.
Using Containers on an HPC Cluster
It is possible to run containers on shared computing systems run by a university or national computing center. As a researcher, you can build and test your container on your own computer and then use it to run your full-scale computing work on a shared computing system like a high performance cluster or high throughput grid.
The catch? Most university and national computing centers do not support running containers with Docker commands, and instead use a similar tool called Singularity or Shifter. However, both of these programs can be used to run Docker container images, so often people create their container as a Docker container image, so they can run it using either of Docker or Singularity.
We will see examples of how to run containers on an HPC system in day two of this workshop. This will include pulling images from Docker Hub.
Seeking Examples
Do you have another example of using Docker in a workflow related to your field? Please open a lesson issue or submit a pull request to add it to this episode and the extras section of the lesson.
Key Points
There are many ways you might use Docker and existing container images in your research project.
Containers in Research Workflows: Reproducibility and Granularity
Overview
Teaching: 20 min
Exercises: 0 minQuestions
How can I use container images to make my research more reproducible?
How do I incorporate containers into my research workflow?
What are container orchestration tools and how can they potentially help me?
Objectives
Understand how container images can help make research more reproducible.
Understand what practical steps I can take to improve the reproducibility of my research using containers.
Know that container orchestration tools are and what they can do.
Although this workshop is titled “Reproducible computational environments using containers”, so far we have mostly covered the mechanics of using Docker with only passing reference to the reproducibility aspects. In this section, we discuss these aspects in more detail.
Work in progress…
Note that reproducibility aspects of software and containers are an active area of research, discussion and development so are subject to many changes. We will present some ideas and approaches here but best practices will likely evolve in the near future.
Reproducibility
By reproducibility here we mean the ability of someone else (or your future self) being able to reproduce what you did computationally at a particular time (be this in research, analysis or something else) as closely as possible even if they do not have access to exactly the same hardware resources that you had when you did the original work.
Some examples of why containers are an attractive technology to help with reproducibility include:
- The same computational work can be run across multiple different technologies seamlessly (e.g. Windows, macOS, Linux).
- You can save the exact process that you used for your computational work (rather than relying on potentially incomplete notes).
- You can save the exact versions of software and their dependencies in the image.
- You can access legacy versions of software and underlying dependencies which may not be generally available any more.
- Depending on their size, you can also potentially store a copy of key data within the image.
- You can archive and share the image as well as associating a persistent identifier with an image to allow other researchers to reproduce and build on your work.
Sharing images
As we have already seen, the Docker Hub provides a platform for sharing images publicly. Once you have uploaded an image, you can point people to its public location and they can download and build upon it.
This is fine for working collaboratively with images on a day-to-day basis but the Docker Hub is not a good option for long time archive of images in support of research and publications as:
- free accounts have a limit on how long an image will be hosted if it is not updated
- it does not support adding persistent identifiers to images
- it is easy to overwrite tagged images with newer versions by mistake.
Archiving and persistently identifying images using Zenodo
When you publish your work or make it publicly available in some way it is good practice to make images that you used for computational work available in an immutable, persistent way and to have an identifier that allows people to cite and give you credit for the work you have done. Zenodo provides this functionality.
Zenodo supports the archiving of tar archives and we can capture our Docker images as tar archives using the docker save
command.
For example, to export the image we created earlier in this lesson:
docker save alice/alpine-python:v1 -o alpine-python.tar
These tar images can become quite large and Zenodo supports uploads up to 50GB so you may need to compress your archive to make it fit on Zenodo using a tool such as gzip (or zip):
gzip alpine-python.tar
Once you have your archive, you can deposit it on Zenodo and this will:
- Create a long-term archive snapshot of your Docker image which people (including your future self) can download and reuse or reproduce your work.
- Create a persistent DOI (Digital Object Identifier) that you can cite in any publications or outputs to enable reproducibility and recognition of your work.
In addition to the archive file itself, the deposit process will ask you to provide some basic metadata to classify the image and the associated work.
Note that Zenodo is not the only option for archiving and generating persistent DOIs for images. There are other services out there – for example, some organizations may provide their own, equivalent, service.
Use the Zenodo Sandbox to test archiving images
You can experiment with archiving images using the Zenodo Sandbox. The sandbox allows you to explorre the process for uploading to Zenodo without actually creating real archives.
Reproducibility good practice
- Make use of images to capture the computational environment required for your work.
- Decide on the appropriate granularity for the images you will use for your computational work – this will be different for each project/area. Take note of accepted practice from contemporary work in the same area. What are the right building blocks for individual images in your work?
- Document what you have done and why – this can be put in comments in the
Dockerfile
and the use of the image described in associated documentation and/or publications. Make sure that references are made in both directions so that the image and the documentation are appropriately linked. - When you publish work (in whatever way) use an archiving and DOI service such as Zenodo to make sure your image is captured as it was used for the work and that is obtains a persistent DOI to allow it to be cited and referenced properly.
More informaiton
CodeRefinery have a useful overview of data sharing along with links to services that you can use to share data such as container images.
Container Granularity
As mentioned above, one of the decisions you may need to make when containerising your research workflows is what level of granularity you wish to employ. The two extremes of this decision could be characterised as:
- Create a single container image with all the tools you require for your research or analysis workflow
- Create many container images each running a single command (or step) of the workflow and use them in sequence
Of course, many real applications will sit somewhere between these two extremes.
Positives and negatives
What are the advantages and disadvantages of the two approaches to container granularity for research workflows described above? Think about this and write a few bullet points for advantages and disadvantages for each approach in the course Etherpad.
Solution
This is not an exhaustive list but some of the advantages and disadvantages could be:
Single large container
- Advantages:
- Simpler to document
- Full set of requirements packaged in one place
- Potentially easier to maintain (though could be opposite if working with large, distributed group)
- Disadvantages:
- Could get very large in size, making it more difficult to distribute
- Could use Docker multi-stage build to reduce size
- Singularity also has a multistage build feature
- May end up with same dependency issues within the container from different software requirements
- Potentially more complex to test
- Less re-useable for different, but related, work
Multiple smaller containers
- Advantages:
- Individual components can be re-used for different, but related, work
- Individual parts are smaller in size making them easier to distribute
- Avoid dependency issues between different pieces of software
- Easier to test
- Disadvantages:
- More difficult to document
- Potentially more difficult to maintain (though could be easier if working with large, distributed group)
- May end up with dependency issues between component containers if they get out of sync
Next steps with containers
Now that we’re at the end of the lesson material, take a moment to reflect on what you’ve learned, how it applies to you, and what to do next.
- In your own notes, write down or diagram your understanding of Docker containers: concepts, commands, and how they work.
- In the workshop’s shared notes document, write down how you think you might use containers in your daily work. If there’s something you want to try doing with containers right away, what is a next step after this workshop to make that happen?
Key Points
Container images allow us to encapsulate the computation (and data) we have used in our research.
Using a service such as Docker Hub allows us to easily share computational work we have done.
Using container images along with a DOI service such as Zenodo allows us to capture our work and enables reproducibility.
Tools such as Docker Compose, Docker Swarm and Kubernetes allow us to describe how multiple containers work together.