Creating More Complex Container Images
Overview
Teaching: 15 min
Exercises: 20 minQuestions
How can I make more complex container images?
Objectives
Explain how you can include files within Docker container images when you build them.
Explain how you can access files on the Docker host from your Docker containers.
Building container images with your files included
In order to create and use your own container images, you may need more information than our previous example. You may want to use files from outside the container, that are not included within the container image by copying the files into the container image at build time.
Create a Python script
Before we go ahead and build our next container image, we’re going to create a simple Python script on our host system and create a Dockerfile to have this script copied into our container image when it is created.
In your shell, create a new directory to hold the build context for our new container image and move into the directory:
$ mkdir alpine-sum
$ cd alpine-sum
Use your text editor to create a Python script called sum.py
with the
following contents:
#!/usr/bin/env python3
import sys
try:
total = sum(int(arg) for arg in sys.argv[1:])
print('sum =', total)
except ValueError:
print('Please supply integer arguments')
Let’s assume that we’ve finished with our sum.py
script and want to add it to the container image itself.
Create the Dockerfile
Now we have our Python script, we are going to create our Dockerfile. This is going to be similar to the Dockerfile we used in the previous section with the addition of one extra line. Here is the full Dockerfile:
FROM alpine
RUN apk add --update python3 py3-pip python3-dev
RUN pip install cython
COPY sum.py /home
CMD ["python3", "--version"]
The additional line we have added is:
COPY sum.py /home
This line will cause Docker to copy the file from your computer into the container’s
filesystem. Let’s build the container image like before, but give it a different name
and then push it to Docker Hub (remember to subsitute alice
for your Docker Hub username):
$ docker image build --platform linux/amd64 -t alice/alpine-sum .
...output from docker build...
$ docker push alice/alpine-sum
The Importance of Command Order in a Dockerfile
When you run
docker build
it executes the build in the order specified in theDockerfile
. This order is important for rebuilding and you typically will want to put yourRUN
commands before yourCOPY
commands.Docker builds the layers of commands in order. This becomes important when you need to rebuild container images. If you change layers later in the
Dockerfile
and rebuild the container image, Docker doesn’t need to rebuild the earlier layers but will instead used a stored (called “cached”) version of those layers.For example, in an instance where you wanted to copy
multiply.py
into the container image instead ofsum.py
. If theCOPY
line came before theRUN
line, it would need to rebuild the whole image. If theCOPY
line came second then it would use the cachedRUN
layer from the previous build and then only rebuild theCOPY
layer.
Exercise: Did it work?
Can you remember how to run a container interactively on the remote HPC system? Try that with this one. Once inside, try running the Python script you added to the container image.
Solution
You can start the container interactively on the remote HPC system like so (remember to use your Docker Hub username):
remote$ singularity pull alpine-sum.sif docker://alice/alpine-sum remote$ singularity shell alpine-sum.sif
You should be able to run the python command inside the container like this:
Singularity> python3 /home/sum.py
This COPY
keyword can be used to place your own scripts or own data into a container image
that you want to publish or use as a record. Note that it’s not necessarily a good idea
to put your scripts inside the container image if you’re constantly changing or editing them.
Then, referencing the scripts from outside the container is a good idea, by bind mounting
host data into the running container as we saw earlier in the workshop. You also want to
think carefully about size – if you run docker image ls
you’ll see the size of each container
image all the way on the right of the screen. The bigger your container image becomes, the harder
it will be to easily download.
Security Warning
Login credentials including passwords, tokens, secure access tokens or other secrets must never be stored in a container. If secrets are stored, they are at high risk to be found and exploited when made public.
Copying alternatives
Another trick for getting your own files into a container image is by using the
RUN
keyword and downloading the files from the internet. For example, if your code is in a GitHub repository, you could include this statement in your Dockerfile to download the latest version every time you build the container image:RUN git clone https://github.com/alice/mycode
Similarly, the
wget
command can be used to download any file publicly available on the internet:RUN wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.10.0/ncbi-blast-2.10.0+-x64-linux.tar.gz
Note that the above
RUN
examples depend on commands (git
andwget
respectively) that must be available within your container: Linux distributions such as Alpine may require you to install such commands before using them withinRUN
statements.
More fancy Dockerfile
options (optional, for presentation or as exercises)
We can expand on the example above to make our container image even more “automatic”. Here are some ideas:
Make the sum.py
script run automatically
FROM alpine
RUN apk add --update python3 py3-pip python3-dev
COPY sum.py /home
# Run the sum.py script as the default command
CMD ["python3", "/home/sum.py"]
Build and push this:
$ docker image build --platform linux/amd64 -t alice/alpine-sum:v1 .
$ docker push alice/alpine-sum:v1
You’ll notice that you can run the container without arguments just fine,
resulting in sum = 0
, but this is boring. Supplying arguments however
doesn’t work:
remote$ singularity pull alpine-sum_v1.sif docker://alice/alpine-sum:v1
remote$ singularity run alpine-sum_v1.sif 10 11 12
results in:
FATAL: "10": executable file not found in $PATH
This is because the arguments 10 11 12
are ignored by the CMD
syntax and the container
tries to interpret them as additional commands instead of arguments to the sum.py
script.
To achieve the goal of having a command that always runs when a
container is run from the container image and can be passed the arguments given on the
command line, use the keyword ENTRYPOINT
in the Dockerfile
.
FROM alpine
COPY sum.py /home
RUN apk add --update python3 py3-pip python3-dev
# Run the sum.py script as the default command and
# allow people to enter arguments for it
ENTRYPOINT ["python3", "/home/sum.py"]
# Give default arguments, in case none are supplied on
# the command-line
CMD ["10", "11"]
Build and push this:
$ docker image build --platform linux/amd64 -t alice/alpine-sum:v2 .
$ docker push alice/alpine-sum:v2
remote$ singularity pull alpine-sum_v2.sif docker://alice/alpine-sum:v2
remote$ singularity run alpine-sum_v2.sif 10 11 12
sum = 33
Add the sum.py
script to the PATH
so you can run it directly:
FROM alpine
RUN apk add --update python3 py3-pip python3-dev
COPY sum.py /home
# set script permissions
RUN chmod +x /home/sum.py
# add /home folder to the PATH
ENV PATH /home:$PATH
Build and push this:
$ docker image build --platform linux/amd64 -t alice/alpine-sum:v3 .
$ docker push alice/alpine-sum:v3
Build and test it:
remote$ singularity pull alpine-sum_v3.sif docker://alice/alpine-sum:v3
remote$ singularity exec alpine-sum_v3.sif sum.py 10 11 12
sum = 33
Best practices for writing Dockerfiles
Take a look at Nüst et al.’s “Ten simple rules for writing Dockerfiles for reproducible data science” [1] for some great examples of best practices to use when writing Dockerfiles. The GitHub repository associated with the paper also has a set of example
Dockerfile
s demonstrating how the rules highlighted by the paper can be applied.[1] Nüst D, Sochat V, Marwick B, Eglen SJ, Head T, et al. (2020) Ten simple rules for writing Dockerfiles for reproducible data science. PLOS Computational Biology 16(11): e1008316. https://doi.org/10.1371/journal.pcbi.1008316
Key Points
Docker allows containers to read and write files from the Docker host.
You can include files from your Docker host into your Docker container images by using the
COPY
instruction in yourDockerfile
.