Containers

Linux containers

What are containers?

https://upload.wikimedia.org/wikipedia/commons/d/d0/Container_lashing_with_rods.jpg

Source: Danny Cornelissen, Attribution, via Wikimedia Commons.

A Container can be seen as a minimal virtual environment that can be used in any Linux-compatible machine (and beyond).

Using containers is time- and resource-saving as they allow:

Controlling for software installation and dependencies.
Reproducibility of the analysis.

Containers allow us to use exactly the same versions of the tools.

Virtual machines or containers ?

Virtualisation	Containerisation (aka lightweight virtualisation)
Abstraction of physical hardware	Abstraction of application layer
Depends on hypervisor (software)	Depends on host kernel (OS)
Do not confuse with hardware emulator	Application and dependencies bundled all together
Enable virtual machines	Every virtual machine with an OS (Operating System)

Virtual machines vs containers

https://raw.githubusercontent.com/collabnix/dockerlabs/master/beginners/docker/images/vm-docker5.png

Source

Pros and cons

ADV	Virtualisation	Containerisation
PROS.	Very similar to a full OS. High OS diversity	No need of full OS installation (less space). Better portability Faster than virtual machines. Easier automation. Easier distribution of recipes. Better portability.
CONS.	Need more space and resources. Slower than containers. Not that good automation.	Some cases might not be exactly the same as a full OS. Still less OS diversity, even with current solutions

Docker

What is Docker?

Platform for developing, shipping and running applications.
Infrastructure as application / code.
First version: 2013.
Company: originally dotCloud (2010), later named Docker.
Established Open Container Initiative.

As a software:

Docker Community Edition.
Docker Enterprise Edition.

There is an increasing number of alternative container technologies and providers. Many of them are actually based on software components originally from the Docker stack, and they normally try to address some specific use cases or weak points. As an example, Singularity, that we introduce later in this course, is focused on HPC environments. Another case, Podman, keeps a high functional compatibility with Docker but with a different focus on technology (not keeping a daemon) and permissions.

Docker components

http://apachebooster.com/kb/wp-content/uploads/2017/09/docker-architecture.png

Read-only templates.
Containers are run from them.
Images are not run.
Images have several layers.

Images versus containers

Image: A set of layers, read-only templates, inert.
An instance of an image is called a container.

When you start an image, you have a running container of this image. You can have many running containers of the same image.

“The image is the recipe, the container is the cake; you can make as many cakes as you like with a given recipe.”

https://stackoverflow.com/questions/23735149/what-is-the-difference-between-a-docker-image-and-a-container

Docker vocabulary

docker

Get help:

docker run --help

Using existing images

Explore Docker hub

Images can be stored locally or shared in a registry.

Docker hub is the main public registry for Docker images.

Let’s search the keyword ubuntu:

docker pull: import image

get latest image / latest release

docker pull ubuntu

choose the version of Ubuntu you are fetching: check the different tags

docker pull ubuntu:22.04

docker images: list images

docker images

Each image has a unique IMAGE ID.

docker run: run image, i.e. start a container

Now we want to use what is inside the image.

docker run creates a fresh container (active instance of the image) from a Docker (static) image, and runs it.

The format is:

docker run image:tag command

docker run ubuntu:22.04 /bin/ls

Now execute ls in your current working directory: is the result the same?

You can execute any program/command that is stored inside the image:

docker run ubuntu:22.04 /bin/whoami
docker run ubuntu:22.04 cat /etc/issue

You can either execute programs in the image from the command line (see above) or execute a container interactively, i.e. “enter” the container.

With --name you can provide a name to the container.

docker run -it ubuntu:22.04 /bin/bash

docker run --name myubuntu -it ubuntu:22.04 /bin/bash

docker ps: check containers status

List running containers:

docker ps

List all containers (whether they are running or not):

docker ps -a

docker rm, docker rmi: clean up!

docker rm myubuntu
docker rm -f myubuntu

docker rmi ubuntu:22.04

Volumes

Docker containers are fully isolated. It is necessary to mount volumes in order to handle input/output files.

Syntax: --volume/-v host:container

mkdir data
# We can also copy the FASTQ we used in data
docker run --volume $(pwd)/data:/scratch --name fastqc_container biocontainers/fastqc:v0.11.9_cv7 fastqc /scratch/B7_input_s_chr19.fastq.gz

docker run –user

It is possible to run certain containers with a specific user, appending `run \--user`.

A convenient command would be:

docker run --user $(id -u):$(id -g) --volume $(pwd)/data:/scratch --name user_test biocontainers/fastqc:v0.11.9_cv7 touch /scratch/userfile

Build images

OS commands in image building

Depending on the underlying OS, there are different ways to build images.

Know your base system and their packages. Popular ones:

Debian
CentOS
Alpine
Conda. Anaconda, Conda-forge, Bioconda, etc.

Update and upgrade packages

In Ubuntu:

apt-get update && apt-get upgrade -y

In CentOS:

yum check-update && yum update -y

Search and install packages

In Ubuntu:

apt search libxml2
apt install -y libxml2-dev

In CentOS:

yum search libxml2
yum install -y libxml2-devel.x86_64

Note the -y option that we set for updating and for installing.<br> It is an important option in the context of Docker: it means that you answer yes to all questions regarding installation.

Building recipes

All commands should be saved in a text file, named by default Dockerfile.

Basic instructions

Each row in the recipe corresponds to a layer of the final image.

FROM: parent image. Typically, an operating system. The base layer.

FROM ubuntu:22.04

RUN: the command to execute inside the image filesystem.

Think about it this way: every RUN line is essentially what you would run to install programs on a freshly installed Ubuntu OS.

RUN apt install wget

A basic recipe:

FROM ubuntu:22.04

RUN apt update && apt -y upgrade
RUN apt install -y wget

docker build

Implicitely looks for a Dockerfile file in the current directory:

docker build .

Same as:

docker build --file Dockerfile .

Syntax: --file / -f

. stands for the context (in this case, current directory) of the build process. This makes sense if copying files from filesystem, for instance. IMPORTANT: Avoid contexts (directories) overpopulated with files (even if not actually used in the recipe).

You can define a specific name for the image during the build process.

Syntax: -t imagename:tag. If not defined `:tag` default is latest.

docker build -t mytestimage-$USER .
# Same as:
docker build -t mytestimage-$USER:latest .

IMPORTANT: Avoid contexts (directories) over-populated with files (even if not actually used in the recipe).

In order to avoid that some directories or files are inspected or included (e.g, with COPY command in Dockerfile), you can use .dockerignore file to specify which paths should be avoided. More information at: https://codefresh.io/docker-tutorial/not-ignore-dockerignore-2/

NOTE: We use $USER bash env variable for avoiding conflicts between different users in the same machine. This is a common practice in HPC environments.

The last line of installation should be Successfully built …: then you are good to go.

Check with docker images that you see the newly built image in the list…

Then let’s check the ID of the image and run it!

docker images

docker run f9f41698e2f8
docker run mytestimage-$USER

More instructions

WORKDIR: all subsequent actions will be executed in that working directory

WORKDIR ~

ADD, COPY: add files to the image filesystem

Difference between ADD and COPY explained here and here

COPY: lets you copy a local file or directory from your host (the machine from which you are building the image)

ADD: same, but ADD works also for URLs, and for .tar archives that will be automatically extracted upon being copied.

If we have a file, let’s say `example.jpg`, we can copy it.

# COPY source destination
COPY example.jpg .

A more sophisticated case:

FROM ubuntu:22.04

RUN apt update && apt -y upgrade
RUN apt install -y wget

RUN mkdir -p /data

WORKDIR /data

COPY example.jpg .

CMD, ENTRYPOINT: command to execute when generated container starts

The ENTRYPOINT specifies a command that will always be executed when the container starts. The CMD specifies arguments that will be fed to the ENTRYPOINT

In the example below, when the container is run without an argument, it will execute echo “hello world”. If it is run with the argument hello moon it will execute echo “hello moon”

FROM ubuntu:22.04
ENTRYPOINT ["/bin/echo"]
CMD ["hello world"]

A more complex recipe (save it in a text file named Dockerfile):

FROM ubuntu:22.04

RUN mkdir -p /downloads
WORKDIR /downloads

RUN apt-get update && apt-get -y upgrade
RUN apt-get install -y wget

ENTRYPOINT ["/usr/bin/wget"]
CMD ["https://cdn.wp.nginx.com/wp-content/uploads/2016/07/docker-swarm-hero2.png"]

docker run f9f41698e2f8 https://cdn-images-1.medium.com/max/1600/1*_NQN6_YnxS29m8vFzWYlEg.png

docker tag

To tag a local image with ID “e23aaea5dff1” into the “ubuntu_wget” image name repository with version “1.0”:

docker tag e23aaea5dff1 ubuntu_wget:1.0

More complex examples

Check in containers/alba directory

Additional docker commands

docker commit: Turn a container into an image
docker save: Save an image to a tar archive
docker load: Load an image from a tar archive
docker export: Export a container’s filesystem as a tar archive (little used)
docker import: Import the contents from a tarball to create a filesystem image (little used)

Recommend workflow: If necessary, commit a Docker container into an image and then save it into a tar archive that can be shared and loaded in another machine.

Reference: https://www.baeldung.com/ops/docker-save-export

Major clean

Check used space

docker system df

Remove unused containers (and others) - DO WITH CARE

docker system prune

Remove ALL non-running containers, images, etc. - DO WITH MUCH MORE CARE!!!

docker system prune -a

Reference: https://www.digitalocean.com/community/tutorials/how-to-remove-docker-images-containers-and-volumes

Singularity

Focus:
- Reproducibility to scientific computing and the high-performance computing (HPC) world.
Origin: Lawrence Berkeley National Laboratory. Later spin-off: Sylabs
Version 1.0 (2016)
More information: https://en.wikipedia.org/wiki/Singularity_(software)

Singularity architecture

Strengths	Weaknesses
No dependency of a daemon	At the time of writing only good support in Linux
Can be run as a simple user	Mac experimental. Desktop edition. Only running
Avoids permission headaches and hacks	For some features you need root account (or sudo)
Image/container is a file (or directory)
More easily portable
Two types of images: Read-only (production)
Writable (development, via sandbox)

Trivia

Nowadays, there may be some confusion since there are two projects:

They “forked” in 2021. So far they share most of the codebase, but eventually this could be different, and software might have different functionality.

In the command-line you can have apptainer installed but singularity is available as an alias of the former.

Container registries

Container images, normally different versions of them, are stored in container repositories.

These repositories can be browser or discovered within, normally public, container registries.

Docker hub

It is the first and most popular public container registry (which provides also private repositories).

Docker Hub

Example:

https://hub.docker.com/r/biocontainers/fastqc

singularity build fastqc-0.11.9_cv7.sif docker://biocontainers/fastqc:v0.11.9_cv7

Biocontainers

Biocontainers

Website gathering Bioinformatics focused container images from different registries.

Originally Docker Hub was used, but now other registries are preferred.

Example: https://biocontainers.pro/tools/fastqc

Via quay.io

https://quay.io/repository/biocontainers/fastqc

singularity build fastqc-0.11.9.sif docker://quay.io/biocontainers/fastqc:0.11.9--0

Via Galaxy project prebuilt images

singularity pull --name fastqc-0.11.9.sif https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0

Galaxy project provides all Bioinformatics software from the BioContainers initiative as Singularity prebuilt images. If download and conversion time of images is an issue, this might be the best option for those working in the biomedical field.

Link: https://depot.galaxyproject.org/singularity/

From Docker daemon

If you have a Docker daemon running in your machine, you can also build images from there without need to share them in a registry first.

singularity build myubuntu.sif docker-daemon://myubuntu:latest

From a Docker tar archive

If you saved a tar archive from a Docker image, you can also build images from there. This is useful if you might not have a Docker daemon running in the machine you intend to use Singularity. This is common in HPC environments.

# Where you have a Docker daemon running
      docker save -o myubuntu.tar myubuntu:latest
# Where you have Singularity
singularity build myubuntu.sif docker-archive://myubuntu.tar

Running and executing containers

Once we have some image files (or directories) ready, we can run processes.

Singularity shell

The straight-forward exploratory approach is equivalent to docker run -ti biocontainers/fastqc:v0.11.9_cv7 /bin/sh but with a more handy syntax.

singularity shell fastqc-0.11.9.sif

Move around the directories and notice how the isolation approach is different in comparison to Docker. You can access most of the host filesystem.

Singularity exec

That is the most common way to execute Singularity (equivalent to docker exec). That would be the normal approach in an HPC environment.

singularity exec fastqc-0.11.9.sif fastqc

a processing of a FASTQ file from data directory:

singularity exec fastqc-0.11.9_cv7.sif fastqc B7_input_s_chr19.fastq.gz

Environment control

By default, Singularity inherits a profile environment (e.g., PATH environment variable). This may be convenient in some circumstances, but it can also lead to unexpected problems when your own environment clashes with the default one from the image.

singularity shell -e fastqc-0.11.9.sif
singularity exec -e fastqc-0.11.9.sif fastqc

Compare env command with and without -e modifier.

singularity exec fastqc-0.11.9.sif env
singularity exec -e fastqc-0.11.9.sif env

Exercise

Using the 2 fastq available files, process them using fastqc.

Singularity tips

Troubleshooting

singularity --help

Fakeroot

Singularity permissions are an evolving field. If you don’t have access to sudo, it might be worth considering using –fakeroot/-f parameter.

More details at https://apptainer.org/docs/user/main/fakeroot.html

Singularity cache directory

$HOME/.singularity

It stores cached images from registries, instances, etc.
If problems may be a good place to clean. When running sudo, $HOME is /root.

Global singularity configuration

Normally at /etc/singularity/singularity.conf or similar (e.g., preceded by /usr/local/)

It can only be modified by users with administration permissions
Worth noting bind path lines, which point default mounted directories in containers