Containers

Linux containers

What are containers?

https://www.synopsys.com/blogs/software-security/wp-content/uploads/2018/04/containers-rsa.jpg

A Container can be seen as a minimal virtual environment that can be used in any Linux-compatible machine (and beyond).

Using containers is time- and resource-saving as they allow:

  • Controlling for software installation and dependencies.

  • Reproducibility of the analysis.

Containers allow us to use exactly the same versions of the tools.

Virtual machines or containers ?

Virtualisation

Containerisation (aka lightweight virtualisation)

Abstraction of physical hardware

Abstraction of application layer

Depends on hypervisor (software)

Depends on host kernel (OS)

Do not confuse with hardware emulator

Application and dependencies bundled all together

Enable virtual machines

Every virtual machine with an OS (Operating System)

Virtual machines vs containers

https://raw.githubusercontent.com/collabnix/dockerlabs/master/beginners/docker/images/vm-docker5.png

Source

Pros and cons

ADV

Virtualisation

Containerisation

PROS.

  • Very similar to a full OS.

  • High OS diversity

  • No need of full OS installation (less space).

  • Better portability

  • Faster than virtual machines.

  • Easier automation.

  • Easier distribution of recipes.

  • Better portability.

CONS.

  • Need more space and resources.

  • Slower than containers.

  • Not that good automation.

  • Some cases might not be exactly the same as a full OS.

  • Still less OS diversity, even with current solutions

Docker

https://connpass-tokyo.s3.amazonaws.com/thumbs/80/52/80521f18aec0945dfedbb471dad6aa1a.png

What is Docker?

  • Platform for developing, shipping and running applications.

  • Infrastructure as application / code.

  • First version: 2013.

  • Company: originally dotCloud (2010), later named Docker.

  • Established Open Container Initiative.

As a software:

There is an increasing number of alternative container technologies and providers. Many of them are actually based on software components originally from the Docker stack and they normally try to address some specific use cases or weakpoints. As a example, Singularity, that we introduce later in this couse, is focused in HPC environments. Another case, Podman, keeps a high functional compatibility with Docker but with a different focus on technology (not keeping a daemon) and permissions.

Docker components

http://apachebooster.com/kb/wp-content/uploads/2017/09/docker-architecture.png
  • Read-only templates.

  • Containers are run from them.

  • Images are not run.

  • Images have several layers.

https://i.stack.imgur.com/vGuay.png

Images versus containers

  • Image: A set of layers, read-only templates, inert.

  • An instance of an image is called a container.

When you start an image, you have a running container of this image. You can have many running containers of the same image.

“The image is the recipe, the container is the cake; you can make as many cakes as you like with a given recipe.”

https://stackoverflow.com/questions/23735149/what-is-the-difference-between-a-docker-image-and-a-container

Docker vocabulary

docker
_images/docker_vocab.png

Get help:

docker run --help
_images/docker_run_help.png

Using existing images

Explore Docker hub

Images can be stored locally or shared in a registry.

Docker hub is the main public registry for Docker images.

Let’s search the keyword ubuntu:

_images/dockerhub_ubuntu.png

docker pull: import image

  • get latest image / latest release

docker pull ubuntu
_images/docker_pull.png
  • choose the version of Ubuntu you are fetching: check the different tags

_images/dockerhub_ubuntu_1804.png
docker pull ubuntu:18.04

Biocontainers

https://biocontainers.pro/

Specific directory of Bioinformatics related entries

Example: FastQC

https://biocontainers.pro/#/tools/fastqc

docker pull biocontainers/fastqc:v0.11.9_cv7

docker images: list images

docker images
_images/docker_images_list.png

Each image has a unique IMAGE ID.

docker run: run image, i.e. start a container

Now we want to use what is inside the image.

docker run creates a fresh container (active instance of the image) from a Docker (static) image, and runs it.

The format is:

docker run image:tag command

docker run ubuntu:18.04 /bin/ls
_images/docker_run_ls.png

Now execute ls in your current working directory: is the result the same?

You can execute any program/command that is stored inside the image:

docker run ubuntu:18.04 /bin/whoami
docker run ubuntu:18.04 cat /etc/issue

You can either execute programs in the image from the command line (see above) or execute a container interactively, i.e. “enter” the container.

docker run -it ubuntu:18.04 /bin/bash

Run container as daemon (in background)

docker run -ti --detach ubuntu:18.04

docker run --detach ubuntu:18.04 tail -f /dev/null

Run container as daemon (in background) with a given name

docker run -ti --detach --name myubuntu ubuntu:18.04

docker run --detach --name myubuntu ubuntu:18.04 tail -f /dev/null

docker ps: check containers status

List running containers:

docker ps

List all containers (whether they are running or not):

docker ps -a

Each container has a unique ID.

docker exec: execute process in running container

docker exec myubuntu uname -a
  • Interactively

docker exec -it myubuntu /bin/bash

docker stop, start, restart: actions on container

Stop a running container:

docker stop myubuntu

docker ps -a

Start a stopped container (does NOT create a new one):

docker start myubuntu

docker ps -a

Restart a running container:

docker restart myubuntu

docker ps -a

Run with restart enabled

docker run --restart=unless-stopped --detach --name myubuntu2 ubuntu:18.04 tail -f /dev/null
  • Restart policies: no (default), always, on-failure, unless-stopped

Update restart policy

docker update --restart unless-stopped myubuntu

docker rm, docker rmi: clean up!

docker rm myubuntu
docker rm -f myubuntu
docker rmi ubuntu:18.04

Major clean

Check used space

docker system df

Remove unused containers (and others) - DO WITH CARE

docker system prune

Remove ALL non-running containers, images, etc. - DO WITH MUCH MORE CARE!!!

docker system prune -a

Volumes

Docker containers are fully isolated. It is necessary to mount volumes in order to handle input/output files.

Syntax: –volume/-v host:container

mkdir data
touch data/test
# We can also copy the FASTQ we used in previous exercises... cp ...
docker run --detach --volume $(pwd)/data:/scratch --name fastqc_container biocontainers/fastqc:v0.11.9_cv7 tail -f /dev/null
docker exec -ti fastqc_container /bin/bash
> ls -l /scratch
# We can also run fastqc from here
> cd /scratch; fastqc SRR6466185_1.fastq.gz
> exit

Singularity

Singularity architecture

_images/singularity_architecture.png

Strengths

Weaknesses

No dependency of a daemon

At the time of writing only good support in Linux

Can be run as a simple user

Mac experimental. Desktop edition. Only running

Avoids permission headaches and hacks

For some features you need root account (or sudo)

Image/container is a file (or directory)

More easily portable

Two types of images: Read-only (production)

Writable (development, via sandbox)

Trivia

Nowadays, there may be some confusion since there are two projects:

They “forked” in 2021. So far they share most of the codebase, but eventually this could be different, and software might have different functionality.

The former is already “End Of Life” and its development continues named as Apptainer, under the support of the Linux Foundation.

Container registries

Container images, normally different versions of them, are stored in container repositories.

These repositories can be browser or discovered within, normally public, container registries.

Docker hub

It is the first and most popular public container registry (which provides also private repositories).

Example:

https://hub.docker.com/r/biocontainers/fastqc

singularity build fastqc-0.11.9_cv7.sif docker://biocontainers/fastqc:v0.11.9_cv7

Biocontainers

Website gathering Bioinformatics focused container images from different registries.

Originally Docker Hub was used, but now other registries are preferred.

Example: https://biocontainers.pro/tools/fastqc

Via quay.io

https://quay.io/repository/biocontainers/fastqc

singularity build fastqc-0.11.9.sif docker://quay.io/biocontainers/fastqc:0.11.9--0

Via Galaxy project prebuilt images

singularity pull --name fastqc-0.11.9.sif https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0

Galaxy project provides all Bioinformatics software from the BioContainers initiative as Singularity prebuilt images. If download and conversion time of images is an issue, this might be the best option for those working in the biomedical field.

Link: https://depot.galaxyproject.org/singularity/

Running and executing containers

Once we have some image files (or directories) ready, we can run processes.

Singularity shell

The straight-forward exploratory approach is equivalent to docker run -ti biocontainers/fastqc:v0.11.9_cv7 /bin/shell but with a more handy syntax.

singularity shell fastqc-0.11.9.sif

Move around the directories and notice how the isolation approach is different in comparison to Docker. You can access most of the host filesystem.

Singularity exec

That is the most common way to execute Singularity (equivalent to docker exec). That would be the normal approach in a HPC environment.

singularity exec fastqc-0.11.9.sif fastqc

a processing of a FASTQ file from data directory:

singularity exec fastqc-0.11.9_cv7.sif fastqc SRR6466185_1.fastq.gz

Singularity run

This executes runscript from recipe definition (equivalent to docker run). Not so common for HPC uses. More common for instances (servers).

singularity run fastqc-0.11.9.sif

Environment control

By default Singularity inherits a profile environment (e.g., PATH environment variable). This may be convenient in some circumstances, but it can also lead to unexpected problems when your own environment clashes with the default one from the image.

singularity shell -e fastqc-0.11.9.sif
singularity exec -e fastqc-0.11.9.sif fastqc
singularity run -e fastqc-0.11.9.sif

Compare env command with and without -e modifier.

singularity exec fastqc-0.11.9.sif env
singularity exec -e fastqc-0.11.9.sif env

Exercise

Using the 2 fastq available files, process them outside and inside a mounted directory using fastqc.

Suggested solution
# Let's create a dummy directory
mkdir data

# Let's copy contents of data in that directory

singularity exec fastqc.sif fastqc data/*fastq.gz

# Check you have some HTMLs there. Remove them
rm data/*html

# Let's use shell
singularity shell fastqc.sif
> cd data
> fastqc *fastq.gz
> exit

# Check you have some HTMLs there. Remove them
singularity exec -B ./data:/scratch fastqc.sif fastqc /scratch/*fastq.gz

# What happens here!
singularity exec -B ./data:/scratch fastqc.sif bash -c 'fastqc /scratch/*fastq.gz'