3.8 Volumes

Docker containers are fully isolated. It is necessary to mount volumes in order to handle input/output files.

Syntax: --volume/-v host:container


We can pull the following image to illustrate:

docker pull biocontainers/fastqc:v0.11.9_cv7

FastQC is a tool that runs a quality control on .fastq files (a file format that stores nucleotide sequences and their corresponding quality scores).


# Create directory and empty file
mkdir datatest
touch datatest/test

# Run a container in the background (--detach) and mount the local volume "datatest" (we map it to directory /scratch inside the container)
docker run --detach --volume $(pwd)/datatest:/scratch --name fastqc_container_${USER} biocontainers/fastqc:v0.11.9_cv7 tail -f /dev/null

# Execute the container interactively
docker exec -ti fastqc_container /bin/bash
> ls -l /scratch
> exit

Strictly speaking, these type of volumes we showed above are named bind mounts. If interested, there is a more powerful syntax available which allow you to control things such as turning them read-only (with –mount). More details here: https://docs.docker.com/storage/bind-mounts/

More sophisticated kinds of volumes (which are handled by Docker daemon) are explained here: https://docs.docker.com/storage/volumes/

HANDS-ON

  1. Copy the 2 fastq files from the Github repository and place them in mounted directory.
  2. Run fastqc interactively (inside container): fastqc /scratch/*.gz
  3. Run fastqc outside the container
Answer

# Download test fastq files (manually or using the following commands) and place them in "datatest":
wget -O - https://github.com/biocorecrg/PhD_course_containers_2021/blob/main/testdata/B7_H3K4me1_s_chr19.fastq.gz?raw=true > datatest/B7_H3K4me1_s_chr19.fastq.gz
wget -O - https://github.com/biocorecrg/PhD_course_containers_2021/blob/main/testdata/B7_input_s_chr19.fastq.gz?raw=true > datatest/B7_input_s_chr19.fastq.gz

# Mount volumes and start a detached container
docker run --detach -ti --volume $(pwd)/datatest:/scratch --name fastqc_container_test_${USER} biocontainers/fastqc:v0.11.9_cv7

# Execute container interactively and run fastqc
docker exec -ti fastqc_container_test_${USER} /bin/bash
> fastqc /scratch/*.gz

# Run fastqc outside the container
# One by one
docker exec fastqc_container_test_${USER} fastqc /scratch/B7_H3K4me1_s_chr19.fastq.gz
docker exec fastqc_container_test_${USER} fastqc /scratch/B7_input_s_chr19.fastq.gz
# All (using wildcard *)
docker exec fastqc_container_test_${USER} bash -c 'ls /scratch/*gz'
bash -c stands for executing from the provided string.

HANDS-ON

  1. Copy the 2 FASTA files from the Github repository and place them in mounted directory.
  2. Run blastp inside the container against each other: blastp -query /scratch/O75976.fasta -subject /scratch/Q90240.fasta
  3. Run blastp outside the container and get the result. Do with exec and do all at once with run
Answer

# Let's create the container and we dettach it
docker run --detach -ti --volume $(pwd)/datatest:/scratch --name blastp_test_${USER} ncbi/blast:2.10.1

# Execute container interactively and run blastp
docker exec -ti blastp_test /bin/bash
> blastp -query /scratch/O75976.fasta -subject /scratch/Q90240.fasta > /scratch/insideout.txt

# Run from outside and retrieve in different ways
docker exec -ti blastp_test_${USER} blastp -query /scratch/O75976.fasta -subject /scratch/Q90240.fasta

docker exec -ti blastp_test_${USER} blastp -query /scratch/O75976.fasta -subject /scratch/Q90240.fasta > out.txt

docker exec -ti blastp_test_${USER} blastp -query /scratch/O75976.fasta -subject /scratch/Q90240.fasta -out /scratch/outagain.txt

docker run --volume $(pwd)/datatest:/scratch --name blastp1_${USER} ncbi/blast:2.10.1 blastp -query /scratch/O75976.fasta -subject /scratch/Q90240.fasta > out2.txt

docker run --volume $(pwd)/datatest:/scratch --name blastp2_${USER} ncbi/blast:2.10.1 blastp -query /scratch/O75976.fasta -subject /scratch/Q90240.fasta -out /scratch/outagain2.txt