3.8 Volumes
Docker containers are fully isolated. It is necessary to mount volumes in order to handle input/output files.
Syntax: --volume/-v host:container
We can pull the following image to illustrate:
FastQC is a tool that runs a quality control on .fastq files (a file format that stores nucleotide sequences and their corresponding quality scores).
# Create directory and empty file
mkdir datatest
touch datatest/test
# Run a container in the background (--detach) and mount the local volume "datatest" (we map it to directory /scratch inside the container)
docker run --detach --volume $(pwd)/datatest:/scratch --name fastqc_container biocontainers/fastqc:v0.11.9_cv7 tail -f /dev/null
# Execute the container interactively
docker exec -ti fastqc_container /bin/bash
> ls -l /scratch
> exit
Strictly speaking, these type of volumes we showed above are named bind mounts. If interested, there is a more powerful syntax available which allow you to control things such as turning them read-only (with –mount). More details here: https://docs.docker.com/storage/bind-mounts/
More sophisticated kinds of volumes (which are handled by Docker daemon) are explained here: https://docs.docker.com/storage/volumes/HANDS-ON
- Copy the 2 fastq files from the Github repository and place them in mounted directory.
- Run fastqc interactively (inside container):
fastqc /scratch/*.gz
- Run fastqc outside the container
Answer
# Download test fastq files (manually or using the following commands) and place them in "datatest":
wget -O - https://github.com/biocorecrg/CoursesCRG_Containers_Nextflow_May_2021/blob/main/testdata/B7_H3K4me1_s_chr19.fastq.gz?raw=true > datatest/B7_H3K4me1_s_chr19.fastq.gz
wget -O - https://github.com/biocorecrg/CoursesCRG_Containers_Nextflow_May_2021/blob/main/testdata/B7_input_s_chr19.fastq.gz?raw=true > datatest/B7_input_s_chr19.fastq.gz
# Mount volumes and start a detached container
docker run --detach -ti --volume $(pwd)/datatest:/scratch --name fastqc_container_test biocontainers/fastqc:v0.11.9_cv7
# Execute container interactively and run fastqc
docker exec -ti fastqc_container_test /bin/bash
> fastqc /scratch/*.gz
# Run fastqc outside the container
# One by one
docker exec fastqc_container_test fastqc /scratch/B7_H3K4me1_s_chr19.fastq.gz
docker exec fastqc_container_test fastqc /scratch/B7_input_s_chr19.fastq.gz
# All (using wildcard *)
docker exec fastqc_container_test bash -c 'ls /scratch/*gz'
bash -c stands for executing from the provided string.
HANDS-ON
- Copy the 2 FASTA files from the Github repository and place them in mounted directory.
- Run
blastp
inside the container against each other:blastp -query /scratch/O75976.fasta -subject /scratch/Q90240.fasta
- Run
blastp
outside the container and get the result. Do with exec and do all at once with run
Answer
# Let's create the container and we dettach it
docker run --detach -ti --volume $(pwd)/datatest:/scratch --name blastp_test ncbi/blast:2.10.1
# Execute container interactively and run blastp
docker exec -ti blastp_test /bin/bash
> blastp -query /scratch/O75976.fasta -subject /scratch/Q90240.fasta > /scratch/insideout.txt
# Run from outside and retrieve in different ways
docker exec -ti blastp_test blastp -query /scratch/O75976.fasta -subject /scratch/Q90240.fasta
docker exec -ti blastp_test blastp -query /scratch/O75976.fasta -subject /scratch/Q90240.fasta > out.txt
docker exec -ti blastp_test blastp -query /scratch/O75976.fasta -subject /scratch/Q90240.fasta -out /scratch/outagain.txt
docker run --volume $(pwd)/datatest:/scratch --name blastp1 ncbi/blast:2.10.1 blastp -query /scratch/O75976.fasta -subject /scratch/Q90240.fasta > out2.txt
docker run --volume $(pwd)/datatest:/scratch --name blastp2 ncbi/blast:2.10.1 blastp -query /scratch/O75976.fasta -subject /scratch/Q90240.fasta -out /scratch/outagain2.txt