6.9 Modules and re-usage of the code
A great advance of the new DSL2 is to allow the modularization of the code.
In particular, you can move a named workflow within a module and keep it aside for being accessed from different pipelines.
Looking at the test4 folder gives you an idea of how the code uses modules.
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
/*
* Input parameters: read pairs
* Params are stored in the params.config file
*/
version = "1.0"
params.help = false
// this prints the input parameters
log.info """
BIOCORE@CRG - N F TESTPIPE ~ version ${version}
=============================================
reads : ${params.reads}
"""
if (params.help) {
log.info 'This is the Biocore\'s NF test pipeline'
log.info 'Enjoy!'
log.info '\n'
exit 1
}
/*
* Defining the output folders.
*/
fastqcOutputFolder = "output_fastqc"
multiqcOutputFolder = "output_multiQC"
Channel
.fromPath( params.reads )
.ifEmpty { error "Cannot find any reads matching: ${params.reads}" }
.set {reads_for_fastqc}
/*
* Here we include two modules from two files. We also add the parameter OUTPUT to pass them the folders where to publish the results
*/
include { fastqc } from "${baseDir}/lib/fastqc" addParams(OUTPUT: fastqcOutputFolder)
include { multiqc } from "${baseDir}/lib/multiqc" addParams(OUTPUT: multiqcOutputFolder)
// The main worflow can directly call the named workflows from the modules
workflow {
fastqc_out = fastqc(reads_for_fastqc)
multiqc(fastqc_out.collect())
}
workflow.onComplete {
println ( workflow.success ? "\nDone! Open the following report in your browser --> ${multiqcOutputFolder}/multiqc_report.html\n" : "Oops .. something went wrong" )
}
We now include two modules named fastqc and multiqc from ${baseDir}/lib/fastqc.nf
and ${baseDir}/lib/multiqc.nf
.
Let’s inspect the fastqc module:
/*
* fastqc module
*/
params.CONTAINER = "quay.io/biocontainers/fastqc:0.11.9--0"
params.OUTPUT = "fastqc_output"
process qc {
publishDir(params.OUTPUT, mode: 'copy')
tag { "${reads}" }
container params.CONTAINER
input:
path(reads)
output:
path("*_fastqc*")
script:
"""
fastqc ${reads}
"""
}
Module fastqc takes as input a channel with reads and produces as output the files generated by the fastqc program.
The module is quite simple: it contains the directive publishDir
, the tag, the container to be used and has a similar input, output and script session than seen previously.
A module can contain its own parameters that can be used for connecting the main script to some variables inside the module.
In this example we have the declaration of two parameters that are defined at the beginning:
They can be overridden from the main script that is calling the module:
- The parameter params.OUTPUT can be used for connecting the definition of the output of this module with the one in the main script.
- The parameter params.CONTAINER can be used for deciding which image has to be used for this particular module.
In this example in our main script we pass only the OUTPUT parameters by writing in this way:
include { fastqc } from "${baseDir}/lib/fastqc" addParams(OUTPUT: fastqcOutputFolder)
include { multiqc } from "${baseDir}/lib/multiqc" addParams(OUTPUT: multiqcOutputFolder)
While we keep the information of the container inside the module for better reproducibility:
Here you see that we are not using our own image but one provided by biocontainers stored in quay.
Here you can find a list of fastqc images developed and stored by the biocontainers community https://biocontainers.pro/#/tools/fastqc.
Let’s have a look at the multiqc.nf module:
/*
* multiqc module
*/
params.CONTAINER = "quay.io/biocontainers/multiqc:1.9--pyh9f0ad1d_0"
params.OUTPUT = "multiqc_output"
params.LABEL = ""
process multiqc {
publishDir(params.OUTPUT, mode: 'copy')
container params.CONTAINER
label (params.LABEL)
input:
path (inputfiles)
output:
path "multiqc_report.html"
script:
"""
multiqc .
"""
}
It is very similar to the fastqc one: we just add an extra parameter for connecting the resources defined in the nextflow.config file and the label indicated in the process.
In case we want to use it we would need to change the main code in this way:
include { multiqc } from "${baseDir}/lib/multiqc" addParams(OUTPUT: multiqcOutputFolder, LABEL="onecpu")
This is because we specified the label onecpu in out nextflow.config file:
includeConfig "$baseDir/params.config"
process {
container = 'biocorecrg/debian-perlbrew-pyenv3-java'
memory='0.6G'
cpus='1'
time='6h'
withLabel: 'onecpu'
{
memory='0.6G'
cpus='1'
}
}
singularity.cacheDir = "$baseDir/singularity"
IMPORTANT: you will need to specify a default image when you want to run nextflow -with-docker or -with-singularity and you have containers defined inside the modules