5.7 Workflow and log

The code as it is will not produce anything, because another part is needed that will actually call the process and connect it to the input channel.

This part is called a workflow.
Let’s add a workflow to our code:

#!/usr/bin/env nextflow

nextflow.enable.dsl=2

str = Channel.from('hello', 'hola', 'bonjour')

process printHello {

   tag { "${str_in}" }

   input:        
   val str_in

   output:        
   stdout

   script:        
   """
   echo ${str_in} in Italian is ciao
   """
}

/*
 * A workflow consists of a number of invocations of processes
 * where they are fed with the expected input channels
 * as if they were custom functions. You can only invoke a process once per workflow.
 */

workflow {
 result = printHello(str)
 result.view()
}

We can run the script this time sending the execution in the background (with the -bg option) and saving the log in the file log.txt.

nextflow run test1.nf -bg > log.txt

5.7.1 Nextflow log

Let’s inspect now the log file:

cat log.txt

N E X T F L O W  ~  version 20.07.1
Launching `test1.nf` [high_fermat] - revision: b129d66e57
[6a/2dfcaf] Submitted process > printHello (hola)
[24/a286da] Submitted process > printHello (hello)
[04/e733db] Submitted process > printHello (bonjour)
hola in Italian is ciao

hello in Italian is ciao

bonjour in Italian is ciao

The tag allows us to see that the process printHello was launched three times on the hola, hello and bonjour values contained in the input channel.

At the start of each row, there is an alphanumeric code:

[6a/2dfcaf] Submitted process > printHello (hola)

This code indicates the path in which the process is “isolated” and where the corresponding temporary files are kept in the work directory.

IMPORTANT: Nextflow will randomly generate temporary folders so they will be named differently in your execution!!!

Let’s have a look inside that folder:

# Show the folder's full name
echo work/6a/2dfcaf*
  work/6a/2dfcafc01350f475c60b2696047a87

# List was is inside the folder
ls -alht work/6a/2dfcaf*

total 40
-rw-r--r--  1 lcozzuto  staff     1B Oct  7 13:39 .exitcode
drwxr-xr-x  9 lcozzuto  staff   288B Oct  7 13:39 .
-rw-r--r--  1 lcozzuto  staff    24B Oct  7 13:39 .command.log
-rw-r--r--  1 lcozzuto  staff    24B Oct  7 13:39 .command.out
-rw-r--r--  1 lcozzuto  staff     0B Oct  7 13:39 .command.err
-rw-r--r--  1 lcozzuto  staff     0B Oct  7 13:39 .command.begin
-rw-r--r--  1 lcozzuto  staff    45B Oct  7 13:39 .command.sh
-rw-r--r--  1 lcozzuto  staff   2.5K Oct  7 13:39 .command.run
drwxr-xr-x  3 lcozzuto  staff    96B Oct  7 13:39 ..

You see a lot of “hidden” files:

.exitcode, contains 0 if everything is ok, another value if there was a problem.
.command.log, contains the log of the command execution. It is often identical to .command.out
.command.out, contains the standard output of the command execution
.command.err, contains the standard error of the command execution
.command.begin, contains what has to be executed before .command.sh
.command.sh, contains the block of code indicated in the process
.command.run, contains the code made by nextflow for the execution of .command.sh, and contains environmental variables, eventual invocations of linux containers etc.

For instance the content of .command.sh is:

cat work/6a/2dfcaf*/.command.sh

#!/bin/bash -ue
echo hola in Italian is ciao

And the content of .command.out is

cat work/6a/2dfcaf*/.command.out

hola in Italian is ciao

You can also give a name to workflows, so that you can combine them in the main workflow. For instance we can write:

#!/usr/bin/env nextflow

nextflow.enable.dsl=2

str = Channel.from('hello', 'hola', 'bonjour')

process printHello {

   tag { "${str_in}" }

   input:        
   val str_in

   output:        
   stdout

   script:        
   """
   echo ${str_in} in Italian is ciao
   """
}

/*
 * A workflow can be named as a function and receive an input using the take keyword
 */

workflow first_pipeline {
    take: str_input
    main:
    printHello(str_input).view()
}


/*
 * You can re-use the previous processes and combine as you prefer
 */

workflow second_pipeline {
    take: str_input
    main:
    printHello(str_input.collect()).view()
}

/*
 * You can then invoke the different named workflows in this way
 * passing the same input channel `str` to both  
 */

workflow {
    first_pipeline(str)
    second_pipeline(str)
}

You can see that with the previous code you can execute two workflows containing the same process.
We can add the collect operator to the second workflow that collects the output from different executions and returns the resulting list as a sole emission.

Let’s run the code:

nextflow run test1.nf -bg > log2

cat log2

N E X T F L O W  ~  version 20.07.1
Launching `test1.nf` [irreverent_davinci] - revision: 25a5511d1d
[de/105b97] Submitted process > first_pipeline:printHello (hello)
[ba/051c23] Submitted process > first_pipeline:printHello (bonjour)
[1f/9b41b2] Submitted process > second_pipeline:printHello (hello)
[8d/270d93] Submitted process > first_pipeline:printHello (hola)
[18/7b84c3] Submitted process > second_pipeline:printHello (hola)
hello in Italian is ciao

bonjour in Italian is ciao

[0f/f78baf] Submitted process > second_pipeline:printHello (bonjour)
hola in Italian is ciao

['hello in Italian is ciao\n', 'hola in Italian is ciao\n', 'bonjour in Italian is ciao\n']