MOP_TAIL

This module takes as input the output from MOP_PREPROCESS: basecalled fast5 reads, together with their respective fastq files, alignment and assignment read ID to gene/transcript. It outputs the estimation of poly(A) tail length at read level provided by Tailfindr, Nanopolish or both.

Input Parameters

Parameter name	Description
input_path	Output folder generated by mop_preprocess
reference	reference sequences
output	Output folder
pars_tools	TSV file with optional extra command line parameters for the tool indicated in the first field.
tailfindr	It (in)activate the corresponding flow. It can be YES or NO
nanopolish	It (in)activate the corresponding flow. It can be YES or NO
email	Email for pipeline reporting.

How to run the pipeline

Before launching the pipeline,user should:

Decide which containers to use - either docker or singularity [-with-docker / -with-singularity].
Fill in both params.config and tools_opt.tsv files.

To launch the pipeline, please use the following command:

nextflow run mop_tail.nf -with-singularity > log.txt

You can run the pipeline in the background adding the nextflow parameter -bg:

nextflow run mop_tail.nf -with-singularity -bg > log.txt

You can change the parameters either by changing params.config file or by feeding the parameters via command line:

nextflow run mop_tail.nf -with-singularity -bg --output test2 > log.txt

You can specify a different working directory with temporary files:

nextflow run mop_tail.nf -with-singularity -bg -w /path/working_directory > log.txt

Note

In case of errors you can troubleshoot seeing the log file (log.txt) for more details. Furthermore, if more information is needed, you can also find the working directory of the process in the file. Then, access that directory indicated by the error output and check both the .command.log and .command.err files.

Tip

Once the error has been solved or if you change a specific parameter, you can resume the execution with the Netxtlow parameter - resume (only one dash!). If there was an error, the pipeline will resume from the process that had the error and proceed with the rest. If a parameter was changed, only processes affected by this parameter will be re-run.

nextflow run mop_tail.nf -with-singularity -bg -resume > log_resumed.txt

To check whether the pipeline has been resumed properly, please check the log file. If previous correctly executed process are found as Cached, resume worked!

Results

Several folders are created by the pipeline within the output directory specified by the output parameter:

NanoPolish: contains the output of nanopolish tool.
Tailfindr: contains the output of tailfindr tool.
PolyA_final: contains the txt files with the combined results (i.e. predicted polyA sizes). Here an example of a test:

"Read name"  "Tailfindr"     "Nanopolish"    "Gene Name"
"013a5dde-9c52-4de1-83eb-db70fb2cd130"       52.16   49.39   "YKR072C"
"01119f62-ca68-458d-aa1f-cf8c8c04cd3b"       231.64  274.28  "YDR133C"
"0154ce9c-fe6b-4ebc-bbb1-517fdc524207"       24.05   24.24   "YFL044C"
"020cde28-970d-4710-90a5-977e4b4bbc46"       41.27   56.79   "YGL238W"

If both softwares are run, an additional plot which shows the correlation of their results is generated.

Here an example of a result: