MOP_TAIL

This pipeline takes as input the output from MOP_PREPROCESS: basecalled fast5 reads, together with their respective fastq files, alignment and assignment read ID to gene/transcript. It outputs the estimation of poly(A) tail length at read level provided by Tailfindr, Nanopolish or both. Tailfinr can be run using three modes: standard, for Nano3P-seq protocol with R9 chemistry and Nano3P-seq protocol with R10 chemistry.

mop_tail graph

Input Parameters

The input parameters are stored in yaml files like the one represented here:

input_path: "${projectDir}/../mop_preprocess/outfolder/"
reference: "${projectDir}/../anno/yeast_rRNA_ref.fa.gz"

pars_tools: "${projectDir}/tools_opt.tsv"

output: "${projectDir}/outputPoly"

tailfindr: "YES"

# Different modes: standard, n3ps_r9 or n3ps_r10
tailfindr_mode: "standard"

nanopolish: "YES"

email: "yourname@yourdomain"

How to run the pipeline

Before launching the pipeline,user should:

  1. Decide which containers to use - either docker or singularity [-with-docker / -with-singularity].

  2. Fill in both params.config and tools_opt.tsv files.

To launch the pipeline, please use the following command:

nextflow run mop_tail.nf -params-file params.yaml  -with-singularity > log.txt

You can run the pipeline in the background adding the nextflow parameter -bg:

nextflow run mop_tail.nf -params-file params.yaml -with-singularity -bg > log.txt

You can change the parameters either by changing params.config file or by feeding the parameters via command line:

nextflow run mop_tail.nf -params-file params.yaml -with-singularity -bg --output test2 > log.txt

You can specify a different working directory with temporary files:

nextflow run mop_tail.nf -params-file params.yaml -with-singularity -bg -w /path/working_directory > log.txt

Results

Several folders are created by the pipeline within the output directory specified by the output parameter:

  1. NanoPolish: contains the output of nanopolish tool.

  2. Tailfindr: contains the output of tailfindr tool.

  3. PolyA_final: contains the txt files with the combined results (i.e. predicted polyA sizes). Here an example of a test:

"Read name"  "Tailfindr"     "Nanopolish"    "Gene Name"
"013a5dde-9c52-4de1-83eb-db70fb2cd130"       52.16   49.39   "YKR072C"
"01119f62-ca68-458d-aa1f-cf8c8c04cd3b"       231.64  274.28  "YDR133C"
"0154ce9c-fe6b-4ebc-bbb1-517fdc524207"       24.05   24.24   "YFL044C"
"020cde28-970d-4710-90a5-977e4b4bbc46"       41.27   56.79   "YGL238W"

If both programs are run, an additional plot that shows the correlation of their results is generated.

_images/mod_corr.png