MOP_MOD

This module takes as input the output from MOP_PREPROCESS: basecalled fast5 reads, together with their respective fastq files and unspliced alignments to the transcriptome . It runs four different RNA detection algorithms (Epinano, Nanopolish, Tombo and Nanocompore) and it outputs the predictions generated by each one of them as individual tab-delimited files.

Input Parameters

Parameter name	Description
input_path	Output folder generated by mop_preprocess
comparison	TSV file with two fields, each one will indicate the ID of the sample that has to be compared 1 vs 1
reference	reference sequences
output	Output folder
pars_tools	TSV file with optional extra command line parameters for the tool indicated in the first field.
epinano	It (in)activate the corresponding flow. It can be YES or NO
nanocompore	It (in)activate the corresponding flow. It can be YES or NO
tombo_lsc	It (in)activate the corresponding flow. It can be YES or NO
tombo_msc	It (in)activate the corresponding flow. It can be YES or NO
epinano_plots	If YES will produce a plot for each sample for each transcript.
email	Email for pipeline reporting.

How to run the pipeline

Before launching the pipeline,user should:

Decide which containers to use - either docker or singularity [-with-docker / -with-singularity].
Fill in both params.config and tools_opt.tsv files.
Fill in comparison.tsv file - please see example below:

wt_1 ko_1
wt_2 ko_2

To launch the pipeline, please use the following command:

nextflow run mop_mod.nf -with-singularity > log.txt

You can run the pipeline in the background adding the nextflow parameter -bg:

nextflow run mop_mod.nf -with-singularity -bg > log.txt

You can change the parameters either by changing params.config file or by feeding the parameters via command line:

nextflow run mop_mod.nf -with-singularity -bg --output test2 > log.txt

You can specify a different working directory with temporary files:

nextflow run mop_mod.nf -with-singularity -bg -w /path/working_directory > log.txt

Note

In case of errors you can troubleshoot seeing the log file (log.txt) for more details. Furthermore, if more information is needed, you can also find the working directory of the process in the file. Then, access that directory indicated by the error output and check both the .command.log and .command.err files.

Tip

Once the error has been solved or if you change a specific parameter, you can resume the execution with the Netxtlow parameter - resume (only one dash!). If there was an error, the pipeline will resume from the process that had the error and proceed with the rest. If a parameter was changed, only processes affected by this parameter will be re-run.

nextflow run mop_mod.nf -with-singularity -bg -resume > log_resumed.txt

To check whether the pipeline has been resumed properly, please check the log file. If previous correctly executed process are found as Cached, resume worked!

Results

Several folders are created by the pipeline within the output directory specified by the output parameter:

Epinano results are stored in epinano_flow directory. It contains two files per sample: one containing data at position level and the other, at 5-mer level. Different features frequencies as well as quality data are included in the results. See example below:

#Ref,pos,base,cov,q_mean,q_median,q_std,mis,ins,del
gene_A,2515,C,45497.0,5.36995,4.00000,3.97797,0.0822032221904741,0.18715519704595907,0.2058377475437941
gene_A,2516,A,45504.0,5.38207,4.00000,4.71619,0.17128164556962025,0.20497099156118143,0.07733386075949367
gene_A,2517,C,45529.0,6.92130,5.00000,5.04250,0.06165301236574491,0.1505633771881658,0.13540820136616222
gene_A,2518,A,45545.0,6.49821,5.00000,5.47485,0.10802503018992206,0.10855198155670216,0.2082775277198375
gene_A,2519,T,45557.0,6.51247,5.00000,4.81853,0.09386043857145993,0.14792457800118533,0.2033057488421099

Here an example of a plot from Epinano:

Tombo results are stored in tombo_flow directory. It contains one file per comparison. It reports the p-value per position, the sum of p-values per 5-mer and coverage in both WT and KO. See example below:

"Ref_Position"       "Chr"   "Position"      "Tombo_SiteScore"       "Coverage_Sample"       "Coverage_IVT"  "Tombo_KmerScore"
"gene_A_3"   "gene_A"        "3"     "0.0000"        "92"    "87"    NA
"gene_A_4"   "gene_A"        "4"     "0.0000"        "92"    "87"    NA
"gene_A_5"   "gene_A"        "5"     "0.0000"        "92"    "87"    0
"gene_A_6"   "gene_A"        "6"     "0.0000"        "93"    "88"    0.0014
"gene_A_7"   "gene_A"        "7"     "0.0000"        "95"    "89"    0.0027
"gene_A_8"   "gene_A"        "8"     "0.0014"        "95"    "89"    0.004

Nanopolish results are stored in nanopolish-compore_flow directory. It contains two files per sample: raw eventalign output (gzipped) and another with the median raw current per position and transcript (sample_processed_perpos_median.tsv.gz). See example below:

contig       position        reference_kmer  read_name       median  coverage
gene_A       0       AAATT   1       113.35  433
gene_A       1       AATTG   1       97.24   506
gene_A       2       ATTGA   1       70.35   2034
gene_A       3       TTGAA   1       102.03  416
gene_A       4       TGAAG   1       115.315 422
gene_A       5       GAAGA   1       104.25  471

Nanocompore results are stored in nanopolish-compore_flow directory. It contains one file per comparison (wt_1_vs_ko_1_nanocompore_results.tsv). Default output from Nanocompore (see Nanocompore’s repository for a more detailed explanation).