MOP_MOD

This pipeline takes as input the output from MOP_PREPROCESS: basecalled fast5 reads, together with their respective fastq files and unspliced alignments to the transcriptome . It runs four different RNA detection algorithms (Epinano, Nanopolish, Tombo and Nanocompore) and it outputs the predictions generated by each one of them as individual tab-delimited files.

mop_mod graph

Input Parameters

The input parameters are stored in yaml files like the one represented here:

input_path: "${projectDir}/../mop_preprocess/outfolder/"
comparison: "${projectDir}/comparison.tsv"

reference: "${projectDir}/../anno/yeast_rRNA_ref.fa.gz"
output: "${projectDir}/output_mod"
pars_tools: "${projectDir}/tools_opt.tsv"

# flows
epinano: "YES"
nanocompore: "NO"
tombo_lsc: "YES"
tombo_msc: "YES"
modphred: "NO"

# epinano plots
epinano_plots: "YES"
email: ""

How to run the pipeline

Before launching the pipeline, user should:

  1. Decide which containers to use - either docker or singularity [-with-docker / -with-singularity].

  2. Fill in both params.yaml and tools_opt.tsv files.

  3. Fill in comparison.tsv file - please see example below:

wt_1 ko_1
wt_2 ko_2

To launch the pipeline, please use the following command:

nextflow run mop_mod.nf -params-file params.yaml -with-singularity > log.txt

You can run the pipeline in the background adding the nextflow parameter -bg:

nextflow run mop_mod.nf -params-file params.yaml -with-singularity -bg > log.txt

You can change the parameters either by changing params.config file or by feeding the parameters via command line:

nextflow run mop_mod.nf -params-file params.yaml -with-singularity -bg --output test2 > log.txt

You can specify a different working directory with temporary files:

nextflow run mop_mod.nf -params-file params.yaml -with-singularity -bg -w /path/working_directory > log.txt

Results

Several folders are created by the pipeline within the output directory specified by the output parameter:

  1. Epinano results are stored in epinano_flow directory. It contains two files per sample: one containing data at position level and the other, at 5-mer level. Different features frequencies as well as quality data are included in the results. See example below:

#Ref,pos,base,cov,q_mean,q_median,q_std,mis,ins,del
gene_A,2515,C,45497.0,5.36995,4.00000,3.97797,0.0822032221904741,0.18715519704595907,0.2058377475437941
gene_A,2516,A,45504.0,5.38207,4.00000,4.71619,0.17128164556962025,0.20497099156118143,0.07733386075949367
gene_A,2517,C,45529.0,6.92130,5.00000,5.04250,0.06165301236574491,0.1505633771881658,0.13540820136616222
gene_A,2518,A,45545.0,6.49821,5.00000,5.47485,0.10802503018992206,0.10855198155670216,0.2082775277198375
gene_A,2519,T,45557.0,6.51247,5.00000,4.81853,0.09386043857145993,0.14792457800118533,0.2033057488421099

Here an example of a plot from Epinano:

_images/epinano.png
  1. Tombo results are stored in tombo_flow directory. It contains one file per comparison. It reports the p-value per position, the sum of p-values per 5-mer and coverage in both WT and KO. See example below:

"Ref_Position"       "Chr"   "Position"      "Tombo_SiteScore"       "Coverage_Sample"       "Coverage_IVT"  "Tombo_KmerScore"
"gene_A_3"   "gene_A"        "3"     "0.0000"        "92"    "87"    NA
"gene_A_4"   "gene_A"        "4"     "0.0000"        "92"    "87"    NA
"gene_A_5"   "gene_A"        "5"     "0.0000"        "92"    "87"    0
"gene_A_6"   "gene_A"        "6"     "0.0000"        "93"    "88"    0.0014
"gene_A_7"   "gene_A"        "7"     "0.0000"        "95"    "89"    0.0027
"gene_A_8"   "gene_A"        "8"     "0.0014"        "95"    "89"    0.004
  1. Nanopolish results are stored in nanopolish-compore_flow directory. It contains two files per sample: raw eventalign output (gzipped) and another with the median raw current per position and transcript (sample_processed_perpos_median.tsv.gz). See example below:

contig       position        reference_kmer  read_name       median  coverage
gene_A       0       AAATT   1       113.35  433
gene_A       1       AATTG   1       97.24   506
gene_A       2       ATTGA   1       70.35   2034
gene_A       3       TTGAA   1       102.03  416
gene_A       4       TGAAG   1       115.315 422
gene_A       5       GAAGA   1       104.25  471
  1. Nanocompore results are stored in nanopolish-compore_flow directory. It contains one file per comparison (wt_1_vs_ko_1_nanocompore_results.tsv). Default output from Nanocompore (see Nanocompore’s repository for a more detailed explanation).

Encoding of modification information from m6A-aware basecalled data using modPhred

Once the data has been basecalled with our m6A modification-aware basecalling model, modification information data should be encoded for its later downstream analysis. This step is performed by modPhred, another software included in the mop_mod module.

To run this software, in the params.yaml file you should specify modphred: "YES" and run the code below:

cd mop_mod
nextflow run mop_mod.nf -params-file params.yaml -with-singularity -bg > yourlog.txt