.. _home-page-mopmod:

*******************
MOP_MOD
*******************

.. autosummary::
   :toctree: generated

This pipeline takes as input the output from MOP_PREPROCESS: basecalled fast5 reads, together with their respective fastq files and unspliced alignments to the transcriptome . It runs four different RNA detection algorithms (Epinano, Nanopolish, Tombo and Nanocompore) and it outputs the predictions generated by each one of them as individual tab-delimited files.

.. image:: ../img/flow_mod.png
  :width: 600
  :alt: mop_mod graph

Input Parameters
======================

The input parameters are stored in yaml files like the one represented here:

.. literalinclude:: ../mop_mod/params.yaml
   :language: yaml


How to run the pipeline
=============================

Before launching the pipeline, user should:

1. Decide which containers to use - either docker or singularity **[-with-docker / -with-singularity]**.
2. Fill in both **params.yaml** and **tools_opt.tsv** files.
3. Fill in **comparison.tsv** file - please see example below:

.. code-block:: console

   wt_1 ko_1
   wt_2 ko_2


To launch the pipeline, please use the following command:

.. code-block:: console

   nextflow run mop_mod.nf -params-file params.yaml -with-singularity > log.txt


You can run the pipeline in the background adding the nextflow parameter **-bg**:

.. code-block:: console

   nextflow run mop_mod.nf -params-file params.yaml -with-singularity -bg > log.txt

You can change the parameters either by changing **params.config** file or by feeding the parameters via command line:

.. code-block:: console

   nextflow run mop_mod.nf -params-file params.yaml -with-singularity -bg --output test2 > log.txt


You can specify a different working directory with temporary files:

.. code-block:: console

   nextflow run mop_mod.nf -params-file params.yaml -with-singularity -bg -w /path/working_directory > log.txt


Results
====================

Several folders are created by the pipeline within the output directory specified by the **output** parameter:

1. **Epinano** results are stored in **epinano_flow** directory. It contains two files per sample: one containing data at position level and the other, at 5-mer level. Different features frequencies as well as quality data are included in the results. See example below:

.. code-block:: console

   #Ref,pos,base,cov,q_mean,q_median,q_std,mis,ins,del
   gene_A,2515,C,45497.0,5.36995,4.00000,3.97797,0.0822032221904741,0.18715519704595907,0.2058377475437941
   gene_A,2516,A,45504.0,5.38207,4.00000,4.71619,0.17128164556962025,0.20497099156118143,0.07733386075949367
   gene_A,2517,C,45529.0,6.92130,5.00000,5.04250,0.06165301236574491,0.1505633771881658,0.13540820136616222
   gene_A,2518,A,45545.0,6.49821,5.00000,5.47485,0.10802503018992206,0.10855198155670216,0.2082775277198375
   gene_A,2519,T,45557.0,6.51247,5.00000,4.81853,0.09386043857145993,0.14792457800118533,0.2033057488421099

Here an example of a plot from Epinano:

.. image:: ../img/epinano.png
  :width: 350  

2. **Tombo** results are stored in **tombo_flow** directory. It contains one file per comparison. It reports the p-value per position, the sum of p-values per 5-mer and coverage in both WT and KO. See example below:

.. code-block:: console

   "Ref_Position"	"Chr"	"Position"	"Tombo_SiteScore"	"Coverage_Sample"	"Coverage_IVT"	"Tombo_KmerScore"
   "gene_A_3"	"gene_A"	"3"	"0.0000"	"92"	"87"	NA
   "gene_A_4"	"gene_A"	"4"	"0.0000"	"92"	"87"	NA
   "gene_A_5"	"gene_A"	"5"	"0.0000"	"92"	"87"	0
   "gene_A_6"	"gene_A"	"6"	"0.0000"	"93"	"88"	0.0014
   "gene_A_7"	"gene_A"	"7"	"0.0000"	"95"	"89"	0.0027
   "gene_A_8"	"gene_A"	"8"	"0.0014"	"95"	"89"	0.004


3. **Nanopolish** results are stored in **nanopolish-compore_flow** directory. It contains two files per sample: raw eventalign output (gzipped) and another with the median raw current per position and transcript (**sample_processed_perpos_median.tsv.gz**). See example below:

.. code-block:: console

   contig	position	reference_kmer	read_name	median	coverage
   gene_A	0	AAATT	1	113.35	433
   gene_A	1	AATTG	1	97.24	506
   gene_A	2	ATTGA	1	70.35	2034
   gene_A	3	TTGAA	1	102.03	416
   gene_A	4	TGAAG	1	115.315	422
   gene_A	5	GAAGA	1	104.25	471

4. **Nanocompore** results are stored in **nanopolish-compore_flow** directory. It contains one file per comparison (**wt_1_vs_ko_1_nanocompore_results.tsv**). Default output from Nanocompore (see Nanocompore's repository for a more detailed explanation).

Encoding of modification information from m6A-aware basecalled data using modPhred
=====================================================================================

Once the data has been basecalled with our m6A modification-aware basecalling model, modification information data should be encoded for its later downstream analysis. This step is performed by **modPhred**, another software included in the **mop_mod** module.

To run this software, in the ``params.yaml`` file you should specify ``modphred: "YES"`` and run the code below:

.. code-block:: console

   cd mop_mod
   nextflow run mop_mod.nf -params-file params.yaml -with-singularity -bg > yourlog.txt