.. _home-page-mopmod: ******************* MOP_MOD ******************* .. autosummary:: :toctree: generated This pipeline takes as input the output from **mop_preprocess** and the pod5 that have been used as input. It runs 6 different workflows, 4 of which can be used later by **mop_consensus**: Epinano, F5c, nanoRMS, and baseQ. To use nanoRMS, you need to base the call with dorado-mod (i.e., with the option to emit the move table). .. image:: ../img/flow_mod.png :width: 600 :alt: mop_mod graph The remaining two are modKit and M6Anet: .. image:: ../img/flow_mod2.png :width: 600 :alt: mop_mod2 graph For running modkit, you need to basecall with dorado with models that include one or more modified bases. Input Parameters ====================== The input parameters are stored in yaml files like the one represented here: .. literalinclude:: ../mop_mod/params.yaml :language: yaml How to run the pipeline ============================= Before launching the pipeline, user should: 1. Decide which containers to use - docker or singularity **[-with-docker / -with-singularity]**. 2. Fill in both **params.yaml** and **tools_opt.tsv** files. 3. Fill in **comparison.tsv** file - please see example below: .. code-block:: console wt_1 ko_1 wt_2 ko_2 To launch the pipeline, please use the following command: .. code-block:: console nextflow run mop_mod.nf -params-file params.yaml -with-singularity > log.txt You can run the pipeline in the background adding the nextflow parameter **-bg**: .. code-block:: console nextflow run mop_mod.nf -params-file params.yaml -with-singularity -bg > log.txt You can change the parameters either by changing **params.config** file or by feeding the parameters via command line: .. code-block:: console nextflow run mop_mod.nf -params-file params.yaml -with-singularity -bg --output test2 > log.txt You can specify a different working directory with temporary files: .. code-block:: console nextflow run mop_mod.nf -params-file params.yaml -with-singularity -bg -w /path/working_directory > log.txt .. note:: If you have demultiplexed data in your mop_preprocess you need to turn on the option `demulti_pod5: "ON"`. The input of mop_mod will be: .. code-block:: bash input_pod5: "${projectDir}/../mop_preprocess/output/pod5_files/**/*.pod5" Results ==================== Several folders are created by the pipeline within the output directory specified by the **output** parameter: **Epinano** ---------------- results are stored in **epinano_flow** directory. It contains two files per sample: one containing data at position level and the other, at 5-mer level. Different features frequencies as well as quality data are included in the results. See example below: .. code-block:: console #Ref,pos,base,cov,q_mean,q_median,q_std,mis,ins,del gene_A,2515,C,45497.0,5.36995,4.00000,3.97797,0.0822032221904741,0.18715519704595907,0.2058377475437941 gene_A,2516,A,45504.0,5.38207,4.00000,4.71619,0.17128164556962025,0.20497099156118143,0.07733386075949367 gene_A,2517,C,45529.0,6.92130,5.00000,5.04250,0.06165301236574491,0.1505633771881658,0.13540820136616222 gene_A,2518,A,45545.0,6.49821,5.00000,5.47485,0.10802503018992206,0.10855198155670216,0.2082775277198375 gene_A,2519,T,45557.0,6.51247,5.00000,4.81853,0.09386043857145993,0.14792457800118533,0.2033057488421099 To run this software, in the ``params.yaml`` file you should specify ``epinano: "YES"``. Here an example of a plot from Epinano: .. image:: ../img/epinano.png :width: 350 F5c -------- F5c resquiggles the signal from the input pod5 using the alignments and the basecalled fastq files produced by mop_preprocess. It produces a compressed TSV file with mean current intensity per position. To run this software, in the ``params.yaml`` file you should specify ``f5c: "YES"``. nanoRMS -------- nanoRMS uses the move table stored in the bam file when basecalling with dorado-mod in mop_preprocess for predicting the modified bases. It needs a comparison file (unmodified vs modified) and it generates three BED files per comparison. To run this software, in the ``params.yaml`` file you should specify ``nanoRMS: "YES"``. baseQ -------- baseQ uses just the bam files for predicting the modified bases. It needs a comparison file (unmodified vs modified) and it generates a BED file per comparison. To run this software, in the ``params.yaml`` file you should specify ``baseQ: "YES"``. Modkit -------- Modkit is used for extracting modification encoded within BAM files (basecalling with modification) Once the data has been basecalled with a modification-aware basecalling model, modification can be evaluated using modkit. To run this software, in the ``params.yaml`` file you should specify ``modkit: "YES"``. Modkit will generate a pileup file that can be filtered by keeping only the positions with a coverage higher than a value indicated in ``params.yaml``. Moreover, you can select the strand and which field to be reported. Default is 11 (percent_modified). Here the list of fields: chrom, chromStart, chromEnd, name, score, strand, thickStart, thickEnd, color, valid_coverage, percent_modified, count_modified, count_canonical, count_other_mode, count_delete, count_fail, count_diff, count_nocall .. code-block:: yaml modkit: modkit: "--mod-threshold 21891:0.90 --edge-filter 500,500" cov_filtering: "" strand: "" field_sel: "11" The resulting bedgraphs are then merged in a single table called union.bed. m6Anet -------- The m6Anet workflow uses f5c for resquiggling the signal stored in the POD5 files using alignments in BAM and basecalled reads in FASTQ generated by mop_preprocess. The current intensity per read position is then used by m6Anet for m6A prediction. The results is stored in a folder per sample and it consists of two CSV files, one for individual probabilities and one for site probabilities. To run this software, in the ``params.yaml`` file you should specify ``m6Anet: "YES"``.