.. _home-page-mopmod: ******************* MOP_MOD ******************* .. autosummary:: :toctree: generated This module takes as input the output from MOP_PREPROCESS: basecalled fast5 reads, together with their respective fastq files and unspliced alignments to the transcriptome . It runs four different RNA detection algorithms (Epinano, Nanopolish, Tombo and Nanocompore) and it outputs the predictions generated by each one of them as individual tab-delimited files. Input Parameters ====================== .. list-table:: :widths: 25 75 :header-rows: 1 * - Parameter name - Description * - **input_path** - Output folder generated by mop_preprocess * - **comparison** - TSV file with two fields, each one will indicate the ID of the sample that has to be compared 1 vs 1 * - **reference** - reference sequences * - **output** - Output folder * - **pars_tools** - TSV file with optional extra command line parameters for the tool indicated in the first field. * - **epinano** - It (in)activate the corresponding flow. It can be YES or NO * - **nanocompore** - It (in)activate the corresponding flow. It can be YES or NO * - **tombo_lsc** - It (in)activate the corresponding flow. It can be YES or NO * - **tombo_msc** - It (in)activate the corresponding flow. It can be YES or NO * - **epinano_plots** - If YES will produce a plot for each sample for each transcript. * - **email** - Email for pipeline reporting. How to run the pipeline ============================= Before launching the pipeline,user should: 1. Decide which containers to use - either docker or singularity **[-with-docker / -with-singularity]**. 2. Fill in both **params.config** and **tools_opt.tsv** files. 3. Fill in **comparison.tsv** file - please see example below: .. code-block:: console wt_1 ko_1 wt_2 ko_2 To launch the pipeline, please use the following command: .. code-block:: console nextflow run mop_mod.nf -with-singularity > log.txt You can run the pipeline in the background adding the nextflow parameter **-bg**: .. code-block:: console nextflow run mop_mod.nf -with-singularity -bg > log.txt You can change the parameters either by changing **params.config** file or by feeding the parameters via command line: .. code-block:: console nextflow run mop_mod.nf -with-singularity -bg --output test2 > log.txt You can specify a different working directory with temporary files: .. code-block:: console nextflow run mop_mod.nf -with-singularity -bg -w /path/working_directory > log.txt .. note:: * In case of errors you can troubleshoot seeing the log file (log.txt) for more details. Furthermore, if more information is needed, you can also find the working directory of the process in the file. Then, access that directory indicated by the error output and check both the **.command.log** and **.command.err** files. .. tip:: Once the error has been solved or if you change a specific parameter, you can resume the execution with the **Netxtlow** parameter **- resume** (only one dash!). If there was an error, the pipeline will resume from the process that had the error and proceed with the rest. If a parameter was changed, only processes affected by this parameter will be re-run. .. code-block:: console nextflow run mop_mod.nf -with-singularity -bg -resume > log_resumed.txt To check whether the pipeline has been resumed properly, please check the log file. If previous correctly executed process are found as *Cached*, resume worked! Results ==================== Several folders are created by the pipeline within the output directory specified by the **output** parameter: 1. **Epinano** results are stored in **epinano_flow** directory. It contains two files per sample: one containing data at position level and the other, at 5-mer level. Different features frequencies as well as quality data are included in the results. See example below: .. code-block:: console #Ref,pos,base,cov,q_mean,q_median,q_std,mis,ins,del gene_A,2515,C,45497.0,5.36995,4.00000,3.97797,0.0822032221904741,0.18715519704595907,0.2058377475437941 gene_A,2516,A,45504.0,5.38207,4.00000,4.71619,0.17128164556962025,0.20497099156118143,0.07733386075949367 gene_A,2517,C,45529.0,6.92130,5.00000,5.04250,0.06165301236574491,0.1505633771881658,0.13540820136616222 gene_A,2518,A,45545.0,6.49821,5.00000,5.47485,0.10802503018992206,0.10855198155670216,0.2082775277198375 gene_A,2519,T,45557.0,6.51247,5.00000,4.81853,0.09386043857145993,0.14792457800118533,0.2033057488421099 Here an example of a plot from Epinano: .. image:: ../img/epinano.png :width: 600 2. **Tombo** results are stored in **tombo_flow** directory. It contains one file per comparison. It reports the p-value per position, the sum of p-values per 5-mer and coverage in both WT and KO. See example below: .. code-block:: console "Ref_Position" "Chr" "Position" "Tombo_SiteScore" "Coverage_Sample" "Coverage_IVT" "Tombo_KmerScore" "gene_A_3" "gene_A" "3" "0.0000" "92" "87" NA "gene_A_4" "gene_A" "4" "0.0000" "92" "87" NA "gene_A_5" "gene_A" "5" "0.0000" "92" "87" 0 "gene_A_6" "gene_A" "6" "0.0000" "93" "88" 0.0014 "gene_A_7" "gene_A" "7" "0.0000" "95" "89" 0.0027 "gene_A_8" "gene_A" "8" "0.0014" "95" "89" 0.004 3. **Nanopolish** results are stored in **nanopolish-compore_flow** directory. It contains two files per sample: raw eventalign output (gzipped) and another with the median raw current per position and transcript (**sample_processed_perpos_median.tsv.gz**). See example below: .. code-block:: console contig position reference_kmer read_name median coverage gene_A 0 AAATT 1 113.35 433 gene_A 1 AATTG 1 97.24 506 gene_A 2 ATTGA 1 70.35 2034 gene_A 3 TTGAA 1 102.03 416 gene_A 4 TGAAG 1 115.315 422 gene_A 5 GAAGA 1 104.25 471 4. **Nanocompore** results are stored in **nanopolish-compore_flow** directory. It contains one file per comparison (**wt_1_vs_ko_1_nanocompore_results.tsv**). Default output from Nanocompore (see Nanocompore's repository for a more detailed explanation).