CHANGELOG

Version 3.0

  • mop_preprocess
    • We added a custom model for m6A basecalling. It is automatically installed when running INSTALL.sh. For using it you need to indicated ``–pars_tools “drna_tool_splice_m6A_opt.tsv” ``

    • We add support to cuda11 for guppy version > 4.4.1.

    • Added readucks for improving demultiplexing with guppy (optional).

    • New parameter “barcodes” where you can specify a file with barcodes to be kept. Example in keep_barcodes.txt

    • Adding a new model for direct RNA basecalling.

    • Added support to dorado basecalling. Not yet supported the demultiplexing

    • Also guppy version >= 6.5.x are supported. No need for indicating different command lines for different guppy versions inside tool_opts. The pipeline will get the version and act accordingly

    • pod5 are supported for dorado and guppy >= 6.5.x. No fast5 and stats files will be output. This will limit other pipelines.

  • mop_tail
    • we upgraded tailfindR to version 1.3

    • Tailfinder can be used either in standard mode or nano3p mode (chemistry R10 and R9) by specifying the tailfindr_mode to: standard, n3ps_r9 or n3ps_r10.

Version 2.0

  • Completely rewritten using the powerful DSL2.

  • Subworkflows are stored in the independent repository BioNextflow.

  • Global nextflow config is broken down to different profiles (cluster, cloud, local…)

  • Added the new module mop_consensus

  • mop_preprocess (formerly known as nanoPreprocess + nanoPreprocessSimple)
    • now can read multiple runs per time using the syntax “PATH/**/*.fast5”

    • can demultiplex fast5 using guppy too

    • deeplexicon can be run on GPU too

    • Parameters of each tool are stored in a tsv file. We have different ones already pre-set for cDNA, DNA and dRNA (option –pars_tools)

    • Added new process discovery with bambu / isoquant for discovering and quantifying new transcripts.

    • demultiplexing, filtering, mapping, counting and discovery can be switched off by setting “NO” as a parameter

    • saveSpace can be set to “YES” to reduce the amount of disk space required. WARNING This will prevent the possibility to resume!

    • Merged old NanoPreprocess and NanoPreprocessSimple in mop_preprocess. Using fastq or fast5 will switch among the two executions.

    • Htseq-count now accepts alignments generated by minimap2. https://github.com/htseq/htseq/issues/33

    • We can specify a final_summary_.txt** for extracting kit and flowcell info in the params.config file. If not present we should specify those info or a custom model via extra parameters in one of the *_opt.tsv files or guppy will trigger an error.

    • This module can be run in AWS BATCH using the profile awsbatch

    • demultiplexing of fast5 with deeplexicon is now faster thanks to multithreading and parallelization

  • mop_tail (formerly known as nanoTail)
    • now you can launch each analysis independently

    • Fine tuning of parameter for each step in tools_opt.tsv

  • mop_mod (formerly known as nanoMod)
    • coming SOON!

Version 1.1

  • Added a new module called NanoPreprocessSimple that starts from fastq files instead of fast5 files. It allows the analysis of multiple files at a time.

  • Added support to vbz compressed fast5 https://github.com/nanoporetech/vbz_compression in NanoPreprocess, NanoMod and NanoTail

  • NanoPreprocess now outputs also CRAM files and can do downsampling with the parameter –downsampling

  • NanoPreprocess allows performing variant calling using medaka (BETA)

  • NanoPreprocess allows performing demultiplexing with GUPPY

  • Added plots for Epinano output in NanoMod

  • Added a conversion of Tombo results in bed format in NanoMod

  • Added a INSTALL.sh file for automatically retrieve guppy 3.4.5 from https://mirror.oxfordnanoportal.com/, place it in NanoPreprocess/bin and making the required links

  • Added profiles for being used locally and on the CRG SGE cluster

Version 1.0

This is the original version published in the paper MasterOfPores: A Workflow for the Analysis of Oxford Nanopore Direct RNA Sequencing Datasets