This module allows to estimates polyA sizes by using two different methods (nanopolish and talifindr). it reads directly the output produced by NanoPreprocess and in particular it needs the read counts / assignment.


  1. check_reference It verifies whether the reference is zipped and eventually unzip it
  2. tailfindr it runs tailfindr tool in parallel.
  3. collect_tailfindr_results It collects the results of tailfindr.
  4. filter_bam Bam files are filtered with samtools to keep only mapped reads and remove secondary alignments
  5. tail_nanopolish It runs nanopolish in parallel.
  6. collect_nanopolish_results It collects the results of tail_nanopolish.
  7. join_results It merges the results from the two algorithms and make a plot of the correlation.

Input Parameters

  1. input_folders path to the folders produced by NanoPreprocessing step.
  2. nanopolish_opt options for the nanopolish program
  3. tailfindr_opt options for the tailfindr program
  4. reference reference genome / transcriptome
  5. output folder
  6. email


Three folders are created by the pipeline within the output folder:

  1. NanoPolish: contains the output of nanopolish tool.
  2. Tailfindr: contains the output of tailfindr tool.
  3. PolyA_final: contains the txt files with the combined results (i.e. predicted polyA sizes). Here an example of a test:
"Read name"	"Tailfindr"	"Nanopolish"	"Gene Name"
"013a5dde-9c52-4de1-83eb-db70fb2cd130"	52.16	49.39	"YKR072C"
"01119f62-ca68-458d-aa1f-cf8c8c04cd3b"	231.64	274.28	"YDR133C"
"0154ce9c-fe6b-4ebc-bbb1-517fdc524207"	24.05	24.24	"YFL044C"
"020cde28-970d-4710-90a5-977e4b4bbc46"	41.27	56.79	"YGL238W"

A plot is also produced for showing the correlation between the two methods.