Small project

Data set selection

We will try and analysis this data.


GEO dataset GSE128010 studies the effect of the knock-out of gene SPP381 in Saccharomyces cerevisiae.
The data is paired-end! You need to adapt the pipeline!

Pipeline

  • Create a new folder in ~/rnaseq_course
  • Download raw data from SRA Run Selector with fastq-dump
    • Back up: you can alternatively download the data with wget from: https://public-docs.crg.es/biocore/projects/training/PHINDaccess2020/miniproject
  • Quality control of the data with FastQC.
  • Decide if you need to trim the data! If so, use skewer.
  • Retrieve genome reference genome and annotation files from ENSEMBL: ENSEMBL also provides fasta files for transcripts in the cdna folder of the FTP.
  • Prepare SALMON index.
  • Map data with SALMON
  • Proceed with the differential expression analysis with DESeq2: import data, fit model, extract diffferential expression of WT vs KO, build dendrogram and run PCA. Filter genes.
  • If any time is left, explore what kind of functional analysis you can run on this data set.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE67163