Part 1 Welcome

About the course

The so-called tidyverse set of packages is widely used in the R community for powerful and efficient data reading, tidying, manipulation and visualization. It is one of the most popular and up-to-date set of tools for data analysis and data science using the R language.

All tidyverse packages share a common vocabulary/grammar that makes code more intuitive and easier to read than the base R.

This 8-hour training aims at introducing some of the tidyverse packages and functions for data wrangling and manipulation: dplyr, tidyr, stringr and readr.

Dates, time & location

  • Dates (2021):
    • April 19th + 21rst
  • Time:
    • 9:30-13:30
  • Location:
    • Online (Zoom: find details in the dedicated Moodle page)

Instructors

Sarah Bonnin
Julia Ponomarenko
from the CRG Bioinformatics core facility (office , 4th floor hospital side)

Prerequisites

This is an intermediate course.
Familiarity with R scripting is required: syntax, installation of packages, objects manipulation, data import/export.

Material

All material is available from this page (https://biocorecrg.github.io/CRG_R_tidyverse_2021) and will be regularly updated.

If you want to get the latest version locally, you can:

  • download and uncompress the zip archive
  • keep only the “docs” and " images" folders.
  • open the “index.html” file in a web browser.

Program

  • Data import & export with {readr}
  • tibbles characteristics and manipulation
  • tidy data definition
  • tidying data with {tidyr}:
    • separate & unite
    • pivot (long and wide formats): pivot_longer, pivot_wider
    • complete (missing values)
  • “forward-pipe”: %>% from the {magrittr} package
  • Data manipulation with {dplyr}:
    • mutate, mutate_at, transmute
    • select, select_if
    • filter
    • summarise, group_by
    • arrange
    • “join” functions
    • count
  • Handling missing data
  • string manipulation with {stringr}:
    • str_remove
    • str_length
    • str_c (paste)
    • str_sub