Intermediate R: introduction to data wrangling with the Tidyverse (2021)
2021-04-21
Part 1 Welcome
About the course
The so-called tidyverse set of packages is widely used in the R community for powerful and efficient data reading, tidying, manipulation and visualization. It is one of the most popular and up-to-date set of tools for data analysis and data science using the R language.
All tidyverse packages share a common vocabulary/grammar that makes code more intuitive and easier to read than the base R.
This 8-hour training aims at introducing some of the tidyverse packages and functions for data wrangling and manipulation: dplyr, tidyr, stringr and readr.
Dates, time & location
- Dates (2021):
- April 19th + 21rst
- Time:
- 9:30-13:30
- 9:30-13:30
- Location:
- Online (Zoom: find details in the dedicated Moodle page)
Instructors
Sarah Bonnin
Julia Ponomarenko
from the CRG Bioinformatics core facility (office , 4th floor hospital side)
Prerequisites
This is an intermediate course.
Familiarity with R scripting is required: syntax, installation of packages, objects manipulation, data import/export.
Material
All material is available from this page (https://biocorecrg.github.io/CRG_R_tidyverse_2021) and will be regularly updated.
If you want to get the latest version locally, you can:
- download and uncompress the zip archive
- keep only the “docs” and " images" folders.
- open the “index.html” file in a web browser.
Program
- Data import & export with {readr}
- tibbles characteristics and manipulation
- tidy data definition
- tidying data with {tidyr}:
- separate & unite
- pivot (long and wide formats): pivot_longer, pivot_wider
- complete (missing values)
- “forward-pipe”: %>% from the {magrittr} package
- Data manipulation with {dplyr}:
- mutate, mutate_at, transmute
- select, select_if
- filter
- summarise, group_by
- arrange
- “join” functions
- count
- Handling missing data
- string manipulation with {stringr}:
- str_remove
- str_length
- str_c (paste)
- str_sub