4.1 Read

read_delim, read_csv, read_tsv: read a delimited file into a tibble:

Table 4.1: read_delim and derived functions
name delim
read_delim needs to be set
read_csv ,: comma-separated
read_tsv : tab-separated


Read in a file provided as an example with the readr package:

f1 <- read_csv(readr_example("mtcars.csv"))
## 
## ── Column specification ───────────────────────────────────────────────────────────────────────
## cols(
##   mpg = col_double(),
##   cyl = col_double(),
##   disp = col_double(),
##   hp = col_double(),
##   drat = col_double(),
##   wt = col_double(),
##   qsec = col_double(),
##   vs = col_double(),
##   am = col_double(),
##   gear = col_double(),
##   carb = col_double()
## )

Read in a file available online (https protocol):

f2 <- read_tsv("https://public-docs.crg.es/biocore/projects/training/R_tidyverse_2021/inputB.txt")
## 
## ── Column specification ───────────────────────────────────────────────────────────────────────
## cols(
##   State = col_character(),
##   Population = col_double(),
##   Capital = col_character(),
##   Eurozone = col_logical()
## )

As you can see, readr prints out the column specifications, so you can make sure the data is read the way it is meant to be.

Useful arguments to consider, as you read in the file:

  • n_max=k : read in a subset (first k rows).
  • col_names:
    • col_names=FALSE : your data doesn’t contain headers/column names.
    • col_names=c(“A,” “B,” “C”) : you are providing a vector containing column names / header.
  • skip=j : skip the first j rows.
f1 <- read_csv(readr_example("mtcars.csv"),
                n_max=5,
                skip=1,
                col_names=LETTERS[1:11])
## 
## ── Column specification ───────────────────────────────────────────────────────────────────────
## cols(
##   A = col_double(),
##   B = col_double(),
##   C = col_double(),
##   D = col_double(),
##   E = col_double(),
##   F = col_double(),
##   G = col_double(),
##   H = col_double(),
##   I = col_double(),
##   J = col_double(),
##   K = col_double()
## )
f1
## # A tibble: 5 x 11
##       A     B     C     D     E     F     G     H     I     J     K
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1  21       6   160   110  3.9   2.62  16.5     0     1     4     4
## 2  21       6   160   110  3.9   2.88  17.0     0     1     4     4
## 3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1
## 4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1
## 5  18.7     8   360   175  3.15  3.44  17.0     0     0     3     2