Part 7 Data manipulation
Package dplyr
introduces a grammar of data manipulation. See the nice cheat sheet
We will first introduce the 5 intuitively-named key functions from {dplyr}:
name | what.it.does |
---|---|
mutate
|
adds new variables (columns) that are functions of existing variables |
select
|
picks variables (columns) based on their names. |
filter
|
picks observations (rows) based on their values. |
summarise
|
collapses multiple values down to a single summary. |
arrange
|
changes the ordering of the rows. |
All 5 functions work in a similar and consistent way:
- The first argument is the input: a
data frame
or atibble
. - The output is a new
tibble
.
Note that {dplyr} never modifies the input: you need to redirect the output and save in a new - or the same - object.
We will use the presidential
data set from the ggplot2
package.
It contains data of the terms of presidents of the USA, from Eisenhower to Obama:
- Name
- Term starting date
- Term ending date of mandate
- Political party
name | start | end | party |
---|---|---|---|
Eisenhower | 1953-01-20 | 1961-01-20 | Republican |
Kennedy | 1961-01-20 | 1963-11-22 | Democratic |
Johnson | 1963-11-22 | 1969-01-20 | Democratic |
Nixon | 1969-01-20 | 1974-08-09 | Republican |
Ford | 1974-08-09 | 1977-01-20 | Republican |
Carter | 1977-01-20 | 1981-01-20 | Democratic |
Reagan | 1981-01-20 | 1989-01-20 | Republican |
Bush | 1989-01-20 | 1993-01-20 | Republican |
Clinton | 1993-01-20 | 2001-01-20 | Democratic |
Bush | 2001-01-20 | 2009-01-20 | Republican |
Obama | 2009-01-20 | 2017-01-20 | Democratic |