## 5.2 Separate & Unite

Let’s work on the untidy tibble `table5`, a built-in data set from the `tidyr` package (table5 displays the number of tuberculosis cases documented by the WHO in selected countries):

Table 5.1: `table5`
country century year rate
Afghanistan 19 99 745/19987071
Afghanistan 20 00 2666/20595360
Brazil 19 99 37737/172006362
Brazil 20 00 80488/174504898
China 19 99 212258/1272915272
China 20 00 213766/1280428583

### 5.2.1 Separate

Function `separate` separates (indeed!) a column into 2 or more columns, given a specified field separator: `separate(data, col, into, sep)`

``````# separate column "rate" into 2 columns that we name cases and population, based on the "/" separator.
table5_1 <- separate(data=table5,
col=rate,
into=c("cases", "population"),
sep="/"
)

table5_1``````
``````## # A tibble: 6 x 5
##   country     century year  cases  population
##   <chr>       <chr>   <chr> <chr>  <chr>
## 1 Afghanistan 19      99    745    19987071
## 2 Afghanistan 20      00    2666   20595360
## 3 Brazil      19      99    37737  172006362
## 4 Brazil      20      00    80488  174504898
## 5 China       19      99    212258 1272915272
## 6 China       20      00    213766 1280428583``````

The default field separator is a regular expression that matches any sequence of non-alphanumeric values.
So this would work well in our example:

``````separate(data=table5,
col=rate,
into=c("cases", "population")
)``````
``````## # A tibble: 6 x 5
##   country     century year  cases  population
##   <chr>       <chr>   <chr> <chr>  <chr>
## 1 Afghanistan 19      99    745    19987071
## 2 Afghanistan 20      00    2666   20595360
## 3 Brazil      19      99    37737  172006362
## 4 Brazil      20      00    80488  174504898
## 5 China       19      99    212258 1272915272
## 6 China       20      00    213766 1280428583``````

It is good practice to explicitly specify the field separator, so as not to get any bad surprise.

### 5.2.2 Unite

Function `unite` does just the opposite: it will unite/stick together 2 or (more) columns: `unite(data, col, column names, sep)`.

``````# unite columns "century" and "year"
table5a_2 <- unite(data=table5_1,
col=year,
c("century", "year"),
sep="")

table5a_2``````
``````## # A tibble: 6 x 4
##   country     year  cases  population
##   <chr>       <chr> <chr>  <chr>
## 1 Afghanistan 1999  745    19987071
## 2 Afghanistan 2000  2666   20595360
## 3 Brazil      1999  37737  172006362
## 4 Brazil      2000  80488  174504898
## 5 China       1999  212258 1272915272
## 6 China       2000  213766 1280428583``````

#### HANDS-ON

Fix the following untidy `tibble`:

``````patients <- tibble(
ID1=c("A", "A", "B", "B"),
ID2=c(12, 13, 14, 15),
age=c( 21,  32, 25, 43),
c("188_93", "167_55", "155_51", "175_72")
)``````
• Separate the two variables that are found in the last column into columns height and weight: save in object `patients2`.
• Unite columns ID1 and ID2 so as to create a unique ID per patient: save in `patients3`.
``````# Separate the two variables that are found in the last column.
patients2 <- separate(data=patients,
col=4,
into=c("height", "weight"),
sep="_")

# Unite columns ID1 and ID2 so as to create a unique ID per patient.
patients3 <- unite(data=patients2,
col="ID",
c("ID1", "ID2"),
sep="")``````
• Is there a problem with `patients3`? Display the tibble in the R console…

Columns `height` and `weight` are treated as characters (because they originally contained underscores "_" !).
We can convert them into numbers with the `convert=TRUE` parameter in `separate`:

``````# convert=TRUE: runs `type.convert` fonction (that converts data to appropriate type)
patients2 <- separate(data=patients,
col=4,
into=c("height", "weight"),
sep="_",
convert=TRUE)``````