17.6 Exercise 10: Regular expressions
Create the script “exercise8.R” and save it to the “Rcourse/ExtraModule” directory: you will save all the commands of exercise 8 in that script.
Remember you can comment the code using #.
Answer
getwd()
setwd("~/Rcourse/ExtraModule")
1- Play with grep
- Create the following data frame
<- data.frame(age=c(32, 45, 12, 67, 40, 27),
df2 citizenship=c("England", "India", "Spain", "Brasil", "Tunisia", "Poland"),
row.names=paste(rep(c("Patient", "Doctor"), c(4, 2)), 1:6, sep=""),
stringsAsFactors=FALSE)
Using grep: create the smaller data frame df3 that contains only the Patient’s but NOT the Doctor’s information.
Answer
# Select row names
rownames(df2)
# Select only rownames that correspond to patients
grep("Patient", rownames(df2))
# Create data frame that contains only those rows
<- df2[grep("Patient", rownames(df2)), ] df3
- Use grep and one regular expression to retrieve from df2 patients/doctors coming from either Brasil or Spain. Brainstorm a bit!
Answer
grep("a[a-z]*i", df2$citizenship),] df2[
- Use grep and one regular expression to retrieve from df2 patients/doctors coming from either Brasil or England.
Answer
grep("[gi]l", df2$citizenship),] df2[
2- Play with gsub
Build this vector of file names:
<- c("L2_sample1_GTAGCG.fastq.gz", "L1_sample2_ATTGCC.fastq.gz",
vector1 "L1_sample3_TGTTAC.fastq.gz", "L4_sample4_ATGGTA.fastq.gz")
Use gsub and an appropriate regular expression on vector1 to retrieve only “sample1,” “sample2,” “sample3” and “sample4.”
Answer
# | is used as OR
gsub(pattern="L[124]{1}_|_[ATGC]{6}.fastq.gz",
replacement="",
x=vector1)