16.8 Exercise 9: ggplot2
Create the script “exercise9.R” and save it to the “Rcourse/Module3” directory: you will save all the commands of exercise 9 in that script.
Remember you can comment the code using #.
Answer
getwd()
setwd("~/Rcourse/Module3")
16.8.1 Exercise 9a- Scatter plot
1- Load ggplot2 package
Answer
library(ggplot2)
2- Download the data we will use for plotting:
download.file("https://raw.githubusercontent.com/biocorecrg/CRG_RIntroduction/master/ex9_normalized_intensities.csv", "ex9_normalized_intensities.csv", method="curl")
3- Read file into object “project1” (remember the input/output tutorial!)
About this file:
- It is comma separated (csv format).
- The first row is the header.
- Take the row names from the first column.
Answer
<- read.table("ex9_normalized_intensities.csv",
project1 sep=",",
header=TRUE,
row.names = 1)
4- Using ggplot, create a simple scatter plot representing gene expression of “sampleB” on the x-axis and “sampleH” on the y-axis.
Answer
ggplot(data=project1, mapping=aes(x=sampleB, y=sampleH)) +
geom_point()
5- Add a column to the data frame “project1” (call this column “expr_limits”), that will be filled the following way:
- if the expression of a gene is > 13 in both sampleB and sampleH, set to the value in “expr_limits” to “high”
- if the expression of a gene is < 6 in both sampleB and sampleH, set it to “low”
- if different, set it to “normal.”
Answer
# Initialize all values to "normal"
$expr_limits <- "normal"
project1# "high" if project1$sampleB > 13 and project1$sampleH > 13
$expr_limits[project1$sampleB > 13 & project1$sampleH > 13] <- "high"
project1# "low" if project1$sampleB < 6 and project1$sampleH < 6
$expr_limits[project1$sampleB < 6 & project1$sampleH < 6] <- "low"
project1
## more complicated version, using a for loop and if statement
# initialize column "expr_limits" with "normal"
$expr_limits <- "normal"
project1# loop around each row of "project1"
for(i in 1:nrow(project1)){
# create an object that contains only row "i" (the row will be different at every iteration)
<- project1[i,]
rowi # test values in rowi: assign expr_limits accordingly
if(rowi$sampleB > 13 & rowi$sampleH > 13){
$expr_limits[i] <- "high"
project1else if(rowi$sampleB < 6 & rowi$sampleH < 6){
}$expr_limits[i] <- "low"
project1
} }
6- Color the points of the scatter plot according to the newly created column “expr_limits.” Save that plot in the object “p”
Answer
<- ggplot(data=project1, mapping=aes(x=sampleB, y=sampleH, color=expr_limits)) +
p geom_point()
7- Add a layer to “p” in order to change the points colors to blue (for low), grey (for normal) and red (for high). Save this plot in the object “p2.”
Answer
<- p + scale_color_manual(values=c("red", "blue", "grey")) p2
8- Save p2 in a jpeg file.
a. Try with RStudio Plots window (Export)
b. Try in the console:
Answer
jpeg("myscatterggplot.jpg")
p2dev.off()
16.8.2 Exercise 9b- Box plot
1- Convert “project1” from a wide format to a long format: save in the object “project_long” Note: remember melt function from reshape2 package.
Answer
library(reshape2)
<- melt(data=project1) project_long
2- Produce a boxplot of the expression of all samples (i.e. each sample is represented by a box)
Answer
ggplot(data=project_long, mapping=aes(x=variable, y=value)) +
geom_boxplot()
3- Modify the previous boxplot so as to obtain 3 “sub-boxplots” per sample, each representing the expression of either “low,” “normal” or “high” genes.
Answer
ggplot(data=project_long, mapping=aes(x=variable, y=value, color=expr_limits)) +
geom_boxplot()
4- Rotate the x-axis labels (90 degrees angle).
This is new ! Google it !!
Answer
ggplot(data=project_long, mapping=aes(x=variable, y=value, color=expr_limits)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle = 90))
5- Finally, add a title of your choice to the plot.
Answer
ggplot(data=project_long, mapping=aes(x=variable, y=value, color=expr_limits)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle = 90)) +
ggtitle("My boxplots")
16.8.3 Exercise 9c- Bar plot
1- Produce a bar plot of how many low/normal/high genes are in the column “expr_limits” of “project1.”
Answer
ggplot(data=project1, mapping=aes(x=expr_limits)) +
geom_bar()
2- Add an horizontal line at counts 250 (y-axis). Save the plot in the object “bar”
Answer
<- ggplot(data=project1, mapping=aes(x=expr_limits)) +
bar geom_bar() +
geom_hline(yintercept=250)
3- Swap the x and y axis. Save in object “bar2.”
Answer
<- bar + coord_flip() bar2
4- Save “bar” and “bar2” plots in a “png” file, using the png()** function, in the console: use grid.arrange (from the gridExtra package) to organize both plots in one page !**
Answer
png("mybarplots.png", width=1000)
grid.arrange(bar, bar2, nrow=1, ncol=2)
dev.off()
16.8.4 Exercise 9d- Histogram
1- Create a simple histogram using project_long (column “value”).
Answer
ggplot(data=project_long, mapping=aes(x=value)) +
geom_histogram()
2- Notice that you get the following warning message" stat_bin() using bins = 30
. Pick better value with binwidth
.
Set “bins”" parameter of geom_histogram() to 50.
Answer
ggplot(data=project_long, mapping=aes(x=value)) +
geom_histogram(bins=50)
3- The histogram plots the expression values for All samples.
Change the plot so as to obtain one histograms per sample.
Answer
ggplot(data=project_long, mapping=aes(x=value, fill=variable)) +
geom_histogram(bins=50)
4- By default, geom_histogram produces a stacked histogram.
Change argument “position” to “dodge.”
Answer
<- ggplot(data=project_long, mapping=aes(x=value, fill=variable)) +
hist1 geom_histogram(position="dodge")
5- A bit messy ?? Run the following:
<- ggplot(data=project_long, mapping=aes(x=value, fill=variable)) +
hist2 geom_histogram(bins=50) +
facet_grid(~variable)
facet_grid() is another easy way to split the views!
6- Change the default colors with scale_fill_manual().
You can try the rainbow() function for coloring.
Answer
<- hist2 + scale_fill_manual(values=rainbow(8)) hist3
7- “Zoom in” the plots: set the x-axis limits from from 6 to 13.
Add the xlim() layer.
Answer
<- hist3 + xlim(6, 13) hist4
8- Change the default theme to theme_minimal()
Answer
<- hist4 + theme_minimal() hist5
9- Save that last plot to a file (format of your choice) with ggsave()
Answer
ggsave(filename="myhistograms.png", plot=hist5, device="png", width=20)