12.8 Exercise 6: about Module 2…
Create the script “exercise6.R” and save it to the “Rcourse/Module2” directory: you will save all the commands of exercise 6 in that script.
Remember you can comment the code using #.
Answer
getwd()
setwd("Rcourse/Module2")
setwd("~/Rcourse/Module2")
1- Install and load package ggplot2
Answer
# Install
install.packages(pkgs="ggplot2")
# Load in the environment
library("ggplot2")
Check with sessionInfo() that the package is indeed loaded. Was version of ggplot2 did you get?
2- ggplot2 loads automatically the diamonds dataset in the working environment: you can use it as an object after ggplot2 is loaded. You can see it in “Environment -> change”Global Environment" to “package:ggplot2.”
What are the dimensions of diamonds? What are the column names of diamond?
Answer
# Dimensions of diamonds
dim(diamonds)
# Column names of diamonds
colnames(diamonds)
You can read the help page of the diamonds dataset to understand what it contains!
Note: diamonds is a data frame: you can test it with is.data.frame(diamonds) (returns TRUE).
3. How many diamonds have a “Premium” cut ?
Answer
nrow(diamonds[diamonds$cut == "Premium",])
4. Select diamonds that have a “Premium” cut AND an “E” color. Save in the new object diams1. How many rows does diams1 have ?
Answer
<- diamonds[diamonds$cut == "Premium" & diamonds$color == "E",] diams1
5- Install and load the package dplyr from the R Console.
Answer
# Install package
install.packages(pkgs="dplyr")
# Load package
library("dplyr")
6- Use the function “sample_n” from the dplyr package (check the help page for sample_n) to randomly sample 200 lines of diams1: save in the new diams object.
Answer
# Subset data frame
<- sample_n(tbl=diams1, size=200) diams
7- What are the minimum, average and median “carat” of the 200 diamonds in diams ?
Answer
min(diams$carat)
mean(diams$carat)
median(diams$carat)
summary(diams$carat)
8- Save diams into a csv file (try the “write.csv” function.)
Answer
# Write a csv file with csv
write.csv(x=diams,
file="diamonds_Premium_E_200.csv",
row.names=FALSE,
quote=FALSE)