12.8 Exercise 6: about Module 2…

Create the script “exercise6.R” and save it to the “Rcourse/Module2” directory: you will save all the commands of exercise 6 in that script.
Remember you can comment the code using #.

Answer
getwd()
setwd("Rcourse/Module2")
setwd("~/Rcourse/Module2")

1- Install and load package ggplot2

Answer
# Install
install.packages(pkgs="ggplot2")
# Load in the environment
library("ggplot2")

Check with sessionInfo() that the package is indeed loaded. Was version of ggplot2 did you get?

2- ggplot2 loads automatically the diamonds dataset in the working environment: you can use it as an object after ggplot2 is loaded. You can see it in “Environment -> change”Global Environment" to “package:ggplot2.”

What are the dimensions of diamonds? What are the column names of diamond?

Answer
# Dimensions of diamonds
dim(diamonds)
# Column names of diamonds
colnames(diamonds)

You can read the help page of the diamonds dataset to understand what it contains!

Note: diamonds is a data frame: you can test it with is.data.frame(diamonds) (returns TRUE).

3. How many diamonds have a “Premium” cut ?

Answer
nrow(diamonds[diamonds$cut == "Premium",])

4. Select diamonds that have a “Premium” cut AND an “E” color. Save in the new object diams1. How many rows does diams1 have ?

Answer
diams1 <- diamonds[diamonds$cut == "Premium" & diamonds$color == "E",]

5- Install and load the package dplyr from the R Console.

Answer
# Install package
install.packages(pkgs="dplyr")
# Load package
library("dplyr")

6- Use the function “sample_n” from the dplyr package (check the help page for sample_n) to randomly sample 200 lines of diams1: save in the new diams object.

Answer
# Subset data frame
diams <- sample_n(tbl=diams1, size=200)

7- What are the minimum, average and median “carat” of the 200 diamonds in diams ?

Answer
min(diams$carat)
mean(diams$carat)
median(diams$carat)
summary(diams$carat)

8- Save diams into a csv file (try the “write.csv” function.)

Answer
# Write a csv file with csv
write.csv(x=diams, 
    file="diamonds_Premium_E_200.csv",
        row.names=FALSE,
        quote=FALSE)