12.8 Exercise 6: about Module 2…
Create the script “exercise6.R” and save it to the “Rcourse/Module2” directory: you will save all the commands of exercise 6 in that script.
Remember you can comment the code using #.
getwd() setwd("Rcourse/Module2") setwd("~/Rcourse/Module2")
1- Install and load package ggplot2
# Install install.packages(pkgs="ggplot2") # Load in the environment library("ggplot2")
Check with sessionInfo() that the package is indeed loaded. Was version of ggplot2 did you get?
2- ggplot2 loads automatically the diamonds dataset in the working environment: you can use it as an object after ggplot2 is loaded. You can see it in “Environment -> change”Global Environment" to “package:ggplot2.”
What are the dimensions of diamonds? What are the column names of diamond?
# Dimensions of diamonds dim(diamonds) # Column names of diamonds colnames(diamonds)
You can read the help page of the diamonds dataset to understand what it contains!
Note: diamonds is a data frame: you can test it with is.data.frame(diamonds) (returns TRUE).
3. How many diamonds have a “Premium” cut ?
nrow(diamonds[diamonds$cut == "Premium",])
4. Select diamonds that have a “Premium” cut AND an “E” color. Save in the new object diams1. How many rows does diams1 have ?
<- diamonds[diamonds$cut == "Premium" & diamonds$color == "E",]diams1
5- Install and load the package dplyr from the R Console.
# Install package install.packages(pkgs="dplyr") # Load package library("dplyr")
6- Use the function “sample_n” from the dplyr package (check the help page for sample_n) to randomly sample 200 lines of diams1: save in the new diams object.
# Subset data frame <- sample_n(tbl=diams1, size=200)diams
7- What are the minimum, average and median “carat” of the 200 diamonds in diams ?
min(diams$carat) mean(diams$carat) median(diams$carat) summary(diams$carat)
8- Save diams into a csv file (try the “write.csv” function.)
# Write a csv file with csv write.csv(x=diams, file="diamonds_Premium_E_200.csv", row.names=FALSE, quote=FALSE)