9.9 Exercise 5. Data frame manipulation

Create the script “exercise5.R” and save it to the “Rcourse/Module1” directory: you will save all the commands of exercise 5 in that script.
Remember you can comment the code using #.

correction

9.9.1 Exercise 5a

1- Create the following data frame:

43 181 M
34 172 F
22 189 M
27 167 F


With Row names: John, Jessica, Steve, Rachel.
And Column names: Age, Height, Sex.

correction

2- Check the structure of mydf with str().

correction

3- Calculate the average age and height in mydf

Try different approaches:

  • Calculate the average for each column separately.

correction

  • Calculate the average of both columns simultaneously using the apply() function.

correction

4- Add one row to mydf: Georges who is 53 years old and 168cm tall.

correction

5- Change the row names of mydf so the data becomes anonymous: Use Patient1, Patient2, etc. instead of actual names.

correction

6- Create the data frame mydf2 that is a subset of mydf containing only the female entries.

correction

7- Create the data frame mydf3 that is a subset of mydf containing only entries of males taller than 170.

correction

9.9.2 Exercise 5b

1. Create two data frames mydf1 and mydf2 as:

mydf1:

1 14
2 12
3 15
4 10

mydf2:

1 paul
2 helen
3 emily
4 john
5 mark

With column names: “id”, “age” for mydf1, and “id”, “name” for mydf2.

correction

2- Merge mydf1 and mydf2 by their “id” column. Look for the help page of merge and/or Google it!

correction

3- Order mydf3 by decreasing age. Look for the help page of order.

correction

9.9.3 Exercise 5c

1- Using the download.file function, download this file to your current directory. (Right click on “this file” -> Copy link location to get the full path).

correction

2- The function dir() lists the files and directories present in the current directory: check if genes_dataframe.RData was copied.

correction

3- Load genes_dataframe.RData in your environment Use the load function.

correction

4- genes_dataframe.RData contains the mydf_genes object: is it now present in your environment?

correction

5- Explore mydf_genes and see what it contains You can use a variety of functions: str, head, tail, dim, colnames, rownames, class…

correction

6- Select rows for which pvalue_KOvsWT < 0.05 AND log2FoldChange_KOvsWT > 0.5. Store in the up object.

correction

How many rows (genes) were selected?

7- Select from the up object the Zinc finger protein coding genes (i.e. the gene symbol starts with Zfp). Use the grep() function.

correction

8- Select rows for which pvalue_KOvsWT < 0.05 AND log2FoldChange_KOvsWT is > 0.5 OR < -0.5. For the selection of log2FoldChange: give the abs function a try!
Store in the diff_genes object.

correction

How many rows (genes) were selected?