© Copyright 2016 Dr Marta Milo, University of Sheffield.
Week 5 Practical
This Notebook contains practical assignments for Week 5.
This is a guidance on how to implement basic workflow for gene expression data analysis. The following steps will be implemented so to generate the full analysis of Affymetrix gene expression data.
Step1: Load packages with data from Bioconductor and/or access it from file in the data directory
Step 2: Arrange the data in an affybatch using Bioconductor commands. Annotate the PhenoData
Step 3: Analysis of gene expression data with different methods and normalisation techniques
Step 4: Diagnostics of the data with plotting techniques
Step 5: Basic use of limma and puma for Differential Expression Analysis
Step 6: Visualisation of the data with PCA
Step 7: Hierarchical clustering of DE (Differentially Expressed) genes
Step 8: Functional analysis of DE targets using PANTHER or DAVID
Step 9: Pathway analysis using PANTHER
We will be plotting large amounts of data. To avoid problems with file size, you should run the following line which changes plot outputs to small .png files
Step1: Load packages with data from Bioconductor and/or access it from file in the data directory
Load data from the data directory data_wk5
Step 2: Arrange the data in an affybatch using Bioconductor commands. Annotate the PhenoData
There is no need to reannotate since the annotation is clear and explicatory.
Step 3: Analysis of gene expression data with different methods and normalisation techniques
Step 4: Diagnostics of the data with plotting techniques
We can repeat the same plotting for puma data.
We can also use boxplots and densities to visualise the data. it is possible to write a small function that is able to plot all the densities on the same plot to check the spread.
this diagnostics shows the difference in the two methods. in this particular data set RMA seems to be more appropriate because the change in gene expressio are only in few genes and the normalisation methods used by RMA helps to spread them out better.
Step 5: Basic use of limma and puma for Differential Expression Analysis
Before performing any type of DE analysis you need to combine the data as explained in the lectures. For puma
we combine the data using an bayesian Hierarchical model, which might take some time when implemented on data with the comand pumaCombImproved()
. Load the puma combined data for this exercise from the data folder using the command load("eset_puma_comb.RDA")
.
We have seen in week 4 initial use of limma
for DE analysis. Remember the three core steps of limma
Step 1: build the design contrast matrix
Step 2: fit the linear model
Step 3: calculate the p-values and FDRs with a empirical Bayes test we used built-in function in
puma
to create the design and constrast matrix.
Extract the top DE genes using topTable()
as seen in week10. Explore the results. You might want to extract more genes from using topTable... look at the parameters.( n=100, for examples...)
Rembenber the threshold for selecting FC after they passed the significance threhold is FC<-1 and FC>1. Assign the thresholds to variables in the script, this helps to modify them and keep the code correct. In this way you will change only the value of one variable and not in all instances when it is recalled.
Always make sure that the p-values are distributed as expected. Use the hist()
to diagnose this.
When the data is not combined, there are 2311 potential DE genes identified initially.
However, only 153 of those are DE in both times (10 and 48 hours after estrogen administration).
Combined data:
As you can see from the presence of NA puma as discovered different DE genes from the ones annotated with RMA.
Step 6: Pricipal component analysis.
Step 7: Hierarchical clustering of DE (Differentially Expressed) genes
To perform this we need to activate a library called gplots
. We will use the command heatmap.2()
. We do clustering a the selected genes from our DE analysis this is to search for patterns in of differentially regulatend pathways.
For the RMA processed we can do:
We changed the name of teh PROBEID with teh correspondenty gene SYMBOLS.
For Step 7 and 8 you need to use the lists of symbols and upload them onto the web portal. Save the SYMBOL into a tab delimited file usint the write.table()
comand and use that into the PANTHER and DAVID web portal.