© Copyright 2016 Dr Marta Milo and Dr Mike Croucher, University of Sheffield.
Week 6 Practical
This Notebook contains practical assignments for Week 6. The practical consists in writing the pipeline for data analysis of the final project.
The pipeline should reflect the basic workflow for gene expression data analysis that we have defined in Week4 and Week5 as well as all the details on thresholds and methods that you are going to use to develop the final project.
Once you have defined your pipeline, you will discuss this practical in the group to which you are alocated for the final project. In the Week6 folder you will find this notebook, the notebook conataing datails for the project and the data folder with the .CEL
files you will have to use.
Example of basic workflow:
Step1: Load packages with data from Bioconductor and/or access it from file in the data directory
Step 2: Arrange the data in an affybatch using Bioconductor commands. Annotate the PhenoData
Step 3: Analysis of gene expression data with different methods and normalisation techniques
Step 4: Diagnostics of the data with plotting techniques
Step 5: Differential Expression Analysis
Step 6: Visualisation of the data with PCA
Step 7: Hierarchical clustering of DE (Differentially Expressed) genes
Step 8: Functional/Pathway analysis of DE targets using PANTHER or DAVID
To complete this practical, you will need to create a new notebook in the Week 6 folder of your SageMathCloud account that you will call your username_week6.ipynb.
The notebooks assessed but no formative feedback will be given, unless a completely wrong direction has been taken. In this case you will receive a clear explanation on how to rectify it.
**Exercise 1 **: Write a pipeline for the data analysis of the project allocated to your group
Step 1: Load packages with daa from Bioconductor, library(affy) - mas5, rma, library(puma)
Step 2: Load and read data, create affybatch. Annotate with pData.
Step 3: Analysis of gene expression data with different methods and normalisation techniques.
Create eset
Extract gene expression
First diagnostic using density() and boxplot()
Normalisation by log2 if required
Step 4: Diagnostics of the data with plotting techniques
MAPlot
ggplot
boxplot
Step 5: Differential Expression Analysis
For
puma
, combine the data using an bayesian Hierarchical modelCheck the dimension and the
pData()
for the eset of the combined values. Calculate the FC and plot the data with a MA plot using the command ma.plot()MAPlot
use of
limma
for DE analysis. Remember the three core steps oflimma
Step 1: build the design contrast matrix
Step 2: fit the linear model
Step 3: calculate the p-values and FDRs with a empirical Bayes test
Step 6: Visualisation of Data with PCA
perform PCA in R using the command
prcomp()
It needs the traspose command
t()
since the input for theprcomp()
wants the genes in the columnsFor probabilistic PCA you can use
pumaPCA()
Step 7: Hierarchical clustering of DE (Differentially Expressed) genes
To perform this we need to activate a library called
gplots
. We will use the commandheatmap.2()
.We do clustering a the selected genes from our DE analysis this is to search for patterns in of differentially regulatend pathways.
Step 8: Functional/Pathway analysis of DE targets using PANTHER or DAVID