Week 6 Practical

This Notebook contains practical assignments for Week 6. The practical consists in writing the pipeline for data analysis of the final project.

The pipeline should reflect the basic workflow for gene expression data analysis that we have defined in Week4 and Week5 as well as all the details on thresholds and methods that you are going to use to develop the final project.

Once you have defined your pipeline, you will discuss this practical in the group to which you are alocated for the final project. In the Week6 folder you will find this notebook, the notebook conataing datails for the project and the data folder with the .CEL files you will have to use.

Example of basic workflow:

Step1: Load packages with data from Bioconductor and/or access it from file in the data directory
Step 2: Arrange the data in an affybatch using Bioconductor commands. Annotate the PhenoData
Step 3: Analysis of gene expression data with different methods and normalisation techniques
Step 4: Diagnostics of the data with plotting techniques
Step 5: Differential Expression Analysis
Step 6: Visualisation of the data with PCA
Step 7: Hierarchical clustering of DE (Differentially Expressed) genes
Step 8: Functional/Pathway analysis of DE targets using PANTHER or DAVID

To complete this practical, you will need to create a new notebook in the Week 6 folder of your SageMathCloud account that you will call your username_week6.ipynb.

The notebooks assessed but no formative feedback will be given, unless a completely wrong direction has been taken. In this case you will receive a clear explanation on how to rectify it.

**Exercise 1 **: Write a pipeline for the data analysis of the project allocated to your group

In [1]:

?require

Step 1: Load packages with daa from Bioconductor, library(affy) - mas5, rma, library(puma)

Step 2: Load and read data, create affybatch. Annotate with pData.

Step 3: Analysis of gene expression data with different methods and normalisation techniques.

Create eset
Extract gene expression
First diagnostic using density() and boxplot()
Normalisation by log2 if required

Step 4: Diagnostics of the data with plotting techniques

MAPlot
ggplot
boxplot

Step 5: Differential Expression Analysis

For puma, combine the data using an bayesian Hierarchical model
Check the dimension and the pData() for the eset of the combined values. Calculate the FC and plot the data with a MA plot using the command ma.plot()
MAPlot
use of limma for DE analysis. Remember the three core steps of limma

Step 1: build the design contrast matrix
Step 2: fit the linear model
Step 3: calculate the p-values and FDRs with a empirical Bayes test

Step 6: Visualisation of Data with PCA

perform PCA in R using the command prcomp()
It needs the traspose command t() since the input for the prcomp() wants the genes in the columns
For probabilistic PCA you can use pumaPCA()

Step 7: Hierarchical clustering of DE (Differentially Expressed) genes

To perform this we need to activate a library called gplots. We will use the command heatmap.2().
We do clustering a the selected genes from our DE analysis this is to search for patterns in of differentially regulatend pathways.

Step 8: Functional/Pathway analysis of DE targets using PANTHER or DAVID

In [ ]: