Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download

R

Views: 4031
Kernel: R (R-Project)

© Copyright 2016 Dr Marta Milo and Dr Mike Croucher, University of Sheffield.

Week 6 Practical

This Notebook contains practical assignments for Week 6. The practical consists in writing the pipeline for data analysis of the final project.

The pipeline should reflect the basic workflow for gene expression data analysis that we have defined in Week4 and Week5 as well as all the details on thresholds and methods that you are going to use to develop the final project.

Once you have defined your pipeline, you will discuss this practical in the group to which you are alocated for the final project. In the Week6 folder you will find this notebook, the notebook conataing datails for the project and the data folder with the .CEL files you will have to use.

Example of basic workflow:

  • Step1: Load packages with data from Bioconductor and/or access it from file in the data directory

  • Step 2: Arrange the data in an affybatch using Bioconductor commands. Annotate the PhenoData

  • Step 3: Analysis of gene expression data with different methods and normalisation techniques

  • Step 4: Diagnostics of the data with plotting techniques

  • Step 5: Differential Expression Analysis

  • Step 6: Visualisation of the data with PCA

  • Step 7: Hierarchical clustering of DE (Differentially Expressed) genes

  • Step 8: Functional/Pathway analysis of DE targets using PANTHER or DAVID

To complete this practical, you will need to create a new notebook in the Week 6 folder of your SageMathCloud account that you will call your username_week6.ipynb.

The notebooks assessed but no formative feedback will be given, unless a completely wrong direction has been taken. In this case you will receive a clear explanation on how to rectify it.

**Exercise 1 **: Write a pipeline for the data analysis of the project allocated to your group

?require

Step 1: Load packages with daa from Bioconductor, library(affy) - mas5, rma, library(puma)

Step 2: Load and read data, create affybatch. Annotate with pData.

Step 3: Analysis of gene expression data with different methods and normalisation techniques.

  • Create eset

  • Extract gene expression

  • First diagnostic using density() and boxplot()

  • Normalisation by log2 if required

Step 4: Diagnostics of the data with plotting techniques

  • MAPlot

  • ggplot

  • boxplot

Step 5: Differential Expression Analysis

  • For puma, combine the data using an bayesian Hierarchical model

  • Check the dimension and the pData() for the eset of the combined values. Calculate the FC and plot the data with a MA plot using the command ma.plot()

  • MAPlot

  • use of limma for DE analysis. Remember the three core steps of limma

  • Step 1: build the design contrast matrix

  • Step 2: fit the linear model

  • Step 3: calculate the p-values and FDRs with a empirical Bayes test

Step 6: Visualisation of Data with PCA

  • perform PCA in R using the command prcomp()

  • It needs the traspose command t() since the input for the prcomp() wants the genes in the columns

  • For probabilistic PCA you can use pumaPCA()

Step 7: Hierarchical clustering of DE (Differentially Expressed) genes

  • To perform this we need to activate a library called gplots. We will use the command heatmap.2().

  • We do clustering a the selected genes from our DE analysis this is to search for patterns in of differentially regulatend pathways.

Step 8: Functional/Pathway analysis of DE targets using PANTHER or DAVID