The Project : group C

This Notebook contains specifications for the implemation of the data analysis for the final project assigned to group C. The practical assignments for Week 4-5-6 are the bases to build the final project notebook. ALL the steps suggested in the notebook Week_6_practical must be reported and at least one method used for the evaluation of the gene expression expression levels. Differential Expression analysis needs to be carried out using fold changes as well as measure of their signficance (i.e. p-values, FDRs, PPLR etc.).

The final project will have to be describes in a notebook with implementation of the data analysis, detailed documention and justification for each step using markdown cell within the same notebook. ALL the code cells in the notebook will have to be executed and outputs visible. Make sure that your notebook looks like a final report, with good headings and detailed text. The graphs must be clear and fully explained in markdown cells, the results discussed in full and critical analysis of findings will need to be present in the report.

Figures from functional/pathway analysis will have to be included in the notebook and results discussed in the report. You can insert .jpg figures in the notebook as instructed here:

Place the image you want to link to the notebook in the same folder as your notebook, you can use subfolders to store images ( i.e. a folder called images). To link to it use the following :

![An image from Chen et al (2014)](images/Chen_et_al.png)

The final notebook must include your final pipeline that should reflect the basic workflow for gene expression data analysis, as well as all the details on thresholds and methods that you are going to use to develop the final project. It can be the same as Week_6_practical or a refined version of it.

THE DEADLINE FOR THE SUBMISSION OF THE FINAL PROJECT IS 15TH OF JANUARY 2017 AT 17:00

TIPS and SUGGESTIONS:

To read .CEL files use the command readAffy() in the affy package this will output an affybatch object that you will process to estimate the gene expression values. An example of its implementation is shown below:

sp1_filenames <- c("LPGMa.CEL", "LPGMb.CEL", "LPHa.CEL", "LPHb.CEL")
affybatch.sp1 <- ReadAffy(filenames=sp1_filenames)

If you are using puma as well as RMA run rma() first to avoid conflict in the library, subsequently activate the puma package to run mmgmos().
To combine replicated experiments in puma you need to use pumaCombImproved(), which might take some time to complete.
To save puma data that took time to process you can use the command save() and then load() to upload the object into your workspace. An example of usage is:

save(your_eset, filename="puma_eset.RDA")
load("puma_eset.RDA")

Differential Expression analysis with puma will need you save the results with write.reslts(), as specified in the puma userguide, this will save the Fold Changes for all possible comparisons and the statistics, which in this case are the PPLRs. An example of writing pumaDE() output is:

pumaDERes <- pumaDE(eset_estrogen_comb)
# write results
write.reslts(pumaDERes, file="pumaDERes")

For the heatmap you need to activate the package gplots. If you want to use hierarchical clustering with Pearson Correlation you need to build the function for calculating the correlation. Example of usage is as follow:

hclust2 <- function(x, method="average", ...)
  hclust(x, method=method, ...)  
dist2 <- function(x, ...)
  as.dist(1-cor(t(x), method="pearson"))
heatmap.2(as.matrix(expression_values),distfun=dist2, hclustfun=hclust2, col=redgreen(75), scale="row",
key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.9, cexCol=0.8,lwid=c(0.1,0.1))

To annotate the probesets you need to activate the packages annotate and hgu133plus2.db. Use the command select() to retreive annotations of the probeset IDs. For eaxample you can use:

annotated_list<-select(hgu133plus2.db, topProbes, c("SYMBOL","GENENAME"), "PROBEID")

with topProbes beeing your selected probeset IDs.

The Project

This study is to explain the effect of Hypoxia on human Neutrophils to identify possible involvement of inflammatory response in adverse prognosis of hypoxia-related disease, i.e. pulmonary hypertension, myocardial infarction. To elucidate this effect primary cultures of human neutrophils were studied at normal condition and in a hypoxia condition. A gene expression profile of the neutrophil in normal and hypoxia condition was done after certain ammount of hrs in culture.

The expression profiles were quantified using Affymetrix GeneChip HGU133 PLUS 2. The files containing the data are as follow:

LPGMa.CEL neutrophils at normal condition in culture - sample 1
LPGMb.CEL neutrophils at normal condition in culture - sample 2
LPHa.CEL neutrophils with hypoxia induced in culture - sample 1
LPHb.CEL neutrophils with hypoxia induced in culture - sample 2

After estimating gene expression levels, visualise the data and describe the findings. Identify which genes are changing between conditions and define any potential pathway that the hypoxia might have altered in neutrophils.

In [ ]: