Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download

R

Views: 4031
Kernel: SageMath (stable)
options(jupyter.plot_mimetypes ='image/png')

** THE EFFECT OF HYPOXIA ON HUMAN NEUTROPHILS **

In this study, the effects of Hypoxia on human Neutrophils were investigated in order to identify the possible involvement of inflammatory response in adverse prognosis of hypoxia-related disease, such as pulmonary hypertension and myocardial infarction. Primary cultures of human neutrophils were studied in both normal and hypoxia conditions. A gene expression profile of the neutrophils in both conditions were done after centain amounts of time in culture, and quantified using Affymetrix GeneChip HGU133 PLUS 2. The study was conducted on two separate samples.

This report aims to estimate gene expression levels, and analyse the results to identify the genes that are changing between the two conditions, defining the potential pathways that hypoxia may have altered in neutrophils.

** Data Analysis **

Workflow

Step 1: Load packages with data from Bioconductor, library(affy) - mas5, rma, library(puma)

Step 2: Load and read data, create affybatch. Annotate with pData.

Step 3: Analysis of gene expression data with different methods and normalisation techniques.

  • Create eset

  • Extract gene expression

  • First diagnostic using density() and boxplot()

  • Normalisation by log2 if required

Step 4: Diagnostics of the data with plotting techniques

  • MAPlot

  • boxplot

Step 5: Differential Expression Analysis

  • For puma, combine the data using an bayesian Hierarchical model

  • Check the dimension and the pData() for the eset of the combined values. Calculate the FC and plot the data with a MA plot using the command ma.plot()

  • MAPlot

  • use of limma for DE analysis. Remember the three core steps of limma

  • Step 1: build the design contrast matrix

  • Step 2: fit the linear model

  • Step 3: calculate the p-values and FDRs with a empirical Bayes test

Step 6: Visualisation of Data with PCA

  • perform PCA in R using the command prcomp()

  • It needs the traspose command t() since the input for the prcomp() wants the genes in the columns

  • For probabilistic PCA you can use pumaPCA()

Step 7: Hierarchical clustering of DE (Differentially Expressed) genes

  • To perform this we need to activate a library called gplots. We will use the command heatmap.2().

  • We do clustering a the selected genes from our DE analysis this is to search for patterns in of differentially regulatend pathways.

Step 8: Functional/Pathway analysis of DE targets using PANTHER or DAVID

Step 1:

library(affy)
Loading required package: BiocGenerics Loading required package: parallel Attaching package: ‘BiocGenerics’ The following objects are masked from ‘package:parallel’: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following objects are masked from ‘package:stats’: IQR, mad, xtabs The following objects are masked from ‘package:base’: anyDuplicated, append, as.data.frame, as.vector, cbind, colnames, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unlist, unsplit Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'.

This step loads the affy package, which is part of the BioConductor project, allowing for data analysis and exploration of Affymetrix oligonucleotide array probe level data. It summarises the probe set intensities, forming one expression measure (data available for analysis) for each gene. The package includes plotting functions for the probe level data useful for quality control, making it useful in the initial analysis of the data, it includes plotting functions for the data that can be useful for quality control of data, RNA degradation assessments, normaliasation and background correction procedures. It also allows for probe level data to be converted to expression measures. In this project, MAS 5.0 and RMA are used for perform the analysis.

Step 2:

Set working directory

setwd("~/Autumn2016/ProjectC/data_projectC")
getwd()
'/projects/ddda6a8e-2bca-47f5-b1d6-79b2c48d0e30/Autumn2016/ProjectC/data_projectC'

In order to load the data that is required, a working directory must be set, leading to where the data is saved.

hypoxia_filenames <- c("LPGMa.CEL","LPGMb.CEL","LPHa.CEL","LPHb.CEL") affybatch.hypoxia <- ReadAffy(filenames=hypoxia_filenames)

The files that contain the data are saved in the .CEL format, indicating the files contain measured intensities and locations for an array that has been hybridised.

show(affybatch.hypoxia)
Warning message: “replacing previous import ‘AnnotationDbi::tail’ by ‘utils::tail’ when loading ‘hgu133plus2cdf’”Warning message: “replacing previous import ‘AnnotationDbi::head’ by ‘utils::head’ when loading ‘hgu133plus2cdf’”
AffyBatch object size of arrays=1164x1164 features (18 kb) cdf=HG-U133_Plus_2 (54675 affyids) number of samples=4 number of genes=54675 annotation=hgu133plus2 notes=

The data shows the size of the array is 1154x1164 (18kb), cdf maps each gene that is in the array (54675 genes), and there are 4 samples.

phenoData(affybatch.hypoxia) pData(affybatch.hypoxia)
An object of class 'AnnotatedDataFrame' sampleNames: LPGMa.CEL LPGMb.CEL LPHa.CEL LPHb.CEL varLabels: sample varMetadata: labelDescription
sample
LPGMa.CEL1
LPGMb.CEL2
LPHa.CEL3
LPHb.CEL4

pData retrieves information on experimental phenotypes that are recorded.

pData(affybatch.hypoxia)<- data.frame( "Condition"=c("Normal", "Normal", "Hypoxia", "Hypoxia"), "Sample"=c("1", "2", "1", "2"), row.names=rownames(pData(affybatch.hypoxia))) pData(affybatch.hypoxia)
ConditionSample
LPGMa.CELNormal 1
LPGMb.CELNormal 2
LPHa.CELHypoxia1
LPHb.CELHypoxia2

Step 3:

This step involves the analysis of gene expression data with different methods and normalisation techniques. The methods convert the probe level data to expression values, which is achieved through:

  • Reading in probe level data

  • Background correction

  • Normalization

  • Probe specific background correction

  • Summarising the probe set values into one expression measure

RMA and MAS 5.0 creates two different types of ExpressionSets, from which the gene expression values will be extracted.

eset_rma<-rma(affybatch.hypoxia) show(eset_rma)
Background correcting Normalizing Calculating Expression ExpressionSet (storageMode: lockedEnvironment) assayData: 54675 features, 4 samples element names: exprs protocolData sampleNames: LPGMa.CEL LPGMb.CEL LPHa.CEL LPHb.CEL varLabels: ScanDate varMetadata: labelDescription phenoData sampleNames: LPGMa.CEL LPGMb.CEL LPHa.CEL LPHb.CEL varLabels: Condition Sample varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation: hgu133plus2
eset_mas5<-mas5(affybatch.hypoxia) show(eset_mas5)
background correction: mas PM/MM correction : mas expression values: mas background correcting...done. 54675 ids to be processed | | |####################| ExpressionSet (storageMode: lockedEnvironment) assayData: 54675 features, 4 samples element names: exprs, se.exprs protocolData sampleNames: LPGMa.CEL LPGMb.CEL LPHa.CEL LPHb.CEL varLabels: ScanDate varMetadata: labelDescription phenoData sampleNames: LPGMa.CEL LPGMb.CEL LPHa.CEL LPHb.CEL varLabels: Condition Sample varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation: hgu133plus2
e_rma<-exprs(eset_rma) head(e_rma)
LPGMa.CELLPGMb.CELLPHa.CELLPHb.CEL
1007_s_at 6.359706 6.378816 6.399439 6.320410
1053_at 7.488801 7.490398 7.339287 7.136260
117_at10.88026010.73816011.34817911.316371
121_at 7.359978 7.344536 7.452264 7.372993
1255_g_at 2.471620 2.503735 2.670402 2.575980
1294_at 6.208576 6.346277 6.331029 6.551821
e_mas5<-exprs(eset_mas5) head(e_mas5)
LPGMa.CELLPGMb.CELLPHa.CELLPHb.CEL
1007_s_at 428.46053 362.13667 437.20195 404.803198
1053_at 604.36568 540.84643 549.49801 592.917234
117_at7722.59667 7774.66079 11815.21462 11637.983040
121_at 461.12158 379.06718 470.19313 480.864624
1255_g_at 41.02802 48.25334 72.33834 4.947423
1294_at 246.59488 251.62389 293.92585 322.333187
density(e_rma) density(e_mas5)
Call: density.default(x = e_rma) Data: e_rma (218700 obs.); Bandwidth 'bw' = 0.1711 x y Min. : 1.276 Min. :8.300e-07 1st Qu.: 4.654 1st Qu.:1.038e-02 Median : 8.031 Median :4.186e-02 Mean : 8.031 Mean :7.396e-02 3rd Qu.:11.408 3rd Qu.:1.444e-01 Max. :14.785 Max. :2.408e-01
Call: density.default(x = e_mas5) Data: e_mas5 (218700 obs.); Bandwidth 'bw' = 14.33 x y Min. : -42.87 Min. :0.000e+00 1st Qu.:16716.38 1st Qu.:2.380e-07 Median :33475.62 Median :7.530e-07 Mean :33475.62 Mean :5.338e-05 3rd Qu.:50234.86 3rd Qu.:3.687e-06 Max. :66994.10 Max. :1.014e-02
par(mfrow=c(1,1)) plot(density(e_rma[,1]),col="red", main="RMA Estimation") lines(density(e_rma[,2]),col="blue") lines(density(e_rma[,3]),col="green") lines(density(e_rma[,4]),col="purple") plot(density(e_mas5[,1]),col="red",main="Mas5 Estimation") lines(density(e_mas5[,2]),col="blue") lines(density(e_mas5[,3]),col="green") lines(density(e_mas5[,4]),col="purple")
Image in a Jupyter notebookImage in a Jupyter notebook
par(mfrow=c(1,1)) boxplot((e_rma), xlab="Neutrophil samples", ylab="Gene Expression", main="Boxplot of gene expression extracted using rma") boxplot((e_mas5), xlab="Neutrophil samples", ylab="Gene Expression", main="Boxplot of gene expression extracted using mas5")
Image in a Jupyter notebookImage in a Jupyter notebook
log2e_mas5<-log2(e_mas5) head(log2e_mas5)
LPGMa.CELLPGMb.CELLPHa.CELLPHb.CEL
1007_s_at 8.743019 8.500390 8.772156 8.661077
1053_at 9.239278 9.079075 9.101970 9.211687
117_at12.91487012.92456413.52835813.506553
121_at 8.849003 8.566310 8.877110 8.909487
1255_g_at 5.358538 5.592557 6.176689 2.306677
1294_at 7.945999 7.975125 8.199308 8.332409
density(log2e_mas5)
Call: density.default(x = log2e_mas5) Data: log2e_mas5 (218700 obs.); Bandwidth 'bw' = 0.2168 x y Min. :-3.749 Min. :1.800e-07 1st Qu.: 1.358 1st Qu.:4.223e-03 Median : 6.466 Median :3.180e-02 Mean : 6.466 Mean :4.890e-02 3rd Qu.:11.574 3rd Qu.:8.519e-02 Max. :16.681 Max. :1.705e-01
par(mfrow=c(1,1)) plot(density(log2e_mas5[,1]),col="red",main="Mas5 Estimation - normalised") lines(density(log2e_mas5[,2]),col="blue") lines(density(log2e_mas5[,3]),col="green") lines(density(log2e_mas5[,4]),col="purple") boxplot((log2e_mas5), xlab="Neutrophil samples", ylab="Gene Expression", main="Boxplot of gene expression extracted using mas5 - normalised")
Image in a Jupyter notebookImage in a Jupyter notebook

Expression values for mas5 and rma are extracted and the first diagnostics performed on the data, using density() and boxplot(). Initial mas5 estimation showed the data was difficult to read due to the large values of the outliers, therefore a log2 transformation was performed, to change the scale and make the plots more readable. The transformation also eliminated much of the negative values. No transformation or normalisation was required however, as the medians are aligned, with no negative outliers. Therefore, further analysis is continued with the use of rma extracted expressions.

require(puma)
Loading required package: puma Loading required package: oligo Loading required package: oligoClasses Welcome to oligoClasses version 1.32.0 Attaching package: ‘oligoClasses’ The following object is masked from ‘package:affy’: list.celfiles Loading required package: Biostrings Loading required package: S4Vectors Loading required package: stats4 Loading required package: IRanges Loading required package: XVector ================================================================================ Welcome to oligo version 1.34.2 ================================================================================ Attaching package: ‘oligo’ The following objects are masked from ‘package:affy’: intensity, MAplot, mm, mm<-, mmindex, pm, pm<-, pmindex, probeNames, rma Loading required package: mclust Package 'mclust' version 5.2 Type 'citation("mclust")' for citing this R package in publications.

puma (Propagating Uncertainty in Microarray Analysis) is another bioconductor package. Microarrays measure the expression level of thousands of genes simultaneously, therefore there are many significant soutces of uncertainties associated with it; these uncertainties must be considered to accurately infer from the data. Earlier methods used (mas5 and rma) only provide single point estimates that summarises the target concentration. By using probabilistic models such as puma for probe-level analysis, it is possible to associate gene expression levels with credibility intervals that quantify the measurement uncentainty associated with the estimate of target concentration with a sample. puma performs analysis through:

  • Calculation of expression levels and confidence measures for those levels from raw .CEL data

  • Combine uncertainty information from replicated arrays

  • Determine differential expression between conditions, or between more complex contrasts such as interaction terms

  • Cluster data taking the expression level uncertainty into account

  • Perform a noise-propagation version of principal compinent analysis (PCA)

eset_puma<-mmgmos(affybatch.hypoxia) show(eset_puma)
Model optimising .............................................................................................................. Expression values calculating .............................................................................................................. Done. Expression Set (exprReslt) with 54675 genes 4 samples An object of class 'AnnotatedDataFrame' sampleNames: LPGMa.CEL LPGMb.CEL LPHa.CEL LPHb.CEL varLabels: Condition Sample varMetadata: labelDescription
eset_puma_normd <-pumaNormalize(eset_puma)
e_puma<-exprs(eset_puma) head(e_puma)
LPGMa.CELLPGMb.CELLPHa.CELLPHb.CEL
1007_s_at 5.902316 5.6103537 5.6174054 5.737430
1053_at 6.626212 6.6790712 6.3627049 6.180460
117_at10.170737 10.121731610.685271710.553831
121_at 5.211771 4.8356632 5.5315978 5.400501
1255_g_at-1.389828 0.2041968 0.2107774-1.592713
1294_at 5.458750 5.9264062 5.8934533 6.153923
density(e_puma)
Call: density.default(x = e_puma) Data: e_puma (218700 obs.); Bandwidth 'bw' = 0.2729 x y Min. :-35.061 Min. :0.000e+00 1st Qu.:-22.699 1st Qu.:2.170e-06 Median :-10.336 Median :2.474e-05 Mean :-10.336 Mean :2.020e-02 3rd Qu.: 2.026 3rd Qu.:2.640e-02 Max. : 14.388 Max. :1.145e-01
e_puma_normd<-exprs(eset_puma_normd) head(e_puma_normd)
LPGMa.CELLPGMb.CELLPHa.CELLPHb.CEL
1007_s_at 5.902316 5.6103537 5.6174054 5.737430
1053_at 6.626212 6.6790712 6.3627049 6.180460
117_at10.170737 10.121731610.685271710.553831
121_at 5.211771 4.8356632 5.5315978 5.400501
1255_g_at-1.389828 0.2041968 0.2107774-1.592713
1294_at 5.458750 5.9264062 5.8934533 6.153923
density(e_puma_normd)
Call: density.default(x = e_puma_normd) Data: e_puma_normd (218700 obs.); Bandwidth 'bw' = 0.2729 x y Min. :-35.061 Min. :0.000e+00 1st Qu.:-22.699 1st Qu.:2.170e-06 Median :-10.336 Median :2.474e-05 Mean :-10.336 Mean :2.020e-02 3rd Qu.: 2.026 3rd Qu.:2.640e-02 Max. : 14.388 Max. :1.145e-01

After performing pumaNormalize() on the data, the first diagnostic tests showed that there is no difference to the data prior to normalisation, therefore indicating that the pumadata is already normalised.

plot(density(e_puma[,1]),col="red", main="PUMA Estimation") lines(density(e_puma[,2]),col="blue") lines(density(e_puma[,3]),col="green") lines(density(e_puma[,4]),col="purple")
Image in a Jupyter notebook
boxplot((e_puma), xlab="Neutrophil samples", ylab="Gene Expression", main="Boxplot of gene expression extracted using puma")
Image in a Jupyter notebook

Although the data is shown to be normalized, and the medians are aligned, it can also be seem from the boxplot that there is a large number of negative outliers, therefore the negative gene expression values are set to zero, to further normalise the data.

for (i in 1:4) { y<-e_puma[,i] y[y<0] <-0 e_puma[,i] <- y } head(e_puma)
LPGMa.CELLPGMb.CELLPHa.CELLPHb.CEL
1007_s_at 5.902316 5.6103537 5.6174054 5.737430
1053_at 6.626212 6.6790712 6.3627049 6.180460
117_at10.170737 10.121731610.685271710.553831
121_at 5.211771 4.8356632 5.5315978 5.400501
1255_g_at 0.000000 0.2041968 0.2107774 0.000000
1294_at 5.458750 5.9264062 5.8934533 6.153923
density(e_puma)
Call: density.default(x = e_puma) Data: e_puma (218700 obs.); Bandwidth 'bw' = 0.2384 x y Min. :-0.7153 Min. :0.0000002 1st Qu.: 3.0348 1st Qu.:0.0172600 Median : 6.7849 Median :0.0533883 Mean : 6.7849 Mean :0.0665816 3rd Qu.:10.5351 3rd Qu.:0.0965064 Max. :14.2852 Max. :0.4313578
plot(density(e_puma[,1]), col="red", main="PUMA Estimation") lines(density(e_puma[,2]),col="green") lines(density(e_puma[,3]),col="blue") lines(density(e_puma[,4]),col="purple")
Image in a Jupyter notebook
boxplot(e_puma,main="Boxplot of gene expression extracted using puma - normalised", xlab="Neutrophil Samples", ylab="gene expression")
Image in a Jupyter notebook

Boxplots show the differences in probe intensity behaviour between arrays. Boxplots are useful in the visualisation of data for first diagnostics, ensuring all the samples are comparable. Box plots show are able to illustrate:

  • Median

  • Upper Quartile

  • Lower Quartile

  • Range

  • Individual extreme values (Outliers)

The boxplots above show that gene expression extracted using rma does not need to be normalised as the medians are aligned, and no negative outliers. The mas5 boxplot showed the data must be log2 transformed in order for comparison to be possible. For puma, the results needed to be normalised due to the high number of negative outliers present, although the medians are aligned.

All three analysis techniques showed a similar range of values following normalisation.

par(mfrow=c(2,2)) MAplot(e_rma)
Image in a Jupyter notebook
par(mfrow=c(2,2)) MAplot(e_puma)
Image in a Jupyter notebook

In MA plots, each Affymetrix marray is compared to a pseudo-array, which consist of the median intensity of each probe over all arrays, the plot shows to what extent the variability in expression depends on the expression level. M is the difference between the intensity of a probe on the array and the median intensity of that probe over all arrays A is the average intensity of a probe on that array and the median intensity of that probe over all arrays.

The cloud of data points in the MA plot is centered around M=0, based on the assumption that the majority of the genes are not differentially expressed, an the number of upregulated genes is similar to the number of downregulated genes.

From the MA plots above, it can be deduced that there appears to be a greater number of downregulated genes in neutrophils under hypoxia conditions than in normal conditions.

Step 5:

eset_puma_comb<- pumaCombImproved(eset_puma_normd)
pumaComb expected completion time is 3 hours .......20%.......40%.......60%.......80%......100% ..................................................
save(eset_puma_comb, file="eset_pumacomb.RDA")
load("eset_pumacomb.RDA") ls()
  1. 'affybatch.hypoxia'
  2. 'e_mas5'
  3. 'e_puma'
  4. 'e_puma_normd'
  5. 'e_rma'
  6. 'eset_mas5'
  7. 'eset_puma'
  8. 'eset_puma_comb'
  9. 'eset_puma_normd'
  10. 'eset_rma'
  11. 'hypoxia_filenames'
  12. 'i'
  13. 'log2e_mas5'
  14. 'y'
show(eset_puma_comb)
ExpressionSet (storageMode: lockedEnvironment) assayData: 54675 features, 4 samples element names: exprs, se.exprs protocolData: none phenoData sampleNames: Hypoxia.1 Normal.1 Hypoxia.2 Normal.2 varLabels: Condition Sample varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation:
pData(eset_puma_comb)
ConditionSample
Hypoxia.1Hypoxia1
Normal.1Normal 1
Hypoxia.2Hypoxia2
Normal.2Normal 2
dim(eset_puma_comb)
Features
54675
Samples
4
hypoxia_comb_puma<-exprs(eset_puma_comb) for(i in 1:4) { temp<-hypoxia_comb_puma[,i] temp[temp<0] <-0 hypoxia_comb_puma[,i]<- temp }
FC_puma<- hypoxia_comb_puma[,1:2] - hypoxia_comb_puma[,3:4] colnames(FC_puma) <- c("Hypoxia-Normal 1","Hypoxia-Normal 2") head(FC_puma)
Hypoxia-Normal 1Hypoxia-Normal 2
1007_s_at-0.0003167666 0.0009128018
1053_at 0.0007166274-0.0002780249
117_at 0.0032436276 0.0011207505
121_at 0.0001154092 0.0002085455
1255_g_at 0.0000000000 0.0000000000
1294_at-0.0019379157-0.0020818635
MAplot(FC_puma)
Warning message in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, : “Binning grid too coarse for current (small) bandwidth: consider increasing 'gridsize'”Warning message in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, : “Binning grid too coarse for current (small) bandwidth: consider increasing 'gridsize'”
Image in a Jupyter notebookImage in a Jupyter notebook
FC_rma<- e_rma[,1:2] - e_rma[,3:4] colnames(FC_rma) <- c("Hypoxia-Normal 1","Hypoxia-Normal 2") head(FC_rma)
Hypoxia-Normal 1Hypoxia-Normal 2
1007_s_at-0.03973322 0.05840595
1053_at 0.14951411 0.35413809
117_at-0.46791889-0.57821119
121_at-0.09228612-0.02845722
1255_g_at-0.19878260-0.07224494
1294_at-0.12245352-0.20554369
MAplot(FC_rma)
Image in a Jupyter notebookImage in a Jupyter notebook
groups<-c("H1","N1","H2","N2") hypoxia_table<-data.frame(sampleNames(eset_puma_comb),groups) group1<-factor(groups[1:2]) group2<-factor(groups[3:4]) group1 group2
  1. H1
  2. N1
  1. H2
  2. N2
hypoxia_table
sampleNames.eset_puma_comb.groups
Hypoxia.1H1
Normal.1 N1
Hypoxia.2H2
Normal.2 N2
par(mfrow=c(2,2)) MAplot(eset_puma_comb)
Warning message in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, : “Binning grid too coarse for current (small) bandwidth: consider increasing 'gridsize'”Warning message in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, : “Binning grid too coarse for current (small) bandwidth: consider increasing 'gridsize'”Warning message in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, : “Binning grid too coarse for current (small) bandwidth: consider increasing 'gridsize'”Warning message in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, : “Binning grid too coarse for current (small) bandwidth: consider increasing 'gridsize'”
Image in a Jupyter notebook
library(limma) group<-factor(c("Normal","Normal","Hypoxia","Hypoxia")) design<-model.matrix(~0+group) colnames(design)<-c("Normal","Hypoxia") contrast.matrix_puma<- makeContrasts(Normal,Hypoxia,levels=design) design contrast.matrix_puma fit<-lmFit(eset_puma,design) fit2<-contrasts.fit(fit,contrast.matrix_puma) fit3<-eBayes(fit2) topDEGenes<-topTable(fit3, coef=1, adjust="BH", n=100, lfc=1) topDEGenes
Attaching package: ‘limma’ The following object is masked from ‘package:oligo’: backgroundCorrect The following object is masked from ‘package:BiocGenerics’: plotMA
NormalHypoxia
101
201
310
410
NormalHypoxia
Normal10
Hypoxia01
logFCAveExprtP.Valueadj.P.ValB
200958_s_at13.16920 13.11567 124.3121 4.891234e-072.954137e-055.351285
204006_s_at12.87494 12.61830 123.9542 4.936824e-072.954137e-055.350019
200748_s_at13.33368 13.34257 122.0398 5.190456e-072.954137e-055.343069
211919_s_at12.99841 12.80683 121.8356 5.218509e-072.954137e-055.342310
211742_s_at12.62507 12.33153 121.0488 5.328459e-072.954137e-055.339351
1555745_a_at12.85586 12.53620 121.0256 5.331751e-072.954137e-055.339263
AFFX-HSAC07/X00351_3_at12.84239 12.95069 120.9210 5.346613e-072.954137e-055.338865
AFFX-hum_alu_at13.11747 13.12849 120.9017 5.349360e-072.954137e-055.338792
212560_at12.62891 12.38139 120.4412 5.415465e-072.954137e-055.337029
217028_at13.12695 12.97323 120.2231 5.447149e-072.954137e-055.336188
208980_s_at12.53417 12.55008 120.1028 5.464738e-072.954137e-055.335722
202727_s_at12.53732 12.55374 119.9873 5.481680e-072.954137e-055.335273
202917_s_at13.44072 13.46929 119.8412 5.503220e-072.954137e-055.334704
200801_x_at12.79446 12.86409 119.7931 5.510328e-072.954137e-055.334517
200668_s_at12.48924 12.47371 119.5855 5.541183e-072.954137e-055.333703
201368_at12.71576 12.42925 119.5836 5.541462e-072.954137e-055.333696
212587_s_at12.65005 12.59432 119.5826 5.541612e-072.954137e-055.333692
202388_at12.95929 12.60095 119.3217 5.580716e-072.954137e-055.332665
200704_at12.84004 12.89974 119.2973 5.584377e-072.954137e-055.332569
200794_x_at12.66494 12.71327 119.1220 5.610873e-072.954137e-055.331874
215952_s_at12.73163 12.62020 118.9827 5.632039e-072.954137e-055.331321
209201_x_at12.96794 12.76079 118.9804 5.632398e-072.954137e-055.331311
AFFX-HSAC07/X00351_5_at12.36732 12.55121 118.8585 5.651011e-072.954137e-055.330826
201721_s_at13.02248 13.02379 118.8531 5.651844e-072.954137e-055.330804
224765_at12.35501 11.72387 118.6302 5.686084e-072.954137e-055.329912
228754_at12.34079 12.06019 118.5799 5.693850e-072.954137e-055.329710
AFFX-HSAC07/X00351_M_at12.53094 12.68500 118.5455 5.699171e-072.954137e-055.329572
208679_s_at12.82208 12.76857 118.4661 5.711483e-072.954137e-055.329252
204122_at12.78061 12.81913 118.3847 5.724132e-072.954137e-055.328923
211296_x_at12.73171 12.73269 118.3529 5.729075e-072.954137e-055.328795
7112.16112 11.66354 115.4135 6.212097e-072.954137e-055.316500
7212.60173 12.61777 115.4121 6.212343e-072.954137e-055.316493
7312.82313 13.02037 115.2926 6.233099e-072.954137e-055.315975
7412.35619 11.88091 115.2806 6.235184e-072.954137e-055.315923
7512.01287 11.82985 115.2349 6.243133e-072.954137e-055.315725
7612.08655 11.76656 115.1775 6.253161e-072.954137e-055.315475
7712.08300 11.74270 115.1515 6.257710e-072.954137e-055.315361
7813.05271 13.04967 115.0145 6.281731e-072.954137e-055.314763
7912.08619 11.89104 114.9890 6.286219e-072.954137e-055.314652
8012.63099 12.73537 114.9662 6.290222e-072.954137e-055.314552
8112.04174 12.15283 114.9463 6.293735e-072.954137e-055.314465
8212.28123 12.27198 114.9459 6.293805e-072.954137e-055.314463
8312.36272 12.52576 114.9123 6.299725e-072.954137e-055.314316
8412.09863 11.61412 114.8484 6.311019e-072.954137e-055.314035
8512.11993 12.16425 114.7207 6.333660e-072.954137e-055.313474
8612.14712 11.62236 114.6764 6.341531e-072.954137e-055.313279
8712.16596 12.04887 114.6760 6.341598e-072.954137e-055.313277
8812.07946 12.06161 114.6299 6.349816e-072.954137e-055.313074
8911.99899 12.09922 114.6182 6.351897e-072.954137e-055.313022
9011.92110 11.82842 114.6063 6.354027e-072.954137e-055.312970
9112.19644 12.19601 114.5656 6.361297e-072.954137e-055.312790
9212.22030 12.22629 114.5647 6.361456e-072.954137e-055.312786
9312.28066 12.24067 114.5480 6.364439e-072.954137e-055.312712
9412.13352 12.16809 114.5474 6.364545e-072.954137e-055.312709
9512.04900 11.96460 114.5259 6.368395e-072.954137e-055.312614
9612.27975 12.26686 114.4355 6.384589e-072.954137e-055.312214
9712.19763 12.27570 114.4078 6.389574e-072.954137e-055.312091
9812.86782 13.04416 114.4035 6.390343e-072.954137e-055.312072
9912.04219 11.50384 114.3701 6.396356e-072.954137e-055.311924
10012.72957 12.72944 114.3418 6.401459e-072.954137e-055.311798

Limma is a package for differential expression analysis of data arising from microarray experiments. A linear model is fit to the expression data for each gene. Empirical Beyes (a shrinkage method) is used to borrow information across genes making the analyses stable. Linear models are used to analyse designed microarray experiments, allowing for very general experiments to be analysed easily. Two matrices need to be specified. The design matrix provides a representation of the different RNA targets which have been hybridized to the arrays. The contrast matrix allows the coefficients designed by the design matrix to be combined into contrasts of interest. Each contrast corresponds to a comparison of interest between the RNA targets.

results_puma<-decideTests(fit3, method="global",lfc=1) vennDiagram(results_puma)
Image in a Jupyter notebook

The Venn Diagram shows that 3761 genes and 3390 genes were expressed in only normal and only hypoxia conditions, respectively. 35170 genes were expressed in both normal and hypoxia conditions.

hist(fit3$p.value)
Image in a Jupyter notebook
dim(topDEGenes)
  1. 100
  2. 6
rownames(topDEGenes)
  1. '200958_s_at'
  2. '204006_s_at'
  3. '200748_s_at'
  4. '211919_s_at'
  5. '211742_s_at'
  6. '1555745_a_at'
  7. 'AFFX-HSAC07/X00351_3_at'
  8. 'AFFX-hum_alu_at'
  9. '212560_at'
  10. '217028_at'
  11. '208980_s_at'
  12. '202727_s_at'
  13. '202917_s_at'
  14. '200801_x_at'
  15. '200668_s_at'
  16. '201368_at'
  17. '212587_s_at'
  18. '202388_at'
  19. '200704_at'
  20. '200794_x_at'
  21. '215952_s_at'
  22. '209201_x_at'
  23. 'AFFX-HSAC07/X00351_5_at'
  24. '201721_s_at'
  25. '224765_at'
  26. '228754_at'
  27. 'AFFX-HSAC07/X00351_M_at'
  28. '208679_s_at'
  29. '204122_at'
  30. '211296_x_at'
  31. '202391_at'
  32. '207238_s_at'
  33. '211940_x_at'
  34. '208763_s_at'
  35. '225414_at'
  36. '203535_at'
  37. '207008_at'
  38. '211911_x_at'
  39. '1555756_a_at'
  40. '201858_s_at'
  41. '204774_at'
  42. '1553588_at'
  43. '213828_x_at'
  44. '228846_at'
  45. '209732_at'
  46. '210774_s_at'
  47. '211997_x_at'
  48. '224373_s_at'
  49. '216231_s_at'
  50. '208616_s_at'
  51. '204959_at'
  52. '213702_x_at'
  53. '202902_s_at'
  54. '208718_at'
  55. '220990_s_at'
  56. '225364_at'
  57. '216438_s_at'
  58. '201210_at'
  59. '200774_at'
  60. '218614_at'
  61. '224761_at'
  62. '226810_at'
  63. '208783_s_at'
  64. '200059_s_at'
  65. '208788_at'
  66. '224583_at'
  67. '205922_at'
  68. 'AFFX-HUMGAPDH/M33197_3_at'
  69. '232617_at'
  70. '218205_s_at'
  71. '226979_at'
  72. '211676_s_at'
  73. '211956_s_at'
  74. '200921_s_at'
  75. '209083_at'
  76. '203509_at'
  77. '221059_s_at'
  78. '212788_x_at'
  79. '207988_s_at'
  80. '200706_s_at'
  81. '202833_s_at'
  82. '204563_at'
  83. '201779_s_at'
  84. '200920_s_at'
  85. '200729_s_at'
  86. '209112_at'
  87. '217983_s_at'
  88. '224372_at'
  89. '208736_at'
  90. '224451_x_at'
  91. '1553570_x_at'
  92. '217967_s_at'
  93. '204351_at'
  94. '209933_s_at'
  95. '201862_s_at'
  96. '200904_at'
  97. '205568_at'
  98. '211506_s_at'
  99. '213241_at'
  100. '209949_at'
write.table(rownames(topDEGenes),"/projects/ddda6a8e-2bca-47f5-b1d6-79b2c48d0e30/Autumn2016/ProjectC/data_projectC.txt")
pumaDERes<-pumaDE(eset_puma_comb) pumaDERes
DEResult object: DEMethod = pumaDE statisticDescription = Probability of Positive Log Ratio (PPLR) statistic = 54675 probesets x 7 contrasts
getwd()
'/projects/ddda6a8e-2bca-47f5-b1d6-79b2c48d0e30/Autumn2016/ProjectC/data_projectC'
write.reslts(pumaDERes, file="pumaDERes")
library(hgu133plus2.db) library(annotate) geneProbes<-as.character(rownames(topDEGenes)) annotated_list<-select(hgu133plus2.db, geneProbes,c("SYMBOL","GENENAME")) annotated_list
Loading required package: AnnotationDbi Loading required package: org.Hs.eg.db Loading required package: DBI Loading required package: XML 'select()' returned 1:many mapping between keys and columns
PROBEIDSYMBOLGENENAME
200958_s_at SDCBP syndecan binding protein (syntenin)
204006_s_at FCGR3B Fc fragment of IgG, low affinity IIIb, receptor (CD16b)
204006_s_at FCGR3A Fc fragment of IgG, low affinity IIIa, receptor (CD16a)
200748_s_at FTH1 ferritin, heavy polypeptide 1
211919_s_at CXCR4 chemokine (C-X-C motif) receptor 4
211742_s_at EVI2B ecotropic viral integration site 2B
1555745_a_at LYZ lysozyme
AFFX-HSAC07/X00351_3_at ACTB actin, beta
AFFX-hum_alu_at NA NA
212560_at SORL1 sortilin-related receptor, L(DLR class) A repeats containing
217028_at CXCR4 chemokine (C-X-C motif) receptor 4
208980_s_at UBC ubiquitin C
202727_s_at IFNGR1 interferon gamma receptor 1
202917_s_at S100A8 S100 calcium binding protein A8
200801_x_at ACTB actin, beta
200668_s_at UBE2D3 ubiquitin-conjugating enzyme E2D 3
201368_at ZFP36L2 ZFP36 ring finger protein-like 2
212587_s_at PTPRC protein tyrosine phosphatase, receptor type, C
202388_at RGS2 regulator of G-protein signaling 2
200704_at LITAF lipopolysaccharide-induced TNF factor
200794_x_at DAZAP2 DAZ associated protein 2
215952_s_at OAZ1 ornithine decarboxylase antizyme 1
209201_x_at CXCR4 chemokine (C-X-C motif) receptor 4
AFFX-HSAC07/X00351_5_at ACTB actin, beta
201721_s_at LAPTM5 lysosomal protein transmembrane 5
224765_at MSL1 male-specific lethal 1 homolog (Drosophila)
228754_at SLC6A6 solute carrier family 6 (neurotransmitter transporter), member 6
AFFX-HSAC07/X00351_M_at ACTB actin, beta
208679_s_at ARPC2 actin related protein 2/3 complex, subunit 2, 34kDa
204122_at TYROBP TYRO protein tyrosine kinase binding protein
211676_s_at IFNGR1 interferon gamma receptor 1
211956_s_at EIF1 eukaryotic translation initiation factor 1
200921_s_at BTG1 B-cell translocation gene 1, anti-proliferative
209083_at CORO1A coronin, actin binding protein, 1A
203509_at SORL1 sortilin-related receptor, L(DLR class) A repeats containing
221059_s_at CHST6 carbohydrate (N-acetylglucosamine 6-O) sulfotransferase 6
221059_s_at COTL1 coactosin-like F-actin binding protein 1
212788_x_at FTL ferritin, light polypeptide
207988_s_at ARPC2 actin related protein 2/3 complex, subunit 2, 34kDa
200706_s_at LITAF lipopolysaccharide-induced TNF factor
202833_s_at SERPINA1 serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 1
204563_at SELL selectin L
201779_s_at RNF13 ring finger protein 13
200920_s_at BTG1 B-cell translocation gene 1, anti-proliferative
200729_s_at ACTR2 ARP2 actin-related protein 2 homolog (yeast)
209112_at CDKN1B cyclin-dependent kinase inhibitor 1B (p27, Kip1)
217983_s_at RNASET2 ribonuclease T2
224372_at ND4 NADH dehydrogenase, subunit 4 (complex I)
208736_at ARPC3 actin related protein 2/3 complex, subunit 3, 21kDa
224451_x_at ARHGAP9 Rho GTPase activating protein 9
1553570_x_at COX2 cytochrome c oxidase subunit II
217967_s_at FAM129A family with sequence similarity 129, member A
204351_at S100P S100 calcium binding protein P
209933_s_at CD300A CD300a molecule
201862_s_at LRRFIP1 leucine rich repeat (in FLII) interacting protein 1
200904_at HLA-E major histocompatibility complex, class I, E
205568_at AQP9 aquaporin 9
211506_s_at CXCL8 chemokine (C-X-C motif) ligand 8
213241_at PLXNC1 plexin C1
209949_at NCF2 neutrophil cytosolic factor 2
annotated_list[,2]
  1. 'SDCBP'
  2. 'FCGR3B'
  3. 'FCGR3A'
  4. 'FTH1'
  5. 'CXCR4'
  6. 'EVI2B'
  7. 'LYZ'
  8. 'ACTB'
  9. 'NA'
  10. 'SORL1'
  11. 'CXCR4'
  12. 'UBC'
  13. 'IFNGR1'
  14. 'S100A8'
  15. 'ACTB'
  16. 'UBE2D3'
  17. 'ZFP36L2'
  18. 'PTPRC'
  19. 'RGS2'
  20. 'LITAF'
  21. 'DAZAP2'
  22. 'OAZ1'
  23. 'CXCR4'
  24. 'ACTB'
  25. 'LAPTM5'
  26. 'MSL1'
  27. 'SLC6A6'
  28. 'ACTB'
  29. 'ARPC2'
  30. 'TYROBP'
  31. 'UBC'
  32. 'BASP1'
  33. 'PTPRC'
  34. 'H3F3A'
  35. 'H3F3B'
  36. 'H3F3AP4'
  37. 'TSC22D3'
  38. 'RNF149'
  39. 'S100A9'
  40. 'CXCR2'
  41. 'HLA-B'
  42. 'CLEC7A'
  43. 'SRGN'
  44. 'EVI2A'
  45. 'ND3'
  46. 'SH3KBP1'
  47. 'H3F3A'
  48. 'H3F3B'
  49. 'H3F3AP4'
  50. 'MXD1'
  51. 'CLEC2B'
  52. 'NCOA4'
  53. 'H3F3B'
  54. 'H3F3A'
  55. 'MIR4738'
  56. 'ND4'
  57. 'B2M'
  58. 'PTP4A2'
  59. 'MNDA'
  60. 'ASAH1'
  61. 'CTSS'
  62. 'DDX17'
  63. 'VMP1'
  64. 'MIR21'
  65. 'STK4'
  66. 'TMSB4X'
  67. 'DDX3X'
  68. 'FAM120A'
  69. 'KIAA1551'
  70. 'GNA13'
  71. 'OGFRL1'
  72. 'CD46'
  73. 'RHOA'
  74. 'ELOVL5'
  75. 'COTL1'
  76. 'VNN2'
  77. 'GAPDH'
  78. 'CTSS'
  79. 'MKNK2'
  80. 'MAP3K2'
  81. 'IFNGR1'
  82. 'EIF1'
  83. 'BTG1'
  84. 'CORO1A'
  85. 'SORL1'
  86. 'CHST6'
  87. 'COTL1'
  88. 'FTL'
  89. 'ARPC2'
  90. 'LITAF'
  91. 'SERPINA1'
  92. 'SELL'
  93. 'RNF13'
  94. 'BTG1'
  95. 'ACTR2'
  96. 'CDKN1B'
  97. 'RNASET2'
  98. 'ND4'
  99. 'ARPC3'
  100. 'ARHGAP9'
  101. 'COX2'
  102. 'FAM129A'
  103. 'S100P'
  104. 'CD300A'
  105. 'LRRFIP1'
  106. 'HLA-E'
  107. 'AQP9'
  108. 'CXCL8'
  109. 'PLXNC1'
  110. 'NCF2'
write.table(annotated_list[,2],"/projects/ddda6a8e-2bca-47f5-b1d6-79b2c48d0e30/Autumn2016/ProjectC/data_projectC/SYMBOL.txt")
dir()
  1. 'DEGenesSYMBOL.txt'
  2. 'eset_puma.RDA'
  3. 'eset_pumacomb.RDA'
  4. 'LPGMa.CEL'
  5. 'LPGMb.CEL'
  6. 'LPHa.CEL'
  7. 'LPHb.CEL'
  8. 'PANTHER_Pathway.png'
  9. 'pantherChart.txt'
  10. 'pantherGeneList.txt'
  11. 'pumaDERes_FCs.csv'
  12. 'pumaDERes_statistics.csv'
  13. 'SYMBOL.txt'
pumaDE_stat<-read.csv("pumaDERes_statistics.csv") pumaDE_FC<-read.csv("pumaDERes_FCs.csv")
head(pumaDE_stat)
XNormal.1_vs_Hypoxia.1Hypoxia.2_vs_Hypoxia.1Normal.2_vs_Normal.1Normal.2_vs_Hypoxia.2Condition_Hypoxia_vs_NormalSample_1_vs_2Int__Condition_Hypoxia.Normal_vs_Sample_1.2
1007_s_at0.51137500.50407630.48825520.49555390.49754980.50271180.4944060
1053_at 0.51642130.48955190.50405390.53089810.48326240.50226110.5051275
117_at 0.40820590.47613490.49174970.42348430.55974130.51135820.5055253
121_at 0.49657470.49859100.49745390.49543770.50282410.50139830.4995980
1255_g_at0.49659980.49631110.50341080.50369960.49989410.50009830.5025101
1294_at 0.48078380.51981140.52128160.48225440.51307040.48546820.5005205
probeid<-pumaDE_stat[,1] PPLR_N1vsH1<-pumaDE_stat[,2] PPLR_N2vsH2<-pumaDE_stat[,5] pumaRes<-data.frame(probeid,PPLR_N1vsH1,PPLR_N2vsH2) pumaRes
probeidPPLR_N1vsH1PPLR_N2vsH2
1007_s_at 0.5113750 0.4955539
1053_at 0.5164213 0.5308981
117_at 0.4082059 0.4234843
121_at 0.4965747 0.4954377
1255_g_at 0.4965998 0.5036996
1294_at 0.4807838 0.4822544
1316_at 0.5027632 0.4971939
1320_at 0.4974770 0.5011232
1405_i_at 0.4949343 0.4911106
1431_at 0.4815598 0.5103977
1438_at 0.5009881 0.5005768
1487_at 0.5521465 0.5394558
1494_f_at 0.5036250 0.4874812
1552256_a_at0.4859761 0.5094893
1552257_a_at0.5269255 0.5426862
1552258_at 0.5296864 0.5360022
1552261_at 0.5024123 0.4974071
1552263_at 0.5002512 0.5120329
1552264_a_at0.5087401 0.5142300
1552266_at 0.4986212 0.5178322
1552269_at 0.5007328 0.5008457
1552271_at 0.5004264 0.5022765
1552272_a_at0.4978715 0.4888625
1552274_at 0.9742525 0.9869562
1552275_s_at0.9382355 0.9633200
1552276_a_at0.5002337 0.5004873
1552277_a_at0.5154551 0.5125756
1552278_a_at0.4979116 0.4996785
1552279_a_at0.4968687 0.4988454
1552280_at 0.5000570 0.4988962
AFFX-PheX-3_at 6.383827e-20 2.273321e-19
AFFX-PheX-5_at 4.565683e-10 4.361088e-10
AFFX-PheX-M_at 1.048477e-09 1.116457e-09
AFFX-r2-Bs-dap-3_at 9.091292e-10 4.329772e-10
AFFX-r2-Bs-dap-5_at 9.638324e-14 1.339950e-13
AFFX-r2-Bs-dap-M_at 8.537546e-08 1.068900e-07
AFFX-r2-Bs-lys-3_at 1.101316e-17 2.225752e-17
AFFX-r2-Bs-lys-5_at 3.508264e-27 1.746063e-27
AFFX-r2-Bs-lys-M_at 1.291064e-21 1.064751e-21
AFFX-r2-Bs-phe-3_at 5.069334e-23 1.557718e-22
AFFX-r2-Bs-phe-5_at 2.727829e-19 8.052990e-19
AFFX-r2-Bs-phe-M_at 6.153183e-12 5.725746e-12
AFFX-r2-Bs-thr-3_s_at2.039883e-20 5.979694e-20
AFFX-r2-Bs-thr-5_s_at9.929500e-17 1.425241e-16
AFFX-r2-Bs-thr-M_s_at2.017511e-20 4.427426e-19
AFFX-r2-Ec-bioB-3_at 5.074861e-01 5.188246e-01
AFFX-r2-Ec-bioB-5_at 4.782735e-01 4.845650e-01
AFFX-r2-Ec-bioB-M_at 4.750756e-01 5.299473e-01
AFFX-r2-Ec-bioC-3_at 4.848563e-01 5.102748e-01
AFFX-r2-Ec-bioC-5_at 4.846223e-01 5.212988e-01
AFFX-r2-Ec-bioD-3_at 4.914816e-01 5.154334e-01
AFFX-r2-Ec-bioD-5_at 4.877219e-01 5.165479e-01
AFFX-r2-P1-cre-3_at 4.936170e-01 5.089681e-01
AFFX-r2-P1-cre-5_at 4.963582e-01 5.091534e-01
AFFX-ThrX-3_at 1.478307e-11 1.845208e-11
AFFX-ThrX-5_at 3.374282e-01 3.317903e-01
AFFX-ThrX-M_at 2.961146e-21 1.959100e-21
AFFX-TrpnX-3_at 4.753222e-01 5.152153e-01
AFFX-TrpnX-5_at 4.894762e-01 4.862144e-01
AFFX-TrpnX-M_at 4.999410e-01 5.024540e-01
down_N1vsH1<-pumaRes[pumaRes$PPLR_N1vsH1<=0.2,1] up_N1vsH1<-pumaRes[pumaRes$PPLR_N1vsH1>=0.8,1] down_N2vsH2<-pumaRes[pumaRes$PPLR_N2vsH2<=0.2,1] up_N2vsH2<-pumaRes[pumaRes$PPLR_N2vsH2>=0.8,1] downDE<-data.frame(match(down_N1vsH1,down_N2vsH2)) downDE<-downDE[!is.na(downDE)] upDE<-data.frame(match(up_N1vsH1,up_N2vsH2)) upDE<-upDE[!is.na(upDE)]
DE<-data.frame(match(downDE,upDE)) DE<-DE[!is.na(DE)] length(DE)
928
head(pumaDE_FC)
XNormal.1_vs_Hypoxia.1Hypoxia.2_vs_Hypoxia.1Normal.2_vs_Normal.1Normal.2_vs_Hypoxia.2Condition_Hypoxia_vs_NormalSample_1_vs_2Int__Condition_Hypoxia.Normal_vs_Sample_1.2
1007_s_at 0.0008676691 0.0003108934-0.0008958805-0.0003391048-2.642822e-04 2.924935e-04-6.033869e-04
1053_at 0.0011140228-0.0007086750 0.0002749431 0.0020976409-1.605832e-03 2.168660e-04 4.918090e-04
117_at -0.0125714683-0.0032412178-0.0011199112-0.0104501617 1.151082e-02 2.180565e-03 1.060653e-03
121_at -0.0002734692-0.0001124929-0.0002032746-0.0003642509 3.188601e-04 1.578838e-04-4.539084e-05
1255_g_at -0.0001363636-0.0001479422 0.0001367911 0.0001483697-6.003047e-06 5.575531e-06 1.423666e-04
1294_at -0.0018486011 0.0019059397 0.0020474700-0.0017070709 1.777836e-03-1.976705e-03 7.076513e-05
geneProbes<-as.character(pumaDE_FC$X) annotated_list<-select(hgu133plus2.db,geneProbes,c("SYMBOL","GENENAME")) DEGenes=annotated_list[pumaRes[DE,1],] DEGenes dim(DEGenes)
'select()' returned 1:many mapping between keys and columns
PROBEIDSYMBOLGENENAME
11007_s_at DDR1 discoidin domain receptor tyrosine kinase 1
21007_s_at MIR4640 microRNA 4640
31053_at RFC2 replication factor C (activator 1) 2, 40kDa
5121_at PAX8 paired box 8
61255_g_at GUCA1A guanylate cyclase activator 1A (retina)
71294_at UBA7 ubiquitin-like modifier activating enzyme 7
81294_at MIR5193 microRNA 5193
91316_at THRA thyroid hormone receptor, alpha
101320_at PTPN21 protein tyrosine phosphatase, non-receptor type 21
111405_i_at CCL5 chemokine (C-C motif) ligand 5
121431_at CYP2E1 cytochrome P450, family 2, subfamily E, polypeptide 1
131438_at EPHB3 EPH receptor B3
141487_at ESRRA estrogen-related receptor alpha
151494_f_at CYP2A6 cytochrome P450, family 2, subfamily A, polypeptide 6
161552256_a_at SCARB1 scavenger receptor class B, member 1
181552258_at LINC00152 long intergenic non-protein coding RNA 152
191552261_at WFDC2 WAP four-disulfide core domain 2
201552263_at MAPK1 mitogen-activated protein kinase 1
211552264_a_at MAPK1 mitogen-activated protein kinase 1
221552266_at ADAM32 ADAM metallopeptidase domain 32
241552271_at PRR22 proline rich 22
251552272_a_at PRR22 proline rich 22
261552274_at PXK PX domain containing serine/threonine kinase
271552275_s_at PXK PX domain containing serine/threonine kinase
281552276_a_at VPS18 vacuolar protein sorting 18 homolog (S. cerevisiae)
291552277_a_at MSANTD3 Myb/SANT-like DNA-binding domain containing 3
301552278_a_at SLC46A1 solute carrier family 46 (folate transporter), member 1
311552279_a_at SLC46A1 solute carrier family 46 (folate transporter), member 1
321552280_at TIMD4 T-cell immunoglobulin and mucin domain containing 4
331552281_at SLC39A5 solute carrier family 39 (zinc transporter), member 5
8991553462_at NA NA
9001553464_at FLJ40288 uncharacterized FLJ40288
9011553465_a_at CES5A carboxylesterase 5A
9021553466_at CFAP47 cilia and flagella associated protein 47
9031553467_at DCAF8L2 DDB1 and CUL4 associated factor 8-like 2
9041553467_at FLJ32742 uncharacterized locus FLJ32742
9051553467_at LOC101928481 uncharacterized LOC101928481
9061553468_at HYDIN HYDIN, axonemal central pair apparatus protein
9071553468_at HYDIN2 HYDIN2, axonemal central pair apparatus protein (pseudogene)
9081553470_at DNAH17 dynein, axonemal, heavy chain 17
9091553471_at SLC35G3 solute carrier family 35, member G3
9101553472_at LOC150596 uncharacterized LOC150596
9111553474_at LOC100288966 POTE ankyrin domain family member D-like
9121553475_at NA NA
9131553478_at KIRREL3-AS3 KIRREL3 antisense RNA 3
9141553479_at TMEM145 transmembrane protein 145
9151553482_at C15orf32 chromosome 15 open reading frame 32
9161553483_at TSGA10IP testis specific, 10 interacting protein
9171553484_at LINC00477 long intergenic non-protein coding RNA 477
9181553485_at CCDC140 coiled-coil domain containing 140
9191553486_a_at C17orf78 chromosome 17 open reading frame 78
9201553488_at TEKT5 tektin 5
9211553489_a_at TEKT5 tektin 5
9221553491_at KSR2 kinase suppressor of ras 2
9231553492_a_at PAX1 paired box 1
9241553493_a_at TDH L-threonine dehydrogenase (pseudogene)
9251553494_at TDH L-threonine dehydrogenase (pseudogene)
9261553497_at LINC00615 long intergenic non-protein coding RNA 615
9271553498_at NA NA
9281553499_s_at SERPINA9 serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 9
  1. 928
  2. 3
group<-factor(c("Normal","Normal","Hypoxia","Hypoxia")) design<-model.matrix(~0+group) colnames(design)<-c("Normal","Hypoxia") contrast.matrix_rma<- makeContrasts(Normal,Hypoxia,levels=design) design contrast.matrix_rma fitrma<-lmFit(eset_rma,design) fit2rma<-contrasts.fit(fit,contrasts=contrast.matrix_rma) fit3rma<-eBayes(fit2rma) topDEGenes_rma<-topTable(fit3, coef=1, adjust="BH", n=100, lfc=1) topDEGenes_rma dim(topDEGenes_rma)
NormalHypoxia
101
201
310
410
NormalHypoxia
Normal10
Hypoxia01
logFCAveExprtP.Valueadj.P.ValB
200958_s_at13.16920 13.11567 124.3121 4.891234e-072.954137e-055.351285
204006_s_at12.87494 12.61830 123.9542 4.936824e-072.954137e-055.350019
200748_s_at13.33368 13.34257 122.0398 5.190456e-072.954137e-055.343069
211919_s_at12.99841 12.80683 121.8356 5.218509e-072.954137e-055.342310
211742_s_at12.62507 12.33153 121.0488 5.328459e-072.954137e-055.339351
1555745_a_at12.85586 12.53620 121.0256 5.331751e-072.954137e-055.339263
AFFX-HSAC07/X00351_3_at12.84239 12.95069 120.9210 5.346613e-072.954137e-055.338865
AFFX-hum_alu_at13.11747 13.12849 120.9017 5.349360e-072.954137e-055.338792
212560_at12.62891 12.38139 120.4412 5.415465e-072.954137e-055.337029
217028_at13.12695 12.97323 120.2231 5.447149e-072.954137e-055.336188
208980_s_at12.53417 12.55008 120.1028 5.464738e-072.954137e-055.335722
202727_s_at12.53732 12.55374 119.9873 5.481680e-072.954137e-055.335273
202917_s_at13.44072 13.46929 119.8412 5.503220e-072.954137e-055.334704
200801_x_at12.79446 12.86409 119.7931 5.510328e-072.954137e-055.334517
200668_s_at12.48924 12.47371 119.5855 5.541183e-072.954137e-055.333703
201368_at12.71576 12.42925 119.5836 5.541462e-072.954137e-055.333696
212587_s_at12.65005 12.59432 119.5826 5.541612e-072.954137e-055.333692
202388_at12.95929 12.60095 119.3217 5.580716e-072.954137e-055.332665
200704_at12.84004 12.89974 119.2973 5.584377e-072.954137e-055.332569
200794_x_at12.66494 12.71327 119.1220 5.610873e-072.954137e-055.331874
215952_s_at12.73163 12.62020 118.9827 5.632039e-072.954137e-055.331321
209201_x_at12.96794 12.76079 118.9804 5.632398e-072.954137e-055.331311
AFFX-HSAC07/X00351_5_at12.36732 12.55121 118.8585 5.651011e-072.954137e-055.330826
201721_s_at13.02248 13.02379 118.8531 5.651844e-072.954137e-055.330804
224765_at12.35501 11.72387 118.6302 5.686084e-072.954137e-055.329912
228754_at12.34079 12.06019 118.5799 5.693850e-072.954137e-055.329710
AFFX-HSAC07/X00351_M_at12.53094 12.68500 118.5455 5.699171e-072.954137e-055.329572
208679_s_at12.82208 12.76857 118.4661 5.711483e-072.954137e-055.329252
204122_at12.78061 12.81913 118.3847 5.724132e-072.954137e-055.328923
211296_x_at12.73171 12.73269 118.3529 5.729075e-072.954137e-055.328795
7112.16112 11.66354 115.4135 6.212097e-072.954137e-055.316500
7212.60173 12.61777 115.4121 6.212343e-072.954137e-055.316493
7312.82313 13.02037 115.2926 6.233099e-072.954137e-055.315975
7412.35619 11.88091 115.2806 6.235184e-072.954137e-055.315923
7512.01287 11.82985 115.2349 6.243133e-072.954137e-055.315725
7612.08655 11.76656 115.1775 6.253161e-072.954137e-055.315475
7712.08300 11.74270 115.1515 6.257710e-072.954137e-055.315361
7813.05271 13.04967 115.0145 6.281731e-072.954137e-055.314763
7912.08619 11.89104 114.9890 6.286219e-072.954137e-055.314652
8012.63099 12.73537 114.9662 6.290222e-072.954137e-055.314552
8112.04174 12.15283 114.9463 6.293735e-072.954137e-055.314465
8212.28123 12.27198 114.9459 6.293805e-072.954137e-055.314463
8312.36272 12.52576 114.9123 6.299725e-072.954137e-055.314316
8412.09863 11.61412 114.8484 6.311019e-072.954137e-055.314035
8512.11993 12.16425 114.7207 6.333660e-072.954137e-055.313474
8612.14712 11.62236 114.6764 6.341531e-072.954137e-055.313279
8712.16596 12.04887 114.6760 6.341598e-072.954137e-055.313277
8812.07946 12.06161 114.6299 6.349816e-072.954137e-055.313074
8911.99899 12.09922 114.6182 6.351897e-072.954137e-055.313022
9011.92110 11.82842 114.6063 6.354027e-072.954137e-055.312970
9112.19644 12.19601 114.5656 6.361297e-072.954137e-055.312790
9212.22030 12.22629 114.5647 6.361456e-072.954137e-055.312786
9312.28066 12.24067 114.5480 6.364439e-072.954137e-055.312712
9412.13352 12.16809 114.5474 6.364545e-072.954137e-055.312709
9512.04900 11.96460 114.5259 6.368395e-072.954137e-055.312614
9612.27975 12.26686 114.4355 6.384589e-072.954137e-055.312214
9712.19763 12.27570 114.4078 6.389574e-072.954137e-055.312091
9812.86782 13.04416 114.4035 6.390343e-072.954137e-055.312072
9912.04219 11.50384 114.3701 6.396356e-072.954137e-055.311924
10012.72957 12.72944 114.3418 6.401459e-072.954137e-055.311798
  1. 100
  2. 6
hist(fit3rma$p.value)
Image in a Jupyter notebook
results_rma<-decideTests(fit3rma, method="global",lfc=1) vennDiagram(results_rma)
Image in a Jupyter notebook

Step 6:

PCA is a mathematical algorithm that reduces the dimensionality of the data while retaining most of the variation in the data set. It does so by identifying directions, called principal components, along which the variation in the data is maximal. PCA plots check whether the overall variability of the samples reflect their groupings.

pca_hypoxia <- prcomp(t(e_rma)) plot(pca_hypoxia$x, xlab="Component 1", ylab="Component 2", pch=unclass(as.factor(pData(eset_rma)[,1])), col=unclass(as.factor(pData(eset_rma)[,2])), main="Standard PCA") groups<-paste(eset_rma$Sample, eset_rma$Condition, sep =" ") legend(0,0,groups,pch=unclass(as.factor(pData(eset_rma)[,1])) , col=unclass(as.factor(pData(eset_rma)[,2])))
Image in a Jupyter notebook
pumapca_hypoxia=pumaPCA(eset_puma_normd) plot(pumapca_hypoxia)
Iteration number: 1 Iteration number: 2 Iteration number: 3 Iteration number: 4 Iteration number: 5
Image in a Jupyter notebook

The two PCA plots show that the gene expressions of the neutrophils under the two conditions do vary. This is clear on the plots as the points for the two hypoxia samples are on right side of the plot, whereas the two normal samples are found on the left of the plot. It is unclear whether there is a clear difference between the gene expressions of the two sample groups themselves; in order to observe a clear difference between samples, more conditions are required, such as different levels of hypoxia.

Step 7:

Heat maps and clustering are often used in gene expression analysis studies to visualise the data and for quality control. It is a graphical representation of the data where the individual values in the matrix are represented as colours. They compares the level of gene expression of a number of samples, allowing for immediate visualisation of the data by assigning different colours to each gene, and it is possible to see clusters of genes with similar or hugely different expression values.

library(gplots) tID<-rownames(topDEGenes) ind<-1 j<-1 for (i in 1: length(tID)) { ind[j]<-which(rownames(eset_rma)==tID[i],arr.ind=TRUE) j<-j+1 } topExpr<-e_rma[ind,] heatmap.2(topExpr, col=redgreen(75), scale="row", key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5, cexCol=0.8)
Attaching package: ‘gplots’ The following object is masked from ‘package:IRanges’: space The following object is masked from ‘package:stats’: lowess
Image in a Jupyter notebook
library(gplots) tID<-rownames(topDEGenes) ind<-1 j<-1 for (i in 1: length(tID)) { ind[j]<-which(rownames(eset_puma)==tID[i],arr.ind=TRUE) j<-j+1 } topExpr<-e_puma[ind,] heatmap.2(topExpr, col=redgreen(75), scale="row", key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5, cexCol=0.8)
Image in a Jupyter notebook

The heatmaps above are generated from eset_rma and eset_puma data, both showing similar patterns of gene expression, as indicated by the colours. From both methods, it can be deduced that a lot of genes that are expressed in samples order normal conditions are not expressed in samples under hypoxia, confirming that hypoxia has an effect on neutrophil gene expression.

Step 8:

This step involves the functional/ pathway analysis of differentially expressed targets using PANTHER or DAVID. DAVID is the online Database for Annotation, Visualization and Integrated Discovery, which can be used to convert a list of gene IDs. PANTHER (Protein ANalysis THrough Evolutionary Relationships) can be used to classify proteins and identify the key pathways involved in the difference in gene expression observed. PANTHER is used in this project to identify the key pathways in regulating gene expression in neutrophils under hypoxia and normal conditions.

setwd("~/Autumn2016/ProjectC/data_projectC")
GeneList<-read.table("pantherGeneList.txt", fill=TRUE) GeneList
WARNING: Some output was deleted.
pantherChart<-read.table("pantherChart.txt", fill=TRUE) pantherChart
V1V2V3V4V5V6V7V8V9V10
1 Axon guidance mediated by Slit/Robo (P00008) 1 1.1% 1.6%
2 JAK/STAT signaling pathway (P00038) 1 1.1% 1.6%
3 Axon guidance mediated by semaphorins (P00007) 1 1.1% 1.6%
4 p38 MAPK pathway (P05918) 1 1.1% 1.6%
5 Interleukin signaling pathway (P00036) 4 4.5% 6.3%
6 Angiogenesis (P00005) 1 1.1% 1.6%
7 Interferon-gamma signaling pathway (P00035) 1 1.1% 1.6%
8 Alzheimer disease-presenilin pathway (P00004) 2 2.3% 3.1%
9 Integrin signalling pathway (P00034) 5 5.7% 7.8%
10 Inflammation mediated by chemokine and cytokine signaling pathway (P00031)
9 10.2% 14.1%
11 Ubiquitin proteasome pathway (P00060) 1 1.1% 1.6%
12 Angiotensin II-stimulated signaling through G proteins and beta-arrestin (P05911)
1 1.1% 1.6%
13 Endothelin signaling pathway (P00019) 1 1.1% 1.6%
14 EGF receptor signaling pathway (P00018) 1 1.1% 1.6%
15 Gonadotropin-releasinghormone receptor pathway (P06664) 2 2.3% 3.1%
16 DNA replication (P00017) 2 2.3% 3.1%
17 PDGF signaling pathway (P00047) 3 3.4% 4.7%
18 Cytoskeletal regulation by Rho GTPase (P00016) 3 3.4% 4.7%
19 Oxidative stress response (P00046) 1 1.1% 1.6%
20 Ras Pathway (P04393) 1 1.1% 1.6%
21 Nicotinic acetylcholine receptor signaling pathway (P00044) 1 1.1% 1.6%
22 Cadherin signaling pathway (P00012) 2 2.3% 3.1%
23 Blood coagulation (P00011) 1 1.1% 1.6%
24 B cell activation (P00010) 2 2.3% 3.1%
25 CCKR signaling map (P06959) 4 4.5% 6.3%
26 Huntington disease (P00029) 3 3.4% 4.7%
27 Heterotrimeric G-protein signaling pathway-Gq alpha and Go alpha mediated
pathway (P00027) 2 2.3% 3.1%
28 Wnt signaling pathway (P00057) 1 1.1% 1.6%
29 Heterotrimeric G-protein signaling pathway-Gi alpha and Gs alpha mediated
pathway (P00026) 1 1.1% 1.6%
30 Glycolysis (P00024) 1 1.1% 1.6%
31 Toll receptor signaling pathway (P00054) 1 1.1% 1.6%
32 T cell activation (P00053) 2 2.3% 3.1%
33 FGF signaling pathway (P00021) 1 1.1% 1.6%

Pathway analysis using PANTHER

From PANTHER, the gene list and the pathways they work in have been identified, with the piechart showing the percentage of genes that are present in each pathway. The most prominent pathway in the effects of hypoxia on human neutrophils is identified as the Inflammation mediated by chemokine and cytokine signaling pathway.

Discussion

In this project, the aim was to stimate gene expression levels, and analyse the results to identify the genes that are changing between the two conditions of normal and hypoxia, defining the potential pathways that hypoxia may have altered in neutrophils. The methods of RMA and MAS5 were used, and first diagnostics performed in order to identify the suitable method to continue with. RMA was chosen as no further normalisation was required. The PUMA package was also used, and the data combined using a Bayesian Hierarchical model, further analysis was done in order to obtain the fold change in gene expression. Limma was used for Differential Expression Analysis, and the p-value calculated. The data was visualised using PCA, indicating that there is a clear difference between the gene expression of neutrophils under normal, or hypoxia conditions. This was further supported by the heatmaps generated.

Through the use of PANTHER, it was possible to identify the key pathways that are regulating the effects of hypoxia on human neutrophils - The Inflammation mediated by chemikine and cytokine signaling pathway.

References

https://www.bioconductor.org/packages/devel/bioc/vignettes/puma/inst/doc/puma.pdf https://www.bioconductor.org/packages/devel/bioc/manuals/puma/man/puma.pdf https://www.bioconductor.org/packages/release/bioc/vignettes/affy/inst/doc/affy.pdf http://svitsrv25.epfl.ch/R-doc/library/Biobase/html/00Index.html#P http://bioinfo.cipf.es/babelomicstutorial/maplot http://www.bioinformatics.babraham.ac.uk/projects/seqmonk/Help/3 Visualisation/3.2 Figures and Graphs/3.2.13 The MA Plot.html https://www.biostars.org/p/101727/ http://www.nature.com/ng/journal/v32/n4s/pdf/ng1032.pdf http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-15-103 http://www.nature.com/nbt/journal/v26/n3/pdf/nbt0308-303.pdf http://wiki.bits.vib.be/index.php/Analyze_your_own_microarray_data_in_R/Bioconductor#MA_plots http://arrayanalysis.org/main.html