Project: Jingyi Xie - Autumn2016/BMS353

Path: Autumn2016 / ProjectC / mda14jx Final Project - Effects of Hypoxia on human Neutrophils.ipynb

Views: ⁴⁰³¹

Kernel: SageMath (stable)

In [1]:

options(jupyter.plot_mimetypes ='image/png')

THE EFFECT OF HYPOXIA ON HUMAN NEUTROPHILS

In this study, the effects of Hypoxia on human Neutrophils were investigated in order to identify the possible involvement of inflammatory response in adverse prognosis of hypoxia-related disease, such as pulmonary hypertension and myocardial infarction. Primary cultures of human neutrophils were studied in both normal and hypoxia conditions. A gene expression profile of the neutrophils in both conditions were done after centain amounts of time in culture, and quantified using Affymetrix GeneChip HGU133 PLUS 2. The study was conducted on two separate samples.

This report aims to estimate gene expression levels, and analyse the results to identify the genes that are changing between the two conditions, defining the potential pathways that hypoxia may have altered in neutrophils.

Data Analysis

Workflow

Step 1: Load packages with data from Bioconductor, library(affy) - mas5, rma, library(puma)

Step 2: Load and read data, create affybatch. Annotate with pData.

Step 3: Analysis of gene expression data with different methods and normalisation techniques.

Create eset
Extract gene expression
First diagnostic using density() and boxplot()
Normalisation by log2 if required

Step 4: Diagnostics of the data with plotting techniques

MAPlot
boxplot

Step 5: Differential Expression Analysis

For puma, combine the data using an bayesian Hierarchical model
Check the dimension and the pData() for the eset of the combined values. Calculate the FC and plot the data with a MA plot using the command ma.plot()
MAPlot
use of limma for DE analysis. Remember the three core steps of limma

Step 1: build the design contrast matrix
Step 2: fit the linear model
Step 3: calculate the p-values and FDRs with a empirical Bayes test

Step 6: Visualisation of Data with PCA

perform PCA in R using the command prcomp()
It needs the traspose command t() since the input for the prcomp() wants the genes in the columns
For probabilistic PCA you can use pumaPCA()

Step 7: Hierarchical clustering of DE (Differentially Expressed) genes

To perform this we need to activate a library called gplots. We will use the command heatmap.2().
We do clustering a the selected genes from our DE analysis this is to search for patterns in of differentially regulatend pathways.

Step 8: Functional/Pathway analysis of DE targets using PANTHER or DAVID

Step 1:

In [2]:

library(affy)

Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
    do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unlist, unsplit

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

This step loads the affy package, which is part of the BioConductor project, allowing for data analysis and exploration of Affymetrix oligonucleotide array probe level data. It summarises the probe set intensities, forming one expression measure (data available for analysis) for each gene. The package includes plotting functions for the probe level data useful for quality control, making it useful in the initial analysis of the data, it includes plotting functions for the data that can be useful for quality control of data, RNA degradation assessments, normaliasation and background correction procedures. It also allows for probe level data to be converted to expression measures. In this project, MAS 5.0 and RMA are used for perform the analysis.

Step 2:

Set working directory

In [3]:

setwd("~/Autumn2016/ProjectC/data_projectC")

In [4]:

getwd()

'/projects/ddda6a8e-2bca-47f5-b1d6-79b2c48d0e30/Autumn2016/ProjectC/data_projectC'

In order to load the data that is required, a working directory must be set, leading to where the data is saved.

In [5]:

hypoxia_filenames <- c("LPGMa.CEL","LPGMb.CEL","LPHa.CEL","LPHb.CEL")
affybatch.hypoxia <- ReadAffy(filenames=hypoxia_filenames)

The files that contain the data are saved in the .CEL format, indicating the files contain measured intensities and locations for an array that has been hybridised.

In [6]:

show(affybatch.hypoxia)

Warning message:
“replacing previous import ‘AnnotationDbi::tail’ by ‘utils::tail’ when loading ‘hgu133plus2cdf’”Warning message:
“replacing previous import ‘AnnotationDbi::head’ by ‘utils::head’ when loading ‘hgu133plus2cdf’”

AffyBatch object
size of arrays=1164x1164 features (18 kb)
cdf=HG-U133_Plus_2 (54675 affyids)
number of samples=4
number of genes=54675
annotation=hgu133plus2
notes=

The data shows the size of the array is 1154x1164 (18kb), cdf maps each gene that is in the array (54675 genes), and there are 4 samples.

In [7]:

phenoData(affybatch.hypoxia)
pData(affybatch.hypoxia)

An object of class 'AnnotatedDataFrame'
  sampleNames: LPGMa.CEL LPGMb.CEL LPHa.CEL LPHb.CEL
  varLabels: sample
  varMetadata: labelDescription

	sample
LPGMa.CEL	1
LPGMb.CEL	2
LPHa.CEL	3
LPHb.CEL	4

pData retrieves information on experimental phenotypes that are recorded.

In [8]:

pData(affybatch.hypoxia)<- data.frame(
    "Condition"=c("Normal", "Normal", "Hypoxia", "Hypoxia"), 
    "Sample"=c("1", "2", "1", "2"), 
    row.names=rownames(pData(affybatch.hypoxia)))
pData(affybatch.hypoxia)

	Condition	Sample
LPGMa.CEL	Normal	1
LPGMb.CEL	Normal	2
LPHa.CEL	Hypoxia	1
LPHb.CEL	Hypoxia	2

Step 3:

This step involves the analysis of gene expression data with different methods and normalisation techniques. The methods convert the probe level data to expression values, which is achieved through:

Reading in probe level data
Background correction
Normalization
Probe specific background correction
Summarising the probe set values into one expression measure

RMA and MAS 5.0 creates two different types of ExpressionSets, from which the gene expression values will be extracted.

In [9]:

eset_rma<-rma(affybatch.hypoxia)
show(eset_rma)

Background correcting
Normalizing
Calculating Expression
ExpressionSet (storageMode: lockedEnvironment)
assayData: 54675 features, 4 samples 
  element names: exprs 
protocolData
  sampleNames: LPGMa.CEL LPGMb.CEL LPHa.CEL LPHb.CEL
  varLabels: ScanDate
  varMetadata: labelDescription
phenoData
  sampleNames: LPGMa.CEL LPGMb.CEL LPHa.CEL LPHb.CEL
  varLabels: Condition Sample
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: hgu133plus2 

In [10]:

eset_mas5<-mas5(affybatch.hypoxia)
show(eset_mas5)

background correction: mas 
PM/MM correction : mas 
expression values: mas 
background correcting...done.
54675 ids to be processed
|                    |
|####################|
ExpressionSet (storageMode: lockedEnvironment)
assayData: 54675 features, 4 samples 
  element names: exprs, se.exprs 
protocolData
  sampleNames: LPGMa.CEL LPGMb.CEL LPHa.CEL LPHb.CEL
  varLabels: ScanDate
  varMetadata: labelDescription
phenoData
  sampleNames: LPGMa.CEL LPGMb.CEL LPHa.CEL LPHb.CEL
  varLabels: Condition Sample
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: hgu133plus2 

In [11]:

e_rma<-exprs(eset_rma)
head(e_rma)

	LPGMa.CEL	LPGMb.CEL	LPHa.CEL	LPHb.CEL
1007_s_at	6.359706	6.378816	6.399439	6.320410
1053_at	7.488801	7.490398	7.339287	7.136260
117_at	10.880260	10.738160	11.348179	11.316371
121_at	7.359978	7.344536	7.452264	7.372993
1255_g_at	2.471620	2.503735	2.670402	2.575980
1294_at	6.208576	6.346277	6.331029	6.551821

In [12]:

e_mas5<-exprs(eset_mas5)
head(e_mas5)

	LPGMa.CEL	LPGMb.CEL	LPHa.CEL	LPHb.CEL
1007_s_at	428.46053	362.13667	437.20195	404.803198
1053_at	604.36568	540.84643	549.49801	592.917234
117_at	7722.59667	7774.66079	11815.21462	11637.983040
121_at	461.12158	379.06718	470.19313	480.864624
1255_g_at	41.02802	48.25334	72.33834	4.947423
1294_at	246.59488	251.62389	293.92585	322.333187

In [13]:

density(e_rma)
density(e_mas5)

Call:
	density.default(x = e_rma)

Data: e_rma (218700 obs.);	Bandwidth 'bw' = 0.1711

       x                y            
 Min.   : 1.276   Min.   :8.300e-07  
 1st Qu.: 4.654   1st Qu.:1.038e-02  
 Median : 8.031   Median :4.186e-02  
 Mean   : 8.031   Mean   :7.396e-02  
 3rd Qu.:11.408   3rd Qu.:1.444e-01  
 Max.   :14.785   Max.   :2.408e-01  

Call:
	density.default(x = e_mas5)

Data: e_mas5 (218700 obs.);	Bandwidth 'bw' = 14.33

       x                  y            
 Min.   :  -42.87   Min.   :0.000e+00  
 1st Qu.:16716.38   1st Qu.:2.380e-07  
 Median :33475.62   Median :7.530e-07  
 Mean   :33475.62   Mean   :5.338e-05  
 3rd Qu.:50234.86   3rd Qu.:3.687e-06  
 Max.   :66994.10   Max.   :1.014e-02  

In [14]:

par(mfrow=c(1,1))
plot(density(e_rma[,1]),col="red", main="RMA Estimation")
lines(density(e_rma[,2]),col="blue")
lines(density(e_rma[,3]),col="green")
lines(density(e_rma[,4]),col="purple")
plot(density(e_mas5[,1]),col="red",main="Mas5 Estimation")
lines(density(e_mas5[,2]),col="blue")
lines(density(e_mas5[,3]),col="green")
lines(density(e_mas5[,4]),col="purple")

In [15]:

par(mfrow=c(1,1))
boxplot((e_rma), xlab="Neutrophil samples", ylab="Gene Expression", main="Boxplot of gene expression extracted using rma")
boxplot((e_mas5), xlab="Neutrophil samples", ylab="Gene Expression", main="Boxplot of gene expression extracted using mas5")

In [16]:

log2e_mas5<-log2(e_mas5)
head(log2e_mas5)

	LPGMa.CEL	LPGMb.CEL	LPHa.CEL	LPHb.CEL
1007_s_at	8.743019	8.500390	8.772156	8.661077
1053_at	9.239278	9.079075	9.101970	9.211687
117_at	12.914870	12.924564	13.528358	13.506553
121_at	8.849003	8.566310	8.877110	8.909487
1255_g_at	5.358538	5.592557	6.176689	2.306677
1294_at	7.945999	7.975125	8.199308	8.332409

In [17]:

density(log2e_mas5)

Call:
	density.default(x = log2e_mas5)

Data: log2e_mas5 (218700 obs.);	Bandwidth 'bw' = 0.2168

       x                y            
 Min.   :-3.749   Min.   :1.800e-07  
 1st Qu.: 1.358   1st Qu.:4.223e-03  
 Median : 6.466   Median :3.180e-02  
 Mean   : 6.466   Mean   :4.890e-02  
 3rd Qu.:11.574   3rd Qu.:8.519e-02  
 Max.   :16.681   Max.   :1.705e-01  

In [18]:

par(mfrow=c(1,1))
plot(density(log2e_mas5[,1]),col="red",main="Mas5 Estimation - normalised")
lines(density(log2e_mas5[,2]),col="blue")
lines(density(log2e_mas5[,3]),col="green")
lines(density(log2e_mas5[,4]),col="purple")
boxplot((log2e_mas5), xlab="Neutrophil samples", ylab="Gene Expression", main="Boxplot of gene expression extracted using mas5 - normalised")

Expression values for mas5 and rma are extracted and the first diagnostics performed on the data, using density() and boxplot(). Initial mas5 estimation showed the data was difficult to read due to the large values of the outliers, therefore a log2 transformation was performed, to change the scale and make the plots more readable. The transformation also eliminated much of the negative values. No transformation or normalisation was required however, as the medians are aligned, with no negative outliers. Therefore, further analysis is continued with the use of rma extracted expressions.

In [19]:

require(puma)

Loading required package: puma
Loading required package: oligo
Loading required package: oligoClasses
Welcome to oligoClasses version 1.32.0

Attaching package: ‘oligoClasses’

The following object is masked from ‘package:affy’:

    list.celfiles

Loading required package: Biostrings
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: IRanges
Loading required package: XVector
================================================================================
Welcome to oligo version 1.34.2
================================================================================

Attaching package: ‘oligo’

The following objects are masked from ‘package:affy’:

    intensity, MAplot, mm, mm<-, mmindex, pm, pm<-, pmindex,
    probeNames, rma

Loading required package: mclust
Package 'mclust' version 5.2
Type 'citation("mclust")' for citing this R package in publications.

puma (Propagating Uncertainty in Microarray Analysis) is another bioconductor package. Microarrays measure the expression level of thousands of genes simultaneously, therefore there are many significant soutces of uncertainties associated with it; these uncertainties must be considered to accurately infer from the data. Earlier methods used (mas5 and rma) only provide single point estimates that summarises the target concentration. By using probabilistic models such as puma for probe-level analysis, it is possible to associate gene expression levels with credibility intervals that quantify the measurement uncentainty associated with the estimate of target concentration with a sample. puma performs analysis through:

Calculation of expression levels and confidence measures for those levels from raw .CEL data
Combine uncertainty information from replicated arrays
Determine differential expression between conditions, or between more complex contrasts such as interaction terms
Cluster data taking the expression level uncertainty into account
Perform a noise-propagation version of principal compinent analysis (PCA)

In [20]:

eset_puma<-mmgmos(affybatch.hypoxia)
show(eset_puma)

Model optimising ..............................................................................................................
Expression values calculating ..............................................................................................................
Done.
Expression Set (exprReslt) with 
	54675 genes
	4 samples
	An object of class 'AnnotatedDataFrame'
  sampleNames: LPGMa.CEL LPGMb.CEL LPHa.CEL LPHb.CEL
  varLabels: Condition Sample
  varMetadata: labelDescription

In [21]:

eset_puma_normd <-pumaNormalize(eset_puma)

In [22]:

e_puma<-exprs(eset_puma)
head(e_puma)

	LPGMa.CEL	LPGMb.CEL	LPHa.CEL	LPHb.CEL
1007_s_at	5.902316	5.6103537	5.6174054	5.737430
1053_at	6.626212	6.6790712	6.3627049	6.180460
117_at	10.170737	10.1217316	10.6852717	10.553831
121_at	5.211771	4.8356632	5.5315978	5.400501
1255_g_at	-1.389828	0.2041968	0.2107774	-1.592713
1294_at	5.458750	5.9264062	5.8934533	6.153923

In [23]:

density(e_puma)

Call:
	density.default(x = e_puma)

Data: e_puma (218700 obs.);	Bandwidth 'bw' = 0.2729

       x                 y            
 Min.   :-35.061   Min.   :0.000e+00  
 1st Qu.:-22.699   1st Qu.:2.170e-06  
 Median :-10.336   Median :2.474e-05  
 Mean   :-10.336   Mean   :2.020e-02  
 3rd Qu.:  2.026   3rd Qu.:2.640e-02  
 Max.   : 14.388   Max.   :1.145e-01  

In [24]:

e_puma_normd<-exprs(eset_puma_normd) 
head(e_puma_normd)

	LPGMa.CEL	LPGMb.CEL	LPHa.CEL	LPHb.CEL
1007_s_at	5.902316	5.6103537	5.6174054	5.737430
1053_at	6.626212	6.6790712	6.3627049	6.180460
117_at	10.170737	10.1217316	10.6852717	10.553831
121_at	5.211771	4.8356632	5.5315978	5.400501
1255_g_at	-1.389828	0.2041968	0.2107774	-1.592713
1294_at	5.458750	5.9264062	5.8934533	6.153923

In [25]:

density(e_puma_normd)

Call:
	density.default(x = e_puma_normd)

Data: e_puma_normd (218700 obs.);	Bandwidth 'bw' = 0.2729

       x                 y            
 Min.   :-35.061   Min.   :0.000e+00  
 1st Qu.:-22.699   1st Qu.:2.170e-06  
 Median :-10.336   Median :2.474e-05  
 Mean   :-10.336   Mean   :2.020e-02  
 3rd Qu.:  2.026   3rd Qu.:2.640e-02  
 Max.   : 14.388   Max.   :1.145e-01  

After performing pumaNormalize() on the data, the first diagnostic tests showed that there is no difference to the data prior to normalisation, therefore indicating that the pumadata is already normalised.

In [26]:

plot(density(e_puma[,1]),col="red", main="PUMA Estimation")
lines(density(e_puma[,2]),col="blue")
lines(density(e_puma[,3]),col="green")
lines(density(e_puma[,4]),col="purple")

In [27]:

boxplot((e_puma), xlab="Neutrophil samples", ylab="Gene Expression", main="Boxplot of gene expression extracted using puma")

Although the data is shown to be normalized, and the medians are aligned, it can also be seem from the boxplot that there is a large number of negative outliers, therefore the negative gene expression values are set to zero, to further normalise the data.

In [28]:

for (i in 1:4) {
    y<-e_puma[,i]
    y[y<0] <-0
    e_puma[,i] <- y
}
head(e_puma)

	LPGMa.CEL	LPGMb.CEL	LPHa.CEL	LPHb.CEL
1007_s_at	5.902316	5.6103537	5.6174054	5.737430
1053_at	6.626212	6.6790712	6.3627049	6.180460
117_at	10.170737	10.1217316	10.6852717	10.553831
121_at	5.211771	4.8356632	5.5315978	5.400501
1255_g_at	0.000000	0.2041968	0.2107774	0.000000
1294_at	5.458750	5.9264062	5.8934533	6.153923

In [29]:

density(e_puma)

Call:
	density.default(x = e_puma)

Data: e_puma (218700 obs.);	Bandwidth 'bw' = 0.2384

       x                 y            
 Min.   :-0.7153   Min.   :0.0000002  
 1st Qu.: 3.0348   1st Qu.:0.0172600  
 Median : 6.7849   Median :0.0533883  
 Mean   : 6.7849   Mean   :0.0665816  
 3rd Qu.:10.5351   3rd Qu.:0.0965064  
 Max.   :14.2852   Max.   :0.4313578  

In [30]:

plot(density(e_puma[,1]), col="red", main="PUMA Estimation")
lines(density(e_puma[,2]),col="green")
lines(density(e_puma[,3]),col="blue")
lines(density(e_puma[,4]),col="purple")

In [31]:

boxplot(e_puma,main="Boxplot of gene expression extracted using puma - normalised", xlab="Neutrophil Samples", ylab="gene expression")

Boxplots show the differences in probe intensity behaviour between arrays. Boxplots are useful in the visualisation of data for first diagnostics, ensuring all the samples are comparable. Box plots show are able to illustrate:

Median
Upper Quartile
Lower Quartile
Range
Individual extreme values (Outliers)

The boxplots above show that gene expression extracted using rma does not need to be normalised as the medians are aligned, and no negative outliers. The mas5 boxplot showed the data must be log2 transformed in order for comparison to be possible. For puma, the results needed to be normalised due to the high number of negative outliers present, although the medians are aligned.

All three analysis techniques showed a similar range of values following normalisation.

In [32]:

par(mfrow=c(2,2))
MAplot(e_rma)

In [33]:

par(mfrow=c(2,2))
MAplot(e_puma)

In MA plots, each Affymetrix marray is compared to a pseudo-array, which consist of the median intensity of each probe over all arrays, the plot shows to what extent the variability in expression depends on the expression level. M is the difference between the intensity of a probe on the array and the median intensity of that probe over all arrays A is the average intensity of a probe on that array and the median intensity of that probe over all arrays.

The cloud of data points in the MA plot is centered around M=0, based on the assumption that the majority of the genes are not differentially expressed, an the number of upregulated genes is similar to the number of downregulated genes.

From the MA plots above, it can be deduced that there appears to be a greater number of downregulated genes in neutrophils under hypoxia conditions than in normal conditions.

Step 5:

In [36]:

eset_puma_comb<- pumaCombImproved(eset_puma_normd)

pumaComb expected completion time is 3 hours 
.......20%.......40%.......60%.......80%......100%
..................................................

In [65]:

save(eset_puma_comb, file="eset_pumacomb.RDA")

In [34]:

load("eset_pumacomb.RDA")
ls()

'affybatch.hypoxia'
'e_mas5'
'e_puma'
'e_puma_normd'
'e_rma'
'eset_mas5'
'eset_puma'
'eset_puma_comb'
'eset_puma_normd'
'eset_rma'
'hypoxia_filenames'
'i'
'log2e_mas5'
'y'

In [35]:


show(eset_puma_comb)

ExpressionSet (storageMode: lockedEnvironment)
assayData: 54675 features, 4 samples 
  element names: exprs, se.exprs 
protocolData: none
phenoData
  sampleNames: Hypoxia.1 Normal.1 Hypoxia.2 Normal.2
  varLabels: Condition Sample
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation:  

In [36]:

pData(eset_puma_comb)

	Condition	Sample
Hypoxia.1	Hypoxia	1
Normal.1	Normal	1
Hypoxia.2	Hypoxia	2
Normal.2	Normal	2

In [37]:

dim(eset_puma_comb)

Features: 54675
Samples: 4

In [38]:

hypoxia_comb_puma<-exprs(eset_puma_comb)
for(i in 1:4) {
    temp<-hypoxia_comb_puma[,i]
    temp[temp<0] <-0
    hypoxia_comb_puma[,i]<- temp
}

In [39]:

FC_puma<- hypoxia_comb_puma[,1:2] - hypoxia_comb_puma[,3:4]
colnames(FC_puma) <- c("Hypoxia-Normal 1","Hypoxia-Normal 2")
head(FC_puma)

	Hypoxia-Normal 1	Hypoxia-Normal 2
1007_s_at	-0.0003167666	0.0009128018
1053_at	0.0007166274	-0.0002780249
117_at	0.0032436276	0.0011207505
121_at	0.0001154092	0.0002085455
1255_g_at	0.0000000000	0.0000000000
1294_at	-0.0019379157	-0.0020818635

In [40]:

MAplot(FC_puma)

Warning message in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
“Binning grid too coarse for current (small) bandwidth: consider increasing 'gridsize'”Warning message in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
“Binning grid too coarse for current (small) bandwidth: consider increasing 'gridsize'”

In [41]:

FC_rma<- e_rma[,1:2] - e_rma[,3:4]
colnames(FC_rma) <- c("Hypoxia-Normal 1","Hypoxia-Normal 2")
head(FC_rma)

	Hypoxia-Normal 1	Hypoxia-Normal 2
1007_s_at	-0.03973322	0.05840595
1053_at	0.14951411	0.35413809
117_at	-0.46791889	-0.57821119
121_at	-0.09228612	-0.02845722
1255_g_at	-0.19878260	-0.07224494
1294_at	-0.12245352	-0.20554369

In [42]:

MAplot(FC_rma)

In [43]:

groups<-c("H1","N1","H2","N2")
hypoxia_table<-data.frame(sampleNames(eset_puma_comb),groups)
group1<-factor(groups[1:2])
group2<-factor(groups[3:4])

group1
group2

In [44]:

hypoxia_table

sampleNames.eset_puma_comb.	groups
Hypoxia.1	H1
Normal.1	N1
Hypoxia.2	H2
Normal.2	N2

In [45]:

par(mfrow=c(2,2))
MAplot(eset_puma_comb)

Warning message in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
“Binning grid too coarse for current (small) bandwidth: consider increasing 'gridsize'”Warning message in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
“Binning grid too coarse for current (small) bandwidth: consider increasing 'gridsize'”Warning message in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
“Binning grid too coarse for current (small) bandwidth: consider increasing 'gridsize'”Warning message in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
“Binning grid too coarse for current (small) bandwidth: consider increasing 'gridsize'”

In [46]:

library(limma)
group<-factor(c("Normal","Normal","Hypoxia","Hypoxia"))
design<-model.matrix(~0+group)
colnames(design)<-c("Normal","Hypoxia")
contrast.matrix_puma<- makeContrasts(Normal,Hypoxia,levels=design)
design
contrast.matrix_puma

fit<-lmFit(eset_puma,design)
fit2<-contrasts.fit(fit,contrast.matrix_puma)
fit3<-eBayes(fit2)

topDEGenes<-topTable(fit3, coef=1, adjust="BH", n=100, lfc=1)
topDEGenes

Attaching package: ‘limma’

The following object is masked from ‘package:oligo’:

    backgroundCorrect

The following object is masked from ‘package:BiocGenerics’:

    plotMA

	Normal	Hypoxia
1	0	1
2	0	1
3	1	0
4	1	0

	Normal	Hypoxia
Normal	1	0
Hypoxia	0	1

	logFC	AveExpr	t	P.Value	adj.P.Val	B
200958_s_at	13.16920	13.11567	124.3121	4.891234e-07	2.954137e-05	5.351285
204006_s_at	12.87494	12.61830	123.9542	4.936824e-07	2.954137e-05	5.350019
200748_s_at	13.33368	13.34257	122.0398	5.190456e-07	2.954137e-05	5.343069
211919_s_at	12.99841	12.80683	121.8356	5.218509e-07	2.954137e-05	5.342310
211742_s_at	12.62507	12.33153	121.0488	5.328459e-07	2.954137e-05	5.339351
1555745_a_at	12.85586	12.53620	121.0256	5.331751e-07	2.954137e-05	5.339263
AFFX-HSAC07/X00351_3_at	12.84239	12.95069	120.9210	5.346613e-07	2.954137e-05	5.338865
AFFX-hum_alu_at	13.11747	13.12849	120.9017	5.349360e-07	2.954137e-05	5.338792
212560_at	12.62891	12.38139	120.4412	5.415465e-07	2.954137e-05	5.337029
217028_at	13.12695	12.97323	120.2231	5.447149e-07	2.954137e-05	5.336188
208980_s_at	12.53417	12.55008	120.1028	5.464738e-07	2.954137e-05	5.335722
202727_s_at	12.53732	12.55374	119.9873	5.481680e-07	2.954137e-05	5.335273
202917_s_at	13.44072	13.46929	119.8412	5.503220e-07	2.954137e-05	5.334704
200801_x_at	12.79446	12.86409	119.7931	5.510328e-07	2.954137e-05	5.334517
200668_s_at	12.48924	12.47371	119.5855	5.541183e-07	2.954137e-05	5.333703
201368_at	12.71576	12.42925	119.5836	5.541462e-07	2.954137e-05	5.333696
212587_s_at	12.65005	12.59432	119.5826	5.541612e-07	2.954137e-05	5.333692
202388_at	12.95929	12.60095	119.3217	5.580716e-07	2.954137e-05	5.332665
200704_at	12.84004	12.89974	119.2973	5.584377e-07	2.954137e-05	5.332569
200794_x_at	12.66494	12.71327	119.1220	5.610873e-07	2.954137e-05	5.331874
215952_s_at	12.73163	12.62020	118.9827	5.632039e-07	2.954137e-05	5.331321
209201_x_at	12.96794	12.76079	118.9804	5.632398e-07	2.954137e-05	5.331311
AFFX-HSAC07/X00351_5_at	12.36732	12.55121	118.8585	5.651011e-07	2.954137e-05	5.330826
201721_s_at	13.02248	13.02379	118.8531	5.651844e-07	2.954137e-05	5.330804
224765_at	12.35501	11.72387	118.6302	5.686084e-07	2.954137e-05	5.329912
228754_at	12.34079	12.06019	118.5799	5.693850e-07	2.954137e-05	5.329710
AFFX-HSAC07/X00351_M_at	12.53094	12.68500	118.5455	5.699171e-07	2.954137e-05	5.329572
208679_s_at	12.82208	12.76857	118.4661	5.711483e-07	2.954137e-05	5.329252
204122_at	12.78061	12.81913	118.3847	5.724132e-07	2.954137e-05	5.328923
211296_x_at	12.73171	12.73269	118.3529	5.729075e-07	2.954137e-05	5.328795
⋮	⋮	⋮	⋮	⋮	⋮	⋮
71	12.16112	11.66354	115.4135	6.212097e-07	2.954137e-05	5.316500
72	12.60173	12.61777	115.4121	6.212343e-07	2.954137e-05	5.316493
73	12.82313	13.02037	115.2926	6.233099e-07	2.954137e-05	5.315975
74	12.35619	11.88091	115.2806	6.235184e-07	2.954137e-05	5.315923
75	12.01287	11.82985	115.2349	6.243133e-07	2.954137e-05	5.315725
76	12.08655	11.76656	115.1775	6.253161e-07	2.954137e-05	5.315475
77	12.08300	11.74270	115.1515	6.257710e-07	2.954137e-05	5.315361
78	13.05271	13.04967	115.0145	6.281731e-07	2.954137e-05	5.314763
79	12.08619	11.89104	114.9890	6.286219e-07	2.954137e-05	5.314652
80	12.63099	12.73537	114.9662	6.290222e-07	2.954137e-05	5.314552
81	12.04174	12.15283	114.9463	6.293735e-07	2.954137e-05	5.314465
82	12.28123	12.27198	114.9459	6.293805e-07	2.954137e-05	5.314463
83	12.36272	12.52576	114.9123	6.299725e-07	2.954137e-05	5.314316
84	12.09863	11.61412	114.8484	6.311019e-07	2.954137e-05	5.314035
85	12.11993	12.16425	114.7207	6.333660e-07	2.954137e-05	5.313474
86	12.14712	11.62236	114.6764	6.341531e-07	2.954137e-05	5.313279
87	12.16596	12.04887	114.6760	6.341598e-07	2.954137e-05	5.313277
88	12.07946	12.06161	114.6299	6.349816e-07	2.954137e-05	5.313074
89	11.99899	12.09922	114.6182	6.351897e-07	2.954137e-05	5.313022
90	11.92110	11.82842	114.6063	6.354027e-07	2.954137e-05	5.312970
91	12.19644	12.19601	114.5656	6.361297e-07	2.954137e-05	5.312790
92	12.22030	12.22629	114.5647	6.361456e-07	2.954137e-05	5.312786
93	12.28066	12.24067	114.5480	6.364439e-07	2.954137e-05	5.312712
94	12.13352	12.16809	114.5474	6.364545e-07	2.954137e-05	5.312709
95	12.04900	11.96460	114.5259	6.368395e-07	2.954137e-05	5.312614
96	12.27975	12.26686	114.4355	6.384589e-07	2.954137e-05	5.312214
97	12.19763	12.27570	114.4078	6.389574e-07	2.954137e-05	5.312091
98	12.86782	13.04416	114.4035	6.390343e-07	2.954137e-05	5.312072
99	12.04219	11.50384	114.3701	6.396356e-07	2.954137e-05	5.311924
100	12.72957	12.72944	114.3418	6.401459e-07	2.954137e-05	5.311798

Limma is a package for differential expression analysis of data arising from microarray experiments. A linear model is fit to the expression data for each gene. Empirical Beyes (a shrinkage method) is used to borrow information across genes making the analyses stable. Linear models are used to analyse designed microarray experiments, allowing for very general experiments to be analysed easily. Two matrices need to be specified. The design matrix provides a representation of the different RNA targets which have been hybridized to the arrays. The contrast matrix allows the coefficients designed by the design matrix to be combined into contrasts of interest. Each contrast corresponds to a comparison of interest between the RNA targets.

In [47]:

results_puma<-decideTests(fit3, method="global",lfc=1) 
vennDiagram(results_puma)

The Venn Diagram shows that 3761 genes and 3390 genes were expressed in only normal and only hypoxia conditions, respectively. 35170 genes were expressed in both normal and hypoxia conditions.

In [48]:

hist(fit3$p.value)

In [49]:

dim(topDEGenes)

In [50]:

rownames(topDEGenes)

'200958_s_at'
'204006_s_at'
'200748_s_at'
'211919_s_at'
'211742_s_at'
'1555745_a_at'
'AFFX-HSAC07/X00351_3_at'
'AFFX-hum_alu_at'
'212560_at'
'217028_at'
'208980_s_at'
'202727_s_at'
'202917_s_at'
'200801_x_at'
'200668_s_at'
'201368_at'
'212587_s_at'
'202388_at'
'200704_at'
'200794_x_at'
'215952_s_at'
'209201_x_at'
'AFFX-HSAC07/X00351_5_at'
'201721_s_at'
'224765_at'
'228754_at'
'AFFX-HSAC07/X00351_M_at'
'208679_s_at'
'204122_at'
'211296_x_at'
'202391_at'
'207238_s_at'
'211940_x_at'
'208763_s_at'
'225414_at'
'203535_at'
'207008_at'
'211911_x_at'
'1555756_a_at'
'201858_s_at'
'204774_at'
'1553588_at'
'213828_x_at'
'228846_at'
'209732_at'
'210774_s_at'
'211997_x_at'
'224373_s_at'
'216231_s_at'
'208616_s_at'
'204959_at'
'213702_x_at'
'202902_s_at'
'208718_at'
'220990_s_at'
'225364_at'
'216438_s_at'
'201210_at'
'200774_at'
'218614_at'
'224761_at'
'226810_at'
'208783_s_at'
'200059_s_at'
'208788_at'
'224583_at'
'205922_at'
'AFFX-HUMGAPDH/M33197_3_at'
'232617_at'
'218205_s_at'
'226979_at'
'211676_s_at'
'211956_s_at'
'200921_s_at'
'209083_at'
'203509_at'
'221059_s_at'
'212788_x_at'
'207988_s_at'
'200706_s_at'
'202833_s_at'
'204563_at'
'201779_s_at'
'200920_s_at'
'200729_s_at'
'209112_at'
'217983_s_at'
'224372_at'
'208736_at'
'224451_x_at'
'1553570_x_at'
'217967_s_at'
'204351_at'
'209933_s_at'
'201862_s_at'
'200904_at'
'205568_at'
'211506_s_at'
'213241_at'
'209949_at'

In [51]:

write.table(rownames(topDEGenes),"/projects/ddda6a8e-2bca-47f5-b1d6-79b2c48d0e30/Autumn2016/ProjectC/data_projectC.txt")

In [52]:

pumaDERes<-pumaDE(eset_puma_comb)
pumaDERes

DEResult object:
  DEMethod = pumaDE
  statisticDescription = Probability of Positive Log Ratio (PPLR)
  statistic = 54675 probesets x 7 contrasts

In [53]:

getwd()

'/projects/ddda6a8e-2bca-47f5-b1d6-79b2c48d0e30/Autumn2016/ProjectC/data_projectC'

In [54]:

write.reslts(pumaDERes, file="pumaDERes")

In [55]:

library(hgu133plus2.db)
library(annotate)

geneProbes<-as.character(rownames(topDEGenes))
annotated_list<-select(hgu133plus2.db, geneProbes,c("SYMBOL","GENENAME"))
annotated_list

Loading required package: AnnotationDbi
Loading required package: org.Hs.eg.db
Loading required package: DBI


Loading required package: XML
'select()' returned 1:many mapping between keys and columns

PROBEID	SYMBOL	GENENAME
200958_s_at	SDCBP	syndecan binding protein (syntenin)
204006_s_at	FCGR3B	Fc fragment of IgG, low affinity IIIb, receptor (CD16b)
204006_s_at	FCGR3A	Fc fragment of IgG, low affinity IIIa, receptor (CD16a)
200748_s_at	FTH1	ferritin, heavy polypeptide 1
211919_s_at	CXCR4	chemokine (C-X-C motif) receptor 4
211742_s_at	EVI2B	ecotropic viral integration site 2B
1555745_a_at	LYZ	lysozyme
AFFX-HSAC07/X00351_3_at	ACTB	actin, beta
AFFX-hum_alu_at	NA	NA
212560_at	SORL1	sortilin-related receptor, L(DLR class) A repeats containing
217028_at	CXCR4	chemokine (C-X-C motif) receptor 4
208980_s_at	UBC	ubiquitin C
202727_s_at	IFNGR1	interferon gamma receptor 1
202917_s_at	S100A8	S100 calcium binding protein A8
200801_x_at	ACTB	actin, beta
200668_s_at	UBE2D3	ubiquitin-conjugating enzyme E2D 3
201368_at	ZFP36L2	ZFP36 ring finger protein-like 2
212587_s_at	PTPRC	protein tyrosine phosphatase, receptor type, C
202388_at	RGS2	regulator of G-protein signaling 2
200704_at	LITAF	lipopolysaccharide-induced TNF factor
200794_x_at	DAZAP2	DAZ associated protein 2
215952_s_at	OAZ1	ornithine decarboxylase antizyme 1
209201_x_at	CXCR4	chemokine (C-X-C motif) receptor 4
AFFX-HSAC07/X00351_5_at	ACTB	actin, beta
201721_s_at	LAPTM5	lysosomal protein transmembrane 5
224765_at	MSL1	male-specific lethal 1 homolog (Drosophila)
228754_at	SLC6A6	solute carrier family 6 (neurotransmitter transporter), member 6
AFFX-HSAC07/X00351_M_at	ACTB	actin, beta
208679_s_at	ARPC2	actin related protein 2/3 complex, subunit 2, 34kDa
204122_at	TYROBP	TYRO protein tyrosine kinase binding protein
⋮	⋮	⋮
211676_s_at	IFNGR1	interferon gamma receptor 1
211956_s_at	EIF1	eukaryotic translation initiation factor 1
200921_s_at	BTG1	B-cell translocation gene 1, anti-proliferative
209083_at	CORO1A	coronin, actin binding protein, 1A
203509_at	SORL1	sortilin-related receptor, L(DLR class) A repeats containing
221059_s_at	CHST6	carbohydrate (N-acetylglucosamine 6-O) sulfotransferase 6
221059_s_at	COTL1	coactosin-like F-actin binding protein 1
212788_x_at	FTL	ferritin, light polypeptide
207988_s_at	ARPC2	actin related protein 2/3 complex, subunit 2, 34kDa
200706_s_at	LITAF	lipopolysaccharide-induced TNF factor
202833_s_at	SERPINA1	serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 1
204563_at	SELL	selectin L
201779_s_at	RNF13	ring finger protein 13
200920_s_at	BTG1	B-cell translocation gene 1, anti-proliferative
200729_s_at	ACTR2	ARP2 actin-related protein 2 homolog (yeast)
209112_at	CDKN1B	cyclin-dependent kinase inhibitor 1B (p27, Kip1)
217983_s_at	RNASET2	ribonuclease T2
224372_at	ND4	NADH dehydrogenase, subunit 4 (complex I)
208736_at	ARPC3	actin related protein 2/3 complex, subunit 3, 21kDa
224451_x_at	ARHGAP9	Rho GTPase activating protein 9
1553570_x_at	COX2	cytochrome c oxidase subunit II
217967_s_at	FAM129A	family with sequence similarity 129, member A
204351_at	S100P	S100 calcium binding protein P
209933_s_at	CD300A	CD300a molecule
201862_s_at	LRRFIP1	leucine rich repeat (in FLII) interacting protein 1
200904_at	HLA-E	major histocompatibility complex, class I, E
205568_at	AQP9	aquaporin 9
211506_s_at	CXCL8	chemokine (C-X-C motif) ligand 8
213241_at	PLXNC1	plexin C1
209949_at	NCF2	neutrophil cytosolic factor 2

In [56]:

annotated_list[,2]

'SDCBP'
'FCGR3B'
'FCGR3A'
'FTH1'
'CXCR4'
'EVI2B'
'LYZ'
'ACTB'
'NA'
'SORL1'
'CXCR4'
'UBC'
'IFNGR1'
'S100A8'
'ACTB'
'UBE2D3'
'ZFP36L2'
'PTPRC'
'RGS2'
'LITAF'
'DAZAP2'
'OAZ1'
'CXCR4'
'ACTB'
'LAPTM5'
'MSL1'
'SLC6A6'
'ACTB'
'ARPC2'
'TYROBP'
'UBC'
'BASP1'
'PTPRC'
'H3F3A'
'H3F3B'
'H3F3AP4'
'TSC22D3'
'RNF149'
'S100A9'
'CXCR2'
'HLA-B'
'CLEC7A'
'SRGN'
'EVI2A'
'ND3'
'SH3KBP1'
'H3F3A'
'H3F3B'
'H3F3AP4'
'MXD1'
'CLEC2B'
'NCOA4'
'H3F3B'
'H3F3A'
'MIR4738'
'ND4'
'B2M'
'PTP4A2'
'MNDA'
'ASAH1'
'CTSS'
'DDX17'
'VMP1'
'MIR21'
'STK4'
'TMSB4X'
'DDX3X'
'FAM120A'
'KIAA1551'
'GNA13'
'OGFRL1'
'CD46'
'RHOA'
'ELOVL5'
'COTL1'
'VNN2'
'GAPDH'
'CTSS'
'MKNK2'
'MAP3K2'
'IFNGR1'
'EIF1'
'BTG1'
'CORO1A'
'SORL1'
'CHST6'
'COTL1'
'FTL'
'ARPC2'
'LITAF'
'SERPINA1'
'SELL'
'RNF13'
'BTG1'
'ACTR2'
'CDKN1B'
'RNASET2'
'ND4'
'ARPC3'
'ARHGAP9'
'COX2'
'FAM129A'
'S100P'
'CD300A'
'LRRFIP1'
'HLA-E'
'AQP9'
'CXCL8'
'PLXNC1'
'NCF2'

In [57]:

write.table(annotated_list[,2],"/projects/ddda6a8e-2bca-47f5-b1d6-79b2c48d0e30/Autumn2016/ProjectC/data_projectC/SYMBOL.txt")

In [58]:

dir()

'DEGenesSYMBOL.txt'
'eset_puma.RDA'
'eset_pumacomb.RDA'
'LPGMa.CEL'
'LPGMb.CEL'
'LPHa.CEL'
'LPHb.CEL'
'PANTHER_Pathway.png'
'pantherChart.txt'
'pantherGeneList.txt'
'pumaDERes_FCs.csv'
'pumaDERes_statistics.csv'
'SYMBOL.txt'

In [59]:

pumaDE_stat<-read.csv("pumaDERes_statistics.csv")
pumaDE_FC<-read.csv("pumaDERes_FCs.csv")

In [60]:

head(pumaDE_stat)

X	Normal.1_vs_Hypoxia.1	Hypoxia.2_vs_Hypoxia.1	Normal.2_vs_Normal.1	Normal.2_vs_Hypoxia.2	Condition_Hypoxia_vs_Normal	Sample_1_vs_2	Int__Condition_Hypoxia.Normal_vs_Sample_1.2
1007_s_at	0.5113750	0.5040763	0.4882552	0.4955539	0.4975498	0.5027118	0.4944060
1053_at	0.5164213	0.4895519	0.5040539	0.5308981	0.4832624	0.5022611	0.5051275
117_at	0.4082059	0.4761349	0.4917497	0.4234843	0.5597413	0.5113582	0.5055253
121_at	0.4965747	0.4985910	0.4974539	0.4954377	0.5028241	0.5013983	0.4995980
1255_g_at	0.4965998	0.4963111	0.5034108	0.5036996	0.4998941	0.5000983	0.5025101
1294_at	0.4807838	0.5198114	0.5212816	0.4822544	0.5130704	0.4854682	0.5005205

In [61]:

probeid<-pumaDE_stat[,1]
PPLR_N1vsH1<-pumaDE_stat[,2]
PPLR_N2vsH2<-pumaDE_stat[,5]


pumaRes<-data.frame(probeid,PPLR_N1vsH1,PPLR_N2vsH2)
pumaRes

probeid	PPLR_N1vsH1	PPLR_N2vsH2
1007_s_at	0.5113750	0.4955539
1053_at	0.5164213	0.5308981
117_at	0.4082059	0.4234843
121_at	0.4965747	0.4954377
1255_g_at	0.4965998	0.5036996
1294_at	0.4807838	0.4822544
1316_at	0.5027632	0.4971939
1320_at	0.4974770	0.5011232
1405_i_at	0.4949343	0.4911106
1431_at	0.4815598	0.5103977
1438_at	0.5009881	0.5005768
1487_at	0.5521465	0.5394558
1494_f_at	0.5036250	0.4874812
1552256_a_at	0.4859761	0.5094893
1552257_a_at	0.5269255	0.5426862
1552258_at	0.5296864	0.5360022
1552261_at	0.5024123	0.4974071
1552263_at	0.5002512	0.5120329
1552264_a_at	0.5087401	0.5142300
1552266_at	0.4986212	0.5178322
1552269_at	0.5007328	0.5008457
1552271_at	0.5004264	0.5022765
1552272_a_at	0.4978715	0.4888625
1552274_at	0.9742525	0.9869562
1552275_s_at	0.9382355	0.9633200
1552276_a_at	0.5002337	0.5004873
1552277_a_at	0.5154551	0.5125756
1552278_a_at	0.4979116	0.4996785
1552279_a_at	0.4968687	0.4988454
1552280_at	0.5000570	0.4988962
⋮	⋮	⋮
AFFX-PheX-3_at	6.383827e-20	2.273321e-19
AFFX-PheX-5_at	4.565683e-10	4.361088e-10
AFFX-PheX-M_at	1.048477e-09	1.116457e-09
AFFX-r2-Bs-dap-3_at	9.091292e-10	4.329772e-10
AFFX-r2-Bs-dap-5_at	9.638324e-14	1.339950e-13
AFFX-r2-Bs-dap-M_at	8.537546e-08	1.068900e-07
AFFX-r2-Bs-lys-3_at	1.101316e-17	2.225752e-17
AFFX-r2-Bs-lys-5_at	3.508264e-27	1.746063e-27
AFFX-r2-Bs-lys-M_at	1.291064e-21	1.064751e-21
AFFX-r2-Bs-phe-3_at	5.069334e-23	1.557718e-22
AFFX-r2-Bs-phe-5_at	2.727829e-19	8.052990e-19
AFFX-r2-Bs-phe-M_at	6.153183e-12	5.725746e-12
AFFX-r2-Bs-thr-3_s_at	2.039883e-20	5.979694e-20
AFFX-r2-Bs-thr-5_s_at	9.929500e-17	1.425241e-16
AFFX-r2-Bs-thr-M_s_at	2.017511e-20	4.427426e-19
AFFX-r2-Ec-bioB-3_at	5.074861e-01	5.188246e-01
AFFX-r2-Ec-bioB-5_at	4.782735e-01	4.845650e-01
AFFX-r2-Ec-bioB-M_at	4.750756e-01	5.299473e-01
AFFX-r2-Ec-bioC-3_at	4.848563e-01	5.102748e-01
AFFX-r2-Ec-bioC-5_at	4.846223e-01	5.212988e-01
AFFX-r2-Ec-bioD-3_at	4.914816e-01	5.154334e-01
AFFX-r2-Ec-bioD-5_at	4.877219e-01	5.165479e-01
AFFX-r2-P1-cre-3_at	4.936170e-01	5.089681e-01
AFFX-r2-P1-cre-5_at	4.963582e-01	5.091534e-01
AFFX-ThrX-3_at	1.478307e-11	1.845208e-11
AFFX-ThrX-5_at	3.374282e-01	3.317903e-01
AFFX-ThrX-M_at	2.961146e-21	1.959100e-21
AFFX-TrpnX-3_at	4.753222e-01	5.152153e-01
AFFX-TrpnX-5_at	4.894762e-01	4.862144e-01
AFFX-TrpnX-M_at	4.999410e-01	5.024540e-01

In [62]:


down_N1vsH1<-pumaRes[pumaRes$PPLR_N1vsH1<=0.2,1]
up_N1vsH1<-pumaRes[pumaRes$PPLR_N1vsH1>=0.8,1]

down_N2vsH2<-pumaRes[pumaRes$PPLR_N2vsH2<=0.2,1]
up_N2vsH2<-pumaRes[pumaRes$PPLR_N2vsH2>=0.8,1]



downDE<-data.frame(match(down_N1vsH1,down_N2vsH2))
downDE<-downDE[!is.na(downDE)]
upDE<-data.frame(match(up_N1vsH1,up_N2vsH2))
upDE<-upDE[!is.na(upDE)]

In [63]:

DE<-data.frame(match(downDE,upDE))
DE<-DE[!is.na(DE)]
length(DE)

928

In [64]:

head(pumaDE_FC)

X	Normal.1_vs_Hypoxia.1	Hypoxia.2_vs_Hypoxia.1	Normal.2_vs_Normal.1	Normal.2_vs_Hypoxia.2	Condition_Hypoxia_vs_Normal	Sample_1_vs_2	Int__Condition_Hypoxia.Normal_vs_Sample_1.2
1007_s_at	0.0008676691	0.0003108934	-0.0008958805	-0.0003391048	-2.642822e-04	2.924935e-04	-6.033869e-04
1053_at	0.0011140228	-0.0007086750	0.0002749431	0.0020976409	-1.605832e-03	2.168660e-04	4.918090e-04
117_at	-0.0125714683	-0.0032412178	-0.0011199112	-0.0104501617	1.151082e-02	2.180565e-03	1.060653e-03
121_at	-0.0002734692	-0.0001124929	-0.0002032746	-0.0003642509	3.188601e-04	1.578838e-04	-4.539084e-05
1255_g_at	-0.0001363636	-0.0001479422	0.0001367911	0.0001483697	-6.003047e-06	5.575531e-06	1.423666e-04
1294_at	-0.0018486011	0.0019059397	0.0020474700	-0.0017070709	1.777836e-03	-1.976705e-03	7.076513e-05

In [65]:

geneProbes<-as.character(pumaDE_FC$X)
annotated_list<-select(hgu133plus2.db,geneProbes,c("SYMBOL","GENENAME"))
DEGenes=annotated_list[pumaRes[DE,1],]
DEGenes
dim(DEGenes)

'select()' returned 1:many mapping between keys and columns

	PROBEID	SYMBOL	GENENAME
1	1007_s_at	DDR1	discoidin domain receptor tyrosine kinase 1
2	1007_s_at	MIR4640	microRNA 4640
3	1053_at	RFC2	replication factor C (activator 1) 2, 40kDa
5	121_at	PAX8	paired box 8
6	1255_g_at	GUCA1A	guanylate cyclase activator 1A (retina)
7	1294_at	UBA7	ubiquitin-like modifier activating enzyme 7
8	1294_at	MIR5193	microRNA 5193
9	1316_at	THRA	thyroid hormone receptor, alpha
10	1320_at	PTPN21	protein tyrosine phosphatase, non-receptor type 21
11	1405_i_at	CCL5	chemokine (C-C motif) ligand 5
12	1431_at	CYP2E1	cytochrome P450, family 2, subfamily E, polypeptide 1
13	1438_at	EPHB3	EPH receptor B3
14	1487_at	ESRRA	estrogen-related receptor alpha
15	1494_f_at	CYP2A6	cytochrome P450, family 2, subfamily A, polypeptide 6
16	1552256_a_at	SCARB1	scavenger receptor class B, member 1
18	1552258_at	LINC00152	long intergenic non-protein coding RNA 152
19	1552261_at	WFDC2	WAP four-disulfide core domain 2
20	1552263_at	MAPK1	mitogen-activated protein kinase 1
21	1552264_a_at	MAPK1	mitogen-activated protein kinase 1
22	1552266_at	ADAM32	ADAM metallopeptidase domain 32
24	1552271_at	PRR22	proline rich 22
25	1552272_a_at	PRR22	proline rich 22
26	1552274_at	PXK	PX domain containing serine/threonine kinase
27	1552275_s_at	PXK	PX domain containing serine/threonine kinase
28	1552276_a_at	VPS18	vacuolar protein sorting 18 homolog (S. cerevisiae)
29	1552277_a_at	MSANTD3	Myb/SANT-like DNA-binding domain containing 3
30	1552278_a_at	SLC46A1	solute carrier family 46 (folate transporter), member 1
31	1552279_a_at	SLC46A1	solute carrier family 46 (folate transporter), member 1
32	1552280_at	TIMD4	T-cell immunoglobulin and mucin domain containing 4
33	1552281_at	SLC39A5	solute carrier family 39 (zinc transporter), member 5
⋮	⋮	⋮	⋮
899	1553462_at	NA	NA
900	1553464_at	FLJ40288	uncharacterized FLJ40288
901	1553465_a_at	CES5A	carboxylesterase 5A
902	1553466_at	CFAP47	cilia and flagella associated protein 47
903	1553467_at	DCAF8L2	DDB1 and CUL4 associated factor 8-like 2
904	1553467_at	FLJ32742	uncharacterized locus FLJ32742
905	1553467_at	LOC101928481	uncharacterized LOC101928481
906	1553468_at	HYDIN	HYDIN, axonemal central pair apparatus protein
907	1553468_at	HYDIN2	HYDIN2, axonemal central pair apparatus protein (pseudogene)
908	1553470_at	DNAH17	dynein, axonemal, heavy chain 17
909	1553471_at	SLC35G3	solute carrier family 35, member G3
910	1553472_at	LOC150596	uncharacterized LOC150596
911	1553474_at	LOC100288966	POTE ankyrin domain family member D-like
912	1553475_at	NA	NA
913	1553478_at	KIRREL3-AS3	KIRREL3 antisense RNA 3
914	1553479_at	TMEM145	transmembrane protein 145
915	1553482_at	C15orf32	chromosome 15 open reading frame 32
916	1553483_at	TSGA10IP	testis specific, 10 interacting protein
917	1553484_at	LINC00477	long intergenic non-protein coding RNA 477
918	1553485_at	CCDC140	coiled-coil domain containing 140
919	1553486_a_at	C17orf78	chromosome 17 open reading frame 78
920	1553488_at	TEKT5	tektin 5
921	1553489_a_at	TEKT5	tektin 5
922	1553491_at	KSR2	kinase suppressor of ras 2
923	1553492_a_at	PAX1	paired box 1
924	1553493_a_at	TDH	L-threonine dehydrogenase (pseudogene)
925	1553494_at	TDH	L-threonine dehydrogenase (pseudogene)
926	1553497_at	LINC00615	long intergenic non-protein coding RNA 615
927	1553498_at	NA	NA
928	1553499_s_at	SERPINA9	serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 9

In [66]:


group<-factor(c("Normal","Normal","Hypoxia","Hypoxia"))
design<-model.matrix(~0+group)
colnames(design)<-c("Normal","Hypoxia")
contrast.matrix_rma<- makeContrasts(Normal,Hypoxia,levels=design)
design
contrast.matrix_rma

fitrma<-lmFit(eset_rma,design)
fit2rma<-contrasts.fit(fit,contrasts=contrast.matrix_rma)
fit3rma<-eBayes(fit2rma)

topDEGenes_rma<-topTable(fit3, coef=1, adjust="BH", n=100, lfc=1)
topDEGenes_rma

dim(topDEGenes_rma)

	Normal	Hypoxia
1	0	1
2	0	1
3	1	0
4	1	0

	Normal	Hypoxia
Normal	1	0
Hypoxia	0	1

	logFC	AveExpr	t	P.Value	adj.P.Val	B
200958_s_at	13.16920	13.11567	124.3121	4.891234e-07	2.954137e-05	5.351285
204006_s_at	12.87494	12.61830	123.9542	4.936824e-07	2.954137e-05	5.350019
200748_s_at	13.33368	13.34257	122.0398	5.190456e-07	2.954137e-05	5.343069
211919_s_at	12.99841	12.80683	121.8356	5.218509e-07	2.954137e-05	5.342310
211742_s_at	12.62507	12.33153	121.0488	5.328459e-07	2.954137e-05	5.339351
1555745_a_at	12.85586	12.53620	121.0256	5.331751e-07	2.954137e-05	5.339263
AFFX-HSAC07/X00351_3_at	12.84239	12.95069	120.9210	5.346613e-07	2.954137e-05	5.338865
AFFX-hum_alu_at	13.11747	13.12849	120.9017	5.349360e-07	2.954137e-05	5.338792
212560_at	12.62891	12.38139	120.4412	5.415465e-07	2.954137e-05	5.337029
217028_at	13.12695	12.97323	120.2231	5.447149e-07	2.954137e-05	5.336188
208980_s_at	12.53417	12.55008	120.1028	5.464738e-07	2.954137e-05	5.335722
202727_s_at	12.53732	12.55374	119.9873	5.481680e-07	2.954137e-05	5.335273
202917_s_at	13.44072	13.46929	119.8412	5.503220e-07	2.954137e-05	5.334704
200801_x_at	12.79446	12.86409	119.7931	5.510328e-07	2.954137e-05	5.334517
200668_s_at	12.48924	12.47371	119.5855	5.541183e-07	2.954137e-05	5.333703
201368_at	12.71576	12.42925	119.5836	5.541462e-07	2.954137e-05	5.333696
212587_s_at	12.65005	12.59432	119.5826	5.541612e-07	2.954137e-05	5.333692
202388_at	12.95929	12.60095	119.3217	5.580716e-07	2.954137e-05	5.332665
200704_at	12.84004	12.89974	119.2973	5.584377e-07	2.954137e-05	5.332569
200794_x_at	12.66494	12.71327	119.1220	5.610873e-07	2.954137e-05	5.331874
215952_s_at	12.73163	12.62020	118.9827	5.632039e-07	2.954137e-05	5.331321
209201_x_at	12.96794	12.76079	118.9804	5.632398e-07	2.954137e-05	5.331311
AFFX-HSAC07/X00351_5_at	12.36732	12.55121	118.8585	5.651011e-07	2.954137e-05	5.330826
201721_s_at	13.02248	13.02379	118.8531	5.651844e-07	2.954137e-05	5.330804
224765_at	12.35501	11.72387	118.6302	5.686084e-07	2.954137e-05	5.329912
228754_at	12.34079	12.06019	118.5799	5.693850e-07	2.954137e-05	5.329710
AFFX-HSAC07/X00351_M_at	12.53094	12.68500	118.5455	5.699171e-07	2.954137e-05	5.329572
208679_s_at	12.82208	12.76857	118.4661	5.711483e-07	2.954137e-05	5.329252
204122_at	12.78061	12.81913	118.3847	5.724132e-07	2.954137e-05	5.328923
211296_x_at	12.73171	12.73269	118.3529	5.729075e-07	2.954137e-05	5.328795
⋮	⋮	⋮	⋮	⋮	⋮	⋮
71	12.16112	11.66354	115.4135	6.212097e-07	2.954137e-05	5.316500
72	12.60173	12.61777	115.4121	6.212343e-07	2.954137e-05	5.316493
73	12.82313	13.02037	115.2926	6.233099e-07	2.954137e-05	5.315975
74	12.35619	11.88091	115.2806	6.235184e-07	2.954137e-05	5.315923
75	12.01287	11.82985	115.2349	6.243133e-07	2.954137e-05	5.315725
76	12.08655	11.76656	115.1775	6.253161e-07	2.954137e-05	5.315475
77	12.08300	11.74270	115.1515	6.257710e-07	2.954137e-05	5.315361
78	13.05271	13.04967	115.0145	6.281731e-07	2.954137e-05	5.314763
79	12.08619	11.89104	114.9890	6.286219e-07	2.954137e-05	5.314652
80	12.63099	12.73537	114.9662	6.290222e-07	2.954137e-05	5.314552
81	12.04174	12.15283	114.9463	6.293735e-07	2.954137e-05	5.314465
82	12.28123	12.27198	114.9459	6.293805e-07	2.954137e-05	5.314463
83	12.36272	12.52576	114.9123	6.299725e-07	2.954137e-05	5.314316
84	12.09863	11.61412	114.8484	6.311019e-07	2.954137e-05	5.314035
85	12.11993	12.16425	114.7207	6.333660e-07	2.954137e-05	5.313474
86	12.14712	11.62236	114.6764	6.341531e-07	2.954137e-05	5.313279
87	12.16596	12.04887	114.6760	6.341598e-07	2.954137e-05	5.313277
88	12.07946	12.06161	114.6299	6.349816e-07	2.954137e-05	5.313074
89	11.99899	12.09922	114.6182	6.351897e-07	2.954137e-05	5.313022
90	11.92110	11.82842	114.6063	6.354027e-07	2.954137e-05	5.312970
91	12.19644	12.19601	114.5656	6.361297e-07	2.954137e-05	5.312790
92	12.22030	12.22629	114.5647	6.361456e-07	2.954137e-05	5.312786
93	12.28066	12.24067	114.5480	6.364439e-07	2.954137e-05	5.312712
94	12.13352	12.16809	114.5474	6.364545e-07	2.954137e-05	5.312709
95	12.04900	11.96460	114.5259	6.368395e-07	2.954137e-05	5.312614
96	12.27975	12.26686	114.4355	6.384589e-07	2.954137e-05	5.312214
97	12.19763	12.27570	114.4078	6.389574e-07	2.954137e-05	5.312091
98	12.86782	13.04416	114.4035	6.390343e-07	2.954137e-05	5.312072
99	12.04219	11.50384	114.3701	6.396356e-07	2.954137e-05	5.311924
100	12.72957	12.72944	114.3418	6.401459e-07	2.954137e-05	5.311798

In [67]:

hist(fit3rma$p.value)

In [68]:

results_rma<-decideTests(fit3rma, method="global",lfc=1) 
vennDiagram(results_rma)

Step 6:

PCA is a mathematical algorithm that reduces the dimensionality of the data while retaining most of the variation in the data set. It does so by identifying directions, called principal components, along which the variation in the data is maximal. PCA plots check whether the overall variability of the samples reflect their groupings.

In [69]:

pca_hypoxia <- prcomp(t(e_rma)) 


plot(pca_hypoxia$x, xlab="Component 1", ylab="Component 2", 
     pch=unclass(as.factor(pData(eset_rma)[,1])), 
     col=unclass(as.factor(pData(eset_rma)[,2])), main="Standard PCA")

groups<-paste(eset_rma$Sample, eset_rma$Condition, sep =" ")

legend(0,0,groups,pch=unclass(as.factor(pData(eset_rma)[,1]))
, col=unclass(as.factor(pData(eset_rma)[,2])))

In [70]:

pumapca_hypoxia=pumaPCA(eset_puma_normd)
plot(pumapca_hypoxia)

Iteration number: 1
Iteration number: 2
Iteration number: 3
Iteration number: 4
Iteration number: 5

The two PCA plots show that the gene expressions of the neutrophils under the two conditions do vary. This is clear on the plots as the points for the two hypoxia samples are on right side of the plot, whereas the two normal samples are found on the left of the plot. It is unclear whether there is a clear difference between the gene expressions of the two sample groups themselves; in order to observe a clear difference between samples, more conditions are required, such as different levels of hypoxia.

Step 7:

Heat maps and clustering are often used in gene expression analysis studies to visualise the data and for quality control. It is a graphical representation of the data where the individual values in the matrix are represented as colours. They compares the level of gene expression of a number of samples, allowing for immediate visualisation of the data by assigning different colours to each gene, and it is possible to see clusters of genes with similar or hugely different expression values.

In [71]:

library(gplots)

tID<-rownames(topDEGenes)
ind<-1
j<-1
for (i in 1: length(tID)) {
	ind[j]<-which(rownames(eset_rma)==tID[i],arr.ind=TRUE)
	j<-j+1
}


topExpr<-e_rma[ind,]
heatmap.2(topExpr, col=redgreen(75), scale="row",
key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5, cexCol=0.8)

Attaching package: ‘gplots’

The following object is masked from ‘package:IRanges’:

    space

The following object is masked from ‘package:stats’:

    lowess

In [72]:

library(gplots)

tID<-rownames(topDEGenes)
ind<-1
j<-1
for (i in 1: length(tID)) {
	ind[j]<-which(rownames(eset_puma)==tID[i],arr.ind=TRUE)
	j<-j+1
}


topExpr<-e_puma[ind,]
heatmap.2(topExpr, col=redgreen(75), scale="row",
key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5, cexCol=0.8)

The heatmaps above are generated from eset_rma and eset_puma data, both showing similar patterns of gene expression, as indicated by the colours. From both methods, it can be deduced that a lot of genes that are expressed in samples order normal conditions are not expressed in samples under hypoxia, confirming that hypoxia has an effect on neutrophil gene expression.

Step 8:

This step involves the functional/ pathway analysis of differentially expressed targets using PANTHER or DAVID. DAVID is the online Database for Annotation, Visualization and Integrated Discovery, which can be used to convert a list of gene IDs. PANTHER (Protein ANalysis THrough Evolutionary Relationships) can be used to classify proteins and identify the key pathways involved in the difference in gene expression observed. PANTHER is used in this project to identify the key pathways in regulating gene expression in neutrophils under hypoxia and normal conditions.

In [73]:

setwd("~/Autumn2016/ProjectC/data_projectC")

In [82]:

GeneList<-read.table("pantherGeneList.txt", fill=TRUE)
GeneList

WARNING: Some output was deleted.

In [83]:

pantherChart<-read.table("pantherChart.txt", fill=TRUE)
pantherChart

V1	V2	V3	V4	V5	V6	V7	V8	V9	V10
1	Axon	guidance	mediated	by	Slit/Robo	(P00008)	1	1.1%	1.6%
2	JAK/STAT	signaling	pathway	(P00038)	1	1.1%	1.6%
3	Axon	guidance	mediated	by	semaphorins	(P00007)	1	1.1%	1.6%
4	p38	MAPK	pathway	(P05918)	1	1.1%	1.6%
5	Interleukin	signaling	pathway	(P00036)	4	4.5%	6.3%
6	Angiogenesis	(P00005)	1	1.1%	1.6%
7	Interferon-gamma	signaling	pathway	(P00035)	1	1.1%	1.6%
8	Alzheimer	disease-presenilin	pathway	(P00004)	2	2.3%	3.1%
9	Integrin	signalling	pathway	(P00034)	5	5.7%	7.8%
10	Inflammation	mediated	by	chemokine	and	cytokine	signaling	pathway	(P00031)
9	10.2%	14.1%
11	Ubiquitin	proteasome	pathway	(P00060)	1	1.1%	1.6%
12	Angiotensin	II-stimulated	signaling	through	G	proteins	and	beta-arrestin	(P05911)
1	1.1%	1.6%
13	Endothelin	signaling	pathway	(P00019)	1	1.1%	1.6%
14	EGF	receptor	signaling	pathway	(P00018)	1	1.1%	1.6%
15	Gonadotropin-releasing	hormone	receptor	pathway	(P06664)	2	2.3%	3.1%
16	DNA	replication	(P00017)	2	2.3%	3.1%
17	PDGF	signaling	pathway	(P00047)	3	3.4%	4.7%
18	Cytoskeletal	regulation	by	Rho	GTPase	(P00016)	3	3.4%	4.7%
19	Oxidative	stress	response	(P00046)	1	1.1%	1.6%
20	Ras	Pathway	(P04393)	1	1.1%	1.6%
21	Nicotinic	acetylcholine	receptor	signaling	pathway	(P00044)	1	1.1%	1.6%
22	Cadherin	signaling	pathway	(P00012)	2	2.3%	3.1%
23	Blood	coagulation	(P00011)	1	1.1%	1.6%
24	B	cell	activation	(P00010)	2	2.3%	3.1%
25	CCKR	signaling	map	(P06959)	4	4.5%	6.3%
26	Huntington	disease	(P00029)	3	3.4%	4.7%
27	Heterotrimeric	G-protein	signaling	pathway-Gq	alpha	and	Go	alpha	mediated
pathway	(P00027)	2	2.3%	3.1%
28	Wnt	signaling	pathway	(P00057)	1	1.1%	1.6%
29	Heterotrimeric	G-protein	signaling	pathway-Gi	alpha	and	Gs	alpha	mediated
pathway	(P00026)	1	1.1%	1.6%
30	Glycolysis	(P00024)	1	1.1%	1.6%
31	Toll	receptor	signaling	pathway	(P00054)	1	1.1%	1.6%
32	T	cell	activation	(P00053)	2	2.3%	3.1%
33	FGF	signaling	pathway	(P00021)	1	1.1%	1.6%

From PANTHER, the gene list and the pathways they work in have been identified, with the piechart showing the percentage of genes that are present in each pathway. The most prominent pathway in the effects of hypoxia on human neutrophils is identified as the Inflammation mediated by chemokine and cytokine signaling pathway.

Discussion

In this project, the aim was to stimate gene expression levels, and analyse the results to identify the genes that are changing between the two conditions of normal and hypoxia, defining the potential pathways that hypoxia may have altered in neutrophils. The methods of RMA and MAS5 were used, and first diagnostics performed in order to identify the suitable method to continue with. RMA was chosen as no further normalisation was required. The PUMA package was also used, and the data combined using a Bayesian Hierarchical model, further analysis was done in order to obtain the fold change in gene expression. Limma was used for Differential Expression Analysis, and the p-value calculated. The data was visualised using PCA, indicating that there is a clear difference between the gene expression of neutrophils under normal, or hypoxia conditions. This was further supported by the heatmaps generated.

Through the use of PANTHER, it was possible to identify the key pathways that are regulating the effects of hypoxia on human neutrophils - The Inflammation mediated by chemikine and cytokine signaling pathway.

References

https://www.bioconductor.org/packages/devel/bioc/vignettes/puma/inst/doc/puma.pdf https://www.bioconductor.org/packages/devel/bioc/manuals/puma/man/puma.pdf https://www.bioconductor.org/packages/release/bioc/vignettes/affy/inst/doc/affy.pdf http://svitsrv25.epfl.ch/R-doc/library/Biobase/html/00Index.html#P http://bioinfo.cipf.es/babelomicstutorial/maplot http://www.bioinformatics.babraham.ac.uk/projects/seqmonk/Help/3 Visualisation/3.2 Figures and Graphs/3.2.13 The MA Plot.html https://www.biostars.org/p/101727/ http://www.nature.com/ng/journal/v32/n4s/pdf/ng1032.pdf http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-15-103 http://www.nature.com/nbt/journal/v26/n3/pdf/nbt0308-303.pdf http://wiki.bits.vib.be/index.php/Analyze_your_own_microarray_data_in_R/Bioconductor#MA_plots http://arrayanalysis.org/main.html

In [0]:

** THE EFFECT OF HYPOXIA ON HUMAN NEUTROPHILS **

** Data Analysis **