Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download

Jupyter notebook Autumn2015/Week12_cAt/mda13gsl_Week12.ipynb

Views: 62
Kernel: R

BMS 353 Bioinformatics Project

Gi Shuen Kirsty Lee

This project investigates the role of a transcription factor, SP1, in colon cells. The data being analysed will be from cell cultures 48 hours after SP1 was silenced, and control cells where it was not silenced. There are two samples in each condition, and the gene expression profiles were quantified using Affymetrix GeneChip HGU133 PLUS 2.

The pipeline for the analysis of this project is as follows:

  1. Load data and convert to affymetrix

  2. Pre-process data with RMA to normalize data

    • Visualise data with MA plots and box plot

  3. Analyse differential gene expression with Puma

    • Visualise data with MA plots and box plots

    • Extract the top differentially expressed genes

    • Annotate probeset with gene names and symbols

  4. Principal component analysis

  5. Hierarchical clustering with a heatmap

  6. Functional pathway analysis with PANTHER

1. Loading data

options(jupyter.plot_mimetypes ='image/png')

This line of code was executed so as to make sure output plots are executed as PNG files, which are small in size - to keep the file size of this notebook small.

The affy package is then loaded, and the working directory set, in order to be able to load the data and convert it into an Affybatch Object for pre-processing.

library(affy)
Loading required package: BiocGenerics Loading required package: parallel Attaching package: ‘BiocGenerics’ The following objects are masked from ‘package:parallel’: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following objects are masked from ‘package:stats’: IQR, mad, xtabs The following objects are masked from ‘package:base’: anyDuplicated, append, as.data.frame, as.vector, cbind, colnames, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unlist, unsplit Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'.
setwd("~/Autumn2015/Week12_cAt/data_groupA/") getwd()
[1] "/projects/20138cfe-1ee5-4c4b-ad6d-60908cc27d7e/Autumn2015/Week12_cAt/data_groupA"

Data is now loaded into an AffyBatch object, sp1.

sp1_filenames <- c("M48-1.CEL", "M48-2.CEL", "S48-1.CEL", "S48-2.CEL") sp1 <- ReadAffy(filenames=sp1_filenames)
sp1
Warning message: : replacing previous import by ‘utils::tail’ when loading ‘hgu133plus2cdf’Warning message: : replacing previous import by ‘utils::head’ when loading ‘hgu133plus2cdf’
AffyBatch object size of arrays=1164x1164 features (18 kb) cdf=HG-U133_Plus_2 (54675 affyids) number of samples=4 number of genes=54675 annotation=hgu133plus2 notes=

Now that the data is loaded in an Affybatch Object, sp1, the data will be pre-processed using rma to normalize the data. The results will be extracted from an expression set into an object, eset_rma.

2. Pre-processing data

library(affy) sp1_rma <- affy::rma(sp1) sp1_rma
Background correcting Normalizing Calculating Expression
ExpressionSet (storageMode: lockedEnvironment) assayData: 54675 features, 4 samples element names: exprs protocolData sampleNames: M48-1.CEL M48-2.CEL S48-1.CEL S48-2.CEL varLabels: ScanDate varMetadata: labelDescription phenoData sampleNames: M48-1.CEL M48-2.CEL S48-1.CEL S48-2.CEL varLabels: sample varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation: hgu133plus2
eset_rma <- exprs(sp1_rma)

This pre-processed data will now be visualized with a boxplot, and MA plot to make sure that the data is normalized.

library(puma) library(affy) library(limma)
Loading required package: oligo Loading required package: oligoClasses Welcome to oligoClasses version 1.32.0 Attaching package: ‘oligoClasses’ The following object is masked from ‘package:affy’: list.celfiles Loading required package: Biostrings Loading required package: S4Vectors Loading required package: stats4 Loading required package: IRanges Loading required package: XVector ================================================================================ Welcome to oligo version 1.34.0 ================================================================================ Attaching package: ‘oligo’ The following objects are masked from ‘package:affy’: intensity, MAplot, mm, mm<-, mmindex, pm, pm<-, pmindex, probeNames, rma Loading required package: mclust Package 'mclust' version 5.1 Type 'citation("mclust")' for citing this R package in publications. Attaching package: ‘limma’ The following object is masked from ‘package:oligo’: backgroundCorrect The following object is masked from ‘package:BiocGenerics’: plotMA
groups<-c("absent","present") table_rma<-data.frame(sampleNames(sp1_rma),groups) group10hr <- factor(groups[1:4]) MAplot(sp1_rma[,1:4], pairs=TRUE, groups=group10hr)
Image in a Jupyter notebook

This MA plot allows us to visualize that the data has been normalized by RMA pre-processing, as a flat relationship can be observed between the ratio of signal between two arrays and signal level.

par(mfrow=c(2,2)) boxplot(sp1, main="Boxplot of original data") boxplot(eset_rma, main="Boxplot of pre-processed data")
Image in a Jupyter notebook

The comparison of the boxplots before and after RMA pre-processing further confirms that the data has been normalized - as it can be seen that the mean of the distributions are the same and are centered.

3. Differential expression analysis with Puma

The puma package will now be loaded so as to be able to perform differential expression analysis of the genes.

library(puma)

Phenodata for the dataframe was annotated so that the data can be combined with pumaCombImproved after expression levels of genes are estimated with the mmgmos package in puma.

pData (sp1) <- data.frame("SP1.Silencing"=c("absent", "absent", "present", "present"), row.names=rownames(pData(sp1))) pData(sp1)
SP1.Silencing M48-1.CEL absent M48-2.CEL absent S48-1.CEL present S48-2.CEL present

The data was then processed with the mmgmos package in puma to estimate the expression levels of the genes.

sp1_puma <- mmgmos(sp1) sp1_puma
Model optimising .............................................................................................................. Expression values calculating .............................................................................................................. Done.
Expression Set (exprReslt) with 54675 genes 4 samples An object of class 'AnnotatedDataFrame' sampleNames: M48-1.CEL M48-2.CEL S48-1.CEL S48-2.CEL varLabels: SP1.Silencing varMetadata: labelDescription
eset_puma <- exprs(sp1_puma)

This analysis by puma's mmgmos function was then visualized as an MA plot and a boxplot, seen below.

groups<-c("absent","present") table_puma<-data.frame(sampleNames(sp1_puma),groups) group10hr <- factor(groups[1:2]) MAplot(sp1_puma[,1:2], pairs=TRUE, groups=group10hr)
Image in a Jupyter notebook

This MA plot allows us to see that the outliers of the data have been removed, and that the difference between the two conditions is symmetrical, and the spread is mostly centered around the mean.

boxplot(sp1_puma, main="Boxplot of puma analysed data")
Image in a Jupyter notebook

From the boxplot, it can be observed that the data is already normalized and does not need any further normalisation. However it can be seen that there is variation outside of the upper and lower quartiles of the data. As the outliers have been removed, it can be seen that the means have decreased in value slightly, when comparing with the boxplot of the original data shown earlier.

The data was then combined with pumaCombImproved so as to be able to perform differential expression analysis. This combining was done using a bayesian Hierarchical model in puma. Due to the long processing time that this took, the code was executed and saved to a file, sp1_comb.RDA, which is seen to be loaded in the following cell.

The code that was used to combine the data is as follows:

save(sp1_comb, file="sp1_comb.RDA")```
load(file="sp1_comb.RDA")
sp1_comb
ExpressionSet (storageMode: lockedEnvironment) assayData: 54675 features, 2 samples element names: exprs, se.exprs protocolData: none phenoData sampleNames: absent present varLabels: SP1.Silencing varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation:
eset_comb <- exprs(sp1_comb)

Differential expression analysis was then carried out using the puma package, and results written into a .csv file. This allows us to compare relative gene expression between the control and silenced cell conditions, in order to see which genes are affected the most by the silencing of SP1.

pumaDEres <- pumaDE(sp1_comb) write.reslts(pumaDEres, file="pumaDEres")
pumaDEresults <- read.csv(file="pumaDEres_statistics.csv")
pumaFC <- read.csv(file="pumaDEres_FCs.csv")

The P-like value of the results from pumaDE was also calculated, and visualized as a histogram.

puma_plv <- pLikeValues(pumaDEres, contrast=1, direction="either")
hist(puma_plv, main="Histogram of P-like values calculated from pumaDE")
Image in a Jupyter notebook

P-like values range between 0 and 1, where 0 has the highest probability of being differentially expressed, and 1 the lowest. From this histogram, it can be seen that a large majority of the genes were likely not differentially expressed, and there is a small set, of frequency roughly 100-200 that have a high probability of being differentially expressed.

The top differentially expressed genes were then extracted using >1 and <-1 for cut off values in fold change, and PPLR cut off values of >0.99 and <0.01 to compare different methods of extracting the top differentially expressed genes.

topFC1 <- pumaFC[which(pumaFC$present_vs_absent< -1),]
topFC2 <- pumaFC[which(pumaFC$present_vs_absent>1),]
topFC <- merge(topFC1, topFC2, all=TRUE)
length(topFC$present_vs_absent)
[1] 261

It can be seen that 261 genes were found to be the genes most likely to be affected by the the silencing of SP1.

pumaDEresultsX <- as.character(pumaDEresults$X)

The top differentially expressed genes are now to be extracted using the PPLR values, and merged into a dataset called topgenes.

topgenes1 <- pumaDEresults[which(pumaDEresults$present_vs_absent>0.99),]
topgenes2 <- pumaDEresults[which(pumaDEresults$present_vs_absent<0.01),]
topgenes <- merge(topgenes1, topgenes2, all=TRUE)
length(topgenes$X)
[1] 990

It can be seen that the two ways of extracting the genes most affected by the silencing of SP1 differ in their stringency, and using the fold change is a much stricter method. To be on the conservative side, the genes extracted by their PPLR value will be used.

Topgenes, containing the probesets, was then annotated with the symbol, gene name, and probe ID - annotated_list, and the NAs present omitted. The final list of genes that display the most change between conditions is loaded under the name 'annotated_list_na'.

library(annotate) library(hgu133plus2.db)
Loading required package: AnnotationDbi Loading required package: XML Loading required package: org.Hs.eg.db Loading required package: DBI
geneProbes <- (as.character(topgenes$X)) annotated_list <- select(hgu133plus2.db, geneProbes, c("SYMBOL", "GENENAME"), "PROBEID")
'select()' returned 1:many mapping between keys and columns
annotated_list
PROBEID SYMBOL 1 1552580_at TRIML2 2 1552658_a_at NAV3 3 1553142_at LACC1 4 1553292_s_at SGK494 5 1553402_a_at HFE 6 1553546_at <NA> 7 1553679_s_at VKORC1L1 8 1553685_s_at SP1 9 1554352_s_at DENND4A 10 1554471_a_at ANKRD13C 11 1554883_a_at ERCC8 12 1555673_at KRTAP2-3 13 1555673_at KRTAP2-4 14 1555888_at UBR5-AS1 15 1555889_a_at CRTAP 16 1556285_s_at PPA2 17 1556499_s_at COL1A1 18 1556696_s_at NR2F1-AS1 19 1556769_a_at LOC105371967 20 1557238_s_at <NA> 21 1557411_s_at SLC25A43 22 1557487_at <NA> 23 1557805_at <NA> 24 1558111_at MBNL1 25 1558522_at MPP6 26 1558605_at <NA> 27 1558606_s_at TTC5 28 1558871_at LOC105377276 29 1558906_a_at OSER1-AS1 30 1558930_at LINC00460 â‹® â‹® â‹® 1009 242317_at HIGD1A 1010 242473_at TRAF4 1011 242474_s_at VMA21 1012 242509_at <NA> 1013 242577_at LOC389834 1014 242584_at FAM161A 1015 242629_at RAB3B 1016 242794_at MAML3 1017 242909_at <NA> 1018 243010_at MSI2 1019 243031_at <NA> 1020 243110_x_at NPW 1021 243582_at SH3RF2 1022 243610_at C9orf135 1023 244163_at SEMA3A 1024 244187_at <NA> 1025 244563_at QSER1 1026 244764_at HIVEP3 1027 244829_at LINC00518 1028 32088_at BLZF1 1029 34408_at RTN2 1030 34764_at LARS2 1031 38892_at GLTSCR1L 1032 50221_at TFEB 1033 53720_at C19orf66 1034 58916_at KCTD14 1035 58916_at NDUFC2-KCTD14 1036 59705_at SCLY 1037 65472_at C2orf68 1038 91826_at EPS8L1 GENENAME 1 tripartite motif family-like 2 2 neuron navigator 3 3 laccase (multicopper oxidoreductase) domain containing 1 4 uncharacterized serine/threonine-protein kinase SgK494 5 hemochromatosis 6 <NA> 7 vitamin K epoxide reductase complex, subunit 1-like 1 8 Sp1 transcription factor 9 DENN/MADD domain containing 4A 10 ankyrin repeat domain 13C 11 excision repair cross-complementation group 8 12 keratin associated protein 2-3 13 keratin associated protein 2-4 14 UBR5 antisense RNA 1 15 cartilage associated protein 16 pyrophosphatase (inorganic) 2 17 collagen, type I, alpha 1 18 NR2F1 antisense RNA 1 19 uncharacterized LOC105371967 20 <NA> 21 solute carrier family 25, member 43 22 <NA> 23 <NA> 24 muscleblind-like splicing regulator 1 25 membrane protein, palmitoylated 6 (MAGUK p55 subfamily member 6) 26 <NA> 27 tetratricopeptide repeat domain 5 28 uncharacterized LOC105377276 29 OSER1 antisense RNA 1 (head to head) 30 long intergenic non-protein coding RNA 460 â‹® â‹® 1009 HIG1 hypoxia inducible domain family, member 1A 1010 TNF receptor-associated factor 4 1011 VMA21 vacuolar H+-ATPase homolog (S. cerevisiae) 1012 <NA> 1013 ankyrin repeat domain 57 pseudogene 1014 family with sequence similarity 161, member A 1015 RAB3B, member RAS oncogene family 1016 mastermind-like transcriptional coactivator 3 1017 <NA> 1018 musashi RNA-binding protein 2 1019 <NA> 1020 neuropeptide W 1021 SH3 domain containing ring finger 2 1022 chromosome 9 open reading frame 135 1023 sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3A 1024 <NA> 1025 glutamine and serine rich 1 1026 human immunodeficiency virus type I enhancer binding protein 3 1027 long intergenic non-protein coding RNA 518 1028 basic leucine zipper nuclear factor 1 1029 reticulon 2 1030 leucyl-tRNA synthetase 2, mitochondrial 1031 GLTSCR1-like 1032 transcription factor EB 1033 chromosome 19 open reading frame 66 1034 potassium channel tetramerization domain containing 14 1035 NDUFC2-KCTD14 readthrough 1036 selenocysteine lyase 1037 chromosome 2 open reading frame 68 1038 EPS8-like 1
length(annotated_list$GENENAME)
[1] 1038
annotated_list_na<- na.omit(annotated_list) length(annotated_list_na$GENENAME) annotated_list_na
[1] 998
PROBEID SYMBOL 1 1552580_at TRIML2 2 1552658_a_at NAV3 3 1553142_at LACC1 4 1553292_s_at SGK494 5 1553402_a_at HFE 7 1553679_s_at VKORC1L1 8 1553685_s_at SP1 9 1554352_s_at DENND4A 10 1554471_a_at ANKRD13C 11 1554883_a_at ERCC8 12 1555673_at KRTAP2-3 13 1555673_at KRTAP2-4 14 1555888_at UBR5-AS1 15 1555889_a_at CRTAP 16 1556285_s_at PPA2 17 1556499_s_at COL1A1 18 1556696_s_at NR2F1-AS1 19 1556769_a_at LOC105371967 21 1557411_s_at SLC25A43 24 1558111_at MBNL1 25 1558522_at MPP6 27 1558606_s_at TTC5 28 1558871_at LOC105377276 29 1558906_a_at OSER1-AS1 30 1558930_at LINC00460 31 1559739_at CHPT1 32 1559822_s_at MTDH 34 1561347_a_at LOC101927166 35 1562019_at NT5DC4 36 1562738_a_at USP3-AS1 â‹® â‹® â‹® 1003 241959_at ANAPC10 1004 242005_at LINC00973 1006 242146_at SNRPA1 1007 242149_at FAM210A 1009 242317_at HIGD1A 1010 242473_at TRAF4 1011 242474_s_at VMA21 1013 242577_at LOC389834 1014 242584_at FAM161A 1015 242629_at RAB3B 1016 242794_at MAML3 1018 243010_at MSI2 1020 243110_x_at NPW 1021 243582_at SH3RF2 1022 243610_at C9orf135 1023 244163_at SEMA3A 1025 244563_at QSER1 1026 244764_at HIVEP3 1027 244829_at LINC00518 1028 32088_at BLZF1 1029 34408_at RTN2 1030 34764_at LARS2 1031 38892_at GLTSCR1L 1032 50221_at TFEB 1033 53720_at C19orf66 1034 58916_at KCTD14 1035 58916_at NDUFC2-KCTD14 1036 59705_at SCLY 1037 65472_at C2orf68 1038 91826_at EPS8L1 GENENAME 1 tripartite motif family-like 2 2 neuron navigator 3 3 laccase (multicopper oxidoreductase) domain containing 1 4 uncharacterized serine/threonine-protein kinase SgK494 5 hemochromatosis 7 vitamin K epoxide reductase complex, subunit 1-like 1 8 Sp1 transcription factor 9 DENN/MADD domain containing 4A 10 ankyrin repeat domain 13C 11 excision repair cross-complementation group 8 12 keratin associated protein 2-3 13 keratin associated protein 2-4 14 UBR5 antisense RNA 1 15 cartilage associated protein 16 pyrophosphatase (inorganic) 2 17 collagen, type I, alpha 1 18 NR2F1 antisense RNA 1 19 uncharacterized LOC105371967 21 solute carrier family 25, member 43 24 muscleblind-like splicing regulator 1 25 membrane protein, palmitoylated 6 (MAGUK p55 subfamily member 6) 27 tetratricopeptide repeat domain 5 28 uncharacterized LOC105377276 29 OSER1 antisense RNA 1 (head to head) 30 long intergenic non-protein coding RNA 460 31 choline phosphotransferase 1 32 metadherin 34 uncharacterized LOC101927166 35 5'-nucleotidase domain containing 4 36 USP3 antisense RNA 1 â‹® â‹® 1003 anaphase promoting complex subunit 10 1004 long intergenic non-protein coding RNA 973 1006 small nuclear ribonucleoprotein polypeptide A' 1007 family with sequence similarity 210, member A 1009 HIG1 hypoxia inducible domain family, member 1A 1010 TNF receptor-associated factor 4 1011 VMA21 vacuolar H+-ATPase homolog (S. cerevisiae) 1013 ankyrin repeat domain 57 pseudogene 1014 family with sequence similarity 161, member A 1015 RAB3B, member RAS oncogene family 1016 mastermind-like transcriptional coactivator 3 1018 musashi RNA-binding protein 2 1020 neuropeptide W 1021 SH3 domain containing ring finger 2 1022 chromosome 9 open reading frame 135 1023 sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3A 1025 glutamine and serine rich 1 1026 human immunodeficiency virus type I enhancer binding protein 3 1027 long intergenic non-protein coding RNA 518 1028 basic leucine zipper nuclear factor 1 1029 reticulon 2 1030 leucyl-tRNA synthetase 2, mitochondrial 1031 GLTSCR1-like 1032 transcription factor EB 1033 chromosome 19 open reading frame 66 1034 potassium channel tetramerization domain containing 14 1035 NDUFC2-KCTD14 readthrough 1036 selenocysteine lyase 1037 chromosome 2 open reading frame 68 1038 EPS8-like 1
symbol<- annotated_list_na$SYMBOL

Principle Components Analysis was then carried out on the puma analysed data to visualise the variation between the conditions.

4. Principal Component Analysis

pumapca_sp1<-pumaPCA(sp1_puma) plot(pumapca_sp1, main="Principal Component Analysis of puma analysed data")
Iteration number: 1 Iteration number: 2 Iteration number: 3 Iteration number: 4 Iteration number: 5 Iteration number: 6 Iteration number: 7 Iteration number: 8 Iteration number: 9 Iteration number: 10 Iteration number: 11 Iteration number: 12 Iteration number: 13 Iteration number: 14
Image in a Jupyter notebook

PCA allows for the data to be separated into unrelated dimensions, allowing for the data to be visualized without being affected by the other factors. From this PCA plot it can be seen that the two conditions where SP1 was silenced (red triangles) have a large variance in relation to the control conditions - silencing of SP1 affected gene transcription of a large number of genes.

** 5. Hierarchical Clustering**

library(gplots)
Attaching package: ‘gplots’ The following object is masked from ‘package:IRanges’: space The following object is masked from ‘package:stats’: lowess

Hierarchical clustering was then performed, with the output being a heatmap - seen below. Heatmaps allow for gene expression levels to be visualized across a number of samples or conditions, allowing for comparison across conditions.

# find the IDs that belong to the DE genes. tID<-topgenes$X ind<-1 j<-1 for (i in 1: length(tID)) { ind[j]<-which(rownames(sp1_puma)==tID[i],arr.ind=TRUE) j<-j+1 } # ind is the vector with all the indexes topExpr<-exprs(sp1_puma)[ind,] heatmap.2(topExpr, col=redgreen(75), scale="row", key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5, cexCol=0.8)
Image in a Jupyter notebook

It can be clearly observed from the heatmap generated that there is a clear difference in gene expression between the silenced cells and control cells. Genes that are seen to be downregulated in the control conditions are seen to be upregulated, and vice versa. A comparison of M48.2 and S48.2 (Sample 2 of Control and Silenced cells) reveals that the colours on the heatmap are completely opposite of each other (red becomes green, and vice versa) - allowing the inference that after silencing of SP1, the effects on gene transcription were reversed.

6. Functional pathway analysis with PANTHER

write(symbol, file="symbols.tsv")

The data in the Symbols column of the annotated list of top differentially expressed genes was saved, and inputted to the Panther database in order to investigate which pathways were involved. The list of pathways generated by the Panther database was saved and has been loaded below, under the dataframe 'genepathways'.

panther <- read.table("pantherGeneList.txt", sep="\t")
genepathways <- as.data.frame(panther$V3)
print(genepathways)
panther$V3 1 p53 pathway feedback loops 2 2 Metabotropic glutamate receptor group III pathway 3 Opioid proenkephalin pathway 4 Arginine biosynthesis 5 Axon guidance mediated by Slit/Robo 6 Vitamin B6 metabolism 7 Opioid proopiomelanocortin pathway 8 Apoptosis signaling pathway 9 5HT1 type receptor mediated signaling pathway 10 VEGF signaling pathway 11 FGF signaling pathway 12 p53 pathway 13 Hypoxia response via HIF activation 14 Cell cycle 15 General transcription regulation 16 Dopamine receptor mediated signaling pathway 17 Insulin/IGF pathway-protein kinase B signaling cascade 18 Axon guidance mediated by netrin 19 Thyrotropin-releasing hormone receptor signaling pathway 20 Threonine biosynthesis 21 Cortocotropin releasing factor receptor signaling pathway 22 T cell activation 23 Endothelin signaling pathway 24 FAS signaling pathway 25 Angiotensin II-stimulated signaling through G proteins and beta-arrestin 26 Salvage pyrimidine ribonucleotides 27 Fructose galactose metabolism 28 Pentose phosphate pathway 29 Glycolysis 30 Alpha adrenergic receptor signaling pathway 31 p38 MAPK pathway 32 Pyridoxal-5-phosphate biosynthesis 33 Interleukin signaling pathway 34 Lysine biosynthesis 35 Parkinson disease 36 Heme biosynthesis 37 Insulin/IGF pathway-mitogen activated protein kinase kinase/MAP kinase cascade 38 Opioid prodynorphin pathway 39 Gamma-aminobutyric acid synthesis 40 5HT4 type receptor mediated signaling pathway 41 Oxidative stress response 42 Pyrimidine Metabolism 43 General transcription by RNA polymerase I 44 Histamine H1 receptor mediated signaling pathway 45 Notch signaling pathway 46 Cadherin signaling pathway 47 5HT3 type receptor mediated signaling pathway 48 Methylmalonyl pathway 49 Alzheimer disease-amyloid secretase pathway 50 Integrin signalling pathway 51 Heterotrimeric G-protein signaling pathway-Gq alpha and Go alpha mediated pathway 52 Histamine H2 receptor mediated signaling pathway 53 Synaptic vesicle trafficking 54 Metabotropic glutamate receptor group II pathway 55 Muscarinic acetylcholine receptor 2 and 4 signaling pathway 56 PDGF signaling pathway 57 Inflammation mediated by chemokine and cytokine signaling pathway 58 TGF-beta signaling pathway 59 Muscarinic acetylcholine receptor 1 and 3 signaling pathway 60 De novo purine biosynthesis 61 PI3 kinase pathway 62 Huntington disease 63 De novo pyrimidine ribonucleotides biosythesis 64 Angiogenesis 65 EGF receptor signaling pathway 66 DNA replication 67 Glutamine glutamate conversion 68 B cell activation 69 De novo pyrimidine deoxyribonucleotide biosynthesis 70 Methionine biosynthesis 71 Vitamin D metabolism and pathway 72 Gonadotropin-releasing hormone receptor pathway 73 Metabotropic glutamate receptor group I pathway 74 Serine glycine biosynthesis 75 Blood coagulation 76 5HT2 type receptor mediated signaling pathway 77 Heterotrimeric G-protein signaling pathway-rod outer segment phototransduction 78 Beta2 adrenergic receptor signaling pathway 79 Aminobutyrate degradation 80 Enkephalin release 81 Purine metabolism 82 Axon guidance mediated by semaphorins 83 Tetrahydrofolate biosynthesis 84 Heterotrimeric G-protein signaling pathway-Gi alpha and Gs alpha mediated pathway 85 Ionotropic glutamate receptor pathway 86 Alzheimer disease-presenilin pathway 87 Ubiquitin proteasome pathway 88 Cytoskeletal regulation by Rho GTPase 89 Ras Pathway 90 Coenzyme A biosynthesis 91 GABA-B receptor II signaling 92 Transcription regulation by bZIP transcription factor 93 Vasopressin synthesis 94 Wnt signaling pathway 95 Beta1 adrenergic receptor signaling pathway 96 Oxytocin receptor mediated signaling pathway 97 CCKR signaling map 98 Nicotinic acetylcholine receptor signaling pathway 99 Beta3 adrenergic receptor signaling pathway

The symbols inputted into Panther were also available for visualisation as a pie chart depending on the ontology. The pie charts for the biological process ontology and molecular function ontology can be seen below.

Pie chart showing the biological process ontology of the top differentially expressed genes 48 hours after SP1 silencing

Pie chart showing the biological process ontology of the top differentially expressed genes

It can be seen from this pie chart that majority of the genes that have been altered by the silencing of SP1 are largely related to metabolic or cellular processes in the cell, and to a lesser extent, biological regulation, developmental processes and localisation. As SP1 is a transcription factor that binds DNA, this is unsurprising - Tan and Khachigian (2009) have reported that it regulates the expression of a multitude of genes that control cellular processes.

Pie chart showing the molecular function ontology of the top differentially expressed genes 48 hours after SP1 silencing

Pie chart showing the molecular function ontology of the top differentially expressed genes

From this pie chart, it is observed that majority of the genes are associated with binding and catalytic activity in the cell, which is unsurprising as SP1 is a transcription factor, and would therefore be associated with a number of binding factors and enzymatic reactions in the cell.

Conclusion

From the analysis of the results obtained from the AffyBatch GeneChip, it can be concluded that SP1 has a large effect on gene transcription in the colon cell. Up to 99 pathways are affected - seen in the 'genepathways' dataframe; majority of which are common signalling pathways seen in most cells, such as the p53 pathway, Wnt signalling pathway, and TGF beta signalling pathway. The genes changing the most between the two conditions can be observed through the dataset 'topgenes'. Through the analysis of the top differentially expressed genes, it was found that majority of the genes affected were related to metabolic and cellular processes in the cell, and their molecular function were associated to binding and catalytic activity. This is expected as SP1 being a transcription factor would likely be associated with cellular and metabolic activity, which would require catalytic reactions. Silencing of SP1 resulted in the genes being either upregulated or downregulated, opposite to what they would have been in a control condition.