Sharedsupport / knitr_example.RnwOpen in CoCalc
Author: William A. Stein
License: GNU General Public License v3.0
Description: Examples for support purposes.
1
\documentclass[12pt]{article}
2
3
\usepackage{times}
4
\usepackage{hyperref}
5
\hypersetup{pdfpagemode=UseNone} % don't show bookmarks on initial view
6
\hypersetup{colorlinks, urlcolor={blue}}
7
8
% revise margins
9
\setlength{\headheight}{0.0in}
10
\setlength{\topmargin}{0.0in}
11
\setlength{\headsep}{0.0in}
12
\setlength{\textheight}{8.65in}
13
\setlength{\footskip}{0.35in}
14
\setlength{\oddsidemargin}{0.0in}
15
\setlength{\evensidemargin}{0.0in}
16
\setlength{\textwidth}{6.5in}
17
18
\setlength{\parskip}{6pt}
19
\setlength{\parindent}{0pt}
20
21
\begin{document}
22
23
{\sffamily \textbf{An example Knitr/R Markdown document}}
24
25
\href{http://kbroman.org}{Karl W Broman}
26
27
This is a portion of the ``\href{http://www.rqtl.org/rqtltour2.pdf}{A shorter tour of R/qtl}''
28
tutorial, developed here in multiple formats to illustrate the use of knitr.
29
This particular document is written with \href{http://www.latex-project.org}{LaTeX}.
30
31
<<knitr_options, include=FALSE>>=
32
opts_chunk$set(fig.width=12, fig.height=4, fig.path='RnwFigs/',
33
warning=FALSE, message=FALSE, tidy=FALSE)
34
options(width=60)
35
set.seed(53079239)
36
# install R/qtl package if necessary:
37
if(!require("qtl")) install.packages("qtl", repos="http://cran.us.r-project.org")
38
@
39
40
\bigskip
41
{\sffamily \textbf{Preliminaries}}
42
\nopagebreak
43
44
To install R/qtl, you need to first install the package.
45
Type (within R) {\tt install.packages("qtl")}
46
(This needs to be done just once.)
47
48
You then load the R/qtl package using the {\tt library} function:
49
50
<<load_qtl>>=
51
library(qtl)
52
@
53
54
This needs to be done every time you start R. (There is a way to
55
have the package loaded automatically every time, but we won't discuss
56
that here.)
57
58
To get help on the functions and data sets in R
59
(and in R/qtl), use {\tt help()} or {\tt ?}. For example, to view the help
60
file for the {\tt read.cross} function, type one of the following:
61
62
<<help, eval=FALSE>>=
63
help(read.cross)
64
?read.cross
65
@
66
67
\bigskip
68
{\sffamily \textbf{Data import}}
69
\nopagebreak
70
71
We will consider data from
72
\href{http://www.ncbi.nlm.nih.gov/pubmed/12118100}{Sugiyama et al.,
73
Physiol Genomics 10:5--12, 2002}. Load the data into R/qtl as
74
follows.
75
76
<<load_cross>>=
77
sug <- read.cross("csv", "http://www.rqtl.org", "sug.csv",
78
genotypes=c("CC", "CB", "BB"),
79
alleles=c("C", "B"))
80
@
81
82
83
The function {\tt read.cross} is for importing data into R/qtl.
84
{\tt "sug.csv"} is the name of the file, which we import directly
85
from the R/qtl website. {\tt genotypes} indicates the codes used for
86
the genotypes; {\tt alleles} indicates single-character codes to be
87
used in plots and such.
88
89
{\tt read.cross} loads the data from the file and formats it into
90
a special cross object, which is then assigned to {\tt sug} via the
91
assignment operator {\tt <-}.
92
93
The data are from an intercross between BALB/cJ and CBA/CaJ; only male
94
offspring were considered. There are four phenotypes: blood pressure,
95
heart rate, body weight, and heart weight. We will focus on the blood
96
pressure phenotype, will consider just the \Sexpr{nind(sug)} individuals with
97
genotype data and, for simplicity, will focus on the autosomes.
98
99
100
\bigskip
101
{\sffamily \textbf{Summaries}}
102
\nopagebreak
103
104
The data object {\tt sug} is complex; it contains the genotype
105
data, phenotype data and genetic map. R has a certain amount of
106
``object oriented'' facilities, so that calls to functions like
107
{\tt summary} and {\tt plot} are interpreted appropriately for the object
108
considered.
109
110
The object {\tt sug} has ``class'' {\tt "cross"}, and so calls to
111
{\tt summary} and {\tt plot} are actually sent to the functions
112
{\tt summary.cross} and {\tt plot.cross}.
113
114
Use {\tt summary()} to get a quick summary of the data. (This also
115
performs a variety of checks of the integrity of the data.)
116
117
<<summary_cross>>=
118
summary(sug)
119
@
120
121
We see that this is an intercross with \Sexpr{nind(sug)} individuals.
122
There are \Sexpr{nphe(sug)} phenotypes, and genotype data at
123
\Sexpr{totmar(sug)} markers across the \Sexpr{nchr(sug)} autosomes. The genotype
124
data is quite complete.
125
126
Use {\tt plot()} to get a summary plot of the data.
127
128
<<summary_plot, fig.height=8>>=
129
plot(sug)
130
@
131
132
The plot in the upper-left shows the pattern of missing genotype data, with
133
black pixels corresponding to missing genotypes. The next plot shows
134
the genetic map of the typed markers. The following plots are
135
histograms or bar plots for the six phenotypes. The last two
136
``phenotypes'' are sex (with 1 corresponding to males) and mouse ID.
137
138
139
\bigskip
140
{\sffamily \textbf{Single-QTL analysis}}
141
\nopagebreak
142
143
Let's now proceed to QTL mapping via a single-QTL model.
144
145
We first calculate the QTL genotype probabilities, given the
146
observed marker data, via the function {\tt calc.genoprob}. This is
147
done at the markers and at a grid along the chromosomes. The argument
148
{\tt step} is the density of the grid (in cM), and defines the
149
density of later QTL analyses.
150
151
<<calc_genoprob>>=
152
sug <- calc.genoprob(sug, step=1)
153
@
154
155
The output of {\tt calc.genoprob} is the same cross object as input,
156
with additional information (the QTL genotype probabilities) inserted. We
157
assign this back to the original object (writing over the previous
158
data), though it could have also been assigned to a new object.
159
160
To perform a single-QTL genome scan, we use the function {\tt scanone}.
161
By default, it performs standard interval mapping (that is, maximum
162
likelihood via the EM algorithm). Also, by default, it considers the
163
first phenotype in the input cross object (in this case, blood
164
pressure).
165
166
<<scanone>>=
167
out.em <- scanone(sug)
168
@
169
170
The output has ``class'' {\tt "scanone"}. The {\tt summary}
171
function is passed to the function {\tt summary.scanone}, and gives
172
the maximum LOD score on each chromosome.
173
174
<<summary_scanone>>=
175
summary(out.em)
176
@
177
178
Alternatively, we can give a threshold, e.g., to only see those
179
chromosomes with LOD $>$ 3.
180
181
<<summary_w_threshold>>=
182
summary(out.em, threshold=3)
183
@
184
185
We can plot the results as follows.
186
187
<<plot_scanone>>=
188
plot(out.em)
189
@
190
191
We can do the genome scan via Haley-Knott regression by calling
192
{\tt scanone} with the argument {\tt method="hk"}.
193
194
<<scanone_hk>>=
195
out.hk <- scanone(sug, method="hk")
196
@
197
198
We may plot the two sets of LOD curves together in a single call
199
to {\tt plot}.
200
201
<<plot_em_and_hk>>=
202
plot(out.em, out.hk, col=c("blue", "red"))
203
@
204
205
Alternatively, we could do the following (figure not included, for brevity):
206
207
<<plot_em_and_hk_alt, eval=FALSE>>=
208
plot(out.em, col="blue")
209
plot(out.hk, col="red", add=TRUE)
210
@
211
212
It's perhaps more informative to plot the differences:
213
214
<<plot_diff>>=
215
plot(out.hk - out.em, ylim=c(-0.3, 0.3),
216
ylab="LOD(HK)-LOD(EM)")
217
@
218
219
\bigskip
220
{\sffamily \textbf{Permutation tests}}
221
\nopagebreak
222
223
To perform a permutation test, to get a genome-wide significance
224
threshold or genome-scan-adjusted p-values, we use {\tt scanone} just as
225
before, but with an additional argument, {\tt n.perm}, indicating the
226
number of permutation replicates. It's quickest to use Haley-Knott
227
regression.
228
229
<<scanone_perm>>=
230
operm <- scanone(sug, method="hk", n.perm=1000)
231
@
232
233
A histogram of the results (the 1000 genome-wide maximum LOD
234
scores) is obtained as follows:
235
236
<<plot_perm>>=
237
plot(operm)
238
@
239
240
Significance thresholds may be obtained via the {\tt summary}
241
function:
242
243
<<summary_perm>>=
244
summary(operm)
245
summary(operm, alpha=c(0.05, 0.2))
246
@
247
248
The permutation results may be used along with
249
the {\tt scanone} results to have significance thresholds and
250
p-values calculated automatically:
251
252
<<summary_scanone_w_perm>>=
253
summary(out.hk, perms=operm, alpha=0.2, pvalues=TRUE)
254
@
255
256
\bigskip
257
{\sffamily \textbf{Interval estimates of QTL location}}
258
\nopagebreak
259
260
For the blood pressure phenotype, we've seen good evidence for QTL on
261
chromosomes 7 and 15. Interval estimates of the location of QTL are
262
commonly obtained via 1.5-LOD support intervals, which may be
263
calculated via the function {\tt lodint}. Alternatively, an
264
approximate Bayes credible interval may be obtained with
265
{\tt bayesint}.
266
267
To obtain the 1.5-LOD support interval and 95\% Bayes interval
268
for the QTL on chromosome 7, type the following.
269
The first and last rows define the ends of the intervals; the middle
270
row is the estimated QTL location.
271
272
<<lodint_bayesint>>=
273
lodint(out.hk, chr=7)
274
bayesint(out.hk, chr=7)
275
@
276
277
It is sometimes useful to identify the closest flanking markers;
278
use {\tt expandtomarkers=TRUE}:
279
280
<<expandtomarkers>>=
281
lodint(out.hk, chr=7, expandtomarkers=TRUE)
282
bayesint(out.hk, chr=7, expandtomarkers=TRUE)
283
@
284
285
We can calculate the 2-LOD support interval and the 99\% Bayes
286
interval as follows.
287
288
<<lodint_2>>=
289
lodint(out.hk, chr=7, drop=2)
290
bayesint(out.hk, chr=7, prob=0.99)
291
@
292
293
The intervals for the chr 15 locus may be calculated as follows.
294
295
<<lodint_chr15>>=
296
lodint(out.hk, chr=15)
297
bayesint(out.hk, chr=15)
298
@
299
300
301
\bigskip
302
{\sffamily \textbf{R and package versions used}}
303
\nopagebreak
304
305
<<sessionInfo, include=TRUE, echo=TRUE, results='markup'>>=
306
sessionInfo()
307
@
308
309
\end{document}
310