CoCalc -- knitr_example.Rnw

| Hosted by CoCalc | Download
1
\documentclass[12pt]{article}
2

3
\usepackage{times}
4
\usepackage{hyperref}
5
\hypersetup{pdfpagemode=UseNone} % don't show bookmarks on initial view
6
\hypersetup{colorlinks, urlcolor={blue}}
7

8
% revise margins
9
\setlength{\headheight}{0.0in}
10
\setlength{\topmargin}{0.0in}
11
\setlength{\headsep}{0.0in}
12
\setlength{\textheight}{8.65in}
13
\setlength{\footskip}{0.35in}
14
\setlength{\oddsidemargin}{0.0in}
15
\setlength{\evensidemargin}{0.0in}
16
\setlength{\textwidth}{6.5in}
17

18
\setlength{\parskip}{6pt}
19
\setlength{\parindent}{0pt}
20

21
\begin{document}
22

23
{\sffamily \textbf{An example Knitr/R Markdown document}}
24

25
\href{http://kbroman.org}{Karl W Broman}
26

27
This is a portion of the ``\href{http://www.rqtl.org/rqtltour2.pdf}{A shorter tour of R/qtl}''
28
tutorial, developed here in multiple formats to illustrate the use of knitr.
29
This particular document is written with \href{http://www.latex-project.org}{LaTeX}.
30

31
<<knitr_options, include=FALSE>>=
32
opts_chunk$set(fig.width=12, fig.height=4, fig.path='RnwFigs/',
33
               warning=FALSE, message=FALSE, tidy=FALSE)
34
options(width=60)
35
set.seed(53079239)
36
# install R/qtl package if necessary:
37
if(!require("qtl")) install.packages("qtl", repos="http://cran.us.r-project.org")
38
@
39

40
\bigskip
41
{\sffamily \textbf{Preliminaries}}
42
\nopagebreak
43

44
To install R/qtl, you need to first install the package.
45
Type (within R) {\tt install.packages("qtl")}
46
(This needs to be done just once.)
47

48
You then load the R/qtl package using the {\tt library} function:
49

50
<<load_qtl>>=
51
library(qtl)
52
@
53

54
This needs to be done every time you start R. (There is a way to
55
have the package loaded automatically every time, but we won't discuss
56
that here.)
57

58
To get help on the functions and data sets in R
59
(and in R/qtl), use {\tt help()} or {\tt ?}. For example, to view the help
60
file for the {\tt read.cross} function, type one of the following:
61

62
<<help, eval=FALSE>>=
63
help(read.cross)
64
?read.cross
65
@
66

67
\bigskip
68
{\sffamily \textbf{Data import}}
69
\nopagebreak
70

71
We will consider data from
72
\href{http://www.ncbi.nlm.nih.gov/pubmed/12118100}{Sugiyama et al.,
73
Physiol Genomics 10:5--12, 2002}. Load the data into R/qtl as
74
follows.
75

76
<<load_cross>>=
77
sug <- read.cross("csv", "http://www.rqtl.org", "sug.csv",
78
                  genotypes=c("CC", "CB", "BB"),
79
                  alleles=c("C", "B"))
80
@
81

82

83
The function {\tt read.cross} is for importing data into R/qtl.
84
{\tt "sug.csv"} is the name of the file, which we import directly
85
from the R/qtl website.  {\tt genotypes} indicates the codes used for
86
the genotypes; {\tt alleles} indicates single-character codes to be
87
used in plots and such.
88

89
{\tt read.cross} loads the data from the file and formats it into
90
a special cross object, which is then assigned to {\tt sug} via the
91
assignment operator {\tt <-}.
92

93
The data are from an intercross between BALB/cJ and CBA/CaJ; only male
94
offspring were considered.  There are four phenotypes: blood pressure,
95
heart rate, body weight, and heart weight.  We will focus on the blood
96
pressure phenotype, will consider just the \Sexpr{nind(sug)} individuals with
97
genotype data and, for simplicity, will focus on the autosomes.
98

99

100
\bigskip
101
{\sffamily \textbf{Summaries}}
102
\nopagebreak
103

104
The data object {\tt sug} is complex; it contains the genotype
105
data, phenotype data and genetic map.  R has a certain amount of
106
``object oriented'' facilities, so that calls to functions like
107
{\tt summary} and {\tt plot} are interpreted appropriately for the object
108
considered.
109

110
The object {\tt sug} has ``class'' {\tt "cross"}, and so calls to
111
{\tt summary} and {\tt plot} are actually sent to the functions
112
{\tt summary.cross} and {\tt plot.cross}.
113

114
Use {\tt summary()} to get a quick summary of the data.  (This also
115
performs a variety of checks of the integrity of the data.)
116

117
<<summary_cross>>=
118
summary(sug)
119
@
120

121
We see that this is an intercross with \Sexpr{nind(sug)} individuals.
122
There are \Sexpr{nphe(sug)} phenotypes, and genotype data at
123
\Sexpr{totmar(sug)} markers across the \Sexpr{nchr(sug)} autosomes.  The genotype
124
data is quite complete.
125

126
Use {\tt plot()} to get a summary plot of the data.
127

128
<<summary_plot, fig.height=8>>=
129
plot(sug)
130
@
131

132
The plot in the upper-left shows the pattern of missing genotype data, with
133
black pixels corresponding to missing genotypes.  The next plot shows
134
the genetic map of the typed markers.  The following plots are
135
histograms or bar plots for the six phenotypes.  The last two
136
``phenotypes'' are sex (with 1 corresponding to males) and mouse ID.
137

138

139
\bigskip
140
{\sffamily \textbf{Single-QTL analysis}}
141
\nopagebreak
142

143
Let's now proceed to QTL mapping via a single-QTL model.
144

145
We first calculate the QTL genotype probabilities, given the
146
observed marker data, via the function {\tt calc.genoprob}.  This is
147
done at the markers and at a grid along the chromosomes.  The argument
148
{\tt step} is the density of the grid (in cM), and defines the
149
density of later QTL analyses.
150

151
<<calc_genoprob>>=
152
sug <- calc.genoprob(sug, step=1)
153
@
154

155
The output of {\tt calc.genoprob} is the same cross object as input,
156
with additional information (the QTL genotype probabilities) inserted.  We
157
assign this back to the original object (writing over the previous
158
data), though it could have also been assigned to a new object.
159

160
To perform a single-QTL genome scan, we use the function {\tt scanone}.
161
By default, it performs standard interval mapping (that is, maximum
162
likelihood via the EM algorithm).  Also, by default, it considers the
163
first phenotype in the input cross object (in this case, blood
164
pressure).
165

166
<<scanone>>=
167
out.em <- scanone(sug)
168
@
169

170
The output has ``class'' {\tt "scanone"}.  The {\tt summary}
171
function is passed to the function {\tt summary.scanone}, and gives
172
the maximum LOD score on each chromosome.
173

174
<<summary_scanone>>=
175
summary(out.em)
176
@
177

178
Alternatively, we can give a threshold, e.g., to only see those
179
chromosomes with LOD $>$ 3.
180

181
<<summary_w_threshold>>=
182
summary(out.em, threshold=3)
183
@
184

185
We can plot the results as follows.
186

187
<<plot_scanone>>=
188
plot(out.em)
189
@
190

191
We can do the genome scan via Haley-Knott regression by calling
192
{\tt scanone} with the argument {\tt method="hk"}.
193

194
<<scanone_hk>>=
195
out.hk <- scanone(sug, method="hk")
196
@
197

198
We may plot the two sets of LOD curves together in a single call
199
to {\tt plot}.
200

201
<<plot_em_and_hk>>=
202
plot(out.em, out.hk, col=c("blue", "red"))
203
@
204

205
Alternatively, we could do the following (figure not included, for brevity):
206

207
<<plot_em_and_hk_alt, eval=FALSE>>=
208
plot(out.em, col="blue")
209
plot(out.hk, col="red", add=TRUE)
210
@
211

212
It's perhaps more informative to plot the differences:
213

214
<<plot_diff>>=
215
plot(out.hk - out.em, ylim=c(-0.3, 0.3),
216
     ylab="LOD(HK)-LOD(EM)")
217
@
218

219
\bigskip
220
{\sffamily \textbf{Permutation tests}}
221
\nopagebreak
222

223
To perform a permutation test, to get a genome-wide significance
224
threshold or genome-scan-adjusted p-values, we use {\tt scanone} just as
225
before, but with an additional argument, {\tt n.perm}, indicating the
226
number of permutation replicates.  It's quickest to use Haley-Knott
227
regression.
228

229
<<scanone_perm>>=
230
operm <- scanone(sug, method="hk", n.perm=1000)
231
@
232

233
A histogram of the results (the 1000 genome-wide maximum LOD
234
scores) is obtained as follows:
235

236
<<plot_perm>>=
237
plot(operm)
238
@
239

240
Significance thresholds may be obtained via the {\tt summary}
241
function:
242

243
<<summary_perm>>=
244
summary(operm)
245
summary(operm, alpha=c(0.05, 0.2))
246
@
247

248
The permutation results may be used along with
249
the {\tt scanone} results to have significance thresholds and
250
p-values calculated automatically:
251

252
<<summary_scanone_w_perm>>=
253
summary(out.hk, perms=operm, alpha=0.2, pvalues=TRUE)
254
@
255

256
\bigskip
257
{\sffamily \textbf{Interval estimates of QTL location}}
258
\nopagebreak
259

260
For the blood pressure phenotype, we've seen good evidence for QTL on
261
chromosomes 7 and 15.  Interval estimates of the location of QTL are
262
commonly obtained via 1.5-LOD support intervals, which may be
263
calculated via the function {\tt lodint}.  Alternatively, an
264
approximate Bayes credible interval may be obtained with
265
{\tt bayesint}.
266

267
To obtain the 1.5-LOD support interval and 95\% Bayes interval
268
for the QTL on chromosome 7, type the following.
269
The first and last rows define the ends of the intervals; the middle
270
row is the estimated QTL location.
271

272
<<lodint_bayesint>>=
273
lodint(out.hk, chr=7)
274
bayesint(out.hk, chr=7)
275
@
276

277
It is sometimes useful to identify the closest flanking markers;
278
use {\tt expandtomarkers=TRUE}:
279

280
<<expandtomarkers>>=
281
lodint(out.hk, chr=7, expandtomarkers=TRUE)
282
bayesint(out.hk, chr=7, expandtomarkers=TRUE)
283
@
284

285
We can calculate the 2-LOD support interval and the 99\% Bayes
286
interval as follows.
287

288
<<lodint_2>>=
289
lodint(out.hk, chr=7, drop=2)
290
bayesint(out.hk, chr=7, prob=0.99)
291
@
292

293
The intervals for the chr 15 locus may be calculated as follows.
294

295
<<lodint_chr15>>=
296
lodint(out.hk, chr=15)
297
bayesint(out.hk, chr=15)
298
@
299

300

301
\bigskip
302
{\sffamily \textbf{R and package versions used}}
303
\nopagebreak
304

305
<<sessionInfo, include=TRUE, echo=TRUE, results='markup'>>=
306
sessionInfo()
307
@
308

309
\end{document}
310

311