Sharedsupport / knitr_example.RnwOpen in CoCalc

Examples for support purposes...


\hypersetup{pdfpagemode=UseNone} % don't show bookmarks on initial view
\hypersetup{colorlinks, urlcolor={blue}}

% revise margins



{\sffamily \textbf{An example Knitr/R Markdown document}}

\href{}{Karl W Broman}

This is a portion of the ``\href{}{A shorter tour of R/qtl}''
tutorial, developed here in multiple formats to illustrate the use of knitr.
This particular document is written with \href{}{LaTeX}.

<<knitr_options, include=FALSE>>=
opts_chunk$set(fig.width=12, fig.height=4, fig.path='RnwFigs/',
               warning=FALSE, message=FALSE, tidy=FALSE)
# install R/qtl package if necessary:
if(!require("qtl")) install.packages("qtl", repos="")

{\sffamily \textbf{Preliminaries}}

To install R/qtl, you need to first install the package.
Type (within R) {\tt install.packages("qtl")}
(This needs to be done just once.)

You then load the R/qtl package using the {\tt library} function:


This needs to be done every time you start R. (There is a way to
have the package loaded automatically every time, but we won't discuss
that here.)

To get help on the functions and data sets in R
(and in R/qtl), use {\tt help()} or {\tt ?}. For example, to view the help
file for the {\tt read.cross} function, type one of the following:

<<help, eval=FALSE>>=

{\sffamily \textbf{Data import}}

We will consider data from
\href{}{Sugiyama et al.,
Physiol Genomics 10:5--12, 2002}. Load the data into R/qtl as

sug <- read.cross("csv", "", "sug.csv",
                  genotypes=c("CC", "CB", "BB"),
                  alleles=c("C", "B"))

The function {\tt read.cross} is for importing data into R/qtl.
{\tt "sug.csv"} is the name of the file, which we import directly
from the R/qtl website.  {\tt genotypes} indicates the codes used for
the genotypes; {\tt alleles} indicates single-character codes to be
used in plots and such.

{\tt read.cross} loads the data from the file and formats it into
a special cross object, which is then assigned to {\tt sug} via the
assignment operator {\tt <-}.

The data are from an intercross between BALB/cJ and CBA/CaJ; only male
offspring were considered.  There are four phenotypes: blood pressure,
heart rate, body weight, and heart weight.  We will focus on the blood
pressure phenotype, will consider just the \Sexpr{nind(sug)} individuals with
genotype data and, for simplicity, will focus on the autosomes.

{\sffamily \textbf{Summaries}}

The data object {\tt sug} is complex; it contains the genotype
data, phenotype data and genetic map.  R has a certain amount of
``object oriented'' facilities, so that calls to functions like
{\tt summary} and {\tt plot} are interpreted appropriately for the object

The object {\tt sug} has ``class'' {\tt "cross"}, and so calls to
{\tt summary} and {\tt plot} are actually sent to the functions
{\tt summary.cross} and {\tt plot.cross}.

Use {\tt summary()} to get a quick summary of the data.  (This also
performs a variety of checks of the integrity of the data.)


We see that this is an intercross with \Sexpr{nind(sug)} individuals.
There are \Sexpr{nphe(sug)} phenotypes, and genotype data at
\Sexpr{totmar(sug)} markers across the \Sexpr{nchr(sug)} autosomes.  The genotype
data is quite complete.

Use {\tt plot()} to get a summary plot of the data.

<<summary_plot, fig.height=8>>=

The plot in the upper-left shows the pattern of missing genotype data, with
black pixels corresponding to missing genotypes.  The next plot shows
the genetic map of the typed markers.  The following plots are
histograms or bar plots for the six phenotypes.  The last two
``phenotypes'' are sex (with 1 corresponding to males) and mouse ID.

{\sffamily \textbf{Single-QTL analysis}}

Let's now proceed to QTL mapping via a single-QTL model.

We first calculate the QTL genotype probabilities, given the
observed marker data, via the function {\tt calc.genoprob}.  This is
done at the markers and at a grid along the chromosomes.  The argument
{\tt step} is the density of the grid (in cM), and defines the
density of later QTL analyses.

sug <- calc.genoprob(sug, step=1)

The output of {\tt calc.genoprob} is the same cross object as input,
with additional information (the QTL genotype probabilities) inserted.  We
assign this back to the original object (writing over the previous
data), though it could have also been assigned to a new object.

To perform a single-QTL genome scan, we use the function {\tt scanone}.
By default, it performs standard interval mapping (that is, maximum
likelihood via the EM algorithm).  Also, by default, it considers the
first phenotype in the input cross object (in this case, blood

out.em <- scanone(sug)

The output has ``class'' {\tt "scanone"}.  The {\tt summary}
function is passed to the function {\tt summary.scanone}, and gives
the maximum LOD score on each chromosome.


Alternatively, we can give a threshold, e.g., to only see those
chromosomes with LOD $>$ 3.

summary(out.em, threshold=3)

We can plot the results as follows.


We can do the genome scan via Haley-Knott regression by calling
{\tt scanone} with the argument {\tt method="hk"}.

<<scanone_hk>>= <- scanone(sug, method="hk")

We may plot the two sets of LOD curves together in a single call
to {\tt plot}.

plot(out.em,, col=c("blue", "red"))

Alternatively, we could do the following (figure not included, for brevity):

<<plot_em_and_hk_alt, eval=FALSE>>=
plot(out.em, col="blue")
plot(, col="red", add=TRUE)

It's perhaps more informative to plot the differences:

plot( - out.em, ylim=c(-0.3, 0.3),

{\sffamily \textbf{Permutation tests}}

To perform a permutation test, to get a genome-wide significance
threshold or genome-scan-adjusted p-values, we use {\tt scanone} just as
before, but with an additional argument, {\tt n.perm}, indicating the
number of permutation replicates.  It's quickest to use Haley-Knott

operm <- scanone(sug, method="hk", n.perm=1000)

A histogram of the results (the 1000 genome-wide maximum LOD
scores) is obtained as follows:


Significance thresholds may be obtained via the {\tt summary}

summary(operm, alpha=c(0.05, 0.2))

The permutation results may be used along with
the {\tt scanone} results to have significance thresholds and
p-values calculated automatically:

summary(, perms=operm, alpha=0.2, pvalues=TRUE)

{\sffamily \textbf{Interval estimates of QTL location}}

For the blood pressure phenotype, we've seen good evidence for QTL on
chromosomes 7 and 15.  Interval estimates of the location of QTL are
commonly obtained via 1.5-LOD support intervals, which may be
calculated via the function {\tt lodint}.  Alternatively, an
approximate Bayes credible interval may be obtained with
{\tt bayesint}.

To obtain the 1.5-LOD support interval and 95\% Bayes interval
for the QTL on chromosome 7, type the following.
The first and last rows define the ends of the intervals; the middle
row is the estimated QTL location.

lodint(, chr=7)
bayesint(, chr=7)

It is sometimes useful to identify the closest flanking markers;
use {\tt expandtomarkers=TRUE}:

lodint(, chr=7, expandtomarkers=TRUE)
bayesint(, chr=7, expandtomarkers=TRUE)

We can calculate the 2-LOD support interval and the 99\% Bayes
interval as follows.

lodint(, chr=7, drop=2)
bayesint(, chr=7, prob=0.99)

The intervals for the chr 15 locus may be calculated as follows.

lodint(, chr=15)
bayesint(, chr=15)

{\sffamily \textbf{R and package versions used}}

<<sessionInfo, include=TRUE, echo=TRUE, results='markup'>>=