CoCalc Public Filestmp / 2014-10-03-170445.sagews
Authors: Harald Schilly, ℏal Snyder, William A. Stein

## Introduction to Computational Statistics

### ** Common Discrete Probability Distributions **

#### The binomial, geometric, hypergeometric, poisson, and negative binomial distributions.

For each of the five above distributions:

• Choose some parameters (not trivial like 0 or 1).
• Use the *** d.... ( ) *** function to generate the probability distribution function and plot it.
• Use the *** r.... ( ) *** function to generate 1000 data points.
• Use *** table ( ) *** and *** lines ( ) *** or *** points ( ) *** and appropriate scaling to plot it on the same graph as the distribution function, so that they approximately match.
• Use *** mean ( ) *** and *** var ( ) *** to check the mean and variance of the data; compare to the theoretical values.

#### Binomial Distribution

##### Parameters

$\quad \displaystyle n \in \mathbb{N_0} \, - \, number \, of \, trials.$ $\quad \displaystyle p \in \lbrack 0,1 \rbrack\, - \, success \, probability \, in \, each \, trial.$ $\quad \displaystyle y \in \left\{0, ... , n \right\}\, - \, number \, of \, successes.$

##### Probability mass function, expected value, variance, skewness and excess kurtosis

$\quad \displaystyle p(y) = \binom{n}{y} p^y (1 - p)^{n-y}, \quad y \in \{0, 1, \ldots, n\}.$ $\quad \displaystyle E(X) = np.$ $\quad \displaystyle Var(X) = np(1-p).$

$\quad Skewness = \displaystyle \frac{1-2p}{\sqrt(np(1-p))}.$ $\quad Ex. kurtosis = \displaystyle \frac{1-6p(1-p)}{np(1-p)}.$

##### ** $\mathcal B(n = 1,000, \: p = 0.25).$ **
%r
# sample size 1,000
n <- 1e3

# random binomial data
x.binom <- rbinom(n=n, size=15, prob=.25)

##### Probability Mass Function (pmf) Plot: Binomial
%r
plot(0:15,dbinom(x=0:15, size=15, prob=.25), xlab="x", ylab="P(X=x)", ylim=c(0, .3),
col = "tomato", type = "h", lwd = 6)

title(expression("Density Plot of Random Binomial Data " * phantom(" vs. B(15, 0.25)")),
col.main = "dodgerblue")
title(expression(phantom("Density Plot of Random Binomial Data ") * "vs." * phantom("B(15, 0.25)"),
col.main = "black"))
title(expression(phantom("Density Plot of Random Binomial Data vs. ") * "B(15, 0.25)"),
col.main = "tomato")

points((table(x.binom) / n), col = adjustcolor("dodgerblue", alpha.f = 0.3), lwd = 20)

text(10, .29, paste("mean = ", (mean(x.binom)), "vs. E(X) = 3.7500"), cex = 1.2)
text(10, .27, paste("variance = ", format(var(x.binom), digits = 4), "vs. Var(X) = 2.8125"), cex = 1.2)

##### ** Cumulative Distribution Function (cdf) Plot: Binomial Distribution **
%r
# Empirical Cumulative Distribution Function
plot(ecdf(x.binom), xlab="x", ylab="P(X <= x)",
col = "maroon", cex = 2, lwd = 6, pch = 21,
main= "Cumulative Distribution Function \n Bionomial(n=1000, p=0.25)")

# gridlines
grid(nx = NULL, ny = NULL, col = "lightgray", lty = "dotted", lwd = par("lwd"), equilogs = TRUE)


#### ** Geometric Distribution **

##### Parameters

$\quad \displaystyle p \in ( 0,1 \rbrack\, - \, success \, probability.$ $\quad \displaystyle y \in \left\{0, ... , n \right\}\, - \, number \, of \, successes.$

Probability mass function, expected value and variance

$\quad \displaystyle p(y) = p(1 - p)^{y-1} , \quad y \in \left\{1, 2, 3,...\right\}.$ $\quad \displaystyle E(X) = \frac{1}{p}.$ $\quad \displaystyle Var(X) = \frac{1-p}{p^{2}}.$

$\quad Skewness = \displaystyle \frac{2-p}{\sqrt(1-p)}.$ $\quad Ex. kurtosis = \displaystyle \frac{2-p}{\sqrt(1-p)}.$

##### ** $\mathcal G(n = 1,000,\: p = 0.6).$ **
%r
# sample size 1,000
n <- 1e3

# random geometric data
x.geom <- rgeom(n=n, prob=.6)

# density
d.geom <- dgeom(0:10, prob = .6, log = FALSE)

##### ** Probability Mass Function (pmf): Geometric Distribution **
%r
plot(0:10, d.geom, col = "tomato", type = "h", lwd = 6,
main = "Empirical (blue) vs. Theoretical (red) \n Geometric Distribution",
xlab = "Number of Failures Before Success", ylab = "P(X=x)")

points(table(x.geom) / n, col = adjustcolor("dodgerblue", alpha.f = 0.3), lwd = 20)

text(6, .59, paste("mean = ", format(mean(x.geom), digits = 4), ", E(X) = 0.600"), cex = 1.2)
text(6, .55, paste("variance = ", format(var(x.geom), digits = 4), ", Var(X) = 1.000"), cex = 1.2)

##### ** Cumulative Distribution Function (cdf) Plot: Geometric Distribution **
%r
# Cumulative Distribution Function
plot(ecdf(x.geom), xlab="x", ylab="P(X <= x)",
col = "maroon", cex = 2, lwd = 6, pch = 21,
main= "Cumulative Distribution Function \n Geometric(n=1000, p=0.25)")

# gridlines
grid(nx = NULL, ny = NULL, col = "lightgray", lty = "dotted", lwd = par("lwd"), equilogs = TRUE)


#### ** Hypergeometric Distribution **

##### Parameters

$\quad \displaystyle N \in \left\{0, 1, 2, 3 ... \right\}\, - \, population \, size.$ $\quad \displaystyle r \in \left\{0, 1, 2, 3 ... , N \right\}\, - \, number \, of \, successes \, in \, N.$ $\quad \displaystyle n \in \left\{0, 1, 2, 3 ... , N \right\}\, - \, number \, of \, draws \, without \, replacement.$ $\quad \displaystyle y \in \left\{max(0, n+r-N, ... , min(r, n) \right\}\, - \, number \, of \, successes \, in \, n.$

Probability mass function, expected value and variance

$\quad \displaystyle p(y) = \frac{\binom{r}{y} \binom{N - r}{n-y}}{\binom{N}{n}}.$ $\quad \displaystyle E(X) = n \frac{nr}{N} .$ $\quad \displaystyle Var(X) = n \frac{r}{N} \frac{N-r}{N} \frac{N-n}{N-1}.$

$\quad \displaystyle Skewness = \frac{(N-2K)(N-1)^{\frac{1}{2}}(N-2n)}{[nK(N-K)(N-n)]^{\frac{1}{2}}(N-2)}.$

$\quad Ex. kurtosis = \frac{1}{nK(N-K)(N-n)(N-2)(N-3)[(N-1)N^{2}(N(N+1)-6K(N-K)-6n(N-n))+6nK(N-K)(N-n)(5N-6)]}.$

$Hypergeometric(m = 10 , \: n = 8 , \: k = 10).$

%r
# random hypergeometric data
x.hyper <- rhyper(1e3, m=10, n=8, k=10)

# density
d.hyper <- dhyper(0:10, m=10, n=8, k=10)

##### ** Probability Mass Function (pmf): Hypergeometric Distribution **
%r
plot(0:10, d.hyper, xlab = "x", ylab = "P(X=x)", ylim=c(0, .45), col = "tomato", type = "h", lwd = 6)
title(main = expression("Hypergeometric Distribution \n"), col.main = "black")
title(main = expression("Empirical (blue)" * phantom("vs. Theoretical (orange)")), col.main = "dodger blue")
title(main = expression(phantom("Empirical (blue)") * "vs. " * phantom("Theoretical (orange)")), col.main = "black")
title(main = expression(phantom("Empirical (blue) vs. ") * "Theoretical (orange)"), col.main = "tomato")

points(table(x.hyper) / n, col = adjustcolor("dodgerblue", alpha.f = 0.3), lwd = 20)

text(6, .44, paste("mean = ", format(mean(x.hyper), digits = 4), " vs. E(X) = *"), cex = 1.2)
text(6, .41, paste("variance = ", format(var(x.hyper), digits = 4), " vs. Var(X) = *"), cex = 1.2)

##### ** Cumulative Distribution Function (cdf) Plot: Hypergeometric Distribution **
%r
# Empirical Cumulative Distribution Function
plot(ecdf(x.hyper), xlab="x", ylab="P(X <= x)",
col = "maroon", cex = 2, lwd = 6, pch = 21,
main= "Cumulative Distribution Function \n Hypereometric(nn=1000, m=10, n=8, k=10)")

# gridlines
grid(nx = NULL, ny = NULL, col = "lightgray", lty = "dotted", lwd = par("lwd"), equilogs = TRUE)


#### ** Poisson Distribution **

##### Parameters

$\quad \displaystyle \lambda \gt 0.$ $\quad \displaystyle y \in \left\{0, 1, 2, 3,..., \lambda \right\}\, - \, number \, of \, successes.$

Probability mass function, mean and variance

$\quad \displaystyle p(y) = e^{-\lambda} \frac{\lambda^y}{y!} \,for \ n \in \mathbb{N}, \, where \,y \gt 0.$ $\quad \displaystyle E(X) = Var(X) = \lambda.$

$\quad \displaystyle Skewness = \lambda^{-\frac{1}{2}} .$ $\quad \displaystyle Ex. kurtosis = \lambda^{-1} .$

##### ** $Pois(\lambda=5)$ **