| Download
Think Stats by Allen B. Downey Think Stats is an introduction to Probability and Statistics for Python programmers.
This is the accompanying code for this book.
Project: Support and Testing
Views: 7138License: GPL3
"""This file contains code used in "Think Stats",1by Allen B. Downey, available from greenteapress.com23Copyright 2014 Allen B. Downey4License: GNU GPLv3 http://www.gnu.org/licenses/gpl.html5"""67from __future__ import print_function, division89import thinkstats210import thinkplot1112import math13import random14import numpy as np1516from scipy import stats17from estimation import RMSE, MeanError181920"""This file contains a solution to exercises in Think Stats:2122Exercise 8.12324In this chapter we used $\xbar$ and median to estimate $\mu$, and25found that $\xbar$ yields lower MSE.26Also, we used $S^2$ and $S_{n-1}^2$ to estimate $\sigma$, and found that27$S^2$ is biased and $S_{n-1}^2$ unbiased.2829Run similar experiments to see if $\xbar$ and median are biased estimates30of $\mu$.31Also check whether $S^2$ or $S_{n-1}^2$ yields a lower MSE.3233My conclusions:34351) xbar and median yield lower mean error as m increases, so neither36one is obviously biased, as far as we can tell from the experiment.37382) The biased estimator of variance yields lower RMSE than the unbiased39estimator, by about 10%. And the difference holds up as m increases.404142Exercise 8.24344Suppose you draw a sample with size $n=10$ from a population45with an exponential disrtribution with $\lambda=2$. Simulate46this experiment 1000 times and plot the sampling distribution of47the estimate $\lamhat$. Compute the standard error of the estimate48and the 90\% confidence interval.4950Repeat the experiment with a few different values of $n$ and make51a plot of standard error versus $n$.52531) With sample size 10:5455standard error 0.89671791154556confidence interval (1.2901330772324622, 3.8692334892427911)57582) As sample size increases, standard error and the width of59the CI decrease:606110 0.90 (1.3, 3.9)62100 0.21 (1.7, 2.4)631000 0.06 (1.9, 2.1)6465All three confidence intervals contain the actual value, 2.666768Exercise 8.36970In games like hockey and soccer, the time between goals is71roughly exponential. So you could estimate a team's goal-scoring rate72by observing the number of goals they score in a game. This73estimation process is a little different from sampling the time74between goals, so let's see how it works.7576Write a function that takes a goal-scoring rate, {\tt lam}, in goals77per game, and simulates a game by generating the time between goals78until the total time exceeds 1 game, then returns the number of goals79scored.8081Write another function that simulates many games, stores the82estimates of {\tt lam}, then computes their mean error and RMSE.8384Is this way of making an estimate biased? Plot the sampling85distribution of the estimates and the 90\% confidence interval. What86is the standard error? What happens to sampling error for increasing87values of {\tt lam}?8889My conclusions:90911) RMSE for this way of estimating lambda is 1.492932) The mean error is small and decreases with m, so this estimator94appears to be unbiased.9596One note: If the time between goals is exponential, the distribution97of goals scored in a game is Poisson.9899See https://en.wikipedia.org/wiki/Poisson_distribution100101"""102103def Estimate1(n=7, m=100000):104"""Mean error for xbar and median as estimators of population mean.105106n: sample size107m: number of iterations108"""109mu = 0110sigma = 1111112means = []113medians = []114for _ in range(m):115xs = [random.gauss(mu, sigma) for i in range(n)]116xbar = np.mean(xs)117median = np.median(xs)118means.append(xbar)119medians.append(median)120121print('Experiment 1')122print('mean error xbar', MeanError(means, mu))123print('mean error median', MeanError(medians, mu))124125126def Estimate2(n=7, m=100000):127"""RMSE for biased and unbiased estimators of population variance.128129n: sample size130m: number of iterations131"""132mu = 0133sigma = 1134135estimates1 = []136estimates2 = []137for _ in range(m):138xs = [random.gauss(mu, sigma) for i in range(n)]139biased = np.var(xs)140unbiased = np.var(xs, ddof=1)141estimates1.append(biased)142estimates2.append(unbiased)143144print('Experiment 2')145print('RMSE biased', RMSE(estimates1, sigma**2))146print('RMSE unbiased', RMSE(estimates2, sigma**2))147148149def SimulateSample(lam=2, n=10, m=1000):150"""Sampling distribution of L as an estimator of exponential parameter.151152lam: parameter of an exponential distribution153n: sample size154m: number of iterations155"""156def VertLine(x, y=1):157thinkplot.Plot([x, x], [0, y], color='0.8', linewidth=3)158159estimates = []160for j in range(m):161xs = np.random.exponential(1/lam, n)162lamhat = 1/np.mean(xs)163estimates.append(lamhat)164165stderr = RMSE(estimates, lam)166print('standard error', stderr)167168cdf = thinkstats2.Cdf(estimates)169ci = cdf.Percentile(5), cdf.Percentile(95)170print('confidence interval', ci)171VertLine(ci[0])172VertLine(ci[1])173174# plot the CDF175thinkplot.Cdf(cdf)176thinkplot.Save(root='estimation2',177xlabel='estimate',178ylabel='CDF',179title='Sampling distribution')180181return stderr182183184def SimulateGame(lam):185"""Simulates a game and returns the estimated goal-scoring rate.186187lam: actual goal scoring rate in goals per game188"""189goals = 0190t = 0191while True:192time_between_goals = random.expovariate(lam)193t += time_between_goals194if t > 1:195break196goals += 1197198# estimated goal-scoring rate is the actual number of goals scored199L = goals200return L201202203def Estimate4(lam=2, m=1000000):204205estimates = []206for i in range(m):207L = SimulateGame(lam)208estimates.append(L)209210print('Experiment 4')211print('rmse L', RMSE(estimates, lam))212print('mean error L', MeanError(estimates, lam))213214pmf = thinkstats2.Pmf(estimates)215216thinkplot.Hist(pmf)217thinkplot.Show()218219220def main():221thinkstats2.RandomSeed(17)222223Estimate1()224Estimate2()225226print('Experiment 3')227for n in [10, 100, 1000]:228stderr = SimulateSample(n=n)229print(n, stderr)230231Estimate4()232233234if __name__ == '__main__':235main()236237238