CoCalc Shared FilesAssignments W20 (make a copy in your sandbox!) / Lab 3 - Quantifying Uncertainty Lec 2.ipynbOpen in CoCalc with one click!

Name:

I collaborated with:

1

In the winter of 2014-2015, there was an outbreak of measles among Disneyland visitors. Among the 110 California cases, 49 were unvaccinated and 61 were vaccinated. Thus, somewhat more vaccinated than unvaccinated people got sick. This seems counterintuitive -- if the measles vaccine works, shouldn't most sick people be unvaccinated? What is important to note is that 91% of all people in the California population are vaccinated. The fraction of people who got sick who were unvaccinated was 44.5% (much higher than the 9% in the general population).

Now, consider a similar measles outbreak in which another 110 people get sick. We would not expect exactly 44.5% to be unvaccinated -- the numbers will vary. We would like to know how much we can expect them to vary. In this lab, we will use resampling to quantify this and create a confidence interval representing the expected results.

2

In [1]:

# Import libraries import numpy as np import matplotlib as mat import seaborn as sns %matplotlib inline

3

4

In [ ]:

#TODO

5

The simplest way to find out how much the proportion of unvaccinated individuals would vary in another outbreak would be to actually have another outbreak. Obviously, this is both impractical and unethical. Therefore, we do the next best thing. In a move of dazzling intellectual chutzpah, we will treat the sample as the population. We can do this because, just as a sample proportion is an estimate of the population proportion, the sample variability is an estimate of the population variability.

The procedure is fairly simple. Just as when simulating null hypotheses, we will make random samples of data points. But instead of sampling from a box representing a null hypothesis, we will sample (with replacement, just as before) from our *actual data*. We will pull out samples of the same size as the original and compute whatever we computed for the original. This is called constructing a confidence interval.

6

7

In [10]:

#TODO

8

9

In [11]:

#TODO

10

We will now find the range in which unvaccinated proportions from other outbreaks would fall 99% of the time. This is called the 99% confidence interval.

Finding the confidence interval involves two steps.

- Find cutoff values by sorting the list of re-sampled values from smallest to largest and finding the 50th and 9950th elements.
- Performing a calculation with the cutoffs to get the actual 99% confidence interval.

11

To sort a list or 1-D array, use the command `listname.sort()`

. This sorts the original list (rather than making a copy), so be careful with it if you think you might need the original. For re-sampling, this behavior is exactly what we want.

12

We find the confidence interval by sorting the list of re-sampled values from smallest to largest and finding the 50th and 9950th elements. If these are 0.33 and 0.9, respectively, our 99% percentile confidence interval is (0.33, 0.9). Note: We will focus on percentile confidence intervals for this lab.

13

In [ ]:

#TODO

14

To find the 50th and 9950th elements of a list or array, you can use what’s called indexing. This identifies elements in a list by their position, often called its *index*. In Python, indexing starts with 0, so the first element of a list with *k* elements has index 0 and the last element has index *k−1*.

15

*HINT: Think about some smaller numbers.*

16

In [12]:

#TODO

17

18

In [6]:

#TODO

19

20

In [7]:

#TODO

21

22

In [8]:

#TODO

23

24

In [9]:

#TODO

25