In the winter of 2014-2015, there was an outbreak of measles among Disneyland visitors. Among the 110 California cases, 49 were unvaccinated and 61 were vaccinated. Thus, somewhat more vaccinated than unvaccinated people got sick. This seems counterintuitive -- if the measles vaccine works, shouldn't most sick people be unvaccinated? What is important to note is that 91% of all people in the California population are vaccinated. The fraction of people who got sick who were unvaccinated was 44.5% (much higher than the 9% in the general population).
Now, consider a similar measles outbreak in which another 110 people get sick. We would not expect exactly 44.5% to be unvaccinated -- the numbers will vary. We would like to know how much we can expect them to vary. In this lab, we will use resampling to quantify this and create a confidence interval representing the expected results.
1. In this example, our measure is the proportion of the population that is unvaccinated. Calculate the observed measure for the Disneyland measles outbreak, which we'll call Mobs and save it in a variable to use later.
In [ ]:
The simplest way to find out how much the proportion of unvaccinated individuals would vary in another outbreak would be to actually have another outbreak. Obviously, this is both impractical and unethical. Therefore, we do the next best thing. In a move of dazzling intellectual chutzpah, we will treat the sample as the population. We can do this because, just as a sample proportion is an estimate of the population proportion, the sample variability is an estimate of the population variability.
The procedure is fairly simple. Just as when simulating null hypotheses, we will make random samples of data points. But instead of sampling from a box representing a null hypothesis, we will sample (with replacement, just as before) from our actual data. We will pull out samples of the same size as the original and compute whatever we computed for the original. This is called constructing a confidence interval.
2. Make a box model representing the sick population and resample it 10 times, saving the unvaccinated proportion in a list. Look at the list to make sure your results make sense.
3. Now, resample the data 10,000 times, saving the unvaccinated proportion as before. You may want to plot a histogram to make sure these numbers make sense.
We will now find the range in which unvaccinated proportions from other outbreaks would fall 99% of the time. This is called the 99% confidence interval.
Finding the confidence interval involves two steps.
Find cutoff values by sorting the list of re-sampled values from smallest to largest and finding the 50th and 9950th elements.
Performing a calculation with the cutoffs to get the actual 99% confidence interval.
To sort a list or 1-D array, use the command listname.sort(). This sorts the original list (rather than making a copy), so be careful with it if you think you might need the original. For re-sampling, this behavior is exactly what we want.
We find the confidence interval by sorting the list of re-sampled values from smallest to largest and finding the 50th and 9950th elements. If these are 0.33 and 0.9, respectively, our 99% percentile confidence interval is (0.33, 0.9). Note: We will focus on percentile confidence intervals for this lab.
4. Why do we want the 50th and 9950th elements in particular?
In [ ]:
To find the 50th and 9950th elements of a list or array, you can use what’s called indexing. This identifies elements in a list by their position, often called its index. In Python, indexing starts with 0, so the first element of a list with k elements has index 0 and the last element has index k−1.
5. What are the list indices of the 50th and 9950th elements of a list?
HINT: Think about some smaller numbers.
6. Find Mlower, which is the 50th element of the list, and Mupper, which is the 9950th element of the list.
7. Find the lower bound of the confidence interval by computing 2Mobs−Mupper, where Mobs is the actual observed value defined above.
8. Find the upper bound of the confidence interval by computing 2Mobs−Mlower.
9. Write a sentence reporting and interpreting your confidence interval.