Jupyter notebook CDS-102/Lab Week 09 - Statistical distribution of temperatures in Washington, DC/CDS-102 Lab Week 09 Workbook.ipynb
CDS-102: Lab 9 Workbook
Helena Gray
March 30, 2017
month | day | year | t.avg |
---|---|---|---|
1 | 1 | 1995 | 40.6 |
1 | 2 | 1995 | 39.8 |
1 | 3 | 1995 | 29.3 |
1 | 4 | 1995 | 33.0 |
1 | 5 | 1995 | 20.9 |
1 | 6 | 1995 | 27.0 |
Lab Task 1##
The code below generates a summary statistics (mean(), median(), min(), max(), and sd()) report of the average temperature grouped by month using the summarise() function.
month | mean | max | min | med | sd |
---|---|---|---|---|---|
1 | 36.19589 | 63.1 | -99.0 | 36.10 | 11.903005 |
2 | 38.54678 | 62.8 | 11.9 | 38.55 | 8.666396 |
3 | 46.83534 | 76.4 | -99.0 | 46.60 | 10.634093 |
4 | 56.95727 | 79.8 | -99.0 | 56.90 | 10.009018 |
5 | 66.27434 | 85.5 | 50.2 | 65.95 | 7.144027 |
6 | 74.23167 | 89.6 | -99.0 | 75.55 | 14.716578 |
7 | 79.70176 | 92.8 | 65.5 | 79.80 | 4.813713 |
8 | 77.95806 | 91.0 | -99.0 | 78.20 | 10.525838 |
9 | 71.11848 | 86.1 | -99.0 | 72.10 | 11.103055 |
10 | 59.88578 | 79.5 | 40.4 | 59.95 | 7.446588 |
11 | 49.30621 | 70.7 | 28.2 | 49.30 | 7.769721 |
12 | 40.26364 | 65.9 | -99.0 | 40.75 | 13.539830 |
Lab Task 2##
The code below plots the Probability Mass Function (PMF) histogram of the average daily temperatures in the full dataset for each month of the year using the geom_histogram() and ggplot() functions. It uses the facet_wrap() function to create this as a 12 panel plot.
{"output_type":"display_data"}
Lab Task 3##
The code below creates the normal distribution model for the month of June (all years) using the summary statistics computed in task 1 by generating the Probability Density Function (PDF). It then stores the computed values of the model in a new two-column tibble named jun.model.
Lab Task 4##
The code below creates a new plot containing the average daily temperature PMF histogram and the normal distribution model for June (all years). Note whether or not the model visually agrees with the histogram.
{"output_type":"display_data"}
Lab Task 5##
The code below creates a qqplot for the average temperature distribution in June. A theoretical line is computed and included for comparison.
{"output_type":"display_data"}
Lab Task 6##
The code below creates a 12 panel series of qqplots (without theoretical lines) for each month (all years) using facet_wrap(). Note whether the trend for June applies to the other months.
{"output_type":"display_data"}
Lab Task 7##
The normal distribution model is used to compute the temperature of the 0.10 percentile for the month of June (all years) using the qnorm() function.
Lab Task 8##
The normal distribution model is used to compute the percentile of the temperature 83◦F for the month of June (all years) using the pnorm() function.
Key Questions##
For the month of June, what is the probability that any given day will have a temperature of 83◦F or higher? The code below uses the pnorm() function to find this probability.
How cold are the coldest 10% of days? The qnorm() function is used to find this average temperature of the 10% coldest days.
Report the mean for the month of March with a 68% and a 95% confidence interval.