Description: Jupyter notebook CDS-102/Lab Week 09 - Statistical distribution of temperatures in Washington, DC/CDS-102 Lab Week 09 Workbook.ipynb
CDS-102: Lab 9 Workbook
March 30, 2017
# Run this code block to load the Tidyverse package.libPaths(new="~/Rlibs")library(tidyverse)# The dataset is in the file "MDWASHDC_JAN1995_DEC2016.csv"dc.temps<-read.csv("MDWASHDC_JAN1995_DEC2016.csv")head(dc.temps)
The code below plots the Probability Mass Function (PMF) histogram of the average daily temperatures in
the full dataset for each month of the year using the geom_histogram() and ggplot() functions. It uses the
facet_wrap() function to create this as a 12 panel plot.
The code below creates the normal distribution model for the month of June (all years) using the summary
statistics computed in task 1 by generating the Probability Density Function (PDF). It then stores
the computed values of the model in a new two-column tibble named jun.model.
The code below creates a new plot containing the average daily temperature PMF histogram and the normal
distribution model for June (all years). Note whether or not the model visually agrees with
The code below creates a qqplot for the average temperature distribution in June. A theoretical line is
computed and included for comparison.
# Find the 1st and 3rd quartiles (0.25 and 0.75 percentiles)qq_y<-quantile(dc.temps.june$t.avg,c(0.25,0.75))# Find the matching normal values on the x-axisqq_x<-qnorm(c(0.25,0.75))# Compute line slopeqq_slope<-diff(qq_y)/diff(qq_x)# Compute line interceptqq_int<-qq_y-qq_slope*qq_xqqplot.june<-ggplot(dc.temps.june)+geom_qq(aes(sample=t.avg),color="cyan3")+geom_abline(intercept=qq_int,slope=qq_slope,color="black")ggsave("qqplot.june.png",plot=qqplot.june,device="png",scale=1,width=5,height=4)qqplot.june
Lab Task 6##
The code below creates a 12 panel series of qqplots (without theoretical lines) for each month (all years)
using facet_wrap(). Note whether the trend for June applies to the other
Report the mean for the month of March with a 68% and a 95% confidence interval.
ci.95<-2*june.sdcat("The 95% confidence interval for the unfiltered dataset is ",june.mean,"+-",ci.95,"\n")cat("The 68% confidence interval for the unfiltered dataset is ",june.mean,"+-",june.sd)june.mean+ci.95june.mean-ci.95june.mean+june.sdjune.mean-june.sd
The 95% confidence interval for the unfiltered dataset is 74.23167 +- 29.43316
The 68% confidence interval for the unfiltered dataset is 74.23167 +- 14.71658