Description: Jupyter notebook CDS-102/Lab Week 13 - Temperature data revisited, part 2: the sine function/CDS-102 Lab Week 13 Workbook.ipynb
CDS-102: Lab 13 Workbook
Name: Helena Gray
April 26, 2017
# Run this code block to load the Tidyverse package.libPaths(new="~/Rlibs")library(tidyverse)library(modelr)# Load the save file that preloads the dataset and# the model.sin functionload("lab13.RData")
# To change the size of any plots, copy the code snippet# below, uncomment it, and set the size of the width# and height.# Note: All subsequent figures will use the same size,# unless you change the options() snippet and run it# again.# options(repr.plot.width=6, repr.plot.height=4)
Lab Task 1###
The code below creates the number_days column and remove any missing values from the dataset.
In the northern hemisphere we tend to be hottest in July and coldest in January.
The difference between January 1st and July 1st is roughly 180 days.
The time it takes the Earth to make one rotation around the sun is roughly 365 days or a year. Thus we will assign this number of days to a variable named model.T. In the context of this dataset, T has a specific meaning: it is the number of days that will pass before the pattern for the daily
average temperatures begins to repeat itself. This is what the variable model.T will represent.
Lab Task 4###
The code below creates models for n=1, n=2, n=3, n=4, n=5, which are various values of the phase shift and assign them to variables with names like mod_n_1, mod_n_2 using a custom function model.sin() which takes three inputs: the period T (given in units of days), the phase shift n (given in units of months), and x is the explanatory variable, in this case the number of days since January 1, 1995.
The following code plots the models overlaid on top of the temperature data for years 1995 and 1996. It creates a tibble of model predictions using the data_grid() and gather_predictions() functions and then plots the values in this tibble by model.
The code below improves the choice of n by repeating the same procedure as in Task 5 and 6 for an n value that is
between the two “best” values from the previous task which would be 3.5. The code follows the
model fitting steps again, calculates R2 and the standard deviation of the residuals, and assigns this parameter to a variable named model.n.