Jupyter notebook CDS-102/Lab Week 13 - Temperature data revisited, part 2: the sine function/CDS-102 Lab Week 13 Workbook.ipynb
CDS-102: Lab 13 Workbook
Name: Helena Gray
April 26, 2017
Lab Task 1###
The code below creates the number_days column and remove any missing values from the dataset.
Lab Task 2###
The code below displays a plot of t.avg as a function of number_days.
Lab Task 3###
The code below filters the dataset to only include temperatures from the years 1995 and 1996, assigns this to a variable named t.data.y9596, and then plots t.avg as a function of number_days.
In the northern hemisphere we tend to be hottest in July and coldest in January. The difference between January 1st and July 1st is roughly 180 days. The time it takes the Earth to make one rotation around the sun is roughly 365 days or a year. Thus we will assign this number of days to a variable named model.T. In the context of this dataset, T has a specific meaning: it is the number of days that will pass before the pattern for the daily average temperatures begins to repeat itself. This is what the variable model.T will represent.
Lab Task 4###
The code below creates models for n=1, n=2, n=3, n=4, n=5, which are various values of the phase shift and assign them to variables with names like mod_n_1, mod_n_2 using a custom function model.sin() which takes three inputs: the period T (given in units of days), the phase shift n (given in units of months), and x is the explanatory variable, in this case the number of days since January 1, 1995.
Lab Task 5###
The following code plots the models overlaid on top of the temperature data for years 1995 and 1996. It creates a tibble of model predictions using the data_grid() and gather_predictions() functions and then plots the values in this tibble by model.
Lab Task 6###
The code below quantifies the quality of different models using the R2 parameter, which is extracted directly from the model variables.
Lab Task 7###
The code below improves the choice of n by repeating the same procedure as in Task 5 and 6 for an n value that is between the two “best” values from the previous task which would be 3.5. The code follows the model fitting steps again, calculates R2 and the standard deviation of the residuals, and assigns this parameter to a variable named model.n.
Lab Task 8###
The code below gets the values for the parameters A and B by running summary() on the model.n.
Lab Task 9###
The code below creates a final plot of t.data.filtered, the model and the residuals.