We will read the data, print it out and see what it looks like, and plot the mean January temperature vs. latitude. The commands below show how to do this.

Note that all the information following any "#" sign is just to explain what is going on. R ignores anything that follows a "#" sign.

City | Mean_Jan_Temp_F | Latitude | Jan_degreesC |
---|---|---|---|

Akron, OH | 27 | 41.05 | -2.78 |

Albany-Schenectady-Troy, NY | 23 | 42.40 | -5.00 |

Allentown, Bethlehem, PA-NJ | 29 | 40.35 | -1.67 |

Atlanta, GA | 45 | 33.45 | 7.22 |

Baltimore, MD | 35 | 39.20 | 1.67 |

Birmingham, AL | 45 | 33.31 | 7.22 |

Boston, MA | 30 | 42.15 | -1.11 |

Bridgeport-Milford, CT | 30 | 41.12 | -1.11 |

Buffalo, NY | 24 | 42.54 | -4.44 |

Canton, OH | 27 | 40.50 | -2.78 |

Chattanooga, TN-GA | 42 | 35.01 | 5.56 |

Chicago, IL | 26 | 41.49 | -3.33 |

Cincinnati, OH-KY-IN | 34 | 39.08 | 1.11 |

Cleveland, OH | 28 | 41.30 | -2.22 |

Columbus, OH | 31 | 40.00 | -0.56 |

Dallas, TX | 46 | 32.45 | 7.78 |

Dayton-Springfield, OH | 30 | 39.54 | -1.11 |

Denver, CO | 30 | 39.44 | -1.11 |

Detroit, MI | 27 | 42.06 | -2.78 |

Flint, MI | 24 | 43.00 | -4.44 |

Grand Rapids, MI | 24 | 43.00 | -4.44 |

Greensboro-Winston-Salem-High Point, NC | 40 | 36.04 | 4.44 |

Hartford, CT | 27 | 41.45 | -2.78 |

Houston, TX | 55 | 29.46 | 12.78 |

Indianapolis, IN | 29 | 39.45 | -1.67 |

Kansas City, MO | 31 | 39.05 | -0.56 |

Lancaster, PA | 32 | 40.05 | 0.00 |

Los Angeles, Long Beach, CA | 53 | 34.00 | 11.67 |

Louisville, KY-IN | 35 | 38.15 | 1.67 |

Memphis, TN-AR-MS | 42 | 35.07 | 5.56 |

Miami-Hialeah, FL | 67 | 25.45 | 19.44 |

Milwaukee, WI | 20 | 43.03 | -6.67 |

Minneapolis-St. Paul, MN-WI | 12 | 44.58 | -11.11 |

Nashville, TN | 40 | 36.10 | 4.44 |

New Haven-Meriden, CT | 30 | 41.20 | -1.11 |

New Orleans, LA | 54 | 30.00 | 12.22 |

New York, NY | 33 | 40.40 | 0.56 |

Philadelphia, PA-NJ | 32 | 40.00 | 0.00 |

Pittsburgh, PA | 29 | 40.26 | -1.67 |

Portland, OR | 38 | 45.31 | 3.33 |

Providence, RI | 29 | 41.50 | -1.67 |

Reading, PA | 33 | 40.20 | 0.56 |

Richmond-Petersburg, VA | 39 | 37.35 | 3.89 |

Rochester, NY | 25 | 43.15 | -3.89 |

St. Louis, MO-IL | 32 | 38.39 | 0.00 |

San Diego, CA | 55 | 32.43 | 12.78 |

San Francisco, CA | 48 | 37.45 | 8.89 |

San Jose, CA | 49 | 37.20 | 9.44 |

Seattle, WA | 40 | 47.36 | 4.44 |

Springfield, MA | 28 | 42.05 | -2.22 |

Syracuse, NY | 24 | 43.05 | -4.44 |

Toledo, OH | 26 | 41.40 | -3.33 |

Utica-Rome, NY | 23 | 43.05 | -5.00 |

Washington, DC-MD-VA | 37 | 38.50 | 2.78 |

Wichita, KS | 32 | 37.42 | 0.00 |

Wilmington, DE-NJ-MD | 33 | 39.45 | 0.56 |

Worcester, MA | 24 | 42.16 | -4.44 |

York, PA | 33 | 40.00 | 0.56 |

Youngstown-Warren, OH | 28 | 41.05 | -2.22 |

The above "read.csv" command produces a dataframe named
"winterdat". We can now compute summary stats, plot histograms,
boxplots, etc. for the above variables.
However, given the focus of the present tutorial,
let's scatter plot the mean January temperature vs. latitude

correlation= -0.8573135

Nest, we will standardize the x, y variables and convert
them to z-scores.
Here is how:

correlation after standardizing= -0.8573135

Next, we will let R compute a linear regression model for us.

Note that there are several, slightly different, variations on how to do this. Some examples are shown below.

Note that there are several, slightly different, variations on how to do this. Some examples are shown below.

Call:
lm(formula = yvar ~ xvar)
Coefficients:
(Intercept) xvar
118.14 -2.15
Call:
lm(formula = Mean_Jan_Temp_F ~ Latitude, data = winterdat)
Coefficients:
(Intercept) Latitude
118.14 -2.15
Call:
lm(formula = Mean_Jan_Temp_F ~ Latitude, data = winterdat)
Residuals:
Min 1Q Median 3Q Max
-10.2978 -2.6353 -0.8719 0.3965 23.6789
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 118.139 6.743 17.52 <2e-16 ***
Latitude -2.150 0.171 -12.57 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.272 on 57 degrees of freedom
Multiple R-squared: 0.735, Adjusted R-squared: 0.7303
F-statistic: 158.1 on 1 and 57 DF, p-value: < 2.2e-16

- Read the file into R
- Make a sctterplot of "deaths_per_100k" vs "gun_ownership_rate". Be sure to label your axes.
- Compute the correlation between those two variables
- Plot the same two variables in standardized form
- Construct a linear regression model to predict firearm deaths from gun ownership rate. Plot the model together with the original data.
- Plot the residuals.
- Find the $R^2$ value.
- Write a short paragraph discussing the quality and appropriateness of the linear model, based on the scatter plot, correlation, $R^2$, etc. Are there any conclusions you can draw from the model?