Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download

Jupyter notebook CDS-101/Midterm Project/Energy Consumption.ipynb

Views: 67
Kernel: R (SageMath)

Midterm Project

Name : Francisca Idio

CDS 101

3/28/2017

# Set the location for R packages .libPaths(new = "~/Rlibs") # Load the Tidyverse packages library(tidyverse)
Loading tidyverse: ggplot2 Loading tidyverse: tibble Loading tidyverse: tidyr Loading tidyverse: readr Loading tidyverse: purrr Conflicts with tidy packages --------------------------------------------------- filter(): dplyr, stats lag(): dplyr, stats
# Load the Public Passenger Vehicle Lisence dataset energy <- read_csv("Energy Census and Economic Data US 2010-2014.csv")
Parsed with column specification: cols( .default = col_double(), StateCodes = col_character(), State = col_character(), Region = col_integer(), Division = col_integer(), Coast = col_integer(), `Great Lakes` = col_integer(), TotalC2010 = col_integer(), TotalC2011 = col_integer(), TotalC2012 = col_integer(), TotalC2013 = col_integer(), TotalC2014 = col_integer(), TotalP2010 = col_integer(), TotalP2011 = col_integer(), TotalP2012 = col_integer(), TotalP2013 = col_integer(), TotalP2014 = col_integer(), BiomassC2010 = col_integer(), BiomassC2011 = col_integer(), BiomassC2012 = col_integer(), BiomassC2013 = col_integer() # ... with 77 more columns ) See spec(...) for full column specifications.
print(energy)
# A tibble: 52 × 192 StateCodes State Region Division Coast `Great Lakes` TotalC2010 <chr> <chr> <int> <int> <int> <int> <int> 1 AL Alabama 3 6 1 0 1931522 2 AK Alaska 4 9 1 0 653221 3 AZ Arizona 4 8 0 0 1383531 4 AR Arkansas 3 7 0 0 1120632 5 CA California 4 9 1 0 7760629 6 CO Colorado 4 8 0 0 1513547 7 CT Connecticut 1 1 1 0 764970 8 DE Delaware 3 5 1 0 250212 9 FL Florida 3 5 1 0 4282673 10 GA Georgia 3 5 1 0 3100144 # ... with 42 more rows, and 185 more variables: TotalC2011 <int>, # TotalC2012 <int>, TotalC2013 <int>, TotalC2014 <int>, TotalP2010 <int>, # TotalP2011 <int>, TotalP2012 <int>, TotalP2013 <int>, TotalP2014 <int>, # TotalE2010 <dbl>, TotalE2011 <dbl>, TotalE2012 <dbl>, TotalE2013 <dbl>, # TotalE2014 <dbl>, TotalPrice2010 <dbl>, TotalPrice2011 <dbl>, # TotalPrice2012 <dbl>, TotalPrice2013 <dbl>, TotalPrice2014 <dbl>, # `TotalC10-11` <dbl>, `TotalC11-12` <dbl>, `TotalC12-13` <dbl>, # `TotalC13-14` <dbl>, `TotalP10-11` <dbl>, `TotalP11-12` <dbl>, # `TotalP12-13` <dbl>, `TotalP13-14` <dbl>, `TotalE10-11` <dbl>, # `TotalE11-12` <dbl>, `TotalE12-13` <dbl>, `TotalE13-14` <dbl>, # `TotalPrice10-11` <dbl>, `TotalPrice11-12` <dbl>, `TotalPrice12-13` <dbl>, # `TotalPrice13-14` <dbl>, BiomassC2010 <int>, BiomassC2011 <int>, # BiomassC2012 <int>, BiomassC2013 <int>, BiomassC2014 <int>, # CoalC2010 <int>, CoalC2011 <int>, CoalC2012 <int>, CoalC2013 <int>, # CoalC2014 <int>, CoalP2010 <int>, CoalP2011 <int>, CoalP2012 <int>, # CoalP2013 <int>, CoalP2014 <int>, CoalE2010 <dbl>, CoalE2011 <dbl>, # CoalE2012 <dbl>, CoalE2013 <dbl>, CoalE2014 <dbl>, CoalPrice2010 <dbl>, # CoalPrice2011 <dbl>, CoalPrice2012 <dbl>, CoalPrice2013 <dbl>, # CoalPrice2014 <dbl>, ElecC2010 <int>, ElecC2011 <int>, ElecC2012 <int>, # ElecC2013 <int>, ElecC2014 <int>, ElecE2010 <dbl>, ElecE2011 <dbl>, # ElecE2012 <dbl>, ElecE2013 <dbl>, ElecE2014 <dbl>, ElecPrice2010 <dbl>, # ElecPrice2011 <dbl>, ElecPrice2012 <dbl>, ElecPrice2013 <dbl>, # ElecPrice2014 <dbl>, FossFuelC2010 <int>, FossFuelC2011 <int>, # FossFuelC2012 <int>, FossFuelC2013 <int>, FossFuelC2014 <int>, # GeoC2010 <int>, GeoC2011 <int>, GeoC2012 <int>, GeoC2013 <int>, # GeoC2014 <int>, GeoP2010 <int>, GeoP2011 <int>, GeoP2012 <int>, # GeoP2013 <int>, GeoP2014 <int>, HydroC2010 <int>, HydroC2011 <int>, # HydroC2012 <int>, HydroC2013 <int>, HydroC2014 <int>, HydroP2010 <int>, # HydroP2011 <int>, HydroP2012 <int>, HydroP2013 <int>, HydroP2014 <int>, ...

Dataset Cleaning

To clean this dataset, I used the gather function to take all the other columns that were not in the original shown dataset and made a column to show if there was a blank columns or NA in there accidentally in a column that is was not supposed to be in. Also since the values in the variable columns are mostly numerical except for the column of the state names, there were no mispelling of words in the columns. Since I will be mainly using the column of total consumption of energy throughout this poroject, I made them to be all in one column using the gather function to consolidate the data and bake it look easier to read and a bit cleaner. It didn't make a lot of sence to have the Total consumption spread out and for graphing and comparison purposes, having them it one column helped me to create more organized graphs. This dataset with the consolidated data is labeled "EN1"

energy2 <- gather(energy,TotalC2011: HydroP2014, key = "key", value = "numbers", na.rm = TRUE)
print(energy2)
# A tibble: 5,200 × 94 StateCodes State Region Division Coast `Great Lakes` TotalC2010 * <chr> <chr> <int> <int> <int> <int> <int> 1 AL Alabama 3 6 1 0 1931522 2 AK Alaska 4 9 1 0 653221 3 AZ Arizona 4 8 0 0 1383531 4 AR Arkansas 3 7 0 0 1120632 5 CA California 4 9 1 0 7760629 6 CO Colorado 4 8 0 0 1513547 7 CT Connecticut 1 1 1 0 764970 8 DE Delaware 3 5 1 0 250212 9 FL Florida 3 5 1 0 4282673 10 GA Georgia 3 5 1 0 3100144 # ... with 5,190 more rows, and 87 more variables: NatGasC2010 <int>, # NatGasC2011 <int>, NatGasC2012 <int>, NatGasC2013 <int>, NatGasC2014 <int>, # NatGasE2010 <dbl>, NatGasE2011 <dbl>, NatGasE2012 <dbl>, NatGasE2013 <dbl>, # NatGasE2014 <dbl>, NatGasPrice2010 <dbl>, NatGasPrice2011 <dbl>, # NatGasPrice2012 <dbl>, NatGasPrice2013 <dbl>, NatGasPrice2014 <dbl>, # LPGC2010 <int>, LPGC2011 <int>, LPGC2012 <int>, LPGC2013 <int>, # LPGC2014 <int>, LPGE2010 <dbl>, LPGE2011 <dbl>, LPGE2012 <dbl>, # LPGE2013 <dbl>, LPGE2014 <dbl>, LPGPrice2010 <dbl>, LPGPrice2011 <dbl>, # LPGPrice2012 <dbl>, LPGPrice2013 <dbl>, LPGPrice2014 <dbl>, # GDP2010Q1 <int>, GDP2010Q2 <int>, GDP2010Q3 <int>, GDP2010Q4 <int>, # GDP2010 <dbl>, GDP2011Q1 <int>, GDP2011Q2 <int>, GDP2011Q3 <int>, # GDP2011Q4 <int>, GDP2011 <dbl>, GDP2012Q1 <int>, GDP2012Q2 <int>, # GDP2012Q3 <int>, GDP2012Q4 <int>, GDP2012 <dbl>, GDP2013Q1 <int>, # GDP2013Q2 <int>, GDP2013Q3 <int>, GDP2013Q4 <int>, GDP2013 <dbl>, # GDP2014Q1 <int>, GDP2014Q2 <int>, GDP2014Q3 <int>, GDP2014Q4 <int>, # GDP2014 <dbl>, CENSUS2010POP <int>, POPESTIMATE2010 <int>, # POPESTIMATE2011 <int>, POPESTIMATE2012 <int>, POPESTIMATE2013 <int>, # POPESTIMATE2014 <int>, RBIRTH2011 <dbl>, RBIRTH2012 <dbl>, # RBIRTH2013 <dbl>, RBIRTH2014 <dbl>, RDEATH2011 <dbl>, RDEATH2012 <dbl>, # RDEATH2013 <dbl>, RDEATH2014 <dbl>, RNATURALINC2011 <dbl>, # RNATURALINC2012 <dbl>, RNATURALINC2013 <dbl>, RNATURALINC2014 <dbl>, # RINTERNATIONALMIG2011 <dbl>, RINTERNATIONALMIG2012 <dbl>, # RINTERNATIONALMIG2013 <dbl>, RINTERNATIONALMIG2014 <dbl>, # RDOMESTICMIG2011 <dbl>, RDOMESTICMIG2012 <dbl>, RDOMESTICMIG2013 <dbl>, # RDOMESTICMIG2014 <dbl>, RNETMIG2011 <dbl>, RNETMIG2012 <dbl>, # RNETMIG2013 <dbl>, RNETMIG2014 <dbl>, key <chr>, numbers <dbl>
EN1 <- gather(energy, TotalC2010:TotalC2014, key = "TotalCYear", value = "count")
EN1
StateCodes State Region Division Coast Great Lakes 1 AL Alabama 3 6 1 0 2 AK Alaska 4 9 1 0 3 AZ Arizona 4 8 0 0 4 AR Arkansas 3 7 0 0 5 CA California 4 9 1 0 6 CO Colorado 4 8 0 0 7 CT Connecticut 1 1 1 0 8 DE Delaware 3 5 1 0 9 FL Florida 3 5 1 0 10 GA Georgia 3 5 1 0 11 HI Hawaii 4 9 1 0 12 ID Idaho 4 8 0 0 13 IL Illinois 2 3 0 1 14 IN Indiana 2 3 0 1 15 IA Iowa 2 4 0 0 16 KS Kansas 2 4 0 0 17 KY Kentucky 3 6 0 0 18 LA Louisiana 3 7 1 0 19 ME Maine 1 1 1 0 20 MD Maryland 3 5 1 0 21 MA Massachusetts 1 1 1 0 22 MI Michigan 2 3 0 1 23 MN Minnesota 2 4 0 1 24 MS Mississippi 3 6 1 0 25 MO Missouri 2 4 0 0 26 MT Montana 4 8 0 0 27 NE Nebraska 2 4 0 0 28 NV Nevada 4 8 0 0 29 NH New Hampshire 1 1 1 0 30 NJ New Jersey 1 2 1 0 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 231 MN Minnesota 2 4 0 1 232 MS Mississippi 3 6 1 0 233 MO Missouri 2 4 0 0 234 MT Montana 4 8 0 0 235 NE Nebraska 2 4 0 0 236 NV Nevada 4 8 0 0 237 NH New Hampshire 1 1 1 0 238 NJ New Jersey 1 2 1 0 239 NM New Mexico 4 8 0 0 240 NY New York 1 2 1 1 241 NC North Carolina 3 5 1 0 242 ND North Dakota 2 4 0 0 243 OH Ohio 2 3 0 1 244 OK Oklahoma 3 7 0 0 245 OR Oregon 4 9 1 0 246 PA Pennsylvania 1 2 0 1 247 RI Rhode Island 1 1 1 0 248 SC South Carolina 3 5 1 0 249 SD South Dakota 2 4 0 0 250 TN Tennessee 3 6 0 0 251 TX Texas 3 7 1 0 252 UT Utah 4 8 0 0 253 VT Vermont 1 1 0 0 254 VA Virginia 3 5 1 0 255 WA Washington 4 9 1 0 256 WV West Virginia 3 5 0 0 257 WI Wisconsin 2 3 0 1 258 WY Wyoming 4 8 0 0 259 DC District of Columbia 3 5 0 0 260 US United States NA NA NA NA TotalP2010 TotalP2011 TotalP2012 TotalP2013 ⋯ RDOMESTICMIG2011 1 1419613 1400108 1433370 1463647 ⋯ -0.02044325 2 1738207 1641980 1563102 1513859 ⋯ -1.17513721 3 580948 617956 598039 594994 ⋯ 1.34147180 4 1247709 1391190 1472778 1432074 ⋯ 1.36951366 5 2532205 2634789 2334863 2390424 ⋯ -1.16207924 6 2561459 2750097 2921385 2838193 ⋯ 5.18339661 7 203188 195792 205073 207118 ⋯ -3.38443506 8 3575 3976 3530 3818 ⋯ 2.86684813 9 510201 500907 442188 542570 ⋯ 5.54039345 10 561796 549483 555238 581082 ⋯ 1.10581577 11 16245 18378 20485 25465 ⋯ -0.72861180 12 130652 176717 154829 138799 ⋯ 0.05833087 13 2090939 2189423 2436702 2519993 ⋯ -5.42476227 14 987568 1062319 1046014 1106441 ⋯ -1.30254263 15 673102 689146 677875 730473 ⋯ 0.09157511 16 812830 781704 793080 830704 ⋯ -3.15696832 17 2781466 2840858 2391118 2138426 ⋯ 0.59702684 18 3170742 3972994 3789540 3208044 ⋯ 0.45474712 19 151940 154764 152331 155380 ⋯ 0.06176347 20 250812 272882 245197 248554 ⋯ 0.06293021 21 108868 103058 110384 102521 ⋯ -0.52366579 22 626639 680857 626107 654767 ⋯ -4.37165740 23 429062 425422 417674 405967 ⋯ -0.62804606 24 448713 441341 390246 410271 ⋯ -1.96091120 25 198297 197714 203118 191759 ⋯ -2.24542626 26 1148154 1104165 1008736 1104932 ⋯ 3.43419996 27 399822 390808 357156 370523 ⋯ -0.64486314 28 51846 53807 60252 70208 ⋯ -2.82732878 29 156829 131919 129929 166328 ⋯ -1.63666494 30 376845 390931 393682 403225 ⋯ -5.11033711 ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ ⋮ 231 429062 425422 417674 405967 ⋯ -0.62804606 232 448713 441341 390246 410271 ⋯ -1.96091120 233 198297 197714 203118 191759 ⋯ -2.24542626 234 1148154 1104165 1008736 1104932 ⋯ 3.43419996 235 399822 390808 357156 370523 ⋯ -0.64486314 236 51846 53807 60252 70208 ⋯ -2.82732878 237 156829 131919 129929 166328 ⋯ -1.63666494 238 376845 390931 393682 403225 ⋯ -5.11033711 239 2255073 2260681 2315892 2367423 ⋯ 0.03089282 240 836703 879395 817908 872500 ⋯ -4.32545483 241 576162 572796 560887 606427 ⋯ 3.17214261 242 1253749 1518406 2135154 2634178 ⋯ 9.14542431 243 1032440 1067943 1067944 1149882 ⋯ -3.18858790 244 2576077 2704301 2950525 3137443 ⋯ 1.67426274 245 393763 513597 496254 458848 ⋯ 2.95766482 246 3064745 3870671 4730336 5873742 ⋯ -0.54676726 247 3399 3419 2822 2627 ⋯ -5.83915808 248 649978 661898 646738 690644 ⋯ 3.19120230 249 225387 248370 248616 236679 ⋯ 2.44945783 250 505731 506701 468757 544916 ⋯ 2.45252515 251 11412134 12571450 14180982 15688922 ⋯ 4.57794456 252 1073063 1128821 1118599 1122868 ⋯ -0.32882084 253 80143 80677 77979 84188 ⋯ -0.75225076 254 1096973 1085234 1045095 1023009 ⋯ 1.33171670 255 907957 1101186 1108998 1003330 ⋯ 3.44647012 256 3699729 3820203 3720223 3809750 ⋯ 0.58611685 257 329441 312552 333635 310413 ⋯ -1.09492384 258 10536570 10353148 9611304 9233869 ⋯ -0.36219433 259 95 271 277 323 ⋯ 11.33288241 260 74593106 77778786 79032062 81604540 ⋯ NA RDOMESTICMIG2012 RDOMESTICMIG2013 RDOMESTICMIG2014 RNETMIG2011 RNETMIG2012 1 -0.1684135 0.3964159 0.4201015 1.01194084 1.0013329 2 -1.9495712 -3.7893131 -13.7544937 0.94818536 1.8353761 3 -0.4208753 -0.5805623 -1.3130505 2.31780119 0.6219715 4 5.1312819 3.9104760 6.2806359 3.33662764 7.1552124 5 -1.1739507 -1.3412263 -0.8309823 2.76137748 2.7727695 6 5.5536752 6.9775832 7.5871626 6.93315930 7.6608637 7 -5.6114915 -4.7316382 -7.2862519 1.11689425 -1.0591659 8 3.5983800 3.3971710 5.1481739 5.30328177 6.2212626 9 5.1253203 4.9187834 7.0161227 11.35960582 10.7225728 10 1.8522009 -0.5768876 2.2004666 3.37500669 4.4262675 11 -2.5672885 -0.6631562 -3.6350806 4.25497620 4.3161049 12 -0.1635544 2.9865046 4.7386958 0.95294886 0.8485958 13 -5.6904719 -5.2380977 -7.3691757 -3.18271398 -3.3823391 14 -1.9541615 -0.2311505 -1.1921716 0.15591763 -0.5077296 15 -1.3988317 0.1501230 -0.2613128 1.68465491 0.1895506 16 -1.7898060 -4.3695984 -4.7601461 -1.40654930 0.4746409 17 -1.2712625 -0.5421808 -0.8589545 1.77800694 0.1999200 18 -0.1679608 -0.4968579 -1.3115694 1.94023978 1.5255891 19 -0.4675286 -1.1026255 0.3994297 0.95130816 0.5345335 20 -1.4320466 -1.5138764 -2.5673210 4.60404951 3.2782158 21 -1.6315826 -0.3254842 -2.4310476 4.78903744 3.6469429 22 -3.3942432 -2.9868108 -2.8956885 -2.54604112 -1.5226322 23 -1.6574311 -0.4143418 -1.2309691 1.73345966 0.7482767 24 -1.8894810 -1.6054616 -3.1344983 -1.33764198 -0.9667034 25 -2.2406441 -1.3487765 -1.3336080 -0.98795424 -0.8004437 26 3.5509860 5.2979490 4.4641915 4.04076780 4.4167635 27 -0.4916545 -0.4784591 -1.3603621 1.16935909 1.5144472 28 5.2844160 4.7552264 8.3909457 -0.08078082 8.1453008 29 -0.3985745 -1.9864496 0.8432006 -0.30288929 1.0456898 30 -5.5890376 -5.1205616 -6.2151265 0.45426226 -0.1989998 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 231 -1.65743112 -0.4143418 -1.2309691 1.73345966 0.7482767 232 -1.88948096 -1.6054616 -3.1344983 -1.33764198 -0.9667034 233 -2.24064408 -1.3487765 -1.3336080 -0.98795424 -0.8004437 234 3.55098601 5.2979490 4.4641915 4.04076780 4.4167635 235 -0.49165445 -0.4784591 -1.3603621 1.16935909 1.5144472 236 5.28441598 4.7552264 8.3909457 -0.08078082 8.1453008 237 -0.39857453 -1.9864496 0.8432006 -0.30288929 1.0456898 238 -5.58903761 -5.1205616 -6.2151265 0.45426226 -0.1989998 239 -3.42012889 -5.0629404 -6.7844755 1.09814337 -2.0783084 240 -5.94680886 -5.4820494 -7.8049472 1.49434986 -0.3189971 241 3.29953845 3.8869020 3.6636405 5.04794391 5.7729809 242 15.58963681 23.1894509 12.2651006 10.67971377 17.8045736 243 -3.20406546 -1.9123029 -1.5749692 -1.81998332 -1.7622360 244 2.35073293 3.5996562 1.1322997 3.00715292 3.9555020 245 3.37611288 2.4896662 5.7404707 4.52890803 4.9325756 246 -1.35870300 -2.4839403 -2.4599014 1.57878163 0.7970514 247 -5.02599711 -4.8746647 -3.2126693 -2.05881151 -1.0319971 248 5.61300811 5.9543633 8.0408887 4.29289110 7.2294966 249 5.24394471 5.1975757 0.6616883 3.99058013 7.0598520 250 4.28628133 2.0897983 3.7574480 3.69627104 5.7299274 251 5.63098177 4.4282840 5.7785074 7.67527476 8.7241630 252 -0.03068503 1.9208122 -0.4225336 1.41976181 1.7599803 253 -2.60420825 -1.0981705 -2.4716435 0.25554166 -1.5168595 254 0.60391533 0.3041831 -2.4583298 4.97522119 5.0037998 255 1.96217689 2.2507462 3.9989250 6.41638546 5.3702911 256 0.56206796 -1.2528613 -1.4843729 1.14365578 1.1715587 257 -1.65388673 -1.3697434 -1.7270528 -0.05843103 -0.6192236 258 9.65117376 4.5478211 -4.5777881 0.32332470 10.6105245 259 10.00583847 9.7776663 1.7935725 16.80595478 15.5957903 260 NA NA NA 2.94196766 3.0519321 RNETMIG2013 RNETMIG2014 TotalCYear count 1 1.56224745 1.57796256 TotalC2010 1931522 2 -0.58569541 -10.88473403 TotalC2010 653221 3 0.50947307 -0.22176714 TotalC2010 1383531 4 6.05235310 8.41044101 TotalC2010 1120632 5 2.86612681 3.34640648 TotalC2010 7760629 6 9.05178284 9.59789807 TotalC2010 1513547 7 0.02196365 -2.55530211 TotalC2010 764970 8 6.00612012 7.71366265 TotalC2010 250212 9 10.70250086 12.70342279 TotalC2010 4282673 10 1.93363794 4.67089008 TotalC2010 3100144 11 5.76353447 2.43941415 TotalC2010 278046 12 4.03935504 5.78263774 TotalC2010 516120 13 -2.73308256 -4.85062123 TotalC2010 3955091 14 1.35562499 0.39840311 TotalC2010 2863396 15 1.88253574 1.47044899 TotalC2010 1499729 16 -2.26263009 -2.70939350 TotalC2010 1117631 17 0.85255141 0.49789857 TotalC2010 1978527 18 1.15507555 0.30908636 TotalC2010 4385758 19 -0.05720105 1.43523880 TotalC2010 415065 20 3.40085453 2.29271254 TotalC2010 1464503 21 5.26431452 3.11142579 TotalC2010 1416119 22 -0.95779330 -0.86681842 TotalC2010 2753536 23 2.16927752 1.34310939 TotalC2010 1857095 24 -0.82129781 -2.38277997 TotalC2010 1177620 25 0.10538352 0.09365317 TotalC2010 1910500 26 6.09001761 5.21574555 TotalC2010 400855 27 1.57768007 0.66871583 TotalC2010 860741 28 7.78763883 11.39453695 TotalC2010 645604 29 -0.45009045 2.37258670 TotalC2010 294473 30 0.67867877 -0.43059603 TotalC2010 2395713 ⋮ ⋮ ⋮ ⋮ ⋮ 231 2.16927752 1.34310939 TotalC2014 1912065 232 -0.82129781 -2.38277997 TotalC2014 1155549 233 0.10538352 0.09365317 TotalC2014 1903839 234 6.09001761 5.21574555 TotalC2014 403446 235 1.57768007 0.66871583 TotalC2014 864347 236 7.78763883 11.39453695 TotalC2014 660256 237 -0.45009045 2.37258670 TotalC2014 310142 238 0.67867877 -0.43059603 TotalC2014 2340188 239 -3.77898635 -5.50369841 TotalC2014 679136 240 0.56988277 -1.78094837 TotalC2014 3742892 241 6.14754287 5.80461228 TotalC2014 2554776 242 25.10448511 14.02819169 TotalC2014 640095 243 -0.39372214 -0.06103728 TotalC2014 3809648 244 5.10835669 2.60452203 TotalC2014 1679856 245 4.21784158 7.47805827 TotalC2014 987145 246 -0.21118267 -0.18679231 TotalC2014 3902434 247 -0.76828438 0.85652211 TotalC2014 204459 248 7.33368090 9.35153650 TotalC2014 1632085 249 6.94875162 2.36771385 TotalC2014 391857 250 3.50729121 5.16164300 TotalC2014 2194512 251 7.62949715 8.94472107 TotalC2014 12899498 252 3.80966870 1.44722034 TotalC2014 797995 253 0.05267388 -1.32437968 TotalC2014 139897 254 4.49933481 1.64009160 TotalC2014 2430205 255 5.70595658 7.39508290 TotalC2014 2011941 256 -0.62157876 -0.85584979 TotalC2014 752942 257 -0.24032423 -0.58797357 TotalC2014 1868867 258 5.40635592 -3.74686476 TotalC2014 535612 259 15.64924997 7.54279039 TotalC2014 178929 260 3.16821052 3.13508085 TotalC2014 98385210

Data Citation And Summary

Citation for Energy Consumption Dataset

File Name: United States Energy, Census, and GDP

Website of dataset: Kaggle.com

Link to dataset: https://www.kaggle.com/lislejoem/us_energy_census_gdp_10-14

I did not use any filters while downloading this data. The provider/ creator of this dataset got the information they put into ther own dataset from external sources such as the Bureau of Economic Analysis for the data on GDP/Census and the U.S. Department of Commerce and the U.S. Energy Information Administration for information on energy consumption

Summary

This dataset is about the U.S. Census, Gross Domestc Product, and Energy consumption of both renewable and non renewable resources. Some of the columns are labeled State/ StateCodes which is the state abbreviation, Region in which a number 1-4 is allocated to a state in a sspecific region ( i.e. 1 = Northeast, 2 = Midwest, 3 = South, 4 = West), TotalC(year) which means the total energy consumption in billion British Thermal Units (BTU) in a given year, the type of renewable/nonrenewable energy source with a "C" (consumption) at the end ("CoalC2012") to show the total consumption of that product. The rest of the columns either list the resources or tell about the human population of each state and the birth/death rate, etc. Also please take into account that the row with a lot of NA for the values is from the total of the entire United states (US). A lot of the column variables do not apply to this row, this is why there is NA present.

Data Analysis and Plots

Question 1

In the dataset, is there any region that has an extreme outlier when it comes to energy consumption in the year 2010?

To create this scatter plot I downloaded ggplot2 in the R package. I made the data used in the plot to be the energy2 dataset. After i made it into a scatterplot by using geom point and made the x-axis labeled regions and the y-axis labeled TotalC2010 and for the grapg to be color coded by the "StateCodes"

ggplot(data = energy2) + geom_point(mapping = aes(x = Region, y = TotalC2010, color = StateCodes))
Warning message: “Removed 100 rows containing missing values (geom_point).”
MIME type unknown not supported
Image in a Jupyter notebook

We see that the South and Midwestern Regions have outliers, which are Texas and California. These states are 2 of the biggest states in the U.S. and have some of the largest opulations, so it makes sence that their numbers would be outside of the frames of the other states. In other states in the 4 regions resented, we see that most states have an total of .03 billion British Themal Units

Question 2

In 2010 and 2014, which region paid the highest amount for energy consumption, were there any decreases in the total spending?

In this problem I wanted to make a graph comparing the prives from 2 years in the different divisions. Note: the prices are in billions and USD. I created a name for this new table which is energy3, and used the select function to choose the columns from the original plot I wanted to see in the graph.

energy3 <- select(EN1, Division, TotalPrice2010, TotalPrice2014)
energy3
Division TotalPrice2010 TotalPrice2014 1 6 17.82 18.64 2 9 20.13 24.43 3 8 22.25 25.94 4 7 16.90 18.87 5 9 20.97 25.31 6 8 17.40 21.25 7 1 25.62 27.84 8 5 23.50 23.77 9 5 22.05 24.91 10 5 18.25 20.95 11 9 30.43 37.38 12 8 17.14 20.43 13 3 17.39 19.02 14 3 14.89 17.94 15 4 15.68 18.24 16 4 17.70 21.46 17 6 17.16 20.75 18 7 15.06 15.49 19 1 18.77 22.62 20 5 22.83 25.25 21 1 23.33 26.36 22 3 18.47 20.01 23 4 17.04 19.65 24 6 17.90 20.78 25 4 18.84 21.94 26 8 18.26 21.72 27 4 16.68 19.26 28 8 21.22 23.52 29 1 24.21 27.86 30 2 21.28 22.78 ⋮ ⋮ ⋮ ⋮ 231 4 17.04 19.65 232 6 17.90 20.78 233 4 18.84 21.94 234 8 18.26 21.72 235 4 16.68 19.26 236 8 21.22 23.52 237 1 24.21 27.86 238 2 21.28 22.78 239 8 20.02 23.37 240 2 23.09 24.57 241 5 20.27 23.02 242 4 13.79 18.51 243 3 18.18 20.01 244 7 17.16 19.97 245 9 19.20 22.29 246 2 19.74 21.44 247 1 23.63 26.34 248 5 18.52 21.74 249 4 17.21 20.32 250 6 18.67 21.47 251 7 17.36 19.05 252 8 16.89 20.71 253 1 23.75 27.60 254 5 19.12 21.66 255 9 18.33 21.11 256 5 17.26 19.97 257 3 18.48 20.74 258 8 15.16 18.70 259 5 25.76 24.96 260 NA 18.92 21.33
ggplot(energy3) + geom_col(mapping = aes(x = Division, y =TotalPrice2010))
Warning message: “Removed 5 rows containing missing values (position_stack).”
MIME type unknown not supported
Image in a Jupyter notebook
ggplot(energy3) + geom_col(mapping = aes(x = Division, y = TotalPrice2014))
Warning message: “Removed 5 rows containing missing values (position_stack).”
MIME type unknown not supported
Image in a Jupyter notebook

In both 2010 and 2014, Division 5 has spent the most amount of money in both years, but in 2010, Division 5 actually spent a lower amount of money on energy in comparison to 2014

Question 3

For each state, what is the average consumption of energy from 2010 to 2014?

Mean_energy <- mutate(energy, mean_energy = (TotalC2010 + TotalC2011 + TotalC2012 + TotalC2013 + TotalC2014) / 5)

To add the mean column to the dataset, I assigned a name for the new column. Then I took the mutate function and added TotalC for all 5 years (2010 - 2014) and divided it by 5 since there were 5 columns being added together.

Mean_energy
StateCodes State Region Division Coast Great Lakes TotalC2010 1 AL Alabama 3 6 1 0 1931522 2 AK Alaska 4 9 1 0 653221 3 AZ Arizona 4 8 0 0 1383531 4 AR Arkansas 3 7 0 0 1120632 5 CA California 4 9 1 0 7760629 6 CO Colorado 4 8 0 0 1513547 7 CT Connecticut 1 1 1 0 764970 8 DE Delaware 3 5 1 0 250212 9 FL Florida 3 5 1 0 4282673 10 GA Georgia 3 5 1 0 3100144 11 HI Hawaii 4 9 1 0 278046 12 ID Idaho 4 8 0 0 516120 13 IL Illinois 2 3 0 1 3955091 14 IN Indiana 2 3 0 1 2863396 15 IA Iowa 2 4 0 0 1499729 16 KS Kansas 2 4 0 0 1117631 17 KY Kentucky 3 6 0 0 1978527 18 LA Louisiana 3 7 1 0 4385758 19 ME Maine 1 1 1 0 415065 20 MD Maryland 3 5 1 0 1464503 21 MA Massachusetts 1 1 1 0 1416119 22 MI Michigan 2 3 0 1 2753536 23 MN Minnesota 2 4 0 1 1857095 24 MS Mississippi 3 6 1 0 1177620 25 MO Missouri 2 4 0 0 1910500 26 MT Montana 4 8 0 0 400855 27 NE Nebraska 2 4 0 0 860741 28 NV Nevada 4 8 0 0 645604 29 NH New Hampshire 1 1 1 0 294473 30 NJ New Jersey 1 2 1 0 2395713 31 NM New Mexico 4 8 0 0 649962 32 NY New York 1 2 1 1 3723729 33 NC North Carolina 3 5 1 0 2685333 34 ND North Dakota 2 4 0 0 476072 35 OH Ohio 2 3 0 1 3824933 36 OK Oklahoma 3 7 0 0 1579910 37 OR Oregon 4 9 1 0 975067 38 PA Pennsylvania 1 2 0 1 3752280 39 RI Rhode Island 1 1 1 0 195314 40 SC South Carolina 3 5 1 0 1643912 41 SD South Dakota 2 4 0 0 378514 42 TN Tennessee 3 6 0 0 2247273 43 TX Texas 3 7 1 0 11687521 44 UT Utah 4 8 0 0 756012 45 VT Vermont 1 1 0 0 153697 46 VA Virginia 3 5 1 0 2483360 47 WA Washington 4 9 1 0 2031428 48 WV West Virginia 3 5 0 0 738821 49 WI Wisconsin 2 3 0 1 1791199 50 WY Wyoming 4 8 0 0 540122 51 DC District of Columbia 3 5 0 0 190529 52 US United States NA NA NA NA 97446021 TotalC2011 TotalC2012 TotalC2013 ⋯ RINTERNATIONALMIG2014 RDOMESTICMIG2011 1 1905207 1879716 1919365 ⋯ 1.1578610 -0.02044325 2 653637 649341 621107 ⋯ 2.8697597 -1.17513721 3 1424944 1395839 1414383 ⋯ 1.0912833 1.34147180 4 1122544 1067642 1096438 ⋯ 2.1298051 1.36951366 5 7777115 7564063 7665241 ⋯ 4.1773888 -1.16207924 6 1470445 1440781 1470844 ⋯ 2.0107355 5.18339661 7 739130 725019 754901 ⋯ 4.7309498 -3.38443506 8 272568 273728 273716 ⋯ 2.5654887 2.86684813 9 4141711 4029903 4076406 ⋯ 5.6873001 5.54039345 10 2982837 2767491 2782782 ⋯ 2.4704235 1.10581577 11 287113 280171 281329 ⋯ 6.0744948 -0.72861180 12 516978 510869 526613 ⋯ 1.0439419 0.05833087 13 3937616 3820547 3991089 ⋯ 2.5185545 -5.42476227 14 2847188 2770158 2894764 ⋯ 1.5905747 -1.30254263 15 1498973 1440053 1518870 ⋯ 1.7317618 0.09157511 16 1104843 1075435 1105160 ⋯ 2.0507526 -3.15696832 17 1903208 1868483 1838898 ⋯ 1.3568530 0.59702684 18 4388867 4255161 4196622 ⋯ 1.6206558 0.45474712 19 413893 399670 409785 ⋯ 1.0358091 0.06176347 20 1410012 1368600 1395961 ⋯ 4.8600336 0.06293021 21 1397164 1363282 1428433 ⋯ 5.5424734 -0.52366579 22 2785212 2687926 2832425 ⋯ 2.0288701 -4.37165740 23 1850749 1816866 1858816 ⋯ 2.5740785 -0.62804606 24 1155456 1143099 1138181 ⋯ 0.7517183 -1.96091120 25 1856590 1781978 1847506 ⋯ 1.4272611 -2.24542626 26 402355 395724 405934 ⋯ 0.7515540 3.43419996 27 862675 852984 871815 ⋯ 2.0290779 -0.64486314 28 632655 639190 659568 ⋯ 3.0035913 -2.82732878 29 292979 284490 304538 ⋯ 1.5293861 -1.63666494 30 2411816 2241207 2311685 ⋯ 5.7845304 -5.11033711 31 668675 666540 670257 ⋯ 1.2807771 0.03089282 32 3611091 3503309 3626150 ⋯ 6.0239988 -4.32545483 33 2558792 2481060 2530649 ⋯ 2.1409718 3.17214261 34 528508 552326 594439 ⋯ 1.7630911 9.14542431 35 3792585 3655849 3739974 ⋯ 1.5139319 -3.18858790 36 1585212 1561913 1620837 ⋯ 1.4722224 1.67426274 37 1002476 975044 991867 ⋯ 1.7375876 2.95766482 38 3725014 3623997 3826959 ⋯ 2.2731090 -0.54676726 39 185731 183879 199165 ⋯ 4.0691914 -5.83915808 40 1601881 1558766 1590456 ⋯ 1.3106478 3.19120230 41 378470 375950 389619 ⋯ 1.7060255 2.44945783 42 2195401 2080953 2132508 ⋯ 1.4041950 2.45252515 43 11906249 11931169 12660976 ⋯ 3.1662137 4.57794456 44 794058 790154 831668 ⋯ 1.8697539 -0.32882084 45 150475 130412 137527 ⋯ 1.1472638 -0.75225076 46 2380922 2343908 2414477 ⋯ 4.0984214 1.33171670 47 2059630 2037127 2036309 ⋯ 3.3961579 3.44647012 48 726341 720985 743612 ⋯ 0.6285231 0.58611685 49 1778018 1721543 1813458 ⋯ 1.1390792 -1.09492384 50 556548 550182 539146 ⋯ 0.8309234 -0.36219433 51 183806 172963 175560 ⋯ 5.7492179 11.33288241 52 96827465 94411432 97141368 ⋯ 3.1350809 NA RDOMESTICMIG2012 RDOMESTICMIG2013 RDOMESTICMIG2014 RNETMIG2011 RNETMIG2012 1 -0.16841354 0.3964159 0.4201015 1.01194084 1.0013329 2 -1.94957118 -3.7893131 -13.7544937 0.94818536 1.8353761 3 -0.42087528 -0.5805623 -1.3130505 2.31780119 0.6219715 4 5.13128187 3.9104760 6.2806359 3.33662764 7.1552124 5 -1.17395070 -1.3412263 -0.8309823 2.76137748 2.7727695 6 5.55367522 6.9775832 7.5871626 6.93315930 7.6608637 7 -5.61149155 -4.7316382 -7.2862519 1.11689425 -1.0591659 8 3.59838002 3.3971710 5.1481739 5.30328177 6.2212626 9 5.12532032 4.9187834 7.0161227 11.35960582 10.7225728 10 1.85220088 -0.5768876 2.2004666 3.37500669 4.4262675 11 -2.56728847 -0.6631562 -3.6350806 4.25497620 4.3161049 12 -0.16355442 2.9865046 4.7386958 0.95294886 0.8485958 13 -5.69047190 -5.2380977 -7.3691757 -3.18271398 -3.3823391 14 -1.95416154 -0.2311505 -1.1921716 0.15591763 -0.5077296 15 -1.39883166 0.1501230 -0.2613128 1.68465491 0.1895506 16 -1.78980603 -4.3695984 -4.7601461 -1.40654930 0.4746409 17 -1.27126249 -0.5421808 -0.8589545 1.77800694 0.1999200 18 -0.16796076 -0.4968579 -1.3115694 1.94023978 1.5255891 19 -0.46752860 -1.1026255 0.3994297 0.95130816 0.5345335 20 -1.43204655 -1.5138764 -2.5673210 4.60404951 3.2782158 21 -1.63158264 -0.3254842 -2.4310476 4.78903744 3.6469429 22 -3.39424318 -2.9868108 -2.8956885 -2.54604112 -1.5226322 23 -1.65743112 -0.4143418 -1.2309691 1.73345966 0.7482767 24 -1.88948096 -1.6054616 -3.1344983 -1.33764198 -0.9667034 25 -2.24064408 -1.3487765 -1.3336080 -0.98795424 -0.8004437 26 3.55098601 5.2979490 4.4641915 4.04076780 4.4167635 27 -0.49165445 -0.4784591 -1.3603621 1.16935909 1.5144472 28 5.28441598 4.7552264 8.3909457 -0.08078082 8.1453008 29 -0.39857453 -1.9864496 0.8432006 -0.30288929 1.0456898 30 -5.58903761 -5.1205616 -6.2151265 0.45426226 -0.1989998 31 -3.42012889 -5.0629404 -6.7844755 1.09814337 -2.0783084 32 -5.94680886 -5.4820494 -7.8049472 1.49434986 -0.3189971 33 3.29953845 3.8869020 3.6636405 5.04794391 5.7729809 34 15.58963681 23.1894509 12.2651006 10.67971377 17.8045736 35 -3.20406546 -1.9123029 -1.5749692 -1.81998332 -1.7622360 36 2.35073293 3.5996562 1.1322997 3.00715292 3.9555020 37 3.37611288 2.4896662 5.7404707 4.52890803 4.9325756 38 -1.35870300 -2.4839403 -2.4599014 1.57878163 0.7970514 39 -5.02599711 -4.8746647 -3.2126693 -2.05881151 -1.0319971 40 5.61300811 5.9543633 8.0408887 4.29289110 7.2294966 41 5.24394471 5.1975757 0.6616883 3.99058013 7.0598520 42 4.28628133 2.0897983 3.7574480 3.69627104 5.7299274 43 5.63098177 4.4282840 5.7785074 7.67527476 8.7241630 44 -0.03068503 1.9208122 -0.4225336 1.41976181 1.7599803 45 -2.60420825 -1.0981705 -2.4716435 0.25554166 -1.5168595 46 0.60391533 0.3041831 -2.4583298 4.97522119 5.0037998 47 1.96217689 2.2507462 3.9989250 6.41638546 5.3702911 48 0.56206796 -1.2528613 -1.4843729 1.14365578 1.1715587 49 -1.65388673 -1.3697434 -1.7270528 -0.05843103 -0.6192236 50 9.65117376 4.5478211 -4.5777881 0.32332470 10.6105245 51 10.00583847 9.7776663 1.7935725 16.80595478 15.5957903 52 NA NA NA 2.94196766 3.0519321 RNETMIG2013 RNETMIG2014 mean_energy 1 1.56224745 1.57796256 1918806.2 2 -0.58569541 -10.88473403 636085.0 3 0.50947307 -0.22176714 1408257.4 4 6.05235310 8.41044101 1104333.0 5 2.86612681 3.34640648 7677426.0 6 9.05178284 9.59789807 1474558.8 7 0.02196365 -2.55530211 746807.8 8 6.00612012 7.71366265 268847.4 9 10.70250086 12.70342279 4130474.6 10 1.93363794 4.67089008 2896848.8 11 5.76353447 2.43941415 281573.0 12 4.03935504 5.78263774 518103.0 13 -2.73308256 -4.85062123 3949331.2 14 1.35562499 0.39840311 2861427.2 15 1.88253574 1.47044899 1499905.0 16 -2.26263009 -2.70939350 1107084.6 17 0.85255141 0.49789857 1871971.4 18 1.15507555 0.30908636 4301170.2 19 -0.05720105 1.43523880 409788.6 20 3.40085453 2.29271254 1407938.0 21 5.26431452 3.11142579 1408527.6 22 -0.95779330 -0.86681842 2788129.8 23 2.16927752 1.34310939 1859118.2 24 -0.82129781 -2.38277997 1153981.0 25 0.10538352 0.09365317 1860082.6 26 6.09001761 5.21574555 401662.8 27 1.57768007 0.66871583 862512.4 28 7.78763883 11.39453695 647454.6 29 -0.45009045 2.37258670 297324.4 30 0.67867877 -0.43059603 2340121.8 31 -3.77898635 -5.50369841 666914.0 32 0.56988277 -1.78094837 3641434.2 33 6.14754287 5.80461228 2562122.0 34 25.10448511 14.02819169 558288.0 35 -0.39372214 -0.06103728 3764597.8 36 5.10835669 2.60452203 1605545.6 37 4.21784158 7.47805827 986319.8 38 -0.21118267 -0.18679231 3766136.8 39 -0.76828438 0.85652211 193709.6 40 7.33368090 9.35153650 1605420.0 41 6.94875162 2.36771385 382882.0 42 3.50729121 5.16164300 2170129.4 43 7.62949715 8.94472107 12217082.6 44 3.80966870 1.44722034 793977.4 45 0.05267388 -1.32437968 142401.6 46 4.49933481 1.64009160 2410574.4 47 5.70595658 7.39508290 2035287.0 48 -0.62157876 -0.85584979 736540.2 49 -0.24032423 -0.58797357 1794617.0 50 5.40635592 -3.74686476 544322.0 51 15.64924997 7.54279039 180357.4 52 3.16821052 3.13508085 96842299.2

Question 4

Compare 2 states, 1 that has a coast and 1 that has does not have a coast, do landlocked states use more or less energy than states that have a body of water next to them?

First I will assign a name to the subset of the table EN1 by naming it "Nonrenewable". Then I selected the groups of the columns from the original table to add to this subset table.

Nonrenewable <- group_by(EN1, StateCodes , Coast, TotalCYear, count )

by putting summerize in fromt of the new table name, the subset data can be printed to be viewd and analyzed

summarize(Nonrenewable)
StateCodes Coast TotalCYear count 1 AK 1 TotalC2010 653221 2 AK 1 TotalC2011 653637 3 AK 1 TotalC2012 649341 4 AK 1 TotalC2013 621107 5 AK 1 TotalC2014 603119 6 AL 1 TotalC2010 1931522 7 AL 1 TotalC2011 1905207 8 AL 1 TotalC2012 1879716 9 AL 1 TotalC2013 1919365 10 AL 1 TotalC2014 1958221 11 AR 0 TotalC2010 1120632 12 AR 0 TotalC2011 1122544 13 AR 0 TotalC2012 1067642 14 AR 0 TotalC2013 1096438 15 AR 0 TotalC2014 1114409 16 AZ 0 TotalC2010 1383531 17 AZ 0 TotalC2011 1424944 18 AZ 0 TotalC2012 1395839 19 AZ 0 TotalC2013 1414383 20 AZ 0 TotalC2014 1422590 21 CA 1 TotalC2010 7760629 22 CA 1 TotalC2011 7777115 23 CA 1 TotalC2012 7564063 24 CA 1 TotalC2013 7665241 25 CA 1 TotalC2014 7620082 26 CO 0 TotalC2010 1513547 27 CO 0 TotalC2011 1470445 28 CO 0 TotalC2012 1440781 29 CO 0 TotalC2013 1470844 30 CO 0 TotalC2014 1477177 ⋮ ⋮ ⋮ ⋮ ⋮ 231 VA 1 TotalC2010 2483360 232 VA 1 TotalC2011 2380922 233 VA 1 TotalC2012 2343908 234 VA 1 TotalC2013 2414477 235 VA 1 TotalC2014 2430205 236 VT 0 TotalC2010 153697 237 VT 0 TotalC2011 150475 238 VT 0 TotalC2012 130412 239 VT 0 TotalC2013 137527 240 VT 0 TotalC2014 139897 241 WA 1 TotalC2010 2031428 242 WA 1 TotalC2011 2059630 243 WA 1 TotalC2012 2037127 244 WA 1 TotalC2013 2036309 245 WA 1 TotalC2014 2011941 246 WI 0 TotalC2010 1791199 247 WI 0 TotalC2011 1778018 248 WI 0 TotalC2012 1721543 249 WI 0 TotalC2013 1813458 250 WI 0 TotalC2014 1868867 251 WV 0 TotalC2010 738821 252 WV 0 TotalC2011 726341 253 WV 0 TotalC2012 720985 254 WV 0 TotalC2013 743612 255 WV 0 TotalC2014 752942 256 WY 0 TotalC2010 540122 257 WY 0 TotalC2011 556548 258 WY 0 TotalC2012 550182 259 WY 0 TotalC2013 539146 260 WY 0 TotalC2014 535612

Looking at both the land locked states and the states with with coastal fronts, it is fair to say that most of the time that states that have coasts use more energy than states that are land locked. For example Virginia, a coastal stste, in 2011 used a total of 2,380,922 btu of energy while Arizona used 1,114,409 btu of energy. But there are a few coastal states that use less energy than land locked states.

Question 5

Based on the Summarized report in the previous problem, what was New Yorks total consumption each year from 2010 to 2014? What line of best fit would show the trend of the graph accurately?

For this function, we are filtering out any row that has "NY" as a state in the Nonrenewable table and making it into its own table. I labeled this new table New_York.

library(dplyr) New_York <- filter(Nonrenewable, StateCodes == "NY" )

Afterwards I printed out the table for the filter by recalling the name "New_York"

New_York
StateCodes State Region Division Coast Great Lakes TotalP2010 TotalP2011 1 NY New York 1 2 1 1 836703 879395 2 NY New York 1 2 1 1 836703 879395 3 NY New York 1 2 1 1 836703 879395 4 NY New York 1 2 1 1 836703 879395 5 NY New York 1 2 1 1 836703 879395 TotalP2012 TotalP2013 ⋯ RDOMESTICMIG2011 RDOMESTICMIG2012 RDOMESTICMIG2013 1 817908 872500 ⋯ -4.325455 -5.946809 -5.482049 2 817908 872500 ⋯ -4.325455 -5.946809 -5.482049 3 817908 872500 ⋯ -4.325455 -5.946809 -5.482049 4 817908 872500 ⋯ -4.325455 -5.946809 -5.482049 5 817908 872500 ⋯ -4.325455 -5.946809 -5.482049 RDOMESTICMIG2014 RNETMIG2011 RNETMIG2012 RNETMIG2013 RNETMIG2014 TotalCYear 1 -7.804947 1.49435 -0.3189971 0.5698828 -1.780948 TotalC2010 2 -7.804947 1.49435 -0.3189971 0.5698828 -1.780948 TotalC2011 3 -7.804947 1.49435 -0.3189971 0.5698828 -1.780948 TotalC2012 4 -7.804947 1.49435 -0.3189971 0.5698828 -1.780948 TotalC2013 5 -7.804947 1.49435 -0.3189971 0.5698828 -1.780948 TotalC2014 count 1 3723729 2 3611091 3 3503309 4 3626150 5 3742892

Afterwards, I created a line plot by calling the package ggplot to print the graphing background. Then i set the data to equal the "New_York" table. I then call the plot to be a scatter plot by putting geom_point which is apart of the ggplot package. Lastly for the aesthetics of the graph, I made the x axis equal to the "TotalCYear" and the y axs equal to the "count" of each year.

ggplot(data = New_York) + geom_point(mapping = aes(x = TotalCYear, y = count))
MIME type unknown not supported
Image in a Jupyter notebook

The total consumption in 2010 was 3,723,729 BTUs, the total consumption for 2011 was 3,611,091 BTUs, for 2012 it was 3,503,309 BTUs, 2013 3,626,150 BTUs and 2014 it was 3,742,892 BTUs. I think the line of best fit for ths data would be a quadratic line/ parabola since there is a decline until 2012 and then an inclineor even an absolute value line since there is a sharper dip towards the center of the graph.