SharedCDS-101 / Midterm Project / Energy Consumption.ipynbOpen in CoCalc
Author: Francisca Idio
Views : 2
Description: Jupyter notebook CDS-101/Midterm Project/Energy Consumption.ipynb

Midterm Project

Name : Francisca Idio

CDS 101

3/28/2017

In [3]:
# Set the location for R packages .libPaths(new = "~/Rlibs") # Load the Tidyverse packages library(tidyverse)
Loading tidyverse: ggplot2 Loading tidyverse: tibble Loading tidyverse: tidyr Loading tidyverse: readr Loading tidyverse: purrr Conflicts with tidy packages --------------------------------------------------- filter(): dplyr, stats lag(): dplyr, stats
In [4]:
# Load the Public Passenger Vehicle Lisence dataset energy <- read_csv("Energy Census and Economic Data US 2010-2014.csv")
Parsed with column specification: cols( .default = col_double(), StateCodes = col_character(), State = col_character(), Region = col_integer(), Division = col_integer(), Coast = col_integer(), `Great Lakes` = col_integer(), TotalC2010 = col_integer(), TotalC2011 = col_integer(), TotalC2012 = col_integer(), TotalC2013 = col_integer(), TotalC2014 = col_integer(), TotalP2010 = col_integer(), TotalP2011 = col_integer(), TotalP2012 = col_integer(), TotalP2013 = col_integer(), TotalP2014 = col_integer(), BiomassC2010 = col_integer(), BiomassC2011 = col_integer(), BiomassC2012 = col_integer(), BiomassC2013 = col_integer() # ... with 77 more columns ) See spec(...) for full column specifications.
In [5]:
print(energy)
# A tibble: 52 × 192 StateCodes State Region Division Coast `Great Lakes` TotalC2010 <chr> <chr> <int> <int> <int> <int> <int> 1 AL Alabama 3 6 1 0 1931522 2 AK Alaska 4 9 1 0 653221 3 AZ Arizona 4 8 0 0 1383531 4 AR Arkansas 3 7 0 0 1120632 5 CA California 4 9 1 0 7760629 6 CO Colorado 4 8 0 0 1513547 7 CT Connecticut 1 1 1 0 764970 8 DE Delaware 3 5 1 0 250212 9 FL Florida 3 5 1 0 4282673 10 GA Georgia 3 5 1 0 3100144 # ... with 42 more rows, and 185 more variables: TotalC2011 <int>, # TotalC2012 <int>, TotalC2013 <int>, TotalC2014 <int>, TotalP2010 <int>, # TotalP2011 <int>, TotalP2012 <int>, TotalP2013 <int>, TotalP2014 <int>, # TotalE2010 <dbl>, TotalE2011 <dbl>, TotalE2012 <dbl>, TotalE2013 <dbl>, # TotalE2014 <dbl>, TotalPrice2010 <dbl>, TotalPrice2011 <dbl>, # TotalPrice2012 <dbl>, TotalPrice2013 <dbl>, TotalPrice2014 <dbl>, # `TotalC10-11` <dbl>, `TotalC11-12` <dbl>, `TotalC12-13` <dbl>, # `TotalC13-14` <dbl>, `TotalP10-11` <dbl>, `TotalP11-12` <dbl>, # `TotalP12-13` <dbl>, `TotalP13-14` <dbl>, `TotalE10-11` <dbl>, # `TotalE11-12` <dbl>, `TotalE12-13` <dbl>, `TotalE13-14` <dbl>, # `TotalPrice10-11` <dbl>, `TotalPrice11-12` <dbl>, `TotalPrice12-13` <dbl>, # `TotalPrice13-14` <dbl>, BiomassC2010 <int>, BiomassC2011 <int>, # BiomassC2012 <int>, BiomassC2013 <int>, BiomassC2014 <int>, # CoalC2010 <int>, CoalC2011 <int>, CoalC2012 <int>, CoalC2013 <int>, # CoalC2014 <int>, CoalP2010 <int>, CoalP2011 <int>, CoalP2012 <int>, # CoalP2013 <int>, CoalP2014 <int>, CoalE2010 <dbl>, CoalE2011 <dbl>, # CoalE2012 <dbl>, CoalE2013 <dbl>, CoalE2014 <dbl>, CoalPrice2010 <dbl>, # CoalPrice2011 <dbl>, CoalPrice2012 <dbl>, CoalPrice2013 <dbl>, # CoalPrice2014 <dbl>, ElecC2010 <int>, ElecC2011 <int>, ElecC2012 <int>, # ElecC2013 <int>, ElecC2014 <int>, ElecE2010 <dbl>, ElecE2011 <dbl>, # ElecE2012 <dbl>, ElecE2013 <dbl>, ElecE2014 <dbl>, ElecPrice2010 <dbl>, # ElecPrice2011 <dbl>, ElecPrice2012 <dbl>, ElecPrice2013 <dbl>, # ElecPrice2014 <dbl>, FossFuelC2010 <int>, FossFuelC2011 <int>, # FossFuelC2012 <int>, FossFuelC2013 <int>, FossFuelC2014 <int>, # GeoC2010 <int>, GeoC2011 <int>, GeoC2012 <int>, GeoC2013 <int>, # GeoC2014 <int>, GeoP2010 <int>, GeoP2011 <int>, GeoP2012 <int>, # GeoP2013 <int>, GeoP2014 <int>, HydroC2010 <int>, HydroC2011 <int>, # HydroC2012 <int>, HydroC2013 <int>, HydroC2014 <int>, HydroP2010 <int>, # HydroP2011 <int>, HydroP2012 <int>, HydroP2013 <int>, HydroP2014 <int>, ...

Dataset Cleaning

To clean this dataset, I used the gather function to take all the other columns that were not in the original shown dataset and made a column to show if there was a blank columns or NA in there accidentally in a column that is was not supposed to be in. Also since the values in the variable columns are mostly numerical except for the column of the state names, there were no mispelling of words in the columns. Since I will be mainly using the column of total consumption of energy throughout this poroject, I made them to be all in one column using the gather function to consolidate the data and bake it look easier to read and a bit cleaner. It didn't make a lot of sence to have the Total consumption spread out and for graphing and comparison purposes, having them it one column helped me to create more organized graphs. This dataset with the consolidated data is labeled "EN1"

In [6]:
energy2 <- gather(energy,TotalC2011: HydroP2014, key = "key", value = "numbers", na.rm = TRUE)
In [7]:
print(energy2)
# A tibble: 5,200 × 94 StateCodes State Region Division Coast `Great Lakes` TotalC2010 * <chr> <chr> <int> <int> <int> <int> <int> 1 AL Alabama 3 6 1 0 1931522 2 AK Alaska 4 9 1 0 653221 3 AZ Arizona 4 8 0 0 1383531 4 AR Arkansas 3 7 0 0 1120632 5 CA California 4 9 1 0 7760629 6 CO Colorado 4 8 0 0 1513547 7 CT Connecticut 1 1 1 0 764970 8 DE Delaware 3 5 1 0 250212 9 FL Florida 3 5 1 0 4282673 10 GA Georgia 3 5 1 0 3100144 # ... with 5,190 more rows, and 87 more variables: NatGasC2010 <int>, # NatGasC2011 <int>, NatGasC2012 <int>, NatGasC2013 <int>, NatGasC2014 <int>, # NatGasE2010 <dbl>, NatGasE2011 <dbl>, NatGasE2012 <dbl>, NatGasE2013 <dbl>, # NatGasE2014 <dbl>, NatGasPrice2010 <dbl>, NatGasPrice2011 <dbl>, # NatGasPrice2012 <dbl>, NatGasPrice2013 <dbl>, NatGasPrice2014 <dbl>, # LPGC2010 <int>, LPGC2011 <int>, LPGC2012 <int>, LPGC2013 <int>, # LPGC2014 <int>, LPGE2010 <dbl>, LPGE2011 <dbl>, LPGE2012 <dbl>, # LPGE2013 <dbl>, LPGE2014 <dbl>, LPGPrice2010 <dbl>, LPGPrice2011 <dbl>, # LPGPrice2012 <dbl>, LPGPrice2013 <dbl>, LPGPrice2014 <dbl>, # GDP2010Q1 <int>, GDP2010Q2 <int>, GDP2010Q3 <int>, GDP2010Q4 <int>, # GDP2010 <dbl>, GDP2011Q1 <int>, GDP2011Q2 <int>, GDP2011Q3 <int>, # GDP2011Q4 <int>, GDP2011 <dbl>, GDP2012Q1 <int>, GDP2012Q2 <int>, # GDP2012Q3 <int>, GDP2012Q4 <int>, GDP2012 <dbl>, GDP2013Q1 <int>, # GDP2013Q2 <int>, GDP2013Q3 <int>, GDP2013Q4 <int>, GDP2013 <dbl>, # GDP2014Q1 <int>, GDP2014Q2 <int>, GDP2014Q3 <int>, GDP2014Q4 <int>, # GDP2014 <dbl>, CENSUS2010POP <int>, POPESTIMATE2010 <int>, # POPESTIMATE2011 <int>, POPESTIMATE2012 <int>, POPESTIMATE2013 <int>, # POPESTIMATE2014 <int>, RBIRTH2011 <dbl>, RBIRTH2012 <dbl>, # RBIRTH2013 <dbl>, RBIRTH2014 <dbl>, RDEATH2011 <dbl>, RDEATH2012 <dbl>, # RDEATH2013 <dbl>, RDEATH2014 <dbl>, RNATURALINC2011 <dbl>, # RNATURALINC2012 <dbl>, RNATURALINC2013 <dbl>, RNATURALINC2014 <dbl>, # RINTERNATIONALMIG2011 <dbl>, RINTERNATIONALMIG2012 <dbl>, # RINTERNATIONALMIG2013 <dbl>, RINTERNATIONALMIG2014 <dbl>, # RDOMESTICMIG2011 <dbl>, RDOMESTICMIG2012 <dbl>, RDOMESTICMIG2013 <dbl>, # RDOMESTICMIG2014 <dbl>, RNETMIG2011 <dbl>, RNETMIG2012 <dbl>, # RNETMIG2013 <dbl>, RNETMIG2014 <dbl>, key <chr>, numbers <dbl>
In [29]:
EN1 <- gather(energy, TotalC2010:TotalC2014, key = "TotalCYear", value = "count")
In [30]:
EN1
StateCodesStateRegionDivisionCoastGreat LakesTotalP2010TotalP2011TotalP2012TotalP2013RDOMESTICMIG2011RDOMESTICMIG2012RDOMESTICMIG2013RDOMESTICMIG2014RNETMIG2011RNETMIG2012RNETMIG2013RNETMIG2014TotalCYearcount
AL Alabama 3 6 1 0 1419613 1400108 1433370 1463647 -0.02044325 -0.1684135 0.3964159 0.4201015 1.01194084 1.0013329 1.56224745 1.57796256 TotalC2010 1931522
AK Alaska 4 9 1 0 1738207 1641980 1563102 1513859 -1.17513721 -1.9495712 -3.7893131 -13.7544937 0.94818536 1.8353761 -0.58569541 -10.88473403 TotalC2010 653221
AZ Arizona 4 8 0 0 580948 617956 598039 594994 1.34147180 -0.4208753 -0.5805623 -1.3130505 2.31780119 0.6219715 0.50947307 -0.22176714 TotalC2010 1383531
AR Arkansas 3 7 0 0 1247709 1391190 1472778 1432074 1.36951366 5.1312819 3.9104760 6.2806359 3.33662764 7.1552124 6.05235310 8.41044101 TotalC2010 1120632
CA California 4 9 1 0 2532205 2634789 2334863 2390424 -1.16207924 -1.1739507 -1.3412263 -0.8309823 2.76137748 2.7727695 2.86612681 3.34640648 TotalC2010 7760629
CO Colorado 4 8 0 0 2561459 2750097 2921385 2838193 5.18339661 5.5536752 6.9775832 7.5871626 6.93315930 7.6608637 9.05178284 9.59789807 TotalC2010 1513547
CT Connecticut 1 1 1 0 203188 195792 205073 207118 -3.38443506 -5.6114915 -4.7316382 -7.2862519 1.11689425 -1.0591659 0.02196365 -2.55530211 TotalC2010 764970
DE Delaware 3 5 1 0 3575 3976 3530 3818 2.86684813 3.5983800 3.3971710 5.1481739 5.30328177 6.2212626 6.00612012 7.71366265 TotalC2010 250212
FL Florida 3 5 1 0 510201 500907 442188 542570 5.54039345 5.1253203 4.9187834 7.0161227 11.35960582 10.7225728 10.70250086 12.70342279 TotalC2010 4282673
GA Georgia 3 5 1 0 561796 549483 555238 581082 1.10581577 1.8522009 -0.5768876 2.2004666 3.37500669 4.4262675 1.93363794 4.67089008 TotalC2010 3100144
HI Hawaii 4 9 1 0 16245 18378 20485 25465 -0.72861180 -2.5672885 -0.6631562 -3.6350806 4.25497620 4.3161049 5.76353447 2.43941415 TotalC2010 278046
ID Idaho 4 8 0 0 130652 176717 154829 138799 0.05833087 -0.1635544 2.9865046 4.7386958 0.95294886 0.8485958 4.03935504 5.78263774 TotalC2010 516120
IL Illinois 2 3 0 1 2090939 2189423 2436702 2519993 -5.42476227 -5.6904719 -5.2380977 -7.3691757 -3.18271398 -3.3823391 -2.73308256 -4.85062123 TotalC2010 3955091
IN Indiana 2 3 0 1 987568 1062319 1046014 1106441 -1.30254263 -1.9541615 -0.2311505 -1.1921716 0.15591763 -0.5077296 1.35562499 0.39840311 TotalC2010 2863396
IA Iowa 2 4 0 0 673102 689146 677875 730473 0.09157511 -1.3988317 0.1501230 -0.2613128 1.68465491 0.1895506 1.88253574 1.47044899 TotalC2010 1499729
KS Kansas 2 4 0 0 812830 781704 793080 830704 -3.15696832 -1.7898060 -4.3695984 -4.7601461 -1.40654930 0.4746409 -2.26263009 -2.70939350 TotalC2010 1117631
KY Kentucky 3 6 0 0 2781466 2840858 2391118 2138426 0.59702684 -1.2712625 -0.5421808 -0.8589545 1.77800694 0.1999200 0.85255141 0.49789857 TotalC2010 1978527
LA Louisiana 3 7 1 0 3170742 3972994 3789540 3208044 0.45474712 -0.1679608 -0.4968579 -1.3115694 1.94023978 1.5255891 1.15507555 0.30908636 TotalC2010 4385758
ME Maine 1 1 1 0 151940 154764 152331 155380 0.06176347 -0.4675286 -1.1026255 0.3994297 0.95130816 0.5345335 -0.05720105 1.43523880 TotalC2010 415065
MD Maryland 3 5 1 0 250812 272882 245197 248554 0.06293021 -1.4320466 -1.5138764 -2.5673210 4.60404951 3.2782158 3.40085453 2.29271254 TotalC2010 1464503
MA Massachusetts1 1 1 0 108868 103058 110384 102521 -0.52366579 -1.6315826 -0.3254842 -2.4310476 4.78903744 3.6469429 5.26431452 3.11142579 TotalC2010 1416119
MI Michigan 2 3 0 1 626639 680857 626107 654767 -4.37165740 -3.3942432 -2.9868108 -2.8956885 -2.54604112 -1.5226322 -0.95779330 -0.86681842 TotalC2010 2753536
MN Minnesota 2 4 0 1 429062 425422 417674 405967 -0.62804606 -1.6574311 -0.4143418 -1.2309691 1.73345966 0.7482767 2.16927752 1.34310939 TotalC2010 1857095
MS Mississippi 3 6 1 0 448713 441341 390246 410271 -1.96091120 -1.8894810 -1.6054616 -3.1344983 -1.33764198 -0.9667034 -0.82129781 -2.38277997 TotalC2010 1177620
MO Missouri 2 4 0 0 198297 197714 203118 191759 -2.24542626 -2.2406441 -1.3487765 -1.3336080 -0.98795424 -0.8004437 0.10538352 0.09365317 TotalC2010 1910500
MT Montana 4 8 0 0 1148154 1104165 1008736 1104932 3.43419996 3.5509860 5.2979490 4.4641915 4.04076780 4.4167635 6.09001761 5.21574555 TotalC2010 400855
NE Nebraska 2 4 0 0 399822 390808 357156 370523 -0.64486314 -0.4916545 -0.4784591 -1.3603621 1.16935909 1.5144472 1.57768007 0.66871583 TotalC2010 860741
NV Nevada 4 8 0 0 51846 53807 60252 70208 -2.82732878 5.2844160 4.7552264 8.3909457 -0.08078082 8.1453008 7.78763883 11.39453695 TotalC2010 645604
NH New Hampshire1 1 1 0 156829 131919 129929 166328 -1.63666494 -0.3985745 -1.9864496 0.8432006 -0.30288929 1.0456898 -0.45009045 2.37258670 TotalC2010 294473
NJ New Jersey 1 2 1 0 376845 390931 393682 403225 -5.11033711 -5.5890376 -5.1205616 -6.2151265 0.45426226 -0.1989998 0.67867877 -0.43059603 TotalC2010 2395713
MN Minnesota 2 4 0 1 429062 425422 417674 405967 -0.62804606 -1.65743112 -0.4143418 -1.2309691 1.73345966 0.7482767 2.16927752 1.34310939 TotalC2014 1912065
MS Mississippi 3 6 1 0 448713 441341 390246 410271 -1.96091120 -1.88948096 -1.6054616 -3.1344983 -1.33764198 -0.9667034 -0.82129781 -2.38277997 TotalC2014 1155549
MO Missouri 2 4 0 0 198297 197714 203118 191759 -2.24542626 -2.24064408 -1.3487765 -1.3336080 -0.98795424 -0.8004437 0.10538352 0.09365317 TotalC2014 1903839
MT Montana 4 8 0 0 1148154 1104165 1008736 1104932 3.43419996 3.55098601 5.2979490 4.4641915 4.04076780 4.4167635 6.09001761 5.21574555 TotalC2014 403446
NE Nebraska 2 4 0 0 399822 390808 357156 370523 -0.64486314 -0.49165445 -0.4784591 -1.3603621 1.16935909 1.5144472 1.57768007 0.66871583 TotalC2014 864347
NV Nevada 4 8 0 0 51846 53807 60252 70208 -2.82732878 5.28441598 4.7552264 8.3909457 -0.08078082 8.1453008 7.78763883 11.39453695 TotalC2014 660256
NH New Hampshire 1 1 1 0 156829 131919 129929 166328 -1.63666494 -0.39857453 -1.9864496 0.8432006 -0.30288929 1.0456898 -0.45009045 2.37258670 TotalC2014 310142
NJ New Jersey 1 2 1 0 376845 390931 393682 403225 -5.11033711 -5.58903761 -5.1205616 -6.2151265 0.45426226 -0.1989998 0.67867877 -0.43059603 TotalC2014 2340188
NM New Mexico 4 8 0 0 2255073 2260681 2315892 2367423 0.03089282 -3.42012889 -5.0629404 -6.7844755 1.09814337 -2.0783084 -3.77898635 -5.50369841 TotalC2014 679136
NY New York 1 2 1 1 836703 879395 817908 872500 -4.32545483 -5.94680886 -5.4820494 -7.8049472 1.49434986 -0.3189971 0.56988277 -1.78094837 TotalC2014 3742892
NC North Carolina 3 5 1 0 576162 572796 560887 606427 3.17214261 3.29953845 3.8869020 3.6636405 5.04794391 5.7729809 6.14754287 5.80461228 TotalC2014 2554776
ND North Dakota 2 4 0 0 1253749 1518406 2135154 2634178 9.14542431 15.58963681 23.1894509 12.2651006 10.67971377 17.8045736 25.10448511 14.02819169 TotalC2014 640095
OH Ohio 2 3 0 1 1032440 1067943 1067944 1149882 -3.18858790 -3.20406546 -1.9123029 -1.5749692 -1.81998332 -1.7622360 -0.39372214 -0.06103728 TotalC2014 3809648
OK Oklahoma 3 7 0 0 2576077 2704301 2950525 3137443 1.67426274 2.35073293 3.5996562 1.1322997 3.00715292 3.9555020 5.10835669 2.60452203 TotalC2014 1679856
OR Oregon 4 9 1 0 393763 513597 496254 458848 2.95766482 3.37611288 2.4896662 5.7404707 4.52890803 4.9325756 4.21784158 7.47805827 TotalC2014 987145
PA Pennsylvania 1 2 0 1 3064745 3870671 4730336 5873742 -0.54676726 -1.35870300 -2.4839403 -2.4599014 1.57878163 0.7970514 -0.21118267 -0.18679231 TotalC2014 3902434
RI Rhode Island 1 1 1 0 3399 3419 2822 2627 -5.83915808 -5.02599711 -4.8746647 -3.2126693 -2.05881151 -1.0319971 -0.76828438 0.85652211 TotalC2014 204459
SC South Carolina 3 5 1 0 649978 661898 646738 690644 3.19120230 5.61300811 5.9543633 8.0408887 4.29289110 7.2294966 7.33368090 9.35153650 TotalC2014 1632085
SD South Dakota 2 4 0 0 225387 248370 248616 236679 2.44945783 5.24394471 5.1975757 0.6616883 3.99058013 7.0598520 6.94875162 2.36771385 TotalC2014 391857
TN Tennessee 3 6 0 0 505731 506701 468757 544916 2.45252515 4.28628133 2.0897983 3.7574480 3.69627104 5.7299274 3.50729121 5.16164300 TotalC2014 2194512
TX Texas 3 7 1 0 11412134 12571450 14180982 15688922 4.57794456 5.63098177 4.4282840 5.7785074 7.67527476 8.7241630 7.62949715 8.94472107 TotalC2014 12899498
UT Utah 4 8 0 0 1073063 1128821 1118599 1122868 -0.32882084 -0.03068503 1.9208122 -0.4225336 1.41976181 1.7599803 3.80966870 1.44722034 TotalC2014 797995
VT Vermont 1 1 0 0 80143 80677 77979 84188 -0.75225076 -2.60420825 -1.0981705 -2.4716435 0.25554166 -1.5168595 0.05267388 -1.32437968 TotalC2014 139897
VA Virginia 3 5 1 0 1096973 1085234 1045095 1023009 1.33171670 0.60391533 0.3041831 -2.4583298 4.97522119 5.0037998 4.49933481 1.64009160 TotalC2014 2430205
WA Washington 4 9 1 0 907957 1101186 1108998 1003330 3.44647012 1.96217689 2.2507462 3.9989250 6.41638546 5.3702911 5.70595658 7.39508290 TotalC2014 2011941
WV West Virginia 3 5 0 0 3699729 3820203 3720223 3809750 0.58611685 0.56206796 -1.2528613 -1.4843729 1.14365578 1.1715587 -0.62157876 -0.85584979 TotalC2014 752942
WI Wisconsin 2 3 0 1 329441 312552 333635 310413 -1.09492384 -1.65388673 -1.3697434 -1.7270528 -0.05843103 -0.6192236 -0.24032423 -0.58797357 TotalC2014 1868867
WY Wyoming 4 8 0 0 10536570 10353148 9611304 9233869 -0.36219433 9.65117376 4.5478211 -4.5777881 0.32332470 10.6105245 5.40635592 -3.74686476 TotalC2014 535612
DC District of Columbia 3 5 0 0 95 271 277 323 11.33288241 10.00583847 9.7776663 1.7935725 16.80595478 15.5957903 15.64924997 7.54279039 TotalC2014 178929
US United States NA NA NA NA 74593106 77778786 79032062 81604540 NA NA NA NA 2.94196766 3.0519321 3.16821052 3.13508085 TotalC2014 98385210

Data Citation And Summary

Citation for Energy Consumption Dataset

File Name: United States Energy, Census, and GDP

Website of dataset: Kaggle.com

Link to dataset: https://www.kaggle.com/lislejoem/us_energy_census_gdp_10-14

I did not use any filters while downloading this data. The provider/ creator of this dataset got the information they put into ther own dataset from external sources such as the Bureau of Economic Analysis for the data on GDP/Census and the U.S. Department of Commerce and the U.S. Energy Information Administration for information on energy consumption

Summary

This dataset is about the U.S. Census, Gross Domestc Product, and Energy consumption of both renewable and non renewable resources. Some of the columns are labeled State/ StateCodes which is the state abbreviation, Region in which a number 1-4 is allocated to a state in a sspecific region ( i.e. 1 = Northeast, 2 = Midwest, 3 = South, 4 = West), TotalC(year) which means the total energy consumption in billion British Thermal Units (BTU) in a given year, the type of renewable/nonrenewable energy source with a "C" (consumption) at the end ("CoalC2012") to show the total consumption of that product. The rest of the columns either list the resources or tell about the human population of each state and the birth/death rate, etc. Also please take into account that the row with a lot of NA for the values is from the total of the entire United states (US). A lot of the column variables do not apply to this row, this is why there is NA present.

Data Analysis and Plots

Question 1

In the dataset, is there any region that has an extreme outlier when it comes to energy consumption in the year 2010?

To create this scatter plot I downloaded ggplot2 in the R package. I made the data used in the plot to be the energy2 dataset. After i made it into a scatterplot by using geom point and made the x-axis labeled regions and the y-axis labeled TotalC2010 and for the grapg to be color coded by the "StateCodes"

In [47]:
ggplot(data = energy2) + geom_point(mapping = aes(x = Region, y = TotalC2010, color = StateCodes))
Warning message: “Removed 100 rows containing missing values (geom_point).”

We see that the South and Midwestern Regions have outliers, which are Texas and California. These states are 2 of the biggest states in the U.S. and have some of the largest opulations, so it makes sence that their numbers would be outside of the frames of the other states. In other states in the 4 regions resented, we see that most states have an total of .03 billion British Themal Units

Question 2

In 2010 and 2014, which region paid the highest amount for energy consumption, were there any decreases in the total spending?

In this problem I wanted to make a graph comparing the prives from 2 years in the different divisions. Note: the prices are in billions and USD. I created a name for this new table which is energy3, and used the select function to choose the columns from the original plot I wanted to see in the graph.

In [11]:
energy3 <- select(EN1, Division, TotalPrice2010, TotalPrice2014)
In [12]:
energy3
DivisionTotalPrice2010TotalPrice2014
6 17.8218.64
9 20.1324.43
8 22.2525.94
7 16.9018.87
9 20.9725.31
8 17.4021.25
1 25.6227.84
5 23.5023.77
5 22.0524.91
5 18.2520.95
9 30.4337.38
8 17.1420.43
3 17.3919.02
3 14.8917.94
4 15.6818.24
4 17.7021.46
6 17.1620.75
7 15.0615.49
1 18.7722.62
5 22.8325.25
1 23.3326.36
3 18.4720.01
4 17.0419.65
6 17.9020.78
4 18.8421.94
8 18.2621.72
4 16.6819.26
8 21.2223.52
1 24.2127.86
2 21.2822.78
4 17.0419.65
6 17.9020.78
4 18.8421.94
8 18.2621.72
4 16.6819.26
8 21.2223.52
1 24.2127.86
2 21.2822.78
8 20.0223.37
2 23.0924.57
5 20.2723.02
4 13.7918.51
3 18.1820.01
7 17.1619.97
9 19.2022.29
2 19.7421.44
1 23.6326.34
5 18.5221.74
4 17.2120.32
6 18.6721.47
7 17.3619.05
8 16.8920.71
1 23.7527.60
5 19.1221.66
9 18.3321.11
5 17.2619.97
3 18.4820.74
8 15.1618.70
5 25.7624.96
NA 18.9221.33
In [26]:
ggplot(energy3) + geom_col(mapping = aes(x = Division, y =TotalPrice2010))
Warning message: “Removed 5 rows containing missing values (position_stack).”
In [14]:
ggplot(energy3) + geom_col(mapping = aes(x = Division, y = TotalPrice2014))
Warning message: “Removed 5 rows containing missing values (position_stack).”

In both 2010 and 2014, Division 5 has spent the most amount of money in both years, but in 2010, Division 5 actually spent a lower amount of money on energy in comparison to 2014

Question 3

For each state, what is the average consumption of energy from 2010 to 2014?

In [15]:
Mean_energy <- mutate(energy, mean_energy = (TotalC2010 + TotalC2011 + TotalC2012 + TotalC2013 + TotalC2014) / 5)

To add the mean column to the dataset, I assigned a name for the new column. Then I took the mutate function and added TotalC for all 5 years (2010 - 2014) and divided it by 5 since there were 5 columns being added together.

In [16]:
Mean_energy
StateCodesStateRegionDivisionCoastGreat LakesTotalC2010TotalC2011TotalC2012TotalC2013RINTERNATIONALMIG2014RDOMESTICMIG2011RDOMESTICMIG2012RDOMESTICMIG2013RDOMESTICMIG2014RNETMIG2011RNETMIG2012RNETMIG2013RNETMIG2014mean_energy
AL Alabama 3 6 1 0 1931522 1905207 1879716 1919365 1.1578610 -0.02044325 -0.16841354 0.3964159 0.4201015 1.01194084 1.0013329 1.56224745 1.57796256 1918806.2
AK Alaska 4 9 1 0 653221 653637 649341 621107 2.8697597 -1.17513721 -1.94957118 -3.7893131 -13.7544937 0.94818536 1.8353761 -0.58569541 -10.88473403 636085.0
AZ Arizona 4 8 0 0 1383531 1424944 1395839 1414383 1.0912833 1.34147180 -0.42087528 -0.5805623 -1.3130505 2.31780119 0.6219715 0.50947307 -0.22176714 1408257.4
AR Arkansas 3 7 0 0 1120632 1122544 1067642 1096438 2.1298051 1.36951366 5.13128187 3.9104760 6.2806359 3.33662764 7.1552124 6.05235310 8.41044101 1104333.0
CA California 4 9 1 0 7760629 7777115 7564063 7665241 4.1773888 -1.16207924 -1.17395070 -1.3412263 -0.8309823 2.76137748 2.7727695 2.86612681 3.34640648 7677426.0
CO Colorado 4 8 0 0 1513547 1470445 1440781 1470844 2.0107355 5.18339661 5.55367522 6.9775832 7.5871626 6.93315930 7.6608637 9.05178284 9.59789807 1474558.8
CT Connecticut 1 1 1 0 764970 739130 725019 754901 4.7309498 -3.38443506 -5.61149155 -4.7316382 -7.2862519 1.11689425 -1.0591659 0.02196365 -2.55530211 746807.8
DE Delaware 3 5 1 0 250212 272568 273728 273716 2.5654887 2.86684813 3.59838002 3.3971710 5.1481739 5.30328177 6.2212626 6.00612012 7.71366265 268847.4
FL Florida 3 5 1 0 4282673 4141711 4029903 4076406 5.6873001 5.54039345 5.12532032 4.9187834 7.0161227 11.35960582 10.7225728 10.70250086 12.70342279 4130474.6
GA Georgia 3 5 1 0 3100144 2982837 2767491 2782782 2.4704235 1.10581577 1.85220088 -0.5768876 2.2004666 3.37500669 4.4262675 1.93363794 4.67089008 2896848.8
HI Hawaii 4 9 1 0 278046 287113 280171 281329 6.0744948 -0.72861180 -2.56728847 -0.6631562 -3.6350806 4.25497620 4.3161049 5.76353447 2.43941415 281573.0
ID Idaho 4 8 0 0 516120 516978 510869 526613 1.0439419 0.05833087 -0.16355442 2.9865046 4.7386958 0.95294886 0.8485958 4.03935504 5.78263774 518103.0
IL Illinois 2 3 0 1 3955091 3937616 3820547 3991089 2.5185545 -5.42476227 -5.69047190 -5.2380977 -7.3691757 -3.18271398 -3.3823391 -2.73308256 -4.85062123 3949331.2
IN Indiana 2 3 0 1 2863396 2847188 2770158 2894764 1.5905747 -1.30254263 -1.95416154 -0.2311505 -1.1921716 0.15591763 -0.5077296 1.35562499 0.39840311 2861427.2
IA Iowa 2 4 0 0 1499729 1498973 1440053 1518870 1.7317618 0.09157511 -1.39883166 0.1501230 -0.2613128 1.68465491 0.1895506 1.88253574 1.47044899 1499905.0
KS Kansas 2 4 0 0 1117631 1104843 1075435 1105160 2.0507526 -3.15696832 -1.78980603 -4.3695984 -4.7601461 -1.40654930 0.4746409 -2.26263009 -2.70939350 1107084.6
KY Kentucky 3 6 0 0 1978527 1903208 1868483 1838898 1.3568530 0.59702684 -1.27126249 -0.5421808 -0.8589545 1.77800694 0.1999200 0.85255141 0.49789857 1871971.4
LA Louisiana 3 7 1 0 4385758 4388867 4255161 4196622 1.6206558 0.45474712 -0.16796076 -0.4968579 -1.3115694 1.94023978 1.5255891 1.15507555 0.30908636 4301170.2
ME Maine 1 1 1 0 415065 413893 399670 409785 1.0358091 0.06176347 -0.46752860 -1.1026255 0.3994297 0.95130816 0.5345335 -0.05720105 1.43523880 409788.6
MD Maryland 3 5 1 0 1464503 1410012 1368600 1395961 4.8600336 0.06293021 -1.43204655 -1.5138764 -2.5673210 4.60404951 3.2782158 3.40085453 2.29271254 1407938.0
MA Massachusetts 1 1 1 0 1416119 1397164 1363282 1428433 5.5424734 -0.52366579 -1.63158264 -0.3254842 -2.4310476 4.78903744 3.6469429 5.26431452 3.11142579 1408527.6
MI Michigan 2 3 0 1 2753536 2785212 2687926 2832425 2.0288701 -4.37165740 -3.39424318 -2.9868108 -2.8956885 -2.54604112 -1.5226322 -0.95779330 -0.86681842 2788129.8
MN Minnesota 2 4 0 1 1857095 1850749 1816866 1858816 2.5740785 -0.62804606 -1.65743112 -0.4143418 -1.2309691 1.73345966 0.7482767 2.16927752 1.34310939 1859118.2
MS Mississippi 3 6 1 0 1177620 1155456 1143099 1138181 0.7517183 -1.96091120 -1.88948096 -1.6054616 -3.1344983 -1.33764198 -0.9667034 -0.82129781 -2.38277997 1153981.0
MO Missouri 2 4 0 0 1910500 1856590 1781978 1847506 1.4272611 -2.24542626 -2.24064408 -1.3487765 -1.3336080 -0.98795424 -0.8004437 0.10538352 0.09365317 1860082.6
MT Montana 4 8 0 0 400855 402355 395724 405934 0.7515540 3.43419996 3.55098601 5.2979490 4.4641915 4.04076780 4.4167635 6.09001761 5.21574555 401662.8
NE Nebraska 2 4 0 0 860741 862675 852984 871815 2.0290779 -0.64486314 -0.49165445 -0.4784591 -1.3603621 1.16935909 1.5144472 1.57768007 0.66871583 862512.4
NV Nevada 4 8 0 0 645604 632655 639190 659568 3.0035913 -2.82732878 5.28441598 4.7552264 8.3909457 -0.08078082 8.1453008 7.78763883 11.39453695 647454.6
NH New Hampshire 1 1 1 0 294473 292979 284490 304538 1.5293861 -1.63666494 -0.39857453 -1.9864496 0.8432006 -0.30288929 1.0456898 -0.45009045 2.37258670 297324.4
NJ New Jersey 1 2 1 0 2395713 2411816 2241207 2311685 5.7845304 -5.11033711 -5.58903761 -5.1205616 -6.2151265 0.45426226 -0.1989998 0.67867877 -0.43059603 2340121.8
NM New Mexico 4 8 0 0 649962 668675 666540 670257 1.2807771 0.03089282 -3.42012889 -5.0629404 -6.7844755 1.09814337 -2.0783084 -3.77898635 -5.50369841 666914.0
NY New York 1 2 1 1 3723729 3611091 3503309 3626150 6.0239988 -4.32545483 -5.94680886 -5.4820494 -7.8049472 1.49434986 -0.3189971 0.56988277 -1.78094837 3641434.2
NC North Carolina 3 5 1 0 2685333 2558792 2481060 2530649 2.1409718 3.17214261 3.29953845 3.8869020 3.6636405 5.04794391 5.7729809 6.14754287 5.80461228 2562122.0
ND North Dakota 2 4 0 0 476072 528508 552326 594439 1.7630911 9.14542431 15.58963681 23.1894509 12.2651006 10.67971377 17.8045736 25.10448511 14.02819169 558288.0
OH Ohio 2 3 0 1 3824933 3792585 3655849 3739974 1.5139319 -3.18858790 -3.20406546 -1.9123029 -1.5749692 -1.81998332 -1.7622360 -0.39372214 -0.06103728 3764597.8
OK Oklahoma 3 7 0 0 1579910 1585212 1561913 1620837 1.4722224 1.67426274 2.35073293 3.5996562 1.1322997 3.00715292 3.9555020 5.10835669 2.60452203 1605545.6
OR Oregon 4 9 1 0 975067 1002476 975044 991867 1.7375876 2.95766482 3.37611288 2.4896662 5.7404707 4.52890803 4.9325756 4.21784158 7.47805827 986319.8
PA Pennsylvania 1 2 0 1 3752280 3725014 3623997 3826959 2.2731090 -0.54676726 -1.35870300 -2.4839403 -2.4599014 1.57878163 0.7970514 -0.21118267 -0.18679231 3766136.8
RI Rhode Island 1 1 1 0 195314 185731 183879 199165 4.0691914 -5.83915808 -5.02599711 -4.8746647 -3.2126693 -2.05881151 -1.0319971 -0.76828438 0.85652211 193709.6
SC South Carolina 3 5 1 0 1643912 1601881 1558766 1590456 1.3106478 3.19120230 5.61300811 5.9543633 8.0408887 4.29289110 7.2294966 7.33368090 9.35153650 1605420.0
SD South Dakota 2 4 0 0 378514 378470 375950 389619 1.7060255 2.44945783 5.24394471 5.1975757 0.6616883 3.99058013 7.0598520 6.94875162 2.36771385 382882.0
TN Tennessee 3 6 0 0 2247273 2195401 2080953 2132508 1.4041950 2.45252515 4.28628133 2.0897983 3.7574480 3.69627104 5.7299274 3.50729121 5.16164300 2170129.4
TX Texas 3 7 1 0 11687521 11906249 11931169 12660976 3.1662137 4.57794456 5.63098177 4.4282840 5.7785074 7.67527476 8.7241630 7.62949715 8.94472107 12217082.6
UT Utah 4 8 0 0 756012 794058 790154 831668 1.8697539 -0.32882084 -0.03068503 1.9208122 -0.4225336 1.41976181 1.7599803 3.80966870 1.44722034 793977.4
VT Vermont 1 1 0 0 153697 150475 130412 137527 1.1472638 -0.75225076 -2.60420825 -1.0981705 -2.4716435 0.25554166 -1.5168595 0.05267388 -1.32437968 142401.6
VA Virginia 3 5 1 0 2483360 2380922 2343908 2414477 4.0984214 1.33171670 0.60391533 0.3041831 -2.4583298 4.97522119 5.0037998 4.49933481 1.64009160 2410574.4
WA Washington 4 9 1 0 2031428 2059630 2037127 2036309 3.3961579 3.44647012 1.96217689 2.2507462 3.9989250 6.41638546 5.3702911 5.70595658 7.39508290 2035287.0
WV West Virginia 3 5 0 0 738821 726341 720985 743612 0.6285231 0.58611685 0.56206796 -1.2528613 -1.4843729 1.14365578 1.1715587 -0.62157876 -0.85584979 736540.2
WI Wisconsin 2 3 0 1 1791199 1778018 1721543 1813458 1.1390792 -1.09492384 -1.65388673 -1.3697434 -1.7270528 -0.05843103 -0.6192236 -0.24032423 -0.58797357 1794617.0
WY Wyoming 4 8 0 0 540122 556548 550182 539146 0.8309234 -0.36219433 9.65117376 4.5478211 -4.5777881 0.32332470 10.6105245 5.40635592 -3.74686476 544322.0
DC District of Columbia 3 5 0 0 190529 183806 172963 175560 5.7492179 11.33288241 10.00583847 9.7776663 1.7935725 16.80595478 15.5957903 15.64924997 7.54279039 180357.4
US United States NA NA NA NA 97446021 96827465 94411432 97141368 3.1350809 NA NA NA NA 2.94196766 3.0519321 3.16821052 3.13508085 96842299.2

Question 4

Compare 2 states, 1 that has a coast and 1 that has does not have a coast, do landlocked states use more or less energy than states that have a body of water next to them?

First I will assign a name to the subset of the table EN1 by naming it "Nonrenewable". Then I selected the groups of the columns from the original table to add to this subset table.

In [17]:
Nonrenewable <- group_by(EN1, StateCodes , Coast, TotalCYear, count )

by putting summerize in fromt of the new table name, the subset data can be printed to be viewd and analyzed

In [18]:
summarize(Nonrenewable)
StateCodesCoastTotalCYearcount
AK 1 TotalC2010 653221
AK 1 TotalC2011 653637
AK 1 TotalC2012 649341
AK 1 TotalC2013 621107
AK 1 TotalC2014 603119
AL 1 TotalC20101931522
AL 1 TotalC20111905207
AL 1 TotalC20121879716
AL 1 TotalC20131919365
AL 1 TotalC20141958221
AR 0 TotalC20101120632
AR 0 TotalC20111122544
AR 0 TotalC20121067642
AR 0 TotalC20131096438
AR 0 TotalC20141114409
AZ 0 TotalC20101383531
AZ 0 TotalC20111424944
AZ 0 TotalC20121395839
AZ 0 TotalC20131414383
AZ 0 TotalC20141422590
CA 1 TotalC20107760629
CA 1 TotalC20117777115
CA 1 TotalC20127564063
CA 1 TotalC20137665241
CA 1 TotalC20147620082
CO 0 TotalC20101513547
CO 0 TotalC20111470445
CO 0 TotalC20121440781
CO 0 TotalC20131470844
CO 0 TotalC20141477177
VA 1 TotalC20102483360
VA 1 TotalC20112380922
VA 1 TotalC20122343908
VA 1 TotalC20132414477
VA 1 TotalC20142430205
VT 0 TotalC2010 153697
VT 0 TotalC2011 150475
VT 0 TotalC2012 130412
VT 0 TotalC2013 137527
VT 0 TotalC2014 139897
WA 1 TotalC20102031428
WA 1 TotalC20112059630
WA 1 TotalC20122037127
WA 1 TotalC20132036309
WA 1 TotalC20142011941
WI 0 TotalC20101791199
WI 0 TotalC20111778018
WI 0 TotalC20121721543
WI 0 TotalC20131813458
WI 0 TotalC20141868867
WV 0 TotalC2010 738821
WV 0 TotalC2011 726341
WV 0 TotalC2012 720985
WV 0 TotalC2013 743612
WV 0 TotalC2014 752942
WY 0 TotalC2010 540122
WY 0 TotalC2011 556548
WY 0 TotalC2012 550182
WY 0 TotalC2013 539146
WY 0 TotalC2014 535612

Looking at both the land locked states and the states with with coastal fronts, it is fair to say that most of the time that states that have coasts use more energy than states that are land locked. For example Virginia, a coastal stste, in 2011 used a total of 2,380,922 btu of energy while Arizona used 1,114,409 btu of energy. But there are a few coastal states that use less energy than land locked states.

Question 5

Based on the Summarized report in the previous problem, what was New Yorks total consumption each year from 2010 to 2014? What line of best fit would show the trend of the graph accurately?

For this function, we are filtering out any row that has "NY" as a state in the Nonrenewable table and making it into its own table. I labeled this new table New_York.

In [19]:
library(dplyr) New_York <- filter(Nonrenewable, StateCodes == "NY" )

Afterwards I printed out the table for the filter by recalling the name "New_York"

In [20]:
New_York
StateCodesStateRegionDivisionCoastGreat LakesTotalP2010TotalP2011TotalP2012TotalP2013RDOMESTICMIG2011RDOMESTICMIG2012RDOMESTICMIG2013RDOMESTICMIG2014RNETMIG2011RNETMIG2012RNETMIG2013RNETMIG2014TotalCYearcount
NY New York 1 2 1 1 836703 879395 817908 872500 -4.325455 -5.946809 -5.482049 -7.804947 1.49435 -0.31899710.5698828 -1.780948 TotalC20103723729
NY New York 1 2 1 1 836703 879395 817908 872500 -4.325455 -5.946809 -5.482049 -7.804947 1.49435 -0.31899710.5698828 -1.780948 TotalC20113611091
NY New York 1 2 1 1 836703 879395 817908 872500 -4.325455 -5.946809 -5.482049 -7.804947 1.49435 -0.31899710.5698828 -1.780948 TotalC20123503309
NY New York 1 2 1 1 836703 879395 817908 872500 -4.325455 -5.946809 -5.482049 -7.804947 1.49435 -0.31899710.5698828 -1.780948 TotalC20133626150
NY New York 1 2 1 1 836703 879395 817908 872500 -4.325455 -5.946809 -5.482049 -7.804947 1.49435 -0.31899710.5698828 -1.780948 TotalC20143742892

Afterwards, I created a line plot by calling the package ggplot to print the graphing background. Then i set the data to equal the "New_York" table. I then call the plot to be a scatter plot by putting geom_point which is apart of the ggplot package. Lastly for the aesthetics of the graph, I made the x axis equal to the "TotalCYear" and the y axs equal to the "count" of each year.

In [21]:
ggplot(data = New_York) + geom_point(mapping = aes(x = TotalCYear, y = count))

The total consumption in 2010 was 3,723,729 BTUs, the total consumption for 2011 was 3,611,091 BTUs, for 2012 it was 3,503,309 BTUs, 2013 3,626,150 BTUs and 2014 it was 3,742,892 BTUs. I think the line of best fit for ths data would be a quadratic line/ parabola since there is a decline until 2012 and then an inclineor even an absolute value line since there is a sharper dip towards the center of the graph.