Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download

Jupyter notebook Learn to code OU/week3/World Bank Project/WB_2_indicators.ipynb

Project: 161004 test
Views: 75
Kernel: Anaconda (Python 3)

Week 3: New Project Task by Chris Pyves 1st November 2016

To create a completely new project by choosing another two of the World Bank indicators and see if there is any correlation between them. If there is a choice of similar indicators, choose one that leads to meaningful comparisons between countries.

introduction

Although we were directed to look at the World Bank list of indicators I found the range of data sets presented somewhat overwhelming and initially I had some difficulty searching through the website to try and find some suitable data that I could work with. When I followed up a particular subject that interested me I found that the data belonged to a third party or was almost non existant or partial. There were lots of avenues to persue and it was easy to get lost in all the profusion of data. However eventually I stumbled across the World Bank page listing all their indicators (which is what I should have looked at from the outsett) and this was where I made my selection from http://data.worldbank.org/indicator I chose to download the following datasets for 2014 for all countries:

  • 'SP.POP.TOTL' - Population, total

  • 'NY.GDP.PCAP.CD' - GDP per capita (current USD)

  • 'SL.TLF.TOTL.IN' - Labor force, total

  • 'SL.UEM.TOTL.ZS' - Unemployment, total (% of total labor force) (modeled ILO estimate)

  • 'EG.USE.PCAP.KG.OE' - Energy use (kg of oil equivalent per capita) (2013 Data as 2014 was not available)

I wanted to look at whether there was any statistical correlation between GDP per Capita and a counties labour force its rate of unemployment or energy use. I suspected that energy use went 'hand-in-hand' with GDP per Capita. But wanted to see if there was any relation to the size of the Labour force or unemployment. In particular I wanted to plot this information on a chart to see what it looked like.

The project will be set out in the following manner:

Analysis

  • Step 1: downloading the data

  • Step 2: cleaning the data

  • Step 3: transforming the data

  • Step 4: merging the data

  • Step 5: calculating

  • Step 6: visualising the correlation

Conclusions

Analysis

Analysis: Step 1: downloading

The Datasets were all downloaded as csv files without problems and then uploaded to SageMathCloud again without any problems.

import warnings warnings.simplefilter('ignore', FutureWarning) from pandas import * POP_INDICATOR = 'SP.POP.TOTL' # - Population, total GDP_INDICATOR = 'NY.GDP.PCAP.CD' # - GDP per capita (current US$) LAB_INDICATOR = 'SL.TLF.TOTL.IN' # - Labor force, total UNEMP_INDICATOR = 'SL.UEM.TOTL.ZS' # - Unemployment, total (% of total labor force) (modeled ILO estimate) ENERGY_INDICATOR = 'EG.USE.PCAP.KG.OE' # - Energy use (kg of oil equivalent per capita) (2013 Data) YEAR = 2014 ########## Download command #################################################### # popWB = download(indicator=POP_INDICATOR, country='all', start=YEAR, end=YEAR) # download(indicator=string, country='all', start=number, end=number) is a function in the pandas.io.wb module # downloads World Bank data for the given indicator and all countries and country groups for a given range of years ############################################################################### popReset = read_csv('SP.POP.TOTL.csv', skip_blank_lines=True ) gdpReset = read_csv('NY.GDP.PCAP.CD.csv') labReset = read_csv('SL.TLF.TOTL.IN.csv') unempReset = read_csv('SL.UEM.TOTL.ZS.csv') energyReset = read_csv('EG.USE.PCAP.KG.OE.csv')
# The following functions may be useful during the data cleaning process and so are loaded here def roundToMillions (value): return round(value / 1000000) def roundToThousands (value): return round(value / 1000) # The following function will not be required: #def usdToGBP (usd): # return usd / 1.564768# note if this is used check the correct rate for 2014 is used (this contains the 2013 rate)

Analysis: Step 2: cleaning

Each Reset file was then cleaned in the following order and saved as a Country file:

  • 'SP.POP.TOTL' - Population, total

  • 'NY.GDP.PCAP.CD' - GDP per capita (current USD)

  • 'SL.TLF.TOTL.IN' - Labor force, total

  • 'SL.UEM.TOTL.ZS' - Unemployment, total (% of total labor force) (modeled ILO estimate)

  • 'EG.USE.PCAP.KG.OE' - Energy use (kg of oil equivalent per capita) (2013 Data)

POP_INDICATOR = 'SP.POP.TOTL'

Whilst the data appeared to be in good order I had difficulty trying to change the data type of the data column with Population values from object to type int64. After some testing this was found to be cause by a single cell containing two dots which presumably is used to signify that no data is currently available for that country. Whilst this was quite frustrating to have everything stopped by just one cell it turned out to a useful learning experience.

A simple solution was used and this cell was assigned the value 0.00

After that the asType method was applied to the whole column and the column type was changed to int64.

"asType(aType) when applied to a dataframe column, the method changes the data type of each value in that column to the type given by the string aType." Glossary Week 2

#Identifying the cell causing the problem that prevent asType('int64') from being applied POP_INDICATOR = '2014 [YR2014]' popReset[(popReset[POP_INDICATOR]=='..')]
Series Name Series Code Country Name Country Code 2014 [YR2014]
61 Population, total SP.POP.TOTL Eritrea ERI ..
#POP_INDICATOR = 'SP.POP.TOTL' Population, total #popReset.head() #popReset.columns# column names Index([u'Series Name', u'Series Code', u'Country Name', u'Country Code', u'2014 [YR2014]'],dtype='object') popCountries = popReset[:217].dropna() POP_INDICATOR = '2014 [YR2014]' # 'SP.POP.TOTL' replace with header name #popCountries.dtypes# check before: all objects ###################################################################################### popCountries.loc[(popCountries[POP_INDICATOR]=='..'),POP_INDICATOR]=0.0 # ###################################################################################### popCountries[POP_INDICATOR] = popCountries[POP_INDICATOR].astype('int64')# #popCountries.dtypes

GDP_INDICATOR = 'NY.GDP.PCAP.CD'

#GDP_INDICATOR = 'NY.GDP.PCAP.CD' GDP per capita (current USD) #gdpReset.head() #popReset.columns#column names Index([u'Series Name', u'Series Code', u'Country Name', u'Country Code', u'2014 [YR2014]'],dtype='object') gdpCountries = gdpReset[:217].dropna() GDP_INDICATOR = '2014 [YR2014]' # 'NY.GDP.PCAP.CD' replace with header name gdpCountries.dtypes# check before: all objects #############Clean: Try using .loc[row_indexer,col_indexer] = value instead########### gdpCountries.loc[(gdpCountries[GDP_INDICATOR]=='..'),GDP_INDICATOR]=0.0 # ###################################################################################### gdpCountries[GDP_INDICATOR] = gdpCountries[GDP_INDICATOR].astype('float64')# #gdpCountries.dtypes

LAB_INDICATOR = 'SL.TLF.TOTL.IN'

#LAB_INDICATOR = 'SL.TLF.TOTL.IN' - Labor force, total #labReset.head() #popReset.columns# column names Index([u'Series Name', u'Series Code', u'Country Name', u'Country Code', u'2014 [YR2014]'],dtype='object') labCountries = labReset[:217].dropna() # rename(columns={'2014 [YR2014]' : POP}) renames the column oldName to newName. # london = london.rename(columns={'WindDirDegrees<br />':'WindDirDegrees'}) lab_INDICATOR = '2014 [YR2014]' # 'SL.TLF.TOTL.IN' replace with header name labCountries.dtypes# check before: all objects #############Clean: Try using .loc[row_indexer,col_indexer] = value instead########### labCountries.loc[(labCountries[lab_INDICATOR]=='..'),lab_INDICATOR]=0.0 # ###################################################################################### labCountries[lab_INDICATOR] = labCountries[lab_INDICATOR].astype('float64')# #labCountries.dtypes

UNEMP_INDICATOR = 'SL.UEM.TOTL.ZS'

#UNEMP_INDICATOR = 'SL.UEM.TOTL.ZS' # - Unemployment, total (% of total labor force) (modeled ILO estimate) #unempReset.head() #popReset.columns# column names Index([u'Series Name', u'Series Code', u'Country Name', u'Country Code', u'2014 [YR2014]'],dtype='object') unempCountries = unempReset[:217].dropna() unemp_INDICATOR = '2014 [YR2014]' # 'SL.UEM.TOTL.ZS' replace with header name unempCountries.dtypes# check before: all objects #############Clean: Try using .loc[row_indexer,col_indexer] = value instead########### unempCountries.loc[(unempCountries[unemp_INDICATOR]=='..'),unemp_INDICATOR]=0.0 # ###################################################################################### unempCountries[unemp_INDICATOR] = unempCountries[unemp_INDICATOR].astype('float64')# #unempCountries.dtypes

ENERGY_INDICATOR = 'EG.USE.PCAP.KG.OE'

This file was cleaned just like all the others before. But later on it was felt that replacing the cells with '..' with zeros might not be the best course of action and a decision was made to remove all these rows from this particular data set. This can be seen by the addition of a drop command into the code. When the resultat file was checked later on it returned a better correlation coefficient and made the graph look a lot clearer.

#ENERGY_INDICATOR = 'EG.USE.PCAP.KG.OE' # - Energy use (kg of oil equivalent per capita) (2013 Data) #energyReset.head() #popReset.columns#column names Index([u'Series Name', u'Series Code', u'Country Name', u'Country Code', u'2014 [YR2014]'],dtype='object') energyCountries = energyReset[:217].dropna() energy_INDICATOR = '2013 [YR2013]' # 'SL.UEM.TOTL.ZS' replace with header name energyCountries.dtypes# check before: all objects #############Clean: Try using .loc[row_indexer,col_indexer] = value instead########### #energyCountries.loc[(energyCountries[energy_INDICATOR]=='..'),energy_INDICATOR]=0.0 # try deleting these rows. # df.drop(df.index[[1,3]]) energyColumn = energyCountries.drop(energyCountries.index[energyCountries[energy_INDICATOR]=='..']) # deletes all these rows. ###################################################################################### energyCountries = energyColumn #energyCountries.dtypes #energyCountries['2013 [YR2013]'] energyCountries[energy_INDICATOR] = energyCountries[energy_INDICATOR].astype('float64')# #energyCountries.dtypes

Analysis: Step 3: transforming

Each individual Country file was reduced to just essential columns and saved as a Clean file. I kept the Country Codes column as I felt that this would prove to be a better 'Common Column' join all the tables with.

"A join is the merging of two tables on a common column. The resulting table has all columns of both tables (the common column isn’t duplicated), and the rows are determined by the type of join. Rows in the two tables that have the same value in the common column become a joined row in the resulting table." Glossary Week 3

#POP_INDICATOR = 'SP.POP.TOTL' Population, total # Round Population to 1000s COUNTRY = 'Country Name' CODE = 'Country Code' POP = 'Population (1000s)' popCountries[POP]= popCountries[POP_INDICATOR].apply(roundToThousands)# cant run this till data cleaned # Consolidates data & gets rid of year & Source data [checked] headings = [COUNTRY, CODE, POP] popClean = popCountries[headings] #popClean.head()
#GDP_INDICATOR = 'NY.GDP.PCAP.CD' GDP per capita (current USD) # Round GDP to? leave as is for time being COUNTRY = 'Country Name' CODE = 'Country Code' GDP = 'Per Capita GDP $' gdpCountries[GDP]= gdpCountries[GDP_INDICATOR]#.apply(roundToThousands)# cant run this till data cleaned # Consolidates data & gets rid of year & Source data [checked] headings = [COUNTRY, CODE, GDP] gdpClean = gdpCountries[headings] #gdpClean.head()
#LAB_INDICATOR = 'SL.TLF.TOTL.IN' - Labor force, total # Round LAB to ?? leave as is for the time being COUNTRY = 'Country Name' CODE = 'Country Code' LAB = 'Labour force total' labCountries[LAB]= labCountries[lab_INDICATOR]#.apply(roundToThousands)# cant run this till data cleaned # Consolidates data & gets rid of year & Source data [checked] headings = [COUNTRY, CODE, LAB] labClean = labCountries[headings] #labClean.head()
#UNEMP_INDICATOR = 'SL.UEM.TOTL.ZS' # - Unemployment, total (% of total labor force) (modeled ILO estimate) # Round Unemp to % COUNTRY = 'Country Name' CODE = 'Country Code' UNEMP = 'Unemployment (% total force)' #Unemployment, total (% of total labor force) (modeled ILO estimate) unempCountries[UNEMP]= unempCountries[unemp_INDICATOR]# # Consolidates data & gets rid of year & Source data [checked] headings = [COUNTRY, CODE, UNEMP] unempClean = unempCountries[headings] #unempClean.head()
#ENERGY_INDICATOR = 'EG.USE.PCAP.KG.OE' # - Energy use (kg of oil equivalent per capita) (2013 Data) # Round Energy to ?? COUNTRY = 'Country Name' CODE = 'Country Code' ENERGY = 'kg of oil / capita' # Energy use (kg of oil equivalent per capita) (2013 Data) energyCountries[ENERGY]= energyCountries[energy_INDICATOR]#.apply(roundToThousands)# cant run this till data cleaned # Consolidates data & gets rid of year & Source data [checked] headings = [COUNTRY, CODE, ENERGY] energyClean = energyCountries[headings] #energyClean.head()

Analysis: Step 4: merging the data

All the Clean files were then merged together into one data file and I was able to test out Method Chaining using the Merge function to merge five files all at the same time. I was half expecting this process to blow up in my face but it actually worked! The data file once checked was then reduced down to a finaldata file with just the columns required for analysis.

# Perform Method Chain Merge: data = popClean.merge(gdpClean, on=CODE).merge(labClean,on=CODE).merge(unempClean,on=CODE).merge(energyClean,on=CODE) data.head() # Could make use of drop axis command inplace to remove specific commands or use method in next cell #df.drop('Column name',axis=1,inplace=True)
Country Name_x Country Code Population (1000s) Country Name_y Per Capita GDP $ Country Name_x Labour force total Country Name_y Unemployment (% total force) Country Name kg of oil / capita
0 Albania ALB 2894 Albania 4588.649440 Albania 1292576.0 Albania 16.1 Albania 800.568726
1 Algeria DZA 38934 Algeria 5484.066806 Algeria 12355028.0 Algeria 9.5 Algeria 1245.992191
2 Angola AGO 24228 Angola 5232.690501 Angola 8844204.0 Angola 6.8 Angola 654.905481
3 Argentina ARG 42980 Argentina 12324.938786 Argentina 19540451.0 Argentina 8.2 Argentina 1894.617872
4 Armenia ARM 3006 Armenia 3873.533566 Armenia 1559971.0 Armenia 17.1 Armenia 969.309790
# Now Consolidate all data into one big table [checked] headings = [COUNTRY, CODE, POP, GDP, LAB, UNEMP, ENERGY] finalData = data[headings] finalData.head() #finalData.tail()
Country Name Country Code Population (1000s) Per Capita GDP $ Labour force total Unemployment (% total force) kg of oil / capita
0 Albania ALB 2894 4588.649440 1292576.0 16.1 800.568726
1 Algeria DZA 38934 5484.066806 12355028.0 9.5 1245.992191
2 Angola AGO 24228 5232.690501 8844204.0 6.8 654.905481
3 Argentina ARG 42980 12324.938786 19540451.0 8.2 1894.617872
4 Armenia ARM 3006 3873.533566 1559971.0 17.1 969.309790

Analysis: Step 5: calculating

A number of tests were then performed on this data set to determine the strength of any correlation between the data sets.

  1. Per Capita GDP & Labour Force Surprisingly there was onlt a small correlation of 0.193 between Per Capita GDP and the size of the Labour force. Yet current economic theory suggest that where more people are gathered incomes tend to be higher. This does not appear to hold true when it comes to looking at Countries

  2. Per Capita GDP & Unemployment This returned a slightly higher correlation of 0.274. Although as unemployment is measured as a % of the total labout force the relationship here between Per Capitas GDP and unemployment will be an inverse one. The more Kg of oil a country consumes the less its unemployment percentage is likely to be. Similar curves were in fact found with the next data set.

  3. Kg Oil per Capita & Unemployment This returned a marginal increase in correlation of 0.332. The graph of this data set is shown below and seems to describe an inverse realtionship in the form 1/x. Although it is quite difficult to make much sense of this data an increase in Kg of oil per Capital would suggests lower unemployment. But there is a large group of countries just sitting in the bottom left corner.

  4. Kg Oil per Capita & Labour Force This has a higher correlation coefficient of 0.44668 and would seem to make sense. The bigger the size of the Labour force the higher the KgOil consumption per Capita.

  5. GDP per Capita & Kg Oil per Capita This returned the highest Correlation Coefficient of all the data sets tested of 0.666 and is in line with common sense. However the dataset when plotted did have a number of results on the zero line for Kg Oil per Capita. I wondered whether this might be distoring the data somewhat so I took the decision to eliminat all the rows from my dataset that I had replaced with a zero. The code was amended and rerun to return a correlation factor of 0.799 the highest correlation coefficient so far.

The next step was to graph and plot this data.

from scipy.stats import spearmanr # A number of test were performed to identify which data set had the most significant correlation # However imported Energy data had had null values set to zero and it was felt this may be distorting results so they were stripped gdpColumn = finalData[GDP]# energyColumn = finalData[ENERGY]# (correlation, pValue) = spearmanr(energyColumn, gdpColumn) print('The correlation is', correlation) if pValue < 0.05: print('It is statistically significant.') else: print('It is not statistically significant.')
The correlation is 0.799198722722 It is statistically significant.

Analysis: Step 6: visualising the correlation

Mapping Per Capital Energy with Per Capita GDP:

%matplotlib inline finalData.plot(y=ENERGY, x=GDP, kind='scatter', grid=True, figsize=(10, 4))# Graph changed logx=True, # I tried both layouts; this graph seemed to be the easiest to read
<matplotlib.axes._subplots.AxesSubplot at 0x7fa4f9853d68>
Image in a Jupyter notebook

The GDP per capita is obtained by dividing the country’s gross domestic product, adjusted by inflation, by the total population.

If you look at the Per Capital GDP there is a distinct group of four countries all to the right of the vertical line measuring $80,000. These countries were investigated using the boolean search below as their per capita GDP appears to be the highest in the world. Swizerland Luxembourg & Norway all have Per Capital Kg of Oil ranging from $3.3k - $7.3k whilst Qatar is in a league of its own just short of $19k.

finalData[(finalData[GDP] > 80000) & (finalData[ENERGY] > 0)].sort(GDP,ascending=0)# These are the outliers for GDP
Country Name Country Code Population (1000s) Per Capita GDP $ Labour force total Unemployment (% total force) kg of oil / capita
76 Luxembourg LUX 556 116612.884152 266117.0 6.1 7310.309923
95 Norway NOR 5137 97429.708476 2727176.0 3.4 6438.755199
104 Qatar QAT 2172 96732.529026 1593886.0 0.3 19120.344284
119 Switzerland CHE 8189 85610.842029 4753196.0 4.5 3304.032860

I found this dataset difficult to understand. But researching these countries provided explanations which seem to support the facts

  1. Luxembourg has a total population equivalent to a small city. It is easy to find other areas the same size that have higher GDP per capita, they just aren't countries so don't appear in the rankings. Luxembourg's GDP also includes a lot of businesses that are set up there to take advantage of low taxes. Most of the revenue is actually generated elsewhere, but directed through Luxembourg by clever accounting tricks - that boosts the GDP figures. Whilst Luxembourg’s high energy use per capita can not be explained by access to abundant and cheap energy sources. The high ratio of energy use in Luxembourg can be partly explained by the low sales taxes on petroleum products, which encourage motorists and other consumers from neighbouring countries (Belgium, France and Germany) to buy their supplies in Luxembourg.

  2. Norway has enormous oil and gas resources and is a major producer of hydropower. It is Europe's largest oil producer and the third-largest natural gas exporter in the entire world behind Russia and Qatar.

  3. Qatar has a well-developed oil exploration industry where the petroleum industry accounts for 70% of its government revenue, 60% of its GDP and 85% of its export earnings. Qatar is also the most conterversial country on the list for its use of forced labour from nations throughout Asia and parts of Africa. Qatar belches out more CO2 per capita than anywhere else.

  4. Swiss banking and financial institutions keep Switzerland and its economy afloat. It is important to note that some of the wealthiest people and companies in the world own Swiss bank accounts and therefore Switzerland has excess capital to use for investment purposes. Zurich and Geneva, Switzerland's most well-known cities, have consistently ranked among the top ten highest living standard cities in the world.

Energy use (kg of oil equivalent per capita)

Energy use refers to use of primary energy before transformation to other end-use fuels, which is equal to indigenous production plus imports and stock changes, minus exports and fuels supplied to ships and aircraft engaged in international transport.

The next dataset to look at are those countries with per Capita GDP less than $80000 but with Kg of Oil / Capita greater than $9,500 which seem to contain about 5 countries. They are shown below sorted in descending order of Kg of Oil / Capita.

finalData[(finalData[GDP] < 80000) & (finalData[ENERGY] > 9500)].sort(ENERGY,ascending=0)# These are the outliers for GDP
Country Name Country Code Population (1000s) Per Capita GDP $ Labour force total Unemployment (% total force) kg of oil / capita
54 Iceland ISL 327 52036.731814 192535.0 5.0 18177.252567
125 Trinidad and Tobago TTO 1354 21317.449266 686710.0 4.0 14537.570462
31 Curacao CUW 156 0.000000 0.0 0.0 11800.977754
8 Bahrain BHR 1362 24855.215635 750065.0 3.9 10171.681017
70 Kuwait KWT 3753 43593.702018 1992954.0 3.0 9757.448683

This data extract threw up another curious collection of countries which apart from Kuwait & Bahrain I would not normally associate as being amongst the highest per Capita users of energy in the World.

  1. With a population of just 327 thousand Iceland is the world’s largest hydropower country per capita, the world’s largest geothermal energy producer per capita, and the world’s largest electricity producer per capita. It's competitively priced electricity (from hydro- and geothermal power) has attracted numerous industries and services. Currently, the aluminum industry in Iceland consumes close to 75% of all the electricity produced in Iceland. A very high share of Iceland’s energy comes from renewable sources, making Iceland one of the greenest country in the world with regard to energy consumption. In total, approximately 86% of Iceland’s consumption of primary energy comes from renewable sources. And what is especially interesting, is the fact that Iceland still has access to numerous competitive renewable energy sources yet to be harnessed.

  2. Trinidad and Tobago is the wealthiest country in the Caribbean as well as the third richest country by GDP (PPP) per capita in the Americas after the United States and Canada. It is the region's leading producer of oil and gas; and houses one of the largest natural gas processing facilities in the Western Hemisphere. Trinidad and Tobago is the leading Caribbean producer of oil and gas, and its economy is heavily dependent upon these resources but it also supplies manufactured goods, notably food and beverages, as well as cement to the Caribbean region. Oil and gas account for about 40% of GDP and 80% of exports, but only 5% of employment. https://en.wikipedia.org/wiki/Economy_of_Trinidad_and_Tobago

  3. Curaçao. With a population of 156 thousand Curaçao is an island country in the southern Caribbean Sea, approximately 65 kilometres north of the Venezuelan coast. It has one of the highest standards of living in the Caribbean. The island has a well-developed infrastructure with strong tourism and financial services sectors. Shipping, international trade, oil refining, and other activities related to the port of Willemstad (like the Free Trade Zone) also make a significant contribution to the economy. The discovery of oil in the Maracaibo Basin of Venezuela in the early 20th century forced the Venezuelan government to search for ideal locations for large scale refining. Curaçao's proximity to the country, naturally deep harbors, and stable government led Royal Dutch Shell to construct the Isla Refinery, the largest refinery in the world at the time. The refinery is located in Willemstad harbour and began operating in 1918. In 2013, the unemployment rate was 13%. Though Curaçao is considered a tax haven, it adheres to the EU Code of Conduct against harmful tax practices. https://en.wikipedia.org/wiki/Economy_of_Curaçao

  4. Bahrain. Bahrain has the freest economy in the Middle East and North Africa region and is the tenth freest economy in the world. Bahrain was recognised by the World Bank as a high income economy. Petroleum production is Bahrain's most exported product, accounting for 60% of export receipts, 70% of government revenues, and 11% of GDP. Aluminium production is the second most exported product, followed by finance and construction materials. https://en.wikipedia.org/wiki/Economy_of_Bahrain

  5. Kuwait is a constitutional monarchy with a high income economy backed by the world's sixth largest oil reserves. According to the World Bank, the country has the fourth highest per capita income in the world. 70% of the population are expatriates, while only 30% of the population are Kuwaiti citizens. https://en.wikipedia.org/wiki/Kuwait

About 'EG.USE.PCAP.KG.OE' - Energy use (kg of oil equivalent per capita)

Development Relevance: In developing economies growth in energy use is closely related to growth in the modern sectors - industry, motorized transport, and urban areas - but energy use also reflects climatic, geographic, and economic factors (such as the relative price of energy). Energy use has been growing rapidly in low- and middle-income economies, but high-income economies still use almost five times as much energy on a per capita basis. Governments in many countries are increasingly aware of the urgent need to make better use of the world's energy resources. Improved energy efficiency is often the most economic and readily available means of improving energy security and reducing greenhouse gas emissions.

Limitations and Exceptions: The IEA makes these estimates in consultation with national statistical offices, oil companies, electric utilities, and national energy experts. The IEA occasionally revises its time series to reflect political changes, and energy statistics undergo continual changes in coverage or methodology as more detailed energy accounts become available. Breaks in series are therefore unavoidable.

Mapping Per Capital Energy to Unemployment % of Labour force:

%matplotlib inline finalData.plot(y=UNEMP, x=ENERGY, kind='scatter', grid=True, figsize=(10, 4))# Graph changed logx=True,
<matplotlib.axes._subplots.AxesSubplot at 0x7fa4ffb55358>
Image in a Jupyter notebook

Whilst a significant correlation was also found between energy and Unemployment logic would suggest that this is an inverse relationship and plotting the county values suggest a curve comming down the y axis and turning to run along the x axis. This a clasical inverse relationship with most of the data being bunched in the bottom left corner. As we have already considered the high per Capita energy & GNP countries the following extract focuses on those countries with low energy per Capita & high unemployment. The list has been sorted in descending order of unemployment as % of the Labour force.

finalData[(finalData[UNEMP] > 15) & (finalData[ENERGY] <5000)].sort(UNEMP,ascending=0)#
Country Name Country Code Population (1000s) Per Capita GDP $ Labour force total Unemployment (% total force) kg of oil / capita
14 Bosnia and Herzegovina BIH 3818 4851.660528 1496651.0 27.900000 1687.968693
77 Macedonia, FYR MKD 2076 5453.281275 952721.0 27.900000 1349.475017
48 Greece GRC 10892 21627.354288 4937216.0 26.299999 2134.104396
113 South Africa ZAF 54059 6472.101027 20001038.0 25.100000 2655.850165
115 Spain ESP 46481 29718.500216 23345028.0 24.700001 2503.794194
86 Mozambique MOZ 27216 622.638064 12471084.0 22.600000 407.372414
109 Serbia SRB 7131 6200.173221 3132558.0 22.200001 2078.490179
43 Gabon GAB 1688 10772.061754 645573.0 19.700001 1434.913543
74 Libya LBY 6259 6573.386737 2324217.0 19.200001 2711.300391
84 Montenegro MNE 622 7378.044404 251741.0 19.100000 1651.299808
88 Namibia NAM 2403 5342.944460 901859.0 18.600000 742.347200
15 Botswana BWA 2220 7153.444325 1155963.0 18.200001 1098.358841
137 Yemen, Rep. YEM 26184 0.000000 7628379.0 17.400000 323.718668
4 Armenia ARM 3006 3873.533566 1559971.0 17.100000 969.309790
29 Croatia HRV 4238 13480.650754 1845114.0 16.700001 1813.931187
58 Iraq IRQ 35273 6336.468066 8801559.0 16.400000 1466.608239
0 Albania ALB 2894 4588.649440 1292576.0 16.100000 800.568726
32 Cyprus CYP 1154 27245.744625 613733.0 15.600000 1691.102893

The top 10 Countries in this list have unemployment rates around or above 20%: Greece South Africa Spain & Mozambique all have large populations & struggling economies. What is noticeable is the inclusion of Iraq & Libya which are two of the major oil producing economies yet due War their economies have suffered major setbacks.

Conclusions

The data acquisition was not difficult and the data cleaning although initially laborious got quicker with each file that was repeated.

However the analysis of the final data proved to be more difficult than anticipated to understand. The decision to reanalyse the data after removing all countries with no per Capita energy results turned out to be the right one as the correlation coefficients were improved and it was easier to focus on remaining data.

What surprised me most about this study was the list of countries from the two data extracts that were so diverse and out of the ordinary but which on further investigation turned out to be substantialy correct.