Jupyter notebook Learn to code OU/week3/World Bank Project/WB_2_indicators.ipynb

Path: Learn to code OU/week3/World Bank Project/WB_2_indicators.ipynb

Views: ⁷⁵

Kernel: Anaconda (Python 3)

Week 3: New Project Task by Chris Pyves 1st November 2016

To create a completely new project by choosing another two of the World Bank indicators and see if there is any correlation between them. If there is a choice of similar indicators, choose one that leads to meaningful comparisons between countries.

introduction

Although we were directed to look at the World Bank list of indicators I found the range of data sets presented somewhat overwhelming and initially I had some difficulty searching through the website to try and find some suitable data that I could work with. When I followed up a particular subject that interested me I found that the data belonged to a third party or was almost non existant or partial. There were lots of avenues to persue and it was easy to get lost in all the profusion of data. However eventually I stumbled across the World Bank page listing all their indicators (which is what I should have looked at from the outsett) and this was where I made my selection from http://data.worldbank.org/indicator I chose to download the following datasets for 2014 for all countries:

'SP.POP.TOTL' - Population, total
'NY.GDP.PCAP.CD' - GDP per capita (current USD)
'SL.TLF.TOTL.IN' - Labor force, total
'SL.UEM.TOTL.ZS' - Unemployment, total (% of total labor force) (modeled ILO estimate)
'EG.USE.PCAP.KG.OE' - Energy use (kg of oil equivalent per capita) (2013 Data as 2014 was not available)

I wanted to look at whether there was any statistical correlation between GDP per Capita and a counties labour force its rate of unemployment or energy use. I suspected that energy use went 'hand-in-hand' with GDP per Capita. But wanted to see if there was any relation to the size of the Labour force or unemployment. In particular I wanted to plot this information on a chart to see what it looked like.

The project will be set out in the following manner:

Analysis

Step 1: downloading the data
Step 2: cleaning the data
Step 3: transforming the data
Step 4: merging the data
Step 5: calculating
Step 6: visualising the correlation

Conclusions

Analysis

Analysis: Step 1: downloading

The Datasets were all downloaded as csv files without problems and then uploaded to SageMathCloud again without any problems.

In [48]:

import warnings
warnings.simplefilter('ignore', FutureWarning)

from pandas import *

POP_INDICATOR = 'SP.POP.TOTL' # - Population, total 
GDP_INDICATOR = 'NY.GDP.PCAP.CD' # - GDP per capita (current US$)
LAB_INDICATOR = 'SL.TLF.TOTL.IN'  # - Labor force, total
UNEMP_INDICATOR = 'SL.UEM.TOTL.ZS' # - Unemployment, total (% of total labor force) (modeled ILO estimate)
ENERGY_INDICATOR = 'EG.USE.PCAP.KG.OE'  # - Energy use (kg of oil equivalent per capita) (2013 Data)

YEAR = 2014
########## Download command ####################################################
# popWB = download(indicator=POP_INDICATOR, country='all', start=YEAR, end=YEAR)
# download(indicator=string, country='all', start=number, end=number) is a function in the pandas.io.wb module 
# downloads World Bank data for the given indicator and all countries and country groups for a given range of years
###############################################################################
popReset = read_csv('SP.POP.TOTL.csv', skip_blank_lines=True )
gdpReset = read_csv('NY.GDP.PCAP.CD.csv')
labReset = read_csv('SL.TLF.TOTL.IN.csv')
unempReset = read_csv('SL.UEM.TOTL.ZS.csv')
energyReset = read_csv('EG.USE.PCAP.KG.OE.csv')

In [49]:

# The following functions may be useful during the data cleaning process and so are loaded here

def roundToMillions (value):
    return round(value / 1000000)

def roundToThousands (value):
    return round(value / 1000)

# The following function will not be required:
#def usdToGBP (usd):
#    return usd / 1.564768# note if this is used check the correct rate for 2014 is used (this contains the 2013 rate)

Analysis: Step 2: cleaning

Each Reset file was then cleaned in the following order and saved as a Country file:

'SP.POP.TOTL' - Population, total
'NY.GDP.PCAP.CD' - GDP per capita (current USD)
'SL.TLF.TOTL.IN' - Labor force, total
'SL.UEM.TOTL.ZS' - Unemployment, total (% of total labor force) (modeled ILO estimate)
'EG.USE.PCAP.KG.OE' - Energy use (kg of oil equivalent per capita) (2013 Data)

POP_INDICATOR = 'SP.POP.TOTL'

Whilst the data appeared to be in good order I had difficulty trying to change the data type of the data column with Population values from object to type int64. After some testing this was found to be cause by a single cell containing two dots which presumably is used to signify that no data is currently available for that country. Whilst this was quite frustrating to have everything stopped by just one cell it turned out to a useful learning experience.

A simple solution was used and this cell was assigned the value 0.00

After that the asType method was applied to the whole column and the column type was changed to int64.

"asType(aType) when applied to a dataframe column, the method changes the data type of each value in that column to the type given by the string aType." Glossary Week 2

In [50]:

#Identifying the cell causing the problem that prevent asType('int64') from being applied
POP_INDICATOR = '2014 [YR2014]'
popReset[(popReset[POP_INDICATOR]=='..')]

	Series Name	Series Code	Country Name	Country Code	2014 [YR2014]
61	Population, total	SP.POP.TOTL	Eritrea	ERI	..

In [51]:

#POP_INDICATOR = 'SP.POP.TOTL' Population, total
#popReset.head()
#popReset.columns# column names Index([u'Series Name', u'Series Code', u'Country Name', u'Country Code', u'2014 [YR2014]'],dtype='object')
popCountries = popReset[:217].dropna()
POP_INDICATOR = '2014 [YR2014]' # 'SP.POP.TOTL' replace with header name
#popCountries.dtypes# check before: all objects
######################################################################################
popCountries.loc[(popCountries[POP_INDICATOR]=='..'),POP_INDICATOR]=0.0              #
######################################################################################
popCountries[POP_INDICATOR] = popCountries[POP_INDICATOR].astype('int64')# 
#popCountries.dtypes

GDP_INDICATOR = 'NY.GDP.PCAP.CD'

In [52]:

#GDP_INDICATOR = 'NY.GDP.PCAP.CD' GDP per capita (current USD)
#gdpReset.head()
#popReset.columns#column names Index([u'Series Name', u'Series Code', u'Country Name', u'Country Code', u'2014 [YR2014]'],dtype='object')
gdpCountries = gdpReset[:217].dropna()
GDP_INDICATOR = '2014 [YR2014]' # 'NY.GDP.PCAP.CD' replace with header name
gdpCountries.dtypes# check before: all objects
#############Clean: Try using .loc[row_indexer,col_indexer] = value instead###########
gdpCountries.loc[(gdpCountries[GDP_INDICATOR]=='..'),GDP_INDICATOR]=0.0              #
######################################################################################
gdpCountries[GDP_INDICATOR] = gdpCountries[GDP_INDICATOR].astype('float64')# 
#gdpCountries.dtypes

LAB_INDICATOR = 'SL.TLF.TOTL.IN'

In [53]:

#LAB_INDICATOR = 'SL.TLF.TOTL.IN' - Labor force, total
#labReset.head()
#popReset.columns# column names Index([u'Series Name', u'Series Code', u'Country Name', u'Country Code', u'2014 [YR2014]'],dtype='object')
labCountries = labReset[:217].dropna()
#                rename(columns={'2014 [YR2014]' : POP}) renames the column oldName to newName.
#                london = london.rename(columns={'WindDirDegrees<br />':'WindDirDegrees'})
lab_INDICATOR = '2014 [YR2014]' # 'SL.TLF.TOTL.IN' replace with header name
labCountries.dtypes# check before: all objects
#############Clean: Try using .loc[row_indexer,col_indexer] = value instead###########
labCountries.loc[(labCountries[lab_INDICATOR]=='..'),lab_INDICATOR]=0.0              #
######################################################################################
labCountries[lab_INDICATOR] = labCountries[lab_INDICATOR].astype('float64')# 
#labCountries.dtypes

UNEMP_INDICATOR = 'SL.UEM.TOTL.ZS'

In [54]:

#UNEMP_INDICATOR = 'SL.UEM.TOTL.ZS' # - Unemployment, total (% of total labor force) (modeled ILO estimate)
#unempReset.head()
#popReset.columns# column names Index([u'Series Name', u'Series Code', u'Country Name', u'Country Code', u'2014 [YR2014]'],dtype='object')
unempCountries = unempReset[:217].dropna()
unemp_INDICATOR = '2014 [YR2014]' # 'SL.UEM.TOTL.ZS' replace with header name
unempCountries.dtypes# check before: all objects
#############Clean: Try using .loc[row_indexer,col_indexer] = value instead###########
unempCountries.loc[(unempCountries[unemp_INDICATOR]=='..'),unemp_INDICATOR]=0.0      #
######################################################################################
unempCountries[unemp_INDICATOR] = unempCountries[unemp_INDICATOR].astype('float64')# 
#unempCountries.dtypes

ENERGY_INDICATOR = 'EG.USE.PCAP.KG.OE'

This file was cleaned just like all the others before. But later on it was felt that replacing the cells with '..' with zeros might not be the best course of action and a decision was made to remove all these rows from this particular data set. This can be seen by the addition of a drop command into the code. When the resultat file was checked later on it returned a better correlation coefficient and made the graph look a lot clearer.

In [55]:

#ENERGY_INDICATOR = 'EG.USE.PCAP.KG.OE'  # - Energy use (kg of oil equivalent per capita) (2013 Data)
#energyReset.head()
#popReset.columns#column names Index([u'Series Name', u'Series Code', u'Country Name', u'Country Code', u'2014 [YR2014]'],dtype='object')
energyCountries = energyReset[:217].dropna()
energy_INDICATOR = '2013 [YR2013]' # 'SL.UEM.TOTL.ZS' replace with header name
energyCountries.dtypes# check before: all objects
#############Clean: Try using .loc[row_indexer,col_indexer] = value instead###########
#energyCountries.loc[(energyCountries[energy_INDICATOR]=='..'),energy_INDICATOR]=0.0  # try deleting these rows.
# df.drop(df.index[[1,3]])
energyColumn = energyCountries.drop(energyCountries.index[energyCountries[energy_INDICATOR]=='..'])  # deletes all these rows.
######################################################################################
energyCountries = energyColumn
#energyCountries.dtypes
#energyCountries['2013 [YR2013]']
energyCountries[energy_INDICATOR] = energyCountries[energy_INDICATOR].astype('float64')#
#energyCountries.dtypes

Analysis: Step 3: transforming

Each individual Country file was reduced to just essential columns and saved as a Clean file. I kept the Country Codes column as I felt that this would prove to be a better 'Common Column' join all the tables with.

"A join is the merging of two tables on a common column. The resulting table has all columns of both tables (the common column isn’t duplicated), and the rows are determined by the type of join. Rows in the two tables that have the same value in the common column become a joined row in the resulting table." Glossary Week 3

In [56]:

#POP_INDICATOR = 'SP.POP.TOTL' Population, total
# Round Population to 1000s
COUNTRY = 'Country Name'
CODE = 'Country Code'
POP = 'Population (1000s)'
popCountries[POP]= popCountries[POP_INDICATOR].apply(roundToThousands)# cant run this till data cleaned

# Consolidates data & gets rid of year & Source data [checked]
headings = [COUNTRY, CODE, POP]
popClean = popCountries[headings]
#popClean.head()

In [57]:

#GDP_INDICATOR = 'NY.GDP.PCAP.CD' GDP per capita (current USD)
# Round GDP to? leave as is for time being
COUNTRY = 'Country Name'
CODE = 'Country Code'
GDP = 'Per Capita GDP $'
gdpCountries[GDP]= gdpCountries[GDP_INDICATOR]#.apply(roundToThousands)# cant run this till data cleaned

# Consolidates data & gets rid of year & Source data [checked]
headings = [COUNTRY, CODE, GDP]
gdpClean = gdpCountries[headings]
#gdpClean.head()

In [58]:

#LAB_INDICATOR = 'SL.TLF.TOTL.IN' - Labor force, total
# Round LAB to ?? leave as is for the time being
COUNTRY = 'Country Name'
CODE = 'Country Code'
LAB = 'Labour force total'
labCountries[LAB]= labCountries[lab_INDICATOR]#.apply(roundToThousands)# cant run this till data cleaned

# Consolidates data & gets rid of year & Source data [checked]
headings = [COUNTRY, CODE, LAB]
labClean = labCountries[headings]
#labClean.head()

In [59]:

#UNEMP_INDICATOR = 'SL.UEM.TOTL.ZS' # - Unemployment, total (% of total labor force) (modeled ILO estimate)
# Round Unemp to %
COUNTRY = 'Country Name'
CODE = 'Country Code'
UNEMP = 'Unemployment (% total force)' #Unemployment, total (% of total labor force) (modeled ILO estimate)
unempCountries[UNEMP]= unempCountries[unemp_INDICATOR]#

# Consolidates data & gets rid of year & Source data [checked]
headings = [COUNTRY, CODE, UNEMP]
unempClean = unempCountries[headings]
#unempClean.head()

In [60]:

#ENERGY_INDICATOR = 'EG.USE.PCAP.KG.OE'  # - Energy use (kg of oil equivalent per capita) (2013 Data)
# Round Energy to ??
COUNTRY = 'Country Name'
CODE = 'Country Code'
ENERGY = 'kg of oil / capita' # Energy use (kg of oil equivalent per capita) (2013 Data)
energyCountries[ENERGY]= energyCountries[energy_INDICATOR]#.apply(roundToThousands)# cant run this till data cleaned

# Consolidates data & gets rid of year & Source data [checked]
headings = [COUNTRY, CODE, ENERGY]
energyClean = energyCountries[headings]
#energyClean.head()

Analysis: Step 4: merging the data

All the Clean files were then merged together into one data file and I was able to test out Method Chaining using the Merge function to merge five files all at the same time. I was half expecting this process to blow up in my face but it actually worked! The data file once checked was then reduced down to a finaldata file with just the columns required for analysis.

In [61]:

# Perform Method Chain Merge:
data = popClean.merge(gdpClean, on=CODE).merge(labClean,on=CODE).merge(unempClean,on=CODE).merge(energyClean,on=CODE)
data.head()
# Could make use of drop axis command inplace to remove specific commands or use method in next cell
#df.drop('Column name',axis=1,inplace=True)

	Country Name_x	Country Code	Population (1000s)	Country Name_y	Per Capita GDP $	Country Name_x	Labour force total	Country Name_y	Unemployment (% total force)	Country Name	kg of oil / capita
0	Albania	ALB	2894	Albania	4588.649440	Albania	1292576.0	Albania	16.1	Albania	800.568726
1	Algeria	DZA	38934	Algeria	5484.066806	Algeria	12355028.0	Algeria	9.5	Algeria	1245.992191
2	Angola	AGO	24228	Angola	5232.690501	Angola	8844204.0	Angola	6.8	Angola	654.905481
3	Argentina	ARG	42980	Argentina	12324.938786	Argentina	19540451.0	Argentina	8.2	Argentina	1894.617872
4	Armenia	ARM	3006	Armenia	3873.533566	Armenia	1559971.0	Armenia	17.1	Armenia	969.309790

In [62]:

# Now Consolidate all data into one big table [checked]
headings = [COUNTRY, CODE, POP, GDP, LAB, UNEMP, ENERGY]
finalData = data[headings]
finalData.head()
#finalData.tail()

	Country Name	Country Code	Population (1000s)	Per Capita GDP $	Labour force total	Unemployment (% total force)	kg of oil / capita
0	Albania	ALB	2894	4588.649440	1292576.0	16.1	800.568726
1	Algeria	DZA	38934	5484.066806	12355028.0	9.5	1245.992191
2	Angola	AGO	24228	5232.690501	8844204.0	6.8	654.905481
3	Argentina	ARG	42980	12324.938786	19540451.0	8.2	1894.617872
4	Armenia	ARM	3006	3873.533566	1559971.0	17.1	969.309790

Analysis: Step 5: calculating

A number of tests were then performed on this data set to determine the strength of any correlation between the data sets.

Per Capita GDP & Labour Force Surprisingly there was onlt a small correlation of 0.193 between Per Capita GDP and the size of the Labour force. Yet current economic theory suggest that where more people are gathered incomes tend to be higher. This does not appear to hold true when it comes to looking at Countries
Per Capita GDP & Unemployment This returned a slightly higher correlation of 0.274. Although as unemployment is measured as a % of the total labout force the relationship here between Per Capitas GDP and unemployment will be an inverse one. The more Kg of oil a country consumes the less its unemployment percentage is likely to be. Similar curves were in fact found with the next data set.
Kg Oil per Capita & Unemployment This returned a marginal increase in correlation of 0.332. The graph of this data set is shown below and seems to describe an inverse realtionship in the form 1/x. Although it is quite difficult to make much sense of this data an increase in Kg of oil per Capital would suggests lower unemployment. But there is a large group of countries just sitting in the bottom left corner.
Kg Oil per Capita & Labour Force This has a higher correlation coefficient of 0.44668 and would seem to make sense. The bigger the size of the Labour force the higher the KgOil consumption per Capita.
GDP per Capita & Kg Oil per Capita This returned the highest Correlation Coefficient of all the data sets tested of 0.666 and is in line with common sense. However the dataset when plotted did have a number of results on the zero line for Kg Oil per Capita. I wondered whether this might be distoring the data somewhat so I took the decision to eliminat all the rows from my dataset that I had replaced with a zero. The code was amended and rerun to return a correlation factor of 0.799 the highest correlation coefficient so far.

The next step was to graph and plot this data.

In [63]:

from scipy.stats import spearmanr
# A number of test were performed to identify which data set had the most significant correlation
# However imported Energy data had had null values set to zero and it was felt this may be distorting results so they were stripped

gdpColumn = finalData[GDP]# 
energyColumn = finalData[ENERGY]# 
(correlation, pValue) = spearmanr(energyColumn, gdpColumn)
print('The correlation is', correlation)
if pValue < 0.05:
    print('It is statistically significant.')
else:
    print('It is not statistically significant.')

The correlation is 0.799198722722
It is statistically significant.

Analysis: Step 6: visualising the correlation

Mapping Per Capital Energy with Per Capita GDP:

In [64]:

%matplotlib inline
finalData.plot(y=ENERGY, x=GDP, kind='scatter', grid=True, figsize=(10, 4))# Graph changed logx=True, 
# I tried both layouts; this graph seemed to be the easiest to read

<matplotlib.axes._subplots.AxesSubplot at 0x7fa4f9853d68>

The GDP per capita is obtained by dividing the country’s gross domestic product, adjusted by inflation, by the total population.

If you look at the Per Capital GDP there is a distinct group of four countries all to the right of the vertical line measuring $80,000. These countries were investigated using the boolean search below as their per capita GDP appears to be the highest in the world. Swizerland Luxembourg & Norway all have Per Capital Kg of Oil ranging from $3.3k - $7.3k whilst Qatar is in a league of its own just short of $19k.

In [65]:

finalData[(finalData[GDP] > 80000) & (finalData[ENERGY] > 0)].sort(GDP,ascending=0)# These are the outliers for GDP

	Country Name	Country Code	Population (1000s)	Per Capita GDP $	Labour force total	Unemployment (% total force)	kg of oil / capita
76	Luxembourg	LUX	556	116612.884152	266117.0	6.1	7310.309923
95	Norway	NOR	5137	97429.708476	2727176.0	3.4	6438.755199
104	Qatar	QAT	2172	96732.529026	1593886.0	0.3	19120.344284
119	Switzerland	CHE	8189	85610.842029	4753196.0	4.5	3304.032860

I found this dataset difficult to understand. But researching these countries provided explanations which seem to support the facts

Luxembourg has a total population equivalent to a small city. It is easy to find other areas the same size that have higher GDP per capita, they just aren't countries so don't appear in the rankings. Luxembourg's GDP also includes a lot of businesses that are set up there to take advantage of low taxes. Most of the revenue is actually generated elsewhere, but directed through Luxembourg by clever accounting tricks - that boosts the GDP figures. Whilst Luxembourg’s high energy use per capita can not be explained by access to abundant and cheap energy sources. The high ratio of energy use in Luxembourg can be partly explained by the low sales taxes on petroleum products, which encourage motorists and other consumers from neighbouring countries (Belgium, France and Germany) to buy their supplies in Luxembourg.
Norway has enormous oil and gas resources and is a major producer of hydropower. It is Europe's largest oil producer and the third-largest natural gas exporter in the entire world behind Russia and Qatar.
Qatar has a well-developed oil exploration industry where the petroleum industry accounts for 70% of its government revenue, 60% of its GDP and 85% of its export earnings. Qatar is also the most conterversial country on the list for its use of forced labour from nations throughout Asia and parts of Africa. Qatar belches out more CO2 per capita than anywhere else.
Swiss banking and financial institutions keep Switzerland and its economy afloat. It is important to note that some of the wealthiest people and companies in the world own Swiss bank accounts and therefore Switzerland has excess capital to use for investment purposes. Zurich and Geneva, Switzerland's most well-known cities, have consistently ranked among the top ten highest living standard cities in the world.

Energy use (kg of oil equivalent per capita)

Energy use refers to use of primary energy before transformation to other end-use fuels, which is equal to indigenous production plus imports and stock changes, minus exports and fuels supplied to ships and aircraft engaged in international transport.

The next dataset to look at are those countries with per Capita GDP less than $80000 but with Kg of Oil / Capita greater than $9,500 which seem to contain about 5 countries. They are shown below sorted in descending order of Kg of Oil / Capita.

In [66]:

finalData[(finalData[GDP] < 80000) & (finalData[ENERGY] > 9500)].sort(ENERGY,ascending=0)# These are the outliers for GDP

	Country Name	Country Code	Population (1000s)	Per Capita GDP $	Labour force total	Unemployment (% total force)	kg of oil / capita
54	Iceland	ISL	327	52036.731814	192535.0	5.0	18177.252567
125	Trinidad and Tobago	TTO	1354	21317.449266	686710.0	4.0	14537.570462
31	Curacao	CUW	156	0.000000	0.0	0.0	11800.977754
8	Bahrain	BHR	1362	24855.215635	750065.0	3.9	10171.681017
70	Kuwait	KWT	3753	43593.702018	1992954.0	3.0	9757.448683

This data extract threw up another curious collection of countries which apart from Kuwait & Bahrain I would not normally associate as being amongst the highest per Capita users of energy in the World.

With a population of just 327 thousand Iceland is the world’s largest hydropower country per capita, the world’s largest geothermal energy producer per capita, and the world’s largest electricity producer per capita. It's competitively priced electricity (from hydro- and geothermal power) has attracted numerous industries and services. Currently, the aluminum industry in Iceland consumes close to 75% of all the electricity produced in Iceland. A very high share of Iceland’s energy comes from renewable sources, making Iceland one of the greenest country in the world with regard to energy consumption. In total, approximately 86% of Iceland’s consumption of primary energy comes from renewable sources. And what is especially interesting, is the fact that Iceland still has access to numerous competitive renewable energy sources yet to be harnessed.
Trinidad and Tobago is the wealthiest country in the Caribbean as well as the third richest country by GDP (PPP) per capita in the Americas after the United States and Canada. It is the region's leading producer of oil and gas; and houses one of the largest natural gas processing facilities in the Western Hemisphere. Trinidad and Tobago is the leading Caribbean producer of oil and gas, and its economy is heavily dependent upon these resources but it also supplies manufactured goods, notably food and beverages, as well as cement to the Caribbean region. Oil and gas account for about 40% of GDP and 80% of exports, but only 5% of employment. https://en.wikipedia.org/wiki/Economy_of_Trinidad_and_Tobago
Curaçao. With a population of 156 thousand Curaçao is an island country in the southern Caribbean Sea, approximately 65 kilometres north of the Venezuelan coast. It has one of the highest standards of living in the Caribbean. The island has a well-developed infrastructure with strong tourism and financial services sectors. Shipping, international trade, oil refining, and other activities related to the port of Willemstad (like the Free Trade Zone) also make a significant contribution to the economy. The discovery of oil in the Maracaibo Basin of Venezuela in the early 20th century forced the Venezuelan government to search for ideal locations for large scale refining. Curaçao's proximity to the country, naturally deep harbors, and stable government led Royal Dutch Shell to construct the Isla Refinery, the largest refinery in the world at the time. The refinery is located in Willemstad harbour and began operating in 1918. In 2013, the unemployment rate was 13%. Though Curaçao is considered a tax haven, it adheres to the EU Code of Conduct against harmful tax practices. https://en.wikipedia.org/wiki/Economy_of_Curaçao
Bahrain. Bahrain has the freest economy in the Middle East and North Africa region and is the tenth freest economy in the world. Bahrain was recognised by the World Bank as a high income economy. Petroleum production is Bahrain's most exported product, accounting for 60% of export receipts, 70% of government revenues, and 11% of GDP. Aluminium production is the second most exported product, followed by finance and construction materials. https://en.wikipedia.org/wiki/Economy_of_Bahrain
Kuwait is a constitutional monarchy with a high income economy backed by the world's sixth largest oil reserves. According to the World Bank, the country has the fourth highest per capita income in the world. 70% of the population are expatriates, while only 30% of the population are Kuwaiti citizens. https://en.wikipedia.org/wiki/Kuwait

About 'EG.USE.PCAP.KG.OE' - Energy use (kg of oil equivalent per capita)

Development Relevance: In developing economies growth in energy use is closely related to growth in the modern sectors - industry, motorized transport, and urban areas - but energy use also reflects climatic, geographic, and economic factors (such as the relative price of energy). Energy use has been growing rapidly in low- and middle-income economies, but high-income economies still use almost five times as much energy on a per capita basis. Governments in many countries are increasingly aware of the urgent need to make better use of the world's energy resources. Improved energy efficiency is often the most economic and readily available means of improving energy security and reducing greenhouse gas emissions.

Limitations and Exceptions: The IEA makes these estimates in consultation with national statistical offices, oil companies, electric utilities, and national energy experts. The IEA occasionally revises its time series to reflect political changes, and energy statistics undergo continual changes in coverage or methodology as more detailed energy accounts become available. Breaks in series are therefore unavoidable.

Mapping Per Capital Energy to Unemployment % of Labour force:

In [67]:

%matplotlib inline
finalData.plot(y=UNEMP, x=ENERGY, kind='scatter', grid=True, figsize=(10, 4))# Graph changed logx=True,

<matplotlib.axes._subplots.AxesSubplot at 0x7fa4ffb55358>

Whilst a significant correlation was also found between energy and Unemployment logic would suggest that this is an inverse relationship and plotting the county values suggest a curve comming down the y axis and turning to run along the x axis. This a clasical inverse relationship with most of the data being bunched in the bottom left corner. As we have already considered the high per Capita energy & GNP countries the following extract focuses on those countries with low energy per Capita & high unemployment. The list has been sorted in descending order of unemployment as % of the Labour force.

In [68]:

finalData[(finalData[UNEMP] > 15) & (finalData[ENERGY] <5000)].sort(UNEMP,ascending=0)#

	Country Name	Country Code	Population (1000s)	Per Capita GDP $	Labour force total	Unemployment (% total force)	kg of oil / capita
14	Bosnia and Herzegovina	BIH	3818	4851.660528	1496651.0	27.900000	1687.968693
77	Macedonia, FYR	MKD	2076	5453.281275	952721.0	27.900000	1349.475017
48	Greece	GRC	10892	21627.354288	4937216.0	26.299999	2134.104396
113	South Africa	ZAF	54059	6472.101027	20001038.0	25.100000	2655.850165
115	Spain	ESP	46481	29718.500216	23345028.0	24.700001	2503.794194
86	Mozambique	MOZ	27216	622.638064	12471084.0	22.600000	407.372414
109	Serbia	SRB	7131	6200.173221	3132558.0	22.200001	2078.490179
43	Gabon	GAB	1688	10772.061754	645573.0	19.700001	1434.913543
74	Libya	LBY	6259	6573.386737	2324217.0	19.200001	2711.300391
84	Montenegro	MNE	622	7378.044404	251741.0	19.100000	1651.299808
88	Namibia	NAM	2403	5342.944460	901859.0	18.600000	742.347200
15	Botswana	BWA	2220	7153.444325	1155963.0	18.200001	1098.358841
137	Yemen, Rep.	YEM	26184	0.000000	7628379.0	17.400000	323.718668
4	Armenia	ARM	3006	3873.533566	1559971.0	17.100000	969.309790
29	Croatia	HRV	4238	13480.650754	1845114.0	16.700001	1813.931187
58	Iraq	IRQ	35273	6336.468066	8801559.0	16.400000	1466.608239
0	Albania	ALB	2894	4588.649440	1292576.0	16.100000	800.568726
32	Cyprus	CYP	1154	27245.744625	613733.0	15.600000	1691.102893

The top 10 Countries in this list have unemployment rates around or above 20%: Greece South Africa Spain & Mozambique all have large populations & struggling economies. What is noticeable is the inclusion of Iraq & Libya which are two of the major oil producing economies yet due War their economies have suffered major setbacks.

Conclusions

The data acquisition was not difficult and the data cleaning although initially laborious got quicker with each file that was repeated.

However the analysis of the final data proved to be more difficult than anticipated to understand. The decision to reanalyse the data after removing all countries with no per Capita energy results turned out to be the right one as the correlation coefficients were improved and it was easier to focus on remaining data.

What surprised me most about this study was the list of countries from the two data extracts that were so diverse and out of the ordinary but which on further investigation turned out to be substantialy correct.