This is my project notebook for Week 4 of The Open University's Learn to Code for Data Analysis course.
A country's economy depends, sometimes heavily, on its exports and imports. The United Nations Comtrade database provides data on global trade. It will be used to analyse the UK's imports and exports of three hop-based products (see commodity descriptors below) in 2014:
from IPython.display import display, HTML
HTML('''Note: <p style="display:inline;color:darkred;">Option to toggle code visibility on/off is at <a href="#Bottom">bottom</a> of page.</p>''')
import warnings
warnings.simplefilter('ignore', FutureWarning)
from pandas import *
%matplotlib inline
The data is obtained from the United Nations Comtrade website, by selecting the following configuration:
Clicking on 'Preview' results in a message that the data exceeds 500 rows. Data was downloaded using the Download CSV button and the download file renamed appropriately.
LOCATION='comtrade_hops_uk_2014.csv'
The data can also be downloaded directly from Comtrade using the "View API Call" URL, modified in two ways:
max=500
is increased to max=5000
to make sure all data is loaded,&fmt=csv
is added at the end to obtain the data in CSV format.# LOCATION = 'http://comtrade.un.org/api/get?max=5000&type=C&freq=M&px=HS&ps=2014&r=826&p=all&rg=1%2C2&cc=1210%2C121010%2C130213&fmt=csv'
On reading in the data, the commodity code has to be read as a string, to not lose the leading zero.
hops = read_csv(LOCATION, dtype={'Commodity Code':str})
hops
hops.tail(2)
Most columns are irrelevant for this analysis, or contain always the same value, like the year and reporter columns. The commodity code is transformed into a short but descriptive text and only the relevant columns are selected.
def hopsType(code):
if code == '1210': # Hop cones, fresh or dried, lupulin
return 'cones_1'
if code == '121010': # Hop cones, not ground, powdered or pelleted
return 'cones_2'
if code == '130213': # Hop extract
return 'extract'
return 'unknown'
COMMODITY = 'Hop-based products'
hops[COMMODITY] = hops['Commodity Code'].apply(hopsType)
MONTH = 'Period'
PARTNER = 'Partner'
FLOW = 'Trade Flow'
VALUE = 'Trade Value (US$)'
headings = [MONTH, PARTNER, FLOW, COMMODITY, VALUE]
hops = hops[headings]
hops.head()
The data contains the total imports and exports per month, under the 'World' partner. Those rows are removed to keep only the per-country data.
hops = hops[hops[PARTNER] != 'World']
hops.head()
print("\nNote: The UK trade partners consist of " + str(len(hops['Partner'].unique())) + " different countries.")
To answer the first question, 'How much does the UK export and import and is the balance positive (more exports than imports)?', the dataframe is split into two groups: exports from the UK and imports into the UK. The trade values within each group are summed up to get the total trading.
grouped = hops.groupby([FLOW])
hops_flow_totals = grouped[VALUE].aggregate(sum)
hops_flow_totals
def roundToMillions (value):
return round((value / 1000000),1)
# Let's get the trade surplus:
surplus = hops_flow_totals['Imports'] - hops_flow_totals['Exports']
print("This shows a trade surplus of about " + str(roundToMillions(surplus)) + " million US$.")
To address the second question, 'Which are the main trading partners, i.e. from/to which countries does the UK import/export the most?', the dataframe is split by country instead, and then each group aggregated for the total trade value. This is done separately for imports and exports. The result is sorted in descending order so that the main partners are at the top.
imports = hops[hops[FLOW] == 'Imports']
grouped = imports.groupby([PARTNER])
print('The UK imports from', len(grouped), 'countries.')
print('The 5 biggest exporters to the UK are:')
totalImports = grouped[VALUE].aggregate(sum).sort(inplace=False,ascending=False)
totalImports.head()
The export values can be plotted as a bar chart, making differences between countries easier to see.
totalImports.head(10).plot(kind='barh')
exports = hops[hops[FLOW] == 'Exports']
grouped = exports.groupby([PARTNER])
print('The UK exports to', len(grouped), 'countries.')
print('The 5 biggest importers from the UK are:')
totalExports = grouped[VALUE].aggregate(sum).order(ascending=False).head()
totalExports.head()
i = totalImports.head(1)[0]
i = str(roundToMillions(i))
e = totalExports.head(1)[0]
e = str(roundToMillions(e))
print("The above clearly shows that the U.S. is our largest partner.")
print("However, our imports from the U.S. are about " + i + " million US$ and our exports are about " + e + " million US$.")
balance_of_trade = (totalExports.head(1)[0]) - (totalImports.head(1)[0]) # Country's imports exceed its exports.
print("Consequently, the balance of trade is about " + str(roundToMillions(balance_of_trade)) + " million US$.")
Given that there are three commodities, the third question, 'Which are the regular customers, i.e. which countries buy hop-based exports from the UK every month?', is meant in the sense that a regular customer buys (imports) all three commodities every month. This means that if the exports dataframe is grouped by country, each group has exactly 36 rows (three commodities bought each of the 12 months). To see the countries, only the first month of one commodity has to be listed, as by definition it's the same countries every month and for the other commodity.
def buysEveryMonth(group):
return len(group) == 36
grouped = exports.groupby([PARTNER])
regular = grouped.filter(buysEveryMonth)
regular = regular[(regular[MONTH] == 201401) & (regular[COMMODITY] == 'extract')]
regular
pc=((regular[VALUE].sum() / exports[VALUE].sum())*100)
regulars = len(regular)
print("Approximately " + str(round(pc,2)) + "% of the total UK exports are due to the", regulars ,"regular customer(s) identified above.")
To address the fourth question, 'Which countries does the UK both import from and export to?', a pivot table is used to list the total export and import value for each country.
countries = pivot_table(hops, index=[PARTNER], columns=[FLOW],
values=VALUE, aggfunc=sum)
countries.head()
Removing the rows with a missing value will result in only those countries with bi-directional trade flow with the UK.
print("\nThe UK has bi-directional trade flow with the following " + str(len(countries.dropna())) + " countries.\n")
display(countries.dropna())
The UK hops-based product trade in 2014 was analysed in terms of which countries the UK mostly depends on for income (exports) and goods (imports). Over the period, the UK had a trade surplus of over 32 million US dollars with 74 different parther countries.
The U.S. is the main partner, but the UK imported almost over three times the value of hop-based goods from the U.S. than it exported to them (i.e. a substantial trade deficit of about 31 million US dollars).
The UK exported to 72 countries during the period, but only imported from 19 countries. Germany and France are the next two main customers for both hop imports into, and exports out of, the UK.
Interestingly, the UK is not very heavily dependent on its regular customers, since there are only two countries out of 74 that buy all three types of hops-based commodities every month (France and Ireland). This accounts for less than 1% of exports. However, the UK does rely significantly on France for imported goods.
The UK has bi-directional trade (i.e. both exports and imports) with 17 countries, although for some (i.e. Peru) the trade value is suprisingly low.
print("\nPandas version", pandas.__version__, end="")
from IPython.display import HTML
HTML('''<script>
function toggler() {
$('div.input').toggle();
location.href="#Bottom";
}
</script>
<p style="display:inline;"><center>Click <a href="javascript:toggler();">here</a> to toggle code visibility on/off</center></p>
<script>
$('div.input').show();
location.href="#Top";</script></div>''')