{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "# Project 1: Deaths by tuberculosis\n", "\n", "by Michel Wermelinger and Alexandre Campos, 14 July 2015, edited 5 April 2016, updated 18 October, 20 December 2017 and 20 November 2018.\n", "\n", "This is the project notebook for the first part of The Open University's _Learn to code for Data Analysis_ course.\n", "\n", "In 2000, the United Nations set eight Millenium Development Goals (MDGs) to reduce poverty and diseases, improve gender equality and environmental sustainability, etc. Each goal is quantified and time-bound, to be achieved by the end of 2015. Goal 6 is to have halted and started reversing the spread of HIV, malaria and tuberculosis (TB).\n", "TB doesn't make headlines like Ebola, SARS (severe acute respiratory syndrome) and other epidemics, but is far deadlier. For more information, see the World Health Organisation (WHO) page .\n", "\n", "Given the population and number of deaths due to TB in some countries during one year, the following questions will be answered: \n", "\n", "- What is the total, maximum, minimum and average number of deaths in that year?\n", "- Which countries have the most and the least deaths?\n", "- What is the death rate (deaths per 100,000 inhabitants) for each country?\n", "- Which countries have the lowest and highest death rate?\n", "\n", "The death rate allows for a better comparison of countries with widely different population sizes." ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "## The data\n", "\n", "The data consists of total population and total number of deaths due to TB (excluding HIV) in 2013 in each of the South America (Argentina, Bolivia, Brazil, Chile, Colombia, Ecuador, Guyana, Paraguay, Peru, Suriname, Uruguay, Venezuela) countries. \n", "\n", "The data was taken in July 2015 from (population) and (deaths). The uncertainty bounds of the number of deaths were ignored.\n", "\n", "The data was collected into an Excel file which should be in the same folder as this notebook." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryPopulation (1000s)TB deaths
0Argentina41446570.0
1Bolivia (Plurinational State of)10671430.0
2Brazil2003624400.0
3Chile17620220.0
4Colombia48321770.0
5Ecuador15738320.0
6Guyana800130.0
7Paraguay6802200.0
8Peru303762300.0
9Suriname53912.0
10Uruguay340740.0
11Venezuela (Bolivarian Republic of)30405480.0
\n", "
" ] }, "execution_count": 1, "metadata": { }, "output_type": "execute_result" } ], "source": [ "import warnings\n", "warnings.simplefilter('ignore', FutureWarning)\n", "\n", "from pandas import *\n", "all_data = read_excel('WHO POP TB all.xls')\n", "\n", "# subselecting only South America countries, not including:\n", "# - French Guiana (France) [https://en.wikipedia.org/wiki/French_Guiana]\n", "# - Falkland Islands (UK) [https://en.wikipedia.org/wiki/Falkland_Islands]\n", "\n", "# reference: \n", "# https://cmdlinetips.com/2018/02/how-to-subset-pandas-dataframe-based-on-values-of-a-column/\n", "# https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.copy.html\n", "# https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reset_index.html\n", "countries = [\n", " 'Argentina',\n", " 'Bolivia (Plurinational State of)', # Bolivia\n", " 'Brazil',\n", " 'Chile',\n", " 'Colombia',\n", " 'Ecuador',\n", " 'Guyana',\n", " 'Paraguay',\n", " 'Peru',\n", " 'Suriname',\n", " 'Uruguay',\n", " 'Venezuela (Bolivarian Republic of)' # Venezuela\n", "]\n", "data = all_data[all_data['Country'].isin(countries)].reset_index(drop=True).copy()\n", "data" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "## The range of the problem\n", "\n", "The column of interest is the last one." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "tbColumn = data['TB deaths']" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "The total number of deaths in 2013 is:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "9872.0" ] }, "execution_count": 3, "metadata": { }, "output_type": "execute_result" } ], "source": [ "tbColumn.sum()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "The largest and smallest number of deaths in a single country are:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "4400.0" ] }, "execution_count": 4, "metadata": { }, "output_type": "execute_result" } ], "source": [ "tbColumn.max()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "12.0" ] }, "execution_count": 5, "metadata": { }, "output_type": "execute_result" } ], "source": [ "tbColumn.min()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "From 12 to 4400 deaths is a large range. The average number of deaths, over all countries in the selected data, can give a better idea of the seriousness of the problem in each country.\n", "The average can be computed as the mean or the median. Given the wide range of deaths, the median is probably a more sensible average measure." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "822.6666666666666" ] }, "execution_count": 6, "metadata": { }, "output_type": "execute_result" } ], "source": [ "tbColumn.mean()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "375.0" ] }, "execution_count": 7, "metadata": { }, "output_type": "execute_result" } ], "source": [ "tbColumn.median()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "The median is lower than the mean. This indicates that some of the countries had a high number of TB deaths in 2013, pushing the value of the mean up." ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "## The most affected\n", "\n", "To see the most affected countries, the table is sorted in ascending order by the last column, which puts those countries in the last rows." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryPopulation (1000s)TB deaths
9Suriname53912.0
10Uruguay340740.0
6Guyana800130.0
7Paraguay6802200.0
3Chile17620220.0
5Ecuador15738320.0
1Bolivia (Plurinational State of)10671430.0
11Venezuela (Bolivarian Republic of)30405480.0
0Argentina41446570.0
4Colombia48321770.0
8Peru303762300.0
2Brazil2003624400.0
\n", "
" ] }, "execution_count": 8, "metadata": { }, "output_type": "execute_result" } ], "source": [ "data.sort_values('TB deaths')" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "The table raises the possibility that a large number of deaths may be partly due to a large population. To compare the countries on an equal footing, the death rate per 100,000 inhabitants is computed." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryPopulation (1000s)TB deathsTB deaths (per 100,000)
10Uruguay340740.01.174053
3Chile17620220.01.248581
0Argentina41446570.01.375284
11Venezuela (Bolivarian Republic of)30405480.01.578688
4Colombia48321770.01.593510
5Ecuador15738320.02.033295
2Brazil2003624400.02.196025
9Suriname53912.02.226345
7Paraguay6802200.02.940312
1Bolivia (Plurinational State of)10671430.04.029613
8Peru303762300.07.571767
6Guyana800130.016.250000
\n", "
" ] }, "execution_count": 10, "metadata": { }, "output_type": "execute_result" } ], "source": [ "populationColumn = data['Population (1000s)']\n", "data['TB deaths (per 100,000)'] = tbColumn * 100 / populationColumn\n", "data.sort_values('TB deaths (per 100,000)')" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "## Conclusions\n", "\n", "The South America countries had a total of 9872 deaths due to TB in 2013. The median shows that half of these coutries had fewer than 375 deaths. The mean (over 822) indicates that some countries had a high number. The least affected were Suriname and Uruguay, with 12 and 40 deaths respectively, and the most affected were Peru and Brazil with 2300 and 4400 deaths in a single year. However, taking the population size into account, the least affected were Uruguay and Chile with less than 1.3 deaths per 100 thousand inhabitants, and the most affected were Peru and Guyana with over 7.5 and 16 deaths respectively per 100,000 inhabitants.\n", "\n", "One should not forget that most values are estimates, and that the chosen countries are a small sample of all the world's countries. Nevertheless, they convey the message that TB is still a major cause of fatalities, and that there is a huge disparity between countries, with several ones being highly affected." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (Ubuntu Linux)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 0 }