{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Project 2: Holiday weather - South Africa\n", "\n", "by Rob Griffiths, 11 September 2015, updated 11 April 2017, 18 October and 20 December 2017\n", "and Rita N, 20 November 2018\n", "\n", "This is the project notebook for the second part of The Open University's _Learn to code for Data Analysis_ course.\n", "\n", "There is nothing I like better than taking a holiday. In the winter I like to have a two week break in a country where I can be guaranteed sunny dry days. In the summer I like to have two weeks off relaxing in my garden in London. However I'm often disappointed because I pick a fortnight when the weather is dull and it rains. So in this project I am going to use the historic weather data from the Weather Underground for London to try to predict two good weather weeks to take off as holiday next summer. Of course the weather in the summer of 2016 may be very different to 2014 but it should give me some indication of when would be a good time to take a summer break.\n", "\n", "In the 2018 update of this project, I have amended the analysis to determine a promising 2-week period for a vacation in the capital of South Africa.\n", "\n", "## Getting the data\n", "\n", "Weather Underground keeps historical weather data collected in many airports around the world. Right-click on the following URL and choose 'Open Link in New Window' (or similar, depending on your browser):\n", "\n", "http://www.wunderground.com/history\n", "\n", "When the new page opens start typing 'LHR' in the 'Location' input box and when the pop up menu comes up with the option 'LHR, United Kingdom' select it and then click on 'Submit'. \n", "\n", "When the next page opens with London Heathrow data, click on the 'Custom' tab and select the time period From: 1 January 2014 to: 31 December 2014 and then click on 'Get History'. The data for that year should then be displayed further down the page. \n", "\n", "You can copy each month's data directly from the browser to a text editor like Notepad or TextEdit, to obtain a single file with as many months as you wish.\n", "\n", "Weather Underground has changed in the past the way it provides data and may do so again in the future. \n", "I have therefore collated the whole 2014 data in the provided 'London_2014.csv' file.\n", "\n", "The data for South Africa is available as part of the project, and coding is updated accordingly to call the South African weather data 'CapeTown_CPT_2014.csv'\n", "\n", "Now load the CSV file into a dataframe making sure that any extra spaces are skipped:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "import warnings\n", "warnings.simplefilter('ignore', FutureWarning)\n", "\n", "from pandas import *\n", "capeTown = read_csv('CapeTown_CPT_2014.csv', skipinitialspace=True)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Now I want to see the structure of the data before starting to clean it." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", " | Date | \n", "Max TemperatureC | \n", "Mean TemperatureC | \n", "Min TemperatureC | \n", "Dew PointC | \n", "MeanDew PointC | \n", "Min DewpointC | \n", "Max Humidity | \n", "Mean Humidity | \n", "Min Humidity | \n", "... | \n", "Max VisibilityKm | \n", "Mean VisibilityKm | \n", "Min VisibilitykM | \n", "Max Wind SpeedKm/h | \n", "Mean Wind SpeedKm/h | \n", "Max Gust SpeedKm/h | \n", "Precipitationmm | \n", "CloudCover | \n", "Events | \n", "WindDirDegrees<br /> | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "2014-1-1 | \n", "28 | \n", "23 | \n", "18 | \n", "19 | \n", "17 | \n", "15 | \n", "88 | \n", "71 | \n", "45 | \n", "... | \n", "19.0 | \n", "14.0 | \n", "10.0 | \n", "35 | \n", "14 | \n", "NaN | \n", "0.0 | \n", "2.0 | \n", "NaN | \n", "213<br /> | \n", "
1 | \n", "2014-1-2 | \n", "28 | \n", "23 | \n", "18 | \n", "19 | \n", "18 | \n", "16 | \n", "88 | \n", "74 | \n", "46 | \n", "... | \n", "26.0 | \n", "13.0 | \n", "10.0 | \n", "32 | \n", "21 | \n", "NaN | \n", "0.0 | \n", "2.0 | \n", "NaN | \n", "204<br /> | \n", "
2 | \n", "2014-1-3 | \n", "27 | \n", "23 | \n", "19 | \n", "19 | \n", "18 | \n", "18 | \n", "94 | \n", "75 | \n", "48 | \n", "... | \n", "31.0 | \n", "12.0 | \n", "3.0 | \n", "32 | \n", "26 | \n", "NaN | \n", "0.0 | \n", "4.0 | \n", "NaN | \n", "193<br /> | \n", "
3 | \n", "2014-1-4 | \n", "27 | \n", "22 | \n", "18 | \n", "19 | \n", "18 | \n", "14 | \n", "88 | \n", "74 | \n", "46 | \n", "... | \n", "26.0 | \n", "13.0 | \n", "9.0 | \n", "32 | \n", "18 | \n", "NaN | \n", "0.0 | \n", "3.0 | \n", "NaN | \n", "314<br /> | \n", "
4 | \n", "2014-1-5 | \n", "26 | \n", "22 | \n", "18 | \n", "17 | \n", "16 | \n", "14 | \n", "83 | \n", "70 | \n", "46 | \n", "... | \n", "26.0 | \n", "13.0 | \n", "10.0 | \n", "45 | \n", "21 | \n", "NaN | \n", "0.0 | \n", "4.0 | \n", "Rain | \n", "25<br /> | \n", "
5 rows × 23 columns
\n", "\n", " | Date | \n", "Max TemperatureC | \n", "Mean TemperatureC | \n", "Min TemperatureC | \n", "Dew PointC | \n", "MeanDew PointC | \n", "Min DewpointC | \n", "Max Humidity | \n", "Mean Humidity | \n", "Min Humidity | \n", "... | \n", "Max VisibilityKm | \n", "Mean VisibilityKm | \n", "Min VisibilitykM | \n", "Max Wind SpeedKm/h | \n", "Mean Wind SpeedKm/h | \n", "Max Gust SpeedKm/h | \n", "Precipitationmm | \n", "CloudCover | \n", "Events | \n", "WindDirDegrees | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
2014-01-01 | \n", "2014-01-01 | \n", "28 | \n", "23 | \n", "18 | \n", "19 | \n", "17 | \n", "15 | \n", "88 | \n", "71 | \n", "45 | \n", "... | \n", "19.0 | \n", "14.0 | \n", "10.0 | \n", "35 | \n", "14 | \n", "NaN | \n", "0.0 | \n", "2.0 | \n", "NaN | \n", "213.0 | \n", "
2014-01-02 | \n", "2014-01-02 | \n", "28 | \n", "23 | \n", "18 | \n", "19 | \n", "18 | \n", "16 | \n", "88 | \n", "74 | \n", "46 | \n", "... | \n", "26.0 | \n", "13.0 | \n", "10.0 | \n", "32 | \n", "21 | \n", "NaN | \n", "0.0 | \n", "2.0 | \n", "NaN | \n", "204.0 | \n", "
2014-01-03 | \n", "2014-01-03 | \n", "27 | \n", "23 | \n", "19 | \n", "19 | \n", "18 | \n", "18 | \n", "94 | \n", "75 | \n", "48 | \n", "... | \n", "31.0 | \n", "12.0 | \n", "3.0 | \n", "32 | \n", "26 | \n", "NaN | \n", "0.0 | \n", "4.0 | \n", "NaN | \n", "193.0 | \n", "
2014-01-04 | \n", "2014-01-04 | \n", "27 | \n", "22 | \n", "18 | \n", "19 | \n", "18 | \n", "14 | \n", "88 | \n", "74 | \n", "46 | \n", "... | \n", "26.0 | \n", "13.0 | \n", "9.0 | \n", "32 | \n", "18 | \n", "NaN | \n", "0.0 | \n", "3.0 | \n", "NaN | \n", "314.0 | \n", "
2014-01-05 | \n", "2014-01-05 | \n", "26 | \n", "22 | \n", "18 | \n", "17 | \n", "16 | \n", "14 | \n", "83 | \n", "70 | \n", "46 | \n", "... | \n", "26.0 | \n", "13.0 | \n", "10.0 | \n", "45 | \n", "21 | \n", "NaN | \n", "0.0 | \n", "4.0 | \n", "Rain | \n", "25.0 | \n", "
5 rows × 23 columns
\n", "