{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "# Project 2: Holiday weather\n", "\n", "by Jake Stokes, 2nd of July 2016.\n", "\n", "This is the project for Week 2 of The Open University's [_Learn to code for Data Analysis_](http://futurelearn.com/courses/learn-to-code) course.\n", "\n", "The purpose of the project is to examine historic weather data from the Weather Underground for London to try to predict the best dates this year to take a nice warm staycation. My aim will be to:\n", "- obtain weather data for the year of 2015\n", "- clean the obtained data\n", "- run some basic data analysis techniques on the data set to:\n", "- find two weeks with the highest mean temperature; and,\n", "- avoid precipitation where possible.\n", "\n", "The weather may of course may be very different this year to the weather of 2015, but it should give me some indication of when would be a good time to take a break.\n", "\n", "## Getting the data\n", "\n", "The weather data was obtained from the [Weather Underground](https://www.wunderground.com/history) website, using the dates 1st Jan 2015 til 31st Dec 2015, and saved as 'London_2015.csv'.\n", "\n", "To obtain the data you must first enter London, United Kingdom as the location, and hit submit. On the following page there are some tabs - select 'custom', and from here you can enter the dates. The option to see the data in a CSV format is at the very bottom of the page underneath the data. This can be right-click-saved, and renamed from a .html to a .csv ready for use.\n", "\n", "If you haven't the 'London_2014.csv' file, you can obtain the data as follows. Right-click on the following URL and choose 'Open Link in New Window' (or similar, depending on your browser):\n", "\n", "http://www.wunderground.com/history\n", "\n", "When the new page opens start typing 'London' in the 'Location' input box and when the pop up menu comes up with the option 'London, United Kingdom' select it and then click on 'Submit'. \n", "\n", "Once ready, as shown below, I have loaded the dataframe, ensuing that any extra spaces at the start of values are removed. I have also imported the whole pandas module for data analytics." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [ ], "source": [ "import warnings\n", "warnings.simplefilter('ignore', FutureWarning)\n", "\n", "from pandas import *\n", "london = read_csv('London_2015.csv', skipinitialspace=True)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "## Cleaning the data\n", "\n", "First I will display some of the data to see if there are any obvious issues.\n", "\n", "First we need to clean up the data. I'm not going to make use of `'WindDirDegrees'` in my analysis, but you might in yours so we'll rename `'WindDirDegrees< br />'` to `'WindDirDegrees'`. " ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", " | GMT | \n", "Max TemperatureC | \n", "Mean TemperatureC | \n", "Min TemperatureC | \n", "Dew PointC | \n", "MeanDew PointC | \n", "Min DewpointC | \n", "Max Humidity | \n", "Mean Humidity | \n", "Min Humidity | \n", "... | \n", "Max VisibilityKm | \n", "Mean VisibilityKm | \n", "Min VisibilitykM | \n", "Max Wind SpeedKm/h | \n", "Mean Wind SpeedKm/h | \n", "Max Gust SpeedKm/h | \n", "Precipitationmm | \n", "CloudCover | \n", "Events | \n", "WindDirDegrees<br /> | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "2015-1-1 | \n", "12 | \n", "8 | \n", "4 | \n", "11 | \n", "7 | \n", "3 | \n", "94 | \n", "88 | \n", "78 | \n", "... | \n", "18 | \n", "9 | \n", "5 | \n", "39 | \n", "21 | \n", "60 | \n", "0.51 | \n", "7 | \n", "Rain | \n", "209<br /> | \n", "
1 | \n", "2015-1-2 | \n", "11 | \n", "7 | \n", "4 | \n", "12 | \n", "4 | \n", "0 | \n", "94 | \n", "70 | \n", "41 | \n", "... | \n", "31 | \n", "16 | \n", "3 | \n", "35 | \n", "24 | \n", "50 | \n", "0.00 | \n", "2 | \n", "Rain | \n", "258<br /> | \n", "
2 | \n", "2015-1-3 | \n", "6 | \n", "4 | \n", "2 | \n", "6 | \n", "3 | \n", "1 | \n", "100 | \n", "91 | \n", "70 | \n", "... | \n", "31 | \n", "10 | \n", "2 | \n", "19 | \n", "10 | \n", "NaN | \n", "7.11 | \n", "5 | \n", "Rain | \n", "19<br /> | \n", "
3 | \n", "2015-1-4 | \n", "3 | \n", "1 | \n", "-2 | \n", "3 | \n", "1 | \n", "-2 | \n", "100 | \n", "97 | \n", "90 | \n", "... | \n", "13 | \n", "4 | \n", "0 | \n", "13 | \n", "6 | \n", "27 | \n", "0.00 | \n", "6 | \n", "Fog | \n", "225<br /> | \n", "
4 | \n", "2015-1-5 | \n", "10 | \n", "6 | \n", "2 | \n", "8 | \n", "5 | \n", "2 | \n", "100 | \n", "86 | \n", "67 | \n", "... | \n", "31 | \n", "10 | \n", "3 | \n", "19 | \n", "10 | \n", "NaN | \n", "0.25 | \n", "6 | \n", "NaN | \n", "199<br /> | \n", "
5 rows × 23 columns
\n", "\n", " | GMT | \n", "Max TemperatureC | \n", "Mean TemperatureC | \n", "Min TemperatureC | \n", "Dew PointC | \n", "MeanDew PointC | \n", "Min DewpointC | \n", "Max Humidity | \n", "Mean Humidity | \n", "Min Humidity | \n", "... | \n", "Max VisibilityKm | \n", "Mean VisibilityKm | \n", "Min VisibilityKm | \n", "Max Wind SpeedKm/h | \n", "Mean Wind SpeedKm/h | \n", "Max Gust SpeedKm/h | \n", "Precipitationmm | \n", "CloudCover | \n", "Events | \n", "WindDirDegrees | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GMT | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
2015-07-03 | \n", "2015-07-03 | \n", "27 | \n", "20 | \n", "13 | \n", "15 | \n", "11 | \n", "8 | \n", "83 | \n", "56 | \n", "23 | \n", "... | \n", "31 | \n", "21 | \n", "9 | \n", "27 | \n", "10 | \n", "NaN | \n", "8.89 | \n", "2 | \n", "Rain | \n", "83 | \n", "
2015-07-04 | \n", "2015-07-04 | \n", "27 | \n", "22 | \n", "17 | \n", "18 | \n", "14 | \n", "10 | \n", "100 | \n", "67 | \n", "33 | \n", "... | \n", "31 | \n", "8 | \n", "2 | \n", "27 | \n", "14 | \n", "42 | \n", "0.00 | \n", "2 | \n", "Rain-Thunderstorm | \n", "220 | \n", "
2015-07-10 | \n", "2015-07-10 | \n", "27 | \n", "20 | \n", "13 | \n", "11 | \n", "7 | \n", "2 | \n", "82 | \n", "45 | \n", "11 | \n", "... | \n", "31 | \n", "23 | \n", "10 | \n", "23 | \n", "11 | \n", "39 | \n", "0.00 | \n", "NaN | \n", "NaN | \n", "182 | \n", "
2015-07-11 | \n", "2015-07-11 | \n", "26 | \n", "20 | \n", "14 | \n", "14 | \n", "10 | \n", "8 | \n", "77 | \n", "49 | \n", "24 | \n", "... | \n", "31 | \n", "19 | \n", "10 | \n", "27 | \n", "13 | \n", "42 | \n", "0.00 | \n", "2 | \n", "Rain | \n", "274 | \n", "
2015-07-14 | \n", "2015-07-14 | \n", "23 | \n", "20 | \n", "17 | \n", "18 | \n", "15 | \n", "14 | \n", "100 | \n", "78 | \n", "51 | \n", "... | \n", "31 | \n", "13 | \n", "3 | \n", "24 | \n", "18 | \n", "NaN | \n", "2.03 | \n", "6 | \n", "Rain | \n", "252 | \n", "
2015-07-16 | \n", "2015-07-16 | \n", "25 | \n", "20 | \n", "14 | \n", "15 | \n", "13 | \n", "9 | \n", "88 | \n", "65 | \n", "44 | \n", "... | \n", "26 | \n", "14 | \n", "6 | \n", "24 | \n", "13 | \n", "NaN | \n", "7.11 | \n", "4 | \n", "Rain-Thunderstorm | \n", "90 | \n", "
2015-07-17 | \n", "2015-07-17 | \n", "25 | \n", "20 | \n", "14 | \n", "17 | \n", "13 | \n", "9 | \n", "94 | \n", "67 | \n", "35 | \n", "... | \n", "23 | \n", "11 | \n", "6 | \n", "35 | \n", "16 | \n", "NaN | \n", "0.25 | \n", "3 | \n", "Rain | \n", "242 | \n", "
2015-08-03 | \n", "2015-08-03 | \n", "25 | \n", "20 | \n", "16 | \n", "16 | \n", "13 | \n", "8 | \n", "83 | \n", "64 | \n", "40 | \n", "... | \n", "31 | \n", "14 | \n", "10 | \n", "34 | \n", "18 | \n", "47 | \n", "0.00 | \n", "3 | \n", "Rain | \n", "200 | \n", "
2015-08-08 | \n", "2015-08-08 | \n", "26 | \n", "20 | \n", "14 | \n", "15 | \n", "13 | \n", "10 | \n", "94 | \n", "62 | \n", "28 | \n", "... | \n", "31 | \n", "15 | \n", "10 | \n", "23 | \n", "10 | \n", "NaN | \n", "0.00 | \n", "2 | \n", "NaN | \n", "148 | \n", "
2015-08-21 | \n", "2015-08-21 | \n", "26 | \n", "22 | \n", "17 | \n", "17 | \n", "16 | \n", "13 | \n", "88 | \n", "70 | \n", "39 | \n", "... | \n", "31 | \n", "13 | \n", "10 | \n", "26 | \n", "18 | \n", "37 | \n", "0.00 | \n", "4 | \n", "NaN | \n", "199 | \n", "
2015-08-22 | \n", "2015-08-22 | \n", "31 | \n", "23 | \n", "15 | \n", "17 | \n", "14 | \n", "12 | \n", "94 | \n", "63 | \n", "27 | \n", "... | \n", "31 | \n", "16 | \n", "7 | \n", "26 | \n", "10 | \n", "NaN | \n", "0.00 | \n", "4 | \n", "NaN | \n", "114 | \n", "
11 rows × 23 columns
\n", "