Project 2: My Holiday weather - Manchester 2015

by Chris Pyves, 22 October 2016

This is the my project notebook for Week 2 of The Open University's Learn to code for Data Analysis course.

Section 1: Introduction:

There is nothing I like better than growing enormous pumpkins. This is why I decided to relocate from swealtering heat of sunny Andalucia in Spain to the cooler damper climate of Manchester. It was after watching with great envy the tremendous amounts of rain that Manchester received during the staging of their Commonwealth Games in 2002. You see I am a keen amateur gardiner and whilst everyone knows that to grow world class enourmous pumpkins you need lots of 'rich home grown manure' they also need vast amounts of water. But like everything in life, there are however times when you can have too much of a good thing. And as my former gardening mentor and Regional Champion Pumpkin Grower used to say to me every day until he passed away; "You can add the water, but you can't take it away". So to increase my prospects of winning the Chorlton Allotments Prize Pumkin Growers Award next year I have decided to mount a 24 hour wind & rain watch operation during the worst of the storm season so that I can protect and watch over my little darling pumpkins. For this project I am going to have to find historic weather data from local weather collectors to help me try to predict when the wind and the rain is going to be at its worst. Of course the weather in the year 2016/17 may be very different to 2015 but hey I am not Michael Fish the weatherman either.

Sub heading A: What is the task

First I need to get hold of some weather data. But sourcing accurate reliable & consistent metrological data without having to pay a fee for the service is going to be a challenge. Fortunately I have heard that meterological data can be downloaded from a site called weather Underground completely free of charge!

Sub heading B: About my location

My location should not be a problem. I live in Manchester so I ought to be able to easily lay my hands on some quality weather data after all it the only topic that everyone talks about here.

Section 2: Data Sourcing & Acquisition:

In [399]:
import warnings
warnings.simplefilter('ignore', FutureWarning)
from pandas import *

Sub heading A: Weather Underground:

About Weather Underground:

Weather Underground challenges the convention of how weather information is shared with the public. Since 1993 their community and meteorologists have been providing an internet weather service with unique access to free meaningful weather data from around the globe. They are pioneers within this field and are constantly seeking new data sets and the next technologies that will help them share more data with more people.

There are around seven times the number of amateur stations feeding into the site compared to the number of observational stations used by the Met Office. Forecasters are asking people to invest in small meteorological stations, which can be bought from electronic shops for about £60.

Downloading Data from Weather Underground - using Save As HTML

Getting the data If you haven't the 'Manchester_2015.csv' file, you can obtain the data as follows. Right-click on the following URL and choose 'Open Link in New Window' (or similar, depending on your browser): http://www.wunderground.com/history When the new page opens start typing 'Manchester' in the 'Location' input box and when the pop up menu comes up with the option 'Manchester, United Kingdom' select it and then click on 'Submit'. When the next page opens click on the 'Custom' tab and selet the time period From: 1 January 2015 to: 31 December 2015 and then click on 'Get History'. The data for that year should then be displayed. Scroll to the end of the data and then right click on the blue link labelled 'Comma Delimited File':

  • if you are using the Chrome Browser choose Save Link As ... then, in the File dialogue that appears save the file with its default name of 'CustomHistory' to the folder you created for this course and where this notebook is located. Once the file has been downloaded rename it from 'CustomHistory.html' to 'London_2014.csv'. Now load the CSV file into a dataframe making sure that any extra spaces are skipped:
In [400]:
#manchester = read_csv('My_loc_2015.csv',skipinitialspace=True)

This data was downloaded then uploaded. Unfortunately although the data had a column for precipitation readings it seems that no recordings are being made or are availabe from this source for Manchester. Whilst I was able to import the data clean it and plot various graphs I decided to see what I could do to resolve this problem by looking for another source of data.

Sub heading B: WOW: Private Weather Stations (PWS): The Matlock Observatory (12 month rainfall data)

About the UK Met Office Weather Observations Website WOW

The Met Office is helping to co-ordinate the growth of the weather observing community in the UK, by asking anyone to submit the observations they are taking. This can be done using all levels of equipment, so there are no cost restrictions.

http://www.weatherstations.co.uk/wow.htm http://wow.metoffice.gov.uk/home The Royal Meteorological Society (RMetS) provide some simple guidelines for setting up a weather station that they have kindly provided for WOW users to download: the RMetS Guide (PDF) to setting up a weather station.

More detailed information about observing the weather is available within "The Observer's Handbook", available online from the Met Office Publications Archive. http://www.computerweekly.com/news/4500272936/Met-Office-extends-application-program-interfaces-with-CA-Technologies http://www.itv.com/news/anglia/2015-11-11/wow-the-met-office-with-your-schools-weather-observations/

About The Matlock Observatory:

After searching the Met Office website for local weather stations I found one that was recording monthly rainfall figures. This was the Matlock Observatory and their website showing their daily readings can be viewed on their website. http://wow.metoffice.gov.uk/observations/details?site_id=52446464

Downloading Data from the Matlock Observatory - direct off their webpage using Google importHTML

Weather station located on west facing side of the Derwent Valley in Matlock, Derbyshire. Submitting Observations since: May 2013 Submitted over 350 days in 2015

http://www.matlockobservatory.uk/

Google command to download table number 6 from website: =IMPORTHTML("http://www.matlockobservatory.uk/","table",6) https://blog.ouseful.info/2008/10/14/data-scraping-wikipedia-with-google-spreadsheets/ How to scrape a table from an HTML web page into a Google spreadsheet. The URL of the target web page, and the target table element both need to be in double quotes. The number N identifies the N’th table in the page (counting starts at 0) as the target table for data scraping.

In [401]:
matlock = read_csv('Matlock.csv',skipinitialspace=True)

Section 3: Cleaning the data :

Sub heading A: Weather Underground (see Appendix)

Whilst the Weather Underground data was cleaned a decision was made not to publish the data here as it would make this project too large. However a similar process was followed in cleaning the Matlock data and whilst the actual data was less the process was idential.

Sub heading B: The Matlock Observatory Data

Stage 1 Initial overview of the data & quality - focusing on Column Names

In [402]:
#This code will list the column names that have been imported - check for any spaces in front of names
matlock.columns# no problems with column names
Out[402]:
Index(['Month', 'Warmest', 'Coolest', 'Average', 'Gust', 'Rainfall',
       'Wet Days'],
      dtype='object')
In [403]:
# data cleaning - based on table overview
# This code will help you inspect the data & look for problems that will require data cleaning
matlock # As this is a small file you can look at whole file: data appears to have ported well top & bottom
# Date appears to be in the right format but you could change the name
# The following columns could have endings stripped
#'Warmest', 
#'Coolest', 
#'Average', 
#'Gust', 
#'Rainfall',
#'Wet Days'
Out[403]:
Month Warmest Coolest Average Gust Rainfall Wet Days
0 2015-1-1 13.4 °C -4.5 °C 4 mph 40 mph 70.2 mm 25
1 2015-2-1 11.7 °C -4.2 °C 2 mph 24 mph 44.1 mm 17
2 2015-3-1 17.2 °C -2.9 °C 3 mph 30 mph 72.3 mm 17
3 2015-4-1 24.0 °C -1.7 °C 2 mph 30 mph 24.3 mm 13
4 2015-5-1 21.7 °C 1.4 °C 3 mph 30 mph 96.9 mm 18
5 2015-6-1 35.6 °C 2.2 °C 2 mph 31 mph 45.9 mm 10
6 2015-7-1 36.9 °C 4.0 °C 2 mph 22 mph 17.7 mm 13
7 2015-8-1 29.2 °C 4.0 °C 2 mph 20 mph 43.2 mm 14
8 2015-9-1 23.4 °C 2.2 °C 1 mph 17 mph 11.1 mm 9
9 2015-10-1 20.9 °C 1.0 °C 1 mph 23 mph 62.1 mm 13
10 2015-11-1 22.3 °C -3.8 °C 4 mph 30 mph 97.5 mm 26
11 2015-12-1 15.0 °C -0.5 °C 6 mph 39 mph 100.2 mm 27
Now look at the column data types - how has the data been defined by Jupyter on import - do they need chaning?
In [404]:
# To look at data dtypes before cleaning
matlock.dtypes
Out[404]:
Month       object
Warmest     object
Coolest     object
Average     object
Gust        object
Rainfall    object
Wet Days     int64
dtype: object

Stage 2: Changing any Column Names that require tidying up

Change 'Month' column name to 'Date':
In [405]:
matlock = matlock.rename(columns={'Month' : 'Date'})# This is done to avoid confusion later as we will be dealing with months

Stage 3: Working through the data stripping off any rogue characters column by colum

The following columns require endings stripped off their data:
In [406]:
#'Warmest' ' °C'
#'Coolest' ' °C'
#'Average' ' mph'
#'Gust' ' mph'
#'Rainfall' ' mm'
#'Wet Days'
matlock['Warmest'] = matlock['Warmest'].str.rstrip(' °C')# ' °C'
matlock['Coolest'] = matlock['Coolest'].str.rstrip(' °C')# ' °C'
matlock['Average'] = matlock['Average'].str.rstrip(' mph')# ' mph'
matlock['Gust'] = matlock['Gust'].str.rstrip(' mph')# ' ' mph'
matlock['Rainfall'] = matlock['Rainfall'].str.rstrip(' mm')# ' mm'
In [407]:
#Run this check to make sure that all the data has been stripped and that the data is ready for the next stage
#matlock

Stage 4: Looking for NaNs in the dataset

Getting an overview of the situation then drill down by column & row
In [408]:
# This will check all columns and all rows and return any NaN's detected
matlock.isnull().sum().sum()# zero means data is all clear and nothing further needs to be donw
Out[408]:
0
Code to drill down column & row
In [409]:
# Quantifying & Resolving the NaNs problem
# How many nan are in dataframe?
#manchester.isnull().sum().sum()# returned 308
#manchester.isnull().sum() # returned breakdown by row of 308
# Max Gust SpeedKm/h            203
# CloudCover                      1
# Events                        104
#sum(manchester.isnull().sum(axis=1)>=2) # returned total NaNs by row: Total 73 containing 2
#manchester
Code that uses the fillna() to clean out NaN's
In [410]:
# Replacing Events NaNs using fillna('')
#manchester['Events'] = manchester['Events'].fillna('')# all NaNs removed

Stage 5: Changing the value types of columns

First carry out a check off your data types of what needs to be changed - before commencing
In [411]:
matlock.dtypes# Before
Out[411]:
Date        object
Warmest     object
Coolest     object
Average     object
Gust        object
Rainfall    object
Wet Days     int64
dtype: object
Identify those columns that need changing to int64:

"int64 data type is how pandas represents integers (whole numbers)."

In [412]:
# Integer data
#Average     object
#Gust        object
#Wet Days     int64
matlock['Average'] = matlock['Average'].astype('int64') 
matlock['Gust'] = matlock['Gust'].astype('int64') 
#matlock['Wet Days'] = matlock['Wet Days'].astype('int64') # already in correct format int64
Identify those columns that need changing to Float64

"float64 data type is how pandas represents floating point numbers (decimals)."

In [413]:
# Float data
#Warmest     object
#Coolest     object
#Rainfall    object
matlock['Warmest'] = matlock['Warmest'].astype('float64')
matlock['Coolest'] = matlock['Coolest'].astype('float64')
matlock['Rainfall'] = matlock['Rainfall'].astype('float64')
Identify thos columns that need changing to Datetime64

"datetime64 data type is how pandas represents dates."

In [414]:
matlock['Date'] = to_datetime(matlock['Date'])# 
In [415]:
matlock.index = matlock['Date']

Stage 6 Now check that the data has all been cleaned correctly

In [416]:
matlock.dtypes# After
Out[416]:
Date        datetime64[ns]
Warmest            float64
Coolest            float64
Average              int64
Gust                 int64
Rainfall           float64
Wet Days             int64
dtype: object

Section 4: Analysing the Data:

The 2015 Matlock Weather Data

Stage 1: Looking at the data

Select the data to analyse: Rainfall by month

In [417]:
#This returns final data file after cleaning - there should be no errors
matlock # Clean data
Out[417]:
Date Warmest Coolest Average Gust Rainfall Wet Days
Date
2015-01-01 2015-01-01 13.4 -4.5 4 40 70.2 25
2015-02-01 2015-02-01 11.7 -4.2 2 24 44.1 17
2015-03-01 2015-03-01 17.2 -2.9 3 30 72.3 17
2015-04-01 2015-04-01 24.0 -1.7 2 30 24.3 13
2015-05-01 2015-05-01 21.7 1.4 3 30 96.9 18
2015-06-01 2015-06-01 35.6 2.2 2 31 45.9 10
2015-07-01 2015-07-01 36.9 4.0 2 22 17.7 13
2015-08-01 2015-08-01 29.2 4.0 2 20 43.2 14
2015-09-01 2015-09-01 23.4 2.2 1 17 11.1 9
2015-10-01 2015-10-01 20.9 1.0 1 23 62.1 13
2015-11-01 2015-11-01 22.3 -3.8 4 30 97.5 26
2015-12-01 2015-12-01 15.0 -0.5 6 39 100.2 27

Stage 2: Turning the data into a chart

1. Producing a Dot Plot Graph: Monthly Rainfall data
In [418]:
import matplotlib.pyplot as plt
import numpy as np
In [419]:
#Start at the begining
#import matplotlib.pyplot as plt
#plt.plot([1,2,3,4,5])
plt.plot(matlock['Rainfall'],'ro')# colour 'ro' red circles
#plt.axis([a+1 for a in range(12)])# does not work with above
plt.ylabel('Rainfall in mm')
plt.show()
#http://matplotlib.org/users/pyplot_tutorial.html
2. Producing a Line Plot Graph: Rainfall
In [420]:
lines=plt.plot(matlock['Wet Days'])# works with one arguement but not two
#lines=plt.plot(np.array xdata, np.array ydata)
#lines = plt.plot(x1, y1, x2, y2)
# use keyword args
#plt.setp(lines, color='r', linewidth=2.0)
#plt.axis([a+1 for a in range(12)])# does not work
#plt.axis([datetime(2015,1),datetime(2015,12),0,100])# date value out of range
# or MATLAB style string value pairs
plt.setp(lines, 'color', 'r', 'linewidth', 2.0)
# note: data: (np.array xdata, np.array ydata)
Out[420]:
[None, None]
3. Adding a header & some x & y axis labels (and making the x axis a bit more legible with month numbers)
In [421]:
x = [a+1 for a in range(12)]
y = matlock['Rainfall']
plt.plot(x, y)

plt.xlabel('months')
plt.ylabel('rainfall')
plt.title('Manchester Rainfall 2015')
plt.grid(True)
#plt.savefig("test.png")
plt.show()
4. Trying to express this data as a Bar Chart: Rainfall
In [422]:
# Bar Chart
import matplotlib.mlab as mlab

x = [a+1 for a in range(12)]# list
y = matlock['Rainfall'] #data

plt.bar(x, y, label = 'mm', align='center')# label does not appear to work

plt.xlabel('Months')
plt.ylabel('Rainfall')
plt.title(r'Monthly Rainfall Manchester 2015')

#plt.subplots_adjust(left=0.125)
plt.show()
#http://matplotlib.org/users/screenshots.html#simple-plot
5. Adding to the Bar Chart of Rainfall and overlay line graph showing Gust Speeds (and getting the legend to display in the right place)
In [423]:
matlock.columns
Out[423]:
Index(['Date', 'Warmest', 'Coolest', 'Average', 'Gust', 'Rainfall',
       'Wet Days'],
      dtype='object')
In [424]:
# Bar Chart with line plot overlay
import matplotlib.mlab as mlab
import matplotlib.patches as mpatches# required to post legends below

x = [x+1 for x in range(len(matlock['Rainfall']))]# x list index +1
y = matlock['Rainfall'] #data1

x2 = [x+1 for x in range(len(matlock['Gust']))]# x2 list index +1
y2 = matlock['Gust']#data2  

plt.bar(x, y, label = 'Rainfall mm', align='center')# label works with legend # to align bars align='center'
plt.plot(x2, y2, label = 'Gust mph', color='r',linewidth=3.0)#  

#To make the legend appear for Rain & Gust?
blue_patch = mpatches.Patch(color='blue', label='Rainfall')
red_patch = mpatches.Patch(color='red', label='Gust')

plt.legend(handles=[blue_patch,red_patch])

plt.xlabel('Months')
plt.ylabel('Rainfall')
plt.title(r'2015 Manchester: Weather by Month')

#To move legend box outside square to right
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

plt.show()
#http://matplotlib.org/users/screenshots.html#simple-plot
6. Rainfall with Gust Chart: (adding Gust Speeds to the right hand Axis & Month names along the bottom axis )
In [425]:
# This code tackled: How to show two y axes [Left hand y axis for Rainfall & right hand y axis for Gust speed] 
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import datetime# 1

###########################################################
# How to return months along x axis instead of numbers?
###########################################################
#To generate a list of 12 short month names in order
listMonths = []
for m in range(12):# Set at 12 months
    listMonths.append((datetime.date(2000, m+1, 1).strftime('%b'))) # 
#print(listMonths)# ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
#print(listMonths)# use to check listMonths is in right format
my_xticks = listMonths
#plt.xticks(x, my_xticks)# statement used below
#plt.plot(x, y)# stement used below
##########################################################

fig, ax1 = plt.subplots()
x = [x+1 for x in range(len(matlock['Rainfall']))]# x list index +1
y1 = matlock['Rainfall'] #data1
# plt.bar(x, y1)
plt.xticks(x, my_xticks)#2
ax1.bar(x, y1,label='Rainfall mm',align='center')# ax1.plot(x, y1,'b+',label='Rainfall mm')
ax1.set_xlabel('months')# This is the label to describe x axis (not the data for each item) 
# Make the y-axis label and tick labels match the line color.
ax1.set_ylabel('rainfall mm', color='b')
for xl in ax1.get_yticklabels():
    xl.set_color('b')
##########################################################
ax2 = ax1.twinx()
y2 = matlock['Gust']# y2 data 
# plt.plot(x, y2)
ax2.plot(x, y2,'r', label='Gust mph',linewidth=2.0)# line width required colour & width increase
ax2.set_ylabel('gusts mph', color='r')#
for xl in ax2.get_yticklabels():
    xl.set_color('r')
##########################################################
blue_patch = mpatches.Patch(color='blue', label='Rainfall')
red_patch = mpatches.Patch(color='red', label='Gust')
plt.legend(handles=[blue_patch,red_patch],loc=9)# Note loc codes: 0 - 10  0-'best' 9-'top centre' (see next link below)
#loc : int or string or pair of floats, default: ‘upper right’ 4- 'lower right'
#http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.legend
#Alternatively can be a 2-tuple giving x, y of the lower-left corner of the legend in axes coordinates.
#plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)# plots legend outside box but only 1 value - do not use
plt.title(r'2015 Manchester: Weather by Month')

plt.show()

Stage 3: Conclusion

The graphs have shown that there are two main seasons for rain in Manchester; later October into November December and January. Febuary is quiet with a brief return in March which falls back in April before the May showers kick in befpore summer. If we match this to the gust speed the peak times are November December & January. For enthusiastc amateur pumpkin growers like myself these three winter months are the one that can cause the most damage whether it is caused by too much rain or stormy weather damaging the crops. Of course this is no guarantee that the weather pattern will repeat itself in future years. To make a sensible prediction we would need to analyse the summers for many more years. I have a feeling that before this course has ended we will be doing just that.

Section 5: Reference:

Reference: Markdown

For help in learning how to use Markdown:

either select Help Markdown or [click here][GitHub] to learn more about About writing and formatting on GitHub

[GitHub] : https://help.github.com/articles/getting-started-with-writing-and-formatting-on-github/
[wikipedia] : http://en.wikipedia.org/wiki/Markdown#Example
[dingus] : http://spec.commonmark.org/dingus/
[babelmark] : http://johnmacfarlane.net/babelmark2/faq.html
[workflow] : http://idratherbewriting.com/2013/06/04/exploring-markdown-in-collaborative-authoring-to-publishing-workflows/

Reference: Coding

In [426]:
# This is your Basic Plot template: for building upon

#import matplotlib.pyplot as plt

#x=[]# index: # [x+1 for x in range(len(df['Column name']))]# x list index +1
#y=[]#data to be plotted # df['Column name']

#plot(x,y)# select type & add in any required arguments

#plt.xlabel('x description')
#plt.xlabel('y description')
#plt.title('A title for your graph')
#plt.legend()

#plt.show()# draws the graph or saves it
In [427]:
# How to generate short month list:
#import datetime# 1
#mydate = datetime.datetime.now()# 2
#mydate.strftime("%b")# little b%b prints 'dec' # 3
###################################
#matlock.ix[0,'Date']# returns Timestamp('2015-01-01 00:00:00')
####################################
listMonths = []
for m in range(12):# Set at 12 months
    listMonths.append((datetime.date(2000, m+1, 1).strftime('%b'))) # 
#print(listMonths)# ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
listMonths = ' '.join(listMonths)# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#print(listMonths)