Jupyter notebook Learn to code OU/week2/2016-10-24-000940.ipynb
Project 2: My Holiday weather - Manchester 2015
by Chris Pyves, 22 October 2016
This is the my project notebook for Week 2 of The Open University's Learn to code for Data Analysis course.
Section 1: Introduction:
There is nothing I like better than growing enormous pumpkins. This is why I decided to relocate from the swealtering sticky heat of sunny Andalucia in Spain to the damp & cooler climate of Manchester. It was after watching with great envy the tremendous amounts of rain that Manchester received during the staging of their Commonwealth Games in 2002. You see I am a keen amateur gardiner and whilst everyone knows that to grow world class enourmous pumpkins they need lots of 'rich home grown manure' they also need vast amounts of water. But as in life, there are times when you can have too much of a good thing. And as my former gardening mentor and Regional Champion Pumpkin Grower used to tell me every day, until he passed away; "You can add the water, but you can't take it away". So to increase my prospects of winning the Chorlton Allotments Prize Pumkin Growers Award next year I have decided to mount a 24 hour Wind & Rain Watch operation during the worst of the storm season so that I can protect and watch over my little darling pumpkins. For this project I am going to have to find historic weather data from local weather collectors to help me try to predict when the wind and the rain is going to be at its worst. Of course the weather in the year 2016/17 may be very different to 2015 but hey I am not Michael Fish the weatherman either.
Sub heading A: What is the task
First I need to get hold of some weather data. But sourcing accurate reliable & consistent metrological data without having to pay a fee for the service is going to be a challenge. Fortunately I have heard that meterological data can be downloaded from a site called weather Underground completely free of charge!
Sub heading B: About my location
My location should not be a problem. I live in Manchester so I ought to be able to easily lay my hands on some quality weather data after all it the only topic that everyone talks about here.
Section 2: Data Sourcing & Acquisition:
Sub heading A: Weather Underground:
About Weather Underground:
Weather Underground challenges the convention of how weather information is shared with the public. Since 1993 their community and meteorologists have been providing an internet weather service with unique access to free meaningful weather data from around the globe. They are pioneers within this field and are constantly seeking new data sets and the next technologies that will help them share more data with more people.
There are around seven times the number of amateur stations feeding into the site compared to the number of observational stations used by the Met Office. Forecasters are asking people to invest in small meteorological stations, which can be bought from electronic shops for about £60.
Downloading Data from Weather Underground - using Save As HTML
Getting the data If you haven't the 'Manchester_2015.csv' file, you can obtain the data as follows. Right-click on the following URL and choose 'Open Link in New Window' (or similar, depending on your browser): http://www.wunderground.com/history When the new page opens start typing 'Manchester' in the 'Location' input box and when the pop up menu comes up with the option 'Manchester, United Kingdom' select it and then click on 'Submit'. When the next page opens click on the 'Custom' tab and selet the time period From: 1 January 2015 to: 31 December 2015 and then click on 'Get History'. The data for that year should then be displayed. Scroll to the end of the data and then right click on the blue link labelled 'Comma Delimited File':
if you are using the Chrome Browser choose Save Link As ... then, in the File dialogue that appears save the file with its default name of 'CustomHistory' to the folder you created for this course and where this notebook is located. Once the file has been downloaded rename it from 'CustomHistory.html' to 'My_loc_2015.csv'. Now load the CSV file into a dataframe making sure that any extra spaces are skipped:
This data was downloaded then uploaded. Unfortunately although the data had a column for precipitation readings it seems that no recordings are being made or are availabe from this source for Manchester. Whilst I was able to import the data clean it and plot various graphs I decided to see what I could do to resolve this problem by looking for another source of data.
Sub heading B: WOW: Private Weather Stations (PWS): The Matlock Observatory (12 month rainfall data)
About the UK Met Office Weather Observations Website WOW
The Met Office is helping to co-ordinate the growth of the weather observing community in the UK, by asking anyone to submit the observations they are taking. This can be done using all levels of equipment, so there are no cost restrictions.
http://www.weatherstations.co.uk/wow.htm http://wow.metoffice.gov.uk/home The Royal Meteorological Society (RMetS) provide some simple guidelines for setting up a weather station that they have kindly provided for WOW users to download: the RMetS Guide (PDF) to setting up a weather station.
More detailed information about observing the weather is available within "The Observer's Handbook", available online from the Met Office Publications Archive. http://www.computerweekly.com/news/4500272936/Met-Office-extends-application-program-interfaces-with-CA-Technologies http://www.itv.com/news/anglia/2015-11-11/wow-the-met-office-with-your-schools-weather-observations/
About The Matlock Observatory:
After searching the Met Office website for local weather stations I found one that was recording monthly rainfall figures. This was the Matlock Observatory and their website showing their daily readings can be viewed on their website. http://wow.metoffice.gov.uk/observations/details?site_id=52446464
Downloading Data from the Matlock Observatory - direct off their webpage using Google importHTML
Weather station located on west facing side of the Derwent Valley in Matlock, Derbyshire. Submitting Observations since: May 2013 Submitted over 350 days in 2015
http://www.matlockobservatory.uk/
Google command to download table number 6 from website: =IMPORTHTML("http://www.matlockobservatory.uk/","table",6) https://blog.ouseful.info/2008/10/14/data-scraping-wikipedia-with-google-spreadsheets/ How to scrape a table from an HTML web page into a Google spreadsheet. The URL of the target web page, and the target table element both need to be in double quotes. The number N identifies the N’th table in the page (counting starts at 0) as the target table for data scraping.
Section 3: Cleaning the data :
Sub heading A: Weather Underground
Whilst the Weather Underground data was cleaned a decision was made not to publish the data here as it would make this project too large. However a similar process was followed in cleaning the Matlock data and whilst the actual data was less the process was idential.
Sub heading B: The Matlock Observatory Data
Stage 1 Initial overview of the data & quality - focusing on Column Names
Month | Warmest | Coolest | Average | Gust | Rainfall | Wet Days | |
---|---|---|---|---|---|---|---|
0 | 2015-1-1 | 13.4 °C | -4.5 °C | 4 mph | 40 mph | 70.2 mm | 25 |
1 | 2015-2-1 | 11.7 °C | -4.2 °C | 2 mph | 24 mph | 44.1 mm | 17 |
2 | 2015-3-1 | 17.2 °C | -2.9 °C | 3 mph | 30 mph | 72.3 mm | 17 |
3 | 2015-4-1 | 24.0 °C | -1.7 °C | 2 mph | 30 mph | 24.3 mm | 13 |
4 | 2015-5-1 | 21.7 °C | 1.4 °C | 3 mph | 30 mph | 96.9 mm | 18 |
5 | 2015-6-1 | 35.6 °C | 2.2 °C | 2 mph | 31 mph | 45.9 mm | 10 |
6 | 2015-7-1 | 36.9 °C | 4.0 °C | 2 mph | 22 mph | 17.7 mm | 13 |
7 | 2015-8-1 | 29.2 °C | 4.0 °C | 2 mph | 20 mph | 43.2 mm | 14 |
8 | 2015-9-1 | 23.4 °C | 2.2 °C | 1 mph | 17 mph | 11.1 mm | 9 |
9 | 2015-10-1 | 20.9 °C | 1.0 °C | 1 mph | 23 mph | 62.1 mm | 13 |
10 | 2015-11-1 | 22.3 °C | -3.8 °C | 4 mph | 30 mph | 97.5 mm | 26 |
11 | 2015-12-1 | 15.0 °C | -0.5 °C | 6 mph | 39 mph | 100.2 mm | 27 |
Now look at the column data types - how has the data been defined by Jupyter on import - do they need changing?
Stage 2: Changing any Column Names that require tidying up
Change 'Month' column name to 'Date':
Stage 3: Working through the data stripping off any rogue characters column by colum
The following columns require endings stripped off their data:
Stage 4: Looking for NaNs in the dataset
Getting an overview of the situation then drill down by column & row
Code to drill down column & row
Code that uses the fillna() to clean out NaN's
Stage 5: Changing the value types of columns
First carry out a check off your data types of what needs to be changed - before commencing
Identify those columns that need changing to int64:
"int64 data type is how pandas represents integers (whole numbers)."
Identify those columns that need changing to Float64
"float64 data type is how pandas represents floating point numbers (decimals)."
Identify thos columns that need changing to Datetime64
"datetime64 data type is how pandas represents dates."
Stage 6 Now check that the data has all been cleaned correctly
Section 4: Analysing the Data:
The 2015 Matlock Weather Data
Stage 1: Looking at the data
Select the data to analyse: Rainfall by month
Date | Warmest | Coolest | Average | Gust | Rainfall | Wet Days | |
---|---|---|---|---|---|---|---|
Date | |||||||
2015-01-01 | 2015-01-01 | 13.4 | -4.5 | 4 | 40 | 70.2 | 25 |
2015-02-01 | 2015-02-01 | 11.7 | -4.2 | 2 | 24 | 44.1 | 17 |
2015-03-01 | 2015-03-01 | 17.2 | -2.9 | 3 | 30 | 72.3 | 17 |
2015-04-01 | 2015-04-01 | 24.0 | -1.7 | 2 | 30 | 24.3 | 13 |
2015-05-01 | 2015-05-01 | 21.7 | 1.4 | 3 | 30 | 96.9 | 18 |
2015-06-01 | 2015-06-01 | 35.6 | 2.2 | 2 | 31 | 45.9 | 10 |
2015-07-01 | 2015-07-01 | 36.9 | 4.0 | 2 | 22 | 17.7 | 13 |
2015-08-01 | 2015-08-01 | 29.2 | 4.0 | 2 | 20 | 43.2 | 14 |
2015-09-01 | 2015-09-01 | 23.4 | 2.2 | 1 | 17 | 11.1 | 9 |
2015-10-01 | 2015-10-01 | 20.9 | 1.0 | 1 | 23 | 62.1 | 13 |
2015-11-01 | 2015-11-01 | 22.3 | -3.8 | 4 | 30 | 97.5 | 26 |
2015-12-01 | 2015-12-01 | 15.0 | -0.5 | 6 | 39 | 100.2 | 27 |
Stage 2: Turning the data into a chart
1. Producing a Dot Plot Graph: Monthly Rainfall data
2. Producing a Line Plot Graph: Rainfall
3. Adding a header & some x & y axis labels (and making the x axis a bit more legible with month numbers)
4. Trying to express this data as a Bar Chart: Rainfall
5. Adding to the Bar Chart of Rainfall an overlay line graph showing Gust Speeds (and getting the legend to display in the right place)
6. Rainfall with Gust Chart: (adding Gust Speeds to the right hand Axis & Month names along the bottom axis )
Stage 3: Conclusion
The graphs have shown that there are two main seasons for rain in Manchester; later October into November December and January. Febuary is quiet with a brief return in March which falls back in April before the May showers kick in before summer. If we match this to the gust speed the peak times are November December & January. For enthusiastc amateur pumpkin growers like myself these three winter months are the one that can cause the most damage whether it is too much rain or stormy weather damaging the crops. Of course this is no guarantee that the weather pattern will repeat itself in future years. To make a sensible prediction we would need to analyse the winters for many more years. I have a feeling that before this course has ended we will be doing just that.
Section 5: Reference:
Reference: Markdown
For help in learning how to use Markdown:
either select Help Markdown or [click here][GitHub] to learn more about About writing and formatting on GitHub
[GitHub] : https://help.github.com/articles/getting-started-with-writing-and-formatting-on-github/ [wikipedia] : http://en.wikipedia.org/wiki/Markdown#Example [dingus] : http://spec.commonmark.org/dingus/ [babelmark] : http://johnmacfarlane.net/babelmark2/faq.html [workflow] : http://idratherbewriting.com/2013/06/04/exploring-markdown-in-collaborative-authoring-to-publishing-workflows/