The purpose of the project is to examine historic weather data from the Weather Underground for London to try to predict the best dates this year to take a nice warm staycation. My aim will be to:
obtain weather data for the year of 2015
clean the obtained data
run some basic data analysis techniques on the data set to:
find two weeks with the highest mean temperature; and,
avoid precipitation where possible.
The weather may of course may be very different this year to the weather of 2015, but it should give me some indication of when would be a good time to take a break.
Getting the data
The weather data was obtained from the Weather Underground website, using the dates 1st Jan 2015 til 31st Dec 2015, and saved as 'London_2015.csv'.
To obtain the data you must first enter London, United Kingdom as the location, and hit submit. On the following page there are some tabs - select 'custom', and from here you can enter the dates. The option to see the data in a CSV format is at the very bottom of the page underneath the data. This can be right-click-saved, and renamed from a .html to a .csv ready for use.
If you haven't the 'London_2014.csv' file, you can obtain the data as follows. Right-click on the following URL and choose 'Open Link in New Window' (or similar, depending on your browser):
First I will display some of the data to see if there are any obvious issues.
First we need to clean up the data. I'm not going to make use of 'WindDirDegrees' in my analysis, but you might in yours so we'll rename 'WindDirDegrees< br />' to 'WindDirDegrees'.
Max Wind SpeedKm/h
Mean Wind SpeedKm/h
Max Gust SpeedKm/h
5 rows × 23 columns
There are some immediately obvious issues with the data:
the final column: 'WindDirDegrees<br />' and its contents have retained the html line breaks on the end of the data line
this means that the final column will be an object dtype as opposed to int64 as intended (as shown below)
the GMT column has the dtype 'object' as opposed to 'datetime' (as shown below)
there are various NaN values in the results
Max TemperatureC int64
Mean TemperatureC int64
Min TemperatureC int64
Dew PointC int64
MeanDew PointC int64
Min DewpointC int64
Max Humidity int64
Mean Humidity int64
Min Humidity int64
Max Sea Level PressurehPa int64
Mean Sea Level PressurehPa int64
Min Sea Level PressurehPa int64
Max VisibilityKm int64
Mean VisibilityKm int64
Min VisibilitykM int64
Max Wind SpeedKm/h int64
Mean Wind SpeedKm/h int64
Max Gust SpeedKm/h float64
WindDirDegrees<br /> object
First I will rename the column to remove the html line breaks:
Finally, I change the values in the 'GMT' column to the datetime64 dtype:
I also need to change the index from the default to the datetime64 values in the 'GMT' column so that it is easier to pull out rows between particular dates and display more meaningful graphs:
Now I need to address the 'NaN' values in the data and then decide what to do with them. The intentions for this project are to use the 'Mean TemperatureC' and 'Precipitationmm' column values to establish the best dates for the staycation, so first I will check if here are any NaN values in these columns:
meanTempNaN=len(london[london['Mean TemperatureC'].isnull()])precipitationmm=len(london[london['Precipitationmm'].isnull()])print("The number of NaN values in the mean temperature column and the precipitation column\ are %d and %d respectively."%(meanTempNaN,precipitationmm))
The number of NaN values in the mean temperature column and the precipitation column are 0 and 0 respectively.
Considering that there are no NaN values in the data I will actually be utilising for this project, I am able to ignore the NaN values in the dataframe for this project.
Finding a summer break
According to meteorologists, summer extends for the whole months of June, July, and August in the northern hemisphere and the whole months of December, January, and February in the southern hemisphere. I'm in the northern hemisphere, so I'm going to create a dataframe that holds just those months, and starting from tomorrow's date (today is July 2nd 2016):
<matplotlib.axes._subplots.AxesSubplot at 0x7f6cfb2f6dd8>
It seems that there were days of high precipitation in the last week of both July and August.
Both months seem very similar on face value, so for the best chance an enjoyable staycation, I will look at the mean of the 'mean temperatureC' for each month to see if there is statistically better option:
july=remainingSummer.ix[datetime(2015,7,1):datetime(2015,7,31)]august=remainingSummer.ix[datetime(2015,8,1):datetime(2015,8,31)]julyTempMean=float(july[['Mean TemperatureC']].values.mean())augTempMean=float(august[['Mean TemperatureC']].values.mean())MeanTemperatures="Mean temperature for July : %0.1fºC\nMean temperature for August: %0.1fºC"%(julyTempMean,augTempMean)print(MeanTemperatures)
Mean temperature for July : 18.1ºC
Mean temperature for August: 17.8ºC
With the mean temperatures for each month varying by only 0.3ºC, not a noticable difference; I have decided to also examine the mean precipitation of both months to see if the result makes a particular month a clearer best choice:
julyPrecipMean=float(july[['Precipitationmm']].values.mean())augPrecipMean=float(august[['Precipitationmm']].values.mean())MeanPrecipitation="Mean precipitation for July : %0.1fmm\nMean precipitation for August: %0.1fmm"%(julyPrecipMean,augPrecipMean)print(MeanPrecipitation)
Mean precipitation for July : 2.1mm
Mean precipitation for August: 3.2mm
July had 0.9mm less precipitation on average per day than August.
The graphs have shown both July and August both had very similar weather throughout the month. Ultimately, July had both a higher average mean temperature across the month, and a lower precipitation level, so I will take my 2 week staycation this month, starting immediately! (That way I can get more practice with Python)
Of course these results are no guarantee that the weather pattern will repeat itself in future years. To make a sensible prediction I would need to analyse the summers for many more years. I am currently studying to expand my skills to be able to achieve this in future projects.