Let's say we want to average the monthly time series data from the previous exercise into annually averaged data.
There are a number of ways to do this. Probably the easiest in this case, since we know the data is monthly, would be to chunk through every 12 data points. Looking at the dates, we see that the data starts in January. So, if we loop every 12 points, and take the average of those, we should be able to get an annual average for each year.
Dimensions: (lat: 1, lon: 1, nv: 2, time: 1628)
* lat (lat) float32 41.0
* lon (lon) float32 -89.0
* time (time) datetime64[ns] 1880-01-15 1880-02-15 1880-03-15 ...
* nv (nv) int64 0 1
time_bnds (time, nv) int32 29219 29250 29250 29279 29279 29310 29310 ...
tempanomaly (time, lat, lon) float64 7.94 3.23 0.04 0.07 2.96 0.62 ...
title: GISTEMP Surface Temperature Analysis
institution: NASA Goddard Institute for Space Studies
history: Created 2015-09-11 09:23:38 by SBBX_to_nc 2.0 - ILAND=1200, IOCEAN=NCDC/ER4, Base: 1951-1980
Let's calculate how many years of data we have.
number of years=12number of months
# How many years do we have?ntimes=np.shape(nc_cmi['time'])print(ntimes/12.)
OK, we don't have an even number of months going into years. The data starts in January 1880, so it must go partially into a year. Let's forget about the last partial year, and do the averages starting in January 1880 through the end. We can calculate the number of years to calculate by using numpy's floor command (rounding down to the nearest integer after dividing by 12. We will then create numpy array of averages, nyears long, that will store the data, and the dates.
Now, we need to devise an algorithm to average the monthly data by year. This can be accomplished by indexing the data to include every 12 months in a list, and subsetting the array. It can be helpful to think about this as a numberline problem. In the diagram below, the -'s are each month in the dataset (12 per year), and we want to average over each year in the dataset, from 1880-2014.
|------------|------------|------------| ... |------------|-----------|----- 1880 1881 1882 1883 2013 2014 2015 year 0 12 24 36 1596 1608 1620 index of January of each year 0 1 2 3 134 135 136 index of year
Thinking about how to go about this, we know that we need to pick the first 12 months of the data to get the 1880 data. This can be accomplished by using indexing of the data and time. Let's try to find the first year, keeping in mind that the first element (January 1880) is index 0, and the last index will not be included (convention of python).
data_1880=nc_cmi['tempanomaly'][0:12]time_1880=nc_cmi['time'][0:12]#let's check the time to see if it workedprinttime_1880
Now we need to automate this - we don't want to do this for all the months in the file! Time for a for loop. We can loop over each year by using nyears that we calculated above. Now we need to map from the counter that will go from 0:nyears to the months. From the above, we want to have the first element be the January of each year, and the last December, keeping in mind how python indexing works. We can multiply the counter by 12, then add one to the counter and multiply it by 12 to get the right data.
Plot the original and annually-averaged time series.
plt.figure(figsize=(11,8.5))#create a new figureplt.plot(nc_cmi['time'],np.squeeze(nc_cmi['tempanomaly']),'b',alpha=0.5)plt.plot(nc_cmi_ann['time'],np.squeeze(nc_cmi_ann['tempanomaly']),'r',linewidth=2.0)plt.legend(['Monthly averages','Annual Averages'])plt.xlabel('Year')plt.ylabel('Temperature Anomaly (degrees C)')plt.title('GISTEMP Temperature Anomalies near Champaign, IL')plt.show()