CoCalc Shared FilesATMS-305-Content / Week-8 / Week 8 Handouts / Mar 8 in-class exercise.html
Authors: Anna Nesbitt, Steve Nesbitt
Views : 8
Description: Jupyter html version of ATMS-305-Content/Week-8/Week 8 Handouts/Mar 8 in-class exercise.ipynb
(File too big to render with math typesetting.)
Mar 8 in-class exercise

# ATMS 305 : March 8, 2017¶

## Data visualization using xarray and matplotlib¶

In the last exercise, you learned how to use matplotlib to visualize data. In this exercise, we will put it all together:

• subset files using fancy indexing
• plot the results

We're going to use a new python package called xarray. It is a high level interface for netCDF file reading and writing, that will make your life easier. Let's go!

In [1]:
%pylab inline
import xarray as xr

Populating the interactive namespace from numpy and matplotlib


Did you get this this error?

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-c53dcc25b060> in <module>()
----> 1 import xarray

ImportError: No module named xarray

Phooey! What do we do now?

Well, we can install a new package using pip, the python package manager. Many packages can be installed in this way.

Open a terminal and enter

pip install xarray --user --upgrade

When it is done, restart the kernel and try again.

In [2]:
import xarray as xr


Ahhhhhhhh....all better. Let's get to work!

## Using xarray to read a netCDF file¶

It can't be much easier than this...xarray handles a lot of the dirty work for you. We can load both local files, as well as files on the internet. Either give the local file path, or the web site!

In [3]:
nc=xr.open_dataset('https://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/gistemp/combined/250km/air.2x2.250.mon.anom.comb.nc')
nc

Out[3]:
<xarray.Dataset>
Dimensions:  (lat: 90, lon: 180, time: 1645)
Coordinates:
* lat      (lat) float32 89.0 87.0 85.0 83.0 81.0 79.0 77.0 75.0 73.0 71.0 ...
* lon      (lon) float32 1.0 3.0 5.0 7.0 9.0 11.0 13.0 15.0 17.0 19.0 21.0 ...
* time     (time) datetime64[ns] 1880-01-01 1880-02-01 1880-03-01 ...
Data variables:
air      (time, lat, lon) float64 ...
Attributes:
title: GISS Surface Temperature Analysis (GISTEMP): 250km smoothing combined land/ocean
comments: This is the 250km smoothed combined land,ocean version of the dataset on a 2x2 grid. SST dataset used at NASA is now ERSST
platform: Analysis
Source: http://data.giss.nasa.gov/gistemp/ source and http://data.giss.nasa.gov/pub/gistemp/ data
Documentation: http://www.esrl.noaa.gov/psd/data/gridded/data.gistemp.html
references: http://www.esrl.noaa.gov/psd/data/gridded/data.gistemp.html
Conventions: COARDS
history: created at PSD Jun 2009 by CAS. Converted to chunked, deflated non-packed NetCDF4 2014/06
dataset_title: GISS Surface Temperature Analysis
DODS_EXTRA.Unlimited_Dimension: time
EXTRA_DIMENSION.nbnds: 2
In [4]:
ncvar = nc['air']
ncvar

Out[4]:
<xarray.DataArray 'air' (time: 1645, lat: 90, lon: 180)>
[26649000 values with dtype=float64]
Coordinates:
* lat      (lat) float32 89.0 87.0 85.0 83.0 81.0 79.0 77.0 75.0 73.0 71.0 ...
* lon      (lon) float32 1.0 3.0 5.0 7.0 9.0 11.0 13.0 15.0 17.0 19.0 21.0 ...
* time     (time) datetime64[ns] 1880-01-01 1880-02-01 1880-03-01 ...
Attributes:
long_name: Monthly Average Temperature Anomalies
valid_range: [-25.  25.]
units: degC
precision: 2
var_desc: Air Temperature: 250km smoothing combined
dataset: GISS Surface Temperature Analysis (GISTEMP)
level_desc: Surface
statistic: Anomaly
parent_stat: Individual obs
cell_methods: time: anomaly (monthly from values)
standard_name: air_temperature_anomaly
actual_range: [  -20.74464989  9999.        ]
_ChunkSize: [  1  90 180]

This data is a gridded time series at 2 degree resolution of monthly surface temperature anomalies starting in 1880. Let's average over all space dimensions (lat - axis 1, and lon - axis 2). We can use np.mean and its axis keyword (very handy) for this purpose.

This will yield a time series of globally averaged temperature!

In [5]:
%pylab inline

plt.plot(ncvar.time,np.mean(ncvar,axis=(1,2)))
plt.xlabel('Year')
plt.ylabel('Temperature anomaly deg C')
plt.title("It's getting hot up in here!")

Populating the interactive namespace from numpy and matplotlib

Out[5]:
<matplotlib.text.Text at 0x7f99d8a2d748>

We can use the select tool to get a subset in a box (find closest index values of lon and lat) so that we can subset the data and grab the closest point to Champaign-Urbana. We can give it a list of points. Here we will give it one.

In [5]:
nc_cmi=nc.sel(lon=-88.9+360.,lat=40., method='nearest')
nc_cmi

Out[5]:
<xarray.Dataset>
Dimensions:  (time: 1645)
Coordinates:
lat      float32 41.0
lon      float32 271.0
* time     (time) datetime64[ns] 1880-01-01 1880-02-01 1880-03-01 ...
Data variables:
air      (time) float64 ...
Attributes:
title: GISS Surface Temperature Analysis (GISTEMP): 250km smoothing combined land/ocean
comments: This is the 250km smoothed combined land,ocean version of the dataset on a 2x2 grid. SST dataset used at NASA is now ERSST
platform: Analysis
Source: http://data.giss.nasa.gov/gistemp/ source and http://data.giss.nasa.gov/pub/gistemp/ data
Documentation: http://www.esrl.noaa.gov/psd/data/gridded/data.gistemp.html
references: http://www.esrl.noaa.gov/psd/data/gridded/data.gistemp.html
Conventions: COARDS
history: created at PSD Jun 2009 by CAS. Converted to chunked, deflated non-packed NetCDF4 2014/06
dataset_title: GISS Surface Temperature Analysis
DODS_EXTRA.Unlimited_Dimension: time
EXTRA_DIMENSION.nbnds: 2
In [6]:
import numpy as np
import matplotlib.pyplot as plt

var='air'

plt.plot(nc_cmi.time,nc_cmi[var])
plt.xlabel('Year')
plt.ylabel('Temperature anomaly deg C')
plt.title("It's not getting as hot in Champaign")

Out[6]:
<matplotlib.text.Text at 0x7f0f804ee7b8>

## Saving to file - easy as np.pi()¶

Want to save the file as a netCDF file? No problem!

In [ ]:
nc_cmi.to_netcdf('nc_cmi.nc')


## Calculating time averages¶

Let's say we want to average the monthly time series data into annually averaged data.

There are a number of ways to do this. xarray offers time sampling capabilities, similar to pandas. First though, let's do it the hard way.

In [7]:
nc_cmi

Out[7]:
<xarray.Dataset>
Dimensions:  (time: 1645)
Coordinates:
lat      float32 41.0
lon      float32 271.0
* time     (time) datetime64[ns] 1880-01-01 1880-02-01 1880-03-01 ...
Data variables:
air      (time) float64 10.02 4.389 0.8811 0.5576 3.356 1.193 0.2533 ...
Attributes:
title: GISS Surface Temperature Analysis (GISTEMP): 250km smoothing combined land/ocean
comments: This is the 250km smoothed combined land,ocean version of the dataset on a 2x2 grid. SST dataset used at NASA is now ERSST
platform: Analysis
Source: http://data.giss.nasa.gov/gistemp/ source and http://data.giss.nasa.gov/pub/gistemp/ data
Documentation: http://www.esrl.noaa.gov/psd/data/gridded/data.gistemp.html
references: http://www.esrl.noaa.gov/psd/data/gridded/data.gistemp.html
Conventions: COARDS
history: created at PSD Jun 2009 by CAS. Converted to chunked, deflated non-packed NetCDF4 2014/06
dataset_title: GISS Surface Temperature Analysis
DODS_EXTRA.Unlimited_Dimension: time
EXTRA_DIMENSION.nbnds: 2
In [8]:
# How many years do we have?

ntimes=np.shape(nc_cmi['time'])
print(ntimes[0]/12.)

137.08333333333334


It looks like we don't have an evenly divisible number of months, so we don't have complete years. Let's start at the beginning and loop by 12.

In [9]:
nyears=np.floor(ntimes[0]/12.)
averages=np.zeros(nyears)

for i in np.arange(nyears):
averages[i]=np.mean(nc_cmi[var][i*12:(i+1)*12-1])

/projects/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:2: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
from ipykernel import kernelapp as app
/projects/anaconda3/lib/python3.5/site-packages/pandas/tseries/base.py:276: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
result = getitem(key)
/projects/5aeec648-46ad-40c6-af03-9a8215dee00d/.local/lib/python3.5/site-packages/xarray/core/indexing.py:410: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
return self._ensure_ndarray(self.array[key])
/projects/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:5: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future


Save time series to a new file, just Champaign.

In [10]:
site='Champaign_annualavgs'

nc_cmi.to_netcdf(site+'_data.nc')


Read it back in to check!

In [12]:
nc_cmi2 = xr.open_dataset('Champaign_annualavgs_data.nc')
nc_cmi2

Out[12]:
<xarray.Dataset>
Dimensions:  (time: 1645)
Coordinates:
lat      float32 41.0
lon      float32 271.0
* time     (time) datetime64[ns] 1880-01-01 1880-02-01 1880-03-01 ...
Data variables:
air      (time) float64 10.02 4.391 0.8828 0.5547 3.359 1.195 0.25 1.242 ...
Attributes:
title: GISS Surface Temperature Analysis (GISTEMP): 250km smoothing combined land/ocean
comments: This is the 250km smoothed combined land,ocean version of the dataset on a 2x2 grid. SST dataset used at NASA is now ERSST
platform: Analysis
Source: http://data.giss.nasa.gov/gistemp/ source and http://data.giss.nasa.gov/pub/gistemp/ data
Documentation: http://www.esrl.noaa.gov/psd/data/gridded/data.gistemp.html
references: http://www.esrl.noaa.gov/psd/data/gridded/data.gistemp.html
Conventions: COARDS
history: created at PSD Jun 2009 by CAS. Converted to chunked, deflated non-packed NetCDF4 2014/06
dataset_title: GISS Surface Temperature Analysis
DODS_EXTRA.Unlimited_Dimension: time
EXTRA_DIMENSION.nbnds: 2

Now the easy way. We xarray and pandas share the same interface to resample and group time series conveniently. The documentation is available at: http://xarray.pydata.org/en/stable/time-series.html#resampling-and-grouped-operations. The codes for resampling are the same as pandas. See http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases

In [13]:
nc_cmi3=nc_cmi.resample('AS', dim='time', how='mean')
nc_cmi3

Out[13]:
<xarray.Dataset>
Dimensions:  (time: 138)
Coordinates:
* time     (time) datetime64[ns] 1880-01-01 1881-01-01 1882-01-01 ...
Data variables:
air      (time) float64 0.8747 0.503 0.7313 -1.04 -0.06189 -1.009 ...

How easy is that? np.pi()? You can also resample with other time frequencies, or in space, or change how you do the calculation (i.e., calculate the median instead of the mean).

Now save to a file:

In [ ]:
site='Champaign_annualavgs'

nc_cmi3.to_netcdf(site+'_data.nc')


Make a (nice) plot!

In [15]:
plt.figure(figsize=(11,8.5)) #create a new figure

plt.plot(nc_cmi['time'],np.squeeze(nc_cmi['air']),'b',alpha=0.5)
plt.plot(nc_cmi3['time'],np.squeeze(nc_cmi3['air']),'r',linewidth=2.0)
plt.legend(['Monthly averages','Annual averages'])
plt.xlabel('Year')
plt.ylabel('Temperature Anomaly (degrees C)')
plt.title('GISTEMP Temperature Anomalies near Champaign, IL')
plt.show()


## Introduction to mapping with basemap.¶

Python's basemap is a package that is designed to enable matplotlib handle geographic calculations and visualize maps with geographic boundaries. All we need is an array with a 2-D variable, and associated latitudes and longitudes, and off we go.

In [19]:
import pandas as pd
from mpl_toolkits.basemap import Basemap

#Select January 2017, make lats and lons 2D so basemap knows where to plot each point

jan2017 = nc.sel(time=pd.datetime(1976,9,1))

#Clean up data masking - this will maske where data is nan, and put the good values in there

lons,lats=np.meshgrid(jan2017['lon'].values,jan2017['lat'].values)

# create figure, axes instances.
fig = plt.figure()
# create Basemap instance.
# coastlines not used, so resolution set to None to skip
# continent processing (this speeds things up a bit)
m = Basemap(projection='kav7',lon_0=0,resolution='l')
# draw line around map projection limb.
# color background of map projection region.
# missing values over land will show up this color.
m.drawmapboundary(fill_color='0.3')
# plot temperature with pcolor
im1 = m.pcolormesh(lons,lats,data,latlon=True,vmin=-10,vmax=10)
# draw parallels and meridians, but don't bother labelling them.
m.drawparallels(np.arange(-90.,99.,30.))
m.drawmeridians(np.arange(-180.,180.,60.))
m.drawcoastlines()
cb.set_label('Temperature anomaly (deg C)')
ax.set_title('GISS Temperature Anomalies for Jan 2017')
plt.show()


Zoom in, use a different map projection.

In [20]:
from mpl_toolkits.basemap import Basemap
import numpy as np
import matplotlib.pyplot as plt
# create new figure, axes instances.
fig=plt.figure()
# setup mercator map projection.
m = Basemap(llcrnrlon=-130.,llcrnrlat=20.,urcrnrlon=-55.,urcrnrlat=60.,\
rsphere=(6378137.00,6356752.3142),\
resolution='l',projection='merc',\
lat_0=40.,lon_0=-20.,lat_ts=20.)
im1 = m.pcolormesh(lons,lats,data,latlon=True,vmin=-10,vmax=10)
m.drawcoastlines()
m.drawcountries(color='k',linewidth=2,linestyle='--')
# draw parallels
m.drawparallels(np.arange(10,90,20),labels=[1,1,0,1])
# draw meridians
m.drawmeridians(np.arange(-180,180,30),labels=[1,1,0,1])