Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
| Download

This repository contains the course materials from Math 157: Intro to Mathematical Software.

Creative Commons BY-SA 4.0 license.

Views: 3037
License: OTHER
Kernel: SageMath 8.1

Math 157: Intro to Mathematical Software

UC San Diego, winter 2018

Homework 7: due March 2, 2018

Please enter all answers within this notebook unless otherwise specified. As usual, don't forget to cite sources and collaborators.

Through this problem set, use the SageMath 8.1 kernel except as specified. You may find the following declarations useful:

preparser(False) # Turn off Sage preparser import pandas as pd # Load pandas import numpy as np # Load numpy import matplotlib # Load matplotlib import matplotlib.pyplot as plt # Load pyplot matplotlib.style.use('ggplot') # Use R-style plotting in matplotlib import seaborn as sns # Load seaborn

This homework consists of 5 problems, each of equal value.

preparser(False) # Turn off Sage preparser import pandas as pd # Load pandas import numpy as np # Load numpy import matplotlib # Load matplotlib import matplotlib.pyplot as plt # Load pyplot matplotlib.style.use('ggplot') # Use R-style plotting in matplotlib import seaborn as sns # Load seaborn import statsmodels.api as sm

Problem 1: Emulation of R in Python

Grading criteria: correctness of code.

Demonstrate Python analogues of the following R code blocks from the previous homework. Hints:

  • The Python statsmodels module includes the submodule datasets which simulates the corresponding R package.

  • The R pairs function can be simulated using the pandas function scatter_matrix.

  • The seaborn function FacetGrid allows you to set up a grid in which each entry corresponds to a particular value of a conditioning variable. Use this, and the matplotlib scatter plot functionality, to simulate the R function coplot.

  • The statsmodels module mosaicplot can simulate the R function mosaicplot.

from statsmodels import datasets %load_ext rpy2.ipython
%%R pairs(mtcars)
Image in a Jupyter notebook
mtcars=sm.datasets.get_rdataset('mtcars').data
/ext/sage/sage-8.1/local/lib/python2.7/site-packages/statsmodels/datasets/utils.py:243: DeprecationWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing See the documentation here: http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated dataset_meta = index.ix[idx]
pd.plotting.scatter_matrix(mtcars)
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fa7f237e890>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7f237e5d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7efdda2d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7efd5c390>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7efd3c950>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7efcbcad0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7efd72590>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7efba47d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7efb29750>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7efa99810>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7efa16d90>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef9fec10>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef97ce90>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef9b4fd0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef874b50>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef7f5c10>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef772150>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef6f12d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef89aad0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef5d7410>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef5d7d90>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef4d1110>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef4501d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef4342d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef3b6390>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef434c50>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef293f50>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef22f050>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef18d890>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef10da10>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef33ea10>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef074b50>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef039550>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fa7eef61050>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7eeee3110>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ef240d90>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7eedc8150>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7eedf5b10>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7eed23e10>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7eeca5ed0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7eec09f50>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7eeba7050>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7eec36d90>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7eeb02c10>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fa7eea04cd0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee9fe410>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee980590>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7eeab8a90>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee8646d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee8280d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee75c490>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee6dc550>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee6bacd0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee5d4310>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee6679d0>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee537510>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee4eaed0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee416f10>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee398fd0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee360610>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee28a190>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee338950>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee1ed4d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee16d6d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee0cea90>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee04fc10>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ee26fc50>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7edfb5d50>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7edf7d750>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7edeab9d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ede31910>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7edda8310>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7edd24990>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7eddbe950>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7edc0aa50>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7edbd0450>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7edb03490>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fa7eda873d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7eda61d90>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed9fa450>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7eda10c90>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed8e0510>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed8e0e90>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed83df10>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed7c1e50>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed739850>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed6b5ed0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed6e11d0>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed59cf90>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed565990>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed4949d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed419910>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed3f2310>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed370810>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed3a53d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed26b110>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed1ec050>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed14b6d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed0d0510>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed046150>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ed041850>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ecfdcb90>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ecf3d150>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ecebd2d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ece1e790>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ecd9e910>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ecf5e1d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ecd019d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ecc493d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ecbfb6d0>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ecb81610>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ecadbfd0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7eca74690>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7eca89e10>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ec959750>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ec91c150>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ec852190>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ec7d50d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ec7b0a90>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ec6c9150>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa7ec75c850>]], dtype=object)
Image in a Jupyter notebook
%%R require(stats); require(graphics) coplot(circumference ~ age | Tree, data = Orange, show.given = FALSE)
Image in a Jupyter notebook
import rpy2.robjects as robjects from rpy2.robjects import r, pandas2ri pandas2ri.activate()
Orange = pandas2ri.ri2py(r['Orange']) # Convert into a pandas dataframe
g = sns.FacetGrid(Orange, col='Tree') # Make a different graph for each tree g.map(plt.scatter, "age", "circumference") # Put a scatter plot in each graph
<seaborn.axisgrid.FacetGrid object at 0x7fa7e16fb6d0>
Image in a Jupyter notebook
%%R x <- apply(HairEyeColor, c(1, 2), sum) x mosaicplot(x)
Image in a Jupyter notebook
HairEyeColor=sm.datasets.get_rdataset('HairEyeColor').data
HEC2=HairEyeColor.groupby(["Hair", "Eye"]).sum()
from statsmodels.graphics.mosaicplot import mosaic mosaic(HEC2['Freq']) # See here for some discussion on why this is needed https://github.com/statsmodels/statsmodels/issues/2190
(<matplotlib.figure.Figure object at 0x7fa7e1030a90>, OrderedDict([(('Black', 'Blue'), (0.0, 0.0, 0.17973638663293834, 0.18335166850018333)), (('Black', 'Brown'), (0.0, 0.18665199853318665, 0.17973638663293834, 0.62339567290062325)), (('Black', 'Green'), (0.0, 0.8133480014668133, 0.17973638663293834, 0.045837917125045931)), (('Black', 'Hazel'), (0.0, 0.86248624862486256, 0.17973638663293834, 0.13751375137513747)), (('Blond', 'Blue'), (0.18466249500732257, 0.0, 0.21135667687391826, 0.73282918843065414)), (('Blond', 'Brown'), (0.18466249500732257, 0.73612951846365737, 0.21135667687391826, 0.054572386372495528)), (('Blond', 'Green'), (0.18466249500732257, 0.79400223486915622, 0.21135667687391826, 0.1247368831371326)), (('Blond', 'Hazel'), (0.18466249500732257, 0.92203944803929216, 0.21135667687391826, 0.07796055196070785)), (('Brown', 'Blue'), (0.4009452802556251, 0.0, 0.47596857941685516, 0.29079831060029077)), (('Brown', 'Brown'), (0.4009452802556251, 0.29409864063329411, 0.47596857941685516, 0.41196427335041197)), (('Brown', 'Green'), (0.4009452802556251, 0.70936324401670936, 0.47596857941685516, 0.10039465485010035)), (('Brown', 'Hazel'), (0.4009452802556251, 0.81305822889981305, 0.47596857941685516, 0.18694177110018698)), (('Red', 'Blue'), (0.8818399680468645, 0.0, 0.11816003195313542, 0.23706596011713846)), (('Red', 'Brown'), (0.8818399680468645, 0.24036629015014174, 0.11816003195313542, 0.36257146841444704)), (('Red', 'Green'), (0.8818399680468645, 0.60623808859759221, 0.11816003195313542, 0.19523079068470234)), (('Red', 'Hazel'), (0.8818399680468645, 0.80476920931529772, 0.11816003195313542, 0.19523079068470223))]))
Image in a Jupyter notebook

Problem 2: Sunspots revisited

Grading criteria: correctness of code and results.

Let sunspots be the sunactivity dataframe (defined below for you).

from statsmodels import datasets sunspots = datasets.sunspots.load_pandas().data.set_index("YEAR")

2a. For how many years was the activity ≥100\geq 100?

sunspots[sunspots.SUNACTIVITY>100].size
43

2b. Make a histogram plot of all activity from 1900 to the end of the dataset.

sunspots[sunspots.index>=1900].hist()
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fa7e0e52b90>]], dtype=object)
Image in a Jupyter notebook

2c. Which year(s) had the highest activity?

sunspots[sunspots.SUNACTIVITY>190]
SUNACTIVITY
YEAR
1957.0 190.2

Problem 3: Pivot tables

Grading criteria: correctness of code and explanations.

3a. Load the "mpg" R dataset from the Python ggplot library into the variable mpg.

from ggplot import mpg mpg.head()
/ext/sage/sage-8.1/local/lib/python2.7/site-packages/ggplot/utils.py:81: FutureWarning: pandas.tslib is deprecated and will be removed in a future version. You can access Timestamp as pandas.Timestamp pd.tslib.Timestamp, /ext/sage/sage-8.1/local/lib/python2.7/site-packages/ggplot/stats/smoothers.py:4: FutureWarning: The pandas.lib module is deprecated and will be removed in a future version. These are private functions and can be accessed from pandas._libs.lib instead from pandas.lib import Timestamp
manufacturer model displ year cyl trans drv cty hwy fl class
0 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
1 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
2 audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
3 audi a4 2.0 2008 4 auto(av) f 21 30 p compact
4 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact

3b. Using the pandas pivot_table command, cretae a pandas DataFrame that tells you the average "cty" and "hwy" (city and highway miles per gallon) for each manufacturer.

pivoted=pd.pivot_table(mpg, values=['cty', 'hwy'], index='manufacturer') pivoted
cty hwy
manufacturer
audi 17.611111 26.444444
chevrolet 15.000000 21.894737
dodge 13.135135 17.945946
ford 14.000000 19.360000
honda 24.444444 32.555556
hyundai 18.642857 26.857143
jeep 13.500000 17.625000
land rover 11.500000 16.500000
lincoln 11.333333 17.000000
mercury 13.250000 18.000000
nissan 18.076923 24.615385
pontiac 17.000000 26.400000
subaru 19.285714 25.571429
toyota 18.529412 24.911765
volkswagen 20.925926 29.222222

3c. Has the average city mileage improved from 1999 to 2008? Has the average highway mileage improved from 1999 to 2008?

pivoted2=pd.pivot_table(mpg, values=['cty', 'hwy'], index='year') pivoted2
cty hwy
year
1999 17.017094 23.427350
2008 16.700855 23.452991

3d. Create a scatterplot of pairs (displ, hwy) for all cars in 1999, and another for all cars in 2008.

mpg[mpg.year==1999].plot.scatter('displ', 'hwy') mpg[mpg.year==2008].plot.scatter('displ', 'hwy')
<matplotlib.axes._subplots.AxesSubplot object at 0x7fa7e0b1bed0>
Image in a Jupyter notebookImage in a Jupyter notebook

3e. What effect does increasing displacement have on highway gas mileage?

Greater displacement tends to make highway gas mileage worse.

Problem 4: Irises

Grading criteria: correctness of code and explanations.

The iris dataset is a famous example used in statistics education.

4a. Load the iris dataset into a pandas DataFrame and use the describe command to see some basic statistics.

iris = sns.load_dataset('iris') iris.describe()
sepal_length sepal_width petal_length petal_width
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000 1.199333
std 0.828066 0.435866 1.765298 0.762238
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000

4b. Plot all of the sepal (length, width) pairs in a scatterplot, and the petal (length, width) pairs in another scatterplot.

iris.plot.scatter('sepal_length', 'sepal_width') iris.plot.scatter('petal_length', 'petal_width')
<matplotlib.axes._subplots.AxesSubplot object at 0x7fa7d9003990>
Image in a Jupyter notebookImage in a Jupyter notebook

4c. Compute the average petal width for each of the "species"-categories.

pd.pivot_table(iris, values=['petal_width'], index='species')
petal_width
species
setosa 0.246
versicolor 1.326
virginica 2.026

Problem 5: Machine learning with irises

Grading criteria: correctness and relevance of code.

5a. The Wikipedia article on the iris dataset asserts:

The use of this data set in cluster analysis however is not common, since the data set only contains two clusters with rather obvious separation.

Demonstrate this by performing a clustering computation and showing that it fails to separate the three species.

%%R library(cluster) set.seed(1) isGoodCol <- function(col){ sum(is.na(col)) == 0 && is.numeric(col) } goodCols <- sapply(iris, isGoodCol) clusters <- kmeans(iris[,goodCols], centers=3) labels <- clusters$cluster
%%R iris2d <- prcomp(iris[,goodCols], center=TRUE) twoColumns <- iris2d$x[,1:2] clusplot(twoColumns, labels)
Image in a Jupyter notebook

5b. Use the scikit-learn SVM classifier to classify species. Use a random sample of 80% of the initial data for training and the other 20% for testing, and report the accuracy rate of your predictions.

from sklearn import svm clf = svm.SVC(gamma=0.001, C=100.)
iris_array=iris.as_matrix(['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']) # Converting to numpy array np.random.shuffle(iris_array) # Randomizing the rows iris_data=iris_array[:,:4] # Data is the first 4 columns iris_target=iris_array[:, [4]] # Trying to guess the last column
clf.fit(iris_data[:120], iris_target[:120]) # Use the first 80% of the randomized data to train pre = clf.predict(iris_data[120:]) # Use the model to predict the last 20% ans = iris_target[120:]
/ext/sage/sage-8.1/local/lib/python2.7/site-packages/sklearn/utils/validation.py:578: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True)
len([i for i in range(30) if pre[i] == ans[i]]) # Compare the predictions to the answers, all correct!
30