CoCalc -- Chp_04.ipynb

📚 The CoCalc Library - books, templates and other resources

Project: 📚 The Library - Shared Public Version

Path: cocalc-examples / statistical-rethinking-python-PyMC3 / Chp_04.ipynb

Views: ⁹⁵⁶³⁶
License: OTHER

Kernel: Python 3

In [1]:

%matplotlib inline
import pymc3 as pm
import numpy as np
import pandas as pd
from scipy import stats
from scipy.interpolate import griddata
import matplotlib.pyplot as plt

%config InlineBackend.figure_format = 'retina'
plt.style.use(['seaborn-colorblind', 'seaborn-darkgrid'])

Code 4.1

In [2]:

pos = np.random.uniform(-1, 1, size=(16, 1000)).sum(0)
pm.kdeplot(pos)
plt.xlabel('position', fontsize=14)
plt.ylabel('Density', fontsize=14);

Code 4.2 and 4.3

In [3]:

pos = np.random.uniform(1, 1.1, size=(12, 10000)).prod(0)
pm.kdeplot(pos);

Code 4.4

In [4]:

big = np.random.uniform(1, 1.5, size=(12, 10000)).prod(0)
small = np.random.uniform(1, 1.01, size=(12, 10000)).prod(0)
_, ax = plt.subplots(1,2, figsize=(8,4))
pm.kdeplot(big, ax=ax[0])
pm.kdeplot(small, ax=ax[1]);

Code 4.5

In [5]:

log_big = np.log(np.random.uniform(1, 1.5, size=(12, 10000)).prod(0))
pm.kdeplot(log_big);

Code 4.6

In [6]:

w, n = 6, 9
p_grid = np.linspace(0, 1, 100)
posterior = stats.binom.pmf(k=w, n=n, p=p_grid) * stats.uniform.pdf(p_grid, 0, 1)
posterior = posterior / (posterior).sum()
plt.plot(p_grid, posterior)
plt.xlabel('p', fontsize=14)
plt.ylabel('Density', fontsize=14);

Code 4.7 and 4.8

In [7]:

d = pd.read_csv('Data/Howell1.csv', sep=';', header=0)
d.head()

	height	weight	age	male
0	151.765	47.825606	63.0	1
1	139.700	36.485807	63.0	0
2	136.525	31.864838	65.0	0
3	156.845	53.041915	41.0	1
4	145.415	41.276872	51.0	0

Code 4.9

In [8]:

d.height.head()

  151.765
  139.700
  136.525
  156.845
  145.415
Name: height, dtype: float64

Code 4.10

In [9]:

d2 = d[d.age >= 18]

Code 4.11

In [10]:

x = np.linspace(100, 250, 100)
plt.plot(x, stats.norm.pdf(x, 178, 20));

Code 4.12

In [11]:

x = np.linspace(-10, 60, 100)
plt.plot(x, stats.uniform.pdf(x, 0, 50));

Code 4.13

In [12]:

n_samples = 1000
sample_mu = stats.norm.rvs(loc=178, scale=20, size=n_samples)
sample_sigma = stats.uniform.rvs(loc=0, scale=50, size=n_samples)
prior_h = stats.norm.rvs(loc=sample_mu, scale=sample_sigma)
pm.kdeplot(prior_h)
plt.xlabel('heights', fontsize=14)
plt.yticks([]);

Code 4.14

In [13]:

post = np.mgrid[140:160:0.1, 4:9:0.1].reshape(2,-1).T

likelihood = [sum(stats.norm.logpdf(d2.height, loc=post[:,0][i], scale=post[:,1][i])) for i in range(len(post))]

post_prod = (likelihood + 
             stats.norm.logpdf(post[:,0], loc=178, scale=20) + 
             stats.uniform.logpdf(post[:,1], loc=0, scale=50))
post_prob = np.exp(post_prod - max(post_prod))

Code 4.15 and 4.16

In [14]:

xi = np.linspace(post[:,0].min(), post[:,0].max(), 100)
yi = np.linspace(post[:,1].min(), post[:,1].max(), 100)
zi = griddata((post[:,0], post[:,1]), post_prob, (xi[None,:], yi[:,None]))

plt.contour(xi, yi, zi);

Code 4.17 and 4.18

In [15]:

sample_rows = np.random.choice(np.arange(len(post)), size=10000, replace=True, 
                               p=(post_prob/post_prob.sum()))
sample_mu = post[:,0][sample_rows]
sample_sigma = post[:,1][sample_rows]

plt.plot(sample_mu, sample_sigma, 'o', alpha=0.05)
plt.axis('equal')
plt.grid(False)
plt.xlabel('sample_mu', fontsize=14)
plt.ylabel('sample_sigma', fontsize=14);

Code 4.19

In [16]:

_, ax = plt.subplots(1,2, figsize=(8,4))
pm.kdeplot(sample_mu, ax=ax[0])
ax[0].set_xlabel('sample_mu', fontsize=14)
ax[0].set_yticks([])
pm.kdeplot(sample_sigma, ax=ax[1])
ax[1].set_xlabel('sample_sigma', fontsize=14)
ax[1].set_yticks([]);

Code 4.20

In [17]:

pm.hpd(sample_mu), pm.hpd(sample_sigma)

(array([ 153.8,  155.4]), array([ 7.3,  8.4]))

Code 4.21 and 4.22

In [18]:

d3 = np.random.choice(d2.height, 20)
post2 = np.mgrid[150:170:0.1, 4:20:0.1].reshape(2,-1).T

likelihood2 = [sum(stats.norm.logpdf(d3, loc=post[:,0][i], scale=post[:,1][i])) for i in range(len(post))]

post_prod2 = (likelihood + 
              stats.norm.logpdf(post[:,0], loc=178, scale=20) + 
              stats.uniform.logpdf(post[:,1], loc=0, scale=50))

post_prob2 = np.exp(post_prod - max(post_prod))

sample_rows2 = np.random.choice(np.arange(len(post)), size=10000, replace=True, 
                               p=(post_prob/post_prob.sum()))
sample_mu2 = post[:,0][sample_rows]
sample_sigma2 = post[:,1][sample_rows]

In [19]:

plt.plot(sample_mu2, sample_sigma2, 'o', alpha=0.05)
plt.axis('equal')
plt.xlabel('sample_mu2', fontsize=14)
plt.ylabel('sample_sigma2', fontsize=14)
plt.grid(False)

Code 4.23

In [21]:

pm.kdeplot(sample_sigma2)
plt.xlabel('sample_sigma2', fontsize=14)
plt.yticks([]);

Code 4.24

We are repeating code 4.7, 4.8 and 4.10

In [22]:

d = pd.read_csv('Data/Howell1.csv', sep=';', header=0)
d2 = d[d.age >= 18]

Code 4.25

In [23]:

with pm.Model() as m4_1:
    mu = pm.Normal('mu', mu=178, sd=20)
    sigma = pm.Uniform('sigma', lower=0, upper=50)
    height = pm.Normal('height', mu=mu, sd=sigma, observed=d2.height)

Code 4.26

We could use a quadratic approximation like McElreath does in his book and we did in code 2.6. But Using PyMC3 is really simple to just sample from the model using a "sampler method". Most common sampler methods are members of the Markov Chain Monte Carlo Method (MCMC) family (for details read Section 2.4.3 and Chapter 8 of Statistical Rethinking).

PyMC3 comes with various sampler. Some sampler are more suited than others for certain type of variable (and/or problems). For now we are going to let PyMC3 choose the sampler for us. PyMC3 also tries to provide a reasonable starting point for the simulation. By default PyMC3 uses the same adaptive procedure as in STAN 'jitter+adapt_diag', which start with a identity mass matrix and then adapt a diagonal based on the variance of the tuning samples.

You can read more details of PyMC3 here

In [24]:

with m4_1:
    trace_4_1 = pm.sample(1000, tune=1000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
100%|██████████| 2000/2000 [00:02<00:00, 785.58it/s]

In [25]:

pm.traceplot(trace_4_1); # this function let you check the samples values

Code 4.27

Notice that comapred to the table in the book we have an extra column, "mc_error". Since we are sampling from the posterior, there is an error introducing by the sampling process. This error can be reduced by taking more samples.

In [26]:

pm.summary(trace_4_1, alpha=.11).round(2)

	mean	sd	mc_error	hpd_5.5	hpd_94.5	n_eff	Rhat
mu	154.61	0.41	0.01	154.03	155.31	2000.0	1.0
sigma	7.77	0.29	0.01	7.31	8.24	2000.0	1.0

Code 4.28

In [27]:

with pm.Model() as m4_1:
    mu = pm.Normal('mu', mu=178, sd=20, testval=d2.height.mean())
    sigma = pm.Uniform('sigma', lower=0, upper=50, testval=d2.height.std())
    height = pm.Normal('height', mu=mu, sd=sigma, observed=d2.height)
    trace_4_1 = pm.sample(1000, tune=1000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
100%|██████████| 2000/2000 [00:02<00:00, 687.21it/s]

In [28]:

pm.traceplot(trace_4_1); 
pm.summary(trace_4_1, alpha=.11).round(2)

	mean	sd	mc_error	hpd_5.5	hpd_94.5	n_eff	Rhat
mu	154.61	0.41	0.01	153.97	155.28	1837.0	1.0
sigma	7.75	0.30	0.01	7.24	8.19	1634.0	1.0

Code 4.29

In [29]:

with pm.Model() as m4_2:
    mu = pm.Normal('mu', mu=178, sd=0.1)
    sigma = pm.Uniform('sigma', lower=0, upper=50)
    height = pm.Normal('height', mu=mu, sd=sigma, observed=d2.height)
    trace_4_2 = pm.sample(1000, tune=1000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
100%|██████████| 2000/2000 [00:02<00:00, 743.31it/s]

In [30]:

pm.summary(trace_4_2, alpha=.11).round(2)

	mean	sd	mc_error	hpd_5.5	hpd_94.5	n_eff	Rhat
mu	177.86	0.10	0.00	177.70	178.02	1906.0	1.0
sigma	24.58	0.96	0.03	23.17	26.16	1230.0	1.0

Code 4.30

For some computations could be nice to have the trace turned into a DataFrame, this can be donde using the "trace_to_dataframe" function

In [31]:

trace_df = pm.trace_to_dataframe(trace_4_1)
trace_df.cov()

	mu	sigma
mu	0.171531	-0.006672
sigma	-0.006672	0.087821

Code 4.31

In [32]:

np.diag(trace_df.cov())

array([ 0.17153096,  0.08782102])

In [33]:

trace_df.corr()

	mu	sigma
mu	1.000000	-0.054363
sigma	-0.054363	1.000000

Code 4.32

We did not use the quadratic approximation, instead we use a MCMC method to sample from the posterior. Thus, we already have samples. We can do something like

In [34]:

trace_df.head()

	mu	sigma
0	154.983655	7.448307
1	154.463080	8.185044
2	154.659249	7.354942
3	154.659249	7.354942
4	153.987518	8.050903

Or directly from the trace (we are getting the first ten samples of sigma)

In [35]:

trace_4_1['sigma'][:10]

array([ 7.44830696,  8.18504394,  7.35494238,  7.35494238,  8.05090306,
        7.59597171,  7.59597171,  8.04482191,  8.21965281,  7.43963924])

Code 4.33

In our case, this is the same we did in the code 4.27

In [36]:

pm.summary(trace_4_1, alpha=.11).round(2)

	mean	sd	mc_error	hpd_5.5	hpd_94.5	n_eff	Rhat
mu	154.61	0.41	0.01	153.97	155.28	1837.0	1.0
sigma	7.75	0.30	0.01	7.24	8.19	1634.0	1.0

Code 4.34

In [37]:

stats.multivariate_normal.rvs(mean=trace_df.mean(), cov=trace_df.cov(), size=10)

array([[ 154.03855047,    8.47487496],
       [ 153.88156243,    7.20226899],
       [ 154.70388212,    7.50080125],
       [ 153.95839377,    7.7531158 ],
       [ 154.81692047,    8.06307253],
       [ 155.31924568,    7.94462247],
       [ 154.2419407 ,    7.8269126 ],
       [ 155.15960393,    8.04177803],
       [ 155.21232361,    8.01861701],
       [ 153.81958738,    7.24442562]])

Code 4.35 and 4.36

Instead of sampling from a normal and then exponentiating to ensure sigma is positive, we can use the lognormal distribution for the same result. The Lognormal distribution is parametrized in terms of $\tau$ (tau) the precision and not the standard deviation, where:

tau=\frac{1}{sd^2}

The normal distribution can also be parametrized in terms of the precision (tau). Given that the conversion between both parametrization is done right, which one to use is only a matter of convenience.

In [38]:

with pm.Model() as m4_1_logsigma:
    mu = pm.Normal('mu', mu=178, sd=20)
    sigma = pm.Lognormal('sigma', mu=2, tau=0.01)
    height = pm.Normal('height', mu=mu, sd=sigma, observed=d2.height)
    trace_4_1_logsigma = pm.sample(1000, tune=1000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
100%|██████████| 2000/2000 [00:03<00:00, 625.43it/s]

In [39]:

pm.traceplot(trace_4_1_logsigma);

Code 4.37

In [40]:

plt.plot(d2.height, d2.weight, '.');

Code 4.38 and 4.39

Notice that the variable mu is defined as alpha + beta * d2.weight in a single line. If we want the trace to containt mu we can write as a deterministic varible. The computating will be exactly the same. The only diference is that mu will be accesible in the trace.

In [41]:

with pm.Model() as m4_3:
    alpha = pm.Normal('alpha', mu=178, sd=100)
    beta = pm.Normal('beta', mu=0, sd=10)
    sigma = pm.Uniform('sigma', lower=0, upper=50)
    mu = alpha + beta * d2.weight
    #mu = pm.Deterministic('mu', alpha + beta * d2.weight) # try uncomenting this line and comenting the above line
    height = pm.Normal('height', mu=mu, sd=sigma, observed=d2.height)
    trace_4_3 = pm.sample(1000, tune=1000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
100%|██████████| 2000/2000 [00:11<00:00, 170.87it/s]

In [42]:

pm.traceplot(trace_4_3);

Another alternative is to write mu inside the likelihood and not as a separate line.

height = pm.Normal('height', mu=alpha + beta * d2.weight, sd=sigma, observed=d2.height)

Using PyMC3 there is not too much reason to do this. I personally think that defining mu in a separate lines improves readability.

Code 4.40

In [43]:

pm.summary(trace_4_3, alpha=.11).round(2)

	mean	sd	mc_error	hpd_5.5	hpd_94.5	n_eff	Rhat
alpha	113.79	1.85	0.06	110.86	116.74	811.0	1.0
beta	0.91	0.04	0.00	0.84	0.97	823.0	1.0
sigma	5.11	0.20	0.01	4.81	5.43	1065.0	1.0

Code 4.41

In [44]:

trace_df = pm.trace_to_dataframe(trace_4_3)
trace_df.corr().round(2)

	alpha	beta	sigma
alpha	1.00	-0.99	-0.02
beta	-0.99	1.00	0.02
sigma	-0.02	0.02	1.00

Code 4.42

In [45]:

d2 = d2.assign(weight_c=pd.Series(d2.weight - d2.weight.mean()))

Code 4.43

In [46]:

with pm.Model() as m4_4:
    alpha = pm.Normal('alpha', mu=178, sd=100)
    beta = pm.Normal('beta', mu=0, sd=10)
    sigma = pm.Uniform('sigma', lower=0, upper=50)
    mu = alpha + beta * d2.weight_c
    height = pm.Normal('height', mu=mu, sd=sigma, observed=d2.height)
    trace_4_4 = pm.sample(1000, tune=1000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
100%|██████████| 2000/2000 [00:03<00:00, 625.46it/s]

Code 4.44

In [47]:

pm.summary(trace_4_4, alpha=.11).round(2)

	mean	sd	mc_error	hpd_5.5	hpd_94.5	n_eff	Rhat
alpha	154.59	0.28	0.01	154.15	155.03	2000.0	1.0
beta	0.90	0.04	0.00	0.84	0.97	2000.0	1.0
sigma	5.11	0.19	0.00	4.80	5.40	2000.0	1.0

Code 4.45

Instead of using the MAP, we are going to use the mean of the posterior

In [48]:

plt.plot(d2.weight, d2.height, '.')
plt.plot(d2.weight, trace_4_3['alpha'].mean() + trace_4_3['beta'].mean() * d2.weight)
plt.xlabel(d2.columns[1], fontsize=14)
plt.ylabel(d2.columns[0], fontsize=14);

Code 4.46 and 4.47

In [49]:

pm.trace_to_dataframe(trace_4_4)[:5]

	alpha	beta	sigma
0	154.472997	0.863022	4.978642
1	154.498745	0.850804	4.899789
2	154.938774	0.962782	5.110805
3	154.280574	0.825338	5.127054
4	154.361697	0.920776	5.190605

Code 4.48

In [50]:

N = [10, 50, 150, 352][0]
with pm.Model() as m_N:
    alpha = pm.Normal('alpha', mu=178, sd=100)
    beta = pm.Normal('beta', mu=0, sd=10)
    sigma = pm.Uniform('sigma', lower=0, upper=50)
    mu = pm.Deterministic('mu', alpha + beta * d2.weight[:N])
    height_hat = pm.Normal('height_hat', mu=mu, sd=sigma, observed=d2.height[:N])
    trace_N = pm.sample(1000, tune=1000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
100%|██████████| 2000/2000 [00:09<00:00, 214.13it/s]

In [51]:

chain_N = trace_N[100:]
pm.traceplot(chain_N);

Code 4.49

In [52]:

plt.plot(d2.weight[:N], d2.height[:N], 'C0o')
for _ in range(0, 20):
    idx = np.random.randint(len(chain_N))
    plt.plot(d2.weight[:N], chain_N['alpha'][idx] + chain_N['beta'][idx] * d2.weight[:N], 'C2-', alpha=0.5)
plt.xlabel(d2.columns[1], fontsize=14)
plt.ylabel(d2.columns[0], fontsize=14);

Alternative we can directly use the deterministic mu variable

In [53]:

plt.plot(d2.weight[:N], d2.height[:N], 'C0o')
for _ in range(0, 20):
    idx = np.random.randint(len(chain_N))
    plt.plot(d2.weight[:N], chain_N['mu'][idx], 'C2-', alpha=0.5)
plt.xlabel(d2.columns[1], fontsize=14)
plt.ylabel(d2.columns[0], fontsize=14);

Code 4.50 and 4.51

In [54]:

mu_at_50 = chain_N['alpha'] + chain_N['beta'] * 50
pm.kdeplot(mu_at_50)
plt.xlabel('heights', fontsize=14)
plt.yticks([]);

Code 4.52

In [55]:

pm.hpd(mu_at_50, alpha=.11)

array([ 153.36045406,  159.60493185])

Code 4.53

Using PyMC3, we do not need to compute anything else. By defining a deterministic variable mu in the model, we add that variable to the trace. Thus we get a matrix with row samples from the posterior and columns values of weights. We can access this matrix directly from the trace or turn it into a DataFrame, it all depends on what we need.

In [56]:

df_trace_N = pm.trace_to_dataframe(chain_N).filter(regex=('mu.*'))
df_trace_N.head()

	mu__0	mu__1	mu__2	mu__3	mu__4	mu__5	mu__6	mu__7	mu__8	mu__9
0	155.209354	144.034259	139.480408	160.349898	148.755737	170.156044	145.766399	162.752544	142.441808	161.774723
1	154.334418	143.469248	139.041692	159.332396	148.059782	168.866582	145.153350	161.668407	141.920962	160.717705
2	154.844252	142.188852	137.031776	160.665736	147.535758	171.770849	144.150439	163.386647	140.385457	162.279299
3	155.023345	140.512205	134.598915	161.698470	146.643162	174.431996	142.761432	164.818365	138.444367	163.548640
4	153.442479	139.729380	134.141293	159.750504	145.523164	171.783748	141.854911	162.698820	137.775264	161.498924

Code 4.54 and 4.58

We are doing manually, what in thebook is done using the link function. In the book on code 4.58 the following operations are performed manually.

In [57]:

weigth_seq = np.arange(25, 71)
# Given that we have a lot of samples we can use less of them for plotting (or we can use all!)
chain_N_thinned = chain_N[::10]
mu_pred = np.zeros((len(weigth_seq), len(chain_N_thinned)*chain_N.nchains))
for i, w in enumerate(weigth_seq):
    mu_pred[i] = chain_N_thinned['alpha'] + chain_N_thinned['beta'] * w

Code 4.55

In [58]:

plt.plot(weigth_seq, mu_pred, 'C0.', alpha=0.1)
plt.xlabel('weight', fontsize=14)
plt.ylabel('height', fontsize=14);

Code 4.56

In [59]:

mu_mean = mu_pred.mean(1)
mu_hpd = pm.hpd(mu_pred.T, alpha=.11)

Code 4.57

In [60]:

plt.scatter(d2.weight[:N], d2.height[:N])
plt.plot(weigth_seq, mu_mean, 'C2')
plt.fill_between(weigth_seq, mu_hpd[:,0], mu_hpd[:,1], color='C2', alpha=0.25)
plt.xlabel('weight', fontsize=14)
plt.ylabel('height', fontsize=14)
plt.xlim(d2.weight[:N].min(), d2.weight[:N].max());

Code 4.59

Now we are going to use sample_ppc() from PyCM3. This function give us posterior predictive samples, that is for each value of the input variable we get the a sample (from the posterior) of the output variable. Thus in the following example the shape of height_pred['height_hat'].shape is (200, 352)

In [61]:

height_pred = pm.sample_ppc(chain_N, 200, m_N)

100%|██████████| 200/200 [00:00<00:00, 356.45it/s]

Code 4.60

In [62]:

height_pred_hpd = pm.hpd(height_pred['height_hat'])

Code 4.61

sample_ppc returns values corresponding to the input values (weights in this example). Because the weights are not ordered if we use them with the fill_between function we will get a mess. For that reason in the following cell we order the weights and the predicted heights

In [63]:

idx = np.argsort(d2.weight.values[:N])
d2_weight_ord = d2.weight.values[:N][idx]
height_pred_hpd = height_pred_hpd[idx]

In [64]:

plt.scatter(d2.weight[:N], d2.height[:N])
plt.fill_between(weigth_seq, mu_hpd[:,0], mu_hpd[:,1], color='C2', alpha=0.25)
plt.fill_between(d2_weight_ord, height_pred_hpd[:,0], height_pred_hpd[:,1], color='C2', alpha=0.25)
plt.plot(weigth_seq, mu_mean, 'C2')
plt.xlabel('weight', fontsize=14)
plt.ylabel('height', fontsize=14)
plt.xlim(d2.weight[:N].min(), d2.weight[:N].max());

Code 4.62

Change the number of samples used in 4.59 (200) to other values. Because we are getting samples at the input values the jaggedness of this plot is larger than the one in the book.

Code 4.63

Now we are going to generate heights from the posterior manually, instead of restricting to the input values we are going to pass an array of equally spaced weights values weight_seg.

In [65]:

weigth_seq = np.arange(25, 71)
post_samples = []
for _ in range(1000): # number of samples from the posterior
    i = np.random.randint(len(chain_N))
    mu_pred = chain_N['alpha'][i] + chain_N['beta'][i] * weigth_seq
    sigma_pred = chain_N['sigma'][i]
    post_samples.append(np.random.normal(mu_pred, sigma_pred))

In [66]:

post_samples_hpd = pm.hpd(np.array(post_samples))

In [67]:

plt.scatter(d2.weight[:N], d2.height[:N])
plt.plot(weigth_seq, mu_mean, 'C2')
plt.fill_between(weigth_seq, mu_hpd[:,0], mu_hpd[:,1], color='C2', alpha=0.25)
plt.fill_between(weigth_seq, post_samples_hpd[:,0], post_samples_hpd[:,1], color='C2', alpha=0.25)
plt.xlabel('weight', fontsize=14)
plt.ylabel('height', fontsize=14)
plt.xlim(d2.weight.min(), d2.weight.max());

Code 4.64

We have already loaded this dataset, check code 4.7 and 4.8.

In [68]:

d.head()

	height	weight	age	male
0	151.765	47.825606	63.0	1
1	139.700	36.485807	63.0	0
2	136.525	31.864838	65.0	0
3	156.845	53.041915	41.0	1
4	145.415	41.276872	51.0	0

Code 4.65

In [69]:

d.weight_std = (d.weight - d.weight.mean()) / d.weight.std()
d.weight_std2 = d.weight_std**2

Code 4.66

In [70]:

with pm.Model() as m_4_5:
    alpha = pm.Normal('alpha', mu=178, sd=100)
    beta = pm.Normal('beta', mu=0, sd=10, shape=2)
    sigma = pm.Uniform('sigma', lower=0, upper=50)
    mu = pm.Deterministic('mu', alpha + beta[0] * d.weight_std + beta[1] * d.weight_std2)
    height = pm.Normal('height', mu=mu, sd=sigma, observed=d.height)
    trace_4_5 = pm.sample(1000, tune=1000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
100%|██████████| 2000/2000 [00:04<00:00, 451.83it/s]

In [71]:

varnames = ['alpha', 'beta', 'sigma']
pm.traceplot(trace_4_5, varnames);

Code 4.67

In [72]:

pm.summary(trace_4_5, varnames, alpha=.11).round(2)

	mean	sd	mc_error	hpd_5.5	hpd_94.5	n_eff	Rhat
alpha	146.66	0.37	0.01	146.05	147.24	1428.0	1.0
beta__0	21.40	0.30	0.01	20.91	21.84	1497.0	1.0
beta__1	-8.42	0.28	0.01	-8.89	-8.01	1291.0	1.0
sigma	5.78	0.17	0.00	5.48	6.02	1501.0	1.0

Code 4.68

In [73]:

mu_pred = trace_4_5['mu']
idx = np.argsort(d.weight_std)
mu_hpd = pm.hpd(mu_pred, alpha=.11)[idx]

height_pred = pm.sample_ppc(trace_4_5, 200, m_4_5)
height_pred_hpd = pm.hpd(height_pred['height'], alpha=.11)[idx]

100%|██████████| 200/200 [00:00<00:00, 932.57it/s]

Code 4.69

In [74]:

plt.scatter(d.weight_std, d.height, c='C0', alpha=0.3)
plt.fill_between(d.weight_std[idx], mu_hpd[:,0], mu_hpd[:,1], color='C2', alpha=0.25);
plt.fill_between(d.weight_std[idx], height_pred_hpd[:,0], height_pred_hpd[:,1], color='C2', alpha=0.25);

Code 4.70

We will stack the weights to get a 2D array, these simplifies wrriting a model. Now we can compute the dot product between beta and the 2D-array

In [75]:

weight_m = np.vstack((d.weight_std, d.weight_std**2, d.weight_std**3))

In [76]:

with pm.Model() as m_4_6:
    alpha = pm.Normal('alpha', mu=178, sd=100)
    beta = pm.Normal('beta', mu=0, sd=10, shape=3)
    sigma = pm.Uniform('sigma', lower=0, upper=50)
    mu = pm.Deterministic('mu', alpha + pm.math.dot(beta, weight_m))
    height = pm.Normal('height', mu=mu, sd=sigma, observed=d.height)
    trace_4_6 = pm.sample(1000, tune=1000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
100%|██████████| 2000/2000 [00:16<00:00, 336.64it/s]

In [77]:

pm.traceplot(trace_4_6, varnames);

Code 4.71 and 4.72

In [80]:

plt.scatter(d.weight_std, d.height, c='C0', alpha=0.3)
plt.fill_between(d.weight_std[idx], mu_hpd[:,0], mu_hpd[:,1], color='C2', alpha=0.25);
plt.fill_between(d.weight_std[idx], height_pred_hpd[:,0], height_pred_hpd[:,1], color='C2', alpha=0.25)

at = np.arange(-2, 3)
plt.xticks(at, np.round(at * d.weight.std() + d.weight.mean(), 1));

In [79]:

import sys, IPython, scipy, matplotlib, platform
print("This notebook was createad on a computer %s running %s and using:\nPython %s\nIPython %s\nPyMC3 %s\nNumPy %s\nPandas %s\nSciPy %s\nMatplotlib %s\n" % (platform.machine(), ' '.join(platform.linux_distribution()[:2]), sys.version[:5], IPython.__version__, pm.__version__, np.__version__, pd.__version__, scipy.__version__, matplotlib.__version__))

This notebook was createad on a computer x86_64 running debian stretch/sid and using:
Python 3.6.2
IPython 6.1.0
PyMC3 3.2
NumPy 1.13.3
Pandas 0.20.3
SciPy 0.19.1
Matplotlib 2.1.0