Code 4.1
Code 4.2 and 4.3
Code 4.4
Code 4.5
Code 4.6
Code 4.7 and 4.8
height | weight | age | male | |
---|---|---|---|---|
0 | 151.765 | 47.825606 | 63.0 | 1 |
1 | 139.700 | 36.485807 | 63.0 | 0 |
2 | 136.525 | 31.864838 | 65.0 | 0 |
3 | 156.845 | 53.041915 | 41.0 | 1 |
4 | 145.415 | 41.276872 | 51.0 | 0 |
Code 4.9
Code 4.10
Code 4.11
Code 4.12
Code 4.13
Code 4.14
Code 4.15 and 4.16
Code 4.17 and 4.18
Code 4.19
Code 4.20
Code 4.21 and 4.22
Code 4.23
Code 4.24
We are repeating code 4.7, 4.8 and 4.10
Code 4.25
Code 4.26
We could use a quadratic approximation like McElreath does in his book and we did in code 2.6. But Using PyMC3 is really simple to just sample from the model using a "sampler method". Most common sampler methods are members of the Markov Chain Monte Carlo Method (MCMC) family (for details read Section 2.4.3 and Chapter 8 of Statistical Rethinking).
PyMC3 comes with various sampler. Some sampler are more suited than others for certain type of variable (and/or problems). For now we are going to let PyMC3 choose the sampler for us. PyMC3 also tries to provide a reasonable starting point for the simulation. By default PyMC3 uses the same adaptive procedure as in STAN 'jitter+adapt_diag'
, which start with a identity mass matrix and then adapt a diagonal based on the variance of the tuning samples.
You can read more details of PyMC3 here
Code 4.27
Notice that comapred to the table in the book we have an extra column, "mc_error". Since we are sampling from the posterior, there is an error introducing by the sampling process. This error can be reduced by taking more samples.
mean | sd | mc_error | hpd_5.5 | hpd_94.5 | n_eff | Rhat | |
---|---|---|---|---|---|---|---|
mu | 154.61 | 0.41 | 0.01 | 154.03 | 155.31 | 2000.0 | 1.0 |
sigma | 7.77 | 0.29 | 0.01 | 7.31 | 8.24 | 2000.0 | 1.0 |
Code 4.28
mean | sd | mc_error | hpd_5.5 | hpd_94.5 | n_eff | Rhat | |
---|---|---|---|---|---|---|---|
mu | 154.61 | 0.41 | 0.01 | 153.97 | 155.28 | 1837.0 | 1.0 |
sigma | 7.75 | 0.30 | 0.01 | 7.24 | 8.19 | 1634.0 | 1.0 |
Code 4.29
mean | sd | mc_error | hpd_5.5 | hpd_94.5 | n_eff | Rhat | |
---|---|---|---|---|---|---|---|
mu | 177.86 | 0.10 | 0.00 | 177.70 | 178.02 | 1906.0 | 1.0 |
sigma | 24.58 | 0.96 | 0.03 | 23.17 | 26.16 | 1230.0 | 1.0 |
Code 4.30
For some computations could be nice to have the trace turned into a DataFrame, this can be donde using the "trace_to_dataframe" function
mu | sigma | |
---|---|---|
mu | 0.171531 | -0.006672 |
sigma | -0.006672 | 0.087821 |
Code 4.31
mu | sigma | |
---|---|---|
mu | 1.000000 | -0.054363 |
sigma | -0.054363 | 1.000000 |
Code 4.32
We did not use the quadratic approximation, instead we use a MCMC method to sample from the posterior. Thus, we already have samples. We can do something like
mu | sigma | |
---|---|---|
0 | 154.983655 | 7.448307 |
1 | 154.463080 | 8.185044 |
2 | 154.659249 | 7.354942 |
3 | 154.659249 | 7.354942 |
4 | 153.987518 | 8.050903 |
Or directly from the trace (we are getting the first ten samples of sigma)
Code 4.33
In our case, this is the same we did in the code 4.27
mean | sd | mc_error | hpd_5.5 | hpd_94.5 | n_eff | Rhat | |
---|---|---|---|---|---|---|---|
mu | 154.61 | 0.41 | 0.01 | 153.97 | 155.28 | 1837.0 | 1.0 |
sigma | 7.75 | 0.30 | 0.01 | 7.24 | 8.19 | 1634.0 | 1.0 |
Code 4.34
Code 4.35 and 4.36
Instead of sampling from a normal and then exponentiating to ensure sigma is positive, we can use the lognormal distribution for the same result. The Lognormal distribution is parametrized in terms of (tau) the precision and not the standard deviation, where:
The normal distribution can also be parametrized in terms of the precision (tau). Given that the conversion between both parametrization is done right, which one to use is only a matter of convenience.
Code 4.37
Code 4.38 and 4.39
Notice that the variable mu is defined as alpha + beta * d2.weight in a single line. If we want the trace to containt mu we can write as a deterministic varible. The computating will be exactly the same. The only diference is that mu will be accesible in the trace.
Another alternative is to write mu inside the likelihood and not as a separate line.
Using PyMC3 there is not too much reason to do this. I personally think that defining mu in a separate lines improves readability.
Code 4.40
mean | sd | mc_error | hpd_5.5 | hpd_94.5 | n_eff | Rhat | |
---|---|---|---|---|---|---|---|
alpha | 113.79 | 1.85 | 0.06 | 110.86 | 116.74 | 811.0 | 1.0 |
beta | 0.91 | 0.04 | 0.00 | 0.84 | 0.97 | 823.0 | 1.0 |
sigma | 5.11 | 0.20 | 0.01 | 4.81 | 5.43 | 1065.0 | 1.0 |
Code 4.41
alpha | beta | sigma | |
---|---|---|---|
alpha | 1.00 | -0.99 | -0.02 |
beta | -0.99 | 1.00 | 0.02 |
sigma | -0.02 | 0.02 | 1.00 |
Code 4.42
Code 4.43
Code 4.44
mean | sd | mc_error | hpd_5.5 | hpd_94.5 | n_eff | Rhat | |
---|---|---|---|---|---|---|---|
alpha | 154.59 | 0.28 | 0.01 | 154.15 | 155.03 | 2000.0 | 1.0 |
beta | 0.90 | 0.04 | 0.00 | 0.84 | 0.97 | 2000.0 | 1.0 |
sigma | 5.11 | 0.19 | 0.00 | 4.80 | 5.40 | 2000.0 | 1.0 |
Code 4.45
Instead of using the MAP, we are going to use the mean of the posterior
Code 4.46 and 4.47
alpha | beta | sigma | |
---|---|---|---|
0 | 154.472997 | 0.863022 | 4.978642 |
1 | 154.498745 | 0.850804 | 4.899789 |
2 | 154.938774 | 0.962782 | 5.110805 |
3 | 154.280574 | 0.825338 | 5.127054 |
4 | 154.361697 | 0.920776 | 5.190605 |
Code 4.48
Code 4.49
Alternative we can directly use the deterministic mu variable
Code 4.50 and 4.51
Code 4.52
Code 4.53
Using PyMC3, we do not need to compute anything else. By defining a deterministic variable mu in the model, we add that variable to the trace. Thus we get a matrix with row samples from the posterior and columns values of weights. We can access this matrix directly from the trace or turn it into a DataFrame, it all depends on what we need.
mu__0 | mu__1 | mu__2 | mu__3 | mu__4 | mu__5 | mu__6 | mu__7 | mu__8 | mu__9 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 155.209354 | 144.034259 | 139.480408 | 160.349898 | 148.755737 | 170.156044 | 145.766399 | 162.752544 | 142.441808 | 161.774723 |
1 | 154.334418 | 143.469248 | 139.041692 | 159.332396 | 148.059782 | 168.866582 | 145.153350 | 161.668407 | 141.920962 | 160.717705 |
2 | 154.844252 | 142.188852 | 137.031776 | 160.665736 | 147.535758 | 171.770849 | 144.150439 | 163.386647 | 140.385457 | 162.279299 |
3 | 155.023345 | 140.512205 | 134.598915 | 161.698470 | 146.643162 | 174.431996 | 142.761432 | 164.818365 | 138.444367 | 163.548640 |
4 | 153.442479 | 139.729380 | 134.141293 | 159.750504 | 145.523164 | 171.783748 | 141.854911 | 162.698820 | 137.775264 | 161.498924 |
Code 4.54 and 4.58
We are doing manually, what in thebook is done using the link
function. In the book on code 4.58 the following operations are performed manually.
Code 4.55
Code 4.56
Code 4.57
Code 4.59
Now we are going to use sample_ppc()
from PyCM3. This function give us posterior predictive samples, that is for each value of the input variable we get the a sample (from the posterior) of the output variable. Thus in the following example the shape of height_pred['height_hat'].shape is (200, 352)
Code 4.60
Code 4.61
sample_ppc
returns values corresponding to the input values (weights in this example). Because the weights are not ordered if we use them with the fill_between
function we will get a mess. For that reason in the following cell we order the weights and the predicted heights
Code 4.62
Change the number of samples used in 4.59 (200) to other values. Because we are getting samples at the input values the jaggedness of this plot is larger than the one in the book.
Code 4.63
Now we are going to generate heights from the posterior manually, instead of restricting to the input values we are going to pass an array of equally spaced weights values weight_seg
.
Code 4.64
We have already loaded this dataset, check code 4.7 and 4.8.
height | weight | age | male | |
---|---|---|---|---|
0 | 151.765 | 47.825606 | 63.0 | 1 |
1 | 139.700 | 36.485807 | 63.0 | 0 |
2 | 136.525 | 31.864838 | 65.0 | 0 |
3 | 156.845 | 53.041915 | 41.0 | 1 |
4 | 145.415 | 41.276872 | 51.0 | 0 |
Code 4.65
Code 4.66
Code 4.67
mean | sd | mc_error | hpd_5.5 | hpd_94.5 | n_eff | Rhat | |
---|---|---|---|---|---|---|---|
alpha | 146.66 | 0.37 | 0.01 | 146.05 | 147.24 | 1428.0 | 1.0 |
beta__0 | 21.40 | 0.30 | 0.01 | 20.91 | 21.84 | 1497.0 | 1.0 |
beta__1 | -8.42 | 0.28 | 0.01 | -8.89 | -8.01 | 1291.0 | 1.0 |
sigma | 5.78 | 0.17 | 0.00 | 5.48 | 6.02 | 1501.0 | 1.0 |
Code 4.68
Code 4.69
Code 4.70
We will stack the weights to get a 2D array, these simplifies wrriting a model. Now we can compute the dot product between beta and the 2D-array