Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
| Download

📚 The CoCalc Library - books, templates and other resources

Views: 96106
License: OTHER
Kernel: Python [default]

Section 2: Model checking

In this section, we will look at two techniques that aim to answer:

  1. Are the model and parameters estimated a good fit for the underlying data?

  2. Given two separate models, which is a better fit for the underlying data?


Model Check I: Posterior predictive check

One method of checking model fit is called the posterior predictive check. I find this to be a very intuitive technique. You'll recall in the previous section we estimated the parameter μ\mu of a Poisson distribution by collecting 200,000 samples from the posterior distribution of μ\mu. Each of these samples was considered to be a credible parameter value.

The posterior predictive check requires one to generate new data from the predicted model. What does that mean? Well, we have estimated 200,000 credible values of μ\mu for the Poisson distribution. That means we can construct 200,000 Poisson distributions with these values and then randomly sample from these distributions. This is formally represented as:

p(y~y)=p(y~θ)f(θy)dθp(\tilde{y}|y) = \int p(\tilde{y}|\theta) f(\theta|y) d\theta

Conceptually, if the model is a good fit for the underlying data - then the generated data should resemble the original observed data. PyMC provides a convenient way to sample from the fitted model. You may have noticed a new line in the above model specification:

y_pred = pm.Poisson('y_pred', mu=mu)

This is almost identical to y_est except we do not specify the observed data. PyMC considers this to be a stochastic node (as opposed to an observed node) and as the MCMC sampler runs - it also samples data from y_est.

We then plot y_pred below and compare it to the observed data y_est

import json import matplotlib.pyplot as plt import numpy as np import pandas as pd import pymc3 as pm import scipy import scipy.stats as stats import statsmodels.api as sm import theano.tensor as tt from IPython.display import Image %matplotlib inline plt.style.use('bmh') colors = ['#348ABD', '#A60628', '#7A68A6', '#467821', '#D55E00', '#CC79A7', '#56B4E9', '#009E73', '#F0E442', '#0072B2'] messages = pd.read_csv('data/hangout_chat_data.csv')
with pm.Model() as model: mu = pm.Uniform('mu', lower=0, upper=100) y_est = pm.Poisson('y_est', mu=mu, observed=messages['time_delay_seconds'].values) y_pred = pm.Poisson('y_pred', mu=mu) start = pm.find_MAP() step = pm.Metropolis() trace = pm.sample(200000, step, start=start, progressbar=True)
Applied interval-transform to mu and added transformed mu_interval_ to model. 100%|██████████| 200000/200000 [00:30<00:00, 6546.68it/s]
x_lim = 60 burnin = 50000 y_pred = trace[burnin:].get_values('y_pred') mu_mean = trace[burnin:].get_values('mu').mean() fig = plt.figure(figsize=(10,6)) fig.add_subplot(211) _ = plt.hist(y_pred, range=[0, x_lim], bins=x_lim, histtype='stepfilled', color=colors[1]) _ = plt.xlim(1, x_lim) _ = plt.ylabel('Frequency') _ = plt.title('Posterior predictive distribution') fig.add_subplot(212) _ = plt.hist(messages['time_delay_seconds'].values, range=[0, x_lim], bins=x_lim, histtype='stepfilled') _ = plt.xlabel('Response time in seconds') _ = plt.ylabel('Frequency') _ = plt.title('Distribution of observed data') plt.tight_layout()
Image in a Jupyter notebook

Choosing the right distribution

I'm not particularly happy with the above plot. Ideally, I'd like the posterior predictive distribution to somewhat resemble the distribution of the observed data. Intuitively, if we have correctly estimated the parameters of the model, then we should be able to sample similar data from that model. Clearly this is not the case.

Perhaps the Poisson distribution is not suitable for this data. One alternative option we have is the Negative Binomial distribution. This has very similar characteristics to the Poisson distribution except that it has two parameters (μ\mu and α\alpha) which enables it to vary its variance independently of its mean. Recall that the Poisson distribution has one parameter (μ\mu) that represents both its mean and its variance.

fig = plt.figure(figsize=(10,5)) fig.add_subplot(211) x_lim = 70 mu = [15, 40] for i in np.arange(x_lim): plt.bar(i, stats.poisson.pmf(mu[0], i), color=colors[3]) plt.bar(i, stats.poisson.pmf(mu[1], i), color=colors[4]) _ = plt.xlim(1, x_lim) _ = plt.xlabel('Response time in seconds') _ = plt.ylabel('Probability mass') _ = plt.title('Poisson distribution') _ = plt.legend(['$\lambda$ = %s' % mu[0], '$\lambda$ = %s' % mu[1]]) # Scipy takes parameters n & p, not mu & alpha def get_n(mu, alpha): return 1. / alpha * mu def get_p(mu, alpha): return get_n(mu, alpha) / (get_n(mu, alpha) + mu) fig.add_subplot(212) a = [2, 4] for i in np.arange(x_lim): plt.bar(i, stats.nbinom.pmf(i, n=get_n(mu[0], a[0]), p=get_p(mu[0], a[0])), color=colors[3]) plt.bar(i, stats.nbinom.pmf(i, n=get_n(mu[1], a[1]), p=get_p(mu[1], a[1])), color=colors[4]) _ = plt.xlabel('Response time in seconds') _ = plt.ylabel('Probability mass') _ = plt.title('Negative Binomial distribution') _ = plt.legend(['$\\mu = %s, \/ \\beta = %s$' % (mu[0], a[0]), '$\\mu = %s, \/ \\beta = %s$' % (mu[1], a[1])]) plt.tight_layout()
Image in a Jupyter notebook

Lets go ahead and estimate the parameters for a Negative Binomial distribution given the same dataset used before. Again, we will use a Uniform distribution to estimate both μ\mu and α\alpha. The model can be represented as:

yjNegBinomial(μ,α)y_{j} \sim NegBinomial(\mu, \alpha)α=Exponential(0.2)\alpha = Exponential(0.2)μ=Uniform(0,100)\mu = Uniform(0,100)
Image('graphics/Neg Binomial Dag.png', width=400)
Image in a Jupyter notebook
with pm.Model() as model: alpha = pm.Exponential('alpha', lam=.2) mu = pm.Uniform('mu', lower=0, upper=100) y_pred = pm.NegativeBinomial('y_pred', mu=mu, alpha=alpha) y_est = pm.NegativeBinomial('y_est', mu=mu, alpha=alpha, observed=messages['time_delay_seconds'].values) start = pm.find_MAP() step = pm.Metropolis() trace = pm.sample(200000, step, start=start, progressbar=True)
Applied log-transform to alpha and added transformed alpha_log_ to model. Applied interval-transform to mu and added transformed mu_interval_ to model. 100%|██████████| 200000/200000 [01:04<00:00, 3102.68it/s]
_ = pm.traceplot(trace[burnin:], varnames=['alpha', 'mu'])
Image in a Jupyter notebook

We see the above model has greater uncertainty around the estimation of the mean response time (μ)(\mu) for chat messages:

  • Poisson: 10 to 30

  • Negative Binomial: 16 to 21

Additionally, the Negative Binonomial model has an α\alpha parameter of 1.2 to 2.2 which further increases the variance in the estimated parameter μ\mu. Let's have a look at the posterior preditive distribution and see if it more closely resembles the distribution from the observed data.

x_lim = 60 y_pred = trace[burnin:].get_values('y_pred') fig = plt.figure(figsize=(10,6)) fig.add_subplot(211) fig.add_subplot(211) _ = plt.hist(y_pred, range=[0, x_lim], bins=x_lim, histtype='stepfilled', color=colors[1]) _ = plt.xlim(1, x_lim) _ = plt.ylabel('Frequency') _ = plt.title('Posterior predictive distribution') fig.add_subplot(212) _ = plt.hist(messages['time_delay_seconds'].values, range=[0, x_lim], bins=x_lim, histtype='stepfilled') _ = plt.xlabel('Response time in seconds') _ = plt.ylabel('Frequency') _ = plt.title('Distribution of observed data') plt.tight_layout()
Image in a Jupyter notebook

Yes, these two distributions are looking more similar to one another. As per the posterior predictive check, this would suggest that the Negative binomial model is a more appropriate fit for the underlying data.

If you find yourself doubting the rigor of this model checking approach, Bayesians have other, more analytical methods.

Model Check II: Bayes Factor

Another modeling technique is to compute the Bayes factor. This is an analytical method that aims to compare two models with each other.

The Bayes factor was typically a difficult metric to compute because it required integrating over the full joint probability distribution. In a low dimension space, integration is possible but once you begin to model in even modest dimensionality, integrating over the full joint posterior distribution becomes computationally costly and time-consuming.

There is an alternative and analogous technique for calculating the Bayes factor. It involves taking your two models for comparison and combining them into a hierarchical model with a model parameter index (τ\tau). This index will switch between the two models throughout the MCMC process depending on which model it finds more credible. As such, the trace of the model index tells us a lot about the credibility of model M1 over model M2.

Image('graphics/Bayes Factor DAG.png', width=540)
Image in a Jupyter notebook
with pm.Model() as model: # Index to true model prior_model_prob = 0.5 #tau = pm.DiscreteUniform('tau', lower=0, upper=1) tau = pm.Bernoulli('tau', prior_model_prob) # Poisson parameters mu_p = pm.Uniform('mu_p', 0, 60) # Negative Binomial parameters alpha = pm.Exponential('alpha', lam=0.2) mu_nb = pm.Uniform('mu_nb', lower=0, upper=60) y_like = pm.DensityDist('y_like', lambda value: pm.math.switch(tau, pm.Poisson.dist(mu_p).logp(value), pm.NegativeBinomial.dist(mu_nb, alpha).logp(value) ), observed=messages['time_delay_seconds'].values) start = pm.find_MAP() step1 = pm.Metropolis([mu_p, alpha, mu_nb]) step2 = pm.ElemwiseCategorical(vars=[tau], values=[0,1]) trace = pm.sample(200000, step=[step1, step2], start=start) _ = pm.traceplot(trace[burnin:], varnames=['tau'])
Applied interval-transform to mu_p and added transformed mu_p_interval_ to model. Applied log-transform to alpha and added transformed alpha_log_ to model. Applied interval-transform to mu_nb and added transformed mu_nb_interval_ to model. 100%|██████████| 200000/200000 [02:34<00:00, 1292.72it/s]
Image in a Jupyter notebook

We can calculate the Bayes Factor for the above two models using the below formulation:

PosteriorOdds=BayesFactorPriorOddsPosterior Odds = Bayes Factor * Prior OddsP(Data  M1)P(Data  M2)=B.F.×P(M1)P(M2)\frac{P(Data \ | \ M_{1})}{P(Data \ | \ M_{2})} = B.F. \times \frac{P(M_{1})}{P(M_{2})}

In the above example, we didn't apply prior probability to either model, hence the Bayes Factor is simply the quotient of the model likelihoods. If you find that your MCMC sampler is not traversing between the two models, you can introduce prior probabilities that will help you get sufficient exposure to both models.

# Compute the Bayes factor prob_pois = trace[burnin:]['tau'].mean() prob_nb = 1 - prob_pois BF = (prob_nb/prob_pois)*(prior_model_prob/(1-prior_model_prob)) print("Bayes Factor: %s" % BF)
Bayes Factor: 1.63638920135

A Bayes Factor of >1 suggests that M1M_1 (Negative Binomial) is more strongly supported by the data than M2M_2 (Poisson). Jeffreys' scale of evidence for Bayes factors interprets a BF of 1.60 as there being weak evidence of M1M_1 over M2M_2 given the data. Combining the posterior predictive check and Bayes factor I will conclude that the Negative Binomial is a better model for the given data.

Bayes FactorInterpretation
BF(M1,M2M_1, M_2) < 1/10Strong evidence for M2M_2
1/10 < BF(M1,M2M_1, M_2),< 1/3Moderate evidence for M2M_2
1/3 < BF(M1,M2M_1, M_2) < 1Weak evidence for M2M_2
1 < BF(M1,M2M_1, M_2) < 3Weak evidence for M1M_1
3 < BF(M1,M2M_1, M_2) < 10Moderate evidence for M1M_1
BF(M1,M2M_1, M_2) > 10Strong evidence for M1M_1

>> Go to the Next Section

References

  1. Jeffreys' scale of evidence, Humboldt University of Berlin. Link

  2. Model checking and diagnostics, PyMC2 Documentation. Link

  3. Compute Bayes factor using PyMC3, Chris Fonnesbeck. GitHub Issue

  4. Doing Bayesian Data Analysis by John Kruschke

# Apply pretty styles from IPython.core.display import HTML def css_styling(): styles = open("styles/custom.css", "r").read() return HTML(styles) css_styling()