CoCalc -- medical-model-comparison.ipynb

Path: medical-model-comparison.ipynb

Views: ⁷⁶

Kernel: R (R-Project)

We consider the eczema medical trial data set again. This time we will compare which of 2 models explain the observed data best.

Model 1: All studies have the same probability of success.
Model 2: A hierarchical model where the probability of success in each study is drawn from a beta prior distribution with unknown $\alpha$ and $\beta$ parameters.

Study	Treatment group	Control group
Di Rienzo 2014	20 / 23	9 / 15
Galli 1994	10 / 16	11 / 18
Kaufman 1974	13 / 16	4 / 10
Qin 2014	35 / 45	21 / 39
Sanchez 2012	22 / 31	12 / 29
Silny 2006	7 / 10	0 / 10
Totals	107 / 141	57 / 121

Model 1:

For each group (treatment and control), all 6 studies have the same fixed, but unknown, probability of success, $\theta_t,\theta_c\in[0,1]$ .
The data follow a binomial distribution in each study, conditioned on the probability of success — $\theta_t$ for treatment or $\theta_c$ for control.
The priors over $\theta_t$ and $\theta_c$ are uniform.

These assumptions lead to the following model.

Likelihood: $\prod_{i=1}^6 \text{Binomial}(s_i\,|\,\theta, n_i)$ , where $s_i$ is the number of successful recoveries, $f_i$ is the number of failures (did not recover), and $n_i=s_i+f_i$ the number of patients.
Prior: $\text{Beta}(\theta\,|\,1,1)$ for both $\theta_t$ and $\theta_c$ .
Posterior for treatment group: $\text{Beta}(\theta_t\,|\,108, 35)$ .
Posterior for control group: $\text{Beta}(\theta_c\,|\,58, 65)$ .

Since we have closed-form solutions for the posteriors, we can calculate the marginal likelihood by rearranging Bayes' equation: (marginal likelihood) = (likelihood) x (prior) / (posterior).

P(\text{data}) = \left[\prod_{i=1}^6 \text{Binomial}(s_i\,|\,\theta, n_i) \right] \text{Beta}(\theta\,|\,\alpha_0,\beta_0) \,/\, \text{Beta}(\theta\,|\,\alpha_1,\beta_1)

where $\alpha_0=1$ and $\beta_0=1$ are the parameters of the prior, and $\alpha_1$ and $\beta_1$ are the parameters of the posterior beta distribution.

Since all factors involving $\theta$ cancel out, we are just left with the normalization constants of the likelihood, the prior and the posterior:

\begin{align} P(\text{data}) &= \left[ \prod_{i=1}^6 \left(\begin{array}{c}s_i+f_i \\ s_i\end{array}\right) \right] \frac{\text{B}(\alpha_1,\beta_1)}{\text{B}(\alpha_0,\beta_0)} \\ &= \left[\prod_{i=1}^6 \frac{1}{(s_i+f_i+1)\text{B}(s_i+1,f_i+1)}\right] \frac{\text{B}(\alpha_1,\beta_1)}{\text{B}(\alpha_0,\beta_0)} \end{align}

We usually compute the log of the marginal likelihood since the results can vary over many orders of magnitude.

Task 1:

Take the log of the marginal likelihood above.
Complete the R function below to calculate the log marginal likelihood.
You can use the built-in function lbeta(a,b) to compute $\log \text{B}(a,b)$ .

In [0]:

log_beta_binomial_marginal_likelihood <- function(alpha0, beta0, data) {
    # Compute the log marginal likelihood of a beta-binomial model.
    # alpha0, beta0: prior beta distribution parameters.
    # data: 2 x n matrix, where n is the number of samples from the
    #       binomial distribution. This means the first row contains
    #       the success counts and the second row the failure counts
    #       of the samples.

    alpha1 = alpha0 + sum(data[1,])  # Compute posterior parameters
    beta1 = beta0 + sum(data[2,])
    log_evidence = (
        lbeta(alpha1, beta1) -
        lbeta(alpha0, beta0) -
        sum(lbeta(data[1,] + 1, data[2,] + 1)) -
        sum(log(data[1,] + data[2,] + 1)))
    return(log_evidence)
}

#Vy's
vys_log_beta_binomial_marginal_likelihood <- function(alpha0, beta0, data) {
    alpha1<-0
    beta1<-0
    for (i in 1:length(data)){
        si<-data[i][1]
        fi<-data[i][2]
        sum<-sum+log(1/(si+fi+1))+log(1)-lbeta(si+1,fi+1)
        alpha1<-alpha1+si
    }
    sum<-sum+lbeta(alpha1,beta1)-lbeta(alpha0,beta0) 
    return(sum)
}

In [1]:

log_beta_binomial_marginal_likelihood <- function(alpha0, beta0, data) {
    # Compute the log marginal likelihood of the beta-binomial model for the eczema data set.
    si_treat = c(20,10,13,35,22,7)
    fi_treat = c(23,16,16,45,31,10)
    si_control = c(9,11,4,21,12,0)
    fi_control = c(15,18,10,39,29,10)
    
    pis_treat <- c()
    pis_control <- c()
    
    for s,f in si_treat,fi_treat {
        pi_i = - log(s+f+1) - (lbeta(s+1,f+1)))
        append(pis_treat,pi_i)
    }
    for s,f in si_control,fi_control {
        pi_i = - log(s+f+1) - (lbeta(s+1,f+1)))
        append(pis_control,pi_i)
    }
    
    p_data_treat = sum(pis_treat) + lbeta(108,35) - lbeta(1,1)
    p_data_control = sum(pis_control) + lbeta(58,65) - lbeta(1,1)
    
    # alpha0, beta0: prior beta distribution parameters.
    alpha0 = 1
    beta0 = 1
    # data: 2 x n matrix with number of successes in the first row and number of failures in the second row.
    
}

Error in parse(text = x, srcfile = src): <text>:11:9: unexpected symbol
10:     
11:     for s
            ^
Traceback:

Model 2:

For each group (intervention and control), each of the 6 studies has a different probability of success.
Each probability of success is drawn from a beta prior with unknown parameters $\alpha$ and $\beta$ .
Since $\alpha$ and $\beta$ are unknown, we put a broad hyperprior on them — we choose the Gamma(2, 0.5) distribution, which is shown below.

In [1]:

curve(dgamma(x, 2, 0.5), from=0, to=15)

These assumptions lead to the following model:

Likelihood: $\prod_{i=1}^6 \text{Binomial}(s_i\,|\,\theta_i, n_i)$ , where $s_i$ is the number of successful recoveries, $f_i$ is the number of failures (did not recover), and $n_i=s_i+f_i$ the number of patients. Note that each study has its own $\theta_i$ , whereas Model 1 had the same $\theta$ for all 6 studies.
Prior: $\prod_{i=1}^6 \text{Beta}(\theta_i\,|\,\alpha,\beta)$ .
Hyperprior: $P(\alpha,\beta) = \text{Gamma}(\alpha\,|\,2,0.5)\,\text{Gamma}(\beta\,|\,2,0.5)$ .

This model has 8 parameters (for each of the treatment and control groups), namely $\theta_1, \ldots, \theta_6$ , $\alpha$ , and $\beta$ .

Since the posterior does not have a closed-form analytical solution, we have to calculate the marginal likelihood by integrating out all of the parameters in the model.

P(\text{data}) = \int_0^{\infty} \int_0^{\infty} \int_0^1\cdots\int_0^1 \left[\prod_{i=1}^6 \text{Binomial}(s_i\,|\,\theta_i,n_i)\,\text{Beta}(\theta_i\,|\,\alpha,\beta) \right] P(\alpha,\beta)\ \text{d}\theta_6\cdots\text{d}\theta_1\,\text{d}\beta\,\text{d}\alpha

This looks like a crazy 8-dimensional integral, but we can actually integrate out the $\theta_i$ analytically, leaving a 2-dimensional integral over $\alpha$ and $\beta$ .

First, note that $P(\alpha,\beta)$ does not contain $\theta_i$ , so we can move it outside of the $\theta_i$ integrals.

ParseError: KaTeX parse error: Invalid color: '[' at position 61: …eta) \textcolor[̲rgb]{0.00,0.00,…

Next, since there are no factors containing two different $\theta_i$ variables (go to explanation), we can rearrange the integrals and the products (the blue part) like this:

ParseError: KaTeX parse error: Invalid color: '[' at position 61: …eta) \textcolor[̲rgb]{0.00,0.00,…

Note that we cannot always swap products and integrals.

Since the beta distribution is conjugate to the binomial, the blue integrals above can be evaluated analytically (much like we did for Model 1), to get

= \int_0^{\infty}\int_0^{\infty} P(\alpha,\beta) \left[\prod_{i=1}^6 \frac{1}{(s_i+f_i+1)\,\text{B}(s_i+1,f_i+1)}\,\frac{\text{B}(\alpha+s_i, \beta+f_i)}{\text{B}(\alpha,\beta)}\right]\,\text{d}\beta\,\text{d}\alpha

Finally, move all the factors that do not depend on $\alpha$ or $\beta$ out of the integrals.

= \left[\prod_{i=1}^6 (s_i+f_i+1)\,\text{B}(s_i+1,f_i+1) \right]^{-1} \int_0^{\infty}\int_0^{\infty} P(\alpha,\beta)\, \text{B}(\alpha,\beta)^{-6} \prod_{i=1}^6 \text{B}(\alpha+s_i, \beta+f_i)\ \text{d}\beta\,\text{d}\alpha

Unfortunately we cannot evaluate the remaining integrals analytically, so we resort to a numerical calculation.

Task 2:

Read up about the adaptIntegrate(f, lowerLimit, upperLimit) function in the cubature package — see the function documentation below.
How would you define a function $f$ so that adaptIntegrate(f, c(0, 0), c(Inf, Inf)) evaluates the 2-dimensional integral over $\alpha$ and $\beta$ above?

In [1]:

library(cubature)
?adaptIntegrate

adaptIntegrate {cubature}

R Documentation

Adaptive multivariate integration over hypercubes

Description

The function performs adaptive multidimensional integration (cubature) of (possibly) vector-valued integrands over hypercubes.

Usage


adaptIntegrate(f, lowerLimit, upperLimit, ..., tol = 1e-05, fDim = 1,
               maxEval = 0, absError=0, doChecking=FALSE)

Arguments

`f`	The function (integrand) to be integrated
`lowerLimit`	The lower limit of integration, a vector for hypercubes
`upperLimit`	The upper limit of integration, a vector for hypercubes
`...`	All other arguments passed to the function f
`tol`	The maximum tolerance, default 1e-5.
`fDim`	The dimension of the integrand, default 1, bears no relation to the dimension of the hypercube
`maxEval`	The maximum number of function evaluations needed, default 0 implying no limit
`absError`	The maximum absolute error tolerated
`doChecking`	A flag to be a bit anal about checking inputs to C routines. A FALSE value results in approximately 9 percent speed gain in our experiments. Your mileage will of course vary. Default value is FALSE.

Details

The function merely calls Johnson's C code and returns the results. The original C code by Johnson was modified for use with R memory allocation functions and a helper function does the callback.

One can specify a maximum number of function evaluations (default is 0 for no limit). Otherwise, the integration stops when the estimated error is less than the absolute error requested, or when the estimated error is less than tol times the integral, in absolute value.

Value

The returned value is a list of three items:

`integral`	the value of the integral
`error`	the estimated relative error
`functionEvaluations`	the number of times the function was evaluated
`returnCode`	the actual integer return code of the C routine

Author(s)

Balasubramanian Narasimhan

References

See http://ab-initio.mit.edu/wiki/index.php/Cubature.

Examples


## Test function 0
## Compare with original cubature result of
## ./cubature_test 2 1e-4 0 0
## 2-dim integral, tolerance = 0.0001
## integrand 0: integral = 0.708073, est err = 1.70943e-05, true err = 7.69005e-09
## #evals = 17

testFn0 <- function(x) {
  prod(cos(x))
}

adaptIntegrate(testFn0, rep(0,2), rep(1,2), tol=1e-4)

M_2_SQRTPI <- 2/sqrt(pi)

## Test function 1
## Compare with original cubature result of
## ./cubature_test 3 1e-4 1 0
## 3-dim integral, tolerance = 0.0001
## integrand 1: integral = 1.00001, est err = 9.67798e-05, true err = 9.76919e-06
## #evals = 5115

testFn1 <- function(x) {
  scale = 1.0
  val = 0
  dim = length(x)
  val = sum (((1-x) / x)^2)
  scale = prod(M_2_SQRTPI/x^2)
  exp(-val) * scale
}

adaptIntegrate(testFn1, rep(0, 3), rep(1, 3), tol=1e-4)

##
## Test function 2
## Compare with original cubature result of
## ./cubature_test 2 1e-4 2 0
## 2-dim integral, tolerance = 0.0001
## integrand 2: integral = 0.19728, est err = 1.97261e-05, true err = 4.58316e-05
## #evals = 166141

testFn2 <- function(x) {
  ## discontinuous objective: volume of hypersphere
  radius = as.double(0.50124145262344534123412)
  ifelse(sum(x*x) < radius*radius, 1, 0)
}

adaptIntegrate(testFn2, rep(0, 2), rep(1, 2), tol=1e-4)

##
## Test function 3
## Compare with original cubature result of
## ./cubature_test 3 1e-4 3 0
## 3-dim integral, tolerance = 0.0001
## integrand 3: integral = 1, est err = 0, true err = 2.22045e-16
## #evals = 33

testFn3 <- function(x) {
  prod(2*x)
}

adaptIntegrate(testFn3, rep(0,3), rep(1,3), tol=1e-4)

##
## Test function 4 (Gaussian centered at 1/2)
## Compare with original cubature result of
## ./cubature_test 2 1e-4 4 0
## 2-dim integral, tolerance = 0.0001
## integrand 4: integral = 1, est err = 9.84399e-05, true err = 2.78894e-06
## #evals = 1853

testFn4 <- function(x) {
  a = 0.1
  s = sum((x-0.5)^2)
  (M_2_SQRTPI / (2. * a))^length(x) * exp (-s / (a * a))
}

adaptIntegrate(testFn4, rep(0,2), rep(1,2), tol=1e-4)


##
## Test function 5 (double Gaussian)
## Compare with original cubature result of
## ./cubature_test 3 1e-4 5 0
## 3-dim integral, tolerance = 0.0001
## integrand 5: integral = 0.999994, est err = 9.98015e-05, true err = 6.33407e-06
## #evals = 59631

testFn5 <- function(x) {
  a = 0.1
  s1 = sum((x-1/3)^2)
  s2 = sum((x-2/3)^2)
  0.5 * (M_2_SQRTPI / (2. * a))^length(x) * (exp(-s1 / (a * a)) + exp(-s2 / (a * a)))
}

adaptIntegrate(testFn5, rep(0,3), rep(1,3), tol=1e-4)

##
## Test function 6 (Tsuda's example)
## Compare with original cubature result of
## ./cubature_test 4 1e-4 6 0
## 4-dim integral, tolerance = 0.0001
## integrand 6: integral = 0.999998, est err = 9.99685e-05, true err = 1.5717e-06
## #evals = 18753

testFn6 <- function(x) {
  a = (1+sqrt(10.0))/9.0
  prod(a/(a+1)*((a+1)/(a+x))^2)
}

adaptIntegrate(testFn6, rep(0,4), rep(1,4), tol=1e-4)


##
## Test function 7
##   test integrand from W. J. Morokoff and R. E. Caflisch, "Quasi=
##   Monte Carlo integration," J. Comput. Phys 122, 218-230 (1995).
##   Designed for integration on [0,1]^dim, integral = 1. */
## Compare with original cubature result of
## ./cubature_test 3 1e-4 7 0
## 3-dim integral, tolerance = 0.0001
## integrand 7: integral = 1.00001, est err = 9.96657e-05, true err = 1.15994e-05
## #evals = 7887

testFn7 <- function(x) {
  n <- length(x)
  p <- 1/n
  (1+p)^n * prod(x^p)
}

adaptIntegrate(testFn7, rep(0,3), rep(1,3), tol=1e-4)


## Example from web page
## http://ab-initio.mit.edu/wiki/index.php/Cubature
##
## f(x) = exp(-0.5(euclidean_norm(x)^2)) over the three-dimensional
## hyperbcube [-2, 2]^3
## Compare with original cubature result
testFnWeb <-  function(x) {
  exp(-0.5*sum(x^2))
}

adaptIntegrate(testFnWeb, rep(-2,3), rep(2,3), tol=1e-4)

## Test function I.1d from
## Numerical integration using Wang-Landau sampling
## Y. W. Li, T. Wust, D. P. Landau, H. Q. Lin
## Computer Physics Communications, 2007, 524-529
## Compare with exact answer: 1.63564436296
##
I.1d <- function(x) {
  sin(4*x) *
    x * ((x * ( x * (x*x-4) + 1) - 1))
}

adaptIntegrate(I.1d, -2, 2, tol=1e-7)

## Test function I.2d from
## Numerical integration using Wang-Landau sampling
## Y. W. Li, T. Wust, D. P. Landau, H. Q. Lin
## Computer Physics Communications, 2007, 524-529
## Compare with exact answer: -0.01797992646
##
## Test function I.2d from
## Numerical integration using Wang-Landau sampling
## Y.W. Li, T. Wust, D.P. Landau, H.Q. Lin
## Computer Physics Communications, 2007 524-529
## Compare with exact answer: -0.01797992646
##
I.2d <- function(x) {
  x1 = x[1]
  x2 = x[2]
  sin(4*x1+1) * cos(4*x2) * x1 * (x1*(x1*x1)^2 - x2*(x2*x2 - x1) +2)
}

adaptIntegrate(I.2d, rep(-1, 2), rep(1, 2), maxEval=10000)

##
## Example of multivariate normal integration borrowed from
## package mvtnorm (on CRAN) to check ... argument
## Compare with output of
## pmvnorm(lower=rep(-0.5, m), upper=c(1,4,2), mean=rep(0, m), corr=sigma, alg=Miwa())
##     0.3341125.  Blazing quick as well!  Ours is, not unexpectedly, much slower.
##
dmvnorm <- function (x, mean, sigma, log = FALSE) {
    if (is.vector(x)) {
        x <- matrix(x, ncol = length(x))
    }
    if (missing(mean)) {
        mean <- rep(0, length = ncol(x))
    }
    if (missing(sigma)) {
        sigma <- diag(ncol(x))
    }
    if (NCOL(x) != NCOL(sigma)) {
        stop("x and sigma have non-conforming size")
    }
    if (!isSymmetric(sigma, tol = sqrt(.Machine$double.eps),
        check.attributes = FALSE)) {
        stop("sigma must be a symmetric matrix")
    }
    if (length(mean) != NROW(sigma)) {
        stop("mean and sigma have non-conforming size")
    }
    distval <- mahalanobis(x, center = mean, cov = sigma)
    logdet <- sum(log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values))
    logretval <- -(ncol(x) * log(2 * pi) + logdet + distval)/2
    if (log)
        return(logretval)
    exp(logretval)
}

m <- 3
sigma <- diag(3)
sigma[2,1] <- sigma[1, 2] <- 3/5 ; sigma[3,1] <- sigma[1, 3] <- 1/3
sigma[3,2] <- sigma[2, 3] <- 11/15
adaptIntegrate(dmvnorm, lower=rep(-0.5, m), upper=c(1,4,2),
                        mean=rep(0, m), sigma=sigma, log=FALSE,
               maxEval=10000)

[Package cubature version 1.1-2 ]

In [0]: