Setup

Load packages

library(rstanarm)
library(brms)
library(cmdstanr)
options(mc.cores = 4)
library(loo)
library(ggplot2)
library(ggdist)
library(bayesplot)
theme_set(bayesplot::theme_default(base_family = "sans"))
library(posterior)
options(posterior.num_args=list(digits=2))
library(priorsense)
library(dplyr)
library(tibble)

1 Introduction

This notebook demonstrates cross-validation model comparison and cross-validation predictive checking of models. Furthermore the notebook demonstrates how to use integrated PSIS-LOO with varying intercept (“random effect”) models.

The roaches data example comes from Chapter 8.3 of Gelman and Hill (2007) and the introduction text for the data is from Estimating Generalized Linear Models for Count Data with rstanarm by Jonah Gabry and Ben Goodrich.

We want to make inferences about the efficacy of a certain pest management system at reducing the number of roaches in urban apartments. Here is how Gelman and Hill describe the experiment (pg. 161):

the treatment and control were applied to 160 and 104 apartments, respectively, and the outcome measurement \(y_i\) in each apartment \(i\) was the number of roaches caught in a set of traps. Different apartments had traps for different numbers of days

In addition to an intercept, the regression predictors for the model are the pre-treatment number of roaches roach1, the treatment indicator treatment, and a variable indicating whether the apartment is in a building restricted to elderly residents senior. Because the number of days for which the roach traps were used is not the same for all apartments in the sample, we include it as an exposure2 by adding \(\ln(u_i)\)) to the linear predictor \(\eta_i\) and it can be specified using the offset argument to stan_glm.

2 Poisson model

Load data

data(roaches)
# Roach1 is very skewed and we take a square root
roaches$sqrt_roach1 <- sqrt(roaches$roach1)

Make a Poisson regression model with rstanarm

stan_glmp <- stan_glm(y ~ sqrt_roach1 + treatment + senior, offset = log(exposure2),
                      data = roaches, family = poisson, 
                      prior = normal(0,2.5), prior_intercept = normal(0,5),
                      chains = 4, cores = 1, seed = 1704009, refresh=0)

2.1 Analyse posterior

Plot posterior

mcmc_areas(as.matrix(stan_glmp), prob_outer = .999,
           pars = c("sqrt_roach1","treatment","senior"))

All marginal posteriors are clearly away from zero.

2.2 Cross-validation checking

We can use Pareto-smoothed importance sampling leave-one-out (PSIS-LOO) cross-validation as model checking tool (Vehtari, Gelman and Gabry, 2017).

(loop <- loo(stan_glmp))

Computed from 4000 by 262 log-likelihood matrix

         Estimate     SE
elpd_loo  -5461.6  695.7
p_loo       258.3   55.7
looic     10923.2 1391.5
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     237   90.5%   378       
 (0.5, 0.7]   (ok)         9    3.4%   61        
   (0.7, 1]   (bad)        9    3.4%   13        
   (1, Inf)   (very bad)   7    2.7%   1         
See help('pareto-k-diagnostic') for details.

p_loo is about 260, which is much higher than the number of parameters p=4, which indicates bad misspecification which we are like likely to see also with posterior predictive checking (see, https://mc-stan.org/loo/reference/loo-glossary.html#pareto-k-estimates-1)

We can check the observation specific plot Pareto-\(\hat{k}\) values.

plot(loop)

There are several observations which are highly influential, which indicates potential model misspecification (Vehtari, Gelman and Gabry, 2017).

Before looking in more detail where the problem is or fixing it, let’s check what would cross-validation say about relevance of covariates.

We form 3 models by dropping each of the covariates out. We later compare these to results from other models.

stan_glmm1p <- update(stan_glmp, formula = y ~ treatment + senior)
stan_glmm2p <- update(stan_glmp, formula = y ~ sqrt_roach1 + senior)
stan_glmm3p <- update(stan_glmp, formula = y ~ sqrt_roach1 + treatment)

Although Pareto \(k\) values were very large we can make a quick test with PSIS-LOO (if the comparison would say there is difference, then PSIS-LOO couldn’t be trusted and refitting problematic folds or \(K\)-fold-CV would be needed (see more in Vehtari, Gelman and Gabry, 2017)).

loo_compare(loo(stan_glmm1p), loop)
            elpd_diff se_diff
stan_glmp       0.0       0.0
stan_glmm1p -3001.7     687.4
loo_compare(loo(stan_glmm2p), loop)
            elpd_diff se_diff
stan_glmp      0.0       0.0 
stan_glmm2p -232.8     200.4 
loo_compare(loo(stan_glmm3p), loop)
            elpd_diff se_diff
stan_glmp    0.0       0.0   
stan_glmm3p -9.3      91.1   

Based on this the roaches covariate would be relevant, but although dropping treatment or senior covariate will make a large change to elpd, the uncertainty is also large and cross-validation states that these covariates are not necessarily relevant! The posterior marginals are conditional on the model, but cross-validation is more cautious by not using any model for the future data distribution.

2.3 Posterior predictive checking

In addition of cross-validation, the posterior predictive checks can often detect problems and also provide more information about the reason. As the range of counts is large, we can use kernel density estimate plot.

pp_check(stan_glmp, plotfun = "dens_overlay", nreps=20) +
  scale_x_sqrt(breaks=c(0,1,3,10,30,100,300), lim=c(0,400))

We see that the marginal distribution of model replicated data is clearly different from the observed data.

We test additionally the proportion of zeros predicted by the model and compare them to the observed number of zeros.

prop_zero <- function(y) mean(y == 0)
(prop_zero_test1 <- pp_check(stan_glmp, plotfun = "stat", stat = "prop_zero"))

3 Negative binomial model

We change the Poisson model to a more robust negative binomial model.

stan_glmnb <- update(stan_glmp, family = neg_binomial_2)

3.1 Analyse posterior

Plot posterior

mcmc_areas(as.matrix(stan_glmnb), prob_outer = .999,
    pars = c("sqrt_roach1","treatment","senior"))

Treatment effect is much closer to zero, and senior effect has lot of probability mass on both sides of 0. So it matters, which model we use.

We discuss posterior dependencies in more detail in collinear notebook, but for reference we plot also here paired marginals.

mcmc_pairs(as.matrix(stan_glmnb),
           pars = c("sqrt_roach1","treatment","senior"))

There are some posterior correlations, but not something which would change our conclusions.

3.2 Cross-validation checking

Let’s check PSIS-LOO and Pareto \(k\) diagnostics

(loonb <- loo(stan_glmnb, save_psis=TRUE))

Computed from 4000 by 262 log-likelihood matrix

         Estimate   SE
elpd_loo   -881.6 38.2
p_loo         8.1  3.4
looic      1763.2 76.4
------
Monte Carlo SE of elpd_loo is 0.1.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     261   99.6%   279       
 (0.5, 0.7]   (ok)         1    0.4%   82        
   (0.7, 1]   (bad)        0    0.0%   <NA>      
   (1, Inf)   (very bad)   0    0.0%   <NA>      

All Pareto k estimates are ok (k < 0.7).
See help('pareto-k-diagnostic') for details.

All khat’s are ok, which indicates that negative-binomial would be better. We can also compare Poisson and negative-binomial.

loo_compare(loop, loonb)
           elpd_diff se_diff
stan_glmnb     0.0       0.0
stan_glmp  -4580.0     675.9

Negative-binomial model is clearly better than Poisson.

As Poisson is a special case of negative-Binomial, we could have also seen that Poisson is likely by looking at the posterior of the over-dispersion parameter (which gets very small values).

mcmc_areas(as.matrix(stan_glmnb), prob_outer = .999,
    pars = c("reciprocal_dispersion"))

3.3 Posterior predictive checking

We use posterior predictive checking to compare marginal distributions.

pp_check(stan_glmnb, plotfun = "dens_overlay", nreps=20) +
  scale_x_sqrt(breaks=c(0,1,3,10,30,100,300), lim=c(0,400))

We see that the negative-binomial model is much better although not perfect as the model predictive distribution has more mass for small counts than the real data.

#at the moment pit computation is not correct for discrete (PR in progress)
#pp_check(stan_glmnb, plotfun = "pit_ecdf")
pit <- function(y, yrep) {
  n_draws <- nrow(yrep)
  pit <- sapply(1:length(y),
         \(n) {
           mean(y[n] > yrep[, n]) +
             # randomized PIT for discrete y (Czado, C., Gneiting, T.,
             # Held, L.: Predictive model assessment for count
             # data. Biometrics 65(4), 1254–1261 (2009).)
             sample(sum(y[n] == yrep[, n]), 1) / n_draws
         })
  pmax(pmin(pit, 1), 0)
}

Overall the PIT distribution looks fine

ppc_pit_ecdf(pit=pit(y = roaches$y, yrep = posterior_predict(stan_glmnb)))

Previously this notebook used only probability of zeros as the statistic, which misses the miscalibration seen in density plots and PIT plots

(prop_zero_test2 <- pp_check(stan_glmnb, plotfun = "stat", stat = "prop_zero"))

The posterior predictive check compares observations to the posterior predictive distribution which is conditioned on the same observations. We can avoid double use of data, by using LOO predictive distributions. Natural statistic is PIT which should have approximately uniform distribution if the model is well calibrated.

#pp_check(stan_glmnb, plotfun = "loo_pit_qq") +
loo_pit <- function(y, yrep, lw) {
  pit <- vapply(seq_len(ncol(yrep)), function(j) {
    sel_min <- yrep[, j] < y[j]
    pit_min <- exp_log_sum_exp(lw[sel_min,j])
    sel_sup <- yrep[, j] == y[j]
    pit_sup <- pit_min + exp_log_sum_exp(lw[sel_sup,j])
    runif(1, pit_min, pit_sup)
  }, FUN.VALUE = 1)
  pmax(pmin(pit, 1), 0)
}
exp_log_sum_exp <- function(x) {
  m <- suppressWarnings(max(x))
  exp(m + log(sum(exp(x - m))))
}

Bayesplot includes loo_pit_qq for checking how uniform LOO-PIT distribution is.

ppc_loo_pit_qq(pit=loo_pit(y = roaches$y, yrep = posterior_predict(stan_glmnb),
                           lw=weights(loonb$psis_object))) +
  geom_abline() +
  ylim(c(0,1))

We can use also the pit_ecdf plot to make the comparison to posterior predictive PIT values easier.

ppc_pit_ecdf(pit=loo_pit(y = roaches$y, yrep = posterior_predict(stan_glmnb),
                         lw = weights(loonb$psis_object)))

We can compare posterior-PIT and LOO-PIT directly

yrep = posterior_predict(stan_glmnb)
data.frame(pit=sort(pit(y = roaches$y, yrep = posterior_predict(stan_glmnb))),
           loopit=sort(loo_pit(y = roaches$y, yrep = posterior_predict(stan_glmnb),
                               lw = weights(loonb$psis_object)))) |>
  ggplot(aes(x=pit,y=loopit)) +
  geom_point() +
  geom_abline() +
  labs(x='Posterior-PIT',y='LOO-PIT')

There is not much difference as the number of observations is much larger than the number of parameters, and the posterior predictive distributions and LOO-predictive distributions are similar.

## A=seq(0.01,0.99,by=0.01)
## loopit <- loo_pit(y = roaches$y, yrep = posterior_predict(stan_glmnb),
##                          lw = weights(loonb$psis_object))
## data.frame(A=A,Aloo=sapply(A,\(a) {mean(loopit>=0.5-a/2 & loopit<=0.5+a/2)}))|>
##   ggplot(aes(x=A,y=Aloo))+
##   geom_point() +
##   labs(x='Nominal central coverage', y='Observed LOO coverage')+
##   geom_abline()

3.4 Predictive relevance of covariates

Let’s finally check cross-validation model comparison that it agrees on relevance of covariates

stan_glmm1nb <- update(stan_glmm1p, family = neg_binomial_2)
stan_glmm2nb <- update(stan_glmm2p, family = neg_binomial_2)
stan_glmm3nb <- update(stan_glmm3p, family = neg_binomial_2)
loo_compare(loo(stan_glmm1nb), loonb)
             elpd_diff se_diff
stan_glmnb     0.0       0.0  
stan_glmm1nb -35.4       9.0  
loo_compare(loo(stan_glmm2nb), loonb)
             elpd_diff se_diff
stan_glmnb    0.0       0.0   
stan_glmm2nb -8.5       5.6   
loo_compare(loo(stan_glmm3nb), loonb)
             elpd_diff se_diff
stan_glmm3nb  0.0       0.0   
stan_glmnb   -2.1       1.7   

Covariate roaches1 has clear effect. treatment effect is visible in marginal posterior, but as discussed in betablockers demo, cross-validation is not good for detecting weak effects. Based on cross-validation senior effect is also having negligible effect in the predictive performance.

Conclusion from the analysis would be then that, treatment is likely to help, but it’s difficult to predict the number of roaches given treatment or not.

4 Poisson model with varying intercepts

Sometimes overdispersion is modelled by adding varying intercepts (“random effects”) for each individual. The following example illustrates computational problems in this approach.

Fit with stan_glm

roaches$id <- 1:dim(roaches)[1]
stan_glmpr <- stan_glmer(y ~ sqrt_roach1 + treatment + senior + (1 | id), 
                         offset = log(exposure2),
                         data = roaches, family = poisson, 
                         prior = normal(0,2.5), prior_intercept = normal(0,5),
                         chains = 4, cores = 4, iter = 4000, 
                         seed = 1704009, refresh=0)

4.1 Analyse posterior

Plot posterior

mcmc_areas(as.matrix(stan_glmpr), prob_outer = .999,
    pars = c("sqrt_roach1","treatment","senior"))

The marginals are similar as with negative-binomial model except the marginal for senior is clearly away from zero.

4.2 Cross-validation checking

Let’s check PSIS-LOO.

(loopr <- loo(stan_glmpr, save_psis=TRUE))

Computed from 8000 by 262 log-likelihood matrix

         Estimate   SE
elpd_loo   -634.1 24.2
p_loo       169.4  4.6
looic      1268.2 48.3
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)       5    1.9%   797       
 (0.5, 0.7]   (ok)        48   18.3%   154       
   (0.7, 1]   (bad)      185   70.6%   14        
   (1, Inf)   (very bad)  24    9.2%   9         
See help('pareto-k-diagnostic') for details.

p_loo is about 175, which is less than the number of parameters 267, but it is relatively large compared to to the number of observations (p_loo >>N/5), which indicates very flexible model. In this case his is due to having an intercept parameter for each observations. Removing one observation changes the posterior for that intercept so much that importance sampling fails (even with Pareto smoothing). Note that WAIC would fail due to the same reason.

We can also plot Pareto \(k\) values.

plot(loopr)

We have a very large number of high \(k\) values which indicates very flexible model.

While importance sampling in PSIS-LOO can fail for varying parameter model, we can use \(K\)-fold-CV instead to re-fit the model 10 times, each time leaving out 10% of the observations. This shows that cross-validation itself is not infeasible for varying parameter models.

(kcvpr <- kfold(stan_glmpr, K=10))

Based on 10-fold cross-validation

           Estimate   SE
elpd_kfold   -878.6 38.1
p_kfold       413.9 18.8
kfoldic      1757.2 76.2

loo package allows comparing PSIS-LOO and \(K\)-fold-CV results

loo_compare(loonb, kcvpr)
           elpd_diff se_diff
stan_glmpr  0.0       0.0   
stan_glmnb -3.0       7.6   

There is not much difference, and this difference could also be explained by \(K\)-fold-CV using only 90% of observations for the posteriors, while PSIS-LOO is using 99.6% of observations for the posteriors. We can check this by running \(K\)-fold-CV also for negative-binomial model.

(kcvnb <- kfold(stan_glmnb, K=10))

Based on 10-fold cross-validation

           Estimate   SE
elpd_kfold   -879.2 38.0
p_kfold         5.7  2.8
kfoldic      1758.4 76.0
loo_compare(kcvnb, kcvpr)
           elpd_diff se_diff
stan_glmpr  0.0       0.0   
stan_glmnb -0.6       7.3   

When both models are assessed using \(K\)-fold-CV, the difference in predictive performance is very small. The models can still have different predictive distributions.

Now that we’ve seen that based on robust \(K\)-fold-CV there is not much difference between negative-binomial and varying-intercept-Poisson models, we can also check how bad the comparison would have been with PSIS-LOO.

loo_compare(loonb, loopr)
           elpd_diff se_diff
stan_glmpr    0.0       0.0 
stan_glmnb -247.5      17.3 

If we would have ignored Pareto-\(k\) warnings, we would have mistakenly assumed that varying intercept model is much better. Note that WAIC is (as usual) even worse (see also Vehtari, Gelman and Gabry (2017))

loo_compare(waic(stan_glmnb), waic(stan_glmpr))
           elpd_diff se_diff
stan_glmpr    0.0       0.0 
stan_glmnb -309.1      18.2 

4.3 Posterior predictive checking

We do posterior predictive checking for varying intercept Poisson model.

pp_check(stan_glmpr, plotfun = "dens_overlay", nreps=20) +
  scale_x_sqrt(breaks=c(0,1,3,10,30,100,300), lim=c(0,400))

The match looks perfect, but that can be explained with having one parameter for each observation and kernel density estimate hiding something. PIT-ECDF plot shows problems

ppc_pit_ecdf(pit=pit(y = roaches$y, yrep = posterior_predict(stan_glmpr)))

There are too many PIT values near 0.5. If we look at the predictive intervals and observations, we see that many the observations are in the middle of the posterior predictive interval, which can be explained by having very flexible model with one parameter for each observation.

ppc_intervals(y = roaches$y, yrep = posterior_predict(stan_glmpr))

LOO-PIT’s are not much better, because PSIS-LOO fails (remember the high Pareto-\(\hat{k}\)’s)

## pp_check(stan_glmpr, plotfun = "loo_pit_qq") +
##   geom_abline() +
##   ylim(c(0,1))
## ppc_loo_pit_qq(pit=loo_pit(y = roaches$y, yrep = posterior_predict(stan_glmpr),
##                            lw=weights(loopr$psis_object))) +
##   geom_abline() +
##   ylim(c(0,1))
ppc_pit_ecdf(pit=loo_pit(y = roaches$y, yrep = posterior_predict(stan_glmpr),
                         lw=weights(loopr$psis_object)))

5 Poisson model with varying intercept and integrated LOO

Removing one observation changes the posterior for that intercept so much that importance sampling fails (even with Pareto smoothing). We can get improved stability by integrating that intercept out with something more accurate than the importance sampling (Vehtari et al., 2016). If there is only one group or individual specific parameter then we can integrate that out easily with adaptive quadrature (2 parameters with 2D quadrature would work, too, which could be implemented with nested 1D quadratures).

As we can easily integrate out only 1 (or 2) parameters (per group / individual), it’s not easy to make a generic approach for rstanarm and brms, and thus I illustrate the approach with direct Stan model code and cmdstanr.

The following Stan model uses individual specific intercept term z[i] (with common prior with scale sigmaz). In the generated quantities, the usual log_lik computation is replaced with integrated approach. We also generate y_loorep which is the LOO predictive distribution given other parameters than z. This is needed to get the correct LOO predictive distributions when combined with integrated PSIS-LOO.

poisson_re_int <- "poisson_re_integrate.stan"
writeLines(readLines(poisson_re_int))
// Poisson regression with hierarchical intercept ("random effect")
functions {
  real integrand(real z, real notused, array[] real theta,
               array[] real X_i, array[] int y_i) {
    real sigmaz = theta[1];
    real mu_i = theta[2];
    real p = exp(normal_lpdf(z | 0, sigmaz) + poisson_log_lpmf(y_i | z + mu_i));
    return (is_inf(p) || is_nan(p)) ? 0 : p;
  }
}
data {
  int<lower=0> N;           // number of data points
  int<lower=0> P;           // number of covariates
  matrix[N,P] X;            // covariates
  array[N] int<lower=0> y;  // target
  vector[N] offsett;        // offset (offset variable name is reserved)
  real integrate_1d_reltol;
}
parameters {
  real alpha;               // intercept
  vector[P] beta;           // slope
  vector[N] z;              // individual intercept ("random effect")
  real<lower=0> sigmaz;     // prior scale for z
}
model {
  // priors
  alpha ~ normal(0, 3);  
  beta ~ normal(0, 3);    
  z ~ normal(0, sigmaz);  
  sigmaz ~ normal(0, 1);
  // observation model
  y ~ poisson_log_glm(X, z+offsett+alpha, beta); 
}
generated quantities {
  // log_lik for PSIS-LOO
  vector[N] log_lik;
  vector[N] y_rep;
  vector[N] y_loorep;
  for (i in 1:N) {
    // z as posterior draws, this would be challenging for PSIS-LOO (and WAIC)
    // log_lik[i] = poisson_log_glm_lpmf({y[i]} | X[i,], z[i]+offsett[i]+alpha, beta);
    // posterior predictive replicates conditional on p(z[i] | y[i])
    // in theory these could be used with above (non-integrated) log_lik and PSIS-LOO
    // to get draws from LOO predictive distribution, but the distribution of the
    // ratios from above lok_lik is bad
    real mu_i = offsett[i] + alpha + X[i,]*beta;
    y_rep[i] = poisson_log_rng(z[i] + mu_i);
    // we can integrate each z[i] out with 1D adaptive quadrature to get more
    // stable log_lik and corresponding importance ratios
    log_lik[i] = log(integrate_1d(integrand,
                              negative_infinity(),
                              positive_infinity(),
                              append_array({sigmaz}, {mu_i}),
                  {0}, // not used, but an empty array not allowed
                  {y[i]},
                  integrate_1d_reltol));
    // conditional LOO predictive replicates conditional on p(z[i] | sigmaz, mu_i)
    // these combined with integrated log_lik and PSIS-LOO provide
    // more stable LOO predictive distributions
    y_loorep[i] = poisson_log_rng(normal_rng(0, sigmaz) + mu_i);
  }
}

Compile the model, prepare the data, and sample. integrate_1d_reltol sets the relative tolerance for the adaptive 1D quadrature function integrate_1d (if with other model and data you see messages about error estimate of integral exceeding the given relative tolerance times norm of integral, you will get NaNs and need to increase the relative tolerance).

modpri <- cmdstan_model(stan_file = poisson_re_int)
datap <- list(N = dim(roaches)[1],
              P = 3,
              offsett = log(roaches$exposure2),
              X = roaches[,c('sqrt_roach1','treatment','senior')],
              y = roaches$y,
              integrate_1d_reltol = 1e-6)
fitpri <- modpri$sample(data = datap, refresh = 0, chains = 4, parallel_chains = 4)

The posterior is similar as above for varying intercept Poisson model (as it should be, as the generated quantities is not affecting the posterior).

mcmc_areas(as_draws_matrix(fitpri$draws(variables=c('beta','sigmaz'))),
           prob_outer = .999)

Now the PSIS-LOO doesn’t give warnings and the result is close to K-fold-CV.

(loopri <- fitpri$loo(save_psis=TRUE))

Computed from 4000 by 262 log-likelihood matrix

         Estimate   SE
elpd_loo   -878.8 38.3
p_loo         5.2  0.5
looic      1757.6 76.6
------
Monte Carlo SE of elpd_loo is 0.2.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

Comparing to the negative binomial model gives only slight improvement in predictive performance.

loo_compare(fitpri=loopri, loonb)
           elpd_diff se_diff
fitpri      0.0       0.0   
stan_glmnb -2.8       7.4   

Posterior predictive checking with KDE looks very good, but that can be explained with having one parameter for each observation and posterior predictive checking cannot be trusted.

drm <- fitpri$draws(format='matrix')
y_rep <- subset_draws(drm,variable='y_rep')
pp_check(roaches$y, y_rep[sample(1:4000,20),], fun='dens_overlay') +
  scale_x_sqrt(breaks=c(0,1,3,10,30,100,300), lim=c(0,400))

PIT-ECDF’s are the same as for the varying intercept Poisson LOO-PIT’s look good as integrated LOO helps.

y_loorep <- subset_draws(drm,variable='y_loorep')
## pp_check(roaches$y, yrep=y_loorep, fun='loo_pit_qq', lw=weights(loopri$psis_object)) +
##   geom_abline() +
##   ylim(c(0,1))
## ppc_loo_pit_qq(pit=loo_pit(y = roaches$y, yrep = y_loorep,
##                            lw=weights(loopri$psis_object))) +
##   geom_abline() +
##   ylim(c(0,1))
ppc_pit_ecdf(pit=loo_pit(y = roaches$y, yrep = y_loorep,
                         lw=weights(loopri$psis_object)))

6 Zero-inflated negative-binomial model

As the proportion of zeros is quite high in the data, it is worthwhile to test also a zero-inflated negative-binomial model, which is a mixture of two models - logistic regression to model the proportion of extra zero counts - negative-binomial model

We switch to brms as rstanarm doesn’t support zero-inflated negative-binomial model.

brm_glmzinb <-
  brm(bf(y ~ sqrt_roach1 + treatment + senior + offset(log(exposure2)),
         zi ~ sqrt_roach1 + treatment + senior + offset(log(exposure2))),
      family=zero_inflated_negbinomial(), data=roaches,
      prior=c(prior(normal(0,3), class='b'),
              prior(normal(0,3), class='b', dpar='zi'),
              prior(normal(0,3), class='Intercept', dpar='zi')),
      seed=1704009, refresh=1000)

Based on PSIS-LOO, zero-inflated negative-binomial is clearly better.

brm_glmzinb <- add_criterion(brm_glmzinb, criterion='loo', save_psis=TRUE)
(loozinb <- loo(brm_glmzinb))

Computed from 4000 by 262 log-likelihood matrix

         Estimate   SE
elpd_loo   -859.1 37.7
p_loo        11.4  2.8
looic      1718.2 75.4
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     260   99.2%   1607      
 (0.5, 0.7]   (ok)         1    0.4%   117       
   (0.7, 1]   (bad)        1    0.4%   71        
   (1, Inf)   (very bad)   0    0.0%   <NA>      
See help('pareto-k-diagnostic') for details.
loo_compare(loonb, loozinb)
            elpd_diff se_diff
brm_glmzinb   0.0       0.0  
stan_glmnb  -22.5       6.7  

Posterior predictive checking looks good, but there is no clear difference to negative-binomial in most checks.

pp_check(brm_glmzinb, type='dens_overlay') +
    scale_x_sqrt(breaks=c(0,1,3,10,30,100,300), lim=c(0,400))

ppc_pit_ecdf(pit=pit(y = roaches$y, yrep = posterior_predict(brm_glmzinb)))

LOO-PIT-ECDF

ppc_pit_ecdf(pit=loo_pit(y = roaches$y, yrep = posterior_predict(brm_glmzinb),
                         lw = weights(loozinb$psis_object)))

Proportion of zeros is similar, but has more variation compared to negative-binomial model.

pp_check(brm_glmzinb, type='stat', stat=\(y) mean(y==0))

There is much bigger difference in max count test statistic. We first check the max count PPC for negative-binomial model.

pp_check(stan_glmnb, plotfun = "stat", stat = "max") +
  scale_x_sqrt(breaks=c(100,300,1000,3000,10000,30000))

which shows that negative-binomial model is predicting often 10-100 larger roach counts than in the data (30,000 roaches in one trap is a lot).

The max count PPC for zero-inflated negative-binomial model

pp_check(brm_glmzinb, type = "stat", stat = "max") +
  scale_x_sqrt(breaks=c(100,300,1000,3000,10000))

is much better, although still the max counts can be 10 times bigger than the max count in the data.

The difference in the max counts can be explained so that the plain negative binomial model has smaller shape parameter to model the overdispersion in zeros but leading to long tail without observations. We can check the overdispersion parameters.

# negative-binomial model 
mean(as_draws_rvars(stan_glmnb)$reciprocal_dispersion)
[1] 0.310424
# zero-inflated negative-binomial model 
mean(as_draws_rvars(brm_glmzinb)$shape)
[1] 0.4932308

Although the models are different, with finite data and wide LOO predictive distributions, there is a limit in which differences can be see in LOO-PIT values. Both negative-binomial and zero-inflated negative binomial are close enough the LOO-PIT can’t see discrepancy from the data, but elpd_loo is still able to to show that zero-inflation component improves the predictive accuracy.

6.1 Analyse posterior

Plot posterior

mcmc_areas(as.matrix(brm_glmzinb)[,3:8], prob_outer = .999)

The posterior marginals for negative-binomial part are similar to marginals in the plain negative-binomial model. The marginal effects for the logistic part have opposite sign as the logistic part is modelling the extra zeros.

The treatment effect is now divided between negative-binomial and logistic part. We can use the model to make predictions for the expected number of roaches given treatment and no-treatment.

Expectations of posterior predictive distributions given treatment=0 and treatment=1

pred <- posterior_epred(brm_glmzinb, newdata=rbind(mutate(roaches, treatment=0),
                                                   mutate(roaches, treatment=1)))

Ratio of expected number of roaches with vs without treatment

ratio <- array(rowMeans(pred[,263:524]/pred[,1:262]), c(1000, 4, 1)) |>
  as_draws_df() |>
  set_variables(variables='ratio')
ratio |>
  ggplot(aes(x=ratio)) +
  stat_slab() +
  labs(x='Ratio of roaches with vs without treatment', y=NULL) +
  scale_y_continuous(breaks=NULL) +
  theme(axis.line.y=element_blank(),
        strip.text.y=element_blank()) +
  xlim(c(0,1)) +
  geom_vline(xintercept=1, linetype='dotted')

The treatment clearly reduces the expected number of roaches.

Make prior sensitivity analysis by power-scaling both prior and likelihood. Focus on the ratio of expected number of roaches with vs without treatment.

powerscale_sensitivity(brm_glmzinb, prediction = \(x, ...) ratio)$sensitivity |>
                         filter(variable=='ratio') |>
                         mutate(across(where(is.double),  ~num(.x, digits=2)))
# A tibble: 1 × 4
  variable     prior likelihood diagnosis
  <chr>    <num:.2!>  <num:.2!> <chr>    
1 ratio         0.01       0.11 -        

There is no prior sensitivity

6.2 Predictive relevance of covariates

Let’s finally check cross-validation model comparison to see whether improved model has effect on the predictive performance comparison.

brm_glmm1zinb <-
  update(brm_glmzinb,
         formula=bf(y ~ treatment + senior + offset(log(exposure2)),
                    zi ~ treatment + senior + offset(log(exposure2))))
brm_glmm2zinb <-
  update(brm_glmzinb,
         formula=bf(y ~ sqrt_roach1 + senior + offset(log(exposure2)),
                    zi ~ sqrt_roach1 + senior + offset(log(exposure2))))
brm_glmm3zinb <-
  update(brm_glmzinb,
         formula=bf(y ~ sqrt_roach1 + treatment + offset(log(exposure2)),
                  zi ~ sqrt_roach1 + treatment + offset(log(exposure2))))
loo_compare(loo(brm_glmm1zinb),loozinb)
              elpd_diff se_diff
brm_glmzinb     0.0       0.0  
brm_glmm1zinb -57.6      10.2  
loo_compare(loo(brm_glmm2zinb),loozinb)
              elpd_diff se_diff
brm_glmzinb    0.0       0.0   
brm_glmm2zinb -8.2       5.1   
loo_compare(loo(brm_glmm3zinb),loozinb)
              elpd_diff se_diff
brm_glmm3zinb  0.0       0.0   
brm_glmzinb   -1.3       2.4   

Roaches1 has clear effect. Treatment effect improves the predictive performance with 95% probability (the normal approximation can be trusted as 1) the number of observations is larger than 100, 2) the elpd difference is bigger than 4, 3) there are no clear outliers, and 4) the distribution of the pointwise differences has finite variance as tested with Pareto-\(\hat{k}\) diagnostic (Sivula, Magnusson and Vehtari, 2020; Vehtari et al., 2022).


References

Sivula, T., Magnusson, M. and Vehtari, A. (2020) ‘Uncertainty in Bayesian leave-one-out cross-validation based model comparison’, arXiv:2008.10296.

Vehtari, A., Gelman, A. and Gabry, J. (2017) ‘Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC’, Statistics and Computing, 27(5), pp. 1413–1432. doi: 10.1007/s11222-016-9696-4.

Vehtari, A., Mononen, T., Tolvanen, V., Sivula, T. and Winther, O. (2016) ‘Bayesian leave-one-out cross-validation approximations for gaussian latent variable models’, The Journal of Machine Learning Research, 17(1), pp. 3581–3618.

Vehtari, A., Simpson, D., Gelman, A., Yao, Y. and Gabry, J. (2022) ‘Pareto smoothed importance sampling’, arXiv preprint arXiv:1507.02646. Available at: https://arxiv.org/abs/1507.02646v6.

Licenses

  • Code © 2017-2024, Aki Vehtari, licensed under BSD-3.
  • Text © 2017-2024, Aki Vehtari, licensed under CC-BY-NC 4.0.
  • Parts of text and code © 2017, Jonah Gabry and Ben Goodrich from rstanarm vignette for count data, licensed under GPL 3>

Original Computing Environment

sessionInfo()
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fi_FI.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=fi_FI.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=fi_FI.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tibble_3.2.1          dplyr_1.1.4           priorsense_0.0.0.9000
 [4] posterior_1.5.0.9000  bayesplot_1.10.0      ggdist_3.3.1         
 [7] ggplot2_3.4.4         loo_2.6.0             cmdstanr_0.7.1       
[10] brms_2.20.4           rstanarm_2.26.1       Rcpp_1.0.12          

loaded via a namespace (and not attached):
  [1] minqa_1.2.6          colorspace_2.1-0     ellipsis_0.3.2      
  [4] ggridges_0.5.5       markdown_1.12        QuickJSR_1.0.9      
  [7] base64enc_0.1-3      farver_2.1.1         rstan_2.32.3        
 [10] DT_0.31              fansi_1.0.6          mvtnorm_1.2-4       
 [13] bridgesampling_1.1-2 codetools_0.2-19     splines_4.2.2       
 [16] cachem_1.0.8         knitr_1.45           shinythemes_1.2.0   
 [19] jsonlite_1.8.8       nloptr_2.0.3         shiny_1.8.0         
 [22] compiler_4.2.2       backports_1.4.1      Matrix_1.6-4        
 [25] fastmap_1.1.1        cli_3.6.2            later_1.3.2         
 [28] htmltools_0.5.7      tools_4.2.2          igraph_1.6.0        
 [31] coda_0.19-4          gtable_0.3.4         glue_1.7.0          
 [34] reshape2_1.4.4       V8_4.4.1             jquerylib_0.1.4     
 [37] vctrs_0.6.5          nlme_3.1-162         crosstalk_1.2.1     
 [40] tensorA_0.36.2.1     xfun_0.41            stringr_1.5.1       
 [43] ps_1.7.5             lme4_1.1-35.1        mime_0.12           
 [46] miniUI_0.1.1.1       lifecycle_1.0.4      gtools_3.9.5        
 [49] MASS_7.3-58.2        zoo_1.8-12           scales_1.3.0        
 [52] colourpicker_1.3.0   promises_1.2.1       Brobdingnag_1.2-9   
 [55] parallel_4.2.2       inline_0.3.19        shinystan_2.6.0     
 [58] yaml_2.3.8           curl_5.2.0           gridExtra_2.3       
 [61] StanHeaders_2.32.5   sass_0.4.8           stringi_1.8.3       
 [64] highr_0.10           dygraphs_1.1.1.6     checkmate_2.3.1     
 [67] boot_1.3-28          pkgbuild_1.4.3       rlang_1.1.3         
 [70] pkgconfig_2.0.3      matrixStats_1.2.0    distributional_0.3.2
 [73] evaluate_0.23        lattice_0.20-45      rstantools_2.3.1.1  
 [76] htmlwidgets_1.6.4    labeling_0.4.3       tidyselect_1.2.0    
 [79] processx_3.8.3       plyr_1.8.9           magrittr_2.0.3      
 [82] R6_2.5.1             generics_0.1.3       pillar_1.9.0        
 [85] withr_3.0.0          xts_0.13.1           survival_3.4-0      
 [88] abind_1.4-5          utf8_1.2.4           rmarkdown_2.25      
 [91] grid_4.2.2           data.table_1.14.10   callr_3.7.3         
 [94] threejs_0.3.3        digest_0.6.34        xtable_1.8-4        
 [97] httpuv_1.6.13        RcppParallel_5.1.7   stats4_4.2.2        
[100] munsell_0.5.0        bslib_0.6.1          shinyjs_2.1.0       


---
title: "Roaches cross-validation demo"
author: "[Aki Vehtari](https://users.aalto.fi/~ave/)"
date: "First version 2017-01-10. Last modified `r format(Sys.Date())`."
output:
  html_document:
    fig_caption: yes
    toc: TRUE
    toc_depth: 2
    number_sections: TRUE
    toc_float:
      smooth_scroll: FALSE
    code_download: true
bibliography: modelsel.bib
csl: harvard-cite-them-right.csl
link-citations: yes
---

# Setup  {.unnumbered}


```{r setup, include=FALSE}
knitr::opts_chunk$set(cache=FALSE, message=FALSE, error=FALSE, warning=FALSE, comment=NA, out.width='95%')
```


**Load packages**

```{r}
library(rstanarm)
library(brms)
library(cmdstanr)
options(mc.cores = 4)
library(loo)
library(ggplot2)
library(ggdist)
library(bayesplot)
theme_set(bayesplot::theme_default(base_family = "sans"))
library(posterior)
options(posterior.num_args=list(digits=2))
library(priorsense)
library(dplyr)
library(tibble)
```


# Introduction

This notebook demonstrates cross-validation model comparison and
cross-validation predictive checking of models. Furthermore the
notebook demonstrates how to use integrated PSIS-LOO with
varying intercept ("random effect") models.

The roaches data example comes from Chapter 8.3 of [Gelman and Hill
(2007)](http://www.stat.columbia.edu/~gelman/arm/) and the
introduction text for the data is from [Estimating Generalized
Linear Models for Count Data with
rstanarm](https://cran.r-project.org/web/packages/rstanarm/vignettes/count.html)
by Jonah Gabry and Ben Goodrich.

We want to make inferences about the efficacy of a certain pest
management system at reducing the number of roaches in urban
apartments. Here is how Gelman and Hill describe the experiment
(pg. 161):

> the treatment and control were applied to 160 and 104 apartments,
respectively, and the outcome measurement $y_i$ in each apartment
$i$ was the number of roaches caught in a set of traps. Different
apartments had traps for different numbers of days

In addition to an intercept, the regression predictors for the
model are the pre-treatment number of roaches `roach1`, the
treatment indicator `treatment`, and a variable indicating whether
the apartment is in a building restricted to elderly residents
`senior`. Because the number of days for which the roach traps were
used is not the same for all apartments in the sample, we include
it as an `exposure2` by adding $\ln(u_i)$) to the linear predictor
$\eta_i$ and it can be specified using the `offset` argument to
`stan_glm`.

# Poisson model

Load data

```{r}
data(roaches)
# Roach1 is very skewed and we take a square root
roaches$sqrt_roach1 <- sqrt(roaches$roach1)
```


Make a Poisson regression model with `rstanarm`

```{r results='hide'}
stan_glmp <- stan_glm(y ~ sqrt_roach1 + treatment + senior, offset = log(exposure2),
                      data = roaches, family = poisson, 
                      prior = normal(0,2.5), prior_intercept = normal(0,5),
                      chains = 4, cores = 1, seed = 1704009, refresh=0)
```


## Analyse posterior

Plot posterior

```{r}
mcmc_areas(as.matrix(stan_glmp), prob_outer = .999,
           pars = c("sqrt_roach1","treatment","senior"))
```


All marginal posteriors are clearly away from zero.

## Cross-validation checking

We can use Pareto-smoothed importance sampling leave-one-out (PSIS-LOO)
cross-validation as model checking tool
[@Vehtari+etal:PSIS-LOO:2017].

```{r}
(loop <- loo(stan_glmp))
```

`p_loo` is about 260, which is much higher than the number of
parameters p=4, which indicates bad misspecification which we are
like likely to see also with posterior predictive checking (see,
https://mc-stan.org/loo/reference/loo-glossary.html#pareto-k-estimates-1)


We can check the observation specific plot Pareto-$\hat{k}$ values.

```{r}
plot(loop)
```


There are several observations which are highly influential, which
indicates potential model misspecification [@Vehtari+etal:PSIS-LOO:2017].

Before looking in more detail where the problem is or fixing it,
let's check what would cross-validation say about relevance of
covariates.

We form 3 models by dropping each of the covariates out. We later
compare these to results from other models.

```{r results='hide'}
stan_glmm1p <- update(stan_glmp, formula = y ~ treatment + senior)
stan_glmm2p <- update(stan_glmp, formula = y ~ sqrt_roach1 + senior)
stan_glmm3p <- update(stan_glmp, formula = y ~ sqrt_roach1 + treatment)
```


Although Pareto $k$ values were very large we can make a quick test
with PSIS-LOO (if the comparison would say there is difference,
then PSIS-LOO couldn't be trusted and refitting problematic folds
or $K$-fold-CV would be needed [see more in
@Vehtari+etal:PSIS-LOO:2017]).

```{r}
loo_compare(loo(stan_glmm1p), loop)
loo_compare(loo(stan_glmm2p), loop)
loo_compare(loo(stan_glmm3p), loop)
```

Based on this the roaches covariate would be relevant, but although
dropping treatment or senior covariate will make a large change to
elpd, the uncertainty is also large and cross-validation states
that these covariates are not necessarily relevant! The posterior
marginals are conditional on the model, but cross-validation is
more cautious by not using any model for the future data
distribution.

## Posterior predictive checking

In addition of cross-validation, the posterior predictive checks
can often detect problems and also provide more information about
the reason. As the range of counts is large, we can use kernel
density estimate plot.

```{r}
pp_check(stan_glmp, plotfun = "dens_overlay", nreps=20) +
  scale_x_sqrt(breaks=c(0,1,3,10,30,100,300), lim=c(0,400))
```

We see that the marginal distribution of model replicated data is
clearly different from the observed data.

We test additionally the proportion of zeros predicted by the model
and compare them to the observed number of zeros.

```{r}
prop_zero <- function(y) mean(y == 0)
(prop_zero_test1 <- pp_check(stan_glmp, plotfun = "stat", stat = "prop_zero"))
```


# Negative binomial model

We change the Poisson model to a more robust negative binomial model.

```{r results='hide'}
stan_glmnb <- update(stan_glmp, family = neg_binomial_2)
```


## Analyse posterior

Plot posterior

```{r}
mcmc_areas(as.matrix(stan_glmnb), prob_outer = .999,
    pars = c("sqrt_roach1","treatment","senior"))
```


Treatment effect is much closer to zero, and senior effect has lot
of probability mass on both sides of 0. So it matters, which model
we use.

We discuss posterior dependencies in more detail in `collinear`
notebook, but for reference we plot also here paired marginals.

```{r}
mcmc_pairs(as.matrix(stan_glmnb),
           pars = c("sqrt_roach1","treatment","senior"))
```


There are some posterior correlations, but not something which
would change our conclusions.

## Cross-validation checking

Let's check PSIS-LOO and Pareto $k$ diagnostics

```{r}
(loonb <- loo(stan_glmnb, save_psis=TRUE))
```

All khat's are ok, which indicates that negative-binomial would be
better. We can also compare Poisson and negative-binomial.

```{r}
loo_compare(loop, loonb)
```

Negative-binomial model is clearly better than Poisson.

As Poisson is a special case of negative-Binomial, we could have
also seen that Poisson is likely by looking at the posterior of the
over-dispersion parameter (which gets very small values).

```{r}
mcmc_areas(as.matrix(stan_glmnb), prob_outer = .999,
    pars = c("reciprocal_dispersion"))
```


## Posterior predictive checking

We use posterior predictive checking to compare marginal
distributions.

```{r}
pp_check(stan_glmnb, plotfun = "dens_overlay", nreps=20) +
  scale_x_sqrt(breaks=c(0,1,3,10,30,100,300), lim=c(0,400))
```

We see that the negative-binomial model is much better although not
perfect as the model predictive distribution has more mass for
small counts than the real data.

```{r}
#at the moment pit computation is not correct for discrete (PR in progress)
#pp_check(stan_glmnb, plotfun = "pit_ecdf")
pit <- function(y, yrep) {
  n_draws <- nrow(yrep)
  pit <- sapply(1:length(y),
         \(n) {
           mean(y[n] > yrep[, n]) +
             # randomized PIT for discrete y (Czado, C., Gneiting, T.,
             # Held, L.: Predictive model assessment for count
             # data. Biometrics 65(4), 1254–1261 (2009).)
             sample(sum(y[n] == yrep[, n]), 1) / n_draws
         })
  pmax(pmin(pit, 1), 0)
}
```

Overall the PIT distribution looks fine

```{r}
ppc_pit_ecdf(pit=pit(y = roaches$y, yrep = posterior_predict(stan_glmnb)))
```

Previously this notebook used only probability of zeros as the statistic,
which misses the miscalibration seen in density plots and PIT plots

```{r}
(prop_zero_test2 <- pp_check(stan_glmnb, plotfun = "stat", stat = "prop_zero"))
```

The posterior predictive check compares observations to the
posterior predictive distribution which is conditioned on the same
observations. We can avoid double use of data, by using LOO
predictive distributions. Natural statistic is PIT which should
have approximately uniform distribution if the model is well
calibrated.

```{r}
#pp_check(stan_glmnb, plotfun = "loo_pit_qq") +
loo_pit <- function(y, yrep, lw) {
  pit <- vapply(seq_len(ncol(yrep)), function(j) {
    sel_min <- yrep[, j] < y[j]
    pit_min <- exp_log_sum_exp(lw[sel_min,j])
    sel_sup <- yrep[, j] == y[j]
    pit_sup <- pit_min + exp_log_sum_exp(lw[sel_sup,j])
    runif(1, pit_min, pit_sup)
  }, FUN.VALUE = 1)
  pmax(pmin(pit, 1), 0)
}
exp_log_sum_exp <- function(x) {
  m <- suppressWarnings(max(x))
  exp(m + log(sum(exp(x - m))))
}
```

Bayesplot includes `loo_pit_qq` for checking how uniform LOO-PIT distribution is.

```{r}
ppc_loo_pit_qq(pit=loo_pit(y = roaches$y, yrep = posterior_predict(stan_glmnb),
                           lw=weights(loonb$psis_object))) +
  geom_abline() +
  ylim(c(0,1))
```

We can use also the `pit_ecdf` plot to make the comparison to
posterior predictive PIT values easier.

```{r}
ppc_pit_ecdf(pit=loo_pit(y = roaches$y, yrep = posterior_predict(stan_glmnb),
                         lw = weights(loonb$psis_object)))
```

We can compare posterior-PIT and LOO-PIT directly

```{r}
yrep = posterior_predict(stan_glmnb)
data.frame(pit=sort(pit(y = roaches$y, yrep = posterior_predict(stan_glmnb))),
           loopit=sort(loo_pit(y = roaches$y, yrep = posterior_predict(stan_glmnb),
                               lw = weights(loonb$psis_object)))) |>
  ggplot(aes(x=pit,y=loopit)) +
  geom_point() +
  geom_abline() +
  labs(x='Posterior-PIT',y='LOO-PIT')
```

There is not much difference as the number of observations is much
larger than the number of parameters, and the posterior predictive
distributions and LOO-predictive distributions are similar.

```{r}
## A=seq(0.01,0.99,by=0.01)
## loopit <- loo_pit(y = roaches$y, yrep = posterior_predict(stan_glmnb),
##                          lw = weights(loonb$psis_object))
## data.frame(A=A,Aloo=sapply(A,\(a) {mean(loopit>=0.5-a/2 & loopit<=0.5+a/2)}))|>
##   ggplot(aes(x=A,y=Aloo))+
##   geom_point() +
##   labs(x='Nominal central coverage', y='Observed LOO coverage')+
##   geom_abline()
```


## Predictive relevance of covariates

Let's finally check cross-validation model comparison that it
agrees on relevance of covariates

```{r results='hide'}
stan_glmm1nb <- update(stan_glmm1p, family = neg_binomial_2)
stan_glmm2nb <- update(stan_glmm2p, family = neg_binomial_2)
stan_glmm3nb <- update(stan_glmm3p, family = neg_binomial_2)
```



```{r}
loo_compare(loo(stan_glmm1nb), loonb)
loo_compare(loo(stan_glmm2nb), loonb)
loo_compare(loo(stan_glmm3nb), loonb)
```

Covariate `roaches1` has clear effect. `treatment` effect is
visible in marginal posterior, but as discussed in `betablockers`
demo, cross-validation is not good for detecting weak
effects. Based on cross-validation senior effect is also having
negligible effect in the predictive performance.

Conclusion from the analysis would be then that, `treatment` is
likely to help, but it's difficult to predict the number of roaches
given `treatment` or not.

# Poisson model with varying intercepts

Sometimes overdispersion is modelled by adding varying intercepts
("random effects") for each individual. The following example
illustrates computational problems in this approach.

Fit with stan_glm

```{r results='hide'}
roaches$id <- 1:dim(roaches)[1]
stan_glmpr <- stan_glmer(y ~ sqrt_roach1 + treatment + senior + (1 | id), 
                         offset = log(exposure2),
                         data = roaches, family = poisson, 
                         prior = normal(0,2.5), prior_intercept = normal(0,5),
                         chains = 4, cores = 4, iter = 4000, 
                         seed = 1704009, refresh=0)
```


## Analyse posterior

Plot posterior

```{r}
mcmc_areas(as.matrix(stan_glmpr), prob_outer = .999,
    pars = c("sqrt_roach1","treatment","senior"))
```


The marginals are similar as with negative-binomial model except the
marginal for `senior` is clearly away from zero.

## Cross-validation checking

Let's check PSIS-LOO.

```{r}
(loopr <- loo(stan_glmpr, save_psis=TRUE))
```

`p_loo` is about 175, which is less than the number of parameters
267, but it is relatively large compared to to the number of
observations (`p_loo >>N/5`), which indicates very flexible
model. In this case his is due to having an intercept
parameter for each observations. Removing one observation changes
the posterior for that intercept so much that importance
sampling fails (even with Pareto smoothing). Note that WAIC would
fail due to the same reason.

We can also plot Pareto $k$ values.

```{r}
plot(loopr)
```

We have a very large number of high $k$ values which indicates very flexible model.

While importance sampling in PSIS-LOO can fail for varying parameter
model, we can use $K$-fold-CV instead to re-fit the model 10 times,
each time leaving out 10% of the observations. This shows that
cross-validation itself is not infeasible for varying parameter models.

```{r}
(kcvpr <- kfold(stan_glmpr, K=10))
```


loo package allows comparing PSIS-LOO and $K$-fold-CV results

```{r}
loo_compare(loonb, kcvpr)
```


There is not much difference, and this difference could also be
explained by $K$-fold-CV using only 90% of observations for the
posteriors, while PSIS-LOO is using 99.6% of observations for the
posteriors. We can check this by running $K$-fold-CV also for
negative-binomial model.

```{r}
(kcvnb <- kfold(stan_glmnb, K=10))
loo_compare(kcvnb, kcvpr)
```


When both models are assessed using $K$-fold-CV, the difference in
predictive performance is very small. The models can still have
different predictive distributions.

Now that we've seen that based on robust $K$-fold-CV there is not
much difference between negative-binomial and varying-intercept-Poisson
models, we can also check how bad the comparison would have been
with PSIS-LOO.

```{r}
loo_compare(loonb, loopr)
```


If we would have ignored Pareto-$k$ warnings, we would have
mistakenly assumed that varying intercept model is much better. Note
that WAIC is (as usual) even worse (see also @Vehtari+etal:PSIS-LOO:2017)

```{r}
loo_compare(waic(stan_glmnb), waic(stan_glmpr))
```


## Posterior predictive checking

We do posterior predictive checking for varying intercept Poisson  model.

```{r}
pp_check(stan_glmpr, plotfun = "dens_overlay", nreps=20) +
  scale_x_sqrt(breaks=c(0,1,3,10,30,100,300), lim=c(0,400))
```

The match looks perfect, but that can be explained with having one
parameter for each observation and kernel density estimate hiding something.
PIT-ECDF plot shows problems

```{r}
ppc_pit_ecdf(pit=pit(y = roaches$y, yrep = posterior_predict(stan_glmpr)))
```

There are too many PIT values near 0.5. If we look at the
predictive intervals and observations, we see that many the
observations are in the middle of the posterior predictive
interval, which can be explained by having very flexible model with
one parameter for each observation.

```{r}
ppc_intervals(y = roaches$y, yrep = posterior_predict(stan_glmpr))
```

LOO-PIT's are not much better, because PSIS-LOO fails (remember the
high Pareto-$\hat{k}$'s)

```{r}
## pp_check(stan_glmpr, plotfun = "loo_pit_qq") +
##   geom_abline() +
##   ylim(c(0,1))
## ppc_loo_pit_qq(pit=loo_pit(y = roaches$y, yrep = posterior_predict(stan_glmpr),
##                            lw=weights(loopr$psis_object))) +
##   geom_abline() +
##   ylim(c(0,1))
ppc_pit_ecdf(pit=loo_pit(y = roaches$y, yrep = posterior_predict(stan_glmpr),
                         lw=weights(loopr$psis_object)))
```

# Poisson model with varying intercept and integrated LOO

Removing one observation changes the posterior for that intercept
so much that importance sampling fails (even with Pareto
smoothing). We can get improved stability by integrating that 
intercept out with something more accurate than the importance
sampling [@Vehtari+etal:2016:LOO_for_GLVM].  If there is only one
group or individual specific parameter then we can integrate that
out easily with adaptive quadrature (2 parameters with 2D
quadrature would work, too, which could be implemented with nested
1D quadratures).

As we can easily integrate out only 1 (or 2) parameters (per group
/ individual), it's not easy to make a generic approach for `rstanarm`
and brms, and thus I illustrate the approach with direct Stan model
code and `cmdstanr`.

The following Stan model uses individual specific intercept term
`z[i]` (with common prior with scale `sigmaz`). In the generated
quantities, the usual `log_lik` computation is replaced with
integrated approach. We also generate `y_loorep` which is the LOO
predictive distribution given other parameters than `z`. This is
needed to get the correct LOO predictive distributions when
combined with integrated PSIS-LOO.

```{r}
poisson_re_int <- "poisson_re_integrate.stan"
writeLines(readLines(poisson_re_int))
```


Compile the model, prepare the data, and sample. `integrate_1d_reltol`
sets the relative tolerance for the adaptive 1D quadrature function
`integrate_1d` (if with other model and data you see messages about
error estimate of integral exceeding the given relative tolerance
times norm of integral, you will get NaNs and need to increase the
relative tolerance).

```{r results='hide'}
modpri <- cmdstan_model(stan_file = poisson_re_int)
datap <- list(N = dim(roaches)[1],
              P = 3,
              offsett = log(roaches$exposure2),
              X = roaches[,c('sqrt_roach1','treatment','senior')],
              y = roaches$y,
              integrate_1d_reltol = 1e-6)
fitpri <- modpri$sample(data = datap, refresh = 0, chains = 4, parallel_chains = 4)
```


The posterior is similar as above for varying intercept Poisson model
(as it should be, as the generated quantities is not affecting
the posterior).

```{r}
mcmc_areas(as_draws_matrix(fitpri$draws(variables=c('beta','sigmaz'))),
           prob_outer = .999)
```

Now the PSIS-LOO doesn't give warnings and the result is close to K-fold-CV.

```{r}
(loopri <- fitpri$loo(save_psis=TRUE))
```

Comparing to the negative binomial model gives only slight
improvement in predictive performance.

```{r}
loo_compare(fitpri=loopri, loonb)
```

Posterior predictive checking with KDE looks very good, but that
can be explained with having one parameter for each observation and
posterior predictive checking cannot be trusted.

```{r}
drm <- fitpri$draws(format='matrix')
y_rep <- subset_draws(drm,variable='y_rep')
pp_check(roaches$y, y_rep[sample(1:4000,20),], fun='dens_overlay') +
  scale_x_sqrt(breaks=c(0,1,3,10,30,100,300), lim=c(0,400))
```

PIT-ECDF's are the same as for the varying intercept Poisson
LOO-PIT's look good as integrated LOO helps.

```{r}
y_loorep <- subset_draws(drm,variable='y_loorep')
## pp_check(roaches$y, yrep=y_loorep, fun='loo_pit_qq', lw=weights(loopri$psis_object)) +
##   geom_abline() +
##   ylim(c(0,1))
## ppc_loo_pit_qq(pit=loo_pit(y = roaches$y, yrep = y_loorep,
##                            lw=weights(loopri$psis_object))) +
##   geom_abline() +
##   ylim(c(0,1))
ppc_pit_ecdf(pit=loo_pit(y = roaches$y, yrep = y_loorep,
                         lw=weights(loopri$psis_object)))
```

# Zero-inflated negative-binomial model

As the proportion of zeros is quite high in the data, it is
worthwhile to test also a zero-inflated negative-binomial model,
which is a mixture of two models
 - logistic regression to model the proportion of extra zero counts
 - negative-binomial model

We switch to `brms` as `rstanarm` doesn't support zero-inflated
negative-binomial model.

```{r results='hide'}
brm_glmzinb <-
  brm(bf(y ~ sqrt_roach1 + treatment + senior + offset(log(exposure2)),
         zi ~ sqrt_roach1 + treatment + senior + offset(log(exposure2))),
      family=zero_inflated_negbinomial(), data=roaches,
      prior=c(prior(normal(0,3), class='b'),
              prior(normal(0,3), class='b', dpar='zi'),
              prior(normal(0,3), class='Intercept', dpar='zi')),
      seed=1704009, refresh=1000)
```

Based on PSIS-LOO, zero-inflated negative-binomial is clearly better.

```{r}
brm_glmzinb <- add_criterion(brm_glmzinb, criterion='loo', save_psis=TRUE)
(loozinb <- loo(brm_glmzinb))
loo_compare(loonb, loozinb)
```

Posterior predictive checking looks good, but there is no clear difference
to negative-binomial in most checks.

```{r}
pp_check(brm_glmzinb, type='dens_overlay') +
    scale_x_sqrt(breaks=c(0,1,3,10,30,100,300), lim=c(0,400))

ppc_pit_ecdf(pit=pit(y = roaches$y, yrep = posterior_predict(brm_glmzinb)))
```

LOO-PIT-ECDF

```{r}
ppc_pit_ecdf(pit=loo_pit(y = roaches$y, yrep = posterior_predict(brm_glmzinb),
                         lw = weights(loozinb$psis_object)))
```

Proportion of zeros is similar, but has more variation compared to
negative-binomial model.

```{r}
pp_check(brm_glmzinb, type='stat', stat=\(y) mean(y==0))
```

There is much bigger difference in max count test statistic. We
first check the max count PPC for negative-binomial model.

```{r}
pp_check(stan_glmnb, plotfun = "stat", stat = "max") +
  scale_x_sqrt(breaks=c(100,300,1000,3000,10000,30000))
```


which shows that negative-binomial model is predicting often 10-100
larger roach counts than in the data (30,000 roaches in one trap is
a lot).

The max count PPC for zero-inflated negative-binomial model

```{r}
pp_check(brm_glmzinb, type = "stat", stat = "max") +
  scale_x_sqrt(breaks=c(100,300,1000,3000,10000))
```


is much better, although still the max counts can be 10 times bigger
than the max count in the data.

The difference in the max counts can be explained so that the plain
negative binomial model has smaller shape parameter to model the
overdispersion in zeros but leading to long tail without
observations. We can check the overdispersion parameters.

```{r}
# negative-binomial model 
mean(as_draws_rvars(stan_glmnb)$reciprocal_dispersion)
# zero-inflated negative-binomial model 
mean(as_draws_rvars(brm_glmzinb)$shape)
```

Although the models are different, with finite data and wide LOO
predictive distributions, there is a limit in which differences can
be see in LOO-PIT values. Both negative-binomial and zero-inflated
negative binomial are close enough the LOO-PIT can't see discrepancy
from the data, but elpd_loo is still able to to show that
zero-inflation component improves the predictive accuracy.

## Analyse posterior

Plot posterior

```{r}
mcmc_areas(as.matrix(brm_glmzinb)[,3:8], prob_outer = .999)
```


The posterior marginals for negative-binomial part are similar to marginals
in the plain negative-binomial model. The marginal effects for the
logistic part have opposite sign as the logistic part is modelling
the extra zeros.

The treatment effect is now divided between negative-binomial and
logistic part. We can use the model to make predictions for the
expected number of roaches given treatment and no-treatment.

Expectations of posterior predictive distributions given treatment=0 and treatment=1

```{r}
pred <- posterior_epred(brm_glmzinb, newdata=rbind(mutate(roaches, treatment=0),
                                                   mutate(roaches, treatment=1)))
```

Ratio of expected number of roaches with vs without treatment

```{r}
ratio <- array(rowMeans(pred[,263:524]/pred[,1:262]), c(1000, 4, 1)) |>
  as_draws_df() |>
  set_variables(variables='ratio')
ratio |>
  ggplot(aes(x=ratio)) +
  stat_slab() +
  labs(x='Ratio of roaches with vs without treatment', y=NULL) +
  scale_y_continuous(breaks=NULL) +
  theme(axis.line.y=element_blank(),
        strip.text.y=element_blank()) +
  xlim(c(0,1)) +
  geom_vline(xintercept=1, linetype='dotted')
```

The treatment clearly reduces the expected number of roaches.

Make prior sensitivity analysis by power-scaling both prior and
likelihood.  Focus on the ratio of expected number of roaches
with vs without treatment.

```{r}
powerscale_sensitivity(brm_glmzinb, prediction = \(x, ...) ratio)$sensitivity |>
                         filter(variable=='ratio') |>
                         mutate(across(where(is.double),  ~num(.x, digits=2)))
```


There is no prior sensitivity

## Predictive relevance of covariates

Let's finally check cross-validation model comparison to see whether improved model has effect on the predictive performance comparison.

```{r results='hide'}
brm_glmm1zinb <-
  update(brm_glmzinb,
         formula=bf(y ~ treatment + senior + offset(log(exposure2)),
                    zi ~ treatment + senior + offset(log(exposure2))))
brm_glmm2zinb <-
  update(brm_glmzinb,
         formula=bf(y ~ sqrt_roach1 + senior + offset(log(exposure2)),
                    zi ~ sqrt_roach1 + senior + offset(log(exposure2))))
brm_glmm3zinb <-
  update(brm_glmzinb,
         formula=bf(y ~ sqrt_roach1 + treatment + offset(log(exposure2)),
                  zi ~ sqrt_roach1 + treatment + offset(log(exposure2))))
```
```{r}
loo_compare(loo(brm_glmm1zinb),loozinb)
loo_compare(loo(brm_glmm2zinb),loozinb)
loo_compare(loo(brm_glmm3zinb),loozinb)
```

Roaches1 has clear effect. Treatment effect improves the predictive
performance with 95% probability (the normal approximation can be
trusted as 1) the number of observations is larger than 100, 2) the
elpd difference is bigger than 4, 3) there are no clear outliers,
and 4) the distribution of the pointwise differences has finite
variance as tested with Pareto-$\hat{k}$ diagnostic
[@Sivula+etal:2020:loo_uncertainty;@Vehtari+etal:PSIS:2022].

<br />

# References {.unnumbered}

<div id="refs"></div>

# Licenses {.unnumbered}

* Code &copy; 2017-2024, Aki Vehtari, licensed under BSD-3.
* Text &copy; 2017-2024, Aki Vehtari, licensed under CC-BY-NC 4.0.
* Parts of text and code &copy; 2017, Jonah Gabry and Ben Goodrich from [rstanarm vignette for count data](https://cran.r-project.org/web/packages/rstanarm/vignettes/count.html), licensed under GPL 3>

# Original Computing Environment {.unnumbered}


```{r}
sessionInfo()
```


<br />

