Nabiximols treatment efficiency

Author

Aki Vehtari

Published

2024-02-23

Modified

2024-05-08

1 Introduction

This notebook was inspired by a question by Llew Mills in Stan Discourse about comparison of continuous and discrete observation models. This question has been asked several times before, and an answer is included in CV-FAQ, too. CV-FAQ answer states

You can’t compare densities and probabilities directly. Thus you can’t compare model given continuous and discrete observation models, unless you compute probabilities in intervals from the continuous model (also known as discretising continuous model).

The answer is complete in theory, but doesn’t tell how to do the discretization in practice. The first part of this notebook shows how easy that discretization can be in a special case of counts.

Load packages

Code
library(tidyr)
library(dplyr)
library(tibble)
library(modelr)
library(loo)
library(brms)
options(brms.backend = "cmdstanr", mc.cores = 4)
library(posterior)
options(posterior.num_args=list(digits=2))
library(pillar)
options(pillar.negative = FALSE)
library(ggplot2)
library(bayesplot)
theme_set(bayesplot::theme_default(base_family = "sans", base_size=14))
library(tidybayes)
library(ggdist)
library(tinytable)
options(tinytable_format_num_fmt = "significant_cell", tinytable_format_digits = 2, tinytable_tt_digits=2)
library(matrixStats)
library(reliabilitydiag)
library(priorsense)

2 Data

Data comes from a study [Nabiximols for the Treatment of Cannabis Dependence: A Randomized Clinical Trial] by Lintzeris et al. (2019), and was posted in that Discourse thread by Mills (one of the co-authors). The data includes 128 participants (id) in two groups (group = Placebo or Nabiximols).

Participants received 12-week treatment involving weekly clinical reviews, structured counseling, and flexible medication doses—up to 32 sprays daily (tetrahydrocannabinol, 86.4 mg, and cannabidiol, 80 mg), dispensed weekly.

Primary outcome was self-reported number of days using illicit cannabis during the 12-week period.

The number of cannabis used days (cu) was for previous 28 days (set) asked after 0, 4, 8, and 12 weeks (week as a factor).

id <- factor(c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 6, 6, 6, 6, 7, 7, 8, 8, 8, 8, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 12, 12, 13, 14, 15, 16, 16, 17, 18, 18, 18, 18, 19, 20, 20, 20, 20, 21, 21, 21, 21, 22, 22, 23, 23, 23, 24, 24, 24, 24, 25, 25, 25, 25, 26, 27, 27, 28, 28, 28, 28, 29, 30, 30, 30, 30, 31, 31, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 35, 35, 36, 36, 37, 37, 37, 37, 38, 39, 39, 39, 39, 40, 40, 40, 41, 42, 42, 42, 42, 43, 43, 43, 43, 44, 44, 45, 45, 46, 46, 46, 46, 47, 47, 47, 47, 48, 48, 49, 49, 49, 50, 50, 50, 50, 51, 51, 51, 52, 52, 52, 52, 53, 53, 53, 53, 54, 54, 55, 55, 55, 55, 56, 57, 57, 57, 57, 58, 58, 58, 58, 59, 59, 59, 59, 60, 60, 60, 60, 61, 61, 61, 62, 63, 63, 64, 64, 64, 65, 65, 65, 65, 66, 66, 66, 66, 67, 67, 67, 67, 68, 68, 68, 69, 69, 69, 69, 70, 70, 70, 70, 71, 71, 71, 71, 72, 73, 73, 73, 73, 74, 74, 74, 75, 76, 76, 76, 76, 77, 77, 77, 77, 78, 78, 78, 79, 79, 79, 79, 80, 80, 80, 80, 81, 81, 81, 81, 82, 82, 83, 83, 84, 84, 84, 85, 85, 85, 86, 86, 86, 86, 87, 87, 87, 87, 88, 88, 88, 88, 89, 89, 89, 89, 90, 90, 90, 90, 91, 91, 91, 91, 92, 92, 92, 92, 93, 93, 93, 93, 94, 94, 94, 94, 95, 95, 95, 95, 96, 96, 96, 96, 97, 97, 97, 98, 98, 98, 98, 99, 99, 99, 99, 100, 101, 101, 101, 102, 102, 102, 102, 103, 103, 103, 103, 104, 104, 105, 105, 105, 105, 106, 106, 106, 106, 107, 107, 107, 107, 108, 108, 108, 108, 109, 109, 109, 109, 110, 110, 111, 111, 112, 112, 112, 112, 113, 113, 113, 113, 114, 115, 115, 115, 115, 116, 116, 116, 116, 117, 117, 117, 117, 118, 118, 119, 119, 119, 119, 120, 120, 120, 120, 121, 121, 121, 122, 123, 123, 123, 123, 124, 124, 124, 125, 125, 125, 125, 126, 126, 126, 126, 127, 127, 128))
group <- factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0),
                levels = 0:1,
                labels = c("placebo", "nabiximols"))
week <- factor(c(0, 4, 8, 12, 0, 4, 8, 0, 4, 8, 0, 4, 8, 12, 0, 4, 0, 4, 8, 12, 0, 4, 0, 4, 8, 12, 0, 4, 8, 0, 4, 8, 12, 0, 4, 8, 0, 4, 0, 0, 0, 0, 4, 0, 0, 4, 8, 12, 0, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 0, 4, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 0, 4, 0, 4, 8, 12, 0, 0, 4, 8, 12, 0, 4, 0, 4, 8, 12, 0, 4, 8, 0, 4, 12, 0, 8, 0, 4, 0, 4, 8, 12, 0, 0, 4, 8, 12, 0, 4, 8, 0, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 0, 4, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 0, 4, 12, 0, 4, 8, 12, 0, 4, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 0, 4, 8, 12, 0, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 12, 0, 0, 4, 0, 4, 8, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 0, 4, 8, 12, 0, 4, 8, 0, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 0, 4, 0, 4, 8, 0, 4, 8, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 0, 4, 8, 12, 0, 4, 8, 12, 0, 0, 4, 8, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 0, 4, 0, 4, 8, 12, 0, 4, 8, 12, 0, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 12, 0, 0, 4, 8, 12, 0, 4, 12, 0, 4, 8, 12, 0, 4, 8, 12, 0, 4, 0))
cu <- c(13, 12, 12, 12, 28, 0, NA, 16, 9, 2, 28, 28, 28, 28, 28, NA, 28, 28, 17, 28, 28, NA, 16, 0, 0, NA, 28, 28, 28, 28, 17, 0, NA, 28, 27, 28, 28, 26, 24, 28, 28, 28, 25, 28, 26, 28, 18, 16, 28, 28, 7, 0, 2, 28, 2, 4, 1, 28, 28, 16, 28, 28, 24, 26, 15, 28, 25, 17, 1, 8, 28, 24, 27, 28, 28, 28, 28, 28, 27, 28, 28, 28, 28, 20, 28, 28, 28, 28, 12, 28, NA, 17, 15, 14, 28, 0, 28, 28, 28, 0, 0, 0, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 21, 24, 28, 27, 28, 28, 26, NA, 28, NA, 20, 2, 3, 7, 28, 1, 19, 8, 21, 7, 28, 28, 20, 28, 28, 28, 24, 20, 17, 11, 25, 25, 28, 26, 28, 24, 17, 16, 27, 14, 28, 28, 28, 28, 28, 28, 14, 13, 4, 24, 28, 28, 28, 21, 28, 21, 26, 28, 28, 0, 0, 28, 23, 20, 28, 20, 16, 28, 28, 28, 10, 1, 1, 2, 28, 28, 28, 28, 18, 22, 9, 15, 28, 9, 1, 20, 18, 20, 24, 28, 28, 28, 19, 28, 28, 28, 28, 28, 28, 28, 28, 28, 4, 14, 20, 28, 28, 0, 0, 0, 28, 20, 9, 24, 28, 28, 28, 28, 28, 21, 28, 28, 14, 24, 28, 23, 0, 0, 0, 28, NA, 28, NA, 28, 15, NA, 12, 25, NA, 28, 2, 0, 0, 28, 10, 0, 0, 28, 0, 0, 0, 23, 0, 0, 0, 28, 0, 0, 0, 28, 0, 0, 0, 28, 2, 1, 0, 21, 14, 7, 8, 28, 28, 28, 0, 28, 28, 20, 18, 24, 0, 0, 0, 28, 15, NA, 28, 1, 1, 2, 28, 1, 0, 0, 28, 28, 14, 21, 25, 19, 16, 13, 28, 28, 28, 28, 28, 28, 28, 27, 19, 21, 18, 1, 0, 0, 28, 28, 28, 28, 28, 24, 27, 28, 18, 0, 3, 8, 28, 28, 28, 9, 20, 25, 20, 12, 19, 0, 0, 0, 27, 28, 0, 0, 0, 20, 17, 16, 14, 28, 7, 0, 1, 28, 24, 28, 25, 23, 20, 28, 14, 16, 7, 28, 28, 26, 28, 28, 26, 28, 28, 28, 24, 20, 28, 28, 28, 28, 28, 8, 6, 4, 28, 20, 28)
set <- rep(28, length(cu))
cu_df <- data.frame(id, group, week, cu, set)

I remove the rows with NA’s as they are not used for posteriors anyway

cu_df <- cu_df |>
  drop_na(cu)

It’s good to visualize the data distribution. There is a clear change in the distribution going from week 0 to later weeks.

cu_df |> 
  ggplot(aes(x=cu)) +
  geom_histogram(breaks=seq(-.5,28.5,by=1)) +
  scale_x_continuous(breaks=c(0,10,20,28)) +
  facet_grid(group ~ week, switch="y", axes="all_x", labeller=label_both)+
  labs(y="")

3 Initial models

Mills provided two brms (Bürkner 2017) models with specified priors. The first one is a normal regression model with varying intercept for each participant (id).

fit_normal <- brm(formula = cu ~ group*week + (1 | id),
      data = cu_df,
      family = gaussian(),
          prior = c(prior(normal(14, 1.5), class = Intercept),
                    prior(normal(0, 11), class = b),
                    prior(cauchy(1,2), class = sd)),
          save_pars = save_pars(all=TRUE),
          seed = 1234,
          refresh = 0)
fit_normal <- add_criterion(fit_normal, criterion="loo", save_psis=TRUE)
Warning: Found 6 observations with a pareto_k > 0.7 in model 'fit_normal'. It
is recommended to set 'moment_match = TRUE' in order to perform moment matching
for problematic observations.

The second provided models is binomial model with the number of trials being 28 for each outcome (cu)

fit_binomial <- brm(formula = cu | trials(set)  ~ group*week + (1 | id),
    data = cu_df,
    binomial(link = logit),
    prior = c(prior(normal(0, 1.5), class = Intercept),
                  prior(normal(0, 1), class = b),
                  prior(cauchy(0,2), class = sd)),
    save_pars = save_pars(all=TRUE),
        seed = 1234,
        refresh = 0)
fit_binomial <- add_criterion(fit_binomial, criterion="loo", save_psis=TRUE)
Warning: Found 110 observations with a pareto_k > 0.7 in model 'fit_binomial'.
It is recommended to set 'moment_match = TRUE' in order to perform moment
matching for problematic observations.

Mills compared the models using PSIS-LOO (Vehtari, Gelman, and Gabry 2017; Vehtari et al. 2024) and asked is this valid:

loo_compare(fit_normal, fit_binomial)
Warning: Not all models have the same y variable. ('yhash' attributes do not
match)
             elpd_diff se_diff
fit_normal      0.0       0.0 
fit_binomial -307.6     118.6 

4 Comparison of discrete and continuous models

At this point I ignore the fact that there many high Pareto-k values indicating unreliable PSIS-LOO estimates (Vehtari et al. 2022), and first discuss comparison of predictive probabilities and predictive probability densities.

I pick the id=2 and compute the posterior predictive probability of each possible outcome (0,1,2,\ldots,28). At this point we don’t need to consider LOO-CV issues, and can focus on comparison of posterior predictive probabilities. We can use log_lik() function to compute the log predictive probabilities. In the following plot the area of each bar is equal to the probability of corresponding count outcome and these probabilities sum to 1. They grey bar illustrates the posterior predictive probability of cu=12.

i<-2
cu_df_predi <- cu_df[i,] |> select(!cu) |> expand_grid(cu=0:28)
S<-4000
p <- exp(colLogSumExps(log_lik(fit_binomial,
                               newdata = cu_df_predi))-log(S))
predi <- data.frame(cu=0:28, p=p)
predi |>
  ggplot(aes(x=cu, y=p)) +
  geom_col(width=1,fill="white", color="blue") +
  geom_col(data=predi[cu[i]+1,],width=1,fill="gray", color="blue") +
  labs(y="Probability") +
  scale_x_continuous(breaks=c(0,10,20,28), minor_breaks = 0:28) +
  guides(x = guide_axis(minor.ticks = TRUE))

In case of normal model, the observation model is continuous, and we compute posterior predictive densities. I use log_lik() to compute log predictive densities. I compute the predictive density of cu=12 and predictive density in fine grid for other values (although the observation model is continuous, in practice with computers, we can evaluate the density only in finite number of points). The following plot shows the predictive density and grey line at cu=12 with end of line corresponding to density at cu=12.

p1 <- exp(colLogSumExps(log_lik(fit_normal,
                               newdata = cu_df[i,]))-log(S))
cu_pred <- seq(-15,40,length.out=400)
cu_df_predi_2 <- cu_df[i,] |> select(!cu) |> expand_grid(cu=cu_pred)
p <- exp(colLogSumExps(log_lik(fit_normal,
                               newdata = cu_df_predi_2))-log(S))
predi <- data.frame(cu=cu_pred, p=p)
pn <- predi |>
  ggplot(aes(x=cu, y=p)) +
  geom_line(color="blue") +
  annotate(geom="segment", x=cu_df[i,'cu'], y=0, xend=cu_df[i,'cu'], yend=p1, color="gray") +
  labs(y="Density")+
  scale_x_continuous(breaks=c(0,10,20,28), minor_breaks = 0:28) +
  guides(x = guide_axis(minor.ticks = TRUE))
pn

We can’t compare probabilities and densities directly, but we can discretize the density to get probabilities. As the outcomes are integers (0,1,2,\ldots,28), we can compute probabilities for intervals ((-0.5, 0.5), (0.5,1.5), (1.5,2.5),\ldots,(27.5,28.5)). The following plot shows the vertical lines for the edges of these intervals.

p2 <- exp(colLogSumExps(log_lik(fit_normal,
                               newdata = cu_df[i,] |> select(!cu) |> expand_grid(cu=seq(-0.5,28.5,by=1))))-log(S))
pn +
  annotate(geom="segment",x=seq(-0.5,28.5,by=1),y=0,xend=seq(-0.5,28.5,by=1),yend=p2, color="blue") +
  scale_x_continuous(breaks=c(0,10,20,28), minor_breaks = 0:28, lim=c(-0.5,28.5))
Warning: Removed 190 rows containing missing values or values outside the scale range
(`geom_line()`).

We can then integrate the density over the interval. For example, integrate predictive density from 11.5 to 12.5 to get a probability that 11.5 < cu < 12.5 (it doesn’t matter here if the interval ends are open or closed). To simplify the computation, we approximate the density as piecewise constant function shown in the next plot.

p3 <- exp(colLogSumExps(log_lik(fit_normal,
                               newdata = cu_df[i,] |> select(!cu) |> expand_grid(cu=seq(0,28,by=1))))-log(S))
pn <-
  predi |>
  ggplot(aes(x=cu, y=p)) +
  geom_line(color="blue", alpha=0.2) +
  annotate(geom="segment", x=cu_df[i,"cu"], y=0, xend=cu_df[i,"cu"], yend=p1, color="gray") +
  labs(y="Density")+
  scale_x_continuous(breaks=c(0,10,20,28), minor_breaks = 0:28, lim=c(-0.5,28.5)) +
  guides(x = guide_axis(minor.ticks = TRUE))
pn +
  annotate(geom="segment",x=rep(seq(-0.5,28.5,by=1),each=2)[2:59],y=0,xend=rep(seq(-0.5,28.5,by=1),each=2)[2:59],yend=rep(p3, each=2), color="blue") +
  annotate(geom="segment",x=rep(seq(-0.5,28.5,by=1),each=2)[1:58],y=rep(p3, each=2),xend=rep(seq(-0.5,28.5,by=1),each=2)[3:60],yend=rep(p3, each=2), color="blue")
Warning: Removed 190 rows containing missing values or values outside the scale range
(`geom_line()`).

Now the probability of each interval is approximated by the height times the width of a bar. The height is the density in the middle of the interval and width of the bar is 1, and thus the probability value is the same as the density value! In this case this is simple as the counts are integers and the distance between counts is 1. The following plot shows the probability of cu=12.

pn +
  annotate(geom="segment",x=rep(seq(-0.5,28.5,by=1),each=2)[2:59],y=0,xend=rep(seq(-0.5,28.5,by=1),each=2)[2:59],yend=rep(p3, each=2), color="blue") +
  annotate(geom="segment",x=rep(seq(-0.5,28.5,by=1),each=2)[1:58],y=rep(p3, each=2),xend=rep(seq(-0.5,28.5,by=1),each=2)[3:60],yend=rep(p3, each=2), color="blue") +
  geom_col(data=data.frame(cu=12,p=p3[12+1]),width=1,fill="gray", color="blue")
Warning: Removed 190 rows containing missing values or values outside the scale range
(`geom_line()`).

Thus, in this case, the LOO-ELPD comparison is valid.

The normal model predictions are not constrained between 0 and 28 (or -0.5 and 28.5), and thus the probabilities for cu \in (0,1,2,\ldots,28) do not necessarily sum to 1.

sum(p3)
[1] 0.9549362

We could switch to truncated normal, but for this data, it would not work well, and as we have better options, I skip illustration of that.

Before continuing with better models, I illustrate the case where the integration interval is not 1, and the comparison would require more work. It is common with continuous targets to normalize the target to have zero mean and unit standard deviation.

cu_scaled_df <- cu_df |>
  mutate(cu_scaled = (cu - mean(cu)) / sd(cu))

We fit the normal model with scaled target.

fit_normal_scaled <- brm(formula = cu_scaled ~ group*week + (1 | id),
      data = cu_scaled_df,
      family = gaussian(),
          prior = c(prior(normal(0, 1.5), class = Intercept),
                    prior(normal(0, 11), class = b),
                    prior(cauchy(1,2), class = sd)),
          save_pars = save_pars(all=TRUE),
          seed = 1234,
          refresh = 0)
fit_normal_scaled <- add_criterion(fit_normal_scaled, criterion="loo")
Warning: Found 5 observations with a pareto_k > 0.7 in model
'fit_normal_scaled'. It is recommended to set 'moment_match = TRUE' in order to
perform moment matching for problematic observations.

Now the densities for the scaled target are much higher, and the scaled model seems to be much better.

loo_compare(fit_normal, fit_normal_scaled)
Warning: Not all models have the same y variable. ('yhash' attributes do not
match)
                  elpd_diff se_diff
fit_normal_scaled    0.0       0.0 
fit_normal        -915.1       0.8 

However to make a fair comparison, we need to take into account the scaling we used. To get the probabilities for integer counts, the densities need to be multiplied by the discretization binwidth 0.093. Correspondingly in log scale we add \log(0.093) to each log-density, and the total adjustment is

nrow(cu_df)*log(sd(cu_df$cu))
[1] 914.2574

Adding this to eldp_loo of the scaled outcome model, makes the two models to have almost the same elpd_loo (the small difference comes from not scaling the priors).

5 Model checking

Back to the original models. The normal model was better than the binomial model in the LOO comparison, but that doesn’t mean it is a good model. Before trying to solve the high Pareto-k issues in LOO computation, we can use posterior predictive checking.

PIT-ECDF plot is useful for checking calibration (Säilynoja, Bürkner, and Vehtari 2022)

pp_check(fit_binomial, type="pit_ecdf")

We see that binomial model has too many PIT values near 0 and 1, which indicates the posterior predictive intervals are too narrow, and the model is not appropriately handling the overdispersion compared to binomial model, even with included varying intercept term (1 | id).

How about the normal model?

pp_check(fit_normal, type="pit_ecdf")

Not bad. There are fewer PIT values near 0 and 1 than expected, but this can be because of double use of data in posterior predictive checking. We can get a more accurate PIT-ECDF plot by using LOO predictive distributions (even if we ignore the few high Pareto-k warnings).

At the moment fixed LOO-PIT for discrete outcomes is not yet in bayesplot, and I’m defining the necessary functions here.

Code
loo_pit <- function(y, yrep, lw) {
  pit <- vapply(seq_len(ncol(yrep)), function(j) {
    sel_min <- yrep[, j] < y[j]
    pit_min <- exp_log_sum_exp(lw[sel_min,j])
    sel_sup <- yrep[, j] == y[j]
    pit_sup <- pit_min + exp_log_sum_exp(lw[sel_sup,j])
    runif(1, pit_min, pit_sup)
  }, FUN.VALUE = 1)
  pmax(pmin(pit, 1), 0)
}
exp_log_sum_exp <- function(x) {
  m <- suppressWarnings(max(x))
  exp(m + log(sum(exp(x - m))))
}

LOO-PIT-ECDF for normal model.

ppc_pit_ecdf(pit=loo_pit(y = cu_df$cu,
                         yrep = posterior_predict(fit_normal),
                         lw = weights(loo(fit_normal)$psis_object))) +
  labs(x="LOO-PIT")

This looks quite good, which is a bit surprising considering the prediction are not constrained between 0 and 28.

6 Model extension

Binomial model doesn’t have overdispersion term, but the normal model does. How about using beta-binomial model which is an overdispersed version of the binomial model.

fit_betabinomial <- brm(formula = cu | trials(set)  ~ group*week + (1 | id),
    data = cu_df,
    beta_binomial(),
    prior = c(prior(normal(0, 1.5), class = Intercept),
                  prior(normal(0, 1), class = b),
                  prior(cauchy(0,2), class = sd)),
    save_pars = save_pars(all=TRUE),
        seed = 1234,
        refresh = 0)
fit_betabinomial <- add_criterion(fit_betabinomial, criterion="loo", save_psis=TRUE)
Warning: Found 18 observations with a pareto_k > 0.7 in model
'fit_betabinomial'. It is recommended to set 'moment_match = TRUE' in order to
perform moment matching for problematic observations.
loo_compare(fit_normal, fit_binomial, fit_betabinomial)
Warning: Not all models have the same y variable. ('yhash' attributes do not
match)

The beta-binomial model beats the normal model big time. The difference is so big that it is unlikely that fixing Pareto-k warnings matter.

7 Improved LOO computation

Since in case of high Pareto-k warnings, the estimate is usually optimistic, it is sufficient if we use moment matching (Paananen et al. 2021) for beta-binomial model.

fit_betabinomial <- add_criterion(fit_betabinomial, criterion="loo",
                                  save_psis=TRUE,
                                  moment_match=TRUE,
                                  overwrite=TRUE)

Moment matching gets all Pareto-k values below the diagnostic threshold. There is negligible change in the comparison results.

loo_compare(fit_normal, fit_binomial, fit_betabinomial)
Warning: Not all models have the same y variable. ('yhash' attributes do not
match)
                 elpd_diff se_diff
fit_betabinomial    0.0       0.0 
fit_normal       -540.5      33.5 
fit_binomial     -848.1     112.0 

Truncated normal might get closer to beta-binomial than normal, but as I said it had difficulties here, and since beta-binomial is proper discrete observation model, I continue with that.

8 Model checking

PIT-ECDF calibration plot looks quite good, but there is a slight S-shape that might be explained by double use of data in posterior predictive checking.

pp_check(fit_betabinomial, type="pit_ecdf")

LOO-PIT-ECDF calibration looks very nice and also better than for the normal model.

ppc_pit_ecdf(pit=loo_pit(y = cu_df$cu,
                         yrep = posterior_predict(fit_betabinomial),
                         lw = weights(loo(fit_betabinomial)$psis_object))) +
  labs(x="LOO-PIT")

The data included many counts of 0 and 28, and we can further check whether we might need to include zero-inflation or 28-inflation model component. As illustrated in Roaches case study PIT’s may sometimes be weak to detect zero-inflation, but reliability diagram (Dimitriadis, Gneiting, and Jordan 2021) can focus there in more detail.

Calibration check with reliability diagram for 0 vs others

th<-0
rd=reliabilitydiag(EMOS = pmin(E_loo(0+(posterior_predict(fit_betabinomial)>th),loo(fit_betabinomial)$psis_object)$value,1),
                   y = as.numeric(cu_df$cu>th))
autoplot(rd)+
  labs(x="Predicted probability of non-zero",
       y="Conditional event probabilities")+
  bayesplot::theme_default(base_family = "sans", base_size=14)

Calibration check with reliability diagram for 28 vs others

th<-27
rd=reliabilitydiag(EMOS = pmin(E_loo(0+(posterior_predict(fit_betabinomial)>th),loo(fit_betabinomial)$psis_object)$value,1),
                   y = as.numeric(cu_df$cu>th))
autoplot(rd)+
  labs(x="Predicted probability of 28",
       y="Conditional event probabilities")+
  bayesplot::theme_default(base_family = "sans", base_size=16)

Although the red line did not completely stay within blue envelope, these look good.

9 Prior sensitivity analysis

The priors provided by Mills seems a bit narrow, so I do prior-likelihood sensitivity analysis using powerscaling approach (Kallioinen et al. 2023).

powerscale_sensitivity(fit_betabinomial,
                       variable=variables(as_draws(fit_betabinomial))[1:11]) |>
  tt() |>
  format_tt(num_fmt="decimal")
tinytable_bno4bjaey95ps4n7dkcd
variable prior likelihood diagnosis
b_Intercept 0.19 0.59 prior-data conflict
b_groupnabiximols 0.07 0.05 prior-data conflict
b_week4 0.18 0.37 prior-data conflict
b_week8 0.18 0.43 prior-data conflict
b_week12 0.17 0.36 prior-data conflict
b_groupnabiximols:week4 0.08 0.19 prior-data conflict
b_groupnabiximols:week8 0.09 0.13 prior-data conflict
b_groupnabiximols:week12 0.11 0.21 prior-data conflict
sd_id__Intercept 0.13 0.66 prior-data conflict
phi 0.18 0.9 prior-data conflict
Intercept 0.11 0.41 prior-data conflict

We have prior-data conflict for all global parameters. It is possible that the priors had been chosen based on substantial prior information and then the posterior should be influenced by the prior, too, and the conflict is not necessarily a problem.

10 Alternative priors

However, I decided to try slightly wider priors, especially as the data seem to be quite informative

fit_betabinomial2 <- brm(formula = cu | trials(set) ~ group*week + (1 | id),
    data = cu_df,
    beta_binomial(),
    prior = c(prior(normal(0, 3), class = Intercept),
                  prior(normal(0, 3), class = b),
                  prior(normal(0, 3), class = sd)),
    save_pars = save_pars(all=TRUE),
        seed = 1234,
        refresh = 0)

There are no prior data conflicts.

powerscale_sensitivity(fit_betabinomial2,
                       variable=variables(as_draws(fit_betabinomial2))[1:11]) |>
  tt() |>
  format_tt(num_fmt="decimal")
tinytable_cx1ivetwgka6x2jpbwm4
variable prior likelihood diagnosis
b_Intercept 0.03 0.38 -
b_groupnabiximols 0.02 0.06 -
b_week4 0.03 0.33 -
b_week8 0.04 0.36 -
b_week12 0.03 0.33 -
b_groupnabiximols:week4 0.02 0.1 -
b_groupnabiximols:week8 0.02 0.07 -
b_groupnabiximols:week12 0.02 0.11 -
sd_id__Intercept 0.03 0.47 -
phi 0.05 0.73 -
Intercept 0.03 0.32 -

We can also do LOO comparison.

fit_betabinomial2 <- add_criterion(fit_betabinomial2, criterion="loo", save_psis=TRUE, moment_match=TRUE, overwrite=TRUE)
loo_compare(fit_betabinomial, fit_betabinomial2)
                  elpd_diff se_diff
fit_betabinomial2  0.0       0.0   
fit_betabinomial  -1.6       2.6   

11 Model refinement

All the above models included the baseline week=0 in the interaction term with group, which does not make sense as the group should not affect the baseline. I modify the data and models by moving the baseline week=0 cus to be pre-treatment covariate.

cu_df_b <- cu_df |> filter(week != 0) |>
  mutate(week = droplevels(week))
cu_df_b <- left_join(cu_df_b,
                     cu_df |> filter(week == 0) |> select(id, cu),
                     by="id",
                     suffix=c("","_baseline"))
fit_betabinomial2b <- brm(formula = cu | trials(set) ~ group*week + cu_baseline + (1 | id),
    data = cu_df_b,
    beta_binomial(),
    prior = c(prior(normal(0, 3), class = Intercept),
                  prior(normal(0, 3), class = b)),
    save_pars = save_pars(all=TRUE),
        seed = 1234,
        refresh = 0)
fit_betabinomial2b <- add_criterion(fit_betabinomial2b, criterion="loo",
                                    save_psis=TRUE, moment_match=TRUE)
Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
Warning: Found 5 observations with a pareto_k > 0.7 in model
'fit_betabinomial2b'. We recommend to set 'reloo = TRUE' in order to calculate
the ELPD without the assumption that these observations are negligible. This
will refit the model 5 times to compute the ELPDs for the problematic
observations directly.

To compare the new model against the previous ones, we exclude the pointwise elpds for week 0.

loo2 <- loo(fit_betabinomial2)
loo2$pointwise <- loo2$pointwise[cu_df$week!="0",]
loo_compare(loo2, loo(fit_betabinomial2b))
Warning: Not all models have the same y variable. ('yhash' attributes do not
match)
                   elpd_diff se_diff
fit_betabinomial2b   0.0       0.0  
fit_betabinomial2  -34.3       9.9  

The quick fix to exclude week 0, does not change the yhash and we get a warning we can ignore. The new model is clearly better than the previous best model.

By comparing phi parameter posteriors, we see that the overdispersion of the beta.binomial in the new model is smaller (smaller phi means bigger overdispersion).

as_draws_df(fit_betabinomial2) |>
  subset_draws(variable="phi") |>
  summarise_draws() |>
  tt()
tinytable_hornwzueda4fyogp7qki
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
phi 1.7 1.7 0.24 0.24 1.3 2.1 1 3294 3433
as_draws_df(fit_betabinomial2b) |>
  subset_draws(variable="phi") |>
  summarise_draws() |>
  tt()
tinytable_5hj0qashhm9zrurauja8
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
phi 3.8 3.7 0.72 0.68 2.7 5.1 1 2038 2689

12 Model checking

LOO-PIT-ECDF plot looks good:

ppc_pit_ecdf(pit=loo_pit(y = cu_df_b$cu,
                         yrep = posterior_predict(fit_betabinomial2b),
                         lw = weights(loo(fit_betabinomial2b)$psis_object))) +
  labs(x="LOO-PIT")

Calibration check with reliability diagrams for 0 vs others and 28 vs others look better than for the previous model.

th<-0
rd=reliabilitydiag(EMOS = pmin(E_loo(0+(posterior_predict(fit_betabinomial2b)>th),loo(fit_betabinomial2b)$psis_object)$value,1),
                   y = as.numeric(cu_df_b$cu>th))
autoplot(rd)+
  labs(x="Predicted probability of non-zero",
       y="Conditional event probabilities")+
  bayesplot::theme_default(base_family = "sans", base_size=16)

th<-27
rd=reliabilitydiag(EMOS = pmin(E_loo(0+(posterior_predict(fit_betabinomial2b)>th),loo(fit_betabinomial2b)$psis_object)$value,1),
                   y = as.numeric(cu_df_b$cu>th))
autoplot(rd)+
  labs(x="Predicted probability of 28",
       y="Conditional event probabilities")+
  bayesplot::theme_default(base_family = "sans", base_size=16)

13 Treatment effect

As the treatment is included in interaction term, looking at the univariate posterior marginal is not sufficient for checking whether the treatment helps.

We can examine the effect of the treatment by looking at the posterior predictive distribution and the expectation of the posterior predictive distribution for a new person (new id). We use tidybayes (Kay 2023) and ggdist (Kay 2024).

We start by examining the posterior predictive distribution. We predict cu for a new id=129 given cu_baseline=28 (median value). The distribution is wide as it included the aleatoric uncertainty from the unknown id specific intercept and beta-binomial distribution.

cu_df_b |>
  data_grid(group, week, cu_baseline=28, id=129, set=28) |>
  add_predicted_draws(fit_betabinomial2b, allow_new_levels=TRUE) |>
  ggplot(aes(x = .prediction)) +
  facet_grid(group ~ week, switch = "y", axes="all_x", labeller = label_both) +
  stat_dotsinterval(quantiles=100)+
  coord_cartesian(expand = FALSE, clip="off") +
  theme(strip.background = element_blank(), strip.placement = "outside") +
  labs(x="", y="") +  
  scale_x_continuous(breaks=c(0,10,20,28), lim=c(-0.5,28.5)) +
  theme(axis.line.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        axis.line.x=element_blank())

The difference between groups is the biggest in week 12. The next plot shows the comparison of predicted cu between the groups in different weeks.

cu_df_b |>
  data_grid(group, week, cu_baseline=28, id=129, set=28) |>
  add_predicted_draws(fit_betabinomial2b, allow_new_levels=TRUE) |>
  compare_levels(.prediction, by=group) |>
  ggplot(aes(x = .prediction)) +
  facet_grid(. ~ week, switch = "y", axes="all_x", labeller = label_both) +
  stat_dotsinterval(quantiles=100)+
  coord_cartesian(expand = FALSE, clip="off") +
  theme(strip.background = element_blank(), strip.placement = "outside") +
  labs(x="Difference in cu given placebo vs Nabiximols", y="") +
  geom_vline(xintercept=0, linetype="dashed") +
  theme(axis.line.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        axis.line.x=element_blank())

For a new individual the posterior predictive distribution indicates 90% possibility of smaller cu with Nabiximols than with placebo.

The following plots shows the expectation of the posterior predictive distribution for cu for a new id=129 given cu_baseline=28 (expectation as in average over many new individuals). The distribution is much narrower as it includes only the epistemic uncertainty, but not the aleatoric uncertainty (from the unknown id specific intercept and beta-binomial distribution).

cu_df_b |>
  data_grid(group, week, cu_baseline=28, id=129, set=28) |>
  add_epred_draws(fit_betabinomial2b, re_formula=NA, allow_new_levels=TRUE) |>
  ggplot(aes(x = .epred)) +
  facet_grid(group ~ week, switch = "y", axes="all_x", labeller = label_both) +
  stat_dotsinterval(quantiles=100)+
  coord_cartesian(expand = FALSE, clip="off") +
  theme(strip.background = element_blank(), strip.placement = "outside") +
  labs(x="", y="") +  
  scale_x_continuous(breaks=c(0,10,20,28), lim=c(-0.5,28.5)) +
  ylim(c(0, 1)) +
  theme(axis.line.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        axis.line.x=element_blank())

The next plot shows the comparison of expected cu between the groups in different weeks.

cu_df_b |>
  data_grid(group, week, cu_baseline=28, id=129, set=28) |>
  add_epred_draws(fit_betabinomial2b, re_formula=NA, allow_new_levels=TRUE) |>
  compare_levels(.epred, by=group) |>
  ggplot(aes(x = .epred)) +
  facet_grid(. ~ week, switch = "y", axes="all_x", labeller = label_both) +
  stat_dotsinterval(quantiles=100)+
  coord_cartesian(expand = FALSE, clip="off") +
  theme(strip.background = element_blank(), strip.placement = "outside") +
  labs(x="Difference in expected cu given placebo vs Nabiximols", y="") +  
  geom_vline(xintercept=0, linetype="dashed") +
  theme(axis.line.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        axis.line.x=element_blank())

For new individuals the posterior predictive distribution indicates 99% possibility of smaller expected cu with Nabiximols than with placebo.

cu_df_b |>
  data_grid(group, week=12, cu_baseline=28, id=129, set=28) |>
  add_epred_draws(fit_betabinomial2b, re_formula=NA, allow_new_levels=TRUE) |>
  compare_levels(.epred, by=group) |>
  mutate(p=as.numeric(.epred<0)) |>
  pull("p") |>
  mean()
[1] 0.989

14 Sensitivity to model choice

Finally we can check how much there is difference in the conclusions, if we had used the continuos normal model or underdispersed binomial model.

The normal model underestimates the magnitude of the change and is overconfident having much narrower posterior. In this case, the conclusion that Nabiximols is beneficial is the same, but too narrow posterior might in some other case lead to an overoptimistic conclusion.

cu_df |>
  data_grid(group, week, id=129, set=28) |>
  filter(week!=0) |>
  add_epred_draws(fit_normal, re_formula=NA, allow_new_levels=TRUE) |>
  compare_levels(.epred, by=group) |>
  ggplot(aes(x = .epred)) +
  facet_grid(. ~ week, switch = "y", axes="all_x", labeller = label_both) +
  stat_dotsinterval(quantiles=100)+
  coord_cartesian(expand = FALSE, clip="off") +
  theme(strip.background = element_blank(), strip.placement = "outside") +
  labs(x="Difference in expected cu given placebo vs Nabiximols", y="") +  
  geom_vline(xintercept=0, linetype="dashed") +
  theme(axis.line.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        axis.line.x=element_blank())

The binomial model is overconfident having much narrower posterior. In this case, the conclusion that Nabiximols is beneficial is the same, but too narrow posterior might in some other case lead to an overoptimistic conclusion.

cu_df |>
  data_grid(group, week, id=129, set=28) |>
  filter(week!=0) |>
  add_epred_draws(fit_binomial, re_formula=NA, allow_new_levels=TRUE) |>
  compare_levels(.epred, by=group) |>
  ggplot(aes(x = .epred)) +
  facet_grid(. ~ week, switch = "y", axes="all_x", labeller = label_both) +
  stat_dotsinterval(quantiles=100)+
  coord_cartesian(expand = FALSE, clip="off") +
  theme(strip.background = element_blank(), strip.placement = "outside") +
  labs(x="Difference in expected cu given placebo vs Nabiximols", y="") +  
  geom_vline(xintercept=0, linetype="dashed") +
  theme(axis.line.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        axis.line.x=element_blank())

15 Posterior vs posterior predictive distribution

Above, posterior predictive checking, LOO predictive checking, and LOO-ELPD comparisons were important for model checking and comparison, and only after I trusted that there is no significant discrepancy between the model predictions and the data, I did look at the treatment effect. I used LOO model comparison to assess the the more elaborate models did provide better predictions, but I did not use it above for assessing the predictive performance of using the treatment effect in the model.

Let’s now build the mode without group variable.

fit_betabinomial3b <- brm(cu | trials(set) ~ week + cu_baseline + (1 | id),
    data = cu_df_b,
    beta_binomial(),
    prior = c(prior(normal(0, 9), class = Intercept),
                  prior(normal(0, 3), class = b)),
    save_pars = save_pars(all=TRUE),
        seed = 1234,
        refresh = 0)
Running MCMC with 4 parallel chains...

Chain 4 finished in 2.8 seconds.
Chain 2 finished in 3.1 seconds.
Chain 3 finished in 4.6 seconds.
Chain 1 finished in 4.8 seconds.

All 4 chains finished successfully.
Mean chain execution time: 3.8 seconds.
Total execution time: 4.8 seconds.
fit_betabinomial3b <- add_criterion(fit_betabinomial3b, criterion="loo", save_psis=TRUE, moment_match=TRUE)
Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
Warning: Found 3 observations with a pareto_k > 0.7 in model
'fit_betabinomial3b'. We recommend to set 'reloo = TRUE' in order to calculate
the ELPD without the assumption that these observations are negligible. This
will refit the model 3 times to compute the ELPDs for the problematic
observations directly.

And compare the models with and without group variable:

loo_compare(fit_betabinomial2b, fit_betabinomial3b)
                   elpd_diff se_diff
fit_betabinomial3b  0.0       0.0   
fit_betabinomial2b -1.0       2.7   

There is no difference in predictive performance. This is probably due to

1. the actual effect not being very far from 0,
2. the aleatoric uncertainty (modelled by the beta-binomial)
   for a new 4-week period is big, that is the predictive
   distribution is very wide and due to the constrained range
   has also thick tails (actually U shape), which makes the log
   score not to be sensitive to tail values as usually

I don’t usually use Bayes factors for many reasons, but for completness showing here that bridgesampling (Gronau, Singmann, and Wagenmakers 2020) estimated BF doesn’t either see difference with or without treatment models.

(br2b <- bridgesampling::bridge_sampler(fit_betabinomial2b, silent=TRUE))
Bridge sampling estimate of the log marginal likelihood: -607.2811
Estimate obtained in 10 iteration(s) via method "normal".
(br3b <- bridgesampling::bridge_sampler(fit_betabinomial3b, silent=TRUE))
Bridge sampling estimate of the log marginal likelihood: -606.3994
Estimate obtained in 8 iteration(s) via method "normal".
bridgesampling::bf(br2b, br3b)
Estimated Bayes factor in favor of br2b over br3b: 0.41407

As the predictive distribution is wide with thick tails, we can also focus on comparing absolute error of using means of the predictive distributions as point estimates. This approach has the benefit that we can improve accuracy of the finite MCMC sample by dropping the sampling from the varying intercept population distribution and from the beta-binomial distribution (using posterior_epred(, re_formula=NA). E_loo is used to go from posterior predictive draws to the mean of leave-one-out predictive distribution.

ae2 <- abs(cu_df_b$cu - E_loo(posterior_epred(fit_betabinomial2b, re_formula=NA),
                              loo(fit_betabinomial2b)$psis_object, type="mean")$value)
ae3 <- abs(cu_df_b$cu - E_loo(posterior_epred(fit_betabinomial3b, re_formula=NA),
                              loo(fit_betabinomial3b)$psis_object, type="mean")$value)

Probability that the leave-one-out predictive absolute error is smaller with model 2b (with group variable) than with model 3b (without group variable) using normal approximation

pnorm(0, mean(ae2-ae3), sd(ae2-ae3)/sqrt(257)) |> round(2)
[1] 0.96

and using Bayesian bootstrap (as the difference distribution was far from normal)

mean(bayesboot::bayesboot(ae2-ae3, mean)$V1 < 0) |> round(2)
[1] 0.95


References

Bürkner, Paul-Christian. 2017. brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.
Dimitriadis, Timo, Tilmann Gneiting, and Alexander I. Jordan. 2021. “Stable Reliability Diagrams for Probabilistic Classifiers.” Proceedings of the National Academy of Sciences 118 (8): e2016191118.
Gronau, Quentin F., Henrik Singmann, and Eric-Jan Wagenmakers. 2020. bridgesampling: An R Package for Estimating Normalizing Constants.” Journal of Statistical Software 92 (10): 1–29. https://doi.org/10.18637/jss.v092.i10.
Kallioinen, Noa, Topi Paananen, Paul-Christian Bürkner, and Aki Vehtari. 2023. “Detecting and Diagnosing Prior and Likelihood Sensitivity with Power-Scaling.” Statistics and Computing 34 (1): 57.
Kay, Matthew. 2023. tidybayes: Tidy Data and Geoms for Bayesian Models. https://doi.org/10.5281/zenodo.1308151.
———. 2024. ggdist: Visualizations of Distributions and Uncertainty. https://doi.org/10.5281/zenodo.3879620.
Lintzeris, Nicholas, Anjali Bhardwaj, Llewellyn Mills, Adrian Dunlop, Jan Copeland, Iain McGregor, Raimondo Bruno, et al. 2019. “Nabiximols for the Treatment of Cannabis Dependence: A Randomized Clinical Trial.” JAMA Internal Medicine 179 (9): 1242–53.
Paananen, Topi, Juho Piironen, Paul-Christian Bürkner, and Aki Vehtari. 2021. “Implicitly Adaptive Importance Sampling.” Statistics and Computing 31 (16).
Säilynoja, Teemu, Paul-Christian Bürkner, and Aki Vehtari. 2022. “Graphical Test for Discrete Uniformity and Its Applications in Goodness-of-Fit Evaluation and Multiple Sample Comparison.” Statistics and Computing 32: 1573–375.
Vehtari, Aki, Jonah Gabry, Mans Magnusson, Yuling Yao, Paul-Christian Bürkner, Topi Paananen, and Andrew Gelman. 2024. “Loo: Efficient Leave-One-Out Cross-Validation and WAIC for Bayesian Models.” https://mc-stan.org/loo/.
Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC.” Statistics and Computing 27: 1413–32.
Vehtari, Aki, Daniel Simpson, Andrew Gelman, Yuling Yao, and Jonah Gabry. 2022. “Pareto Smoothed Importance Sampling.” arXiv Preprint arXiv:1507.02646. https://arxiv.org/abs/1507.02646v6.


Licenses

  • Code © 2024, Aki Vehtari, licensed under BSD-3.
  • Text © 2024, Aki Vehtari, licensed under CC-BY-NC 4.0.