Default prior for negative-binomial shape parameter

Author

Aki Vehtari

Published

2024-03-15

Modified

2024-04-22

library(tibble)
library(ggplot2)
theme_set(bayesplot::theme_default(base_family = "sans"))

1 Introduction

The current default prior in brms for negative-binomial shape parameter is gamma(0.01, 0.01). This strongly favoring smaller shape values. The purpose of this notebook is to present an alternative default prior.

2 Penalised complexity priors

Simpson et al. (2017) developed penalised complexity (PC) prior approach that is useful for constructing reasonable default priors for nested models. Negative-binomial family is over-dispersed compared to Poisson but includes Poisson as a special case. The prior is set on the distance of negative-binomial from the Poisson. I don’t review PC prior approach in this notebook, but I recommend starting by reading Dan Simpson’s blog post on PC priors.

3 PC prior for negative-binomial

Dan provides PC prior for negative-binomial shape parameter \alpha.

pc_alpha <- function(alpha, lambda) {
  lambda / alpha^2 * abs(trigamma(1/alpha) - alpha)/
    sqrt(2*log(1/alpha) - 2*digamma(1/alpha))*
    exp(-lambda*sqrt(2*log(1/alpha) - 2*digamma(1/alpha)))
}
dat <- tibble(alpha = seq(0.0001, 10, length.out = 10000),
              pc_alpha = pc_alpha(alpha, lambda=1))
dat |>
  ggplot(aes(x = alpha, y = pc_alpha)) + 
  geom_hline(yintercept=0, alpha=0.3) +
  geom_vline(xintercept=0, alpha=0.3) +
  geom_line(color = "red") +
  labs(y="density")

But, brms uses negative-binomial with shape parameter \phi=1/\alpha. The transformed prior is p_\phi(\phi) = |J|p_{\alpha(1/) = ^{-2} p_{}(1/)$

pc_phi <- function(phi, lambda) phi^(-2) * pc_alpha(1/phi, lambda)

The following plot shows the PC prior in red, and the current default gamma(0.01,0.01) in black.

dat <- tibble(phi = exp(seq(log(0.01), log(10), length.out = 1000)),
       pc_phi =  pc_phi(phi, lambda=1),
       current = dgamma(phi,0.01,0.01))
dat |>
  ggplot(aes(x = phi, y = pc_phi)) + 
  geom_hline(yintercept=0, alpha=0.3) +
  geom_vline(xintercept=0, alpha=0.3) +
  geom_line(color = "red") +
  geom_line(aes(y=current), color = "black") +
  scale_x_continuous(breaks=0:10, minor_breaks=seq(0,28,by=0.5)) +
  scale_y_continuous(lim=c(0,0.55)) +
  guides(x = guide_axis(minor.ticks = TRUE)) +
  labs(y="density")
Warning: Removed 75 rows containing missing values or values outside the scale range
(`geom_line()`).

Above I used \lambda=1. The lambda parameter of PC prior affects the location of the prior, and looking at this plot \lambda=1 looks just fine with 80% central interval being about (0.3, 100).

Testing a few distributions available in Stan, inverse-gamma is quite a good approximation. Base R doesn’t have inverse-gamma distribution, but it is easy to define (using the Stan parameterization

dinv_gamma <- function(y, a, b) b^a / gamma(a) * y^(-a-1)*exp(-b/y)

We find good parameter values by minimizing KL divergence from the PC prior to the inverse-gamma.

klfun <- \(th) integrate(\(phi) pc_phi(phi,1)*(log(pc_phi(phi,1))-log(dinv_gamma(phi, th[1],th[2]))), lower=5e-3, upper=Inf)$value
optim(c(0.4,.3),klfun)$par |> round(2)
[1] 0.41 0.29

For simplicity these are further rounded to (a=0.4, b=0.3) with only a small increase in KL divergence.

dat <- tibble(phi = exp(seq(log(0.01), log(10), length.out = 1000)),
       pc_phi =  pc_phi(phi, lambda=1),
       invgamma = dinv_gamma(phi, .4, .3))
dat |>
  ggplot(aes(x = phi, y = pc_phi)) + 
  geom_hline(yintercept=0, alpha=0.3) +
  geom_vline(xintercept=0, alpha=0.3) +
  geom_line(color = "red") +
  geom_line(aes(y=invgamma), color = "red", linetype="dashed") +
  scale_x_continuous(breaks=0:10, minor_breaks=seq(0,28,by=0.5)) +
  guides(x = guide_axis(minor.ticks = TRUE)) +
  labs(y="density")

For comparison with \lambda=1/2, 80% central interval is about (0.1, 20), and the inverse-gamma distribution minimizing KL (approximately) has parameters (.4, .09)

klfun <- \(th) integrate(\(phi) pc_phi(phi,1/2)*(log(pc_phi(phi,1/2))-log(dinv_gamma(phi, th[1],th[2]))), lower=5e-3, upper=Inf)$value
optim(c(0.4,.3),klfun)$par |> round(2)
[1] 0.41 0.09
dat <- tibble(phi = exp(seq(log(0.01), log(10), length.out = 1000)),
       pc_phi =  pc_phi(phi, lambda=1),
       pc_phi_half =  pc_phi(phi, lambda=1/2),
       invgamma = dinv_gamma(phi, .4, .3),
       invgamma_half = dinv_gamma(phi, .4, .09))

dat |>
  ggplot(aes(x = phi, y = pc_phi)) + 
  geom_hline(yintercept=0, alpha=0.3) +
  geom_vline(xintercept=0, alpha=0.3) +
  geom_line(color = "red") +
  geom_line(aes(y=pc_phi_half), color = "purple") +
  geom_line(aes(y=invgamma), color = "red", linetype="dashed") +
  geom_line(aes(y=invgamma_half), color = "purple", linetype="dashed") +
  scale_x_continuous(breaks=0:10, minor_breaks=seq(0,28,by=0.5)) +
  guides(x = guide_axis(minor.ticks = TRUE)) +
  labs(y="density")

Based on these, I recommend inv_gamma(.4,.3) which approaximates PC(1) prior as the default prior for negative-binomial shape parameter \phi.

References

Simpson, Daniel, Håvard Rue, Andrea Riebler, Thiago G. Martins, and Sigrunn H. Sørbye. 2017. Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors.” Statistical Science 32 (1): 1–28.