Default prior for negative-binomial shape parameter

Author

Aki Vehtari

Published

2024-03-15

Modified

2025-11-15

library(tibble)
library(ggplot2)
theme_set(bayesplot::theme_default(base_family = "sans"))

1 Introduction

The default prior in brms for negative-binomial shape parameter used to be gamma(0.01, 0.01), which strongly favors smaller shape values. The purpose of this notebook was to present an alternative default prior. Based on this notebook the default prior in brms is now inv_gamm(.4,.3).

2 Penalised complexity priors

Simpson et al. (2017) developed penalised complexity (PC) prior approach that is useful for constructing reasonable default priors for nested models. Negative-binomial family is over-dispersed compared to Poisson but includes Poisson as a special case. The prior is set on the distance of negative-binomial from the Poisson. I don’t review PC prior approach in this notebook, but I recommend starting by reading Dan Simpson’s blog post on PC priors.

3 PC prior for negative-binomial

Dan provides PC prior for negative-binomial shape parameter \alpha.

pc_alpha <- function(alpha, lambda) {
  lambda / alpha^2 * abs(trigamma(1/alpha) - alpha)/
    sqrt(2*log(1/alpha) - 2*digamma(1/alpha))*
    exp(-lambda*sqrt(2*log(1/alpha) - 2*digamma(1/alpha)))
}
dat <- tibble(alpha = seq(0.0001, 10, length.out = 10000),
              pc_alpha = pc_alpha(alpha, lambda=1))
dat |>
  ggplot(aes(x = alpha, y = pc_alpha)) + 
  geom_hline(yintercept=0, alpha=0.3) +
  geom_vline(xintercept=0, alpha=0.3) +
  geom_line(color = "red") +
  labs(y="density")

But, brms uses negative-binomial with shape parameter \phi=1/\alpha. The transformed prior is p_\phi(\phi) = |J|p_{\alpha(1/) = ^{-2} p_{}(1/)$

pc_phi <- function(phi, lambda) phi^(-2) * pc_alpha(1/phi, lambda)

The following plot shows the PC prior in red, and the current default gamma(0.01,0.01) in black.

dat <- tibble(phi = exp(seq(log(0.01), log(10), length.out = 1000)),
       pc_phi =  pc_phi(phi, lambda=1),
       current = dgamma(phi,0.01,0.01))
dat |>
  ggplot(aes(x = phi, y = pc_phi)) + 
  geom_hline(yintercept=0, alpha=0.3) +
  geom_vline(xintercept=0, alpha=0.3) +
  geom_line(color = "red") +
  geom_line(aes(y=current), color = "black") +
  scale_x_continuous(breaks=0:10, minor_breaks=seq(0,28,by=0.5)) +
  scale_y_continuous(lim=c(0,0.55)) +
  guides(x = guide_axis(minor.ticks = TRUE)) +
  labs(y="density")
Warning: Removed 75 rows containing missing values or values outside the scale range
(`geom_line()`).

Above I used \lambda=1. The lambda parameter of PC prior affects the location of the prior, and looking at this plot \lambda=1 looks just fine with 80% central interval being about (0.3, 100).

Testing a few distributions available in Stan, inverse-gamma is quite a good approximation. Base R doesn’t have inverse-gamma distribution, but it is easy to define (using the Stan parameterization

dinv_gamma <- function(y, a, b) b^a / gamma(a) * y^(-a-1)*exp(-b/y)

We find good parameter values by minimizing KL divergence from the PC prior to the inverse-gamma.

klfun <- \(th) integrate(\(phi) pc_phi(phi,1)*(log(pc_phi(phi,1))-log(dinv_gamma(phi, th[1],th[2]))), lower=5e-3, upper=Inf)$value
optim(c(0.4,.3),klfun)$par |> round(2)
[1] 0.41 0.29

For simplicity these are further rounded to (a=0.4, b=0.3) with only a small increase in KL divergence.

dat <- tibble(phi = exp(seq(log(0.01), log(10), length.out = 1000)),
       pc_phi =  pc_phi(phi, lambda=1),
       invgamma = dinv_gamma(phi, .4, .3))
dat |>
  ggplot(aes(x = phi, y = pc_phi)) + 
  geom_hline(yintercept=0, alpha=0.3) +
  geom_vline(xintercept=0, alpha=0.3) +
  geom_line(color = "red") +
  geom_line(aes(y=invgamma), color = "red", linetype="dashed") +
  scale_x_continuous(breaks=0:10, minor_breaks=seq(0,28,by=0.5)) +
  guides(x = guide_axis(minor.ticks = TRUE)) +
  labs(y="density")

For comparison with \lambda=1/2, 80% central interval is about (0.1, 20), and the inverse-gamma distribution minimizing KL (approximately) has parameters (.4, .09)

klfun <- \(th) integrate(\(phi) pc_phi(phi,1/2)*(log(pc_phi(phi,1/2))-log(dinv_gamma(phi, th[1],th[2]))), lower=5e-3, upper=Inf)$value
optim(c(0.4,.3),klfun)$par |> round(2)
[1] 0.41 0.09
dat <- tibble(phi = exp(seq(log(0.01), log(10), length.out = 1000)),
       pc_phi =  pc_phi(phi, lambda=1),
       pc_phi_half =  pc_phi(phi, lambda=1/2),
       invgamma = dinv_gamma(phi, .4, .3),
       invgamma_half = dinv_gamma(phi, .4, .09))

dat |>
  ggplot(aes(x = phi, y = pc_phi)) + 
  geom_hline(yintercept=0, alpha=0.3) +
  geom_vline(xintercept=0, alpha=0.3) +
  geom_line(color = "red") +
  geom_line(aes(y=pc_phi_half), color = "purple") +
  geom_line(aes(y=invgamma), color = "red", linetype="dashed") +
  geom_line(aes(y=invgamma_half), color = "purple", linetype="dashed") +
  scale_x_continuous(breaks=0:10, minor_breaks=seq(0,28,by=0.5)) +
  guides(x = guide_axis(minor.ticks = TRUE)) +
  labs(y="density")

Based on these, I recommended inv_gamma(.4,.3) which approaximates PC(1) prior as the default prior for negative-binomial shape parameter \phi. brms is now using this prior.

References

Simpson, Daniel, Håvard Rue, Andrea Riebler, Thiago G. Martins, and Sigrunn H. Sørbye. 2017. Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors.” Statistical Science 32 (1): 1–28.