Default prior for negative-binomial shape parameter

Author

Aki Vehtari

Published

2024-03-15

Modified

2024-04-22

library(tibble)
library(ggplot2)
theme_set(bayesplot::theme_default(base_family = "sans"))

1 Introduction

The current default prior in brms for negative-binomial shape parameter is gamma(0.01, 0.01). This strongly favoring smaller shape values. The purpose of this notebook is to present an alternative default prior.

2 Penalised complexity priors

Simpson et al. (2017) developed penalised complexity (PC) prior approach that is useful for constructing reasonable default priors for nested models. Negative-binomial family is over-dispersed compared to Poisson but includes Poisson as a special case. The prior is set on the distance of negative-binomial from the Poisson. I don’t review PC prior approach in this notebook, but I recommend starting by reading Dan Simpson’s blog post on PC priors.

3 PC prior for negative-binomial

Dan provides PC prior for negative-binomial shape parameter \alpha.

pc_alpha <- function(alpha, lambda) {
  lambda / alpha^2 * abs(trigamma(1/alpha) - alpha)/
    sqrt(2*log(1/alpha) - 2*digamma(1/alpha))*
    exp(-lambda*sqrt(2*log(1/alpha) - 2*digamma(1/alpha)))
}
dat <- tibble(alpha = seq(0.0001, 10, length.out = 10000),
              pc_alpha = pc_alpha(alpha, lambda=1))
dat |>
  ggplot(aes(x = alpha, y = pc_alpha)) + 
  geom_hline(yintercept=0, alpha=0.3) +
  geom_vline(xintercept=0, alpha=0.3) +
  geom_line(color = "red") +
  labs(y="density")

But, brms uses negative-binomial with shape parameter \phi=1/\alpha. The transformed prior is p_\phi(\phi) = |J|p_{\alpha(1/) = ^{-2} p_{}(1/)$

pc_phi <- function(phi, lambda) phi^(-2) * pc_alpha(1/phi, lambda)

The following plot shows the PC prior in red, and the current default gamma(0.01,0.01) in black.

dat <- tibble(phi = exp(seq(log(0.01), log(10), length.out = 1000)),
       pc_phi =  pc_phi(phi, lambda=1),
       current = dgamma(phi,0.01,0.01))
dat |>
  ggplot(aes(x = phi, y = pc_phi)) + 
  geom_hline(yintercept=0, alpha=0.3) +
  geom_vline(xintercept=0, alpha=0.3) +
  geom_line(color = "red") +
  geom_line(aes(y=current), color = "black") +
  scale_x_continuous(breaks=0:10, minor_breaks=seq(0,28,by=0.5)) +
  scale_y_continuous(lim=c(0,0.55)) +
  guides(x = guide_axis(minor.ticks = TRUE)) +
  labs(y="density")

Warning: Removed 75 rows containing missing values or values outside the scale range
(`geom_line()`).

Above I used \lambda=1. The lambda parameter of PC prior affects the location of the prior, and looking at this plot \lambda=1 looks just fine with 80% central interval being about (0.3, 100).

Testing a few distributions available in Stan, inverse-gamma is quite a good approximation. Base R doesn’t have inverse-gamma distribution, but it is easy to define (using the Stan parameterization

dinv_gamma <- function(y, a, b) b^a / gamma(a) * y^(-a-1)*exp(-b/y)

We find good parameter values by minimizing KL divergence from the PC prior to the inverse-gamma.

klfun <- \(th) integrate(\(phi) pc_phi(phi,1)*(log(pc_phi(phi,1))-log(dinv_gamma(phi, th[1],th[2]))), lower=5e-3, upper=Inf)$value
optim(c(0.4,.3),klfun)$par |> round(2)

[1] 0.41 0.29

For simplicity these are further rounded to (a=0.4, b=0.3) with only a small increase in KL divergence.

dat <- tibble(phi = exp(seq(log(0.01), log(10), length.out = 1000)),
       pc_phi =  pc_phi(phi, lambda=1),
       invgamma = dinv_gamma(phi, .4, .3))
dat |>
  ggplot(aes(x = phi, y = pc_phi)) + 
  geom_hline(yintercept=0, alpha=0.3) +
  geom_vline(xintercept=0, alpha=0.3) +
  geom_line(color = "red") +
  geom_line(aes(y=invgamma), color = "red", linetype="dashed") +
  scale_x_continuous(breaks=0:10, minor_breaks=seq(0,28,by=0.5)) +
  guides(x = guide_axis(minor.ticks = TRUE)) +
  labs(y="density")

For comparison with \lambda=1/2, 80% central interval is about (0.1, 20), and the inverse-gamma distribution minimizing KL (approximately) has parameters (.4, .09)

klfun <- \(th) integrate(\(phi) pc_phi(phi,1/2)*(log(pc_phi(phi,1/2))-log(dinv_gamma(phi, th[1],th[2]))), lower=5e-3, upper=Inf)$value
optim(c(0.4,.3),klfun)$par |> round(2)

[1] 0.41 0.09

dat <- tibble(phi = exp(seq(log(0.01), log(10), length.out = 1000)),
       pc_phi =  pc_phi(phi, lambda=1),
       pc_phi_half =  pc_phi(phi, lambda=1/2),
       invgamma = dinv_gamma(phi, .4, .3),
       invgamma_half = dinv_gamma(phi, .4, .09))

dat |>
  ggplot(aes(x = phi, y = pc_phi)) + 
  geom_hline(yintercept=0, alpha=0.3) +
  geom_vline(xintercept=0, alpha=0.3) +
  geom_line(color = "red") +
  geom_line(aes(y=pc_phi_half), color = "purple") +
  geom_line(aes(y=invgamma), color = "red", linetype="dashed") +
  geom_line(aes(y=invgamma_half), color = "purple", linetype="dashed") +
  scale_x_continuous(breaks=0:10, minor_breaks=seq(0,28,by=0.5)) +
  guides(x = guide_axis(minor.ticks = TRUE)) +
  labs(y="density")

Based on these, I recommend inv_gamma(.4,.3) which approaximates PC(1) prior as the default prior for negative-binomial shape parameter \phi.

References

Simpson, Daniel, Håvard Rue, Andrea Riebler, Thiago G. Martins, and Sigrunn H. Sørbye. 2017. “Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors.” Statistical Science 32 (1): 1–28.

#' --- #' title: "Default prior for negative-binomial shape parameter" #' author: "Aki Vehtari" #' date: 2024-03-15 #' date-modified: today #' date-format: iso #' format: #' html: #' toc: true #' toc-location: left #' toc-depth: 2 #' number-sections: true #' smooth-scroll: true #' theme: readable #' code-copy: true #' code-download: true #' code-tools: true #' embed-resources: true #' anchor-sections: true #' html-math-method: katex #' bibliography: ../casestudies.bib #' --- library(tibble) library(ggplot2) theme_set(bayesplot::theme_default(base_family = "sans")) #' # Introduction #' #' The current default prior in brms for negative-binomial shape #' parameter is gamma(0.01, 0.01). This strongly favoring smaller #' shape values. The purpose of this notebook is to present an #' alternative default prior. #' #' # Penalised complexity priors #' #' @Simpson+etal:2017:PC_prior developed penalised complexity (PC) prior #' approach that is useful for constructing reasonable default #' priors for nested models. Negative-binomial family is #' over-dispersed compared to Poisson but includes Poisson as a #' special case. The prior is set on the distance of #' negative-binomial from the Poisson. I don't review PC prior #' approach in this notebook, but I recommend starting by reading #' [Dan Simpson's blog post on PC #' priors](https://dansblog.netlify.app/posts/2022-08-29-priors4/priors4.html). #' #' # PC prior for negative-binomial #' #' Dan provides PC prior for negative-binomial shape parameter #' $\alpha$. pc_alpha <- function(alpha, lambda) { lambda / alpha^2 * abs(trigamma(1/alpha) - alpha)/ sqrt(2*log(1/alpha) - 2*digamma(1/alpha))* exp(-lambda*sqrt(2*log(1/alpha) - 2*digamma(1/alpha))) } dat <- tibble(alpha = seq(0.0001, 10, length.out = 10000), pc_alpha = pc_alpha(alpha, lambda=1)) dat |> ggplot(aes(x = alpha, y = pc_alpha)) + geom_hline(yintercept=0, alpha=0.3) + geom_vline(xintercept=0, alpha=0.3) + geom_line(color = "red") + labs(y="density") #' But, brms uses negative-binomial with shape parameter $\phi=1/\alpha$. #' The transformed prior is #' $p_\phi(\phi) = |J|p_{\alpha$(1/\phi) = \phi^{-2} p_{\alpha}(1/\phi)$ pc_phi <- function(phi, lambda) phi^(-2) * pc_alpha(1/phi, lambda) #' The following plot shows the PC prior in red, and the current #' default gamma(0.01,0.01) in black. dat <- tibble(phi = exp(seq(log(0.01), log(10), length.out = 1000)), pc_phi = pc_phi(phi, lambda=1), current = dgamma(phi,0.01,0.01)) dat |> ggplot(aes(x = phi, y = pc_phi)) + geom_hline(yintercept=0, alpha=0.3) + geom_vline(xintercept=0, alpha=0.3) + geom_line(color = "red") + geom_line(aes(y=current), color = "black") + scale_x_continuous(breaks=0:10, minor_breaks=seq(0,28,by=0.5)) + scale_y_continuous(lim=c(0,0.55)) + guides(x = guide_axis(minor.ticks = TRUE)) + labs(y="density") #' Above I used $\lambda=1$. The lambda parameter of PC prior affects the #' location of the prior, and looking at this plot $\lambda=1$ looks just #' fine with 80% central interval being about (0.3, 100). #' #' Testing a few distributions available in Stan, inverse-gamma is #' quite a good approximation. Base R doesn't have inverse-gamma #' distribution, but it is easy to define (using [the Stan parameterization](https://mc-stan.org/docs/functions-reference/positive_continuous_distributions.html#inverse-gamma-distribution) dinv_gamma <- function(y, a, b) b^a / gamma(a) * y^(-a-1)*exp(-b/y) #' We find good parameter values by minimizing KL divergence from the #' PC prior to the inverse-gamma. klfun <- \(th) integrate(\(phi) pc_phi(phi,1)*(log(pc_phi(phi,1))-log(dinv_gamma(phi, th[1],th[2]))), lower=5e-3, upper=Inf)$value optim(c(0.4,.3),klfun)$par |> round(2) #' For simplicity these are further rounded to (a=0.4, b=0.3) with #' only a small increase in KL divergence. dat <- tibble(phi = exp(seq(log(0.01), log(10), length.out = 1000)), pc_phi = pc_phi(phi, lambda=1), invgamma = dinv_gamma(phi, .4, .3)) dat |> ggplot(aes(x = phi, y = pc_phi)) + geom_hline(yintercept=0, alpha=0.3) + geom_vline(xintercept=0, alpha=0.3) + geom_line(color = "red") + geom_line(aes(y=invgamma), color = "red", linetype="dashed") + scale_x_continuous(breaks=0:10, minor_breaks=seq(0,28,by=0.5)) + guides(x = guide_axis(minor.ticks = TRUE)) + labs(y="density") #' For comparison with $\lambda=1/2$, 80% central interval is about (0.1, #' 20), and the inverse-gamma distribution minimizing KL #' (approximately) has parameters (.4, .09) klfun <- \(th) integrate(\(phi) pc_phi(phi,1/2)*(log(pc_phi(phi,1/2))-log(dinv_gamma(phi, th[1],th[2]))), lower=5e-3, upper=Inf)$value optim(c(0.4,.3),klfun)$par |> round(2) dat <- tibble(phi = exp(seq(log(0.01), log(10), length.out = 1000)), pc_phi = pc_phi(phi, lambda=1), pc_phi_half = pc_phi(phi, lambda=1/2), invgamma = dinv_gamma(phi, .4, .3), invgamma_half = dinv_gamma(phi, .4, .09)) dat |> ggplot(aes(x = phi, y = pc_phi)) + geom_hline(yintercept=0, alpha=0.3) + geom_vline(xintercept=0, alpha=0.3) + geom_line(color = "red") + geom_line(aes(y=pc_phi_half), color = "purple") + geom_line(aes(y=invgamma), color = "red", linetype="dashed") + geom_line(aes(y=invgamma_half), color = "purple", linetype="dashed") + scale_x_continuous(breaks=0:10, minor_breaks=seq(0,28,by=0.5)) + guides(x = guide_axis(minor.ticks = TRUE)) + labs(y="density") #' Based on these, I recommend inv_gamma(.4,.3) which approaximates #' PC(1) prior as the default prior for negative-binomial shape #' parameter $\phi$.