set.seed(2024)
<- 2
lambda <- 5
nu
<- chol(matrix(c(1, 0.5, 0.5, 1), nrow = 2))
L <- matrix(rnorm(1000), ncol = 2) %*% L
z
<- pnorm(z)
y 1] <- qexp(y[,1], rate = lambda)
y[,2] <- qchisq(y[,2], df = nu) y[,
Copulas
An Introduction to Copula Modelling
A common goal is the modelling of dependence between outcomes, however this is only trivially/commonly implemented for Gaussian or Student-T distributed outcomes.
An alternative, more flexible approach is to model the dependence structure separately from the marginal distributions. This is the idea behind copula modelling.
Copula Functions
A copula function allows for the construction of a multivariate cumulative distribution function (CDF) from a set of univariate CDFs (for each marginal), with no requirement that the marginals are Gaussian or Student-T distributed - or that they all have the same distribution.
Given a set of outcomes \(x_1, ..., x_m\), with marginal CDFs \(F_1, ..., F_m\), and the results of applying these CDFs to the outcomes \(u_1 = F_1(x_1), ..., u_m = F_m(x_m)\), the copula function \(C\) is defined as:
\[ C(u | \Gamma) = G_m(G^{-1}(u_1), ..., G^{-1}(u_m)|\Gamma) \] Where \(G_m\) is an \(m\)-variate CDF for a given distribution, \(G^{-1}\) is the univariate quantile function (inverse-CDF) for that distribution, and \(\Gamma\) is a correlation matrix.
The most popular implementation of this is the Gaussian copula:
\[ C(u | \Gamma) = \Phi_m(\Phi^{-1}(u_1), ..., \Phi^{-1}(u_m)|\Gamma) \] Where \(\Phi_m\) is the \(m\)-variate CDF for the standard normal distribution, and \(\Phi^{-1}\) is the univariate quantile function for the standard normal distribution.
Interpreting Dependence in Copula Functions
Note that correlation matrix \(\Gamma\) can no longer be interpreted on the scale of the observed outcomes. Instead, these correlations are now on the scale of the copula function. For example, when a Gaussian copula is used with continuous marginals the estimated correlation is linear between normal scores:
\[ \Gamma_{1,2} = \text{Corr}(\Phi^{-1}(F_1(x_1)), \Phi^{-1}(F_2(x_2))) \]
When a Gaussian copula is used with a discrete marginal, the correlation is with a normally-distributed latent variate. For example, with two ordinal outcomes, the correlation would be interpreted as a polychoric correlation coefficient. This can also be used as a motivating reason for using the Gaussian copula, as opposed to other forms, as the estimated correlations have existing analogues in the literature with known interpretations.
From Copula Functions to Density Functions
With the copula function defined, the joint density function with all continuous marginals is given as:
\[ f(x) = c(u | \Gamma) \prod_{i=1}^{m} f_i(x_i) \] Where \(c(\dot)\) is the density function for the chosen copula function and \(f_i\) is the marginal density function for the \(i\)-th outcome.
Gaussian Copula Density
For a Gaussian copula, the density function is given as:
\[ c(u | \Gamma) = |\Gamma|^{-1/2} \exp\left(\frac{q^T\left(I_m - \Gamma^{-1}\right)q}{2}\right) \] \[ = |\Gamma|^{-1/2} \exp\left(\frac{q^Tq}{2} - \frac{q^T\Gamma^{-1}q}{2}\right) \]
Where \(q = \Phi^{-1}(u)\).
Note that when the outcomes are independent (i.e., \(\Gamma = I\)), or there is only a single outcome, the joint density function simplifies to the product of the marginal density functions.
Multivariate Standard Normal Density and Change-of-Variables
The form of the Gaussian copula density above is also now amenable to being expressed using a multivariate normal. Where the multivariate standard normal density is given as:
\[ \textrm{MVN}(q | 0, \Gamma) = \left(2\pi\right)^{-m/2} |\Gamma|^{-1/2} \exp\left(-\frac{q^T\Gamma^{-1}q}{2}\right) \]
The Gaussian copula density can be expressed as a multivariate standard-normal density multipled by an adjustment term:
\[ c(u | \Gamma) = \textrm{MVN}(q | 0, \Gamma) \cdot \exp\left(\frac{q^Tq}{2}\right) \cdot \left(2\pi\right)^{m/2} \]
The adjustment term above is equal to the reciprocal of the standard-normal density evaluated at \(q\), giving the final form of the Gaussian copula density:
\[ c(u | \Gamma) = \textrm{MVN}(q | 0, \Gamma) \cdot \prod_{i=1}^m\textrm{N}(q_i | 0, 1)^{-1} \]
This form also makes intuitive sense. The Gaussian copula is specifying a multivariate normal density over a transformation of inputs (\(q = \Phi(u)\)), so we need to adjust the density by the absolute derivative of the transformation to account for this. The derivative of the standard normal CDF \(\Phi()\) is the reciprocal of the standard normal density, so we simply multiply the multivariate normal density by the product of these reciprocals.
This also provides other benefits, as we can simply delegate to existing implementations of these density functions which have already received significant optimisation and testing.
Example: Fitting Bivariate Gaussian Copula
To illustrate, we can fit a bivariate Gaussian copula to the generated data from the previous example to see whether we can recover the data-generating parameters. We will first show to estimate these using simple maximum likelihood estimation in R, and then using a Bayesian approach.
To refresh, we have an exponentially-distributed outcome \(y_1\) with rate parameter \(\lambda\), a chi-squared-distributed outcome \(y_2\) with degrees of freedom \(\nu\), and we are modelling the dependence between these outcomes using a Gaussian copula with correlation \(\rho\):
\[ y_1 \sim \textrm{Exp}(\lambda) \]
\[ y_2 \sim \chi^2(\nu) \]
\[ \begin{bmatrix}F_1(y_1) \\ F_2(y_2) \end{bmatrix} \sim \textrm{c}_{\Phi} \left( \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix} \right) \] Where \(F_i(\cdot)\) is the CDF for the \(i\)-th marginal distribution, and \(\textrm{c}_{\Phi}(\cdot)\) is the Gaussian copula density.
Maximum Likelihood Estimation
To estimate these parameters via maximum-likelihood in R, we need to define a function that takes a vector of parameter values and returns the joint log-likelihood. It is also useful to define the function such that the input parameters can take on any value, and then transform these to the appropriate scale within the function (i.e., \(\rho \in \left(-1,1\right)\)). Given that we will then be using transformations of the parameters to define the log-likelihood, we need to add a correction factor to account for this. For more information and examples of different transformations & corrections, see this Stan Documentation Page.
First, we will define a function to calculate the gaussian copula density for a given set of uniform variates \(u\) and correlation matrix \(\Gamma\):
library(mvtnorm)
<- function(u, gamma, log = TRUE) {
dgauss_copula # Map uniform variates to normal scores
<- qnorm(u)
q <- sum(dmvnorm(q, sigma = gamma, log = TRUE)) - sum(dnorm(q, log = TRUE))
ll ifelse(isTRUE(log), ll, exp(ll))
}
# Define the log-likelihood function
<- function(parameters, data_y) {
ll_function # Constrain input parameters to valid values for distributions
<- exp(parameters[1])
lambda <- exp(parameters[2])
nu <- tanh(parameters[3])
rho
<- sum(parameters[1:2]) + log1p(-(rho*rho))
adjustments
<- sum(dexp(data_y[,1], rate = lambda, log = TRUE)) +
marginal_log_lik sum(dchisq(data_y[,2], df = nu, log = TRUE))
# Apply marginal CDFs to data
<- data_y
u 1] <- pexp(data_y[,1], rate = lambda)
u[,2] <- pchisq(data_y[,2], df = nu)
u[,
<- dgauss_copula(u, matrix(c(1, rho, rho, 1), nrow = 2),
copula_log_lik log = TRUE)
# Return joint log-likelihood with adjustments for change-of-variables
+ copula_log_lik + adjustments
marginal_log_lik }
However, rather than returning the log-likelihood itself, we need to return the negative log-likelihood. This is because the optimisation functions in R are minimisation functions. To do this, we’ll simply create a ‘wrapper’ function which takes an arbitrary function as input and returns a function which itseld returns the negative of the input function’s output:
<- function(f) {
neg_ll_function function(...) { -f(...) }
}
Next we can use the optim
function to estimate the parameters:
<- optim(
result par = c(0, 0, 0), # initial values for parameters
fn = neg_ll_function(ll_function),
data_y = y
)
And extract and transform them to the appropriate scale, showing that the recovery was successful:
c(
"lambda" = exp(result$par[1]),
"nu" = exp(result$par[2]),
"rho" = tanh(result$par[3])
|> round(2) )
lambda nu rho
1.96 4.91 0.54
Bayesian Estimation
We can also estimate these parameters using a Bayesian approach. This can be done in two ways: by using our existing ll_function()
implementation for sampling, or by writing a Stan model to estimate the parameters.
Sampling from an R Function
Estimating the posterior distributions for parameters of an R function can be done quite simply using the StanEstimators
R package:
if (!requireNamespace("StanEstimators", quietly = TRUE)) {
::install_github("andrjohns/StanEstimators")
remotes
}library(StanEstimators)
suppressPackageStartupMessages(library(posterior))
This will allow us to use algorithms implemented by Stan to estimate parameters for arbitrary R functions. To use full Bayesian sampling with the No-U-Turn Sampler (NUTS) algorithm, we can use the stan_sample()
function:
# Record executtion for later comparison
<- system.time({
sampling_time <- stan_sample(fn = ll_function,
results_stan par_inits = c(0,0,0),
additional_args = list(data_y = y),
parallel_chains = 4,
quiet = TRUE)
})
Next, we can use the posterior
package to extract the posterior draws for the parameters and transform them to the appropriate scale before summarising:
@draws |>
results_stanmutate_variables(lambda = exp(`pars[1]`), nu = exp(`pars[2]`), rho = tanh(`pars[3]`)) |>
subset_draws(variable=c("lambda", "nu", "rho")) |>
summarise_draws()
# A tibble: 3 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 lambda 1.96 1.96 0.0862 0.0868 1.82 2.10 1.00 2786. 2937.
2 nu 4.92 4.92 0.121 0.120 4.72 5.12 1.00 2750. 2828.
3 rho 0.536 0.537 0.0281 0.0282 0.488 0.582 1.00 2736. 2600.
We can see that estimates are consistent with their maximum likelihood counterparts, and the narrow credible intervals and high effective sample sizes indicate that we can have confidence in the estimated values.
However, full Bayesian sampling from an R function can be computationally expensive and inefficient. This is especially true in this case where we have not also provided a provided a function to calculate the gradients for each parameter. Without this, StanEstimators
has to fall back to using finite differences to estimate the gradients, which can be slow and inaccurate.
Approximate Bayesian Inference with an R Function
One alternative, before moving to using a Stan model directly, is to use the Pathfinder algorithm for approximate Bayesian inference. As the name implies, this allows us to approximate the results from full Bayesian sampling with dramatically reduced computational cost:
<- system.time({
path_time <- stan_pathfinder(fn = ll_function,
results_path par_inits = c(0,0,0),
additional_args = list(data_y = y),
quiet = TRUE,
# Return same number of approximate draws as sampling
num_psis_draws = 4000)
})
We can see that this provided posterior estimates consistent with sampling:
@draws |>
results_pathmutate_variables(lambda = exp(`pars[1]`), nu = exp(`pars[2]`), rho = tanh(`pars[3]`)) |>
subset_draws(variable=c("lambda", "nu", "rho")) |>
summarise_draws()
# A tibble: 3 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 lambda 1.96 1.96 0.0884 0.0885 1.82 2.11 1.00 3678. 3910.
2 nu 4.92 4.92 0.128 0.128 4.71 5.13 1.00 4012. 3914.
3 rho 0.536 0.537 0.0293 0.0290 0.489 0.584 1.00 4224. 4126.
While the execution time was orders of magnitude faster:
c("Stan" = sampling_time[3], "Pathfinder" = path_time[3])
Stan.elapsed Pathfinder.elapsed
77.174 1.763
For more efficient Bayesian sampling, especially with larger or more complex models, it is recommended to use a Stan model directly.
Stan Model
functions {
real gauss_copula_cholesky_lpdf(matrix u, matrix L) {
array[rows(u)] row_vector[cols(u)] q;
for (n in 1:rows(u)) {
q[n] = inv_Phi(u[n]);
}
return multi_normal_cholesky_lpdf(q | rep_row_vector(0, cols(L)), L)
- std_normal_lpdf(to_vector(to_matrix(q)));
}
}
data {
int<lower=0> N;
matrix[N, 2] Y;
}
parameters {
real<lower=0> lambda;
real<lower=0> nu;
cholesky_factor_corr[2] rho_chol;
}
model {
matrix[N, 2] u;
for (n in 1:N) {
1] = exponential_cdf(Y[n, 1] | lambda);
u[n, 2] = chi_square_cdf(Y[n, 2] | nu);
u[n,
}
1] ~ exponential(lambda);
Y[, 2] ~ chi_square(nu);
Y[,
u ~ gauss_copula_cholesky(rho_chol);
}
generated quantities {
real rho = multiply_lower_tri_self_transpose(rho_chol)[1, 2];
}
We will be using the cmdstanr
package for our model fitting:
if (!requireNamespace("cmdstanr", quietly = TRUE)) {
::install_github("stan-dev/cmdstanr")
remotes::check_cmdstan_toolchain(fix = TRUE)
cmdstanr }
First, we create a model object from the above Stan code, which has been saved to a file named copula.stan
:
<- cmdstan_model("copula.stan") copula_mod
Then we can use the compiled model object to sample our parameters:
<- copula_mod$sample(data = list(N = nrow(y), Y = y),
copula_fit parallel_chains = 4)
We can see that the results are consistent with the estimates from our StanEstimators
implementation:
$summary(variables = c("lambda", "nu", "rho")) copula_fit
# A tibble: 3 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 lambda 1.96 1.96 0.0853 0.0853 1.83 2.11 1.00 2618. 2637.
2 nu 4.91 4.91 0.121 0.122 4.72 5.12 1.00 2623. 2903.
3 rho 0.537 0.538 0.0284 0.0286 0.490 0.583 1.00 3086. 2756.
But the time taken to fit the model is significantly faster:
$time()$total copula_fit
[1] 5.402146
Discrete Marginals - Probability Mass via Copula Mass Functions
Direct Estimation: Marginalising Over Uniforms
While the use of copulas are attractive for continuous marginals, their application to discrete marginals is less straightforward. It was possible to analytically define the density function for continuous marginals as there was a unique, one-to-one mapping from the observed, marginal scale (\(x_m\)) to copula function scale (\(q = \Phi^{-1}(u)\)). However, this is not the case for discrete marginals, as there are now a range of \(u_m\) values that map to the same \(x_m\) value.
For discrete marginals, we are instead estimating a probability mass function (PMF): summing the copula density function over all possible values of \(u_m\) that map to the observed \(x_m\) value.
Note that the PMF for a given univariate discrete distribution can be expressed as:
\[ f(x) = F(x) - F(x^-) \] Where \(x^-\) denotes the previous value of \(x\) (or 0 if \(x == 0\)).
This is extended to the multivariate case by summing, for each \(x_m\), the difference between the copula function evaluated at \(x_m\) and \(x_m^-\), holding all other \(x_i\) constant. In other words, we need to integrate the uniform \(u_m\) variable out of the likelihood. This is expressed as:
\[ f(x) = \sum_{j_1=1}^2 \cdot\cdot\cdot \sum_{j_m=1}^2 (-1)^{j_1 + \cdot\cdot\cdot + j_m} C(u_{1,j_1}, \cdot\cdot\cdot, u_{m,j_m} | \Gamma) \] Where \(u_{j,1}\) = \(F_j(x_j^-)\) and \(u_{j,2}\) = \(F_j(x_j)\).
Note that this process is markedly more computationally costly than the continuous case, particularly as the number of outcomes increases - as an additional dimension of integration is added for each outcome. Additionally, evaluating the density for copula with discrete marginals now requires the evaluation of the \(m\)-variate CDF, where the continuous case only required the evaluation of the \(m\)-variate density. While there are existing implementations for the multivariate Gaussian and Student-T CDFs, these require numerical integration and so are markedly more costly than the corresponding density functions.
Using the bivariate case as an example:
\[ f(x_1, x_2) = \sum_{j_1=1}^2 \sum_{j_2=1}^2 (-1)^{j_1 + j_2} C(u_{1,j_1}, u_{2,j_2} | \Gamma) \]
\[ = (-1)^{1+1} C(u_{1,1}, u_{2,1} | \Gamma) \]
\[ + (-1)^{1+2} C(u_{1,1}, u_{2,2} | \Gamma) \]
\[ + (-1)^{2+1} C(u_{1,2}, u_{2,1} | \Gamma) \]
\[ + (-1)^{2+2} C(u_{1,2}, u_{2,2} | \Gamma) \]
Which is more simply expressed as:
\[ f(x_1, x_2) = \] \[ C(F_1(x_1^-), F_2(x_2^-) | \Gamma) \\ \]
\[ - C(F_1(x_1), F_2(x_2^-) | \Gamma) \\ \]
\[ - C(F_1(x_1^-), F_2(x_2) | \Gamma) \\ \]
\[ + C(F_1(x_1), F_2(x_2) | \Gamma) \]
Where \(C(\dot)\) is the copula function. Note that in the univariate case, this simplifies to the marginal PMF:
\[ f(x_1 | I_1) = \Phi_1(\Phi^{-1}(F_1(x_1)) | I_1) - \Phi_1(\Phi^{-1}(F_1(x_1^-)) | I_1) \\ \] \[ = \Phi(\Phi^{-1}(F_1(x_1))) - \Phi(\Phi^{-1}(F_1(x_1^-))) \\ \] \[ = F_1(x_1) - F_1(x_1^-) \]
As \(\Phi_1(\cdot | I_1) == \Phi(\cdot)\) (i.e., a multivariate CDF with one outcome and unit variance is equivalent to the univariate CDF).
Bayesian Estimation: Data Augmentation
Clearly, directing estimating the likelihood by marginalising over the uniform variables is computationally expensive. A more efficient alternative was proposed by Smith & Khaled (2011), which was the use of data augmentation to treat the \(u_m\) uniform variables as parameters to be estimated. In other words, for each observation we estimate a uniform parameter bounded by the marginal CDF evaluated at the upper and lower bounds:
\[ u_m \sim U\left(F_m(x_m^-), F_m(x_m)\right) \]
In Smith & Khaled’s (2011) original paper, this still required some computational complexity to implement, as they were using a Gibbs sampler - requiring the specification of conditional distributions. This is much simpler in Stan (and other HMC samplers), where we only need to specify the joint density function. This means that we simply specify the uniform parameters and use them in the copula density function that we defined above.
To illustrate, we will use a Gaussian copula to model the correlation \(\rho\) between a Poisson-distributed outcome \(y_1\) with rate parameter \(\lambda\) and a binomial outcome \(y_2\) with probability \(\theta\).
As with the example above, we will first use the copula approach to generate the correlated outcomes:
<- 10
lambda_r <- 0.7
theta_r <- 0.5
rho_r <- sample(1:100, 500, replace = TRUE)
y_denom
<- matrix(rnorm(1000), ncol = 2) %*%
z_discrete chol(matrix(c(1, rho_r, rho_r, 1), nrow = 2))
<- pnorm(z_discrete)
y_discrete 1] <- qpois(y_discrete[,1], lambda_r)
y_discrete[,2] <- qbinom(y_discrete[,2], size = y_denom, prob = theta_r) y_discrete[,
Next, we update our Stan model with the data augmentation approach. As Stan allows us to specify parameter bounds directly, we simply need a function which evaluates the marginal CDFs at the upper and lower bounds for each observation and returns these as the bounds for the uniform parameters. This is implemented as the uvar_bounds
function in the Stan model below.
Then the u
parameters are used directly in the copula density function:
functions {
real gauss_copula_cholesky_lpdf(matrix u, matrix L) {
array[rows(u)] row_vector[cols(u)] q;
for (n in 1:rows(u)) {
q[n] = inv_Phi(u[n]);
}
return multi_normal_cholesky_lpdf(q | rep_row_vector(0, cols(L)), L)
- std_normal_lpdf(to_vector(to_matrix(q)));
}
vector gauss_copula_cholesky_pointwise(matrix u, matrix L) {
int N = rows(u);
int J = cols(u);
matrix[J,J] Sigma_inv = chol2inv(L);
vector[J] inv_sigma_inv = inv(diagonal(Sigma_inv));
matrix[N, J] log_lik_mat;
matrix[N, J] G;
matrix[N, J] q;
for (n in 1:N) {
q[n] = inv_Phi(u[n]);
}
G = q * Sigma_inv;
for (n in 1:N) {
for (j in 1:J) {
log_lik_mat[n, j] = normal_lpdf(q[n, j] | q[n, j] - G[n, j] * inv_sigma_inv[j], sqrt(inv_sigma_inv[j]))
- std_normal_lpdf(q[n, j]);
}
}return to_vector(log_lik_mat);
}
matrix uvar_bounds(array[] int pois_y, array[] int binom_y,
array[] int binom_N, real lambda, real theta,
int is_upper) {
int N = size(pois_y);
matrix[N, 2] u_bounds;
for (n in 1:N) {
if (is_upper == 0) {
1] = pois_y[n] == 0.0
u_bounds[n, 0.0 : poisson_cdf(pois_y[n] - 1 | lambda);
? 2] = binom_y[n] == 0.0
u_bounds[n, 0.0 : binomial_cdf(binom_y[n] - 1 | binom_N[n], theta);
? else {
} 1] = poisson_cdf(pois_y[n] | lambda);
u_bounds[n, 2] = binomial_cdf(binom_y[n] | binom_N[n], theta);
u_bounds[n,
}
}
return u_bounds;
}
}
data {
int<lower=0> N;
array[N] int pois_y;
array[N] int binom_y;
array[N] int binom_N;
}
parameters {
real<lower=0> lambda;
real<lower=0, upper=1> theta;
matrix<
lower=uvar_bounds(pois_y, binom_y, binom_N, lambda, theta, 0),
upper=uvar_bounds(pois_y, binom_y, binom_N, lambda, theta, 1)
2] u;
>[N, cholesky_factor_corr[2] rho_chol;
}
model {
u ~ gauss_copula_cholesky(rho_chol);
}
generated quantities {
real rho = multiply_lower_tri_self_transpose(rho_chol)[1, 2];
vector[N*2] log_lik = gauss_copula_cholesky_pointwise(u, rho_chol);
}
Next we follow the same process as the continuous case to fit the model:
<- cmdstan_model("copula_discrete.stan") copula_discrete_mod
Then we can use the compiled model object to sample our parameters:
<- copula_discrete_mod$sample(
copula_discrete_fit data = list(N = nrow(y_discrete),
pois_y = y_discrete[,1],
binom_y = y_discrete[,2],
binom_N = y_denom),
parallel_chains = 4)
We can see that we were able to successfully recover both the marginal and correlation data-generating parameters:
$summary(variables = c("lambda", "theta", "rho")) copula_discrete_fit
# A tibble: 3 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 lambda 9.96 9.96 0.138 0.139 9.73 10.2 1.00 3943. 2451.
2 theta 0.699 0.699 0.00285 0.00286 0.694 0.703 1.00 3658. 2975.
3 rho 0.522 0.523 0.0283 0.0275 0.474 0.568 1.00 3627. 2944.
Additionally, the time taken to fit the model was quite reasonable for the sample size (500):
$time()$total copula_discrete_fit
[1] 93.82755
Mixed Continuous-Discrete Marginals
Combining both continuous and discrete marginals is a simple combination of the two approaches, where the Copula function \(C(\cdot)\) now includes both the continuous and discrete components while marginalising over the \(u\) for the discrete components.
To illustrate, consider if we extended the bivariate Poisson example to include two continuous outcomes \(x_3\) & \(x_4\) with arbitrary density functions \(f_3(\cdot)\) & \(f_4(\cdot)\) and marginal CDFs \(F_3(\cdot)\) & \(F_4(\cdot)\).
The joint density function would now be given as:
\[ f(x_1, x_2, x_3, x_4) = \\ \]
\[ f_3(x_3) \cdot f_4(x_4) \cdot \sum_{j_1=1}^2 \sum_{j_2=1}^2 (-1)^{j_1 + j_2} C(u_{1,j_1}, u_{2,j_2}, F_3(x_3), F_4(x_4) | \Gamma) \]
Or, given that we are using the data-augmentation approach for the discrete marginals:
\[ u_1 \sim U\left(F_1(x_1^-), F_1(x_1)\right) \]
\[ u_2 \sim U\left(F_2(x_2^-), F_2(x_2)\right) \]
\[ f(x_1, x_2, x_3, x_4) = f_3(x_x) \cdot f_4(x_4) \cdot c(u_1, u_2, F_3(x_3), F_4(x_4) | \Gamma) \]
Clearly, this allows for the trivial modelling of any combination of continuous and discrete marginals.
To illustrate, we will combine the previous two approaches, and estimate a Gaussian copula with marginals: \[ y_1 \sim \text{Exponential}(\lambda) \]
\[ y_2 \sim \text{Chi-Square}(\nu) \]
\[ y_3 \sim \text{Poisson}(\lambda_r) \]
\[ y_4 \sim \text{Binomial}(N, \theta_r) \]
As with the previous examples, we will first generate the correlated outcomes:
<- randcorr::randcorr(4)
gamma <- matrix(rnorm(2000), ncol = 4) %*% chol(gamma)
z <- pnorm(z)
y_mix 1] <- qexp(y_mix[,1], rate = lambda)
y_mix[,2] <- qchisq(y_mix[,2], df = nu)
y_mix[,
<- sample(1:100, 500, replace = TRUE)
ymix_denom 3] <- qpois(y_mix[,3], lambda_r)
y_mix[,4] <- qbinom(y_mix[,4], size = ymix_denom, prob = theta_r) y_mix[,
And update our Stan model simply append the bounded uniform parameters (for the discrete marginals) to the evaluations of the marginal CDFs for the continuous marginals, before passing these to the copila density function:
functions {
real gauss_copula_cholesky_lpdf(matrix u, matrix L) {
array[rows(u)] row_vector[cols(u)] q;
for (n in 1:rows(u)) {
q[n] = inv_Phi(u[n]);
}
return multi_normal_cholesky_lpdf(q | rep_row_vector(0, cols(L)), L)
- std_normal_lpdf(to_vector(to_matrix(q)));
}
matrix uvar_bounds(array[] int pois_y, array[] int binom_y,
array[] int binom_N, real lambda, real theta,
int is_upper) {
int N = size(pois_y);
matrix[N, 2] u_bounds;
for (n in 1:N) {
if (is_upper == 0) {
1] = pois_y[n] == 0.0
u_bounds[n, 0.0 : poisson_cdf(pois_y[n] - 1 | lambda);
? 2] = binom_y[n] == 0.0
u_bounds[n, 0.0 : binomial_cdf(binom_y[n] - 1 | binom_N[n], theta);
? else {
} 1] = poisson_cdf(pois_y[n] | lambda);
u_bounds[n, 2] = binomial_cdf(binom_y[n] | binom_N[n], theta);
u_bounds[n,
}
}
return u_bounds;
}
}
data {
int<lower=0> N;
matrix[N, 2] Y;
array[N] int pois_y;
array[N] int binom_y;
array[N] int binom_N;
}
parameters {
real<lower=0> lambda_exp;
real<lower=0> nu;
real<lower=0> lambda_pois;
real<lower=0, upper=1> theta;
matrix<
lower=uvar_bounds(pois_y, binom_y, binom_N, lambda_pois, theta, 0),
upper=uvar_bounds(pois_y, binom_y, binom_N, lambda_pois, theta, 1)
2] u;
>[N, cholesky_factor_corr[4] rho_chol;
}
model {
matrix[N, 4] u_mix;
for (n in 1:N) {
1] = exponential_cdf(Y[n,1] | lambda_exp);
u_mix[n, 2] = chi_square_cdf(Y[n,2] | nu);
u_mix[n, 3] = u[n, 1];
u_mix[n, 4] = u[n, 2];
u_mix[n,
}
1] ~ exponential(lambda_exp);
Y[, 2] ~ chi_square(nu);
Y[,
u_mix ~ gauss_copula_cholesky(rho_chol);
}
generated quantities {
corr_matrix[4] rho = multiply_lower_tri_self_transpose(rho_chol);
}
<- cmdstan_model("copula_mix.stan") copula_mix_mod
<- copula_mix_mod$sample(
copula_mix_fit data = list(N = nrow(y_mix),
Y = y_mix[,1:2],
pois_y = y_mix[,3],
binom_y = y_mix[,4],
binom_N = ymix_denom),
parallel_chains = 4)
We can see that the data-generating parameters (both for the marginals and dependence between them) have all been successfully recovered
$summary(variables = c("lambda_exp", "nu", "lambda_pois", "theta")) copula_mix_fit
# A tibble: 4 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 lambda_exp 2.14 2.14 0.0900 0.0877 2.00 2.29 1.00 2044. 2343.
2 nu 4.88 4.88 0.113 0.111 4.70 5.07 1.00 2136. 2544.
3 lambda_pois 9.93 9.93 0.128 0.130 9.73 10.1 1.00 2168. 2725.
4 theta 0.700 0.700 0.00254 0.00259 0.696 0.705 1.00 4581. 2998.
gamma
[,1] [,2] [,3] [,4]
[1,] 1.0000000 0.6975456 0.78363383 -0.13837356
[2,] 0.6975456 1.0000000 0.74715991 0.49068561
[3,] 0.7836338 0.7471599 1.00000000 -0.01542118
[4,] -0.1383736 0.4906856 -0.01542118 1.00000000
$summary(variables = c("rho")) copula_mix_fit
# A tibble: 16 × 10
variable mean median sd mad q5 q95 rhat ess_bulk
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 rho[1,1] 1 1 0 0 1 1 NA NA
2 rho[2,1] 0.706 0.706 0.0136 0.0132 0.682 0.727 1.00 3539.
3 rho[3,1] 0.792 0.793 0.0139 0.0138 0.768 0.814 1.00 3835.
4 rho[4,1] -0.144 -0.144 0.0304 0.0303 -0.193 -0.0940 1.00 3053.
5 rho[1,2] 0.706 0.706 0.0136 0.0132 0.682 0.727 1.00 3539.
6 rho[2,2] 1 1 0 0 1 1 NA NA
7 rho[3,2] 0.737 0.738 0.0144 0.0144 0.713 0.760 1.00 5418.
8 rho[4,2] 0.471 0.472 0.0236 0.0237 0.432 0.511 1.00 3234.
9 rho[1,3] 0.792 0.793 0.0139 0.0138 0.768 0.814 1.00 3835.
10 rho[2,3] 0.737 0.738 0.0144 0.0144 0.713 0.760 1.00 5418.
11 rho[3,3] 1 1 0 0 1 1 NA NA
12 rho[4,3] -0.0483 -0.0487 0.0322 0.0328 -0.101 0.00458 1.00 3690.
13 rho[1,4] -0.144 -0.144 0.0304 0.0303 -0.193 -0.0940 1.00 3053.
14 rho[2,4] 0.471 0.472 0.0236 0.0237 0.432 0.511 1.00 3234.
15 rho[3,4] -0.0483 -0.0487 0.0322 0.0328 -0.101 0.00458 1.00 3690.
16 rho[4,4] 1 1 0 0 1 1 NA NA
# ℹ 1 more variable: ess_tail <dbl>
$time()$total copula_mix_fit
[1] 170.4394
Mixed Continuous-Discrete Marginals with Missing Data
The use of data augmentation with discrete marginals also allows for the trivial handling of missing data on discrete outcomes. While missing continuous data in Stan models can easily by handled by estimating the observation as a parameter in the model, the same approach cannot be used for discrete data - as HMC-based methods like Stan cannot estimate discrete parameters. However, as the data-augmented copula model is already estimating a uniform parameter for each discrete observation, we can simply set the bounds for the uniform parameter to \((0,1)\) (as we do not have an observed CDF for the bounds).
As such, the uniform parameters are instead estimated as:
\[ u_m \sim \begin{cases} U\left(F_m(x_m^-), F_m(x_m)\right) & \text{if } x_m \text{ is observed} \\ U(0, 1) & \text{if } x_m \text{ is missing} \end{cases} \]
This is also trivial to integrate into our existing Stan model. We simply update our bounds function (uvar_bounds()
) to set the bounds to \((0,1)\) by default, and then set the bounds to \((F_m(x_m^-), F_m(x_m))\) for the observed outcomes:
functions {
real gauss_copula_cholesky_lpdf(matrix u, matrix L) {
array[rows(u)] row_vector[cols(u)] q;
for (n in 1:rows(u)) {
q[n] = inv_Phi(u[n]);
}
return multi_normal_cholesky_lpdf(q | rep_row_vector(0, cols(L)), L)
- std_normal_lpdf(to_vector(to_matrix(q)));
}
matrix uvar_bounds(int N, array[] int pois_y, array[] int binom_y,
array[] int binom_N, array[] int pois_y_ind,
array[] int binom_y_ind, real lambda, real theta,
int is_upper) {
matrix[N, 2] u_bounds = rep_matrix(is_upper, N, 2);
for (n in 1:size(pois_y_ind)) {
if (is_upper == 0) {
1] = pois_y[n] == 0.0
u_bounds[pois_y_ind[n], 0.0 : poisson_cdf(pois_y[n] - 1 | lambda);
? else {
} 1] = poisson_cdf(pois_y[n] | lambda);
u_bounds[pois_y_ind[n],
}
}
for (n in 1:size(binom_y_ind)) {
if (is_upper == 0) {
2] = binom_y[n] == 0.0 ? 0.0
u_bounds[binom_y_ind[n], 1 | binom_N[n], theta);
: binomial_cdf(binom_y[n] - else {
} 2] = binomial_cdf(binom_y[n] | binom_N[n], theta);
u_bounds[binom_y_ind[n],
}
}
return u_bounds;
}
}
data {
int<lower=0> N;
int<lower=0> Npois;
int<lower=0> Nbinom;
matrix[N, 2] Y;
array[Npois] int pois_y;
array[Npois] int pois_y_ind;
array[Nbinom] int binom_y;
array[Nbinom] int binom_y_ind;
array[Nbinom] int binom_N;
}
parameters {
real<lower=0> lambda_exp;
real<lower=0> nu;
real<lower=0> lambda_pois;
real<lower=0, upper=1> theta;
matrix<
lower=uvar_bounds(N, pois_y, binom_y, binom_N, pois_y_ind,
0),
binom_y_ind, lambda_pois, theta, upper=uvar_bounds(N, pois_y, binom_y, binom_N, pois_y_ind,
1)
binom_y_ind, lambda_pois, theta, 2] u;
>[N, cholesky_factor_corr[4] rho_chol;
}
model {
matrix[N, 4] u_mix;
for (n in 1:N) {
1] = exponential_cdf(Y[n,1] | lambda_exp);
u_mix[n, 2] = chi_square_cdf(Y[n,2] | nu);
u_mix[n, 3] = u[n, 1];
u_mix[n, 4] = u[n, 2];
u_mix[n,
}
1] ~ exponential(lambda_exp);
Y[, 2] ~ chi_square(nu);
Y[,
u_mix ~ gauss_copula_cholesky(rho_chol);
}
generated quantities {
corr_matrix[4] rho = multiply_lower_tri_self_transpose(rho_chol);
}
To test this, we will randomly remove 10% of the data for each of the Poisson and Binomial outcomes and fit the model:
<- y_mix
y_mix_miss sample(1:500, 50), 3] <- NA
y_mix_miss[sample(1:500, 50), 4] <- NA
y_mix_miss[<- which(!is.na(y_mix_miss[, 3]))
pois_y_ind <- which(!is.na(y_mix_miss[, 4]))
binom_y_ind
<- list(N = nrow(y_mix_miss),
standata Npois = length(pois_y_ind),
Nbinom = length(binom_y_ind),
Y = y_mix_miss[, 1:2],
pois_y = y_mix_miss[pois_y_ind, 3],
pois_y_ind = pois_y_ind,
binom_y = y_mix_miss[binom_y_ind, 4],
binom_y_ind = binom_y_ind,
binom_N = ymix_denom[binom_y_ind])
<- cmdstan_model("copula_mix_missing.stan") copula_mix_missing_mod
<- copula_mix_missing_mod$sample(data = standata, parallel_chains = 4) copula_mix_missing_fit
We can see that even with missing discrete data, the data-generating parameters have been successfully recovered:
$summary(variables = c("lambda_exp", "nu", "lambda_pois", "theta")) copula_mix_missing_fit
# A tibble: 4 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 lambda_exp 2.15 2.14 0.0851 0.0832 2.01 2.28 1.00 1775. 2849.
2 nu 4.89 4.89 0.111 0.109 4.71 5.08 1.00 1657. 2263.
3 lambda_pois 9.96 9.96 0.124 0.126 9.75 10.2 1.00 1870. 2639.
4 theta 0.701 0.701 0.00259 0.00259 0.697 0.705 1.00 3148. 2901.
gamma
[,1] [,2] [,3] [,4]
[1,] 1.0000000 0.6975456 0.78363383 -0.13837356
[2,] 0.6975456 1.0000000 0.74715991 0.49068561
[3,] 0.7836338 0.7471599 1.00000000 -0.01542118
[4,] -0.1383736 0.4906856 -0.01542118 1.00000000
$summary(variables = c("rho")) copula_mix_missing_fit
# A tibble: 16 × 10
variable mean median sd mad q5 q95 rhat ess_bulk
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 rho[1,1] 1 1 0 0 1 1 NA NA
2 rho[2,1] 0.704 0.705 0.0142 0.0140 0.681 0.727 1.00 2786.
3 rho[3,1] 0.787 0.787 0.0147 0.0146 0.762 0.810 1.00 3356.
4 rho[4,1] -0.150 -0.151 0.0304 0.0299 -0.198 -0.0981 1.00 2214.
5 rho[1,2] 0.704 0.705 0.0142 0.0140 0.681 0.727 1.00 2786.
6 rho[2,2] 1 1 0 0 1 1 NA NA
7 rho[3,2] 0.750 0.751 0.0144 0.0141 0.725 0.773 1.00 5402.
8 rho[4,2] 0.467 0.467 0.0238 0.0239 0.428 0.506 1.00 2864.
9 rho[1,3] 0.787 0.787 0.0147 0.0146 0.762 0.810 1.00 3356.
10 rho[2,3] 0.750 0.751 0.0144 0.0141 0.725 0.773 1.00 5402.
11 rho[3,3] 1 1 0 0 1 1 NA NA
12 rho[4,3] -0.0386 -0.0382 0.0333 0.0342 -0.0929 0.0155 1.00 3163.
13 rho[1,4] -0.150 -0.151 0.0304 0.0299 -0.198 -0.0981 1.00 2214.
14 rho[2,4] 0.467 0.467 0.0238 0.0239 0.428 0.506 1.00 2864.
15 rho[3,4] -0.0386 -0.0382 0.0333 0.0342 -0.0929 0.0155 1.00 3163.
16 rho[4,4] 1 1 0 0 1 1 NA NA
# ℹ 1 more variable: ess_tail <dbl>
Approximate LOO-CV for Gaussian Copula Models
Leave-one-out cross-validation (LOO-CV) is a popular and powerful tool for model comparison and selection. However, it can be difficult to compute for likelihoods which cannot be factorised into a product of individual likelihoods. Thankfully, Buerkner, Gabry, and Vehtari (2020) have previously derived a method for calculating approximate LOO-CV for non-factorisable models with a multivariate-normal outcome distribution.
Pointwise Likelihood for a Multivariate Normal
Given a vector \(y\) which follows a multivariate-normal distribution with mean \(\mu\) and covariance \(\Sigma\):
\[ y \sim \mathcal{MVN}(\mu, \Sigma) \]
The pointwise likelihood is given by:
\[ p(y_i | y_{-i}, \theta) = \mathcal{N}(y_i | y_i - \frac{g_i}{\bar{\sigma}_{ii}}, \sqrt{\bar{\sigma}_{ii}^{-1}}) \]
Where:
\[ g_i = [\Sigma^{-1}(y - \mu)]_i \]
\[ \bar{\sigma}_{ii} = [\Sigma^{-1}]_i \]
Appriximate LOO-CV for Gaussian Copula
As demonstrated earlier, the Gaussian copula density can be expressed as a multivariate normal density divided by the product of the univariate standard normal densities. As such, we can calculate the approximate LOO-CV for a Gaussian copula by applying Buerkner, Gabry, and Vehtari (2020)’s method to the multivariate normal density to obtain a pointwise likelihood, and then divide by the respective univariate standard normal density.
To summarise, given that a Gaussian copula with continuous marginals has density function:
\[ f(y) = c(\Phi(y) | \Gamma) \prod_{i=1}^{m} f_i(y_i) \]
\[ c(\Phi(y) | \Gamma) = \textrm{MVN}(q | 0, \Gamma) \cdot \prod_{i=1}^m\textrm{N}(q_i | 0, 1)^{-1} \]
The pointwise conditional likelihood is given by:
\[ p(y_i | y_{-i}, \theta) = \mathcal{N}(q_i | q_i - \frac{g_i}{\bar{\sigma}_{ii}}, \sqrt{\bar{\sigma}_{ii}^{-1}}) \cdot \textrm{N}(q_i | 0, 1)^{-1} \cdot f_i(y_i) \]
We can then use the PSIS algorithm implemented in the loo
package to approximate \(p(y_i | y_{-i})\).
Exact LOO-CV for Gaussian Copula
In order to validate the approximation, we can also use the method proposed by Buerkner at. al (2020) to perform exact LOO-CV. To do so, the held-out observation \(y_i\) is estimated as a parameter in the model \(y_i^{miss}\) and substituted into the set of observations \(y\) before calculating the marginal CDFs:
\[ u_{miss(i)} = \left(F_1(y_1),...,F_{i-1}(y_{i-1}),F_i(y_i^{miss}),F_{i+1}(y_{i+1}),..., F_m(y_m)\right) \] Such that \(q_{miss(i)}\) is then the result of the standard normal quantile function for each element of \(u_{miss(i)}\). This new \(q_{miss(i)}\) set is then used to define the parameters for the log-predictive density when evaluating it the held-out observation \(q_i\):
\[ p(y_i | y_{-i}, \theta) = \mathcal{N}(q_i | q_i^{miss} - \frac{g_i^{miss}}{\bar{\sigma}_{ii}}, \sqrt{\bar{\sigma}_{ii}^{-1}}) \cdot \textrm{N}(q_i | 0, 1)^{-1} \cdot f_i(y_i) \] Where: \[ g_i^{miss} = [\Gamma^{-1}(q_{miss(i)})]_i \] And we can then estimate the leave-one-out density for the held-out observation using the posterior samples:
\[ p(y_i | y_{-i}) = \sum_{s=1}^{S} p(y_i | y_{-i}, \theta_{-i}^{(s)}) \]
Stan Model
To implement these, we will update our Stan model to either calculate the exact LOO density for a held-out observation (and estimate that observation as a parameter), or to calculate the density for all observations for use in approximate LOO-CV estimation.
We add a data argument y_miss_i
to the model to specify the index of the missing observation. If y_miss_i
is 0, the model will calculate the approximate LOO-CV density for all observations. If y_miss_i
is greater than 0, the model will calculate the exact LOO-CV density for the observation at index y_miss_i
.
functions {
real gauss_copula_cholesky_lpdf(matrix u, matrix L) {
array[rows(u)] row_vector[cols(u)] q;
for (n in 1:rows(u)) {
q[n] = inv_Phi(u[n]);
}
return multi_normal_cholesky_lpdf(q | rep_row_vector(0, cols(L)), L)
- std_normal_lpdf(to_vector(to_matrix(q)));
}
row_vector gauss_copula_cholesky_pointwise(row_vector y_obs, row_vector y_miss,
real lambda, real nu, matrix Sigma_inv) {
int J = cols(Sigma_inv);
vector[J] inv_sigma_inv = inv(diagonal(Sigma_inv));
row_vector[J] log_lik;
vector[2] u_miss = [ exponential_cdf(y_miss[1] | lambda),
2] | nu) ]';
chi_square_cdf(y_miss[vector[2] u = [ exponential_cdf(y_obs[1] | lambda),
2] | nu) ]';
chi_square_cdf(y_obs[vector[2] q_miss = inv_Phi(u_miss);
vector[2] q = inv_Phi(u);
vector[2] g = Sigma_inv * q_miss;
1] | lambda),
log_lik = [ exponential_lpdf(y_obs[2] | nu) ];
chi_square_lpdf(y_obs[
for (j in 1:J) {
log_lik[j] += normal_lpdf(q[j] | q_miss[j] - g[j] * inv_sigma_inv[j],
sqrt(inv_sigma_inv[j]))
- std_normal_lpdf(q[j]);
}return log_lik;
}
}
data {
int<lower=0> N;
matrix[N, 2] Y;
int y_miss_i;
}
transformed data {
int has_missing = y_miss_i > 0;
int y_miss_col = (y_miss_i - 1) %/% N + 1;
int y_miss_row = y_miss_i - (y_miss_col - 1) * N;
array[2] int y_miss_idx = { y_miss_row, y_miss_col };
}
parameters {
real<lower=0> lambda;
real<lower=0> nu;
cholesky_factor_corr[2] rho_chol;
array[has_missing] real<lower=0> y_miss;
}
model {
matrix[N, 2] u;
matrix[N, 2] Ymiss = Y;
if (has_missing) {
1], y_miss_idx[2]] = y_miss[1];
Ymiss[y_miss_idx[
}
for (n in 1:N) {
1] = exponential_cdf(Ymiss[n, 1] | lambda);
u[n, 2] = chi_square_cdf(Ymiss[n, 2] | nu);
u[n,
}
1] ~ exponential(lambda);
Ymiss[, 2] ~ chi_square(nu);
Ymiss[,
u ~ gauss_copula_cholesky(rho_chol);
}
generated quantities {
real rho = multiply_lower_tri_self_transpose(rho_chol)[1, 2];
vector[has_missing ? 1 : N*2] log_lik;
{int J = 2;
matrix[J,J] Sigma_inv = chol2inv(rho_chol);
matrix[has_missing ? 1 : N, J] log_lik_mat;
array[has_missing ? 1 : N] int iters = has_missing ? {y_miss_idx[1]}
1, N);
: linspaced_int_array(N,
for (n in iters) {
row_vector[J] Ymiss = Y[n];
if (has_missing) {
2]] = y_miss[1];
Ymiss[y_miss_idx[
}1 : n]
log_lik_mat[has_missing ?
= gauss_copula_cholesky_pointwise(Y[n], Ymiss, lambda, nu, Sigma_inv);
}
if (has_missing) {
1] = log_lik_mat[1, y_miss_idx[2]];
log_lik[else {
}
log_lik = to_vector(log_lik_mat);
}
} }
<- cmdstan_model("copula_mod_loo.stan") copula_mod_loo
First we’ll estimate the approximate LOO-CV for all observations:
<- copula_mod_loo$sample(data = list(N = nrow(y), Y = y, y_miss_i = 0),
copula_loo_fit parallel_chains = 4)
<- copula_loo_fit$loo() approx_loo
Next, we’ll estimate the exact LOO-CV for each observation. Given the number of observations to hold-out (1000), we will do this parallel using the furrr
and future
packages:
::plan(future::multisession)
future
<- function(idx) {
run_fun <- cmdstanr::cmdstan_model(exe_file = "copula_mod_loo")
copula_mod <- copula_mod$sample(
copula_fit_iter data = list(N = nrow(y), Y = y, y_miss_i = idx)
)
<- copula_fit_iter$draws(variables = "log_lik", format = "draws_df")$`log_lik[1]`
log_lik_draws
data.frame(res = as.numeric(log_lik_draws)) |>
setNames(paste0("log_lik_", idx))
}
<- furrr::future_map_dfc(1:1000, run_fun, .progress = TRUE)
exact_lpd <- function(x) {
log_mean_exp # more stable than log(mean(exp(x)))
<- max(x)
max_x + log(sum(exp(x - max_x))) - log(length(x))
max_x
}<- apply(exact_lpd, 2, log_mean_exp) exact_elpds
Then we can visually compare the approximate and exact LOO-CV estimates:
library(ggplot2)
suppressPackageStartupMessages(library(bayesplot))
color_scheme_set("brightblue")
theme_set(theme_default())
# Create a summary annotation for the plot
<- paste0(paste0("Exact ELPD: ", round(sum(exact_elpds), 2)), "\n",
plot_lab paste0("Approx ELPD: ", round(sum(approx_loo$pointwise[,1]), 2)))
data.frame(loo_elpd = approx_loo$pointwise[,1],
exact_lpd = exact_elpds) |>
ggplot(aes(x = loo_elpd, y = exact_lpd)) +
geom_abline(color = "gray30") +
geom_point(size = 2) +
xlab("Approximate elpds") +
ylab("Exact elpds") +
annotate("text", x = 0, y = -6, label = plot_lab)
We can see that the exact and approximate LOO-CV estimates are extremely consistent.