Copulas

Author

Andrew Johnson

Published

April 26, 2024

An Introduction to Copula Modelling

A common goal is the modelling of dependence between outcomes, however this is only trivially/commonly implemented for Gaussian or Student-T distributed outcomes.

An alternative, more flexible approach is to model the dependence structure separately from the marginal distributions. This is the idea behind copula modelling.

Copula Functions

A copula function allows for the construction of a multivariate cumulative distribution function (CDF) from a set of univariate CDFs (for each marginal), with no requirement that the marginals are Gaussian or Student-T distributed - or that they all have the same distribution.

Given a set of outcomes \(x_1, ..., x_m\), with marginal CDFs \(F_1, ..., F_m\), and the results of applying these CDFs to the outcomes \(u_1 = F_1(x_1), ..., u_m = F_m(x_m)\), the copula function \(C\) is defined as:

\[ C(u | \Gamma) = G_m(G^{-1}(u_1), ..., G^{-1}(u_m)|\Gamma) \] Where \(G_m\) is an \(m\)-variate CDF for a given distribution, \(G^{-1}\) is the univariate quantile function (inverse-CDF) for that distribution, and \(\Gamma\) is a correlation matrix.

The most popular implementation of this is the Gaussian copula:

\[ C(u | \Gamma) = \Phi_m(\Phi^{-1}(u_1), ..., \Phi^{-1}(u_m)|\Gamma) \] Where \(\Phi_m\) is the \(m\)-variate CDF for the standard normal distribution, and \(\Phi^{-1}\) is the univariate quantile function for the standard normal distribution.

Interpreting Dependence in Copula Functions

Note that correlation matrix \(\Gamma\) can no longer be interpreted on the scale of the observed outcomes. Instead, these correlations are now on the scale of the copula function. For example, when a Gaussian copula is used with continuous marginals the estimated correlation is linear between normal scores:

\[ \Gamma_{1,2} = \text{Corr}(\Phi^{-1}(F_1(x_1)), \Phi^{-1}(F_2(x_2))) \]

When a Gaussian copula is used with a discrete marginal, the correlation is with a normally-distributed latent variate. For example, with two ordinal outcomes, the correlation would be interpreted as a polychoric correlation coefficient. This can also be used as a motivating reason for using the Gaussian copula, as opposed to other forms, as the estimated correlations have existing analogues in the literature with known interpretations.

Generating Arbitrarily-Distributed Correlated Data with Copulas

The Copula approach to generating random variates also allows for the very simple generation of correlated/dependent data, regardless of their desired marginal distributions. To this, we essentially reverse the steps involved in a Gaussian copula - instead of transforming the observed outcomes to multivariate normal scores, we transform multivariate normal scores to the desired distributions.

\[ \begin{bmatrix}z_1\\ \vdots \\ z_m \end{bmatrix} \sim \textrm{MVN} \left( \begin{bmatrix}0 \\ \vdots \\ 0 \end{bmatrix}, \begin{bmatrix} 1 & \dots & \rho_{1m} \\ \vdots & \ddots & \vdots \\ \rho_{m1} & \dots & 1 \end{bmatrix} \right) \]

\[ \begin{bmatrix}y_1\\ \vdots \\ y_m \end{bmatrix} = \begin{bmatrix}F_1^{-1}(\Phi(z_1))\\ \vdots \\ F_m^{-1}(\Phi(z_m)) \end{bmatrix} \]

Where \(F_i^{-1}(\cdot)\) is the quantile function for the \(i\)-th marginal distribution, and \(\Phi(\cdot)\) is the standard normal CDF.

This can be further simplified by using the Cholesky decomposition of the correlation matrix to construct the multivariate normal variates:

\[ x \sim \textrm{N(0, 1)} \Rightarrow z = Lx \sim \textrm{MVN}(0, LL^T) \]

Where \(L\) is the Cholesky decomposition of the correlation matrix \(\Gamma\).

To illustrate, consider the goal of generating an exponentially-distributed outcome \(y_1\) (with rate parameter \(\lambda\)) and a chi-squared-distributed outcome \(y_2\) (with degrees of freedom \(\nu\)) with a correlation of 0.5. This can be simply achieved by sampling from a univariate standard-normal, multiplying by the cholesky factor of the desired correlations, and then transforming these to the desired distributions:

set.seed(2024)
lambda <- 2
nu <- 5

L <- chol(matrix(c(1, 0.5, 0.5, 1), nrow = 2))
z <- matrix(rnorm(1000), ncol = 2) %*% L

y <- pnorm(z)
y[,1] <- qexp(y[,1], rate = lambda)
y[,2] <- qchisq(y[,2], df = nu)

From Copula Functions to Density Functions

With the copula function defined, the joint density function with all continuous marginals is given as:

\[ f(x) = c(u | \Gamma) \prod_{i=1}^{m} f_i(x_i) \] Where \(c(\dot)\) is the density function for the chosen copula function and \(f_i\) is the marginal density function for the \(i\)-th outcome.

Gaussian Copula Density

For a Gaussian copula, the density function is given as:

\[ c(u | \Gamma) = |\Gamma|^{-1/2} \exp\left(\frac{q^T\left(I_m - \Gamma^{-1}\right)q}{2}\right) \] \[ = |\Gamma|^{-1/2} \exp\left(\frac{q^Tq}{2} - \frac{q^T\Gamma^{-1}q}{2}\right) \]

Where \(q = \Phi^{-1}(u)\).

Note that when the outcomes are independent (i.e., \(\Gamma = I\)), or there is only a single outcome, the joint density function simplifies to the product of the marginal density functions.

Multivariate Standard Normal Density and Change-of-Variables

The form of the Gaussian copula density above is also now amenable to being expressed using a multivariate normal. Where the multivariate standard normal density is given as:

\[ \textrm{MVN}(q | 0, \Gamma) = \left(2\pi\right)^{-m/2} |\Gamma|^{-1/2} \exp\left(-\frac{q^T\Gamma^{-1}q}{2}\right) \]

The Gaussian copula density can be expressed as a multivariate standard-normal density multipled by an adjustment term:

\[ c(u | \Gamma) = \textrm{MVN}(q | 0, \Gamma) \cdot \exp\left(\frac{q^Tq}{2}\right) \cdot \left(2\pi\right)^{m/2} \]

The adjustment term above is equal to the reciprocal of the standard-normal density evaluated at \(q\), giving the final form of the Gaussian copula density:

\[ c(u | \Gamma) = \textrm{MVN}(q | 0, \Gamma) \cdot \prod_{i=1}^m\textrm{N}(q_i | 0, 1)^{-1} \]

This form also makes intuitive sense. The Gaussian copula is specifying a multivariate normal density over a transformation of inputs (\(q = \Phi(u)\)), so we need to adjust the density by the absolute derivative of the transformation to account for this. The derivative of the standard normal CDF \(\Phi()\) is the reciprocal of the standard normal density, so we simply multiply the multivariate normal density by the product of these reciprocals.

This also provides other benefits, as we can simply delegate to existing implementations of these density functions which have already received significant optimisation and testing.

Example: Fitting Bivariate Gaussian Copula

To illustrate, we can fit a bivariate Gaussian copula to the generated data from the previous example to see whether we can recover the data-generating parameters. We will first show to estimate these using simple maximum likelihood estimation in R, and then using a Bayesian approach.

To refresh, we have an exponentially-distributed outcome \(y_1\) with rate parameter \(\lambda\), a chi-squared-distributed outcome \(y_2\) with degrees of freedom \(\nu\), and we are modelling the dependence between these outcomes using a Gaussian copula with correlation \(\rho\):

\[ y_1 \sim \textrm{Exp}(\lambda) \]

\[ y_2 \sim \chi^2(\nu) \]

\[ \begin{bmatrix}F_1(y_1) \\ F_2(y_2) \end{bmatrix} \sim \textrm{c}_{\Phi} \left( \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix} \right) \] Where \(F_i(\cdot)\) is the CDF for the \(i\)-th marginal distribution, and \(\textrm{c}_{\Phi}(\cdot)\) is the Gaussian copula density.

Maximum Likelihood Estimation

To estimate these parameters via maximum-likelihood in R, we need to define a function that takes a vector of parameter values and returns the joint log-likelihood. It is also useful to define the function such that the input parameters can take on any value, and then transform these to the appropriate scale within the function (i.e., \(\rho \in \left(-1,1\right)\)). Given that we will then be using transformations of the parameters to define the log-likelihood, we need to add a correction factor to account for this. For more information and examples of different transformations & corrections, see this Stan Documentation Page.

First, we will define a function to calculate the gaussian copula density for a given set of uniform variates \(u\) and correlation matrix \(\Gamma\):

library(mvtnorm)
dgauss_copula <- function(u, gamma, log = TRUE) {
  # Map uniform variates to normal scores
  q <- qnorm(u)
  ll <- sum(dmvnorm(q, sigma = gamma, log = TRUE)) - sum(dnorm(q, log = TRUE))
  ifelse(isTRUE(log), ll, exp(ll))
}

# Define the log-likelihood function
ll_function <- function(parameters, data_y) {
  # Constrain input parameters to valid values for distributions
  lambda <- exp(parameters[1])
  nu <- exp(parameters[2])
  rho <- tanh(parameters[3])

  adjustments <- sum(parameters[1:2]) + log1p(-(rho*rho))

  marginal_log_lik <- sum(dexp(data_y[,1], rate = lambda, log = TRUE)) +
                      sum(dchisq(data_y[,2], df = nu, log = TRUE))

  # Apply marginal CDFs to data
  u <- data_y
  u[,1] <- pexp(data_y[,1], rate = lambda)
  u[,2] <- pchisq(data_y[,2], df = nu)

  copula_log_lik <- dgauss_copula(u, matrix(c(1, rho, rho, 1), nrow = 2),
                                  log = TRUE)

  # Return joint log-likelihood with adjustments for change-of-variables
  marginal_log_lik + copula_log_lik + adjustments
}

However, rather than returning the log-likelihood itself, we need to return the negative log-likelihood. This is because the optimisation functions in R are minimisation functions. To do this, we’ll simply create a ‘wrapper’ function which takes an arbitrary function as input and returns a function which itseld returns the negative of the input function’s output:

neg_ll_function <- function(f) {
  function(...) { -f(...) }
}

Next we can use the optim function to estimate the parameters:

result <- optim(
  par = c(0, 0, 0), # initial values for parameters
  fn = neg_ll_function(ll_function),
  data_y = y
)

And extract and transform them to the appropriate scale, showing that the recovery was successful:

c(
  "lambda" = exp(result$par[1]),
  "nu" = exp(result$par[2]),
  "rho" = tanh(result$par[3])
) |> round(2)

lambda     nu    rho 
  1.96   4.91   0.54

Bayesian Estimation

We can also estimate these parameters using a Bayesian approach. This can be done in two ways: by using our existing ll_function() implementation for sampling, or by writing a Stan model to estimate the parameters.

Sampling from an R Function

Estimating the posterior distributions for parameters of an R function can be done quite simply using the StanEstimators R package:

if (!requireNamespace("StanEstimators", quietly = TRUE)) {
  remotes::install_github("andrjohns/StanEstimators")
}
library(StanEstimators)
suppressPackageStartupMessages(library(posterior))

This will allow us to use algorithms implemented by Stan to estimate parameters for arbitrary R functions. To use full Bayesian sampling with the No-U-Turn Sampler (NUTS) algorithm, we can use the stan_sample() function:

# Record executtion for later comparison
sampling_time <- system.time({
  results_stan <- stan_sample(fn = ll_function,
                              par_inits = c(0,0,0),
                              additional_args = list(data_y = y),
                              parallel_chains = 4,
                              quiet = TRUE)
})

Next, we can use the posterior package to extract the posterior draws for the parameters and transform them to the appropriate scale before summarising:

results_stan@draws |>
    mutate_variables(lambda = exp(`pars[1]`),  nu = exp(`pars[2]`),  rho = tanh(`pars[3]`)) |>
    subset_draws(variable=c("lambda", "nu", "rho")) |>
    summarise_draws()

# A tibble: 3 × 10
  variable  mean median     sd    mad    q5   q95  rhat ess_bulk ess_tail
  <chr>    <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 lambda   1.96   1.96  0.0862 0.0868 1.82  2.10   1.00    2786.    2937.
2 nu       4.92   4.92  0.121  0.120  4.72  5.12   1.00    2750.    2828.
3 rho      0.536  0.537 0.0281 0.0282 0.488 0.582  1.00    2736.    2600.

We can see that estimates are consistent with their maximum likelihood counterparts, and the narrow credible intervals and high effective sample sizes indicate that we can have confidence in the estimated values.

However, full Bayesian sampling from an R function can be computationally expensive and inefficient. This is especially true in this case where we have not also provided a provided a function to calculate the gradients for each parameter. Without this, StanEstimators has to fall back to using finite differences to estimate the gradients, which can be slow and inaccurate.

Approximate Bayesian Inference with an R Function

One alternative, before moving to using a Stan model directly, is to use the Pathfinder algorithm for approximate Bayesian inference. As the name implies, this allows us to approximate the results from full Bayesian sampling with dramatically reduced computational cost:

path_time <- system.time({
  results_path <- stan_pathfinder(fn = ll_function,
                                  par_inits = c(0,0,0),
                                  additional_args = list(data_y = y),
                                  quiet = TRUE,
                            # Return same number of approximate draws as sampling
                            num_psis_draws = 4000)
})

We can see that this provided posterior estimates consistent with sampling:

results_path@draws |>
    mutate_variables(lambda = exp(`pars[1]`),  nu = exp(`pars[2]`),  rho = tanh(`pars[3]`)) |>
    subset_draws(variable=c("lambda", "nu", "rho")) |>
    summarise_draws()

# A tibble: 3 × 10
  variable  mean median     sd    mad    q5   q95  rhat ess_bulk ess_tail
  <chr>    <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 lambda   1.96   1.96  0.0884 0.0885 1.82  2.11   1.00    3678.    3910.
2 nu       4.92   4.92  0.128  0.128  4.71  5.13   1.00    4012.    3914.
3 rho      0.536  0.537 0.0293 0.0290 0.489 0.584  1.00    4224.    4126.

While the execution time was orders of magnitude faster:

c("Stan" = sampling_time[3], "Pathfinder" = path_time[3])

      Stan.elapsed Pathfinder.elapsed 
            77.174              1.763

For more efficient Bayesian sampling, especially with larger or more complex models, it is recommended to use a Stan model directly.

Stan Model

functions {
  real gauss_copula_cholesky_lpdf(matrix u, matrix L) {
    array[rows(u)] row_vector[cols(u)] q;
    for (n in 1:rows(u)) {
      q[n] = inv_Phi(u[n]);
    }

    return multi_normal_cholesky_lpdf(q | rep_row_vector(0, cols(L)), L)
            - std_normal_lpdf(to_vector(to_matrix(q)));
  }
}

data {
  int<lower=0> N;
  matrix[N, 2] Y;
}

parameters {
  real<lower=0> lambda;
  real<lower=0> nu;
  cholesky_factor_corr[2] rho_chol;
}

model {
  matrix[N, 2] u;

  for (n in 1:N) {
    u[n, 1] = exponential_cdf(Y[n, 1] | lambda);
    u[n, 2] = chi_square_cdf(Y[n, 2] | nu);
  }

  Y[, 1] ~ exponential(lambda);
  Y[, 2] ~ chi_square(nu);

  u ~ gauss_copula_cholesky(rho_chol);
}

generated quantities {
  real rho = multiply_lower_tri_self_transpose(rho_chol)[1, 2];
}

We will be using the cmdstanr package for our model fitting:

if (!requireNamespace("cmdstanr", quietly = TRUE)) {
  remotes::install_github("stan-dev/cmdstanr")
  cmdstanr::check_cmdstan_toolchain(fix = TRUE)
}

First, we create a model object from the above Stan code, which has been saved to a file named copula.stan:

copula_mod <- cmdstan_model("copula.stan")

Then we can use the compiled model object to sample our parameters:

copula_fit <- copula_mod$sample(data = list(N = nrow(y), Y = y),
                                parallel_chains = 4)

We can see that the results are consistent with the estimates from our StanEstimators implementation:

copula_fit$summary(variables = c("lambda", "nu", "rho"))

# A tibble: 3 × 10
  variable  mean median     sd    mad    q5   q95  rhat ess_bulk ess_tail
  <chr>    <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 lambda   1.96   1.96  0.0853 0.0853 1.83  2.11   1.00    2618.    2637.
2 nu       4.91   4.91  0.121  0.122  4.72  5.12   1.00    2623.    2903.
3 rho      0.537  0.538 0.0284 0.0286 0.490 0.583  1.00    3086.    2756.

But the time taken to fit the model is significantly faster:

copula_fit$time()$total

[1] 5.402146

Discrete Marginals - Probability Mass via Copula Mass Functions

Direct Estimation: Marginalising Over Uniforms

While the use of copulas are attractive for continuous marginals, their application to discrete marginals is less straightforward. It was possible to analytically define the density function for continuous marginals as there was a unique, one-to-one mapping from the observed, marginal scale (\(x_m\)) to copula function scale (\(q = \Phi^{-1}(u)\)). However, this is not the case for discrete marginals, as there are now a range of \(u_m\) values that map to the same \(x_m\) value.

For discrete marginals, we are instead estimating a probability mass function (PMF): summing the copula density function over all possible values of \(u_m\) that map to the observed \(x_m\) value.

Note that the PMF for a given univariate discrete distribution can be expressed as:

\[ f(x) = F(x) - F(x^-) \] Where \(x^-\) denotes the previous value of \(x\) (or 0 if \(x == 0\)).

This is extended to the multivariate case by summing, for each \(x_m\), the difference between the copula function evaluated at \(x_m\) and \(x_m^-\), holding all other \(x_i\) constant. In other words, we need to integrate the uniform \(u_m\) variable out of the likelihood. This is expressed as:

\[ f(x) = \sum_{j_1=1}^2 \cdot\cdot\cdot \sum_{j_m=1}^2 (-1)^{j_1 + \cdot\cdot\cdot + j_m} C(u_{1,j_1}, \cdot\cdot\cdot, u_{m,j_m} | \Gamma) \] Where \(u_{j,1}\) = \(F_j(x_j^-)\) and \(u_{j,2}\) = \(F_j(x_j)\).

Note that this process is markedly more computationally costly than the continuous case, particularly as the number of outcomes increases - as an additional dimension of integration is added for each outcome. Additionally, evaluating the density for copula with discrete marginals now requires the evaluation of the \(m\)-variate CDF, where the continuous case only required the evaluation of the \(m\)-variate density. While there are existing implementations for the multivariate Gaussian and Student-T CDFs, these require numerical integration and so are markedly more costly than the corresponding density functions.

Using the bivariate case as an example:

\[ f(x_1, x_2) = \sum_{j_1=1}^2 \sum_{j_2=1}^2 (-1)^{j_1 + j_2} C(u_{1,j_1}, u_{2,j_2} | \Gamma) \]

\[ = (-1)^{1+1} C(u_{1,1}, u_{2,1} | \Gamma) \]

\[ + (-1)^{1+2} C(u_{1,1}, u_{2,2} | \Gamma) \]

\[ + (-1)^{2+1} C(u_{1,2}, u_{2,1} | \Gamma) \]

\[ + (-1)^{2+2} C(u_{1,2}, u_{2,2} | \Gamma) \]

Which is more simply expressed as:

\[ f(x_1, x_2) = \] \[ C(F_1(x_1^-), F_2(x_2^-) | \Gamma) \\ \]

\[ - C(F_1(x_1), F_2(x_2^-) | \Gamma) \\ \]

\[ - C(F_1(x_1^-), F_2(x_2) | \Gamma) \\ \]

\[ + C(F_1(x_1), F_2(x_2) | \Gamma) \]

Where \(C(\dot)\) is the copula function. Note that in the univariate case, this simplifies to the marginal PMF:

\[ f(x_1 | I_1) = \Phi_1(\Phi^{-1}(F_1(x_1)) | I_1) - \Phi_1(\Phi^{-1}(F_1(x_1^-)) | I_1) \\ \] \[ = \Phi(\Phi^{-1}(F_1(x_1))) - \Phi(\Phi^{-1}(F_1(x_1^-))) \\ \] \[ = F_1(x_1) - F_1(x_1^-) \]

As \(\Phi_1(\cdot | I_1) == \Phi(\cdot)\) (i.e., a multivariate CDF with one outcome and unit variance is equivalent to the univariate CDF).

Bayesian Estimation: Data Augmentation

Clearly, directing estimating the likelihood by marginalising over the uniform variables is computationally expensive. A more efficient alternative was proposed by Smith & Khaled (2011), which was the use of data augmentation to treat the \(u_m\) uniform variables as parameters to be estimated. In other words, for each observation we estimate a uniform parameter bounded by the marginal CDF evaluated at the upper and lower bounds:

\[ u_m \sim U\left(F_m(x_m^-), F_m(x_m)\right) \]

In Smith & Khaled’s (2011) original paper, this still required some computational complexity to implement, as they were using a Gibbs sampler - requiring the specification of conditional distributions. This is much simpler in Stan (and other HMC samplers), where we only need to specify the joint density function. This means that we simply specify the uniform parameters and use them in the copula density function that we defined above.

To illustrate, we will use a Gaussian copula to model the correlation \(\rho\) between a Poisson-distributed outcome \(y_1\) with rate parameter \(\lambda\) and a binomial outcome \(y_2\) with probability \(\theta\).

As with the example above, we will first use the copula approach to generate the correlated outcomes:

lambda_r <- 10
theta_r <- 0.7
rho_r <- 0.5
y_denom <- sample(1:100, 500, replace = TRUE)

z_discrete <- matrix(rnorm(1000), ncol = 2) %*%
                chol(matrix(c(1, rho_r, rho_r, 1), nrow = 2))
y_discrete <- pnorm(z_discrete)
y_discrete[,1] <- qpois(y_discrete[,1], lambda_r)
y_discrete[,2] <- qbinom(y_discrete[,2], size = y_denom, prob = theta_r)

Next, we update our Stan model with the data augmentation approach. As Stan allows us to specify parameter bounds directly, we simply need a function which evaluates the marginal CDFs at the upper and lower bounds for each observation and returns these as the bounds for the uniform parameters. This is implemented as the uvar_bounds function in the Stan model below.

Then the u parameters are used directly in the copula density function:

functions {
  real gauss_copula_cholesky_lpdf(matrix u, matrix L) {
    array[rows(u)] row_vector[cols(u)] q;
    for (n in 1:rows(u)) {
      q[n] = inv_Phi(u[n]);
    }

    return multi_normal_cholesky_lpdf(q | rep_row_vector(0, cols(L)), L)
            - std_normal_lpdf(to_vector(to_matrix(q)));
  }

  vector gauss_copula_cholesky_pointwise(matrix u, matrix L) {
    int N = rows(u);
    int J = cols(u);
    matrix[J,J] Sigma_inv = chol2inv(L);
    vector[J] inv_sigma_inv = inv(diagonal(Sigma_inv));
    matrix[N, J] log_lik_mat;
    matrix[N, J] G;
    matrix[N, J] q;

    for (n in 1:N) {
      q[n] = inv_Phi(u[n]);
    }

    G = q * Sigma_inv;

    for (n in 1:N) {
      for (j in 1:J) {
        log_lik_mat[n, j] = normal_lpdf(q[n, j] | q[n, j] - G[n, j] * inv_sigma_inv[j], sqrt(inv_sigma_inv[j]))
                              - std_normal_lpdf(q[n, j]);
      }
    }
    return to_vector(log_lik_mat);
  }

  matrix uvar_bounds(array[] int pois_y, array[] int binom_y,
                     array[] int binom_N, real lambda, real theta,
                     int is_upper) {
    int N = size(pois_y);
    matrix[N, 2] u_bounds;

    for (n in 1:N) {
      if (is_upper == 0) {
        u_bounds[n, 1] = pois_y[n] == 0.0
                          ? 0.0 : poisson_cdf(pois_y[n] - 1 | lambda);
        u_bounds[n, 2] = binom_y[n] == 0.0
                          ? 0.0 : binomial_cdf(binom_y[n] - 1 | binom_N[n], theta);
      } else {
        u_bounds[n, 1] = poisson_cdf(pois_y[n] | lambda);
        u_bounds[n, 2] = binomial_cdf(binom_y[n] | binom_N[n], theta);
      }
    }

    return u_bounds;
  }
}

data {
  int<lower=0> N;
  array[N] int pois_y;
  array[N] int binom_y;
  array[N] int binom_N;
}

parameters {
  real<lower=0> lambda;
  real<lower=0, upper=1> theta;
  matrix<
    lower=uvar_bounds(pois_y, binom_y, binom_N, lambda, theta, 0),
    upper=uvar_bounds(pois_y, binom_y, binom_N, lambda, theta, 1)
  >[N, 2] u;
  cholesky_factor_corr[2] rho_chol;
}

model {
  u ~ gauss_copula_cholesky(rho_chol);
}

generated quantities {
  real rho = multiply_lower_tri_self_transpose(rho_chol)[1, 2];
  vector[N*2] log_lik = gauss_copula_cholesky_pointwise(u, rho_chol);
}

Next we follow the same process as the continuous case to fit the model:

copula_discrete_mod <- cmdstan_model("copula_discrete.stan")

Then we can use the compiled model object to sample our parameters:

copula_discrete_fit <- copula_discrete_mod$sample(
                                data = list(N = nrow(y_discrete),
                                            pois_y = y_discrete[,1],
                                            binom_y = y_discrete[,2],
                                            binom_N = y_denom),
                                parallel_chains = 4)

We can see that we were able to successfully recover both the marginal and correlation data-generating parameters:

copula_discrete_fit$summary(variables = c("lambda", "theta", "rho"))

# A tibble: 3 × 10
  variable  mean median      sd     mad    q5    q95  rhat ess_bulk ess_tail
  <chr>    <dbl>  <dbl>   <dbl>   <dbl> <dbl>  <dbl> <dbl>    <dbl>    <dbl>
1 lambda   9.96   9.96  0.138   0.139   9.73  10.2    1.00    3943.    2451.
2 theta    0.699  0.699 0.00285 0.00286 0.694  0.703  1.00    3658.    2975.
3 rho      0.522  0.523 0.0283  0.0275  0.474  0.568  1.00    3627.    2944.

Additionally, the time taken to fit the model was quite reasonable for the sample size (500):

copula_discrete_fit$time()$total

[1] 93.82755

Mixed Continuous-Discrete Marginals

Combining both continuous and discrete marginals is a simple combination of the two approaches, where the Copula function \(C(\cdot)\) now includes both the continuous and discrete components while marginalising over the \(u\) for the discrete components.

To illustrate, consider if we extended the bivariate Poisson example to include two continuous outcomes \(x_3\) & \(x_4\) with arbitrary density functions \(f_3(\cdot)\) & \(f_4(\cdot)\) and marginal CDFs \(F_3(\cdot)\) & \(F_4(\cdot)\).

The joint density function would now be given as:

\[ f(x_1, x_2, x_3, x_4) = \\ \]

\[ f_3(x_3) \cdot f_4(x_4) \cdot \sum_{j_1=1}^2 \sum_{j_2=1}^2 (-1)^{j_1 + j_2} C(u_{1,j_1}, u_{2,j_2}, F_3(x_3), F_4(x_4) | \Gamma) \]

Or, given that we are using the data-augmentation approach for the discrete marginals:

\[ u_1 \sim U\left(F_1(x_1^-), F_1(x_1)\right) \]

\[ u_2 \sim U\left(F_2(x_2^-), F_2(x_2)\right) \]

\[ f(x_1, x_2, x_3, x_4) = f_3(x_x) \cdot f_4(x_4) \cdot c(u_1, u_2, F_3(x_3), F_4(x_4) | \Gamma) \]

Clearly, this allows for the trivial modelling of any combination of continuous and discrete marginals.

To illustrate, we will combine the previous two approaches, and estimate a Gaussian copula with marginals: \[ y_1 \sim \text{Exponential}(\lambda) \]

\[ y_2 \sim \text{Chi-Square}(\nu) \]

\[ y_3 \sim \text{Poisson}(\lambda_r) \]

\[ y_4 \sim \text{Binomial}(N, \theta_r) \]

As with the previous examples, we will first generate the correlated outcomes:

gamma <- randcorr::randcorr(4)
z <- matrix(rnorm(2000), ncol = 4) %*% chol(gamma)
y_mix <- pnorm(z)
y_mix[,1] <- qexp(y_mix[,1], rate = lambda)
y_mix[,2] <- qchisq(y_mix[,2], df = nu)

ymix_denom <- sample(1:100, 500, replace = TRUE)
y_mix[,3] <- qpois(y_mix[,3], lambda_r)
y_mix[,4] <- qbinom(y_mix[,4], size = ymix_denom, prob = theta_r)

And update our Stan model simply append the bounded uniform parameters (for the discrete marginals) to the evaluations of the marginal CDFs for the continuous marginals, before passing these to the copila density function:

functions {
  real gauss_copula_cholesky_lpdf(matrix u, matrix L) {
    array[rows(u)] row_vector[cols(u)] q;
    for (n in 1:rows(u)) {
      q[n] = inv_Phi(u[n]);
    }

    return multi_normal_cholesky_lpdf(q | rep_row_vector(0, cols(L)), L)
            - std_normal_lpdf(to_vector(to_matrix(q)));
  }

  matrix uvar_bounds(array[] int pois_y, array[] int binom_y,
                     array[] int binom_N, real lambda, real theta,
                     int is_upper) {
    int N = size(pois_y);
    matrix[N, 2] u_bounds;

    for (n in 1:N) {
      if (is_upper == 0) {
        u_bounds[n, 1] = pois_y[n] == 0.0
                          ? 0.0 : poisson_cdf(pois_y[n] - 1 | lambda);
        u_bounds[n, 2] = binom_y[n] == 0.0
                          ? 0.0 : binomial_cdf(binom_y[n] - 1 | binom_N[n], theta);
      } else {
        u_bounds[n, 1] = poisson_cdf(pois_y[n] | lambda);
        u_bounds[n, 2] = binomial_cdf(binom_y[n] | binom_N[n], theta);
      }
    }

    return u_bounds;
  }
}

data {
  int<lower=0> N;
  matrix[N, 2] Y;
  array[N] int pois_y;
  array[N] int binom_y;
  array[N] int binom_N;
}

parameters {
  real<lower=0> lambda_exp;
  real<lower=0> nu;
  real<lower=0> lambda_pois;
  real<lower=0, upper=1> theta;
  matrix<
    lower=uvar_bounds(pois_y, binom_y, binom_N, lambda_pois, theta, 0),
    upper=uvar_bounds(pois_y, binom_y, binom_N, lambda_pois, theta, 1)
  >[N, 2] u;
  cholesky_factor_corr[4] rho_chol;
}

model {
  matrix[N, 4] u_mix;
  for (n in 1:N) {
    u_mix[n, 1] = exponential_cdf(Y[n,1] | lambda_exp);
    u_mix[n, 2] = chi_square_cdf(Y[n,2] | nu);
    u_mix[n, 3] = u[n, 1];
    u_mix[n, 4] = u[n, 2];
  }

  Y[, 1] ~ exponential(lambda_exp);
  Y[, 2] ~ chi_square(nu);
  u_mix ~ gauss_copula_cholesky(rho_chol);
}

generated quantities {
  corr_matrix[4] rho = multiply_lower_tri_self_transpose(rho_chol);
}

copula_mix_mod <- cmdstan_model("copula_mix.stan")

copula_mix_fit <- copula_mix_mod$sample(
                                data = list(N = nrow(y_mix),
                                            Y = y_mix[,1:2],
                                            pois_y = y_mix[,3],
                                            binom_y = y_mix[,4],
                                            binom_N = ymix_denom),
                                parallel_chains = 4)

We can see that the data-generating parameters (both for the marginals and dependence between them) have all been successfully recovered

copula_mix_fit$summary(variables = c("lambda_exp", "nu", "lambda_pois", "theta"))

# A tibble: 4 × 10
  variable     mean median      sd     mad    q5    q95  rhat ess_bulk ess_tail
  <chr>       <dbl>  <dbl>   <dbl>   <dbl> <dbl>  <dbl> <dbl>    <dbl>    <dbl>
1 lambda_exp  2.14   2.14  0.0900  0.0877  2.00   2.29   1.00    2044.    2343.
2 nu          4.88   4.88  0.113   0.111   4.70   5.07   1.00    2136.    2544.
3 lambda_pois 9.93   9.93  0.128   0.130   9.73  10.1    1.00    2168.    2725.
4 theta       0.700  0.700 0.00254 0.00259 0.696  0.705  1.00    4581.    2998.

gamma

           [,1]      [,2]        [,3]        [,4]
[1,]  1.0000000 0.6975456  0.78363383 -0.13837356
[2,]  0.6975456 1.0000000  0.74715991  0.49068561
[3,]  0.7836338 0.7471599  1.00000000 -0.01542118
[4,] -0.1383736 0.4906856 -0.01542118  1.00000000

copula_mix_fit$summary(variables = c("rho"))

# A tibble: 16 × 10
   variable    mean  median     sd    mad     q5      q95  rhat ess_bulk
   <chr>      <dbl>   <dbl>  <dbl>  <dbl>  <dbl>    <dbl> <dbl>    <dbl>
 1 rho[1,1]  1       1      0      0       1      1       NA         NA 
 2 rho[2,1]  0.706   0.706  0.0136 0.0132  0.682  0.727    1.00    3539.
 3 rho[3,1]  0.792   0.793  0.0139 0.0138  0.768  0.814    1.00    3835.
 4 rho[4,1] -0.144  -0.144  0.0304 0.0303 -0.193 -0.0940   1.00    3053.
 5 rho[1,2]  0.706   0.706  0.0136 0.0132  0.682  0.727    1.00    3539.
 6 rho[2,2]  1       1      0      0       1      1       NA         NA 
 7 rho[3,2]  0.737   0.738  0.0144 0.0144  0.713  0.760    1.00    5418.
 8 rho[4,2]  0.471   0.472  0.0236 0.0237  0.432  0.511    1.00    3234.
 9 rho[1,3]  0.792   0.793  0.0139 0.0138  0.768  0.814    1.00    3835.
10 rho[2,3]  0.737   0.738  0.0144 0.0144  0.713  0.760    1.00    5418.
11 rho[3,3]  1       1      0      0       1      1       NA         NA 
12 rho[4,3] -0.0483 -0.0487 0.0322 0.0328 -0.101  0.00458  1.00    3690.
13 rho[1,4] -0.144  -0.144  0.0304 0.0303 -0.193 -0.0940   1.00    3053.
14 rho[2,4]  0.471   0.472  0.0236 0.0237  0.432  0.511    1.00    3234.
15 rho[3,4] -0.0483 -0.0487 0.0322 0.0328 -0.101  0.00458  1.00    3690.
16 rho[4,4]  1       1      0      0       1      1       NA         NA 
# ℹ 1 more variable: ess_tail <dbl>

copula_mix_fit$time()$total

[1] 170.4394

Mixed Continuous-Discrete Marginals with Missing Data

The use of data augmentation with discrete marginals also allows for the trivial handling of missing data on discrete outcomes. While missing continuous data in Stan models can easily by handled by estimating the observation as a parameter in the model, the same approach cannot be used for discrete data - as HMC-based methods like Stan cannot estimate discrete parameters. However, as the data-augmented copula model is already estimating a uniform parameter for each discrete observation, we can simply set the bounds for the uniform parameter to \((0,1)\) (as we do not have an observed CDF for the bounds).

As such, the uniform parameters are instead estimated as:

\[ u_m \sim \begin{cases} U\left(F_m(x_m^-), F_m(x_m)\right) & \text{if } x_m \text{ is observed} \\ U(0, 1) & \text{if } x_m \text{ is missing} \end{cases} \]

This is also trivial to integrate into our existing Stan model. We simply update our bounds function (uvar_bounds()) to set the bounds to \((0,1)\) by default, and then set the bounds to \((F_m(x_m^-), F_m(x_m))\) for the observed outcomes:

functions {
  real gauss_copula_cholesky_lpdf(matrix u, matrix L) {
    array[rows(u)] row_vector[cols(u)] q;
    for (n in 1:rows(u)) {
      q[n] = inv_Phi(u[n]);
    }

    return multi_normal_cholesky_lpdf(q | rep_row_vector(0, cols(L)), L)
            - std_normal_lpdf(to_vector(to_matrix(q)));
  }

  matrix uvar_bounds(int N, array[] int pois_y, array[] int binom_y,
                      array[] int binom_N, array[] int pois_y_ind,
                      array[] int binom_y_ind, real lambda, real theta,
                      int is_upper) {
    matrix[N, 2] u_bounds = rep_matrix(is_upper, N, 2);

    for (n in 1:size(pois_y_ind)) {
      if (is_upper == 0) {
        u_bounds[pois_y_ind[n], 1] = pois_y[n] == 0.0
                                      ? 0.0 : poisson_cdf(pois_y[n] - 1 | lambda);
      } else {
        u_bounds[pois_y_ind[n], 1] = poisson_cdf(pois_y[n] | lambda);
      }
    }

    for (n in 1:size(binom_y_ind)) {
      if (is_upper == 0) {
        u_bounds[binom_y_ind[n], 2] = binom_y[n] == 0.0 ? 0.0
                                      : binomial_cdf(binom_y[n] - 1 | binom_N[n], theta);
      } else {
        u_bounds[binom_y_ind[n], 2] = binomial_cdf(binom_y[n] | binom_N[n], theta);
      }
    }

    return u_bounds;
  }
}

data {
  int<lower=0> N;
  int<lower=0> Npois;
  int<lower=0> Nbinom;
  matrix[N, 2] Y;
  array[Npois] int pois_y;
  array[Npois] int pois_y_ind;
  array[Nbinom] int binom_y;
  array[Nbinom] int binom_y_ind;
  array[Nbinom] int binom_N;
}

parameters {
  real<lower=0> lambda_exp;
  real<lower=0> nu;
  real<lower=0> lambda_pois;
  real<lower=0, upper=1> theta;
  matrix<
    lower=uvar_bounds(N, pois_y, binom_y, binom_N, pois_y_ind,
                      binom_y_ind, lambda_pois, theta, 0),
    upper=uvar_bounds(N, pois_y, binom_y, binom_N, pois_y_ind,
                      binom_y_ind, lambda_pois, theta, 1)
  >[N, 2] u;
  cholesky_factor_corr[4] rho_chol;
}

model {
  matrix[N, 4] u_mix;
  for (n in 1:N) {
    u_mix[n, 1] = exponential_cdf(Y[n,1] | lambda_exp);
    u_mix[n, 2] = chi_square_cdf(Y[n,2] | nu);
    u_mix[n, 3] = u[n, 1];
    u_mix[n, 4] = u[n, 2];
  }

  Y[, 1] ~ exponential(lambda_exp);
  Y[, 2] ~ chi_square(nu);

  u_mix ~ gauss_copula_cholesky(rho_chol);
}

generated quantities {
  corr_matrix[4] rho = multiply_lower_tri_self_transpose(rho_chol);
}

To test this, we will randomly remove 10% of the data for each of the Poisson and Binomial outcomes and fit the model:

y_mix_miss <- y_mix
y_mix_miss[sample(1:500, 50), 3] <- NA
y_mix_miss[sample(1:500, 50), 4] <- NA
pois_y_ind <- which(!is.na(y_mix_miss[, 3]))
binom_y_ind <- which(!is.na(y_mix_miss[, 4]))

standata <-  list(N = nrow(y_mix_miss),
                        Npois = length(pois_y_ind),
                        Nbinom = length(binom_y_ind),
                        Y = y_mix_miss[, 1:2],
                        pois_y = y_mix_miss[pois_y_ind, 3],
                        pois_y_ind = pois_y_ind,
                        binom_y = y_mix_miss[binom_y_ind, 4],
                        binom_y_ind = binom_y_ind,
                        binom_N = ymix_denom[binom_y_ind])

copula_mix_missing_mod <- cmdstan_model("copula_mix_missing.stan")

copula_mix_missing_fit <- copula_mix_missing_mod$sample(data = standata, parallel_chains = 4)

We can see that even with missing discrete data, the data-generating parameters have been successfully recovered:

copula_mix_missing_fit$summary(variables = c("lambda_exp", "nu", "lambda_pois", "theta"))

# A tibble: 4 × 10
  variable     mean median      sd     mad    q5    q95  rhat ess_bulk ess_tail
  <chr>       <dbl>  <dbl>   <dbl>   <dbl> <dbl>  <dbl> <dbl>    <dbl>    <dbl>
1 lambda_exp  2.15   2.14  0.0851  0.0832  2.01   2.28   1.00    1775.    2849.
2 nu          4.89   4.89  0.111   0.109   4.71   5.08   1.00    1657.    2263.
3 lambda_pois 9.96   9.96  0.124   0.126   9.75  10.2    1.00    1870.    2639.
4 theta       0.701  0.701 0.00259 0.00259 0.697  0.705  1.00    3148.    2901.

gamma

           [,1]      [,2]        [,3]        [,4]
[1,]  1.0000000 0.6975456  0.78363383 -0.13837356
[2,]  0.6975456 1.0000000  0.74715991  0.49068561
[3,]  0.7836338 0.7471599  1.00000000 -0.01542118
[4,] -0.1383736 0.4906856 -0.01542118  1.00000000

copula_mix_missing_fit$summary(variables = c("rho"))

# A tibble: 16 × 10
   variable    mean  median     sd    mad      q5     q95  rhat ess_bulk
   <chr>      <dbl>   <dbl>  <dbl>  <dbl>   <dbl>   <dbl> <dbl>    <dbl>
 1 rho[1,1]  1       1      0      0       1       1      NA         NA 
 2 rho[2,1]  0.704   0.705  0.0142 0.0140  0.681   0.727   1.00    2786.
 3 rho[3,1]  0.787   0.787  0.0147 0.0146  0.762   0.810   1.00    3356.
 4 rho[4,1] -0.150  -0.151  0.0304 0.0299 -0.198  -0.0981  1.00    2214.
 5 rho[1,2]  0.704   0.705  0.0142 0.0140  0.681   0.727   1.00    2786.
 6 rho[2,2]  1       1      0      0       1       1      NA         NA 
 7 rho[3,2]  0.750   0.751  0.0144 0.0141  0.725   0.773   1.00    5402.
 8 rho[4,2]  0.467   0.467  0.0238 0.0239  0.428   0.506   1.00    2864.
 9 rho[1,3]  0.787   0.787  0.0147 0.0146  0.762   0.810   1.00    3356.
10 rho[2,3]  0.750   0.751  0.0144 0.0141  0.725   0.773   1.00    5402.
11 rho[3,3]  1       1      0      0       1       1      NA         NA 
12 rho[4,3] -0.0386 -0.0382 0.0333 0.0342 -0.0929  0.0155  1.00    3163.
13 rho[1,4] -0.150  -0.151  0.0304 0.0299 -0.198  -0.0981  1.00    2214.
14 rho[2,4]  0.467   0.467  0.0238 0.0239  0.428   0.506   1.00    2864.
15 rho[3,4] -0.0386 -0.0382 0.0333 0.0342 -0.0929  0.0155  1.00    3163.
16 rho[4,4]  1       1      0      0       1       1      NA         NA 
# ℹ 1 more variable: ess_tail <dbl>

Approximate LOO-CV for Gaussian Copula Models

Leave-one-out cross-validation (LOO-CV) is a popular and powerful tool for model comparison and selection. However, it can be difficult to compute for likelihoods which cannot be factorised into a product of individual likelihoods. Thankfully, Buerkner, Gabry, and Vehtari (2020) have previously derived a method for calculating approximate LOO-CV for non-factorisable models with a multivariate-normal outcome distribution.

Pointwise Likelihood for a Multivariate Normal

Given a vector \(y\) which follows a multivariate-normal distribution with mean \(\mu\) and covariance \(\Sigma\):

\[ y \sim \mathcal{MVN}(\mu, \Sigma) \]

The pointwise likelihood is given by:

\[ p(y_i | y_{-i}, \theta) = \mathcal{N}(y_i | y_i - \frac{g_i}{\bar{\sigma}_{ii}}, \sqrt{\bar{\sigma}_{ii}^{-1}}) \]

Where:

\[ g_i = [\Sigma^{-1}(y - \mu)]_i \]

\[ \bar{\sigma}_{ii} = [\Sigma^{-1}]_i \]

Appriximate LOO-CV for Gaussian Copula

As demonstrated earlier, the Gaussian copula density can be expressed as a multivariate normal density divided by the product of the univariate standard normal densities. As such, we can calculate the approximate LOO-CV for a Gaussian copula by applying Buerkner, Gabry, and Vehtari (2020)’s method to the multivariate normal density to obtain a pointwise likelihood, and then divide by the respective univariate standard normal density.

To summarise, given that a Gaussian copula with continuous marginals has density function:

\[ f(y) = c(\Phi(y) | \Gamma) \prod_{i=1}^{m} f_i(y_i) \]

\[ c(\Phi(y) | \Gamma) = \textrm{MVN}(q | 0, \Gamma) \cdot \prod_{i=1}^m\textrm{N}(q_i | 0, 1)^{-1} \]

The pointwise conditional likelihood is given by:

\[ p(y_i | y_{-i}, \theta) = \mathcal{N}(q_i | q_i - \frac{g_i}{\bar{\sigma}_{ii}}, \sqrt{\bar{\sigma}_{ii}^{-1}}) \cdot \textrm{N}(q_i | 0, 1)^{-1} \cdot f_i(y_i) \]

We can then use the PSIS algorithm implemented in the loo package to approximate \(p(y_i | y_{-i})\).

Exact LOO-CV for Gaussian Copula

In order to validate the approximation, we can also use the method proposed by Buerkner at. al (2020) to perform exact LOO-CV. To do so, the held-out observation \(y_i\) is estimated as a parameter in the model \(y_i^{miss}\) and substituted into the set of observations \(y\) before calculating the marginal CDFs:

\[ u_{miss(i)} = \left(F_1(y_1),...,F_{i-1}(y_{i-1}),F_i(y_i^{miss}),F_{i+1}(y_{i+1}),..., F_m(y_m)\right) \] Such that \(q_{miss(i)}\) is then the result of the standard normal quantile function for each element of \(u_{miss(i)}\). This new \(q_{miss(i)}\) set is then used to define the parameters for the log-predictive density when evaluating it the held-out observation \(q_i\):

\[ p(y_i | y_{-i}, \theta) = \mathcal{N}(q_i | q_i^{miss} - \frac{g_i^{miss}}{\bar{\sigma}_{ii}}, \sqrt{\bar{\sigma}_{ii}^{-1}}) \cdot \textrm{N}(q_i | 0, 1)^{-1} \cdot f_i(y_i) \] Where: \[ g_i^{miss} = [\Gamma^{-1}(q_{miss(i)})]_i \] And we can then estimate the leave-one-out density for the held-out observation using the posterior samples:

\[ p(y_i | y_{-i}) = \sum_{s=1}^{S} p(y_i | y_{-i}, \theta_{-i}^{(s)}) \]

Stan Model

To implement these, we will update our Stan model to either calculate the exact LOO density for a held-out observation (and estimate that observation as a parameter), or to calculate the density for all observations for use in approximate LOO-CV estimation.

We add a data argument y_miss_i to the model to specify the index of the missing observation. If y_miss_i is 0, the model will calculate the approximate LOO-CV density for all observations. If y_miss_i is greater than 0, the model will calculate the exact LOO-CV density for the observation at index y_miss_i.

functions {
  real gauss_copula_cholesky_lpdf(matrix u, matrix L) {
    array[rows(u)] row_vector[cols(u)] q;
    for (n in 1:rows(u)) {
      q[n] = inv_Phi(u[n]);
    }

    return multi_normal_cholesky_lpdf(q | rep_row_vector(0, cols(L)), L)
            - std_normal_lpdf(to_vector(to_matrix(q)));
  }

  row_vector gauss_copula_cholesky_pointwise(row_vector y_obs, row_vector y_miss,
                                              real lambda, real nu, matrix Sigma_inv) {
    int J = cols(Sigma_inv);
    vector[J] inv_sigma_inv = inv(diagonal(Sigma_inv));

    row_vector[J] log_lik;
    vector[2] u_miss = [ exponential_cdf(y_miss[1] | lambda),
                          chi_square_cdf(y_miss[2] | nu) ]';
    vector[2] u = [ exponential_cdf(y_obs[1] | lambda),
                    chi_square_cdf(y_obs[2] | nu) ]';
    vector[2] q_miss = inv_Phi(u_miss);
    vector[2] q = inv_Phi(u);
    vector[2] g = Sigma_inv * q_miss;

    log_lik = [ exponential_lpdf(y_obs[1] | lambda),
                      chi_square_lpdf(y_obs[2] | nu) ];

    for (j in 1:J) {
      log_lik[j] += normal_lpdf(q[j] | q_miss[j] - g[j] * inv_sigma_inv[j],
                                        sqrt(inv_sigma_inv[j]))
                      - std_normal_lpdf(q[j]);
    }
    return log_lik;
  }
}

data {
  int<lower=0> N;
  matrix[N, 2] Y;
  int y_miss_i;
}

transformed data {
  int has_missing = y_miss_i > 0;
  int y_miss_col = (y_miss_i - 1) %/% N + 1;
  int y_miss_row = y_miss_i - (y_miss_col - 1) * N;
  array[2] int y_miss_idx = { y_miss_row, y_miss_col };
}

parameters {
  real<lower=0> lambda;
  real<lower=0> nu;
  cholesky_factor_corr[2] rho_chol;
  array[has_missing] real<lower=0> y_miss;
}

model {
  matrix[N, 2] u;
  matrix[N, 2] Ymiss = Y;
  if (has_missing) {
    Ymiss[y_miss_idx[1], y_miss_idx[2]] = y_miss[1];
  }

  for (n in 1:N) {
    u[n, 1] = exponential_cdf(Ymiss[n, 1] | lambda);
    u[n, 2] = chi_square_cdf(Ymiss[n, 2] | nu);
  }

  Ymiss[, 1] ~ exponential(lambda);
  Ymiss[, 2] ~ chi_square(nu);

  u ~ gauss_copula_cholesky(rho_chol);
}

generated quantities {
  real rho = multiply_lower_tri_self_transpose(rho_chol)[1, 2];
  vector[has_missing ? 1 : N*2] log_lik;

  {
    int J = 2;
    matrix[J,J] Sigma_inv = chol2inv(rho_chol);
    matrix[has_missing ? 1 : N, J] log_lik_mat;
    array[has_missing ? 1 : N] int iters = has_missing ? {y_miss_idx[1]}
                                            : linspaced_int_array(N, 1, N);

    for (n in iters) {
      row_vector[J] Ymiss = Y[n];
      if (has_missing) {
        Ymiss[y_miss_idx[2]] = y_miss[1];
      }
      log_lik_mat[has_missing ? 1 : n]
          = gauss_copula_cholesky_pointwise(Y[n], Ymiss, lambda, nu, Sigma_inv);
    }

    if (has_missing) {
      log_lik[1] = log_lik_mat[1, y_miss_idx[2]];
    } else {
      log_lik = to_vector(log_lik_mat);
    }
  }
}

copula_mod_loo <- cmdstan_model("copula_mod_loo.stan")

First we’ll estimate the approximate LOO-CV for all observations:

copula_loo_fit <- copula_mod_loo$sample(data = list(N = nrow(y), Y = y, y_miss_i = 0),
                                parallel_chains = 4)
approx_loo <- copula_loo_fit$loo()

Next, we’ll estimate the exact LOO-CV for each observation. Given the number of observations to hold-out (1000), we will do this parallel using the furrr and future packages:

future::plan(future::multisession)

run_fun <- function(idx) {
  copula_mod <- cmdstanr::cmdstan_model(exe_file = "copula_mod_loo")
  copula_fit_iter <- copula_mod$sample(
    data = list(N = nrow(y), Y = y, y_miss_i = idx)
  )
  
  log_lik_draws <- copula_fit_iter$draws(variables = "log_lik", format = "draws_df")$`log_lik[1]`

  data.frame(res = as.numeric(log_lik_draws)) |>
    setNames(paste0("log_lik_", idx))
}

exact_lpd <- furrr::future_map_dfc(1:1000, run_fun, .progress = TRUE)
log_mean_exp <- function(x) {
  # more stable than log(mean(exp(x)))
  max_x <- max(x)
  max_x + log(sum(exp(x - max_x))) - log(length(x))
}
exact_elpds <- apply(exact_lpd, 2, log_mean_exp)

Then we can visually compare the approximate and exact LOO-CV estimates:

library(ggplot2)
suppressPackageStartupMessages(library(bayesplot))
color_scheme_set("brightblue")
theme_set(theme_default())

# Create a summary annotation for the plot
plot_lab <- paste0(paste0("Exact ELPD: ", round(sum(exact_elpds), 2)), "\n",
              paste0("Approx ELPD: ", round(sum(approx_loo$pointwise[,1]), 2)))

data.frame(loo_elpd = approx_loo$pointwise[,1],
           exact_lpd = exact_elpds) |>
  ggplot(aes(x = loo_elpd, y = exact_lpd)) +
    geom_abline(color = "gray30") +
    geom_point(size = 2) +
    xlab("Approximate elpds") +
    ylab("Exact elpds") +
    annotate("text", x = 0, y = -6, label = plot_lab)

We can see that the exact and approximate LOO-CV estimates are extremely consistent.