We use an item-response model such as described in the previous chapter to fit a model to estimate the abilities of the teams in the 2014 football World Cup. We use score differentials as data (ignoring the shoot-outs).
The forthcoming Bayesian workflow book has more information about the modeling task, and example of workflow of checking and fixing some models. At this time this case study focuses on illustrating the predictive performance comparison of continuous and discrete models, and the rest of the code will be included later.
We’ll start with continuous models and discretized continuous models to illustrate how those can be compared, and eventually build also models using bivariate Poisson and Poisson difference models (Karlis and Ntzoufras 2003), which are commonly used in analysis of sports data.
Data include 64 game results from Worldcup 2014 and Soccer Power Index that was available on the internet a month before the tournament (Silver, 2014). We took the rankings, with Brazil at the top (getting a score of 32) and Australia at the bottom (with a score of 1), and then for simplicity in interpretation of the parameters we rescaled these to have mean 0 and standard deviation 1/2, to get ``prior scores’’ that ranged from -0.83 to +0.83.
The first model uses normal distribution for score difference.
The model structure is as follows: if game i has teams j_1 and team j_2 playing, and they score s_1 and s_2 goals, respectively, then the data point for this game is y_i =
s_1-s_2, and the data model is:
y_i \sim \mathrm{normal}(a_{j_1[i]}-a_{j_2[i]}, \sigma_y),
where a_{j_1} and a_{j_2} are the ability parameters (to use psychometrics jargon) for the two teams and \sigma_y is a scale parameter estimated from the data.
To include the prior information the model for the team abilities is
a_j \sim \mathrm{normal}(\mu + \beta*\mathrm{prior\_score}_j, \sigma_a),
which we write as
a_j=\mu+b*\mathrm{prior\_score}_j+\sigma_a*\alpha_j,
with \alpha_j\sim\mathrm{normal}(0,1) \text{ for } j=1,\dots,J=32. Actually, though, all we care about are the relative, not the absolute, team abilities, so we can just set \mu=0, so that the model is,
a_j = b*\mathrm{prior\_score}_j+\sigma_a*\alpha_j, \text{ with } \alpha_j \sim \mathrm{normal}(0, 1), \text{ for } j=1,\dots,J=32.
Following our general practice, we add weakly informative priors for b, \sigma_a, and \sigma_y. The data and model are roughly on unit scale, so \mathrm{normal}(0,1) priors should give some computational stability while letting the data be dominant in estimating the parameters.
The data model in Stan is written as
dif ~ normal(a[team_1] - a[team_2], sigma_y);
We compute the log_lik for PSIS-LOO as most would do with continuous models, that is using the corresponding log density at the observed value.
4 Normal model with rounding and explicit latent values
As the score differences are integers, a more appropriate model would be discrete. We can discretize the normal model by adding rounding model: The new discrete model is
\begin{aligned}
\alpha & \sim \mathrm{normal}(0, 1) \\
b & \sim \mathrm{normal}(0, 1) \\
\sigma_a & \sim \mathrm{normal}(0, 1) \\
\sigma_z & \sim \mathrm{normal}(0, 1) \\
a_j & = b * \mathrm{prior\_score} + \sigma_a * \alpha, \quad \text{for } j=1,\dots,32\\
z_i & \sim \mathrm{normal}(a_{j_1[i]}-a_{j_2[i]}, \sigma_z), \quad \text{for } i=1,\dots,64\\
y_i & = \mathrm{round}(z_i, 1)
\end{aligned}
where the observed score differential y_i is equal to the latent variable z_i rounded to the nearest integer. The likelihood p(y_i|z_i) is then a box function,
\begin{aligned}
p(y_i|z_i) = \begin{cases}
\, 1 & \text{if } z_i\in(y_i-0.5, y_i+ 0.5) \\
\, 0 & \text{otherwise.}
\end{cases}
\end{aligned}
Multiplying the other terms of the model with this likelihood sets the posterior density to zero when z_i is outside the interval (y_i-0.5, y_i+0.5). To handle this discontinuous posterior in Stan, we code the constraint as bounds for the latent continuous outcome in the parameters block:
vector<lower=dif-0.5, upper=dif+0.5>[N_games] z;
The model block is then coded as
model { alpha ~ normal(0, 1); b ~ normal(0, 1); sigma_a ~ normal(0, 1); sigma_z ~ normal(0, 1); z ~ normal(a[team_1] - a[team_2], sigma_z);}
The model block does not include the data model for y_i at all, but the effect of the likelihood p(y_i|z_i) is coded through the constraints on z_i. This is another example, where due to the properties of Stan, we don’t code the model completely as a generative model, and part of the data model comes in the constraint.
The model summary is very similar to continuous model. We’ll soon see the explanation why the difference is small. At this point we skip the LOO comparison as PSIS-LOO computation does not work well with the explicit latent value model.
5 Normal model with rounding and integrated latent values
In our first discrete model we presented the latent value explicitly, but we can also integrate that out:
\begin{aligned}
p(\alpha,b,\sigma_a,\sigma_z|y) & \propto \int \!\!p(\alpha,b,\sigma_a,\sigma_z,z|y) dz \\
& = p(\alpha,b,\sigma_a,\sigma_z)\prod_{i=1}^{64} \int\!\! p(z_i|\alpha,b,\sigma_z)p(y_i|z_i) dz_i \\
& = p(\alpha,b,\sigma_a,\sigma_z)\prod_{i=1}^{64} \int_{y_i-1/2}^{y_i+1/2} p(z_i|\alpha,b,\sigma_z) dz_i.
\end{aligned}
The integrals \int_{y_i-1/2}^{y_i+1/2} p(z_i|\alpha,b,\sigma_z)dz_i are easy to compute as
\int_{y_i-1/2}^{y_i+1/2} p(z_i|\alpha,b,\sigma_z)dz_i = \int_{-\infty}^{y_i+1/2} p(z_i|\alpha,b,\sigma_z)dz_i \,- \int_{-\infty}^{y_i-1/2} p(z_i|\alpha,b,\sigma_z)dz_i,
and p(z_i|\alpha,b,\sigma_z) is normal distribution so that we can use a common function to compute the cumulative distribution function efficiently.
We can’t write this only with the distribution notation, and we increment the log density directly:
Here computes \log(\exp(a)-\exp(b)) while taking care of the floating point numerical accuracy, and computes the log cumulative distribution function of the normal distribution.
Now the corresponding log_lik is computed as in the model block, and this integrated log_lik works well with PSIS-LOO computation.
The model summary is up to Monte Carlo variability the same as from the model with explicit latent variables.
6 Continuous model approximating a discrete model
We can approximate integrals using the midpoint rule,
\int_{a}^{b}f(z)\,dz\approx (b-a)\,f((a+b)/2),
which in this case is
\int_{y_i-1/2}^{y_i+1/2}p(z_i|\alpha,b,\sigma_z) dz_i \approx (y_i+1/2-y_i-1/2)p(y_i|\alpha,b,\sigma_z) = p(y_i|\alpha,b,\sigma_z),
exactly the term in the continuous model after replacing \sigma_z with \sigma_y! If \sigma_y is relatively big compared to the integration interval (which in this case is 1, as the score differential is represented as a latent continuous variable rounded to the nearest integer), then the midpoint rule shou#’ ld be fine. In this case the posterior mean of \sigma_y is about 1.5. For this problem it would be fine for most purposes to perform inferences treating the discrete score differential as a continuous outcome and then include the rounding just when generating predictions.
7 Comparing discrete and continuous models
When estimating predictive performance we use log predictive densities for continuous data models and log predictive probabilities for discrete data models. We can’t directly compare densities and probabilities, but the above discussion shows that for integer outcomes, we can use the midpoint rule to approximate the predictive probabilities for a discretized (rounded) continuous model.
If we compare LOO results, there is minimal difference due to the Monte Carlo variation.
elpd_diff se_diff
Normal rounded model 0.0 0.0
Normal continuous model -0.2 0.2
The computation gets more complicated if a continuous data model is used for discrete outcomes that are not integers, or more specifically if level of discretization or rounding is not the same for all observations. Previously we had used continuous data model for the square root of the score differentials. If the outcome were continuous, we could adjust the log predictive densities by including the Jacobian of the transformation. Chapter 12 of {} has an example of modeling the weight of leaves from mesquite bushes with normal model for the yield y directly and for the logarithm of the yield, \log(y). To make the comparison valid, for the model using \log(y) as data, we can compute
p(y_i | \theta) = p(\log(y_i) | \theta)/y_i,
where 1/y_i is the Jacobian of the transformation \log(y_i). When using a continuous data model for discrete outcome, simply adding the Jacobian term and using midpoint rule may work sometimes, but for example, if we use normal model for signed square root score difference, then this does not work, as the score difference y_i can be 0 and then the Jacobian 1/(2\sqrt{y_i}) becomes infinite. Even if the Jacobian were finite for all outcome values, we would need to take into account that the integration interval length is not 1, and the Jacobian can affect the curvature. Thus, we need to perform the actual integration,
\int_{y_i-1/2}^{y_i+1/2} \frac{1}{2\sqrt{y_i}}\mathrm{normal}(y_i | a_{j_1[i]}-a_{j_2[i]}, \sigma_y) dy_i.
This integral is well defined also for y_i=0, but we just need to compute the integral in parts avoiding the problematic singularity at 0. Stan provides the function using the double exponential quadrature algorithm, which we can use in generated quantities to compute the variable used in PSIS-LOO computation.
We build a normal for signed square root score difference. The transformed data block has
for (i in1:N_games) { sqrt_dif[i] = 2 * (step(dif[i]) - 0.5) * sqrt(abs(dif[i])); }
If we compare the sqrt-normal model without taking into account the Jacobian to the normal model, it would falsely look like sqrt-normal is much better.
loo_compare(list("Normal model"=loo_normal,"Sqrt-normal model (no Jacobian)"=loo_sqrt_normal))
elpd_diff se_diff
Sqrt-normal model (no Jacobian) 0.0 0.0
Normal model -32.5 5.2
We then create a sqrt-normal model with log_lik computation using Jacobian of sqrt-transformation and integration to discretize the sqrt-normal. The integration is made using 1D quadrature integration.
log_lik[n] = (dif[n] == 0) ? log(integrate_1d(integrand, -0.5,0, append_array({sigma_y}, {a[team_1[n]] - a[team_2[n]]}), {0}, // not used, but an empty array not allowed {0}, // not used, but an empty array not allowed1e-6) + integrate_1d(integrand,0,0.5, append_array({sigma_y}, {a[team_1[n]] - a[team_2[n]]}), {0}, // not used, but an empty array not allowed {0}, // not used, but an empty array not allowed1e-6)) : log(integrate_1d(integrand, dif[n]-0.5, dif[n]+0.5, append_array({sigma_y}, {a[team_1[n]] - a[team_2[n]]}), {0}, // not used, but an empty array not allowed {0}, // not used, but an empty array not allowed1e-6));}
If we compare the sqrt-normal model to the normal model while taking into account the Jacobian and correct discretization, we see that the sqrt root model is worse.
elpd_diff se_diff
Normal model 0.0 0.0
Sqrt-normal model -7.2 5.5
8 Bivariate Poisson and Poisson difference models
To complete the comparison of continuous and discrete models we build bivariate Poisson and Poisson difference models (Karlis and Ntzoufras 2003), which are naturally discrete and commonly used in sport statistics. For the bivariate Poisson we compute log_lik in generated quantities using the Poisson difference log probability to make it comparable with other models targeting the score difference.
When we compare different models, there is not much difference, but it would make sense to use bivariate Poisson or Poisson difference as they are naturally discrete and justfied for count data.
elpd_diff se_diff
Poisson difference 0.0 0.0
Bivariate Poisson -1.0 1.6
Normal rounded (discrete) -1.7 1.6
Normal (continuous) -1.9 1.7
Although in this case the actual discrete count data models were better or as good as continuous, sometimes it can be useful to have more flexibility in the choice of distributions and some continuous distribution with flexible shape might work better than common count models. If some continuous model seems to better than usual count choices (and the log_lik computation has been made correctly), then it might be worthwhile to investigate if there is some less well known discrete distribution with the same flexibility or if using additional latent dispersion variable would help.
References
Karlis, Dimitris, and Ioannis Ntzoufras. 2003. “Analysis of Sports Data by Using Bivariate Poisson Models.”Journal of the Royal Statistical Society: Series D 52 (3): 381–93.
Source Code
#' ---#' title: "Bayesian workflow book - Worldcup continuous vs discrete"#' author: "Andrew Gelman and Aki Vehtari"#' date: 2021-01-12#' date-modified: today#' date-format: iso#' format:#' html:#' toc: true#' toc-location: left#' toc-depth: 2#' number-sections: true#' smooth-scroll: true#' theme: readable#' code-copy: true#' code-download: true#' code-tools: true#' embed-resources: true#' anchor-sections: true#' html-math-method: katex#' bibliography: ../casestudies.bib#' ---#' #' # Worldcup 2014 team performance analysis#'#' We use an item-response model such as described in the previous#' chapter to fit a model to estimate the abilities of the teams in#' the 2014 football World Cup. We use score differentials as data#' (ignoring the shoot-outs).#'#' The forthcoming Bayesian workflow book has more information about#' the modeling task, and example of workflow of checking and fixing#' some models. At this time this case study focuses on illustrating#' the predictive performance comparison of continuous and discrete#' models, and the rest of the code will be included later.#' #' We'll start with continuous models and discretized continuous#' models to illustrate how those can be compared, and eventually#' build also models using bivariate Poisson and Poisson difference#' models [@Karlis+Ntzoufras:2003:bivariate_Poisson],#' which are commonly used in analysis of sports data.#'#' -------------#' #+ setup, include=FALSEknitr::opts_chunk$set(message=FALSE, error=FALSE, warning=FALSE, comment=NA, cache=FALSE)# switch this to TRUE to save figures in separate filessavefigs <- FALSE#' **Load packages**#| code-fold: true#| cache: FALSElibrary(rprojroot)root<-has_file(".Workflow-Examples-root")$make_fix_file()library(cmdstanr)library(posterior)options(tibble.print_max=35, pillar.neg=FALSE, pillar.subtle=FALSE, pillar.sigfig=2)library(arm)library(loo)library(dplyr)library(readr)#' # Data#'#' Data include 64 game results from Worldcup 2014 and Soccer Power#' Index that was available on the internet a month before the#' tournament (Silver, 2014). We took the rankings, with Brazil at#' the top (getting a score of 32) and Australia at the bottom (with a#' score of 1), and then for simplicity in interpretation of the#' parameters we rescaled these to have mean 0 and standard deviation#' 1/2, to get ``prior scores'' that ranged from $-0.83$ to $+0.83$. powerindex <- read_csv(root("Worldcup/data","soccerpowerindex.csv")) %>% mutate(prior_score = as.vector(scale(rev(index))/2))teamnames <- powerindex$teamworldcup2014 <- read_csv(root("Worldcup/data","worldcup2014.csv")) %>% mutate(team_1 = match(team1, teamnames), team_2 = match(team2, teamnames))N_games <- nrow(worldcup2014)gamenames <- with(worldcup2014, rev(paste(teamnames[team_1], "vs.", teamnames[team_2])))#' ## Data for Stanstandata <- with( worldcup2014, list( N_teams = nrow(powerindex), N_games = N_games, team_1 = team_1, score_1 = score1, team_2 = team_2, score_2 = score2, prior_score = powerindex$prior_score ))#' # Normal continuous model#'#' The first model uses normal distribution for score difference.#'#' The model structure is as follows: if game $i$ has teams $j_1$ and#' team $j_2$ playing, and they score $s_1$ and $s_2$ goals,#' respectively, then the data point for this game is $y_i =#' s_1-s_2$, and the data model is:#' $$#' y_i \sim \mathrm{normal}(a_{j_1[i]}-a_{j_2[i]}, \sigma_y),#' $$#' where $a_{j_1}$ and $a_{j_2}$ are the ability parameters (to use#' psychometrics jargon) for the two teams and $\sigma_y$ is a scale#' parameter estimated from the data.#' #' To include the prior information the model for the team abilities is#' $$#' a_j \sim \mathrm{normal}(\mu + \beta*\mathrm{prior\_score}_j, \sigma_a),#' $$#' which we write as#' $$#' a_j=\mu+b*\mathrm{prior\_score}_j+\sigma_a*\alpha_j,#' $$#' with $\alpha_j\sim\mathrm{normal}(0,1) \text{ for } j=1,\dots,J=32$.#' Actually, though, all we care about are the relative, not the#' absolute, team abilities, so we can just set $\mu=0$, so that the#' model is,#' $$#' a_j = b*\mathrm{prior\_score}_j+\sigma_a*\alpha_j, \text{ with } \alpha_j \sim \mathrm{normal}(0, 1), \text{ for } j=1,\dots,J=32.#' $$#' Following our general practice, we add weakly informative priors#' for $b$, $\sigma_a$, and $\sigma_y$. The data and model are#' roughly on unit scale, so $\mathrm{normal}(0,1)$ priors should give#' some computational stability while letting the data be dominant in#' estimating the parameters.#'#' The data model in Stan is written as#' ```stan#' dif ~ normal(a[team_1] - a[team_2], sigma_y);#' ```#' We compute the `log_lik` for PSIS-LOO as most would do with#' continuous models, that is using the corresponding log density at#' the observed value.#' ```stan#' log_lik[n] = normal_lpdf(dif[n] | a[team_1[n]] - a[team_2[n]], sigma_y);#' ```#'#' Continuous normal modelmodel_normal <- cmdstan_model(stan_file = root("Worldcup", "worldcup_normal.stan"))fit_normal <- model_normal$sample(data = standata, refresh = 0)fit_normal$summary(c("a[1]","a[32]","b","sigma_a","sigma_y"))loo_normal <- fit_normal$loo()#' # Normal model with rounding and explicit latent values#'#' As the score differences are integers, a more appropriate model#' would be discrete. We can discretize the normal model by#' adding rounding model:#' The new discrete model is#' $$#' \begin{aligned}#' \alpha & \sim \mathrm{normal}(0, 1) \\#' b & \sim \mathrm{normal}(0, 1) \\#' \sigma_a & \sim \mathrm{normal}(0, 1) \\#' \sigma_z & \sim \mathrm{normal}(0, 1) \\#' a_j & = b * \mathrm{prior\_score} + \sigma_a * \alpha, \quad \text{for } j=1,\dots,32\\#' z_i & \sim \mathrm{normal}(a_{j_1[i]}-a_{j_2[i]}, \sigma_z), \quad \text{for } i=1,\dots,64\\#' y_i & = \mathrm{round}(z_i, 1)#' \end{aligned}#' $$#' where the observed score differential $y_i$ is equal to the latent#' variable $z_i$ rounded to the nearest integer.#' The likelihood $p(y_i|z_i)$ is then a box function,#' $$#' \begin{aligned}#' p(y_i|z_i) = \begin{cases} #' \, 1 & \text{if } z_i\in(y_i-0.5, y_i+ 0.5) \\#' \, 0 & \text{otherwise.}#' \end{cases}#' \end{aligned}#' $$#' Multiplying the other terms of the model with this likelihood sets#' the posterior density to zero when $z_i$ is outside the interval#' $(y_i-0.5, y_i+0.5)$. To handle this discontinuous posterior in#' Stan, we code the constraint as bounds for the latent continuous#' outcome in the parameters block:#' #' ```stan#' vector<lower=dif-0.5, upper=dif+0.5>[N_games] z;#' ```#' The model block is then coded as#' ```stan#' model {#' alpha ~ normal(0, 1);#' b ~ normal(0, 1);#' sigma_a ~ normal(0, 1);#' sigma_z ~ normal(0, 1);#' z ~ normal(a[team_1] - a[team_2], sigma_z);#' }#' ```#' The model block does not include the data model for $y_i$ at all,#' but the effect of the likelihood $p(y_i|z_i)$ is coded through the#' constraints on $z_i$. This is another example, where due to the#' properties of Stan, we don't code the model completely as a#' generative model, and part of the data model comes in the#' constraint.#' #' Discrete model with explicit latent z sampledmodel_normal_rounded_z <- cmdstan_model(stan_file = root("Worldcup", "worldcup_normal_rounded_z.stan"))fit_normal_rounded_z <- model_normal_rounded_z$sample(data = standata, refresh = 0)fit_normal_rounded_z$summary(c("b","sigma_a","sigma_z"))#' The model summary is very similar to continuous model. We'll soon#' see the explanation why the difference is small. At this point we#' skip the LOO comparison as PSIS-LOO computation does not work well#' with the explicit latent value model.#' #' # Normal model with rounding and integrated latent values#'#' In our first discrete model we presented the latent value#' explicitly, but we can also integrate that out:#' $$#' \begin{aligned}#' p(\alpha,b,\sigma_a,\sigma_z|y) & \propto \int \!\!p(\alpha,b,\sigma_a,\sigma_z,z|y) dz \\#' & = p(\alpha,b,\sigma_a,\sigma_z)\prod_{i=1}^{64} \int\!\! p(z_i|\alpha,b,\sigma_z)p(y_i|z_i) dz_i \\#' & = p(\alpha,b,\sigma_a,\sigma_z)\prod_{i=1}^{64} \int_{y_i-1/2}^{y_i+1/2} p(z_i|\alpha,b,\sigma_z) dz_i.#' \end{aligned}#' $$#' The integrals $\int_{y_i-1/2}^{y_i+1/2} p(z_i|\alpha,b,\sigma_z)dz_i$ are easy to compute as#' $$#' \int_{y_i-1/2}^{y_i+1/2} p(z_i|\alpha,b,\sigma_z)dz_i = \int_{-\infty}^{y_i+1/2} p(z_i|\alpha,b,\sigma_z)dz_i \,- \int_{-\infty}^{y_i-1/2} p(z_i|\alpha,b,\sigma_z)dz_i,#' $$#' and $p(z_i|\alpha,b,\sigma_z)$ is normal distribution so that we can use a common function to compute the cumulative distribution function efficiently.#' #' We can't write this only with the distribution notation, and we increment the log density directly:#' \begin{verbatimR}#' model {#' alpha ~ normal(0, 1);#' b ~ normal(0, 1);#' sigma_a ~ normal(0, 1);#' sigma_z ~ normal(0, 1);#' for (n in 1:N_games) {#' target += log_diff_exp(normal_lcdf(dif[n]+0.5 | a[team_1[n]]-a[team_2[n]], sigma_z),#' normal_lcdf(dif[n]-0.5 | a[team_1[n]]-a[team_2[n]], sigma_z));#' }#' }#' \end{verbatimR}#'#' Here \texttt{log\_diff\_exp} computes $\log(\exp(a)-\exp(b))$ while#' taking care of the floating point numerical accuracy, and#' \texttt{normal\_lcdf} computes the log cumulative distribution#' function of the normal distribution.#'#' Now the corresponding `log_lik` is computed as in the model block, and this integrated `log_lik` works well with PSIS-LOO computation.#' ```stan#' log_lik[n] = log_diff_exp(normal_lcdf(dif[n]+0.5 | a[team_1[n]] - a[team_2[n]], sigma_z),#' normal_lcdf(dif[n]-0.5 | a[team_1[n]] - a[team_2[n]], sigma_z));#' ```model_normal_rounded <- cmdstan_model(stan_file = root("Worldcup", "worldcup_normal_rounded.stan"))fit_normal_rounded <- model_normal_rounded$sample(data = standata, refresh = 0)fit_normal_rounded$summary(c("b","sigma_a","sigma_z"))loo_normal_rounded <- fit_normal_rounded$loo()#' The model summary is up to Monte Carlo variability the same as from#' the model with explicit latent variables.#'#' # Continuous model approximating a discrete model#' #' We can approximate integrals using the midpoint rule,#' $$#' \int_{a}^{b}f(z)\,dz\approx (b-a)\,f((a+b)/2),#' $$#' which in this case is#' $$#' \int_{y_i-1/2}^{y_i+1/2}p(z_i|\alpha,b,\sigma_z) dz_i \approx (y_i+1/2-y_i-1/2)p(y_i|\alpha,b,\sigma_z) = p(y_i|\alpha,b,\sigma_z),#' $$#' exactly the term in the continuous model after replacing $\sigma_z$#' with $\sigma_y$! If $\sigma_y$ is relatively big compared to the#' integration interval (which in this case is 1, as the score#' differential is represented as a latent continuous variable rounded#' to the nearest integer), then the midpoint rule shou#' ld be#' fine. In this case the posterior mean of $\sigma_y$ is about 1.5.#' For this problem it would be fine for most purposes to perform#' inferences treating the discrete score differential as a continuous#' outcome and then include the rounding just when generating#' predictions.#'#' # Comparing discrete and continuous models#'#' When estimating predictive performance we use log predictive#' densities for continuous data models and log predictive#' probabilities for discrete data models. We can't directly compare#' densities and probabilities, but the above discussion shows that#' for integer outcomes, we can use the midpoint rule to approximate#' the predictive probabilities for a discretized (rounded) continuous#' model.#'#' If we compare LOO results, there is minimal difference due to the#' Monte Carlo variation.loo_compare(list("Normal continuous model"=loo_normal, "Normal rounded model"=loo_normal_rounded))#' The computation gets more complicated if a continuous data model is#' used for discrete outcomes that are not integers, or more#' specifically if level of discretization or rounding is not the same#' for all observations. Previously we had used continuous data model#' for the square root of the score differentials. If the outcome were#' continuous, we could adjust the log predictive densities by#' including the Jacobian of the transformation. Chapter 12 of {\em#' Regression and Other Stories} has an example of modeling the weight#' of leaves from mesquite bushes with normal model for the yield $y$#' directly and for the logarithm of the yield, $\log(y)$. To make the#' comparison valid, for the model using $\log(y)$ as data, we can#' compute#' $$#' p(y_i | \theta) = p(\log(y_i) | \theta)/y_i,#' $$#' where $1/y_i$ is the Jacobian of the transformation $\log(y_i)$.#' When using a continuous data model for discrete outcome, simply#' adding the Jacobian term and using midpoint rule may work#' sometimes, but for example, if we use normal model for signed#' square root score difference, then this does not work, as the score#' difference $y_i$ can be $0$ and then the Jacobian $1/(2\sqrt{y_i})$#' becomes infinite. Even if the Jacobian were finite for all outcome#' values, we would need to take into account that the integration#' interval length is not 1, and the Jacobian can affect the#' curvature. Thus, we need to perform the actual integration,#' $$#' \int_{y_i-1/2}^{y_i+1/2} \frac{1}{2\sqrt{y_i}}\mathrm{normal}(y_i | a_{j_1[i]}-a_{j_2[i]}, \sigma_y) dy_i.#' $$#' This integral is well defined also for $y_i=0$, but we just need to#' compute the integral in parts avoiding the problematic singularity#' at $0$. Stan provides the \texttt{integrate\_1d} function using the#' double exponential quadrature algorithm, which we can use in#' generated quantities to compute the \texttt{log\_lik} variable used#' in PSIS-LOO computation.#'#' We build a normal for signed square root score difference. The transformed data block has#' ```stan#' for (i in 1:N_games) {#' sqrt_dif[i] = 2 * (step(dif[i]) - 0.5) * sqrt(abs(dif[i]));#' }#' ```#' the model block has#' ```stan#' sqrt_dif ~ normal(a[team_1] - a[team_2], sigma_y);#' ```#' and the generated quantities block has#' ```stan#' log_lik[n] = normal_lpdf(sqrt_dif[n] | a[team_1[n]] - a[team_2[n]], sigma_y);#' ```#' We create a model using log densities for `log_lik`.model_sqrt_normal <- cmdstan_model(stan_file = root("Worldcup", "worldcup_sqrt_normal.stan"))fit_sqrt_normal <- model_sqrt_normal$sample(data = standata, refresh = 0)fit_sqrt_normal$summary(c("b","sigma_a","sigma_y"))loo_sqrt_normal <- fit_sqrt_normal$loo()#' If we compare the sqrt-normal model without taking into account the#' Jacobian to the normal model, it would falsely look like#' sqrt-normal is much better.loo_compare(list("Normal model"=loo_normal, "Sqrt-normal model (no Jacobian)"=loo_sqrt_normal))#' We then create a sqrt-normal model with `log_lik` computation using Jacobian of sqrt-transformation and integration to discretize the sqrt-normal. The integration is made using 1D quadrature integration.#' ```stan#' log_lik[n] = (dif[n] == 0)#' ? log(integrate_1d(integrand,#' -0.5,#' 0,#' append_array({sigma_y}, {a[team_1[n]] - a[team_2[n]]}),#' {0}, // not used, but an empty array not allowed#' {0}, // not used, but an empty array not allowed#' 1e-6) +#' integrate_1d(integrand,#' 0,#' 0.5,#' append_array({sigma_y}, {a[team_1[n]] - a[team_2[n]]}),#' {0}, // not used, but an empty array not allowed#' {0}, // not used, but an empty array not allowed#' 1e-6))#' : log(integrate_1d(integrand,#' dif[n]-0.5,#' dif[n]+0.5,#' append_array({sigma_y}, {a[team_1[n]] - a[team_2[n]]}),#' {0}, // not used, but an empty array not allowed#' {0}, // not used, but an empty array not allowed#' 1e-6));#' }#'```model_sqrt_normal_jacobian_discretized <- cmdstan_model(stan_file = root("Worldcup", "worldcup_sqrt_normal_jacobian_discretized.stan"))fit_sqrt_normal_jacobian_discretized <- model_sqrt_normal_jacobian_discretized$sample(data = standata, refresh = 0)fit_sqrt_normal_jacobian_discretized$summary(c("b","sigma_a","sigma_y"))loo_sqrt_normal_jacobian_discretized <- fit_sqrt_normal_jacobian_discretized$loo()#' If we compare the sqrt-normal model to the normal model while#' taking into account the Jacobian and correct discretization, we see#' that the sqrt root model is worse.loo_compare(list("Normal model"=loo_normal, "Sqrt-normal model"=loo_sqrt_normal_jacobian_discretized))#' # Bivariate Poisson and Poisson difference models#'#' To complete the comparison of continuous and discrete models we#' build bivariate Poisson and Poisson difference models#' [@Karlis+Ntzoufras:2003:bivariate_Poisson], which are naturally#' discrete and commonly used in sport statistics. For the bivariate#' Poisson we compute `log_lik` in generated quantities using the#' Poisson difference log probability to make it comparable with other#' models targeting the score difference.model_bipois <- cmdstan_model(stan_file = root("Worldcup", "worldcup_bivariate_poisson.stan"))fit_bipois <- model_bipois$sample(data = standata, refresh = 0)loo_bipois <- fit_bipois$loo()model_poisdif <- cmdstan_model(stan_file = root("Worldcup", "worldcup_poisson_difference.stan"))fit_poisdif <- model_poisdif$sample(data = standata, refresh = 0)fit_poisdif$summary(c("a[1]","a[32]","b","sigma_a"))loo_poisdif <- fit_poisdif$loo()#' When we compare different models, there is not much difference, but#' it would make sense to use bivariate Poisson or Poisson difference#' as they are naturally discrete and justfied for count data.loo_compare(list("Normal (continuous)"=loo_normal, "Normal rounded (discrete)"=loo_normal_rounded, "Bivariate Poisson"=loo_bipois, "Poisson difference"=loo_poisdif))#' Although in this case the actual discrete count data models were#' better or as good as continuous, sometimes it can be useful to have#' more flexibility in the choice of distributions and some continuous#' distribution with flexible shape might work better than common#' count models. If some continuous model seems to better than usual#' count choices (and the `log_lik` computation has been made#' correctly), then it might be worthwhile to investigate if there is#' some less well known discrete distribution with the same#' flexibility or if using additional latent dispersion variable would#' help.#'