Friends model checking case study

Author

Aki Vehtari

Published

2025-08-26

Modified

2025-08-27

Setup

Load packages

Code
library(dplyr)
library(tidyr)
library(reliabilitydiag)
library(loo)
library(ggplot2)
library(bayesplot)
theme_set(bayesplot::theme_default(base_family = "sans", base_size=16))
library(brms)
options(brms.backend = "cmdstanr", mc.cores = 4)
library(marginaleffects)
library(patchwork)
library(ggdist)
library(khroma)

1 Introduction

This case study was inspired by a case study by Julia M. Rohrer and Vincent Arel-Bundock illustrating the use of marginaleffects R package.

From their case study:

It’s a common complaint that people who enter a relationship start to neglect their friends. Here, we are going to use this to motivate an associational research question: Do people in romantic relationships, on average, assign less importance to their friends? To address this question, we analyze data that were collected in the context of a diary study on satisfaction with various aspects of life (Rohrer et al., 2024). In this study, 482 people reported whether they were in a romantic relationship of any kind (partner) and also the extent to which they considered their friendships important (friendship_importance) on a scale from 1 (not important at all) to 5 (very important).

In this cases study, I build corresponding Bayesian models and illustrate model checking, comparison, and interpretation. I did check convergence diagnostics and sufficient effective sample sizes (Vehtari et al. 2021) for all MCMC inferences, but do not show them explicitly here.

2 Read and clean the data

friends <- read.csv("start.csv")
friends <- friends |>
  filter(sex != 3 &      # exclude 5 people who reported a gender distinct from male/female
           age < 60) |>  # exclude people over the age of 60
  select(IMP_friends_Start, age, partner_any, sex) |>
  rename(partner = partner_any,
         gender = sex,
         friendship_importance = IMP_friends_Start) |>
  drop_na() |> 
  mutate(gender = factor(gender, levels = c(1, 2), labels = c("female", "male")),
         partner = factor(partner, levels = c(0, 1), labels = c("no", "yes")),
         age_scaled = age/60) # normalize for easier prior definition
head(friends) |> print(digits = 2)
  friendship_importance age partner gender age_scaled
1                     5  25      no female       0.42
2                     4  23      no female       0.38
3                     5  20     yes female       0.33
4                     4  22     yes female       0.37
5                     5  24      no female       0.40
6                     5  20      no female       0.33

3 Normal data model

Normal model is commonly used even we know the target is not normally distributed or continuous. It can be computationally convenient especially in initial data exploration in the early part of modeling workflow.

3.1 Linear model with age

#

Using scaled age, makes the prior definition easier especially for the later models with more predictor terms. For this simplest model, we could use any wide prior or even flat prior for the only coefficient, but R2D2 prior (Zhang et al. 2022; Aguilar and Bürkner 2023) has a benefit that the prior is set on R^2 and when we later add more model components, by using the same R2D2 prior all the models have similar prior predictive distribution.

fit_age_lin <- brm(friendship_importance ~ age_scaled,
               data = friends,
               prior = prior(R2D2(mean_R2 = 1/3, prec_R2 = 3, cons_D2 = 1/3), class = b),
               control = list(adapt_delta = 0.99),
               refresh = 0) |>
  add_criterion(criterion="loo")

3.2 Each year of age as its own category

The original model had formula ~ as.factor(age), but for Bayesians it is more natural to use hierarchical model where the prior scale for the category specific intercepts is also inferred.

fit_age_cat <- brm(friendship_importance ~ 1 + (1 | age), data = friends,
               refresh = 0) |>
  add_criterion(criterion="loo")

3.3 Spline model with age

The original model used bs(age, df = 4), but s() is better as it uses regularized thin plate splines and we don’t need to choose degrees of freedoms. We can use R2D2 prior also for spline magnitude parameter.

fit_age_smooth <- brm(friendship_importance ~ s(age_scaled),
                      data = friends,
                      prior = prior(R2D2(mean_R2 = 1/3, prec_R2 = 3, cons_D2 = 1/3), class = sds),
                      control = list(adapt_delta = 0.99),
                      refresh = 0) |>
  add_criterion(criterion = "loo")

3.4 Spline and interactions with age, gender, and partner

The original case study included the following formula:

friendship_importance ~ bs(age, df = 4) + gender + partner +
           bs(age, df = 4):gender + partner:gender + bs(age, df = 4):partner

As splines didn’t seem to be helpful, we also build a corresponding model with linear terms and the same interactions.

fit_i <- brm(friendship_importance ~ age_scaled + age_scaled:gender + age_scaled:partner +
               gender*partner,
             prior = prior(R2D2(mean_R2 = 1/3, prec_R2 = 3, cons_D2 = 1/3), class = b),
             data = friends,
             control = list(adapt_delta = 0.99),
             refresh = 0) |>
  add_criterion(criterion = "loo")

As before for the spline version we use s() instead of bs().

fit_is <- brm(friendship_importance ~ s(age_scaled) + gender*partner +
                s(age_scaled, by = gender) + s(age_scaled, by = partner),
              prior = c(prior(R2D2(mean_R2 = 1/3, prec_R2 = 3, cons_D2 = 1/3, main = TRUE), class = b),
                        prior(R2D2(main = FALSE), class = sds)),
              data = friends,
              control = list(adapt_delta = 0.95),
              refresh = 0) |>
  add_criterion(criterion = "loo")

3.5 Comparison of predictive performance

We already have computed above the expected log predictive densities using fast Pareto smoothed importance sampling leave-one-out cross-validation (Vehtari, Gelman, and Gabry 2017; Vehtari et al. 2024), and use those for comparing the models.

loo_compare(fit_age_lin, fit_age_cat, fit_age_smooth,
            model_names = c("Normal linear",
                            "Normal categorical",
                            "Normal spline"))
                   elpd_diff se_diff
Normal spline        0.0       0.0  
Normal linear       -1.1       1.8  
Normal categorical -11.0       4.0  

Categorical is clearly the worst. Linear and spline model are practically equally good.

loo_compare(fit_age_smooth, fit_i, fit_is,
            model_names = c("Normal spline with age only",
                            "Normal linear with interactions",
                            "Normal spline with interactions"))
                                elpd_diff se_diff
Normal spline with age only      0.0       0.0   
Normal linear with interactions -2.2       2.5   
Normal spline with interactions -2.3       1.7   

There is no predictive performance benefit from the additional predictors.

3.6 Predictive model checking

We compare model predictions to the data to diagnose possible model misspecification.

The default posterior predictive checking plot uses kernel density estimate, which is not good for discrete data, but still shows that the data distribution is more skewed than normal.

pp_check(fit_age_smooth)

Histogram plot makes it clear that the normal distribution does not resemble the data.

pp_check(fit_age_smooth, type = "hist", ndraws=1) +
  scale_x_continuous(breaks = 1:7)

We can discretize the normal model by rounding the predictions to nearest integer and use bars plot. Bars plot is often useless for ordinal models as shown later also in this case study. In this case, the normal model is so rigid that even the bar plot can show the misspecification.

ppc_bars(y = friends$friendship_importance,
         yrep = round(posterior_predict(fit_age_smooth))) +
  scale_x_continuous(breaks = 1:7) +
  coord_cartesian(xlim=c(.5,7.5))

4 Ordinal model

At this point let’s switch to a better data model and use actual ordinal model. We build linear and spline models without and with interactions. In this case study, I don’t build any polynomial models as there is no domain specific theoretical justification to use them for this data, and as generic non-linear models they are inferior to regularized thin plate splines.

We can use R2D2 prior with ordinal model, too. The implementation in brms is not defining prior on R^2, but it is still taking care that the prior predictive variance in latent space is not increasing when we add more model compare. Yanchenko (2025) discusses how to specify prior on pseudo-R^2 for ordinal models.

4.1 Latent linear model without interactions

fit_ord <- brm(friendship_importance ~ age_scaled + gender + partner,
                 family = cumulative(),
                 prior = prior(R2D2(mean_R2 = 1/3, prec_R2 = 3, cons_D2 = 1/3), class = b),
               data = friends,
               control = list(adapt_delta = 0.99),
                 refresh = 0) |>
  add_criterion(criterion = "loo", save_psis = TRUE)

4.2 Latent spline model without interactions

fit_ord_s <- brm(friendship_importance ~ s(age_scaled) + gender + partner,
                 family = cumulative(),
                 prior = c(prior(R2D2(mean_R2 = 1/3, prec_R2 = 3, cons_D2 = 1/3, main = TRUE), class = b),
                           prior(R2D2(main = FALSE), class = sds)),
                 data = friends,
                 control = list(adapt_delta = 0.95),
                 refresh = 0) |>
  add_criterion(criterion = "loo", save_psis = TRUE)

4.3 Latent linear model with interactions

fit_ord_i <- brm(friendship_importance ~ age_scaled + age_scaled:gender + age_scaled:partner +
                   gender*partner,
                 family = cumulative(),
                 prior = prior(R2D2(mean_R2 = 1/3, prec_R2 = 3, cons_D2 = 1/3), class = b),
                 data = friends,
                 control = list(adapt_delta = 0.99),
                 refresh = 0) |>
  add_criterion(criterion = "loo", save_psis = TRUE)

4.4 Latent spline model with interactions

fit_ord_is <- brm(friendship_importance ~ s(age_scaled) + gender*partner +
                    s(age_scaled, by=gender) + s(age_scaled, by=partner),
                  family = cumulative(),
                  prior = c(prior(R2D2(mean_R2 = 1/3, prec_R2 = 3, cons_D2 = 1/3, main = TRUE), class = b),
                            prior(R2D2(main = FALSE), class = sds)),
                  data = friends,
                  control = list(adapt_delta = 0.95),
                  refresh = 0) |>
  add_criterion(criterion = "loo", save_psis = TRUE)

4.5 Comparison of predictive performance

Again we have already computed LOO-CV results and compare predictive performances.

loo_compare(fit_is, fit_ord, fit_ord_i, fit_ord_s, fit_ord_is,
            model_names = c("Normal spline with interactions",
                            "Ordinal linear",
                            "Ordinal linear with interactions",
                            "Ordinal spline",
                            "Ordinal spline with interactions"))
                                 elpd_diff se_diff
Ordinal linear                     0.0       0.0  
Ordinal spline                     0.0       1.6  
Ordinal linear with interactions  -0.4       0.5  
Ordinal spline with interactions  -1.5       1.6  
Normal spline with interactions  -56.8       9.5  

Ordinal models beat the normal model clearly (see Nabiximols case study for explanation why we can compare continuous and discrete model in this case). There is no practical difference in including splines or interactions. The data are noisy and there are not that many observations to be able to infer small non-linearities or interactions.

4.6 Predictive checking

There is no practical difference in histograms.

pp_check(fit_ord_is, type = "hist", ndraws=1)

Bars plot looks great, but it is actually useless, as cumulative ordinal model has category specific intercepts and bar plot would look perfect even without any predictors (Säilynoja et al. 2025).

pp_check(fit_ord_is, type = "bars")

For ordinal models it is useful to examine calibration of cumulative probabilities (Säilynoja et al. 2025). If the red calibration line stays inside the blue envelope, the predictive probabilities are well calibrated.

calibration_plot <- \(fit, i) {
  rd=reliabilitydiag(EMOS = E_loo((matrix(as.integer(posterior_predict(fit)),nrow=4000)<=i)+0,
                                  loo(fit)$psis_object)$value,
                     y = as.numeric(friends$friendship_importance<=i))
  autoplot(rd)+
    labs(x=paste0("Predicted probability of outcome <= ", i),
         y="Conditional event probabilities")+
    bayesplot::theme_default(base_family = "sans", base_size=16)
}
calibration_plot(fit_ord_is, 1)

calibration_plot(fit_ord_is, 2)

calibration_plot(fit_ord_is, 3)

calibration_plot(fit_ord_is, 4)

5 Interpreting models

We examine association between predictors and outcome. Usually we would only examine models which have passed the model checking. For demonstrations, we examine whether there is any difference between using normal or ordinal model.

We did several normal and ordinals models, and one option would be to illustrate the results with all these models in multiverse style as Rohrer and Arel-Bundock did. As Bayesians we can also average over the models to provide just one combined model which takes into account the uncertainty in the model choice. As the model with splines and interactions include other models as nested inside, we can use the most complex model. This is makes sense, especially when the most complex model does not have much worse predictive performance. The most complex model takes automatically into account the uncertainty in different model choices.

We use marginaleffects package as in the Rohrer and Arel-Bundock’s case study, but wrap it to produce data frames useful for making our own plots overlaying the normal and ordinal model results.

my_avg_comp <- \(fit, variables, modelname) {
  if (modelname == "ordinal") {
    comp <- avg_comparisons(fit, variables=variables, hypothesis = ~ I(sum(x * 1:5)))
  } else {
    comp <- avg_comparisons(fit, variables=variables)
  }
  comp <- comp |>
    get_draws() |>
    select("draw")
  comp$model <- modelname
  comp
}
plot_theme <- theme(axis.text.y=element_blank(),
                    axis.ticks.y=element_blank(),
                    axis.line.y=element_blank(),
                    axis.title.y=element_blank(),
                    plot.title = element_text(size=16),
                    legend.position="none")
clr <- colour("bright",names=FALSE)(7)

rbind(my_avg_comp(fit_is, "partner", "normal"),
      my_avg_comp(fit_ord_is, "partner", "ordinal")) |>
  ggplot(aes(x=draw, color=model)) +
  stat_slab(expand=TRUE, trim=FALSE, alpha=.6, fill=NA) +
  scale_color_bright() +
  plot_theme + 
  labs(x="Average difference in friendship importance",
       title="Partner no vs. yes") +
  annotate(geom="text", x=.03, y=0.7, label="Normal", hjust=0, color=clr[1], size=6) + 
  annotate(geom="text", x=-.14, y=0.7, label="Ordinal", hjust=1, color=clr[2], size=6)

rbind(my_avg_comp(fit_is, "gender", "normal"),
      my_avg_comp(fit_ord_is, "gender", "ordinal")) |>
  ggplot(aes(x=draw, color=model)) +
  stat_slab(expand=TRUE, trim=FALSE, alpha=.6, fill=NA) +
  scale_color_bright() +
  plot_theme + 
  labs(x="Average difference in friendship importance",
       title="Gender female vs. male") +
  annotate(geom="text", x=.02, y=0.7, label="Normal", hjust=0, color=clr[1], size=6) + 
  annotate(geom="text", x=-.14, y=0.7, label="Ordinal", hjust=1, color=clr[2], size=6)

rbind(my_avg_comp(fit_is, list("age_scaled"=5/60), "normal"),
      my_avg_comp(fit_ord_is, list("age_scaled"=5/60), "ordinal")) |>
  ggplot(aes(x=draw, color=model)) +
  stat_slab(expand=TRUE, trim=FALSE, alpha=.6, fill=NA) +
  scale_color_bright() +
  plot_theme + 
  labs(x="Average difference in friendship importance",
       title="Age +5 years") +
  annotate(geom="text", x=-.09, y=0.7, label="Normal", hjust=1, color=clr[1], size=6) + 
  annotate(geom="text", x=-.055, y=0.7, label="Ordinal", hjust=0, color=clr[2], size=6)

It seems partner status and gender do not have association to the friendship importance, while age does. For age, posterior of the ordinal model is sharper, but the overall conclusion would be the same. In this case, the normal model posterior happens to be similar to posterior of a more realistic model, as 1) the conditional uncertainty is not very skewed, 2) has not very non-normal like tails, and 3) the target is mean. Nabiximols case study demonstrates a case where using posterior of normal is quite different from the posterior of a more realistic model. With this data, there would be more difference in results if the target would be for example the probability of answers 5.


References

Aguilar, J. E., and P. C. Bürkner. 2023. “Intuitive Joint Priors for Bayesian Linear Multilevel Models: The R2D2M2 Prior.” Electronic Journal of Statistics 17: 1711–67.
Säilynoja, T., A. R. Johnson, O. A. Martin, and A. Vehtari. 2025. “Recommendations for Visual Predictive Checks in Bayesian Workflow.”
Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC.” Statistics and Computing 27: 1413–32.
Vehtari, Aki, Andrew Gelman, Daniel Simpson, Bob Carpenter, and Paul-Christian Bürkner. 2021. “Rank-Normalization, Folding, and Localization: An Improved \widehat{R} for Assessing Convergence of MCMC (with Discussion).” Bayesian Analysis 16 (2): 667–718.
Vehtari, Aki, Daniel Simpson, Andrew Gelman, Yuling Yao, and Jonah Gabry. 2024. “Pareto Smoothed Importance Sampling.” Journal of Machine Learning Research 25 (72).
Yanchenko, Eric. 2025. “Pseudo-R2D2 Prior for High-Dimensional Ordinal Regression.” Statistics and Computing 35 (5): 129.
Zhang, Yan Dora, Brian P. Naughton, Howard D. Bondell, and Brian J. Reich. 2022. “Bayesian Regression Using a Prior on the Model Fit: The R2-D2 Shrinkage Prior.” Journal of the American Statistical Association 117: 862–74.

Licenses

  • Code © 2017-2025, Aki Vehtari, licensed under BSD-3.
  • Text © 2017-2025, Aki Vehtari, licensed under CC-BY-NC 4.0.

Original Computing Environment

sessionInfo()
R version 4.5.1 (2025-06-13)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_DK.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=fi_FI.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Helsinki
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] khroma_1.16.0          ggdist_3.3.3           patchwork_1.3.1       
 [4] marginaleffects_0.28.0 brms_2.22.12           Rcpp_1.1.0            
 [7] bayesplot_1.13.0.9000  ggplot2_3.5.2          loo_2.8.0             
[10] reliabilitydiag_0.2.1  tidyr_1.3.1            dplyr_1.1.4           

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1        farver_2.1.2            fastmap_1.2.0          
 [4] TH.data_1.1-3           tensorA_0.36.2.1        digest_0.6.37          
 [7] estimability_1.5.1      lifecycle_1.0.4         StanHeaders_2.36.0.9000
[10] survival_3.8-3          processx_3.8.6          magrittr_2.0.3         
[13] posterior_1.6.1         compiler_4.5.1          rlang_1.1.6            
[16] tools_4.5.1             yaml_2.3.10             collapse_2.1.2         
[19] data.table_1.17.8       knitr_1.50              labeling_0.4.3         
[22] bridgesampling_1.1-2    htmlwidgets_1.6.4       pkgbuild_1.4.8         
[25] curl_6.4.0              plyr_1.8.9              RColorBrewer_1.1-3     
[28] cmdstanr_0.9.0.9000     multcomp_1.4-28         abind_1.4-8            
[31] withr_3.0.2             purrr_1.1.0             grid_4.5.1             
[34] stats4_4.5.1            xtable_1.8-4            inline_0.3.21          
[37] emmeans_1.11.2          scales_1.4.0            MASS_7.3-65            
[40] insight_1.3.1           cli_3.6.5               mvtnorm_1.3-3          
[43] rmarkdown_2.29          generics_0.1.4          RcppParallel_5.1.10    
[46] reshape2_1.4.4          rstan_2.36.0.9000       stringr_1.5.1          
[49] splines_4.5.1           parallel_4.5.1          matrixStats_1.5.0      
[52] vctrs_0.6.5             V8_6.0.5                Matrix_1.7-3           
[55] sandwich_3.1-1          jsonlite_2.0.0          Formula_1.2-5          
[58] glue_1.8.0              codetools_0.2-20        ps_1.9.1               
[61] distributional_0.5.0    stringi_1.8.7           gtable_0.3.6           
[64] QuickJSR_1.8.0          tibble_3.3.0            pillar_1.11.0          
[67] htmltools_0.5.8.1       Brobdingnag_1.2-9       R6_2.6.1               
[70] evaluate_1.0.4          lattice_0.22-7          backports_1.5.0        
[73] rstantools_2.4.0.9000   coda_0.19-4.1           gridExtra_2.3          
[76] nlme_3.1-168            checkmate_2.3.3         mgcv_1.9-3             
[79] xfun_0.52               zoo_1.8-14              pkgconfig_2.0.3