Cross-validation vs. DIC using stack loss data
Aki Vehtari
2003-01-15
Introduction
Here is the code and results for comparing cross-validation
(CV) vs. deviance information criteria (DIC) using stack loss
data. Stack loss data is used as an example in Classic BUGS and
WinBUGS (Spiegelhalter et al., 1996, pages
27-29) and was specifically used to demonstrate the DIC by
Spiegelhalter et al. (2002). Data is
available in Classic BUGS and WinBUGS distributions. Residual
models expect for one are also available in BUGS distributions
but with slightly different priors. To make comparison as fair
as possible, Brad
Carlin kindly provided the models and priors used by
Spiegelhalter et al. (2002).
Both DIC and cross-validation estimate the expected predictive
performance, that is, expected utilities of the model (Vehtari, 2002; Vehtari and
Lampinen, 2002; Vehtari and Lampinen,
2003). We presented in the 2002 International Conference of
the Royal Statistical Society some results comparing CV and DIC
(Slides in
PDF, Abstract in
PDF).
Cross-validation was made using Matlab to divide the data to
cross-validation folds and to call Classic BUGS or WinBUGS
to do the MCMC sampling. We used DIC values reported by
Spiegelhalter et al. (2002).
Robust regression using stack loss data
Problem is to make regression model for predicting the amount of
stack loss (escaping ammonia in industrial application). There are
three predictor variables and linear regression model is used. The
model selection problem is to choose residual model. Five residual
models were compared: 1) Normal, 2) Double-exponential (Laplace),
3) Logistic, 4) Student's t-distribution with 4 degrees of freedom
(t_4), and 5) t_4 as scale mixture model.
Code
Code and explanation of files
Results
Figure 1 shows the expected predictive deviance estimated with
CV and DIC. They produce similar results, but the DIC gives
consistently lower values. This is probably because of using
plug-in predictive distributions instead of full predictive
distributions, and thus ignoring the uncertainty in the
parameter values. Largest difference is in the scale mixture
model, which supports this argument. Figure 2 shows the
effective number of parameters. There is no need to compute
this in the CV approach, but it may be computed if thought that
it would provide addtional insight to the models.
|
|
Figure 1
|
Figure 2
|
In the case of DIC estimation of the uncertainty in the
estimate is still under investigation and usually only point
estimates with some heuristic is used to estimate what
difference is significant. In the case of cross-validation it
is easy to estimate the associated uncertainty. Figure 3
shows the pairwise comparison of t_4 scale mixture model to
every other model. Comparison is presented by plotting the
the distribution of the estimate of the difference between
the expected utilities of two models. It is easy to see
differences and associated uncertainties. Note that the
amount of uncertainty in the comparison depends heavily on
which models are compared. From these results it is also
possible to compute the probability that one model is better
than other one. For example, probabilities that t_4 scale
mixture model is better than models 1,2,3 and 4 are 0.85,
0.48, 0.96 and 0.98, respectively. Models 2 and 5 have better
predictive performance than models 1,2 and 4. Models 2 and 5
are indistinguishable on grounds of predictive performance.
|
Figure 3
|
References
- Spiegelhalter, D. J., Thomas, A.,
Best, N. G., and Gilks, W. R.
(1996). BUGS Examples Volume 1, Version 0.5, (version ii).
Cambridge: Medical Research Council Biostatistics Unit. (PDF)
- Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and van der
Linde, A. (2002). Bayesian measures of model complexity and fit
(with discussion). Journal of the Royal Statistical Society.
Series B (Statistical Methodology), 64(3):583-639. (PDF)
- Vehtari, A. (2002). Discussion to `Bayesian measures of
model complexity and fit' by Spiegelhalter, D. J., Best,
N. G., Carlin, B. P., and van der Linde, A.
Journal of the Royal Statistical Society, Series B
(Statistical Methodology), 64(4):620.
(PostScript)
(PDF)
- Vehtari, A. and Lampinen, J. (2002).
Bayesian model assessment and comparison using
cross-validation predictive densities.
Neural Computation, 14(10):2439-2468.
(PostScript)
(PDF)
- Vehtari, A. and Lampinen, J. (2003).
Expected utility estimation via cross-validation.
In J. M. Bernardo, et al., editors,
Bayesian Statistics 7, in press. Oxford
University Press.
(PostScript)
(PDF)
Aki Vehtari
Last modified: 2003-06-10