Use of posterior predictive assessments to evaluate model fit in multilevel logistic regression

Martin J. Green; Graham F. Medley; William J. Browne

doi:doi:10.1051/vetres/2009013

All issues

Volume 40 / No 4 (July-August 2009)

Vet. Res., 40 4 (2009) 30

Abstract

Open Access

Issue		Vet. Res. Volume 40, Number 4, July-August 2009


Number of page(s)		10
DOI		https://doi.org/10.1051/vetres/2009013
Published online		28 March 2009
How to cite this article		Vet. Res. (2009) 40:30

How to cite this article: Vet. Res. (2009) 40:30
DOI: 10.1051/vetres/2009013

Use of posterior predictive assessments to evaluate model fit in multilevel logistic regression

Martin J. Green^{1, 2}, Graham F. Medley³ and William J. Browne⁴

¹ School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, Sutton Bonington, LE12 5RD, United Kingdom
² School of Mathematical Sciences, University of Nottingham, Nottingham, NG7 2RD, United Kingdom
³ Department of Biological Sciences, University of Warwick, Coventry, CV4 7AL, United Kingdom
⁴ Department of Clinical Veterinary Science, University of Bristol, Langford House, Langford, Bristol, BS40 5DT, United Kingdom

Received 5 November 2008; accepted 24 March 2009; published online 28 March 2009

Abstract - Assessing the fit of a model is an important final step in any statistical analysis, but this is not straightforward when complex discrete response models are used. Cross validation and posterior predictions have been suggested as methods to aid model criticism. In this paper a comparison is made between four methods of model predictive assessment in the context of a three level logistic regression model for clinical mastitis in dairy cattle; cross validation, a prediction using the full posterior predictive distribution and two “mixed” predictive methods that incorporate higher level random effects simulated from the underlying model distribution. Cross validation is considered a gold standard method but is computationally intensive and thus a comparison is made between posterior predictive assessments and cross validation. The analyses revealed that mixed prediction methods produced results close to cross validation whilst the full posterior predictive assessment gave predictions that were over-optimistic (closer to the observed disease rates) compared with cross validation. A mixed prediction method that simulated random effects from both higher levels was best at identifying the outlying level two (farm-year) units of interest. It is concluded that this mixed prediction method, simulating random effects from both higher levels, is straightforward and may be of value in model criticism of multilevel logistic regression, a technique commonly used for animal health data with a hierarchical structure.

Key words: model fit / posterior predictive assessment / mixed predictive assessment / cross validation / Bayesian multilevel model

Corresponding author: martin.green@nottingham

© INRA, EDP Sciences 2009