9.6 Modelling strategy for binary outcomes
A statistical model is a tool to understand the world. The better your model describes your data, the more useful it will be. Fitting a successful statistical model requires decisions around which variables to include in the model. Our advice regarding variable selection follows the same lines as in the linear regression chapter.
- As few explanatory variables should be used as possible (parsimony);
- Explanatory variables associated with the outcome variable in previous studies should be accounted for;
- Demographic variables should be included in model exploration;
- Population stratification should be incorporated if available;
- Interactions should be checked and included if influential;
- Final model selection should be performed using a “criterion-based approach”
- minimise the Akaike information criterion (AIC)
- maximise the c-statistic (area under the receiver operator curve).
We will use these principles through the next section.