9.6 Modelling strategy for binary outcomes

A statistical model is a tool to understand the world. The better your model describes your data, the more useful it will be. Fitting a successful statistical model requires decisions around which variables to include in the model. Our advice regarding variable selection follows the same lines as in the linear regression chapter.

  1. As few explanatory variables should be used as possible (parsimony);
  2. Explanatory variables associated with the outcome variable in previous studies should be accounted for;
  3. Demographic variables should be included in model exploration;
  4. Population stratification should be incorporated if available;
  5. Interactions should be checked and included if influential;
  6. Final model selection should be performed using a “criterion-based approach”
  • minimise the Akaike information criterion (AIC)
  • maximise the c-statistic (area under the receiver operator curve).

We will use these principles through the next section.