14.1 The problem of missing data

As journal editors, we often receive studies in which the investigators fail to describe, analyse, or even acknowledge missing data. This is frustrating, as it is often of the utmost importance. Conclusions may (and do) change when missing data is accounted for. Some folk seem to not even appreciate that in conventional regression, only rows with complete data are included. By reading this, you will not be one of them!

These are the five steps to ensuring missing data are correctly identified and appropriately dealt with:

  1. Ensure your data are coded correctly.
  2. Identify missing values within each variable.
  3. Look for patterns of missingness.
  4. Check for associations between missing and observed data.
  5. Decide how to handle missing data.

We will work through a number of functions that will help with each of these.