11.7 Handling missing data: MCAR

Prior to a standard regression analysis, we can either:

Delete the variable with the missing data
Delete the cases with the missing data
Impute (fill in) the missing data
Model the missing data

Using the examples, we identify that smoking (MCAR) is missing completely at random.

We know nothing about the missing values themselves, but we know of no plausible reason that the values of the missing data, for say, people who died should be different to the values of the missing data for those who survived. The pattern of missingness is therefore not felt to be MNAR.

11.7.1 Common solution: row-wise deletion

Depending on the number of data points that are missing, we may have sufficient power with complete cases to examine the relationships of interest.

We therefore elect to omit the patients in whom smoking is missing. This is known as list-wise deletion and will be performed by default and usually silently by any standard regression function.

explanatory <- c("age", "sex.factor", 
                 "nodes", "obstruct.factor",  
                 "smoking_mcar")
dependent <- "mort_5yr"
fit = colon_s %>% 
  finalfit(dependent, explanatory)

TABLE 8.2: Regression analysis with missing data: List-wise deletion.
Dependent: Mortality 5 year		Alive	Died	OR (univariable)	OR (multivariable)
Age (years)	Mean (SD)	59.8 (11.4)	59.9 (12.5)	1.00 (0.99-1.01, p=0.986)	1.01 (1.00-1.02, p=0.200)
Sex	Female	243 (55.6)	194 (44.4)
	Male	268 (56.1)	210 (43.9)	0.98 (0.76-1.27, p=0.889)	1.02 (0.76-1.38, p=0.872)
nodes	Mean (SD)	2.7 (2.4)	4.9 (4.4)	1.24 (1.18-1.30, p<0.001)	1.25 (1.18-1.33, p<0.001)
Obstruction	No	408 (56.7)	312 (43.3)
	Yes	89 (51.1)	85 (48.9)	1.25 (0.90-1.74, p=0.189)	1.53 (1.05-2.22, p=0.027)
Smoking (MCAR)	Non-smoker	358 (56.4)	277 (43.6)
	Smoker	90 (49.7)	91 (50.3)	1.31 (0.94-1.82, p=0.113)	1.37 (0.96-1.96, p=0.083)

11.7.2 Other considerations

Sensitivity analysis
Omit the variable
Imputation
Model the missing data

If the variable in question is thought to be particularly important, you may wish to perform a sensitivity analysis. A sensitivity analysis in this context aims to capture the effect of uncertainty on the conclusions drawn from the model. Thus, you may choose to re-label all missing smoking values as “smoker”, and see if that changes the conclusions of your analysis. The same procedure can be performed labelling with “non-smoker”.

If smoking is not associated with the explanatory variable of interest or the outcome, it may be considered not to be a confounder and so could be omitted. That deals with the missing data issue, but of course may not always be appropriate.

Imputation and modelling are considered below.