11.5 Including missing data in demographics tables

“Table 1” in a healthcare study is often a demographics table of an “explanatory variable of interest” against other explanatory variables/confounders. Do not silently drop missing values in this table. It is easy to do this correctly with summary_factorlist(). This function provides a useful summary of a dependent variable against explanatory variables. Despite its name, continuous variables are handled nicely.

na_include=TRUE ensures missing data from the explanatory variables (but not dependent) are included. To include missing values from the dependent, add na_include_dependent = TRUE. Including a total column (total_col = TRUE) is also useful, as well as column totals (add_col_totals = TRUE).

If you are using a lot of continuous explanatory variables with missing values, then these can be seen easily using add_row_totals = TRUE.

Note that missing data is not included when p-values are generated. If you wish missing data to be passed to statistical tests, then include na_to_p = TRUE.

TABLE 11.1: Simulated missing completely at random (MCAR) and missing at random (MAR) dataset.
label levels No Yes (Missing) Total p
Total N (%) 732 (78.8) 176 (18.9) 21 (2.3) 929
Age (years) Mean (SD) 60.2 (11.5) 57.3 (13.3) 63.9 (11.9) 59.8 (11.9) 0.004
Sex Female 346 (47.3) 91 (51.7) 8 (38.1) 445 (47.9) 0.330
Male 386 (52.7) 85 (48.3) 13 (61.9) 484 (52.1)
nodes Mean (SD) 3.7 (3.7) 3.5 (3.2) 3.3 (3.1) 3.7 (3.6) 0.435
Smoking (MCAR) Non-smoker 500 (68.3) 130 (73.9) 15 (71.4) 645 (69.4) 0.080
Smoker 154 (21.0) 26 (14.8) 3 (14.3) 183 (19.7)
(Missing) 78 (10.7) 20 (11.4) 3 (14.3) 101 (10.9)
Smoking (MAR) Non-smoker 456 (62.3) 115 (65.3) 14 (66.7) 585 (63.0) 0.822
Smoker 112 (15.3) 26 (14.8) 3 (14.3) 141 (15.2)
(Missing) 164 (22.4) 35 (19.9) 4 (19.0) 203 (21.9)