8.4 Check the data

As always, check any new dataset carefully before you start analysis.

library(tidyverse)
library(finalfit)
theme_set(theme_bw())
meldata %>% glimpse()

## Rows: 205
## Columns: 7
## $ time      <dbl> 10, 30, 35, 99, 185, 204, 210, 232, 232, 279, 295, 355, 386…
## $ status    <dbl> 3, 3, 2, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 1,…
## $ sex       <dbl> 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1,…
## $ age       <dbl> 76, 56, 41, 71, 52, 28, 77, 60, 49, 68, 53, 64, 68, 63, 14,…
## $ year      <dbl> 1972, 1968, 1977, 1968, 1965, 1971, 1972, 1974, 1968, 1971,…
## $ thickness <dbl> 6.76, 0.65, 1.34, 2.90, 12.08, 4.84, 5.16, 3.22, 12.88, 7.4…
## $ ulcer     <dbl> 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…

meldata %>% ff_glimpse()

## $Continuous
##               label var_type   n missing_n missing_percent   mean     sd    min
## time           time    <dbl> 205         0             0.0 2152.8 1122.1   10.0
## status       status    <dbl> 205         0             0.0    1.8    0.6    1.0
## sex             sex    <dbl> 205         0             0.0    0.4    0.5    0.0
## age             age    <dbl> 205         0             0.0   52.5   16.7    4.0
## year           year    <dbl> 205         0             0.0 1969.9    2.6 1962.0
## thickness thickness    <dbl> 205         0             0.0    2.9    3.0    0.1
## ulcer         ulcer    <dbl> 205         0             0.0    0.4    0.5    0.0
##           quartile_25 median quartile_75    max
## time           1525.0 2005.0      3042.0 5565.0
## status            1.0    2.0         2.0    3.0
## sex               0.0    0.0         1.0    1.0
## age              42.0   54.0        65.0   95.0
## year           1968.0 1970.0      1972.0 1977.0
## thickness         1.0    1.9         3.6   17.4
## ulcer             0.0    0.0         1.0    1.0
## 
## $Categorical
## data frame with 0 columns and 205 rows

As can be seen, all of the variables are currently coded as continuous/numeric. The <dbl> stands for ‘double’, meaning numeric which comes from ‘double-precision floating point’, an awkward computer science term.