8.4 Check the data
As always, check any new dataset carefully before you start analysis.
## Rows: 205
## Columns: 7
## $ time <dbl> 10, 30, 35, 99, 185, 204, 210, 232, 232, 279, 295, 355, 386…
## $ status <dbl> 3, 3, 2, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 1,…
## $ sex <dbl> 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1,…
## $ age <dbl> 76, 56, 41, 71, 52, 28, 77, 60, 49, 68, 53, 64, 68, 63, 14,…
## $ year <dbl> 1972, 1968, 1977, 1968, 1965, 1971, 1972, 1974, 1968, 1971,…
## $ thickness <dbl> 6.76, 0.65, 1.34, 2.90, 12.08, 4.84, 5.16, 3.22, 12.88, 7.4…
## $ ulcer <dbl> 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $Continuous
## label var_type n missing_n missing_percent mean sd min
## time time <dbl> 205 0 0.0 2152.8 1122.1 10.0
## status status <dbl> 205 0 0.0 1.8 0.6 1.0
## sex sex <dbl> 205 0 0.0 0.4 0.5 0.0
## age age <dbl> 205 0 0.0 52.5 16.7 4.0
## year year <dbl> 205 0 0.0 1969.9 2.6 1962.0
## thickness thickness <dbl> 205 0 0.0 2.9 3.0 0.1
## ulcer ulcer <dbl> 205 0 0.0 0.4 0.5 0.0
## quartile_25 median quartile_75 max
## time 1525.0 2005.0 3042.0 5565.0
## status 1.0 2.0 2.0 3.0
## sex 0.0 0.0 1.0 1.0
## age 42.0 54.0 65.0 95.0
## year 1968.0 1970.0 1972.0 1977.0
## thickness 1.0 1.9 3.6 17.4
## ulcer 0.0 0.0 1.0 1.0
##
## $Categorical
## data frame with 0 columns and 205 rows
As can be seen, all of the variables are currently coded as continuous/numeric.
The <dbl>
stands for ‘double’, meaning numeric which comes from ‘double-precision floating point’, an awkward computer science term.