8.5 Recode the data

It is really important that variables are correctly coded for all plotting and analysis functions. Using the data dictionary, we will convert the categorical variables to factors.

In the section below, we convert the continuous variables to factors (e.g., sex %>% factor() %>%), then use the forcats package to recode the factor levels. Modern databases (such as REDCap) can give you an R script to recode your specific dataset. This means you don’t always have to recode your factors from numbers to names manually. But you will always be recoding variables during the exploration and analysis stages too, so it is important to follow what is happening here.

meldata <- meldata %>% 
  mutate(sex.factor =             # Make new variable  
           sex %>%                # from existing variable
           factor() %>%           # convert to factor
           fct_recode(            # forcats function
             "Female" = "0",      # new on left, old on right
             "Male"   = "1") %>% 
           ff_label("Sex"),       # Optional label for finalfit
         
         # same thing but more condensed code:
         ulcer.factor = factor(ulcer) %>% 
           fct_recode("Present" = "1",
                      "Absent"  = "0") %>% 
           ff_label("Ulcerated tumour"),
         
         status.factor = factor(status) %>% 
           fct_recode("Died melanoma"       = "1",
                      "Alive"               = "2",
                      "Died - other causes" = "3") %>% 
           ff_label("Status"))

We have formatted the recode of the sex variables to be on multiple lines - to make it easier for you to see the exact steps included. We have condensed for the other recodes (e.g., ulcer.factor = factor(ulcer) %>%), but it does the exact same thing as the first one.