3.9 arrange()
rows
The arrange()
function sorts rows based on the column(s) you want. By default, it arranges the tibble in ascending order:
## # A tibble: 3 x 4
## cause year sex deaths_millions
## <chr> <dbl> <chr> <dbl>
## 1 Injuries 1990 Female 1.41
## 2 Injuries 2017 Female 1.42
## 3 Injuries 1990 Male 2.84
For numeric variables, we can just use a -
to sort in descending order:
## # A tibble: 3 x 4
## cause year sex deaths_millions
## <chr> <dbl> <chr> <dbl>
## 1 Non-communicable diseases 2017 Male 21.74
## 2 Non-communicable diseases 2017 Female 19.15
## 3 Non-communicable diseases 1990 Male 13.91
The -
doesn’t work for categorical variables; they need to be put in desc()
for arranging in descending order:
## # A tibble: 4 x 4
## cause year sex deaths_millions
## <chr> <dbl> <chr> <dbl>
## 1 Communicable diseases 1990 Male 8.06
## 2 Communicable diseases 2017 Male 5.47
## 3 Non-communicable diseases 1990 Female 12.8
## 4 Non-communicable diseases 2017 Female 19.15
3.9.1 Factor levels
arrange()
sorts characters alphabetically, whereas factors will be sorted by the order of their levels.
Let’s make the cause column into a factor:
When we first create a factor, its levels will be ordered alphabetically:
## [1] "Communicable diseases" "Injuries"
## [3] "Non-communicable diseases"
But we can now use fct_relevel()
inside mutate()
to change the order of these levels:
gbd_factored <- gbd_factored %>%
mutate(cause = cause %>%
fct_relevel("Injuries"))
gbd_factored$cause %>% levels()
## [1] "Injuries" "Communicable diseases"
## [3] "Non-communicable diseases"
fct_relevel()
brings the level(s) listed in it to the front.
So if we use arrange()
on gbd_factored
, the cause
column will be sorted based on the order of its levels, not alphabetically.
This is especially useful in two places:
- plotting - categorical variables that are characters will be ordered alphabetically (e.g., think barplots), regardless of whether the rows are arranged or not;
- statistical tests - the reference level of categorical variables that are characters is the alphabetically first (e.g., what the odds ratio is relative to).
However, making a character column into a factor gives us power to give its levels a non-alphabetical order, giving us control over plotting order or defining our reference levels for use in statistical tests.