3.9 arrange() rows

The arrange() function sorts rows based on the column(s) you want. By default, it arranges the tibble in ascending order:

gbd_long %>% 
  arrange(deaths_millions) %>% 
  # first 3 rows just for printing:
  slice(1:3)
## # A tibble: 3 x 4
##   cause     year sex    deaths_millions
##   <chr>    <dbl> <chr>            <dbl>
## 1 Injuries  1990 Female            1.41
## 2 Injuries  2017 Female            1.42
## 3 Injuries  1990 Male              2.84

For numeric variables, we can just use a - to sort in descending order:

gbd_long %>% 
  arrange(-deaths_millions) %>% 
  slice(1:3)
## # A tibble: 3 x 4
##   cause                      year sex    deaths_millions
##   <chr>                     <dbl> <chr>            <dbl>
## 1 Non-communicable diseases  2017 Male             21.74
## 2 Non-communicable diseases  2017 Female           19.15
## 3 Non-communicable diseases  1990 Male             13.91

The - doesn’t work for categorical variables; they need to be put in desc() for arranging in descending order:

gbd_long %>% 
  arrange(desc(sex)) %>% 
  # printing rows 1, 2, 11, and 12
  slice(1,2, 11, 12)
## # A tibble: 4 x 4
##   cause                      year sex    deaths_millions
##   <chr>                     <dbl> <chr>            <dbl>
## 1 Communicable diseases      1990 Male              8.06
## 2 Communicable diseases      2017 Male              5.47
## 3 Non-communicable diseases  1990 Female           12.8 
## 4 Non-communicable diseases  2017 Female           19.15

3.9.1 Factor levels

arrange() sorts characters alphabetically, whereas factors will be sorted by the order of their levels. Let’s make the cause column into a factor:

gbd_factored <- gbd_long %>% 
  mutate(cause = factor(cause))

When we first create a factor, its levels will be ordered alphabetically:

gbd_factored$cause %>% levels()
## [1] "Communicable diseases"     "Injuries"                 
## [3] "Non-communicable diseases"

But we can now use fct_relevel() inside mutate() to change the order of these levels:

gbd_factored <- gbd_factored %>% 
  mutate(cause = cause %>% 
           fct_relevel("Injuries"))

gbd_factored$cause %>% levels()
## [1] "Injuries"                  "Communicable diseases"    
## [3] "Non-communicable diseases"

fct_relevel() brings the level(s) listed in it to the front.

So if we use arrange() on gbd_factored, the cause column will be sorted based on the order of its levels, not alphabetically. This is especially useful in two places:

  • plotting - categorical variables that are characters will be ordered alphabetically (e.g., think barplots), regardless of whether the rows are arranged or not;
  • statistical tests - the reference level of categorical variables that are characters is the alphabetically first (e.g., what the odds ratio is relative to).

However, making a character column into a factor gives us power to give its levels a non-alphabetical order, giving us control over plotting order or defining our reference levels for use in statistical tests.