## 3.9arrange() rows

The arrange() function sorts rows based on the column(s) you want. By default, it arranges the tibble in ascending order:

gbd_long %>%
arrange(deaths_millions) %>%
# first 3 rows just for printing:
slice(1:3)
## # A tibble: 3 x 4
##   cause     year sex    deaths_millions
##   <chr>    <dbl> <chr>            <dbl>
## 1 Injuries  1990 Female            1.41
## 2 Injuries  2017 Female            1.42
## 3 Injuries  1990 Male              2.84

For numeric variables, we can just use a - to sort in descending order:

gbd_long %>%
arrange(-deaths_millions) %>%
slice(1:3)
## # A tibble: 3 x 4
##   cause                      year sex    deaths_millions
##   <chr>                     <dbl> <chr>            <dbl>
## 1 Non-communicable diseases  2017 Male             21.74
## 2 Non-communicable diseases  2017 Female           19.15
## 3 Non-communicable diseases  1990 Male             13.91

The - doesn’t work for categorical variables; they need to be put in desc() for arranging in descending order:

gbd_long %>%
arrange(desc(sex)) %>%
# printing rows 1, 2, 11, and 12
slice(1,2, 11, 12)
## # A tibble: 4 x 4
##   cause                      year sex    deaths_millions
##   <chr>                     <dbl> <chr>            <dbl>
## 1 Communicable diseases      1990 Male              8.06
## 2 Communicable diseases      2017 Male              5.47
## 3 Non-communicable diseases  1990 Female           12.8
## 4 Non-communicable diseases  2017 Female           19.15

### 3.9.1 Factor levels

arrange() sorts characters alphabetically, whereas factors will be sorted by the order of their levels. Let’s make the cause column into a factor:

gbd_factored <- gbd_long %>%
mutate(cause = factor(cause))

When we first create a factor, its levels will be ordered alphabetically:

gbd_factored$cause %>% levels() ## [1] "Communicable diseases" "Injuries" ## [3] "Non-communicable diseases" But we can now use fct_relevel() inside mutate() to change the order of these levels: gbd_factored <- gbd_factored %>% mutate(cause = cause %>% fct_relevel("Injuries")) gbd_factored$cause %>% levels()
## [1] "Injuries"                  "Communicable diseases"
## [3] "Non-communicable diseases"

fct_relevel() brings the level(s) listed in it to the front.

So if we use arrange() on gbd_factored, the cause column will be sorted based on the order of its levels, not alphabetically. This is especially useful in two places:

• plotting - categorical variables that are characters will be ordered alphabetically (e.g., think barplots), regardless of whether the rows are arranged or not;
• statistical tests - the reference level of categorical variables that are characters is the alphabetically first (e.g., what the odds ratio is relative to).

However, making a character column into a factor gives us power to give its levels a non-alphabetical order, giving us control over plotting order or defining our reference levels for use in statistical tests.