3.4 Add new columns: mutate()
We met mutate()
in the last chapter.
Let’s first give the summarised column a better name, e.g., deaths_per_group
.
We can remove groupings by using ungroup()
.
This is important to remember if you want to manipulate the dataset in its original format.
We can combine ungroup()
with mutate()
to add a total deaths column, which will be used below to calculate a percentage:
gbd2017 %>%
group_by(cause, sex) %>%
summarise(deaths_per_group = sum(deaths_millions)) %>%
ungroup() %>%
mutate(deaths_total = sum(deaths_per_group))
## `summarise()` regrouping output by 'cause' (override with `.groups` argument)
## # A tibble: 6 x 4
## cause sex deaths_per_group deaths_total
## <chr> <chr> <dbl> <dbl>
## 1 Communicable diseases Female 4.91 55.74
## 2 Communicable diseases Male 5.47 55.74
## 3 Injuries Female 1.42 55.74
## 4 Injuries Male 3.05 55.74
## 5 Non-communicable diseases Female 19.15 55.74
## 6 Non-communicable diseases Male 21.74 55.74
3.4.1 Percentages formatting: percent()
So summarise()
condenses a tibble, whereas mutate()
retains its current size and adds columns.
We can also add further lines to mutate()
to calculate the percentage of each group:
# percent() function for formatting percentages come from library(scales)
library(scales)
gbd2017_summarised <- gbd2017 %>%
group_by(cause, sex) %>%
summarise(deaths_per_group = sum(deaths_millions)) %>%
ungroup() %>%
mutate(deaths_total = sum(deaths_per_group),
deaths_relative = percent(deaths_per_group/deaths_total))
gbd2017_summarised
## # A tibble: 6 x 5
## cause sex deaths_per_group deaths_total deaths_relative
## <chr> <chr> <dbl> <dbl> <chr>
## 1 Communicable diseases Female 4.91 55.74 8.8%
## 2 Communicable diseases Male 5.47 55.74 9.8%
## 3 Injuries Female 1.42 55.74 2.5%
## 4 Injuries Male 3.05 55.74 5.5%
## 5 Non-communicable diseases Female 19.15 55.74 34.4%
## 6 Non-communicable diseases Male 21.74 55.74 39.0%
The percent()
function comes from library(scales)
and is a handy way of formatting percentages
You must keep in mind that it changes the column from a number (denoted <dbl>
) to a character (<chr>
).
The percent()
function is equivalent to:
## [1] "8.8%"
This is convenient for final presentation of number, but if you intend to do further calculations/plot/sort the percentages just calculate them as fractions with:
## # A tibble: 6 x 5
## cause sex deaths_per_group deaths_total deaths_relative
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 Communicable diseases Female 4.91 55.74 0.08809
## 2 Communicable diseases Male 5.47 55.74 0.09813
## 3 Injuries Female 1.42 55.74 0.02548
## 4 Injuries Male 3.05 55.74 0.05472
## 5 Non-communicable diseases Female 19.15 55.74 0.3436
## 6 Non-communicable diseases Male 21.74 55.74 0.3900
and convert to nicely formatted percentages later with mutate(deaths_percentage = percent(deaths_relative))
.