3.13 Exercise - mutate(), summarise()

Instead of creating the two summarised tibbles and using a full_join(), achieve the same result as in the previous Exercise by with a single pipeline using summarise() and then mutate().

Hint: you have to do it the either way round, so group_by(year, cause) %>% summarise(...) first, then group_by(year) %>% mutate().

Bonus: select() columns year, cause, percentage, then spread() the cause variable using percentage as values.

Solution

## # A tibble: 7 x 4
## # Groups:   year [7]
##    year `Communicable diseases` Injuries `Non-communicable diseases`
##   <dbl> <chr>                   <chr>    <chr>                      
## 1  1990 33%                     9%       58%                        
## 2  1995 31%                     9%       60%                        
## 3  2000 29%                     9%       62%                        
## 4  2005 27%                     9%       64%                        
## 5  2010 24%                     9%       67%                        
## 6  2015 20%                     8%       72%                        
## 7  2017 19%                     8%       73%

Note that your pipelines shouldn’t be much longer than this, and we often save interim results into separate tibbles for checking (like we did with summary_data1 and summary_data2, making sure the number of rows are what we expect and spot checking that the calculation worked as expected).

R doesn’t do what you want it to do, it does what you ask it to do. Testing and spot checking is essential as you will make mistakes. We sure do.

Do not feel like you should be able to just bash out these clever pipelines without a lot of trial and error first.