2.4 Pipe - %>% | R for Health Data Science

2.4 Pipe - `%>%`

The pipe - denoted %>% - is probably the oddest looking thing you’ll see in this book. But please bear with us; it is not as scary as it looks! Furthermore, it is super useful. We use the pipe to send objects into functions.

In the above examples, we calculated the mean of column var1 from mydata by mean(mydata$var1). With the pipe, we can rewrite this as:

library(tidyverse)
mydata$var1 %>% mean()

## [1] 2.5

Which reads: “Working with mydata, we select a single column called var1 (with the $) and then calculate the mean().” The pipe becomes especially useful once the analysis includes multiple steps applied one after another. A good way to read and think of the pipe is “and then”.

This piping business is not standard R functionality and before using it in a script, you need to tell R this is what you will be doing. The pipe comes from the magrittr package (Figure 2.5), but loading the tidyverse will also load the pipe. So library(tidyverse) initialises everything you need.

To insert a pipe %>%, use the keyboard shortcut Ctrl+Shift+M.

With or without the pipe, the general rule “if the result gets printed it doesn’t get saved” still applies. To save the result of the function into a new object (so it shows up in the Environment), you need to add the name of the new object with the assignment arrow (<-):

mean_result <- mydata$var1 %>% mean()

FIGURE 2.5: This is not a pipe. René Magritte inspired artwork, by Stefan Milton Bache.

2.4.1 Using . to direct the pipe

By default, the pipe sends data to the beginning of the function brackets (as most of the functions we use expect data as the first argument). So mydata %>% lm(dependent~explanatory) is equivalent to lm(mydata, dependent~explanatory). lm() - linear model - will be introduced in detail in Chapter 7.

However, the lm() function does not expect data as its first argument. lm() wants us to specify the variables first (dependent~explanatory), and then wants the tibble these columns are in. So we have to use the . to tell the pipe to send the data to the second argument of lm(), not the first, e.g.,

mydata %>% 
  lm(var1~var2, data = .)