Why should we use the pipe?
The pipe has a huge advantage over any other method of processing data in R: it makes processes easy to read. If we read
%>% as “then”, the code from the previous section is very easy to digest as a set of instructions in plain English:
Load tidyverse packagesTo get our result, take the mtcars dataframe, THEN
Group its entries by number of cylinders, THEN
Compute the mean miles-per-gallon of each group
This is far more readable than if we were to express this process in another way. The two options below are different ways of expressing the previous code, but both are worse for a few reasons.
# Option 1: Store each step in the process sequentially
result <- group_by(mtcars, cyl)
result <- summarise(result, meanMPG = mean(mpg))# Option 2: chain the functions together
> result <- summarise(
meanMPG = mean(mpg))
Option 1 gets the job done, but overwriting our output dataframe
result in every line is problematic. For one, doing this for a procedure with lots of steps isn’t efficient and creates unnecessary repetition in the code. This repetition also makes it harder to identify exactly what is changing on each line in some cases.
Option 2 is even less practical. Nesting each function we want to use gets ugly fast, especially for long procedures. It’s hard to read, and harder to debug. This approach also makes it tough to see the order of steps in the analysis, which is bad news if you want to add new functionality later.
It’s easy to see how using the pipe can substantially improve most R scripts. It makes analyses more readable, removes repetition, and simplifies the process of adding and modifying code. Is there anything it can’t do?