Tech news 10: Pipes
What for?
Pipes are present is many languages and they allow passing objects from one function to another without having to create an intermediary object and keeping a logical and readable flow. In the command line, you can read all files names in a folder, pass them to grep to select all “.R” files and pass them all to Rscript to execute them. Instead of creating intermediate files, the result of each function is passed to the next by a pipe, represented by “|” in bash.
In R, there were two classical ways of going through a workflow. Creating intermediate objects even though not useful to keep:
# Abundances in three sites
<- data.frame(abundance = 20:50,
dt site = rep(
c("A", "B", "C"),
each = 10))
<- subset(dt, site %in% c("A", "B"))
sitesAB <- sample(sitesAB$abundance, size = 5)
subSample <- sort(subSample,
sortedAbundance descending = FALSE)
Or nesting functions with the disadvantage that you have to read the workflow from the inside to the outside and arguments of the same function are sometimes far from each other, a hard-to-digest-sandwich…
<- sort(
sortedAbundance sample(
subset(dt,
%in% c("A", "B"))),
site size = 5),
= FALSE) descending
Using a pipe, it looks like this:
<- dt %>%
sortedAbundance subset(site %in% c("A", "B")) %>%
sample(size = 5) %>%
sort(descending = FALSE)
Easier to read, intention is clear, arguments stay closer to each other and input and output objects are also close to each other: dt
is transformed into sortedAbundance
.
The pipe takes the object passed to it and passes it as the first argument of the next function. If we need to pass the piped object to the second argument, we can name arguments or use a place holder:
c("A","B") %>% sub(pattern = “A”,
replacement = “”)
c("A","B") %>% sub(pattern = “A”,
replacement = “”,
x = .)
The magrittr
pipe
The magrittr
package with its iconic %>%
pipe, was first published early 2014, apparently it caught traction pretty quick and Rstudio developers contacted the creator: “We also worked on a pipe %.%
but it’s not as functional, practical and rich as yours, could we collaborate?”
This pipe, together with the tidyverse grammar then revolutionised the R ecosystem…
The base pipe
And eventually, R developed its own native pipe |>
. In appearance, its usage is very similar with one difference being that the placeholder is “_” instead of “.”. You can read more in this Hadley Wickam article or in the pipe section of his book in which he recommends base
pipe over magrittr
pipe:
For simple cases,
|>
and%>%
behave identically. So why do we recommend the base pipe? Firstly, because it’s part of base R, it’s always available for you to use, even when you’re not using the tidyverse. Secondly,|>
is quite a bit simpler than%>%
: in the time between the invention of%>%
in 2014 and the inclusion of|>
in R 4.1.0 in 2021, we gained a better understanding of the pipe. This allowed the base implementation to jettison infrequently used and less important features.
Partly because the base
pipe is simpler, it has no overhead and is much faster than the magrittr
pipe:
> system.time({for(i in 1:1e5) identity(x)})
R
user system elapsed 0.015 0.000 0.015
> system.time({for(i in 1:1e5) x |> identity()})
R
user system elapsed 0.015 0.000 0.015
> system.time({for(i in 1:1e5) x %>% identity()})
R
user system elapsed 0.105 0.001 0.106
The other pipes
Less frequent as the well-known forward pipe %>%
, the magrittr
package offers other pipes!
- The assignment pipe
%<>%
: Pipe an object forward into a function or call expression and update the left-hand-side object with the resulting value.
%<>% mean() # equivalent to dt <- dt %>% mean() dt
- The exposition pipe
%$%
: Expose the names in left-hand-side to the right-hand-side expression. This is useful when functions do not have a built-in data argument.
%>%
iris subset(Sepal.Length > mean(Sepal.Length)) %$%
cor(Sepal.Length, Sepal.Width)
#> [1] 0.3361992
- The tee pipe
%T%
: Pipe a value forward into a function- or call expression and return the original value instead of the result. This is useful when an expression is used for its side-effect, say plotting or printing.
rnorm(200) %>%
matrix(ncol = 2) %T>%
plot() %>%
colSums()
Resources
The pipe article by Hadley Wickam: https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/
The pipe article in the R for Data Science book (Hadley Wickam, second edition): https://r4ds.hadley.nz/data-transform.html#sec-the-pipe