Tips on non-standard evaluation in R
原文链接 https://renkun-ken.github.io/blog/2014/12/03/tips-on-non-standard-evaluation-in-r.html
注:以下为加速网络访问所做的原文缓存,经过重新格式化,可能存在格式方面的问题,或偶有遗漏信息,请以原文为准。
One of my favorite features of R is its meta-programming facilities. It can be simply demonstrated by the following examples.
An ordinary use of R is to do statistical computing. We can evaluate something like
sin(0)
[1] 0
Meta-programming in R allows users to manipulate the expression to evaluate. We can use quote
to create an object that represents a function call.
quote(sin(0))
sin(0)
In this way, sin(0)
is not evaluated but parsed as a call
object which basically can be represented as a list of function name and the arguments.
as.list(quote(sin(0)))
[[1]]
sin
[[2]]
[1] 0
Now we can use some functions to manipulate the expression so that we can alter the expression to evaluate.
expr <- quote(sin(0))
expr[[1L]] <- quote(cos)
expr
cos(0)
Now we can see the expression is modified. This feature, as stated in the official documentation, is computing on language, that is, R not only is able to compute on literal values, but also on language itself. Then what can we do with the modified expression? We can evaluate it using eval()
as if we do in the console.
eval(expr)
[1] 1
The meta-programming feature requires the definition of language objects and a meta-function to evaluate such a language object. In R, a call
object represents a function call like sin(x)
, a name
or symbol
represents a variable/symbol like x
, a numeric
, character
, etc. represents literal values like 1
, "a"
, and finally eval()
evaluates such a language object in a specific context.
The evaluation context matters when we evaluate an expression containing symbols that are not self-contained. Consider the following example.
expr <- quote(sin(x))
eval(expr)
Error in eval(expr, envir, enclos): object 'x' not found
x
is not found because in the evaluation environment, there is no value assigned to symbol x
. If we assign some value to x
now,
x <- 0
eval(expr)
[1] 0
and evaluate the expression again, x
can be found and the value can be successfully calculated. More specifically, expr
is evaluated in the global environment. Once x
is given a value in this environment, the expression can be evaluated.
We can create our own environment for an expression to evaluate using new.env()
.
env <- new.env()
env$y <- 0
expr <- quote(cos(y))
expr
cos(y)
If we evaluate cos(y)
directly in the global environment, it should produce an error.
eval(expr)
Error in eval(expr, envir, enclos): object 'y' not found
If we evaluate it in env
where y
is properly defined, then it should produce the right result.
eval(expr, env)
[1] 1
We can also create two environments, one being the parent of the other, and evaluate the expression.
env1 <- new.env()
env1$x <- 1
env2 <- new.env(parent = env1)
env2$y <- 2
Now we have two environments, env1
and env2
. env2
's parent is env1
, and env1
's parent is global environment. This means that when eval()
evaluates an expression in the context of env2
and encounters a symbol that is not defined in it, it will go to env1
, and then global environment, package environments, and base environment. Now we evaluate a simple arithmetic expression containing symbols in these two environments.
expr <- quote(x + y)
eval(expr, env1)
Error in eval(expr, envir, enclos): object 'y' not found
eval(expr, env2)
[1] 3
In fact, eval(expr, envir, enclos)
basically follows the following logic to evaluate a quoted expression:
- If
envir
is anenvironment
, then evaluateexpr
inenvir
by looking for symbols all the way alongenvir
and its parent environments until found. - If
envir
is alist
, then evaluateexpr
given the symbols defined in the list; Whenever a symbol is not found in the list, the function will go toenclos
environment to find along the chain until found. - If a symbol is not found until the empty environment (the only environment having no parent) is reached, an error occurs.
This logic has some notable "strange" behaviors. For example,
env3 <- new.env()
eval(quote(x <- 1), env3)
ls.str(env3)
x : num 1
The assignment works as expected. However, if we supply a named list of values to serve the evaluation, and specify env3
as the enclosing environment, then the assignment does not work as some might expect.
eval(quote(y <- p), list(p = 1), env3)
ls.str(env3)
x : num 1
It is understandable because list(p = 1)
provides a set of symbols that are given values. If the symbols in expr
are not defined in the list, then it should go to the enclosing environment and its parents to see if the symbol exists. Therefore the assignment does not happen in env3
at all. Only symbol lookup happens there.
Meta-programming allows a function to interpret its arguments in its own way. For example, we can write a slice
function that perform easy subsetting with a vector using non-standard evaluation.
slice <- function(x, s) {
s <- substitute(s)
x[eval(s, list(. = length(x)))]
}
substitute(s)
prevents s
from being evaluated but substitute the input value by its expression. Then we can get a call
or a name
so that we can manipulate it.
slice()
does nothing special but evaluates argument s
in a non-standard way: s
is evaluated with a specially defined symbol whose value is the length of x
. Therefore we can use it to easily slice a vector like
slice(1:10, 1:(.-3))
[1] 1 2 3 4 5 6 7
slice(1:10, c(1, .))
[1] 1 10
However, slice()
does not work correctly in the following example:
local({
p <- 3
slice(1:10, c(1,.,p))
})
Error in eval(expr, envir, enclos): object 'p' not found
In this case, p
is not found because c(1,.,p)
is not evaluated in the calling environment but the function environment whose parent is the environment where the function is defined (i.e. global environment). We need to modify slice()
to always evaluate the expression in the environment where it is created.
slice <- function(x, s) {
s <- substitute(s)
x[eval(s, list(. = length(x)), parent.frame())]
}
The enclosing environment is set to parent.frame()
to refer to the calling environment which, in this case, is exactly the context where the expression is fully meaningful.
local({
p <- 3
slice(1:10, c(1,.,p))
})
[1] 1 10 3
Using non-standard evaluation, you have to be careful. The above shows the first point: Evaluate the expression in a context where the expression is fully meaningful. In R, you need to take care of the environments to ensure the symbol search path is correct. To do that, you need to be aware of the scope of the evaluation context.
Another point I want to stress is that a danger of non-standard evaluation is potential clash of symbol interpretation. If two functions both use non-standard evaluation to facilitate some kind of tasks, they might clash on interpreting certain symbols.
For example, functions in rlist package and magrittr package use some non-standard evaluation to make things easy. In the following cases, they two might clash.
library("rlist")
library("magrittr")
data <- list("a","b","b","c","b","a")
list.table(data, .)
.
a b c
2 3 1
data %>% list.table(.)
Error in table(useNA = "ifany"): nothing to tabulate
In this case, .
is interpreted by list.table()
as the current element in iteration in data
, that is, it tries to create a table from data
directly by its element value. However, %>%
interprets .
differently: it understands .
as the demand that the user wants to pipe data
to .
as an argument of list.table
. Therefore, %>%
basically evaluates list.table(data)
rather than list.table(data, .)
. I call this behavior an interpretation clash, which might result in unexpected error.
Another example can be reproduced using dplyr and the latest release of magrittr. In the latest release of %>%
, it creates a chaining function easily by starting from .
. For example,
sapply(1:3, . %>% seq_len %>% sum)
[1] 1 3 6
. %>% seq_len %>% sum
actually creates a function. It works by giving .
a special behavior: if .
appears as the start of the pipeline, then a functional sequence should be created. This largely facilitates creating such functions in many cases. However, its risk is interpretation clash when such magic is used with other functions giving the same symbol different behaviors. For example,
library("dplyr")
mtcars %>%
group_by(vs) %>%
do(. %>%
arrange(desc(mpg)) %>%
head(3))
Error: Results are not data frames at positions: 1, 2
In this case do()
works with .
representing each group data frame. User might want to arrange each group by mpg
in descending order and take the top 3 records and finally get a combined data frame. However, .
encounters an interpretation clash: do()
gives .
a special meaning but %>%
understands .
differently and creates a functional sequence which is not expected by do()
.
mtcars %>%
group_by(vs) %>%
do(head(arrange(., desc(mpg)), 3))
Source: local data frame [6 x 11]
Groups: vs
mpg cyl disp hp drat wt qsec vs am gear carb
1 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
2 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
3 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
4 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
5 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
6 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
My conclusion is simple: non-standard evaluation can be magic, but be careful when you use it. It may produce unexpected errors because the evaluation context is wrong, or the interpretation of a symbol is inconsistent.