Our main benchmark for comparison is based on Normal-order full reduction.
This function beta-reduces a lambda term everywhere, including under binders. In this variant, arguments are substituted into the body of a binder before they are reduced (i.e. the leftmost, outermost reduction happens first) If an expression has a normal form, then normal order evaluation will always find it.
For example: this term
(λx. y) ((λx. x x) (λx. x x))
reduces to
y
because the infinite loop is discarded before reduction.
NOTE: We aren't trying to make reduction fast, just create a lot of calls to substitution to compare our implementations. "Optimal reduction" is a separate problem.
Here is how we might implement full reduction using a named
representation. i.e. where variables are represented by strings and bindings include variable names. This function recurses through terms, underneath binders looking for reduction. It defers to the whnf
to find the leftmost reduction in an application, deferring the reduction of arguments until after they have been substituted (or after it has been determined that they won't participate in a beta-reduction.)
nf :: Exp -> Exp
nf (Var x) = Var x
nf (Lam (Bind x e)) = Lam (Bind x (nf e))
-- recursive call under the binder
nf (App f a) =
-- look for left-most reductions in f
-- i.e. "weak-head normal form"
case whnf f of
-- Found a beta-reduction!
Lam (Bind x b) -> nf (subst x a b)
-- substitution needs to be careful of capture
-- make sure that free vars of a are not captured
-- by bound variables in b
f' -> App (nf f') (nf a)
-- compute weak-head normal form
whnf :: Exp -> Exp
whnf (Var _) = Var x
whnf (Lam b) = Lam b
whnf (App f a) =
case whnf f of
Lam (Bind x b) -> whnf (subst x a b)
f' -> App f' a
Running example of full-reduction:
λw. λx. (λy. w y (λx. w y (λx. w y))) (x x x)
^------------------------------------^
Note that we need to rename two inner x variables to avoid capture.
Using a simple, named representation, this term reduces to:
renamed to avoid capture
v v
λw. λx. w (x x x) (λx1.w (x x x) (λx2. w (x x x)))
------ ------ -------
three instances of term substituted in
With de Bruijn indices, we would represent the same term as
λ. λ. (λ. 2 0 (λ. 3 1 (λ. 4 2))) (0 0 0)
^ ^ ^ ^ ^ ^
indices for w & y depend on nesting level
and reduce it to
shift free vars in substitutee
------- -------
λ. λ. 1 (0 0 0) (λ. 2 (1 1 1) (λ. 3 (2 2 2)))
^ ^ ^
decrement free vars from body
With a locally-nameless representation, we would start with the same closed term (no free vars)
λ. λ. (λ. 2 0 (λ. 3 1 (λ. 4 2))) (0 0 0)
then we would name the outermost variables before finding the reduct
λw. λx. (λ. w 0 (λ. w 1 (λ. w 2))) (x x x)
^--------------------------------^
Now, replace substitutee with bound variable "0"
λw. λx.
w (x x x) (λ.w (x x x) (λ. w (x x x)))
And then close up the term, replacing free variables with bound variables:
λ. λ. (λ. 2 0 (λ. 3 1 (λ. 4 2))) (0 0 0)
Although we get the same result at the end, these three reduction strategies need to do extra work during what kinda looks to be a "linear" traversal of the term.
-
With simple names, the substitution
subst x a b
needs to avoid capture. To do so, it needs to calculate the free variables of the substituteea
and rename any bound variables inb
that conflict. Renaming the bound variables means that we need to find a new name that isn't free ina
, but also isn't free inb
. Furthermore, the set of names free at a certain point insideb
can change as we rename bound variables higher up. -
With de Bruin indices, the substitution needs to not just find the right place for
a
, but it also needs to shift the free variables ina
depending on the binding depth of each new position. Furthermore, each of the free variables inb
need to be decremented because we have now lost a binder.nf :: Exp -> Exp nf (Var x) = Var x nf (Lam (Bind e)) = Lam (Bind (nf e)) -- just go right under the binder nf (App f a) = case whnf f of Lam b -> nf (instantiate b a) f' -> App (nf f') (nf a) -- substitute variable bound at b with a instantiate :: Bind Exp -> Exp -> Exp instantiate = ...
-
With a locally nameless representation, each time the normalization function traverses under a lambda-expression it needs to replace that bound variable (index) with a fresh free variable. This "opening" of the lambda terms means that when we substitute, there are no bound indices that need to be shifted.
nf' :: Exp -> N Exp -- run in a state monad for fresh variable gen nf' (Var x) = return (Var x) nf' (Abs b) = do x <- newVar -- generate a fresh variable from the monad b' <- nf' (instantiate b (Var x)) -- bound variable substitution return $ Abs (close x b') nf' (App f a) = do f' <- whnf f case f' of Abs b -> do nf' (instantiate b a) -- bound variable substitution _ -> App <$> nf' f' <*> nf' a
All three of these implementation strategies, if interpreted naively, are rather slow.
We can benchmark them using Lennart's term from "Lambda Calculus Cooked Four Ways".
This term is the church encoding of:
6! == sum [1 .. 37] + 17
i.e. 720 == 719
i.e. False
Statistics about this normalization
- Number of calls to substitution required: 119,697
- AST depth: 53
- Binding depth: 25
- Average # of variable occurrences in each substitution: 1.15
In the repository, the most vanilla implementations are:
-
Lennart.Simple Augustsson's implementation of the most straightforward named, capture-avoiding substitution. With a small bugfix because no one can implement this algorithm correctly the first time.
-
Lennart.DeBruijn Augustsson's implementation of DeBruijn indices.
-
LocallyNameless.Ott translation of the output of the locally nameless backend of the Ott tool.
(This part is mostly replicating that work, with the addition of the Locally Nameless implementation for comparison. The "fourth" way from the original paper is based on a named representation that makes use of the Barendregt variable conventions. It maintains the invariant that all bound variables are unique.)
All results are available in the repository, but you'll need to look at the talk slides to see the graphs.
results/sixteen.local/ifl_talk/nf_bench.csv
These results were obtained by executing the command make normalize
when Suite.hs
has been directed to benchmark the set ifl_talk
.