Implementing dynamic scope for Fennel and Lua

I’m continuing my work on fennel-cljlib, my port of clojure.core and some other core libraries, focusing on porting missing functions and features to it. One such feature, which I sometimes miss in Lua and Fennel, is dynamic binding.

The Lua VM doesn’t provide dynamic scoping as a language feature, and Fennel itself doesn’t introduce any concepts like Clojure’s Var. However, we can still implement dynamic scoping that works similarly to Clojure and other Lisps using the debug library. Most of the ideas are based on information from the guides at leafo.net. There’s even a “Dynamic scoping in Lua” guide that implements a slightly different version of this feature, requiring variables to be referenced by name via dynamic("var_name") call. While this approach is feasible, I wanted something more in line with how other Lisps work, so let’s explore advancing it further. Luckily for us, Leafo already has all the necessary guides!

But first things first. I wanted to delay working on dynamic scoping as much as possible because it is a feature that’s hard to get right. I already have some experience with implementing dynamic scoping for one of my older libraries that implemented a condition system from Common Lisp in Fennel. This library, however, required special syntax to access all of the dynamically bound symbols and thus did not actually require anything fancy for it to work.

So what does dynamic binding/scoping mean in a language? If you know about lexical and dynamic scoping and wish to skip this tangent, feel free to do so.

In short, lexically scoped variables exist only where their lexical scope allows them to. For example, a variable defined in a block of code will only exist in that block because it is its lexical scope.

When working with languages that have higher-order functions, I often find myself in a situation where I want to refactor some code that uses anonymous functions by moving them out and giving them a name. Sometimes it’s possible; sometimes it’s not.

For example, imagine I wanted to move out this function from map:

(fn some-func [messages]
  (let [extra-data (other-func)]
    (map
     (fn [message]
       (do-stuff message extra-data))
     messages)))

If I were to do so, we would have a problem:

(fn process-message [message]
  ;; oops, extra-data is now an unknown variable
  (do-stuff message extra-data))

(fn some-func [messages]
  (let [extra-data (other-func)]
    (map process-message messages)))

So extra-data is bound lexically, and thus if we look at the lexical scope of the process-messages function, we’ll see that it tries to use extra-data while it’s not defined there. If extra-data were a global variable, it wouldn’t be problematic, but it is a local variable with a lexical scope. We could move the entire let that binds extra-data to the result of calling other-func, but let’s say we don’t want to call it on each iteration of a map because it’s slow and will do the same work repeatedly. So, what are our options here?

Well, we can make it a closure!

(fn make-message-processor [extra-data]
  (fn process-message [message]
    (do-stuff message extra-data)))

(fn some-func [messages]
  (let [extra-data (other-func)]
    (map (make-message-processor extra-data) messages)))

Now, we pass extra-data only once to the make-message-processor function, and it returns a function that has this variable stored in a closure. However, this is still a lexical scope because, as you can see, extra-data is present there.

In a language that uses dynamic scoping, this could be a whole different story. Let’s look at Clojure; although I wouldn’t recommend doing it this way, it is possible to do it this way¹:

(defn process-message [message]
  (do-stuff message extra-data))

(defn some-func [messages]
  (binding [extra-data (other-func)]
    (mapv process-message messages)))

Here, I assume that extra-data is a dynamic variable that obeys the rules of dynamic scoping. The binding call introduces a dynamic scope within which extra-data is set to the value of (other-func). It acts more like a scoped global variable, or at least you can think of it that way.

To introduce a dynamic scope, Clojure uses binding. Let’s look at it briefly:

(defmacro binding
  "binding => var-symbol init-expr

  Creates new bindings for the (already-existing) vars, with the
  supplied initial values, executes the exprs in an implicit do, then
  re-establishes the bindings that existed before.  The new bindings
  are made in parallel (unlike let); all init-exprs are evaluated
  before the vars are bound to their new values."
  {:added "1.0"}
  [bindings & body]
  (assert-args
    (vector? bindings) "a vector for its binding"
    (even? (count bindings)) "an even number of forms in binding vector")
  (let [var-ize (fn [var-vals]
                  (loop [ret [] vvs (seq var-vals)]
                    (if vvs
                      (recur  (conj (conj ret `(var ~(first vvs))) (second vvs))
                             (next (next vvs)))
                      (seq ret))))]
    `(let []
       (push-thread-bindings (hash-map ~@(var-ize bindings)))
       (try
         ~@body
         (finally
           (pop-thread-bindings))))))

It’s a simple idea: a try block without catch statements, only with a finally clause. Before we enter the try block, we set all mentioned variables to their values, and after we’re done with the body, we restore those values.

In Clojure, dynamic bindings work really well, but this is due to a combination of factors. First, the try support in the JVM is excellent, ensuring that finally will perform its intended function. Additionally, the JVM supports thread-local bindings, so even in a multithreaded context, binding still works. Finally, heh, Clojure has Vars, which makes it all possible.

Now, let’s try to implement the same concept in Fennel!

Implementing dynamic scope in Fennel

Before I descend into madness, I would say that we could do the same thing as in Clojure: set some variables, run code in a protected call, and reset the variables afterward. While this approach would indeed work, I wanted to tidy up my understanding of function environments in Lua. It’s a neat concept that Lua and a few other languages have, but Lua is one of the few languages that actually allows users to manipulate function environments. So, let’s explore this idea.

First, we need a way to forcefully set a function’s environment. This could be done in Lua 5.1 via setfenv; however, it was removed starting from Lua 5.2 onward. It can be implemented like this:

(local setfenv
  (or _G.setfenv
      (fn setfenv [f env i]
        (let [i (or i 1)]
          (case (debug.getupvalue f i)
            :_ENV (doto f (debug.upvaluejoin i (fn [] env) 1))
            nil f
            _ (setfenv f env (+ i 1)))))))

Now, we can set an environment for any function.

But what exactly is this function environment? I’ve realized that I never explained that, so here we go.

In Lua, the environment is a table² that stores the names of the variables. Starting from Lua 5.2, the environment is represented by a variable called _ENV, which is what we’re testing for in setfenv above. By default, _ENV has the same value as _G, a table that contains all global variables. However, we can change the function’s environment by modifying the value of _ENV.

For instance, in Lua, we can set _ENV to a table, and all global definitions would end up in that table:

a = 0

local function f (t)
    _ENV = t
    a = 42
    b = 322
    return t
end

local t = {}

f(t)

print(a, b) -- 0, nil
print(t.a, t.b) -- 42, 322

This is a cool feature, and we can actually use environments for sandboxing code, but that’s a story for another time. Let’s return to dynamic scoping.

Looking at this, you might get the idea that if we change the function’s environment and set our dynamic variables in it specifically, once we leave the lexical scope of that _ENV, all changes revert to normal because they never happened in a global environment!

Unfortunately, it’s not that simple. Yes, we can change the function’s environment, but it will only affect that specific function. Moreover, this change is permanent, meaning that we’ll have to reset the function back to its original environment. So, it’s not as straightforward as just changing _ENV around the code we want to run. Of course, we could write getfenv, then wrap the entire thing in a pcall, and safely restore the environment once the work is done.

However, we can’t set the environment of just the function we’re calling. _ENV is stored in a closure, so we’ll need to change all of the functions called by the function we wish to invoke with a custom environment. This makes undoing changes trickier to implement.

Luckily for us, we can bypass the need to roll back the changes to the function’s environment completely! Instead, we can simply clone the function and set its environment as we wish! Here’s an implementation:

(fn clone-function-with-env [f env]
  "Recursively clones the function `f`, and any subsequent functions that it
might call via upvalues.  Sets `env` as environment for the cloned function."
  (let [dumped (string.dump f)
        cloned (load dumped)]
    (var (done? i) (values false 1))
    (while (not done?)
      (case (debug.getupvalue f i)
        (where (name val) (= :function (type val)))
        (let [subf (clone-function-with-env val env)]
          (debug.setupvalue cloned i subf))
        name
        (debug.upvaluejoin cloned i f i)
        nil (set done? true))
      (set i (+ i 1)))
    (setfenv cloned env)))

Finally, we write the function that will call a given function f in a context where the given bindings are dynamically bound:

(fn dynamic-call [bindings f ...]
  "Calls `f` with `bindings` as its root environment."
  (let [new-env (setmetatable bindings {:__index _ENV})
        f* (clone-function-with-env f new-env)]
    (f* ...)))

As well, as a convenience macro for using it like a let but with dynamic binding:

(macro binding [bindings ...]
  (assert-compile (sequence? bindings) "expected a sequence of bindings" bindings)
  (assert-compile
   (= 0 (% (length bindings) 2))
   "expected an even number of forms in binding sequence"
   bindings)
  `(dynamic-call
    ,(faccumulate [res {} i 1 (length bindings) 2]
       (doto res
         (tset (tostring (. bindings i)) (. bindings (+ i 1)))))
    (fn [] ,...)))

Now we can use dynamic binding in Fennel!

Usage example

To illustrate, let’s create some variables that we wish to treat dynamically:

(global foo 21)
(global bar 73)

(print foo bar) ;; 21	73

With globals in place, we can try our binding macro:

(binding [foo 42]
  (print foo bar))
;; 42 73

As can be seen, foo no longer refers to 21, but now it is 42. However, if we try to print foo outside of binding’s scope, we will again get 21:

(binding [foo 42]
  (print foo))
;; 42
(print foo)
;; 21

A keen reader would mention that this example is not so different from using a plain let:

(let [foo 42]
  (print foo))
;; 42
(print foo)
;; 21

And you’d be right! However, where it’s going to be different is when we put functions into the mix:

(fn f []
  (print "f:" foo bar))

How, if we were to call it inside of let, the bindings introduced by it won’t affect the function, because both foo and bar are not lexically present here:

(let [foo 42
      bar 1337]
  (f))
;; still prints:
;; f: 21 73

This is where binding jumps in. Instead of following lexical binding rules, that are natural for most languages, we now introduce dynamic binding of foo and bar:

(binding [foo 42
          bar 1337]
  (f))
;; prints:
;; f: 42 1337
(f)
;; prints:
;; f: 21 73

And, as can be seen above, inside binding’s scope, f sees foo as 42, and bar as 1337, while outside of it, the values are still 21 and 73. So, we never actually changed the values of foo and bar, they’re still 21 and 73, respectfully. Instead, in the scope of binding we changed how f accesses these variables.

This also works with functions that call other functions:

(fn f []
  (print "f:" foo))

(fn g []
  (f)
  (print "g:" bar))

(fn h []
  ((fn [] (print "h:" foo bar))))

(binding [foo 42
          bar 322]
  (f)
  ;; prints:
  ;; f: 42
  (g)
  ;; prints:
  ;; f: 42
  ;; g: 322
  (h)
  ;; prints:
  ;; h: 42 322
  )

That’s pretty much it! This approach has a lot of flaws though.

First, it will only work on ordinary functions, so no tricks with __call metamethod, or native functions are supported.

Second, it won’t work with coroutines either. You can’t use string.dump on something like coroutine.resume directly, so we won’t be able to do (dynamic-call {:foo 42} coroutine.resume some-coroutine). It won’t even work if we wrap coroutine.resume into an anonymous function like (dynamic-call {:foo 42} (fn [] (coroutine.resume coro))), because coro here, while being an upvalue, is not a function, so we can’t clone it.

Finally, it relies on the debug library, and recursive function dumping, which itself is already pretty crazy.

There are probably more things that can go wrong with this.

So why not just set the globals temporarily?

First of all, yes, we could just set the globals temporarily, and restore their values later:

(fn set-globals [globals]
  (collect [name new-val (pairs globals)]
    (let [old-val (. _G name)]
      (set (. _G name) new-val)
      (values name old-val))))

(fn close-handler [old-vals ok? ...]
  (each [name val (pairs old-vals)]
    (set (. _G name) val))
  (if ok?
      ...
      (error ... 0)))

(fn call-with-temp-globals [globals f ...]
  (-> globals
      set-globals
      (close-handler (pcall f ...))))

(call-with-temp-globals {:foo 123 :bar 456} g)
;; prints:
;; f: 123
;; g: 456
(g)
;; prints:
;; f: 21
;; g: 73

While this works, I don’t like the idea that we’re actually changing the values instead of shadowing them in the environment, though this is more akin to the original Clojure implementation. Since Lua is single-threaded it should not be problematic, however, I think it can still mess things up if we were to introduce some kind of an asynchronous scheduler, like in my async.fnl library. This also messes up stacktraces in cases where an error occurs as a result of calling f because you can’t re-throw errors in Lua like in other languages. There’s also a potential to use <close> marker that came in Lua 5.4 to avoid pcall altogether:

local function set_globals(globals)
    local old_values = {}
    for name, new_val in pairs(globals) do
        old_values[name] = _G[name]
        _G[name] = new_val
    end
    return old_values
end

local function close_handler(old_values)
    for name, val in pairs(old_values) do
        _G[name] = val
    end
end

local function dynamic_call_close(globals, f, ...)
    local old_values <close> =
        setmetatable(set_globals(globals), {__close = close_handler})
    return f(...)
end

This would keep stack traces intact, and values would be restored right when we exit dynamic_call_close but will only work in Lua 5.4.

Thus, while doing it with pcall is more generic, I wanted to explore the environment approach first since Lua already provides a mechanism for working with function environments. But for now, I think I’ll leave dynamic scoping out of cljlib as I’m not really sold on any of the ways of doing it that I’ve come up with so far.

I really don’t like this way of refactoring this code, but sometimes it is exactly what you need. Not this time, though. The approach with closures is what I’d use in this case in Clojure too. ↩︎
Everything in Lua is a table, because of course it is. ↩︎

Comment via email