Andrey Listopadov

Iterator-based transducers in Lua

@programming lua ~19 minutes read

Early on in my career, when I saw a function that returns an anonymous function, I felt a weird mix of emotions. It was just a weird thing to do, especially when you’re coming from C, where there are no anonymous functions. Such concepts as lambdas in C++ were hard to grasp, and I’m thankful that instead of continuing to hit my head against the C++ wall I picked Scheme, and it helped me understand many core concepts, such as higher-order functions that we’ll be mainly talking about in this post. Today, when I see functions returning functions I understand what happens and feel comfortable. But not so long ago, functions returning functions returning functions were still alienating to me, at least a bit.

After I tackled transducers and tried to explain them in a post, I became much more comfortable with all these stacked-up anonymous functions. Clojure, actually spices things up with arity overloading, so understanding transducers is even harder there, as different arities serve as different stages.

In this post, I would like to share with you some tricks with Lua iterators. I already have a post on differences between iterators and lazy sequences, so to keep things brief, here’s a short explanation of how iterators work in Lua.

  • Iterators in Lua mostly are stateless functions that accept an object, let’s call it iterable, and a key, and somehow produce the next key and a value associated with it, if any.
  • Stateful iterators usually skip the key part and work with the iterable object alone, keeping the state somewhere in a closure, or inside the object itself.

So an iterator in Lua is actually a set of values - the function to produce the next key, an object to iterate on, and some initial value. But you don’t have to write this manually, although you can, Lua has functions that produce iterators for a given object. For example, pairs is such a function, it returns the function next (the actual iterator), the iterable object, and initial key to start the iteration.

Working with such iterators is done with the for loop, by providing it an in keyword. When you say for k,v in pairs(obj) do ... end in Lua, pairs returns the next function, the obj and the first key in obj to the for loop, and subsequently calls next(obj, k) for each k. And, you can actually say for k,v in next,obj,nil do ... end and it will work the same way because that’s what pairs will return.

Now that we’ve covered iterators, let’s play with them!

What I like about Clojure’s transducers, is that it is possible to create a data pipeline, abstract from how the data comes to you and where you should put it to. A transducer is first given the instructions, on how to handle incoming data, then it is given the means how to put it somewhere. We’ll be skipping the how part here because in Lua there’s no standard interface for how to put things into an object. Depending on the data, you’ll either use table.insert, set the value with the obj[key]=value syntax, or use a method provided by the object. So instead, we’ll reuse the for loop here and will rely on it’s main interface, as it is the unambiguous way to iterate in Lua.

First, let’s create a mapping transducer:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
function mapping(f)
    local f1 = function (...)
        if nil ~= ... then
            return f(...)
        end
    end
    return function (iterator)
        return function (...)
            local wrap = function (next1, ...)
                local step = function (iterable, ...)
                    return f1(next1(iterable, ...))
                end
                return step, ...
            end
            return wrap(iterator(...))
        end
    end
end

Let’s ignore the f1 definition for now, and move to the first return statement straight ahead. After calling mapping with some function, the result of this code will itself be a function that accepts an iterator:

 6
 7
 8
 9
10
11
12
13
14
15
16
    return function (iterator)
        return function (...)
            local wrap = function (next1, ...)
                local step = function (iterable, ...)
                    return f1(next1(iterable, ...))
                end
                return step, ...
            end
            return wrap(iterator(...))
        end
    end

This function itself returns an anonymous function that will wrap the iterator and sneakily replace its next function with our own step function. Our step function isn’t magic though, it still relies on the next function from the iterator that we pass into wrap as next1 to avoid a possible name clash. You’ll never know when it hits you. The rest of the values stored as ... from the iterator are returned as is.

As you can see, we’re down three anonymous functions already. Well, wrap isn’t really anonymous, but it’s still manipulating other functions. Let’s try this transducer quickly:

inc = function (k, x) return k, x + 1 end
incrementing = mapping(inc)
incrementing_pairs = incrementing(pairs)

for _,x in incrementing_pairs {0,1,2} do
    print(x)
end
-- 1
-- 2
-- 3

Let’s break this down again.

Here, I’ve created the inc function, which accepts a key, and a value, returns the key as is, and increments the value by 1. Returning the key is necessary because this function will basically become an iterator later on. We pass it to mapping to create a wrapper function for an iterator-producing function, and we call it incrementing, because it’s what this function really does - it is incrementing every value it receives, but it doesn’t yet know how to get them.

This is decided when we pass pairs to incrementing, deciding that our function will get all key-value pairs from an object, and map inc over them. Alternatively, we could have passed ipairs and only map inc over the sequential part of the table.

Finally, we call incrementing_pairs with a table, in a for in loop, and it prints the values from the table incremented by 1. Alternatively, this can be written as:

for _,x in mapping(function (k,x) return k,x+1 end)(pairs)({0,1,2}) do
    print(x)
end

A bit more cramped, but readable.

And, basically, that’s it - now we can write more transducers. But before we do, let’s go back a bit, and understand why I had to wrap f in f1 right at the beginning of the mapping function:

2
3
4
5
6
    local f1 = function (...)
        if nil ~= ... then
            return f(...)
        end
    end

This function is called on each iteration, and iterators terminate the iteration by returning nil as a key without a value. So if we got nil as the first value of the ..., it means that the iteration is ended, and there’s no need to call f anymore. We can still call f, but it would mean that f will have to check if it is the end of the iteration, which isn’t great.

By the way, this works as a general interface, so we can use mapping with other iterators, like string.gmatch:

words = function (s) return s:gmatch("%w+") end
upcasing = mapping(string.upper)
upcased_words = upcasing(words)

-- one-liner: mapping(string.upper)(function (s) return s:gmatch("%w+") end)("so loud")
for word in upcased_words "so loud" do
    print(word)
end
-- SO
-- LOUD

Another interesting property of this approach is that we can compose different functions into a complete data pipeline, abstracting away how they interact:

upcasing = mapping(string.upper)
reversing = mapping(string.reverse)

for s in upcasing(reversing(words))("foo bar") do
    print(s)
end
-- OOF
-- RAB

We can write a helper function, that will compose several functions into a single one, like this f = compose(a, b, c); f(42) produces the same result as a(b(c(42))):

function compose(f, ...)
    local comp = f
    for i=1,select("#", ...) do
        local f,g = comp,select(i, ...)
        comp = function (...) return f(g(...)) end
    end
    return comp
end

It can be helpful when we want to create a data pipeline quickly and won’t be subject to too much chaining:

for s in compose(mapping(string.upper),mapping(string.reverse))(words)("foo bar") do
    print(s)
end

Now, let’s write a filtering transducer:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
function filtering(f)
    local f1 = function (recur, iterable, ...)
        if nil ~= ... then
            if f(...) then
                return ...
            else
                return recur(iterable, ...)
    end end end
    return function (iterator)
        return function (...)
            return (function (next1, ...)
                local function step (iterable, ...)
                    return f1(step, iterable, next1(iterable, ...))
                end
                return step, ...
            end)(iterator(...))
end end end

I hope you’ll forgive my a bit unconventional formatting here, it’s just much easier to read this way for me. I’m putting all end on the same line and vertically aligning them with all indented blocks, so it’s more like Python - when the indentation changes, the scope usually does too. Mainly, it saves so much vertical space, and this article is already long. Anyway, This function is even more complicated, so let’s break it down too.

First, you’ll notice that the f1 function is different. Again, we’ll ignore it for the time being, and move straight to the function that wraps the iterator:

 9
10
11
12
13
14
15
16
17
    return function (iterator)
        return function (...)
            return (function (next1, ...)
                    local function step (iterable, ...)
                        return f1(step, iterable, next1(iterable, ...))
                    end
                    return step, ...
            end)(iterator(...))
    end end

Same as before, it returns a function, wrapping a call to iterator to swap its next function with our step. Although, this time around I instead use an IIFE to avoid introducing an explicit wrapper. The step function, however, must be a named one, e.g. not local step = function () ... but local function step () ..., as it passes itself to f1 along with the object we iterate on, as well as everything from next1. We’ll get to that shortly.

Same as with mapping, we can use filtering with a function, but now we don’t actually need to return a key. This is because this time f1 check if f returns a truthy value, and if we were to return a key, it would count as truth:

is_even = function (_,v) return v%2==0 end
for _,v in filtering(is_even)(pairs)({0,1,2,3,4}) do
    print(v)
end
-- 0
-- 2
-- 4

As before, we can use compose to create a data pipeline that will use other transducers, like mapping:

for _,v in compose(mapping(inc),filtering(is_even))(pairs)({0,1,2,3,4}) do
  print (v)
end
-- 1
-- 3
-- 5

But wait, what’s the big idea? Right now it looks much more complicated than just writing map and filter, which is like half the amount of code, even without packing end on the same line, and is_even and inc don’t have to deal with keys:

function map(f, tbl)
    res = {}
    for k,v in pairs(tbl) do
        res[k]=f(v)
    end
    return res
end

function filter(pred, tbl)
    res = {}
    for k,v in pairs(tbl) do
        if pred(v) then
            res[k] = v
    end end
    return res
end

function inc (x) return x+1 end
function is_even (x) return x%2==0 end

for _,v in pairs(map(inc, filter(is_even, {0,1,2,3,4}))) do
    print(v)
end
-- 1
-- 3
-- 5

But there’s a big difference with this approach. First, this code isn’t as general, as both map and filter don’t know how to iterate things other than tables. They also hard-code pairs as part of their process, so we will always iterate through all keys no matter what our goal is. Finally, this style produces intermediate tables on each step: first, filter produces a table with even numbers, then map consumes this table and produces an entirely new table with altered values. Which wastes memory depending on the overall size of the data and gives GC extra work.

Transducers fix all these issues - both mapping and filtering are abstracted from how we do the iteration itself. And instead of producing intermediate tables, everything is done on a per-element level. Here’s how we can visualize:

function inc (k,x) print("inc "..x) return k,x+1 end
function is_even (_,x) print(x.." even?", x%2==0) return x%2==0 end

for _,v in compose(mapping(inc),filtering(is_even))(pairs)({0,1,2,3,4}) do
  print("result", v)
end
-- 0 even?  true
-- inc      0
-- result   1
-- 1 even?  false
-- 2 even?  true
-- inc      2
-- result   3
-- 3 even?  false
-- 4 even?  true
-- inc      4
-- result   5

As can be seen, if the x is even, we pass it to inc, otherwise, we get another one and check it. The “Transducers” article by Fasih Khatib has a nice visualization for this process:

Figure 1: Chaining

Figure 1: Chaining

Figure 2: Transducing

Figure 2: Transducing

Except, in our case, some values are simply dropped during the filtering step. In order to understand how this works, let’s look closer at f1 from filtering:

2
3
4
5
6
7
8
    local f1 = function (recur, iterable, ...)
        if nil ~= ... then
            if f(...) then
                return ...
            else
                return recur(iterable, ...)
    end end end

As I mentioned, step passes itself to f1, along with the iterable, and values from next1. This function mostly ignores the first two arguments and mainly works with .... But if f returned false or nil, we need to somehow continue the iteration until f returns a truthful value, or until the iterable is exhausted. To achieve that, we recursively call back the step function, called recur here, and it produces a new value and calls f1 again. Since both are tail calls, everything is optimized into a loop.

I can prove that by creating an iterator that never exhausts, and wrapping it with filtering:

function ones () return function () return 1,1 end end
for _ in filtering(is_even)(ones)() do
    break
end
-- even? 1 false
-- even? 1 false
-- ...

This will never finish, but also won’t consume stack or memory. ones simply returns 1,1 every time it is invoked, but filtering looks for even values, so we’ll never reach the body of the loop. And that’s another benefit of transducers - same as iterators, transducers remain lazy, and can work on infinite sets of data.

Finally, let’s write a stateful transducer, such as taking. It’s going to terminate the iteration earlier, after n elements were read:

function taking(n)
    local count = 0
    return function (iterator)
        return function (...)
            return (function (next1, ...)
                return function (iterable, ...)
                    return (function (...)
                        if nil ~= ... and count < n then
                            count = count + 1
                            return ...
                    end end)(next1(iterable, ...))
                end, ...
            end)(iterator(...))
end end end

It may seem easy - all we do is that instead of accepting a filtering function, we accept a number, and check against it. Otherwise, it’s exactly the same principle as filtering, but we don’t need the recursion:

for k,v in taking(4)(pairs)({a=1,b=2,c=3}) do print(k,v) end
-- all values are printed as the table has < 4 keys
-- b 2
-- a 1
-- c 2
for _,v in taking(4)(pairs)({1,2,3,4,5,6,7}) do print(v) end
-- only first 4 values are printed
-- 1
-- 2
-- 3
-- 4

So, in the following example, we create a transducer that should take at most four elements from a collection:

four_pairs = taking(4)(pairs)

What will be printed if we use it here?

for k,v in four_pairs({a=1,b=2,c=3}) do print(k,v) end
for _,v in four_pairs({1,2,3,4,5,6,7}) do print(v) end
Results
-- for k,v in four_pairs({a=1,b=2,c=3}) do print(k,v) end
-- b 2
-- a 1
-- c 2
-- for _,v in four_pairs({1,2,3,4,5,6,7}) do print(v) end
-- 1

People who are familiar with Rich Hickey’s talk Inside Transducers, already know what’s the issue here, and why the result is different from the ones where we created transducers in each for loop. All stateful transducers must create their state anew every time they’re finalized. In our case, the finalization happens when we’re passing a collection to the resulting iterator. So in order to fix this we need to move where we’re initializing the count variable:

--- orig taking
+++ fixd taking
@@ -1,9 +1,9 @@
 function taking(n)
-    local count = 0
     return function (iterator)
         return function (...)
             return (function (next1, ...)
+                    local count = 0
                     return function (iterable, ...)
                         return (function (...)
                                 if nil ~= ... and count < n then
                                     count = count + 1

It should be done as late as possible, right before we’re returning our next function. And because this iterator is stateful, we can’t share the next function it returns with others, unlike the one we get from pairs. Now it will work as we’d expect:

four_pairs = taking(4)(pairs)
for k,v in four_pairs({a=1,b=2,c=3}) do print(k,v) end
-- b 2
-- a 1
-- c 2
for _,v in four_pairs({1,2,3,4,5,6,7}) do print(v) end
-- 1
-- 2
-- 3
-- 4

As our final step in our descendant to functional madness, let’s write a paritioning transducer, that will group individual elements into groups of n elements. It’s important to look at it, as it has a problem we haven’t encountered yet. Also, remember when I said to you that the step function must be a named one? Well, I lied:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
function partitioning (n)
    return function (iterator)
        return function (...)
            return (
                function (next1, ...)
                    local count, arr = 0, {}
                    return (function (f)
                        return (function (i) return i(i) end)(
                            function (i)
                                return f(function (x) return i(i)(x) end)
                            end)
                    end)(function (step)
                        return function(iterable, ...)
                            return (function (recur, iterable, ...)
                                if nil ~= ... then
                                    if n == count then
                                        local arr1 = arr
                                        arr1.n = count
                                        arr, count = {(...)}, 1
                                        return arr1
                                    else
                                        count = count + 1
                                        arr[count] = ...
                                        return recur(iterable, ...)
                                    end
                                elseif count ~= 0 then
                                    arr.n,count = count,0
                                    return arr
                            end end)(step, iterable, next1(iterable, ...))
                    end end), ...
                end)(iterator(...))
end end end

That’s a lot to take in. First of all, there’s no real reason to make this as complicated as it is besides proving the point that I’m proficient with functions returning functions. That thing in line 7 is a Y combinator - I used it here to move the explicit creation of the named step function into its argument and hence made it recursive without really defining it as a named function. So, you can do that in Lua, but it’s here only for fun. We can see that it works by writing another transducer that will use table.concat to pretty-print the tables:

function table_formatter (t)
    return "{"..table.concat(t, ", ").."}"
end

for s in compose(mapping(table_formatter),
                 partitioning(3))(function (s) return s:gmatch(".") end)("foobar") do
  print(s)
end
-- {f, o, o}
-- {b, a, r}

However, as I mentioned, this transducer has a problem that no other transducers had up to this point. Let’s try using it with ordinary pairs:

for k,v in partitioning(3)(pairs)({"a","b","c","d","e"}) do
    print(k,table.concat(v, " "))
end
-- table: 0x555ff86717a0        nil
-- invalid key to 'next'
-- stack traceback:
--      [C]: in function 'next'

After we get the first partition, we immediately get an error. This happens because next, the function we internally use to get the next item from the iterable object has kind of a feedback loop. It uses the result of its last call to continue iteration. Observe:

next({"a","b","c"})
-- 1, "a"
next({"a","b","c"}, 1)
-- 2, "b"
next({"a","b","c"}, "x")
-- invalid key to 'next'
-- stack traceback:
--         [C]: in function 'next'

So, when our iterator returns the table, it is passed to next as the key, and because next can’t find this key in the table, it throws an error. This is also the reason, why we have to return keys unaltered from any function we pass to mapping. This is one major difference from Clojure because there we’re saying how to deal with that by finalizing the transducer with the reducing function. But here we don’t have any reducing function.

We could try to work around this, but there’s another glaring issue here. Look at how we’re adding values to the partition:

21
22
23
24
25
                            else
                              count = count + 1
                              arr[count] = ...
                              return recur(iterable, ...)
                            end

Even if we were to somehow store the last key, and pass it to the next call we still store values in the array in a completely wrong way. This is where Lua’s open iterator interface falls short. Because there’s no guarantee that the first argument is a key and not actual da4a, we can’t just return the first argument, and put everything else into the array. Moreover, if the iterator returns not a key-value pair, but more values, we don’t know how to put these either.

The solution?

Is of course more functions! We’re going to tell partitioning transducer upfront how it should handle the results. This, of course, limits the generality of our system, but I think this is a testament to how well Clojure is designed. Transducers were added to it basically without any internal change.

Here’s the fixed version:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
function partitioning (n, collect)
    return function (iterator)
        return function (...)
            return (
                function (next1, ...)
                    local count, arr, last = 0, {}
                    return (function (f, ...)
                        return (function (i, ...) return i(i, ...) end)(
                            function (i, ...)
                                return f(function (x, ...)
                                            return i(i, ...)(x, ...)
                                         end, ...)
                            end)
                    end)(function (step)
                        return function(iterable, ...)
                            return (function (recur, iterable, ...)
                                if nil ~= ... then
                                    if n-1 == count then
                                        local arr1, count1 = arr, count
                                        arr, count = {}, 0
                                        return (function (...)
                                                   last = ...
                                                   return ...
                                                end)(collect(arr1, count1+1, ...))
                                    else
                                        count = count + 1
                                        return recur(iterable,
                                                     (function (...)
                                                          last = ...
                                                          return ...
                                                     end)(collect(arr, count, ...)))
                                    end
                                elseif count ~= 0 then
                                    count = 0
                                    return last, arr
                            end end)(step, iterable, next1(iterable, ...))
                    end end), ...
                end)(iterator(...))
end end end

I’ve “fixed” the Y combinator to propagate all of its arguments into all of its functions, instead of just one, and added the last key storage, so we could properly terminate the iteration, and return the array. If you count just the keyword function in this piece of code, it appears a whopping 13 times! It’s crazy, yes even by my standards, I would never give such code a pass, but it’s nice to fool around and see how far can we push function manipulation style. Can I remove the storage, and put it into the function’s arguments during anonymous recursion done with the Y combinator? Eh, probably. Will I do it? Nah.

What matters is that this code now works with pairs:

function collector(arr, n, k, v)
    arr[n]=v
    return k, arr
end

partition_three_pairs=partitioning(3,collector)(pairs)

for _,v in partition_three_pairs({"a","b","c","d","e"}) do
    print(table_formatter(v))
end
-- {a, b, c}
-- {d, e}

for _,v in partition_three_pairs({a=1,b=2,c=3,d=4,e=5}) do
    print(table_formatter(v))
end
-- {1, 2, 5}
-- {3, 4}

It also correctly preserves the state, preventing it from leaking when we save the transducer finalizing it with paris. The same transducer works with the single-argument stateful iterator, provided a different collector:

function string_collector (arr, n, s)
    arr[n]=s
    return arr
end

function chars (s) return s:gmatch(".") end

for t in partitioning(4,string_collector)(chars)("foobar") do
    print(table_formatter(t))
end
-- {f, o, o, b}
-- {a, r}

With this, I feel that my functional desires are fulfilled. For now, at least.

I know, this article was a wall of text and code, but it is mostly because I was really passionate about the topic, and wanted to get it out of my system, so I could continue working on the game3. If you’ve made it til here, and you have questions about the code, either because my explanations were unclear, or simply because there’s not enough of them, feel free to contact me, and I will try to answer, and update the post.

If instead you looked at the code and thought something like “well that escalated quickly”, or that I’m just a crazy person, doing over-complicated things - well that’s a valid point of view as well! I wouldn’t say that these transducers are practical, and their runtime characteristics and impact on the Lua runtime performance is unclear to me. I mean, when I talked about how map and filter produce intermediate collections that are thrown away once we’re done, and how it is bad, having a function wrapping another 12 nested function wrappers with a Y combinator in the middle isn’t much better. But still, I feel great just because I managed to make it work, and it isn’t completely insane, as all of the principles are pretty simple. Functional decorators existed for a long time, and these transducers are basically it.

Anyhow, I hope you’ve found this interesting. Thanks for reading!