Iterator-based transducers in Lua
Early on in my career, when I saw a function that returns an anonymous function, I felt a weird mix of emotions. It was just a weird thing to do, especially when you’re coming from C, where there are no anonymous functions. Such concepts as lambdas in C++ were hard to grasp, and I’m thankful that instead of continuing to hit my head against the C++ wall I picked Scheme, and it helped me understand many core concepts, such as higher-order functions that we’ll be mainly talking about in this post. Today, when I see functions returning functions I understand what happens and feel comfortable. But not so long ago, functions returning functions returning functions were still alienating to me, at least a bit.
After I tackled transducers and tried to explain them in a post, I became much more comfortable with all these stacked-up anonymous functions. Clojure, actually spices things up with arity overloading, so understanding transducers is even harder there, as different arities serve as different stages.
In this post, I would like to share with you some tricks with Lua iterators. I already have a post on differences between iterators and lazy sequences, so to keep things brief, here’s a short explanation of how iterators work in Lua.
- Iterators in Lua mostly are stateless functions that accept an object, let’s call it
iterable
, and akey
, and somehow produce the next key and a value associated with it, if any. - Stateful iterators usually skip the
key
part and work with theiterable
object alone, keeping the state somewhere in a closure, or inside the object itself.
So an iterator in Lua is actually a set of values - the function to produce the next key
, an object to iterate on, and some initial value.
But you don’t have to write this manually, although you can, Lua has functions that produce iterators for a given object.
For example, pairs
is such a function, it returns the function next
(the actual iterator), the iterable
object, and initial key
to start the iteration.
Working with such iterators is done with the for
loop, by providing it an in
keyword.
When you say for k,v in pairs(obj) do ... end
in Lua, pairs
returns the next
function, the obj
and the first key
in obj
to the for
loop, and subsequently calls next(obj, k)
for each k
.
And, you can actually say for k,v in next,obj,nil do ... end
and it will work the same way because that’s what pairs
will return.
Now that we’ve covered iterators, let’s play with them!
What I like about Clojure’s transducers, is that it is possible to create a data pipeline, abstract from how the data comes to you and where you should put it to.
A transducer is first given the instructions, on how to handle incoming data, then it is given the means how to put it somewhere.
We’ll be skipping the how part here because in Lua there’s no standard interface for how to put things into an object.
Depending on the data, you’ll either use table.insert
, set the value with the obj[key]=value
syntax, or use a method provided by the object.
So instead, we’ll reuse the for
loop here and will rely on it’s main interface, as it is the unambiguous way to iterate in Lua.
First, let’s create a mapping
transducer:
|
|
Let’s ignore the f1
definition for now, and move to the first return
statement straight ahead.
After calling mapping
with some function, the result of this code will itself be a function that accepts an iterator:
|
|
This function itself returns an anonymous function that will wrap
the iterator
and sneakily replace its next
function with our own step
function.
Our step
function isn’t magic though, it still relies on the next
function from the iterator that we pass into wrap
as next1
to avoid a possible name clash.
You’ll never know when it hits you.
The rest of the values stored as ...
from the iterator are returned as is.
As you can see, we’re down three anonymous functions already.
Well, wrap
isn’t really anonymous, but it’s still manipulating other functions.
Let’s try this transducer quickly:
inc = function (k, x) return k, x + 1 end
incrementing = mapping(inc)
incrementing_pairs = incrementing(pairs)
for _,x in incrementing_pairs {0,1,2} do
print(x)
end
-- 1
-- 2
-- 3
Let’s break this down again.
Here, I’ve created the inc
function, which accepts a key, and a value, returns the key as is, and increments the value by 1
.
Returning the key is necessary because this function will basically become an iterator later on.
We pass it to mapping
to create a wrapper function for an iterator-producing function, and we call it incrementing
, because it’s what this function really does - it is incrementing every value it receives, but it doesn’t yet know how to get them.
This is decided when we pass pairs
to incrementing
, deciding that our function will get all key-value pairs from an object, and map inc
over them.
Alternatively, we could have passed ipairs
and only map inc
over the sequential part of the table.
Finally, we call incrementing_pairs
with a table, in a for in
loop, and it prints the values from the table incremented by 1.
Alternatively, this can be written as:
for _,x in mapping(function (k,x) return k,x+1 end)(pairs)({0,1,2}) do
print(x)
end
A bit more cramped, but readable.
And, basically, that’s it - now we can write more transducers.
But before we do, let’s go back a bit, and understand why I had to wrap f
in f1
right at the beginning of the mapping
function:
|
|
This function is called on each iteration, and iterators terminate the iteration by returning nil
as a key without a value.
So if we got nil
as the first value of the ...
, it means that the iteration is ended, and there’s no need to call f
anymore.
We can still call f
, but it would mean that f
will have to check if it is the end of the iteration, which isn’t great.
By the way, this works as a general interface, so we can use mapping
with other iterators, like string.gmatch
:
words = function (s) return s:gmatch("%w+") end
upcasing = mapping(string.upper)
upcased_words = upcasing(words)
-- one-liner: mapping(string.upper)(function (s) return s:gmatch("%w+") end)("so loud")
for word in upcased_words "so loud" do
print(word)
end
-- SO
-- LOUD
Another interesting property of this approach is that we can compose different functions into a complete data pipeline, abstracting away how they interact:
upcasing = mapping(string.upper)
reversing = mapping(string.reverse)
for s in upcasing(reversing(words))("foo bar") do
print(s)
end
-- OOF
-- RAB
We can write a helper function, that will compose several functions into a single one, like this f = compose(a, b, c); f(42)
produces the same result as a(b(c(42)))
:
function compose(f, ...)
local comp = f
for i=1,select("#", ...) do
local f,g = comp,select(i, ...)
comp = function (...) return f(g(...)) end
end
return comp
end
It can be helpful when we want to create a data pipeline quickly and won’t be subject to too much chaining:
for s in compose(mapping(string.upper),mapping(string.reverse))(words)("foo bar") do
print(s)
end
Now, let’s write a filtering
transducer:
|
|
I hope you’ll forgive my a bit unconventional formatting here, it’s just much easier to read this way for me.
I’m putting all end
on the same line and vertically aligning them with all indented blocks, so it’s more like Python - when the indentation changes, the scope usually does too.
Mainly, it saves so much vertical space, and this article is already long.
Anyway, This function is even more complicated, so let’s break it down too.
First, you’ll notice that the f1
function is different.
Again, we’ll ignore it for the time being, and move straight to the function that wraps the iterator
:
|
|
Same as before, it returns a function, wrapping a call to iterator
to swap its next
function with our step
.
Although, this time around I instead use an IIFE to avoid introducing an explicit wrapper.
The step
function, however, must be a named one, e.g. not local step = function () ...
but local function step () ...
, as it passes itself to f1
along with the object we iterate on, as well as everything from next1
.
We’ll get to that shortly.
Same as with mapping
, we can use filtering
with a function, but now we don’t actually need to return a key.
This is because this time f1
check if f
returns a truthy value, and if we were to return a key, it would count as truth:
is_even = function (_,v) return v%2==0 end
for _,v in filtering(is_even)(pairs)({0,1,2,3,4}) do
print(v)
end
-- 0
-- 2
-- 4
As before, we can use compose
to create a data pipeline that will use other transducers, like mapping
:
for _,v in compose(mapping(inc),filtering(is_even))(pairs)({0,1,2,3,4}) do
print (v)
end
-- 1
-- 3
-- 5
But wait, what’s the big idea?
Right now it looks much more complicated than just writing map
and filter
, which is like half the amount of code, even without packing end
on the same line, and is_even
and inc
don’t have to deal with keys:
function map(f, tbl)
res = {}
for k,v in pairs(tbl) do
res[k]=f(v)
end
return res
end
function filter(pred, tbl)
res = {}
for k,v in pairs(tbl) do
if pred(v) then
res[k] = v
end end
return res
end
function inc (x) return x+1 end
function is_even (x) return x%2==0 end
for _,v in pairs(map(inc, filter(is_even, {0,1,2,3,4}))) do
print(v)
end
-- 1
-- 3
-- 5
But there’s a big difference with this approach.
First, this code isn’t as general, as both map
and filter
don’t know how to iterate things other than tables.
They also hard-code pairs
as part of their process, so we will always iterate through all keys no matter what our goal is.
Finally, this style produces intermediate tables on each step: first, filter
produces a table with even numbers, then map
consumes this table and produces an entirely new table with altered values.
Which wastes memory depending on the overall size of the data and gives GC extra work.
Transducers fix all these issues - both mapping
and filtering
are abstracted from how we do the iteration itself.
And instead of producing intermediate tables, everything is done on a per-element level.
Here’s how we can visualize:
function inc (k,x) print("inc "..x) return k,x+1 end
function is_even (_,x) print(x.." even?", x%2==0) return x%2==0 end
for _,v in compose(mapping(inc),filtering(is_even))(pairs)({0,1,2,3,4}) do
print("result", v)
end
-- 0 even? true
-- inc 0
-- result 1
-- 1 even? false
-- 2 even? true
-- inc 2
-- result 3
-- 3 even? false
-- 4 even? true
-- inc 4
-- result 5
As can be seen, if the x
is even, we pass it to inc
, otherwise, we get another one and check it.
The “Transducers” article by Fasih Khatib has a nice visualization for this process:
Except, in our case, some values are simply dropped during the filtering step.
In order to understand how this works, let’s look closer at f1
from filtering
:
|
|
As I mentioned, step
passes itself to f1
, along with the iterable
, and values from next1
.
This function mostly ignores the first two arguments and mainly works with ...
.
But if f
returned false
or nil
, we need to somehow continue the iteration until f
returns a truthful value, or until the iterable
is exhausted.
To achieve that, we recursively call back the step
function, called recur
here, and it produces a new value and calls f1
again.
Since both are tail calls, everything is optimized into a loop.
I can prove that by creating an iterator that never exhausts, and wrapping it with filtering
:
function ones () return function () return 1,1 end end
for _ in filtering(is_even)(ones)() do
break
end
-- even? 1 false
-- even? 1 false
-- ...
This will never finish, but also won’t consume stack or memory.
ones
simply returns 1,1
every time it is invoked, but filtering
looks for even values, so we’ll never reach the body of the loop.
And that’s another benefit of transducers - same as iterators, transducers remain lazy, and can work on infinite sets of data.
Finally, let’s write a stateful transducer, such as taking
.
It’s going to terminate the iteration earlier, after n
elements were read:
function taking(n)
local count = 0
return function (iterator)
return function (...)
return (function (next1, ...)
return function (iterable, ...)
return (function (...)
if nil ~= ... and count < n then
count = count + 1
return ...
end end)(next1(iterable, ...))
end, ...
end)(iterator(...))
end end end
It may seem easy - all we do is that instead of accepting a filtering function, we accept a number, and check against it.
Otherwise, it’s exactly the same principle as filtering
, but we don’t need the recursion:
for k,v in taking(4)(pairs)({a=1,b=2,c=3}) do print(k,v) end
-- all values are printed as the table has < 4 keys
-- b 2
-- a 1
-- c 2
for _,v in taking(4)(pairs)({1,2,3,4,5,6,7}) do print(v) end
-- only first 4 values are printed
-- 1
-- 2
-- 3
-- 4
So, in the following example, we create a transducer that should take at most four elements from a collection:
four_pairs = taking(4)(pairs)
What will be printed if we use it here?
for k,v in four_pairs({a=1,b=2,c=3}) do print(k,v) end
for _,v in four_pairs({1,2,3,4,5,6,7}) do print(v) end
Results
-- for k,v in four_pairs({a=1,b=2,c=3}) do print(k,v) end
-- b 2
-- a 1
-- c 2
-- for _,v in four_pairs({1,2,3,4,5,6,7}) do print(v) end
-- 1
People who are familiar with Rich Hickey’s talk Inside Transducers, already know what’s the issue here, and why the result is different from the ones where we created transducers in each for
loop.
All stateful transducers must create their state anew every time they’re finalized.
In our case, the finalization happens when we’re passing a collection to the resulting iterator.
So in order to fix this we need to move where we’re initializing the count
variable:
--- orig taking
+++ fixd taking
@@ -1,9 +1,9 @@
function taking(n)
- local count = 0
return function (iterator)
return function (...)
return (function (next1, ...)
+ local count = 0
return function (iterable, ...)
return (function (...)
if nil ~= ... and count < n then
count = count + 1
It should be done as late as possible, right before we’re returning our next
function.
And because this iterator is stateful, we can’t share the next
function it returns with others, unlike the one we get from pairs
.
Now it will work as we’d expect:
four_pairs = taking(4)(pairs)
for k,v in four_pairs({a=1,b=2,c=3}) do print(k,v) end
-- b 2
-- a 1
-- c 2
for _,v in four_pairs({1,2,3,4,5,6,7}) do print(v) end
-- 1
-- 2
-- 3
-- 4
As our final step in our descendant to functional madness, let’s write a paritioning
transducer, that will group individual elements into groups of n
elements.
It’s important to look at it, as it has a problem we haven’t encountered yet.
Also, remember when I said to you that the step
function must be a named one?
Well, I lied:
|
|
That’s a lot to take in.
First of all, there’s no real reason to make this as complicated as it is besides proving the point that I’m proficient with functions returning functions.
That thing in line 7 is a Y combinator - I used it here to move the explicit creation of the named step
function into its argument and hence made it recursive without really defining it as a named function.
So, you can do that in Lua, but it’s here only for fun.
We can see that it works by writing another transducer that will use table.concat
to pretty-print the tables:
function table_formatter (t)
return "{"..table.concat(t, ", ").."}"
end
for s in compose(mapping(table_formatter),
partitioning(3))(function (s) return s:gmatch(".") end)("foobar") do
print(s)
end
-- {f, o, o}
-- {b, a, r}
However, as I mentioned, this transducer has a problem that no other transducers had up to this point.
Let’s try using it with ordinary pairs
:
for k,v in partitioning(3)(pairs)({"a","b","c","d","e"}) do
print(k,table.concat(v, " "))
end
-- table: 0x555ff86717a0 nil
-- invalid key to 'next'
-- stack traceback:
-- [C]: in function 'next'
After we get the first partition, we immediately get an error.
This happens because next
, the function we internally use to get the next item from the iterable
object has kind of a feedback loop.
It uses the result of its last call to continue iteration.
Observe:
next({"a","b","c"})
-- 1, "a"
next({"a","b","c"}, 1)
-- 2, "b"
next({"a","b","c"}, "x")
-- invalid key to 'next'
-- stack traceback:
-- [C]: in function 'next'
So, when our iterator returns the table, it is passed to next
as the key, and because next
can’t find this key in the table, it throws an error.
This is also the reason, why we have to return keys unaltered from any function we pass to mapping
.
This is one major difference from Clojure because there we’re saying how to deal with that by finalizing the transducer with the reducing function.
But here we don’t have any reducing function.
We could try to work around this, but there’s another glaring issue here. Look at how we’re adding values to the partition:
|
|
Even if we were to somehow store the last key, and pass it to the next
call we still store values in the array in a completely wrong way.
This is where Lua’s open iterator interface falls short.
Because there’s no guarantee that the first argument is a key and not actual da4a, we can’t just return the first argument, and put everything else into the array.
Moreover, if the iterator returns not a key-value pair, but more values, we don’t know how to put these either.
The solution?
Is of course more functions!
We’re going to tell partitioning
transducer upfront how it should handle the results.
This, of course, limits the generality of our system, but I think this is a testament to how well Clojure is designed.
Transducers were added to it basically without any internal change.
Here’s the fixed version:
|
|
I’ve “fixed” the Y combinator to propagate all of its arguments into all of its functions, instead of just one, and added the last
key storage, so we could properly terminate the iteration, and return the array.
If you count just the keyword function
in this piece of code, it appears a whopping 13 times!
It’s crazy, yes even by my standards, I would never give such code a pass, but it’s nice to fool around and see how far can we push function manipulation style.
Can I remove the storage, and put it into the function’s arguments during anonymous recursion done with the Y combinator?
Eh, probably.
Will I do it?
Nah.
What matters is that this code now works with pairs
:
function collector(arr, n, k, v)
arr[n]=v
return k, arr
end
partition_three_pairs=partitioning(3,collector)(pairs)
for _,v in partition_three_pairs({"a","b","c","d","e"}) do
print(table_formatter(v))
end
-- {a, b, c}
-- {d, e}
for _,v in partition_three_pairs({a=1,b=2,c=3,d=4,e=5}) do
print(table_formatter(v))
end
-- {1, 2, 5}
-- {3, 4}
It also correctly preserves the state, preventing it from leaking when we save the transducer finalizing it with paris
.
The same transducer works with the single-argument stateful iterator, provided a different collector
:
function string_collector (arr, n, s)
arr[n]=s
return arr
end
function chars (s) return s:gmatch(".") end
for t in partitioning(4,string_collector)(chars)("foobar") do
print(table_formatter(t))
end
-- {f, o, o, b}
-- {a, r}
With this, I feel that my functional desires are fulfilled. For now, at least.
I know, this article was a wall of text and code, but it is mostly because I was really passionate about the topic, and wanted to get it out of my system, so I could continue working on the game3. If you’ve made it til here, and you have questions about the code, either because my explanations were unclear, or simply because there’s not enough of them, feel free to contact me, and I will try to answer, and update the post.
If instead you looked at the code and thought something like “well that escalated quickly”, or that I’m just a crazy person, doing over-complicated things - well that’s a valid point of view as well!
I wouldn’t say that these transducers are practical, and their runtime characteristics and impact on the Lua runtime performance is unclear to me.
I mean, when I talked about how map
and filter
produce intermediate collections that are thrown away once we’re done, and how it is bad, having a function wrapping another 12 nested function wrappers with a Y combinator in the middle isn’t much better.
But still, I feel great just because I managed to make it work, and it isn’t completely insane, as all of the principles are pretty simple.
Functional decorators existed for a long time, and these transducers are basically it.
Anyhow, I hope you’ve found this interesting. Thanks for reading!