Understanding transducers
Some time ago I ported most of Clojure’s core
namespace to Fennel and made it into a library called fennel-cljlib.
This was my first library for Fennel, so it wasn’t really great in terms of how it was implemented.
While it was making Fennel more like Clojure syntax-wise, which I like, it wasn’t following Clojure’s semantics that well.
The main thing that was missing is a proper seq
abstraction, which Clojure relies on for providing both lazy sequences and generic functions that can work on any data type that implements ISeq
.
Such functions are map
, filter
, take
and so on.
Since then, I’ve made a few more libraries for Fennel, which were somewhat more narrowly focused, and one of such libraries was lazy-seq - an implementation of Clojure’s lazy sequences.
It doesn’t feature chunked sequences, (yet, maybe), but it implements almost all sequence-related functions from Clojure.
And you can throw pretty much any Lua data structure that implements pairs
or ipairs
into it, and it will work out how to lazily transform it into a sequence.
This was one of the missing pieces for the fennel-cljlib, as its implementation of seq
simply made a shallow copy of a given object in a linear time, making sure that the result is sequential.
With the implementation of seq
from the lazy-seq library, I could rewrite fennel-cljlib, also making all sequence-related functions lazy.
And while this will make the library more Clojure-like one piece is still missing from both fennel-cljlib and lazy-seq libraries.
Transducers.
I’m quite familiar with transducers, well, I use them regularly at work, and I read about their implementation a few years ago.
However, I’ve never implemented a transduceable context, i.e. a function that accepts a transducer, and applies it to elements of a given collection.
So, as a part of the rewrite of the fennel-cljlib library, I needed not only to port transducers themselves, but I also had to implement such functions as into
, transduce
, and sequence
, which are transduceable contexts.
Thankfully, into
and transduce
are written in Clojure, and are very straightforward to understand, but the sequence
function is not.
Here’s its source code:
(defn sequence
([coll]
(if (seq? coll) coll
(or (seq coll) ())))
([xform coll]
(or (clojure.lang.RT/chunkIteratorSeq
(clojure.lang.TransformerIterator/create xform (clojure.lang.RT/iter coll)))
()))
([xform coll & colls]
(or (clojure.lang.RT/chunkIteratorSeq
(clojure.lang.TransformerIterator/createMulti
xform
(map #(clojure.lang.RT/iter %) (cons coll colls))))
())))
It is written mostly via Java interop, and I can’t use this in my port of the clojure.core
namespace to Fennel, because Fennel runs on Lua, and Lua can’t really interop with Java.
So I had to reimplement this function in Fennel, and this is what motivated me to write this post.
Also, I don’t really know Java, so understanding how sequence
works was a challenge on its own1.
The interesting thing is, after I’ve tried to implement this function several times, I understood that I, actually, don’t understand transducers as well as I thought.
So I had to go back a bit and learn how transducers actually work, and why are they written the way they are.
It was really fascinating, and after a bit of trial and error, I managed to implement sequence
in Clojure first, and then port it to Fennel, using my implementation of lazy sequences.
I will show the resulting code later in this post, but first, let’s understand what transducers are, and how they work.
Transducers
First, a bit of theory. A transducer is a function that describes the process of transformation, without knowing how exactly the thing it transforms is organized. It is not the same as generic functions, because transducers are generic in a bit different way.
For example, we all know how map
works in Clojure - you pass it a function and a collection, and it applies the function to each element of the collection:
(map inc [1 2 3]) ;; => (2 3 4)
Map
walks through the given collection, applies the given function to each element, and puts results into a sequence.
Seems nice, and Clojure actually makes map
generic by transforming the given collection to a sequence before actually mapping over it.
However, this approach isn’t generic enough, because there are things that can’t be transformed into a sequence in an efficient or even meaningful way.
One such thing is an asynchronous channel, and when Clojure developers were working on the core.async
library, they’ve realized that sequence manipulation functions are usable in the context of a channel, but reimplementing all these functions is a no-go.
So how did Clojure developers solve this problem? By decoupling the transformation process from the collection it transforms. Here’s a helpful analogy.
Imagine an airport worker, whose job is to weigh and sort the luggage before it goes into a plane. Their basic instructions are:
- Take a bag;
- Measure its weight and put a sticker on the bag;
- If the bag weight is bigger than X, don’t put the bag on the plane;
- Hand the bag over.
Note that this process, while can be applied to a single bag at a time, doesn’t at all specify how bags are coming to you and how they leave you. One day bags can come to you in containers brought by a vehicle, the other day they can come on a conveyor, it doesn’t matter to you - you just take a bag, weigh it, put a sticker and hand it over to another car or another conveyor. These details should not matter, because your job remains the same, even if you take bags from different sources every day.
This is, of course, an analogy, but it applies to programming pretty well.
Bags are items in the collection you’re going to map
a function over and then filter
them out.
The function is what you do with the bag, in the case of our fellow airport worker weighing bags and putting stickers.
In addition, notice that we first weigh the bag, and then filter it out immediately, in oppose to weighing all bags and then filtering them one by one.
However, in a programming language, the way how items are coming to us completely depends on the collection.
And how we collect results into another collection depends on the implementation of the map
function.
These are two main things stopping us from describing an arbitrary transformation process, without tying ourselves to a particular collection implementation or a conversion protocol.
Looking at other languages, which provide different classes for different data structures, most of the time map
is a method and not a function.
This way map
can be implemented in terms of the collection you’re mapping through, usually producing the same collection as a result, because this method knows how to map a function over this particular collection implementation.
Methods do not fly in functional languages, so another approach, some languages take is to provide different implementations of map
functions via namespaces.
For example, Elixir has a map
function implemented for lists and another implementation for streams:
# Enum versions of map and filter
iex(1)> (0..3
|> Enum.map(fn(x) -> IO.puts("map #{x}"); x + 1 end)
|> Enum.filter(fn(x) -> IO.puts("filter #{x}"); rem(x, 2) != 0 end)
|> List.first)
map 0
map 1
map 2
map 3
filter 1
filter 2
filter 3
filter 4
1
# Stream versions of map and filter
iex(2)> (0..3
|> Stream.map(fn(x) -> IO.puts("map #{x}"); x + 1 end)
|> Stream.filter(fn(x) -> IO.puts("filter #{x}"); rem(x, 2) != 0 end)
|> Enum.to_list
|> List.first)
map 0
filter 1
map 1
filter 2
map 2
filter 3
map 3
filter 4
1
The difference is quite substantial, as streams apply function composition to each element one by one, (similarly to how our airport worker does) whereas enumerations are fully traversed by map
first and only then the result is being passed to filter
.
This distinction is possible thanks to different implementations of map
, but be it a method of a specific class, or a namespaced function that’s exactly what Clojure developers wanted to avoid.
So transducers were created.
Understanding transducers
To understand transducers, we first need to understand what map
essentially does and how we can abstract it away from both the input collection and the result it produces.
It may seem obvious: map
applies a function to each element of a collection and puts the result into a new collection.
I’ve marked important things in bold because if we think about these a bit, we’ll see that there are some concrete actions that the map function performs, which should be abstracted away.
First, let’s implement map
in terms of reduce
.
This function is actually very similar to how mapv
is implemented in Clojure, except we’ve left out some optimizations:
(defn mapr
"A version of `map` that uses `reduce` to traverse the collection and
build the result."
[f coll]
(reduce (fn reducer [result val] (conj result (f val))) [] coll))
;; ^ ^ ^ ^ ^ ^
;; | | | | | |
;; | | | | | `Collection to iterate
;; | | | | `Collection to put results
;; | | | `Function that produces the result
;; | `A reducing function that`knows how to put elements to result
;; `This is how we get a collection element one by one
Here, reduce
takes care of how to traverse the collection, and the reducer
function takes care of how to put elements to the result.
It may seem that we’ve decoupled these steps from map
but we’ve just moved them into another place.
More than that, if we were to implement filter
this way we would have to put the logic, that decides what elements are going to be left out, into an anonymous function:
(defn filterr
"A version of `filter` that uses `reduce` to traverse the collection
and build the result."
[f coll]
(reduce (fn [res x] (if (f x) (conj res x) res)) [] coll))
;; ^ ^
;; | |
;; | `Logic that decides whether the value will be left out
;; `generic way to iterate through a collection
Notice, that both mapr
and filterr
share a lot of structure, the only difference here is how to put the resulting value into the collection.
This should give us a hint on how we can abstract this away.
And given that Clojure is functional, we can write functions that accept other functions and return new functions, which will provide a generic way of how to put the result into a collection:
(defn map-transducer [f]
(fn [reducer]
(fn [result value]
(reducer result (f value)))))
(defn filter-transducer [pred]
(fn [reducer]
(fn [result value]
(if (pred value)
(reducer result value)
result))))
This isn’t what a transducer really is like, but a first real step towards them.
The key point here is that now, we can describe a reducing process without knowing how to put the modified item into the resulting collection (or channel, or socket, or whatever).
The only thing we need to implement for collection now is reduce
or some other function that has the same interface as reduce
.
This is how we can use these prototype transducers:
user> (def incrementer (map-transducer inc)) ; a function that knows how to increment
#'user/incrementer
user> (incrementer conj) ; teaching `incrementer` how to put elements to the result
#function[user/map-transducer/fn--5698/fn--5699]
user> (reduce (incrementer conj) [] [1 2 3]) ; using this transducer in `reduce`
[2 3 4]
user> (reduce ((filter-transducer odd?) conj) [] [1 2 3]) ; same for `filter-transducer`
[1 3]
So what happens here is that when we call map-transducer
and pass it a function inc
it returns a function, that accepts the reducer
also known as the reducing function of just rf
for short.
We then call this function, passing it the reducing function conj
and get another function, that accepts the results so far, and the element to process.
This function then calls inc
, which we’ve supplied in the first step, on the element, and uses conj
to put the resulting value to result
.
In other words, by passing inc
and conj
to the transducer we’ve basically constructed (fn [result value] (conj res (inc value)))
function, that is then used by reduce
.
Here’s a demonstration:
user> (reduce ((map-transducer inc) conj) [] [1 2 3])
[2 3 4]
user> (reduce (fn [result value] (conj result (inc value))) [] [1 2 3])
[2 3 4]
And that’s basically what transducers are all about! They’re just a composition of functions, that produces the final transformation function, that acts as a single step over the given collection. And the amazing part of such design is that transducers can be composed with other transducers:
user> (reduce ((comp (map-transducer inc)
(filter-transducer odd?))
conj)
[] [1 2 3 4 5 6])
[3 5 7]
user> (reduce (fn [result value] ; above is essentially the same as this
((fn [result value] ; function composition
(if (odd? value)
(conj result value)
result))
result (inc value)))
[] [1 2 3 4 5 6])
[3 5 7]
It may be a little hard to process, but don’t worry, I will go into details after we complete the implementation of transducers, as we’re not yet finished.
Completing transducers
Most transducers don’t have any intermediate state, but some do.
For example, the partition-all
function takes a collection and returns a list of partitions of elements from this collection.
The transducer, which is returned by this function, needs to store elements inside an array, and only after the current partition is filled it will append it to the result.
Seems logical, however, if the number of elements can’t be equally partitioned, some will be left over:
user> (partition-all 3 (range 8)) ; a regular partition-all call
((0 1 2) (3 4 5) (6 7))
user> (defn partition-all-transducer
"Our naive implementation of `partition-all` as a transducer."
[n]
(fn [reducing-function]
(let [p (volatile! [])]
(fn [result value]
(vswap! p conj value) ; building the partition
(if (= (count @p) n)
(let [p* @p]
(vreset! p []) ; clearing the partition storage
(reducing-function result p*)) ; adding the partition to the result
result ; returning result as is, if the partition is not yet complete
)))))
#'user/partition-all-transducer
user> (reduce ((partition-all-transducer 3) conj) [] (range 8))
[[0 1 2] [3 4 5]]
We can see that in the case of our implementation, only complete partitions were added to the result, yet there should be an additional incomplete partition, as shown by the direct partition-all
call.
This is because our transducer is missing a so-called completion step:
(defn partition-all-transducer [n]
(fn [reducing-function]
(let [p (volatile! [])]
(fn
([result] ; completion arity
(if (pos? (count @p))
(reducing-function result @p)
(reducing-function result)))
([result value] ; reduction arity
(vswap! p conj value)
(if (= (count @p) n)
(let [p* @p]
(vreset! p [])
(reducing-function result p*))
result))))))
Here I’ve added another arity, that must be called after the reduction process is complete.
This arity checks if the array we’ve used to store the incomplete partition is not empty.
If it’s not, it means that there are some leftovers, that we need to add to the result.
So it calls reducing-function
with this array, otherwise it will call reducing-function
with result
only, propagating completion step down the line.
Invoking this arity after reduce
completed, we can see that all partitions are present:
user> (let [f ((partition-all-transducer 3) conj)
res (reduce f [] (range 8))]
(f res)) ;; complete step
[[0 1 2] [3 4 5] [6 7]]
Our reduce
example has become way too verbose, and there’s also a potential for error if our transducer leaked from this scope and someone else used it after it was completed.
Notice that I’ve forgotten to clear the volatile p
in the completion step, and if someone else calls this particular function again, these leftover elements will be added to the result again.
(Try to achieve that.)
Therefore, Clojure abstracts the process of finalizing a transducer into a function, conventionally called transduce
:
(defn transduce
([xform f coll]
(transduce xform f (f) coll))
([xform f init coll]
(let [f (xform f)
ret (reduce f init coll)]
(f ret))))
There’s one additional arity that can be added to our transducer implementation, which is used for initialization, done by calling the reducing function without arguments.
This arity takes zero arguments, and it is optional, as not all reducing functions can come up with a meaningful initialization process, but it’s better to supply it than not.
So a complete implementation of the map-transducer
function is:
(defn map-transducer [f]
(fn [reducing-function]
(fn
([] ; init
(reducing-function))
([result] ; complete
(reducing-function result))
([result input] ; step
(reducing-function result (f input)))
([result input & inputs] ; step with multiple inputs
(reducing-function result (apply f input inputs))))))
And that’s it!
This is a complete implementation of a transducer.
Clojure provides this as an additional arity of map
where you only supply a function, without the collection, thus we don’t even need a separate function for this, it was merely for demonstration purposes.
If you look back at the Elixir example, you can see that when Stream
implementations of map
and filter
are used, each function is applied in quick succession, opposed to Enum
version, where map
is applied first, and then the result is filter
‘ed as a whole.
With transducers, we just implemented, or with ones available to us in Clojure we can do exactly the same:
user> (->> (range 4)
(map (fn [x] (println "map" x) (inc x)))
(filter (fn [x] (println "filter" x) (odd? x)))
first)
;; map 0
;; map 1
;; map 2
;; map 3
;; filter 1
;; filter 2
;; filter 3
;; filter 4
;; => 1
user> (->> (range 4)
(transduce
(comp (map (fn [x] (println "map" x) (inc x)))
(filter (fn [x] (println "filter" x) (odd? x))))
conj)
first)
;; map 0
;; filter 1
;; map 1
;; filter 2
;; map 2
;; filter 3
;; map 3
;; filter 4
;; => 1
Now, I should mention that it’s a bit stretched example.
The lazy composition of sequences in the first snippet should behave similarly to the second because when we’re mapping a function in a lazy way we don’t consume the whole sequence, we only produce a new one, that knows how to construct itself based on the previous one.
And if we then filter this mapped sequence, we again construct a new sequence, that knows how to build itself lazily.
So, in an ideal world, the first example should also give us map,filter,map,filter
kind of composition, but unfortunately, it is somewhat slow, so Clojure uses chunking, and lazy sequences can compute up to 32 elements upfront.
This isn’t the case for Elixir example, Enum.map
is eager (the Erlang module is also eager), and will produce fully realized sequences.
Streams in Elixir are lazy, and perhaps their composition is what we would have for lazy sequences in Clojure if there was no chunking, but in our case transducers show the difference in composition a bit clearer.
Now let’s really understand how transducers work.
Understanding the inverse order in comp
and how transducers are composed
The transduce
call may have seemed a bit complicated because of comp
, but here’s a nice trick I’ve found.
Simply remember, that the order of transducers in comp
is exactly the same as the order of calls in the ->>
macro.
However, it may seem counter-intuitive, because usually functions in comp
are applied in the reverse order, e.g.:
((comp a b c d) x)
; expressions are aligned for clarity
(a (b (c (d x))))
In other words, even though functions are provided in order a
, b
, c
, d
, the call order will be d
, c
, b
, a
.
And in the case of transducers the composition works the same way, it’s just we have one extra step after we’ve passed the reducing function.
As I’ve demonstrated in this example, the composition basically is engineered in such a way that it kinda inverses its order twice. The first inversion happens after we compose functions, and the second inversion happens when we pass the reducing function. Let’s use the substitution model, sometimes referred to as normal order evaluation to see why this happens.
Substitution here basically means that before we do any reduction, we expand all forms until they only contain primitives. This is done by substituting names with expressions they refer to. For example, given these three functions:
(defn add [a b] (+ a b))
(defn square [x] (* x x))
(defn sum-squares [a b] (+ (square a) (square b)))
We can walk through the (sum-squares (add 1 1) (add 1 2))
expression, and see how it will be evaluated in normal order, and how it differs from applicative order:
Normal order | Applicative order |
---|---|
(sum-squares (add 1 1) (add 1 2)) |
(sum-squares (add 1 1) (add 1 2)) |
(sum-squares (+ 1 1) (+ 1 2)) |
(sum-squares (+ 1 1) (+ 1 2)) |
(+ (square (+ 1 1)) (square (+ 1 2))) |
(sum-squares 2 3) |
(+ (* (+ 1 1) (+ 1 1)) (* (+ 1 2) (+ 1 2))) |
(+ (square 2) (square 3)) |
(+ (* 2 2) (* 3 3)) |
(+ (* 2 2) (* 3 3)) |
(+ 4 9) |
(+ 4 9) |
13 |
13 |
Notice, that in normal order (+ 1 1)
and (+ 1 2)
are executed twice, because all substituting happens before any reduction.
In applicative order, reduction happens before substituting, so every expression is computed only once.
Applicative order is more in line with real evaluation rules, but it’s harder to see what’s happening when we compose things, so I’ll use the normal order to show how transducers are composed.
With this in mind, let’s try to walk through the following expression:
((comp (map-transducer inc)
(filter-transducer odd?))
conj)
First, we need to substitute map-transducer
and filter-transducer
with their (simplified) implementations:
((comp ((fn [f]
(fn [rf]
(fn [result value]
(rf result (f value))))) inc)
((fn [f]
(fn [rf]
(fn [result value]
(if (f value)
(rf result value)
result)))) odd?))
conj)
Next, let’s substitute f
with inc
and odd?
and get rid of function calls, substituting them with anonymous functions that accept rf
:
((comp (fn [rf]
(fn [result value]
(rf result (inc value))))
(fn [rf]
(fn [result value]
(if (odd? value)
(rf result value)
result))))
conj)
Now, knowing that composition of functions f
and g
is (fn [x] (f (g x)))
we can substitute comp
with this expression, and then substitute f
and g
in this expression with our functions from the previous step:
((fn [x]
((fn [rf]
(fn [result value]
(rf result (inc value))))
((fn [rf]
(fn [result value]
(if (odd? value)
(rf result value)
result)))
x)))
conj)
Note that according to composition rules, odd?
should be executed first, and inc
would follow it, but we’re not done composing yet.
Let’s substitute x
for conj
and remove the function call:
((fn [rf]
(fn [result value]
(rf result (inc value))))
((fn [rf]
(fn [result value]
(if (odd? value)
(rf result value)
result)))
conj))
We can now substitute rf
for conj
in the innermost function call, and remove the innermost function that accepts rf
:
((fn [rf]
(fn [result value]
(rf result (inc value))))
(fn [result value]
(if (odd? value)
(conj result value)
result)))
Finally, we can substitute outer rf
with the entire inner function body, and remove another call:
(fn [result value]
((fn [result value]
(if (odd? value)
(conj result value)
result))
result (inc value)))
As the last step, we’re substituting the inner function’s value
to (inc value)
and result
to result
to eventually get:
(fn [result value]
(if (odd? (inc value))
(conj result (inc value))
result))
The final function that will be executed by reduce
, which is just an ordinary two-argument function!
And as you can see inc
happens before odd?
.
So yes, the order in comp
may seem inverse, as inc
and odd
appear in a logical order, but thanks to how the whole composition process evolves, this logical order can be preserved in the resulting function.
I hope that with this substitution model you can now understand the whole composition process of transducers, which is not a trivial process by any means.
But I’m actually amazed by how this simple idea achieves such a complete abstraction that can be used in all kinds of transformation contexts.
Speaking of transduceable contexts, let’s implement one!
Implementing sequence
transduceable context
Now we’re ready to implement sequence
in Clojure, without direct Java interop.
First, let’s remember what reduce
does:
- Accepts a function, initial value, and a collection;
- Gets an element of a collection and passes the initial value and the element to the reducing function;
- The reducing function returns the current result;
- The result is then passed to the reducing function alongside the next element from the collection;
- Once the collection is exhausted, the result is returned.
Thus, reduce
can be written as an ordinary loop.
However, sequence
is lazy, therefore we can’t just loop through the collection, but thankfully, lazy sequences can be recursive.
And, there’s another problem, we must append each element to the sequence we’re producing, but we also need to check if we’ve actually finished, or if we need to skip the element because it is filtered out.
And we need to call the completion step somewhere.
Though, if you think about it, we don’t need for our transducer to actually append elements to a collection, it can merely do the transformation, and since we know what kind of collection we’re building, we can build it later. With this approach, we can check the result of a transducer on each step and act accordingly.
First, let’s finalize the transducer with a reducing function:
(defn sequence [xform coll]
(let [f (xform (completing #(cons %2 %1)))]
;; lazy loop?
))
It may seem that we’re using cons
here because we’re building a sequence, but it’s not.
We could actually use anything, like conj
or even a function that returns something that we can later distinguish from other values.
In cons
call, we have to reverse arguments though, because cons
adds the second argument to a list, provided as a first argument.
And completing
simply adds a completion step that just returns the value it’s been given, in other words, we could write it as (fn ([a] a) ([a b] (cons b a)))
.
Now let’s figure out how to loop through the collection. We can start with an ordinary loop and then convert it to recursion, adding laziness as the last step.
(defn sequence [xform coll]
(let [f (xform (completing #(cons %2 %1)))
res (loop [s (seq coll)
res ()]
(if s
(let [x (f nil (first s))]
(if (seq? x)
(recur (next s) (concat res x))
(recur (next s) res)))
res))]
res))
Let’s try it:
user> (sequence (map inc) [1 2 3])
(2 3 4)
user> (sequence (partition-all 2) [1 2 3 4 5])
([1 2] [3 4])
user> (clojure.core/sequence (partition-all 2) [1 2 3 4 5])
([1 2] [3 4] [5])
Seems to work, but we’re missing the completion step (hence no incomplete partition in the result), so let’s add it:
user> (defn sequence [xform coll]
(let [f (xform (completing #(cons %2 %1)))
res (loop [s (seq coll)
res ()]
(if s
(let [x (f nil (first s))]
(if (seq? x)
(recur (next s) (concat res x))
(recur (next s) res)))
(f res)))] ; complete
res))
user> (sequence (partition-all 2) [1 2 3 4 5])
([5] [1 2] [3 4])
Oops, remember, that we’re using cons
and we can’t really call f
with res
as a completion step, because our reducing function doesn’t know how to build the whole collection, only how to transform a single element.
Instead, we have to call it with nil
as before, and concat
it with the result:
user> (defn sequence [xform coll]
(let [f (xform (completing #(cons %2 %1)))
res (loop [s (seq coll)
res ()]
(if s
(let [x (f nil (first s))]
(if (seq? x)
(recur (next s) (concat res x))
(recur (next s) res)))
(concat res (f nil))))] ; proper completion
res))
user> (sequence (partition-all 2) [1 2 3 4 5])
([1 2] [3 4] [5])
Now let’s make it recursive, by replacing loop
with an anonymous function, and recur
with actual recursion:
user> (defn sequence [xform coll]
(let [f (xform (completing #(cons %2 %1)))
step (fn step [coll res]
(if-some [s (seq coll)]
(let [x (f nil (first s))]
(if (seq? x)
(step (next s) (concat res x))
(step (next s) res)))
(concat res (f nil))))]
(step coll ())))
#'user/sequence
user> (sequence (partition-all 2) [1 2 3 4 5])
([1 2] [3 4] [5])
user> (dorun (sequence (map inc) (range 100000)))
Execution error (StackOverflowError) at user/sequence$step (REPL:25).
null
It still seems to work, but overflows with enough elements.
Luckily, we can use lazy-seq
to eliminate this problem, and actually make our implementation lazy:
user> (defn sequence [xform coll]
(let [f (xform (completing #(cons %2 %1)))
step (fn step [coll res]
(if-some [s (seq coll)]
(let [x (f nil (first s))]
(if (seq? x)
(lazy-seq (step (rest s) (concat res x)))
(lazy-seq (step (rest s) res))))
(concat res (f nil))))]
(step coll ())))
user> (dorun (sequence (map inc) (range 100000)))
Execution error (StackOverflowError) at user/sequence$step (REPL:1).
null
And it still throws the StackOverflowError
.
Why?
Well, because we’re not really lazy yet.
Instead of passing the result to the next iteration of step
, as we did in loop
we should use the result of step
and concatenate with it.
Let’s reorganize our function:
user> (defn sequence [xform coll]
(let [f (xform (completing #(cons %2 %1)))]
((fn step [coll]
(if-some [s (seq coll)]
(let [res (f nil (first s))]
(if (seq? res)
(concat res (lazy-seq (step (rest s))))
(step (rest s))))
(f nil)))
coll)))
user> (dorun (sequence (map inc) (range 100000)))
nil
Now it doesn’t overflow.
However, it is not yet ready to be used, because reduce
, as you may know, can be terminated with reduced
, and some transducers, like take
leverage that to terminate the process.
So we need to check for reduced?
in our implementation.
Not only that, but we have to call the completion step on the value, returned by dereferencing the reduced
object, otherwise it will not be added to the resulting sequence properly.:
user> (defn sequence [xform coll]
(let [f (xform (completing #(cons %2 %1)))]
((fn step [coll]
(if-some [s (seq coll)]
(let [res (f nil (first s))]
(cond (reduced? res) (f (deref res)) ; checking for early termination
(seq? res) (concat res (lazy-seq (step (rest s))))
:else (step (rest s))))
(f nil)))
coll)))
#'user/sequence
user> (sequence (comp (partition-all 2) (take 5)) (range))
([0 1] [2 3] [4 5] [6 7] [8 9])
user> (sequence (comp (take 5) (partition-all 2)) (range))
([0 1] [2 3] [4])
And we’re done!
Well, almost, sequence
should coerce the result to an empty sequence, and our current version will return nil
if the transducer never returned anything.
It’s easy enough to fix:
(defn sequence [xform coll]
(let [f (xform (completing #(cons %2 %1)))]
(or ((fn step [coll]
(if-some [s (seq coll)]
(let [res (f nil (first s))]
(cond (reduced? res) (f (deref res))
(seq? res) (concat res (lazy-seq (step (rest s))))
:else (step (rest s))))
(f nil)))
coll)
())))
We can see that it is lazy, by using side-effecting transducers, like ones with println
in those:
user> (sequence
(comp (map (fn [x] (println "map" x) (inc x)))
(filter (fn [x] (println "filter" x) (odd? x)))
(take 3))
(range))
map 0
filter 1
(1map 1
filter 2
map 2
filter 3
3map 3
filter 4
map 4
filter 5
5)
And as you can see, our sequence
behaves in the same way as the clojure.core/sequence
, regarding laziness:
user> (do (sequence
(comp (map (fn [x] (println "map" x) (inc x)))
(filter (fn [x] (println "filter" x) (odd? x))))
(range)) ;; note, infinite range
nil)
map 0
filter 1
nil
user> (do (clojure.core/sequence
(comp (map (fn [x] (println "map" x) (inc x)))
(filter (fn [x] (println "filter" x) (odd? x))))
(range))
nil)
map 0
filter 1
nil
I think now sequence
is completed, though I’ll need to test it very extensively in the future.
I’ve already tested it a lot when I was porting it to Fennel, and I think it should work correctly, looking at the code at least I don’t see anything that could go wrong.
What have I learned
Implementing transduceable context, in this case, the sequence
function, was a nice puzzle.
I’m sure it’s not as efficient, as the clojure.core
version, mainly due to the use of concat
but I wasn’t able to come up with a better way of building the result which can be done lazily.
And after actually implementing sequence
, porting it, and a lot of transducers to Fennel, I’ve finally figured out how they actually work, and I hope that now you understand it too!
This is why I love Clojure - the developers are putting a lot of thought into such features, and not taking shortcuts, like reimplementing all functions for data sources that can’t be transformed into sequences.
For me, it’s a really practical language, with a lot of tools that can enhance the programming experience, and make it fun.
Of course, if you have any questions, feel free to email me, and I’ll try to answer them, and maybe update the post for future readers. This topic is a bit complicated, so I hope it was not a boring post. Thanks for reading!
-
I actually like this approach to learning. Usually, I’m not reading how a thing is implemented, and instead trying to figure out everything myself. This unfortunately doesn’t produce the greatest results, as a lot of stuff I’m trying to learn this way has a lot of research put into it, and I basically try to reimplement a thing only based on assumptions and observations. Nevertheless, I’m satisfied with the process, and in the end, if I got something working, I feel happy, and even more so, when I’ve got some concepts exactly right. I’m not suggesting that this is a superior way to learn, but it is at least very enjoyable for me personally. ↩︎