Andrey Listopadov

Clojure on Fennel part one: Persistent Data Structures

Somewhere in 2019 I started a project that aimed to bring some of Clojure features to Lua runtime - fennel-cljlib. It was a library for Fennel that implemented a basic subset of clojure.core namespace functions and macros. My goal was simple - I enjoy working with Clojure, but I don’t use it for hobby projects, so I wanted Fennel to feel more Clojure-like, besides what it already provides for that.

This library grew over the years, I implemented lazy sequences, added immutability, made a testing library, inspired by clojure.test and kaocha, and even made a port of clojure.core.async. It was a passion project, I almost never used it to write actual software. One notable exception is fenneldoc - a tool for documentation generation for Fennel libraries. And I haven’t seen anyone else use it for a serious project.

The reason for that is simple - it was an experiment. Corners were cut, and Fennel, being a Clojure-inspired lisp is not associated with functional programming the same way Clojure is. As a matter of fact, I wouldn’t recommend using this library for anything serious… yet.

Recently, however, I started a new project: ClojureFnl. This is a Clojure-to-Fennel compiler that uses fennel-cljlib as a foundation. It’s still in early days of development, but I’ve been working on it for a few months in private until I found a suitable way to make things work in March. As of this moment, it is capable of compiling most of .cljc files I threw at it, but running the compiled code is a different matter. I mean, it works to some degree, but the support for standard library is far from done.

;; Welcome to ClojureFnl REPL
;; ClojureFnl v0.0.1
;; Fennel 1.6.1 on PUC Lua 5.5
user=> (defn prime? [n]
         (not (some zero? (map #(rem n %) (range 2 n)))))
#<function: 0x89ba7c550>
user=> (for [x (range 3 33 2)
             :when (prime? x)]
         x)
(3 5 7 11 13 17 19 23 29 31)
user=>

However, there was a problem.

My initial implementation of immutable data structures in the itable library had a serious flaw. The whole library was a simple hack based on the copy-on-write approach and a bunch of Lua metatables to enforce immutability. As a result, all operations were extremely slow. It was fine as an experiment, but if I wanted to go further with ClojureFnl, I had to replace it. The same problem plagued immutableredblacktree.lua, an implementation of a copy-on-write red-black tree I made for sorted maps. It did a full copy of the tree each time it was modified.

For associative tables it wasn’t that big of a deal - usually maps contain a small amount of keys, and itable only copied levels that needed to be changed. So, if you had a map with, say, ten keys, and each of those keys contained another map with ten keys, adding, removing or updating a key in the outer map meant only copying these ten keys - not the whole nested map. I could do that reliably, because inner maps were immutable too.

But for arrays the story is usually quite different. Arrays often store a lot of indices, and rarely are nested (or at least not as often as maps). And copying arrays on every change quickly becomes expensive. I’ve mitigated some of the performance problems by implementing my version of transients, however the beauty of Clojure’s data structures is that they’re quite fast even without this optimization.

Proper persistent data structures

Clojure uses Persistent HAMT as a base for its hash maps and sets, and a bit-partitioned trie for vectors. For sorted maps and sets, Clojure uses an immutable red-black tree implementation, but as far as I know it’s not doing a full copy of the tree, and it also has structural sharing properties.

I started looking into existing implementations of HAMT for Lua:

  1. hamt.lua (based on mattbierner/hamt)
    • seemed incomplete
  2. ltrie
    • no transients
    • no hashset
    • no ordered map (expectable, different algorithm)
    • no compound vector/hash
  3. Michael-Keith Bernard’s gist
    • no custom hashing

I could use one of those, notably ltrie seemed the most appropriate one, but given that I’m working on a fennel library that I want later to embed into my Clojure compiler I needed a library implemented in Fennel.

So I made my own library: immutable.fnl. This library features HAMT-hash maps, hash-sets, and vectors, as well as a better implementation of a persistent red-black tree, and lazy linked lists.

Persistent Hash Map

I started the implementation with a Persistent HAMT with native Lua hashing. The data structure itself is a Hash Array Mapped Trie (HAMT) with 16-factor branching. Thus all operations are O(Log16 N), which is effectively O(1) for a practical amount of keys.

As far as I know, Clojure uses branching factor of 32, but for a Lua runtime this would mean that the popcount would be more expensive, and despite a shallower tree, each mutation would need to copy a larger sparse array. With branching factor of 16 a map with 50K entries is ~4 levels deep, which would be ~3 with 32 branching factor. So my logic was that it’ll be a compromise, especially since Lua is not JVM when it comes to performance.

Of course, it’s not as fast as a pure Lua table, which is to be expected. Lua tables are implemented in C, use efficient hashing, and dynamically re-allocated based on key count. So for my implementation most operations are a lot slower, but the total time for an operation is still usable.

Here are some benchmarks:

Median time over 7 rounds (1 warmup discarded), N = 50000 elements. GC stopped during measurement. Clock: os.clock (CPU). Runtime: Fennel 1.7.0-dev on PUC Lua 5.5

Regular operations are notably slower, when compared to Lua:

Operation Lua table Persistent HashMap Ratio per op
insert 50000 random keys 2.05 ms 164.80 ms 80.3x slower 3.3 us
lookup 50000 random keys 0.83 ms 92.51 ms 110.8x slower 1.9 us
delete all 0.78 ms 170.78 ms 219.8x slower 3.4 us
delete 10% 0.14 ms 19.50 ms 136.4x slower 3.9 us
iterate 50000 entries 1.74 ms 6.64 ms 3.8x slower 0.133 us

For transients the situation is a bit better, but not by much:

Operation Lua table Transient HashMap Ratio per op
insert 50000 random keys 2.05 ms 89.17 ms 43.5x slower 1.8 us
delete all 0.76 ms 104.31 ms 138.0x slower 2.1 us
delete 10% 0.16 ms 12.71 ms 82.0x slower 2.5 us

On LuaJIT numbers may seem worse, but per-operation cost is much lower, it’s just that native table operations are so much faster:

Median time over 7 rounds (1 warmup discarded), N = 50000 elements. GC stopped during measurement. Clock: os.clock (CPU). Runtime: Fennel 1.7.0-dev on LuaJIT 2.1.1774896198 macOS/arm64

Operation Lua table Persistent HashMap Ratio per op
insert 50000 random keys 0.86 ms 49.05 ms 56.8x slower 0.981 us
lookup 50000 random keys 0.27 ms 14.21 ms 53.4x slower 0.284 us
delete all 0.13 ms 48.63 ms 374.1x slower 0.973 us
delete 10% 0.05 ms 6.49 ms 138.1x slower 1.3 us
iterate 50000 entries 0.07 ms 1.80 ms 27.7x slower 0.036 us
Operation Lua table Transient HashMap Ratio per op
insert 50000 random keys 0.76 ms 22.43 ms 29.6x slower 0.449 us
delete all 0.15 ms 34.16 ms 232.4x slower 0.683 us
delete 10% 0.04 ms 5.02 ms 132.1x slower 1.0 us

With a branching factor of 32 the situation gets worse on PUC Lua, but is slightly better on LuaJIT. So there’s still space for fine-tuning.

For hashing strings and objects I decided to use djb2 algorithm. I am almost as old as this hash function, so seemed like a good fit. JK. The main reason to use it was that it can be implemented even if we don’t have any bit-wise operators, and Lua doesn’t have them in all of the versions. It only uses +, *, and % arithmetic operators, so can be done on any Lua version. It’s prone to collisions, and I try to mitigate that by randomizing it when the library is loaded.

Still, collisions do happen, but HAMT core ensures that they will still resolve correctly by implementing a deep equality function for most objects.

However, when first working on this, I noticed this:

>> (local hash-map (require :io.gitlab.andreyorst.immutable.PersistentHashMap))
nil
>> (local {: hash} (require :io.gitlab.andreyorst.immutable.impl.hash))
nil
>> (hash (hash-map :foo 1 :bar 2))
161272824
>> (hash {:foo 1 :bar 2})
161272824
>> (hash-map (hash-map :foo 1 :bar 2) 1 {:foo 1 :bar 2} 2)
{{:foo 1 :bar 2} 2}

This is an interesting loophole. What object ended up in our hash map as a key - our persistent map or plain Lua table? Well, that depends on insertion order:

>> (each [_ k (pairs (hash-map (hash-map :foo 1 :bar 2) 1 {:foo 1 :bar 2} 2))]
     (print (getmetatable k)))
IPersistentHashMap: 0x824d9b570
nil
>> (each [_ k (pairs (hash-map {:foo 1 :bar 2} 2 (hash-map :foo 1 :bar 2) 1))]
     (print (getmetatable k)))
nil
nil

To reiterate, I’m creating a hash map, with a key set to another persistent hash map, and then insert a plain Lua table with the same content. The Lua table hashes to exactly the same hash, and goes into the same bucket, but there’s no collision, because objects are equal by value. But equality of mutable collections is very loosely defined - it may be equal right now, but the next time you look at it, it’s different. So a different hashing was needed for persistent collections, to avoid these kinds of collision. I ended up salting persistent collections with their prototype address in memory.

Other than that, the HAMT implementation is by the book, and the rest is the interface for interacting with maps.

Main operations:

  • new - construct a new map of key value pairs
  • assoc - associate a key with a value
  • dissoc - remove key from the map
  • conj - universal method for association, much like in Clojure
  • contains - check if key is in the map
  • count - map size, constant time
  • get - get a key value from a map
  • keys - get a lazy list of keys
  • vals - get a lazy list of values
  • transient - convert a map to a transient

Coercion/conversion:

  • from - create a map from another object
  • to-table - convert a map to a Lua table
  • iterator - get an iterator to use in Lua loops

Transient operations:

  • assoc! - mutable assoc
  • dissoc! - mutable dissoc
  • persistent - convert back to persistent variant, and mark transient as completed

This covers most of the needs in my fennel-cljlib library, as anything besides it I can implement myself, or just adapt existing implementations.

A Persistent Hash Set is also available as a thin wrapper around PersistentHashMap with a few method changes.

A note on PersistentArrayMap.

In Clojure there is a second kind of maps that are ordered, not sorted, called a Persistent Array Map. They are used by default when defining a map with eight keys or less, like {:foo 1 :bar 2}. The idea is simple - for such a small map, a linear search through all keys is faster than with a HAMT-based map.

However, in my testing on the Lua runtime, there’s no benefit in this kind of a data structure, apart from it being an ordered variant. Lookup is slower, because of a custom equality function, which does deep comparison.

Persistent Vector

Persistent Vectors came next, and while the trie structure is similar to hash maps, vectors use direct index-based navigation instead of hashing, with a branching factor of 32. Unlike maps, vector arrays in the HAMT are more densely packed, and therefore a higher branching factor is better for performance. So lookup, update, and pop are O(log32 N), append can be considered O(1) amortized.

Still, compared to plain Lua sequential tables the performance is not as good:

Median time over 7 rounds (1 warmup discarded), N = 50000 elements. GC stopped during measurement. Clock: os.clock (CPU). Runtime: Fennel 1.7.0-dev on PUC Lua 5.5

Operation Lua table Persistent Vector Ratio per op
insert 50000 elements 0.19 ms 21.07 ms 109.7x slower 0.421 us
lookup 50000 random indices 0.47 ms 14.05 ms 29.7x slower 0.281 us
update 50000 random indices 0.32 ms 70.04 ms 221.6x slower 1.4 us
pop all 50000 elements 0.25 ms 24.34 ms 96.2x slower 0.487 us
iterate 50000 elements 0.63 ms 10.16 ms 16.2x slower 0.203 us
Operation Lua table Transient Vector Ratio per op
insert 50000 elements 0.19 ms 7.81 ms 40.3x slower 0.156 us
update 50000 random indices 0.33 ms 20.76 ms 62.4x slower 0.415 us
pop all 50000 elements 0.25 ms 11.14 ms 44.4x slower 0.223 us

On LuaJIT:

Median time over 7 rounds (1 warmup discarded), N = 50000 elements. GC stopped during measurement. Clock: os.clock (CPU). Runtime: Fennel 1.7.0-dev on LuaJIT 2.1.1774896198 macOS/arm64

Operation Lua table Persistent Vector Ratio per op
insert 50000 elements 0.10 ms 7.62 ms 74.0x slower 0.152 us
lookup 50000 random indices 0.06 ms 0.67 ms 11.8x slower 0.013 us
update 50000 random indices 0.04 ms 29.13 ms 710.4x slower 0.583 us
pop all 50000 elements 0.02 ms 8.62 ms 410.4x slower 0.172 us
iterate 50000 elements 0.02 ms 0.57 ms 28.7x slower 0.011 us
Operation Lua table Transient Vector Ratio per op
insert 50000 elements 0.05 ms 0.59 ms 11.6x slower 0.012 us
update 50000 random indices 0.04 ms 2.06 ms 51.6x slower 0.041 us
pop all 50000 elements 0.02 ms 0.84 ms 46.7x slower 0.017 us

I think this is an OK performance still. Vectors don’t use hashing, instead it is a direct index traversal via bit-shifting, so there’s no hashing, just index math.

Operations on vectors include:

  • new - constructor
  • conj - append to the tail
  • assoc - change a value at given index
  • count - element count (constant time)
  • get - get value at given index
  • pop - remove last
  • transient - convert to a transient
  • subvec - create a slice of the vector in constant time

Transient operations:

  • assoc! - mutable assoc
  • conj! - mutable conj
  • pop! - mutable pop
  • persistent - convert back to persistent and finalize

Interop:

  • from - creates a vector from any other collection
  • iterator - returns an iterator for use in Lua loops
  • to-table - converts to a sequential Lua table

One notable difference in both vector and hash-map is that it allows nil to be used as a value (and as a key, in case of the hash-map). Vectors don’t have the same problem that Lua sequential tables have, where length is not well-defined if the table has holes in it.

It’s a debate for another time, whether allowing nil as a value (and especially as a key) is a good decision to make, but Clojure already made it for me. So for this project I decided to support it.

Persistent Red-Black Tree

For sorted maps and sorted sets I chose Okasaki’s insertion and Germane & Might’s deletion algorithms. Most of the knowledge I got from this amazing blog post by Matt Might.

I believe the operations are O(Log N), as for any binary tree, but given that the deletion algorithm is tricky, I’m not exactly sure:

Median time over 7 rounds (1 warmup discarded), N = 50000 elements. GC stopped during measurement. Clock: os.clock (CPU). Runtime: Fennel 1.7.0-dev on PUC Lua 5.5

Operation Lua table PersistentTreeMap Ratio per op
insert 50000 random keys 2.10 ms 209.23 ms 99.8x slower 4.2 us
lookup 50000 random keys 0.88 ms 82.97 ms 94.2x slower 1.7 us
delete all 0.74 ms 173.76 ms 234.8x slower 3.5 us

On LuaJIT:

Median time over 7 rounds (1 warmup discarded), N = 50000 elements. GC stopped during measurement. Clock: os.clock (CPU). Runtime: Fennel 1.7.0-dev on LuaJIT 2.1.1774896198 macOS/arm64

Operation Lua table PersistentTreeMap Ratio per op
insert 50000 random keys 0.72 ms 101.08 ms 140.4x slower 2.0 us
lookup 50000 random keys 0.25 ms 12.67 ms 49.9x slower 0.253 us
delete all 0.14 ms 56.14 ms 403.9x slower 1.1 us

The API for sorted maps and sets is the same as to their hash counterparts with a small difference - no transients. Clojure doesn’t do them, and I’m not doing them too.

That’s all for benchmarks. I know that there are many problems with this kind of benchmarking, so take it with a grain of salt. Still, the results are far, far better than what I had with itable.

But there are two more data structures to talk about.

Persistent List

As I mentioned, I made a lazy persistent list implementation a while ago but it had its problems and I couldn’t integrate that library with the current one well enough.

The main problem was that this library uses a single shared metatable per data structure, and the old implementation of lazy lists didn’t. This difference makes it hard to check whether the object is a table, hash-map, list, vector, set, etc. So I reimplemented them.

The reason for old implementation to use different metatables was because I decided to try the approach described in Reversing the technical interview post by Kyle Kingsbury (Aphyr). I know this post is more of a fun joke, but it actually makes sense to define linked lists like that in Lua.

See, tables are mutable, and you can’t do much about it. Closures, on the other hand are much harder to mutate - you can still do it via the debug module, but it’s hard, and it’s not always present. So storing head and tail in function closures was a deliberate choice.

However, it meant that I needed to somehow attach metadata to the function, to make it act like a data structure, and you can’t just use setmetatable on a function. Again, you can do debug.setmetatable but all function objects share the same metadata table. So, while you can do fancy things like this:

>> (fn comp [f g] (fn [...] (f (g ...))))
#<function: 0x7bdb320a0>
>> (debug.setmetatable (fn []) {:__add comp})
#<function: 0x7bd17f040>
>> ((+ string.reverse string.upper) "foo")
"OOF"

You can also notice, that our + overload applied to functions in the string module.

So instead, we use a table, and wrap it with a metatable that has a __call metamethod, essentially making our table act like a function. This, in turn means, that we have to create two tables per list node - one to give to the user, the other to set our __call and use it as a meta-table.

Convoluted, I know. It’s all in the past now - current implementation is a simple {:head 42 :tail {...}} table. Not sure what is worse.

But that meant that I had to rework how lazy lists worked, because previously it was just a metatable swap. Now list stores a “thunk”, that when called replaces itself in the node with the :head and :tail keys. Unless it’s an empty list, of course - in that case we swap the metatable to an empty list one.

So Lists have three metatables now:

  • IPersistentList
  • IPersistentList$Empty
  • IPersistentList$Lazy

Instead of god knows how many in the old implementation.

The list interface is also better now. Previously it was hardcoded how to construct a list from a data structure. Current implementation also hardcodes it, but also allows to build a list in a lazy way from an iterator.

This is better, because now a custom data structure that has weird iteration schema (like maps and sets in this library), we still can convert it to a list. A general case is just:

(PersistentList.from-iterator #(pairs data) (fn [_ v] v))

Meaning that we pass a function that will produce the iterator, and a function to capture values from that iterator. Reminds me of clojure transducers in some way.

Persistent Queue

And the final data structure - a persistent queue. Fast append at the end, and also fast remove from the front.

It’s done by holding two collections - a linked list at the front, and a persistent vector for the rear. So removing from the list is O(1), and appending to the vector is also pretty much O(1).

Interesting things start to happen when we exhaust the list part - we need to move vector’s contents into the list. It is done by calling PersistentList.from on the rear. And building a list out of a persistent vector is an O(1) operation as well! Well, because nothing happens, we simply create an iterator, and build the list in a lazy way. But since indexing the vector is essentially ~O(1), we can say that we still retain this property.

Or at least that’s how I reasoned about this - I’m not that good with time-complexity stuff.

ClojureFnl

That concludes part one about ClojureFnl.

I know that this post was not about ClojureFnl at all, but I had to fix my underlying implementation first. Now, that I have better data structures to build from, I can get back working on the compiler itself. So the next post will hopefully be about the compiler itself.

Unless I get distracted again.