Andrey Listopadov

Extending JSON encoder with custom types

@programming http fennel ~8 minutes read

One thing I like about Lua is that it has a limited amount of built-in types and data structures. There are no objects, just tables. And if you need a custom behavior, a metatable can be attached to any table to extend it with custom methods that integrate with the rest of Lua runtime.

However, this is both a blessing and a curse. And unfortunately, Lua developers made a huge mistake by breaking this API1 once already, in a minor release even.

As you may noticed, fnl-http has a lot of stuff. It has asynchronous Channels, and various Readers, which act like objects, but are still ordinary tables underneath. And this is true for a lot of Lua projects, tables are flexible enough to implement many cool things, so people do.

However, this poses an interesting problem - because objects are not a thing in Lua, and a set of default metamethods is quite limited, there’s no default way of extending objects with custom behavior. In Java, you would specify what interfaces are required for an object to have, and it often means that if something doesn’t implement them, a wrapper object is needed. In Clojure, we can get away with protocols and multimethods, but Lua has neither of those. In Lua, and subsequently Fennel, we have to be creative, because even though we can add stuff through metatables, not every value can have one.

While working on fnl-http, I added a basic JSON parser and encoder. The parser part is quite simple, as JSON has a specification and a fixed amount of data types to worry about. It’s not like EDN or XML, for example, which are both extensible data notations that can represent pretty much anything. In JSON you get only these data types:

  • Number
  • String
  • Array
  • Object
  • Boolean
  • Null

The only difference from Lua here is that null is nil and Lua doesn’t have arrays as a separate type. So implementing a parser was easy, and everything pretty much maps to Lua data structures one-to-one.

Implementing JSON encoder, however, was not as easy. For basic data types, it is easy enough, but once we touch custom objects created with metatables things get messy.

For example, let’s say we need a set data structure. A set is similar to a table, except the values are the same as the keys. But, we may want to distinguish them from ordinary tables, if, say, we’re making a library that has a set of functions for working with mathematical sets. Like intersection, difference, etc. Here’s a set constructor:

(fn Set [...]
  (let [data (faccumulate [res {}
                           i 1 (select :# ...)]
               (let [k (select i ...)]
                 (doto res (tset k k))))]
    (setmetatable
     {}
     {:__index data
      :__newindex (fn [_ k _] (tset data k k))})))

As you can see, we create an empty table and set its metatable to a bunch of functions that close over the local data. This technique is quite common in Lua and is called proxy tables. By using the empty table as a base, we ensure that any table access triggers the __index or __newindex metamethod, depending on the action.

However, there’s a problem. If we try to print such an object in the REPL, we don’t get any information about it, even though it has values:

>> (local s (Set :a :b :c))
nil
>> (. s :a)
"a"
>> (set (. s :d) 42)
nil
>> (. s :d)
"d"
>> s
{}

Additionally to that, we have no means of iteration over this data structure, as the default next function will just return nil:

>> (next s)
nil

We can fix this by adding a __pairs metamethod:

(fn Set [...]
  (let [data (faccumulate [res {}
                           i 1 (select :# ...)]
               (let [k (select i ...)]
                 (doto res (tset k k))))]
    (setmetatable
     {}
     {:__index data
      :__newindex (fn [_ k _] (tset data k k))
      :__pairs (fn [_] #(pick-values 1 (next data $2)))})))

Now, printing works, albeit in a bit strange way:

>> (Set :a :b :c)
{:a nil :b nil :c nil}

A few years ago, I remade the pretty-printer for Fennel, and support for custom objects was a huge part of its design. In short, we can fix our object textual representation by using the __fennelview metamethod.

Now, __fennelview is fine and all, but it is not exactly meant for extending existing objects. Of course, given a Lua library, that implements these proxy tables you can add support for Fennel, but it’s more of an ad-hoc solution. So it’s meant for library authors to make the Fennel REPL experience more user-friendly, and make sure that the printed representation can be read back by Fennel.

And it’s not always possible to add __fennelview to existing objects. For example, one common complaint in Lua is that arrays start from 1 and not 0. We can “fix” that by implementing our own Array type:

(fn Array [...]
  (let [vals [...]]
    (setmetatable
     []
     {:__index (fn [_ i]
                 (. vals (+ i 1)))
      :__newindex (fn [i val]
                    (tset vals (- i 1) val))
      :__len #(length vals)
      :__pairs (fn [_] #(next vals $2))})))

Right now, the object above prints like this:

>> (Array 1 2 3)
[1 2 3]

Let’s assume this came from the library, and we wish to print this differently, e.g. as (Array 1 2 3) instead of [1 2 3] to make sure that such arrays are read back as 0-based arrays, and not as ordinary Lua tables. We can do that, but it is tricky because we don’t have any means of accessing the data closure:

(fn wrap [arr]
  (tset (getmetatable arr)
      :__fennelview
    (fn pp-doc-example [t view options indent]
      (let [lines (icollect [i v (pairs t)]
                    (let [v (view v options (+ 7 indent))]
                      (if (= i 1) v
                          (.. "       " v))))]
        (doto lines
          (tset 1 (.. "(Array " (or (. lines 1) "")))
          (tset (length lines) (.. (. lines (length lines)) ")"))))))
  arr)

However, as you can see, we have to use wrap on every Array, because each instance comes with its own metatable, and setting it once wouldn’t work. And if we try to read back the printed representation we’ll get the old representation:

>> (wrap (Array 1 2 3))
(Array 1 2 3)
>> (Array 1 2 3)
[1 2 3]

So we can make another wrapper:

(local Array* Array)

(fn Array [...]
  (wrap (Array* ...)))
>> (Array 1 2 3)
(Array 1 2 3)

So you can see, how this is a bit problematic.

Custom JSON encoders

Now, this was a bit of a tangent, so let’s get back to the main topic - JSON encoding. Because I can’t ask everyone to do this kind of wrapping, I decided to implement a different way of extending the json.encode function. Thus, the library now features two more functions: register-encoder and unregister-encoder. The second one is for fixing mistakes, and won’t probably be used too much.

So, let’s see how we can add support for custom types. First, we need a function that will tell if the given object is an instance of something we want to encode:

(fn Array? [x]
  (and (= :table (type x))
       (not= nil (. x 0))
       Array))

A library may provide a custom predicate to check if the given object is of the same type, like chan? for asynchronous channels. However, it can’t be used as is, because to register an encoder, such a function needs to return a singleton, that can be compared with =. So we return Array from our example function.

Next, we can implement the encoder:

(fn encode-array [arr encode]
  (.. "["
      (-> (fcollect [i 0 (- (length arr) 1)]
            (encode (. arr i)))
          (table.concat ", "))
      "]"))

Finally, we can register it:

(json.register-encoder object object? encode-object)

The first argument is the object itself, and the second one is a function to obtain the object’s type. Internally, register-encoder calls the object? function on the first argument to provide a bit more flexibility, in case object? returns different values for different objects or based on the object’s properties. The third argument is a function that provides the encoding. It accepts a callback to encode nested values as a second argument.

With that, we can register the encoder and encode our array as JSON by calling (json (Array 1 2 3)):

>> (json.register-encoder (Array) Array? encode-array)
nil
>> (json (Array 1 2 3))
"[1, 2, 3]"

This should provide enough flexibility to support any proxy-based objects.

Custom decoders

I decided not to add custom decoder support to this library, yet. JSON doesn’t have any custom types, so it doesn’t make sense to parse its values to anything but default Lua types.

One could argue, that it would be nice to have a way to parse arrays to zero-based ones, given that we can convert zero-based arrays to JSON. While true, I think this only adds unnecessary complexity to the parser.

One reason, registering an encoder requires a singleton as type designator is to ensure that objects can’t be mixed up. If it was allowed to register encoders via string types, it would be possible to register two "Array" types, making ambiguous what encoder would be used or would require to change the name in case of conflict.

JSON has a fixed amount of types, so registering a decoder for, say, Array, would mean that all arrays would be parsed to this custom type in all of the project parts. Meaning, that if you use a different library, that also uses this JSON parser, and internally expects one-based arrays it will break.

I’m open to changing my mind on this, but for now, I feel that custom encoders are sufficient enough to make this library more usable. As a part of this change, I’m moving this library out from the fnl-http repo into its own library, making fnl-http depend on it instead, making it possible for other projects to use this parser more easily. You can find the code here.

Thanks for reading, and I hope this library will be useful!

Foontones


  1. The deprecation of __ipairs metamethod in Lua 5.3 resulted in an inability to implement custom iteration over custom sequential data structures. For example, the zero-indexed array example would benefit from __ipairs because we could specify that the iteration starts from 0. With the default ipairs we would always miss the first value. It’s not hard to fix manually, by overriding ipairs with a custom function that checks for __ipairs metamethod:

           (local ipairs* ipairs)
           (fn _G.ipairs [tbl ...]
             (case (getmetatable tbl)
               {:__ipairs ip} (ip tbl ...)
               _ (ipairs* tbl ...)))
    

    However, it’s not practical to intrude into Lua’s standard library in such a way. ↩︎