fnl-http - testing with fennel-test

fnl-http is my current passion project - I spend a lot of free time tinkering with it, and the last week was spent on testing and fixing bugs.

As you may know, I made a testing framework, called fennel-test, which has a dedicated test runner, and a set of macros for writing tests. I use it in all of my projects, and oftentimes, some project requires more features that are currently available in the framework, so I extend it as needed. I think it’s a great way of designing something - you use it, and enhance it as you encounter new needs.

I want programming to be exciting, so I spent a lot of time making my test report nice to look at. Here’s how the report looks by default:

Test run at Tue Aug  6 23:20:41 2024, seed: 1722975641242
(....)(......)(.)(.....)(.)(........)(...)(...)

The default reporter is called dots. Each test namespace is encoded with parentheses (), each successful test is encoded as a dot ., and failed tests are encoded as F. The runner randomizes the order of tests each time, so if we run it again, the result would be different:

Test run at Tue Aug  6 23:24:10 2024, seed: 1722975850235
(......)(........)(.)(...)(...)(....)(.....)(.)

However, this is all you get with a reporter like this. If a test fails, information about the error is printed after all tests have finished, so you know what test did fail.

There’s another inbuilt reporter, called namespaces:

Test run at Tue Aug  6 23:27:33 2024, seed: 1722976053226
test.http-parser-test: PASS
test.client-test: PASS
test.json-test: PASS
test.httpbin-test: PASS
test.body-test: PASS
test.headers-test: PASS
test.readers-test: PASS
test.url-test: PASS

As you can see, it prints each namespace that is currently being tested, followed by the namespace-wide status. If a test fails, you again see its error message, but only after all tests have finished, which isn’t ideal because tests in the fnl-http project take about two minutes to complete.

Fortunately enough, you can create reporters in the .fennel-test configuration file!

Reporters

A reporter is just a table with a set of functions. Each one is called when the respecting stage of testing is being run. The methods are:

ns-start - called when entering a new test namespace.

Used to print the information about the namespace, such as its name.
ns-report - called when exiting a test namespace.

Used to print the overall status of the namespace.
test-start - called when running a single test from a namespace.

Used to print the information about the test, such as the test’s name.
test-report - called after the test is finished.

Used to print the status of the test.
stats-report - called after running all tests in all namespaces.

Used to print overall statistics of tests - how many tests were executed, how many errors were encountered, were there any warnings, etc.

This was enough for me to implement other kind of reporters, like the one in the project’s readme. However, I wanted a bit more for this project - first, I wanted the report to be fun to look at, and second, I wanted some statistics. And third, actually, I noticed the need to skip tests in case something is not available on the machine.

After a bit of work, here’s what I’ve got:

With this reporter, if a specific test fails, I can immediately see it, without waiting for all tests to pass. In the end, you can see some statistics on the longest tests, and average test time of the namespace. For example, the test.httpbin-test namespace took 90 seconds to complete, but on average each test only takes 11 seconds. I’m not entirely sure why this metric would be necessary, but I’ve seen it in another test runner, so I implemented it in my reporter.

Now, here’s a problem - one of the tests takes a whopping 90 seconds. I don’t want to run this test all the time, especially, when I’m fixing a bug found by some other test. So I introduced a skip-test function as a first step.

Originally, this function had two modes - terminating and non-terminating, but I later removed the second mode. The reason for that is the with-open macro.

Here’s an example of one of the fixtures I use in the fnl-http project:

(use-fixtures
 :once
 (fn [t]
   (with-open [proc (io.popen (.. "podman run  -p " port ":80 kennethreitz/httpbin >/dev/null 2>&1 & echo $!"))]
     (let [pid (proc:read :*l)
           attempts 10]
       (if (wait-for-server attempts)
           (do (t)
               (kill pid))
           (do (kill pid)
               (skip-test (.. "coudln't connect to httpbin server after " attempts " attempts"))))))))

The with-open macro ensures that the resource is closed no matter what. Closing the process, however, doesn’t kill the program running in the background, so I had to get its PID with sketchy echo $! part and kill it later. Killing the process doesn’t release the file descriptor created by popen, so with-open is only responsible for releasing the file handle here.

A sudden bug

What with-open also does is catching, and re-throwing errors. It’s something akin to trying with resources in Java, and other languages that support this kind of abstraction. You can think about it as try/finally without the catch part.

Lua, however, doesn’t really have well-defined exceptions. Instead, functions usually throw string messages about the error and that’s it. Sometimes, however, it’s useful to throw a table instead. Tables are like objects, so you can embed some additional information in those, making it easier to match a particular error. It’s better to convert this table to a string if it ever reaches the user uncaught though.

What makes things harder is that in Lua it’s not possible to re-throw an error without losing its previous stack trace. While it is a common practice to use pcall to catch any errors and to re-throw them if you can’t do anything about the error, this is what causes losing the precious stack trace. More precisely, if the error message comes with the stack trace, and you would just re-throw it with error, you wouldn’t add your stack trace to the trace. Sounds convoluted, yeah?

So Lua gives you another option: xpcall. The Lua manual doesn’t have a thorough explanation of it though:

xpcall (f, msgh [, arg1, ···])

This function is similar to pcall, except that it sets a new message handler msgh.

It works the same as pcall, except, it accepts an additional argument - a function that would append a stack trace to the error. Usually it’s the debug.traceback function:

debug.traceback ([thread,] [message [, level]])

If message is present but is neither a string nor nil, this function returns message without further processing. Otherwise, it returns a string with a traceback of the call stack. The optional message string is appended at the beginning of the traceback. An optional level number tells at which level to start the traceback (default is 1, the function calling traceback).

Knowing all that, let’s look at how I’ve implemented the skip-test function:

(local Skip
  (setmetatable
   {}
   {:__tostring #:Skip
    :__fennelview #:Skip}))

(fn skip-test [reason]
  "Calling this function inside a test or a fixture will stop the test
early and mark it as skipped. The optional `reason` argument is a
message to display in the log if the reporter is configured to do so."
  (error [Skip reason]))

Right, skip-test throws an array with a special table object, defined as a singleton, so the test runner could distinguish between ordinary test errors and skipped tests. In theory, everything should have worked, but for some reason, some of the tests were skipped correctly, but others didn’t.

The culprit? It’s with-open, obviously, otherwise, why would I take such a tangent about it?

Internally, with-open uses the xpcall function to do its job. And theoretically, it should work just fine, however, it doesn’t use debug.traceback if Fennel’s own compiler.traceback is available.

Originally, I thought that’s just how it is, and implemented two modes in skip-test. After a bit of digging and consulting the manual, I found out that compiler.traceback mistakenly converted every message to a string. Fixing that made the skip-test function work in all cases.

Another sudden bug!

After the patch landed, and I updated Fennel to the newest commit, something else broke. Now, the JSON parsing test no longer succeeded, complaining that some number has no integer representation. The number in question is 23456789012000000000000000000000000000000000000000000000000000000000000000000 or, as it is written in the JSON test file that I use, 2.3456789012e+76.

Before the update, everything worked fine, so I started digging again. Turns out, recently, the way numbers are compiled by Fennel has changed:

@@ -534,8 +534,9 @@ (fn compile-sym
 ;; We do gsub transformation because some locales use , for
 ;; decimal separators, which will not be accepted by Lua.
 (fn serialize-number [n]
-  (pick-values 1 (-> (tostring n)
-                     (string.gsub "," "."))))
+  (if (= (math.floor n) n)
+      (string.format "%d" n)
+      (string.gsub (tostring n) "," ".")))

And now it is failing, because indeed, you can’t convert 2.3456789012e+76 to a string using %d. Except, you can if you’re using LuaJIT - it happily converts this number to "-9223372036854775808", which is even worse.

After a bit of searching, I found that the safest way to format integers as stings is by using "%.f format specifier. Strangely enough, the Lua manual doesn’t have any information on most number formatting patterns. A tiny bit of information was found on the lua-users.org page:

To make sure integer numbers are properly converted into string (without scientific notation), rather than using tostring(x), use format.string("%.0f",x)

I don’t really get why %.f would format a number as an integer given that %f formats floats. And, even though it works the same way in C, I have never seen this in my life. But OK, we could just replace %d with %.f, and be done with that, however, the number above is then would appear as "23456789012000000697746671432670411048021854502345623879537911769302235611136". I get it, it’s a big number, and Lua has limited precision for integers, but the main problem is that this number is much harder to read than its e-notation brother 2.3456789012e+76.

So I settled on finding a way to fix this bug so that both PUC Lua and LuaJIT would format this number in the same way. The result is a patch, that changes the compiler and pretty-printer parts of the language. The function that does that, looks horrifying:

(fn serialize-number [n]
  (let [val (if (= (math.floor n) n)
                (let [s1 (string.format "%.f" n)]
                  (if (= s1 (tostring n)) s1
                      (or (faccumulate [s nil i 0 99 :until s]
                            (let [s (string.format (.. "%." i "e") n)
                                  n* (tonumber s)]
                              (when (= n n*)
                                (let [exp (s:match "e%+?(%d+)$")]
                                  ;; Lua stops transforming numbers from
                                  ;; e-notation to integers at e+14
                                  (if (and exp (> (tonumber exp) 14))
                                      s
                                      s1)))
                              s1))))
                  (tostring n))]
        (pick-values 1 (string.gsub val "," "."))))

We start by checking if (tostring n) and (string.format "%.f" n) give the same string. If they are, we just return the string. If not, we try the scientific notation for each power from 0 to 99. I probably should’ve tested for more powers than that, but it’s OK for now.

We’re converting the number to a string using the e-notation and parse it back, comparing it with the original number. What this does, is that if the power is less than the original, we would lose precision, and comparison would fail. If we can’t find the exponent up to 99, we bail out and return the %.f representation. Otherwise, if the numbers match, we check if the exponent is greater than 14 as it is the exponent where PUC Lua starts using the e-notation. Otherwise, we again return the %.f representation.

Phew!

But I’m not done yet!

Let’s try to compile a really huge number:

>> 1e+99999
nil

What?

Let’s do it in an expression then, maybe that’s just the pretty printer freaking out:

>> (+ 1e+309 1)
attempt to perform arithmetic on a nil value (global 'inf')

~~What?~~ I, of course, jest, but the reason this happens is pretty peculiar.

You see - when we do (tostring "%.f" n) on a huge number like 1e+309 it returns the string "inf". The compiler then happily includes it in the output Lua code, which is then run by the Lua VM. In Lua, however, inf is not a reserved identifier or a number, so it treats inf like a variable name. The inf variable is unbound, hence the nil in the first example, and an error in the second.

So we need to account for that:

(fn serialize-number [n]
  (let [val (if (= (math.floor n) n)
                (let [s1 (string.format "%.f" n)]
                  (if (= s1 "inf") "(1/0)" ; portable inf
                      (= s1 "-inf") "(-1/0)"
                      (= s1 (tostring n)) s1 ; no precision loss
                      (or (faccumulate [s nil
                                        i 0 308 ; beyond 308 every number turns to inf
                                        :until s]
                            (let [s (string.format (.. "%." i "e") n)]
                              (when (= n (tonumber s))
                                (let [exp (s:match "e%+?(%d+)$")]
                                  ;; Lua keeps numbers in standard notation up to e+14
                                  (if (and exp (> (tonumber exp) 14))
                                      s
                                      s1)))))
                          s1)))
                (tostring n))]
    (pick-values 1 (string.gsub val "," "."))))

I also brought the exponent limit up to the last number representable in this notation without becoming inf.

Now we can go back to testing!

Counting skipped tests and assertions

The other thing I added to fennel-test was more counters. Counting skipped tests required some tinkering, like counting tests in the whole namespace if the tests were skipped from the :once fixture.

Counting assertions was even harder, because tests in fennel-test are implemented through macros, and assertions are macros too. I had to inject some runner state, conditionally, and modify it at runtime from each assertion. A bit shady but it works.

What is much more shady is the coverage reports for this project.

Coverage report

I have a firm belief that 100% is the only acceptable coverage. However, and this is a big however, it’s important that such coverage comes from public API tests only. Achieving 100% coverage this way means that your program doesn’t have any dead code and that your public API test at least touches all of our code at least once. It doesn’t speak about test quality, of course, and it doesn’t guarantee that the program is bug-free, but those other points alone make a huge difference.

Lua has one and only luacov that you can use to test your code. I’ve used it in some other Fennel projects in the past, but it comes with some nuances you have to deal with.

First, luacov itself is imprecise. In the post about 100% coverage I talk about it in more detail, but basically, I often see situations where the side effects happened, but the coverage report showed code responsible for the side effects as untouched. Other times, the coverage percentage is bigger than its actual value due to code formatting.

With the fnl-http project I hit a different problem. Two, actually.

The first one is related to the --correlate option of the Fennel compiler. I had an idea, that if I used --correlate I could substitute Lua sources used to produce the coverage with Fennel sources. And since the line numbers are matched by specifying the --correlate option, the coverage report would be so much easier to read.

Didn’t work out.

Unfortunately, --coverage is imprecise. Many parts of Fennel AST don’t contain any line information. Here’s an example:

1
2
3
4
5
6


(local {: a
        : b}
  (require :c))

{: a
 : b}

And here’s the corresponding Lua code:

1
2
3
4
5
6


local _local_1_ = require("c") local a = _local_1_["a"]
local b = _local_1_["b"]
-- empty line
-- empty line
return {a = a, b = b}
-- empty line

As you can see, the overall amount of lines is the same, but many lines that are not empty in the Fennel example are empty in Lua. It’s not a problem, as long as you’re measuring the coverage and viewing the report with Lua, but if we replace the Lua code with its Fennel counterpart, the report becomes invalid. I guess, when the report is constructed, luacov checks if the line is empty or not, and if it isn’t, and there were no hits registered, it counts it as a miss. I’m using luacov-console to view the report, maybe it’s a bug from there, I don’t know.

The only way to fix this is to write the code accounting for these nuances of --correlate:

(local {: a                ; local _local_1_ = require("c") local a = _local_1_["a"]
        : b} (require :c)) ; local b = _local_1_["b"]
                           ; <empty line>
{: a : b}                  ; return {a = a, b = b}

There are more examples like that in the project’s code.

The other problem is more peculiar. Luacov uses debug.sethook to build its report. And I use the async.fnl library in this project, which also uses debug.sethook to schedule tasks. So every time anything happens asynchronously, and trust me, there’s a lot of that in fnl-http, the Luacov report is no longer properly calculated.

So, Luacov has a lot of problems, but I understand why it is like that. You can’t really instrument the Lua code in some other way.

Fennel, on the other hand, isn’t like that. Being a language where everything is an expression, we can probably make some compiler plugin that injects some side-effecting code that writes the report for every expression in the code. It would be difficult to do with things like tables and destructuring, but it should be doable. Maybe I’ll try to tackle this problem in the future, who knows?

Anyway, that’s all from me for now! As usual, you can check out the project, which contains all of the stuff mentioned above regarding fennel-test and luacov. Thanks for reading!

Comment via email