Andrey Listopadov

fnl-http Improvements

@programming async fennel ~14 minutes read

In the last post two weeks ago I described the process of making an asynchronous HTTP/1.1 client from scratch (minus the socket part). At the end, I mentioned that there’s a lot more to implement:

Now, of course, that’s not all that needs to be implemented. In its current state, this client doesn’t support responses that do not specify the Content-Length or produce a response body in chunked encoding. HTTPS is out of the question for now too. Most of the code just takes a “happy path” without much thought given to errors or more complicated scenarios. But it’s fine, I can always make it better down the line.

HTTPS is still out of the question - to be honest, I don’t want to deal with binary protocols that much, and get myself involved with implementing SSL in Lua. I could, of course use a library, but I feel that Luasocket alone is already big enough dependency.

So here are improvements so far.

Chunked transfer-encoding

As previously mentioned, the client didn’t understand chunked responses, so I had to fix it with a chunked body reader:

(fn read-chunk-size [src]
  {:private true}
  ;; TODO: needs to process chunk extensions
  (case (src:read :*l)
    "" (read-chunk-size src)
    line
    (case (line:match "%s*([0-9a-fA-F]+)")
      size (tonumber (.. "0x" size))
      _ (error (format "line missing chunk size: %q" line)))))

(fn chunked-body-reader [src initial-chunk]
  "Reads body in chunks, buffering each fully, and requesting the next
chunk, once the buffer is empty."
  {:private true}
  (var chunk-size initial-chunk)
  (var buffer (or (src:read chunk-size) ""))
  (var more? true)
  (fn read-more []
    ;; TODO: needs to process entity headers after the last chunk.
    (when more?
      (set chunk-size (read-chunk-size src))
      (if (> chunk-size 0)
          (set buffer (.. buffer (or (src:read chunk-size) "")))
          (set more? false)))
    (values (> chunk-size 0) (string-reader buffer)))
  (make-reader
   src
   {:read-bytes (fn [_ pattern]
                  (let [rdr (string-reader buffer)]
                    (case pattern
                      (where n (= :number (type n)))
                      (let [buffer-content (rdr:read pattern)
                            len (if buffer-content (length buffer-content) 0)
                            read-more? (< len n)]
                        (set buffer (buffer:sub (+ len 1)))
                        (if read-more?
                            (let [(_ rdr) (read-more)]
                              (if buffer-content
                                  (.. buffer-content (or (rdr:read (- n len)) ""))
                                  (rdr:read (- n len))))
                            buffer-content))
                      (where (or :*l :l))
                      (let [buffer-content (rdr:read :*l)
                            (_ read-more?) (not (buffer:find "\n"))]
                        (when buffer-content
                          (set buffer (buffer:sub (+ (length buffer-content) 2))))
                        (if read-more?
                            (let [rdr (read-more)]
                              (if buffer-content
                                  (.. buffer-content (or (rdr:read :*l) ""))
                                  (rdr:read :*l)))
                            buffer-content))
                      (where (or :*a :a))
                      (let [buffer-content (rdr:read :*a)]
                        (set buffer "")
                        (while (read-more) nil)
                        (let [rdr (string-reader buffer)]
                          (set buffer "")
                          (case (rdr:read :*a)
                            nil (when buffer-content
                                  buffer-content)
                            data (.. (or buffer-content "") data))))
                      _ (error (tostring pattern)))))
    :read-line (fn [src]
                 (let [rdr (string-reader buffer)
                       buffer-content (rdr:read :*l)
                       read-more? (not (buffer:find "\n"))]
                   (when buffer-content
                     (set buffer (buffer:sub (+ (length buffer-content) 2))))
                   (if read-more?
                       (if buffer-content
                           (.. buffer-content (or (src:read :*l) ""))
                           (src:read :*l))
                       buffer-content)))
    :close (fn [src] (src:close))
    :peek (fn [_ bytes]
            (assert (= :number (type bytes)) "expected number of bytes to peek")
            (let [rdr (string-reader buffer)
                  content (or (rdr:read bytes) "")
                  len (length content)]
              (if (= bytes len)
                  content
                  (let [(_ rdr) (read-more)]
                    (let [data (rdr:read (- bytes len))]
                      (set buffer (.. buffer (or data "")))
                      buffer)))))}))

It’s a mess, but it works. Please, refer to the previous post for the explanation of what a reader is.

The main idea here is that, when the chunked transfer encoding is detected, we create a reader with chunked-body-reader instead of a regular body-reader, and set up an internal string buffer. We then process all reading operations in terms of this buffer, until it depletes, or when we request more data than left in the buffer. After that, we download the next chunk.

We still get a stream-like behavior out of this, although it has a somewhat big issue - if the server responds with a chunk of enormous size, we can fill up all available memory. I’ve thought of reworking this function to fill the buffer with smaller chunks, but it complicates the code even further, so I decided not to. For now, at least.

And this now works both ways! We can send a chunked body to the server if we’ve didn’t specify the Content-Length header, and the body is not a string. This happens automatically for files, other readers, and channels. Yes, with this library, you can stream data to a server from a channel:

(let [ch (chan)]
  (go #(with-open [body (io.open "some-file" :r)]
         (each [line (body:lines)]
           (<! (timeout 300))
           (>! ch (.. line "\n")))
         (close! ch)))
  (http.post "http://some-server.org" {:body ch}))

This will stream the file line by line, sending each line every 300 milliseconds. The server will receive each line as a separate chunk.

I won’t go into much detail on how this is implemented, as I would need to go through a lot of code.

Multipart requests

After I finished with chunked requests, I turned my eyes on multipart/form-data. It’s an interesting part of HTTP, and after a closer look at it, I realized that email is just a multipart request with weird headers. I’ve been editing emails in Emacs for quite some time but never noticed that the head of the mail is exactly like the head of an HTTP request. What a revelation!

That aside, implementing multipart came with a challenge. I don’t know if it’s just me, but I couldn’t find anywhere if the total Content-Length header is required for a multipart request or not. Logically speaking, since every part is separated by a boundary, a server should be able to read a request’s body until the final boundary. So theoretically, it doesn’t have to know the total content length of the request.

However, I’ve been testing my client with the httpbin.org service and, for some reason, it doesn’t parse the multipart request unless the Content-Length header is present. And it’s not just the sum of part lengths, it’s also their headers, empty lines that separate the header part from content, and so forth.

So I had to implement a set of functions to format multipart parts, and count their lengths:

(fn format-multipart-part [{: name : filename : filename*
                            : content :length content-length
                            : headers
                            : mime-type} boundary]
  "Format a single multipart entry.
The part starts with the `boundary`, followed by headers, created from
`name`, optional `filename` or `filename*` for files, `mime-type`, and
`content-length` which is either calculated from `content` or provided
explicitly.

Default headers include `content-disposition`, `content-length`,
`content-type`, and `content-transfer-encoding`. Provide `headers` for
additional or to change the default ones."
  {:private true}
  (let [content (wrap-body content)]
    (format
     "--%s\r\n%s\r\n"
     boundary
     (headers->string
      (collect [k v (pairs (or headers {}))
                :into {:content-disposition (format "form-data; name=%q%s%s" name
                                                    (if filename
                                                        (format "; filename=%q" filename)
                                                        "")
                                                    (if filename*
                                                        (format "; filename*=%s" (urlencode-string filename*))
                                                        ""))
                       :content-length (if (= :string (type content))
                                           (length content)
                                           (or content-length (content:length)))
                       :content-type (or mime-type (guess-content-type content))
                       :content-transfer-encoding (guess-transfer-encoding content)}]
        k v)))))

This function just formats a single multipart header, i.e. it doesn’t include the content. The content is streamed later, so we don’t need it here.

But with this function, we can now calculate the total length of our request:

(fn multipart-content-length [multipart boundary]
  "Calculate the total length of `multipart` body.
Needs to know the `boundary`."
  (+ (accumulate [total 0
                  _ {:length content-length
                     : name
                     : content
                     &as part}
                  (ipairs multipart)]
       (let [content (wrap-body content)]
         (+ total
            (length (format-multipart-part part boundary))
            (if (= :string (type content)) (+ (length content) 2)
                (reader? content)
                (+ 2 (or content-length
                         (content:length)
                         (error (format "can't determine length for multipart content %q" name) 2)))
                (not= nil content-length)
                (+ content-length 2)
                (error (format "missing length field on non-string multipart content %q" name) 2)))))
     (length (format "--%s--\r\n" boundary))))

This function doesn’t construct the whole body either - it only calculates the length based on headers, and the length of content fields that it can reasonably calculate. There’s a small bug in the format-multipart-part function though. We can override any header, including content-length, meaning we can break the request, if we send a string, and provide a different content-length for it either by mistake or deliberately. This can be fixed, of course, by either not allowing to change content length for strings, or by wrapping string bodies in a string reader. I’ll have to think about what method is better here.

That aside, we can now stream the multipart request:

(fn stream-multipart [dst multipart boundary]
  "Write `multipart` entries to `dst` separated with the `boundary`."
  (each [_ {: name : filename
            : content :length content-length
            : mime-type
            &as part}
         (ipairs multipart)]
    (assert (not= nil content) "Multipart content cannot be nil")
    (assert name "Multipart body must contain at least content and name")
    (let [content (wrap-body content)]
      (->> (if (= :string (type content)) content "")
           (.. (format-multipart-part part boundary))
           (dst:write))
      (when (not= :string (type content))
        (stream-body dst content {:content-length (or content-length (content:length))})))
    (dst:write "\r\n"))
  (dst:write (format "--%s--\r\n" boundary)))

Again, I’m not going to dive too deep here. If you’re interested in the code, you can find and read it here.

Here’s an example of a multipart request body:

--a25537b6-51d6-491f-b18b-a33d197ae4c6\r
Content-Disposition: form-data; name="json-test"; filename="test.json"\r
Content-Length: 1438\r
Content-Transfer-Encoding: binary\r
Content-Type: application/octet-stream\r
\r
[
    "JSON Test Pattern pass1",
    {"object with 1 member":["array with 1 element"]},
    {},
    [],
    -42,
    true,
    false,
    null,
    {
        "integer": 1234567890,
        "real": -9876.543210,
        "e": 0.123456789e-12,
        "E": 1.234567890E+34,
        "":  23456789012E66,
        "zero": 0,
        "one": 1,
        "space": " ",
        "quote": "\"",
        "backslash": "\\",
        "controls": "\b\f\n\r\t",
        "slash": "/ & /",
        "alpha": "abcdefghijklmnopqrstuvwyz",
        "ALPHA": "ABCDEFGHIJKLMNOPQRSTUVWYZ",
        "digit": "0123456789",
        "0123456789": "digit",
        "special": "`1~!@#$%^&*()_+-={':[,]}|;.</>?",
        "hex": "\u0123\u4567\u89AB\uCDEF\uabcd\uef4A",
        "true": true,
        "false": false,
        "null": null,
        "array":[  ],
        "object":{  },
        "address": "50 St. James Street",
        "url": "http://www.JSON.org/",
        "comment": "// /* <!-- --",
        "# -- --> */": " ",
        " s p a c e d " :[1,2 , 3

,

4 , 5	   ,  		        6           ,7        ],"compact":[1,2,3,4,5,6,7],
        "jsontext": "{\"object with 1 member\":[\"array with 1 element\"]}",
        "quotes": "&#34; \u0022 %22 0x22 034 &#x22;",
        "/\\\"\uCAFE\uBABE\uAB98\uFCDE\ubcda\uef4A\b\f\n\r\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?"
: "A key can be any string"
    },
    0.5 ,98.6
,
99.44
,

1066,
1e1,
0.1e1,
1e-1,
1e00,2e+00,2e-00
,"rosebud"]
\r
--a25537b6-51d6-491f-b18b-a33d197ae4c6\r
Content-Disposition: form-data; name="description"\r
Content-Length: 27\r
Content-Transfer-Encoding: 8bit\r
Content-Type: text/plain; charset=UTF-8\r
\r
a JSON file to test parsers\r
--a25537b6-51d6-491f-b18b-a33d197ae4c6--\r

And here’s this request issued with the client library:

>> (http.post "http://httpbin.org/post"
              {:multipart [{:name "json-test"
                            :content (io.open "test/data/valid.json")
                            :filename "test.json"}
                           {:name "description" :content "a JSON file to test parsers"}]
               :as :json})
{:body {:args {}
        :data ""
        :files {:json-test "..."} ;; contents were excluded to save some vertical space
        :form {:description "a JSON file to test parsers"}
        :headers {:Content-Length "1946"
                  :Content-Type "multipart/form-data; boundary=------------5af9b616-6e9d-4c19-8aa5-cc0c5f52ca11"
                  :Host "httpbin.org"
                  :X-Amzn-Trace-Id "Root=1-66aad04d-5155e53a4023b5726bdd134e"}
        :url "http://httpbin.org/post"}
 :headers {:Access-Control-Allow-Credentials "true"
           :Access-Control-Allow-Origin "*"
           :Connection "keep-alive"
           :Content-Length "2118"
           :Content-Type "application/json"
           :Date "Thu, 01 Aug 2024 00:01:18 GMT"
           :Server "gunicorn/19.9.0"}
 :http-client #<SocketChannel: 0x559596d0e200>
 :length 2118
 :protocol-version {:major 1 :minor 1 :name "HTTP"}
 :reason-phrase "OK"
 :request-time 388
 :status 200}

As you can see, the JSON got into the files field, and the description into the form field.

Redirects and query-params

Honestly, implementing redirection was a pain in the butt. But I’m glad that I did!

First, I now finally understand how redirection works. Second, I had to do a major code refactoring for it, which made some parts more approachable, and robust. That was a lot of changes, though.

Alongside redirects, I improved the URL parser and implemented a query builder, that can merge multiple queries into one. Here’s an example, again with httpbin.org:

>> (http.get "http://httpbin.org/redirect-to?status_code=302"
             {:query-params {:url "http://httpbin.org/get"}
              :as :json})
{:body {:args {}
        :headers {:Host "httpbin.org"
                  :X-Amzn-Trace-Id "Root=1-66aad21a-4c52d9316b88f5d862a4b535"}
        :url "http://httpbin.org/get"}
 :headers {:Access-Control-Allow-Credentials "true"
           :Access-Control-Allow-Origin "*"
           :Connection "keep-alive"
           :Content-Length "199"
           :Content-Type "application/json"
           :Date "Thu, 01 Aug 2024 00:08:58 GMT"
           :Server "gunicorn/19.9.0"}
 :http-client #<SocketChannel: 0x559596734da0>
 :length 199
 :protocol-version {:major 1 :minor 1 :name "HTTP"}
 :reason-phrase "OK"
 :request-time 394
 :status 200}

As can be seen, the query params in the URL string and the query-params field in the options table were merged. During the merge, all values are URL-encoded, so you can pass in URLs that are not entirely correct, and it should work.

Of course, we can change how we deal with redirects, limiting the total amounts of redirects:

>> (http.get "http://httpbin.org/absolute-redirect/4"
             {:max-redirects 3
              :as :json})
too many redirecs

Or, we can choose not to follow redirects altogether:

>> (http.get "http://httpbin.org/absolute-redirect/4"
             {:follow-redirects? false})
{:body "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\">
<title>Redirecting...</title>
<h1>Redirecting...</h1>
<p>You should be redirected automatically to target URL: <a href=\"http://httpbin.org/absolute-redirect/3\">http://httpbin.org/absolute-redirect/3</a>.  If not click the link."
 :headers {:Access-Control-Allow-Credentials "true"
           :Access-Control-Allow-Origin "*"
           :Connection "keep-alive"
           :Content-Length "283"
           :Content-Type "text/html; charset=utf-8"
           :Date "Thu, 01 Aug 2024 00:24:48 GMT"
           :Location "http://httpbin.org/absolute-redirect/3"
           :Server "gunicorn/19.9.0"}
 :http-client #<SocketChannel: 0x5589fcb69530>
 :length 283
 :protocol-version {:major 1 :minor 1 :name "HTTP"}
 :reason-phrase "FOUND"
 :request-time 228
 :status 302}

Changes in the asynchronous API

I’ve decided to make it easier for users of this library to work with the client’s asynchronous API, and you should no longer be required to use async.fnl directly. Instead, every request function now accepts callbacks for processing responses, and exceptions:

(http.get "https://lua-users.org"
          {:async? true}
          (fn on-response [resp] (print resp.body))
          (fn on-raise [err] (print err.status)))

This means, that the only need for async.fnl is if you need to wait for certain requests to complete. It also means, that the http.METHOD functions will no longer return a promise channel, as I promised in the previous post.

In addition to that, it is now possible to read the response body, returned as a stream on the main thread, even if the request was asynchronous. This is possible because I’ve cheated:

(fn <!? [port]
  "Takes a value from `port`.  Will return `nil` if closed.  Will block,
if nothing is available and used on the main thread.  Will park if
used in the `(go ...)` block."
  (if (main-thread?)
      (<!! port)
      (<! port)))

(fn >!? [port val]
  "Puts a `val` into `port`.  `nil` values are not allowed.  Will block,
if no buffer space or consumer are available and used on the main
thread.  Will park if used inside a `(go ...)` block.  Returns `true`
unless `port` is already closed."
  (if (main-thread?)
      (>!! port val)
      (>! port val)))

Now try to do something like that in Clojure! You can read why this works in my post on async.fnl, or watch my talk at Fennelconf 2023 where I explain this with some more examples.

Performance check

Since the client got bigger and more complicated, let’s check the performance! In the previous post, the client couldn’t accept a chunked response, but now it can, meaning we can compare it with Luasocket’s HTTP client:

>> (local lhttp (require :socket.http))
nil
>> (= (lhttp.request "http://lua-users.org/wiki/FiltersSourcesAndSinks")
      (. (http.get "http://lua-users.org/wiki/FiltersSourcesAndSinks") :body))
true
>>

Let’s measure the time again:

>> (time (do (http.get "http://lua-users.org/wiki/FiltersSourcesAndSinks") nil))
Elapsed 501.31988525391 ms
nil
>> (time (do (lhttp.request "http://lua-users.org/wiki/FiltersSourcesAndSinks") nil))
Elapsed 630.47289848328 ms
nil

It’s about the same. However, we can be slightly faster, if the user doesn’t need to process the body immediately:

>> (time (do (http.get "http://lua-users.org/wiki/FiltersSourcesAndSinks" {:as :stream}) nil))
Elapsed 317.51012802124 ms
nil

Of course, the time is saved by not reading the body of the response, which shows that reading the body alone is about 200ms. But Luasocket’s HTTP client can’t do that at all - it always processes the body, even if the generic form is used and a sink is supplied. That’s an advantage, I would say!

And, thanks to the asynchronous processing of requests, we can actually issue multiple requests at the same time:

>> (let [n-requests 10
         ch (chan)]
     (print ";;; luasocket")
     (time
      (let [start (gettime)]
        (for [i 1 n-requests]
          ;; issuing a synchronous request
          (lhttp.request "http://lua-users.org/wiki/FiltersSourcesAndSinks")
          (io.write i " response processed in " (- (gettime) start)
                    " ms since the first request\n"))))
     (print ";;; fnl-http")
     (time
      (let [start (gettime)]
        (for [i 1 n-requests]
          ;; issuing multiple requests. The on-success handler
          ;; will put the request number into the channel `ch`
          ;; on completion.
          (http.get "http://lua-users.org/wiki/FiltersSourcesAndSinks"
                    {:async? true} #(>! ch [i (- (gettime) start)]) #nil))
        (for [i 1 n-requests]
          ;; avaiting each response
          (let [[i ms] (<!! ch)]
            (io.write i " response processed in " ms
                      " ms since the first request\n"))))))
;;; luasocket
1 response processed in 0.55836296081543 ms since the first request
2 response processed in 1.0487260818481 ms since the first request
3 response processed in 1.4974920749664 ms since the first request
4 response processed in 1.9714229106903 ms since the first request
5 response processed in 2.4879760742188 ms since the first request
6 response processed in 3.010272026062 ms since the first request
7 response processed in 3.6146268844604 ms since the first request
8 response processed in 4.1402359008789 ms since the first request
9 response processed in 4.6472299098969 ms since the first request
10 response processed in 5.1433019638062 ms since the first request
Elapsed 5143.6059474945 ms
;;; fnl-http
5 response processed in 1.3388519287109 ms since the first request
7 response processed in 1.4401869773865 ms since the first request
9 response processed in 1.5485229492188 ms since the first request
4 response processed in 1.6689200401306 ms since the first request
8 response processed in 1.7789130210876 ms since the first request
6 response processed in 1.9086010456085 ms since the first request
1 response processed in 2.0185270309448 ms since the first request
10 response processed in 2.1092948913574 ms since the first request
2 response processed in 2.2185280323029 ms since the first request
3 response processed in 2.3184778690338 ms since the first request
Elapsed 2318.8180923462 ms

Still ‘bout twice as fast!

And, as you can see, the order of request completion in the asynchronous case is different, since multiple requests were processed in parallel. Well, it’s not really parallel execution, since the Lua runtime only has one thread, and all response reading is synchronous, but the moment we’ve written the request into the socket, the operating system, and networking hardware could act independently. Thus, the response was sent, and the server on the other side received a series of requests, and probably processed them in parallel, resulting in a faster overall response. While we wait for the data to come back to our socket, other tasks can be executed, such as making another request or processing a part of the response. Of course, when we process the response, we do it asynchronously as well, creating opportunities for other asynchronous threads to collaborate.

Neat stuff!

Even further development, I guess?

When I started this project, I just wanted to use async.fnl for something more real and see if it works. After working on it for about a month, and reading the source code of the clj-http client, this project gradually became something like a re-implementation of clj-http in Fennel. So I changed the project name to fnl-http. It’s not a one-to-one port like async.fnl where I studied the source code - here all code is written from scratch without looking at other clients, but still, I tried to make it faithful.

There are still a lot of parts of HTTP/1.1 that I haven’t processed, and I still have some refactoring ideas and TODOs, so stay tuned!