String interpolation in Lua
If you know Python or do a lot of shell scripting (or even write in Perl), you’re probably familiar with the ability of these languages to reference variables or even expressions from string literals.
In a shell, it is possible to do this:
echo "Hello, $user"
# prints: Hello, <your username here>
echo "2 + 2 = $(echo 2 + 2 | bc)"
# 2 + 2 = 4
A similar feature exists for Python with f-strings:
f"{2 + 2 = }"
# '2 + 2 = 4'
That’s a cool feature! I wish I could do this in Lua!
Unfortunately, Lua doesn’t have such a feature, and we’re stuck with plain ol’ string.format()
.
Which is fine, but it has some quirks.
For instance, when you need to repeat the same thing several times in the resulting string, you have to repeat it the same amount of times in the format string, and in the arguments:
function provoke_thought (what)
print(string.format("to %s or not to %s", what what))
end
provoke_thought("be") -- to be or not to be
provoke_thought("bee") -- to bee or not to bee
With f-strings in Python, it can be done like this:
def provoke_thought (what):
print(f"to {what} or not to {what}")
provoke_thought("bee") # to bee or not to bee
It’s a bit nicer, as you don’t have to repeat both format specifiers and the variable two times. Another bit of convenience is when you have a lot of arguments:
print(string.format("this string has %d amount of arguments, which are %d (a %s), %q (a %s), and %p (a pointer)",
n,
answer,
type(answer),
greeting,
type(greeting),
string.format))
Which prints, assuming these variables are defined somewhere, the following message:
this string has 4 amount of arguments, which are 42 (a number), “hi mom!” (a string), and 0x7f0d5fb7f1b0 (a pointer)
Now, when you have this many variables, it’s not as fun to maintain. Especially, when you need to modify this message, possibly reordering format strings, or removing/adding some.
Let’s look how this would look with f-string in Lua:
print(f"this string has {n} amount of arguments, which are {answer} (a {type(answer)}), {greeting:%q} (a {type(greeting)}), and {string.format:%p} (a pointer)")
-- this string has 4 amount of arguments, which are 42 (a number), "hi mom!" (a string), and 0x7f0d5fb7f1b0 (a pointer)
Wait, didn’t I just say, that Lua doesn’t have f-strings?
That’s where I present you interpolation.lua.
interpolation.lua
A clever name, I know.
This is a small library I made, that handles string interpolation in Lua in a similar way to Python’s f-strings. There are some differences, of course, but in general, it should be familiar to you, if you ever used one or another variant of this feature.
Let’s look at some of its capabilities.
You require this library like this:
local f = require "interpolation"
And now you have the function f
, which accepts a string, parses it, and spits out the result, formatted accordingly to the expansions in {}
.
When there are no expansions, the string is returned as-is:
f"hi, mom"
hi, mom
However, when there are expansions, you can do various things. For example, you can define some variables, and reference them in the string:
local x, y = 4, 2
f"{x}{y}"
42
You can even evaluate expressions:
local x, y = 40, 2
f"The Answer is: {x + y}"
The Answer is: 42
One cool feature is the =
modifier, which adds the original expression to the output string:
local x, y = 40, 2
f"{x + y = }"
x + y = 42
It’s like when I’m doing literate programming in Org-mode, where I can do a similar thing by writing src_lua[:exports both :post format-result(result=*this*) :wrap]{return 1 + 2}
, and it will expand to return 1 + 2
= 3
.
Except, much less verbose.
As I have shown before, the same format specifiers are available, as per string.format
:
local x, y = 40, 2
f"{x + y = :%X}"
x + y = 2A
Now, let’s look at how this is implemented because there are some tricks.
The parser
First things first, this is not a mere string substitution, where I look for something like {([^}]*)}
and replace the entire thing with some value.
Instead, there’s a parser, which reads the string byte-by-byte until it encounters the {
character.
To read a string byte-by-byte I had to port a string-reader
from my reader.fnl project.
After passing a string to the string-reader
function, we get back a reader object with the following methods: read
, lines
, peek
, length
, and close
.
This library doesn’t really use anything but read
, peek
, and close
, but it’s good to have the rest still.
Next, the parser itself:
local function parse_fstring(str)
local rdr, chars, exprs = string_reader(str), {}, {n=0}
return parse(rdr, chars, exprs)
end
First, we prepare some storage objects, and the reader, and call the main parse
function:
local function parse(rdr, chars, exprs)
local char = rdr:peek(1)
if char == "{" then
rdr:read(1)
parse_expansion(rdr, chars, exprs)
return parse(rdr, chars, exprs)
elseif char ~= nil then
chars[#chars+1] = rdr:read(1)
return parse(rdr, chars, exprs)
else
return table.concat(chars), exprs
end end
It looks like a fairly straightforward parser, but the meat and bones are in the parse_expansion
function.
We call it when we encounter a {
in the string.
It’s pretty big because it has to handle several cases.
Once we’ve found {
we enter expansion parsing mode.
In this mode, we look for these cases:
- We’ve found another
{
- We’ve found a
"
or'
- We’ve found an equal sign:
=
- We’ve found a colon followed by percent:
:%
- Finally, we’ve found the closing
}
If we find another opening curly, that means that the expansion contains a table literal. In this case, we enter another parsing mode, that parses a table, until it finds the matching amount of closing curly braces. So any level of table nesting is supported.
If we find a "
or '
, it means that the expansion is holding a literal string, so we enter a string parsing mode.
This means a few things to us, because inside strings we should not treat both special characters described below, and table delimiters in the table parser as well.
So the table parser also uses the string parser.
Otherwise, if we’re outside of a string or a table, we process the rest of the special characters.
If we find the =
somewhere before the format sequence, we inject the current expression into the resulting string, and then append a format specifier, %s
by default, unless changed via :%
sequence.
If we find the :%
in a string, we assume that this starts the format sequence of the expansion.
So we enter another parsing mode, where we parse and remember the format specifier.
After that, we scan for a closing }
. If there’s something else after the format sequence, we signal an error, as there should not be anything but spaces.
I glossed over this, so if you’re interested in more details of how this works, I welcome you to read the code. Next, let’s see how variables and expressions are turned into values.
Expansion substitution
At the start of the f
function, we create an environment:
env = setmetatable(locals(), { __index = setmetatable(upvalues(), { __index = _G })})
This environment mimics what the Lua VM can see when it needs to get the value of a variable onto the stack.
First, we obtain all locals from the current scope, right outside of the function call:
local function locals()
local idx, variables = 1, {}
while true do
local ln, lv = debug.getlocal(3, idx)
if ln ~= nil then
variables[ln] = lv
else
break
end
idx = 1 + idx
end
return variables
end
It’s a rather simple function, that relies on debug.getlocal
to obtain all visible local variables - their names and values.
So we store it in the table.
Next are upvalues. Upvalues are variables in outer scopes, so we need to get them separately:
local function upvalues()
local variables, idx, func = {}, 1, debug.getinfo(3, "f").func
while true do
local ln, lv = debug.getupvalue(func, idx)
if ln ~= nil then
variables[ln] = lv
else
break
end
idx = 1 + idx
end
return variables
end
It’s similar to locals
, except it uses debug.getupvalue
, ans stores variable names and their values into a table.
We then compose these tables via the __index
metamethod, maintaining the scope hierarchy.
If some variable x
exists as a local but also exists as an upvalue, we will correctly see the local x
value.
Otherwise, if x
is not in the table returned by locals
, we automatically look it in locals
’ __index
metatable.
And if it’s not there, it might be in the _G
table for globals.
Expressionn evaluation
However, not all things can be looked up the way described above.
For example f"{math.pi}"
would fail, despite math
being in _G
, because we would look for _G["math.pi"]
, and there’s no such thing there.
And we can’t really assume that any sequence of characters with .
in the middle is a table lookup, so we can’t translate {math.pi}
to _G["math"]["pi"]
.
Even if we could, it wouldn’t be a good solution, because we can also have expressions like f"{1 + 2}"
.
Instead of implementing a small interpreted language for the expansions, I decided to use Lua’s load
function (and loadstring
for Lua 5.1):
local function eval(code, env)
local env = env or rawget(_G, "_ENV") or _G
local setfenv, loadstring, f = rawget(_G, "setfenv"), rawget(_G, "loadstring")
if (nil ~= setfenv) and (nil ~= loadstring) then
f = assert(loadstring(code))
setfenv(f, env)
elseif assert(nil ~= load, "neither load nor loadstring are available") then
f = assert(load(code, nil, "t", env))
end
return f()
end
So our lookup mechanism is as follows.
So our lookup mechanism is as follows.
First, we try to look at the variable name directly in the env
table.
Then, if nothing was found, we try to eval
an expression.
However, you can’t simply do eval("1 + 2")
in Lua, because Lua requires explicit return
even on the top level.
So our code looks like this:
local value = env[name] or eval("return " .. name, env)
Bypassing variable lookup
There’s one quirk about this implementation, that I’m not sure how to fix. I noticed it randomly when I was writing tests.
What’s the output of this:
local function string_adder(x)
return function (y)
return f"{x + y = }"
end end
local g = string_adder(40)
g(2)
If your guess is a string "40 + 2 = 42"
, then your understanding of this library is correct, but your understanding of the underlying Lua runtime is a bit incomplete.
It’s not at all a problem, this stuff is not required to use Lua, and the problem is subtle.
The resulting string won’t even be returned, because it will raise an error attempt to perform arithmetic on a nil value
.
Why is that?
Before I explain, let’s fix this function:
local function string_adder(x)
return function (y)
return f"{x + y = }", x
end end
local g = string_adder(40)
(g(2))
Now the result is indeed "40 + 2 = 42"
, so what gives?
Apparently, if the code never actually references the variable, it won’t be visible by debug.getupvalue
.
By code, I mean the string of the actual code that is being compiled by the Lua VM into bytecode.
It’s logically sound, though - why would the compiler even bother about storing an upvalue if it is never used?
And the compiler doesn’t know that we will in fact use this upvalue in our eval
function, by extracting it from the string in this special way.
So if this fails, and all you need is to return a formatted string, what do you do?
The f
function got you covered - you don’t need to rely on local lookup, instead, you can provide your own env
:
local function string_adder(x)
return function (y)
return f("{x + y = }", {x = x, y = y})
end end
local g = string_adder(40)
g(2)
This looks like string.format
, again, except it allows providing variables in any order and without duplicates.
I mean, f("Agent {x}{x}{y}", {x = 0, y = 6-1})
is still a bit more concise in my book than repeating x
several times.
Passing your own env
prevents any other kind of lookup, so no more globals, upvalues, or locals will be found.
This also makes this library usable without the debug
library, although it still requires load
for evaluation.
Again, I don’t want to write a Lua interpreter as a fallback for that.
A cool feature?
For me, sure yes.
I like this to be in the language I work with, as it makes some specific cases less of a burden to write and maintain.
Of course, not all strings should become f-strings, there’s still a place both for string.format
and plain string concatenation.
Now, why do I make this library for Lua in Lua, and not for Fennel, as I mainly do my hobby project in it? First of all, I enjoy occasionally writing in pure Lua. I like the language, it has its charm. Fennel is objectively a better dialect of Lua, but sometimes I want to be reminded why it is.
Second, Lua allows for calling functions on strings without parentheses: print "foo"
.
This feature also applies to tables, meaning you can create small DSLs in your program.
For example, one of my libraries implements a JSON encoder.
In Lua, I can call json{x = 1, y = 2}
and get back a string '{"x": 1, "y": 2}'
.
This way I don’t need to write JSON in strings manually, and I can reuse Lua’s table syntax and my editor’s formatting features.
This might be a silly reason to make this in Lua, but I wanted to make this library because it sounded fun and easy.
In the end, it wasn’t easy but was still pretty fun.
It might be a better fit for Fennel still because all these f-strings
could be generated at compile time, additionally solving the upvalue problem, because variables will be referenced, and there would be no need for a custom evaluation step.
This is why I think this feature should be a part of the language and implemented in the compiler itself. I have a toy language that implements a bytecode VM, so perhaps I could try to make it there to see if it would work well. But I hope to see f-strings in Lua someday. Maybe in Lua 12.4, who knows?