GRScheme Design Part 3 - Quasiquoting
At the end of the Part 2 I mentioned that quasiquoting was problematic in GRScheme, due to some of the internal representations and algorithms, but now it is not an issue (I hope).
I’ve figured out the correct way to maintain a list invariant, splice one tree into another tree, and so on.
So this post will describe how quasiquote
, unquote
, and unquote-splicing
procedures are done in GRScheme.
Also, I should mention, that I’ve changed my mind, and now results are printed without the leading quote:
;; before
> (car '(a b c))
'a
> (car '('a 'b 'c))
''a
;; now
> (car '(a b c))
a
> (car '('a 'b 'c))
'a
I was using this format because Racket language was the first thing I saw when I touched the world of Scheme and Lisp, so it was natural to me, to see that results are quoted. Although after some interactions with Clojure, Common Lisp, and GNU Guile, I’ve changed my mind. This doesn’t affect internal representation though, only printing. But if you don’t know about it, it probably will not matter at all.
Quasiquoting
First things I’ve did were quasiquote
and unquote
.
Although those are directly related, since we can’t use unquote
outside of quasiquote
, let’s first look at how plain quasiquote
behaves.
In Racket, quasiquote
of any quote-able form yields quoted form:
Note: code examples below are representing Racket and GRScheme REPLs, so for better distinguishing purposes, each prompt will be marked with a language name, e.g.
racket>
orgrscheme>
.
racket> `a
'a
racket> `(a)
'(a)
This is also the case for GRScheme:
grscheme> `a
a
grscheme> `(a)
(a)
All nested quasiquote
calls are preserved in their original form, e.g. not changed to quote
:
racket> ``a
'`a
racket> `(`a)
'(`a)
And in GRScheme:
grscheme> ``a
`a
grscheme> `(`a)
(`a)
This means that the interpreter should be aware of what quasiquote it is expanding at the moment - is it nested or not?
We’re not going to explicitly check for this by analyzing expressions up the tree, but instead, we’ll add a counter to our eval
function:
pub fn eval(&mut self, expression: &NodePtr) -> Result<NodePtr, EvalError> {
let mut stack = vec![expression.clone()];
let mut res = None;
let mut quasiquote_level = 0;
while let Some(expr) = stack.last() {
res = match Self::expression_type(&expr) {
// …
}
}
}
So when our quasiquote_level
is zero, we’re going to interpret its result as quoted form, and as quasiquote
form otherwise.
I know that this is probably not the best way to manage this, but believe me, the worst is yet to come.
Quasiquoting in GRScheme is really ad hoc right now, but it works.
Now we need to actually interpret quasiquote
forms.
Let’s look at match
arms that are responsible for that:
pub fn eval(&mut self, expression: &NodePtr) -> Result<NodePtr, EvalError> {
let mut stack = vec![expression.clone()];
let mut res = None;
let mut quasiquote_level = 0;
let mut unquote_level = 0;
while let Some(expr) = stack.last() {
res = match Self::expression_type(&expr) {
// …
Type::Quasiquote if quasiquote_level == 0 || quasiquote_level == unquote_level => {
let args = Self::rest_expressions(&expr)?;
let unquotes = Self::pre_quasiquote(&args)?;
if !unquotes.is_empty() {
stack.extend_from_slice(&unquotes);
}
quasiquote_level += 1;
continue;
}
Type::Quasiquote => {
let (res, _) = Self::quote(&Self::rest_expressions(&expr)?)?;
Tree::replace_tree(&expr, res);
quasiquote_level -= 1;
stack.pop()
}
// …
}
}
}
Notice that I’ve added unquote_level
counter, and we use both to decide what match
arm to take.
If our quasiquote_level
is 0
or our quasiquote_level
is equal to unquote_level
we’re taking first branch, that extends our tree traversal stack, increments quasiquote_level
and goes to next iteration.
Otherwise, we quote
the result, which means that we’re out of current quasiquote
context.
Let’s look at the first arm in more detail.
Let’s imagine that we’re evaluating `(a b ,c d)
expression here.
First, we take out the vector of all arguments to quasiquote
, which in this case is [a, b, (unquote c), d]
.
Next we pass it to pre_quasiquote
procedure.
As was described in Part 2 all pre_something
procedures are used to decide what arguments should be evaluated before we go further.
Mostly what we did before, was just filtering of arguments based on their type.
But that’s not really the case for pre_quasiquote
, because unquote
has another property that must be maintained.
But first, let’s what unquote
does.
Let’s look at this expression:
racket> (define c 42)
racket> `(a b ,c d)
'(a b 42 d)
We can see that ,c
was replaced with the value of c
.
That’s what unquote
does in quasiquote
.
One can think of quasiquoting as of inversion of evaluation of list
arguments:
racket> (list 'a 'b c 'd)
'(a b 42 c)
You can see that everything was quoted except c
.
In quasiquote
form only c
was unquoted.
But there’s another property for unquote
and it is quasiquote
nesting level:
racket> `(a '(b ,c d))
'(a '(b 42 c))
Nothing changed, c
still evaluates to 42
, but if we change '
to `
:
racket> `(a `(b ,c d))
'(a `(b ,c d))
Nothing happened to c
.
This is because there were two nested quasiquote
calls and only one unquote
one.
If we add another unquote
we’ll see this:
racket> `(a `(b ,,c d))
'(a `(b ,42 d))
,c
was replaced by 42
, but outer ,
is still there, thus ,42
is the result of ,,c
.
Nested quasiquotes
must be escaped by the same amount of nested unquotes
.
This is also the case for unquote-splicing
.
Racket documentation features this weird expression:
racket> `(1 ```,,@,,@(list (+ 1 2)) 4)
'(1 ```,,@,3 4)
That is quite complex, but it works in GRScheme in the same way:
grscheme> (define list (lambda x x))
grscheme> `(1 ```,,@,,@(list (+ 1 2)) 4)
(1 ```,,@,3 4)
With this information, we can now look at how pre_quasiquote
is implemented.
Filtering arguments for quasiquote
The pre_quasiquote
procedure is quite big, so we’ll split it into parts as was done in previous posts.
The first thing we do is check for argument count, and create two stacks, one for tree traversal, and another for unquote
forms that are valid to evaluate:
fn pre_quasiquote(args: &[NodePtr]) -> Result<Vec<NodePtr>, EvalError> {
Self::check_argument_count("quasiquote", ArgAmount::NotEqual(1), args)?;
let mut stack = vec![args[0].clone()];
let mut unquotes = vec![];
// …
Ok(unquotes)
}
In the end, we will return a vector of unquote
forms, which may be empty, if no valid forms were found, either because of the fact that levels weren’t balanced, or because there simply were no unquote
calls.
Next, we iterate over our stack, as in eval
procedure, and check for
expression type on each iteration:
while let Some(expr) = stack.pop() {
match Self::expression_type(&expr) {
// …
}
}
Depending on what type we’ve got we going to, either put it to the stack, put it to quotes, or continue.
The first type, which is the simplest case, that we need to check is Procedure
:
Type::Procedure => stack.extend_from_slice(&expr.borrow().siblings),
If we see (something …)
we expand our stack with all inside expressions, and continue
.
Next arm checks for Quasiquote
, Symbol
, and List
.
We do one more check for arguments, because all these types take exactly one argument, and if everything is OK, we continue
:
Type::Quasiquote | Type::Symbol | Type::List => {
let args = Self::rest_expressions(&expr)?;
Self::check_argument_count("quasiquote", ArgAmount::NotEqual(1), &args)?;
stack.extend_from_slice(&args);
}
The second to last arm is for all kinds of unquote
expressions, and that’s where everything happens.
The last arm does nothing, as we don’t need to extend the stack with other types:
Type::Unquote | Type::UnquoteSplicing => {
let args = Self::rest_expressions(&expr)?;
Self::check_argument_count("unquote", ArgAmount::NotEqual(1), &args)?;
let (q, u) = Self::quasiquote_unquote_levels(&expr);
if q == 0 {
return Err(EvalError::UnquoteNotInQquote);
} else {
if q == u {
unquotes.push(expr.clone());
}
match Self::expression_type(&args[0]) {
Type::Procedure | Type::Quasiquote | Type::List | Type::Symbol => {
stack.extend_from_slice(&args[0].borrow().siblings)
}
Type::Unquote | Type::UnquoteSplicing if q > u => {
stack.push(args[0].clone())
}
Type::Unquote | Type::UnquoteSplicing => {
return Err(EvalError::UnquoteNotInQquote)
}
_ => (),
}
}
}
_ => (),
So what’s happening here is another check for arguments, and quasiquote_unquote_levels
call.
This function analyses the current expression, and all its parents to be either quasiquote
or unquote
and unqoute-splicing
calls, and sets counters for the nest levels of those.
This function returns a tuple, which we destructive binding to q
and u
variables, that stand for quasiquote
and unquote
.
If q
is equal to 0
this means that we’ve got unquote
outside of quasiquote
, which is not allowed.
Otherwise, if we got the same levels we push expression to unquotes
stack.
But this is not the end, we still have to analyze inner expressions, because unquote
may contain other calls for unquote
or quasiquote
and so on, so our algorithm repeats itself one more time and pushes expressions to stack depending on their types and levels.
This function produces vector of valid unquote
and unquote-splicing
calls that we pass to our eval
function and evaluate.
Unquote
procedure
To perform unquoting, we have to take into account several things.
But first let’s look at Unquote
arms in eval
procedure:
Type::Unquote if quasiquote_level > 0 => {
let rest = Self::rest_expressions(&expr)?;
let args_to_eval = Self::pre_unquote(&rest)?;
if !args_to_eval.is_empty() {
unquote_level += 1;
stack.extend_from_slice(&args_to_eval);
continue;
}
unquote_level -= 1;
let (res, pop) = Self::unquote(&rest)?;
Tree::replace_tree(&expr, res);
if pop {
stack.pop()
} else {
continue;
}
}
Type::Unquote => return Err(EvalError::UnquoteNotInQquote),
Similarly to qasiquote
arm, we check for quasiquote_level
, but now if it is greater than 0
.
If it’s 0
it’s an error that we have to catch at the top level.
You can see that there’s another pre_
procedure here.
That is because we’re going to evaluate unquote
argument, and then actually unquote
it.
pre_unquote
just checks for valid forms that we actually need to evaluate, similarly to other pre_
procedures, so I don’t think we need to look closely at it.
We’ve seen a lot of those in Part 2, and this one is no different.
After everything is evaluated, we call Self::unquote
:
fn unquote(args: &[NodePtr]) -> Result<(NodePtr, bool), EvalError> {
Self::check_argument_count("unquote", ArgAmount::NotEqual(1), args)?;
match Self::expression_type(&args[0]) {
Type::List | Type::Symbol | Type::Quasiquote => {
Ok((Self::rest_expressions(&args[0])?[0].clone(), true))
}
_ => Ok((args[0].clone(), false)),
}
}
This procedure is extremely similar to quote
, except it removes quote
from List
, Symbol
, and Quasiquote
forms, and returns all other forms as is.
It also indicates if we should pop our tree traversal stack or not.
Unquote-splicing
procedure
This procedure is kinda the same as unquote
and actually uses unquote
underneath, but the big thing is splicing
part here.
Because GRScheme uses trees of vectors, which are not linked lists, I thought that maybe there should not be such a thing as splicing, but in the end, I’ve decided that it’ll be better with it.
Splicing works kinda like you would expect, except that it modifies the vector:
impl Tree<T> {
// …
pub fn splice_in_childs(root: &NodePtr<T>, pos: usize, tree: NodePtr<T>) -> &NodePtr<T> {
let mut broot = root.borrow_mut();
let pos = if pos > broot.siblings.len() {
broot.siblings.len() - 1
} else {
pos
};
for child in tree.borrow().siblings.iter().rev() {
broot.siblings.insert(pos, child.clone());
}
root
}
// …
}
Not very idiomatic to do such a thing with vectors, but, eh, I can live with that.
This gives us the ability to splice in siblings of one tree into another one at a given place.
If a place is more than the size of the current tree’s siblings, we just put those to the end.
Now we can use it in UnquoteSplicing
arm:
Type::UnquoteSplicing if quasiquote_level > 0 => {
let parent = Tree::parent(&expr).unwrap();
if !Self::splicing_context_valid(&expr) {
return Err(EvalError::UnquoteSplicingInWrongContext)
}
let pos = Self::splicing_pos(&parent).unwrap();
let rest = Self::rest_expressions(&expr)?;
let args_to_eval = Self::pre_unquote(&rest)?;
if !args_to_eval.is_empty() {
unquote_level += 1;
stack.extend_from_slice(&args_to_eval);
continue;
}
unquote_level -= 1;
let (res, pop) = Self::unquote(&rest)?;
Tree::remove_child(&parent, pos);
Tree::splice_in_childs(&parent, pos, res);
if pop {
stack.pop()
} else {
continue;
}
}
Type::UnquoteSplicing => return Err(EvalError::UnquoteNotInQquote),
For the most part, it is the same as the unquote
procedure, except that at the end we splice results in, and also we check that we’re inside the valid context.
This check finds the quasiquote
procedure and checks if it has an argument of Procedure
:
`(,@'(1 2 3)) ;; valid
`,@'(1 2 3) ;; invalid
As a little showcase of how everything works so far, let’s implement the thread first macro from Clojure ->
:
grscheme> (define -> (lambda forms
(define loop (lambda (x forms)
(if (empty? forms)
x
(let (form (car forms))
(if (list? form)
(loop `(,(car form) ,x ,@(cdr form)) (cdr forms))
(loop `(,form ,x) (cdr forms)))))))
(eval (loop (car forms) (cdr forms)))))
grscheme> ->
#procedure:->
grscheme> (-> "Thread first!\n"
'(display))
Thread first!
grscheme> (-> ''(1 2 3)
'(cdr)
'(car)
'(cons '(3 4 5)))
(2 3 4 5)
Of course, this is not a real macro, and everything happens in run time every time we call ->
, but the algorithm is the same as in Clojure’s ->
.
GRScheme complete?
Certainly not. When I started working on it, I knew literally nothing about Scheme, Lisp, and the basic ideas behind its implementations. I was using different Lisps to some extent, like Emacs Lisp for writing small functions for my Emacs config, and going through SICP with Guile, but up until I wrote my own evaluator I had no complete understanding of what I’m doing. Since then I’ve read some papers on Lisp design, got into Clojure, and progressed through SICP, so I know a little more on how things should work internally for the most part. And with this knowledge…
I understand that my implementation of the whole thing is bad, and I mean really bad :D
There are still no macros, internal data representation is really ugly, strings are useless, there’s no file IO, REPL is really basic, there’s no debugging, the stack doesn’t exist, and so on. I don’t think I should really turn GRScheme into complete language, at least right now.
Even if I wanted to create a real practical language, that could compete with other Schemes, Clojure, or Common Lisp, I would have to redesign everything from the bottom up, so probably it will be easier to write a completely new thing. And that will still mean that I’ll have to re-implement every feature that those languages provide, which is an insane amount of work.
Although I would say that I think Rust is a great choice as the implementation language because it helps produce memory-safe code, and probably will make the implementation of multi-threading less worrying, compared to C. Initially, I had planned on supporting execution in separate threads, but I’ve dropped this idea because the current tree traversal algorithm can’t be easily converted to use multiple threads.
This mostly concludes my experiment with the development of Scheme-like language. I hope it was at least an interesting read. Thanks!