GRScheme Design Part 2 - Evaluation
Good things happened since my previous post! I’ve finished rewriting the interpreter from recursive to iterative, which means that there will be no stack overflows when traversing deep trees. I’ve also simplified name binding and lookup systems added support for arbitrary-sized integers, fractions, and 64-bit floating point numbers, but most importantly figured out tail call elimination and closures. This post will describe all of these themes!
Right now I still need to figure out unquoting and quasiquoting (which will be mentioned at the end), as well as the macro system, but I think that it will not be that difficult since I’ve already done most part of the language.
I’ve already briefly described how interpretation works in GRScheme, but to refresh it, here’s a general mechanism.
Given text input, reader produces an internal data structure that represents the input:
After that, the data structure is passed to the evaluator:
In previous post, I’ve described what happens in the reader module, and this post is going to be about evaluator.
Evaluator
structure
Since everything in Scheme is a list (in our case a vector, but this doesn’t matter right now), we have uniform rules for evaluation:
- If something is a list:
- the first element is a procedure name,
- rest are procedure arguments.
- Else if something is a non-primitive type:
- lookup in scope,
- evaluate
- If something is a primitive type, evaluate it to self.
This means that things such as quoted lists, symbols, numeric constants, and strings are primitive types:
'(list of items)
'symbol
1
,3.14
,22/7
"string of text"
I’ve marked quoted lists and symbols as cursive because actually, it’s not primitive types in GRScheme, but calls to quote
procedure:
(quote (list of items))
(quote symbol)
However I’ve said that primitive types are evaluated to themselves, and this is the case for quote
calls - those produce quote calls, e.g. themselves.
Well, most of the time.
I’ll explain why a bit later.
This leads us to procedure application syntax.
Everything that is wrapped with parentheses is a call to proceed with some arguments, where some may actually be none.
For example, procedure car
returns the first element of the list it was given:
> (car '(1 2 3)) ;; (car (quote (1 2 3)))
1
When we pass this expression to the reader it produces such a tree:
After that, evaluator calls eval
function, on that expression, which iterates over the tree and checks the type of each leaf.
The first leaf it finds, in this case, is the root of the tree, which is (
.
This means that it is a procedure call, so we check what procedure we should call.
In this case, it is car
.
We check if this name exists in the current scope by using the lookup
procedure, which in turn checks for each scope that is available for us at this moment - the scope of the caller, and the global scope.
Since caller scope is empty 1, we check global scope for name car
and find out that it holds #procedure:car
.
What is #procedure:car
?
It is a pattern - a special keyword that means something to the interpreter.
Patterns are reserved (for now) and can’t be created, but can be assigned to names, which is the case for car
.
This pattern describes how we should treat this name, e.g. as a #procedure
, which procedure it contains, car
, and what to expect of its arguments - should we evaluate those, or not.
Since we need to evaluate arguments of car
, our next step is the evaluation of the arguments.
In our case, there’s one argument, and it is a primitive type one - a list '(1 2 3)
, so we evaluate the call to quote
procedure.
The result of calling a quote with a list is a list itself, so quote
returns '(1 2 3)
.
Now, when all arguments are evaluated, we can pass those to #procedure:car
call.
car
should take the first element of the list that we’ve passed to it, so in the case of car
there should be only 1 parameter, and we need to check it.
If there are more parameters than there should be, we will throw an error and abort the execution of the whole expression.
In our case, there’s exactly one parameter of type list, which is what car
expects.
If the parameter has the wrong type car
will signal an error and abort.
If the list is empty, we will also signal an error.
In our case, we successfully take the first element from the list and quote
it.
So the result of (car '(1 2 3)
is (quote 1)
, but because 1
is a primitive type and can’t be quoted, the result of (quote 1)
expression is 1
, which is what interpreter prints after executing our expression.
With this information, you should understand the general idea behind the evaluation process. Now let’s look at the code.
Function eval
Here’s the whole code of eval
function at the moment of publication.
In recursive form it was much shorter, however, shared the common problem of all recursion algorithms - stack overflow.
Unfortunately, I could not figure out a way of making it tail-recursive to use trampolines, because we do some work after each iteration, so I’ve turned it to pure iterative form.
If you’re interested in how this function looks with the recursion algorithm check here.
This function is quite big to list here in its full form so let’s tackle it piece by piece. It also works better for explanation purposes.
Let’s begin with the function signature:
pub fn eval(&mut self, expression: &NodePtr) -> Result<NodePtr, EvalError> {
This function takes two parameters - &mut self
and expression: &NodePtr
.
NodePtr
is a type
that is defined in reader
module as pub type NodePtr = tree::NodePtr<GRData>;
and just a shorter way to manage tree leafs.
The first parameter is self
which means that this function is a method of an object it was called from.
This object is Evaluator
structure which is quite simple and has only one field:
pub struct Evaluator {
global_scope: HashMap<String, NodePtr>,
}
Thus if we create new Evaluator
and call eval
method it will be able to see global_scope
.
We’ll get to that later.
The first thing we do in eval
is stack creation, and variable to hold our end result:
let mut stack = vec![expression.clone()];
let mut res = None;
Now, when we have a place to store our result and a stack to iterate we can begin iteration itself:
while let Some(expr) = stack.last() {
res = match Self::expression_type(&expr) {
// …
}
}
Ok(res.unwrap())
This is the core of our evaluator. Well, not exactly the core, but an important part of it. We pop the last element from the stack and analyze its type. Depending on that type we will perform different actions. There are 9 types right now:
#[derive(Debug, Clone, PartialEq, PartialOrd)]
enum Type {
Integer,
Rational,
Float,
Name,
Str,
Symbol,
List,
Unquote,
Quasiquote,
Procedure,
Pattern,
}
We have types for different numbers, including Integer
for all integers, such as 1
, 123
, -123
, 2352975723524562345
, e.t.c., Rational
for fractions: 1/2
, 22/7
, 235297594562345/2359524562347
, and Float
for 64 bit floating numbers.
Next, we have Name
, names are everything that is simply a sequence of letters and numbers without white space and reserved characters, e.g. car
, quote
, hello1234
.
And Str
is for strings - arbitrary sequences of anything wrapped within pair of "
.
The next ones are more interesting.
We have Symbol
.
Symbols are quoted names and are primitive types.
For example car
is a name, and 'car
is a symbol.
List
is another primitive type, which we have already seen before: '(1 2 3)
.
A list can contain anything inside, including other lists.
And the big ones - Procedure
and Pattern
.
These types are special because the first one determines what we will do with our list, and the second one is how we will do it.
Rules are simple - everything that starts with (
is a Procedure
, and everything that starts with #
is a pattern.
Now there’s one problem.
We have lists and symbols, which are represented as '(a)
and 'a
but in fact are stored as (quote (a))
and (quote a)
.
This means that these are procedure calls.
So how does it work?
Let’s look at the function that determines the type of the expression:
fn expression_type(s: &NodePtr) -> Type {
let data = &s.borrow().data.data;
match data {
Data::String { data } => {
if data.starts_with('#') {
Type::Pattern
} else if !s.borrow().siblings.is_empty() {
match &s.borrow().siblings[0].borrow().data.data {
Data::String { data }
if data == "quote" && s.borrow().siblings.len() > 1 =>
{
match &s.borrow().siblings[1].borrow().data.data {
Data::String { data } if data == "(" => Type::List,
_ => Type::Symbol,
}
}
Data::String { data } if data == "unquote" => Type::Unquote,
Data::String { data } if data == "quasiquote" => Type::Quasiquote,
_ => Type::Procedure,
}
} else if data.starts_with('"') && data.ends_with('"') {
Type::Str
} else {
Type::Name
}
}
Data::Integer { .. } => Type::Integer,
Data::Float { .. } => Type::Float,
Data::Rational { .. } => Type::Rational,
}
}
It’s quite simple, actually.
We pass our tree to it and it analyses the top level.
If our root has String
data, there’s a possibility of it containing anything but numerical data.
So we check if the data
starts with #
, and if that’s the case, we got Pattern
.
If not, we check if it has sub-nodes, which actually means that this is a procedure call.
If the procedure we’re calling is quote
and there are other sub-nodes around, we check if those have either (
or whatever, thus producing List
and Symbol
.
If there are no extra sub-nodes, or the procedure is not quote
that’s Procedure
.
Same think for Unquote
and Quasiquote
.
If there are no sub-nodes at all, that’s either a Str
, if it starts and ends with "
, or a Name
.
In case there are numerical data, we take its type.
So how do we use this information?
In our while let
loop over the stack we get the expression type and match it against it.
But we do not actually need to match against all of those:
res = match Self::expression_type(&expr) {
Type::Procedure => { /* … */ }
Type::List | Type::Symbol => { /* … */}
Type::Quasiquote => { /* … */ }
Type::Unquote => { /* … */ }
Type::Name => { /* … */ }
_ => stack.pop(),
};
We obviously need to evaluate Procedure
, but because List
and Symbol
are a bit more fundamental (as building blocks) we evaluate those separately.
Unquote
and Quasiquote
are implemented as hack, so I won’t really discuss it here 2.
Names are also evaluated in a special way, and the rest types are evaluated to themselves so we do not need to do anything with those, so we simply pop those from the stack.
We do this at the end of each match
arm, except for the first one.
Let’s go from the bottom.
Name
evaluation
That’s how we evaluate names. It’s simple but there’s a lot of what actually happens:
Type::Name => {
Tree::replace_tree(&expr, self.lookup(&expr)?);
stack.pop()
}
First, we’re calling self.lookup
procedure, that looks through all available scopes for the value of the name.
There’s this ?
operator, which you may have seen before in some code above.
It is syntactic sugar for passing the correct value to the caller and the error value out of the caller.
Because lookup
returns Result
type and it is a compound error type we need to pass an error to the handler which is not this function.
We could write this call in a more explicit way:
match self.lookup(&expr) {
Ok(result) => result,
Err(error) => return Err(error),
}
But it is tedious to do, so ?
is great.
Let’s not look at lookup
for now, until we talk about scopes.
What you need to know is that lookup
will check every scope current expression car reach and the global scope.
If value can not be found it will return Err(EvalError::UnboundIdentifier { name })
which will be passed to error handler.
This error can’t be handled so we will see the message in stderr
saying that name
is not bound to anything.
If the value of a name was found we pass it to our next function Tree::replace_tree
.
The value of a Name
is always some tree, so we need to replace our current expression with a new tree.
Now we have finally touched the reduction process of the tree.
Our tree consists of reference-counted pointers to sub-trees.
What this means is that if we have a pointer to the tree if we have a pointer and we’re not violating any of the borrowing rules we can change the value of this pointer.
In the case of our eval
procedure, we have the stack
of such pointers, and when we take out each pointer we’re basically fine to do anything with it.
So when we’ve found the value of the Name
- new tree we can replace the old tree, holding that Name
with our new tree.
Let’s look on a simple example of (+ a 1)
given that a
holds 41
:
So we have this box labeled Global Scope.
Scopes use HashMap
to organize data and optimize search by hashing keys, but here we will use names directly.
When the evaluator reaches a
leaf, it calls lookup
, which checks if the global scope has key a
in the table, and gets out the copy of the tree that holds 42
:
Now that we got 42
we can replace a
with it:
We then drop our a
from the expression, and replace it with 42
, while dropping the pointer which originally held 42
.
The resulting expression is:
This is a general idea behind tree manipulation.
Tree::replace_tree
keeps the original pointer, but drops all of the node’s sub-nodes changes its value to the value of another node, and pushes new sub-nodes from another node.
Then the pointer to another node is dropped.
So for example if a
would have held this list: '(1 2 3)
we would get such expression:
Of course, it would not work, because you can’t add 1
to '(1 2 3)
, but this should illustrate the idea of what’s happening with the tree.
Now we can go to a bit more complex List
and Symbol
evaluation.
List
and Symbol
evaluation
On the last image we’ve seen list expression '(1 2 3)
and how it’s represented in GRScheme Data:
This is done in such a way because we always have a full tree of our program at any time of evaluation.
Therefore if we would apply the quote as the real Scheme does, we would get (1 2 3)
, store it in the tree like that, and when we will try to do something with that list, we would assume that it is a procedure.
Which is not.
So the quote
is stored within the list permanently.
The same goes for Symbols
, those are stored exactly like lists, except the second subtree is just a Name
, e.g. 'a
:
That’s how we evaluate Lists
and Symbols
:
Type::List | Type::Symbol => {
let (res, change) = Self::quote(&Self::rest_expressions(&expr)?)?;
if change {
Tree::replace_tree(&expr, res);
}
stack.pop()
}
It is similar to the evaluation of the Name
but a bit different.
First, instead of calling lookup
we call Self::quote
over the result of Self::rest_expressions
.
What rest_expressions
does, is:
- takes the reference to expression,
- checks whether it has sub-expressions,
- if there are sub expressions it checks if it has more than one of those,
- if There’s more than one, it creates a vector with pointers to those expressions, except the first one, and returns it,
- otherwise, it returns an empty vector,
- if no subexpressions were found it signals an error.
So quote
receives a reference to that vector of subexpressions.
This is what quote
does:
fn quote(args: &[NodePtr]) -> Result<(NodePtr, bool), EvalError> {
Self::check_argument_count("quote", ArgAmount::NotEqual(1), args)?;
let res = args[0].clone();
match Self::expression_type(&res) {
Type::Procedure | Type::Name | Type::Unquote | Type::Quasiquote => {
let root = Tree::new(GRData::from_str("("));
Tree::push_child(&root, GRData::from_str("#procedure:quote"));
Tree::push_tree(&root, res);
Ok((root, false))
}
_ => Ok((res, true)),
}
}
This function returns a compound type of Result<T, E>
, where T
is the success type, and E
is the error type.
In our case success type itself is compound, and it is a tuple that consists of a pointer to the subexpression and a Boolean.
First, we check if there’s a correct amount of arguments passed, which in the case of quote
is 1
.
Then we check its type.
If it is Procedure
, Name
, Unquote
, or Quasiquote
we wrap those into quote
and return with field argument set to false
.
If it is anything else, means that it cannot be quoted, so we return what was passed to quote
with the second field set to true
.
This second Boolean will indicate whether should we use replace_tree
over our result.
Because replacing a tree is somewhat expensive for long lists we should avoid it when possible.
Why do we do extra work if we don’t use it, though?
We’re using this function for quasiquoting without checking its second field.
But later about that.
After we’ve replaced our expression with the result we pop it out of the stack.
If it was the last expression we will use it as a result.
If not it will be used by another expression up the tree.
That pretty much concludes the evaluation process of these forms.
Now we can look at the evaluation of Procedure
.
Evaluation of a Procedure
This is the most complex part of the evaluator because it is the main part. I will split this into a lot of small parts so you can follow up and not get lost in all the stuff that happens because it spreads across the whole interpreter.
First things first, let’s remember how procedures are constructed.
It’s a list, which has at least one item inside it.
The first item is going to be the name of the procedure, and the rest, if any, will be parameters.
Thus (+ 1 2)
is a call to the procedure +
with arguments of 1
and 2
.
That’s not always the case, though.
The first argument can be another procedure call, e.g. ((get-plus) 1 2)
, where get-plus
is a procedure that returns a +
procedure, so we need to check for this type of thing.
That’s what we do when we’ve reached Type::Procedure
arm in our eval
:
Type::Procedure => {
let expr = expr.clone();
let proc = Self::first_expression(&expr)?;
let proc = match Self::expression_type(&proc) {
Type::Procedure | Type::Name => {
stack.push(proc);
continue;
}
_ => proc,
};
// …
}
First, we create a copy of a pointer to the current expression due to some ownership rules.
Then we use Self::first_expression
on a reference to this expression to get the first sub-expression which will either be a procedure, name, or something else.
Next, we check its type, and if it is either a Procedure
or Name
we push it to the stack and go to the next iteration.
This will evaluate Name
and return what it holds to us, so we can use it, and in the case of Procedure
, we will return to this arm and repeat the first steps.
What we are looking for is Pattern
.
And a very specific one.
All functions are described through #procedure:name
pattern.
For example in this expression (+ 1 2)
, +
actually holds a pattern #procedure:+
, and +
itself is a name, so we push it to the stack, evaluate it to #procedure:+
and return to our expression which now holds (#procedure:+ 1 1)
.
Since it is not Procedure
or Name
we set our proc
to it.
Patterns are used to describe what we should do with some particular thing, and also to describe what this particular thing should do with its arguments. So patterns is a special data type, that holds extra information about the environment it expects around it. Each pattern follows some kind of behavior pattern thus the name.
So once we’ve got our procedure pattern, we can get its arguments by calling Self::rest_expressions
on our pointer to the expression:
let args = Self::rest_expressions(&expr)?;
This creates a vector of arguments and we’re not going to do much with it right now. Our next step is the name of the procedure, so we request it:
let proc_name = proc.borrow().data.data.to_string();
Now once we have it, we check if it is more than 11 chars long, and substitute the first 11 chars of it. If it’s not 11 chars long, we’ve probably got the wrong pattern, but we’ll catch that later:
if proc_name.len() > 11 {
let name = &proc_name[11..];
let user_defined = proc.borrow().data.user_defined_procedure;
let args_to_eval: Vec<NodePtr> = match name {
"if" if !user_defined => Self::pre_if(&args)?,
"let" if !user_defined => Self::pre_let(&args)?,
"define" if !user_defined => Self::pre_define(&args)?,
"progn" if !user_defined => Self::pre_progn(&args)?,
&_ if user_defined
|| !BUILTINS_NOEVAL.contains(&name)
|| Self::expression_type(&proc) == Type::List =>
{
Self::pre_eval(&args)
}
&_ => Vec::with_capacity(0),
};
if !args_to_eval.is_empty() {
stack.extend_from_slice(&args_to_eval);
continue;
}
}
Now we have the name, in our case, it’s +
again because we’ve stripped the #procedure:
part of it away.
We also get the information from our node if this procedure is user_defined
or not, but about that later.
Now the interesting part happens.
You can see that match
with arms, such as "if"
, "let"
, "define"
, progn
all calling these pre_
procedures.
These are special forms.
We check if those are user-defined because if those are, they are not special anymore and we’ll call those regular functions.
But if they are not defined by the user, we call these pre_
functions and pass a reference to our array of arguments to those.
Let’s look at what happens in pre_if
:
fn pre_if(args: &[NodePtr]) -> Result<Vec<NodePtr>, EvalError> {
Self::check_argument_count("if", ArgAmount::LessThan(2), args)?;
match Self::expression_type(&args[0]) {
Type::Name | Type::Procedure | Type::Quasiquote | Type::Unquote => {
Ok(vec![args[0].clone()])
}
_ => Ok(vec![]),
}
}
This function checks if our amount of arguments for if
is valid, and then checks the type of the first one.
If it is either Name
, Procedure
, Quasiquote
, or Unquote
it wraps it to the vector and returns, otherwise it returns an empty vector.
The result of that function will be put into args_to_eval
variable, which we later check for emptiness, and if it’s not empty we push everything from it to the stack and go to the next iteration.
So in the case of if
we should only push the first argument, which is the condition, and evaluate it.
But we will decide later what branch to take, based on this condition.
So to illustrate this:
stack | current expression |
---|---|
EMPTY | (if (eq 1 2) "what" "ok") |
(if (eq 1 2) "what" "ok") |
if |
(if (eq 1 2) "what" "ok"), if |
NAME EVALUATION |
(#procedure:if (eq 1 2) "what" "ok") |
(#procedure:if (eq 1 2) "what" "ok") |
(#procedure:if (eq 1 2) "what" "ok"), (eq 1 2) |
CONTINUE |
(#procedure:if (eq 1 2) "what" "ok") |
(eq 1 2) |
(#procedure:if (eq 1 2) "what" "ok"), eq |
NAME EVALUATION |
(#procedure:if (#procedure:eq 1 2) "what" "ok") |
(#procedure:eq 1 2) |
(#procedure:if #f "what" "ok") |
CONTINUE |
This is how our expression transforms to this point.
These CONTINUE, and NAME EVALUATION are used to omit some things that would make this table somewhat longer.
Now we have this expression, and we again call pre_if
on its arguments, but the first argument no longer belongs to Name
, Procedure
, Quasiquote
, or Unquote
, so we return the empty vector, and thus go further.
stack | current expression |
---|---|
EMPTY | (#procedure:if #f "what" "ok") |
"ok" |
CONTINUE |
EMPTY | END |
When the stack is empty, means that we’ve traversed and reduced our tree fully, and the last popped item is the result, which in this case is "ok"
.
This happens with all pre_
procedures, those filter arguments are based on what special form we have as our procedure.
E.g. for define
we only need to evaluate the second one.
For let
we only need to evaluate each second one, because arguments are going in pairs.
pre_progn
is a bit different in a way that it reverses our arguments, because the order in progn
is a thing that we must follow, and because we put expressions to the stack, we have to reverse those to keep the original order:
fn pre_progn(args: &[NodePtr]) -> Result<Vec<NodePtr>, EvalError> {
Self::check_argument_count("progn", ArgAmount::LessThan(1), args)?;
Ok(args
.iter()
.rev()
.cloned()
.filter(|arg| match Self::expression_type(arg) {
Type::Name | Type::Procedure | Type::Quasiquote | Type::Unquote => true,
_ => false,
})
.collect())
}
(progn a b c)
without reverse would form this stack: [progn, a, b, c]
which means that c
will evaluate before a
, but if c
depends on a
, we’ll get error.
So we need to reverse arguments before we put those in the stack, thus getting this stack: [progn, c, b, a]
.
This way we will evaluate a
, b
, c
, and then came back to progn
, which will return it’s last expression c
.
If we got neither of the special forms, and our procedure is not in the list of procedures that do not evaluate arguments at all, we call pre_eval
which is identical to pre_progn
but doesn’t reverse argument order.
So once we’ve dealt with our arguments it’s time to apply the procedure to those.
For now, I will omit some things until we really need those, so in we call apply
our proc
to args
:
Tree::replace_tree(&expr, self.apply(&proc, &args)?);
continue;
Apply
procedure to arguments
Now we’ve got to procedure application, and yet again, the first thing we do is check proc
type:
fn apply(&mut self, proc: &NodePtr, args: &[NodePtr]) -> Result<NodePtr, EvalError> {
match Self::expression_type(&proc) {
// …
}
}
As you can remember, I’ve said that if we’ve got the wrong pattern, then we will probably catch it.
Here’s the place where this happens.
And the first type that is correct for us is…
List
?
Type::List => Self::list_get(&proc, &args),
Wait, what?
Yes, lists are functions.
Because the underlying data structure is a vector, and not a linked list, we actually can get elements from our list pretty fast!
So if we call list as a function and provide an argument of a certain number, we’ll get an element located at that index, pretty much as if we were accessing an array in Rust with []
:
> ('(1 2 3) 0)
1
> ('(a b c d) 1)
'b
Furthermore, if we provide two arguments, we can get sub list of a list, without using car
, cdr
, cons
and recursion at all:
> (define names '("Adam" "Josh" "Peter" "Steven" "Genry"))
> (names 1 3)
'("Josh" "Peter" "Steven")
If we try to access an element beyond the boundaries of a list we’ll get an error:
> ('(1 2 3 4 5 6) 6) ; lists are indexed from 0
;; evaluation error: index 6 is out of bounds of a list of len 6
Because of that length
is also written in terms of accessing internal data representation of a vector and checking its length.
Our next type is indeed a Pattern
.
I’ve mentioned that we’re looking for a #procedure:
type pattern so our mathc
arm has a check for it:
Type::Pattern
if proc
.borrow()
.data
.data
.to_string()
.starts_with("#procedure:") => { /* … */ }
_ => Err(EvalError::WrongApply {
expr: Self::tree_to_string(&proc),
}),
And if the pattern is not starting with a #procedure:
we drop to the _
(any) pattern, which is an error.
So what’s happening in case we’ve got the correct pattern? Let’s see:
let name = proc.borrow().data.data.to_string()[11..].to_owned();
match name.as_ref() {
"newline" => Self::newline(&args),
"read" => Self::read(&args),
"define" => self.define(&args),
"lambda" => Self::lambda(&args),
"if" => Self::if_proc(&args),
"cond" => Self::cond(&args),
"progn" => Self::progn(&args),
"let" => Self::let_proc(&args),
"car" => Self::car(&args),
"cdr" => Self::cdr(&args),
"cons" => Self::cons(&args),
"display" => Self::display(&args),
"eval" => Self::eval_proc(&args),
"empty?" => Self::is_empty(&args),
"length" => Self::length(&args),
"+" | "-" | "*" | "/" | "%" | "remainder" => Self::math(&name, &args),
"<" | ">" | "<=" | ">=" | "=" => Self::compare(&name, &args),
_ => Err(EvalError::UnknownProc { name }),
}
Ah, another match
, that matches the name of the procedure.
These are built-in procedures, that are known to our interpreter and implemented internally.
If we got progn
it calls Self::progn
and passes args
to it.
Then progn
returns an expression.
The same goes for every procedure here.
Each of those returns a compound type of Result<NodePtr, EvalError>
, which either signals about the error, or has a resulting expression.
And if we got a procedure that is not on this list, we get an unknown procedure error.
I will not go into details of each built-in procedure, but generally speaking, each of those takes an array of arguments, checks those to be the correct amount and type, and does something with those.
Now you’ve might think where are user-defined procedures handled?
If this apply
function supports only built-in procedures, what’s the point of it?
That’s why I said earlier, at the end of the [Evaluation of the Procedure
](#code-snippet–simplified procedure application) that we will skip some code, regarding user-defined procedures, because user-defined procedures work kinda differently.
And you may have noticed that the list of built-ins above contains that one procedure, that we can use for defining procedures - lambda
.
Let’s look at it!
Builtin lambda
procedure
So what lambda
does?
How does it work?
To understand this, we need to look at the syntax for the lambda:
(lambda (x)
(* x x x))
Here’s a lambda
that takes one argument x
and multiplies it by itself by calling *
procedure over three arguments of that x
.
This already gives us a lot of information.
For example, not only does lambda
create a procedure, but it also defines a binding for its argument, so it could then be used inside that lambda
by other procedures, such as *
in our case.
So how do we apply this procedure? We wrap with parentheses:
((lambda (x)
(* x x x)) 2)
Here’s we apply our lambda
to argument of 2
, which binds x
to be 2
, so when *
is evaluated we lookup
each x
, evaluate it to 2
and then apply *
to the list of arguments [2, 2, 2]
thus producing the result of 8
.
Lambda also can take a variadic amount of arguments, and it is achieved by providing a single argument but without any parentheses:
(lambda x x)
This lambda will receive any amount of arguments and return those as a list:
> ((lambda x x) 1 2 'a "5")
'(1 2 a "5")
Let’s look at a bit more complex lambda
expression:
(lambda (x)
((lambda (y) (+ x y)) 5))
This example creates a lambda
that takes one argument x
and in its body, it creates another lambda, that takes one argument y
, and sums x
and y
in the body, and applies that lambda
to the argument of 5
, thus binding y
to 5
.
So whenever we call original lambda
and apply it to some number it will add 5
to that number.
Not a very practical procedure, but this gives us the idea, that, perhaps, instead of evaluating our second lambda we could simply return it:
(lambda (x)
(lambda (y) (+ x y)))
So, that’s a lambda
that takes one argument x
and returns a lambda
that takes one argument y
that sums up x
and y
.
Wait…
If we evaluate this procedure and apply it to some number like 5
, we should receive (lambda (y) (+ x y))
as a result, and there’s no x
in that definition.
This means that lambda
can grab bindings from the outside world and store those within itself!
This is called closure.
And what it allows us here is to store outside world bindings within our lambda
and reach those even if the original lambda
no longer exists:
> (((lambda (x)
(lambda (y)
(+ x y))) 5) 6)
11
> (((lambda (x)
(lambda (y)
(+ x y))) 5) 10)
15
However, that’s not really good way to use such procedures. What we can do instead is bind the lambda to a name, thus creating a user-defined procedure:
(define gen-adder (lambda (x)
(lambda (y) (+ x y))))
This gives us gen-adder
procedure (gen
stands for generate
).
Once we call it and pass a parameter, it will return the procedure that adds this parameter to some other parameter:
> (gen-adder 5)
#procedure:anonymous
> ((gen-adder 5) 5)
10
This in turn means that we can bind this resulting procedure to the name:
> (define add2 (gen-adder 2))
> (add2 40)
42
OK.
Now when we’ve hopefully worked this out let’s look at the code of the built-in lambda
procedure.
In parts, as usual:
fn lambda(args: &[NodePtr]) -> Result<NodePtr, EvalError> {
Self::check_argument_count("lambda", ArgAmount::LessThan(2), args)?;
// …
}
As I’ve already said, we check the number of arguments, and if there’s less than 2
that’s not a valid lambda
call.
E.g. (lambda (x))
is not a valid lambda definition.
let arg_list = args[0].clone();
let body = args[1..].to_vec();
Next, we take a pointer to the first argument of the lambda
which will either hold the list of arguments to the function we will produce or one argument that will be represented as a list.
And we take all pointers for body expressions that were passed to the lambda
.
This makes it possible to create procedures with any amount of expressions in the body: (lambda (x) (display x) (newline) (+ x 1))
, this lambda
takes one argument x
and displays
it, prints a newline
then returns the result of x + 1
.
Next, we check the type of our first argument.
It can only be a Name
or a Procedure
3, thus x
or (x)
.
Everything else is not valid type:
match Self::expression_type(&arg_list) {
Type::Procedure | Type::Name => (),
_ => {
return Err(EvalError::GeneralError {
message: "lambda: wrong argument type".to_owned(),
})
}
}
If we got the correct argument we can create a new tree that will hold our lambda.
We create a new node that will be our root, and it will be holding a Pattern
of #procedure:anonymous
created with GRData::from_proc_str
function, and w immediately push our argument list to it as the first sub node:
let res = Tree::new(GRData::from_proc_str("#procedure:anonymous"));
Tree::push_tree(&res, arg_list);
Now the body.
Because of the nature of the language, e.g. everything is stored in the tree, we actually need some root node for our body, and we also need to return only the result of the last expression, so progn
looks like the perfect candidate:
let progn = Tree::new(GRData::from_str("("));
Tree::push_child(&progn, GRData::from_str("#procedure:progn"));
We will then put each sub-expression from the body to that progn
, then push the progn
itself to our #procedure:anonymous
as its second argument, and return it as the result:
for child in body.iter() {
Tree::push_tree(&progn, child.clone());
}
Tree::push_tree(&res, progn);
Ok(res)
So our tree for an anonymous function that takes one argument, prints it, prints a newline, and returns the square of it should look like so:
But wait!
There’s more.
I’ve talked about closures and bindings before, but where do actually those bindings go?
Where do we store those?
How lookup
manage to find out these bindings?
Let’s look at this missing part of our lambda
function:
let mut current = Tree::parent(&body[0]);
while let Some(p) = current {
if !p.borrow().data.scope.is_empty() {
for (key, val) in p.borrow().data.scope.iter() {
progn
.borrow_mut()
.data
.scope
.insert(key.clone(), val.clone());
}
break;
}
current = Tree::parent(&p);
}
So what this piece of code does is the creation of closure for our lambda.
It goes to the expression parent, and if it has some bindings, it copies those into the current scope.
Into the progn
.
Into it’s (
.
So all bindings are being stored in parentheses. And whenever we go to the parent we immediately can see if it has bindings associated with it. This piece goes to the first parent, checks, and if there is something, then we’re interested in it, and only it. There’s no point to go all the way up.
This closure is created upon lambda
creation, but that’s another closure that is being created upon lambda
application, as well as some other things that happen in there, such as tail call elimination.
So let’s look at how we apply our lambda
functions
Application of lambda
procedure
Let’s first look at the apply_lambda
function signature:
fn apply_lambda(
lambda: &NodePtr,
args: &[NodePtr],
tail_call: bool,
) -> Result<(bool, NodePtr), EvalError> {
// …
}
Unlike apply
, this function takes one extra parameter and returns an extra Boolean in the success variant.
To understand what tail_call
does we have to go back to eval
function, and see a proper invocation of apply
and apply_lambda
:
if proc.borrow().data.user_defined_procedure {
if let Some(p) = Tree::parent(&expr) {
if p.borrow().siblings.last().unwrap() == &expr
&& p.borrow()
.siblings
.first()
.unwrap()
.borrow()
.data
.data
.to_string()
== "#procedure:progn"
{
// possible tail call
let (optimize, res) = Self::apply_lambda(&proc, &args, true)?;
if optimize {
// valid tail call
Tree::replace_tree(&p, res);
stack.pop();
} else {
Tree::replace_tree(&expr, res);
}
continue;
}
}
Tree::replace_tree(&expr, Self::apply_lambda(&proc, &args, false)?.1);
} else {
Tree::replace_tree(&expr, self.apply(&proc, &args)?);
}
It’s big. Let’s dig into it.
First, we check if our procedure is user_defined_procedure
, as it will allow us to early go for apply
, and still be able to override, say +
to user procedure with the same name, and call apply_lambda
instead.
If it is user_defined_procedure
we go in and check if it has a parent.
If it is, this is possibly a tail call, but we also have to check that the last expression in our parent is exactly the expression we’re going to evaluate and that the first expression in the parent is progn
because this indicates that it is really the last expression.
If this is true, this is a possible tail call, but we can’t be sure yet.
We call apply_lambda
with third argument set to true
because the check in apply_lambda
will further tell us if this was a tail call or not.
So if apply_lambda
returns true
we optimize
our call by replacing the parent with result of the lambda
application and pop our tree traversal stack, then we continue
.
If it was not a tail call, we replace the current expression with the result and continue
.
If there was a parent, but the expression was not last, or there was no progn
as the first one we pass false
as the third argument to avoid unnecessary checks for a tail call, and replace the current expression with the result.
So what exactly apply_lambda
does?
Let’s see:
let lambda_args = Self::first_expression(&lambda)?;
let lambda_body = Self::rest_expressions(&lambda)?[0].clone();
let mut can_optimize = tail_call;
This deconstructs the #procedure:anonymous
pattern in the same way how we’ve constructed it in the lambda
procedure.
We also create can_optimize
and initialize it with our tail_call
value.
Next, we have to create a closure, and this is going to be messy. Here’s the code:
if tail_call {
if let Some(p) = Tree::parent(&Tree::parent(&lambda).unwrap()) {
let args: Vec<String> = lambda_args
.borrow()
.siblings
.iter()
.map(|a| a.borrow().data.data.to_string())
.collect();
let mut body = lambda_body.borrow_mut();
for (key, val) in p.borrow().data.scope.iter() {
if !args.is_empty() && !args.contains(key) {
can_optimize = false;
} else {
body.data.scope.insert(key.clone(), val.clone());
}
}
}
}
We only do this if this is a tail call because if it is not, we do not have to create any closures besides those that were created in the lambda
procedure, because lookup
will be able to find bindings in parents.
However, we can’t simply create closure and always optimize our call.
That’s how we check:
First, we check if our parent has a parent.
Because at this point we only have sub expressions, we need this extra up the tree.
Then, we create a vector of the string representation of our arguments and take mutable reference to the lambda body.
Then, for each value in the scope of the parent’s parent, we do this: if the argument list is not empty, and the argument list contains the key from the parent’s scope we can’t optimize the tail call.
Because we can lose the original value of that binding, which isn’t gonna work when we, say, pass continuations.
If it is not, we put this binding to our lambda
scope.
So if we can successfully put every binding from the parent expression to our current expression, means we no longer need the parent expression and can optimize it!
After that, we can deal with the usual lambda application stuff. First, we have to bind all argument names to respective values, but there’s a problem. Arguments can be in form of a list, or the list itself, so we have to deal with provided arguments differently:
let procedure_name = match &lambda.borrow().data.data {
Data::String { data } if data.len() > 11 => data[11..].to_string(),
_ => "anonymous".to_string(),
};
match Self::expression_type(&lambda_args) {
// …
}
We store the name of our procedure4 so we could signal meaningful errors in case there’s a wrong argument count or other noise.
Then we check the type of lambda_args
.
It either can be Procedure
3 or Name
.
In the case of Procedure
it is simple, because all we have to do is compare array sizes, and if those are the same, go through each array and bind data from one array to data from another, and put it into the scope:
Type::Procedure => {
Self::check_argument_count(
&procedure_name,
ArgAmount::NotEqual(lambda_args.borrow().siblings.len()),
&args,
)?;
let mut lambda_body = lambda_body.borrow_mut();
for (key, val) in lambda_args.borrow().siblings.iter().zip(args) {
lambda_body
.data
.scope
.insert(key.borrow().data.to_string(), val.clone());
}
}
In case it is Name
we need to create a quoted list with all values from arguments that were passed to our function and bind it to the name:
Type::Name => {
let quoted_args = Tree::new(GRData::from_str("("));
Tree::push_child("ed_args, GRData::from_str("#procedure:quote"));
let list = Tree::push_child("ed_args, GRData::from_str("("));
for c in args.iter() {
let res = match Self::expression_type(&c) {
Type::List | Type::Symbol => {
let res = Self::rest_expressions(&c)?;
res[0].clone()
}
_ => c.clone(),
};
Tree::push_tree(&list, res);
}
lambda_body
.borrow_mut()
.data
.scope
.insert(lambda_args.borrow().data.to_string(), quoted_args);
}
E.g. if we call expression ((lambda x x) 1 2 3 4 5)
, this code above will create '(1 2 3 4 5)
and bind it to x
.
Then our procedure will return this x
.
In case it was not a Name
or Procedure
we throw error:
_ => {
return Err(EvalError::GeneralError {
message: "wrong binding type".to_owned(),
})
}
This is pretty much it! After we’ve done all of this our body is ready! So we return it, as well as information about the possibility of tail call optimization:
Ok((can_optimize, lambda_body))
You remember that our body is just progn
which contains all expressions, so our eval
receives it and does those ones by one.
So we simply convert all user-defined procedures to primitive procedures that are supported by an interpreter and evaluated.
Tail call elimination example
So above we’ve discussed how tail call elimination works in GRScheme. Let’s illustrate that process using trees for this simple procedure:
> (define f (lambda (x)
(display x)
(newline)
(f (+ x 1))))
> (f 0)
0
1
2
3
…
9578583832728723847384
9578583832728723847385
9578583832728723847386
…
This procedure displays the number we give it, prints a newline, and calls itself with the number increased by 1. In most languages that have no tail call elimination the biggest number, we can see represents the size of the stack that we can possibly have. However in Scheme, there’s no such limitation, and we can evaluate this procedure in constant space, e.g. the amount of RAM will stay the same. Let’s see how this procedure is represented in our interpreter.
First, we evaluate lambda
and produce such tree:
Then, define changes #procedure:anonymous
to #procedure:f
and binds f
to the value of #procedure:f
in global scope.
Then we apply f
to 0
we have this tree:
lookup
procedure checks the value of f
and replaces f
with our #procedure:f
:
Now we call apply_lambda
where proc
will hold #procedure:f
and args
will hold [0]
.
This binds x
to 0
at the scope of the progn
, and gives the evaluator this tree:
To clarify, the Scope
here is not a tree node, but data within the (
node.
Now evaluator starts the process of evaluating this expression.
It sees progn
and evaluates all its arguments in order.
First, we evaluate display
and its one and only argument x
.
We lookup
it, and find the value in progn's
parent (
:
Then display
prints 0
on screen, and evaluates to #void
:
After that, we evaluate newline
.
It also prints \n
on the screen, and evaluates to #void
:
After that we evaluate f
, but first we need to evaluate its argument (+ x 1)
.
We lookup x
as we did for newline
, and evaluate +
with 0
and 1
to 1
:
Now we are ready to call f
.
Let’s immediately apply it to see how it looks:
And without tail call optimization each iteration will expand like this.
However, notice that we’re evaluating the last expression, and the first one is progn
.
This means, that to this point we’ve evaluated everything in this progn
and we can throw it away.
So instead of expanding the last expression, we replace our old progn
with a new one:
We again apply all expressions in progn
, then call (f 2)
at the end and optimize the tail call.
This process repeats every time and we always use the same amount of space.
This pretty much sums up how the interpreter works. All these trees can be viewed as a direct representation of what’s going on underneath.
Quasiquoting, Unquoting, and Unquote Splicing
Currently, GRScheme has poor support for these features. It supports quasiquoting but one thing is not done correctly, and this is due to internal data representation. I’m not sure where to fix this, because it can be fixed in two places.
To understand this problem let’s have two examples of quasiquoting and quoting in the list evaluated in Racket:
> (let ((name1 'a)
(name2 'b))
`(a ,name1 b ,name2))
'(a a b b)
> (let ((name1 'a)
(name2 'b))
(list 'a name1 'b name2))
'(a a b b)
If we evaluate the same expressions in GRScheme, we’ll get this:
> (let (name1 'a
name2 'b)
`(a ,name1 b ,name2))
'(a 'a b 'b)
> (let (name1 'a
name2 'b)
((lambda x x) 'a name1 'b name2))
'(a a b b)
This happens because unquote
evaluates name1
, and name1
holds 'a
.
So it puts 'a
to the list.
However when we construct lists with (lambda x x)
or with cons
GRScheme imitates a regular Scheme and removes unnecessary quotes.
I thought that this will work when I started developing GRScheme, but it seems that quasiquoting shatters my dreams here.
I can fix this in quasiquoting as a hack, but I think that instead, I should remove hacks related to list construction and all car
-cdr
related stuff.
This way it will directly represent the internal data structure and will be much more reliable and consistent.
The other way I can fix this is to make it work correct, e.g. (quote (a))
will yield (a)
, but this will introduce some other additional checks in the tree, where we have to check if we should evaluate our expression or not.
Maybe I’m missing something though, and this can be done without additional checks.
Anyway, I’ll leave it for later.
Unquote splicing is not supported at all.
-
Top level scope, in this case, is empty because we imitate the use of the REPL. When a file is being parsed its contents are put to
progn
procedure, and only the last expression result is returned. In the case of the REPL if you enter multiple expressions without line breaks it will also be put into theprogn
call, however, REPL removes it and evaluates expressions in global scope, thus keeping bindings. Actually, every parsed expression sequence is wrapped toprogn
. ↩︎ -
Quasiquote implemented in terms of top-down gathering all valid unquotes and evaluating those bottom-up. This is not really suitable technique for proper quasiquoting, because splicing is really hard to do this way. Also because we store symbols in quoted form unquoting a name that holds a symbol will put a symbol in the quoted form to our list. ↩︎
-
Procedure
, in this case, means not a real procedure that we will apply, but a form of data that we expect. E.g. out of context, this code(x)
is a procedure application, but in the context of alambda
, it is a list of arguments. Reusing the same patterns can be misleading, but I think it is better than adding new data structures, as some other languages do. E.g. in Fennel lambdas are created by using square brackets:((lambda [x] (* x x)) 2)
will produce4
, and((lambda (x) (* x x)) 2)
will produce an error:Compile error in 'fn' unknown:?: expected vector arg list [a b …]
. This also applies to some other Lisp derivatives, like in Clojure:(fn [x] (* x x))
. ↩︎ ↩︎ -
define
can detect if we’re binding a value or a procedure. And if we’re binding anonymous procedure,define
changes its name to the string representation of the first argument passed to define. E.g. if we do(define square (lambda (x) (* x x)))
, define will change#procedure:anonymous
to#procedure:square
. ↩︎