We all know what lexical scoping is, but have you ever had to implement it? I recently found myself in this situation and it was a lot easier that I thought it would be!
I am currently in the process of writing a version “2.0” of a transpiler / translator project I started a long time ago. The official site is here. For the record, that site is 99.742% FELT code and a tiny amount of PHP bootstrap code. But that’s another article.
Again, because I am lazy, here’s the Wikipedia page on it and it says, briefly:
In languages with lexical scope (also called static scope), name resolution depends on the location in the source code and the lexical context, which is defined by where the named variable or function is defined.
Why did I do it ?
Because I had to! My hand was forced!! Truth be told I knew I’d have to do it but it wasn’t on the cards this far in. It is not without purpose that I chose “C” to be the first back-end; it was always going to be the hardest of them all.
Basically, there are some instructions in my language that offer to write things to “standard out”, whatever that means for the chosen target. The keyword for this is
EMIT as you are emitting characters to an output stream. It’s also uncommon enough not to be confused with notions of print/write etc.
printf is the vehicle for EMIT on the C back-end.
Anybody that’s used
printf knows that each argument in the parameter list has a corresponding entry in the format string, here are some simple examples to show how my journey with implementing lexical scoping began:
In FELT, when you want to print stuff out you can do this, emit is styled after
echo in PHP as that was the original driver for a lot of the language as it currently stands, at least in the beginning:
The definition of total carries no type “/<type” on the end so in C it will default to having the type “int”. This means that the EMIT statement needs to know what the type was when it was defined. For this trivial example that’s not a problem if all the defined variables are held in a dictionary.
Show me some code
It gets a bit lairy when you do stuff like this:
In the second case,
gv is a new variable in no way connected with the globally declared one and we have to take this into account. Initially I had a dictionary for variables and functions as I encountered them, and I had a reasonably good set of predicates to deal with them. Implementing lexical scoping then became almost trivial once I had figured out how best to do it in my code base.
When beginning the translation run I call
md_reset, this globally sets some variables I can access from anywhere. Yes, globals. But they are used in a very strict and controlled manner. i see no harm in the. They do NOT cause bad code, they encourage bad people to do bad things that’s all. I am neither.
On entry into the DEFUN handler, all we need to do is establish a new scope and we are done! Any subsequent DEFVAR instructions in the function, and any function parameters will be added to the new dictionary. Here’s the current implementation (will change no doubt) but it shows how easy it is:
defun(_Pos, Name, Sig, Body, Options, Out) :- env_push, produce_return_scope(Options, Name, FName, Scope, ReturnType), produce_arglist(Options, Sig, SigStr), progn_body(Options, Body, BodyStr), env_pop, % c specific template(c, defun, Tpl), Out = [ Tpl - [ Scope, ReturnType, FName, SigStr ], tab(BodyStr), "}\n" ], debug(coder, 'defun: out: ~s', [Out, indent("BODY"), "}"]).
env_push and it’s brethren live in a separate file and they are very very simple, basically pushing and popping a list of dictionaries, and the lookup procedure walks up until it finds a match or not:
Adding a new variable is done when translating the arguments to a function and when defining a new variable withing the current scope… we fail if we find one already in place, otherwise we update the environment stack with the updated dictionary with the new (var,type)-KV entry:
When the time comes to resolve the type of a variable it’s simple, I just call
env_lookup which fails if the variable is not found in any of the list of scopes:
% look up a variable in all scopes until found. env_lookup(Name, Type) :- nb_getval(envstack, Envs), atom_string(AName, Name), env_lookup_(Envs, AName, Type). % end of the line, nothing found, fail! env_lookup_(, Name, _) :- !, debug(scoper, "didn't find: ~s", [Name]), fail. % found name/type in current scope. env_lookup_([Env| _], Name, Type) :- get_dict(Name, Env, Type), debug(scoper, 'env_lookup: ~p = ~p', [Name, Type]), !. % nothing so far, go up one. env_lookup_([_|Envs], Name, Type) :- debug(coder, 'env_lookup_: going up for ~p', [Name]), env_lookup_(Envs, Name, Type).
This means I can now pick up and raise exceptions for:
- naming two parameters or more the same
- naming a local variable the same as a parameter (shadowing)
- naming two local variables the same
Some languages only warn but I do not tolerate such behaviour.
Unit test output
You can see what it gets up to when the unit tests run and you have enabled the debug output for the
% PL-Unit: lexical_scope . % env_lookup: status = "const char*" % env_lookup: foo = "char*" % env_lookup: foo = "my-type" % didn't find: nowhere-man
That is the output from these tests, writing unit tests in Prolog is incredibly succinct and terribly readable!
test(walks_up_to_find_nearest_definition, [true(Value == "my-type")]) :- md_reset, env_add("foo", "char*"), env_push, env_push, env_add("foo", "my-type"), env_push, env_push, env_lookup("foo", Value). test(fails_when_no_resolution, [fail]) :- md_reset, env_push, env_push, env_push, env_lookup("nowhere-man", _).
Do hard things, it’s fun.
I still have some fun to come as I recently became aware of the fact that GNU Compilers and others now allow local functions to be defined in C, which I must confess to being pleasantly surprised at. It means that lexical scoping will have to have a second type in the dictionary but apart from that it just became trivial.