Finding ̶N̶e̶m̶o̶ Names - Implementing Lexical Scoping for FELT

λ June 29, 2020
Tags: swi, prolog, felt

We all know what lexical scoping is, but have you ever had to implement it? I recently found myself in this situation and it was a lot easier that I thought it would be!

I am currently in the process of writing a version “2.0” of a transpiler / translator project I started a long time ago. The official site is here. For the record, that site is 99.742% FELT code and a tiny amount of PHP bootstrap code. But that’s another article.

Lexical Scoping

Again, because I am lazy, here’s the Wikipedia page on it and it says, briefly:

In languages with lexical scope (also called static scope), name resolution depends on the location in the source code and the lexical context, which is defined by where the named variable or function is defined.

Why did I do it ?

Because I had to! My hand was forced!! Truth be told I knew I’d have to do it but it wasn’t on the cards this far in. It is not without purpose that I chose “C” to be the first back-end; it was always going to be the hardest of them all.

Basically, there are some instructions in my language that offer to write things to “standard out”, whatever that means for the chosen target. The keyword for this is EMIT as you are emitting characters to an output stream. It’s also uncommon enough not to be confused with notions of print/write etc.

I originally wrote v1 to work with PHP and JavaScript (which it does) and those languages do not have formally declared variable types; you name the thing and crack on using it. Duck Typing. However, “C” is statically typed and it means that variables have types and almost everywhere you have the variable, you will need that type information. Case in point. printf is the vehicle for EMIT on the C back-end.

printf

Anybody that’s used printf knows that each argument in the parameter list has a corresponding entry in the format string, here are some simple examples to show how my journey with implementing lexical scoping began:

printf("%i", 42);      // => "42"
printf("%c", 65);      // => "A"
printf("%p", pBuffer); // => "0x800b3491"
etc

In FELT, when you want to print stuff out you can do this, emit is styled after echo in PHP as that was the original driver for a lot of the language as it currently stands, at least in the beginning:

(defvar total (+ 10 32))
(emit "hello" 'world' total)

The definition of total carries no type “/<type” on the end so in C it will default to having the type “int”. This means that the EMIT statement needs to know what the type was when it was defined. For this trivial example that’s not a problem if all the defined variables are held in a dictionary.

Show me some code

It gets a bit lairy when you do stuff like this:

(defvar gv 42)         ;  a global value
(defun test (gv) ...)  ; this is a NEW local GV

or

(defun test (a b)
  (defvar gv 0)
   ...
)

In the second case, gv is a new variable in no way connected with the globally declared one and we have to take this into account. Initially I had a dictionary for variables and functions as I encountered them, and I had a reasonably good set of predicates to deal with them. Implementing lexical scoping then became almost trivial once I had figured out how best to do it in my code base.

Initial environment

When beginning the translation run I call md_reset, this globally sets some variables I can access from anywhere. Yes, globals. But they are used in a very strict and controlled manner. i see no harm in the. They do NOT cause bad code, they encourage bad people to do bad things that’s all. I am neither.

On entry into the DEFUN handler, all we need to do is establish a new scope and we are done! Any subsequent DEFVAR instructions in the function, and any function parameters will be added to the new dictionary. Here’s the current implementation (will change no doubt) but it shows how easy it is:

defun(_Pos, Name, Sig, Body, Options, Out) :-
    env_push,
    produce_return_scope(Options, Name, FName, Scope, ReturnType),
    produce_arglist(Options, Sig, SigStr),
    progn_body(Options, Body, BodyStr),
    env_pop,
    % c specific
    template(c, defun, Tpl),
    Out = [
        Tpl - [ Scope, ReturnType, FName, SigStr ],
        tab(BodyStr),
        "}\n"
    ],
    debug(coder, 'defun: out: ~s', [Out, indent("BODY"), "}"]).

env_push and it’s brethren live in a separate file and they are very very simple, basically pushing and popping a list of dictionaries, and the lookup procedure walks up until it finds a match or not:

    % enter a new lexical scope
env_push :-
    nb_getval(envstack, Env),
    nb_setval(envstack, [_{}| Env]).

    % return to previous lexical scope
env_pop :- % discard current top of stack
    (   nb_getval(envstack, [_|Env])
    ->  nb_setval(envstack, Env)
    ; true
    ).

Adding a new variable is done when translating the arguments to a function and when defining a new variable withing the current scope… we fail if we find one already in place, otherwise we update the environment stack with the updated dictionary with the new (var,type)-KV entry:

    % add var+type to current lexical scope
    % fails if already exists in current scope
env_add(Name, Type) :-
    nb_getval(envstack, [Env|Envs]),
    atom_string(AName, Name),
    (   get_dict(AName, Env, _)
    ->  fail
    ;   NewEnv = Env.put([AName=Type]),
        nb_setval(envstack, [NewEnv|Envs])
    ).

When the time comes to resolve the type of a variable it’s simple, I just call env_lookup which fails if the variable is not found in any of the list of scopes:

    % look up a variable in all scopes until found.
env_lookup(Name, Type) :-
    nb_getval(envstack, Envs),
    atom_string(AName, Name),
    env_lookup_(Envs, AName, Type).
    % end of the line, nothing found, fail!
env_lookup_([], Name, _) :-
    !,
    debug(scoper, "didn't find: ~s", [Name]),
    fail.
    % found name/type in current scope.
env_lookup_([Env| _], Name, Type) :-
    get_dict(Name, Env, Type),
    debug(scoper, 'env_lookup: ~p = ~p', [Name, Type]),
    !.
    % nothing so far, go up one.
env_lookup_([_|Envs], Name, Type) :-
    debug(coder, 'env_lookup_: going up for ~p', [Name]),
    env_lookup_(Envs, Name, Type).

This means I can now pick up and raise exceptions for:

  • naming two parameters or more the same
  • naming a local variable the same as a parameter (shadowing)
  • naming two local variables the same

Some languages only warn but I do not tolerate such behaviour.

Unit test output

You can see what it gets up to when the unit tests run and you have enabled the debug output for the scoper tag:

% PL-Unit: lexical_scope .
% env_lookup: status = "const char*"
% env_lookup: foo = "char*"
% env_lookup: foo = "my-type"
% didn't find: nowhere-man

That is the output from these tests, writing unit tests in Prolog is incredibly succinct and terribly readable!

test(walks_up_to_find_nearest_definition,
     [true(Value == "my-type")]) :-
    md_reset,
    env_add("foo", "char*"),
    env_push,
    env_push,
    env_add("foo", "my-type"),
    env_push,
    env_push,
    env_lookup("foo", Value).

test(fails_when_no_resolution, [fail]) :-
    md_reset,
    env_push,
    env_push,
    env_push,
    env_lookup("nowhere-man", _).

Summary

Do hard things, it’s fun.

I still have some fun to come as I recently became aware of the fact that GNU Compilers and others now allow local functions to be defined in C, which I must confess to being pleasantly surprised at. It means that lexical scoping will have to have a second type in the dictionary but apart from that it just became trivial.

Happy hacking.

Comments