The Type Inference Torture Trail -- all or nothing!

λ July 3, 2020
Tags: prolog, felt, type-inference

In the on-going saga of developing FELT (a mammoth transpiler, v0.1 is
viewable at http://felt-lang.com, I had need to implement simple type
inference...once introduced it touches everything; there are no half
measures so for now I have had to make a compromise to avoid getting
side-tracked right now.

What is type inference anyway ?

Putting it simply, it’s the ability of a compiler or an interpreter to work out what the data type of an expression or variable is from the context in which it is used rather than having it explicitly declared by the developer.

For example, in the following code, FELT would already know that the type of argc is int from its internal knowledge base, if you decided to call that something else just to be awkward, well, that would be your own silly fault.

(defvar argc/float 0)

(defun main (argc argv)
  (defvar n argc)
  (emit n)
)

From the above, we (note, -we- not my language yet!) can see that defining the variable n to have the initial value of argc implies that they will have the same data type, int to be precise. Then when the generic EMIT instruction sees its only argument, all it has to do is ask for the type. Note that it is not fooled by the global scoped definition of argc as a float either, I have already implemented lexical scoping and that is working well.

Where it breaks down, for now.

I broke it with a simple test case that had this in it:

(emit 'Hello World' (+ foo bar (* 3 baz)))

And the nub of the problem is this: in order to know what printf format specifier is needed for the second argument, I would have to know the types of foo, bar, baz and if they were different types, then I would also now be forced into knowing the type coercion / promotion rules for C as well. As you can see, things mushroom like mad! :D

Don’t need it right now

The reason why I have spent a lot of time on it but not enough time to absolutely smash it is that I have a plan of execution regarding the order of instructions and how to go about squaring the circle to provide complete translation of FELT into C and this is basically, an irksome diversion from that plan.

The Workaround

At the moment the render phase will throw an exception stating that you need to provide the type as it has failed if it can’t work it out. This feels bad and I will fix it eventually, but not today. Not this week. Maybe this month. It will have a good go by examining the lexical scoping of the variable, and also by checking to see if the string is a hex number, a floating point number or decimal number, using “%x”, “%f” and “%i” accordingly but this may not work in all cases.

Call printf directly for C

The language contains the #F:WHEN instruction which lets you have control over what code makes it through to the output stream, so assuming C needed special treatment we could do this instead.

(#f:when CCoder
  (printf "correct formats" ...args...))

Sadly the current implementation of this instruction does not have an “else” so you would have to explicitly provide an implementation. It was initially designed to allow different code rendering inside a function not to provide complete different functions, but fear not it has been fixed in the new version and will do the right thing as if by magic.

To see what I mean, here is a short snippet of the code that renders to both PHP and JavaScript from the live site, it is the function <> which renders an HTML tag with attributes and body.

(defun <> (name (atts (array)))
  (#f:when PHPCoder
    (:= body (func-get-args)))
  (#f:when JSCoder
    (:= body (func-get-args arguments)))
    
  (ashift body)
  (ashift body)
  (return 
    (concat "<" name (atts->html atts) ">"
	    (->string body) "</" name ">"
	    HTML_SPACER)))

The function is DEFUN-ed into scope, it has the name <> which is fine by me, and it says it has one argument we call atts and the default parameter is the empty array. The very first thing inside the function is a compile time instruction that chooses either a PHP version or a JavaScript version such that the variable body contains the functions arguments. We then array shift the first two arguments away (the tag name and attributes) leaving us with the body content.

Possible enhancements

Currently you assign a type to a variable like this, “varname/type” (without the quotes) and type is stored and used when needed. I could change the syntax so that the emit argument uses the “starred form”, something the parser already allows for but is not currently used. The starred form? Yes! All instructions can be entered as foo or foo*… this goes back the first implementation. I might drop this feature but not until I see how much I used it. Basically it lets you choose an alternative internal handling of the instruction, for example, DEFUN defines a function, DEFUN* defines an anonymous function, allowing closures and local functions to be used.

Going forward

Not a problem. I already have a solution in mind but I really need to make the code generation process my top priority. For the most part, the type inference and lexical scoping is “good enough” and I always knew that the “C” target would be the hardest to do which is why I chose it first…once I have it done the rest of the targets will be a walk in the park.

Happy hacking!

Comments