Adding Generics To My Programming Language
Posted October 27th, 2021
Introduction
In a previous article, I introduced the programming language I was working on. One of the features I intend to work on is generics. At the time of writing this, I haven't started yet. In this post I'll describe how I add it.
Why Generics?
Broadly speaking, the main goal of Minerva is to make control flow more natural. One of the most common kinds of control flow is iteration. One way of looking at iteration is creating and consuming lists. Since Minerva is statically typed, I need to support iterating over lists of different types. I also want to eventually write the Minerva compiler in Minerva, and that will require lists of different types - e.g lists of tokens and lists of expressions.
Furthermore, in any decent langauge and standard library, type polymorphism is essential.
The Goal
I'm not planning on dealing with covariance vs contravariance any time soon. Right now I just want something simple. I want code like this to work:
class Container<T>(item: T) {
function get() => item;
}
var container = Container("value");
print (value.get() + " type infers string");
The Plan
I think I will do things in this order:
- Parse Generic declarations.
- Store the generic variables in function declarations
- Use them for type checking
- Allow parsing of type arguments in call site
- Use for type checking
- Make sure type inference works with generics
Parsing Generic Declarations
I'll start with functions.
private fun lambdaExpression(): Expr.Function {
if (match(TokenType.LESS)) {
genericDeclaration()
}
consume(TokenType.LEFT_PAREN, "Expect ')' after function name.")
private fun genericDeclaration() {
if (!check(TokenType.GREATER)) {
do {
consume(TokenType.IDENTIFIER, "Expect generic parameter name")
} while (match(TokenType.COMMA))
}
consume(TokenType.GREATER, "Expect closing >")
}
Right now I'm not storing any of the values it returns.
Now code like this can compile:
function double<T>(value: Int) => value + value;
I can't do this:
function double<T>(value: T) => value + value;
But one step at a time.
Storing generic type parameters
FIrst I'll make it so that the type parameters are returned, as a list of identifiers:
private fun genericDeclaration(): MutableList<Token> {
val typeParameters = mutableListOf<Token>()
if (!check(TokenType.GREATER)) {
do {
typeParameters.add(consume(TokenType.IDENTIFIER, "Expect generic parameter name"))
} while (match(TokenType.COMMA))
}
consume(TokenType.GREATER, "Expect closing >")
return typeParameters
}
val typeParameters = if (match(TokenType.LESS)) {
genericDeclaration()
} else emptyList()
And now I'll store them in function expressions:
class Function(
val parameters: List<Token>,
val typeParameters: List<Token>,
val body: Expr)
: Expr(Type.AnyType())
Use generics for type checking
I need to bind the type parameters:
is Expr.Function -> {
val block = if (expr.body is Expr.Block)
expr.body
else
Expr.Block(listOf(Stmt.Expression(expr.body)))
expr.typeParameters.forEach {
environment.define(it.lexeme, Type.InferrableType())
}
val closure = Environment(environment)
I'm not sure if this is right or if I need to create new kind of type - GenericType. Either way, onto the next step:
Parsing type arguments in call site
private fun call(): Expr {
var expr = primary()
while (true) {
if (match(TokenType.LEFT_PAREN)) {
This is a problem, by the time I know I'm in a call, it's too late to check if there was a generic first.
private fun call(): Expr {
var expr = primary()
while (true) {
var typeArguments = if (match(TokenType.LESS)) {
genericCall()
} else emptyList()
if (match(TokenType.LEFT_PAREN)) {
I think this will cause bugs by allowing things like this:
double<T>.name
// or
double<T>[2]
But I can fix that later
private fun genericCall(): MutableList<Type> {
val typeArguments = mutableListOf<Type>()
if (!check(TokenType.GREATER)) {
do {
typeArguments.add(typeExpression())
} while (check(TokenType.COMMA))
}
consume(TokenType.GREATER, "Expect >")
return typeArguments
}
Use for type checking
I've hit a roadblock. I assume that any identifier that isn't Boolean, Int, etc, is an instance type. Which results in errors when I typecheck generic types.
I think I will need to introduce a placeholder type, and lookup type variables during typechecking rather than assuming that they are instances.
I've went down a whole rabbit hole without updating things here. Here's what I did: I created a placeholder type. I made generics and classes use this type. I added generic types to call expressions. Now this works:
function double<T>(value: T)=> value;
print (3+double<Int>(3)+3);
And this is an error:
function double<T>(value: T)=> value;
print (3+double<String>(3)+3);
There's still more to do, and a lot of cleanup, but I will make sure the rest of the use cases work.
Ok, I just uncommented some other code, and it turns out that I broke unrelated stuff. The linked list Minerva test code I had earlier is resulting in a stack overflow.
My guess is that type matching does not work anymore, because instance types are now treated like unresolved types.
Ok that's been fixed. There's a good chance there are other things with similar issues, but I'll sort that out after.
Handle scoping and classes
At the moment, I only check the return value of the function and compare it to the type arguments of that same function. I need a way to do it for a class, so that all methods inside that class can use the type parameters, and allow nested functions to use it.
Like before, I'll start with the parser:
class Container<T>(var value: Int) {
}
I'll need to store this somewhere, either in the constructor statement or the class statement.
Actually, I've changed my mind. Rather than start with classes, I'll make it so that scoping works with functions.
I have an idea that should have been obvious.
Imagine you're executing this function:
function add(a: Int, b: Int) : Int => a + b;
Where does a
come from? Or b
? A naive way would be to get the function from the call, and ask it. One problem with that is this would not work for a function like this:
function getAdder(num: Int) => {
function (foo: Int) => num + foo;
}
A better way would be to have a scoped environment where the parameters are defined, and in the call expression it asks for the variables from the current scope. This is how variable values and types are handled, in the resolver and typechecker, respectively. But for some reason I made generic type resolution work the naive way. In order to make it work the sensible way, when I am type checking a call that passes in types, I update the values of the type parameters in the current scope, so that when I look them up I can find them.
Ok now that works. Before I make it work for classes, I'll refactor it to not be as messy.
Another issue
I've written a linked list for testing:
class Node<T>(var value: T, var next: Node | null) {}
class List<T>() {
var head: Node | null = null;
var size = 0;
function add(value: T) => {
this.head = Node<T>(value, this.head);
};
function traverse(current: Node, func: (T)=>Any): null => {
var next = current.next;
func(current.value);
typematch(next) {
Node => this.traverse(next, func);
null => null;
};
null;
};
function iterate(func: (T)=>Any) => {
var ptr = this.head;
typematch(ptr) {
Node =>this.traverse(ptr, func);
else =>null;
};
};
};
Instantiating it works fine. I can add things, and typechecking ensures I add the right things:
var list = List<Int>();
list.add(4);
list.add(3);
list.add(5);
list.add(10);
list.add(12);
list.add(8);
list.iterate(function (num: T) => {print num;});
list.add("hello"); // I get a type error here, as I should
So what's the issue?
The problem lies in creating a new linked list that has a different type argument:
list.iterate(function (num: T) => {print num;});
var newList = List<String>();
newList.add("Hello");
newList.add("World!");
newList.iterate(function (value: T) => {print value;});
list.add(12); // type error here, there shouldn't be
list.add("hmm"); // no error here, when there should be
The problem is scoping, I actually need to set the type argument in a nested scope. Instead of assuming that the type argument of any List must be Int because the first usage is Int, I need to scope it by variable name. This will require me to change the architecture of the type checker.
I've had a long break from this, but I finally got it to work after removing a lot of the code I added trying to make it work.
Support Inference
Here is the desired behaviour:
class Container<T>(var value: T) {}
var container = Container(3);
class Container<T, V>(var value: T) {
var lst: V[] = Array<V>();
}
var container = Container<Int, String>(3);
// either infer everything, or infer nothing
I've added a function that takes in the provided type arguments and checks if it has the same size as the type parameters. If not, then it checks if it is empty. If it's empty it iterates through the parameters and arguments, looking for parameters that refer to generic types, and the corresponding type of the argument. It then returns a new array of types.
Adding generics to type declarations
List<Car>
should probably not be assignable to List<Person>
.
I'll start with the parser:
if (match(TokenType.LESS)) {
do {
typeExpression()
}
while (match(TokenType.COMMA))
consume(TokenType.GREATER, "Expect closing '>'")
}
Now here is a challenge - in the parser I am treating all instance types as unresolved types. At the moment they only reference the name of the class. I will a field that points to other unresolved types.
When I lookup a type and find out that it is an instance type, I then look for the type arguments and apply them.
Modifying class assignment
Most of this code is concerned with checking superclasses, I need to modify this to ensure that type arguments are also checked.
You might think that all I have to do is check that both instances are of the same class, and that they have the same type arguments, or at least that the type arguments are assignable to eachother, but remember, Minerva supports inheritance. If Cat extends Animal, than Cat can be assigned to Animal. What if they both take in type parameters, but Cat takes in even more? How do I check if they are assignable to each other? This is also where covariance bites me. In order to make this work, I first need to ensure that type arguments are passed to superclasses the same way constructor parameters are. But before that, I have another issue I need to fix.
Parsing Problems
After supporting generic type declarations, using < as a comparison operator results in errors. Now, whenever there is a < operator, the parser assumes that it's a generic type declaration, and waits for a >.
The parser needs to be able to tell the difference between:
a < b
a<b>()
I think instead of just checking for '<', I also need to check if the next token is an identifier, and then the token after that is either a comma or a >. That means the parsers needs extra levels of lookahead.
That worked like a charm. The code was clunky, but it worked.
I'm going to take a break now before attempting the next stage. I first need to decide what the desired behaviour is for combining generics with inheritance, so I will have to spend time thinking about it and looking at what languages like Kotlin and Typescript do.
Inheritance
Here is how I want my language to work with generics and inheritance:
class Parent<T> {}
class Child<G, T> extends Parent<T>{}
var john = Child<String, Int>();
Also:
class Parent<T> {}
class Child extends Parent<String> {}
First I will change how inheritance is handling without taking generics into account. At the moment, instance types have a superclass name field, which is used to look up parent classes when assigning or getting members.
Instead of storing the name, I will make it store the superclass instance type. That way, I can have type arguments inside the superlclass.
It seems like generic type inheritance works now.
More parsing problems
Remember the previous issue with < being used as the less than operator and in generic calls? I fixed it by checking for an identifier then either a comma or a >.
The problem is that String, Int, etc, are not identifiers, and since type declarations are parsed, in theory it should take union type parameters and function types, not just identifiers. For some reason this worked when parsing with one generic type parameter, but not two.
I can't think of a good way to parse this, since I need a way to do variable length pattern matches.
I want to rewrite the parser at some point, but not know. Unfortunately, it seems like I will need to make non-trivial changes to the parser to make this work.
I've decided on a compromise. I won't currently check for function types in generic calls, that way I can keep the parser as is.
That worked. It looks like generic inheritance still has problems, it's order dependent. To fix that instead of going off the generic parameter name, I also track the order it's passed in to the superclass.
Finishing class assignment
The way that I currently check if a class type can be assigned to the other is by walking up the inheritance tree until I get to the "same" class. To make sure this works for generics, I'll need to apply the generic parameters at each level, and then see if they have the "same" type arguments.
Tidying up
The code surrounding substitution is very messy. It took me more time and code to support generics than it took me to add type checking in the first place. Now the typechecker has twice the amount of code as the interpreter. Before I push this to Github or start on another feature I will need to do a lot of refactoring.
What's Next
As mentioned earlier, one of the main reasons I've added support for generics is to pave the way for creating collections of different types, so that they can be iterated over. To make these collections iterable, I'd like them to conform to an iterator interface, and so I would first have to add support for interfaces. Before I do that, however, there are a few housekeeping things I'd like to do. I'd like to add multi-line comments to make testing easier. And I'd like to create multiple test files for all the language features so that I can make sure that all previous features work when adding a new feature.