Thoughts on types
Posted August 24th, 2021
I've used multiple programming languages, each with their own ideas of how types should be handled.
Each of these approaches created some issues and solved other issues, and my goal in this article is to describe these pros and cons. Rather than describe them in a list format, I will try to examine what we do as programmers and how various approaches to handling types can help or hinder this.
I'm writing my own programming language. I'll announce it here later. One of my goals with it is to analyse what I do in practise, and add in what I feel is missing. The thoughts I've expressed in this article have gone into my own programming language.
If you pause the execution of a program whilst it's executing an expression, each expression will have a type. There will be valid things you can do to it and things you can't. If an expression evaluates to an integer, you can use it for arithmetic operations you could not perform on strings.
You might think then that using static types is redundant in this case, but actually this makes it all the more valuable. It means that whether or not you're specifying types, you need to think about them to ensure that your program does not crash.
If your compiler does not enforce this, then you will need to be extra vigilant to avoid run-time errors.
Whenever you deal with a variable, you have to read the implementation and usage to make sure you are treating it in a valid way.
By having static types, your IDE or editor can notify you if you are writing invalid code. Static types are also a good form of documentation, because the compiler makes sure it is consistent with your program.
If you have an array you will probably have a loop. If you have a tree you will probably have a recursive function. For any data structure, there are more and less natural ways for processing that data structure.
The first implication of this is that it's important for a programming language to make it easy to define different kinds of data structure.
The second implication is that clearly communicating the data structure makes it easier to write the code to process it.
If you can clearly see that a variable is an array, then you have a good idea of what kind of code you will write next.
If you don't, and instead have to read implementation code from multiple files, then you will have a harder time figuring out what you should do with the variable.
Java is, or was, a good example of this. A lot of Java code looks like this:
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
The compiler and IDE can easily tell that the variable refers to a BufferedReader instance from the fact that it instantiates a BufferedReader. There's no need to repeat that information.
Some statically typed languages have type inference. This allows you to write something like this:
val reader = BufferedReader(InputStreamReader(Sytem::in))
(This is Kotlin).
If you want to, you can still provide type annotations. The compiler still makes sure every variable and expression has a type and that only valid operations are called on the associated types.
Will this program crash:
var item = "foo" print(item) item = 4 print(item * 2) // assume print converts automatically to string
No, it won't crash. By the time you multiply the variable by 2, the value contained in it is an integer so it is a valid operation. But many statically typed languages won't let this compile.
Programmers who are used to dynamic types feel that valid useful options for writing a program are closed off for no good reason in statically typed languages.
The way some statically typed languages have gotten around this is by having more flexible type systems, capable of representing more valid programs.
For example, this can be dealt with by using union types:
var item: String | Int = "foo" // now the compiler will tell you to check it is an integer before multiplying it.
In TypeScript, for example, you can use the type system to make sure that a number is prime.
This is cool, although it has problems. That means that you cannot tell ahead of time if type checking will complete (the halting problem).
In my opinion, this just pushes the problem developers deal with back a step.
Instead of being able to glance at the type declaration and understand how you should write your program, you now need to act like a computer and follow along.
What if you could tell your compiler that in this particular instance, you know better?
You could stick to using simple type system features, and force the language to do what you think will work in certain circumstances. For example, in Kotlin and typescript you can assert that a type is not null, or force a cast.
This is generally considered a bad practise, and should be used sparingly, but it can be useful.
What if invalid types generated warnings rather than errors, so you can still deal with them but you don't have to make your programs awkward in areas that aren't amenable to static types?
I think this is helpful for transitioning from dynamic types, and still provides some of the benefits of static typing.
The Java type system is lying to you. Every variable could be referring to null. Tony Hoare referred to this as his billion dollar mistake. As a result of this mistake, variables in Java often need null checks before they're used. One of the most common causes of runtime exceptions in Java is null pointer exceptions.
As a result of this, people have all the extra work associated with static types, but they don't really get all of the benefits since they have to check if every variable really is the type it says it is manually.
One of the ways to get round this is non-nullable types. This way null cannot be assigned to most variables by default, unless is null is specified as a valid option. If it is then the compiler requires null checks. This can also be achieved with union types.
So it looks like each avenue presents some issues. Some avenues try to get the best of both worlds, but nothing seems to be capable of avoiding all issues.
So then the question is, what to pick and when. I think it depends on the kind of project.
In a large project where you are working with multiple people, I think a slightly flexible statically typed language is the best option. This makes sure that everyone is on the same page regarding what every function expects, reduces runtime errors and cognitive load for developers.
Whilst it might be tempting to go for something less stringent, I think it's ultimately not a good idea for a large project that's expected to last and work.