Existential types in Gypsum
Near the end of my work on pattern matching, I added existential types to Gypsum. I've never had good intuition for existential types, so I dreaded having to add them, but unfortunately they are necessary for any kind of casting or type testing, which is essential for pattern matching.
Existential types allow you to express that you have an object with a known class, but you don't know what's inside it. For example, instead of having a list of strings, you have a list of "something". In technical terms, you have an instance of some paraterized class, but you don't know the type arguments. Existential types are similar to wildcard types in Java.
List[String] // normal class type forsome [X] List[X] // existential type List[_] // syntactic sugar for existential type
Existential types are required in systems with erased type arguments when you're performing down-casts with type checks. That was a mouthful, so let's break it down. Type arguments in Gypsum (like the String
in List[String]
) are erased at compile time. The compiler uses these type arguments to check the type safety of the program, but they are not present at run-time. If you have some random object, you can check its class to determine that it's a List
, but no part of the object tells you it should only contain String
s. You could look at the elements of the List
, but it might be empty, and you might come up with the wrong answer anyway: all the elements of a List[Object]
might happen to be String
s. In the future, Gypsum will support dynamic type arguments (not erased) that will allow this kind of check, but there will be a performance penalty for using them (they take up space and have to be loaded and stored), so erased type arguments (marked with the static
keyword) will probably always be more common.
Because type arguments are erased, when we down-cast a value to a subtype, we cannot know the type arguments of the new type. Existential types express this lack of knowledge, which is why we need them.
There are many situations when we don't actually need to know the type arguments when we're dealing with an object. For example, if we want to know the length of a list, it doesn't really matter what's inside it.
def get-length-if-arg-is-list(obj: Object): i64 = match (obj) case list: forsome [X] List[X] => list.length case _ => 0
Existential syntax
You've already seen a couple examples of existential type syntax, but here it is in detail.
An existential type begins with the forsome
keyword, followed by a list of type parameters in brackets, followed by an inner type. The type parameters may have upper and lower bounds. If the bounds are omitted, the upper bound is Object
, and the lower bound is Nothing
.
ty ::= forsome [tp, ...] ty | ... tp ::= T <: tyupper >: tylower
The type parameters may only be referenced by the inner type and by each other. They are invalid outside of the existential type.
There is some syntactic sugar to make existential types less clunky and more familiar to programmers who have used Java wildcard types. The _
symbol (blank) can be used as a type argument. If a class type has blank type arguments, the whole type is wrapped in an existential, and the blanks are replaced with fresh type parameters.
// These types are equivalent. Foo[_, _, _] forsome [X, Y, Z] Foo[X, Y, Z]
The bounds of these type arguments are automatically set to the bounds of the class's type parameters.
The subtype relation
The hardest part about implementing existential types was understanding how they interact with the rest of the type system. But this is the most important part. The key function in the Gypsum compiler is Type.lub
(in ir_types.py). This function computes the least upper bound of two types in the type lattice. The least upper bound is the most specific type which is a supertype of both types. Type.isSubtypeOf
is implemented using this.
Here are the rules for computing the least upper bound of two types, at least one of which is existential, in plain English.
- Let
left
andright
be the types being combined. - Let
leftInner
andrightInner
be their inner types. If one of them is not existential, then the inner type is the type itself. - Let
innerLub
be the least upper bound ofleftInner
andrightInner
. - If
innerLub
references any type parameters declared inleft
orright
, then the least upper bound is an existential type wrapped aroundinnerLub
declaring the variables that were referenced. - If no parameters are referenced, the least upper bound is
innerLub
.
There are two important details to be aware of.
First, any type is equivalent to itself wrapped in an existential. This works in both directions, and it doesn't throw any information away.
// These types are equivalent. Foo[Bar, Baz] forsome [X] Foo[Bar, Baz]
This rule is applied at the beginning, when we get the inner types, and at the end, when we decide whether to wrap innerLub
in an existential.
Second, the least upper bound of the inner types of two existential types determines the least upper bound of the existential types themselves. This is the middle part.
// Suppose we have a class with covariant type parameters. class Foo[static +A, static +B] // This is the least upper bound of two types. // Recall: `Nothing` is a subtype of all class types. // `â‹` denotes least upper bound. forsome [X] Foo[X, Nothing] â‹ forsome [Y] Foo[Nothing, Y] = forsome [X, Y] Foo[X, Y]
Example: matching Option
In a previous article on pattern matching, I had an example like this:
def get-or-else[static T](opt: Option[T], default: T): T = match (opt) case Some[T](value) => value case _ => default
The first match case has a destructuring pattern. The type of value
is determined by the return type of the Option.try-match
method. That method returns Option[T]
, so the type of value
is inferred to be T
.
How does that work though? How can try-match
return Option[T]
when it's supposed to test and match objects in general? The answer is that try-match
is overloaded. There are actually two implementations.
One implementation handles options specifically. Since we know statically that the argument is an option, we don't need to do any type testing and can just return it.
class Option[static +T] ... static def try-match(opt: Option[T]) = opt
The other implementation handles objects in general. It performs a type test and a down-cast as part of a variable pattern.
class Option[static +T] static def try-match(obj: Object): Option[Object] = match (obj) case opt: Option[_] => opt case _ => None
Note that the type of opt
is Option[_]
, an existential type. We return Option[Object]
though, which is equivalent.
Accessing properties
Frequently, we have an object with an existential type, and we want to call a method or load a property. This is always allowed when the result has nothing to do with the existential type (like the length
property in the get-length-if-arg-is-list
example above). It's even allowed most of the time even when existential variables are involved, but the result is up-cast to a more generic, non-existential.
Here's the general logic for accessing a property (either a method or field):
- Peel off any existential layers from the receiver (object) type and set the existential type parameters aside. The effective receiver type will be a regular class type.
- Determine what property is being accessed, based on the name and the types of the arguments (if there are arguments).
- Determine the type of the result. If type arguments were applied, this may involve type substitution.
- If the result type doesn't reference any of the existential variables peeled off earlier, then that is the final result type.
- If a property is being assigned, report an error. We cannot assign to an existential type safely, because we don't really know what it is.
- Otherwise, up-cast the result type to its base class type. Repeat until no existential variables are referenced.
Let's walk through these steps with this call:
let opt: forsome [X] Some[X] let value = opt.get
- The receiver type is
forsome [X] Some[X]
. - We peel off existential types. The effective receiver type is
Some[X]
. - We look up the
get
method. Its return type isT
(whereT
is the type parameter inOption
). - We perform type argument substitution, replacing
T
withX
. So the result type isX
. - The result type does reference an existential variable, and we are not assigning it. We up-cast to its upper bound,
Object
. - The result type is
Object
.
Conclusion
I'm glad existential types are finally implemented. I still feel like my grasp on them is tenuous, and I'm anxious that my implementation of the least upper bound rules might not be completely right. Most theoretical treatments of existential types have special open
and close
expressions for dealing with them explicitly. In Gypsum, I wanted most operations involving existential types to be implicit, so there's not a direct mapping to theory.
If you're interested in reading more about existential types, I found the paper Towards an Existential Types Model for Java Wildcards [pdf] to be a useful comparison.
The book Types and Programming Languages by Pierce is always a good reference for type systems.