Existential types in Gypsum

Published on 2016-02-04
Tagged: gypsum

Near the end of my work on pattern matching, I added existential types to Gypsum. I've never had good intuition for existential types, so I dreaded having to add them, but unfortunately they are necessary for any kind of casting or type testing, which is essential for pattern matching.

Existential types allow you to express that you have an object with a known class, but you don't know what's inside it. For example, instead of having a list of strings, you have a list of "something". In technical terms, you have an instance of some paraterized class, but you don't know the type arguments. Existential types are similar to wildcard types in Java.

List[String]          // normal class type
forsome [X] List[X]   // existential type
List[_]               // syntactic sugar for existential type

Existential types are required in systems with erased type arguments when you're performing down-casts with type checks. That was a mouthful, so let's break it down. Type arguments in Gypsum (like the String in List[String]) are erased at compile time. The compiler uses these type arguments to check the type safety of the program, but they are not present at run-time. If you have some random object, you can check its class to determine that it's a List, but no part of the object tells you it should only contain Strings. You could look at the elements of the List, but it might be empty, and you might come up with the wrong answer anyway: all the elements of a List[Object] might happen to be Strings. In the future, Gypsum will support dynamic type arguments (not erased) that will allow this kind of check, but there will be a performance penalty for using them (they take up space and have to be loaded and stored), so erased type arguments (marked with the static keyword) will probably always be more common.

Because type arguments are erased, when we down-cast a value to a subtype, we cannot know the type arguments of the new type. Existential types express this lack of knowledge, which is why we need them.

There are many situations when we don't actually need to know the type arguments when we're dealing with an object. For example, if we want to know the length of a list, it doesn't really matter what's inside it.

def get-length-if-arg-is-list(obj: Object): i64 =
  match (obj)
    case list: forsome [X] List[X] => list.length
    case _ => 0

Existential syntax

You've already seen a couple examples of existential type syntax, but here it is in detail.

An existential type begins with the forsome keyword, followed by a list of type parameters in brackets, followed by an inner type. The type parameters may have upper and lower bounds. If the bounds are omitted, the upper bound is Object, and the lower bound is Nothing.

ty ::= forsome [tp, ...] ty
   | ...

tp ::= T <: ty_upper >: ty_lower

The type parameters may only be referenced by the inner type and by each other. They are invalid outside of the existential type.

There is some syntactic sugar to make existential types less clunky and more familiar to programmers who have used Java wildcard types. The _ symbol (blank) can be used as a type argument. If a class type has blank type arguments, the whole type is wrapped in an existential, and the blanks are replaced with fresh type parameters.

// These types are equivalent.
Foo[_, _, _]
forsome [X, Y, Z] Foo[X, Y, Z]

The bounds of these type arguments are automatically set to the bounds of the class's type parameters.

The subtype relation

The hardest part about implementing existential types was understanding how they interact with the rest of the type system. But this is the most important part. The key function in the Gypsum compiler is Type.lub (in ir_types.py). This function computes the least upper bound of two types in the type lattice. The least upper bound is the most specific type which is a supertype of both types. Type.isSubtypeOf is implemented using this.

Here are the rules for computing the least upper bound of two types, at least one of which is existential, in plain English.

Let left and right be the types being combined.
Let leftInner and rightInner be their inner types. If one of them is not existential, then the inner type is the type itself.
Let innerLub be the least upper bound of leftInner and rightInner.
If innerLub references any type parameters declared in left or right, then the least upper bound is an existential type wrapped around innerLub declaring the variables that were referenced.
If no parameters are referenced, the least upper bound is innerLub.

There are two important details to be aware of.

First, any type is equivalent to itself wrapped in an existential. This works in both directions, and it doesn't throw any information away.

// These types are equivalent.
Foo[Bar, Baz]
forsome [X] Foo[Bar, Baz]

This rule is applied at the beginning, when we get the inner types, and at the end, when we decide whether to wrap innerLub in an existential.

Second, the least upper bound of the inner types of two existential types determines the least upper bound of the existential types themselves. This is the middle part.

// Suppose we have a class with covariant type parameters.
class Foo[static +A, static +B]

// This is the least upper bound of two types.
// Recall: `Nothing` is a subtype of all class types.
// `â‹` denotes least upper bound.
forsome [X] Foo[X, Nothing] â‹ forsome [Y] Foo[Nothing, Y]
= forsome [X, Y] Foo[X, Y]

Example: matching `Option`

In a previous article on pattern matching, I had an example like this:

def get-or-else[static T](opt: Option[T], default: T): T =
  match (opt)
    case Some[T](value) => value
    case _ => default

The first match case has a destructuring pattern. The type of value is determined by the return type of the Option.try-match method. That method returns Option[T], so the type of value is inferred to be T.

How does that work though? How can try-match return Option[T] when it's supposed to test and match objects in general? The answer is that try-match is overloaded. There are actually two implementations.

One implementation handles options specifically. Since we know statically that the argument is an option, we don't need to do any type testing and can just return it.

class Option[static +T]
  ...
  static def try-match(opt: Option[T]) = opt

The other implementation handles objects in general. It performs a type test and a down-cast as part of a variable pattern.

class Option[static +T]
  static def try-match(obj: Object): Option[Object] =
    match (obj)
      case opt: Option[_] => opt
      case _ => None

Note that the type of opt is Option[_], an existential type. We return Option[Object] though, which is equivalent.

Accessing properties

Frequently, we have an object with an existential type, and we want to call a method or load a property. This is always allowed when the result has nothing to do with the existential type (like the length property in the get-length-if-arg-is-list example above). It's even allowed most of the time even when existential variables are involved, but the result is up-cast to a more generic, non-existential.

Here's the general logic for accessing a property (either a method or field):

Peel off any existential layers from the receiver (object) type and set the existential type parameters aside. The effective receiver type will be a regular class type.
Determine what property is being accessed, based on the name and the types of the arguments (if there are arguments).
Determine the type of the result. If type arguments were applied, this may involve type substitution.
If the result type doesn't reference any of the existential variables peeled off earlier, then that is the final result type.
If a property is being assigned, report an error. We cannot assign to an existential type safely, because we don't really know what it is.
Otherwise, up-cast the result type to its base class type. Repeat until no existential variables are referenced.

Let's walk through these steps with this call:

let opt: forsome [X] Some[X]
let value = opt.get

The receiver type is forsome [X] Some[X].
We peel off existential types. The effective receiver type is Some[X].
We look up the get method. Its return type is T (where T is the type parameter in Option).
We perform type argument substitution, replacing T with X. So the result type is X.
The result type does reference an existential variable, and we are not assigning it. We up-cast to its upper bound, Object.
The result type is Object.

Conclusion

I'm glad existential types are finally implemented. I still feel like my grasp on them is tenuous, and I'm anxious that my implementation of the least upper bound rules might not be completely right. Most theoretical treatments of existential types have special open and close expressions for dealing with them explicitly. In Gypsum, I wanted most operations involving existential types to be implicit, so there's not a direct mapping to theory.

If you're interested in reading more about existential types, I found the paper Towards an Existential Types Model for Java Wildcards [pdf] to be a useful comparison.

The book Types and Programming Languages by Pierce is always a good reference for type systems.