Static and dynamic types in pattern matching

Published on 2016-12-28
Tagged: compilers gypsum

View All Posts

Pattern matching is a control flow structure in some functional programming languages. It allows developers to examine the structure of an object (or some other data), bind variables to parts of the structure, and execute different cases depending on the structure. I wrote about pattern matching earlier when I added it to Gypsum.

Gypsum now supports pattern matching using both static and dynamic type information.

Let's break this down a bit, since this is a complicated statement. In general, pattern matching involves checking whether a value has a given type at run-time using dynamic type information. For example, we may want to check whether an Object is a String. Not all types can be checked at run-time though. For example, we can't check whether something is a List[String] using only dynamic type information because of type erasure. The Gypsum compiler discards type arguments like [String] at compile-time.

By incorporating static type information about the value being matched, we can perform some checks that wouldn't normally be safe to perform at run-time.

Example: Result

The Gypsum standard library has an abstract class called Result with two concrete subclasses, Ok and Err. Functions that can fail may return some kind of Result as an alternative to throwing exceptions.

public abstract class Result[static +V, static +E]
  ...

public final class Ok[static +V](public value: V) <: Result[V, Nothing]
  ...

public final class Err[static +E](public error: E) <: Result[Nothing, E]
  ...

Recall that type parameters are defined in square brackets. Type parameters with a + prefix are covariant. In the example above, V is the type of a successful result; E is the type of an error.

Suppose we want to define a function that helps us recover from an error: it takes a result and a default value. If the result is Ok, it returns the value from the result. If the result is Err, it returns the default value. We can implement this function using pattern matching.

def recover(result: Result[String, _], default-value: String): String =
  match (result)
    case Ok[String](value) => value
    case _ => default-value

This function needs to be able to determine that its argument is an instance of Ok[String]. This is impossible using only run-time type information because of type erasure. We could only determine whether the object is an instance of the Ok class. If the compiler allowed the check, we might match an Ok[I32] instead of an Ok[String].

Using static type information

Instead of relying solely on run-time information to perform the type check, we can also use the static type information available to the compiler. The compiler knows that the value being matched is of type Result[String, _]. This means that if the value is an instance of Ok, it must be an instance of Ok[String]. Therefore, it's sufficient to only check that the value is an instance of Ok at run-time.

The compiler function that determines whether these run-time type checks are safe is typeCanBeTested. Most of the heavy lifting is done by extractTypeArgsForSubtype though. This function takes a type (in this example, Result[String, _]), down-casts it to a given class (Ok), then figures out what the type arguments would be in the resulting type (String). typeCanBeTested compares the extracted type arguments against the type arguments of the type being matched (Ok[String]). If all type arguments are equivalent or if type arguments are covered by wildcards or existential variables, then the match is deemed safe.

What happens if the supertype is too vague? For example, suppose we were trying to test whether an Object is an Ok[String]. When extractTypeArgsForSubtype down-casts Object to Ok it won't be able to tell that the type argument should be String so it just returns None. typeCanBeTested will judge the match to be unsafe, and the compiler will report an error.

Conclusion

Pattern matching is a really powerful construct, and I wish more mainstream languages had it. Scala does an amazing job with pattern matching. It combines static and dynamic information the same way that Gypsum does; Scala is an inspiration for Gypsum in many ways.

Languages that use variants, like OCaml and Haskell, also do this. They lack a "top" type (like Object in Gypsum), so it's difficult to get into a situation where a pattern can't be matched.