A weird problem in the Scala type system

Published on 2015-01-19
Tagged: compilers gypsum scala

I've been trying to formalize the type system in Gypsum. There are two operations in particular that I want to put on a sound theoretical foundation.

The subtype relation is denoted a <: b. If type a is a subtype of type b, then a value of a can be used in any situation that b can.

The least upper bound is denoted a âˆª b. An upper bound of types a and b is a type c such that a <: c and b <: c. c is a least upper bound if there is no other upper bound d such that d â‰ c and d <: c.

Least upper bound is used to determine the type of if-expressions. It's also useful for inferring the return type of a function with multiple return statements.

class A
class B <: A
class C <: A

// The return type of this function is `A`.
def f(b: B, c: C) = if (true) b else c

// This one is also `A`.
def f(b: B, c: C) =
  if (true)
    return b
  c

The subtype relation can actually be defined in terms of least upper bound.

a <: b  iff  a âˆª b = b

So right now, I'm trying to prove that the compiler can always find the least upper bound for any two types without falling into an infinite loop. Soundness and completeness are great and all, but right now, I'm just trying to prove termination.

I came across an interesting case where it breaks down though. Suppose we have the three classes below.

class A[static +T]
class X <: A[X]
class Y <: A[Y]

What is X âˆª Y in this situation? The compiler goes through these steps to find out:

X âˆª Y
  // First, find the closest class in the inheritance chain.
= A[X] âˆª A[Y]
  // T is covariant, so we need to find the least upper bound
  // of the type arguments.
= A[X âˆª Y]
  // Now we need to find X âˆª Y. So we just find
  // the closest class in the inheritance chain...
= A[A[X] âˆª A[Y]]
  // and then the type arguments...
= A[A[X âˆª Y]]

  // a little later...

= A[A[A[A[A[X âˆª Y]]]]]

I'm not really sure how to reconcile this, so I went to see what the Scala compiler does. Scala is a big inspiration for me, and I want Gypsum's type system to be pretty similar. So I punched this into the Scala REPL.

scala> class A[+T]

defined class A

scala> class X extends A[X]

defined class X

scala> class Y extends A[Y]

defined class Y

scala> var x = new X()

x: X = X@149c9cb

scala> var y = new Y()

y: Y = Y@8c4c14

scala> if (true) x else y

res3: A[A[A[Any]]] = X@149c9cb

Wait... what?

So Scala has this problem, too. There's no way for the compiler to return a least upper bound because there's no way to formulate that type in Scala. It would be infinite. Here's what the Scala language specification actually says about this:

The least upper bound or greatest lower bound of a set of types does not always exist. For instance, consider the class definitions
class A[+T] {}
class B extends A[B]
class C extends A[C]
Then the types A[Any], A[A[Any]], A[A[A[Any]]], ... form a descending sequence of upper bounds for B and C. The least upper bound would be the infinite limit of that sequence, which does not exist as a Scala type. Since cases like this are in general impossible to detect, a Scala compiler is free to reject a term which has a type specified as a least upper or greatest lower bound, and that bound would be more complex than some compiler-set limit.

So there you go. I'm not really sure I agree with what Scala does here. It's impossible to find the least upper bound, and it makes sense to return some upper bound which is close. But it seems arbitrary and kind of inelegant that the compiler goes down three levels and then bails out.

I'm not sure yet what Gypsum will do in this situation. In any case, I need a way to prevent the infinite recursion. That's probably not the right approach.