A brief explanation of Scala 2.8 collections
The new collections in Scala 2.8.0 are awesome, but their design is kind of confusing. Check out the type for the simple function List.map
:
def map[B, That](f: (A) ⇒ B) (implicit bf: CanBuildFrom[List[A], B, That]): That
In another language like OCaml, we would have a much simpler type:
('a -> 'b) -> 'a list -> 'b list
So why the extra complication? What does that CanBuildFrom
do? Much smarter people than me have written excellent explanation already. Martin Odersky has written some very thorough documentation on collections. Daniel C. Sobral also have some amazing Stack Overflow answers like this one. I will attempt a very concise explanation though.
Motivation
Unlike OCaml, Scala has a lot of collections. You have List
s, Set
s, Map
s, each with pretty much every implementation you can think of. And there are mutable and immutable versions of each. These collections all support dozens of standard methods like map
, flatMap
, and filter
. Writing an implementation of every standard method for every collection would be a pain, and it seems unnecessary since the logic in each standard method is pretty much the same.
There are two major obstacles to having a single collection agnostic implementation of each standard method. First, we lack an abstract way to construct a result. List.map
needs to construct a List
, Set.map
needs to construct a Set
, and so on. Second, we don't know what the return type should be. This is how we might naïvely define map
:
trait Traversable[+A] { def map[B](f: A => B): Traversable[B] }
If we called this on a List
, we would get a Traversable
result, so we would need to cast it back to List
.
Overcoming obstacles
Let's look at the Traversable.map
type closely to see how both problems are solved in the collection library. We have the following type parameters:
A
- type of the elements in the input listB
- type of the elements in the result listThat
- type of the resulting list
You'll notice that there is an implicit parameter of type CanBuildFrom[List[A], B, That]
. Here's what CanBuildFrom
looks like:
trait CanBuildFrom[-From, -Elem, +To] extends AnyRef { def apply(): Builder[Elem, To] def apply(from: From): Builder[Elem, To] }
As you can see, CanBuildFrom
is a factory for Builder
:
trait Builder[-Elem, +To] extends Growable[Elem] { def +=(elem: Elem): Builder[Elem, To] def result(): To ... }
A Builder
is used to create a new collection. We just create a new Builder
using CanBuildFrom.apply
, add our elements using +=
, and then call result
. Each collection defines its own Builder
class and a CanBuildFrom
factory. Scala's implicit argument handling finds these for us, so we almost never have to pass them explicitly.
Since Builder
handles the details of constructing the result, map
can delegate all of its collection-specific work to the Builder
. This allows a collection agnostic map
to be implemented, which solves our first problem. The actual implementation is in TraversableLike
.
Our second problem is solved by type parameterization. By simply declaring an extra type parameter for the result type (That
), we can always return the correct type without actually knowing what it is. And since Scala has type inference, we rarely need to specify the type parameters explicitly.
Here's one of my favorite examples of this system in action:
scala> val s = BitSet(1, 2, 3) s: scala.collection.BitSet = BitSet(1, 2, 3) scala> s.map(_ + 2) res0: scala.collection.BitSet = BitSet(3, 4, 5) scala> s.map(_.toString) res1: scala.collection.Set[java.lang.String] = Set(1, 2, 3)
A BitSet
is a set of positive integers represented by a bit field, i.e., bit 5 is set if the integer 5 is in the set. Because of its implementation, BitSet
cannot contain anything except integers, so when we map its elements to strings, the result cannot be a BitSet
.
How does Scala figure out what to do here? Since we pass map
a function of type Int => String
, the Scala compiler knows to look for a CanBuildFrom
where Elem
is String
. BitSet
provides two instances of CanBuildFrom
: one where Elem
is Int
for building new BitSet
s, and one where Elem
is unspecified for building Set
s of any other type. When searching for an implicit argument, the compiler will look for the value with the most specific type. So when we pass an Int => Int
function, we get the first builder, but when we pass an Int => String
function, we get the second.