Arrays in Gypsum
I recently added arrays to Gypsum, but they are quite different than traditional arrays. In most languages (like C or Java), arrays are a primitive that stand on their own. You can build other data structures like array lists and hash maps out of them. In Gypsum, array elements can be integrated into any class. The normal class members come first in memory, then the array elements follow immediately. This is mainly desirable from an efficiency perspective: it provides better cache behavior by improving spatial locality.
Compact memory layout
Consider the built-in String
class. A String
object contains a pointer to a meta (all blocks in CodeSwitch start with a meta), a length, and a contiguous sequence of code points. Here's the memory layout for the string "hello":
Offset | Value | Description |
---|---|---|
0: | String | (meta) |
8: | 5 | (length) |
16: | h | elements |
20: | e | |
24: | l | |
28: | l | |
32: | o |
This is more efficient than storing the string header and characters in separate blocks, as we would be forced to in Java, if we were implementing a string class from scratch with arrays. With this layout, we can retrieve a character from a string using a single load instruction.
String
is a built-in class of course, so it's easy to implement it this way with some special handling. The new thing here is that you can use this compact array layout for any class.
New syntax
In order to add array elements to a class, some new syntax is required: an arrayelements
declaration.
final class Array[static T] arrayelements T, get, set, length
This declaration can only be used in final
classes because the first array element needs to be located at a predictable offset from the beginning of the object, after all the normal fields. If a non-final
class had array elements, a subclass could declare additional fields that would overlap with the elements.
The arrayelements
declaration has four parts.
T
- the type of the elements. All elements have the same type. It can be any type.get
- the name of a method that will return an element. The method will take one argument: the index of the element to return.set
- the name of a method that will change the value of an element. The method will take two arguments: the index of the element, and the value to store.length
- the name of a method that will return the number of array elements.
The get
, set
, and length
methods are generated by the compiler. You can name these anything. You can also use attributes like public
, private
, and override
when declaring them.
There's also a bit of new syntax to allocate objects with array elements, since you need to specify the length.
let array = new(12i32) Array[String]
The new
keyword is followed by the length in parenthesis. Once the object is allocated, its length cannot be changed.
Example: MutableArray
This is all well and good, but what if you just need a regular array? Just a collection of elements? You can implement this easily using arrayelements
. Here's the MutableArray
class from the standard library.
public final class MutableArray[static T] arrayelements T, public get, public set, public length
That's it! You can use this like an array in any other language.
import std.MutableArray def fill[static T](value: T, count: i32) = let array = new(count) MutableArray[T] var i = 0i32 while (i < count) array.set(i, value) i += 1i32 array
Example: covariant immutable Array
You might recall that Gypsum supports covariant and contravariant type parameters. It's actually possible for the array element type to be covariant, as long as the array is immutable. This means, for example, that Array[String]
would be a subtype of Array[Object]
. This is not safe with mutable arrays, since you could create an Array[String]
, cast it to Array[Object]
, then store something that's not a String
in there. Java and C# let you get away with this at the cost of some inefficient run-time type checking.
In Gypsum, all we need to do to make an array immutable is to add the final
keyword to the arrayelements
declaration. Here's the definition of the Array
class from the standard library.
public final class Array[static +T] private def this(a: MutableArray[T]) = var i = 0i32 while (i < length) set(i, a.get(i)) i += 1i32 static def create(a: MutableArray[T]) = new(a.length) Array[T](a) final arrayelements T, public get, private set, public length
The +
on the type parameter indicates it's covariant.
The final
keyword changes two rules. First, the element type will be considered covariant instead of invariant. Second, the set
method can only be called in the initializer and in constructors. In these functions, the actual type of the object is known, and we don't need to worry about variance. Outside of these functions, it's unsafe to make changes to elements because we don't really know the element type. In this class, the set
method is marked private
to avoid exposing it to clients, since they can never call it.
Example: BitSet
Let's look at something that's not just an array. Here's an implementation of a bit set, a set of non-negative 32-bit integers represented by a bitmap. Each BitSet
object contains a sequence of bytes which stores the bitmap.
import std.IllegalArgumentException final class BitSet // The constructor is private. It doesn't do anything since there // are no normal fields. private def this = {} // Clients use this method to create a new `BitSet`. This calculates // the required capacity, then allocates a new `BitSet`. Allocated // objects are zero-initialized automatically. static def create(max: i32) = let bytes-needed = (max + 7i32) / 8i32 new(bytes-needed) BitSet def contains(n: i32) = if (in-range(n)) let byte = get-byte(byte-index(n)) (byte & bit-mask(bit-index(n))) != 0i8 else false def add(n: i32) = if (!in-range(n)) throw IllegalArgumentException() var byte = get-byte(byte-index(n)) byte |= bit-mask(bit-index(n)) set-byte(byte-index(n), byte) def remove(n: i32) = if (!in-range(n)) throw IllegalArgumentException() var byte = get-byte(byte-index(n)) byte &= ~bit-mask(bit-index(n)) set-byte(byte-index(n), byte) private def in-range(n: i32) = n >= 0i32 && n / 8i32 < capacity private def byte-index(n: i32) = n / 8i32 private def bit-index(n: i32) = (n % 8i32).to-i8 private def bit-mask(ix: i8) = 1i8 << ix // This is the important part. Each element is a signed byte (`i8`). // The accessor methods are all declared `private` to avoid exposing // implementation details. arrayelements i8, private get-byte, private set-byte, private capacity
You can see the full example on GitHub.