Method resolution
This entry was taken from my old Fenris development blog. It was originally posted on March 6, 2008. Although the Fenris project is now defunct, I think these old entries have some general appeal, so I've added them to this blog.
I've decided on at least a tenative approach to calling methods polymorphically. It seems the best compromise is to use a closed hash table instead of a vtable and interface table. This allows very simple method lookup and average O(1) for all operations.
Method hash table format | |
---|---|
R | n - 1 |
m1 key | m1 pointer |
m2 key | m2 pointer |
... | ... |
mn key | mn pointer |
The first two words of the hash table tell its size and hash function. R is a random number, and n is the size. The size is guaranteed to be a power of two. Hash codes are computed as H(k) = (k ^ R) & (n - 1). We store n - 1 in the table so the subtraction isn't necessary.
After the header are key-value pairs for methods. Every method defined in a class or interface has a unique key. Keys are actually addresses of bytes in an empty segment; the linker guarantees that they are unique. The values in this case are the addresses of the actual methods. Note that the same method may have multiple entries in the hash table for a class, since it may be defined in multiple interfaces and/or superclasses.
Of course every hash table has the potential for collisions. The run-of-the-mill chained hash table is extensible, but is hard on caches. Since we will never need to add methods to the table after it is initially populated, it is better to use a closed implementation. If a collision occurs, the program probes entries in the table in a circular fashion after the initial slot. Tentative C code for method lookup is as follows:
void* method_lookup( int* hash_table, int key ) { int R = hash_table[0]; int n_minus_1 = hash_table[1]; int index = ( R ^ key ) & n_minus_1; int method = hash_table[index + 2]; while ( method != key ) { if ( method == 0 ) return method_rehash( hash_table, index, key ); index += 2; index &= n_minus_1 << 1; method = hash_table[index + 2]; } return (void*) method; }
The keys aren't finalized until the program is linked, so we
cannot construct the hash tables ahead of time. The
method_rehash
function will look up the appropriate
pointers for methods as they are needed, populating the hash table
lazily. It may be based either on a separate, binary searched table
(which can be constructed at compile time) or dynamic lookup.
This approach is very similar to method lookup in Objective C. It is advantageous over other techniques because interface coercion is a no-op and method lookup is average O(1) for both interface and class method calls. It does not incur the overhead of storing interface table pointers in memory (as tuples with the object pointer), and it is relatively fast and cache-friendly. It is also fairly to implement. The only disadvantage is that it is not quite as fast as normal method calls in C++.
At least initially, the method_lookup
function will
be part of the runtime (rather than inlined). This means that the
implementation can be tweaked with minimal modification to the
compiler. The first implementation may even use a simple binary search
tree rather than a hash table.