CodeSwitch API improvements

Published on 2016-03-18
Tagged: codeswitch gypsum virtual-machines

I've been working on improving the interface between Gypsum code and native C++ code. This work has been in two main areas: the CodeSwitch API, and the native interface. I'll discuss the API in this article, and I'll talk about the native interface next time.

CodeSwitch is designed to be a library that can be embedded easily. It should be possible to use it in any application: web backends, games, word processing apps, whatever. This flexibility is especially important for CodeSwitch, since one of my goals for the project is cross-language compatibility. If CodeSwitch were a single program that could only load its own package files, it would be difficult to use languages it supports in a project that's predominantly C++ or some other language.

As with any library, a good API is crucial. While I can't say that CodeSwitch's API is completely stable yet, I think it's gotten to a point where it's pretty usable. You can see the full API in codeswitch.h.

In this article, I'll take you on a quick tour of the API. First, we'll look at how the simple program driver.cpp loads packages and executes code inside them. Then, we'll look at functionality that is useful for more sophisticated programs.

Starting the VM

Before you can do anything in CodeSwitch, you need to create a VM, which is an instance of the virtual machine with its own set of packages and its own garbage collected heap. Multiple VMs may exist, but nothing is shared between them. When initializing a VM, it's useful to provide a VMOptions object, which contains some settings. In this example, we'll specify directories that CodeSwitch will search when it loads packages by name.

#include <codeswitch.h>


using codeswitch::VM;
using codeswitch::VMOptions;
...

VMOptions vmOptions;
for (std::string& path : packagePaths) {
  vmOptions.packageSearchPaths.push_back(path);
}
VM vm(vmOptions);

The vm object manages the lifetime of the VM. When it is destroyed, the VM is shut down.

Loading packages

A VM won't do you much good until you load packages containing code you want to execute. You can load packages by file name or by package name. When you load a package by file name, that actual file is loaded. When you load a package by package name, the package directories are searched for a file with a name like "foo.bar.baz-1.2.csp" (replace foo.bar.baz and 1.2 with any package name and version). If multiple matching packages are found in the same directory, then the one with the highest version will be loaded. If multiple matches are found in different directories, the package in the first directory in the search order will be loaded.

using codeswitch::Name;
using codeswitch::Package;
...

// Loading a package by package name.
Name packageName = Name::fromStringForPackage(&vm, "foo.bar.baz");
Package package = vm.loadPackage(packageName);

// Loading a package by file name.
Package package = vm.loadPackageFromFile("foo.bar.baz-1.2.csp");

Once a package is loaded, you can call its entryFunction method to get its entry function (kind of like "main"), if it exists. Note each package also have an initialization function, which is executed automatically when the package is loaded. The initialization function is responsible for setting the initial values of global variables.

Here's the main loop from driver.cpp. packageLoaders is a vector of lambdas that call either loadPackage or loadPackageFromFile, depending on the command line arguments.

for (auto& loader : packageLoaders) {
  auto package = loader(&vm);
  auto entryFunction = package.entryFunction();
  if (entryFunction) {
    executedEntry = true;
    entryFunction.call();
  }
}
if (!executedEntry) {
  cerr << "no entry function found in any package" << endl;
}

Aside: References

Package objects are references. A reference is a registered pointer to an object on a CodeSwitch VM's garbage collected heap. References are actually indirect pointers. Instead of pointing directly to an object on the heap, a reference points to a slot in a handle table, which contains a pointer to the heap object.

It's important that we access objects through references for two reasons. First, the garbage collector needs to know whether objects are still referenced or not. Unreferenced objects may be freed. All of the pointers from outside the heap are stored in the handle table, which is very easy to scan. Second, the garbage collector is allowed to move objects around to reduce fragmentation. If we used raw pointers, it would be difficult to track down and update every pointer to a moved object, but it's easy to update pointers in the handle table.

Reference objects are defined by the Reference base class. When a new Reference is created, a slot is allocated in the handle table. When a Reference is destroyed, the slot is freed by the destructor. A Reference is invalid if it is created with the default constructor or after it has been used in a move constructor or move assignment. Some API functions return invalid references to indicate something went wrong. For most API functions, it's an error to pass an invalid reference as an argument. It's always an error to pass a reference from a different VM as an argument.

Accessing globals, classes, and functions

Top-level definitions can be looked up through Package objects. For example, to load a global:

using codeswitch::Global;

...

Global foo = package.findGlobal("foo");
if (foo) {
  cerr << "foo not found" << endl;
} else {
  cerr << "foo: " << foo.value().asI64() << endl;
}

Classes can be looked up the same way. In general, definitions can be accessed either using their full name (which requires constructing a Name object) or using a short name from source code (which can be a String or a std::string). Full names contain the names of all the scopes a definition is declared in. For example, the full name of a method bar declared in a class Foo would be Foo.bar. Full names are generally preferred for lookups, since there is less ambiguity.

Looking up a function is more complicated than looking up other kinds of definitions. Since overloaded functions may have the same full name, a type signature string is required to disambiguate them when looking them up. The type signature string is kind of human readable, but not really. It's documented in typesignatures.md.

You can also look up methods from a class. The Class class has findConstructor and findMethod methods that work similarly to those in Package. findConstructor doesn't require any name; only a type signature string is needed.

Calling functions

Functions can be called by invoking their call method, which can take any number of arguments. Integers, floating point numbers, Boolean values, Objects, and Strings may be passed as arguments. The number of arguments and their types must match the function type signature (an Error will be thrown if they don't). There is no casting or type promotion within CodeSwitch, so when you pass integer arguments, you have to pass values of the correct size.

using codeswitch::Function;
using codeswitch::String;
using codeswitch::Value;
...

Function foo = package.findFunction("foo");
String bar = String(&vm, "bar");
Value result = foo.call(bar, static_cast<int64_t>(12));
cout << result.asI64() << endl;

The call method returns a Value object. Values are used to hold function arguments and return values. The arguments of call are actually Values; all the constructors are implicit, so you can pass anything that can be used to create a Value. Internally, Value records the type of whatever was used to create it. This is used to check the arguments of function calls. There are several methods to retrieve the raw value inside, such as asI64, asString, asObject. These are also type checked.

As a shortcut, it's also possible to call a function directly from a package.

auto result = package.callFunction(
    "foo", String(&vm, "bar"), static_cast<int64_t>(12)).asI64();

Since the lookup and the call are performed in one step, it's not necessary to specify the type signature string, since it can be derived from the arguments. Note that the lookup can be a little expensive (it involves a hash table lookup and string comparisons), so if you need to call a function in a loop, it's better not to use this shortcut; do the lookup outside of the loop.

If the function is a constructor, you can create new objects by calling the newInstance method instead of call. newInstance works pretty much the same way as call, except it returns an Object instead of a Value.

using codeswitch::Class;
using codeswitch::Function;
using codeswitch::Object;
using codeswitch::String;
...

Class personClass = package.findClass("Person");
Function personCtor = personClass.findConstructor("(C::String)");
String name(&vm, "Joebob");
Object person = personCtor.newInstance(name);

Interacting with objects

There are a couple different ways to call methods on objects. The first is to look up the method separately (using a package or class), then call the method as a normal function function, passing the object as the first argument. You can also call a method directly on the object. With this approach, a lookup is performed internally (the type signature string is derived from the arguments), and the object is passed as the first argument automatically.

using codeswitch::Function;
using codeswitch::String;

Function getNameFunction =
    personClass.findFunction("get-name", "(C:Person)");
String name = getNameFunction.call(person).asString();
name += " Jr.";
person.callMethod("set-name", name);

In addition to calling methods, you can load and store fields on objects. Fields can be looked up from the object's class. You can also load and store fields with a single call that performs the lookup internally.

using codeswitch::Field;
using codeswitch::String;

Field nameField = person.clas().findField("name");
String name = person.getField(nameField).asString();
name += "Jr.";
person.setField(nameField, name);
name += "III";
person.setField("name", name);

If an object has array elements, you can access those, too. getElement and setElement access individual elements. Most of the time, you probably want to use copyElementsFrom and copyElementsTo though.

int32_t readIntoBuffer(
    int32_t fd,
    Object buffer,
    int32_t count,
    int32_t offset) {
  vector<char> nativeBuffer(count);
  int32_t actualCount = read(fd, nativeBuffer.data(), count);
  buffer.copyElementsFrom(offset, nativeBuffer.data(), actualCount);
  return actualCount;
}

Handling errors

Errors in CodeSwitch are easy to handle: simply use a regular C++ try..catch block. There are two classes of catchable object.

Exception is a wrapper for unhandled exceptions thrown from Gypsum code. If a Gypsum function throws an exception, and nothing catches it, the exception is wrapped in an Exception object, and the C++ API client has an opportunity to catch it. A reference to the thrown object can be retrieved with the get method.

using codeswitch::Exception;
using codeswitch::String;

try {
  function.call();
} catch (Exception& e) {
  String message = e.get().callMethod("to-string").asString();
  cout << message.toStdString() << endl;
}

Error is a class for programmer errors. This is thrown if the API is used improperly, for example, if an invalid reference is used or if a function is called with the wrong types of arguments. It may also be thrown due to internal errors. Generally, Error should not be caught, except for reporting to the user before terminating.

Conclusion

While I don't think the API is stable yet, it's pretty close to where I want it to be. It's well documented, and fairly simple to get up and running. This is the main way that people interact with CodeSwitch (other than by writing Gypsum code), so it's important to get it right.

You can read through the whole API at codeswitch.h. If you have Doxygen installed, you can generate documentation with the command make doc. Gypsum and CodeSwitch are available on GitHub.