CodeSwitch API: native functions
As part of the API update I discussed in the previous article, I've added the capability for CodeSwitch to call native functions written in C++. This means that when you write a package, part of it can be written in Gypsum, and part of it in C++. This is useful for implementing new low-level primitives, such as files and sockets. It's also necessary for interacting with libraries written in native code, like Qt.
Declaring and registering native functions
In order for CodeSwitch to call a native function, the function must be declared in Gypsum code, and the native implementation must be registered with CodeSwitch.
Declaring a native function in Gypsum is easy. Just add the native
attribute to the function definition, and don't include a body. Top-level functions may be native. Methods (both static and non-static) may also be native. Constructors and overloaded functions cannot be native.
native def top-level-function: i64 class Foo native def normal-method: i64 static native def static-method: i64
Every function declared with the native
keyword must have a C++ implementation that CodeSwitch can find. There are three ways to provide this implementation:
- Explicit registration: functions may be registered explicitly through the CodeSwitch API. When you create a new
VM
you may provide aVMOptions
object. This object contains (among other things) a list of tuples callednativeFunctions
. Each tuple contains the name of a package, the name of a native function declared within that package, and a pointer to a C++ implementation of that function. - Native libraries: you can compile native functions for a package into a shared library (.so or .dylib file) and install that in the same directory as the CodeSwitch package. These libraries and the native functions within them will be loaded automatically when needed. Native functions loaded this way must follow a specific naming convention (details below) so the VM can find them.
- Static linking: native functions may also be compiled into the same binary file as CodeSwitch. This may simplify distribution in some situations. The VM will dynamically load these functions when needed. As with native libraries, these functions must follow a specific naming convention so the VM can find them. They must also be visible in the dynamic symbol table, which may require some special attributes and compiler flags.
When CodeSwitch searches for the implementation of a function, by default, it searches the VMOptions.nativeFunctions
list first, then searches native libraries. Statically linked functions are not searched by default. This search order can be changed globally by setting VMOptions.nativeFunctionSearchOrder
. The search order can also be set on a per-package basis when calling VM.loadPackage
.
Function naming convention
Native functions compiled into libraries or linked into the VM must follow a specific naming convention so the VM can find them. The compiled symbol name must be the package name (with dots replaced by two underscores) followed by the full declared function name (again, with dots between declaring scopes replaced by two underscores). The package name and declared name are separated by three underscores. Any characters which are not valid C identifier characters are replaced by single underscores.
This is best explained by example, so let's consider the File.exists
method in the std.io
package. The full name would be std__io___File__exists
.
std.io → std__io # package name File.exists → File__exists # function name std__io___File__exists
Native implementations must be declared with extern "C"
. C++ compilers usually encode some type information into compiled symbol names. This is called mangling, and it is how overloading is implemented in C++. Unfortunately, each compiler and operating system does this differently, so CodeSwitch can't reliably search for mangled symbols. Declaring with extern "C"
turns off mangling.
One other detail: normally, when compiling a library, it's a good idea to exclude functions from the dynamic symbol table by default. The -fvisibility=hidden
flag does this on g++ and clang. This prevents other libraries from linking to internal functions that may change. However, native functions must be visible in order for CodeSwitch to find them, so you may need to explicitly make them visible in the dynamic symbol table with __attribute__((visibility("default")))
.
Let's tie all this together. Here's a full declaration of the File.exists
method.
extern "C" __attribute__((visibility("default"))) bool std__io___File__exists(VM* vm, Object self) { auto path = getPath(vm, self); struct stat st; int ret = stat(path.c_str(), &st); return ret == 0; }
Native function calling convention
Interoperability between languages is a major goal of CodeSwitch, so I wanted to make native functions feel natural. Parameters are passed in as regular parameters. Return values are just returned. Exceptions can be thrown and caught like regular C++ exceptions. There is a fairly unsurprising mapping between C++ types and Gypsum types.
Parameters
The first parameter of every native implementation must be a pointer to codeswitch::VM
. This provides access to the rest of the VM, and is needed because CodeSwitch has no global state. Native functions may use this to load packages, look up functions, create new objects, or do anything else that native code can do through the API.
Native functions that implement non-static methods take a codeswitch::Object
as their second parameter. This is a reference to the receiver (the object the method was called on). This is just like the this
pointer in C++.
After those required parameters, native function parameters correspond directly to the parameters in the Gypsum declaration. For example, if this function is declared in Gypsum:
native def left-pad(<b>s: String, width: i64b>): String
Then the C++ function's parameters will look like this:
using codeswitch::String; using codeswitch::VM; extern "C" __attribute__((visibility("default"))) String leftpad___left_pad(<b>VM* vm, String s, int64_t widthb>) { ... }
Return values
To return something from a native function, just return it. The return type must correspond with a Gypsum type, according to the table below. Note that unit
functions in Gypsum are void
in C++.
using codeswitch::VM; extern "C" __attribute__((visibility("default"))) int64_t math___abs(VM* vm, int64_t x) { return x >= 0 ? x : -x; }
Throwing and catching exceptions
Exceptions are wrapped using the codeswitch::Exception
class. You can throw and catch them as you normally would in C++. A reference to the actual exception object being thrown can be retrieved with the get
method.
using codeswitch::Exception; using codeswitch::Object; using codeswitch::VM; extern "C" __attribute__((visibility("default"))) void utils___frob(VM* vm, double x) { try { frob(); } catch (Exception& e) { Object obj = e.get(); log(obj); throw e; } }
Types in Gypsum and C++
CodeSwitch converts between Gypsum and C++ types according to the table below. Since type information is lost after C++ code is compiled, CodeSwitch has no way of checking types in native code. Correctness is therefore up to you. Bad things will happen if the types are wrong.
Gypsum type | C++ type |
---|---|
unit | void |
boolean | bool |
i8 | int8_t |
i16 | int16_t |
i32 | int32_t |
i64 | int64_t |
f32 | float |
f64 | double |
String | codeswitch::String |
any other object | codeswitch::Object |
Conclusion
I'm really pleased with how native functions interact with interpreted code. I think CodeSwitch has a much more intuitive system than JNI or V8.
CodeSwitch can be more intuitive because native code that uses the API must be written in C++. This lets us use objects, destructors, and exceptions in a way that makes sense.
This is a strength, but it's also a weakness: there are many languages that compile to native code that I'd like CodeSwitch to interoperate with. C, Rust, and Haskell all come to mind. Most other languages provide a foreign function interface for C because C is the lowest common denominator of native languages: everything is compatible with it. I may provide a separate interface for C in the future, but I think C++ will always have the primary native API.