CodeSwitch assembly glue for native functions

Published on 2016-07-21
Tagged: codeswitch

Last time, I discussed native functions, but I didn't really talk about how CodeSwitch makes the transition from interpreted code to native code.

The interpreter uses a stack to pass arguments between functions. Any instruction that produces a value will push it onto the interpreter's stack. When a function with n parameters is called, the top n values on the stack are passed as arguments to the function. When the function returns, the arguments are popped, and the return value is pushed.

This doesn't translate well to native functions, which are expected to follow the normal C++ ABI for the architecture and operating system. So we need some assembly glue code to copy arguments from the interpreter's stack into the right registers and stack slots.

AMD64 calling convention

Before we get into the glue code, let's talk about what the the calling convention. CodeSwitch currently supports the System V AMD64 ABI [pdf], which is used on 64-bit Linux and MacOS, so we'll just focus on that. I plan to add support for more platforms in the future.

Registers are assigned according to the following rules (see the spec linked above for full details):

Integer arguments are assigned to %rdi, %rsi, %rdx, %rcx, %r8, and %r9, in that order.
Floating point arguments are assigned to %xmm0 through %xmm7.
If there are more arguments that won't fit into the registers above, they are pushed onto the stack. Earlier arguments are stored at lower addresses.
The function's return value is stored in %rax or %xmm0 (depending on whether an integer or floating point number is returned).
%rbx, %r10, %r11, and %xmm8 through %xmm15 are temporary (caller-save) registers.
%r12 through %r15 are callee-save registers.

For the purpose of register assignment, pointer arguments are treated like integers. There is some special processing needed for pointer arguments, which I'll cover a little later.

As an example, suppose we have the function declared below. Each parameter will have the register assignment shown in the comment next to it.

int fn(
  int a,    // %rdi
  float b,  // %xmm0
  int c,    // %rsi
  int d,    // %rdx
  int e,    // %rcx
  float f,  // %xmm1
  float g,  // %xmm2
  float h,  // %xmm3
  float i,  // %xmm4
  int j,    // %r8
  int k,    // %r9
  int l,    // (%rsp)
  float m,  // %xmm5
  float n,  // %xmm6
  float o,  // %xmm7
  float p,  // (%rsp, 8)
  int q     // (%rsp, 16)
);
// result returned through %rax

Moving arguments into registers

Unfortunately, we can't copy arguments from the interpreter's stack into native registers and stack slots in pure C++. The language does not provide any mechanism to do this. We need a bit of assembly "glue" code to do the job. C++ does as much of the work as possible though.

When a native function is called, callNativeFunction is invoked. This function is platform-independent. It takes the Function object being called (which contains type information and a pointer to the native implementation) and a pointer to the top of the interpreter's stack. callNativeFunction has three jobs: it copies arguments from the interpreter's stack into a temporary buffer, it classifies arguments as either integer or non-integer (in a bool array), and it allocates and later destroys handles for any Object arguments (more on this in a bit). Once this work is done, callNativeFunctionRaw is called.

callNativeFunctionRaw does as much of the amd64-specific work as can be done in C++. It moves arguments according to their type from the temporary buffer into an integer buffer of 6 elements, a float buffer of 8 elements, and a stack buffer. Once the argument segregation is done, it passes control to callNativeFunctionForInt or callNativeFunctionForFloat to complete the call, depending on the return type.

callNativeFunctionForInt and callNativeFunctionForFloat are actually two names for the same stub function, implemented in assembly. This code is responsible for copying the arguments from the integer, float, and stack buffers into registers and onto the stack. Here's the section that handles integer registers.

# Move integer array pointer and size
# into scratch registers, since we'll be replacing them.
  movq %rsi, %r10  # int arg count
  movq %rdx, %r11  # int arg ptr

  # Load integer arguments
  cmpq $0, %r10
  je .LloadFloatArgs
  movq (%r11), %rdi
  cmpq $1, %r10
  je .LloadFloatArgs
  movq 8(%r11), %rsi
  cmpq $2, %r10
  je .LloadFloatArgs
  movq 16(%r11), %rdx
  cmpq $3, %r10
  je .LloadFloatArgs
  movq 24(%r11), %rcx
  cmpq $4, %r10
  je .LloadFloatArgs
  movq 32(%r11), %r8
  cmpq $5, %r10
  je .LloadFloatArgs
  movq 40(%r11), %r9

Note that we can't loop over the array and load each argument into the "next" register, since there's no instruction to load a value into a dynamically named register. Each argument has to be loaded with a different instruction.

Once all the arguments have been loaded, we are ready to branch into the native function. The glue code does not set up its own stack frame, and it doesn't need to do anything with the return value, so it tail-calls the native function. When the native function is done, it returns directly to callNativeFunctionRaw.

  # Restore the stack and make the call.
.Lcall:
  addq $32, %rsp
  popq %rax  # native function
  jmp *%rax

The assembly stub doesn't have to do anything special to deal with returned values or exceptions; the native function handles all that automatically. The assembly stub just sets up the registers and the stack. Once its job is done, no trace of it is left behind on the stack. This is why we can use the same stub to call functions returning integers and floating point numbers, even though they are returned through different registers.

Pointers

I glossed over this earlier, but some special handling is needed for pointers to objects on the garbage collected heap. The interpreter can pass pointers directly on the interpreted stack, since the garbage collector knows how to find pointers there. With native code, we need to register pointers with the garbage collector, and we need to add some indirection so that native code doesn't need to worry about objects being moved by the garbage collector. I've covered this before in Memory management in CodeSwitch. The short version is that we use handles, which are basically pointers into a table that contains pointers to objects on the heap. The garbage collector can scan and update the pointers in the table all at once.

When calling a native function, we need to wrap any pointer arguments in handles. This is done in callNativeFunction. As far as the assembly code is concerned a handle is represented as Object**. Pointers are treated like integers, according to the ABI, so we can pass them in the same registers. Interestingly, it doesn't matter if the native function expects its parameters to be passed by value or by reference. When an object with a non-trivial destructor is passed by value, internally, the caller makes a copy and passes it by reference.

After the call returns, handles are destroyed automatically. If the native function returns a handle, the handle is unwrapped, and its pointer is pushed onto the interpreter's stack.

Conclusion

It's regrettable but necessary to drop down into platform-specific code to handle calls to native functions. A cross-platform alternative is to pass an object that lets the callee access arguments on the interpreter's stack and set a return value. This ends up being unfriendly to developers though. It's much nicer to use the regular C++ mechanisms.

One thing I'm concerned about is that a lot of instructions must be executed to make a simple call. This is not going to be good for performance. In the future, CodeSwitch will have a JIT compiler, and the assembly glue can be generated for each call site. That should be quite a bit faster, especially if arguments are allocated in registers already.