ykjdhdghhj.life

Yet Another Attempt at FFI for Node.js

25 May 2020

Note: This article first appeared on dev.to

Earlier this year, I was working on optimizing a data path inside a Node.js library that creates a bunch of data, encodes it to MessagePack, then sends it off to an HTTP server. I thought that maybe we could do some interesting things in native code that would be harder to do in JavaScript, like an optimized MessagePack encoder, and less-costly multithreading. Naturally, calling into native code from Node.js incurs some overhead on its own, so I was exploring some alternatives.

At the same time, I had been reading about io_uring, a new feature in the Linux kernel that allows for certain system calls to be made by passing the arguments through a ring buffer in memory that’s shared by the process and the kernel, for extra speed. This reminded me about how some features of Node.js are implemented by sharing a Buffer between the native and JavaScript code, through which data can be passed. This technique is much simpler than what io_uring does, mostly because it’s done for a single purpose on a single thread. The clearest example I can think of in the Node.js API that uses this is fs.stat(), in which the results of the uv_fs_stat() call are stored in a Buffer which is then read from the JavaScript side.

The thought progression here was that this technique could be used to call native functions from JavaScipt in userland. For example, we could have a C function like:

uint32_t add(uint32_t a, uint32_t b) {
  return a + b;
}

And then to call it, we could have a shared buffer which would effectively have the following struct inside it:

struct shared_buffer {
  uint32_t returnValue;
  uint32_t a;
  uint32_t b;
};

To call the function form JS, we first assign the values to a and b in our shared buffer. Then, we call the function and then read the value form the struct:

function jsAdd(a, b) {
  const uint32buf = new Uint32Array(3);
  uint32buf[1] = a;
  uint32buf[2] = b;
  // This next bit is hand-wavey. I'll get to that in a bit!
  callNativeFunction(add, uint32buf.buffer);
  return uint32buf[0];
}

In this example, callNativeFunction would retreive the native function, then give it the arguments from the shared buffer, and put the return value back into the shared buffer.

At this point, great! We’ve got a way of calling native functions that bypasses a lot of the marshalling that happens between JS and native code by just putting data directly into memory from JS, and then reading the return value right out of it.

The detail here is that callNativeFunction is not a trivial thing to do. You need to have a function pointer for the function you’re going to call, and know its signature. Fortunately, we can handle all this because we’re only creating this native addon for one function. Case closed.

But what about FFI?

FFI (Foreign Function Interface) refers to the ability to call functions in native code (that is, from a low-level language like C or C++) from a higher level language, like JS, Ruby or Python. These languages all support some way of calling functions dynamically, without knowing function signatures at compile time, because there is no compile time. (Okay, that’s not technically true with JIT compilers and all, but for these purposes we can consider them non-compiled.)

C/C++ does not have a built-in way of dynamically determining how to call a function, and with what arguments, like JavaScript does. Instead, the complexities of dealing with calling functions, passing them arguments, grabbing their return values, and handling the stack accordingly are all dealt with by the compiler, using techniques specific to the platform. We call these techniques “calling conventions” and it turns out there are tons of them.

In Node.js the typical thing to do is ignore all this and just write a custom wrapper in C or C++ that calls the exact functions we want. While dealing with these things at compile time is the norm, there are ways of handling them at run time. Libraries like libffi and dyncall exist to fill this void. Each of these libraries provides an interface to deliver arguments to functions and extract their return values. They handle the differences between calling conventions on many platforms. These calls can be built up dynamically, even from a higher-level language, as long as you create reasonable interfaces between libffi or dyncall and the higher-level language.

Enter sbffi

The shared buffer technique didn’t actually pan out for the code I was working on, because it turned out that converting the data into something readable by native code and then into MessagePack was particularly costly. Moving operations to separate threads didn’t really help.

That being said, I still think the approach has value, and I’d like more folks to try it and see if it makes sense for their workloads, so I put together an FFI library for Node.js using the shared buffer technique to get and dyncall to call the native functions dynamically. It’s called sbffi and you can use it today as a simple way to call your already-compiled native libraries.

Take our add example from above:

// add.c
uint32_t add(uint32_t a, uint32_t b) {
  return a + b;
}

Now assume we’ve compiled to to a shared library called libadd.so. We can make the add function available to JavaScript with the following:

// add.js
const assert = require('assert');
const { getNativeFunction } = require('sbffi');
const add = getNativeFunction(
  '/path/to/libadd.so', // Full path to the shared library.
  'add', // The function provided by the library.
  'uint32_t', // The return value type.
  ['uint32_t', 'uint32_t'] // The argument types.
);

assert.strictEqual(add(23, 32), 55);

It turns out that while dynamically building up the function calls incurs some noticeable overhead, this approach is relatively quick. Of course, this test is for a very small function that does very little. Your mileage may vary, but it may be worth trying the shared buffer approach, either manually or with sbffi, the next time you need to call into native code from Node.js.