I have an even simpler idea. We can have some kind of protocol header that specifies the size and shape of the structures and functions that can be called using a binary-level interface. Then we can just provide the implementation as machine code where the header maps each callable function to its respective code block. To "return" values, the callables can just push their results onto a "stack" at a known location and return control to the caller.
The bonus is this should be really fast and it doesn't even require any codec or message copying.
The bonus is this should be really fast and it doesn't even require any codec or message copying.