The main problem with fibers/goroutines and FFI is that one of the benefits of fibers is that each fiber starts with a very small stack (usually just a few kBs) unlike native threads usually starting with a much larger stack (usually expressed in MB). The problem is that the code must be prepared to grow the stack if necessary, which is not compatible with the C FFI. That's one of the reasons why Go's FFI to C, for example, is slower than Rust.
Sure, if you are using split stacks, goroutine code is inherently slower. But for FFI you would switch to the main thread stack that is contiguous, so you won't pay any split stack cost there.
Go abandoned split stacks years ago due to the "hot split problem", and is now using contiguous stacks that are grown when necessary via stack copying. Go switches to the system stack when calling C code. There is some overhead (a few tens of ns) due to that switch, compared to languages like Rust or Zig which don't need to switch the stack.