Segfaults in scripting languages are remarkably common, especially if arbitrary bytecode can be loaded into the VM.
One I ran into in the wild recently is that in older versions of Lua, exceptions in GC finalizers (the `__gc` metamethod) can trigger a segfault. In those same versions of Lua, the bytecode format is notoriously dangerous to load.
I wonder whether this will be a large component of newer scripting language implementations. Do these safety issues warrant use of memory safe languages like Rust, or use of existing sandboxed VM implementations like WebAssembly?
> In those same versions of Lua, the bytecode format is notoriously dangerous to load.
Could you expand on this? Are the dangers such that they could be avoided with a bytecode verifier somewhat like Java's? Things like checking that the stack can never underflow, and that at the merge points of branches the stack always has the same depth.
Can confirm. I did some fuzz-testing of random scripting languages a while back, and reported bugs in things like GNU Awk.
Here's a trivial example:
/usr/bin/gawk 'for (i = ) in steve kemp rocks'
I found fuzz-testing like this very very useful when writing my own BASIC interpreter, and playing with scripting languages though. It's almost magical how quickly problems are found!
It has always been the position of the CPython developers that using python for sandboxing is unsupported. With that in mind it doesn't really matter if you can "exploit" python with weird bytecode because you are supposed to be on the other side of the airtight hatchway[0] anyways.
I don't know what the stance of other python runtimes are, but you should probably just use a sandbox at OS level which is likely to be tested far more thoroughly.
It's worth mentioning the interesting failure of pysandbox:
> I now think that putting a sandbox directly in Python cannot be secure. To build a secure sandbox, the whole Python process must be put in an external sandbox.
CPython bytecode will segfault Python if it’s slightly incorrect. And that’s fine, it’s not a security risk and it’s not worth the performance overhead of validating wonky bytecode.
I think people have, but it's not clear that searching for bugs by perverse input is a great use of time. Maybe better to focus on solving currently known issues.
node dev here - none of the prs here deal with segfaults from js code, because that doesn't really happen (at least, I've never seen it happen). You might have some more success searching V8's bug tracker.
Seg faults are "safe" in the programming language sense:
A program fragment is safe if it does not cause untrapped errors to occur. Languages where all program fragments are safe are called safe languages. Therefore, safe languages rule out the most insidious form of execution errors: the ones that may go unnoticed.
...
It is useful to distinguish between two kinds of execution errors: the ones that cause the computation to stop immediately, and the ones that go unnoticed (for a while) and later cause arbitrary behavior. The former are called trapped errors, whereas the latter are untrapped errors.
The problem is that segfaults are frequently the latter type of problem. You do some unsafe thing and corrupt your state, but keep on going without obvious issue. Then at some point in the future, your corrupted state causes an otherwise bug-free portion of you program to segfault.
The problem isn't the segfault -- it's the earlier unsafe behavior. That is, the untrapped error that eventually caused the segfault.
If the language only has trapped errors, which are indicated by seg faults, it's a safe language.
Every language has such errors. What does divide by zero do?
What does blowing the stack / infinite recursion do in Rust? It seg faults.
The seg fault is the safe behavior. If the stack overflow overwrote heap data structures or global data structures and the program kept running, that would be unsafe.
Returns a value, of course! (And that said, JavaScript does have some trapped errors, such as (1/0).foo.foo (and yes you need the second .foo…))
IMO, the execution "error" here (in this thread) is accessing memory illegally. Sometimes the runtime traps it, but sometimes it does not, and "sometimes" isn't always, so its effectively untrapped as we cannot depend on the trap. (Especially in adversarial circumstances.)
And further, the original quote talks about languages — the behavior of a language like C is that memory access is not necessarily trapped; the behavior is not well defined. Given the lack of a requirement in the C language for a trap, I think it is fair to call C "unsafe" given the above definition of safe/unsafe.
> What does blowing the stack / infinite recursion do in Rust? It seg faults.
Somewhat interestingly, it detects it and SIGABRTs, which technically isn't a segfault. And that's now some black magic that I'm curious about as I really thought it would have segfaulted.
> Somewhat interestingly, it detects it and SIGABRTs, which technically isn't a segfault. And that's now some black magic that I'm curious about as I really thought it would have segfaulted.
> I think it is fair to call C "unsafe" given the above definition of safe/unsafe.
I don't think anyone would disagree with this. What they're saying is that a segfault is safe and that's because a segfault is essentially your OS's version of an out-of-bounds error, one of the reasons that C is not safe is because an out-of-bounds access will not necessarily cause a segfault.
> such as (1/0).foo.foo (and yes you need the second .foo…)
I don't understand the nuance here. In my Firefox developer tools, I can do the following:
> (1/0).foo.foo
---> TypeError: (intermediate value).foo is undefined
> (1/0).foo
---> undefined
> (1/1).foo.foo
---> TypeError: 1.foo is undefined
> Infinity.foo.foo
---> TypeError: Infinity.foo is undefined
> undefined.foo
---> TypeError: undefined has no properties
While these are all different errors, I don't really understand why the '(1/0).foo.foo' case is an example of a _trapped_ error, but the others are not.
In Python, Java, C, OCaml, and most other languages, 1 / 0 aborts the program. That is, division by zero is a trapped error.
In JavaScript, it's not. It keeps going and lets you do stuff like access nonexistent properties .foo on the result, which are also untrapped errors.
So JavaScript is unsafe in Cardelli's terminology. It gives you untrapped errors rather than trapped ones. The program keeps chugging along until you find out later and have to trace backwards to the bug.
This is false. Integer division by zero is undefined, but floating point division is perfectly fine. Many languages have different operators to distinguish floating point and integer division, like pythons / for floats and // for integers.
I’m sort of surprised by that since my memory is that infinity isn’t a real number, rational, etc. That is, does infinity in Haskell obey some algebraic laws?
Yeah it's an interesting case. It appears that Inf is in floating point to AVOID a trapped error.
This answer has an interesting way of looking at it. If you go on the theory that floating points are supposed to represent reals, then in floating point, you can't tell if a value is actually zero or just indistinguishably close to zero.
In the case of "indistinguishably close to zero", you're getting the wrong answer, and the program doesn't halt. It keeps on chugging doing bad math. So that's an untrapped error, and it's UNSAFE by Cardelli's definition.
Floats aren’t reals, at best they are an approximation for certain calculations. +0 and -0 are defined and distinct floating point numbers. Floating point math is known to be problematic and does not evenly distribute numbers on the real line either. There are numerical methods used for reducing error on floating point operations when it is acceptable. Otherwise fixed point or intervals may be used.
I think they're trying to say that an out of bounds memory access within the emulated WebAssembly machine can be caught by the WebAssembly runtime. (I don't know whether this is true; I hardly know anything about WebAssembly.)
The way they said it, though, makes it sound like WebAssembly is implemented with full process sandboxing or something, which is patently false. It works that way in neither Chrome nor Firefox, and there are no other browsers right now.
WebAssembly follows the unfortunate C paradigm that no checks are done at runtime, only these that programmer requests explicitly (and only if no undefined behavior is involved), to improve speed.
The WASM sandbox can call only set of specified host functions, but I expect so much functionality snowballing inside sandboxes that we'll have to allow everything including unsafe ones, anyway.
WebAssembly uses 32 bit indexes to access memory. The default way of implementing memory safety for it is to put a few gigabytes of dead address space before and after the memory a WASM program uses. This makes it impossible for the memory access instructions to escape, despite having no runtime checks.
So it can segfault all the way up to the machine level. But more importantly, it's safe for it to 'segfault' out of its allocated memory, because it can't reach any other memory with 32 bit numbers.
This is nothing new, memory space of a process in all modern OSes is protected so that segfaults are safe too. Combined with original UNIX idea of small tools that communicate by pipes it was fine.
But now the processes are behemoths with gigabytes of dynamically linked libraries that are too hard to secure and to restrict system access, we just enable everything. This will happen to wasm. I'm sure there are WASM blobs configured with unfettered access to DOM in the wild already.
> This is nothing new, memory space of a process in all modern OSes is protected so that segfaults are safe too.
Right, but look at what I'm replying to.
"The way they said it, though, makes it sound like WebAssembly is implemented with full process sandboxing or something, which is patently false."
Unlike with Javascript, the program hosting a WASM script is immune to corrupt pointers inside the script. That's equivalent to OS-level isolation, which is pretty good!
> But now the processes are behemoths with gigabytes of dynamically linked libraries that are too hard to secure and to restrict system access, we just enable everything. This will happen to wasm. I'm sure there are WASM blobs configured with unfettered access to DOM in the wild already.
There will inevitably be bugs in the code handling the html/css/dom/rendering. But WASM greatly reduces the attack surface of the scripting VM.
The types of bugs most likely to still exist with WASM are the types where isolating it in a separate process that communicates by pipes wouldn't help.
import sys, threading, time
t = threading.Thread(target=sys.stdin.read, args=(1,))
t.start()
time.sleep(1)
sys.stdin.close()
Run it then after a few seconds press enter. It doesn't segfault in Python 3, but it still doesn't behave how I'd like, because I would like the close() to unblock the read(), but it doesn't unblock the read(), the read() still hangs until it gets some input.
The whole threading library in Python is a mess. Python was designed around single threaded programs with shared-nothing state and the cracks show as you move beyond that. The whole idea of replacing the GIL with... multiple same-process distinct-state Python interpreters with cheap-ish message passing sort of highlights how ugly it gets.
Back around python 1.5, there was almost a fork of python where every object had locks, there were memory arenas, and multiprocessing was almost thoughtlessly easy. That and stackless would've been great.
More recently than that, too, unless I'm misremembering. As sibling commenter points out, such a model is horrifically slow. Lock-for-every-object is too granular to perform well.
Not necessarily, but I just found it interesting that it wasn't really anything all that crazy - it could in theory be possible to encounter this in a big enough code base that makes heavy use of coroutines and custom exception types.
It is crazy to have a class constructor return something that isn't an instance of the class. That's nonsense code and is unlikely to occur in a codebase of any size, regardless of how often they define custom exception types.
if I used my Snap-On(tm) wrench as a prybar (incorrect usage) and broke it, Snap-On would still replace it in exchange for the broken tool and knowledge of the situation that broke it.
To pretend that a language bug isn't worth reporting because you and your codebases will never encounter it seems short-sighted. Down the line, years from now, who knows what you'll have to do to get something to work. Maybe it'll be something this silly, and you'll be happy that the folks before you encountered it and remediated it.
All that said, from a practical standpoint I agree with you.
If you're doing something wacky, and it turns out as wacky as you thought it would, you're probably attacking the problem from the wrong angle, anyway. I just want to remind everyone that 'wacky' things are required and implemented daily in codebases around the world -- regardless of how bad they smell.
Even some "safe" python libraries can be segfaulted using similar techniques. See for example https://github.com/pydata/numexpr/issues/323
I think this particular bug is even marked 'wont fix'. But I cannot find the bpo at the moment.
At a guess, you want something with dependent types?
Like Idris, or Haskell. You already have a Haskell example. This [0] release of Idris fixed a segfault when concatenating strings.
Maybe you meant a language that is proven from the ground up. Like CakeML. You can find a segfault example here [1].
Maybe you meant a language with an algebraic type system like Ada. You can find a segfault example here [2].
Maybe you meant something like Dotty (Research for the next version of Scala). You can find a segfault example here [3].
In short: You'll need to describe what you believe to be a "proper" type system, and name the languages you think fit that description, or no one can have a conversation with you.
You can trivially prove bottom in Haskell. Something like Agda or Idris would indeed fit the bill better.
Anyway, I said that this specific issue would not occur in a language with dependent types -- where incorrect code would cause the implementation to crash. Not that it is impossible to have a buggy compiler that at certain cases produces segfaults.
One I ran into in the wild recently is that in older versions of Lua, exceptions in GC finalizers (the `__gc` metamethod) can trigger a segfault. In those same versions of Lua, the bytecode format is notoriously dangerous to load.
I wonder whether this will be a large component of newer scripting language implementations. Do these safety issues warrant use of memory safe languages like Rust, or use of existing sandboxed VM implementations like WebAssembly?