If you *aren't* making a safe abstraction around your unsafe code, you're suppos...

DSMan195276 · on Jan 6, 2020

> If you aren't making a safe abstraction around your unsafe code, you're supposed to mark the surrounding function "unsafe" as well, and document its expected constraints in a 'Safety' comment block.

Sure, but now we're back to the issue that it's unclear what constraints `unsafe` code actually has to hold, meaning ensuring your abstraction is safe is just about impossible to do with certainty (And OS kernel code is definitely going to hit on a lot of those corner cases). You may think you have a safe interface, only to find out later you're relying on internal details that aren't guaranteed to stay the same between compiler versions or during optimizations.

With that said, while I agree with what you're proposing about marking surrounding code "unsafe", it leads to lots of strange cases and most people get it wrong or will even disagree with this proposal completely. For example, it can lead to cases where you mark a function `unsafe` even though it contains nothing but "safe" code. And at that point, it's up to you to determine if something is actually "safe" or "unsafe", meaning the markings of "safe" and "unsafe" just become arbitrary choices based on what you think "unsafe" means, rather than marking something that actually does one of the "unsafe" operations.

One of the best examples is pointer arithmetic. It's explicitly a safe operation, but it's also the first things most people would identify as being "unsafe", even though it is dereferencing that is the "unsafe" operation. Ex. You could easily write slice::from_raw_parts without using any `unsafe` code at all, it just puts a pointer and length together into a structure (it doesn't even need to do arithmetic!). It's only marked `unsafe` because it can break other pieces of "safe" code that it doesn't use, but it itself is 100% safe. You could just as easily argue the "other code" should be the `unsafe` code, since it's what will actually break if you use it incorrectly.

Perhaps the biggest annoyance I have is that the official Rust documentation pretty much goes against the idea you presented, saying

> People are fallible, and mistakes will happen, but by requiring these four unsafe operations to be inside blocks annotated with unsafe you’ll know that any errors related to memory safety must be within an unsafe block. Keep unsafe blocks small; you’ll be thankful later when you investigate memory bugs.

Which is just incorrect - memory safety issues are likely to be due to the "safe" code surrounding your unsafe code - when you're writing C code, the bug isn't that you dereferenced NULL or an OOB pointer, the bug is the code that gave you that pointer in the first place, and in Rust that code is likely to all be "safe". Point being, most of the advice on keeping your unsafe blocks small just leads to people making silly APIs that can be broken via the safe API they wrap (Even in weird ways, like calling a method containing only safe code), and unfortunately there are lots of subtle ways you can unintentionally break you Rust code that most people aren't going to have any idea about.

zozbot234 · on Jan 6, 2020

> Sure, but now we're back to the issue that it's unclear what constraints `unsafe` code actually has to hold

The Rust Nomicon actually documents these constraints for each 'unsafe' operation. Code that can ensure that these constraints hold can be regarded as 'safe' and rely on an unsafe{ } block. Code that can't, should be marked unsafe and document its own constraints in turn.

> You may think you have a safe interface, only to find out later you're relying on internal details

If you're relying on internal details, you're most likely doing it wrong, even by the standards of 'unsafe'. There are ways to nail down these details where required, at which point they're not even "internal" details anymore, but this has to be opted-into explicitly, for sensible reasons.

> It's only marked `unsafe` because it can break other pieces of "safe" code that it doesn't use, but it itself is 100% safe. You could just as easily argue the "other code" should be the `unsafe` code, since it's what will actually break if you use it incorrectly.

No, "other code" should not be marked unsafe because this would mean that relying on slices is inherently unsafe. Which is silly; the whole point of a "slice" type, contrasted with its "raw parts", is in the guarantees it provides. This is why slice::from_raw_parts is unsafe: not because if what it does, but what it states about the result.

> Which is just incorrect - memory safety issues are likely to be due to the "safe" code surrounding your unsafe code

This is a matter of how you assign "blame" for a memory safety error. It's definitely true however, that the memory unsafe operations play a key role, and I assume that's the point that these docs are making. I agree that people shouldn't be creating faulty abstractions around "unsafe" blocks, but the reason we can even identify this as an issue is because we have "unsafe" blocks in the first place!

DSMan195276 · on Jan 7, 2020

> The Rust Nomicon actually documents these constraints for each 'unsafe' operation. Code that can ensure that these constraints hold can be regarded as 'safe' and rely on an unsafe{ } block. Code that can't, should be marked unsafe and document its own constraints in turn. > > If you're relying on internal details, you're most likely doing it wrong, even by the standards of 'unsafe'. There are ways to nail down these details where required, at which point they're not even "internal" details anymore, but this has to be opted-into explicitly, for sensible reasons.

It's not that simple, what I'm getting at is that 'unsafe' code is still required to meet all the constraints that 'safe' code has to meet, even though those constraints are not fully documented (Because in general, in 'safe' code there are lots of things that are impossible to do but not explicitly UB), and they have changed them in significant ways in the past. You can see an incomplete list of them here[0]. There has been work to nail these details down for years, with the most recent being this one[1], but in general I would argue there hasn't been much progress since I first looked into it years ago now. Fun fact, the first Rust issue I remember reading that touched on this (and evolved into basically everything going on today) is this[4] one, which was made exactly 4 years ago today (And I mean exact!).

The 'Safety' sections for 'unsafe' operations are actually just extra requirements for those particular API on top that you also have to keep in addition to whatever safe Rust constraints apply. And I would argue in general they're not actually exhaustive, and they're mostly just a "best guess" - it's not a breaking change to introduce or realize there are more constraints. Ex. A little more then a year ago, they added the info that a slice can't be larger than `isize::MAX`[2], and a few months ago was this[3] one, where they added that the pointer length must be from a single allocation.

My point is that when you're doing something like writing an OS kernel, you end up running into these types of issues and corner-cases because Ex. They determine what your allocator is allowed to produce, or if some particular pointer can be validly put into a slice.

> No, "other code" should not be marked unsafe because this would mean that relying on slices is inherently unsafe. Which is silly; the whole point of a "slice" type, contrasted with its "raw parts", is in the guarantees it provides. This is why slice::from_raw_parts is unsafe: not because if what it does, but what it states about the result.

Sure, my point is that this is at best unclear, and I would argue wildly misunderstood. We've basically made two definitions of 'unsafe' - one for code that has to perform one of the five 'unsafe' operations (Which is a clearly defined 'yes' or 'no', and also the one described by the Rust docs), and one for code that can potentially break some type of invariant the will potentially cause UB somewhere else in the program (even if the code in question is completely safe and can't on its own cause any problems).

The problem is that Rust only guarantees the type of safety that it does if you do the second, but yet lots of people (I would almost argue most...) assume the first is sufficient and 'unsafe' by itself ensures all your memory bugs are in `unsafe` code. The Rust docs even describes safe vs. unsafe as two completely different languages (which they are) which brings into question why you would write `unsafe` code when you don't even need any of the `unsafe` operations. You arguably want a separate keyword for this - one that marks a function `unsafe`, but doesn't actually allow you to perform any of the `unsafe` operations in it.

[0] https://doc.rust-lang.org/nomicon/what-unsafe-does.html [1] https://github.com/rust-lang/unsafe-code-guidelines [2] https://github.com/rust-lang/rust/commit/1975b8d21b21d5ede54... [3] https://github.com/rust-lang/rust/commit/1a254e4f434e0cbb9ff... [4] https://github.com/rust-lang/rfcs/issues/1447

zozbot234 · on Jan 7, 2020

> - it's not a breaking change to introduce or realize there are more constraints.

These breaking changes are allowed precisely because they're meant to address soundness concerns. Even if at any given time they're just a "best guess" of what's actually needed, that's still wildly better than what e.g. C/C++ do, which is just to not guess at all. Even wrt. formal memory models, perhaps the most complex issue among the ones you mention here, C/C++ only got an attempt at a workable memory model with the 201x standards (C11, etc.). It's not unreasonable to expect that Rust may ultimately do better than that.

DSMan195276 · on Jan 7, 2020

I highly disagree with that assessment, in regards to C you're comparing apples to oranges IMO, because C is very lax compared to what Rust enforces now and could enforce in the future, just by virtue of having so many less actual features. With that, if you commit to being `gcc` (and `clang`) specific, you have a very high number of guarantees and flexibility, even more depending on what extra feature flags you pass.

> It's not unreasonable to expect that Rust may ultimately do better than that.

But, when? Rust is almost 10 years old, and questions about this stuff has been posed for years now - I was having these same conversations over two years ago. Last year's roadmap included working on the 'unsafe guidelines' I posted, and as an outsider looking in it's unclear to me how much progress has actually being made. Don't get me wrong - they've made a fair amount of progress, but there's still a lot to be done.

My big concern (which I don't consider to be unfounded) is that because there aren't that many big users of Rust actually developing really low-level stuff, the work on defining it isn't getting done. But to me it feels like a self fulfilling prophecy - a project like the Linux Kernel isn't going to end-up using Rust if there's big issues with connecting what they're doing now to Rust idioms (Ignoring the LLVM thing...), even though when Rust was first maturing it was being billed as the replacement for C (and still is).

pjmlp · on Jan 7, 2020

Well, it took 40 years for C to have any kind of memory model, so Rust has still some time available.

DSMan195276 · on Jan 7, 2020

That's a bit hand wavy - The Linux Kernel has been doing atomics in C for I believe over 20 years now, well before C11 came out. And even after it came out they don't use C11 atomics. `gcc` gives them enough guarantees on its own to implement correct atomics and define their own memory model, in some ways better then the C11 one (and in some ways worse, but mostly just from an API standpoint).

When that said, my concerns are more with the state of `unsafe` overall, the memory model is only one part of that (though it somewhat spurred conversions about the other issues).

pjmlp · on Jan 7, 2020

Linux kernel is just one kernel among many.

I bet that the pre-C11 memory semantics on Linux kernel aren't the same as e.g. on HP-UX kernel, across all supported hardware platforms.

It is also ironic that Java and .NET did it before C and C++, with their models being adopted as starting point.

DSMan195276 · on Jan 7, 2020

> Linux kernel is just one kernel among many.

> I bet that the pre-C11 memory semantics on Linux kernel aren't the same as e.g. on HP-UX kernel, across all supported hardware platforms.

Yeah, but if they both work, why does it matter? There exists more than one valid possible memory model. The point I was making is that with C, it has been possible to define valid semantics well before C11 because compilers and the language give enough guarantees already.

To that point, while I focused on atomics (Since atomics and threading are largely what was added in C11), the bulk of the actual memory model that made that possible was defined well before then. Strict-aliasing was a thing in C89 (Though I'm not sure compilers enforced it at that point) and is probably the only real notable "gotcha", and `gcc` and `clang` let you turn it off outright (And if you're not doing weird kernel stuff, it generally is easy to obey).

`gcc` and `clang` also have lots of documentation on the extra guarantees they give, such as very well documented inline assembly, type-punning via `union`, lots of attributes for giving the compiler extra information, etc.

Compared to what Rust offers right now, there is no comparison. With Rust, the aliasing model (Which `unsafe` code is not allowed to break) is still unknown, and a lot of the nitty/gritty details like the stuff I listed for `from_raw_parts` are simply leaky implementation details they're inheriting unintentionally from LLVM (And effectively from C - C is where things like "not allowed to go past one past the end of an allocated block" restriction comes from, along with a host of other things).

pjmlp · on Jan 8, 2020

While I do agree that Rust still needs to improve in this area, what happens when you port pre-C11 from gcc on Linux (x86) to aC on HP-UX (Itanium) and xlc on Aix (PowerPC), to keep up with my example?

DSMan195276 · on Jan 8, 2020

Well, I would clarify - In reference to the memory models I was talking about, it only applies to code written as part of those kernels, userspace code does not "inherit" that memory model. And I don't think portability of kernel code is much of a concern, the memory model is only one of a large variety of problems.

That said, for userspace, pthreads already gives enough guarantees on ordering via mutex's that there aren't really any problems unless you're trying to do atomics in userspace, which before C11 would have been hard to do portability. And the rest of the things I was talking about like the aliasing model are defined as part of the C standard (C89 and C99), so it's always the same regardless of the compiler (ignoring possible bugs).