This seems like an overly academic exercise. Can the compiler, or even the opera...

Sharlin · 2025-09-01T10:56:34 1756724194

Sure, that’s an entirely valid execution. But this is about what exact pairs of values (x, y) are observable by each thread. Some are allowed, others are not, by the semantics of atomic loads and stores guaranteed by the CPU. The starting or joining order of the threads doesn’t matter, except insofar that thread starting and joining both synchronize-with the parent thread.

In general, in the presence of hardware parallelism (ie. always since 2007 or so) the very real corner cases are much more involved than "what if there’s an interrupt" and thinking in terms of single-threaded concurrency is not very fruitful in the presence of memory orderings less strict than seq_cst. It’s not about what order things can happen in (because there isn’t an order), it’s principally about how writes are propagated from the cache of one core to that of another.

x86 processors have sort of lulled many programmers of concurrent code into a false sense of safety because almost everything is either unordered or sequentially consistent. But the other now-common architectures aren’t as forgiving.

thw_9a83c · 2025-09-01T11:04:42 1756724682

Thanks! So now the article actually makes sense to me. It would be nice to have this important clarification in the article itself. I'm not saying that a careful reader can't infer this point from the article even now, but I'm not such a careful reader.

Edit: Since the parent commenter added two more paragraphs after I posted my answer: I wasn't wondering about the pitfalls of sequentially consistent multi-threaded execution on various CPU architectures. It is a well-known fact that x86 adheres to a stronger Total Store Order (TSO) model, whereas POWER and ARM have weaker memory models and actually require memory barriers at the instruction level. Not just to prevent a compiler reordering.

cvoss · 2025-09-01T14:44:15 1756737855

It doesn't it matter for this article whether there exist possible executions other than the one the author inquires about.

The point of weak memory models is to formally define the set of all possible legal executions of a concurrent program. This gets very complicated in a hurry because of the need to accommodate 1) hardware properties such as cache coherence and 2) desired compiler optimization opportunities that will want to reason over what's possible / what's guaranteed.

In this case, there was a conflict between a behavior of Power processors and an older C++ standard that meant that a correct compiler would have to introduce extra synchronization to prevent the forbidden behavior, thus impacting performance. The solution was to weaken the memory model of the standard in a very small way.

The article walks us through how exactly the newer standard permits the funny unintuitive execution of the example program.

The exercise is academic, sure. A lot of hard academic research has gone into this field to get it right. But it has to be that precise because the problems it solves are that subtle.

thw_9a83c · 2025-09-01T15:28:17 1756740497

Yes, see: https://news.ycombinator.com/item?id=45091610

Originally, I was commenting, that the purpose of the article was initially unclear to me, since the order of thread execution cannot be determined anyway.

I now understand that there was a corner case in the POWER and ARM architectures when mixing seq-cst and acquire-release operations on the same atomic variable. Thus, C++26 will be updated to allow more relaxed behavior in order to maintain performance.

https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p06...