The data-race-free memory model was an observation back in the early 90's that a correctly-synchronized program that has no data races will, even on a weak memory model multiprocessor, be indistinguishable from a fully sequentially-consistent memory model. This was adapted into the Java 5 memory model, with the happens-before relation becoming the definition of correctly-synchronized, and then C++11 explicitly borrowed that model and extended it to include weaker atomics, and pretty much everybody else borrows directly or indirectly from that C++ memory model. However, C++ had to go back and patch the definition because their original definition didn't work, and it took C++ standardizing a model to get the academic community to a state where we could finally formalize weak memory models.
In Java, happens-before is composed essentially of the union of two relations: program order (i.e., the order imposed within a single thread by imperative programming model) and synchronizes-with (i.e., the cross-thread synchronization constructs). C++ started out doing the same. However, this is why it broke: in the presence of weak atomics, you can construct a sequence of atomic accesses and program order relations across multiple threads to suggest that something should have a happens-before relation that actually doesn't in the hardware memory model. To describe the necessary relations, you need to add several more kinds of dependencies, and I'm not off-hand sure which dependencies ended up with which labels.
Note that, for a user, all of this stuff generally doesn't matter. You can continue to think of happens-before as a basic program order unioned with a cross-thread synchronizes-with and your code will all work, you just end up with a weaker (fewer things allowed) version of synchronizes-with. The basic motto I use is, to have a value be written on thread A and read on thread B, A needs to write the value then do a release-store on some atomic, and B then needs to load-acquire on the same atomic and only then can it read the value.
> To describe the necessary relations, you need to add several more kinds of dependencies, and I'm not off-hand sure which dependencies ended up with which labels.
No, that's not the thing I'm talking about. Those are the different ordering modes you can specify on atomic operations.
Rather, there's a panoply of definitions like "inter-thread happens-before" and "synchronizes-with" and "happens-before", and those are the ones I don't follow closely. It gets even more confusing when you're reading academic papers on weak memory models.
In Java, happens-before is composed essentially of the union of two relations: program order (i.e., the order imposed within a single thread by imperative programming model) and synchronizes-with (i.e., the cross-thread synchronization constructs). C++ started out doing the same. However, this is why it broke: in the presence of weak atomics, you can construct a sequence of atomic accesses and program order relations across multiple threads to suggest that something should have a happens-before relation that actually doesn't in the hardware memory model. To describe the necessary relations, you need to add several more kinds of dependencies, and I'm not off-hand sure which dependencies ended up with which labels.
Note that, for a user, all of this stuff generally doesn't matter. You can continue to think of happens-before as a basic program order unioned with a cross-thread synchronizes-with and your code will all work, you just end up with a weaker (fewer things allowed) version of synchronizes-with. The basic motto I use is, to have a value be written on thread A and read on thread B, A needs to write the value then do a release-store on some atomic, and B then needs to load-acquire on the same atomic and only then can it read the value.