It's possible to provide stronger guarantees than the spec requires. It wouldn't...

Jweb_Guru · on June 27, 2020

It would very much surprise me, since the weaker ordering requirements are a significant part of how ARM achieves lower power consumption.

reitzensteinm · on June 27, 2020

I agree that it will be a large power penalty, but the alternative may end up being the long tail of software suffering subtle concurrency correctness issues that just aren't there on x86. It may sacrifice some of their advantage but avoid being known as the CPU equivalent of a Ford Pinto.

There's still plenty of room to undercut Intel's historical 60% gross margins even if you're shipping largely interchangeable products.

The other way it might play out is a race to very high core counts as the main differentiator, providing single socket performance worth the hassle of not being able to run everything on it in a rock solid way. Postgres will work great. That redis fork that adds threads maybe not.

Or Intel may start to allow developers to relax memory correctness guarantees on a process by process granularity in their own progress to high core counts. It's hard to imagine their current methods scaling to 1k cores. But if you asked me ten years ago I'd have said they wouldn't have made it to 56 cores, either.

zozbot234 · on June 27, 2020

> Or Intel may start to allow developers to relax memory correctness guarantees on a process by process granularity

If Intel can allow developers/OS's to relax guarantees, why can't ARM allow OS's to strengthen them as necessary, while still keeping a relaxed memory model most of the time?

reitzensteinm · on June 27, 2020

Sure, although you'd probably be better off with strong by default with opt in weakness.

The amount of testing and verification in making sure something like Postgres runs well on a different memory order more is substantial. Adding the flag at the end is trivial.

This way around, software is correct by default unless it explicitly asks to be unsafe.

gpderetta · on June 27, 2020

I would be surprised if that's the case. Why would memory ordering have any effect on power consumption?

Jweb_Guru · on June 27, 2020

Because ensuring release/acquire for every operation (at least in the x86 way) requires supporting total store order, which requires much stronger cache coherence in the normal (no barrier) case and hence results in a lot more bus traffic. ARM didn't just make the memory model weaker for no reason.

BeeOnRope · on June 27, 2020

I am pretty sure this is not correct. Every reordering effect I'm aware of is a core-local effect. That is, it happens in the core before (or at the moment) the data hits the L1D. It does not occur due to a weaker cache coherence system.

Having a cache coherence system which was itself weaker, allowing reorderings consistent with the memory model, makes barriers and "implicit barriers" like address dependencies very expensive, and there is little evidence this is the case.

Even in a hypothetical core which had a cache coherency system coupled to the memeory model, you aren't really avoiding any coherence traffic, just allowing certain reorderings such as satisfying requests out of order.

bestboy · on June 27, 2020

Seems plausible to me. I guess it would be pretty hard to maintain a TPD of about 200W for 80 3GHz ARM cores [1]. The bus traffic to synchronize 80 CPU caches can probably be significant.

[1] https://www.anandtech.com/show/15871/amperes-product-list-80...

gpderetta · on June 27, 2020

Do you have a concrete example were a weaker memory model would allow you to avoid memory coherence traffic?