Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's possible to provide stronger guarantees than the spec requires.

It wouldn't surprise me if server focused ARM chips ended up providing x86 style ordering to ensure compatibility with ported software.



It would very much surprise me, since the weaker ordering requirements are a significant part of how ARM achieves lower power consumption.


I agree that it will be a large power penalty, but the alternative may end up being the long tail of software suffering subtle concurrency correctness issues that just aren't there on x86. It may sacrifice some of their advantage but avoid being known as the CPU equivalent of a Ford Pinto.

There's still plenty of room to undercut Intel's historical 60% gross margins even if you're shipping largely interchangeable products.

The other way it might play out is a race to very high core counts as the main differentiator, providing single socket performance worth the hassle of not being able to run everything on it in a rock solid way. Postgres will work great. That redis fork that adds threads maybe not.

Or Intel may start to allow developers to relax memory correctness guarantees on a process by process granularity in their own progress to high core counts. It's hard to imagine their current methods scaling to 1k cores. But if you asked me ten years ago I'd have said they wouldn't have made it to 56 cores, either.


> Or Intel may start to allow developers to relax memory correctness guarantees on a process by process granularity

If Intel can allow developers/OS's to relax guarantees, why can't ARM allow OS's to strengthen them as necessary, while still keeping a relaxed memory model most of the time?


Sure, although you'd probably be better off with strong by default with opt in weakness.

The amount of testing and verification in making sure something like Postgres runs well on a different memory order more is substantial. Adding the flag at the end is trivial.

This way around, software is correct by default unless it explicitly asks to be unsafe.


I would be surprised if that's the case. Why would memory ordering have any effect on power consumption?


Because ensuring release/acquire for every operation (at least in the x86 way) requires supporting total store order, which requires much stronger cache coherence in the normal (no barrier) case and hence results in a lot more bus traffic. ARM didn't just make the memory model weaker for no reason.


I am pretty sure this is not correct. Every reordering effect I'm aware of is a core-local effect. That is, it happens in the core before (or at the moment) the data hits the L1D. It does not occur due to a weaker cache coherence system.

Having a cache coherence system which was itself weaker, allowing reorderings consistent with the memory model, makes barriers and "implicit barriers" like address dependencies very expensive, and there is little evidence this is the case.

Even in a hypothetical core which had a cache coherency system coupled to the memeory model, you aren't really avoiding any coherence traffic, just allowing certain reorderings such as satisfying requests out of order.


Seems plausible to me. I guess it would be pretty hard to maintain a TPD of about 200W for 80 3GHz ARM cores [1]. The bus traffic to synchronize 80 CPU caches can probably be significant.

[1] https://www.anandtech.com/show/15871/amperes-product-list-80...


Do you have a concrete example were a weaker memory model would allow you to avoid memory coherence traffic?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: