The ram is on the package for more than portability. It’s necessary for fast eno...

jsheard · on Oct 28, 2024

Then again, the rest of the industry has figured out a way to make slottable RAM almost as fast and compact as soldered RAM with the new CAMM2/LPCAMM2 standards. The M4 has LPDDR5X-7500 120GB/sec memory and there are already LPCAMM2-7500 120GB/sec modules, with even faster ones on the way: https://www.anandtech.com/show/21390/micron-ships-crucialbra...

Two of those modules working in parallel would hit "M Pro" speeds as well. I doubt Apple will be adopting them though, for the same reason they don't offer standard M.2 SSD slots even on systems that could obviously support them with minimal design compromises.

incrudible · on Oct 28, 2024

These are still well below what Apple offers at the high end and you can not buy systems like that right now. If you want high memory bandwidth on the CPU today, you will be charged a big markup on Epyc/Xeon/ThreadripperPro CPUs and motherboards, rather than the DRAM.

randmeerkat · on Oct 28, 2024

> Then again, the rest of the industry has figured out a way to make slottable RAM almost as fast and compact as soldered RAM…

Just be patient, the EU will take a large stick and force Apple to allow users to replace their RAM soon too.

gjsman-1000 · on Oct 28, 2024

Very unlikely. Apple can argue that less than 1% of computers users ever upgrade their memory (which is true), and after all, did the EU intervene when GPUs dropped their slotted memory?

jsheard · on Oct 28, 2024

> did the EU intervene when GPUs dropped their slotted memory?

The difference there is that slotted GPU memory is demonstrably impactical, but the memory on the M4 isn't demonstrably better than the LPCAMM2 module above. It's literally the exact same spec. Not that I expect the EU to do anything either when they didn't act on Apples soldered-in SSDs, which definitely aren't any better than standardized M.2 drives.

gjsman-1000 · on Oct 28, 2024

Actually, incorrect. On some scenarios, you’d need up to 4 CAMM2 slots to do what Apple does. This is due to CAMM2 maxing out at 128 bit busses; but M3 Max chips are currently at 512. Needless to say, battery life most affected.

https://news.ycombinator.com/item?id=40287592

jsheard · on Oct 28, 2024

Yes, the higher end Max and Ultra chips would still need soldered memory for sure. Two CAMM modules flanking opposite sides of the SOC is probably doable though, so I think the M Pros could practically have socketed memory.

mmastrac · on Oct 28, 2024

GPU memory is 20 MT/s+, Apple is ~6 MT/s, LPCAMM supports ~7500 MT/s.

Easy heuristic: if your memory transfer rate is more than 1.5x the standard, you can solder RAM. If not, you must use the standard.

wpwpwpw · on Oct 28, 2024

For SSD speeds, that was already dismistified with iBoff new adapter which makes an M1 Macbook Air upgradable and faster. I wouldn't be surprised if the same was true for RAM using the CAMM standard positioned near the CPU. Or maybe even better, slotted memory chips like in the old days, with a memory controller ready to accept multiple chip sizes.

candiddevmike · on Oct 28, 2024

> necessary for fast enough transfer speeds

Source?

stu2b50 · on Oct 28, 2024

When was the last time you saw a GPU with slottable memory?

For transfer speeds, look at the data sheets for the M series. Much faster than DDR4 or DDR5 RAM. In the ballpark of GPU memory.

Wytwwww · on Oct 28, 2024

Would the people who were buying the baseline 8GB model (presumably just for general computing/office work) care about the GPU being slightly slower, though?

I bet that the extreme lag when you run out of memory because you have an Electron app or two, several browser tabs and something like Excel is way more noticeable.

Hardly anyone is using Macs for gaming these days and almost anybody who does something GPU intense would need more than 16GB anyway.

ankleturtle · on Oct 28, 2024

This has been the approach since the M1s.

See: https://www.theregister.com/2020/11/19/apple_m1_high_bandwid...

> The SoC has access to 16GB of unified memory. This uses 4266 MT/s LPDDR4X SDRAM (synchronous DRAM) and is mounted with the SoC using a system-in-package (SiP) design. A SoC is built from a single semiconductor die whereas a SiP connects two or more semiconductor dies.

KoolKat23 · on Oct 28, 2024

https://www.apple.com/ie/newsroom/2020/11/apple-unleashes-m1...

aseipp · on Oct 28, 2024

Source for what? Parallel RAM interfaces have strict timing and electrical requirements. Classic DDR sockets are modular at the cost of peak bandwidth and bus width. The wider your bus, the more traces you have to run in parallel from the socket to the compute complex, which becomes harder and harder. You don't see sockets for HBM or GDDR for a good reason. The proof is there.

LPCAMM solutions mentioned upthread resolve some of this by making the problem more "three dimensional" from what I can tell. They reduce the length of the traces by making the pinout more "square" (as opposed to thin and rectangular) and stacking them closer to the actual dies they connect to. This allows you to cram swappable memory into the same form factor, while retaining the same clock speeds/size/bus width, and without as many design complexities that come from complex socket traces.

In Apple's case they connect their GPU to the same pool of memory that their CPU uses. This is a key piece of the puzzle for their design, because even if the CPU doesn't need 200GB/s of bandwidth, GPUs are a very different story. If you want them to do work, you have to feed them with something, so you need lots of memory bandwidth to do that. Note that Samsung's LPCAMM solutions are only 128-bits wide and reported around 120GB/s. Apple's gone as high as 1024-bit busses with hundreds of GB/s of bandwidth; the M1 Max was released years ago and does 400GB/s. LPCAMM is still useful and a good improvement over the status quo, of course, but I don't think you're even going to see 256-bit or 512-bit versions just so soon.

And if your problem can be parallelized, then the higher your bus width, the lower your clock speeds can go, so you can get lower power while retaining the same level of performance. This same dynamic is how an A100 (1024-bit bus) can smoke a 3090 (384-bit) despite a far lower clock speed and power usage.

There is no magical secret or magical trick. You will always get better performance, less noise, at lower power by directly integrating these components together. It's a matter of if it makes sense given the rest of your design decisions -- like whether your GPU shares the memory pool or not.

There are alternative memory solutions like IBM using serial interfaces for disaggregating RAM and driving the clock speeds higher in the Power10 series, allowing you to kind of "socket-ify" GDDR. But these are mostly unobtainium and nobody is doing them in consumer stuff.