The diffusion process is usually compute-bound, while transformer inference is m...

tarruda · 2025-12-07T00:47:39 1765068459

> but it’s light years behind on compute.

Is that the only factor though? I wonder if pytorch is lacking optimization for the MPS backend.

rfoo · 2025-12-07T11:20:45 1765106445

This is the only factor. People sometimes perceive Apple's NPU as "fast" and "amazing" which is simply false.

It's just that NVIDIA GPU sucks (relatively) at *single-user* LLM inference and it makes people feel like Apple not so bad.