Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The diffusion process is usually compute-bound, while transformer inference is memory-bound.

Apple Silicon is comparable in memory bandwidth to mid-range GPUs, but it’s light years behind on compute.



> but it’s light years behind on compute.

Is that the only factor though? I wonder if pytorch is lacking optimization for the MPS backend.


This is the only factor. People sometimes perceive Apple's NPU as "fast" and "amazing" which is simply false.

It's just that NVIDIA GPU sucks (relatively) at *single-user* LLM inference and it makes people feel like Apple not so bad.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: