Apple Silicon is comparable in memory bandwidth to mid-range GPUs, but it’s light years behind on compute.
Is that the only factor though? I wonder if pytorch is lacking optimization for the MPS backend.
It's just that NVIDIA GPU sucks (relatively) at *single-user* LLM inference and it makes people feel like Apple not so bad.
Apple Silicon is comparable in memory bandwidth to mid-range GPUs, but it’s light years behind on compute.