GPUs are notoriously bad on exploiting sparsity. I wonder if this architecture c...

		deniz_tekalp on Feb 19, 2024 \| parent \| context \| favorite \| on: Groq runs Mixtral 8x7B-32k with 500 T/s GPUs are notoriously bad on exploiting sparsity. I wonder if this architecture can do a better job. The groq engineers in this thread, if a neural network had say 60% of its weights set to 0, what would it do to cost & speed in your hardware?