They’re decreasing demand for expensive GPUs that would be required to train a model. Fine-tuning and inference are less compute intense, so overall demand for top-end GPU performance is decreased even if inference compute demand is increased.
Basically, why train an LLM from scratch, and spend millions on GPUs, when you can fine tune LLAMA and spend hundreds instead.
They’re decreasing demand for expensive GPUs that would be required to train a model. Fine-tuning and inference are less compute intense, so overall demand for top-end GPU performance is decreased even if inference compute demand is increased.
Basically, why train an LLM from scratch, and spend millions on GPUs, when you can fine tune LLAMA and spend hundreds instead.