Because without those, then the llm has to encode way more parameters and way smaller context windows.
In a theoretical world, it would be better, but might not be much better.
Because without those, then the llm has to encode way more parameters and way smaller context windows.
In a theoretical world, it would be better, but might not be much better.