Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What's more likely to be a problem, is the request to be concise.

For some reason, this still seems to not be widely known among even technical users: token generation is where the computation/"thinking" in LLMs happen! By forcing it to keep its answers short, you're starving the model for compute, making each token do more work. There's a small, fixed amount of "thinking" LLM can do per token, so the more you squeeze it, the less reliable it gets, until eventually it's not able to "spend" enough tokens to produce a reliable answer at all.

In other words: all those instructions to "be terse", "be concise", "don't be verbose", "just give answer, no explanation" - or even asking for answer first, then explanations - they're all just different ways to dumb down the model.

I wonder if this can explain, at least in part, why there's so much conflicted experiences with LLMs - in every other LLM thread, you'll see someone claim they're getting great results at some tasks, and then someone else saying they're getting disastrously bad results with the same model on the same tasks. Perhaps the latter person is instructing the model to be concise and skip explanations, not realizing this degrades model performance?

(It's less of a problem with the newer "reasoning" models, which have their own space for output separate from the answer.)



If that's correct then it's a significant problem with LLMs that needs to be addressed. Would it work to have the agent keep the talky, verbose answer to itself and only return to a finally summary to the user?


That's what the "reasoning" models do, effectively. Some LLM services hide or summarize that part for you, other return it verbatim, and ofc. you get the full thing if you're using a local reasoning model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: