Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Off the top of my head: the user wants LLM to help him solve a word puzzle. Think something a bit like Wordle, but less represented in its dataset.

For that, the LLM needs to be able to compare words character by character reliably. And to do that, it needs at least one of: be able to fully resolve the tokens to characters internally within one pass, know to emit the candidate words in a "1 character = 1 token" fashion and then compare that, or know that it should defer to tool calls and do that.

An LLM trained for better tokenization-awareness would be able to do that. The one that wasn't could fall into weird non-humanlike failures.



Surely there are algorithms to more effectively solve Wordles, and many other word puzzles, than LLMs? LLMs could stil be in the loop for generating words: LLM proposes words, deterministic algorithm tell the score according to the rules of the puzzle, or even augment the list by searching adjacent word space; then at some point LLM submits the guess.

Given wordle words are real words, I think this kind of loop could fare pretty well.


Your mistake is thinking that the user wants an algorithm that solves Wordles efficiently. Or that making and invoking a tool is always a more efficient solution.

As opposed to: the user is a 9 year old girl, and she has this puzzle in a smartphone game, and she can't figure out the answer, and the mom is busy, so she asks the AI, because the AI is never busy.

Now, for a single vaguely Wordle-like puzzle, how many tokens would it take to write and invoke a solver, and how many to just solve it - working around the tokenizer if necessary?

If you had a batch of 9000 puzzle questions, I can easily believe that writing and running a purpose specific solver would be more compute efficient. But if we're dealing with 1 puzzle question, and we're already invoking an LLM to interpret the natural language instructions for it? Nah.


> Your mistake is thinking that the user wants an algorithm that solves Wordles efficiently. Or that making and invoking a tool is always a more efficient solution.

Weird how you tell that user is not worried about solving the problem efficiently so we might just as well use LLM directly for it, and go to saying how creating a tool might not be efficient either..

And as we know, LLMs are now very good at character-level problems, but are relatively good at making programs; in particular ones for problems we already know of. LLMs might be able to solve Wordles today with straight-up guessing by just adding spaces between the letters and using their very wide vocabulary, but can LLMs solve e.g. word search puzzles at all?

As you say, if there are 9000 puzzle questions, then a solver is a natural choice to due compute efficiency. But it will also answer the question, and do it without errors (here I'm overstating LLM's abilities a bit though; this would certainly not hold true to novel problems). No "Oh what sharp eyes you have! I'll address the error immdiately!" responses from the solver are to be expected, and actually unsolvable puzzles will be identified, not "lied" about. So why not use the solver even for a single instance of the problem?

I think the (training) effort would be much better on teaching LLMs when they should use an algorithm and when they should just use the model. Many use cases are much less complicated and even more easily solved algorithmically than word puzzle solvers as well; they might be e.g. sorting lists by a certain criteria (the list may be augmented by LLM-created additional data first), and for this task as well I'd rather use a deterministic algorithm than one driven by neural networks and randomness.

E.g. Gemini, Mistral and ChatGPT can do this already in some cases: if I ask them to "Calculate sum of primes between 0 and one million.", it looks like all of them created a piece of code to calculate it. Which is exactly what they should do. (The result was correct.)


What LLMs are "good at" is kind of up to us. No fundamental reason why they can't be trained for better character manipulation capabilities, among many other things.

There are always tasks that are best solved through direct character manipulation - as there are tasks that are best solved with Python code, constraint solvers or web search. So add one more teachable skill to the pile.

Helps that we're getting better at teaching LLMs skills.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: