For me, the problem with LLMs is their infinite capacity to ad lib and improv; i...

For me, the problem with LLMs is their infinite capacity to ad lib and improv; it feels like trying to solve real problems during a taping of "Whose Line is it Anyway?"

Yeah of course it's not a realistic scenario for humans, but the LLM is not a human, it's a tool, and I expect it to have some sort of utility as a tool (repeatability, predictability, fit for purpose). If it can't be used as a tool, and it can't replace human-level inference, then it's worthless at best and antagonistic at worst.

I started testing with the goat/boat prompt because it was obvious given the framing that the LLM was trying to pattern match against the logic problem involving a wolf. Really takes the magic out of it. Most people who hadn't heard the puzzle before would answer with straight up logic, and those who had heard of it would maybe be confused about the framing but wouldn't hallucinate an invisible wolf was part of the solution as so many LLMs do.

To me this just highlights how I have to be an expert at the domain in which I'm prompting, because otherwise I can't be sure the LLM won't suggest I drown a ferret.