Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Conversations are always "reprocessed from scratch" on every message you send. LLMs are practically stateless and the conversation is the state, as in nothing is kept in memory between two turns.


> LLMs are practically stateless

This isn't true of any practical implementation: for a particular conversation, KV Cache is the state. (Indeed there's no state across conversations, but that's irrelevant to the discussion).

You can drop it after each response, but doing so increase the amount of token you need to process by a lot in multi-turn conversations.

And my point was that storing the KV cache for the duration of the conversation isn't possible if you switch between multiple providers in a single conversation.


Not exactly true ... KV and prompt caching is a thing


Assuming you include the same prompts in the new request that were cached in the previous ones.


As far as I understand, the entire chat is the prompt. So at the each round, the previous chat up to that point could already be cached. If I'm not wrong, Claude APIs require an explicit request to cache the prompt, while OpenAI's handle this automatically.


I don't understand how you are downvoted…




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: