Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The only reason LLM server responds with partial results instead of waiting and returning all at once is UX. It’s just too slow. But the problem of slow bulk responses isn’t unique for LLM and can be solved within HTTP 1.1 well enough. Doesn’t have to be the same server, can be a caching proxy in front of it. Any privacy concerns can be addressed by giving the user opportunity to tell server to cache/not to cache (can be as easy as submitting with PUT vs POST requests)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: