Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> 30b+ parameter model doing RAG as part of a conversation with voice responses in less than a second, running on Nvidia.

I believe that this is doable - my pipeline is generally closer to 400ms without RAG and with Mixtral, with a lot of non-ML hacks to get there. It would also definitely be doable with a joint speech-language model that removes the transcription step.

For these use cases, time to first byte is the most important metric, not total throughput.



It’s important…if you’re building a chatbot.

The most interesting applications of LLMs are not chatbots.


> The most interesting applications of LLMs are not chatbots.

What are they then? Every use case I’ve seen is either a chatbot or like a copy editor which is just a long form chatbot.


Obviously not op, but these days LLMs can be fuzzy functions with reliably structured output, and are multi-modal.

Think about the implications of that. I bet you can come up with some pretty cool use cases that don't involve you talking to something over chat.

One example:

I think we'll be seeing a lot of "general detectors" soon. Without training or predefined categories, get pinged when (whatever you specify) happens. Whether it's a security camera, web search, event data, etc


Complex data tagging/enrichment tasks.


> The most interesting applications of LLMs are not chatbots.

In your opinion, what are the most interesting?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: