Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, I was planning to do this back then, but other stuff came up. There are many different ways in which this simple example can be improved:

- better detection of when speech ends (currently basic adaptive threshold)

- use small LLM for quick response with something generic while big LLM computes

- TTS streaming in chunks or sentences

One of the better OSS versions of such chatbot I think is https://github.com/yacineMTB/talk. Though probably many other similar projects also exist by now.



I keep wondering if a small LLM can also be used to help detect when the speaker has finished speaking their thought, not just when they've paused speaking.


Maybe using a voice activity detector, VAD would be a lighter (less resources required) option.


That works when you know what you’re going to say. A human knows when you’re pausing to think, but have a thought you’re in the middle of expressing. A VAD doesn’t know this and would interrupt when it hears a silence of N seconds; a lightweight LLM would know to keep waiting despite the silence.


And the inverse: the VAD would wait longer than necessary after a person says e.g. "What do you think?", in case they were still in the middle of talking.


> use small LLM for quick response with something generic while big LLM computes

Can't wait for poorly implemented chat apps to always start a response with "That's a great question!"


“Uhm, i mean, like, you know” would indeed be a little more human.


Just like poorly implemented human brains tend to do :P




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: