Ok what's turn detection?

kwindla · 2025-03-07T03:25:39 1741317939

Turn detection is deciding when a person has finished talking and expects the other party in a conversation to respond. In this case, the other party in the conversation is an LLM!

remram · 2025-03-07T03:31:17 1741318277

Oh I see. Not like segmenting a conversation where people speak in turn. Thanks.

password4321 · 2025-03-07T21:28:08 1741382888

Speaker diarization is also still a tough problem for free models.

whiddershins · 2025-03-07T17:04:42 1741367082

huh. how is analyzing conversations in the manner you described NOT the way to train such a model?

remram · 2025-03-07T18:13:18 1741371198

Did you reply to the wrong comment? No one is taking about training here.

ry167 · 2025-03-07T03:25:23 1741317923

Detecting when one user of a conversation has finished talking.

It’s a big deal for detecting human speech when interacting with LLM systems

woodson · 2025-03-07T06:33:40 1741329220

It’s often called endpoint detection (in ASR).

lelag · 2025-03-07T16:25:52 1741364752

Yes, weird that they didn't use that term for this project.

kwindla · 2025-03-07T17:48:16 1741369696

I've talked about this a lot with friends.

Endpoint detection (and phrase endpointing, and end of utterance) are terms from the academic literature about this, and related, problems.

Very few people who are doing "AI Engineering" or even "Machine Learning" today know these terms. In the past, I argued that we should use the existing academic language rather than invent new terms.

But then OpenAI released the Realtime API and called this "turn detection" in their docs. And that was that. It no longer made sense to use any other verbiage.

mncharity · 2025-03-07T23:43:37 1741391017

Re SEO, I note "utterance" only occurs once, in a perhaps-ephemeral "Things to do" description.

To help with "what is?" and SEO, perhaps something like "Turn detection (aka [...], end of utterance)"... ?

lelag · 2025-03-07T18:33:04 1741372384

Thank for the explanation. I guess it makes some sense, considering many people with no nlp background are using those models now…