there are providers out there offering for $0 per million tokens, that doesn't m...

foundval · on Feb 19, 2024

(Groq Employee) Agreed, one should care, and especially since this particular service is very differentiated by its speed and has no competitors.

That being said, until there's another option at anywhere that speed.. That point is moot, isn't it :)

For now, Groq is the only option that can let you build an UX with near-instant response times. Or a live agents that help with a human-to-human interaction. I could go on and on about the product categories this opens.

bethekind · on Feb 20, 2024

Why go so fast? Aren't Nvidias products fast enough from a TPS perspective?

mike_hearn · on Feb 20, 2024

OpenAI have a voice powered chat mode in their app and there's a noticeable delay of a few seconds between finishing your sentence and the bot starting to speak.

I think the problem is that for realistic TTS you need quite a few tokens because the prosody can be affected by tokens that come a fair bit further down the sentence, consider the difference in pitch between:

"The war will be long and bloody"

vs

"The war will be long and bloody?"

So to begin TTS you need quite a lot of tokens, which in turn means you have to digest the prompt and run a whole bunch of forward passes before you can start rendering. And of course you have to keep up with the speed of regular speech, which OpenAI sometimes struggles with.

That said, the gap isn't huge. Many apps won't need it. Some use cases where low latency might matter:

- Phone support.

- Trading. Think digesting a press release into an action a few seconds faster than your competitors.

- Agents that listen in to conversations and "butt in" when they have something useful to say.

- RPGs where you can talk to NPCs in realtime.

- Real-time analysis of whatever's on screen on your computing device.

- Auto-completion.

- Using AI as a general command prompt. Think AI bash.

Undoubtably there will be a lot more though. When you give people performance, they find ways to use it.

foundval · on Feb 20, 2024

You've got good ideas. What I like to personally say is that Groq makes the "Copilot" metaphor real. A copilot is supposed to be fast enough to keep up with reality and react live :)

chaunnyong · on Feb 21, 2024

Hi foundval, can we connect on Linkedin please? :