Thank you for doing this AMA 1. How many GroqCards are you using to run the Demo...

tome · on Feb 19, 2024

1. I think our GroqChat demo is using 568 GroqChips. I'm not sure exactly, but it's about that number.

2. We're working on our second generation chip. I don't know how much SRAM it has exactly but we don't need to increase the SRAM to get efficient scaling. Our system is deterministic, which means no need for waiting or queuing anywhere, and we can have very low latency interconnect between cards.

3. Yeah absolutely, see this video of a live demo on CNN!

https://www.youtube.com/watch?t=235&v=pRUddK6sxDg

ppsreejith · on Feb 19, 2024

Thank you, that demo was insane!

Follow up (noob) question: Are you using a KV cache? That would significantly increase your memory requirements. Or are you forwarding the whole prompt for each auto-regressive pass?

tome · on Feb 19, 2024

You're welcome! Yes, we have KV cache. Being able to implement this efficiently in terms of hardware requirements and compute time is one of the benefits of our deterministic chip architecture (and deterministic system architecture).

ppsreejith · on Feb 19, 2024

Thanks again! Hope I'm not overwhelming but one more question: Are you decoding with batch size = 1 or is it more?

tome · on Feb 19, 2024

That's OK, feel free to keep asking!

I think currently 1. Unlike with graphics processors, which really need data parallelism to get good throughput, our LPU architecture allows us to deliver good throughput even at batch size 1.

WiSaGaN · on Feb 19, 2024

How much do 568 chips cost? What’s the cost ratio of it comparing to setup with roughly the same throughput using A100?

benchess · on Feb 19, 2024

They’re for sale on Mouser for $20625 each https://www.mouser.com/ProductDetail/BittWare/RS-GQ-GC1-0109...

At that price 568 chips would be $11.7M

tome · on Feb 19, 2024

Yeah, I don't know what the cost to us is to build out our own hardware but it's significantly less expensive than retail.

fennecbutt · on Feb 20, 2024

I presume that's because it's a custom asic not yet in mass production?

If they can get costs down and put more dies into each card then it'll be business/consumer friendly.

Let's see if they can scale production.

Also, where tf is the next coral chip, alphabet been slacking hard.

bethekind · on Feb 20, 2024

I think Coral has been taken to the wooden shed out back. Nothing new out of them for years sadly

fennecbutt · on Feb 20, 2024

Yeah. And it's a real shame bc even before LLMs got big I was thinking, couple generations down the line and coral would be great for some home automation/edge AI stuff.

Fortunately LLMs and hard work of clever peeps running em on commodity hardware are starting to make this possible anyway.

Because Google Home/Assistant just seems to keep getting dumber and dumber...

WiSaGaN · on Feb 19, 2024

That seems to be per card instead of chip. I would expect it has multiple chips on a single card.

renewiltord · on Feb 19, 2024

From the description that doesn't seem to be the case, but I don't know this product well

> Accelerator Cards GroqCard low latency AI/ML Inference PCIe accelerator card with single GroqChip

WiSaGaN · on Feb 19, 2024

Missed that! Thanks for pointing out!

gautamcgoel · on Feb 19, 2024

Can you talk about the interconnect? Is it fully custom as well? How do you achieve low latency?

tome · on Feb 19, 2024

You can find out about the chip to chip interconnect from our paper below, section 2.3. I don't think that's custom.

We achieve low latency by basically being a software-defined architecture. Our functional units operate completely orthoganal to each other. We don't have to batch in order to achieve parallelism and the system behaviour is completely deterministic, so we can schedule all operations precisely.

https://wow.groq.com/wp-content/uploads/2023/05/GroqISCAPape...