Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thank you for doing this AMA

1. How many GroqCards are you using to run the Demo?

2. Is there a newer version you're using which has more SRAM (since the one I see online only has 230MB)? Since this seems to be the number that will drive down your cost (to take advantage of batch processing, CMIIW!)

3. Can TTS pipelines be integrated with your stack? If so, we can truly have very low latency calls!

*Assuming you're using this: https://www.bittware.com/products/groq/



1. I think our GroqChat demo is using 568 GroqChips. I'm not sure exactly, but it's about that number.

2. We're working on our second generation chip. I don't know how much SRAM it has exactly but we don't need to increase the SRAM to get efficient scaling. Our system is deterministic, which means no need for waiting or queuing anywhere, and we can have very low latency interconnect between cards.

3. Yeah absolutely, see this video of a live demo on CNN!

https://www.youtube.com/watch?t=235&v=pRUddK6sxDg


Thank you, that demo was insane!

Follow up (noob) question: Are you using a KV cache? That would significantly increase your memory requirements. Or are you forwarding the whole prompt for each auto-regressive pass?


You're welcome! Yes, we have KV cache. Being able to implement this efficiently in terms of hardware requirements and compute time is one of the benefits of our deterministic chip architecture (and deterministic system architecture).


Thanks again! Hope I'm not overwhelming but one more question: Are you decoding with batch size = 1 or is it more?


That's OK, feel free to keep asking!

I think currently 1. Unlike with graphics processors, which really need data parallelism to get good throughput, our LPU architecture allows us to deliver good throughput even at batch size 1.


How much do 568 chips cost? What’s the cost ratio of it comparing to setup with roughly the same throughput using A100?


They’re for sale on Mouser for $20625 each https://www.mouser.com/ProductDetail/BittWare/RS-GQ-GC1-0109...

At that price 568 chips would be $11.7M


Yeah, I don't know what the cost to us is to build out our own hardware but it's significantly less expensive than retail.


I presume that's because it's a custom asic not yet in mass production?

If they can get costs down and put more dies into each card then it'll be business/consumer friendly.

Let's see if they can scale production.

Also, where tf is the next coral chip, alphabet been slacking hard.


I think Coral has been taken to the wooden shed out back. Nothing new out of them for years sadly


Yeah. And it's a real shame bc even before LLMs got big I was thinking, couple generations down the line and coral would be great for some home automation/edge AI stuff.

Fortunately LLMs and hard work of clever peeps running em on commodity hardware are starting to make this possible anyway.

Because Google Home/Assistant just seems to keep getting dumber and dumber...


That seems to be per card instead of chip. I would expect it has multiple chips on a single card.


From the description that doesn't seem to be the case, but I don't know this product well

> Accelerator Cards GroqCard low latency AI/ML Inference PCIe accelerator card with single GroqChip


Missed that! Thanks for pointing out!


Can you talk about the interconnect? Is it fully custom as well? How do you achieve low latency?


You can find out about the chip to chip interconnect from our paper below, section 2.3. I don't think that's custom.

We achieve low latency by basically being a software-defined architecture. Our functional units operate completely orthoganal to each other. We don't have to batch in order to achieve parallelism and the system behaviour is completely deterministic, so we can schedule all operations precisely.

https://wow.groq.com/wp-content/uploads/2023/05/GroqISCAPape...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: