I don't think that quite does it. What I'd want -- if you want me to support you -- is access to the chip, libraries, and API documentation.
Best-case would be something I buy for <$2k (if out-of-pocket) or under $5k (if employer). Next best case would be a cloud service with a limited free tier. It's okay if it has barely enough quota that I can develop to it, but the quota should never expire.
(The mistake a lot of services make is to limit free tier to e.g. 30 day or 1 year, rather than hours/month; if I didn't get around to evaluating, switch employers, switch projects, etc. the free tier is gone).
I did sign up for your API service. I won't be able to use it in prod before your (very nice) privacy guarantees are turned into lawyer-compliant regulatory language. But it's an almost ideal fit for my application.
The issue with their approach is that the whole LLM must fit in the chips to run at all: you need hundreds of cards to run a 7B LLM.
This approach is very good if you want to spend several millions building a large inference server to achieve the lowest latency possible. But it doesn't make sense for a lone customer buying a single card, since you wouldn't really be able to run anything on it.
I don’t really understand this. If you are happy to buy a <2K card, then what does it matter if the service is paid or not? Clearly you have enough disposable income to not care about a ‘free’ tier.
2) Low-level access and doing things the manufacturer did not intend, rather than just running inference on Mixtral.
3) Knowing it will be there tomorrow, and I'm not tied to you. I'm more than happy to pay for hosted services, so long as I know after your next pivot, I'm not left hanging.
Why free tier?
I'm only willing to subsidize my employer on rare occasions.
Paying $12 for a prototype means approvals and paperwork if employer does it. I won't do it out-of-pocket unless I'm very sure I'll use it. I've had free tier translate into millions of dollars of income for one cloud vendor about a decade ago. Ironically, it never happened again, since when I switched jobs, my free tier was gone.
Don't blame you. Been at plenty of startups, resources are finite, and focus is important.
My only point was to, well, perhaps bump this up from #100 on your personal priority list perhaps to #87, to the limited extent that influences your business.
Groq Engineer here as well; we actually built our compiler to compile pytorch, TensorFlow, and Onnx natively, so a lot of the amazing work being done by y'all isn't building much of a moat. We got LLama2 working on our hardware in just a couple of days!
Best-case would be something I buy for <$2k (if out-of-pocket) or under $5k (if employer). Next best case would be a cloud service with a limited free tier. It's okay if it has barely enough quota that I can develop to it, but the quota should never expire.
(The mistake a lot of services make is to limit free tier to e.g. 30 day or 1 year, rather than hours/month; if I didn't get around to evaluating, switch employers, switch projects, etc. the free tier is gone).
I did sign up for your API service. I won't be able to use it in prod before your (very nice) privacy guarantees are turned into lawyer-compliant regulatory language. But it's an almost ideal fit for my application.