How would this work in practice if it was litigated? Wouldn't you need proof that this was expressly communicated to the specific individual that violated and that they did so knowingly? Seems like it probably isn't enforceable...
Just to be sure, same with Samsung, Harvard, UCLA? It means that someone once signed up with an email address from the organization? You can just do that?
In general, we can't see what users are doing. But we can see some things like that they upgrade to new releases. We only site users @ logos that are using it on a consistent basis
No doubt, it's technically great that Pinecone trained their own embeddings model—but from a business/customer standpoint I can't help but ask _why?_. This is one of those "build it or buy it" cases where teams must decide to either integrate with an existing solution or build their own. I'm not sure I see the advantage (from an end user perspective) of using Pinecone's home-rolled embeddings model other than, say OpenAI's, especially given the cost factor: OpenAI embeddings costs really not much.
> Astra DB seems to just be a tutorial showing how to generate embeddings using another service.
The link I shared showed how a single request to Astra DB's data API has Astra DB automatically create embeddings behind the scenes, integrating with an embedding service the user chooses when they set their database up. Indeed embeddings are generated by another service and not in-house, but from an end-user perspective, they don't need to generate embeddings themeselves as was the prior art and coordinate requests between:
- get text
- generate embeddings
- take embeddings and send to DB
As of May when they announced Vectorize, one request did all that. I believe from an end-user experience, this is really analogous to what Weaviate and Pinecone are offering unless I'm missing something.
We trained a sparse embedding model because there is a lack of commercially licensed weights or APIs. For any proprietary model, particularly embeddings, there is a degree of lock-in.
Makes a lot of sense to me to combine embedding, retrieval and reranking — I can imagine this being a way that they can differentiate themselves from the popular databases that have added support for vector search
I think it does actually do automated compliance checks.
"The Qodo Merge code review agent addresses these challenges by establishing an automated connection between ticket management systems and code reviews. The tool fetches ticket context from Jira or GitHub Issues when referenced in pull requests, then evaluates how closely the code changes align with the ticket’s requirements. It assigns compliance levels of “Fully compliant,” “Partially compliant,” or “Not compliant,” while maintaining a detailed audit trail of all reviews and changes."
Curious why you decided to open source your entire product. Are you moving to an open core model? I’d expect in that case that much of 2, 3 & 4 would have stayed closed. Would be grateful if you can share your reasoning
This is often the move when the team's spent the money developing something and now the end's in sight, so they want the chance to leave and take the code with them.
Don't know if this is that at all, but it's always worth considering.
That is almost certainly what’s happening here. They raised $3M three years ago, at the peak of evaluations, and don’t have the metrics to raise a Series A in the current climate. Running out of money and want to leave some artifact behind. A very difficult and emotional transition.
You can look at the history of Erlang for a similar example. A very crude summary could be that Ericsson developed the language, took it to production, and then they decided to replace it with Java. The people that did the language design convinced management to release Erlang and the VM as FOSS, and then they promptly went and started a company that could use the tooling they'd developed.
I'm aware I'm leaving out a lot of detail, but it's not clear to me what has become public knowledge and what has not and I happen to know some people that were involved.
Would open sourcing the core IP of a company “typically” require board approval?
If a company goes under, the investors will want to sell off the IP, open sourcing everything would make that IP less valueable. There must be some blanket clause in the term sheet to cover that, right? Ie: founders won’t do anything which will materially hurt the company without board approval (or something, I am no where close to a lawyer, this is all conjecture)
If it mattered it would have become part of VC contracts years ago.
Early stage VCs make money on the big winners, not on the tail end of companies that don’t exit for 100x. For the most part, except for patents the IP is worth less than the Aeron chairs at the end.
Enterprises are spending lots of time and money on this. The biggest issue that has slowed down sales cycle at this stage has been data governance. Most folks think it’s about accuracy or latency (which of course is an issue) but data governance can make this whole thing a non starter.
Interesting. So by open sourcing you think companies can self host and it negates some of these issues? Or is your goal into increase future contributions to keep the project alive and developing?
What % of the NL -> SQL problem is solved in the current version? Ie is this something ready for some type of prod work now, or is it “in 2-3 years we’ll be there”?
Not OP, but there was an EHR SaaS company on HN a day or two back with a similar proposition: it’s open source, so it can be independently verified from a security perspective. It was interesting to me because the code was unusable to normal folks, and even other companies - one of the founders described their moat being the trouble of actually integrating with the ecosystem, and weren’t worried about competitors using it. It really hammered home to me how open source is more and more a marketing lever lately.
For companies that are willing to put in some effort, the self hosting option is a great one. There are certain use cases where this works now, and is already in production. These tend to be use cases with some constraints and don’t deal with very sensitive data.
I think you’re spot on that currently it can only test what the code actually does.
That’s probably why they specify that it’s for regression tests which are meant to do exactly that - test that future changes to the code do not change the current behavior unintentionally