Hacker Newsnew | past | comments | ask | show | jobs | submit | bobismyuncle's commentslogin

Tagging will be redundant pretty soon with facial recognition...


How would this work in practice if it was litigated? Wouldn't you need proof that this was expressly communicated to the specific individual that violated and that they did so knowingly? Seems like it probably isn't enforceable...


In this case I think it might be, because if the event was recorded presumably so was the request to not share.


Just to be sure, same with Samsung, Harvard, UCLA? It means that someone once signed up with an email address from the organization? You can just do that?


In general, we can't see what users are doing. But we can see some things like that they upgrade to new releases. We only site users @ logos that are using it on a consistent basis



Astra DB seems to just be a tutorial showing how to generate embeddings using another service.

Weaviate seems to have added a similar capability — kind of wild that they announced on the same day.

Looks like Pinecone also includes reranking as part of the same process — did Weaviate add that as well?


No doubt, it's technically great that Pinecone trained their own embeddings model—but from a business/customer standpoint I can't help but ask _why?_. This is one of those "build it or buy it" cases where teams must decide to either integrate with an existing solution or build their own. I'm not sure I see the advantage (from an end user perspective) of using Pinecone's home-rolled embeddings model other than, say OpenAI's, especially given the cost factor: OpenAI embeddings costs really not much.

> Astra DB seems to just be a tutorial showing how to generate embeddings using another service.

The link I shared showed how a single request to Astra DB's data API has Astra DB automatically create embeddings behind the scenes, integrating with an embedding service the user chooses when they set their database up. Indeed embeddings are generated by another service and not in-house, but from an end-user perspective, they don't need to generate embeddings themeselves as was the prior art and coordinate requests between:

- get text - generate embeddings - take embeddings and send to DB

As of May when they announced Vectorize, one request did all that. I believe from an end-user experience, this is really analogous to what Weaviate and Pinecone are offering unless I'm missing something.


The only reason I can see for this is to create lock-in. I'd be pretty surprised if anymore than 5% of their customers would want a model by pinecone.


We trained a sparse embedding model because there is a lack of commercially licensed weights or APIs. For any proprietary model, particularly embeddings, there is a degree of lock-in.


This post has some more technical info: https://www.pinecone.io/blog/integrated-inference/

Makes a lot of sense to me to combine embedding, retrieval and reranking — I can imagine this being a way that they can differentiate themselves from the popular databases that have added support for vector search


I think it does actually do automated compliance checks.

"The Qodo Merge code review agent addresses these challenges by establishing an automated connection between ticket management systems and code reviews. The tool fetches ticket context from Jira or GitHub Issues when referenced in pull requests, then evaluates how closely the code changes align with the ticket’s requirements. It assigns compliance levels of “Fully compliant,” “Partially compliant,” or “Not compliant,” while maintaining a detailed audit trail of all reviews and changes."

From here: https://sdtimes.com/qodo-launches-automated-compliance-check...


Thanks, didn't see that in the text.


Noa’s mom is terminally ill with cancer and has pleaded to be able to hug her daughter again in the few months she has left. Such happy news!!


And Israel regime killed over 30'000 innocent kids and women, and destroyed so many families.


Curious why you decided to open source your entire product. Are you moving to an open core model? I’d expect in that case that much of 2, 3 & 4 would have stayed closed. Would be grateful if you can share your reasoning


This is often the move when the team's spent the money developing something and now the end's in sight, so they want the chance to leave and take the code with them.

Don't know if this is that at all, but it's always worth considering.


That is almost certainly what’s happening here. They raised $3M three years ago, at the peak of evaluations, and don’t have the metrics to raise a Series A in the current climate. Running out of money and want to leave some artifact behind. A very difficult and emotional transition.


I don't understand the "leave the code with them" part


You can look at the history of Erlang for a similar example. A very crude summary could be that Ericsson developed the language, took it to production, and then they decided to replace it with Java. The people that did the language design convinced management to release Erlang and the VM as FOSS, and then they promptly went and started a company that could use the tooling they'd developed.

I'm aware I'm leaving out a lot of detail, but it's not clear to me what has become public knowledge and what has not and I happen to know some people that were involved.


I think they mean by open sourcing, they can take the code to a new startup without having IP legality issues.


Would open sourcing the core IP of a company “typically” require board approval?

If a company goes under, the investors will want to sell off the IP, open sourcing everything would make that IP less valueable. There must be some blanket clause in the term sheet to cover that, right? Ie: founders won’t do anything which will materially hurt the company without board approval (or something, I am no where close to a lawyer, this is all conjecture)


If it mattered it would have become part of VC contracts years ago.

Early stage VCs make money on the big winners, not on the tail end of companies that don’t exit for 100x. For the most part, except for patents the IP is worth less than the Aeron chairs at the end.


> Would open sourcing the core IP of a company “typically” require board approval?

At this stage, if they can't raise a series A I'd assume they still have a majority of the "board" to themselves.


getting into enterprise is hard, so probably trying open source to help with that.


Enterprises are spending lots of time and money on this. The biggest issue that has slowed down sales cycle at this stage has been data governance. Most folks think it’s about accuracy or latency (which of course is an issue) but data governance can make this whole thing a non starter.


Can you explain more about why governance is the issue with a service like this? Companies not wanting their data to go off prem?


yes. some want BYOC solutions. others don't want to even be perceived as being used to train an LLM. not to mention CCPA, GDPR, etc etc etc.

lots of questions around what data is being sent to the LLM, or just schema.


Interesting. So by open sourcing you think companies can self host and it negates some of these issues? Or is your goal into increase future contributions to keep the project alive and developing?

What % of the NL -> SQL problem is solved in the current version? Ie is this something ready for some type of prod work now, or is it “in 2-3 years we’ll be there”?


Not OP, but there was an EHR SaaS company on HN a day or two back with a similar proposition: it’s open source, so it can be independently verified from a security perspective. It was interesting to me because the code was unusable to normal folks, and even other companies - one of the founders described their moat being the trouble of actually integrating with the ecosystem, and weren’t worried about competitors using it. It really hammered home to me how open source is more and more a marketing lever lately.


There are organizations using Dataherald in production right now.

The latency is ~20-30s and it takes some set up, so as long as those are not blockers it can be used in prod.


For companies that are willing to put in some effort, the self hosting option is a great one. There are certain use cases where this works now, and is already in production. These tend to be use cases with some constraints and don’t deal with very sensitive data.


I think you’re spot on that currently it can only test what the code actually does.

That’s probably why they specify that it’s for regression tests which are meant to do exactly that - test that future changes to the code do not change the current behavior unintentionally


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: