More

barefeg · 2025-05-05T16:44:04 1746463444

For some time, authentication was not part of the MCP. Now it’s there https://modelcontextprotocol.io/specification/2025-03-26/bas... so I’m wondering what is being addressed in Klavis. Is it something that the reference implementation of MCP lacks? If so, will it eventually make it to MCP?

I think it’s important to release SDKs that are secure by default, so not providing this in the reference MCP would be a big issue.

In my view, MCP should be maintained by the vendors themselves. It’s too complicated to use in the enterprise if everything comes from the community with questionable security. So I applaud initiatives that try to solve this. I think smithery.ai provides something similar while also being a repository of servers (I’m not associated with them), but again the problem is needing to trust an extra middleman vendor.

Does anyone else share this view? For example, will AWS (or insert any other hyperscaler) end up providing the “Bedrock” of MCP where security is native to the platform? Or will individual companies (Box, Google, MS, etc.) start rolling them out as part of their standard developer APIs?

wirehack · 2025-05-05T17:36:57 1746466617

Yes thank you! the newest MCP spec added the authentication part but it seems that people think it is still not perfect and are doing more modifications to the auth part. E.g. https://github.com/modelcontextprotocol/modelcontextprotocol.... We will also keep an eye on the spec development.

throwup238 · 2025-05-05T20:28:40 1746476920

> For example, will AWS (or insert any other hyperscaler) end up providing the “Bedrock” of MCP where security is native to the platform?

Cloudflare already provides something along those lines with MCP on Workers with authentication (via their zero trust product AFAIK): https://blog.cloudflare.com/remote-model-context-protocol-se...

Sounds like they were one of the partners with Anthropic in their recent “Integrations” announcement.

barefeg · on Dec 17, 2024

Thanks for sharing your blogpost. We had a similar journey. I installed and tried both Langfuse and Phoenix and ended up choosing Langfuse due to some versioning conflicts on the python dependency. I’m curious if your thoughts change after V3? I also liked that it only depended on Postgres but the scalable version requires other dependencies.

The thing I liked about Phoenix is that it uses OpenTelemetry. In the end we’re building our Agents SDK in a way that the observability platform can be swapped (https://github.com/zetaalphavector/platform/tree/master/agen...) and the abstraction is OpenTelemetry-inspired.

marcklingen · on Dec 17, 2024

As you mentioned, this was a significant trade-off. We faced two choices:

(1) Stick with a single Docker container and Postgres. This option is simple to self-host, operate, and iterate on, but it suffers from poor performance at scale, especially for analytical queries that become crucial as the project grows. Additionally, as more features emerged, we needed a queue and benefited from caching and asynchronous processing, which required splitting into a second container and adding Redis. These features would have been blocked when going for this setup.

(2) Switch to a scalable setup with a robust infrastructure that enables us to develop features that interest the majority of our community. We have chosen this path and prioritized templates and Helm charts to simplify self-hosting. Please let us know if you have any questions or feedback as we transition to v3. We aim to make this process as easy as possible.

Regarding OTel, we are considering adding a collector to Langfuse as the OTel semantics are currently developing well. The needs of the Langfuse community are evolving rapidly, and starting with our own instrumentation has allowed us to move quickly while the semantic conventions were not developed. We are tracking this here and would greatly appreciate your feedback, upvotes, or any comments you have on this thread: https://github.com/orgs/langfuse/discussions/2509

suninsight · on Dec 18, 2024

So we are still on V2.7 - works pretty good for us. Havent tried V3 yet, and not looking to upgrade. I think the next big feature set we are looking for is a prompt evaluation system.

But we are coming around to the view that it is a big enough problem to have dedicated saas, rather than piggy back on observability saas. At NonBioS, we have very complex requirements - so we might just end up building it up from the ground up.

barefeg · on Aug 16, 2024

He hinted at wanted to be convinced to work at the company. It seems as if he wants people to just know he’s good and people offer him the job instead of applying for it. It’s a bit contradictory since he applied for the job in the first place.

barefeg · on May 17, 2024

From this interview https://youtu.be/Nlkk3glap_U?feature=shared it seemed Anthropic was focusing more on those topics rather than winning the race. He hinted at being “forced” by the competition to release their models.

barefeg · on April 29, 2024

I was confused for a bit but there is no relation to https://haystack.deepset.ai/

danielbln · on April 29, 2024

I've shared the confusion, it's a bit unfortunate naming considering the pretty mature and fleshed out LLM framework of the same name.

manojlds · on April 29, 2024

Yeah I was excited at first since the framework is pretty solid and a vector DB from the same folks would have been interesting.

barefeg · on March 27, 2024

This technique had a very recent resurgence via https://txt.cohere.com/int8-binary-embeddings/. Hugging face also covered the technique here https://huggingface.co/blog/embedding-quantization. It seems like a very good tradeoff compared to the shorter embeddings which require fine tuning via the matryoshka technique. On the other hand, Nils Reimers suggests that trivial quantization of the full precision embeddings is not as good as using “compression friendly” embeddings like Cohere’s Embed V3. Does anyone know what’s the difference in precision between trivial quantization and optimized embeddings?

barefeg · on Dec 9, 2023

Could you give more details/resources on the distributed in cluster buildkit cache?

SOLAR_FIELDS · on Dec 9, 2023

We use Dagger’s implementation. The basic approach is to have a buildkit engine run as a daemonset on the cluster, and clients specify the same docker socket that buildkit uses. The magic is in cache synchronization, eg only lazily pulling layers as the client requests them. This is scalable but obviously since caching is hard there are some complexities with efficient synchronization of cache layers and cache volumes. This is currently a long lived service that runs as a deployment alongside a bunch of ephemeral runners to manage the cache synchronization.

There are several other different architectures that range from simpler to more complex. The architecture I recommend people start out with is a single long lived beefy buildkit instance that a bunch of runners share, since that is much much simpler to implement. It of course has the downside that you have to refresh/rebuild the cache if the instance ever goes down. For runs that need read/write locks on volumes (eg Gradle build cache) my recommendation after trial and error to rsync those to the runners and then rsync them back after the run completes so you don’t have a bunch of locks fighting each other for the same folder.

baby_souffle · on Dec 9, 2023

I would also like to know more

barefeg · on Nov 26, 2023

Have you tried the arc browser? It embraces having lots of open tabs and organize them as you wish (or not) https://arc.net/

otteromkram · on Nov 26, 2023

I updated their slogan just now:

"Firefox is the Chrome replacement I've already been using."

barefeg · on Nov 18, 2023

Do you need to have the same number of positive and negatives? Is there any meaning of pairing a positive an a negative in the triplet?

raphaelty · on Nov 18, 2023

It's because of the loss of the model. I ask the model to produce a higher similarity between the query and the positive document rather than between the query and the negative document. I'll add more losses soon so there are more choices

alexmolas · on Nov 18, 2023

is the loss the usual lambdarank?

barefeg · on Nov 13, 2023

Out of curiosity, how does it compare to YouTube’s own generated transcripts?

eigenvalue · on Nov 13, 2023

I would say that overall they are much, much better than the auto generated ones from YouTube. If the speaker speaks incredibly clearly and slowly, without slang, etc, then the built in ones are good enough. But in a tougher situation, the biggest whisper model achieves near superhuman accuracy— way better.

kawsper · on Nov 13, 2023

I found the YouTube one to be really bad for my voice and the way I speak - Whisper does it perfectly (using the large dataset).

English is my second language, and I mumble.

BetterWhisper · on Nov 13, 2023

While it seems YouTube's auto-generated are hit or miss, I wonder if feeding them through an LLM can fix the mistakes and still get the video's idea out of them

josephrmartinez · on Nov 14, 2023

I've found that to be the case. I typically don't want a full transcript -- I want the materials list, or a summary, or a counterargument. I've found it is totally sufficient to just plop the transcript into an LLM and ask for my desired output. No need to clean of the transcript ahead of time.

tsurba · on Nov 13, 2023

Whisper is generally better than the one in youtube.