More

jbellis · 2025-12-17T02:59:46 1765940386

Love to see people leveraging static analysis for AI agents. Similar to what we're doing in Brokk but we're more tightly coupled to our own harness. (https://brokk.ai/) Would love to compare notes; if you're interested, hmu at [username]@brokk.ai.

Quick comparison: Auditor does framework-specific stuff that Brokk does not, but Brokk is significantly faster (~1M loc per minute).

ThailandJohn · 2025-12-17T04:38:52 1765946332

Would be really cool to compare notes :D Sent from a "non tech" company email so it doesn't get filtered lol.

My speed really depends on language and what needs indexing. On pure Python projects I get around 220k loc/min, but for deeper data flow in Node apps (TypeScript compiler overhead + framework extraction) it's roughly 50k loc/min.

Curious what your stack is and what depth you're extracting to reach 1M/min - those are seriously impressive numbers! :D

jbellis · 2025-12-13T19:01:16 1765652476

Now I'm curious about what fastutil's implementation is doing.

jbellis · 2025-12-09T18:58:46 1765306726

> where are the professional tools, meant to be used for people who don't want to do vibe-coding, but be heavily assisted by LLMs?

This is what we're building at Brokk: https://brokk.ai/

Quick intro: https://blog.brokk.ai/introducing-lutz-mode/

jbellis · 2025-12-05T18:45:27 1764960327

Has the current administration made it harder to qualify for an o1 visa?

proberts · 2025-12-05T19:38:11 1764963491

A bit but not significantly so.

jbellis · 2025-11-19T05:24:34 1763529874

swebench is (1) terrible and (2) saturated

jbellis · 2025-11-17T22:18:01 1763417881

"Released" but not available on API. I think they rushed it out before Gemini 3 drops.

jbellis · 2025-11-11T21:28:29 1762896509

two great points here: (1) quantization is how you speed up vector indexes, and (2) how your build your graph matters much much less*

These are the insights behind DiskANN, which has replaced HNSW in most production systems.

past that, well, you should really go read the DiskANN paper instead of this article, product quantization is way way way way way more effective than simple int8 or binary quant.

here's my writeup from a year and a half ago: https://dev.to/datastax/why-vector-compression-matters-64l

and if you want to skip forward several years to the cutting edge, check out https://arxiv.org/abs/2509.18471 and the references list for further reading

* but it still matters more than a lot of people thought circa 2020

antirez · 2025-11-11T21:36:22 1762896982

Hi! I worked with product quantization in the past in the context of a library I released to read LLMs stored in llama.cpp format (GUFF). However, in the context of in-memory HNSWs, I found them to make a small difference. The recall is already almost perfect with int8. Of course it is very different in the case you are quantizing an actual neural network with, for instance 4 bit quants. There it will make a huge difference. But in my use case I picked what would be the fastest, given that both performed equally well. What could be potentially done with PQ in the case of Redis Vector Sets is to make 4 bit quants work decently (but not as well as int8 anyway), however given how fat the data structure nodes are per-se, I don't think this is a great tradeoff.}

All this to say: the blog post tells mostly the conclusions, but to reach that design, many things were tried, including things that looked cooler but in the practice were not the best fit. It's not by chance that Redis HNSWs are easily able to go 50k full queries/sec in decent hardware.

jbellis · 2025-11-11T21:43:08 1762897388

if you're getting near-perfect recall with int8 and no reranking then you're either testing an unusual dataset or a tiny one, but if it works for you then great!

antirez · 2025-11-11T21:48:47 1762897727

Near perfect recall VS fp32, not in absolute terms: TLDR, it's not int8 to ruin it, at least if the int8 quants are computed per-vector and not with global centroids. And also, recall is a very illusionary metric, but this is an argument for another blog post (In short, what really matters is that the best candidates are collected: the long tail is full of elements that are anyway far enough or practically equivalent, since this happens under the illusion that the embedding model already captures the similarity our application demands. This is, indeed, already an illusion, so if the 60th result is 72th, it normally does not matter. The reranking that really matters (if there is the ability to do that) is the LLM picking / reranking: that, yes, makes all the difference.

jbellis · 2025-11-05T03:30:52 1762313452

About twice the price of the Dell 8k.

jbellis · 2025-11-02T17:21:12 1762104072

This is extremely well trodden ground, and he's right. The world doesn't need him to spend time explaining that water is wet.

msgilligan · 2025-11-02T17:50:09 1762105809

And I think he's also acknowledging that not everybody has an application that needs these performance optimizations.

citizenpaul · 2025-11-02T20:15:53 1762114553

I have an unpopular opinion. I simply do not read anything on medium anymore. I in fact have a ublock rule that blocks the site so I do not accidentally go there or give them traffic anymore.

I saw go in the title so I just checked the HN comments first.

jbellis · 2025-10-16T00:48:37 1760575717

Deepseek pioneered automatic prefix caching and caches on SSD. SSD reads are so fast compared to LLM inference that I can't think of a reason to waste ram on it.

jychang · 2025-10-16T05:30:14 1760592614

It’s not instantly fast though. Context is probably ~20gb of VRAM at max context size. That’s gonna take some time to get from SSD no matter what.

TtFT will get slower if you export kv cache to SSD.