Love to see people leveraging static analysis for AI agents. Similar to what we're doing in Brokk but we're more tightly coupled to our own harness. (https://brokk.ai/) Would love to compare notes; if you're interested, hmu at [username]@brokk.ai.
Quick comparison: Auditor does framework-specific stuff that Brokk does not, but Brokk is significantly faster (~1M loc per minute).
Would be really cool to compare notes :D Sent from a "non tech" company email so it doesn't get filtered lol.
My speed really depends on language and what needs indexing. On pure Python projects I get around 220k loc/min, but for deeper data flow in Node apps (TypeScript compiler overhead + framework extraction) it's roughly 50k loc/min.
Curious what your stack is and what depth you're extracting to reach 1M/min - those are seriously impressive numbers! :D
two great points here:
(1) quantization is how you speed up vector indexes, and
(2) how your build your graph matters much much less*
These are the insights behind DiskANN, which has replaced HNSW in most production systems.
past that, well, you should really go read the DiskANN paper instead of this article, product quantization is way way way way way more effective than simple int8 or binary quant.
and if you want to skip forward several years to the cutting edge, check out https://arxiv.org/abs/2509.18471 and the references list for further reading
* but it still matters more than a lot of people thought circa 2020
Hi! I worked with product quantization in the past in the context of a library I released to read LLMs stored in llama.cpp format (GUFF). However, in the context of in-memory HNSWs, I found them to make a small difference. The recall is already almost perfect with int8. Of course it is very different in the case you are quantizing an actual neural network with, for instance 4 bit quants. There it will make a huge difference. But in my use case I picked what would be the fastest, given that both performed equally well. What could be potentially done with PQ in the case of Redis Vector Sets is to make 4 bit quants work decently (but not as well as int8 anyway), however given how fat the data structure nodes are per-se, I don't think this is a great tradeoff.}
All this to say: the blog post tells mostly the conclusions, but to reach that design, many things were tried, including things that looked cooler but in the practice were not the best fit. It's not by chance that Redis HNSWs are easily able to go 50k full queries/sec in decent hardware.
if you're getting near-perfect recall with int8 and no reranking then you're either testing an unusual dataset or a tiny one, but if it works for you then great!
Near perfect recall VS fp32, not in absolute terms: TLDR, it's not int8 to ruin it, at least if the int8 quants are computed per-vector and not with global centroids. And also, recall is a very illusionary metric, but this is an argument for another blog post (In short, what really matters is that the best candidates are collected: the long tail is full of elements that are anyway far enough or practically equivalent, since this happens under the illusion that the embedding model already captures the similarity our application demands. This is, indeed, already an illusion, so if the 60th result is 72th, it normally does not matter. The reranking that really matters (if there is the ability to do that) is the LLM picking / reranking: that, yes, makes all the difference.
I have an unpopular opinion. I simply do not read anything on medium anymore. I in fact have a ublock rule that blocks the site so I do not accidentally go there or give them traffic anymore.
I saw go in the title so I just checked the HN comments first.
Deepseek pioneered automatic prefix caching and caches on SSD. SSD reads are so fast compared to LLM inference that I can't think of a reason to waste ram on it.
Quick comparison: Auditor does framework-specific stuff that Brokk does not, but Brokk is significantly faster (~1M loc per minute).
reply