Near perfect recall VS fp32, not in absolute terms: TLDR, it's not int8 to ruin ...

Near perfect recall VS fp32, not in absolute terms: TLDR, it's not int8 to ruin it, at least if the int8 quants are computed per-vector and not with global centroids. And also, recall is a very illusionary metric, but this is an argument for another blog post (In short, what really matters is that the best candidates are collected: the long tail is full of elements that are anyway far enough or practically equivalent, since this happens under the illusion that the embedding model already captures the similarity our application demands. This is, indeed, already an illusion, so if the 60th result is 72th, it normally does not matter. The reranking that really matters (if there is the ability to do that) is the LLM picking / reranking: that, yes, makes all the difference.