> It’s hard to find any accounts of code-search using FTS I'm actually going to ...

bevekspldnw · on April 10, 2024

Mmm, it’s not that straight forward: indexes can vastly slow down large scale ingest, so it’s really about when to index as well.

I work with a lot of multi billion row datasets and a lot of my recent focus has been on developing strategies to avoid the slow down with ingest, and then enjoying the speed up for indexed on search.

I’ve also gotten some mjnd boggling speed increases by summarizing key searchable data in smaller tables, some with JSONB columns that are abstractions of other data, indexing those, and using pg prewarm to serve those tables purely from memory. I literally went from queries taking actual days to < 1 sec.

philippemnoel · on April 11, 2024

That's wild. Quite impressive how far Postgres can be tuned. Is this all with tsvector?

sdesol · on April 10, 2024

Yeah I agree. I've had a lot of practice so far with coordinating between hundreds of thousands of tables to ensure ingestion/lookup is fast. Everything boils down to optimizing for your query patterns.

I also believe in using what I call "compass tables" (like your summarization tables), which I guess are indexes of indexes.

bevekspldnw · on April 11, 2024

Scaling databases both oddly frustrating and also rewarding. Getting that first query that executes at 10x of the old one feels great. The week of agony that makes it possible…less so.

sdesol · on April 11, 2024

Fully agree. I do have to give hardware a lot of credit though. With SSD and now NVME, fast random read/write speed is what makes a lot of things possible.

bevekspldnw · on April 11, 2024

Yup, I just wish Samsung made an 8TB NVME!