> It’s hard to find any accounts of code-search using FTS
I'm actually going to be doing this soon. I've thought about code search for close to a decade, but I walked away from it, because there really isn't a business for it. However, now with AI, I'm more interested in using it to help find relevant context and I have no reason to believe FTS won't work. In the past I used Lucene, but I'm planning on going all in with Postgres.
The magic to fast code search (search in general), is keeping things small. As long as your search solution is context aware, you can easily leverage Postgres sharding to reduce index sizes. I'm a strong believer in "disk space is cheap, time isn't", which means I'm not afraid to create as many indexes as required, to shave 100's of milliseconds of searches.
Mmm, it’s not that straight forward: indexes can vastly slow down large scale ingest, so it’s really about when to index as well.
I work with a lot of multi billion row datasets and a lot of my recent focus has been on developing strategies to avoid the slow down with ingest, and then enjoying the speed up for indexed on search.
I’ve also gotten some mjnd boggling speed increases by summarizing key searchable data in smaller tables, some with JSONB columns that are abstractions of other data, indexing those, and using pg prewarm to serve those tables purely from memory. I literally went from queries taking actual days to < 1 sec.
Yeah I agree. I've had a lot of practice so far with coordinating between hundreds of thousands of tables to ensure ingestion/lookup is fast. Everything boils down to optimizing for your query patterns.
I also believe in using what I call "compass tables" (like your summarization tables), which I guess are indexes of indexes.
Scaling databases both oddly frustrating and also rewarding. Getting that first query that executes at 10x of the old one feels great. The week of agony that makes it possible…less so.
Fully agree. I do have to give hardware a lot of credit though. With SSD and now NVME, fast random read/write speed is what makes a lot of things possible.
I'm actually going to be doing this soon. I've thought about code search for close to a decade, but I walked away from it, because there really isn't a business for it. However, now with AI, I'm more interested in using it to help find relevant context and I have no reason to believe FTS won't work. In the past I used Lucene, but I'm planning on going all in with Postgres.
The magic to fast code search (search in general), is keeping things small. As long as your search solution is context aware, you can easily leverage Postgres sharding to reduce index sizes. I'm a strong believer in "disk space is cheap, time isn't", which means I'm not afraid to create as many indexes as required, to shave 100's of milliseconds of searches.