Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If a database is expected to be good enough to optimize query plans by default, wouldn't you expect it to be also good enough to optimize layout/indexing?


The problem is that indexes aren't free; an index can easily take up gigabytes of disk space, if not more. There are also performance considerations with respect to insert performance.

What a "good" index is depends on the use case; for example for a webapp a 0.25 second query that's run on every pageload is extremely slow, and creating an index for that is almost certainly a good trade-off. On the hand, it's usually fine if some analytical query that only your finance people need once a month runs for a minute; in that case, an index is probably a poor trade-off. Or: in my app one query takes about ~5 seconds for a rarely used feature; I could add an index to make it faster, but it would also eat up ~25G of disk space (last I checked, probably more now). The very small number of people using it can wait an extra few seconds.

It's not too hard to automatically create indexes; I'd say it's almost easy. But these kind of judgements based on use case isn't something the database can do, and you're likely to end up with dozens of semi-useless indexes eating up your disk space and insert performance.

This is probably one of those cases where automating things will lead to problems and confusion; it's probably better to spend time on tools to gain insight in what the database is doing, so a human can come along and do something that makes sense.


> But these kind of judgements based on use case isn't something the database can do, and you're likely to end up with dozens of semi-useless indexes eating up your disk space and insert performance

That is an empirical question and another comment points to this working just fine: https://news.ycombinator.com/item?id=31991469

Even ignoring that, creating indexes is a matter of tradeoffs - it's not the index creation that's difficult but the decision on whether the tradeoff is worth it. Seeing this, possible issues arising from automatic index creation can be mitigated by allowing the admin to set parameters that dictate where the tradeoff should be made (as a naive example: "I'm okay with using 10Gb of storage if it results in a 20% improvement for P95 queries on this table").

This is not an unreasonable idea and it sounds like it would greatly improve UX for devs working on median-sized DBs (people needing FAANG scale can manually tweak their DBs). Worst case, you can have a whitelist approach to automatic indexes, where the admin is shown index suggestions which require manual approval.


>for example for a webapp a 0.25 second query that's run on every pageload is extremely slow, and creating an index for that is almost certainly a good trade-off. On the hand, it's usually fine if some analytical query that only your finance people need once a month runs for a minute; in that case, an index is probably a poor trade-off.

But that should be automatable. If the DB sees frequent queries, it would create indexes, if those queries cease, it would drop them. If queries are infrequent, it would not create an index.


Would not want to be responsible for a database where some AI was vigorously creating and dropping indexes behind the scenes, just as I was trying to debug performance issues.


Traditionally query optimizers are focused on producing good and not disastrous plans. Not necessarily the best plan. This is the role of a query optimizer: do decently well and don't royally screw up. What makes optimizers to screw up is cardinality misestimations which leads to the wrong join order or wrong join types (nested loop join thinking that a subresult in a query plan is 3 rows where in reality it's 1M).

Optimizing physical layout presents new challenges:

- it takes time to build those layouts and consumes resources. nothing like adding a load to a database that's already struggling to keep up with the load :)

- you don't know what you are going to break

With all that said that's the future and the industry has to figure it out.


They're often terrible at optimizing query plans, especially when there's no index, since the index provides data about the data (cardinality, for example) into the query optimizer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: