Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Depends on the database. Microsoft SQL Server typically uses the PK as the clustering index. This dictates the order the table data is stored on disk. If you row PK is random you're going to have write latency and a fragmented index.


Right, but most databases that I've seen don't do it that way. If yours does, it's going to be really, really bad time.


Even in non-clustered indexes (relevant to pretty much all SQL implementations not just MS SQL Server) random keys can cause the index to balloon over time due to many page splits, requiring extra space or extra maintenance rounds (rearranging the indexes to optimise them occasionally, if the relevant data grows rapidly or changes often).

The effect is just worsened by clustering, as done in SQL Server for significant (or at least measurable) benefits for many data patterns, because your base data is bigger due to wasted space due to excess page splits, as well as your supplementary indexes, so that needs reorganising more often too.

A common answer is to use an int (or maybe bigint) for your internal keys and a UUID for anything external, so you have the benefits of a UUID for external use (practically zero chance of collision in distributed systems, if appropriately random, and not potentially leaking information in some security contexts) but the efficiently of the integer otherwise. Or to use a partially ordered UUID format, which balances the compromises slightly differently (benefit: single value, dropping some of the UUID issues, keeping some of the benefits; detriment: letting some of the disadvantages of UUIDs, potentially reducing some of the benefits).


InnoDB, the default engine of MySQL, uses a clustering index. Last I checked, it's the most popular (in terms of installations) RDBMS in the world.

Clustering indexes make a lot of sense for many workloads, you just have to design your data model with them in mind.


The most deployed database these days is probably SQLite if you count semi-embedded contexts (ie. mobile apps). Excluding that it does look like MySQL is still on top, but Postgres is catching up fast. I completely forgot InnoDB uses clustered indices by default. Regardless, I was referring to most by implementations, not deployment numbers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: