> Never ‘sharded’ before, no idea how that works. Sharding sucks, but if your da...

bigiain · on Sept 7, 2021

> you have some method to decide for a given key what database machine will have the data

Super simple example, splitting there phone book into two volumes, A-K and L-Z. (Hmmmm, is a "phonebook" a thing that typical HN readers remember?)

> you can fit a lot of database in 8 TB of ram, and if you don't keep the whole thing in ram, you can index a ton of data with 8 TB of ram, which means sharding can wait.

For almost everyone, sharing can wait until after the business doesn't need it any more. FAANG need to shard. Maybe a few thousand other companies need to shard. I suspect way way more businesses start sharding when realistically spending more on suitable hardware would easily cover the next two orders of magnitude of growth.

One of these boxes maxed out will give you a few TB of ram, 24 cpu cores, and 24x16TB NVMe drives which gives you 380-ish TB of fairly fast database - for around $135k, and you'd want two for redundancy. So maybe 12 months worth of a senior engineer's time.

https://www.broadberry.com/performance-storage-servers/cyber...

Zababa · on Sept 7, 2021

> So maybe 12 months worth of a senior engineer's time.

In America. When the salaries are 2/3 times lower, people spend more time to use less hardware.

toast0 · on Sept 7, 2021

Sharding does take more time, but it doesn't save that much in hardware costs. Maybe you can save money with two 4TB ram servers vs one 8TB ram server, because the highest density ram tends to cost more per byte, but you also had to buy a whole second system. And that second system has follow on costs, now you're using more power, and twice the switch ports, etc.

There's also a price breakpoint for single socket vs dual socket. Or four vs two, if you really want to spend money. My feeling is currently, single socket Epyc looks nice if you don't use a ton of ram, but dual socket is still decently affordable if you need more cores or more ram and probably for Intel sevees; quad socket adds a lot of expense and probably isn't worth it.

Of course, if time is cheap and hardware isn't, you can spend more time on reducing data size, profiling to find optimizations, etc.

Zababa · on Sept 7, 2021

Fair points, I'm just trying to push back a bit against "optimizing anything is useless since the main cost is engineering and not hardware", since this situation depends on the local salaries and in low-inome countries the opposite can be true.