Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Never ‘sharded’ before, no idea how that works.

Sharding sucks, but if your database can't fit on a single machine anymore, you do what you've got to do. The basic idea is instead of everything in one database on one machine (or well redundant group of machines anyway), you have some method to decide for a given key what database machine will have the data. Managing the split of data across different machines is, of course, tricky in practice; especially if you need to change the distribution in the future.

OTOH, Supermicro sells dual processor servers that go up to 8 TB of ram now; you can fit a lot of database in 8 TB of ram, and if you don't keep the whole thing in ram, you can index a ton of data with 8 TB of ram, which means sharding can wait. In contrast, eBay had to shard because a Sun e10k, where they ran Oracle, could only go to 64 GB of ram, and they had no choice but to break up into multiple databases.



> you have some method to decide for a given key what database machine will have the data

Super simple example, splitting there phone book into two volumes, A-K and L-Z. (Hmmmm, is a "phonebook" a thing that typical HN readers remember?)

> you can fit a lot of database in 8 TB of ram, and if you don't keep the whole thing in ram, you can index a ton of data with 8 TB of ram, which means sharding can wait.

For almost everyone, sharing can wait until after the business doesn't need it any more. FAANG need to shard. Maybe a few thousand other companies need to shard. I suspect way way more businesses start sharding when realistically spending more on suitable hardware would easily cover the next two orders of magnitude of growth.

One of these boxes maxed out will give you a few TB of ram, 24 cpu cores, and 24x16TB NVMe drives which gives you 380-ish TB of fairly fast database - for around $135k, and you'd want two for redundancy. So maybe 12 months worth of a senior engineer's time.

https://www.broadberry.com/performance-storage-servers/cyber...


> So maybe 12 months worth of a senior engineer's time.

In America. When the salaries are 2/3 times lower, people spend more time to use less hardware.


Sharding does take more time, but it doesn't save that much in hardware costs. Maybe you can save money with two 4TB ram servers vs one 8TB ram server, because the highest density ram tends to cost more per byte, but you also had to buy a whole second system. And that second system has follow on costs, now you're using more power, and twice the switch ports, etc.

There's also a price breakpoint for single socket vs dual socket. Or four vs two, if you really want to spend money. My feeling is currently, single socket Epyc looks nice if you don't use a ton of ram, but dual socket is still decently affordable if you need more cores or more ram and probably for Intel sevees; quad socket adds a lot of expense and probably isn't worth it.

Of course, if time is cheap and hardware isn't, you can spend more time on reducing data size, profiling to find optimizations, etc.


Fair points, I'm just trying to push back a bit against "optimizing anything is useless since the main cost is engineering and not hardware", since this situation depends on the local salaries and in low-inome countries the opposite can be true.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: