I disagree that it's bad, it's a choice. You can't protect against everything. T...

Dylan16807 · 2025-12-21T17:30:23 1766338223

Power outages across big areas are common enough.

It's downright stupid if you build a system that loses all existing data when all nodes go down uncleanly, not even simultaneously but just overlapping. What if you just happen to input a shutdown command the wrong way?

I really hope they meant to just say the write buffer gets lost.

rakoo · 2025-12-23T09:00:55 1766480455

That's why you need to go to other regions, not remain in the same area. Putting all your eggs in one basket (single area) _is_ stupid. Having a single shutdown command for the whole cluster _is_ stupid. Still accepting writes when the system is in a degraded state _is_ stupid. Don't make it sound worse than it actually is just to prove your point.

Dylan16807 · 2025-12-23T10:58:28 1766487508

> Still accepting writes when the system is in a degraded state _is_ stupid.

Again, I'm not concerned for new writes, I'm concerned for all existing data from the previous months and years.

And getting in this situation only takes one out of a wide outage or a bad push that takes down the cluster. Even if that's stupid, it's a common enough stupid that you should never risk your data on the certainty you won't make that mistake.

You can't protect against everything, but you should definitely protect against unclean shutdown.

rakoo · 2025-12-23T17:15:01 1766510101

If it's a common enough occurrence to have _all_ your nodes down at the same time maybe you should reevaluate your deployment choices. The whole point of multi-nodes clustering is that _some_ of the nodes will always be up and running otherwise what you're doing is useless.

Also, garage gives you the possibility to automatically snapshot the metadata, advices on how to do the snapshotting at the filesystem level and to restore that.

Dylan16807 · 2025-12-24T03:59:21 1766548761

All nodes going down doesn't have to be common to make that much data loss a terrible design. It just has to be reasonably possible. And it is. Thinking your nodes will never go down together is hubris. Admitting the risk is being realistic, not something that makes the system useless.

How do filesystem level snapshots work if nodes might get corrupted by power loss? Booting from a snapshot looks exactly the same to a node as booting from a power loss event. Are you implying that it does always recover from power loss and you're defending a flaw it doesn't even have?

rakoo · 2025-12-24T20:11:36 1766607096

No, the snapshotting and restore is manual