If an industry leader with "the brightest software engineers in the world" is unable to write their software to a higher than average standard, then I think I am in the very reasonable position to point it out and raise discussion about it. Don't place me in an ivory tower for that.
Why should we just sweep this under the rug and call it a day? There is something very important that can be learned from this. Apparently, FB themselves have not learned, because this is not the first time this has happened. Why was this allowed to happen more than once? Did Apple, Google, Spotify, Facebook etc not take any actions?
"Shit happens" is one way to look at it. Another way is to question whether we as software engineers are really doing our job properly.
Why are you so angry about Facebook, Apple, Google, and Spotify having a tiny outage? They are not life saving services and only cause a mild inconvenience when they fail which practically never happens for most people especially compared to "average software".
> FB themselves have not learned, because this is not the first time this has happened.
They have tens of thousands of engineers likely split into hundreds of different teams/microservices focussing on different parts of their software stack. A ton of them are new to it and nobody knows every part of the stack so shit can happen.
What is the biggest engineering organization size you have worked for and what was your uptime?
> Why are you so angry about Facebook, Apple, Google, and Spotify having a tiny outage?
The linked issue makes it very clear that the "tiny outage" affects literally every app that has the Facebook SDK merely linked, for a frankly dumb reason (static initialization) that they should have learned from the last time it happened.
It's not a SaaS. "Uptime" for a linked library is stupid.
Why should we just sweep this under the rug and call it a day? There is something very important that can be learned from this. Apparently, FB themselves have not learned, because this is not the first time this has happened. Why was this allowed to happen more than once? Did Apple, Google, Spotify, Facebook etc not take any actions?
"Shit happens" is one way to look at it. Another way is to question whether we as software engineers are really doing our job properly.