Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As quickly as possible? This is the same Xen issue that made Amazon reboot their EC2 instances. They announced it significantly before Rackspace. Both RS and Amazon have the info from the same source -- the Xen vulnerability pre-disclosure list which keeps this issue (XSA-108) under wraps until Oct 1st while major cloud providers can apply the patch.


Rackspace may have thought that they could mitigate the issue without taking this drastic step. They certainly didn't want to do global reboots.

The engineering team probably spent some time running tests and scribbling on whiteboards, trying to prove that the boat wasn't going to sink. In hindsight, they should have just sounded the klaxon and started handing out life jackets, but you know what they say about hindsight. And there are lots of reasons why the typical engineering organization struggles to accept the inevitable and call for an evacuation. Nobody likes Cassandra. Everybody wants to be a hero. Didn't you say this boat was unsinkable? It's hard to get all the decision-makers into one room. The show must go on. It isn't obvious that this complicated problem leads to our certain doom. Et cetera.

The key to making these things go smoothly is the Chaos Monkey, a.k.a. "conduct constant drills of your emergency responses". If you don't rehearse the response, you shy away from trying it. AWS halts or reboots EC2 instances all the time, and lo and behold, when it comes time to reboot all EC2 instances they don't flinch. Or they flinch less visibly, anyway.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: