Slashdot Mirror


Amazon Explains Why S3 Went Down

Angostura writes "Amazon has provided a decent write-up of the problems that caused its S3 storage service to fail for around 8 hours last Sunday. It providers a timeline of events, the immediate action take to fix it (they pulled the big red switch) and what the company is doing to prevent re-occurrence. In summary: A random bit got flipped in one of the server state messages that the S3 machines continuously pass back and forth. There was no checksum on these messages, and the erroneous information was propagated across the cloud, causing so much inter-server chatter that no customer work got done."

2 of 114 comments (clear)

  1. Re:for want of a nail ... by Daimanta · · Score: 5, Funny

    It was the evil bit...

    --
    Knowledge is power. Knowledge shared is power lost.
  2. Re:Other companies could learn from this... by Anonymous Coward · · Score: 5, Funny

    Other companies could learn something from this, unfortunately they won't be able to do anything similar as Amazon has patented the process of explaining technological problems to customers.