Slashdot Mirror


Amazon EC2 Crash Caused Data Loss

Relayman writes "Henry Blodget is reporting that the recent EC2 crash caused permanent data loss. Apparently, the backups that were being made were not sufficient to recover the lost data. Although a small percentage of the total data was lost, any data loss can be bad to a Website operator."

19 of 112 comments (clear)

  1. I am not rightly able to comprehend... by Man+On+Pink+Corner · · Score: 5, Insightful

    ... the confusion of ideas that would lead someone to treat their live web server as their primary/master data repository.

    I guess I'm still stuck in Commodore 64 World, or something..

    1. Re:I am not rightly able to comprehend... by obarthelemy · · Score: 2

      I'm not so sure about rigorous...
      1- I personnally have never lost a single byte of meaningful data
      2- do amazon detail their exact procedures and commitments ?
      3- do amazon backup those "commitments" with hard cash ? How much will the people whose data they lost be compensated ?

      read the sig....

      --
      The Cloud - because you don't care if your apps and data are up in the air.
    2. Re:I am not rightly able to comprehend... by MichaelSmith · · Score: 4, Informative

      It took something pretty catastrophic to bring it down and cause data lass

      Catastrophic would be an earthquake, tsunami and meltdown, in that order. From my reading of the situation amazon stuffed up their own replication mechanism and it recursively replicated the system to fill up the available hardware. Thats just bad design. Its obvious they did no testing under realistic conditions.

    3. Re:I am not rightly able to comprehend... by greenbird · · Score: 2

      well, the a data center run by amazon certainly has more rigorous backup and maintenance schedules than anything I could personally come up with

      It's funny. Not a single place I've worked at has had as good of backups as I have for my personal stuff. And I didn't even spend 6 figures for some useless enterprise backup solution. Some scripting, cp -al, rsync, dmcrypt, ssh and a remote PC at my girlfriends house and you have an incremental backup solution more secure and more robust than any enterprise solution I've ever seen, and it only cost a couple hundred for the drives.

      --
      Who is John Galt?
    4. Re:I am not rightly able to comprehend... by jpapon · · Score: 2
      Until she dumps you and throws your backup drives out her window that is. Tying the security of your backup to the security of your relationship is an interesting gamble. One day you might find yourself lonely AND data-less.

      Unless of course you're one of those people who refers to female friends as "girlfriends", in which case, I hate you.

      --
      -- Let us endeavor so to live that when we pass even the undertaker shall be sorry. -- M. Twain
    5. Re:I am not rightly able to comprehend... by Yvanhoe · · Score: 4, Funny

      That is still better than Amazon's plan actually.

      --
      The Wise adapts himself to the world. The Fool adapts the world to himself. Therefore, all progress depends on the Fool.
    6. Re:I am not rightly able to comprehend... by wvmarle · · Score: 4, Informative

      From a look at the linked article, it seems that one of the issues is data generated by these web sites. Such as user statistics, or user uploaded content, etc. That naturally lives primarily on the live web server and is also data that you don't want to lose. Also as other commenters mentioned as well the EC2 service is not a cloud-storage server, it's a web hosting service, and web hosts tend to indeed generate their own data.

      This data of course needs to be backupped actively, and one would expect a web host to include that in its service. That's one of the reasons to pay for such a service, instead of doing it yourself.

      Besides relying on their backups it's of course a good idea to regularly take backups yourself. But even if you do this daily, it means you may lose up to a day's worth of data. And that's (partly) what happened here. It's similar to someone who takes a photo on a digital camera, and subsequently loses that camera and the photo with it. You don't say "they shouldn't use a camera as primary data repository". It isn't. It's a temporary repository, and when the data is generated it's the one and only repository, simply pending copying to backup media.

    7. Re:I am not rightly able to comprehend... by jc2brown · · Score: 2

      You might want to read this.

      They're crediting all accounts that had any activity in the USA-East region for 10 days of usage, regardless if they were affected.

      Remember that it was EC2 that was affected, which is just a virtual machine with volatile storage. Had it been S3 data that was lost one should expect restitution, but in this case downtime and data loss is ultimately the fault of the user.

    8. Re:I am not rightly able to comprehend... by teh+kurisu · · Score: 3, Interesting

      That depends. Only a couple of our servers in that availability zone were actually affected, but we're apparently being compensated as though all of them were. Bonus for us.

    9. Re:I am not rightly able to comprehend... by darkpixel2k · · Score: 2

      1- I personnally have never lost a single byte of meaningful data

      Yep--the moment I accidentally 'rm -rf /', I simply re-classify the drive as 'not containing meaningful data' and my stats are saved.

      --
      There's no place like ::1 (I've completed my transition to IPv6)
  2. Lost data? by DWMorse · · Score: 2

    Was the lost data... all the stuff the PSN network lost? I think I see a connection!

    --
    There's a spot in User Info for World of Warcraft account names? Really?
  3. What is S3? by badran · · Score: 5, Informative

    EC2 is not meant to be used for data storage, that is what S3 is designed for. You store data and backups on S3, and use EC2 to serve high bandwidth websites to the masses.

  4. Re:Clouds are ephemeral by mini+me · · Score: 5, Informative

    Cloud applications hosted on Amazon survived this incident without issue, as expected. Only the regular old hosted applications had problems with the outage. They were never "the cloud" to begin with, so I'm not sure why the term even comes up in this discussion.

    The cloud represents a black box that hides the underlying network topology so that there are no single points of failure. Cloud applications are tolerant because they are spread through different datacenters across multiple points of in world. A catastrophe at one or more datacenters will have no noticeable effect on the availability of a cloud application because it continues to run in many more.

    Amazon offers a few cloud applications: S3 comes to mind. But Amzon's EC2/EBS hosting service is a plain old hosting service like any other. The EC2 topology is not hidden away from you. You have to make active decisions about where you want your EC2 instance to live. That goes against the idea of the cloud. What Amazon does offer in EC2 is the tools necessary for you to build a cloud application, but not everything hosted on EC2 is a cloud application by default.

  5. Did this save Wikileaks? by kulnor · · Score: 2, Funny

    Guess Wikileaks feels good about not being hosted there anymore.... their critical information could have been "lost" as well....

  6. Re:The Cloud Is Dead by inputdev · · Score: 2

    I think people miss the point of the cloud - saying the cloud is worthless because it "brings people that would otherwise have nothing against you trying to take down your server" is like saying that the internet is worthless because it opens up security risks.
    I for one am glad to be connected, and obviously so are many others. Don't use services that aren't good for you - there are some cloud based services that are great, and some that aren't. It's pretty clear that in the future, things will be more connected, not less - adapt and take advantage of the good parts, the rest will fade anyway.

  7. Re:Availability zones by nereid666 · · Score: 2

    From: http://aws.amazon.com/es/ec2/
    Availability Zones are distinct locations that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region. By launching instances in separate Availability Zones, you can protect your applications from failure of a single location.

    Better than use different region, I think it is better have multiple cloud providers...

    --
    Damia
  8. Post morten Amazon explanation by nereid666 · · Score: 5, Informative

    Post morten Amazon explanation:
    http://aws.amazon.com/message/65648/

    --
    Damia
  9. Store a backup yourself by olau · · Score: 2

    This is not the first time I've heard about a big hosting centre losing data even though it never happens, and they are keeping backups, etc.

    It if it's at all manageable, keep one copy safe at your own place in addition to the replication at the hosting centre. You can set up a cheap box at the office with a couple of terabytes disk space and suck down the data periodically with something like rsync and rdiff-backup. It's not a whole lot of work and can make the difference between having a big problem and total disaster.

    It would help if hosting centres actually told you how exactly they store and backup your data and what they do in case of emergency instead of throwing meaningless phrases like "99.999% uptime!" and "fully redundant storage backbone!" at you. Fully redundant storage backbone is nothing if it means it's built with some big arse proprietary SAN stuff where the whole array goes down if the main controller goes down. Which it of course does because it's a flaky embedded thing with 2k memory that has to be programmed in assembler and C with dangling memory pointers all over the place.

  10. Clarification by Mascot · · Score: 2

    The durability you quote for S3 (99.99%) is for the reduced redundancy option. The standard storage lists 99.999999999% durability.