Slashdot Mirror


The Sidekick Failure and Cloud Culpability

miller60 writes "There's a vigorous debate among cloud pundits about whether the apparent loss of all Sidekick users' data is a reflection on the trustworthiness of cloud computing or simply another cautionary tale about poor backup practices. InformationWeek calls the incident 'a code red cloud disaster.' But some cloud technologists insist data center failures are not cloud failures. Is this distinction meaningful? Or does the cloud movement bear the burden of fuzzy definitions in assessing its shortcomings as well as its promise?"

30 of 246 comments (clear)

  1. Management by FredFredrickson · · Score: 3, Interesting

    It's usually a decision on management's side to not use best practices, despite warnings from the tech dept.

    tldr; There's nothing wrong with the technology, just the greedy bastards using it.

    --
    Belief? Hope? Preference?The Existential Vortex
    1. Re:Management by sopssa · · Score: 5, Insightful

      As always, cloud computing/hosting/whatever is a vague term used like any other buzz term. I just see it as a platform where the resources should be allocated automatically and the underneath system takes care of having those available.

      The same failure points are there. You're just putting the trust and management to someone else. Even if they do have backup plans and certain levels of redundancy, it can always fail. Cloud computing isn't something magical.

      “Similarly datacenters fail, get disconnected, overheat, flood, burn to the ground and so on, but these events should not cause any more than a minor interruption for end users. Otherwise how are they different from ‘legacy’ web applications?”

      That's because they aren't. The system is just managed by someone else, and its managed for thousands of people at the same time so its cheaper. Kind of like what Akamai has been doing for long with their content delivery network - it's cheaper for the providers because they dont have to build the infrastructure themself, and its cheaper for Akamai because they do it for so many clients.

    2. Re:Management by Splab · · Score: 4, Interesting

      Well there is one difference. Cloud computing and virtual servers are to computers what keychains are to keys, it enables you to lose everything at once.

      Yes it is highly convienient and more effective to have everything in one place, but so much more fun when you drop your "chain" in the sewer.

    3. Re:Management by dkf · · Score: 5, Insightful

      Well there is one difference. Cloud computing and virtual servers are to computers what keychains are to keys, it enables you to lose everything at once.

      It's not really a difference. With home-grown datacenters you still have that risk unless you do something like building multiple redundant buildings in different locales and managing some kind of replication and backup strategy. But then all of that stuff is the same with going to a Cloud provider, except you're not having to futz around with the physical facilities yourself.

      There's no magic. All we're seeing is stupid people getting burned because they didn't use basic due diligence.

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    4. Re:Management by dFaust · · Score: 3, Insightful

      But if Akamai loses a server, I don't have to repopulate the gigs of data they're hosting for me - it's not lost, it's just no longer on that particular server that died. That's exactly why I consider Akamai to be "the cloud" and why it doesn't side like Danger was. Especially with an infrastructure like Akamai or Google where things are geographically distributed, you just don't hear about servers dying, and you might not even hear about data centers dying (unless it places an unusually high burden somewhere and causes performance issues - but you don't hear about data loss as a result).

    5. Re:Management by QuantumRiff · · Score: 3, Insightful

      True, but how much more money and brain power does Google have to invest in datacenter design and disaster recovery than your local college?

      Seriously.. I worked at one.. All our stuff was on "next day parts" from Dell.. We had a single internet connection to the campus, single linux based sendmail email server, etc.

      Granted, I had tapes up the wazoo, and could retrieve any file for the past X years, but downtime is still downtime.

      Then you have Google, with multiple sites, multiple connections, replication, Load balancers, etc.

      Not only do they have more to invest, but when they call up a vendor and say "we are Google, we have an outage, and we need some things from you" I bet those vendors jump a little faster than when a local school IT guy calls them up..

      --

      What are we going to do tonight Brain?
    6. Re:Management by wickerprints · · Score: 5, Insightful

      To be fair, Sidekick users didn't have a viable means to back up their personal data that was being pulled from Microsoft/Danger servers. I don't think it's reasonable to expect the users to find some hack or unofficial method to copy all their data from their devices. The only blame they could be assigned is that they bought the service being sold. Your criticism would be valid for, say, iPhone users, since the user has a backup stored on their computer. But no such functionality exists for the Sidekick, as far as I am aware.

      And as to who is really being burned here.... Obviously not Microsoft/Danger. Microsoft doesn't give two shits about this, since their acquisition of Danger in 2008 was really about cannibalizing their talent for Windows Mobile 7, as the Pink project has shown. Danger is just a shell of its former self--the damage was done long before this latest failure, which I think was an inevitable consequence of the acquisition. The ones who got burned are T-Mobile (for trusting Microsoft to manage Danger, and Danger to maintain a proper backup solution), and of course, the consumers.

      The real issue, of course, is that data is always at risk of being lost no matter how, where, or in what amount it is stored. The passage of time guarantees it. But people want to believe in the existence of certainties, in the notion that if something has a 99.9999% reliability, then we can effectively ignore the minuscule probability of failure. But failures happen all the time and there is no such guarantee. We need to rid ourselves of this delusion that data can somehow be made "safe," that risk can be ignored when made small. Cloud computing is just the flavor of the day.

      I knew someone who worked at Danger years ago when the company was still fairly new. It was, at the time, an amazing technology. There was nothing like it. They had so much going for them, and there was a lot of good talent working there. One thing that impressed me was how they solved the problem of mobile web browsing. At the time, mobile web browsing seriously sucked ass. It was not only slow, but many sites simply would not load. Danger solved that by re-parsing the sites on their servers so that pages would look good and function properly on your mobile device. It was the best solution until mobile OSes and hardware became powerful and complex enough to support full browsing; and even then, the UI needed to be tightly integrated before browsing became efficient instead of tedious. It's sad to see such a pioneering company wither on the vine.

    7. Re:Management by BrokenHalo · · Score: 5, Insightful

      This all comes back to the thrust of the OP: whether the apparent loss of all Sidekick users' data is a reflection on the trustworthiness of cloud computing or simply another cautionary tale about poor backup practices.

      The simple truth, of course, is that it is both. And the only solution here is the old one: if you want something done properly, you will have to do it yourself. If your data, documents or whatever are in any way important to you, you should not be relying on anyone else to keep them safe. Simple as that, and no excuses.

    8. Re:Management by pz · · Score: 3, Informative

      There's no magic. All we're seeing is stupid people getting burned because they didn't use basic due diligence.

      Yes, and, no. The people getting burned here are customers, by the many thousands. You can't expect the end-user to know what the DRP / BCP is for a subcontractor of the provider of their wireless communicator data plan. I wouldn't call the end-users stupid, and they are the ones most significantly affected in this case.

      --

      Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
  2. For the love of God the company is called "Danger" by syntap · · Score: 4, Funny

    Didn't that throw up any red flags for ANYONE?

  3. AGPL by Koohoolinn · · Score: 5, Informative

    This is an unforeseen hole in the bulletproof Gandhi mechanism, so I foresee a quick "GPL V3.1" to close this.

    It already exists. It is called AGPL: http://en.wikipedia.org/wiki/AGPL/

    --
    Deze sig is in 't Nederlands geschreven.
  4. Has there never been a non-cloud data loss? by iamacat · · Score: 4, Insightful

    Just like people lose their stuff on personal hard drives when not backed up, they will lose cloud data when not backed up. Both kinds of computing have merits, and long term persistence of data is not automatic with either. Most people do not place THAT hard a value on backups of their cell phones. They typically sync with a PC anyway. But any business that doesn't have weekly reliable offsite backups of their fundamental assets should be sued by shareholders/customers for irresponsibility weather they use cloud or not.

  5. Re:For the love of God the company is called "Dang by garcia · · Score: 4, Interesting

    Didn't that throw up any red flags for ANYONE?

    I was a Sidekick user from 4/2004 until 10/2008. There had been only one 'catastrophic' failure in that time that left Sidekick users without data service for an extended period. Danger produced one of the best mobile devices, which in many ways is still better than anything out there even though the OS and devices that utilize it (the various Sidekick models that exist these days) is quite a bit outdated compared to devices like the iPhone.

    I miss my Sidekick immensely. I loved true multitasking, a fully capable QWERTY keyboard, and incredible battery life. Unfortunately it didn't sync well with calendaring software, didn't keep up with music playing, and is now partially controlled by Microsoft. There have been immense trade offs with moving to the iPhone but based on my main reason for owning an iPhone (I ride the bus and enjoy the music/video player and screen size) it was the right choice for me.

    That said, "cloud computing" is something which usually works (and did, in the case of the Sidekick since 2002). I don't think that this is a proven warning sign that "cloud computing" isn't as reliable as everyone believes, I just think it's proof that companies need to do a much better job of ensuring data integrity than they could have ever imagined before.

    Will I stop using Flickr, Google products, and other future "cloud" devices/software because of this? No. I am smart enough, as a computer savvy end-user, to keep my own backups of my data but I do believe people need to become better educated in what can and will happen as we move to the model we have slowly done in the last 10 years.

  6. Your data is your responsibility. by zerofoo · · Score: 5, Insightful

    As a wise auditor once told me:

    You can outsource the work, but you can not outsource the responsibility.

    If your data is important to you - you must back it up, and you must test your backups.

    The end.

    -ted

  7. Assumptions by eagl · · Score: 4, Insightful

    Just because you're paying someone to store your data doesn't mean they care about that data as much as you do... That's one of the two big problems with cloud computing that can't be solved by technology. First, nobody cares about your data as much as you do. Second, nobody will protect your data (ie. control it's distribution and prevent unauthorized changes) to the level you find appropriate.

    It's usually a good idea to avoid using broad generalities (like I just did), but it seems like in general it would be a bad idea to let someone else be the sole keeper of anything even remotely important or sensitive. There are exceptions, but those seem to be internal to a company (ie. the company runs it's own cloud and has all employees use it). Or military/government applications where centralized security and backup can keep user errors from becoming a real danger to the organization beyond "help I lost my email!".

  8. Not a cloud disaster, not a "data center" disaster by TheLoneGundam · · Score: 3, Interesting

    Leaving aside the fact that a "data center" could consist of two servers under Mabel's desk, this is not a "data center" disaster, nor is it a cloud catastrophe.

    This a contract and contract management failure: the contract with the outsource was probably written without specifying that they must do the backups, AND no one established any sort of audit (formal or informal) test to ensure that there _were_ backups being taken and that the outsourcer was performing according to the contract.

    Too often, the MBA doing the contract thinks "there, that's handled" once they've gotten all the signatures on the dotted line. "There, backups are handled now" he thinks, because many business folk (not ALL, I don't think it's fair to generalize that far) see these kinds of things as milestones, rather than ongoing processes to be managed.

  9. The Cloud is Just a Big Mainframe by tjstork · · Score: 4, Insightful

    When you cut through the "cloud", if you look into the center of things, you see that the so-called modern "cloud" computing environment is a giant computer(s), surrounded by high powered priestly geeks, doling out resources to everyone, completely centralized. The priests have some new tricks to entertain the masses with, but there's nothing fundamentally different between cloud computing and IBM's vision of computing in the 1960s.

    --
    This is my sig.
  10. No true scotsman by vadim_t · · Score: 4, Insightful

    This is awfully convenient. Something that at least to my eyes looks a lot like a cloud crashes. Cloud pundits announce:

    "if it loses your data - it's not a cloud".

    So if Amazon's S3 ever fails horribly and loses everybody's data, then it wasn't a cloud either.

  11. Re:Meta-cloud, anyone? by FlyingBishop · · Score: 3, Interesting

    If someone tells you that they can cheaply prevent catastrophic failure, expect a catastrophic failure. Nothing can correct something like this, which involved an error propagating to the backups.

  12. Re:A reason why cloud computing might be hated on by Anonymous Coward · · Score: 3, Insightful

    I don't think that has anything to do with it; at least not for me. My main concern with cloud computing is trust. Do I trust someone other than myself to not fuck up and lose all my data? For critical data, the answer is no. If somebody is going to fuck up and lose all my data, it's going to be me. I don't know if all the data on a Sidekick would qualify as critical, but it would certainly be annoying as fuck to lose it all.

  13. An epic fail, and missed lessons (so far) by rickb928 · · Score: 4, Insightful

    I'm a TMO subscriber, and I love them, so this is painful. And my sister-in-law is a longtime Sidekick user, so she's in a special agony.

    But T-Mobile is in a potentially no-win situation. They obviously have to believe Danger/Microsoft that they have good processes to avoid and recover from such failures. They didn't, and now TMO is probably going to take the hit. On one hand, they should - if the service is important, take responsibility and ensure management. On the other hand, they have good assurances, so hey, how much is enough?

    BlackBerry users, you should take note. Rim differs only in scale. Ahd, you hope, depth of resilience. Not that RIM hasn't had outages, though not total failure yet.

    TMO may have to tell their Sidekick users to be prepared for the inevitable restore, and of course, work with Danger/Microsoft to re-establish service (even though they don't provide service, D/M does), and of course some money compensation no matter how inadequate.

    And maybe offer them shiny new myTouch3Gs to give the disillusioned Sidekick users an option with a marginally better track record.

    No, wait, that isn't right. I've had to wipe my G1 every update, and some apps don't have a way to save data. They just don't.

    I'm glad I never got on the Sidekick train, but I have no hope that this won't some day hit me. Do you suppose the next major Sidekick update will include data backup? :)

    --
    deleting the extra space after periods so i can stay relevant, yeah.
  14. predictably doomed by jipn4 · · Score: 3, Insightful

    Danger held your data hostage from the start and didn't provide backup. Then, when Microsoft took them over, it was clear that they were going to mess with the service and servers. No backup + Microsoft mucking with the servers = kiss your data goodbye.

    But that's no more an indictment of hosted services or "cloud computing" than a Windows BSOD is an indictment of desktop computing. Microsoft screwed up, and quite predictably, too.

  15. "if it loses your data - it's not a cloud". by John+Hasler · · Score: 4, Insightful

    Just define away your problems. ROFL.

    --
    Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
  16. Re:Cloud Failure by commodore64_love · · Score: 3, Funny

    No it's cold. Besides how am I going to watch these latest episodes of Stargate and Eureka if I'm outside playing with the squirrels and birds?

    --
    "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
  17. Re:For the love of God the company is called "Dang by sopssa · · Score: 4, Insightful

    Tip: If you want to link to specific part in youtube video, you can add #t=1m3s etc on it, ie http://www.youtube.com/watch?v=kcFUDvTFokg#t=1m40s

    Also adding &hd=1 gives hq/hd version.

  18. Re:What IS cloud computing? by rwa2 · · Score: 4, Interesting

    We'll, I was hoping to just google cloud vs. grid vs. distributed vs. cluster vs. etc. computing, but there doesn't seem to be much official-sounding distinction out there. Which means if we start our own thread here it might become definitive!

    "cloud" computing: fluffy term used by people who really don't know anything other than that they run their applications from a web page and their data appears to be stored on the web because they can access it from more than one web browser.

    "hosted" / "server farm" computing: buying server resources from someone who has a real datacenter who tries to take care of your hardware. You access all of your data over the network "cloud". Redundancy & support varies based on pricing & services.

    "grid" / "utility" computing: computing infrastructure where you should be able to simply scale up CPU, data, etc. resources for your operation simply by throwing money at turning on more boxes. You don't necessarily need to share it with others, though.

    "cluster" computing: a computing system made up of more or less independent, generally homogeneous nodes, where problems can be partitioned out. Generally has some form of redundancy so you don't lose work when a single node dies, but probably won't survive a data center failure.

    "distributed" computing: special applications that can be farmed out to the net to break parts of computing or storage across a heterogeneous network of computers distributed over many locations. Ideally it's written to be highly redundant and tolerate faults such as nodes joining / leaving the cluster.

    As far as reliability goes, the TIA data center tiers seems to be the only common way of talking about maintaining "business continuity". I've read through it briefly, and can somewhat paraphrase the intent (mildly inaccurately, mostly because the standard itself is kinda loose and not defined in too much detail with regards to servers) as:

    Tier 1 "basic" : You have a room for servers with a door to keep random people from tripping over the plugs. Maybe you have a UPS on your server so it can do a graceful shutdown without data loss when the power or AC goes out.

    Tier 2 : You have your stuff in racks with a raised floor for air conditioning and some wire racks hanging from the ceiling for cable management.

    Tier 3 : You have redundant UPS's and RAIDs, CRACs, network links, and stuff, so you can make repairs when common things break without turning off the system (typically anything with moving parts or high currents, like power supplies, fans, disks, batteries needs to be hot-swappable). Which means you should also have some sort of monitoring and alert system so you know when that stuff actually fails so you can replace it before the redundant components also fail. This is intended to reach 24x7 availability with high uptimes... , maybe 3-5 nines.

    Tier 4 : Like Tier 3, but certified for mission-critical / life-critical use, like in hospitals and maybe for airplanes and stuff. It should survive prolonged power outages (so you have a diesel generator with a day or two worth of fuel.)

    Unfortunately, it just covers build specs for individual data centers, so it doesn't really cover other business continuity things like maintaining offsite backups so you can somewhat easily rebuild from scratch if a natural disaster takes out one of your data centers or something. But it's kind of different worlds of IT between designing facilities and architecting "cloud" services, which unfortunately don't seem to communicate or collaborate as much as they should to reach the kinds of "distributed grid of redundant load-sharing data centers" configurations we'd expect.

  19. Cloud Computing by snspdaarf · · Score: 4, Funny
    The best part of TFA is the comment below from their version of an A/C:

    Cloud architecture shards data

    In this case it certainly did.

    --
    Why, without your clothes, you're naked, Miss Dudley!
  20. TOS by ei4anb · · Score: 3, Funny

    The TOS probably made the users aware that "your data is in Danger" so they can't complain now :-)

  21. Re:A reason why cloud computing might be hated on by Jezza · · Score: 3, Interesting

    Mod the parent up!

    There are two sides to this (at least). If you're moving your data "to the cloud" you'd expect that "the cloud" is one hell of a lot more reliable than you are. Let's face it, they should be - the economics of scale mean it's a lot cheaper for them to host your data and lots of other's data, than it is for you alone.

    But that isn't what's happened in this case, here Microsoft (!) haven't even covered the basics. This is stunning.

    So does this call into question "cloud computing" or just Microsoft's "cloud computing"? This is a difficult question to answer, without being able to see for yourself your cloud partner's infrastructure and procedures you can't really be sure... But would anyone make such a foolish mistake? Microsoft have proven that the answer is "yes, if it's Microsoft", the real question is should that be just: "yes"?

    I think most of us now want a more hybrid approach, "in the cloud" is nice, but I also want a "local copy".

    Then you have to think about the other kind of "lose" where others gain access to data they shouldn't see...

  22. causes of the meltdown by viralMeme · · Score: 4, Informative

    "According to some reports, the failure was due to a SAN (Storage Area Network) gone wrong at Microsoft's end. It is claimed that Microsoft does not have a working backup of some of the data that has gone missing from customers devices. The SAN upgrade is rumoured to have been outsourced to Hitachi to complete"

    "Microsoft, possibly trying to compensate for lost and / or laid-off Danger employees, outsources an upgrade of its Sidekick SAN to Hitachi, which -- for reasons unknown -- fails to make a backup before starting"