The Sidekick Failure and Cloud Culpability
miller60 writes "There's a vigorous debate among cloud pundits about whether the apparent loss of all Sidekick users' data is a reflection on the trustworthiness of cloud computing or simply another cautionary tale about poor backup practices. InformationWeek calls the incident 'a code red cloud disaster.' But some cloud technologists insist data center failures are not cloud failures. Is this distinction meaningful? Or does the cloud movement bear the burden of fuzzy definitions in assessing its shortcomings as well as its promise?"
Well there is one difference. Cloud computing and virtual servers are to computers what keychains are to keys, it enables you to lose everything at once.
Yes it is highly convienient and more effective to have everything in one place, but so much more fun when you drop your "chain" in the sewer.
Didn't that throw up any red flags for ANYONE?
I was a Sidekick user from 4/2004 until 10/2008. There had been only one 'catastrophic' failure in that time that left Sidekick users without data service for an extended period. Danger produced one of the best mobile devices, which in many ways is still better than anything out there even though the OS and devices that utilize it (the various Sidekick models that exist these days) is quite a bit outdated compared to devices like the iPhone.
I miss my Sidekick immensely. I loved true multitasking, a fully capable QWERTY keyboard, and incredible battery life. Unfortunately it didn't sync well with calendaring software, didn't keep up with music playing, and is now partially controlled by Microsoft. There have been immense trade offs with moving to the iPhone but based on my main reason for owning an iPhone (I ride the bus and enjoy the music/video player and screen size) it was the right choice for me.
That said, "cloud computing" is something which usually works (and did, in the case of the Sidekick since 2002). I don't think that this is a proven warning sign that "cloud computing" isn't as reliable as everyone believes, I just think it's proof that companies need to do a much better job of ensuring data integrity than they could have ever imagined before.
Will I stop using Flickr, Google products, and other future "cloud" devices/software because of this? No. I am smart enough, as a computer savvy end-user, to keep my own backups of my data but I do believe people need to become better educated in what can and will happen as we move to the model we have slowly done in the last 10 years.
We'll, I was hoping to just google cloud vs. grid vs. distributed vs. cluster vs. etc. computing, but there doesn't seem to be much official-sounding distinction out there. Which means if we start our own thread here it might become definitive!
"cloud" computing: fluffy term used by people who really don't know anything other than that they run their applications from a web page and their data appears to be stored on the web because they can access it from more than one web browser.
"hosted" / "server farm" computing: buying server resources from someone who has a real datacenter who tries to take care of your hardware. You access all of your data over the network "cloud". Redundancy & support varies based on pricing & services.
"grid" / "utility" computing: computing infrastructure where you should be able to simply scale up CPU, data, etc. resources for your operation simply by throwing money at turning on more boxes. You don't necessarily need to share it with others, though.
"cluster" computing: a computing system made up of more or less independent, generally homogeneous nodes, where problems can be partitioned out. Generally has some form of redundancy so you don't lose work when a single node dies, but probably won't survive a data center failure.
"distributed" computing: special applications that can be farmed out to the net to break parts of computing or storage across a heterogeneous network of computers distributed over many locations. Ideally it's written to be highly redundant and tolerate faults such as nodes joining / leaving the cluster.
As far as reliability goes, the TIA data center tiers seems to be the only common way of talking about maintaining "business continuity". I've read through it briefly, and can somewhat paraphrase the intent (mildly inaccurately, mostly because the standard itself is kinda loose and not defined in too much detail with regards to servers) as:
Tier 1 "basic" : You have a room for servers with a door to keep random people from tripping over the plugs. Maybe you have a UPS on your server so it can do a graceful shutdown without data loss when the power or AC goes out.
Tier 2 : You have your stuff in racks with a raised floor for air conditioning and some wire racks hanging from the ceiling for cable management.
Tier 3 : You have redundant UPS's and RAIDs, CRACs, network links, and stuff, so you can make repairs when common things break without turning off the system (typically anything with moving parts or high currents, like power supplies, fans, disks, batteries needs to be hot-swappable). Which means you should also have some sort of monitoring and alert system so you know when that stuff actually fails so you can replace it before the redundant components also fail. This is intended to reach 24x7 availability with high uptimes... , maybe 3-5 nines.
Tier 4 : Like Tier 3, but certified for mission-critical / life-critical use, like in hospitals and maybe for airplanes and stuff. It should survive prolonged power outages (so you have a diesel generator with a day or two worth of fuel.)
Unfortunately, it just covers build specs for individual data centers, so it doesn't really cover other business continuity things like maintaining offsite backups so you can somewhat easily rebuild from scratch if a natural disaster takes out one of your data centers or something. But it's kind of different worlds of IT between designing facilities and architecting "cloud" services, which unfortunately don't seem to communicate or collaborate as much as they should to reach the kinds of "distributed grid of redundant load-sharing data centers" configurations we'd expect.