Server Failure Destroys Sidekick Users' Backup Data

As if millions... by Anonymous Coward · 2009-10-10 21:33 · Score: 5, Funny

homemade cell phone porn videos cried out and then were silenced.

Re:As if millions... by Z00L00K · 2009-10-11 03:28 · Score: 1

And what does Borland say about this problem with Sidekick?
At least when I first saw the Sidekick reference I was thinking about that old Borland TSR software.

--
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
Re:As if millions... by davester666 · 2009-10-11 06:31 · Score: 2, Insightful

It is really 'backup' data?
From the sounds of it, each Danger phone loads its data from the 'cloud' whenever it's powered on, and syncs the data as it changes. To me, this makes the 'cloud' the live data store, and the phone just the local cache...

--
Sleep your way to a whiter smile...date a dentist!

"they should have used ZFS or btrfs" by Manip · 2009-10-10 21:34 · Score: 5, Insightful

This seems a rather silly point to make. I know this is Slashdot and we have to suggest Open Source alternatives but throwing out random file systems as a suggestion to fix poor management and HARDWARE issues is some place between ignorant and silly.

Perhaps they should have had at least mirrored or stripped raid, with an off-site backup every week or so?

Re:"they should have used ZFS or btrfs" by timmarhy · 2009-10-10 21:46 · Score: 4, Insightful

retarded comments like that are the reason these zealots aren't taken seriously in the enterprise.
i'd hazard a guess that the offsite backups were corrupted as well somehow or were silently failing.

--
If you mod me down, I will become more powerful than you can imagine....
Re:"they should have used ZFS or btrfs" by rastilin · 2009-10-10 21:50 · Score: 4, Informative

This seems a rather silly point to make. I know this is Slashdot and we have to suggest Open Source alternatives but throwing out random file systems as a suggestion to fix poor management and HARDWARE issues is some place between ignorant and silly.
Not as silly as it might appear. One of ZFS's main functions is that it can compensate for some degree of hardware failure.

--
How do you kill that which has no life?
Re:"they should have used ZFS or btrfs" by gravos · 2009-10-10 22:01 · Score: 4, Informative

The current major cloud providers (Google and Amazon) both replicate your permanent data to multiple hard disks (Google: 3, not sure about Amazon) in multiple areas of the datacenter, and I know Google is looking at providing replication to different datacenters (which is more complex than replication in the same datacenter because of the time delay).

--
This game will waste your life. Don't clicky!
Re:"they should have used ZFS or btrfs" by sopssa · 2009-10-10 22:12 · Score: 5, Insightful

Exactly, this can be a software bug too and that could possibly easily destroy or corrupt backup data too. I really doubt this service was ran without backups.
The type of filesystem has nothing to do with this.
Re:"they should have used ZFS or btrfs" by Anonymous Coward · 2009-10-10 22:12 · Score: 0

Perhaps they should have had at least mirrored or stripped raid
First of all, it spelled "striped" and second, that was probably what they had (http://en.wikipedia.org/wiki/RAID#Standard_levels)
- Peder
Re:"they should have used ZFS or btrfs" by Anonymous Coward · 2009-10-10 22:16 · Score: 1, Interesting

Using one ZFS would just create a different single point of failure, one which is also relatively complex and therefore does not provide satisfactory disaster recovery options. Redundancy should be provided by independent systems. That means that they're ideally implemented differently even though they serve the same function. For example, it's pretty useless to have two fibers coming into a facility if these fibers are taking the same route or are even in the same bundle: A backhoe will get both of them. An implementation bug in a filesystem will very likely affect both redundant stores. Even using two separate filesystems has that flaw. Storage systems should keep redundant data on separate systems with different filesystems. Then the single point of failure is the splitter which sends the data to both storage systems. A failure at that point does not destroy the data. It only affects your ability to access it. Due to its low complexity, it's also a component which can easily be replaced.
Re:"they should have used ZFS or btrfs" by Threni · 2009-10-10 22:17 · Score: 1

> Not as silly as it might appear. One of ZFS's main functions is that it can compensate for some degree of hardware failure.
The problem is not how to compensate for "some degree of hardware failure", but how to avoid any data loss. I believe the answer is `keep full backups` and you can do this perfectly well on FAT32.
Re:"they should have used ZFS or btrfs" by rastilin · 2009-10-10 22:47 · Score: 1

The problem is not how to compensate for "some degree of hardware failure", but how to avoid any data loss. I believe the answer is `keep full backups` and you can do this perfectly well on FAT32.
Even with full backups, you'll still lose the data you had between the last backup and the failure event

--
How do you kill that which has no life?
Re:"they should have used ZFS or btrfs" by Znork · 2009-10-10 22:47 · Score: 5, Insightful

I really doubt this service was ran without backups.
Knowing 'enterprise' backups I'd bet there was at least a backup client installed and running. However, I'm equally sure that the backups were, at best, tested once in a disaster recovery exercise and were otherwise never verified.
Further, responsibility would probably be shared between a storage department, a server operations department and an application management department, neatly ensuring that no single person or function is in the position to even know what data is supposed to be backed up, what limitations there are to ensure consistency (cold/hot/inc/etc), to monitor that that's actually what does happen and that it keeps happening as the application and server configuration evolves.
Backups of dubious value do not seem to be a rarity in enterprise settings.
Re:"they should have used ZFS or btrfs" by malchus842 · 2009-10-10 22:49 · Score: 5, Interesting

One reason why our corporate policy is that we actually have to validate backups for every system on a regular basis (this means doing a full restore of a tape called from off-site), where the regularity is directly proportional to the criticality of the system. The more critical, the more often we test. On our iSeries, they restore the weekly backup tape EVERY week on the QA server - both for the purposes of refreshing it, AND to validate the backups. We also have a quarterly 'random' test where a system is chosen randomly and it must be recovered from bare metal using only our standard procedures + the backup tape.
We've discovered all kinds of strangeness with backup tapes through the years. Our Tier 1 systems have completely separate instances in geographically diverse areas, with data-replication.
Granted, this isn't cheap, but our data isn't either.
Re:"they should have used ZFS or btrfs" by WarlockD · 2009-10-10 22:55 · Score: 4, Interesting

Ever try to restore from a ZFS corruption? It IS easy and it can be done. However...

What if the data was on an EMC storage array and the tech told them its all lost? What if your dealing with a Teir 1 vender (I am looking at you Dell Equallogic) that swears UP and DOWN that there is no way to recover the system after a second drive out of a RAID 5 has been pulled? Hell, try just a standard raid 5 card from a Teir 1 vender. (Not talking about calling like 3ware support directly, they are honestly good and recovered a few arrays with them)

I "suspect" that they are running it off a storage array that failed big time, or lost the LUN, or just someone decided to die and take the server with it. There is just to much we don't know. Was Dagger installed on multi-servers? Was it clustered? Is it a cloud system? Does it run its own storage system or requires additional hardware?

But you know what? ZFS, EMC even Windows 2008, All moot. Why? WHERE ARE THE TAPE BACKUPS?!?! SERIOUSLY. The ONLY way they have lost ALL that data was that they didn't have backup solution. Otherwise their "press release" would say "...however we will be restoring the data from last week/months tapes..."

I do like how they keep saying "Microsoft/Danger" as if they are at fault. A good admin would expect a new car would catch fire and run into a bus full of nuns.
Re:"they should have used ZFS or btrfs" by mike260 · 2009-10-10 22:55 · Score: 4, Funny

There are plausible reports as to how this happened here.
tl;dr - They tried upgrading their SAN without making a backup first, and the upgrade somehow hosed the entire SAN.
Re:"they should have used ZFS or btrfs" by Anonymous Coward · 2009-10-10 22:58 · Score: 3, Insightful

Repeat after me, you haven't got backups unless you've tested RESTORES.
Re:"they should have used ZFS or btrfs" by Rakshasa+Taisab · 2009-10-10 23:05 · Score: 3, Funny

A bug that sneaks into the two or three offsite locations, destroying the tapes which are randomly checked before being shipped to ensure they contain valid data? Really nasty those bugs.

--
- These characters were randomly selected.
Re:"they should have used ZFS or btrfs" by asaul · 2009-10-10 23:19 · Score: 5, Interesting

Dubious backups? Depends. We had a system which was a 6TB cluster that was notoriously difficult to back up. This went on for years, it took too long, failures caused issues downstream etc. Then someone took a moment to realise that the application was not capable of re-using that 6Tb of data if it was restored - once the data came in it was processed and archived. To recover the application all they had to do was backup a few gig of config and binaries, and restart slurping data from upstream again. Viola - backup stripped down to nothing, 6TB a day of data less to backup, and next to no failures as it was now so quick to backup.
Then there is the case of an application which the vendor and application developer signed off on using a backup solution using a daily BCV snapshot. What they failed to tell us was application not only held data in a database, but in a 6G binary blob file buried deep in the application filesystem. If the database and the binary where out of sync in any way, it could mean missed or replayed transactions or generally that the application was inconsistant. As this was an order management platform, that was bad. You can guess the day we found out about this dependancy.... yup, data corruption, bad vendor advice screwed the binary file and all we had to go on was a backup some 23 hours old where the database was backed up an hour after the application. Because of a corresponding database SNAFU, the recover point was actually another day before that, with the database having to be rolled forward. It was at this point we found out the despite the signed off backup solution, the vendors documented recommendations (that were not supplied to us) was that the only good backup was a cold application one - not possible on a core order platform. Thankfully after some 56 hours of solid work the application vendor managed to help sort the issue out and the restore from backup was not actually needed. The backups were never really tested as the DR solution worked on SRDF - the DR consideration for data corruption was never really part of the design (from a very high level, not just this platform).
So there you have it. Two dubious Enterprise backups - one not needed, the other not usable.

--
"If everybody is thinking alike, somebody isn't thinking" - Gen. George S. Patton
Re:"they should have used ZFS or btrfs" by asaul · 2009-10-10 23:28 · Score: 0

Which begs the question, where are THOSE backups then?

--
"If everybody is thinking alike, somebody isn't thinking" - Gen. George S. Patton
Re:"they should have used ZFS or btrfs" by Anonymous Coward · 2009-10-10 23:33 · Score: 0

Is it a cloud system?
It's safe to say it wasn't made of water vapor.
Re:"they should have used ZFS or btrfs" by petes_PoV · 2009-10-10 23:35 · Score: 5, Insightful

It's not a backup unless you can prove it will restore. Until then it's just a waste of tape, or disk, and time
The point about backups is not to tick the box saying "taken backup?" but to provide your business / customers / whatever with a reliable last resort for restoring almost all their data. If you don't have 100% certainty that it will work, you don't have a backup.

--
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
Re:"they should have used ZFS or btrfs" by bertok · 2009-10-10 23:57 · Score: 4, Interesting

There are plausible reports as to how this happened here.
tl;dr - They tried upgrading their SAN without making a backup first, and the upgrade somehow hosed the entire SAN.
That's the thing that has always worried me most about SANs: you have all your eggs in one basket. No matter how redundant or reliable the hardware is, one bad update or trigger-happy admin can cause the instant loss of all your data. That's only slightly better than having your data center burn down. You still have your hardware, but a total restore like that can be a nightmare. I've heard somewhere that 80% of corporations couldn't recover from a scenario like that.
Here's some fun numbers: a typical tape restore runs at something like 70MB/sec, if you're lucky, per tape drive. Some small low-end SANs that I see people buying these days are 10TB or bigger. At those speeds, it takes 40 hours to restore the complete system. What's worse is that it doesn't scale all that well either, you can get more drives, but the storage controllers and back-end FC loops become a limit. If you have some big cloud provider scenario, a complete restore could take days, or even weeks.
What's scary is that mirroring or off-site replicas don't help. If your array starts writing bad blocks, those will get mirrored also.
Re:"they should have used ZFS or btrfs" by CODiNE · 2009-10-11 00:09 · Score: 1

What's really retarded is that using zfs would prevent bitrot and warn you of impending or intermittent hardware failures but is seen as OSS zealotry by people who haven't thought out the problem.

--
Cwm, fjord-bank glyphs vext quiz
Re:"they should have used ZFS or btrfs" by Nerdfest · 2009-10-11 00:35 · Score: 1

I've always been amazed that tape is trusted as much as it is. It seem (anecdotally at least) to have a disproportionately high failure rate.
Re:"they should have used ZFS or btrfs" by vk2 · 2009-10-11 00:36 · Score: 5, Interesting

Thats why you have logical redundancies. I work for a fortune 10 company and this is a standard practice for all mission critical applications. The application has be to geographically redundant with install base at least at 3 data centers (ATL,SEA and DLS). Different SAN technology at each DC. All Oracle databases have 2 physical dataguard configuration with 4 hours and 8 hours latency (to guard against user errors) and all J2EE apps hard configured to switch connections from one db to the other almost on the fly or with a reboot. Some really really critical databases have all this and transaction duplication via Goldengate to remote databases to off load reporting queries. We have had issues where SAs screwed up allocating LUNs and ended up f*cking up the file systems but we recovered in every scenario even a 30 TB DB restore over 2 days.

Its amazing a consumer serving company like T-Mobile risked itself by hosting their application on Microsoft platform;. Furthermore where is the DR in all this? Who the F*ck in the right mind fiddle something on SAN without confirming a full backup of all applications/databases? It appears that Hitachi and Microsoft are at fault here (if SAN maintenance is the root cause of this failure) but T-Mobile is the fool allowing these companies to ruin their data. Not only there won't be any consequences because of this issue to MS or Hitachi - T-Mobile will be pouring in more money to fly in the MS and Hitachi consultants.

--
No Sig for you.!
Re:"they should have used ZFS or btrfs" by JasonBee · 2009-10-11 00:39 · Score: 2, Interesting

In our environment, a large government shop, our data volumes are capped at around 1 TB of storage for that very reason. Between the SAN, and the tape backups...they just simply have to create a physical cutoff point for data storage due to those onerous recovery periods.
There is nothing wrong in our shop with having TWO 1 TB volumes, but you will never get approved to have one single 2TB. Problem solved...at least for file storage. Database backups are managed via other mechanisms like replication.
Re:"they should have used ZFS or btrfs" by Jezza · 2009-10-11 00:58 · Score: 1

Err... This is cloud computing, right? Why do you have off-site backups every week or so?! The data should be stored in multiple geographic locations ALL THE TIME. The ZFS suggestion isn't as dumb as you might think, you tell ZFS not to prune old data, then if stuff gets "deleted" it's still on the disk (I won't bore you with an explanation here). You're right ZFS won't help you against something that destroys (physically) the disks (so multiple locations are required) but it will help you against hacking or software errors.
Of course, ZFS isn't the only way to do this; the reader might have their own ideas too (I'm not suggesting they are any less right than the one detailed here).
Re:"they should have used ZFS or btrfs" by IamTheRealMike · 2009-10-11 01:02 · Score: 4, Informative

I'm not sure what you mean by "cloud provider" as such but Google App Engine has always been replicated across datacenters.
Re:"they should have used ZFS or btrfs" by Tweezer · 2009-10-11 01:03 · Score: 2, Informative

Even with a SAN you need to limit volumes sizes to whatever size you can restore within the acceptable restoration window. There are also those times where you just want to run a chkdsk and if the volume is too big, it takes too long.
That being said, I can't believe they didn't have any backup. Even if they skipped the pre-upgrade backup, they should have had one from last night/week/month. Any of those options would be better than nothing. I have to assume they were doing backup to disk on the same SAN they were upgrading, which is pretty dumb. I still can't understand why they didn't have a backup at another site somewhere else in the world. We do that sort of thing all the time where I work.
Re:"they should have used ZFS or btrfs" by jimicus · 2009-10-11 01:08 · Score: 5, Informative
I've always been amazed that tape is trusted as much as it is. It seem (anecdotally at least) to have a disproportionately high failure rate.
I'm not sure that's the problem so much - after all, LTO has a read head positioned directly after the write head and automatically verifies as it goes along. A tape error is dead easy to spot.
There are a number of places where things can fall apart, and tapes don't even need to come into the matter:
- Nobody checking the logs
- Failure to understand the processes necessary to get a good backup. (You can't just dump the files that comprise a database to disk - you must either quiesce the database or use the DBMS' inbuilt backup routine - or you will wind up with inconsistent files and hence an inconsistent database. You'd be amazed how many people don't understand this.)
- Failure to maintain backup processes. (When you moved the database to another disk because you were running out of space, you did update your backup process? Right?)
- Not doing any test restores.
- Not doing enough test restores, or doing them carefully enough. (If you're unlucky, your database will come back up OK even though you didn't quiesce it before carrying out the backup. Why do I say unlucky? Well, if it had not come up OK, you'd know immediately that there was a problem with your process. Then once the database is back up, make sure you check the restored data to ensure that recent transactions which should be on the backup actually are).
Re:"they should have used ZFS or btrfs" by jimicus · 2009-10-11 01:09 · Score: 1

I wouldn't say that, but ZFS is still a little young for my liking. There are plenty of horror stories concerning data loss, and more to the point plenty of recent horror stories.
Re:"they should have used ZFS or btrfs" by Jezza · 2009-10-11 01:10 · Score: 3, Interesting

The kind of filesystem have help - I'm familiar with ZFS concepts so I'll stick to those:
In ZFS when you write to a file you don't write over the pre-exisiting data, you write elsewhere then that gets mapped in upon success, the old data is still there and you can see the aged mapping (you know what was there). Now you can at this point recycle this space. However, you can switch this pruning off, now you have a complete record of everything that was ever done on the disk. To stop it ever running out of space I can either: Add disks to the disk-pool to stop that, or prune very old data (older than a give age - maybe 6 months?).
So it helps.
Re:"they should have used ZFS or btrfs" by cupantae · 2009-10-11 01:11 · Score: 2, Insightful

When I read that you had quoted "I really doubt this service was ran without backups," I twitched and the thought
I know it's bad grammar, but let's just ignore it, please
was loud in my ears. I was so relieved when I saw that you weren't mentioning it. I don't know what this makes me, but it happens all the time. I'm definitely bothered by poor grammar and spelling, but I want no one to ever point it out.

--
--
Re:"they should have used ZFS or btrfs" by Ant+P. · 2009-10-11 01:22 · Score: 1

If I was using a service like this and found out they were moronic ricers running btrfs, an unfinished filesystem where the disk format hasn't even been finalised, I'd pull my data out immediately.
Re:"they should have used ZFS or btrfs" by webmistressrachel · 2009-10-11 01:39 · Score: 0, Troll

it must be recovered from bare metal using only our standard procedures + the backup tape
Wow, it must be really difficult re-installing all that proprietory firmware back to the BIOS, NIC, RAID etc. then getting it to bootstrap the *Insert OS Here" bootloader all on your own, I bet you don't get any help from the OEM there...
Troll or funny, take your pick...

--
This tagline was transcoded to result in at least one smirk. If you experience failure to smirk, please consult your Gen
Re:"they should have used ZFS or btrfs" by Antique+Geekmeister · 2009-10-11 01:46 · Score: 2, Insightful

I've had something like that happen. The recovery system for a partner had never been tested with a _full_ recovery, only with recovering a few selected files. But because someone decided to get cute with the backup system to pick and choose which targets got backed up, individual directories each got their own backup target. Thousands and thousands of them. And the backup system had a single tape drive, not a changer.
The result was that to restore the filesystem, the tapes had to be swapped in and out to get the last full dump, then the incremental dump, of _each_ of the thousands of targets. Fortunately for them, I managed to liberate an under-used tape library, but the incredible amount of time having the tape drive grind back and forth to find the different targets on each tape was also incredibly nasty. We helped them find other solutions for that issue, but it was nasty to clean up. And unfortunately for them, they didn't _have_ a large enough repository to have tested the full restoration procedure.
The point is that "random checks" are not enough. You have to actually do a full test, once a year. This is also why I despise people who sell monolithic, "high availability" storage systems that are not partitioned enough to create a mirror of your active data anywhere.
Re:"they should have used ZFS or btrfs" by Anonymous Coward · 2009-10-11 02:04 · Score: 2, Funny

-1 "Thinks he's funny"
Re:"they should have used ZFS or btrfs" by Cylix · 2009-10-11 02:09 · Score: 1

Kinda why your backup solution has to scale with your data.
There are monstrous libraries available and when restoring from them you simply dedicate multiple channels to the restoration process.
Depending on the arrangement of systems, archives and tape equipment it may not be ideal to restore directly from tape to host.
I don't know if I find that particular story plausible though. We had much the same issue because despite expensive contracts these companies routinely dole out work to contract technical staff... ie warm bodies.
It is entirely possible to recovery data. Even if they do something silly like blast all of the dcb data from the system and format the drives.
Unless everyone is horribly clueless I'm going to guess the issue is a bit more complex than a simple SAN explosion.

--
"You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
Re:"they should have used ZFS or btrfs" by RichardJenkins · 2009-10-11 02:11 · Score: 1

Here's some fun numbers: a typical tape restore runs at something like 70MB/sec, if you're lucky, per tape drive. Some small low-end SANs that I see people buying these days are 10TB or bigger. At those speeds, it takes 40 hours to restore the complete system
Why not just buy two of those 10TB sans, keep one in your office (call it primary) keep one with your hosting provider (call it secondary). You keep the secondary sync'd up two a day or so behind your primary. As a part of your regular backup routine you record an each days changes.In the case of a catastrophic failure, you swap round the hardware, and sync it up with the latest set of changes.
The real problem in the scenario you describe if that the 10TB SAN is a single point of failure with unacceptably long recovery times. Just because you have a huge SAN powering a bunch of servers doesn't mean it has to be a case of "all your eggs in one basket"
Re:"they should have used ZFS or btrfs" by hedwards · 2009-10-11 02:13 · Score: 1

Precisely, and that's why I store the local version of my backups on a 1gb ZMIRROR, sure it's more than I need in terms of space, but it tells me when things are going bad without having to go through and check everything constantly myself. And as you suggest, it does not get you out of the trouble of backing up offsite, as a flood, fire or theft, not to mention operator error, could cause complete data loss, but it does solve the data corruption problem nicely.

It also happens to be wonderfully easy to backup using snapshots, which if you start immediately often times allows for nice chunks as the data goes in and you can later on consolidate them into sane start points at your leisure. Believe me I wouldn't have put up with this convoluted Open Solaris in a Virtualbox set up if it wasn't totally worth it.

I wonder if this means that MS is going to put in ZFS like pretty much everybody else for a future release. I know that Apple pulled it out of their most recent release, but I doubt very much that it's a permanent removal, more likely they didn't have time to perfect it and were afraid of being caught like that other major data loss bug from several years back.
Re:"they should have used ZFS or btrfs" by Cylix · 2009-10-11 02:18 · Score: 2, Interesting

Well the first problem was the EMC storage array.
The second problem is believing the tech when he says the data cannot be reclaimed.
The third problem is using a simple raid 5 volume on a great deal of data. Multiple drives fail all the time! Hell, racks of servers fail in unison.
Even if the DCB data is corrupted this can be corrected even on a large SAN.
All or part of the data is generally recoverable.
Either this was an impossibly horribly managed install or something very complex has happened. Generally, the more severe instances are because of multi-faceted failures and not something so simple as lost array data.

--
"You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
Re:"they should have used ZFS or btrfs" by Alpha830RulZ · 2009-10-11 02:36 · Score: 2, Funny

Something tells me you have grey hair and wrinkles. And I say that in a good way.

--
I was taught to respect my elders. The trouble is, it's getting harder and harder to find some.
Re:"they should have used ZFS or btrfs" by uncleFester · 2009-10-11 02:40 · Score: 2, Informative

"Who the F*ck in the right mind fiddle something on SAN without confirming a full backup of all applications/databases?
people who drink the kool-aid whenever vendors of said products repeatedly swear up and down all their tasks/patching/operations are 'totally no-impact and no-visibility changes.' combine that with people unwilling to take downtime or spend $$$ to properly protect the contents ahead of time and you have just cooked a recipe for disaster.
-r (not speaking from personal experience.. of course.. :/ )

--
-'fester
Re:"they should have used ZFS or btrfs" by Anonymous Coward · 2009-10-11 02:44 · Score: 0

Google already provides replication to multiple datacenters for app engine:
http://googleappengine.blogspot.com/2009/09/migration-to-better-datastore.html
Re:"they should have used ZFS or btrfs" by Anonymous Coward · 2009-10-11 03:11 · Score: 0

This one's probably not Hitachi's fault.
They were upgrading their SAN, and they outsourced it to a Hitachi consulting firm
...assuming of course that the Hitachi consulting firm was somebody other than Hitachi.
If you understand the concept of storage abstraction, and take on face-value the reports that this is a botched SAN upgrade, then you understand that the culprit is not the server OS. No SAN-connected OS instance, "nix" or otherwise, can make data appear that isn't there.
Re:"they should have used ZFS or btrfs" by drjzzz · 2009-10-11 03:14 · Score: 3, Interesting

It's not a backup unless you can prove it will restore. Until then it's just a waste of tape, or disk, and time
True. There's a similar problem in biological research, where people think they have secured frozen samples but they haven't tested whether the samples are valuable after thawing. For example, frozen cells might not be viable, or RNA might be degraded. Too often the samples are just wasting freezer space. Anybody can freeze (or backup), the question is whether what you thaw (restore) is valuable.

--
to err is human, to forgive is divine, to forget is... umm...
Re:"they should have used ZFS or btrfs" by Antique+Geekmeister · 2009-10-11 03:41 · Score: 4, Funny

It's not the gray hair (or what is left of it!), and those aren't wrinkles. They're laugh lines from the terrific amusement when some youngster ignores the hard-won lessons of the last millennium, especially when they have to call me or someone like me to clean up the mess. The laugh lines are especially deep from when I collected a paper trail to show where their supervisor ignored my written warnings about the danger: those are used with caution, but can be very, very handy.
Re:"they should have used ZFS or btrfs" by runningduck · 2009-10-11 04:04 · Score: 2, Interesting

At the very least they should have been segmenting customer data. How could a single failure outside of a ten mile wide asteroid hit wipe out all customer data? Was everything stored in a single giant registry? I see this a one of the single greatest failings in current system design. Top professionals trust tools more than data design and management processes. I would say the same thing if they were using ZFS or btrfs. Technology is NOT a solution. Technology is at most a tool that contributes to an overall solution. Without proper automated control systems and at least some form of manual verification reliance on pure technology solutions is little more than blind faith.

--
-rd
Re:"they should have used ZFS or btrfs" by Anonymous Coward · 2009-10-11 04:27 · Score: 0

Or better yet, something that allows network replication: OpenAFS. http://www.openafs.org
Re:"they should have used ZFS or btrfs" by Anonymous Coward · 2009-10-11 04:33 · Score: 0

dirty old ike, is that you?
Re:"they should have used ZFS or btrfs" by Kjella · 2009-10-11 05:14 · Score: 1

Not saying I disagree but is the point about anything to "tick the box"? A feature that's on the application checklist but isn't actually useful or usable won't do anyone any good, no matter what we're talking about. It's the same issue of imperfect information, imperfect distribution of responsibility and ultimately about cost incentives. Try telling your superior that we need to use this quarter's profit on getting a backup system upgrade for something that high or might not happen in the next years. Or just that the money is on the wrong budget, I've billed clients extra hours for working on ridiculously ancient hardware and over many months I'm sure I've billed $5000 for not buying a $500 machine.
I've been getting more and more respect for the challenges of leading a large company well. It's like herding a whole pyramid of cats. Perhaps this was a CEO making insane cost cut demands to the CIO. Maybe the CIO was pulling shit because the other C?Os don't understand IT. Maybe the head of storage and backup is incompetent. Maybe the guy who mainly wrote the spec discovered his wife was sleeping around and had his thoughts elsewhere. At really any point in the chain between "corporate goals" and actual implementation there could be a failure to give responsibility, failure to provide the money, incompetence, greed, lack of follow-up or really any other kind of FAIL that means the need that's clear on the grass root level never gets properly communicated up and down the chain of command.
Not sure where I wanted to go with this but I just recognize it all over the place.

--
Live today, because you never know what tomorrow brings
Re:"they should have used ZFS or btrfs" by osu-neko · 2009-10-11 06:35 · Score: 1

Failure to understand the processes necessary to get a good backup. (You can't just dump the files that comprise a database to disk - you must either quiesce the database or use the DBMS' inbuilt backup routine - or you will wind up with inconsistent files and hence an inconsistent database. You'd be amazed how many people don't understand this.)
Ha! Yup. I was working at a place once where we were discussing how backups were done, and this included backing up the database files while it was in-use. The senior engineer said this was okay since, yes there was a risk of problems, but at worst it would be like a power disconnect during db usage. Run the recovery program to fix any inconsistency like after an unscheduled power-cycle and all is good. Being junior and young, I just nodded and didn't give it another thought.
You ever wake up in the morning, not with that usual gradual pleasant return to consciousness, but a very sudden bolt-upright painful snap to consciousness, with certain understanding that something you did yesterday was horribly, horribly wrong, and you now understand completely what messed-up decision you made and just what kind of time-bomb you've left in your wake? XD
Luckily nothing happened before we started doing it right...

--
"Convictions are more dangerous enemies of truth than lies."
Re:"they should have used ZFS or btrfs" by Anonymous Coward · 2009-10-11 07:18 · Score: 0

I am working for a client who the IT department have promised management they wil NEVER loose any email. The problem is that there are so many places in the email chain that data could be before being stored in the mailboxes. It could be in a queue on an MTA server or in a delivery queue on an antispam server etc. Our dr test showed that even when the snapshots copied to our dr site the Data there would corrupt because of an issue with the mail servers in that environment that didn't show up until under load. Scary stuff.
Re:"they should have used ZFS or btrfs" by ximenes · 2009-10-11 08:02 · Score: 1

That is actually the accepted practice for backing up InnoDB databases with MySQL. Of course, you take a filesystem snapshot to get a point-in-time in-use copy of the database files; backing up the real filesystem wouldn't work due to the time shift of the data while you're backing up.
You do have to recover the files upon restoration (I would usually run the recover after the backup finished, since time is of the essence when you need to do a restore), but by not having to take a read lock or halt MySQL, you avoid a service interruption or having to replicate out your data just to backup (which could be problematic in itself, what if the slave server is not in sync?).
Of course, this works because of the way that storage engine functions on disk. You can't do this with MyISAM tables or you'll be in a world of hurt. There are also online backup tools readily available, which are the superior solution now in my opinion.
Re:"they should have used ZFS or btrfs" by AK+Marc · 2009-10-11 08:44 · Score: 2, Informative

Ever have a tape drive with mis alligned heads? That one drive and only that one drive will be able to read those tapes, and sometimes even it can't read them after the tape is ejected, but will show OK on a verify done before the tape is ejected. You either have a verified backup that can't be used, or a pile of tapes that are completely useless if that drive ever fails.

I found one of these when doing a backup/restore to upgrade a server (backup the data from ServerA and restore the data on ServerB). It took a while to figure out why the tapes worked perfectly in ServerA and not at all in Server B (internal tape drives, fixed by swapping the drive from ServerA into ServerB for the restore, then discarding ServerA and the drive from it after).

For a server-loss scenario (fire, theft), this means there is no backup, yet something that wouldn't be discovered without restoring on a separate system. No idea how common this is, but in dealing with not many situations where it could pop up, I've seen it all of once.

--
Learn to love Alaska
Re:"they should have used ZFS or btrfs" by AK+Marc · 2009-10-11 08:54 · Score: 1

When someone claims it is a fix when the cause isn't even clearly known, it sounds a lot like zealotry, rather than a constructive suggestion.

--
Learn to love Alaska
Re:"they should have used ZFS or btrfs" by Myrimos · 2009-10-11 09:12 · Score: 1

Who the F*ck in the right mind fiddle something on SAN without confirming a full backup of all applications/databases?
Hopefully, somebody who'll be looking for a new job tomorrow. I wonder if T-Mobile will be hiring?

--
Internet scofflaw
Re:"they should have used ZFS or btrfs" by petermgreen · 2009-10-11 10:42 · Score: 1

then discarding ServerA and the drive from it after
Wouldn't the more logical thing to do have been to clearly label it and store it with the offsite backups until such time as you were sure all backups made with it were no longer needed.

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Re:"they should have used ZFS or btrfs" by myz24 · 2009-10-11 10:50 · Score: 1

I actually prefer to backup a MySQL slave because you can very easily tell it to stop replication and then do a snapshot or mysqldump while it is not replicating. You get a simple point in time snapshot of your database system. Once the backup is done tell the slave to start replication again. This assumes that your master database isn't over loaded trying to replicate data to slaves as the now out of date slave is going to need a lot of data to get caught up.
Re:"they should have used ZFS or btrfs" by Anonymous Coward · 2009-10-11 10:52 · Score: 1, Interesting

In my experience, tape errors are very low, especially compared with other backup media (DVDs, CDs, hard disks.)
The problem is that with all the data that goes onto tape, the relatively small chance of errors ends up getting magnified. However, backup policies, tape rotations, RAIT (at the high end), and different backup methods minimize the damage errors can do.
Re:"they should have used ZFS or btrfs" by msi · 2009-10-11 11:04 · Score: 1

How do you do a bare metal recovery without unreasonable downtime? I have recovered from major failures a couple of times luckly other peoples mistakes but it has always taken far too long to do in a sceduled downtime window. I have recovered to other systems which seem to work in simulation but with the production system still up and over a day or so to get everything off the tapes.
Re:"they should have used ZFS or btrfs" by Anonymous Coward · 2009-10-11 12:14 · Score: 0

Rumour in Sunnydale is that they had been mandated to change from a Linux distro to a WindowZ distro. If so the implication is that this was user error by a user fully skilled in one but not both OS.
I do know that European mandates prohibit having users confidential data live in two places for privacy reasons. This European mandate complicates a number of cloud redundancy/ backup issues.
Since I do not know I feel safe in venturing a guess that a WindowZ system was connected to a large SAN/ RAID and WindowZ clobbered volume headers headers on 'other partitions'. I know that years back this was a very real issue and since I do not know which version of which OS my guess is as good as any.
Conversions from OS A to OS B not matter what values you have for A and B are Trouble.
Re:"they should have used ZFS or btrfs" by CastrTroy · 2009-10-11 12:20 · Score: 1

Well, the real accepted way is to do a "Flush tables with readlock" which will purge out all data to the disk, and lock all tables for writing. You then do a filesystem snapshop, and then you release the lock. You get a short interruption, of a few seconds when you cannot write to the tables, but not long enough for anything to timeout, with sane timeouts.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:"they should have used ZFS or btrfs" by sglewis100 · 2009-10-11 13:30 · Score: 1

Further, responsibility would probably be shared between a storage department, a server operations department and an application management department, neatly ensuring that no single person or function is in the position to even know what data is supposed to be backed up, what limitations there are to ensure consistency (cold/hot/inc/etc), to monitor that that's actually what does happen and that it keeps happening as the application and server configuration evolves.
I'm in charge of storage (and many other things) for a MUCH smaller shop, our app is used by 300 locations comprising 55 companies and has a mere 4,000 users. Even if we didn't backup reliably (we do) and do twice yearly disaster recovery tests (we do), before a SAN upgrade, we also halt snapshots, upgrade our DR SAN, resume snapshots assuming the two versions are compatible (they tend to be), catchup, then halt snapshots, upgrade the production SAN, test, go back online.

Seriously, Microsoft ran one data center for the whole operation and had no other available data online? Forget tapes for a minute - that's nuts!
Re:"they should have used ZFS or btrfs" by sglewis100 · 2009-10-11 13:40 · Score: 1

That's the thing that has always worried me most about SANs: you have all your eggs in one basket. No matter how redundant or reliable the hardware is, one bad update or trigger-happy admin can cause the instant loss of all your data. That's only slightly better than having your data center burn down. You still have your hardware, but a total restore like that can be a nightmare. I've heard somewhere that 80% of corporations couldn't recover from a scenario like that.
If you have one admin with unfettered access to everything, then yes, you could have a problem. But SANs don't put "all your eggs in one basket". First of all, that same admin would have that same access if you used DAS. Which of our three SANs would you say we have all of our eggs in, by the way?

Here's some fun numbers: a typical tape restore runs at something like 70MB/sec, if you're lucky, per tape drive. Some small low-end SANs that I see people buying these days are 10TB or bigger. At those speeds, it takes 40 hours to restore the complete system. What's worse is that it doesn't scale all that well either, you can get more drives, but the storage controllers and back-end FC loops become a limit. If you have some big cloud provider scenario, a complete restore could take days, or even weeks.
I don't want to argue your point, since you are correct - small shops are buying setups like that. The smart shops, of course aren't. They are using a combination of nearline storage for recent backups, tape for long term storage, deduplication for efficiency, snapshots for quick recoveries, mirroring for offsite recovery, and naturally they have more than one LTO-4 drive driving their backups. Even those cheap Dell TL's support multiple drives. But you are right, a lot of people are buying hardware they have no ability to manage effectively.

What's scary is that mirroring or off-site replicas don't help. If your array starts writing bad blocks, those will get mirrored also.
I'm not really sure that's something to worry about, granted, I have less experience with the lower end of the SAN market.
Re:"they should have used ZFS or btrfs" by AK+Marc · 2009-10-11 13:41 · Score: 1

Yes. However, in this case, archival copies are made in a different manner (network push once every month, not used for regular backups because it takes too long). So trashing the old server and backups didn't lose anything of value.

--
Learn to love Alaska
Re:"they should have used ZFS or btrfs" by ximenes · 2009-10-11 13:46 · Score: 1

Yes, but as I mentioned, this is not necessary specifically only with InnoDB. Because it writes to disk atomically, you will get a valid point-in-time copy of the database simply by taking a filesystem snapshot; no read lock required, which means the application can continue operating from the user's perspective.
The problem with a read lock is that, if done on a master DB, you will impact the production service that uses the database. Depending on the workload, this could take a minute or even longer, which is usually not acceptable.
However, there's another problem: MySQL performance degrades significantly on LVM when a snapshot is active. So even though the database continues operating as usual, performance will not be the same (and perhaps not at all adequate) during the backup period -- especially considering that you're doing extra disk I/O to get the data copied off.
So, I prefer to use xtrabackup these days. This presumes that you have no MyISAM tables though; otherwise you're back to mysqldump or taking a read lock or some other less desirable method.
One other point: if you backup with filesystem snapshots (of the raw DB files), then you have to restore the entire database during a restore. Maybe this is fine and maybe it's a huge headache.
There are a million ways to backup MySQL (and other DB's), and it really comes down to what kind of downtime you can tolerate during your backup. I generally want to back up very frequently, without impacting the service, and avoiding replication (and all of the headaches involved in that -- see the existence of tools like mk-table-sync for an idea of what can go wrong) if possible. If you don't have those requirements, then mysqldump or mylvmbackup or something else are totally valid options.
Re:"they should have used ZFS or btrfs" by ximenes · 2009-10-11 13:52 · Score: 1

That's a completely valid option, but I'm leery of MySQL replication due to prior experiences. When it works it's fine, it just has a few issues that I've had crop up. Keep in mind, if your official backup copy is coming from the slave, you have to make absolutely sure that the data is really in sync and up-to-date.
That means using tools like mk-table-checksum and mk-heartbeat from Maatkit. If you're not using them (or comparable things), then your data could be silently corrupted or out of date which would invalidate your backup. Note that seconds_behind_master from MySQL is kind of a joke for verifying that your slave is up-to-date.
My other beef with MySQL slaves is that they, by design, can only write in a single thread whereas the master can use all of its cores to do this. So even with two identical systems, the master may be fine at load and the slave may totally choke. People also have a habit of purchasing underpowered slaves, because "they don't do anything", forgetting that they still do 100% of the write load from the master, even if nothing else uses the host. Buying larger hardware just to keep up with the master for a once-an-hour backup feels dirty to me, but it is what it is.
Re:"they should have used ZFS or btrfs" by ndege · 2009-10-11 14:28 · Score: 1

And, to summarize, who pays for all of this?
The T-Mobile user.

--
Sig Return: 204 No Content
Re:"they should have used ZFS or btrfs" by saleenS281 · 2009-10-11 15:47 · Score: 1

Synchronous replication is great right up until it replicates the corruption. Replication is not a replacement for backup.
Re:"they should have used ZFS or btrfs" by Anonymous Coward · 2009-10-11 15:50 · Score: 2, Insightful

Nice background, but all useless when the problem they had was morons upgrading the SAN firmware without a proper backup...
Re:"they should have used ZFS or btrfs" by Anonymous Coward · 2009-10-11 17:11 · Score: 0

I find it interesting that folk are saying that ZFS or btrrs would have solved it given how hard they were apparently recruiting for Oracle RAC folk a year or more ago.
What if this was an Oracle DB on a raw SAN partition that got clobbered.... because the SAN was connected to yet one more system box that failed to be hands off of the regions that Oracle-RAC had critical bits on.
Re:"they should have used ZFS or btrfs" by kiwi-backup · 2009-10-11 18:29 · Score: 2, Insightful

Backup is expensive. Disaster recovery exercise is very expensive and bring no extra value to the customer. Managers wants more value for the customer to get more money, no extra expense. It's very hard for the security team to get some time on this kind of things.
Re:"they should have used ZFS or btrfs" by Anonymous Coward · 2009-10-11 20:16 · Score: 1, Informative

Just as well they didn't promise to never lose any email. In fact, I don't even know what it means to "loose" email.
Re:"they should have used ZFS or btrfs" by tgd · 2009-10-12 00:21 · Score: 1

Other than being a tech bigot, what does it being on a MS platform have to do with anything?
In the last 20 years I've seen vastly more incompetently managed enterprise Unix systems.
This sort of a failure is a process failure, not a technology failure.
Re:"they should have used ZFS or btrfs" by jabuzz · 2009-10-12 00:22 · Score: 1

That's bollocks. I routinely backup a 70TB file system, where routinely is every dam day. You just need to get better tools. For the record I use IBM's TSM for the backup and GPFS for the file system.
Re:"they should have used ZFS or btrfs" by jabuzz · 2009-10-12 00:30 · Score: 1

Did they upgrade their SAN or their storage arrays?
It is hard to see how upgrading the firmware on a bunch of fibre channel switches could hose the data on the disk.
On the other hand I can think of plenty of ways a firmware upgrade to a storage array can hose all your data. The first one is *NEVER* upgrade more than one shelf at a time. The second is *NEVER* upgrade the firmware in a more than one hard disk in a RAID array at a time. Don't believe *ANY* vendor that tells you it is fine to select all the shelves etc. in your array any upgrade them in a single go. If your storage array does not allow one shelf etc. at a time, when you replace it pick a vendor that does allow this.
Re:"they should have used ZFS or btrfs" by godefroi · 2009-10-12 02:18 · Score: 1

Happened to us. Our (previous) SAN vendor decided to replace a battery in a controller while the system was online. *POOF* no data.
This particular SAN is now referred to as the "Fisher-Price SAN" internally here.

--
Karma: Poor (Mostly affected by lame karma-joke sigs)
Re:"they should have used ZFS or btrfs" by asaul · 2009-10-12 02:42 · Score: 1

I am talking about 6Tb in the time of DLT7000 drives and 9G disks. As I recall a "failure" was mostly window overruns caused by jammed tape drives or crap performance. I think it also used AdvFS clones which also had some issues.
The moral of the story anyway was backing up what was needed, not what was there.

--
"If everybody is thinking alike, somebody isn't thinking" - Gen. George S. Patton
Re:"they should have used ZFS or btrfs" by atari2600 · 2009-10-12 02:43 · Score: 1

Your comment is fucking retarded. Do you even know what a striped array is? (that's what I assume you mean by "stripped"). How is it +5 Insightful?
Re:"they should have used ZFS or btrfs" by Spazmania · 2009-10-12 02:55 · Score: 1

i'd hazard a guess that the offsite backups were corrupted as well somehow or were silently failing.
That seems unreasonably common with backup systems.

--
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
Re:"they should have used ZFS or btrfs" by Anonymous Coward · 2009-10-12 03:02 · Score: 0

ZFS is not only a file system it also has mirroring and RAID built in it.
Re:"they should have used ZFS or btrfs" by Cytotoxic · 2009-10-12 03:18 · Score: 2, Informative

We had a similar failure here. Had to replace a battery in a redundant SAN controller... it was under support with the vendor so they sent out a rep to do the fix - everything went just fine. Then poof - one whole shelf went dark. No problem, we designed the system to handle that - all arrays striped vertically with no two drives on any one shelf. Then the vendor took the backup card offline to repair the problem. Poof - another shelf down. Uh, oh! A little more work got the shelves back on line - but the drives had been totally corrupted by the glitchy controller. Luckily, not being idiots our engineers had full backups. Unluckily it took days to fully recover everything. Lesson learned - there is no such thing as a safe fix. We moved critical systems off of our "Fisher-Price SAN" over the next several months and it has not caused any additional catastrophes, but we learned a lot about redundancy - a single hardware failure can cut through a lot of layers of redundancy and bring you down hard when the failure mode is less than "off".
Re:"they should have used ZFS or btrfs" by kriston · 2009-10-12 03:22 · Score: 1

The backups may be intact but from the old system.
I would not hesitate to guess that Microsoft was migrating accounts to a new system that they developed, like they did with Hotmail. This time they may have really old backups on the old system and incomplete/unuseable backups on the new system and not an easy way to restore everyone's accounts without restoring them to something from several months ago.
I really don't think there is a cautionary tale here for technology. It is one for user behavior and the need for users to safeguard their own data.

--
Kriston
Re:"they should have used ZFS or btrfs" by cbreaker · 2009-10-12 03:43 · Score: 3, Informative

The technology is available to get good, solid backups for anything. They just didn't use it, test it, verify it, etc. And in the case of this, users cannot back up their own data. And what they lost isn't backups.

I used to have one of these things.

The phone is (like someone above pointed out) a local cache of what's on the server side. The live database/back end is what crashed. When you make a change on the phone, it immediately sends that change to the server. You can login to the sidekick web site and make changes there, which appear quickly on your phone. If you reboot your phone, it will retrieve anything it needs from the server side. Apparently, the phone doesn't even keep a permanent local copy on some sort of non-volatile storage (hence "Don't turn off your phone.")

It's like someone that uses Google apps and stores all their documents on their system. If that system should go down, you'd be screwed, except that you COULD back up your documents locally. With this case, you can not.

I don't really like the term "cloud computing." All it means is server storage somewhere on the Internet. Under this term you could call any web site a "Cloud." It's ambiguous at best.

--
- It's not the Macs I hate. It's Digg users. -
Re:"they should have used ZFS or btrfs" by StuartHankins · 2009-10-12 04:10 · Score: 1

70 TB a day? <sniff> You have neater toys than me.
Re:"they should have used ZFS or btrfs" by amicusNYCL · 2009-10-12 04:16 · Score: 2, Funny

I don't know where you hang out at night, but where I hang out people who call themselves things like "webmistressrachel" are not men.
Like I said, your mileage may vary..

--
"Our two-party system is like a bowl of shit looking at itself in a mirror." - Lewis Black
Re:"they should have used ZFS or btrfs" by cbreaker · 2009-10-12 05:04 · Score: 1

Woah, Millennium?

Here I was thinking that computers have only been around for 50 years..

--
- It's not the Macs I hate. It's Digg users. -
Re:"they should have used ZFS or btrfs" by Panaflex · 2009-10-12 05:05 · Score: 1

The MySQL slave makes you feel dirty... good one...

--
I said no... but I missed and it came out yes.
Re:"they should have used ZFS or btrfs" by cbreaker · 2009-10-12 07:14 · Score: 1

That's not actually wrong, depending on the database.

Filesystem snapshot backups work the same way. It's not the same as powering off/on a machine; you don't have to contend with filesystem corruption which is the leading cause of DB corruption - not sudden crashes.

They call this "Crash-consistent" backups.

A modern database engine is a transactional one. Transactions (any change to the database) are first written to a transaction log. The log is periodically written to the database. The log is then checkpointed to indicate that the transaction has been successfully written to the database.

If the database engine should crash or if the machine should crash and restart, the database engine will run a recovery because the database was not shut down cleanly. The recovery checks the transaction log to make sure all of the transactions have been written to the database. If there was a transaction being written to the database when the crash occurred, that transaction will be scrubbed and replayed from the log. If a transaction was being written to the log, that transaction is cancelled.

So, the worst case scenario is that you lose a couple transactions that were JUST being written to the log.

The likelyhood of a database being corrupted due to this process is very, very low. So, for most normal databases, this method is actually acceptable. For something very critical (such as order processing databases or something) you'll want to make sure you initiate some sort of database aware quiescence of the database file before taking the snapshot. This will tell the database engine to write all current transactions to disk and stop writing new ones for a moment so we can get our snapshot. This still leaves us with a crash-consistent copy of the database, but it's nearly guaranteed to be consistent when we start up.

So there you have it. Your engineer wasn't wrong.

--
- It's not the Macs I hate. It's Digg users. -
Re:"they should have used ZFS or btrfs" by cbreaker · 2009-10-12 07:22 · Score: 1

Yea, I agree that tapes are fairly reliable. I mean, at least as reliable as any other type of backup medium. And, the newer LTO and SDLT drives are less susceptible to broken leaders and such.

However, they can be fragile, and it's necessary to test backups at least once a year, and remove old tapes from rotation every so often depending on what kind of work load they are under. One of the biggest mistakes small companies make here is using the same tapes for years and never replacing the tape drive itself. I've seen drives in use that are 15 years old..

--
- It's not the Macs I hate. It's Digg users. -
Re:"they should have used ZFS or btrfs" by Anonymous+McCartneyf · 2009-10-12 07:26 · Score: 2, Interesting

The current millennium has only been around for nine years and ten months. (Eight + ten months if you are a traditionalist and think the Nineties ended in 2001.)
Then again, good back-up policy predates computers. If Microsoft/Danger had the same dedication to backups of valuable documents as monasteries did back in the 1000s, this sort of mess wouldn't have happened.

--
There is a fine line between recklessness and courage... -- Paul McCartney
Re:"they should have used ZFS or btrfs" by benedictaddis · 2009-10-12 11:15 · Score: 1

You lost me at 'Viola'
Re:"they should have used ZFS or btrfs" by Anonymous Coward · 2009-10-12 12:10 · Score: 1, Funny

>To recover the application all they had to do was backup a few gig of config and binaries, and restart slurping data from upstream again. Viola - backup stripped down to nothing
I'm interested in learning more about how a string instrument factored into your solution.
Re:"they should have used ZFS or btrfs" by Meski · 2009-10-14 15:22 · Score: 1

No, but the mirror drive in my attic (which I've named Dorian) does. :^)

A server failure? by corsec67 · 2009-10-10 21:36 · Score: 3, Informative

A server failure caused all of the data to be lost?

No backups? Not even a spare server with a mirror of the data? No servers in different places? No off-site backup strategy?

As an aside, why would that data be stored in volatile non-battery backed up ram? All of my graphing calculators have a special battery to keep the ram, and they aren't even supposed to store important stuff. Flash is cheap enough these days, why should simply removing the battery cause important data to be lost?

--
If I have nothing to hide, don't search me

Re:A server failure? by Hadlock · 2009-10-10 21:58 · Score: 3, Insightful

Reportedly sidekicks are thin clients, other than making phone calls, everything on the phone is saved on the server side. Which is a special kind of retarded, in today's world where a blackberry performs all the same functions, and provides a local backup feature. But yeah as for the backups, all your backups are worthless if your data backup code is flawed, and nobody ever checks the backup tapes. When MS bought the service, they probably changed the location the servers were in, plugged everything back in, and kept going. I imagine a project like that would be on a short timetable, and "checking to see that the backup tapes are really being backed up to" is low on the priority list when the service is already live.

--
moox. for a new generation.
Re:A server failure? by PolygamousRanchKid+ · 2009-10-10 22:15 · Score: 4, Funny

A server failure caused all of the data to be lost?
Maybe it was the server failure . . . maybe they only had one . . . ?

--
Schroedinger's Brexit: The UK is both in and out of the EU at the same time!
Re:A server failure? by Anonymous Coward · 2009-10-10 22:36 · Score: 0

Sounds familiar. Having a backup protocol and a monkey to change the tapes (that's me!) is a whole world away from verifying that the data is actually backed up and that it's restorable.
Even so, a failure to have those backups is a shocking failure of service. Go go gadget lawsuits!
Re:A server failure? by Ma8thew · 2009-10-10 22:42 · Score: 0

That's a good point. I would guess Microsoft only bought Sidekick for the IP and staff, so the Sidekick itself would probably be more of an annoyance than an asset. Maybe this is Microsoft's way of getting rid of customers! Probably not, as the saying goes: "Never attribute to malice that which can be adequately explained by stupidity".
Re:A server failure? by Serious+Callers+Only · 2009-10-10 22:44 · Score: 4, Informative

There's some interesting background leaks on the takeover of Danger in this article which seem to imply they cut a lot of staff, and gutted the company, which is now running on a skeleton staff. So I guess it's not too surprising when this sort of mistake is made. Not the most reliable source, but they did definitely cut a lot of danger staff after the acquisition.
Re:A server failure? by Anonymous Coward · 2009-10-10 22:48 · Score: 0

It takes a special kind of retarded to make Blackberries look good!
Haha - captcha was 'cremated'!!!
Re:A server failure? by Anonymous Coward · 2009-10-11 00:03 · Score: 0

I imagine [...] "checking to see that the backup tapes are really being backed up to" is low on the priority list when the service is already live.
Sorry to say, I hope you're not a network or system administrator. If you are, please let us know which company we should avoid like the plague.
Re:A server failure? by Locutus · 2009-10-11 02:41 · Score: 3, Funny

in hindsight, firing the person(s) doing backups was probably not a good move. ;-)

LoB

--
"Anyone who stands out in the middle of a road looks like roadkill to me." --Linus
Re:A server failure? by mikael · 2009-10-11 03:13 · Score: 1

Reportedly sidekicks are thin clients, other than making phone calls, everything on the phone is saved on the server side. Which is a special kind of retarded,
What if you are a conference, lose your phone, drop it into the punch bowl, have it fall out of your pocket and run over by a taxi. Having some sort of remote backup seems a good idea.

--
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Re:A server failure? by adf92343414 · 2009-10-11 04:28 · Score: 1

You're modded funny, but having only 1 server might have been best given their apparent system design. If failure of only 1 server is enough to cause all of your customers to lose their data, you're better off having only 1 server. For example, having 3 servers would triple the risk of system failure.

Of course, it is rather insane to have a system where one server failure loses all your customers' data, but everybody (including, now, the Danger people) knows that.
Re:A server failure? by AmberBlackCat · 2009-10-11 06:58 · Score: 1

I don't think that's true. My nephew's Sidekick has a MicroSD memory card slot.
Re:A server failure? by Lennie · 2009-10-11 09:09 · Score: 1

Actually I think I read somewhere it was actually maintaince on a SAN which was having problems. But the stupidity is, they didn't have a good backup.

--
New things are always on the horizon
Re:A server failure? by Lennie · 2009-10-11 09:12 · Score: 1

I read somewhere it was a SAN which had problems and got maintenance and failed after that. But not having a backup is just stupid. They probably thought, it's fancy pancy RAID-whatever it'll never fail. Right.

--
New things are always on the horizon
Re:A server failure? by Tisha_AH · 2009-10-12 01:16 · Score: 1

I found it interesting in one of the news articles that Microsoft stated it was not a problem with their systems.
Guess what buddy. When you buy a company for $500 million dollars (a year ago) then it becomes your system.
Microsoft should have had people standing at the front door the day after the sale was completed to go in and document/standardize the operating procedures of Danger. Since they purchased the company they assume all responsibility for it's operation. To now claim that they are blameless is disingenuous at the least.
Of course, since the name was not changed to "Microsoft" they are hiding behind the illusion of a shell corporation where they can always proclaim their innocence.
As other posters have pointed out, backups are one of the key tenets of IT. I would assume that Microsoft would have a grip on essential business procedures in the IT world.

--
Tisha Hayes
Re:A server failure? by MadKeithV · 2009-10-12 03:50 · Score: 1

Oh, we didn't fire the person doing the backups.
We fired the person doing the restores.

Why not store the data on phone permanent memory? by maxwell+demon · 2009-10-10 21:37 · Score: 1

I mean, having the data backed up in the net may be nice, but not having it stored on permanent memory in the phone IMHO is silly. Even without a server failure: What if you don't have net access, and your phone's battery gets empty? I also expect to be able to switch off my phone any time I want without data loss, no matter whether I currently have net access, and no matter whether I have actually changed and data since the last time having net access.

--
The Tao of math: The numbers you can count are not the real numbers.

What about the backups? by christwohig · 2009-10-10 21:38 · Score: 4, Interesting

So are we saying microsoft didn't have a backup? what about a offsite backup? Who wants to bet they were using their own backup solution? if they had a decent storage array they could have had snapshots and offsite replica's to restore from

Sidekick by nadaou · 2009-10-10 21:38 · Score: 4, Funny

shit, is that TSR still hanging around? goodness!

If the above means anything to you, "apt-get install joe mc" will make you smile as well.

--
~.~
I'm a peripheral visionary.

Re:Sidekick by RenHoek · 2009-10-10 21:52 · Score: 1

Hehe I was an avid user of Sidekick. And yes, 'joe' happens to be my unix editor of choice.
Re:Sidekick by Tetra · 2009-10-10 23:00 · Score: 1

apt-get install mc is the FIRST thing I do after debootstrap.

--
Regards, tEtra
Re:Sidekick by miffo.swe · 2009-10-10 23:50 · Score: 1

When i first read the headline i also thought, "wtf, is that still around?".
I do miss Norton Commander but Midnight Commander is really nice as a replacement.

--
HTTP/1.1 400
Re:Sidekick by tangent3 · 2009-10-11 03:06 · Score: 2, Informative

Ohh yes.. Need an ASCII table? It's just a Ctrl-Alt away
Re:Sidekick by AliasMarlowe · 2009-10-11 04:12 · Score: 1

shit, is that TSR still hanging around? goodness!
Ah, the ASCII table and an actual clipboard were essential. I'm sure there's a 360k floppy with it somewhere in the barn, along with other delights of the early 1980s, like WordStar.

If the above means anything to you, "apt-get install joe mc" will make you smile as well.
In a few years, it'll be "apt-get pension"...

--
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
Re:Sidekick by osu-neko · 2009-10-11 07:07 · Score: 1

Smile, hell, I can't get any work done until joe is installed...

--
"Convictions are more dangerous enemies of truth than lies."

Backups? by ipsi · 2009-10-10 21:41 · Score: 3, Interesting

Either this is a really, really serious meltdown which completely killed not only the server but all their backups as well (and what're the chances of that?), or their IT guys have been really, really slack and just didn't make any backups...

Guess they should have used a better smartphone, like *anything* else on the market... Even the cloud-centric Pre will still work if you don't have access to the Cloud - even if Google and/or Palm dies, you'll still have all your information on your phone! Jesus... Doesn't inspire confidence...

Re:Backups? by TheSunborn · 2009-10-10 21:47 · Score: 5, Insightful

Or this was really a software error, and the backup servers in an other datacenter, just copied the faulty data/delete command.
They should really be far to big to have all their data stored in a single datacenter with no offsite backup. (Or they should have an entry on thedailywtf.com)
Re:Backups? by ipsi · 2009-10-10 22:10 · Score: 1

Yeah, that's a fair point. They don't actually say whether it was hardware or software. Just 'server failure'.
Re:Backups? by asaul · 2009-10-10 23:32 · Score: 1

An article linked to above suggested the cause was a firmware upgrade failure on a HDS array - sounds like maybe it lost the config or did something nasty during the upgrade. At any rate the core question is where is the backup tape?

--
"If everybody is thinking alike, somebody isn't thinking" - Gen. George S. Patton
Re:Backups? by 1s44c · 2009-10-11 03:47 · Score: 1

Either this is a really, really serious meltdown which completely killed not only the server but all their backups as well (and what're the chances of that?), or their IT guys have been really, really slack and just didn't make any backups...
.. or they decided to save money by firing the wrong smart and slightly unstable guy and he took revenge ..
Re:Backups? by sincewhen · 2009-10-11 14:19 · Score: 1

Under that configuration the "backup" data centre isn't one, it is more like a RAID-1 duplicate for high availability.
Remembering, of course, that RAID is not backup!.

--
-- Braden's law of data: All data spends some of its lifetime in an excel spreadsheet.

Microsoft/Danger by delta98 · 2009-10-10 21:52 · Score: 3, Funny

'nuff said.

Priceless ... by foobsr · 2009-10-10 21:54 · Score: 0

Microsoft/Danger has stated that they cannot recover the data but are still trying.

CC.

--
TaijiQuan (Huang, 5 loosenings)

Re:Priceless ... by thePowerOfGrayskull · 2009-10-11 10:26 · Score: 1

Microsoft/Danger has stated that they cannot recover the data but are still trying. CC.
There will be closed captioning on the recovery attempt? What channel?!

It's The Backups Stooped by tres · 2009-10-10 21:57 · Score: 4, Insightful

This is an issue of irresponsibility. Plain and Simple. The company responsible for maintaining the data should -- at the very least -- have had some full system backup from last month. If they had some old backup somewhere at least you could chalk it up to systems failure or bad backup tape or bad admin or something.

But the fact that there is no backup anywhere indicates brazen negligence on the part of everyone responsible for the data. Everyone who had a part in designing the system and managing the system is culpable. The most ridiculous part of this is the over-reliance on server-side data storage by the sidekick designers.

--
Notes From Under *nix: blas.phemo.us

Re:It's The Backups Stooped by 1s44c · 2009-10-11 03:54 · Score: 4, Insightful

But the fact that there is no backup anywhere indicates brazen negligence on the part of everyone responsible for the data. Everyone who had a part in designing the system and managing the system is culpable. The most ridiculous part of this is the over-reliance on server-side data storage by the sidekick designers.
I will bet you there were good people -SCREAMING- to fix the backups, implement and test failover and all sorts of other good things. In my experience things like this are due to management refusing to spend money fixing problems that have not lost customers yet.
Re:It's The Backups Stooped by tres · 2009-10-11 05:13 · Score: 1

So it sounds like someone decided to upgrade the hardware storing the data without making a backup.
Any 'good people' involved should have had a plan. The plan should have included some back-out procedure (which implies step 1 is that data replication takes place and is verified). If management didn't want to pay for the the physical requirements of the plan, then I hope that the systems engineers got it in writing (via submission and rejection of a written plan). For something this critical, doing the paperwork not only makes it much easier to respond to problems when they occur, it means there's a nice big paper CYA there when management decides to do things on the cheap.
So I agree, management is just as culpable as the guy(s) who decided to do this without having a proper plan in place.
I still can't get over the fact that so much relied upon one single SAN operating. It's just insane.

--
Notes From Under *nix: blas.phemo.us
Re:It's The Backups Stooped by foxylad · 2009-10-11 13:26 · Score: 1

Mod parent up. At a mid-sized company who shouldn't remain nameless but will, I built a web service for corporate users. I got redundant servers and mirrored disks past the management, but when it came to backups, they knocked me back and told me to dump everything to the development server each night. I left soon after, and the development server got repurposed as a desktop for the next new salesperson.
I kept the relevant emails from management, and practised a cynical laugh for when they phoned asking me to fix the system when it crashed. But despite three years of complete neglect, the system keeps running - I built the thing too darn well. The sad thing is the management probably would see this as vindicating their decision.

--
Do as you would be done to.
Re:It's The Backups Stooped by metaforest · 2009-10-11 16:52 · Score: 1

Ok,
What is known?
Here's a quick summary.
1. Microsoft buys Danger as they complete the Sidekick LX2009... Sidekick is based on a BSD variant.
2. Microsoft tries to kill off the Sidekick Project shortly after getting control of Danger. T-Mobile makes it clear that they will hold MS/Danger to the agreements for the LX 2009. http://www.hiptop3.com/archives/microsoft-lays-off-danger-employees
3. After the LX2009 is released to manufacturing, The Sidekick team is laid off.
4. Further resource trimming and outsourcing results in a dangerous lapse in system management...
5. Danger's SAN service collapses during surgery by Hitachi Technicians.... Danger's platform fails. For some unknown reason there were no backups of the SAN... http://www.hiptop3.com/archives/what-caused-the-sidekick-fail
conclusion:
If I were a betting man I'd say that Microsoft was well into the 'extinguish' portion of their 'Embrace, Extend, Extinguish.' business cycle with the Danger acquisition when this happened. I have no doubt that this was not the way that MS intended to apply the final deathblow.... but hey... mission accomplished. Sidekick is dead. Danger is all but dead. T-Mobile is in a world of hurt....

Re:Why not store the data on phone permanent memor by Anonymous Coward · 2009-10-10 21:58 · Score: 4, Informative

Because the entire Sidekick architecture is very client-serverish, not transparent as with ordinary phones (GPRS/EDGE/UMTS/etc. through a NAT to internet at large); the server is supposed to be responsible for all that data, and the phone is just caching it. Given that architecture, asking why the local copy is on volatile RAM is analogous to asking why your CPU doesn't have a battery backup for system RAM, or even L2 cache.

That's one of the big reasons I didn't go with a sidekick, even though they have (or had, last I was shopping around) basically the cheapest internet plans available; they push all sorts of stuff that's handled by the phone in any other system off to the Danger servers,. While that does expose you to other people losing your data, as seen here, I didn't even consider that. I just like having a direct internet pipe, so I can run whatever software I want locally.

That said, there are plain benefits to the Sidekick model, for some people. Basically, if you don't want to do funny stuff on your phone, and if you're no less incompetent than the MS/Danger sysadmins, it's better. After all, if you drop your sidekick in a toilet, run over it with a truck, and vaporise it with a plasgun, you can just get a new one and have all your data back -- which is good, since if you're 95% of people, you've _never_ backed up your phone's data. But it's not for me, and given your desire to have your phone work as a PDA even if you power-cycle it in a wilderness/cave/other net-less place, it's not for you either.

Microsoft was testing the US gov edition by AHuxley · 2009-10-10 21:58 · Score: 5, Funny

Right feature, wrong server? MS understands the need for a "Rose Mary Stretch" default setting.
The congress critters have learned a lot from the "terrible mistake" of email backups.
From cute page boys to Iran contra, MS can market this as a feature.

--
Domestic spying is now "Benign Information Gathering"

Re:Microsoft was testing the US gov edition by Anonymous Coward · 2009-10-10 22:02 · Score: 0

LOL

This by Anonymous Coward · 2009-10-10 22:00 · Score: 0

is one of the reasons I don't like cloud computing.

The other is that you need internet to get to your stuff. I've had very negative experiences with my previous ISP (sometimes two weeks without more than 5 minutes of internet) so now I don't trust anything that requires me to be online.

DIY phone backups by golfnomad · 2009-10-10 22:03 · Score: 4, Informative

There are 3rd party apps out there that will let you "backup" your phone data yourself. I personally use a program called bitpim www.bitpim.org (make sure you d/l latest version). It works with many different phone models and I have used it several times to "restore" my phone data (had 2 phones with hardware issues). It restored my calendar, notes, phone book and rings tones (that last one can save you d/l $$$). It is easy enough to install and use, you do not have to be a total geek to make it functional (but having one available to help you set up backups would probably help). Been working in the IT industry too long to rely on someone else backing up my data for me, and I will not encourage Murphy to have a party in my honor!

Re:DIY phone backups by Anonymous Coward · 2009-10-10 22:47 · Score: 0

I will not encourage Murphy to have a party in my honor!
*throws down balloons* But I've been working on it for weeks!
Re:DIY phone backups by RyuuzakiTetsuya · 2009-10-10 23:08 · Score: 1

Really? Mine's grape, and i use itunes.
Seriously, even ActiveSync looks good now.

--
Non impediti ratione cogitationus.
Re:DIY phone backups by Hadlock · 2009-10-10 23:31 · Score: 1

The Sidekick saves everything server side. Other than making phonecalls, it's a thinclient.

--
moox. for a new generation.
Re:DIY phone backups by Anonymous Coward · 2009-10-11 02:10 · Score: 0

Why are "backup" and "restore" in quotes? These are not terms without meaning. Are you implying that these are really code words for, "gave me a happy ending" or "took some blow off a hooker's ass?"
Re:DIY phone backups by HaloZero · 2009-10-11 03:20 · Score: 1

Unfortunately, BitPim would not have been an option for this situation, as the T-Mobile Sidekick does not store any data locally - simply plus down an image from the giant server in the sky on a reboot.
And that's the 'server' that's gone, now.
This is an epic fail; many bards will send their sons to school on the song that will be sung from this gross misadventure.

--
Informatus Technologicus
Re:DIY phone backups by Anonymous Coward · 2009-10-11 07:34 · Score: 0

I have an app that backs ALL the data on my phone locally, works great and its free. It's called iTunes.
I'll back up my own phone data thanks.

WTF by ShooterNeo · 2009-10-10 22:08 · Score: 4, Insightful

This is unbelievably bad. The real problem is : why aren't there incremental off site backups to another server farm? A weekly binary difference snapshot would have made this failure less catastrophic.

Ultimately, with a complex application like this, you can't guarantee 100% that the code doesn't have a bug in it that could result in loss of user data. You can be ALMOST sure it won't, but 100% is not possible with current analysis techniques. (even a mathematical proof of correctness wouldn't protect you from a hacker)

But a properly done set of OFFLINE backups, stored on racks of tapes or hard disks in a separate physical facility : you can be pretty sure that data isn't going anywhere.

Re:WTF by Anonymous Coward · 2009-10-10 23:57 · Score: 0

So how long had the cloud been functioning without critical failures prior to this one?
Re:WTF by Hercules+Peanut · 2009-10-11 01:59 · Score: 1

Ultimately, with a complex application like this, you can't guarantee 100% that the code doesn't have a bug in it that could result in loss of user data. You can be ALMOST sure it won't, but 100% is not possible with current analysis techniques. (even a mathematical proof of correctness wouldn't protect you from a hacker)
I admit that this is beyond the scale of aything I have managed but why can't we be 100% sure? I could take my backup and restore it to a test server where I could then check the data. That's what I used to do with my backup system. Who would perform backups without testing them to ensure they were useable in case of an emergency like this? O.K. who else?

I remember my tech mentor often saying to me "It should be tattoed on every systems administrator's forehead. It can only get so bad if you are properly backed up."
Re:WTF by Locutus · 2009-10-11 03:02 · Score: 5, Interesting

from that sounds of it, Microsoft couldn't turn Danger into a WinMo platform so they gutted it of employees instead of spinning it back off since they'd rather have it dead than spreading more Java but not dead before they had Pink out the door. So when you fire everyone from the top downward, you end up with people who's job is to turn the lights off when the doors get locked for good. they're not motivated much nor are they skilled in all of what used to be required to run the shop. Auto-pilot mode comes to mind.

So maybe the backup system needed to be checked or a CRON job verified or maybe the computer in Joe Fired's office was part of the backup process in some little way but important enough that the whole job was failing every night.

As I said, Microsoft tried to replace the Danger stack with Microsoft software but it wasn't going to work or got too much backtalk( thinking of Softimage ) and threats of everyone leaving if they had to port to the WiMo pile/stack. They moved anyone who'd go, over to Pink and left the rest to keep life support systems running. oops, they failed.

With Ballmer publicly saying that WinMo has been a failure, he's hearing the press say WinMo 6.5 is a yawn and expectations are that the Sony PS3 will eclipse MS XBox, and recently reading about how he's telling people that IBM doesn't know what they are doing....There's probably a new monkey-boy dance going on inside his office we'd probably love to see. It might be too dangerous being so close as to record it.

Will Microsoft ever make any profits from anything outside of MS Windows and MS Office? Ballmers 8-Ball still seems to be telling him something very different from what everyone else is seeing.

LoB

--
"Anyone who stands out in the middle of a road looks like roadkill to me." --Linus
Re:WTF by ShooterNeo · 2009-10-11 04:59 · Score: 1

I'm not referring to backups. If the backups are done off of the main database using known good code, and they are stored on media that is offline (whether that is hard drives that are physically off or tapes does not matter), then you can be reasonably certain that you have the data backed up, so long as nothing happens to the data storage facility and you included plenty of checksum data with each backup.
I'm saying that the main application code that runs the main database can't be guaranteed not to have a bug or flaw that could destroy data. Even if you could mathematically prove correctness for this large codebase, a hacker could possibly crack it and insert malicious code to destroy your data.
Re:WTF by agnosticnixie · 2009-10-11 05:19 · Score: 1

I'm not sure this counts as a failure, but when Rogers bought Fido, they had Fido cancel their sidekick licensing (replacing it with the blackberry pearl basically), and basically turned off the service without warning.
Re:WTF by Anonymous Coward · 2009-10-11 06:38 · Score: 0

As I said, Microsoft tried to replace the Danger stack with Microsoft software but it wasn't going to work or got too much backtalk( thinking of Softimage )

Is there a funny story about Softimage?
I remember lots of hype when MS bought them but heard almost nothing since.

If you want something done... by Anonymous Coward · 2009-10-10 22:09 · Score: 0

you gotta do it yourself. You're the only one who knows how valuable your data is and you're the one who will be affected by its loss. Backup your own damn data.

See it as an opportunity by miataninja · 2009-10-10 22:17 · Score: 1

Now is the opportunity for opensource to show what it's good for. Someone whip together a small app to extract all info from the Sidekick, put it up on sourceforge for FREE and you have tons of goodwill for OSS. Of course, the app should be Linux-only, thus forcing all Sidekick users to install Ubuntu...

Re:See it as an opportunity by AnotherUsername · 2009-10-10 22:52 · Score: 3, Insightful

Now is the opportunity for opensource to show what it's good for. Someone whip together a small app to extract all info from the Sidekick, put it up on sourceforge for FREE and you have tons of goodwill for OSS. Of course, the app should be Linux-only, thus forcing all Sidekick users to install Ubuntu...

Thus eliminating any goodwill that would have been gained...

Really, if you think that open source is a viable option for the masses, you shouldn't care which operating system a powerful application like the one you describe is on. If you really care about using open source for goodwill, releasing it simultaneously on all operating systems should be your goal. How is forcing people to use Ubuntu via software applications any different from Microsoft forcing people to use Windows via software applications?

--
I don't like Linux. This doesn't make me a troll.
Re:See it as an opportunity by Anonymous Coward · 2009-10-10 22:59 · Score: 0

Here comes the sarcasm train... (you can tell it's coming because of the ellipsis)
Re:See it as an opportunity by Bazman · 2009-10-10 23:36 · Score: 1

Or forcing them to the effort of sticking a live boot disk in, and maybe also making their system boot from CD.
Or forcing them to get the source and port it to Windows.
Re:See it as an opportunity by miataninja · 2009-10-11 01:26 · Score: 1

Yeah, the forcing part was a joke, change that to *encourage* and everyone will be happy :)
Re:See it as an opportunity by Xyde · 2009-10-11 20:05 · Score: 1

>Thus eliminating any goodwill that would have been gained...
And ironically, knowing most sidekick users destroying all the data on their (not backed up) Windows partition in the process!

Everybody needs a DRP by Anonymous Coward · 2009-10-10 22:17 · Score: 1, Funny

http://i.zdnet.com/blogs/dilbert_disaster_recovery_plan.jpg

Some reading by Linker3000 · 2009-10-10 22:59 · Score: 1

Forget all the speculation and semi-random after-the-fact suggestions, I am waiting for the write-up to discover how this monumental cock-up occurred. I hope I don't just learn that 'backups would have been a good idea'.

--
AT&ROFLMAO

Bad brand by MM-tng · 2009-10-10 23:00 · Score: 2, Funny

It's like being kicked in the side.

T-Mobile Press Release by mr_lizard13 · 2009-10-10 23:02 · Score: 2, Funny

All your data are lost by us.

--
"We live in a global world" - Harvey Pitt, former Securities and Exchange Commission Chairman

Re:T-Mobile Press Release by thisisaccount2 · 2009-10-11 03:00 · Score: 1

Somebody set up us the backup?

The clue is in the name of the software by Barsteward · 2009-10-10 23:10 · Score: 2, Funny

Microsoft/Danger

--
"The hands that help are better far than lips that pray." - Robert Ingersoll (1833-1899)

undelete (not de-corrupt) by buchner.johannes · 2009-10-10 23:10 · Score: 1

Have ZFS/btrfs developed tools to undelete or rescue files? It is pretty hopeless for ext[234] in my experience.

--
NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.

Re:undelete (not de-corrupt) by myxiplx · 2009-10-10 23:16 · Score: 2, Informative

Yes, it's called a snapshot. Take a snapshot and you can either roll the entire system back to that point in time, or just browse its contents and extract the files you want.
Re:undelete (not de-corrupt) by Cylix · 2009-10-11 02:20 · Score: 1

At the level of data they were working at I highly doubt they are using file systems in the manner you or I would normally use.
It does scale well.

--
"You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra

Thin client: Android, too? by KlaymenDK · 2009-10-10 23:12 · Score: 2, Insightful

Reportedly sidekicks are thin clients, other than making phone calls, everything on the phone is saved on the server side. Which is a special kind of retarded

Isn't that also how Android works?

I mean sure, the apps and such are on internal flash, but it's a different story for your "important" data such as email or contacts list. Heck, as I've learned, one can't even read one's existing ("synced") email without a working web connection. How they can call that "syncing", and what it's doing besides simple header indexing, is beyond me.

This is another reason I am loath to trust "the cloud" -- if I know I can be self-sufficient (in a data accessibility context), that's going to be much better than storing things on a corporate server and hope that said corporation is not going to, um, fall from the sky.

--
"Good news, everyone!"

Re:Thin client: Android, too? by Troed · 2009-10-11 00:01 · Score: 1

Isn't that also how Android works?
No.

--
it's in my head
Re:Thin client: Android, too? by RedK · 2009-10-11 01:33 · Score: 4, Informative

No, it's not how Android works, or how the iPhone works either. You can have cloud enabled applications, but you can also have local storage based ones without any problems. There is nothing in the SDKs that force you to use the cloud for storage at all.

--
"Not to mention all the idiots who use words like boxen."
Anonymous Coward on Monday August 04, @06:49PM
Re:Thin client: Android, too? by hedwards · 2009-10-11 02:17 · Score: 2, Informative

It's not as much of an issue. You might be using a product for which Data Liberation Front hasn't gotten to, but Google does have people working on any of those applications to make it possible to make ones own back up. I'm not sure what specifically triggered that, but I keep a backup of any important information on my computer which is backed up to my local backup mirror and remotely.
Re:Thin client: Android, too? by Varka · 2009-10-11 03:26 · Score: 1

No, that's not how Android works. You're also wrong on how the e-mail works. The e-mail applications sync data locally to the phone; no data connection is necessary to read e-mail once it's been downloaded to the phone. All my apps worked on a recent camping trip where I had zero data connectivity for over a week. Only thing that didn't work was my web browser, even google maps was functional albeit with no zooming. I even retrieved e-mail once or twice by surprise when a cell signal leaked through the mountains every now and then.
Re:Thin client: Android, too? by KlaymenDK · 2009-10-11 04:12 · Score: 1

That's odd. My Galaxy won't let me view anything except my inbox without fetching data, and Google Maps is right out.
I guess it must be a country/telco thing...

--
"Good news, everyone!"
Re:Thin client: Android, too? by Varka · 2009-10-11 04:59 · Score: 1

Quite possibly.

I could read all my existing google mail while out of service area

I pre-zoomed Google Maps to a level that included basically everywhere I wanted; the data was cached locally on the device. I couldn't scroll very much without encountering blank areas, and I couldnt zoom in and out because it could not load data.
Re:Thin client: Android, too? by Moridin42 · 2009-10-11 07:19 · Score: 1

WTF are you talking about? Android stores contacts locally. The email client does, too. The specialized gmail client may not, but you can always pull your gmail account via the generic email client.

--
I don't expect morality, equality, consistency, or justice from the law. I expect only legality.
Re:Thin client: Android, too? by webreaper · 2009-10-11 19:55 · Score: 1

To elaborate, Android does both. All of the contacts, emails, etc are stored client-side, but the server-side infrastructure (I will never ever ever use the phrase 'cloud' without inverted commas) synchronises with the device to provide a backup and enhanced storage. So your contacts are sync'ed to Google's servers, and your Gmail emails are stored on the server with the most recently-recieved ones stored locally on the device.
By all accounts that's how Sidekick works too, since phone users were told to "not turn off their devices to avoid further data loss".
It's a fine model, but only if your server-side infrastructure is resilient. I have reasonable confident in Google's backup/replication strategy, but even so I still have offline backups of the data I absolutely need (such as a .csv file with all my contacts in it).

Do they run FreeBSD+Apache? by philcolbourn · 2009-10-10 23:13 · Score: 1

Netcraft says www.danger.com uses freeBSD+Apache+PHP http://uptime.netcraft.com/up/graph?site=www.danger.com

Re:Do they run FreeBSD+Apache? by argent · 2009-10-11 00:16 · Score: 1

A better question is... do they run AMANDA?
Based on this story, probably not.

RIP Sidekick by drinkypoo · 2009-10-10 23:13 · Score: 4, Insightful

With all the competition in the smartphone market today, this is probably an unrecoverable error. If they manage to recover the data then they will come off as heroes for having the courage to tell their customers promptly. Otherwise they just look like they are: incompetent. No great loss, though.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

Danger? by Anonymous Coward · 2009-10-10 23:14 · Score: 1, Funny

Why trust something that's named "Danger" to begin with?

Irresponsibility to EPIC proportions. by MrCrassic · 2009-10-10 23:25 · Score: 2, Insightful

HOW THE HELL DO THEY NOT HAVE OFF-SITE TAPE BACKUPS????

So essentially, everybody's Sidekick backup data, which is apparently critical should they ever lose power, was all concentrated on A SINGLE SERVER? I hope they at least say their tape backups caught fire and their replicated server died on the same day too...

Their retentions lines are going to be hot this Columbus Day weekend! The iPhone is getting cheaper...

Re:Irresponsibility to EPIC proportions. by MrCrassic · 2009-10-10 23:28 · Score: 1

Forgot to mention that a supporting reason for why T-Mobile will deal with cancellations left and right for a little while is because tons of people hate the Sidekick anyway, and this EPIC FAIL is an EPIC excuse to jump ship right now.
Re:Irresponsibility to EPIC proportions. by AHuxley · 2009-10-10 23:48 · Score: 2, Insightful

"Back him up, boys!"
T-Mobile says, "but I thought you were going to back us up!"
Robbie says, "We didn't get rich buying a lot of servers, you know!"

--
Domestic spying is now "Benign Information Gathering"
Re:Irresponsibility to EPIC proportions. by Anonymous Coward · 2009-10-11 01:15 · Score: 0

The iPhone is getting cheaper...
You had some interesting issues there, except when i saw that line. The joke is really old and not worth it.

This may have to do with the "Pink" project fiasco by HonestButCurious · 2009-10-10 23:32 · Score: 5, Interesting

According to a very long article on AppleInsider:
http://www.appleinsider.com/articles/09/10/09/exclusive_pink_danger_leaks_from_microsofts_windows_phone.html&page=3

MS was misleading T-Mobile about the state of Sidekick support, and apparently charging hundreds of millions every year for, and I quote "a handful of people in Palo Alto managing some contractors in Romania, Ukraine, etc". This is apparently because most of the Sidekick devs had either moved to Pink or quit out of disgust.

Huh? by msauve · 2009-10-10 23:36 · Score: 2, Insightful

"incremental..."weekly binary difference"

Uh, those would do nothing in this case, where it appears the entire DB has been lost. You need a regular full backup, or diffs and incrementals are just cruft. It appears they don't even have that, since there's no talk of restoring to month (or ?) old data.

--
"National Security is the chief cause of national insecurity." - Celine's First Law

Re:Huh? by Anonymous Coward · 2009-10-11 00:32 · Score: 0

Weekly binary differences would allow you to apply those diffs that are good to the base backup, and obtain a snapshot at most 1 week before the DB failure at a fraction of the cost.
Gee, if you blast any solution that requires a IQ > 110 to understand, I hope you're working for my competitor.
Re:Huh? by kobaz · 2009-10-11 03:20 · Score: 2, Interesting

"incremental..."weekly binary difference"
Uh, those would do nothing in this case,
I agree. Weekly? WEEKLY?!!! What is this... 1980? Hell even in 1980 people with critical data on their apple2 spreadsheet kept more than one copy of their data on a daily basis.
I'm not sure why, but one of our customers had a backup daemon running with just incrementals being done. There was one full backup done two years ago and an incremental every night. Well.. they had a computer fry one weekend. It was a crappy windows backup program with only a point and click interface. No way in hell am I going to sit there for days and click restore on 600+ individual backups. I wrote a pretty cool little windows script using autoit3. It was a real pita to write though since every button clicked had to have a "wait-for-next-window" sequence. After five days of the restore script running, they were back in business.
Since then I've gone through every customer's system and made sure they have full backups done weekly and incrementals done daily. And we also do routine backup testing.
A good quote:
"A backup is not a backup, until you try and restore from it"

--

The goal of computer science is to build something that will last at least until we've finished building it.

Your boss by SmallFurryCreature · 2009-10-10 23:42 · Score: 1

He also hopes that you are not going to learn only now, that backups would have been a good idea.

You SHOULD have said, I hope THEY don't just learn, 'backups would have been a good idea.'

Your boss again, this is what your meant right?

--

MMO Quests are like orgasms:

You may solo them, I prefer them in a group.

Re:Your boss by Linker3000 · 2009-10-11 00:46 · Score: 1

Erm, not quite.
I was stating that if I read the report I hope I don't just learn that MS/Danger concluded that backups would have been a good idea.
FWIW: Our corporate backup strategy (for which I am responsible) comprises a mesh of servers across some of our sites (we have 35) that run daily backups, syncing data sets between sites and providing a three-tier level of daily, weekly and monthly snapshots. I can restore any single file back to its state within the last 90 days (more if needed) at the click of a button OR dump entire backups to tape/DVD/USB stick to take them back to site in the event of a major outage.

--
AT&ROFLMAO

What do you expect with a name like by Masterofpsi · 2009-10-10 23:42 · Score: 2, Funny

Danger?

GNU/Linux, Apple/Overrated, Slashdot/GoingDownhill by emmagsachs · 2009-10-10 23:44 · Score: 1

MS's branding is actually a pleasant twist in advertising. Instead of spitting in our faces and telling us it's raining, MS now has the ballmers to tell it like it is.

Interesting article about Pink/Danger/Sidekick by Richard+W.M.+Jones · 2009-10-10 23:46 · Score: 4, Interesting

Interesting article about the Microsoft/Pink/Danger/Sidekick relationship and leaks indicating that Microsoft are trying to kill Sidekick without telling the partners. Microsoft would never do such a thing of course ...

Rich.

--
libguestfs - tools for accessing and modifying virtual machine disk images

Re:Interesting article about Pink/Danger/Sidekick by malevolentjelly · 2009-10-11 03:52 · Score: 1

Wow, this seems surprisingly factual for an article on Roughly Drafted, but it basically loses it halfway through when he starts predicting the doom for all Microsoft products and brands as he tends to do in every article he writes.
It sounds to me like this is built off an angry and possibly drunken rant from a former Danger dev who felt somehow lost or screwed in the Microsoft enterprise atmosphere.
It sounds like Microsoft wants to leverage their software technologies more heavily (actually the Zune HD is a good example of this) in order to have mobile devices that have better interfaces and media capabilities and are more application-centric like the iphone. Chances are the Danger devs are frustrated with the loss of their cloud architecture and not receiving any proper vision from their management as to where their work is going. Microsoft is a power technology firm, after all, so they're going to want their various teams to utilize their core technologies to produce more powerful interfaces and usability paradigms. There's obviously a strong break in management. It's also possible that the Danger devs aren't taking the sort of leadership they should in this or not open-minded enough to learn new technologies.
If Danger really did build a phone on netbsd, then chances are Danger's servers are BSD servers. If Microsoft wasn't continuing the sidekick line or architecture, chances are these servers were still running BSD. There would be no reason to migrate them to Windows. That's likely why they were maintained by external contractors and not simply virtualized into the Microsoft cloud.
Re:Interesting article about Pink/Danger/Sidekick by Anonymous+McCartneyf · 2009-10-11 12:50 · Score: 1

Well, if Microsoft was trying to kill Sidekick, then this is a lucky break for them -- or would have been if T-Mobile weren't so open about who was (not) managing the server now that it's down.
No one, but no one, is going to want to use that particular model after this. T-Mobile no longer sells nor leases them -- that's official as of today.

--
There is a fine line between recklessness and courage... -- Paul McCartney

It is an ancient story, endlessly repeated by SmallFurryCreature · 2009-10-10 23:52 · Score: 4, Informative

It is development dome.

Two companies enter, MS comes out, slightly fatter.

If you do business with MS, you are riding a tiger with the brains to realize that lunch is only a roll on the ground away.

MS really should be renamed to BubbaSoft. Get into the shower with BubbaSoft and you know what is going to happen.

--

MMO Quests are like orgasms:

You may solo them, I prefer them in a group.

Re:It is an ancient story, endlessly repeated by harmonise · 2009-10-11 05:32 · Score: 4, Funny

Get into the shower with BubbaSoft and you know what is going to happen.
Just don't drop the SOAP.

--
Cory Doctorow talking about cloud computing makes as much sense as George W Bush talking about electrical engineering.
Re:It is an ancient story, endlessly repeated by Anonymous Coward · 2009-10-11 06:12 · Score: 0

RAPE!
Re:It is an ancient story, endlessly repeated by 10101001+10101001 · 2009-10-11 11:52 · Score: 1

MS really should be renamed to BubbaSoft. Get into the shower with BubbaSoft and you know what is going to happen.
Now I know why GNU/RMS is safe.

--
Eurohacker European paranoia, gun rights, and h
Re:It is an ancient story, endlessly repeated by sowth · 2009-10-11 14:07 · Score: 1

HEY! Bubba not soft, Bubba be hard! Bubba mad. Bubba eat small furry creature now!

May I be the first to say: by lul_wat · 2009-10-10 23:53 · Score: 0

Haha!

--
Divide a cake by zero. Is it still a cake?

Microsoft/Danger by Anonymous Coward · 2009-10-10 23:55 · Score: 0

There's a reason why they call it "Danger"...

Re:Why not store the data on phone permanent memor by Anonymous Coward · 2009-10-10 23:56 · Score: 0

After all, if you drop your sidekick in a toilet, run over it with a truck, and vaporise it with a plasgun, you can just get a new one and have all your data back -- which is good, since if you're 95% of people, you've _never_ backed up your phone's data.

With services like MobileMe for the iPhone, that's no longer a unique advantage.

Re:Irresponsibility to EPIC proportions. -- yes by cortana · 2009-10-10 23:59 · Score: 1

HOW THE HELL DO THE CUSTOMERS NOT TAKE BACKUPS THEMSELVES

If the data is important, then the only way to make sure it doesn't get lost is to take some responsibility for your actions and do back-ups yourself. If your service provider does not let you then don't use the service, it's that simple.

Sidekick started out vulnerable by Anonymous Coward · 2009-10-11 00:09 · Score: 0

I never even heard of the sidekick, until a story in the news about one belonging to Paris Hilton being hacked, and all her personal information being spread on the net. Damn I thought, I'm never buying one of those pieces of shit. Everyone else said WOW check out Paris Hiltons fabulous new inexpensive texting device targeted at the 12-14 teen market, Lets all get one of those! My data is insecure? my data is unsafe? my data is stored out side my phone? Who cares, Paris Hiltons got one, she knows everything about informaton technology and mobile telecommunications after all.

Re:Irresponsibility to EPIC proportions. -- yes by MrCrassic · 2009-10-11 00:09 · Score: 3, Interesting

A) The Sidekick apparently doesn't store anything, so customers can't make backups that easily, even if they wanted to, and

B) Danger designed this phone to store everything server-side. It is incomprehensibly foolish to not include a SUPER SOLID backup strategy as well. This problem has been ongoing for several days now; I don't know if the data was fine on the onset of this problem, but the infuriated customers have all the right to demand everything AND the kitchen sink for losing practically everything they had.

btrfs by Anonymous Coward · 2009-10-11 00:10 · Score: 0

.. might be a fantastic file system, but it is not ready for production use just yet. Please don't deploy it on your servers.

Re:btrfs by Lennie · 2009-10-11 09:26 · Score: 1

You do make backups, right ? I guess if your company or product is called Danger, maybe not.

--
New things are always on the horizon

Yesterday... all those backups seemed a waste... by argent · 2009-10-11 00:18 · Score: 5, Funny

Yesterday,
All those backups seemed a waste of pay.
Now my database has gone away.
Oh I believe in yesterday.

Suddenly,
There's not half the files there used to be,
And there's a milestone hanging over me
The system crashed so suddenly.

I pushed something wrong
What it was I could not say.
Now all my data's gone and I long for yesterday-ay-ay-ay.

Yesterday,
Need for backup seemed so far away.
Seemed my data were all here to stay,
Now I believe in yesterday.

Anonymous

Re:Irresponsibility to EPIC proportions. -- yes by cortana · 2009-10-11 00:19 · Score: 0, Flamebait

This is exactly my point. The irresponsibility of epic proportions is on the side of the users who didn't realise that the design of the system invites them to be fucked over by a data loss incident. They have no one to blame but themselves.

That could Nehehever happen to Meeeeeeee!! by Anonymous Coward · 2009-10-11 00:20 · Score: 0

...as everyone neurotically signs into their servers to verify the status of their backup jobs and MD5 checks... only a few will go the distance to try uncompressing/unencrypting their archive files and browse around inside the restore area randomly checking files to ensure they look good.. what? you're doing backups on a machine without ECC RAM and writing the image files across a USB to a consumer-grade hard-drive or burner without checking the MD5s --and on that type of hardware you're doing compression/encryption without turning around and automatically decrypting/decompressing and byte-verifying the files against the backup source BEFORE calculating the MD5 on the image file? (And you can trust that MD5 calculation because of why?) Oh yeah, see how much you are "saving" by repurposing old non-ECC equipped hardware to run your backups or buying consumer-grade SANs from Fry's which use whoknowswhat? All those backups they ever created may have some serious problems just waiting for you to discover at the worst possible time...

Foggy idea? by Porchroof · 2009-10-11 00:22 · Score: 2

Cloud computing?

That ain't no cloud. That's the fog obscuring the view of sanity.

IT has been trying this crap ever since the emergence of personal computers.

--
Fata viam invenient.

Re:Kicking it oldskool by Anonymous Coward · 2009-10-11 00:33 · Score: 0

The title is a reference to this being an old fucking troll from the start of this year.

Apparently, it's new to you. Don't feed the trolls, dumbass.

It's all in the name by dfdashh · 2009-10-11 00:37 · Score: 1

Clerk: Danger Powers personal effects [shows box of off-site tapes and such]
MS: Actually my name is Microsoft Powers...
Clerk: It says here - name: Danger Powers
MS: No no no no no... Danger is my middle name
Clerk: Okay, Microsoft Danger Powers...

--
df -h /my/head

DIY surgery. by Anonymous Coward · 2009-10-11 00:39 · Score: 0

That we gather. However the weakness isn't the model but who's doing the serving. Why can't the Sidekick like the Blackberry use an independent server?

The smile by m0s3m8n · 2009-10-11 00:40 · Score: 1

So is Ballmer tossing chairs about?? I think not. Probably sitting back with a smile on his face.

--
Conservative, mod down for violating /. political norms.

Re:The smile by Linker3000 · 2009-10-11 06:08 · Score: 1

Ballmer thinks he had a scheduled chair toss 'sometime next week', but his phone seems to have lost its calendar entries and so he's not so sure any more.

--
AT&ROFLMAO

Re:Kicking it oldskool by 10Ghz · 2009-10-11 00:40 · Score: 0, Offtopic

I believe that the WTC towers were collapsed by detonation, for example

Ah, ignorance at it's finest...

http://www.debunking911.com/index.html

--
Lesbian Nazi Hookers Abducted by UFOs and Forced Into Weight Loss Programs - -all next week on Town Talk.

I work in telecom - Sr Tech Arch by Anonymous Coward · 2009-10-11 00:46 · Score: 3, Interesting

I work in telecom at a different provider. SAN upgrades are performed by the SAN vendor and, IME, they always demand a complete backup prior to starting any work unless the customer demands otherwise. If the customer doesn't want the backup, we always had to get a Sr VP to sign off. There were about 10 Sr VPs in the company - not like at a bank where everyone is a VP.

Usually, we would perform firmware upgrades only when migrating from old SAN equipment into new. The old equipment would be upgraded and used to upgrade either lower performing SAN or directly attached disk arrays that had been neglected for 5+ years. Being out of warranty was avoided. Most data is too important to risk that.

BTW, we measured storage in petabytes and our storage team was **never** on the cutting edge. We were always 2+ years behind other BIG companies. Our labs may have this quarters' latest and greatest, but it would take years to get from the lab into production service. That drove some vendors nuts, but not the "names you know."

I saw where someone above said they randomly verified recovery quarterly. What a joke. On my systems (Sr Tech Arch), we deployed with redundant systems at least 500 miles apart. Many systems did have instant fail over, but if instant fail over was not possible due to the amount of data, **never** would we lose more than 24 hours worth of data. Between, RAID-10, near disk backups, tape backups, remote replication and backups at the alternate location, we had the data. Further, to verify the alternate system worked, we swapped primary production locations every week. I and my internal customer slept very well, thank you.

I have a good friend who works at T-Mobile in their architecture design team. It will be interesting to see whether this subcontractor had anything to do with the issues. I called T-Mobile for an unrelated personal item on Tuesday, they were already swamped with calls and said that a sub to Microsoft was working the issue. I'm thinking MS outsourced/bought the provider and the garage shop team was still running things - but I don't know. I do know that Microsoft has excellent engineers for systems like this and they are more cautious than google with their upgrades and deployed systems. Over the years, I've had to deploy a few Windows-Server-based solutions - usually for voice response systems. I was never really happy doing it. I don't trust backup systems much unless it is really a mirror that I can get to 1 file from 3 weeks ago easily.

Ok, back to upgrading the company email servers. A system version upgrade will impact users for less than 10 minutes - probably under 3 minutes, but we like to under promise and over deliver.

The Tao of Backup by ei4anb · 2009-10-11 01:05 · Score: 4, Interesting

Sadly it comes to pass that every generation the Tao of Backup is forgotten and must be relearned through such trial by fire. http://www.taobackup.com/

XFS and ODIRECT=1 by Anonymous Coward · 2009-10-11 01:17 · Score: 0

With mysql.

Weird file I/O problems and file system corruption. Went to backup & tried to restore.
Restore failed - more filesystem corruption.
previous backup - restore failed and corrupted the filesystem yet again...
repeat many times etc etc
eventually a very old backup restored successfully, but there was data loss.

The bug was in the odirect handling within xfs filesystem under several linux kernel versions. The DB server would trigger it under high I/O, and the restore would also trigger it as well.

These will be big systems and i'll bet they are hitting a software bug on their primary and secondary (running the same software) systems.

Now I run an automated dump & restore. The secondary servers are always running yesterday's backup.

Re:XFS and ODIRECT=1 by cbreaker · 2009-10-12 05:08 · Score: 1

Do you actually think the Sidekick server will be that big?

For one, it's on Windows. Danger is owned by Microsoft, and these servers are housed at a Microsoft data center. For two, you really don't need huge hardware anymore.. just redundant hardware.

A single dual-socket quad-core box with enough RAM could probably handle a million Sidekicks, if not more. There's such little data and it's requested over a cell phone data network. Not to mention they don't synchronize everything all the time. Just little changes.

--
- It's not the Macs I hate. It's Digg users. -

When Paranoia Pays by the+eric+conspiracy · 2009-10-11 02:01 · Score: 1

Ahh. Last week I picked out a new phone for my T-Mobile service. Sidekicks were offered. When I looked up the details and saw it ran an MS OS I moved on.

It is amazing how often not choosing Microsoft pays off.

Re:When Paranoia Pays by larry+bagina · 2009-10-11 04:53 · Score: 2, Informative

it runs NetBSD and Java.

--
Do you even lift?
These aren't the 'roids you're looking for.
Re:When Paranoia Pays by Ilgaz · 2009-10-12 00:32 · Score: 1

But it is purchased by Microsoft. One wonders if MS "Convert a working thing to Windows server" project went wrong? Conspiracy? This is the same company who tried to convert working perfect FreeBSD Hotmail to Windows server 3 times, failing horribly 2 times and at last managed (!) to do it. Now read each "hotmail performance" "down" news that way, you will know the real reason. They can't scale and their engineers&management are incompetent to run UNIX.
And people wonder why I didn't want Yahoo to be acquired by them.

Claimed information from the inside by cshbell · 2009-10-11 02:08 · Score: 5, Interesting

According to this comment post on Engadget, it was a contractor working for Danger/Microsoft who screwed up a SAN upgrade and caused the data loss. Obviously, take this with a grain of salt until it's substantiated:

"I've been getting the straight dope from the inside on this. Let me assure you, your data IS gone. Currently MS is trying to get the devices to sync the data they have back to the service as a form of recovery.

It's not a server failure. They were upgrading their SAN, and they outsourced it to a Hitachi consulting firm. There was room for a backup of the data on the SAN, but they didn't do it (some say they started it but didn't wait for it to complete). They upgraded the SAN, screwed it up and lost all the data.

All the apps in the developer store are gone too.

This is surely the end of Danger. I only hope it's the end of those involved who screwed this up and the MS folks who laid off and drove out anyone at Danger who knew what they were doing.

"Epic fail" doesn't begin to describe this one.

Re:Claimed information from the inside by Anonymous Coward · 2009-10-11 04:14 · Score: 2, Insightful

This doesn't mean the data is "gone", it means that most likely a bunch of disk with user data have had their metadata changed and perhaps a bit of new data has overwritten them. Reformatting drives or changing the RAID configuration doesn't delete data, it just makes it inconvenient to access it. Unless their SAN is designed to magically write zeros over every disk within a few minutes of a configuration change, at least some data is still there. How hard it is to access it depends on how much support they can get from the people who designed the storage system (file system, database, or a raw object store of some kind).
Re:Claimed information from the inside by sincewhen · 2009-10-11 14:28 · Score: 1

There was room for a backup of the data on the SAN
So, the plan was to put the backup on the same device as the primary?
Ah, so they were going to use the "What could possibly go wrong?" strategy!

--
-- Braden's law of data: All data spends some of its lifetime in an excel spreadsheet.
Re:Claimed information from the inside by Anonymous Coward · 2009-10-13 04:39 · Score: 0

"Epic fail" doesn't begin to describe this one.
Too Big to Epic Fail?
Re:Claimed information from the inside by Anonymous Coward · 2009-10-14 05:28 · Score: 0

I think the post on Engadget is partly a troll, or at least poorly informed. Yes, Hitachi was doing the SAN upgrade, but Hitachi always does Danger's SAN upgrades. Nothing unusual there. It's not usual practice to upgrade SAN firmware by yourself, you let the vendor do it.
I work with a former Danger staffer who was asked (but declined) to go in as a contractor and help them try to sort it out, so I probably have a better line on this stuff than Engadget does.

Re:Why not store the data on phone permanent memor by AchiIIe · 2009-10-11 02:21 · Score: 1

Just wanted to let you know that the patent you mention in your signature has been cancelled. (warning, your toddlers might violate a patent)

See link below, scroll to the end:
http://www.google.com/patents?id=T2QKAAAAEBAJ&printsec=abstract&zoom=4&source=gbs_overview_r&cad=0#v=onepage&q=&f=false

--
Nature journal lied in Britannica vs Wikipedia Ask to retrac

Will the number of drug-related calls go down? by Lord_Dweomer · 2009-10-11 02:46 · Score: 1

Not a troll, I swear. Just someone interested in the works of Steven Levitt...

I've read many racially slanted comments elsewhere that seem to indicate a general opinion that the majority of Sidekick users are african american or hispanic. Someone responded at one point that there would be a huge drop in the number of drug deals that occur because everybody will lose their dealers number.

Now, the nerd in me started wondering...what if there actually was a connection!? Or not even something drug-related...could there be other connections that occur as a result of this data failure? Just food for thought...

--
Buy Steampunk Clothing Online!

Re:Will the number of drug-related calls go down? by Slashcrap · 2009-10-11 20:57 · Score: 1

Now, the nerd in me started wondering...what if there actually was a connection!?
Yo, there's a word for racist nerds - it's "libertarian".
Re:Will the number of drug-related calls go down? by Anonymous Coward · 2009-10-13 20:08 · Score: 0

Yes, pointing out statistical fact is indeed racist!

You assume Danger used a MSFT platform by xswl0931 · 2009-10-11 03:05 · Score: 3, Insightful

Looking at the timeframe that Danger was acquired by MSFT and that the Danger OS was likely based on NetBSD (http://en.wikipedia.org/wiki/Danger_Hiptop), it's more likely that Danger was still using NetBSD as their Server Software and this was merely a process issue. Blaming it on the "Microsoft Platform" without any real data is just spreading FUD.

Re:You assume Danger used a MSFT platform by Anonymous Coward · 2009-10-11 06:44 · Score: 2, Informative

You know nothing of which you speak. I assure you it was running on Microsoft software. Unfortunately, I should know.
Re:You assume Danger used a MSFT platform by xswl0931 · 2009-10-11 10:46 · Score: 2, Insightful

You assure us anonymously without any proof? Of course.
Re:You assume Danger used a MSFT platform by Ecuador · 2009-10-12 02:58 · Score: 2, Funny

He's modded +1 Informative. I guess that's proof enough! :D

--
Violence is the last refuge of the incompetent. Polar Scope Align for iOS

synonym by Anonymous Coward · 2009-10-11 03:11 · Score: 0

Microsoft/Danger

The second part is kind of redundant / a synonym, isn't it?

Amazing by HangingChad · 2009-10-11 03:19 · Score: 0, Troll

It's amazing how many times the name "Microsoft" and the words "catastrophic failure" end up in the same headline.

--
That's our life, the big wheel of shit. - The Fat Man, Blue Tango Salvage

Re:Amazing by 1s44c · 2009-10-11 04:07 · Score: 1

It's amazing how many times the name "Microsoft" and the words "catastrophic failure" end up in the same headline.
Amazing yes, but it's not the full story. Microsoft put enormous pressure on multinationals to keep stories of IT failures in. Most of the catastrophic failures will never make the headlines.

Re:Why not store the data on phone permanent memor by Oshawapilot · 2009-10-11 03:21 · Score: 2, Informative

I'll admit to having one of the original (and second version) of the Sidekick (They were called the Hiptop everywhere else except the USA) and the idea of storing everything on the cloud seemed great at the time - through several device upgrades, warranty replacements, and other hardware changes everything just automagically restored to the new phone within 10-15 minutes of switching the SIM.

One should add that the devices themselves are designed to "Play dead" when the battery gets low and shut down while still maintaining enough power to ensure the volatile ram holding the devices local cache of data remains intact. It's only if the battery is fully exhausted to the point of not being able to accomplish this, or a critical error/OS crash (The dreaded "red X of death") is encountered is the volatile ram actually in danger of being erased.

Therefore all the warnings about not letting the phones go "dead" or turning them off are a bit misleading since, excluding one of the two above situations everything is actually safe, but it's not without warrant since I'm sure MS/Danger are going to try to "backwards restore" whatever is salvageable.

Furthermore, since the OS is locked down extremely tight there's no (to my understanding, admittedly a few years old now) method of locally backing up a Sidekicks data. Contacts stored on the device can be backed up to the SIM card one at a time (with only the basic name/phone data, all other extraneous data such as profile pics, etc will not be included) but it was tedious to accomplish (one contact at a time) and the average Sidekick user (read as teen/clueless) probably has no idea how to do it anyways.

Re:Why not store the data on phone permanent memor by sribe · 2009-10-11 03:55 · Score: 1

...since if you're 95% of people, you've _never_ backed up your phone's data.

Unless of course, you're an iPhone user, in which case your data is backed up every time you plug in your phone into your computer to recharge ;-)

That's not much of a cloud. by Nekomusume · 2009-10-11 04:02 · Score: 1

All that data was apparently in one data center... Not nearly diffuse enough to be called "cloud computing". Google's hundreds of distrubuted, redunant data centers... that's a cloud.

Don't panic by Anonymous Coward · 2009-10-11 04:07 · Score: 0

As MS is involved there is a good chance that Sidekick users can either recover the their data from Google cache or buy a backup copy from an underground/crackers/warez forum of their choice.

my guess, crypto key pass phrase by Anonymous Coward · 2009-10-11 04:07 · Score: 0

my guess, the crypto key pass phrase was lost

as in for PCI/HIPPA compliance something was encrypting data. When this server crashed, no one knew the pass phrase to start accessing the data again.

my $0.02

Cloud of stupidity by JustNiz · 2009-10-11 04:12 · Score: 0, Troll

Microsofts products are repeatedly proven to be terrible, their services are repeatedly proven to be even worse. Yet still people pay inflated prices and put them in mission-critical places. Even in preference to better and more robust alternatives, including many that are free, they still choose Microsoft for some crazy reason.

When is this cloud of stupity going to lift from Microsoft users eyes?

Half of a good idea by SuperKendall · 2009-10-11 04:47 · Score: 1

What if you are a conference, lose your phone, drop it into the punch bowl, have it fall out of your pocket and run over by a taxi. Having some sort of remote backup seems a good idea.

Yes of course which is why most modern systems backup to your computer or the cloud or what have you - but ALSO have local permanent storage.

A sign of the wrongness of the approach is that if your sidekick battery dies the data is gone. That would not be true of any modern smartphone at this point, your data would be held in flash as well as whatever servers or other computers it might have backed up to.

Furthermore if Danger didn't offer a viable means of letting people back up data to their own computers, that's another failure too...

--
"There is more worth loving than we have strength to love." - Brian Jay Stanley

Wish I had mod points by Nick+Driver · 2009-10-11 05:05 · Score: 1

Cloud computing?

That ain't no cloud. That's the fog obscuring the view of sanity.

+1 Insightful.

First Big Clue by jriding · 2009-10-11 05:10 · Score: 1

Microsoft named one of its products "Danger" and no one realized this might actually be a warning?

--
love the taste, hate the texture

Re:First Big Clue by Lennie · 2009-10-11 09:07 · Score: 1

Actually the company they bought some time ago is called Danger.

--
New things are always on the horizon

I find it hard to believe... by hackus · 2009-10-11 05:11 · Score: 1

that backups where not done, and no process existed to test the backups. What I would like to know, is what does Microsoft gain or T-Mobile gain by not having to do infrastructure improvements or upgrades if all of the data is suddenly lost.

I think it would prove interesting to know, if someone made the decision to destroy the data, and save potentially millions in infrastructure upgrades.

I find the whole thing rather improbable as many have pointed out here, that backups suddenly failed, or even the off site backups are bad.

I think it is more likely, they (T-Mobile) decided not to do infrastructure improvements and to dump the data due to the fact the management cannot compete in the mobile space, in order to show a profit at the end of the year.

-Hack

--
Got Geometrodynamics? Awe, too hard to figure out? Too bad.

Re:I find it hard to believe... by marciot · 2009-10-11 05:53 · Score: 1

I bet they had a backup server, only that it was running on Hyper-V on the same box that contained the data itself.
Re:I find it hard to believe... by SuiteSisterMary · 2009-10-12 00:34 · Score: 1

Nah. If the article had been 'Catastropic SAN failure destroys online backups, but Microsoft/Danger say the whole thing will be restored from offline backups within two days, and only data less than X hours old, and therefore not part of the backup is lost' then /. would be SCREAMING about 'Microsoft keeps your data! Why, I erased some stuff, but it would STILL BE ON THAT BACKUP TAPE! WHAT IS MICRO$OFT PLANNING ON DOING WITH ALL THAT DATA?!?@?@?! SUE THEM FOR BREACH OF PRIVACY!!!11!!!ELEVENTYONE!!111!!

--
Vintage computer games and RPG books available. Email me if you're interested.

Autorestore - multiple birds one stone. by Colin+Smith · 2009-10-11 05:17 · Score: 2, Insightful

To the standby or testing system. Our staging/testing systems all run yesterday's production data, restored from the most recent backup.

if your backups don't work then neither will your test/staging server... Which will be noticed.

What do you get?
* Backups tested every day.
* A test/staging/standby system identical to the production.
* Something the business can run all the crappy queries they like against without affecting the production system.

--
Deleted

Re:Autorestore - multiple birds one stone. by Anonymous Coward · 2009-10-11 06:05 · Score: 0

A test/staging/standby system identical to the production.
A great idea until you realize the only thing you can test is whatever just went into production yesterday.
Re:Autorestore - multiple birds one stone. by quanticle · 2009-10-11 11:24 · Score: 1

That only works if the data you've got isn't protected by a law like HIPPA or somesuch. In that case, you could find yourself in big trouble for letting private data off your secure production server.

--
We all know what to do, but we don't know how to get re-elected once we have done it
Re:Autorestore - multiple birds one stone. by Anonymous+McCartneyf · 2009-10-11 12:19 · Score: 1

But HIPPA doesn't cover T-Mobile cell phones, does it?

--
There is a fine line between recklessness and courage... -- Paul McCartney
Re:Autorestore - multiple birds one stone. by Colin+Smith · 2009-10-13 05:48 · Score: 1

In which case your restore system becomes your "secure production standby" server...

--
Deleted

Should we continue ... ? by Anonymous Coward · 2009-10-11 05:23 · Score: 0

Hell no! You never should have in the first place.

Means what it said by SuperKendall · 2009-10-11 05:42 · Score: 2, Informative

shit, is that TSR still hanging around? goodness!

Dude, what part of "Stay Resident" did you not understand. It's not like selling your computer rids you of it.

That's why I never ran them, nor consorted with Deamons.

--
"There is more worth loving than we have strength to love." - Brian Jay Stanley

A new dawn in Microsoft procedure by SuperKendall · 2009-10-11 05:45 · Score: 1

At least Microsoft has finally decided to make the end merciful and swift for companies they acquire, instead of slowly suffocating them over thousands of years like a Sarlacc. We should rejoice!

--
"There is more worth loving than we have strength to love." - Brian Jay Stanley

Sidekick? More like... by SuperKendall · 2009-10-11 05:48 · Score: 1

Sidekick to the balls, that is!

This one is Chuck Norris level.

--
"There is more worth loving than we have strength to love." - Brian Jay Stanley

The value of data by symbolset · 2009-10-11 05:49 · Score: 4, Insightful

Granted, this isn't cheap, but our data isn't either.

Microsoft bought Danger for half a billion dollars. Current estimates of the value of this data are roughly... half a billion dollars, plus a little. There's little doubt that in addition to destroying the entire value of the acquisition they've created a connection between "Microsoft", "Danger" and "data loss". In their release T-Mobile isn't being shy about tying those things together. Not good. That's going to have impacts even for some completely unrelated cloud-based products like Azure.

Somebody's about to get a really awkward performance review.

--
Help stamp out iliturcy.

Re:The value of data by cbreaker · 2009-10-12 07:27 · Score: 1

Well, the data isn't the only value. The service itself will be working a full capacity soon, albeit light on people's stuff.

The entire system, and the fact that you need that system to use the Sidekick, has value even after this mess. Many people will simply add their stuff back to their phones, begrudgingly. It's like if you lost your phone if you had almost ANY other type of cell phone.

In fact, now that this happened, you can probably be secure now knowing that Microsoft will never let this happen again. It will be a topic in every single weekly meeting, staff meeting, and director meeting in the company for a long time. "Are the backups working now? Is the off-site system working?"

--
- It's not the Macs I hate. It's Digg users. -
Re:The value of data by symbolset · 2009-10-13 05:39 · Score: 1

In fact, now that this happened, you can probably be secure now knowing that Microsoft will never let this happen again.
Somehow, I doubt that matters. There are some things in technology that a brand just doesn't recover from. The inability to store significant amounts of data on the device, the inability to download from the device to a PC if the service fails can now be pointed to as inherent disadvantages of the thin client smartphone device that won't change no matter how they attempt to rebuild the brand.
A wiser choice might be to isolate the damage to "just" Danger, before it takes out Azure and Windows Mobile as well.

--
Help stamp out iliturcy.
Re:The value of data by cbreaker · 2009-10-14 02:23 · Score: 1

Sure, and I never really liked the Sidekick because of the reasons you cite. However, it was also pretty cool that when I took a picture on the thing, it would automatically show up on the T-Mobile web site in moments and I could save it to my PC or e-mail it or whatever.

The technology isn't inherently bad. Yes, there was a major f'ckup but the cool thing about the Sidekick is that you can lose it, get a new one, and bam, everything is synced back to your phone like nothing happened. The bad part is that the phone is so locked down and controlled that doing anything not blessed by Danger you can't do. You can't even put music on it (at least I couldn't at the time. You could pay $3 for a 10 second low quality ring tone though.)

--
- It's not the Macs I hate. It's Digg users. -

Conspiracy time by log0n · 2009-10-11 05:50 · Score: 1

Anyone else see this is as a way for MS to get WinMo 6.5 phones (which by all accounts are tanking) into more hands?

"We're sorry we hosed your data, but to attempt to make amends we'll give you/discount you this brand new fancy UI Windows Phone with our apologies."

Re:Conspiracy time by DJ+Particle · 2009-10-11 06:15 · Score: 1

**DJ Particle puts on her tinfoil hat**

That's exactly what I was thinking...

...what's worse is, despite the tinfoil hat comment, I'm not joking.
Re:Conspiracy time by Anonymous+McCartneyf · 2009-10-11 13:04 · Score: 1

If no one had linked Microsoft to the Sidekick server, then that might have worked. It still might, to some extent.
But, since T-Mobile named Microsoft as the people maintaining the server, they likely aren't going to give away WinMo phones to make up -- not when they have Blackberries.

--
There is a fine line between recklessness and courage... -- Paul McCartney

Has nothing to do with tech or IT by edelbrp · 2009-10-11 07:07 · Score: 1

This has everything to do business planning of funding and timely rollout of redundancy and backup systems (including staff). The tech, tools, good IT staff hires, procedures, and strategies are out there and are pretty well known. I don't have to know about their infrastructure or staffing to tell you that they didn't invest in the infrastructure and/or training for staff to prevent exactly this sort of disaster.

I can think of three possible reasons why they didn't have such an infrastructure in place:

1) When M$ bought Danger, they scaled back to maximize profits

2) The risk/cost analysis (odds of failure, cost, and cost to prevent) of such a disaster was out of date, incorrect, or simply accepted

3) Funding was available but rollout of redundancy/backup was taking longer than expected

I doubt #3 since Danger has been around for a while and their customer base probably isn't growing exceedingly quickly. #2 is mildly possible, IMHO, if they simply accepted the risk. But, a class-action after a disaster like this has to be really expensive unless they think they can dodge the legal obligation of backing up user data. #1 is fairly typical of take overs, especially if profit is more important than safeguarding user data.

New T-Mobile ad... by Anonymous Coward · 2009-10-11 07:24 · Score: 0

We see a house's front door from the inside of the house. Doorbell rings. Angry-looking guy holding a useless Sidekick walks up, opens door and sees Catherine Zeta-Jones standing there.

"Hi," she says, "I'm here to apologize for what happened to with your Sidekick," then promptly drops out of frame and you hear the sound of a zipper being unzipped...

Cut to close-up of the guy's face transforming from anger to a big, relaxed smile.

Fade to T-Mobile logo.

Re:New T-Mobile ad... by fireylord · 2009-10-12 08:39 · Score: 1

Catherine will be rather busy then. . .

What about the backup (crazy talk)? by argent · 2009-10-11 07:52 · Score: 1

Surely they didn't upgrade the SAN and the offsite backup at the same time?

Surely they had an offsite backup?

Right?

Re:What about the backup (crazy talk)? by morven2 · 2009-10-11 10:02 · Score: 1

I bet you they didn't. Wouldn't be the first time a company thought RAID meant you didn't have to take backups.
It would be very ironic if the reason they were upgrading was partly that backups weren't working anymore.

Re:Kicking it oldskool by Anonymous Coward · 2009-10-11 08:19 · Score: 0

tl;dr

Re:Irresponsibility to EPIC proportions. -- yes by Anonymous Coward · 2009-10-11 08:50 · Score: 0

Sidekicks have very rudimentary backup features. You can save phone numbers and names to a SIM card and put contacts, bookmarks, notes and pictures into emails and send them off to yourself. That's about it. The latest version of the sidekick does not have a file manager to aid in backing anything up.

The system is most definitely a thin client architecture...and I believe would have been properly managed..and continued to work..for years had M$ kept their greedy, incompetent fingers out of it.

Perhaps they should have used ZFS or btrfs for the by PhunkySchtuff · 2009-10-11 09:04 · Score: 1

Perhaps they should have used ZFS or btrfs for their servers.

I don't see how a filesystem woukd do anything when, from all reports, they fucked up a SAN upgrade and didn't have backups.

Oh, my hard drive just died, and I can't access any data on it any more. Maybe I should have used ZFS? I don't think so...

--
Specialist Mac support for creative pros, Melbourne

Re:Irresponsibility to EPIC proportions. -- yes by AK+Marc · 2009-10-11 09:24 · Score: 0, Flamebait

It's a feature. Lose your phone? The guy with the phone gets nothing, and you get everything back in seconds. And it's protected on our super-servers, so you never have to worry about loss. They weren't stupid, just trusting. Though from your tone, you apparently think nice trusting people deserve to be screwed.

--
Learn to love Alaska

You insensitive clod! by Anonymous Coward · 2009-10-11 09:43 · Score: 1, Informative

My 9 year old daughter has a sidekick.
Microsoft has made her cry.

Re:Why not store the data on phone permanent memor by sjames · 2009-10-11 09:52 · Score: 1

I suppose the question should be asked at the architectural level. Why in the world doesn't the phone store the data locally and simply replicate it to the server. That gives 100% of the benefits should the phone be damaged but also makes sure it's all there if it has no connectivity or if, for example, a sysadmin disaster wipes the server out.

If that was done, recovering from this problem would be just a matter of pushing everything on the phone back to the server. It might have even been possible to do it without anyone having to know the server died.

We deserve an NTSB-style post-mortem by jfaughnan · 2009-10-11 10:52 · Score: 1

I've heard and read of enough problems with restoring complex transactional data structures that I can imagine this situation is far more complex than many believe.

What I'd love to see is a full post-mortem, a lesser version of that the NTSB does for airplane crashes.

Google's been doing some of this for their (too frequent) outages, but they've very high level -- typically something about a system reconfig overloading a router. The Cloud user base needs a far greater level of error exposure.

It took openness and in depth analysis to make air travel safe.

The Cloud won't be safe until we learn the same kind of lessons, and apply what we learn in new and improved systems.

--
John Faughnan
jfaughnan@spamcop.net

Mega fail by Anonymous Coward · 2009-10-11 11:16 · Score: 0

Teeth will be gnashing, and the lawsuits will be flying.

Its a shame... by Tangential · 2009-10-11 11:38 · Score: 1

Its a shame (but typical) that in wanting to kill off 1 particular piece of equipment that they don't like (the non MS OS SideKick) that MicroSoft gets to give a whole swath of tech like cloud computing a bad name. OTOH, maybe this will serve as a heads up to most people that cloud computing is years away from really dependable viability.

--
Suppose you were an idiot. And suppose you were a member of congress. But then I repeat myself. -- Mark Twain

Re:Its a shame... by metaforest · 2009-10-11 18:00 · Score: 1

This is a much bigger problem than just taking a careful look at cloud resources and their implementation.
When a company as big as Microsoft 'accidentally' kills off 1 million users like this.... what does it really tell us about the viability of
cloud computing?
It tells me that if I put my business data... my family jews... in a nut cracker.... I can expect someone, someday to give me a none too gentle squeeze. Trusting a third party to manage external computing resources like this is just asking to get fuxed. As the Microsoft/Danger example shows, the real danger is not from a small well meaning company, like Danger. The real risk is from the large corporate shark that snaps said company up and turns the cloud resource you were counting on into a 'TOOL.'
The folks using Google, or Amazon, or what have you, may not have to worry today. But what about 5 years from now, or 10? I have already seen a lot of disasters related to the Cloud Computing v0.9x era... There are a lot of small businesses out there than like the notion of SaaS, but this Microsoft/Danger snafu shows just how risky it is....

Backup the device by Hamsterdan · 2009-10-11 12:41 · Score: 1

And using the cable that came with the phone (or aftermarket) to backup the phone's data ??? I did it on my V360, and I still do it on my current phone (sue me, it's an iPhone, but at least it's jailbroken) It's so simple to backup the device on one's computer... If it's using WinMo, Active Sync will be able to backup, right?

--
I've got better things to do tonight than die.

Re:Irresponsibility to EPIC proportions. -- yes by sowth · 2009-10-11 14:23 · Score: 1

The problem is the cell phone cartel rigs all the phones to be either difficult or impossible for customers to copy their own data. The royals make sure everyone is dependent upon them, and if they screw up or just decide to throw your stuff out, you cannot do much about it. Classic type of problem when doing business with psychopaths...

Followup by symbolset · 2009-10-11 15:10 · Score: 1

Somebody's about to get a really awkward performance review.

Apparently the reviewee's name is Roz Ho. We should all send her flowers. Thanks, Roz. Write if you find work.

--
Help stamp out iliturcy.

Anonymous Coward by Anonymous Coward · 2009-10-11 21:50 · Score: 0

Backups and Disaster Recovery:
Two problems exist: Did the backup happen and is it limited to just one big tape ??
Has anyone restored from backups ? Application, Data, Personality ???
Backups need to be restored someplace else - no point otherwise. What if one site
has been hit ?
One tape backups are better than epoch and incrementals in general.
Restore issues are really things you need to address.

For example: I left an organization once and got called back by a friend when they had
trouble. I wanted to change things so I checked the backups PRIOR to making changes.
Oracle DB backup would have been impossible after the new system admin changed
backups I had written. I rang my old boss who was not pleased.

Backups taking a long time? Have you tried to recover any files ?? At one place
the backups create an index but it takes so long for backups that the restore window
aborts - just looking through the index. SIGH!!!!! The only time you can recover a file
would be after a disaster or if the file you want is found early in the index file.

Backups suddenly taking a lot less time or using up less tapes ?? Check the backup
procedure. I discovered a backup once where 6 sets of 5 tapes were in play.
However the logs indicated that for the last 5 sets the tapes appeared to fit onto only
3 tapes. I identified the problem - a new operator doing the wrong thing and the backup
software allowing it to happen. I ported a similar bit of backup code and fixed the speed
problem as well: 6 hours down to 2 hours 10 minutes.

SAN backups work well but need testing: have you backed up every machine you need ?
I wrote about 2700 lines of code once: I started backing up just one machine a WEB
system. It needed a DNS server and a Certificate server so I had to backup two other
machines. It can be a problem ? This WEB server had several WEB pages - I checked
them - Some of the applications actually were off-system references - so they needed
to be up. I created a BASELINE after I asked for unique WEB page titles - just to sort
this out.

Backup limitations need to be known: some programs that claim to backup can have
bugs or features that prevent a decent backup. I discovered a backup of a filesystem
that should have backed up every file in say "/home" but because MVFS was running
- it failed. You need to stop or quieten filesystem operations for some things and use
another different method of backup as well. Its a tough decision.

Filesystems: Sparce filesystems, Large files ?, Access control lists, inode info ??
All of these need to be considered !!

Activesync does not backup by argent · 2009-10-11 23:24 · Score: 1

If it's using WinMo, Active Sync will be able to backup, right?

I went back to PalmOS after some bad experiences with Pocket PC, including the fact that ActiveSync does NOT make a useful backup of a PDA or phone.

* It doesn't back up applications. It does keep a copy of applications when you install them through ActiveSync, but not if you install them via a CAB file on the device. And that copy is only usable until you replace or reinstall your PC because it depends on a maze of twisty little registry keys.

* It doesn't back up application data that it didn't install. If your applications create data, they have to have software on your PC to back it up.

* It doesn't back up files you install yourself. Depending on how you install them, it may keep a copy of the file you installed, but don't expect it to keep any changes made on the device.

That's why by 2002 every Windows CE "partner" was including a little program to backup to flash card, because ActiveSync doesn't do it for you.

Even VSAM wouldn't save idiocy by Ilgaz · 2009-10-12 00:26 · Score: 1

I think even if sidekick data were housed on geographically away sysplex IBM Z/OS Z10 mainframes, it would be lost somehow with such management. Host OS and filesystem is irrelevant.

Run the most basic backup application, e.g. the one MS gives for free with Windows, you will see it backs up and VALIDATES entire data. I haven't seen a single application, command designed for backup doesn't do it.

Re:Kicking it oldskool by V!NCENT · 2009-10-12 05:06 · Score: 1

How's that ignorance? Did I say who did it? Did I get onto conspiracy mode?

Look kid. Welcome to the real world. Taped, archived and accesable interviews with eyewitnesses (fireman, and alike) from cable tv networks all around the world said that they heared explosions from underground when they where inside the building. That's court grade evidence piece number 1. Taped, archived and accesable interviews with a head of a German universaty said in an interview that they researched the 'soil'(sorry I am not a native English speaker) from the WTC and found an alarming number of phosphor. Evidence number 2. NASA posted infra red satalite images from days after the collapse showing alarming heat still comming from the WTC site. Evidence number 3. Pictures of the days after showing 21 meter long steel bars from the core structure cut in a pricise way show evidence from a demolition. Evidence piece number 4. Then we've got confirmed, on public-, cable television by the head of the WTC 7 tower that tower 7 was collapsed by a demolition. Evidence piece number 5. Want me to go on?

Notice that this is not theory, but actual court grade evidence? I don't know who are responsible for this. I don't know what the hell was going on. All I know is that this is not speculation and it's comming from highly credible sources and not John Doe dragon fighter xXx p0rn l33t 0wN3r's, AOL homepage. Kindergarten is someplace else.

--
Here be signatures

MS/Danger snafu by Anonymous Coward · 2009-10-12 09:24 · Score: 0

Continue to trust?

I'd blame middle management by jerralb · 2009-10-12 14:28 · Score: 1

After learning this was likely caused by a failed single SAN upgrade by Hitachi, I have to think that the architecture built to support the Sidekick didn't have an adequate budget to be built right.
Budgets ultimately decide what we techs/admins get to work with. We can always ask for what we want. But someone else (procurement, finance, project management, architect) can shoot it down, resulting in Plan B. And in most cases the person(s) signing/approving the final purchase order hasn't got a clue. By the time a failure occurs, the parties responsible for the system in place have long gone to their next position to screw up.

Re:Kicking it oldskool by 10Ghz · 2009-10-14 18:53 · Score: 1

How's that ignorance?

It's ignorance since you obviously have no clue how things happened in reality.

Did I say who did it? Did I get onto conspiracy mode?

Yes you did. Since the official record differs from your assumption, it would mean that you think that there is a conspiracy at work.

Look kid. Welcome to the real world. Taped, archived and accesable interviews with eyewitnesses (fireman, and alike) from cable tv networks all around the world said that they heared explosions from underground when they where inside the building.

Well grandpa, many of those interviews were taken out of context or they were made by people who had no idea what's going on. Just because some panickinggue tells that "OMG, I heard explosions!" does not mean that there were actual explosions.

phosphor

Is this the "there were traces of thermite!"-argument? That's debunked here:

http://www.911myths.com/html/traces_of_thermate_at_the_wtc.html

and here:

http://www.debunking911.com/thermite.htm

Evidence number 2. NASA posted infra red satalite images from days after the collapse showing alarming heat still comming from the WTC site.

And what does that prove? That there were fires among the rubble? Surely not!

Evidence number 3. Pictures of the days after showing 21 meter long steel bars from the core structure cut in a pricise way show evidence from a demolition.

Debunked here:

http://www.debunking911.com/thermite.htm

The cuts were done by the rescue-workers AFTER the collapse...

Evidence piece number 4. Then we've got confirmed, on public-, cable television by the head of the WTC 7 tower that tower 7 was collapsed by a demolition.

Debunked here:

http://www.debunking911.com/pull.htm

Evidence piece number 5. Want me to go on?

Please do, I love laughing at your retarded arguments.

Notice that this is not theory, but actual court grade evidence?

None of the things you listed would pass as "evidence".

All I know is that this is not speculation and it's comming from highly credible sources and not John Doe dragon fighter xXx p0rn l33t 0wN3r's, AOL homepage. Kindergarten is someplace else.

"Highly credible sources" indeed.

--
Lesbian Nazi Hookers Abducted by UFOs and Forced Into Weight Loss Programs - -all next week on Town Talk.

Slashdot Mirror

Server Failure Destroys Sidekick Users' Backup Data

304 comments