Server Failure Destroys Sidekick Users' Backup Data

← Back to Stories (view on slashdot.org)

Server Failure Destroys Sidekick Users' Backup Data

Posted by timothy on Saturday October 10, 2009 @09:29PM from the oh-well-enough-said dept.

Expanding on the T-Mobile data loss mentioned in an update to an earlier story, reader stigmato writes "T-Mobile's popular Sidekick brand of devices and their users are facing a data loss crisis. According to the T-Mobile community forums, Microsoft/Danger has suffered a catastrophic server failure that has resulted in the loss of all personal data not stored on the phones. They are advising users not to turn off their phones, reset them or let the batteries die in them for fear of losing what data remains on the devices. Microsoft/Danger has stated that they cannot recover the data but are still trying. Already people are clamoring for a lawsuit. Should we continue to trust cloud computing content providers with our personal information? Perhaps they should have used ZFS or btrfs for their servers."

1 of 304 comments (clear)

Min score:

Reason:

Sort:

Re:"they should have used ZFS or btrfs" by jimicus · 2009-10-11 01:08 · Score: 5, Informative
I've always been amazed that tape is trusted as much as it is. It seem (anecdotally at least) to have a disproportionately high failure rate.
I'm not sure that's the problem so much - after all, LTO has a read head positioned directly after the write head and automatically verifies as it goes along. A tape error is dead easy to spot.
There are a number of places where things can fall apart, and tapes don't even need to come into the matter:
- Nobody checking the logs
- Failure to understand the processes necessary to get a good backup. (You can't just dump the files that comprise a database to disk - you must either quiesce the database or use the DBMS' inbuilt backup routine - or you will wind up with inconsistent files and hence an inconsistent database. You'd be amazed how many people don't understand this.)
- Failure to maintain backup processes. (When you moved the database to another disk because you were running out of space, you did update your backup process? Right?)
- Not doing any test restores.
- Not doing enough test restores, or doing them carefully enough. (If you're unlucky, your database will come back up OK even though you didn't quiesce it before carrying out the backup. Why do I say unlucky? Well, if it had not come up OK, you'd know immediately that there was a problem with your process. Then once the database is back up, make sure you check the restored data to ensure that recent transactions which should be on the backup actually are).