Tech Magazine Loses June Issue, No Backup
Gareth writes "Business 2.0, a magazine published by Time, has been warning their readers against the hazards of not taking backups of computer files. So much so that in an article published by them in 2003, they 'likened backups to flossing — everyone knows it's important, but few devote enough thought or energy to it.' Last week, Business 2.0 got caught forgetting to floss as the magazine's editorial system crashed, wiping out all the work that had been done for its June issue. The backup server failed to back up."
The data "should" be recoverable, and the network server "should" have a RAID. I bet you anything that someone in IT asked for money for the RAID and it was denied, since lots of people with budget control think RAID is bug spray.
Backups and fault-tolerant hardware cost money. You can talk about potential losses and risks until you're blue in the face, until it *actually* costs the company money, nobody will listen. What's going to happen here more than likely is the person who asked for the RAID will get fired, as they're probably the same person in charge of the backups. This will also provide a scapegoat for that person's manager, since obviously if they got fired for it there need be no further repercussions or changes in behavior.
The only way they deserve to get fired is if they didn't advocate as hard as possible for enough backup hardware/software to allow for verification of backed up data and recovery in case of a mechanical hard drive failure. If they did, and were denied, then they did everything they could. (Which doesn't mean they won't get fired, it's just less deserved at that point. However, the thought there is that if they didn't want to get fired for incompetence, they should have tried to become a manager...)
Never underestimate the power of stupid people in large groups.
hell with that. ever heard of competent IT staff? why has their CTO not been fired yet?
honestly though, talking management into backup solutions is like pulling teeth, then they blame you for not having it in place when the failure does happen.
Last place I worked at we were using 4 year old DLT tapes because management was too stupid and cheap to buy new ones.
"we will buy new when those fail" is what we were told.
Do not look at laser with remaining good eye.
/grabs hammer...
*bang* *bang* *bang*
Oops, it looks like a couple of those DLT drives are running into problems. We need replacements. Did you see what happened to Business 2.0?
http://www.nytimes.com/2007/05/01/business/media/0 1mag.html?_r=1&oref=slogin
Doesn't it?
I'm in the hole of the broadband donut.
Huh, I guess I wasn't paying attention to when Slashdot turned into Digg, even though I read both. Here's a link to the original article, rather than what might be a splog. Especially since the article text was copied verbatim.
If they deleted the files in error, then the RAID would faithfully mirror that deletion across all physical disks...
Nice story, though. Reminds me of the sysadmin in my first company who automatically back-upped our server every day. Only problem was: the proces put a copy of the backup on a drive that was being back-upped. You can imagine what happened after a few weeks (it failed, disk full). He only noticed a few months later when we asked him to restore some files.
I know that where I work we run a Disaster Recovery (DR) exercise every time we do either a hardware or software upgrade to prove that everything still works. If nothing has changed one is done every second year. We actually pull our off-site production tapes and restore to a new machine that is not in the same city where the current production machine resides. It may be over-kill for them, but, a test of that for them (or any other business) would be a fruitful exercise in that they prove that the backups are good and they can restore from a given point and carry on with a minimal loss of work.
For one of our server apps we actually have two laptops configured with all of the required software and we do restore production data from backups on a regular basis as we use that for our system testing on projects. This happens several times a year so we know that the backup and restore procedures truly work. It is also very cool walking in to the client site, plug in the laptop and show them that in an emergency they have a working machine very quickly. Not as fast as a server, but, it gets them a working machine until the replacement server arrives.
Panic now, beat the rush!
Wait for OS X 10.5 and "Time Machine".
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
And if the data is that important then a suitable RAIDed disk array will sort things out.
The topic here is backups, not RAID.
Say it again with me everyone "RAID IS NOT A BACKUP"
RAID increases-uptime by decreasing/eliminating the downtimes needed to do restores when an individual drive bites it. It is *NOT* a backup.
RAID does not save you if someone accidentally deletes a needed file.
RAID does not save you if your machine gets nailed by a virus/upatched-exploit.
RAID does not save you if the drive power supply fries taking out attached hardware.
RAID does not save you if a bugler steals your machine.
RAID IS NOT A BACKUP.
The problem is that tech magazines are in the advertising business, not the tech business. I write content for the Web site of a tech radio show, and it's just a bunch of us in cubicles looking stuff up on Google. No tech people involved.
Always someone has power over you. The thing to consider is this: Is the power good, or bad?
smarmy: adjective ( smarmier , smarmiest ) informal - ingratiating and wheedling in a way that is perceived as insincere or excessive : a smarmy, unctuous reply.
// This is not a sig.
This is why I love HP, RAID drives can be dropped into almost any other HP machine and the RAID read. As HP puts it:
Same reason that some of us prefer Software RAID - giving us even more flexibility in what we use to rebuild after a disaster. I could us an HP controller, a 3Ware controller, or some other controller.
That's the usual fly in the ointment with backups. Backups are difficult and expensive. Management always thinks they can make them easy and cheap by buying some automated solution. But the main cost of the backup is regularly testing the backups... which can only be done properly by doing a full restore... which requires available disk space equal to the size of the backup, and hours of operator time.
Story #1. Fortune 500 company. Lost some source. Big brouhaha. Edict went out: all files are to be backed up to diskettes and the diskettes sent to offsite storage which the management had contracted for with an outside firm. It took a lot of extra time, but people did it. After about two years, an important server with source code for a major product crashed. Developers tried to get the source back from offsite storage. It turns out that nobody at any point had taken any responsibility for cataloging, identifying, or indexing the diskettes. The diskettes might as well have not been labelled: the developers couldn't identify what diskettes were needed, and the offsite storage firm couldn't have retrieved them if they had.
Story #2. Medium-size scientific research organization with a Digital 11/70 running RSTS. Enlightened manager pays operator overtime pay to stay late three nights a week and do backups. Backups are performed with the "verify" option enabled. Tapes are placed in a fire-resistant tape vault every night. But no actual restores are performed. Database (Oracle, in the days when Oracle Corporation's name was still Relational Systems, Inc). is corrupted. A restore is attempted. It transpires that this version of Oracle uses the maximum record length for its files, which happens to be 65,536 bytes, and the Digital-supplied backup-restore utility... you guessed it... has a bug with records of that length. Yep. Writes 0 bytes, verifies 0 bytes.
Story #3. I worked at a place that recommended that individual developers perform individual backups using a cartridge tape system and some standard PC software. I set it up. There were two "verify" options. One used the cartridge system's read-after-write feature to read every block as it was written. The second performed the entire backup, then verified the entire backup in a second pass. Took twice as long, of course. I opted for the second method. The problem was: more than half the time, the verify would report one or two errors. And for some reason, probably efficiency of use of the tape, it didn't write file by file, it munged them into blocks. And it didn't even report the names of the files affected. Just "2 errors were encountered" or something like that. So, when that happened, I didn't see that a rational person had any alterative except to perform the whole backup again. And more than half the time, it would report a couple of errors the second time, and...
When I asked colleagues about this, it turned out that I was the only one ever to have picked the second verify option. Everyone else had picked the read-after-write-verify option, "because it was faster."
And told me not to fuss because "if it was only a couple of errors, the chances they were on files you needed to recover was too small to worry about."
"How to Do Nothing," kids activities, back in print!