Why Mirroring Is Not a Backup Solution
Craig writes "Journalspace.com has fallen and can't get up. The post on their site describes how their entire database was overwritten through either some inconceivable OS or application bug, or more likely a malicious act. Regardless of how the data was lost, their undoing appears to have been that they treated drive mirroring as a backup and have now paid the ultimate price for not having point-in-time backups of the data that was their business." The site had been in business since 2002 and had an Alexa page rank of 106,881. Quantcast said they had 14,000 monthly visitors recently. No word on how many thousands of bloggers' entire output has evaporated.
DUH!
And that's why your IT department actually needs funding. Sleep tight.
That's all I can say at this. I'm really surprised that with all the users they had, they are so quick to say "everything is gone and we're giving up" instead of just starting over and maybe implementing protocol that would make sure this doesn't happen again.
Ave Molech Setting
Incremental backups to tape every night, full backup at the weekend. Tapes must be stored off-site at a proper storage location. Got lots of data and a small backup window? Get a faster tape drive and a tape robot. It costs money, but you data costs more.
This is at a minimum people. Come on!
Mirroring: High availability
Backups: High reliability
Maybe I could understand that there might be issues with backing up live databases, and they didn't want to deal with it. Still not an excuse.
BUT, according to the site "the server which held the journalspace data had two large drives in a RAID configuration". Only TWO drives.
All they had to do was pull one of the drives, replace it, and lock up the original off site. In a couple of hours the drives would have been mirrored again.
Or even one, stale, backup.
The cost of that cleanup, of course, will be borne by taxpayers, not industry.
The only problem with that idea is that it may not have been the IT guy's decision to save money by not having a true backup system. I have seen companies skimp on backup systems because they thought their RAID system was enough.
There is no "-1 offended" or "-1 you don't agree with me" mod options for a reason.
No doubt this incident is the result of the admin's fault. He's been confusing mirroring and backup and carried on the mistake until it's too late, as pointed out in other comments.
Now what about a user's angle? The morale is you can never think your data is safer when it's "in the cloud". If you value your blog and your readers, you *should* save a copy of your work as well as the readers' info, *locally*, somewhere you have control over.
There's no place like $HOME.
Colorless green Cthulhu waits dreaming furiously.
In today's world where primary storage and protection storage are well-defined, and where entire industry grew around it (examples: NetApp, Data Domain), one is hard-pressed to understand the reason for such a debacle. The reading of the note referred to in the article leads me to believe, unfortunately, that Journalspace's IT department did not understand the difference.
It is sometimes considered a bad form to say something bad about fellow techies. We prefer to look for 'outside' causes. Still, to learn and avoid the same problems in the future, one has to admit his mistakes first. This paragraph from the Journalspace's page:
The value of such a setup is that if one drive fails, the server keeps running, using the remaining drive. Since the remaining drive has a copy of the data on the other drive, the data is intact. The administrator simply replaces the drive that's gone bad, and the server is back to operating with two redundant drives.
makes me believe there is a denial going on.
End anonymous moderation and posting on
A better backup solution needn't cost much, or even anything. Simply FTPing to your own home machine on occasion would have been a millionfold improvement (given the popularity metrics, I don't think this was like a staffed operation or anything. Just a guy or two)
My guess (and this is a guess, I'd never heard of the site before yesterday) is that this is some guy who started his own little site and it got bigger and bigger. Basically he never designed the backup, the system was just slowly pieced bigger and bigger until it got to it's current state.
The comments in the messages from the site's operator about the cost of the drive recover and thinking both drives just died at once indicate to me that this site was basically a hobby for him and he isn't experienced as an admin.
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
This story put the fear of god into me. The first thing I did since reading it is to back up the website I admin (for my dad) locally. I'd always assumed our host would have good backup, but that seems naÃve now.
All intents and purposes. Not intensive purposes.
See mirroring is like...well a mirror. If you stand before one and stick a fork in your eye your mirror-image does the same. In real time. Analogies are there for a reason.
You don't just need backups. You need to TEST them. Having a backup run every night is nice and all; but if the tapes are unreadable and no error was reported, or if you're doing it wrong and the backup is corrupted and you only find out when you come to restore ....
NAS devices are cheaper and faster now. Lower end removable drives are not much more expensive than tapes, and they are a lot faster and easier to manage.
-- -- Warning. Do not stare directly at the sun.
This is why users should be able to easily back up their own data for any online service. If a service entrusted with your data provides no straightforward way to drop a copy of it onto your own hard drive, don't trust it. I'd go as far to say that any service that doesn't strongly recommend you keep your own backups shouldn't be trusted.
Do the big kahunas of the "Web 2.0" world give users that option? Gmail, Myspace, Facebook, Twitter etcetera ad nauseam?
Prisencolinensinainciusol. Ol Rait!
NAS devices are cheaper and faster now. Lower end removable drives are not much more expensive than tapes, and they are a lot faster and easier to manage.
Having 21 days of off-site backups stored on NAS is kinda difficult.
"I don't know, therefore Aliens" Wafflebox1
Yes.
But you wouldn't think everyone would catch on...
Or attach a 4 TB Drobo to it and then use Time Machine.
Then make a backup and test the restore.
Their admin is criminally incompetent.
- Zav - Imagine a Beowulf cluster of insensitive clods...
Even accepting your price that's a cost of about 12.7 cents per gigabyte and you can get 800GB native LTO-4 tapes for about $50, which comes out to about 6.3 cents per gigabyte.
But quoting costs for desktop grade SATA drives severely understates the true cost. For any non-trivial site installation you're talking near-line rated drives, drive caddies, storage shelves and additional SAN fabric. Then price out the additional power, cooling and rack space. Then price offsite shipping and storage for the bulkier, heavier and more delicate disk option.
Mirroring has its place. Snapshotting has its place. And backups to stable media still has its place too.
Fine. Get the cartridges, but what about the capital cost minus depreciation of the drive? What about random access?
Now weigh those against an inexpensive jbod frame with a 2gb FC backplane. What's the write speed of LT vs a tasty little GB SAS drive? Rackspace? You can put a dozen into about 4U. Cooling? Although I'll grant you green cost, the random accessibility out-classes the seek time and tape insertion by a human cost dramatically. Stable media? Tape? Sometimes. Shelf space?
SAN fabric is dirt these days. You can get a nice Silkworm and a cheap-but-reliable SAS backplane for dirt as well. Perhaps a couple of GBICs.... or some handy-dandy fiber cables (also dirt these days) and you're in business. Or, put up a 10dot network off your public-face grid, and just use iSCSI. No need to use tape anymore. Get out of the reality distortion field, but do the right thing by testing what you have and doing drills to ensure that whatever you have, works and is a procedure understood by all.
---- Teach Peace. It's Cheaper Than War.
1. Backup all your data
2. Test your backups
3. Backup frequently
4. Test your backups
5. Take some backups off-site
6. Test your backups
7. Keep some old backups
8. Test your backups
9. Secure your backups
10. Test your backups
11. Perform integrity checking
10. Test your backups
Every company I've worked at has had a backup plan. Exactly zero have had a recovery plan.
OS X is not BSD in the same way that Ubuntu is not Debian (and to a much greater extent, might I add). Mull over that one for a while.
I won't join Slashcott. OTOH, If Beta goes live, I just won't be back until it's fixed. Sorry Dice.
So let me get this straight... Journalspace.com was smart enough to have someone there setup a robots.txt file, but nobody there asked if anything was being backed up to a tape/external drive/DVD/CD/Floppy Disk/Cocktail Napkin?
I'm just glad I've never used Journalspace.
"There is no rational justification for tape anymore, what with the cost per TB stored on hard disks..."
Pardon? Once you buy the tape library (which admittedly can be pricey), the cost per TB on tape is a hell of a lot less with tape. Check out LTO4 for an example of a high capacity, high performance tape system. (Also, keep in mind that your high-end disk systems also drag along a big chunk of change in infrastructure before you can plug in disks, so the initial cost of a tape library is not such a straight win for disk systems either.)
Also, the monthly cost of spinning disk is also considerable in terms of power, real estate and cooling. And with the costs of disk x 2 for HA, you can get better value for your money with other stuff, *depending* on your requirements.
And before someone else brings it up, data deduplication can work just as well for tape as it does for disk. IBM's Tivoli Storage Manager (TSM) next version will have a data de-dup'ing built in, and it's due out in Q1 2009. (Let the "brand X de-duping is better than brand Y de-duping" wars begin!)
The moral of the story about disk vs tape vs software is that one solution does NOT fit all situations. Simple to implement doesn't mean cost-effective or even rational. Unfortunately, a good DR plan still requires people to think about disaster scenarios end-to-end and be focused on the business requirements rather than on one narrow definition of "good."
"There is no silver bullet."
So let's get this straight: rather than use "old fashioned" tape to produce offsite backups, you use...consumer grade hard drives to produce an onsite backup?
Do you and the admin at Journalspace.com share tips, by any chance?
That's not my company's policy, that's *my* policy. I can take a 3-month hit to my personal data. AND YET MY LAX PERSONAL POLICY WOULD HAVE SAVED JOURNALSPACE.
My *company's* policy is daily offsiting. Expensive, but very many of our locations could become a smoking hole in the ground and we'd still be able to restore and operate.
Fine. Get the cartridges, but what about the capital cost minus depreciation of the drive? What about random access?
Random access is why snapshots also have their place. :) Archival backups and nearline backups solve different sets of problems.
Now weigh those against an inexpensive jbod frame with a 2gb FC backplane.
What kind of capacity are we talking. For a small site you can pick up a little 2U unit that'll store 6.4TB uncompressed for under $5k. Or if you're running a larger site you can snag a 4U unit with two drives for about $15k that'll handle 30.4TB with optional expansion to 60.8TB native.
What's the write speed of LT vs a tasty little GB SAS drive?
120MB/sec per drive without compression. And now that you've talking about SAS drives your per TB cost is hopelessly optimistic. Even OEM packaged terabyte SAS drives are going to run you about a quarter a gigabyte, which is now four times the media cost of an LTO-4 solution.
Rackspace? You can put a dozen into about 4U.
So about 12TB in 4U compared to the 30TB unit I mention above.
Cooling? Although I'll grant you green cost, the random accessibility out-classes the seek time and tape insertion by a human cost dramatically.
Have you never heard of a tape library?
Stable media? Tape? Sometimes.
Properly handled tape is incredibly stable.
Shelf space?
If you're doing off-site storage, that's going to be an issue regardless of what media you're using. And as I pointed out, tape is far more compact and far lighter than disks.
No need to use tape anymore. Get out of the reality distortion field, but do the right thing by testing what you have and doing drills to ensure that whatever you have, works and is a procedure understood by all.
I'm not the one dismissing an entire class of technology while demonstrating ignorance of its costs and benefits.
I'm not sure what planet you're on, but I wish the rest of us were there with you.
Backup media should be and must be transported offsite every freakin day. You'd do that with a hard disk? Or more correctly, you'd do that with a STACK of hard disks? Or is your building fire, flood (including broken sprinkler pipes), gas leak, and drunken-truck-driver proof.
help me i've cloned myself and can't remember which one I am
can you restore a RAID with different hardware? With LTO3 tape I have several drive choices.. notably I can by a NEW drive and know the tape will work even 3-4 years out. What happens when the maker of your RAID solution moves on and wants to send you next year's model? Will the encryption and striping still line up on different hardware made by a different company?
You'll even be able to archive that LTO3 tape off-site for years and know that if you ever need to, you'll be able to read it in your new LTO5 drive.
With a SCSI or SAS hard drive you'll be lucky to even have the correct adaptors to be able to plug the sodding thing into your controller and power it up after two or three years...
Anybody who uses disk based backup for a while finds out that it needs to be augmented by tape sooner or later. A disk based subsystem gets full pretty soon once you get used to the convenience. If you continue to buy disk drives it gets really expensive so people find that the best of both worlds is D2D2T. This reduces the size and speed of the tape subsystem you need, but doesn't make it obsolete. You need offsite storage anyway.
thegodmovie.com - watch it
BSD is no longer BSD either. You need to pick your flavour, whichever one suits your poison.
Skot Nelson music is my saviour / i was maimed by rock and roll
And itsy-bitsy data sets apparently. But in a world where one's data is measured in tens of terabytes rather than hundreds of gigabytes, tape is still king.
again with the power requirements! To keep even month and year would require a massive amount of extra hardware and power, not to mention people to tend it. I'd agree it's super good at recovery but how much more value are you getting versus tapes in a safety deposit box?
Not too bad of an idea, least there is duplicate data at a different location which is better than what these guys are proposing.
Ideal thing is to have the duplicated data be MILES from the main DC site but it's not always practical when you have large volume of data to replicate and backup. Yes I know all about rysnc and smart data replication but still nothing like good old fashioned complete full backups at a expense of time. That always seems to work well for most people.
Your setup sounds *completely* vulnerable to a single malicious employee (with the right passwords). Typical engineer: protect against multiple failure modes, and disregard malice.
You don't have a backup until you seperate the data from the ability to destroy that data. Of course, there is WORM disk storage that meets that need, but that's far more expensive than tape (though handy for meeting auditing requirements).
Socialism: a lie told by totalitarians and believed by fools.
For everything to be just gone and I mean LONG gone, then something besides a truncation or un-linking of the file had to occur.
Now I don't know all that much about the apple file system, but I would imagine it is like most file systems in that it links clusters and sectors of data together using some sort of allocation table, hash, b-tree or something.
Now unless they had file scrubbing turned on and the OS purposefully went out and overwrote every segment of the file with 01010101 and 10101010 then the vast majority of the data should still be there, at least I would think it would be. I mean even the nastiest revenge oriented guy, would have to be able to invoke some kind of program to do that.
I am assuming that it was an SQL database of some flavor. I don't know much about MySQL internals but I am pretty sure a
delete from table
simply goes through the index and marks pages deleted and does not physically go out and scrub ever page that has data on it. I know that is how Oracle works.
So this leaves me wondering about the data recovery house.... I they were doing a sector by sector read on the entire drive ( either of them ) they should till see all sorts of data on the disk. Now I don't know if the database compresses data on the fly ( some do, some don't) and I don't know if drive compression is an option on OS-X. If so, I can see where they would see just mostly larges amounts of compressed data ( making things VERY difficult if not impossible to recover, but baring that, most OS's have the hooks built in do simply do a sector by sector read of the storage device and although your binary data ( images and the like ) might be unrecoverable, you could probably get most if not all of the text.
Just a thought, but hey I might be crazy, it is just the hacker in me that brings these things to mind...
Hey KID! Yeah you, get the fuck off my lawn!
I see your point, but something about this does not pass the smell test.
To have nothing on the HD(s) then someone had to very very carefully wipe the entire disk by overwriting every block and sector that the data occupied, and that would have made whatever DB system shit its pants as it started seeing data disappear so it would have been really obvious, really fast that something was amiss and as you relate would have more then likely caused a kernel panic and or at least a core dump of the DB system.
Hey KID! Yeah you, get the fuck off my lawn!
<sigh...>
Sometimes boldness is in fashion. Sometimes only the brave will be bold.
Agreed. Tape makes a lot of sense for high-volume applications.
It's just being crowded out of the low-end market by ever larger and ever cheaper hard drive sizes. Tape costs would have to drop by about a factor of 4 (or more) to compete in the lower end of the market where 100 tapes is a lot.
(If I could backup 800GB for $10, that would be much more of a no-brainer decision. The cost-advantage would be high enough to pay for the expensive tape drive. And $50 LTO-4 tapes are a lot better then back when a lot of large-capacity tapes cost $100 each.)
Wolde you bothe eate your cake, and have your cake?
"Either can do the job so one is a backup..."
Which one is the backup?
The whole point of a backup is that it is *stable*. Neither copy is stable, so there is no "backup on the hardware level". There are two active systems.
If you cannot restore an accidentally-deleted file from it, it's not a backup.
It is a serious mistake to use the term "backup" in relation to a RAID 0 array. There is only one correct way you can do that, "either disk can serve as a backup for the other, should its media fail".
Either disk can serve as a backup for the other *drive*. However, there is no backup copy of the data. It is *not* a backup solution. There is no backup.
There is, however, fault-tolerance. A media fault can be tolerated. But if the active copy of the data is corrupted, there is no backup.
SAN fabric is dirt these days. You can get a nice Silkworm and a cheap-but-reliable SAS backplane for dirt as well. Perhaps a couple of GBICs.... or some handy-dandy fiber cables (also dirt these days) and you're in business.
Out of curiosity do you have any idea what you're talking about? Fiber or GBICs? You're gonna need both...
Also SAN/LTO are hardly mutually exclusive. Plenty of tape libraries are san attached.