Why Mirroring Is Not a Backup Solution
Craig writes "Journalspace.com has fallen and can't get up. The post on their site describes how their entire database was overwritten through either some inconceivable OS or application bug, or more likely a malicious act. Regardless of how the data was lost, their undoing appears to have been that they treated drive mirroring as a backup and have now paid the ultimate price for not having point-in-time backups of the data that was their business." The site had been in business since 2002 and had an Alexa page rank of 106,881. Quantcast said they had 14,000 monthly visitors recently. No word on how many thousands of bloggers' entire output has evaporated.
That is one reason why mirroring isn't a backup, and why backups should ideally be off-line.
If I have nothing to hide, don't search me
We do data hosting, and I can't imagine how catastrophic that would be. Jebus. Let this be an ultimate example of why numerous backups are needed. Always. Without question.
It is an inexpensive protection against a total harddisc failure, but effective at this part. A software going rogue or a user deleting the wrong files can't be helped by it.
The rules of backups:
1. Backup all your data
2. Backup frequently
3. Take some backups off-site
4. Keep some old backups
5. Test your backups
6. Secure your backups
7. Perform integrity checking
It's more an issue that some people think that HA == DR.. which obviously this story reminds us that it is not the same thing.
Mirroring / RAID == HA.. if one of your HDDs let the smoke out, you still don't incur downtime. If you have a hot-spare, you're even better.. all it does it let you have alittle time to correct the
issue (ie: "It can wait until morning").
Also, one other very important thing.. mirroring doesn't prevent/restore data corruption. If you're mirroring your rm -rf (as pointed out by Corsec67 below), your RAID will happy do what it does.. and span your command to all your disks.... Congrats, you just successfully gave yourself HA to your disk erasing! :]
Backups are DR.. If your RAID croaks.. your SOL if you don't off-machine backups. If you accidently nuke your disks with an rm or something, you can still go back and restore data.. sure you'll likely loose -some- data, but -some- is better then all in this case.
----- The internet has given everyone the ability to have their voice heard equally as loud.. even if they shouldn't be
Even the greenest IT employee knows that mirroring is to protect against hard drive failure and not software corruption.
I only wish that were true. I've given up arguing with friends about this, who insist that their mirrors are good enough backups. I just stare at colleagues who think such, especially those who SHOULD know better. And I *know* coworkers are doing this @ work, too, and I'm just waiting for about 50TB of data to suddenly go missing...
"The urge to save humanity is almost always a false front for the urge to rule." --H.L. Mencken
They also purposely blocked archive.org via a robots.txt exclusion, so the bloggers can't use that to try and recover some of their blogs.
Looks like at least some content is still in Google's cache, those looking to salvage their journals should act quickly.
You can limit google's search results to a particular site by using the "site:domainname.com" search term (example) and then click the "Cached" links of each result to see Google's copy.
There's also a Greasemonkey script for Firefox that can automatically add Google Cache links next to page links, so you can navigate from one cached page to another easier.
Backups must be:
1) Automated - if you need human intervention, it will fail
2) Point-in-time - the system must be able to provide restores for a set of times, as fitting for the turn around on your data. A good default is: daily backups for a week, weekly for a month, and monthly for a year
3) TESTED: You must fully test the restoration process (if this can be automated, even better). Backups that you can't restore from a bare machine are worthless.
For better disaster recovery, backups should be:
4) offsite - if a fire or tornado hits, is the backup somewhere else?
5) easily accessible - how long will it take to get the restore going?
Not quite. Backing up a live database can be a bit tricky. By the time you finish copying part of the database, the first bit can change again. So you have to create a snapshot of some kind. And that has to be supported in the database setup (at the application or server level) in order for the backup to be in a consistent state. And you don't want your backup process to degrade site performance, either. So a simple file copy is totally inadequate.
A common solution is replication. Backup is then performed by creating a replication point on the slave database machine then taking a snapshot and copying that while while master database machine continues serving at full speed. Replication can then catch up when the backup is complete. Another advantage to having replication is duplication on the machine level -- if the master fails, go live to the slave with minimal to no downtime. Set both machines up in a master-master configuration and you can swap back and forth as needed, allowing live maintenance and backup with no performance degredation.
Be relentless!
*BZZZZT*
The GP was 100% correct. If you had kept reading, you'd see that the suggestion was to use replication so you can lock the DB into a consistent state while backing up. When the backup is done, the box starts replicating again. If you didn't have the backup box, you'd have to lock the production database while your backup was going on.
He was suggesting replication purely as a way to avoid having to pause the application during backup, not as the backup it's self.
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
ACID compliant databases use a log, much like a filesystem journal, that contains all the changes made to the database before those changes were actually written out to the main database storage. When you back up the raw database, you back up all the logs since at least the time you started backup up the raw files until the time the backup was finished, and when you need to restore the database you put the raw data back and then let the database replay the logs.