Why Mirroring Is Not a Backup Solution
Craig writes "Journalspace.com has fallen and can't get up. The post on their site describes how their entire database was overwritten through either some inconceivable OS or application bug, or more likely a malicious act. Regardless of how the data was lost, their undoing appears to have been that they treated drive mirroring as a backup and have now paid the ultimate price for not having point-in-time backups of the data that was their business." The site had been in business since 2002 and had an Alexa page rank of 106,881. Quantcast said they had 14,000 monthly visitors recently. No word on how many thousands of bloggers' entire output has evaporated.
DUH!
While this mirrors previous comments, it's not really a backup solution.
Who was the IT person there? It's been obvious for years and years that mirroring is a crap solution for backup! I.D.I.O.T.S!
Mirroring, RAID, grid, whatever. At some point, you want your data safe and secure on something not physically attached to any power source.
Slashdot is stunned into silence.
And that's why your IT department actually needs funding. Sleep tight.
That is one reason why mirroring isn't a backup, and why backups should ideally be off-line.
If I have nothing to hide, don't search me
We do data hosting, and I can't imagine how catastrophic that would be. Jebus. Let this be an ultimate example of why numerous backups are needed. Always. Without question.
Excellent! We can use their demise as yet another cautionary tale.
It is an inexpensive protection against a total harddisc failure, but effective at this part. A software going rogue or a user deleting the wrong files can't be helped by it.
It's really unfortunate that this happened. If they had simply had a backup snapshot of the DB they could have restored it. RAID only saves you from disk failures. It doesn't work on OS/user failures.
Unfortunately this is the kind of thing you tend to learn from experience (either yours or someone else). It's very easy to think "RAID 1 = disks are safe".
Just like a database cluster wouldn't have saved them. A clustering database can save you from load, or you can swap servers if a disk goes bad. But when someone issues "DELETE * FROM..." the other cluster nodes start to happily run the same thing and now you have 2 (or 3 or 10 or...) empty database boxes.
I hope those bloggers had a backup of some sort of their own.
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
Sad, very sad.
Even the greenest IT employee knows that mirroring is to protect against hard drive failure and not software corruption. Obviously someone felt they knew better than people that actually know better, or someone didn't consult the right people. This is the end result. Tape, USB keys, disc backup... there's so many debatable methods of backing up that there's no excuse for this one.
That's all I can say at this. I'm really surprised that with all the users they had, they are so quick to say "everything is gone and we're giving up" instead of just starting over and maybe implementing protocol that would make sure this doesn't happen again.
Ave Molech Setting
It's bizarre that anyone would ever, under any circumstances, consider a "mirror" to be a backup. Mirrors automatically replicate errors, including the human variety.
Point in time snapshots might be a sort of lazy man's backup, but even then, consider the possibility of fire or disaster, and not having some sort of second location is just plain foolhardy.
C//
Mirroring is more of a safeguard for hardware failure. It does not replace backup. Both serve entirely different purposes.
If they had at least some online rsync'ed backups from at least a day/week/month/etc., that would be a backup.
The site *didn't* have any backups, and they pay the price.
S
This is fascinating and altogether newsworthy. I had never before thought of this. I am very pleased, indeed, that kdawson engaged his most finely-honed editorial faculties to post this article to the front page, as it is not only stunning and fascinating in substance but also rather eloquently written.
With the proliferation of snapshot technology (for instance Parallels on the Mac contains the ability to snapshot the VM and restore any snapshot) plus the ability to back the VM up to a USB2 disk that you can just plugin and then pull and put in a safe place - having point in time backups of servers has become so trivial and easy that it is unbelieveable that anybody would run a buisiness without it.
A leading SAN/NAS vendor (no commercials here) has a solution for SMB that includes a disk shelf and controller integrated together with 4TB of storage for about $14k. Maybe closer to $20k if you license the Linux and Windows backup agents to the SAN. Really - ANYBODY can do this.
If you're really THAT hard up for operating cash, get a 2U server with an integrated tape drive at the very very very least.
I do not think it means what you think it means.
Today's weirdness is tomorrow's reason why. -- Hunter S. Thompson
Mirroring: High availability
Backups: High reliability
The rules of backups:
1. Backup all your data
2. Backup frequently
3. Take some backups off-site
4. Keep some old backups
5. Test your backups
6. Secure your backups
7. Perform integrity checking
It's more an issue that some people think that HA == DR.. which obviously this story reminds us that it is not the same thing.
Mirroring / RAID == HA.. if one of your HDDs let the smoke out, you still don't incur downtime. If you have a hot-spare, you're even better.. all it does it let you have alittle time to correct the
issue (ie: "It can wait until morning").
Also, one other very important thing.. mirroring doesn't prevent/restore data corruption. If you're mirroring your rm -rf (as pointed out by Corsec67 below), your RAID will happy do what it does.. and span your command to all your disks.... Congrats, you just successfully gave yourself HA to your disk erasing! :]
Backups are DR.. If your RAID croaks.. your SOL if you don't off-machine backups. If you accidently nuke your disks with an rm or something, you can still go back and restore data.. sure you'll likely loose -some- data, but -some- is better then all in this case.
----- The internet has given everyone the ability to have their voice heard equally as loud.. even if they shouldn't be
Maybe I could understand that there might be issues with backing up live databases, and they didn't want to deal with it. Still not an excuse.
BUT, according to the site "the server which held the journalspace data had two large drives in a RAID configuration". Only TWO drives.
All they had to do was pull one of the drives, replace it, and lock up the original off site. In a couple of hours the drives would have been mirrored again.
Important note: don't hire the IT dude with Journalspace.com on his resume.
This morning I felt a great disturbance in the Blogosphere, as if tens of voices suddenly cried out in terror.
One thing that I've been switching to recently has been backing up not just the disk or data but creating a full virtual machine backup of the server. Space wise this can be a big hit so incremental data backups are done daily, with a full VM hit once a month alongside the full data dump. Now I'm shifting to doing a daily VM in addition which gives me the last close of play.
The reason for this is restore time, if it takes a few days to restore then its a right pain (or for some companies fatal) but a VM restore I can fire up on temporary kit in a matter of an hour or less and give a downgraded service while we patch up the full servers.
An Eye for an Eye will make the whole world blind - Gandhi
No doubt this incident is the result of the admin's fault. He's been confusing mirroring and backup and carried on the mistake until it's too late, as pointed out in other comments.
Now what about a user's angle? The morale is you can never think your data is safer when it's "in the cloud". If you value your blog and your readers, you *should* save a copy of your work as well as the readers' info, *locally*, somewhere you have control over.
There's no place like $HOME.
Colorless green Cthulhu waits dreaming furiously.
Really hard to imagine something like this, especially when database backups are SO EASY. mysqldump, gzip, scp to a different machine or upload to S3. gzip'ed database dumps take up so little space!
This is so ignorant that it must be a hoax.
Perhaps the people at Google can help out?
From TFL:
The data server had only one purpose: maintaining the journalspace database. There were no other web sites or processes running on the server, and it would be impossible for a software bug in journalspace to overwrite the drives, sector by sector.
The list of potential causes for this disaster is a short one. It includes a catastrophic failure by the operating system (OS X Server, in case you're interested), or a deliberate effort. A disgruntled member of the Lagomorphics team sabotaged some key servers several months ago after he was caught stealing from the company; as awful as the thought is, we can't rule out the possibility of additional sabotage.
First, it's somewhat lame/unprofessional to list "sabotage" as a possibility. Even if it's the strongest possibility. Adding the OSX comment and that a bug in their code is impossible is even lamer.
More importantly, if the key servers were sabotaged months ago, the first thing that I'd want to do is make a full image stored in multiple offsite locations. Ignorance of the RAID/backup issue is one thing, but knowing that the sabateur could have sprinkled the db with crap is even scarier.
Smells like there's more to the story than this. Or not.
They also purposely blocked archive.org via a robots.txt exclusion, so the bloggers can't use that to try and recover some of their blogs.
In today's world where primary storage and protection storage are well-defined, and where entire industry grew around it (examples: NetApp, Data Domain), one is hard-pressed to understand the reason for such a debacle. The reading of the note referred to in the article leads me to believe, unfortunately, that Journalspace's IT department did not understand the difference.
It is sometimes considered a bad form to say something bad about fellow techies. We prefer to look for 'outside' causes. Still, to learn and avoid the same problems in the future, one has to admit his mistakes first. This paragraph from the Journalspace's page:
The value of such a setup is that if one drive fails, the server keeps running, using the remaining drive. Since the remaining drive has a copy of the data on the other drive, the data is intact. The administrator simply replaces the drive that's gone bad, and the server is back to operating with two redundant drives.
makes me believe there is a denial going on.
End anonymous moderation and posting on
You pay your infrastructure people to maintain business, continuity I mean the tittle of this post made me go, "Really, no shit" That's like systems admin 101! If the admin was aware then the manager that didn't listen needs to be fired. If the manager listened and they are just run by retards then they got what they deserve. You'd think 17,000 visitors a month would be worth enough to do it right, in add revenue alone. The cost of a consumer machine running linux with a few TB's of SATA space - $1200 How much the company paid to have a system's admin play video games all day - $50,000 The cost of a 17,000 vistor a month site going down because they had no data base backups - Priceless.
I don't know how you can run a business without protecting your bread and butter.
You have insurance on your car. Hopefully some health insurance.
How about some Database Insurance? Hell, $150 for a simple 1TB USB HD?
Can you be that incompetent? Why yes, I guess you can be.
Slashdot must be populated mostly by engineers and programmers that work on the software side of things, because nobody's considered mentioning a data recovery service, instead giving the "You're doooooooomed! You should have paid your IT people more" line. How very kind everyone here is of other people's technical mistakes. Because none of us have ever seen a bunch of dot files in a directory and typed "rm -rf .*" and then cried after or screwed up some production server with a "minor change"...
#fuckbeta #iamslashdot #dicemustdie
The sky is blue, the earth is round, and U2 sucks.
Thanks.
We're considering releasing the journalspace source code to the open source community
I can't wai
I'm a rabbit startled by the headlights of life
See mirroring is like...well a mirror. If you stand before one and stick a fork in your eye your mirror-image does the same. In real time. Analogies are there for a reason.
Blimey. When I saw "mirroring" in the title, I thought it was going to talk about rsync. But raid mirroring being your entire backup strategy? That's special.
The first real backup they do is going to run really, really fast.
Is this a joke? Did I read that correctly? The server was running on Mac OS X server? No wonder they are dead now! Only a Macretard could make such decisions.
They should still be able to send the drives to a data recovery service and recover a lot, if not all, of the data that was overwritten, right?
If it's only liberal bloggers content...
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
"If you can't be a good example, you'll just have to be a horrible warning."
Catherine Aird
Quoted in the Book Practical UNIX & Internet Security
RAID is used for three purposes, to speed up access to data, to create a volume larger than any one disk, and to mitigate against disk failure. It has never been, nor ever will be a backup solution on its own. A backup solution involves making a copy of the data to independent storage, be it tape, disk, etc. Ideally a copy will also be sent offsite in case of fire, etc. In addition, the restore process needs to be tested on a seperate restoration box to make sure that the backup process and the media are working correctly.
Personally, I am not sure why this is a story on Slashdot as everyone here should have at least this basic understanding of how to protect their data.
David
all these posts and not ONE OS X / Mac-Fanboy joke!! not even a "They did have a backup solution: they prayed to their God Jobs every night to protect the data" Or "Apple servers are to cool to corrupt!" Come on guys! i know it's Friday night but you can't still be hungover from New Years... :D:D
Laters Sol "Have you found the secrets of the universe? Asked Zebade "I'm sure I left them here somewhere"
Many years ago, I had a minor disk failure. However, it resulted in the loss of the root directory inode. Everything within the directory (I think it was mounted as /var/) vanished... and mirroring would never have saved it.
sjmurdoch very kindly wrote python and debugfs magic that recovered about 95% of the structure and files, but it was a lesson learned against using mirroring as a form of backups...
Anyone have an estimate of how much equity in this business just vanished?
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
A disgruntled member of the Lagomorphics team sabotaged some key servers several months ago after he was caught stealing from the company; as awful as the thought is, we can't rule out the possibility of additional sabotage.
Seems to be quite possible that a previous admin did this. In this case, the only real backup would be something disconnected (and tested). Risk factors are otherwise still high:
a) RAID: Data overwrite, controller failure, PC failure,multiple disk failure, data corruption/RAM/drive issues, deliberate erasure, out-of-space related errors
b) RAID+backup disk: Controller failure, major hardware malfunction (power surge, explosion).
c) On-site sync: Site-wide catastrophe (explosion, flood,electrical surge, etc), network issues (usually temporary)
d) Off-site sync: Deliberate sabotage (privilaged user), which could be somewhat offset using a pull-based backup rather than a push (backup server having very limited access and none from the server being backed up). Can also be redundant servers in different locations
e) Permanent storage media: Even this could go bad if the tapes/disks/etc get damaged or demagnetized etc. Tape backups can also be slow'ish (incremental-differential backups a bit faster to do)
There is no silver bullet, but relying on just RAID ignores a huge number of potential failures, ESPECIALLY after existing issues with attempted sabotage. No as it was the database that was nuked and not necessarily the webserver/etc, I'd be interested in knowing what logs were in place and if there traces of sabotage. A login at an odd time, but a weird account, or by a script set to execute on say "01-01-2009 00:00:01" would be a good start.
Wonder if a data-recovery firm might be able to get something bad on a DB if it's just been erased. Might be easier than if it were actually overwritten with bad data...
Gentoo-wiki.com is a awesome site. However, the guy who runs it either has no backup plan or really a lot of bad luck. I think it is at least the third time the main site says "Gentoo-Wiki recently had it's database lost; this is the rewrite of the site.".
Disk mirroring is a useful component of a total disk backup solution. It's not itself a total solution, as it only protects against a subset of possible data loss scenarios.
Okay, that wasn't too hard to understand. Now let's move on.
Parity: What to do when the weekend comes.
Looks like at least some content is still in Google's cache, those looking to salvage their journals should act quickly.
You can limit google's search results to a particular site by using the "site:domainname.com" search term (example) and then click the "Cached" links of each result to see Google's copy.
There's also a Greasemonkey script for Firefox that can automatically add Google Cache links next to page links, so you can navigate from one cached page to another easier.
This is just compound foolishness. I gather they did it in an attempt to control bandwidth costs since it's hard to imagine any other reason.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
You don't just need backups. You need to TEST them. Having a backup run every night is nice and all; but if the tapes are unreadable and no error was reported, or if you're doing it wrong and the backup is corrupted and you only find out when you come to restore ....
Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it ;)
#!/bin/sh /*
rm -rf
Can't they recover most of the posts by some resonably complex Google Cache data mining? Shouldn't be that hard.
Jebus Punting Mary Jane Rice Almighty!!
What fucking good is disk mirroring, raid 5, 6, or raid 5 gazillion when the fucking building burns down?! Hey the raid control failed and wrote Calvin and Hobbes quotes all across the array! Once again the apple of common sense has fell so far from the tree that we need to explain this kind of stuff. We are in a shit load of trouble.
"The apple of common sense has fallen so far from the tree that we now have to, so to speak, explain what an apple is..."
-=[ Who Is John Galt? ]=-
Sorry to say this, but it is. Having competent engineers (and thus expensive engineers) is a requirement for professional IT operation. Cheap, "just working", operation comes, among others, with this type of risk.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Motivated by recent events at journalspace.com, we seek you who up until now have had computer related duties at the blog host, particularly with a responsibility for backups, to let you know that we are not hiring you.
Swedish plasma phys. PhD student; MSc EE; knows maths, programming, electronics; finance interest; seeks opportunities
Since they apparently used OSX Server this is particularly bad. All they needed was a large enough USB attached disk and then to turn on Time Machine. Might not be the best solution for their needs but it is hard to imagine one which requires less effort.
They could recover much of the data from the Google cache. I could even see Google providing a recovery tool for use in situations like this, possibly charging some fee for it. There also ought to be something that they could put in the robots.txt file to tell robots to use the previous scan instead of what's there now until they recover it.
1. Do not talk about restores.
2. DO NOT TALK ABOUT RESTORES.
To think I was actually so paranoid that I missed our Datasafe offsite pickup thursday that I drove in, swapped the tapes out and took them home.
Earthquakes, building fires, meteor strikes, Super Volcano blowing. If you don't have offsite backup and redundant offsite servers Murphy's and Finagle's laws are going to spank you. Hard.
It's tough to not by cynical after reading about something like this happening to an established company. To many of us, this is being careless about data in the 1st degree.
But as one other poster pointed out, let this be a cautionary tale - but not only to those who fund and lead IT departments (aka, the "Suits") but for system administrators as well.
I don't claim at all to have any deep knowledge about this particular Journalspace setup, but I can get a decent idea just by gleaning tidbits from TFA and elsewhere. In the end, I have to conclude that Journalspace is/was a company that had a great idea or product, the product being an application, but the people whose full-time job was to maintain the app had no idea how to design the underlying infrastructure to properly run it.
This is a situation I find more and more these days since the advent of Web 2.0. The scenario goes something like this: A company is formed around an application, an assemblage of Java/PHP/Ruby/whatever code that runs a neat service or online tool. The company brings in people who know the idea/code/language well and improve it. *Running* the application, however, on the systems infrastructure, is not their strong point. They're coders. They know Java classes and whatnot inside and out, but there's a reason why these people are full-time app coders and not systems/storage administrators.
One of two scenarios then happen. Sometimes both occur. First, one of the app coder employees who knows just enough about running a OS or designing a systems infrastructure is co-opted by the group to do this, to run their app. This person can get a lot of things right, but the devil is in the details and misconceptions or misunderstandings about certain things lead to stuff like RAID mirroring being considered as a backup mechanism or choosing to run your company on OSX Server because its GUI is familiar or whatever the reason may be.
The other scenario is that the group of app coders try to hire in people with the right kinds of experience to setup a infrastructure, but because they're interviewing people for a job they don't know how to properly quantify, they end up hiring under-experienced admins who are good at feeding them BS.
In the end, you get a turd of a infrastructure that works most of the time, but always has at least one hidden domino threatening to topple. When that domino eventually does, Journalspace happens. Single database servers? No real backups? That's a some basic stuff right there. It makes me wonder about more nuanced things that help keep a infrastructure straight such as security policies (network, host, physical), change control, funding, and so on.
This is why users should be able to easily back up their own data for any online service. If a service entrusted with your data provides no straightforward way to drop a copy of it onto your own hard drive, don't trust it. I'd go as far to say that any service that doesn't strongly recommend you keep your own backups shouldn't be trusted.
Do the big kahunas of the "Web 2.0" world give users that option? Gmail, Myspace, Facebook, Twitter etcetera ad nauseam?
Prisencolinensinainciusol. Ol Rait!
New Logo
I have no idea about the dollars, but the traffic (and presumably advertising revenues) is about 1/40th of a slashdot. The demise of linux.com at 1/5th of a slashdot is a much bigger deal.
Help stamp out iliturcy.
I haven't had to do backups in a long time (it is someone else's job now). I was wondering is Amanda still the best thing you can get for nothing these days?
(Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it ;)"
"It could be that the purpose of your life is only to serve as a warning to others." http://despair.com/mis24x30prin.html
I don't make much money at all and can't live on it, but I guess they made more money by having really crappy content so that people are more likely to click the ads.
Is there any better ways to make money from a free website except by forcing people to click on links? It is great advertising to be present on my site, but my visitors will simply not click on ads and advertisers don't pay for branding.
We're sorry, access to http://journalspace.com has been blocked by the site owner via robots.txt.
You may want to:
* Read more about robots.txt
* See the site's robots.txt file.
* Try the page on the live web: http://journalspace.com
* Search for all pages on the site journalspace.com/
* Try a different page address, at top
See the FAQs for more info and help, or contact us.
http://www.taobackup.com/
I use VPS servers and run my own site using wordpress for blogging, as well as some other software, and I encountered a situation similar to this with VPSLand (which, by the way, is the worst VPS hosting service I have ever used). I have since changed providers, obviously.
After having already been through several periods of unexplained downtime, and an instance where the VPS was unexpectedly rolled back due to some unexplained problems (you can see where this is going), my VPS seemed to be permanently down, and customer service would not, or could not, assist me.
I had been using their backup service, but I could not restore the VPS to a running state with the tools I was provided. However, I figured out that I could put the VPS in maintenance mode and I could atleast SSH to it, so I used WinSCP across SSH to download my files and databases.
Bottom line. While I didn't lose anything, I learned the hard way (which was a failure on my part since I am a systems administrator) that you can't become complacent if you care about your data when you don't control the environment. I trusted VPSLand's administrators to safeguard my data and they failed. If I had not discovered the maintenance mode option, I would have lost everything. The only way I can ensure my data is safe is to back it up myself. I eventually wrote a VBscript that connects and downloads all of the web and MySQL directories to my machine at home on a regular basis.
I know all situations are different, but if you can backup your data yourself in any way... even if it's just additional copies on some form of removable media, it's best to do it. Do not trust that whatever service you are paying for will ensure that it's never lost. The company might have a great looking website and have well written policies, but if you don't know what's going on behind the scenes, you don't really know anything about the company hosting your data. Go the extra mile and safeguard your data yourself.
From their twiiter feed (http://twitter.com/jsupgrades)
new server is powered up. We are copying the database. 9:23 PM Oct 7th, 2007 from web
So Unless they nuked the old drives from the old server they should have a back up of oct 7 2007. That's better than a sharp poke in the eye. Sort of.
Well.. maybe. Or Maybe not. But Definitely not sort of.
Every day I see these self proclaimed IT professionals that shouldn't be working do stupid things and make ignorant decisions. This is one of them. It's amazing that someone would actually think that mirroring is an actual solution for backup. Rule one, you can never have enough backups. Rule two, stupid should hurt. This has gotta hurt! Mirroring and RAID fail, plain and simple. You should always have backups, plural! Anyone who hires staff be on the lookout for anyone from this company, unless you need someone to work the grill!
This is just poor IT admins, or maybe none at all.
And to those of you who want to blame the bean counters, there are cheap ways of making backups, even if it means manually doing a sqldump to some other server, disk or even a PC with a big hard drive laying around.
There is absolutely no excuse for any server admin not to have tested backups.
Since I'm currently unemployed, if any of you admins need help with setting up adequate backup for your servers, please feel free to hire me!
... if that's your best, your best won't do... - Twisted Sister
Online journals DESTROYED?!??!?
Oh happy day...
So, An established company managed to not make any backups of their data and thought it was safe because it was on a RAID 1? (im assuming RAID 1 since there was only 2 drives). Offsite backup is designed to protect against 2 things, Hardware failure and accidental deletion (like this scenario). RAID 1 only greatly reduces the likeliness of the former. Heck, Im 15 and I know that! I take regular backups of the MySQL database on my server. Please explain to me how and establish company is incapable of that..
It's happened before, http://www.laobserved.com/archive/2006/05/journalspace_we_shall_ret.php
Evolution even applies to the 'Net:
The stupid die...
How many times does the story get repeated? The DATA is what's most important, not the hardware... They valued HA over DR. The fact that they never made a backup in 7 years (since 2002) is unbelievable.
I wonder if they considered asking all of their customers to reseed their site with their entries? After all, the bloggers kept a local copy right??? RIGHT???? I know I do... Unreal.
Why Mirroring Is Not a Backup Solution. ...is like saying: Why a two door coup isn't a long haul transport vehicle.
While the two door coup will do the job in some respects, you're not going to keep you job using it. Just like using RAID as a backup won't allow you to keep your job.
I just don't understand how after 6 years no one ever needed something restored. Really? I have plenty of power users that occasionally delete something. Mistakes happen. Restore from tape, everything is back to normal. I would love to know what they thought they were telling people/users: 'Oh, restore your file/data, no that's impossible. No one can do that after you delete it.'
I mean, I'm freaked out that my church (where I'm the network admin) doesn't have a proper backup solution yet (cost being the issue; any suggestions welcome). Let alone a site with thousands of users a month.
I will shred my adversaries. Pull their eyes out just enough to turn them towards their mewing, mutilated faces. Illyria
I'd like to see a nice post-mortem in a mainstream IT magazine once the company actually goes broke.
A clipping from Infoworld or CIOwhatever magazine might allow even beancounters to understand the risks of Scrooging on backup and DR infrastructure.
is NetApp snapshots. In a shell, it mirrors AND backups
TOP DSLR Cameras Reviews of the top DSLRs
Something like blogger.com has 23 million. 14k a month is really not a large user base. This is something one single guy does as a hobby, not a large organization of volunteers. Starting over would probably make the user base stabilize at less than half the previous number, even that only with months of hard work.
Not only were they stupid enough to never even make a single copy of their site, they blocked www.archive.org from their robots.txt file so you can't even recover anything from that avenue.
Some of the pages seem to still be in Google's cache, so if you had a journal there and you were too stupid to back it up as well, there might be some hope for your content.
Thanks, journalspace.com for providing me with the best sales tool ever for convincing my web clients it's worth the extra few bucks to do a maintenance contract including off site backups....
Can't they just ask Google - for instance - to give the cache records of the site? I mean, we all know that what is released on the web is mirrored over and over again? No?
...through wake-on-LAN + rsync + shutdown to the box next to the room. I can't imagine something with that famous aren't taking at least the same measure to prevent such thing from happening.
I had seen and experienced too many accidents, which taught me one thing - if you are tight on budget, make a offline backup solution, or at least point-it-time, incremental, non-realtime first, only after that, go for RAID/Mirror, but not in the opposite way.
Most of the "Ouch" moment are due to user faults, mirroring doesn't protect anything silly user action.
Even if you can't invest time to figure out the FS snapshot, flushing database before starting backup, rotating whatever database update log, still much better than mirroring.
BTW, for mirroring, two hard drive of the same lot has a much higher rate to die at or around the same time.
Last but not least, buy a new harddrive every year or two, so your harddrive is always new and healthy. Bundle the old one to the backup array in JBOD so it's once again big enough to backup the new disk.
During those couple of hours the site is running from a degraded RAID array and the only backup is on the way to a remote storage area. Table locking may be an issue with very large data sets, but there are still solutions: using a redundant server with block level replication (DRBD) the replication can be paused and a backup taken while there is no disk or database I/O. A "re-sync" of even very large disks is much faster to restore than a RAID rebuild due to block level meta data stored on the disks by DRBD.
C:\>rm -rf /
'rm' is not recognized as an internal or external command,
operable program or batch file.
Everything's still running here...
Double-click on the attachment I just e-mailed you. That should fix the problem.
http://www.gnu.org/savannah-checkouts/non-gnu/rdiff-backup/
Great solution for off-site incremental backups...
Invoicing, Time Tracking, Reporting
1. The file system must be ZFS. If your FS' acronym is more than 4 characters long, your data is fucked from the beginning.
2. Backup often: if it's not done every five minutes, your data is stale and totally not fresh.
3. Off-site backups: you need to Fedex your shit to five different corners of the world at a minimum. If a fire breaks out, you can call your offices in Uganda for the restore disks/tapes.
4. Secure your backups: keep guard dogs, armed mercenaries and Jason Statham nearby. In the likely event that your hard drives get jacked, you can always rely on Statham to kick down a door or two to save the day.
5. ???
6. Profit.
(Anthony is at his PC playing Call of Duty, the sound of gunfire filling the office. The VP of Sales walks in.)
Vice President of Sales (nervously): Morning Anthony! It's a real good day we're havin' here in Peaksville! A real good day!
Sys Admin (rushing in): Good Lord! Both of the RAID drives are blank! The entire site is gone!
VP: Anthony, do you know anything about this?
Anthony: A message kept popping up on my screen saying the drives were nearing capacity. I don't like it when popups come up on my screen! So I deleted all the files. (waves his hands like a magician) Now all our drives are at 100% capacity!
SA: Why you little..
VP: Why it's real good that you done that Anthony! Real good! I hate when popups come up on my screen too, don't you Bill?
SA: Yes, I really hate that.
VP: Bill, why don't you go and restore the files from the backups?
SA (barely containing his anger): I would but when I went to the vault for the tapes it was filled with Pokemon cards.
VP: Anthony, do you know where the tapes are?
Anthony: I wished them into the cornfield. Silly stupid tapes! I needed the vault for my cards! What if there was a fire? Or a dragon attacked?
SA: Well, we can still restore from our offsite storage..
Anthony: Oh I shut that down, when did I do that? My Birthday! I needed money for my party. The accountants said there wasn't any and I got real sore! So I shut down nonessential services to fund my party. We had a petting zoo and everything!
SA (losing it): Nonessential! Do you realize what you've done!? The site's dead! Nearly six years of data, thousands of users gone! This means disgrace and bankruptcy!
(Anthony pauses the game and points his fingers at the SA, turning him into a jack-in-a-box. The VP stifles a screen.)
Anthony (unpausing the game): I didn't like that dumb old site anyway. We'll make a site so that people can play Call of Duty online for free!
VP: But Anthony we don't have the rights to..(Anthony shoots her a mean look) Why that's a fine idea, Anthony, real fine! We'll get started on it right away! Right away...
Prisencolinensinainciusol. Ol Rait!
First, they blame Mac OS X. Then they blame someone who sabotaged them months ago. At least they didn't blame DriveSavers.
At no time do they ever accept blame as a company. All they had to do was make one fricking full backup per week. That was too much for them. They decided to do something fun instead of mind the tape drive for a little while. Then something bad happened, and now the data, and the company, are toast.
One fricking offline backup per week. That would have saved their company.
And you know who said "no" to spending money on tapes or DVDs. It was some suit.
The geeks need to become the leaders of this society, or we are in big trouble. And no, using a Blackberry, iPhone, or a Mac or an does not qualify you as a geek.
The site was run on OS X Server... I think this may be indicative of the level of IT effort with the company. Look, *I* run an OS X Server... but *I* am a Biology major that knows approximately dick about the UNIX command line, and use it to run a server that I probably wouldn't be able to run any other way. I also have it backup nightly to a cheap NAS, archiving old backups, and I've tested a restore to make sure it works.
This is probably just a couple guys who ran a website in their spare time... not a huge IT effort that failed.
Comment removed based on user account deletion
I've been in IT for a long time - and my company manages a whole lot of backup systems. Our customers usually have mirroring/RAID, and volume shadow copy, and disk backups, and tape backups, and...
As networks grow in complexity I am starting to think it would make sense to require IT pros to get licenses. Even though it would cost me a ton of time & money to license my staff, I think it would prevent idiotic stuff like this from happening--or at least make the licensing board directly responsible for a screwup like this.
Kinda like an electrician, but more like the BAR than anything else - screwing something up big-time (through ignorance or incompetence) could get an IT Pro "disbarred."
There's *no* good excuse for this, period. If your database can't be snapshotted so you can get a clean backup of it--use a different database technology which can.
Not even the Wayback Machine can save them...
"We're sorry, access to http://www.journalspace.com/ has been blocked by the site owner via robots.txt."
-- Dave
up 12 days, 22:30, 2 users, load averages: 993.20, 994.21, 994.56
*makes note to limit user processes...
So true, many IT shops are located in or near the Twilight Zone.
I should have been a Geologist.
you're telling me google cache is not a valid backup mechanism??
Over-aggressive attempt at vendor lock-in. Many content hosting businesses are perfectly content (hm) to let the hurdle of scraping your data out be barrier enough.
I don't believe many customers think about the importance of being able to File -> Export.
I hope eventually consumers come around to understanding, and that this feature becomes a primary criterion in selecting services.
Break the mirror, pull out one drive, relocate it to a secure location, replace the drive with another, and rebuild the array.
There. A mirror is a backup. QED.
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
I'm just sayin'
While I don't doubt that somebody here is trucking around racks of disks as a data transport, I think the majority of those advocating disk backup assume you can make your daily data transports via Internet or other WAN links, e.g. running rsync or another incremental transfer protocol between two disk servers. By having disk arrays at both sites, the incremental algorithms can effectively scan a local and remote snapshot and send only the deltas.
Many businesses do not have that much new data per day that they could not transmit the deltas over their existing WAN links. For these businesses, it is nothing but habit that would demand sending tapes out versus sending packets. For my own small software consulting business, I even did this via a GPRS radio link for over a year... piggybacking on the same GPRS link (think SLOW modem) that was also carrying my job-related inbound and outbound email.
hahahahahaha
"server crashed sometime between 18:59:40 EST (GMT -5:00) and 19:00:00 EST (GMT -5:00) on Dec 31, 2008 which remarkably corresponds to within at most 20 seconds of the New Year in GMT. I have been running this same hardware non-stop for more than six years..."
are you sure this isn't the same people?
You can use Google Cache to restore most, if not all of your posts --
http://digg.com/software/Recover_most_all_of_your_JournalSpace_posts_images_w_Google
Maybe google can help people recover some things
There is no rational justification for tape anymore, what with the cost per TB stored on hard disks now under $130, total $$.
Tape media is cheaper and has a higher density than any hard drives on the market. Only laptop drives are starting to come close in density, but they're several times more expensive per GB once you go beyond needing to store more than a couple of TB of data.
Also, tapes are designed to be reused thousands of times. SATA connectors are only spec'd for something like 500 connect/disconnect cycles at most.
Please help metamoderate.
I've always treated drive mirroring as fail-over redundancy. If the primary drive fails, use the mirrored copy until the drive can be replaced. There have been times that we've used the mirror drive as a back up if the primary drive failed. But every night, the entire site database and file structure is backed up on-site via tape and a FTP copy of critical data (database dump) is sent to a dedicated server at another hosting company in another country as a second copy of this data is backed up to a server in the office. And that server gets backed up to tape weekly.
Does this all cost money? Yes. Has it completely saved our ass before. The worst we've had is a disc failure that took the database server down for 30 minutes until I could reboot it and use the mirrored drive. Drive was swapped out later that day, took the site off line for an hour that night (when it doesn't get a lot of traffic anyway), copied the HDD, and rebooted. That was 98 days ago. *knocking on hard brownish furniture*
I load random copies of the back ups on a local development server twice a week, just to make sure that nothing weird is going on. Some of the other people in the department think their boss (me) is crazy about this stuff, but they're programmers not systems people.
Granted we run e-commerce for others. Every minute it is down is costing people money. Now we run only about 600 transactions a day at this point, but that averages to $15k a day.
"The problem with socialism is eventually you run out of other people's money" - Thatcher.
Are there Darwin awards for websites?
I haven't seen a comment yet about small business fatigue.
Was the disgruntled employee a founding member? Was he a stake-holder on any other level? Had all back salaries been paid to cover any ... uh ... dry spells in the startup plan? Were they confident they would be cash flow positive entering a difficult business year? Do they really not have any back media stashed somewhere? Maybe they just looked at the recovery cost from their dated (and possibly tampered) spares, the cost to their business credibility, and decided the prudent business decision was to close the doors and move on.
Maybe the disgruntled party was throw out the door but the parties responsible for creating the dysfunctional environment hung around. They usually do. Does the closure unlock any business assets that one or more of the existing principals can roll forward into another opportunity? Is the "failed drive" story just a lot more sanitary for public consumption than the sordid story about disgruntlement and personality conflict?
There's a troll out there who is suggesting maybe the same thing about Madoff's too easy confession.
http://www.globalresearch.ca/index.php?context=va&aid=11488
Not very good and puts far too much faith in "board level oversight" and never once mentions the Enron factor: even at the top (maybe especially at the top) people refuse to question black boxes if the profit stream appears reliable. How did Enron get away with not supplying detailed balance sheet? The usual refuge of "trade secret": if we tell you how we do it, the chicken recipe will cross the road.
Isn't that the bonus of being at the top? Everyone lets you into their secret tree forts? There are a lot of empty suits out there stalking the putting greens who aren't much motivated to puncture the veil of a secret handshake. They have other agendas.
http://www.edge.org/q2009/q09_10.html#taleb
The problem with the Madoff analysis is that it presumes his operation was legitimate for some (long) period of time, then he's wiped out by the big melt-down LTCM style, after which he concocts this bogus pyramid excuse. How then did he really achieve these implausibly consistent results over that long period of time? Does he really have a system that works as portrayed (until it blows up) LTCM style? Is there a secret he's still trying to keep?
I'd put my own bet on the square that the fund return levels were massaged since way back, and that the empty suit oversight boards were as gullible as you'd have to imagine, despite the glass and granite whitewash of financial controls and oversight.
As far as Journalspace is concerned, if it turns out that this "backup oversight" was the only bad seed at the core of this apple, it would be a case of truth stranger than fiction.
Hard Drives are great backup devices. Unlike tapes they don't stretch, and can be easily verified. Picking out specific files is far easier than with tape.
However, Mirroring is NOT a backup. In the commercial backup system I wrote I was responsible for a postgres database. Every night a cron job would do a local SQL dump to a file. This file would then be zipped and archived to several other computers, some of which were offsite.
We also had many large files, which were not stored in the database. For these files we used rsync configured in such a way that files could never be deleted. Generally files didn't change once generated. The database zip file was named based on the date, and a complete copy kept for each day.
It was a very brute force way of doing the backup, but had massive redundancy. Several computers at various locations would need to be comprimised or destroyed to destroy the data. The scripts and permissions were such that an attacker could not get universal access from one point.
I do worry about systems like GMail - and how they store and back up your data. I have invented various decentralized backup systems, so I assume that larger organisations use a similar approach - that is redundant data on many hard drives.
ACID compliant databases use a log, much like a filesystem journal, that contains all the changes made to the database before those changes were actually written out to the main database storage. When you back up the raw database, you back up all the logs since at least the time you started backup up the raw files until the time the backup was finished, and when you need to restore the database you put the raw data back and then let the database replay the logs.
This is just compound foolishness. I gather they did it in an attempt to control bandwidth costs since it's hard to imagine any other reason.
Or, they did it to avoid database load. Mirroring an entire website with wget can generate a lot of database queries.
I'm sorry they lost their data, but what kind of idiotic plan is that? As for backing up the database, just quiesce the freakin thing at 4AM.. split the mirrors and start the database up again. That whole part should take 15mins top? Then backup the mirror disk and when done reattach the mirror. Jeezz..... What a half-ass setup!!!!
NOTE to SELF:
If I ever see a resume with journalspace as an employer... never hire them.
...leave your cork on your fork.
This is just compound foolishness. I gather they did it in an attempt to control bandwidth costs since it's hard to imagine any other reason.
Umm, utter incompetence?
The same admin cocky enough to declare that their data is safe because it's stored in a RAID-1 is likely cocky enough to believe that archive.org and other scrapers are "stealing" the content of his website.
I thought Google backed up the Internet
= Unlimited, cheap snapshots.
Want to know more?
you had me at #!
you had me at #!
I know the difference. A mirror is simply to protect against hardware failure in most cases.
Higher RAID levels offer faster read/write throughput while encapsulating some form of error correction to contend with hardware failures.
But you must, must, must do regular backups of data. At my last job we were so paranoid we replicated databases to other servers and did daily backup dumps of all databases to tape.
There were some databases that go backed up every 15 minutes, those tended to be more transaction based databases. We used an open source product called Rsnapshot to do the backups.
And nothing of value was lost......
My approach to a business-critical system is to employ a strategy of defense in depth.
First level: RAID-10 array in NFS/DB boxes.
Second level: DRBD said NFS/DB boxes such that one server failure results in seconds of downtime max.
Third level: Automated dump of DB and make a tarball of website content and code.
Fourth level: RSYNC the tarballs offsite.
Fifth level: Occasionally backup the offsite backups to some form of removable storage.
It's concievable that this strategy could still have a failure whereby the data was all lost, but such disaster would need to be on the scale of a global thermonuclear war, asteroid impact, etc. In such an eventuality, nobody is going to care at all about the backed-up data anyway.
To get their blog entries back people might be able to use Google Reader. If journalspace provided RSS feeds and if the entire blog posts were part of the RSS feed and if somebody subscribed to the RSS feed in Google Reader one should be able to obtain all blog posts from when the first user followed the blog with Google Reader. Here are some details: http://www.niallkennedy.com/blog/2005/12/google-reader-api.html
Back in the nineties, a friend of mine was backing his mac system up weekly with a tape drive. The thing is, he was using the same tape to repeatedly back up onto. One day he calls to tell me he needed some help recovering files on his hard drive after a crash. I asked, "What about the tape backups?" He said, "That thing backs up perfectly. The problem is, it doesn't restore at all."
Seth
$5 / month hosted VPS on linux = awesome!
My condolences go out to the Journalspace.com operator. It was probably a huge part of that person's life and now it's all gone because of a mistake. I hope it doesn't leave him disillusioned about embarking on other ambitious efforts.
Seth
$5 / month hosted VPS on linux = awesome!
Maybe they had the same problem we had with our former Tampa/Atlanta based hosting company. When a drive in the RAID 1 failed they jumped in to replace the drive. But they replaced the good drive, not the bad drive. (And don't ask about their SAN where we had a backup and it went offline every few days. Excuse-of-the-minute included they had to replaced all the hard disks, or it cost $200,000 so it must be working, or they "replaced all the wires" because they might have cracked when they moved it from Tampa to Atlanta, even though it was still in Atlanta. Then it wasn't their problem any more because they couldn't do anything about it (except keep billing us $100 per month). Absolutely unbelievable what excuses they could pull out of the air. Thanks idrive.com for clean, reliable off-site backups. ) There is some computer usability study out there where they instructed sys admins in RAID 5 then put them in front of a test server and created various fault scenarios. All the data loss was caused by people pulling out the wrong drive or doing the wrong thing.
I'm the guy you reach, assuming you have a valid support contract, in situations like this. In real life, I work as a backline support engineer for midrange disk arrays, which range in price from 70K bux to over 500K bux. I've also taken operating system, network and security cases in the past with a previous employers. Our team takes these Severity-Cluster F*ck-data loss cases daily. We can salvage the data, sometimes. Frequently, there is nothing that can be done. After that, I write the Root Cause Analysis document and assist with the presentation to the customer's management.
I can understand where this guy is coming from. This is a potential career limiting move, and no one ever admits they screwed up. When you're truly scared, you loose the capacity for rational analysis, and grasp at straws. Unfortunately, this is when you are most likely to make mistakes.
A Microsoft filesystem support engineer once gave a really good analogy. Assume the big expensive disk array is a brand new Ferrari sports car. Just because it's really expensive with all the bells and whistles doesn't make it immune to flat tires or rear end collisions, does it?
Malicious actions *are not* the most likely cause of data loss or corruption. All of the specific situation below never occurred in the same data loss event, but I've personally seen each.
1) The array firmware is over two years out of date, because uptime was so important that maintenance was never scheduled. Same for the host OS and HBA drivers.
2) Failure to heed a published service alert requiring an upgrade or workaround.
3) Failure to save the current array configuration information.
4) The site does not have tape backup, instead the data is remote replicated to a similar array at the DR site.
4a) Someone convinced management that a disaster recovery site situated below mean sea level in New Orleans is a good idea. Oh, the date is early Sepember 2005, a week after Katrina hit.
4b) Alternatively, the DR site is in Florida, just after another hurricane. Someone forgot to buy a diesel fuel contract to top off the emergency generators every three days. After a week, the site goes dark.
4c) The telco routed data center primary and alternate fibre lines through the same physical conduit under the street. The utility crew with a ditch witch severs both. The ditch witch *always* wins! Anyways, your DR site is now out of the picture.
5) If tape backups are available, the cassettes are stored on top of a cabinet marked "Danger High Voltage".
6) A minor failure triggers the chain of events. For example, a drive fails and reconstruction to a hotspare drive begins. This is ignored.
7) The array's "call home for help" feature was never configured.
8) The array's data scrubbing feature was not active.
The array continues to operate in a degraded state, but since data availability is maintained, no one notices.
9) A reconstruction read failure on another spindle (second fault) in the Raid 5 volume occurs, taking the entire volume group offline. This read failure also kills off all other hotspares. This situation would have been prevented with data scrubbing.
Up to here, support can almost always recover the existing data on the array drives without too much work or restoring from backup.
10) Someone runs to the array, hears the array alarming and sees the flashing lights. End users are complaining. Management wants something done "right now". The replacement drives are a few hours away. It's time to make a command decision- call for help, sit tight and wait for the cavalry or go into kamikaze sysadmin mode and save the day?
11) The sysadmin recalls there is another identical array, which is running less important applications. He decides to yank parts from this "installed spare".
12) The stolen drives have not fixed the problem. A replacement controller or two from the other array didn't either. End users report additional problems with previously unaffected hosts.
13) Someone finally decided to call support, b
What JournalSpace didn't say is that their mirrored drives are actually 30GB Zunes. Just let them run down overnight, charge them back up and everything will be fine tomorrow.
Anyway, that should be enough to answer the "There is no rational justification for tape anymore" comment especially since tapes have also been improving over the years.
At least something good came of it.
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
On the bright side, if the site gets clogged up with blogs again, they have a nice mirror of it being free of them, for quick restoration.
The company that runs Journalspace (or used to, anyway) is Lagomorphics. They will host your site for you...
http://www.lagomorphics.com/hosting/
At Lagomorphics, we're OS X hosting experts. We've been using the Mac mini and Xserve platforms for years, and we're proud to offer you the opportunity to use our colocation facility. Just send us your Mac mini, or let us provide the hardware.
I'm a big tall mofo.
For everything to be just gone and I mean LONG gone, then something besides a truncation or un-linking of the file had to occur.
Now I don't know all that much about the apple file system, but I would imagine it is like most file systems in that it links clusters and sectors of data together using some sort of allocation table, hash, b-tree or something.
Now unless they had file scrubbing turned on and the OS purposefully went out and overwrote every segment of the file with 01010101 and 10101010 then the vast majority of the data should still be there, at least I would think it would be. I mean even the nastiest revenge oriented guy, would have to be able to invoke some kind of program to do that.
I am assuming that it was an SQL database of some flavor. I don't know much about MySQL internals but I am pretty sure a
delete from table
simply goes through the index and marks pages deleted and does not physically go out and scrub ever page that has data on it. I know that is how Oracle works.
So this leaves me wondering about the data recovery house.... I they were doing a sector by sector read on the entire drive ( either of them ) they should till see all sorts of data on the disk. Now I don't know if the database compresses data on the fly ( some do, some don't) and I don't know if drive compression is an option on OS-X. If so, I can see where they would see just mostly larges amounts of compressed data ( making things VERY difficult if not impossible to recover, but baring that, most OS's have the hooks built in do simply do a sector by sector read of the storage device and although your binary data ( images and the like ) might be unrecoverable, you could probably get most if not all of the text.
Just a thought, but hey I might be crazy, it is just the hacker in me that brings these things to mind...
Hey KID! Yeah you, get the fuck off my lawn!
I work for a Disk->Disk->Tape vendor. There is definitely space for tape, for the following reasons.
Most brilliant IT folks by any other definition don't get that the DBMS vendors including Oracle, Sybase, Ingres (CA of old), Informix, etal. use every trick in the book as artifices to get decent performance. Hey, I use the commercial vendors in major applications, so I'm not being a bigot in any sense. I'm just bemoaning the idea that DBMS's tend to be massively overrated as secure reliable technological mechanisms for storing data. Yeah they can be any one of a variety of "ilities" (you know reliability, availability, etc.) but at a typically huge cost, and not always one which has any bearing on rational $ per increment of functionality.
Most specifically at issue here (in a design which appears to have attempted to avoid the vendor proprietary solutions) seems to be the fact that much of the performance for transaction based database applications depends upon huge (multi gig) physical memory caches of which normally would (in a ISAM or other file based sense) be written to the disk regularly, and in the best cases asynchronously. Sure the file approach has huge downsides, but my comment is strictly about the technology solution which has evolved in the market, and is being bought by notionally smart folks. In this case the IT pilot fish seems to have ignored that the DBMS caches would have a tight affinity in terms of node, and cpu (in a multinode or multi-processing environment), and that the large caches would have to be regularly flushed (also costly in terms of performance) to avoid potential corruption so...
Despite depending upon great SATA and/or RAID or even fiber channel disks to write everything twice, the timing of any failure would make it nearly absolute that even a well timed flush of a large memory (dbms) cache and the subsequent write to disk command of a large multigigabyte cache would not be complete (== corrupt transaction state, corrupt logs, with no time to rollback) on both disks to meet the demands of a simultaneous kernel panic or hardware exception. No disk is that fast at this point, even SSD.
I guess I would chalk this one up to naivete or ignorance on the part of the application designers or architects. You get what ya pay for. Kind of hard to have sympathy. I cleaned up a couple of messes like this when I worked for DEC.
An account with rsync.net and a couple of cron jobs would have saved their butts.
Greg Raven
As long as there's any left, I'll take mine first.
It's just like the way that -- despite the fact that you're far more likely to be killed by a drunk driver than a terrorist, (this may even hold true for members of the US military), it's the terrorist deaths that make the news....
Think about it for a minute -- If, at the top of every hour, CNN spent 30 seconds on each DWI associated death during the last day,
This wouldn't be that much of a problem, however, if it wasn't for the fact that many members of the public -- and especially decision-makers tend to choose their response to 'reality' based on what's in the news... It's that kind of confusion between what's in the news and what's commonly occurring that often results in tragically incorrect focuses for la rge chunks of society.
Sometimes boldness is in fashion. Sometimes only the brave will be bold.
Nothing is ever deleted from the internet... that's more true than false.
Between Google Cache, Archive.org and local temp files, I would expect a gross majority of the information is recoverable if the people know what they are doing.
I had a professor who had accidentally corrupted both his website and the backups. It took all of five minutes to recover good copies.
"Dictator Flakes. They WILL be delicious."
They just upload their shit to an FTP site and have others back it up for them.
Running Windows without Cygwin is even more hardcore than running an internet company without backups.
Assuming that you have spare drives, you can use mirroring as a backup solution.
I had a huge database that I was was responsible for and we'd lock the database and split the mirror, take the drive offsite.
If the system died, we had a spare drive available for immediate recovery.
It's all in how you do it.
seriously, I can see this disgruntled admin thinking, "let me set the whole thing to disappear on Jan 1, which is far enough away from my departure so no one will directly connect this to me"
1) This: http://journalspace.com/this_is_the_way_the_world_ends/not_with_a_bang_but_a_whimper.html is amusing. "The guy who I fired for stealing and who told people how smart he was did not have backups. After he left, I should have checked on stuff." -- This is wrong. If you don't _know_ proper off-site backups exist at any time, you are making a huge mistake. Every single day. Your responsibility does not start when you fire some guy. And in a shop small enough that one guy can handle all IT, the boss of a blogging _website_ _must_ know that.
Not much he can do now and no use crying over spilt milk. But to imply that his (shared) responsibility is less than 100% is a joke.
2) To save affected bloggers the trouble of posting: http://i154.photobucket.com/albums/s257/MyDoom111/btarded/outrage4yvdj3bf67oq.jpg ;)
...when you add -f on the command line. There's a warning about why trying to make commands safe automatically is not a good idea in the Unix Haters Handbook (Changing rm's Behavior Is Not an Option section). Better to have an entirely different command written by yourself that won't vary it's behaviour platform (and at worse will be missing)...
My own cautionary tale (unrelated to the GP) is don't delete directories you think are completely empty by using rm -rf. Use rmdir - it's safer.
Oh and don't use -r if you aren't actually deleting directories. And watch out for GNUisms on GNU userlands - that * might mean more than you think it does on Linux...
I've been looking for more stories like this - i.e. people who failed to back up and their businesses closed. We know there are lots more out there but this is a real coup. Another guy had to re-do three months worth of accounts, but that doesn't even register on the scale of this one. Just goes to show yet again that offsite data backup is the only solution for anyone planning to stay in business.
Peter Graves Channel Computing
I'm not sure what you would get back (aside from remapped sectors). Years ago it was definitely possible to get significant data back from zeroed (as in dd if=/dev/zero rather than a "format") drives. However in recent years there is apparently so little space and densities are so high very few bits can be recovered. The reasoning for this is that there have been cases where drives have been blanked and vast sums of money have been at stake but the data was not recovered (old Slashdot thread about data recovery questioning if it's possible after one pass).
If you have a reliable link saying that this is to the contrary I would really like to see it though. Solid state disks are a whole 'nother kettle of fish though.
Bodhi is no tree, nor is the mind a standing mirror bright. Since all is originally empty, where does the dust alight?
Say hello to my little sig.
That's less than 500 a day! Christ, my personal blog gets more than that. Double that, in fact. And it's not like I put any effort into it whatsoever. It's roughly half talking about bands I like and half ranting about Ruby. No professional ambitions or concessions whatsoever. Not even updated regularly.
So isn't that pathetically low by any modern standard? Who could possibly make any money at all on 14k uniques a month?
I was under the impression that numbers like this were really low and basically meant nothing. "Enough traffic to call yourself popular" starts at about a quarter million uniques a month in my mind, and that would be the absolute minimum.
Don't mean to add insult to injury here, but if you've been soldiering on for more than 6 years and have less traffic than some random guy's zero-effort personal blog, then maybe you should just give up.
Let my new 7-digit UID be a lesson to all - write down your passwords.
coward.. u can easily show and explain the difference + in a mail to the boss if needed.. It *is* an unthankable job in most cases, but it *is* part of your earning a salary.
there are already comments here, and this story itself should provide enough material even for the most staunch believers of mirroring..
Now... ask yoourself how *valuable* are those 50TBs?
Oh wait, who cares but the other bloggers that think the world revolves around them? No one.
Democrats and Republicans are like AIDS and Cancer, I want neither!
There are actually several products like that now.
One is this one, which has USB and eSATA -- it's probably available on other places as well, but who doesn't like ThinkGeek? They don't say the manufacturer, but if I'm reading the photo correctly I think it's "NewWave". It goes for $40. In the ad copy there's a mention of a 1TB limit but I don't know if they really mean that or not.
Another, nicer option (IMO), is this one from NewerTech; they call it the "Voyager Q" if that link dies. It speaks USB, eSATA, and both 400 and 800 Mb/s FireWire. MSRP is $100. It specifically mentions that it's compatible with 2TB and larger drives.
I've seen several other models floating around from various manufacturers (some I think are just re-brandings of the same product ThinkGeek is selling) that are all substantially similar.
I'd be a bit concerned about putting a drive through too many insert/remove cycles -- the internal SATA connector isn't really made for repeated connection and disconnection -- but for backup purposes it's a pretty darn slick idea. I'd been thinking for a while about getting my SCSI tape drive set up and working again, but I think I'm just going to do 2.5" disks instead. Yes, the tapes admittedly last longer, but this assumes you can get the equipment to read them (and DAT was always a bit finicky in this regard). Plus, there's no comparing the sheer volume of media produced; it's a lot easier to take care of a small stack of hard drives (for under $100 you can get an air/water-tight Pelican case and a small media-rated fire safe and keep a few drives secure against just about anything except getting nuked) than it is to try and protect a big stack of tapes.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Multiple. Offsite. Backup.
I question anyone who claims to be "Professional".
The cost of that cleanup, of course, will be borne by taxpayers, not industry.
The JournalSpace blog
http://journalspace.com/blog/
Here in my company, which is in the earth moving business, we have daily backups to tape drives and weekly offsite storage. This is a company who invests very little in IT (our department is just me and one more person for some 2500 total employees) and even we can get that right...
LUSER: "hello, support? our db server disk is full" ...
BOFH: "clickity click click" .... "nope - looks empty to me"