Why Do Companies Backup So Infrequently?
Orome1 writes "Businesses are on average backing up to tape once a month, with one alarming statistic showing 10 percent were only backing up to tape once per year, according to a survey by Vanson Bourne. Although cloud backup solutions are becoming more common, still the majority of companies will do their backups in-house. Sometimes they will have dedicated IT staff to run them, but usually it's done in-house because they have always done it like that, and they have confidence in their own security and safekeeping of data."
Portable HD is cheaper and faster, even for stacks of them. Small businesses may be using a bunch of these in place of tapes.
I run backups at least once per week, and if I make major changes to my system (I self web host), I backup several times per week. This week I lost a 500GB drive. I used ddrescue to recover all but 19MB, although after the filesystem was restored, I lost more than 19MB. Backups saved me huge amounts of time and trouble. The system was down for nearly three days. Data recovery took time, getting a new drive and installing an OS on it, and then rebuilding the system. Its been back for two days. I don't know how companies backup only once per year. Its like they are asking for disaster.
After one failure the costs alone will be realized. I personally never worked in a place that did not have a daily backup. Hell, I knew someone fired because he messed up a daily backup for 2 days but with no data loss just because if it *did* happen it would be catastrophic.
The only place I knew who did it weekly was a small computer shop.
Are companies this cheap today run due to excessive cost accounting cutting and right sizing? I find this too hard to believe
http://saveie6.com/
It's expensive, so management does not really want to pay for new tapes, a disk-based system or cloud backup. It requires personnel, which management does not want to pay for either. It's boring for the persons involved (who likes testing their backup?).
Maybe the reason why companies rarely backup "to tape" is because hardly anyone uses tape any more. Extra disks and RAID really make tape more-or-less pointless. Tape is also much less reliable than disks. Disks come with a warranty, tape does not.
Or at least don't answer them three sentences later.
Why Do Companies Backup So Infrequently?
...because they have always done it like that, and they have confidence in their own security and safekeeping of data.
I live in constant fear of the Coming of the Red Spiders.
Hah, I remember when a coworker of mine accidentally a couple gigabytes of data.
we put our main data on tape every workday, plus full server backups every 4 weeks.... :-)
Most companies have no risk management, and no clear picture of the risks their business faces.
The result of "intuitive" risk-non-management is that the usual human flaws have full impact. Basically, aside from a narrow middle ground, all risks are wrongly estimated.
Assorted stuff I do sometimes: Lemuria.org
The result is that people just expect them to work flawlessly -- not to fail. They also ignore other risks. I put in a machine at some customers a couple of years ago. They did not want to pay extra for backups -- ''yes, we will do that later''. They knew that I configured mirrored (RAID 1) disks. I set up a backup from one part of the disk to another and reminded them every couple of months that they would loose everything if it was destroyed or stolen.
Then a few weeks ago another unit on the industrial was torched -- arson. I have finally pursuaded them ... I am putting in another machine on the far side of their factory that will take a daily rsync, and USB plugin disks that they will backup to weekly & take off site.
These guys are not stupid in what they do professionally, they have an annual turnover of more that £1 million. Why does it take a fire at a neighbour to make them see sense ?
Seriously. Most people aren't willing to put in the effort to determine just how bad things would be if they had to resort to their (often inadequate) backups, and therefore they aren't willing to pay the time and capital to get adequate backups.
If you want your company to get better backups, run a simulation of what would happen if something failed. What's the best recovery you could do? What business would you lose? Then calculate the probability of that failure occurring, and be generous.
Insert self-referential sig here.
especially when combined with 'find' and 'xargs', in what is supposed to be a simple task.
If you don't, you'll do something like what i just did ("worst typo in a decade"): you see, i was trying to update emacs and wanted to purge all the .elc files from ~/.emacs.d
Unfortunately, through a bad typo, some miss-applied keyboard shortcuts, and rushing through without mounting a scratch monkey... what actually ran was effectively "find ~/.emacs.d | xargs rm".
accidently deleted the 'grep'. Oops. 15+ years of elisp/etc destroyed.
Was it backed up? Nope! Been meaning to check it all into git, but always put it off as a "minor, unimportant" task I'd get to later. Of course, we all think that way up until the disaster hits...
*sigh*
Ce n'est pas une signature automatique.
I think my record for getting things restored off tape is about 10%. It mostly seems to be a placebo.
'Do we have backups of FOO?' 'Oh sure, we backup everything nightly.' 'Thanks, could you get me... FOO/BAR?' 'Sure. Justasec.'
Two hours later I get the call. 'Uh, we're having some issues here, it'll be a bit longer...' Two hours after that 'How badly did you need FOO/BAR?'
Either the automatic backup system was failing early on in the backup and had been for months and nobody had noticed the error condition, or it was backing up but for some reason wouldn't restore or they were doing incremental backups but someone forgot to change tapes overnight or whatever the long, long litany of excuses has been.
Which is why I always spin my own nightly work backups of all my machines to a 1 TB USB drive with rsync. Nightly and weekly. 100% success so far.
"Although cloud backup solutions are becoming more common, still the majority of companies will do their backups in-house."
The cloud solutions actually open up a whole new set of issues. Let's assume for the moment that the technology is flawless, and dirty cheap. (I don't know if either is true, but those aren't my issue.) My companies biggest assets are digital. If I wanted to backup to the cloud, I would have to first go through a gauntlet of lawyers and senior executives and explain to them why I think it's safe to put our livelihood in someone else's hands.
Now, there are arguments for, and against, moving our assets to someone else's servers. But, the biggest obstacle to moving to the cloud is that I really don't want to talk to the lawyers.
Yeah, induction, folks.
It's been 6 months since I began my startup. That's 3600x24x30.5x6 seconds, or 15,811,200. 15,811,200 seconds of no computer problems, and 0 with.
Extrapolating out, that means infinite problem-free computin ??#@ NO CARRIER
I'm not a lawyer, but I play one on the Internet. Blog
And the restores often fuck up too.
Name me a good backup system that doesn't need to be nursed along most of the time.
all everyone needs is *restore*.
Backups don't have a real benefit for a company. Except the one you('d have) need(ed) for the urgent restore. So if you never had a full restore...
Where I work, the admins decided one daily snapshot should be enough. Fortunately, after I told them, the business doesn't seem to mind enough to budget for backups earlier than 6 months from now at the soonest, so everything is hunki-dory, right?
:(
Sometimes I hope the plans for this whole "Business Continuity Project" will be lost due to a storage failure...
Posted anonymously, because I'd rather keep my job
Mod me down for trolling all you want, he deserved the pun after not making backups for 15 years consecutively.
I was promised a flying car. Where is my flying car?
Offsite backups are what a lot of companies don't do. They might back up to tape, but the tapes are stored in a pile next to the server. And they never test them.
I'm not sure how everyone gets so ecstatic about those cloud backups. When we would need to send all our data over the internet connection it would take an unworkable amount of time to complete the backup.
Even to the local LTO-4 drive, which runs at over a gigabit per second, the backup takes an appreciable amount of time.
Cloud backup may be good for a 3-man company doing document editing, but with the amounts of data that are common these days, and the speeds of internet connection that you normally have, I don't see it as a realistic possibility.
Just a note:
rsync --delete videos archive
rsync --delete videos/ archive
The first one will copy videos as a "file" to archive, the second one will expand all files in videos/ and copy them to archive deleting everything else. :-(
We do incremental daily backups of main data to LTO4, they go offsite the next day, and a full backup every week, per set. We have 3 sets - there is always one off-site. We also locally sync startup disks, Exchange data goes over the wire every night, finance database is backed up every night via FTP over the wire. I think we worked it out that we would only lose a weeks worth of data if the office burned down AND the van carrying one set of backup tapes crashed and exploded.
I've always wondered if /. bothers to back up the stories and posts...
Anyone here know?
How about for other 'trivial' sites, like reddit, youtube, etc?
You need to think about what data you are protecting, why, how you RESTORE it, and whether it needs an offsite solution at all.
If you do that properly you knock a zero or two off your technical volumes, every time.
Conversely, it's seldom worth doing that at $SMALLCO because the entire enterprise fits into a Terabyte or less and storage is cheap. The danger is that throwing it onto something offsite - whether it's disk, tape, cloud, or ferromagnetic core memory - can lead to lazy thinking about what happens when you need the data in an emergency. Even if you do business by emailing spreadsheets around you might end up with inconsistencies in your backups, and if it's the payroll spreadsheet that Daphne left open overnight for a week...
"... and more and more now there are all kinds of electronic goodies available" -- Pink Floyd 1972
... because hard drive costs have come down so much it's just cheaper to buy a bunch of hard drives and mirror like crazy. Another factor is speed. Backup up costs time and time is money. So it makes sense that more and more organizations have moved to to mirroring/RAID solutions.
RAID is pretty damn robust these days also and with drives as cheap as they are you can create many mirrors at once. I've owned raid 5 over the past 7 years and I've never lost data drives have always died 1 at a time on a disk array. As hard drives got cheaper I get to back up the entire array with one hard drive before I start to replace the drives in an array. My first array was 4x160GB drives as soon as 500GB drives became cheap enough you can copy the contents of 4 disks onto 1. I imagine this pattern has happened for many organizations as well. Drives just keep getting bigger and cheaper fast enough that you can affordably back up what used to be a 'big drive' in a year or two's time as the prices come down.
The hotel where I work has a system that does an incremental backup each night monday to saturday, then a full backup sunday. I've tried to explain to my bosses that the only way we can use this to it's full advantage is to have a different tape for each night. The nod their heads, yet in the year and a half that's I've been there we're using the exact same tape every night.
To most PHBs a computer is a toaster. That's right, an appliance. Nothing more. Most have no idea of the nursemaiding a computer needs, or how vulnerable company data is when it's reduced to 0s and 1s. Until the "unthinkable" happens and the toaster breaks.
Now. I used to be in the biz of disaster recovery. It was lucrative when consumers who valued their potentially lost data found me and asked me "Can you do it?" and I said "Let's have a look". My success rate was something like 40%. Surprisingly (for me anyway), my better corner was recovering data from physically damaged flash drives. Out of several dozen of those I only had one that I couldn't recover anything from. Hard disks are a different animal, and the whole thing can be frustrating when you sit back for a minute or five and consider how much easier your work would be (and how much hardware you can sell) if your client had the benefit of hindsight and someone around who knew the shit of which he spoke - so a week-long recovery project that may or more than likely may not work turns into a two hour exercise in restoring on new hardware and sending the happy client on his way. Now, that is an exercise in getting repeat business through recommendation.
As far as "accidental" deletion of data: there is only so much you can do to protect the stupid from themselves - on a network share, for instance, you can deny users the right to delete files. Job done. In an environment that is supposed to be secure, that's a good start. VSC is another handy tool but you don't need to tell the stupid that their fuckups are (sort of) covered - it breeds complacency and does little to nothing to train responsibility.
Operation Guillotine is in effect.
...Stand back, look at your management, and think of them screaming "Profit" in the most retarded voice you can think of while imagining them in a mentally handicapped pose.
Honestly, it's unfair to the Mentally Handicapped to be compared to Business Management, the mentally handicapped can never be so petty and single minded like today's management skill-set.
It's probably due to the "Satisfaction Now" syndrome that the American lifestyle has perpetrated; "Spending money on a backup solution just if we have a failure would cut into our profit margin and your bonus (...Which I will use metrics management to trick you out of..) just in case our servers/data center/area has a fire/theft/natural disaster. We can worry about that later, we need to make profit now!!!
There is no "Slow Growth tp the top" any more, "Faster, Cheaper, Better" has even stepped aside for just "Faster-Cheaper-Faster"
Remember, only you can stand up to the Mental Retardation of your company.
Computers may be too reassuring in their reliability, but the problem with backups is that they're NOT reliable. Very few backup systems are a setup-and-forget kind of tool. They can:
* Fail to run at all
* Fail to run regularly
* Fail to inform when they don't run
* Inform too often
* Backup all the wrong things
* Run out of space
* Backup all the wrong things and run out of space
* Fail to maintain their own backup integrity
* Fail to restore
* Fail to be even theoretically restorable
* and so on...
I've found simple, reliable tools like rsync and faubackup to be a godsend. Without it, with windows-style tools, or crap like bacula, I'd probably just use RAID with Git. That would NOT be ideal, for some data sets, but far better than most backup tools.
Perhaps this will help
https://github.com/movitto/snap
Years ago, our office burnt down. The business owner/my boss happens to be the most prepared guy IT in the world, keeping frequent backups all over the place - i.e. in house backups, local data warehousing as well as interstate data warehousing.
It would take a lot to destroy our business.
I backup to backup server every hour and then spool to tape every night.
Once a week that weeks tapes get sent offsite and old 3 month old tapes are returned for re-use.
If the company's IT department is NOT doing daily backups then it's IT department is ran by an idiot.
If management will not pay for it, then they need to be told, "so everything you do today does not matter and has no value? because that is what you are saying when you say backups are not important. Hard drives are not reliable, if they crash tonight and we lose everything, how much money do we lose?"
You have to talk to them in management speak... AKA money.
Do not look at laser with remaining good eye.
So a memory hierarchy emerged: local use on solid-state or disk, first-level backups on disk, and second-level backups on tape. That sounds reasonable for organizations big enough to need tape.
You'd be surprised how many businesses have clauses in contracts that essentially demand services be on site. For this reason and many others, cloud backup, mail etc may not be getting adopted as quickly.
Back up to the biggest media you can afford and reliably restore from. That way you reduce the risk of failed media destroying your backup.
Store backups in a remote location. If a fire destroys your backups then you're screwed. doesnt' matter how sound your media is otherwise. This should be a no-brainer.
Hard disks are not very reliable for backups even though they are convenient.They just have too many moving parts and the number of those parts go up as the disks get bigger. Higher costs and higher failure rate. You need to backup to as little media units as possible, so just throwing big disks at your storage solutions is not a safe or cheap backup plan. It's too risky transporting a bunch of high capacity hard disks around.
Mirroring large drives is a much better solution, but you'd have to have at least two of everything for that to work, so it's costly and you'll be at the mercy of your network in teh case of a downtime.
Hence tape. There is nothing as reliable and cheap for the same capacity. Even though the hardware platform may be costly you'll make it up on drive failures. They can also be transported safely as long as they don't go through a strong magnetic field.
I work in SMB consulting and we get a lot of penny pinchers who come to us and express an interest in "cloud backups" or "offsite backups".
None have the bandwidth for it, but I always ask them how they would handle a large restore. Some cloud backup services claim to supply you with a USB disk, but what's that involve? If you're lucky, next day air delivery? And what do you get, just a USB disk with all your files? No NTFS permissions?
What about data from applications like Exchange? SQL? Tons of places run apps that use SQL Express DBs that won't get backup with cloud software.
It's probably a decent service for an office with six guys and a share file, but its hard to see the value of it unless you get into cloud backup systems that are more sophisticated and you have serious bandwidth to throw at it.
Businesses are on average backing up to tape once a month.
Was this survey done for tape companies or what?
I'm CIO at a small company. We do not own a tape drive. The amount of data that our company has it tiny. Less than 500GB total. There's no need for a tape drive.
We are religious about backups and retain 60 days of daily backups for every system. We have about 15 systems - email, CRM, DMS, VPN, development ... you get the idea. Since we don't run MS-Windows, we don't need 50-100GB to back them up. Many are just 2-4GB in size and that includes the 60 days of reverse incremental backups. A few are 20GB.
Further, I've tested the restore more than a few times - oops. 20 minutes later and we have a fully restored from 2am version of the OS + Apps + Data.
Cloud Backup? Are you serious? Not a chance. Our data contains proprietary client information and I'm not interested in wasting bandwidth pushing that onto someone else's storage. We have 5 portable HDDs and encrypt the data written to them. One night a week, a different VP in the company takes home an encrypted portable HDD with all the company data on it. This keeps them tied to it. It means there are 4 copies offsite.
I've worked at enterprises with large tape silos and probably purchased 30 drives at around $30K ea to put into those over the years. I've also purchased $10M in EMC storage over the years at that same job. Tape makes sense there, but not at a business where there simply isn't that much data.
Like we aren't doing backups ... get real. Ask a better question.
I expect a lot of companies are just like my mom. having never experienced data loss, she doesn't see the point of backup.
Request your free CD of my piano music.
Two things at stake. One of them we can lay squarely at the feet of the customer; the other (those of us who provide IT services) we have largely ourselves to blame for.
1. There's an old adage - there are two types of people in this world; those who take regular backups and those who have never lost data. I'm astonished there's anyone left on the planet who's never lost data, but it appears there's actually quite a few. This is really something to lay squarely at the feet of the people who lose data.
2. Backup is the wrong thing to do. We shouldn't be advising businesses to backup and we shouldn't have been doing so for at least 10 years.
Why do I say this? Well, how often do you find the client's been taking backups religiously - onto tape media that's so obsolete it could take weeks of hunting to find a suitable drive? Or that they've been backing up their data all right - but that data depends on software that isn't part of the backup, developed by a company that went out of business two years ago? Or that a former IT provider set them up with backups some time ago using a piece of proprietary software that hasn't been supported since Windows 2000 and won't even fire up on anything more recent? Or that they can recover their systems just fine - but their systems aren't much use without some extra doohickey plugged into the back, and there's only about 100 of them on the entire planet because they're custom-made?
What we should be advising is "Okay, forget about backup. Think about your business processes. How quickly and easily could you start doing them again in the event of the whole lot burning to the ground? What would you need in place? How easily could you get hold of it? That might be backups, but it might also be physical equipment or knowledge that only one person holds. Write down a plan - it doesn't have to be complicated, for a very small business it might be as simple as "I'll drive down to the store, buy X, Y and Z and recover data" - and go over it, make sure it's still relevant and test it once a year."
This has two enormous advantages:
1. It's much less likely that any of the gotchas mentioned above - or any of the hundreds of others I'm sure anyone commenting can think of - will trip you up.
2. It's much easier for a business to think in terms of their own business, which means you're not asking them to concentrate on something they don't understand and are less likely to see the benefit of.
Back in the bad old days businesses just kept two sets of books.One was a fantasy either for tax purposes or to make the business look good to a potential buyer. Now some small and slightly larger businesses just create their books from whole cloth. Customer lists or other vital items might be kept elsewhere. But a lost computer means little as the entire financial history of the business is fabricated every year. I doubt that even ten percent of beauty salons have books that report the truth. Methods include not reporting cash purchasers at all. Usually they are smart enough to know how to get around paying taxes and feel justified in doing so. After all waitresses don't declare all of their tips and the rich have artful constructs to avoid taxation. Other people feel the same privilege should be theirs as well. So who cares if the computer crashes a lot?
Runs nightly, full backup, archives yearly, monthly, and weekly.
Recovery scripts and directions located locally, offsite with the backups, and on my laptop (as well as in our company Redmine which runs on a separate server which is backed up similarly).
I test recovery once a season (to make sure it still works and to remind myself how to do it).
It wasn't rocket science to set up, and took a few days to stand up and fine tune.
I am very small, utmostly microscopic.
We do nightly backups to hard drives (we have 4 sets that we rotate.) We also do an off-site backup to a server in another location in our city, another off-site backup to a server in a different city, and a dump of critical data (our source code and customer databases) to a USB disk in yet another location in our city.
Granted, we're a small company, but those are the companies that are supposed to have bad backup practices. It's really not that hard to set up a few cron jobs to automate nightly backups. There's no excuse not to do it.
The last company I worked at had a good system. Incrementals throughout the week, and full backups Saturday night. Incremental backups averaged 75GB or so because of the media designers. Full backups got to be large at close to 2 TB.
We used removable hard drives with a dock. The tapes were unreliable. Backups and restores failed constantly (probably 20% of the time). I had a 100% success rate with restoring off the drive. I had one drive fail on me in a little over a years time. Once a drive was used X number of times it got pulled from rotation and saved for use in somebody's desktop, or saved as a spare for backups.
One of the best backup strategies we had was utilizing a second backup server. 95% of the restores we had to do involved people accidentally deleting files, and recognizing what they did right away. All we had to do was go to the second backup server with day old files on it and copy the file back to the server. No messing with switching around tapes or drives, and no messing around with backup software. It was a huge time saver.
We backup to multi TB dedupe disk units, about 25TB a day to disk storage, which it instantly cloned to another site on the other side of the city. We keep backups on disk for 6 months. Once a month we snag one full backup ( about 150TB) and dump that to tape for offsite, bunker storage in another city.
Several reasons I can think of why backups are not as prevalent. Hardware is more reliable and cheap, so you can duplicate everything very easily. You try backing up multi TB sized databases to tape and see how far you get, companies storage is enormous, some companies are running multi-petabyte sized databases, you aint' gonna dump that to tape very easily it takes a day or two streaming to 20+ tape drives in big 500+ tape libraries. That all costs money to run and move about. Simply buy she-loads of cheap SATA arrays, RAID 'em, then duplicate that on the end of a fibre line and use that to store your backups.
Each business must individually evaluate the risks and costs of data loss, what risks they need to mitigate, and how much the business can spend on that mitigation on both fixed and ongoing costs. It's not "alarming" that a business only runs a backup to tape if losing 6 months worth of data is acceptable to the business. There seems to be the assumption in the article that *every* business has the exact same risks and risk tolerance.
As the cost per GB of disk drops and the advantages of deduplication, more and more businesses are finding that they can cut way back or completely eliminate tape backups.
I see it a beet different way. In big companies solution is 'delivered' by engineering while maintenance done by operations. Engineering topically thinks that they only have to deliver semi-working prof on concept which does something and put burden of stabilizing, debugging, etc to ops. Basically backup question always replied you have bunch servers, you have some other servers ? Do rsync/tar/ect yourself it's do-able! It's not freaking engineering job to deliver polished solution. I saw that attitude in every company with clean engineering/operations segregation, but it doesn't exists in small companies or companies with devops approach.
Permissions.
Between the human permissions (eg management) and file permissions, Backups often are incomplete without the root user doing it to a physically attached drive. Even then you have things like distributed nodes that have to all be backed up simultaneously or not at all, depending on the replication mechanism.
It's a quagmire.
One of the less-than-ideal methods of backup is to have a fail-over hotcopy duplicate of every production machine that just continously keeps the machines in sync, but what if the machines are file servers? That takes hours. Like holy crap at two different very large american companies I worked for, they've had the entire system down for 8 hours every night just to run maintenance at one.
It seems that distributed file systems and cloud systems are basically the only remaining solution, but you can't backup any of it. If a node blows up, other nodes take over, but should large numbers of nodes blow up, f*cked. Say a datacenter catches fire. Plus there's no way of rolling back the clock should you overwrite your files. Then there is data deduplication. The direct opposite of keeping backups.
So the end result is that you get half-assed backup solutions that only restore the most valuable files, usually in insecure means (USB drives and Tapes) that have a human failure potential.
My client has the married-pair backup system. If the datacenter catches fire, they go out of business.
Disaster (building exploded), and dumbaster (judy deleted her excel doc).
In companies that have the resources to recover from the former, replication is taking place of tape. Cost of tape/disk is about a dead heat now. Disk is also 4.2bajillion times easier to manage than tape, specifically when you decided to change hardware. Other thing is, snapshots. Very easy to right click, previous version, done.
The job I work for backs-up 10+TB of customer data each night to two different off-site storages that are over 100Mi away from out main(and eachother) and have data going 3 months back. One of the sites is a skeletal second datacenter with just enough hardware to get back-up in case out main site gets toasted, and it has some ability to expand. Our connection to these sites is over a non-internet vLAN through our ISP. The 2nd'ary datacenter site can easily have its connection upgraded to an actual internet link if something bad actually happened.
Our companies own data is backed-up somewhere between nightly and weekly, depending on what data you're talking about.
For the same reason that
If we didn't have fire safety regulations most business would have poor fire safety
If we didn't have OSHA regulations for jobs with physical work, we'd have more workplace injuries
If we didn't have building codes, the buildings most business operated in would be unsafe
If we allowed the importation of unsafe cars, business would buy them
Doing things right costs money in the short term. Most businesses and business people think and act short term.
>showing 10 percent were only backing up to tape once per year
You mean showing 10 percent were only backing up to tape once per year to a paying third party vendor....
instead of in house which is cheaper when you know what you are doing...
I love when they twist these headlines to make them more attractive..
Did anyone see a link to the actual study in the linked article? I didn't and I read the whole thing. I call BS on this made-up "study" to sell puffy cloud ideas.
We do Veeam backups of our virtual infrastructure nightly. Once a week, a copy of that is taken offsite. Also, every night our Equallogic SAN's replicate with eachother. They are in three separate offices in North America. In the event one building burns down, is blown up by a nuke, or similar, we can fire up our entire virtual infrastructure in the failback location within a couple of hours (minutes really). Since we only have 3-4 non-virtualized servers, and none of them store important data, we're pretty well protected I think.
When I was an IT guy we backed up once per week doing a full computer image of desktop + server every Wednsebury. We kept 4 weeks of backup's on hand and we had auto back up software when ever files were changed that did almost what you could call SCM and backed up every new edit of a file with stored revisions. This was all done to Tape.
Backing up is expensive and slow. The newer hardware are crazy expensive and hard drive is more cheap per byte than magnetic tape. You can backup faster to tape and most backup software company utilize virtual tape that take advantage of cheap hard drive.
But most of the time, the test cases, use cases, user documentation etc get the short end of the stick. Especially code documentation that does not sit well inside the source code. [In grad school I used a bizzare thing called Cweb that had a single source code for both code and documentation. You run cweave to get the post script document via TeX and LaTeX. And you run ctangle to get c sources. There was a fortran version too. But it never seemed to gain commercial success]. Scientific papers that form the basis of the scientific codes, profiling data, important lessons learned about the data structures and scaling, painful gotcha bugs fixed etc are rarely documented, forget about comprehensive indexed search-able documentation of the lessons learned in developing the code.
Outside software development and may be accounting, there are areas where the need for back-ups conflicts with the desire for secrecy and control and frankly paranoia and delusions too. Salesmen guard their rolodex and its electronic clone zealously. Business plans should not be leaked, strategy session presentations should be kept in strict confidence etc, and most executives don't realize most sys-admin and IT already have full access to the un-backed up data. They are not losing much by allowing a clean back-up solution, but they don't seem to realize that.
Then there is Legal. They are so worried about electronic discovery. Now there are AI equipped scanners that are sixth or seventh generation progeny of humble awk and grep. They see patterns, they find keywords, they even find the euphemism and missing links, they can tell when a sensitive thread of communication goes off electronic to avoid leaving behind a e-trail or paper trail. So they too want to control the back up process.
In the end, corporate back up is not something simple like you have in your home, where the data preservation is the only criterion.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
"Snapshots/CDP"
"It is common now especially with SANs to use Snapshots and CDP" - Is it!? Snapshots yes, CDP no way, how many use that other than banks, and banks willbe making damn sure they have good backup processes in place.
"but any block level changes would mean an increase in the replication traffic" - Block level changes are occuring all the time, snapshots won't make a difference and the snapshot size only increases if there is more changed data since the last snapshot.
"we still need to defragment as fragmented volumes take considerable time to backup" - this does not apply to all SAN technology, certainly not Netapp.
If the section on Snapshots/CDP is anything to go by then the not only is the article rubbish, but the original research behind it probably suspect.
Yeah, I have a client whose medical records are truly paperless (no paper copies squirreled away), so losing the data would be disastrous. Guess how often we back up to tape? Never.
But how often do we backup? Uhm, continuously actually. Versioning and extensive logging combined with near-real-time replication means that if the server storage goes up in flames, we lose at most a few minutes of data, more likely only a couple of seconds. And yes, the versioning allows us to get back to prior versions, so it counts as backup, not just a clone.
Now 240GB may be to small for a backup and the cost can add up fast.
At first I thought this would be a good article on getting clients to back up more, but after reading it a couple of times and double-checking my thinking by reading the comments, it's pretty obvious that the author knows nothing beyond a statistic he/she read about backups. Poking holes in his logic:
1. Tape backup currently does not have the capacity and speed to keep up with the size of modern filesystems. Solution? Create an offsite backup scheme where data is deduplicated at the source and only the deltas (changes) are transferred to the backup site. That way the backup site can chug merrily away backing things up without causing issue with the workload.
2. 99% of data recovery occurs at the file level. A user accidentally deletes a file, overwrites it, or the file is corrupted. Windows Volume Shadow Copy service was created for this specific purpose so a user can recover without bothering the admin. If you don't have Windows, every major SAN/NAS vendor uses snapshots to do the same thing. Next is disk level recovery using RAID.
Actual total, catastrophic failure is very rare. I like to tell clients to prepare for being hit by a meteorite, but PEBCAK errors are far more likely and more dangerous.
3. WTF is a "Windows Write Driver"? At first I thought this was some wondrous new feature of Windows 7 that defragmented on the fly, but no, the author is talking about Data Consistency Points. According to the article, when an OS (only Windows exists to him it seems) writes to a SAN it just blasts the data straight to disk and bypasses the cache. What he doesn't realize is that the write goes to memory cache (usually two), where it is checked against itself for consistency (is everything here?) and THEN it's written to disk. Writing straight to disk NEVER occurs, even on a desktop. There is always cache and consistency checking somewhere along the way.
Data consistency checking came about in the sixties and is used by every single storage vendor today. EMC, NetApp, whoever; they all do it.
I am Homer of Borg, resistance is - Ooo Donuts!
Pretend a guy prevents the 9/11 attacks by requiring all airliners to have a bulletproof, locked cockpit door. Hundreds of millions of dollars are spent, and prices go up. Everyone complains about government regulation. The attacks never happen. The instinct was that we just wasted a bunch of money. We spent hundreds of millions on H1N1 vaccine for an outbreak that never happened. But if it did, the vaccine would have saved many lives and prevented great financial loss. But it never happens, so it's seen as lost money.
You push for backups, and you spend tens of thousands on it for years. There is a disaster eventually that you recover from quickly, but perhaps you aren't around to take credit for it. Or you restore and everyone thinks that it was supposed to because we spent so much on it. No one will think, "Well, we just saved millions of dollars of downtime because of the backup." The thinking is just, "Oh, well it was supposed to. We spent so much on it."
This kind of myopia is commonplace in the world. We can invest money to treat everyone for chronic illnesses such as diabetes, high blood pressure, and high cholesterol with really cheap drugs. We can vaccinate everyone from childhood diseases. "Oh, but vaccines cause autism." Yes, but your child isn't paralyzed from polio. The avoided disaster is never quantified; only the cost spent shows up on your calculations.
A NYC lawyer blogs. http://www.chuangblog.com/
That is just a copy and prone to change. Disconnected media is a real backup that you can be sure is the state of the files at the time it was written.
A former web hosting company near me had copies instead of backups and both systems had their data compromised resulting in a complete loss of all their clients data. While the above has more redundancy that that failed company it still has the problem of live systems being subject to change.
LTO5 is cheaper, larger and faster than many people think and it's not the only solution. Since every year or two I get some tape reels from the 1980s transcribed without problems I think I'll be able to trust those far better LTO5 tapes for at least a decade.
Scene: two IT guys, covered in ash, visibly shaken, a few minutes after the South tower collapsed.
IT guy #1: You had an offsite backup, right?
IT guy #2: Of course I did. Do you think I'm a complete idiot?
IT guy #1: Thank god. Where is it?
IT guy #2: Over there (points)... in the North tower... (scene fades to black, URL of some online backup company appears)
Of the roughly 200 customers that my IT firm manages i'd say at least 90% of them backup nightly, and on Fridays do a full weekly backup.
If BUE misses something, we go in and fix it that day.
Serious companies that value their business always find a solution to backup the data daily and form some critical application on-going backups are used.
http://www.montuori.net/
It's 1:37am and I'm at work writing crap on slashdot now because 3 drives failed in a RAID10 array and I'm waiting for a rebuild. Maybe it's the card, but something is definitely wrong and I may have to restore to somewhere from backup.
Even having another system preferably offsite as a mirror doesn't save you from files being overwritten and even snapshots can get screwed up. A tape in a box on a shelf saves you from that.
WHICH COMPANIES?
Show me a Fortune 500 not backing up, if you find one I would be interested in what they are not backing up.
For small data sets with limited needs of retention then maybe portable drives an okay solution. However I doubt I want to find version x of file y under that system.
I backup, daily, nearly 10tb of data and it is always to tape. Why tape? Because it is portable, easily managed, and well known. That and I need so many of the things I have tape libraries to manage what is on site and off site.
Yeah I bet the problem exist in SMALL businesses but the article is implying all and it certainly isn't true. We are by law bound to keep specific data for set amounts usually measured in years. Believe me, its far cheaper to keep anything than to be caught without it.
* Winners compare their achievements to their goals, losers compare theirs to that of others.
NTFS permissions are a biiiitch >_<
The only things you can back up to that preserve them are:
1. An NTFS disk plugged directly into the computer
2. An NTFS filesystem container
3. A network share on an NTFS disk that supports NTFS permissions.
Either that or you take a raw binary image of the disk (DUMB) or use a proprietary backup system which is just using a proprietary container format anyways (DUMBER).
"When information is power, privacy is freedom" - Jah-Wren Ryel
At the end of the day I still think tape is the longest lasting and most reliable. Not the cheapest, to be sure, but it is very much a tried and tested technology. I simply do not have enough faith in hard drives and flash drives. I do use a hard drive for a weekly secondary backup, but I still feel a little edgy about it.
The world's burning. Moped Jesus spotted on I50. Details at 11.
Seriously, "the cloud" is a fad and anyone who is putting their sensitive data there should be given a sec-idiot badge. It's no different than putting your backup server in the external dmz!
How does that price compare to the computer with hotswap bays you need for backing up to HDDs?
Not to mention the price difference between jukeboxes, which really go in favour of tapes?
To be fair, most people going to hard drive backup are re-considering the whole backup picture.
I've walked clients through this a few times, and typically we wind up with a full ZFS-based server, with data compression, de-duplication, and months of on-line backups, with pairs of drives going offsite daily (ideally in two directions).
Based on average compression and de-duplication ratios, you're looking at about $9000 for a system that can manage about 200TB of data. Perhaps a bit more if you want to keep lots of offsite copies.
One client was so happy about having realtime access to 6 months worth of revisions, he decided it would be best to buy two systems set up with realtime replication, just in case one went down or needed maintenance.
So, yeah, 9 cents per gigabyte is high these days, but it's really a higher level of computing.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
I work at an accredited university. To maintain that accreditation we must adhere to strict network guidelines, we backup to tape EVERY DAY.
They will have confidence until their raid goes south, or there's a fire the day before their yearly backup.
A snapshot clone to SAN is most certainly a backup solution, provided you retain iterations of those snapshots and dont overwrite. It inherently provides redundancy that tape cannot (striping over multiple spindles, failover shelf copies, etc, multi-pathing, etc.). Tape writes a block of backup data to one tape, one time, and those tapes are exponentially more suseptible to static during handling than drives. Unless you configured auxillary copies to run for tape to tape, you have no redundancy in your backup, and if you do you are using double the media per backup.
That leads to discussions on how cost effective your backup actually is.
LTO5 tapes cost $70-$100 per, depending on your distributor.
The devices to read/write LTO5 cost $1500-$2500 each.
The robotic libraries to hold those drives cost $10k+, easily reaching above the $100k range for marginal capacity needs. A $mil solution for tapes is not remotely out of the question.
Maintenance contracts on those devices can easily be $10k+/year.
Now start measure the costs for offsite storage of those tapes. How long do you keep them? Where do you store them? How do you return them for use in the case of a restore? The contracts for these (companies like Iron Mountain) are pretty pricey.
Now consider that if you retain backup copies for large periods of time, you must also maintain dated technology that can read it, and in most corporations that also requries maintaining support on devices you never use and hope to never use. If your data retention reaches back to 7+ years (required by law by many financial/federal institutions) then you're talking about having DLT tapes and libraries sitting around still today unless you have a staff, processes and inventory of LTO onto which you can copy the old data.
Tape streams data to a linear storage. Your choices become one client server writing to one tape or many servers writing to many tapes. If you choose the first then you must either have one writer direct attached to every client server, or you have very poor-performing backups because of network bottlenecks forcing increased backup windows or additional drives to stream the backups across. Most corporate backup solutions are forced to stream many client backups over pools of tapes. You can create multiple pools so that you have matching retention periods. You can then retrieve tapes from storage as they expire and return them to libraries where they can be re-used. Obviously the media handling process requires hands-on administration (increasing your personnel costs). Every tape can be re-used a limited number of times before you start getting read/write errors and they must be retired and replaced (increasing your run costs). You cannot reuse a segment of a tape. You must keep the entire tape if it has even one block of protected data. This can mostly be avoided if you engineer planning media pools.
Comparitively, online disk storage is much more easily managed, more cost-effective, more redundant, and more survivable over greater retention periods.
You can get a SAN shelf with 18tb storage for $20k, support for which is rolled directly under the same support for all your data storage.
There very limited hands-on management requirements, and no daily or weekly hands-on administration needs.
You can reuse any expired block on the tray instantly.
You have striping and snapshots for redundancy.
You have easily performed shelf-to-shelf data migrations to address technology upgrades.
You have no reliance on maintaining dated hardware or a staff/process to deal with it.
Your data storage and your data protection can reside on dedicated fibre already in place for SAN connectivity, and your backups will never impact your LAN performance. Attachment to that local fibre gives you highly increased throughput over LAN backups even if you were to employ a dedicated backup VLAN and you can stream any number of cilent machines to the same devices without concern at all for reu
"But we have to pass the bill so that you can find out what is in it,..." - Nancy Pelosi
(posting anon just in case someone from my work sees this)
On paper, we have a great backup system - we have two mainframes located in separate cities. #1 is the "main" server, #2 is the "backup". Anything you do on the main is mirrored to the backup, delayed around 15 minutes (depending on load, etc). The backup is also used as the report query server for load balancing. Everything else about these two machines is identical. Since we're only slightly less than a 24/7 operation (we're "down" for about 16 hours a week), it's very very important that the servers are running. Might even call it "critical".
Couple years back, disaster strikes - backhoe takes out the power line to the main server, UPS fails (never did hear why), and the main server dies. Power is restored, but the server won't reboot. (We're about two hours into the outage at this point).
Plan seems clear at this point - switch to the backup. This is when IT tells us that while the backup server has all the data, they haven't bothered to keep it up-to-date on the software end, so it won't function as the live system. And they're far enough behind on patches on that server that it will be faster to rebuild the main server and restart it than to do the patches on the backup.
Eight hours later, they finally get the system back up.
There were quite a few "unexpected departures" from that team in the weeks following. Oddly, I don't miss any of them.
I dunno what companies you've seen with their lax policies, but every single day I go into work, I do a backup. One day a week I do two. It's a 6-day rotating backup.
Then again it's a newspaper......Take that as you will.
That's what I call living dangerously and very bad advice. Even my snapshots on spinning storage are in a different building.
I've been up all night nursing an array back to health so at this point I'm very dubious about relying on onsite spinning storage for absolutely everything.
Portable backup is better than nothing. But seriously, offsite backup is the best because it prevents you from losing data in the case of fire, lightning, theft, etc. With companies like Mozy, MyOtherDrive, Carbonite, etc. offering online backup at such great rates, there really is no excuse not to backup online anymore.
Backup and recovery are subjects that PHB's don't want to address because it costs money. Remember, cutting costs is what empowers the PHB. Few of these wankers understand the concept behind incremental backups and full backups. Because of this they don't understand that some backup solutions are cheap because they are a perpetual incremental solution. This incremental approach sounds good in theory, back up just the stuff that's changed, save money with fewer tape drives, fewer tapes, and shorter downtime windows. That is until a couple of years later when the oldest tapes are unreadable or the backup software has lost track of the oldest files. My personal experience with Palindrome told me that without periodic full backups you''ll eventually be screwed. Second, the PHB's can't seem to understand that you don't have a backup solution unless you have tested the recovery process. In order to test the recovery process you need a test server, and test tape drives. That means more money. The next problem is time. Full backups require time to shovel the data onto the tape. The amount of time depends on the volume of data, speed of the recording device, and the number of recording devices. A power company that I won't name approached this problem with critical reason. They asked themselves how long could they afford to be down and came up with a number of hours. They then found a backup and recovery solution that could perform a full restore in that amount of time. Then they worried about how much it cost.
Because the beeping sound is *really* annoying.
Rimshot
I'd love to tell you the details of my backup solution (let me get these earplugs out give me a second) but I still have four hundred thousand 40 MB hot swap MFM drives to service, you can see how it works, just listen to the hum, okay then so I am back to the forklift for a bit on the back side of row 3 right now, we have great health care, I just got a hearing test, they said I need ear plugs. Oh look it's Lunch. Got to go.
Good point. I gave up on it back then. Is it still sequential though?
External drive bays with hard disk cartridges like the Dell Powervault line work alright for this. I use this at a few places. For example, a local police station. The bay is connected to their server, which serves as a backup manager for itself and the client computers. The drives are labeled (numbered) and rotated daily. One is in the drive bay, the other is locked in a fire proof safe, and the third goes home with the Chief or Sergeant at the end of each day.
I create bash scripts in Cygwin with Cron/Scheduler integration to bypass whatever backup software Dell is pushing with the Powervaults now. It was Yosemite.
I find this to be cheap, fast, and effective for very small businesses or organizations. You have an on-site backup, an on-site backup detached and secured, and an off-site backup.
Last place I worked, they backed up over a VPN to a remote site continuously, to backup hard drives nightly, and to tape weekly. There were at least 12 tapes in the rotation unless one broke or otherwise failed, in which case the rotation would be temporarily shortened until some more tapes could be ordered. The most recent tape lived at the office. The next most recent was at the owner's house. The third most recent was in the home of the Senior IT guy, and the fourth most recent in the home of the Junior IT guy. All three of these people had tape drives at home so they could restore remotely if necessary. This was at a company with 80 people and a full server room, so it's not like it was an insignificant amount of data. In the five years I worked there, they had to load files from tape ONCE, and that was only because the missing files weren't noticed for about two weeks.
Granted, this was in an industry where such record-keeping is required by law, but even so, the backup system was considerable overkill for the need. We never lost more than one day's worth of data, and that happened twice.
How is the Riemann zeta function like Trump rallies? Both have an endless number of trivial zeros.
Nowhere within I'd say an order of magnitude or two of what you are suggesting once you start going beyond trivial amounts of data. Also it's not done within a vacuum. If your drive dies there are others that can read it so long as you don't wait a few decades between transcription - and even if you stupidly do wait decades without your own drive there are still many places that can do transcription for you.
Electricity, air conditioning etc costs money as well and there is a crossover point where a box of tapes in a shed is going to cost a hell of a lot less than spinning storage and those advantages you write of are not necessary with long term rarely accessed storage.
You don't get a call along the lines of "we need all the emails from 1997 in five minutes" in a normal business - outside of a niche in Intelligence or something everyone is going to have the hours needed to get it off tape. In my case it's "X is starting reprocessing work on a survey from 1969 in two weeks and the client is sending us the tapes" - so I forgot to add in the other big advantage - TRANSPORT. Also can you imagine what sort of costs would be incurred keeping hundreds of gigabytes of data on spinning storage since 1969 with nobody looking at in since then? There would be an astronomical difference between that and the cost of whatever transcription steps happened over the years before it finally ended up on a couple of LTO2 tapes.
It all comes down to a philosophy of "one size fits all" versus "the right tool for the job" I suppose. Tape (or removable disk in the short term) is the right tool for the job IMHO but sometimes you can get away with something else.
Also did I mention that former web hosting company near me that had nothing but spinning storage? It appears they actually did have proper snapshots on a second offsite system but whatever happened to the main system happened to it and they lost everything. The rumours were hacking or a disgruntled employee but all that is certain is they lost all of their clients data - web pages, DNS, virtual machines, the lot.
A drive dieing is not your worry. A TAPE dieing is. Tape backups are not striped. There's no parity. You will get bad tapes, there's absolutely no getting around it, and you're fucked restoring from those unless you make copies of your backup tapes at or shortly after the backup. Your only solaceis that out of the thousands of tapes you have, statistically you're unlikely to need the bad one.
Again, from my post:
And before you bring up the costs for power/cooling of the additional disks, there's MAID storage. Massive Arrays of Idle Disks. These shelves have front side controlers that power down disks that have not been accessed in X period of time, and automatically spin them back up if an access is requested. Power/cooling is minimal unless you're constantly hitting the shelves, and if you are then you would have a nightmare time managing a large tape inventory for all those requests.
Storage density of the MAID is at or better than tape, so volume of space is comparable. A cheap SATA 2tb drive is almost identical to the cost of an LTO5 tape at around $80, but if you have a maintenance contract on the MAID shelves replacement drives are free. You can recoup the space consumed by any block on the backups instantly unlike tapes, so you never have media sitting around with tiny amounts of protected data on them preventing the use of the media entirely. You would ordinarily set up a copy from high performance SAN to the MAID after a few weeks when restores are less likely, and then oyu can let the MAID data just sit there for as long as needed. Get a new SAN tech? Start a snap from one SAN to another and you're done. Let it copy for as long as it takes, unlike with tape where you'd probably have to hire some temps to load/copy/unload/load/copy/unload for however many days, or weeks, or months it takes to convert your inventory of the old tech.
Its actually quite common in financial, federal and military applications for the entity that owns the data to be required to respond to lawsuits from many years before. They are required by law to retain certain kinds of data specifically because of this reason. When you get a litigation response like this with tape you have some poor guy loading tape after tape for restores for potentially several weeks to get the data covered in the suit. If it's all on MAID it's a matter of click, click, click and come back to check on it tomorrow. Paying someone for the weeks of time it takes with tapes costs a ton of money.
Transport is another matter all together. First off, if you had many gb's of data from 1969, then you probably have all the data at the time from NASA. But that aside... Restore to alternate location from the MAID... If you're talking even several gb, I bet I can run a data copy over a cheapass T1 before you can ship tapes even if you use overnight. If you're company is worth a shit you probably have T3 or OC12, and then you're talking TB's of data copied before overnight tape shipping. If you're going to ship several TB or your lines suck... Want it burned to Blue Ray? Fine. Want it put on tape? Fine. Want the proprietary backup format copied to another media? Fine. Want it encrypted during the copy? Fine. You tell me what format you want it sent to and it's easy enough to accomodate. Do you really plan on sending them your one and only backup tape that contains the data? Hope not. You're going to make a copy from tape anyway unless you're an idiot. The cost of any of this is negligable. Copying a TB of data over the wire costs pennies. Shipping 50 blue rays costs less than $50, and that's about the worst format you could be shipping. It's meaningless.
For your last few points:
Again, if you present a scenario where a hacker can compromise both your primary and secondary copies of the backups, you're an idiot. I can see someone getting to your primary or secondary storage. But the copies should be happening through a SAN device that's on an unroutable VLAN. The only way to get to that data would
For enterprise level (where I sysadmin) it comes down to CYA. They will do the absolute minimum to meet the service agreements. While this does make sense from a cost perspective it doesn't usually go on to cover a DR or how long to rebuild a system etc.
It appears that I am wasting time discussing something with a inexperienced condescending little shit that has decided to call me a liar. Large seismic surveys produced hundreds of gigabytes of source data per survey even back then.
Maybe where you are - once again, inexperience.
Why are you trying to tell me all this stuff I already know about storage but just do not agree is the best solution for every situation?
I'm an idiot to refer to an example which had recent press coverage? Do you speak like that to people's faces?
Because we need it (plain and simple).
We also make it easy (for the most part) for users to recover their own as 90 percent of the time users screw up.
Production is done for various legal reasons. Whether its be legal SEC (or the like)