Why Do Companies Backup So Infrequently?
Orome1 writes "Businesses are on average backing up to tape once a month, with one alarming statistic showing 10 percent were only backing up to tape once per year, according to a survey by Vanson Bourne. Although cloud backup solutions are becoming more common, still the majority of companies will do their backups in-house. Sometimes they will have dedicated IT staff to run them, but usually it's done in-house because they have always done it like that, and they have confidence in their own security and safekeeping of data."
Portable HD is cheaper and faster, even for stacks of them. Small businesses may be using a bunch of these in place of tapes.
I run backups at least once per week, and if I make major changes to my system (I self web host), I backup several times per week. This week I lost a 500GB drive. I used ddrescue to recover all but 19MB, although after the filesystem was restored, I lost more than 19MB. Backups saved me huge amounts of time and trouble. The system was down for nearly three days. Data recovery took time, getting a new drive and installing an OS on it, and then rebuilding the system. Its been back for two days. I don't know how companies backup only once per year. Its like they are asking for disaster.
After one failure the costs alone will be realized. I personally never worked in a place that did not have a daily backup. Hell, I knew someone fired because he messed up a daily backup for 2 days but with no data loss just because if it *did* happen it would be catastrophic.
The only place I knew who did it weekly was a small computer shop.
Are companies this cheap today run due to excessive cost accounting cutting and right sizing? I find this too hard to believe
http://saveie6.com/
It's expensive, so management does not really want to pay for new tapes, a disk-based system or cloud backup. It requires personnel, which management does not want to pay for either. It's boring for the persons involved (who likes testing their backup?).
Or at least don't answer them three sentences later.
Why Do Companies Backup So Infrequently?
...because they have always done it like that, and they have confidence in their own security and safekeeping of data.
I live in constant fear of the Coming of the Red Spiders.
Hah, I remember when a coworker of mine accidentally a couple gigabytes of data.
Most companies have no risk management, and no clear picture of the risks their business faces.
The result of "intuitive" risk-non-management is that the usual human flaws have full impact. Basically, aside from a narrow middle ground, all risks are wrongly estimated.
Assorted stuff I do sometimes: Lemuria.org
The result is that people just expect them to work flawlessly -- not to fail. They also ignore other risks. I put in a machine at some customers a couple of years ago. They did not want to pay extra for backups -- ''yes, we will do that later''. They knew that I configured mirrored (RAID 1) disks. I set up a backup from one part of the disk to another and reminded them every couple of months that they would loose everything if it was destroyed or stolen.
Then a few weeks ago another unit on the industrial was torched -- arson. I have finally pursuaded them ... I am putting in another machine on the far side of their factory that will take a daily rsync, and USB plugin disks that they will backup to weekly & take off site.
These guys are not stupid in what they do professionally, they have an annual turnover of more that £1 million. Why does it take a fire at a neighbour to make them see sense ?
Seriously. Most people aren't willing to put in the effort to determine just how bad things would be if they had to resort to their (often inadequate) backups, and therefore they aren't willing to pay the time and capital to get adequate backups.
If you want your company to get better backups, run a simulation of what would happen if something failed. What's the best recovery you could do? What business would you lose? Then calculate the probability of that failure occurring, and be generous.
Insert self-referential sig here.
especially when combined with 'find' and 'xargs', in what is supposed to be a simple task.
If you don't, you'll do something like what i just did ("worst typo in a decade"): you see, i was trying to update emacs and wanted to purge all the .elc files from ~/.emacs.d
Unfortunately, through a bad typo, some miss-applied keyboard shortcuts, and rushing through without mounting a scratch monkey... what actually ran was effectively "find ~/.emacs.d | xargs rm".
accidently deleted the 'grep'. Oops. 15+ years of elisp/etc destroyed.
Was it backed up? Nope! Been meaning to check it all into git, but always put it off as a "minor, unimportant" task I'd get to later. Of course, we all think that way up until the disaster hits...
*sigh*
Ce n'est pas une signature automatique.
I think my record for getting things restored off tape is about 10%. It mostly seems to be a placebo.
'Do we have backups of FOO?' 'Oh sure, we backup everything nightly.' 'Thanks, could you get me... FOO/BAR?' 'Sure. Justasec.'
Two hours later I get the call. 'Uh, we're having some issues here, it'll be a bit longer...' Two hours after that 'How badly did you need FOO/BAR?'
Either the automatic backup system was failing early on in the backup and had been for months and nobody had noticed the error condition, or it was backing up but for some reason wouldn't restore or they were doing incremental backups but someone forgot to change tapes overnight or whatever the long, long litany of excuses has been.
Which is why I always spin my own nightly work backups of all my machines to a 1 TB USB drive with rsync. Nightly and weekly. 100% success so far.
Yeah, induction, folks.
It's been 6 months since I began my startup. That's 3600x24x30.5x6 seconds, or 15,811,200. 15,811,200 seconds of no computer problems, and 0 with.
Extrapolating out, that means infinite problem-free computin ??#@ NO CARRIER
I'm not a lawyer, but I play one on the Internet. Blog
RAID is not a replacement for a backup.
RAID will safeguard you against the failure of a single disk (if and only if you monitor the system and replace disks as they fail), but backup will give you back your data as it was before your application destroyed it or your user deleted it. That is something completely different.
Mod me down for trolling all you want, he deserved the pun after not making backups for 15 years consecutively.
I was promised a flying car. Where is my flying car?
Offsite backups are what a lot of companies don't do. They might back up to tape, but the tapes are stored in a pile next to the server. And they never test them.
I'm not sure how everyone gets so ecstatic about those cloud backups. When we would need to send all our data over the internet connection it would take an unworkable amount of time to complete the backup.
Even to the local LTO-4 drive, which runs at over a gigabit per second, the backup takes an appreciable amount of time.
Cloud backup may be good for a 3-man company doing document editing, but with the amounts of data that are common these days, and the speeds of internet connection that you normally have, I don't see it as a realistic possibility.
We do incremental daily backups of main data to LTO4, they go offsite the next day, and a full backup every week, per set. We have 3 sets - there is always one off-site. We also locally sync startup disks, Exchange data goes over the wire every night, finance database is backed up every night via FTP over the wire. I think we worked it out that we would only lose a weeks worth of data if the office burned down AND the van carrying one set of backup tapes crashed and exploded.
I've always wondered if /. bothers to back up the stories and posts...
Anyone here know?
How about for other 'trivial' sites, like reddit, youtube, etc?
You need to think about what data you are protecting, why, how you RESTORE it, and whether it needs an offsite solution at all.
If you do that properly you knock a zero or two off your technical volumes, every time.
Conversely, it's seldom worth doing that at $SMALLCO because the entire enterprise fits into a Terabyte or less and storage is cheap. The danger is that throwing it onto something offsite - whether it's disk, tape, cloud, or ferromagnetic core memory - can lead to lazy thinking about what happens when you need the data in an emergency. Even if you do business by emailing spreadsheets around you might end up with inconsistencies in your backups, and if it's the payroll spreadsheet that Daphne left open overnight for a week...
"... and more and more now there are all kinds of electronic goodies available" -- Pink Floyd 1972
... because hard drive costs have come down so much it's just cheaper to buy a bunch of hard drives and mirror like crazy. Another factor is speed. Backup up costs time and time is money. So it makes sense that more and more organizations have moved to to mirroring/RAID solutions.
RAID is pretty damn robust these days also and with drives as cheap as they are you can create many mirrors at once. I've owned raid 5 over the past 7 years and I've never lost data drives have always died 1 at a time on a disk array. As hard drives got cheaper I get to back up the entire array with one hard drive before I start to replace the drives in an array. My first array was 4x160GB drives as soon as 500GB drives became cheap enough you can copy the contents of 4 disks onto 1. I imagine this pattern has happened for many organizations as well. Drives just keep getting bigger and cheaper fast enough that you can affordably back up what used to be a 'big drive' in a year or two's time as the prices come down.
To most PHBs a computer is a toaster. That's right, an appliance. Nothing more. Most have no idea of the nursemaiding a computer needs, or how vulnerable company data is when it's reduced to 0s and 1s. Until the "unthinkable" happens and the toaster breaks.
Now. I used to be in the biz of disaster recovery. It was lucrative when consumers who valued their potentially lost data found me and asked me "Can you do it?" and I said "Let's have a look". My success rate was something like 40%. Surprisingly (for me anyway), my better corner was recovering data from physically damaged flash drives. Out of several dozen of those I only had one that I couldn't recover anything from. Hard disks are a different animal, and the whole thing can be frustrating when you sit back for a minute or five and consider how much easier your work would be (and how much hardware you can sell) if your client had the benefit of hindsight and someone around who knew the shit of which he spoke - so a week-long recovery project that may or more than likely may not work turns into a two hour exercise in restoring on new hardware and sending the happy client on his way. Now, that is an exercise in getting repeat business through recommendation.
As far as "accidental" deletion of data: there is only so much you can do to protect the stupid from themselves - on a network share, for instance, you can deny users the right to delete files. Job done. In an environment that is supposed to be secure, that's a good start. VSC is another handy tool but you don't need to tell the stupid that their fuckups are (sort of) covered - it breeds complacency and does little to nothing to train responsibility.
Operation Guillotine is in effect.
Basically any system that you answer the question "so, waddaya wana backup of?" to truthfully, and then don't change your data layout around and put important things somewhere else.
Which means people need to thing about a backup BEFORE they start setting up / installing a new system. Then it's pretty easy. Anything that gets added as an afterthought is much more hassle and a lot more prone to breaking, because of "ooops, I forgot to include that config file over there on that other share that I use." things.
Also, set up an automated backup, and when possible an automated restore into a test environment. Once you have done that you can basically forget about your backup system, until you find that you test system doesn't work.
You're so wrong ...
Tape is the ONLY way to make serious backups and do archiving once you start having serious data volumes. 50TB of data is a LOT of disks... now do daily increments, weekly full backups with 2 months retention... also single disks are always slower than streaming to LTO4 or LTO5 tapes so your backup window becomes too big to handle.
Encryption?
Search RapidShare and MegaUpload!
The worst though (and the case I saw once) was the backup job that ran locally every night quite reliably. This was a personal backup solution for a laptop but theoretically the same can happen anywhere else.
Anyway, the hard drive crashed (head crash). Restored from backup to a new hard drive. The one file that was absolutely 100% critical had not been backed up. All the moderately important stuff had been, but this one file (I believe it was a masters thesis in progress or something) had not been.
Disk gets shipped off for a few thousand dollars of disassembly-level recovery.....
LedgerSMB: Open source Accounting/ERP
I backup to backup server every hour and then spool to tape every night.
Once a week that weeks tapes get sent offsite and old 3 month old tapes are returned for re-use.
If the company's IT department is NOT doing daily backups then it's IT department is ran by an idiot.
If management will not pay for it, then they need to be told, "so everything you do today does not matter and has no value? because that is what you are saying when you say backups are not important. Hard drives are not reliable, if they crash tonight and we lose everything, how much money do we lose?"
You have to talk to them in management speak... AKA money.
Do not look at laser with remaining good eye.
So a memory hierarchy emerged: local use on solid-state or disk, first-level backups on disk, and second-level backups on tape. That sounds reasonable for organizations big enough to need tape.
I work in SMB consulting and we get a lot of penny pinchers who come to us and express an interest in "cloud backups" or "offsite backups".
None have the bandwidth for it, but I always ask them how they would handle a large restore. Some cloud backup services claim to supply you with a USB disk, but what's that involve? If you're lucky, next day air delivery? And what do you get, just a USB disk with all your files? No NTFS permissions?
What about data from applications like Exchange? SQL? Tons of places run apps that use SQL Express DBs that won't get backup with cloud software.
It's probably a decent service for an office with six guys and a share file, but its hard to see the value of it unless you get into cloud backup systems that are more sophisticated and you have serious bandwidth to throw at it.
I expect a lot of companies are just like my mom. having never experienced data loss, she doesn't see the point of backup.
Request your free CD of my piano music.
Two things at stake. One of them we can lay squarely at the feet of the customer; the other (those of us who provide IT services) we have largely ourselves to blame for.
1. There's an old adage - there are two types of people in this world; those who take regular backups and those who have never lost data. I'm astonished there's anyone left on the planet who's never lost data, but it appears there's actually quite a few. This is really something to lay squarely at the feet of the people who lose data.
2. Backup is the wrong thing to do. We shouldn't be advising businesses to backup and we shouldn't have been doing so for at least 10 years.
Why do I say this? Well, how often do you find the client's been taking backups religiously - onto tape media that's so obsolete it could take weeks of hunting to find a suitable drive? Or that they've been backing up their data all right - but that data depends on software that isn't part of the backup, developed by a company that went out of business two years ago? Or that a former IT provider set them up with backups some time ago using a piece of proprietary software that hasn't been supported since Windows 2000 and won't even fire up on anything more recent? Or that they can recover their systems just fine - but their systems aren't much use without some extra doohickey plugged into the back, and there's only about 100 of them on the entire planet because they're custom-made?
What we should be advising is "Okay, forget about backup. Think about your business processes. How quickly and easily could you start doing them again in the event of the whole lot burning to the ground? What would you need in place? How easily could you get hold of it? That might be backups, but it might also be physical equipment or knowledge that only one person holds. Write down a plan - it doesn't have to be complicated, for a very small business it might be as simple as "I'll drive down to the store, buy X, Y and Z and recover data" - and go over it, make sure it's still relevant and test it once a year."
This has two enormous advantages:
1. It's much less likely that any of the gotchas mentioned above - or any of the hundreds of others I'm sure anyone commenting can think of - will trip you up.
2. It's much easier for a business to think in terms of their own business, which means you're not asking them to concentrate on something they don't understand and are less likely to see the benefit of.
Runs nightly, full backup, archives yearly, monthly, and weekly.
Recovery scripts and directions located locally, offsite with the backups, and on my laptop (as well as in our company Redmine which runs on a separate server which is backed up similarly).
I test recovery once a season (to make sure it still works and to remind myself how to do it).
It wasn't rocket science to set up, and took a few days to stand up and fine tune.
I am very small, utmostly microscopic.
We do nightly backups to hard drives (we have 4 sets that we rotate.) We also do an off-site backup to a server in another location in our city, another off-site backup to a server in a different city, and a dump of critical data (our source code and customer databases) to a USB disk in yet another location in our city.
Granted, we're a small company, but those are the companies that are supposed to have bad backup practices. It's really not that hard to set up a few cron jobs to automate nightly backups. There's no excuse not to do it.
The job I work for backs-up 10+TB of customer data each night to two different off-site storages that are over 100Mi away from out main(and eachother) and have data going 3 months back. One of the sites is a skeletal second datacenter with just enough hardware to get back-up in case out main site gets toasted, and it has some ability to expand. Our connection to these sites is over a non-internet vLAN through our ISP. The 2nd'ary datacenter site can easily have its connection upgraded to an actual internet link if something bad actually happened.
Our companies own data is backed-up somewhere between nightly and weekly, depending on what data you're talking about.
For the same reason that
If we didn't have fire safety regulations most business would have poor fire safety
If we didn't have OSHA regulations for jobs with physical work, we'd have more workplace injuries
If we didn't have building codes, the buildings most business operated in would be unsafe
If we allowed the importation of unsafe cars, business would buy them
Doing things right costs money in the short term. Most businesses and business people think and act short term.
>showing 10 percent were only backing up to tape once per year
You mean showing 10 percent were only backing up to tape once per year to a paying third party vendor....
instead of in house which is cheaper when you know what you are doing...
I love when they twist these headlines to make them more attractive..
We do Veeam backups of our virtual infrastructure nightly. Once a week, a copy of that is taken offsite. Also, every night our Equallogic SAN's replicate with eachother. They are in three separate offices in North America. In the event one building burns down, is blown up by a nuke, or similar, we can fire up our entire virtual infrastructure in the failback location within a couple of hours (minutes really). Since we only have 3-4 non-virtualized servers, and none of them store important data, we're pretty well protected I think.
When I was an IT guy we backed up once per week doing a full computer image of desktop + server every Wednsebury. We kept 4 weeks of backup's on hand and we had auto back up software when ever files were changed that did almost what you could call SCM and backed up every new edit of a file with stored revisions. This was all done to Tape.
But most of the time, the test cases, use cases, user documentation etc get the short end of the stick. Especially code documentation that does not sit well inside the source code. [In grad school I used a bizzare thing called Cweb that had a single source code for both code and documentation. You run cweave to get the post script document via TeX and LaTeX. And you run ctangle to get c sources. There was a fortran version too. But it never seemed to gain commercial success]. Scientific papers that form the basis of the scientific codes, profiling data, important lessons learned about the data structures and scaling, painful gotcha bugs fixed etc are rarely documented, forget about comprehensive indexed search-able documentation of the lessons learned in developing the code.
Outside software development and may be accounting, there are areas where the need for back-ups conflicts with the desire for secrecy and control and frankly paranoia and delusions too. Salesmen guard their rolodex and its electronic clone zealously. Business plans should not be leaked, strategy session presentations should be kept in strict confidence etc, and most executives don't realize most sys-admin and IT already have full access to the un-backed up data. They are not losing much by allowing a clean back-up solution, but they don't seem to realize that.
Then there is Legal. They are so worried about electronic discovery. Now there are AI equipped scanners that are sixth or seventh generation progeny of humble awk and grep. They see patterns, they find keywords, they even find the euphemism and missing links, they can tell when a sensitive thread of communication goes off electronic to avoid leaving behind a e-trail or paper trail. So they too want to control the back up process.
In the end, corporate back up is not something simple like you have in your home, where the data preservation is the only criterion.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
"Snapshots/CDP"
"It is common now especially with SANs to use Snapshots and CDP" - Is it!? Snapshots yes, CDP no way, how many use that other than banks, and banks willbe making damn sure they have good backup processes in place.
"but any block level changes would mean an increase in the replication traffic" - Block level changes are occuring all the time, snapshots won't make a difference and the snapshot size only increases if there is more changed data since the last snapshot.
"we still need to defragment as fragmented volumes take considerable time to backup" - this does not apply to all SAN technology, certainly not Netapp.
If the section on Snapshots/CDP is anything to go by then the not only is the article rubbish, but the original research behind it probably suspect.
Yeah, I have a client whose medical records are truly paperless (no paper copies squirreled away), so losing the data would be disastrous. Guess how often we back up to tape? Never.
But how often do we backup? Uhm, continuously actually. Versioning and extensive logging combined with near-real-time replication means that if the server storage goes up in flames, we lose at most a few minutes of data, more likely only a couple of seconds. And yes, the versioning allows us to get back to prior versions, so it counts as backup, not just a clone.
Look at rdiff-backup.
rdiff-backup videos archive
rdiff-backup --remove-older-than 3M #Or something like that, man is your friend.
Then you'll have 3 months to correct your first line. I guess you'll do it after the first test, or never, but you'll lose all rights to complain.
Rethinking email
Now 240GB may be to small for a backup and the cost can add up fast.
I think it's partly speed. Costs have decreased, but even under the best conditions, backing up a few terrabytes of data takes a relative eternity. It's kind of OK via USB3, but USB2? Hours and hours and hours per terrabyte.
There's also a nasty bug with an entire generation of "green" Seagate drives where you can create a gigantic tarball, but when you try to read it back, the drive's firmware doesn't count "read time" as "activity", and will shut down the drive after a few hours (before it finishes copying). Last I checked, the official fix required reformatting the drive after applying the firmware patch, which won't do you much good if your first discovery of the power-save bug is when you're trying to restore the backup. You CAN get the tarball back, but you'll have to buy ANOTHER drive, then use something like dd_rescue to rip the raw sectors off in two chunks and recombine them on the new drive so you can read the tarball and get the files out of it. I spent almost a week recovering a ~1.7-terrabyte tarball from a Seagate USB2 drive for this exact reason.
At first I thought this would be a good article on getting clients to back up more, but after reading it a couple of times and double-checking my thinking by reading the comments, it's pretty obvious that the author knows nothing beyond a statistic he/she read about backups. Poking holes in his logic:
1. Tape backup currently does not have the capacity and speed to keep up with the size of modern filesystems. Solution? Create an offsite backup scheme where data is deduplicated at the source and only the deltas (changes) are transferred to the backup site. That way the backup site can chug merrily away backing things up without causing issue with the workload.
2. 99% of data recovery occurs at the file level. A user accidentally deletes a file, overwrites it, or the file is corrupted. Windows Volume Shadow Copy service was created for this specific purpose so a user can recover without bothering the admin. If you don't have Windows, every major SAN/NAS vendor uses snapshots to do the same thing. Next is disk level recovery using RAID.
Actual total, catastrophic failure is very rare. I like to tell clients to prepare for being hit by a meteorite, but PEBCAK errors are far more likely and more dangerous.
3. WTF is a "Windows Write Driver"? At first I thought this was some wondrous new feature of Windows 7 that defragmented on the fly, but no, the author is talking about Data Consistency Points. According to the article, when an OS (only Windows exists to him it seems) writes to a SAN it just blasts the data straight to disk and bypasses the cache. What he doesn't realize is that the write goes to memory cache (usually two), where it is checked against itself for consistency (is everything here?) and THEN it's written to disk. Writing straight to disk NEVER occurs, even on a desktop. There is always cache and consistency checking somewhere along the way.
Data consistency checking came about in the sixties and is used by every single storage vendor today. EMC, NetApp, whoever; they all do it.
I am Homer of Borg, resistance is - Ooo Donuts!
Pretend a guy prevents the 9/11 attacks by requiring all airliners to have a bulletproof, locked cockpit door. Hundreds of millions of dollars are spent, and prices go up. Everyone complains about government regulation. The attacks never happen. The instinct was that we just wasted a bunch of money. We spent hundreds of millions on H1N1 vaccine for an outbreak that never happened. But if it did, the vaccine would have saved many lives and prevented great financial loss. But it never happens, so it's seen as lost money.
You push for backups, and you spend tens of thousands on it for years. There is a disaster eventually that you recover from quickly, but perhaps you aren't around to take credit for it. Or you restore and everyone thinks that it was supposed to because we spent so much on it. No one will think, "Well, we just saved millions of dollars of downtime because of the backup." The thinking is just, "Oh, well it was supposed to. We spent so much on it."
This kind of myopia is commonplace in the world. We can invest money to treat everyone for chronic illnesses such as diabetes, high blood pressure, and high cholesterol with really cheap drugs. We can vaccinate everyone from childhood diseases. "Oh, but vaccines cause autism." Yes, but your child isn't paralyzed from polio. The avoided disaster is never quantified; only the cost spent shows up on your calculations.
A NYC lawyer blogs. http://www.chuangblog.com/
That is just a copy and prone to change. Disconnected media is a real backup that you can be sure is the state of the files at the time it was written.
A former web hosting company near me had copies instead of backups and both systems had their data compromised resulting in a complete loss of all their clients data. While the above has more redundancy that that failed company it still has the problem of live systems being subject to change.
LTO5 is cheaper, larger and faster than many people think and it's not the only solution. Since every year or two I get some tape reels from the 1980s transcribed without problems I think I'll be able to trust those far better LTO5 tapes for at least a decade.
It's 1:37am and I'm at work writing crap on slashdot now because 3 drives failed in a RAID10 array and I'm waiting for a rebuild. Maybe it's the card, but something is definitely wrong and I may have to restore to somewhere from backup.
Even having another system preferably offsite as a mirror doesn't save you from files being overwritten and even snapshots can get screwed up. A tape in a box on a shelf saves you from that.
WHICH COMPANIES?
Show me a Fortune 500 not backing up, if you find one I would be interested in what they are not backing up.
For small data sets with limited needs of retention then maybe portable drives an okay solution. However I doubt I want to find version x of file y under that system.
I backup, daily, nearly 10tb of data and it is always to tape. Why tape? Because it is portable, easily managed, and well known. That and I need so many of the things I have tape libraries to manage what is on site and off site.
Yeah I bet the problem exist in SMALL businesses but the article is implying all and it certainly isn't true. We are by law bound to keep specific data for set amounts usually measured in years. Believe me, its far cheaper to keep anything than to be caught without it.
* Winners compare their achievements to their goals, losers compare theirs to that of others.
I use robocopy on Windows, it's a lot like rsync. With vshadow+robocopy you can back up a live Windows computer just like you can do to on a Linux box with an rsync script:
http://ithelp.cveg.uark.edu/IT_Help/Documents_files/backup.pdf
"When information is power, privacy is freedom" - Jah-Wren Ryel
NTFS permissions are a biiiitch >_<
The only things you can back up to that preserve them are:
1. An NTFS disk plugged directly into the computer
2. An NTFS filesystem container
3. A network share on an NTFS disk that supports NTFS permissions.
Either that or you take a raw binary image of the disk (DUMB) or use a proprietary backup system which is just using a proprietary container format anyways (DUMBER).
"When information is power, privacy is freedom" - Jah-Wren Ryel
At the end of the day I still think tape is the longest lasting and most reliable. Not the cheapest, to be sure, but it is very much a tried and tested technology. I simply do not have enough faith in hard drives and flash drives. I do use a hard drive for a weekly secondary backup, but I still feel a little edgy about it.
The world's burning. Moped Jesus spotted on I50. Details at 11.
Seriously, "the cloud" is a fad and anyone who is putting their sensitive data there should be given a sec-idiot badge. It's no different than putting your backup server in the external dmz!
How does that price compare to the computer with hotswap bays you need for backing up to HDDs?
Not to mention the price difference between jukeboxes, which really go in favour of tapes?
To be fair, most people going to hard drive backup are re-considering the whole backup picture.
I've walked clients through this a few times, and typically we wind up with a full ZFS-based server, with data compression, de-duplication, and months of on-line backups, with pairs of drives going offsite daily (ideally in two directions).
Based on average compression and de-duplication ratios, you're looking at about $9000 for a system that can manage about 200TB of data. Perhaps a bit more if you want to keep lots of offsite copies.
One client was so happy about having realtime access to 6 months worth of revisions, he decided it would be best to buy two systems set up with realtime replication, just in case one went down or needed maintenance.
So, yeah, 9 cents per gigabyte is high these days, but it's really a higher level of computing.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Find the tapes, find out how much they cost and where to get them. Whip up something invoice-like that tells them how much they need. Be prepared to give an assessment of the labor and cost that would be required to restore if disaster did hit. Then offer to go fetch the tapes yourself.
If they still say no then they are intentionally avoiding the expenditure. It has nothing to do with lack of clue.
They will have confidence until their raid goes south, or there's a fire the day before their yearly backup.
A snapshot clone to SAN is most certainly a backup solution, provided you retain iterations of those snapshots and dont overwrite. It inherently provides redundancy that tape cannot (striping over multiple spindles, failover shelf copies, etc, multi-pathing, etc.). Tape writes a block of backup data to one tape, one time, and those tapes are exponentially more suseptible to static during handling than drives. Unless you configured auxillary copies to run for tape to tape, you have no redundancy in your backup, and if you do you are using double the media per backup.
That leads to discussions on how cost effective your backup actually is.
LTO5 tapes cost $70-$100 per, depending on your distributor.
The devices to read/write LTO5 cost $1500-$2500 each.
The robotic libraries to hold those drives cost $10k+, easily reaching above the $100k range for marginal capacity needs. A $mil solution for tapes is not remotely out of the question.
Maintenance contracts on those devices can easily be $10k+/year.
Now start measure the costs for offsite storage of those tapes. How long do you keep them? Where do you store them? How do you return them for use in the case of a restore? The contracts for these (companies like Iron Mountain) are pretty pricey.
Now consider that if you retain backup copies for large periods of time, you must also maintain dated technology that can read it, and in most corporations that also requries maintaining support on devices you never use and hope to never use. If your data retention reaches back to 7+ years (required by law by many financial/federal institutions) then you're talking about having DLT tapes and libraries sitting around still today unless you have a staff, processes and inventory of LTO onto which you can copy the old data.
Tape streams data to a linear storage. Your choices become one client server writing to one tape or many servers writing to many tapes. If you choose the first then you must either have one writer direct attached to every client server, or you have very poor-performing backups because of network bottlenecks forcing increased backup windows or additional drives to stream the backups across. Most corporate backup solutions are forced to stream many client backups over pools of tapes. You can create multiple pools so that you have matching retention periods. You can then retrieve tapes from storage as they expire and return them to libraries where they can be re-used. Obviously the media handling process requires hands-on administration (increasing your personnel costs). Every tape can be re-used a limited number of times before you start getting read/write errors and they must be retired and replaced (increasing your run costs). You cannot reuse a segment of a tape. You must keep the entire tape if it has even one block of protected data. This can mostly be avoided if you engineer planning media pools.
Comparitively, online disk storage is much more easily managed, more cost-effective, more redundant, and more survivable over greater retention periods.
You can get a SAN shelf with 18tb storage for $20k, support for which is rolled directly under the same support for all your data storage.
There very limited hands-on management requirements, and no daily or weekly hands-on administration needs.
You can reuse any expired block on the tray instantly.
You have striping and snapshots for redundancy.
You have easily performed shelf-to-shelf data migrations to address technology upgrades.
You have no reliance on maintaining dated hardware or a staff/process to deal with it.
Your data storage and your data protection can reside on dedicated fibre already in place for SAN connectivity, and your backups will never impact your LAN performance. Attachment to that local fibre gives you highly increased throughput over LAN backups even if you were to employ a dedicated backup VLAN and you can stream any number of cilent machines to the same devices without concern at all for reu
"But we have to pass the bill so that you can find out what is in it,..." - Nancy Pelosi
It is possible to do cloud backups while keeping them secure. You use an encrypted container on the remote filesystem that you mount on a local machine and then push the backups through that. The key never leaves your office. Even if someone can get access to the files and RAM contents of the remote system it wouldn't help them, they'd have to crack the encryption on the container.
Not that I think cloud backups are so useful anyways.
"When information is power, privacy is freedom" - Jah-Wren Ryel
I dunno what companies you've seen with their lax policies, but every single day I go into work, I do a backup. One day a week I do two. It's a 6-day rotating backup.
Then again it's a newspaper......Take that as you will.
That's what I call living dangerously and very bad advice. Even my snapshots on spinning storage are in a different building.
I've been up all night nursing an array back to health so at this point I'm very dubious about relying on onsite spinning storage for absolutely everything.
Backup and recovery are subjects that PHB's don't want to address because it costs money. Remember, cutting costs is what empowers the PHB. Few of these wankers understand the concept behind incremental backups and full backups. Because of this they don't understand that some backup solutions are cheap because they are a perpetual incremental solution. This incremental approach sounds good in theory, back up just the stuff that's changed, save money with fewer tape drives, fewer tapes, and shorter downtime windows. That is until a couple of years later when the oldest tapes are unreadable or the backup software has lost track of the oldest files. My personal experience with Palindrome told me that without periodic full backups you''ll eventually be screwed. Second, the PHB's can't seem to understand that you don't have a backup solution unless you have tested the recovery process. In order to test the recovery process you need a test server, and test tape drives. That means more money. The next problem is time. Full backups require time to shovel the data onto the tape. The amount of time depends on the volume of data, speed of the recording device, and the number of recording devices. A power company that I won't name approached this problem with critical reason. They asked themselves how long could they afford to be down and came up with a number of hours. They then found a backup and recovery solution that could perform a full restore in that amount of time. Then they worried about how much it cost.
Because the beeping sound is *really* annoying.
Rimshot
50TB is only 25 disks.
Single disks may be slower, but with HDDs each disk comes with its own drive and IO interface. Each tape doesn't come with its own drive. You need an expensive drive.
When HDD technology improves, you just start buying the newer bigger HDDs and use the same old IO interface - the interface doesn't change that often.
When tape technology improves, to use them you have to buy more expensive tape drives, not just the tapes.
What machine can handle 25 disks at once?
A tape robot easily holds 500 tapes or more, and manages them all by itself. Or would you rather have expensive people come in during weekends to keep swapping disks? Have you ever worked in a real enterprise environment and tried to set something up like you propose? Thought so...
we put our main data on tape every workday, plus full server backups every 4 weeks.... :-)
How often do you restore from tape to test your tapes? You might have 730 tapes full of garbage data. Or if you're reusing tapes, keep in mind that one tape can handle only 100 or so complete writes before degrading. /dev/null every day too, but being able to get the data back is what makes a backup useful.
I can dd all our data to
Good point. I gave up on it back then. Is it still sequential though?
External drive bays with hard disk cartridges like the Dell Powervault line work alright for this. I use this at a few places. For example, a local police station. The bay is connected to their server, which serves as a backup manager for itself and the client computers. The drives are labeled (numbered) and rotated daily. One is in the drive bay, the other is locked in a fire proof safe, and the third goes home with the Chief or Sergeant at the end of each day.
I create bash scripts in Cygwin with Cron/Scheduler integration to bypass whatever backup software Dell is pushing with the Powervaults now. It was Yosemite.
I find this to be cheap, fast, and effective for very small businesses or organizations. You have an on-site backup, an on-site backup detached and secured, and an off-site backup.
Last place I worked, they backed up over a VPN to a remote site continuously, to backup hard drives nightly, and to tape weekly. There were at least 12 tapes in the rotation unless one broke or otherwise failed, in which case the rotation would be temporarily shortened until some more tapes could be ordered. The most recent tape lived at the office. The next most recent was at the owner's house. The third most recent was in the home of the Senior IT guy, and the fourth most recent in the home of the Junior IT guy. All three of these people had tape drives at home so they could restore remotely if necessary. This was at a company with 80 people and a full server room, so it's not like it was an insignificant amount of data. In the five years I worked there, they had to load files from tape ONCE, and that was only because the missing files weren't noticed for about two weeks.
Granted, this was in an industry where such record-keeping is required by law, but even so, the backup system was considerable overkill for the need. We never lost more than one day's worth of data, and that happened twice.
How is the Riemann zeta function like Trump rallies? Both have an endless number of trivial zeros.
Nowhere within I'd say an order of magnitude or two of what you are suggesting once you start going beyond trivial amounts of data. Also it's not done within a vacuum. If your drive dies there are others that can read it so long as you don't wait a few decades between transcription - and even if you stupidly do wait decades without your own drive there are still many places that can do transcription for you.
Electricity, air conditioning etc costs money as well and there is a crossover point where a box of tapes in a shed is going to cost a hell of a lot less than spinning storage and those advantages you write of are not necessary with long term rarely accessed storage.
You don't get a call along the lines of "we need all the emails from 1997 in five minutes" in a normal business - outside of a niche in Intelligence or something everyone is going to have the hours needed to get it off tape. In my case it's "X is starting reprocessing work on a survey from 1969 in two weeks and the client is sending us the tapes" - so I forgot to add in the other big advantage - TRANSPORT. Also can you imagine what sort of costs would be incurred keeping hundreds of gigabytes of data on spinning storage since 1969 with nobody looking at in since then? There would be an astronomical difference between that and the cost of whatever transcription steps happened over the years before it finally ended up on a couple of LTO2 tapes.
It all comes down to a philosophy of "one size fits all" versus "the right tool for the job" I suppose. Tape (or removable disk in the short term) is the right tool for the job IMHO but sometimes you can get away with something else.
Also did I mention that former web hosting company near me that had nothing but spinning storage? It appears they actually did have proper snapshots on a second offsite system but whatever happened to the main system happened to it and they lost everything. The rumours were hacking or a disgruntled employee but all that is certain is they lost all of their clients data - web pages, DNS, virtual machines, the lot.
There's no need for a robot when all the "tapes" (aka HDDs) are all accessible while stored: http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/
(IIRC this was on Slashdot before).
But if an org only needs 25 or similar magnitude of drives you can go Dell (e.g. stuff like MD1000 or MD1220) for a many times the price/capacity. Most orgs with those needs can afford it.
If I were running my own company I'd go with Dell/etc and HDDs first, but if my required storage capacity curve goes up steeply, I'd do the backblaze thing.
Not tape. I don't trust tape, in my experience tapes fail way more than hard drives. Both the tapes and tape drives wear out quite fast - there is physical contact between the heads and the tape, and the drives make many passes per backup cycle. So you'll actually need a higher number of tapes compared to HDDs which are mostly fine spinning at thousands of rpms for years.
Maybe I've been unfortunate? So far the HDDs seem to do better for: total number of HDDs, number of failures per year vs total number of tapes (or tape drives) and failures per year.
It appears that I am wasting time discussing something with a inexperienced condescending little shit that has decided to call me a liar. Large seismic surveys produced hundreds of gigabytes of source data per survey even back then.
Maybe where you are - once again, inexperience.
Why are you trying to tell me all this stuff I already know about storage but just do not agree is the best solution for every situation?
I'm an idiot to refer to an example which had recent press coverage? Do you speak like that to people's faces?
Yes but "of" and "if" are different options were as it usually doesn't matter if you have / or not in most commands.