Hard Drives Instead of Tapes?
An anonymous reader writes "Tom's Hardware News weekly news letter has a very interesting article about Dr. Koch of Computertechnik AG who won the contract to build a RAID backup system for the University of Tübingen. Dr. Koch took several standard entry-level servers, such as the dual-Athlon MP, and add modern components and three large-caliber IDE-RAID controllers per computer, and a total of 576 x 160GB Drives."
This is a much better solution than tape, really. It's predictable that the industry will probably move in this direction, now that the hardware is cheap enough and of high enough capacity to serve this function.
Imagine: instant recovery. Your backup could be a usable image of your live server.
So a BIG RAID is somehow safer than many small RAIDS? Backups aren't just for the heck of it...some of them are required for compliance, i.e. the financial industry.
What about being able to transport and store the information offsite?
I mean, sure tape isn't great, but it's a lot more transportable than harddrives.
---
Programming is like sex... Make one mistake and support it the rest of your life.
There has to be a better way than relying on anything stored in magnetic format, optical I think woudl be preferable, and resistant to EMP.
That would take a while to fill with MP3s!
But as large as harddrives are getting, the demand for backup will still be larger. I don't see this as taking over tape any time soon. People have been talking about how big harddrives are getting and about the demise of tape for a long time.
Just remember, if you can build something like this for backup, you can also build something like this for regular storage... and then what will you do if you need to back it up? Especially if you need to have a 6 month rotating backup...
I'm afraid it will be back to tape then...
I know of a lot of people (myself included) who use multiple external hard drives in rotation for their backups. Especially now with servers' hard drive capacities growing so fast. I just specc'd out a fileserver for a department at a cash-strapped public institution, and a tape drive big enough to backup the system's disk would have been more than 50% of the cost of the computer. Not to mention the cost of tapes. Instead I set them up with two firewire hard drives. For their needs, the reliability/longevity/cost equation made hard drives the best solution.
One thing about tape systems that I didn't see mentioned was the portability of the media. Data recovery is still impossible if your backup burns up along with your server. I don't see anyone rolling one of these out to the offsite storage.
Maybe you could do it with a big pipe between your backup location and your servers. But I bet that would cost a bundle in bandwidth.
Also did anyone notice that typo on UPS (maybe they were on drugs USP)! It took me a good minute to catch it.
"I'm just here to regulate funkyness." - James Gandolfini, as Winston in The Mexican
sPh
The unfortunate thing is that tape technology just hasn't kept pace with disc technology. Back in my first job, we were backing up $1,000 20MB drives onto $40 200MB tapes. If that held true, today we would have $4 tapes that would hold around a terrabyte of data...
But, we now have $100 tapes that hold as much data as a $100 hard drive.
We switched over to hard drives for our backups at our (modest) server facility. Late last year we spent $2000 on a system with 600GB of RAID-5 protected storage. That holds current and historic backups, for around 6 months with our current load. We then weekly dump the current data-set off to a removable 120GB hard drive, which we take off-site.
Tapes are SO dead...
It works great.
Sean
Right now, Sony is shipping Super-AIT tapes. The cartridges are about 3/8 of an inch thick, and each holds 500GB, before compression (which is integrated in the drive hardware). The drive can read or write at 30MB/s, before compression. With typical IT compression of 2:1, you get just under 60MB/s. The cartridge goes for about $150. Just try and get a terabyte of disk for that much. No, the drives aren't cheap, but they get paid off quickly.
Yes, disk is good if you need instant access to your backup, and for small installations of under a couple of TB, using disk backups make sense, but for larger data pools, tape is far more economical.
Also, as mentioned in the article, disk is terrible if you need off-site backups. In addition, a tape library consumes far less power, takes up less space, and produces less heat than a drive array of the same capacity.
Basically, the death of tape has been predicted for years, but it hasn't happened yet.
What about reading the article?
[snip]
The Real World: Hard Times for Tape Backups
There's one aspect in which Dr. Koch's backup system can't keep up with tape solutions: storing the backup medium in another location after the backup has been completed.
[/snip]
I looked at doing something similar (but on a smaller scale) for my home.. but the amount of power that a hard drive based storage system takes is amazing. In additional IDE hard drives arn't know for their reliability.. :P (I've had numerous IDE raids fail spectacularly to the point I won't do that again...)
I ended up going on ebay and getting a StorageTek 9714 "Media Library" with 2 DLT 4000 drives in it. It takes a maximum of 2A of power.. (I've measured it much lower then that when the tape drives arn't in use..) This sucker will store up to 2.4 TB ( 1.2 TB uncompressed) in the 60 available tape slots..
The electricity saves more then makes up for the cost of the tapes.. (Also I expect the tapes to last approx 5-10 years.. I wouldn't expect that with the hard drives.)
--Mark
Instead of building a giant kluge, why didn't they buy a few Quantum DX-30s? Each one only takes up 4U, holds 20 drives, and the internal software emulates a tape library so it easily integrates with enterprise backup software from Legato or Veritas. If your environment requires off-site storage, you could attach a tape library to clone the backups and then store the tapes off-site.
I heard that the latest in Firewire attached drives
were high capacity as well as portable. Sorry, no
link, me too lazy.
Disk backups have been around for awhile, companies such as veritas, storagetek, emc, etc. offer Point in Time backups to disk, as an itermediary to tape. As others have stated, you still need offsite for DR.
Your mind moves quicker than a nun's first curry. - A. Rimmer
Isn't this a dupe?
That's fine for MP3s, but where am I suppose to store my DivX "backups"?
Haven't you ever put a CDR in a microwave? Pretty lights! (I take no responsibility for any damage to your microwave...)
Dupe posts are
The idea of making backups on hard disks is not new- storage area networks (SAN) do this and they avoid the issues with off-site backups by connecting the storage media via a wide area network. This does create potential problems with network bottlenecks, BUT it does allow for a quick transport (at the speed of network) of the data off-site. Additionally, it allows for quicker disaster recovery- reconnect your network and download the data, voila! In fact, it really sounds like the good doctor in the article just made a local area storage network.
"I hate quotations. Tell me what you know." -Ralph Waldo Emerson
To all those people that are asking how to move data offsite. This is why employers ask puzzle questions...so they can find people that can think through the problem and don't need to be shown how to do every damn thing.
Just because tapes have always been driven to their destination doesn't mean that's the only way data can move.
As others have pointed out, you can build the hard drive backup offsite and move the data there periodically over this new invention, the internet.
t
There is no way I would want to support that monster. I didn't see any mention of what happens when a drive fails. It's cake with most any SCSI Raid controllers. Look for the orange light, change the disk. Even promise makes IDE enclosures that do the same. With this system, do you have to take down the node when a drive fails? Sure it's a ton of space, but I'd give up some of the space for some easier administration. It only costs $70 per promise enclosure. That'll add about $12,000. So what. when you've spent $450,000 what's the big deal.
I've been looking into backup to disk lately. We do about 600 Gig a week onto LTO tapes, 500 Gig of full and 100 of incrementals across all systems.
My preference would be two sets of 4x160 in a RAID 5, using two Adaptec 2400 ATA RAID cards. That'd give me a formatted capacity of 2x 409 gigs. I'd want two of those systems available so I could have two fulls and two sets of incrementals on hand at any one time.
The only stumbling blocks I've found are: finding a 2 or 3U box that will accept two of the 2400 cards and that will also provide space, cooling and power for 8 or 9 ATA160 drives. Some of the systems designed to be RAID cabinets or the bigger 4U systems might work, but short ATA cables are tough to work with and some of the front bay mounts may not provide the cabling length.
I'd probably back up the backed-up store onto tape to meet offsite backup requirements, although I'm not entirely sure how well that would work.
Suicide Booth: You are now dead! Thank you for using Stop and Drop, America's favorite since 2008.
The point is, what is the worst case disaster that can happen on site? If the site itself is sufficiently secure, there may be little point in having off-site storage for relatively shortlived data. It's pretty difficult to steal one of these systems, and if it is in a tornado-proof basement suitably protected from flooding, anything that takes out both it and all the surrounding systems with the live data is also likely to take out the need for the data. It's clear that this is transient stuff - not like financial information that has to be kept for years and years.
Panurge has posted for the last time. Thanks for the positive moderations.
If I wanted to do something like this I would use a NetworkAppliance Filer which "speaks" both NFS and CIFS natively, and uses snapshot techology. There is nothing like "drag and drop restores" from a read only copy of the data (snapshot reserve) and the ability to back up the snapshot without worrying about open files! And yes I would use tape to back up the Filer! Obviously the software is custom written for this particular use and considering there are any number of commercial alternatives, I just don't see the point other than to say "we built it ourselves". It might be faster, but you had better hope nothing fails!
portability of the media
Oh No! All of my backups are gone! Guy walked out of here with a backpack!
A hard drive is sensitive to vibrations and has too many moving parts. The only reliable backup media is punch cards. Just don't store them near liquids.
Karma: The shiznight, mostly because I am the Drizzle.
I'm not sure I would have choosen that motherboard for this task. There have been quite a few [forums.2cpu.com] people that have actually had the ATX Connector melt and burn out on them. It seems to be related to the fact that the TigerMP (2460) does not have a seperate 5/12v aux. header that most other dual AMD's have. So it ends up pulling all the juice it needs from ATX connector and melts it.
High powered CPU's and PCI cards drawing off the 5v rail cause this most often, and these boards have 2 x 1500 MP's and 3 x 3Ware RAID cards. (The RAID cards should be running at 3.3v though)
I personally have 3 of the TigerMP's and plan on replacing them very soon for this reason.
I've been running four 120GB drives in a RAID5 configuration at home for about six months now. It makes sense, since it's online storage, not offline, it's relatively fault-tolerant, and if I do lose a drive, I'll just shut the server down until I have replaced the drive, so to ensure that I don't lose more drives. I know that it'll have a higher change of catastrophic failure later, but it's not going to be any worse than tape's 20-30% fault rate, so I'll gladly live with it.
Besides, I'm the only one of my group of friends with 360GB fault tolerant storage online. It's good for geek-factor.
Do not look into laser with remaining eye.
You would think you hear all the good quotes on slashdot, then to discover that someone else said them.
"A beowulf of tape drives?"
No realy, this is actualy a good idea.
"Replace all Tape drives with Hard drives?"
No realy, this is actualy a good idea.
"Steve, do YOU think people will need more that 640KB?"
No realy, this is actualy a good idea.
"WHY DOES SLASHDOT ALWAYS POST THE _OBVIOUS_ QUESTIONS?"
Exactly...
So it's no a dupe.
Somewhere I read "The sum of human knowledge is stored on magnetic media with a one year limited warranty"
SCO to Hell
Data probably is corrupted much more frequently by mistakes and systems problems, and with the sort of live redundancy favored by DR architectures the bad data is already duplicated on any redundant system before the problem is discovered. Journalling filesystems with snapshots could be helpful here, but what if the problem is in the FS code?
The bottom line is that there is no substitute for complete data snapshots on external media, and even then you better be sure you test everything periodically for end-to-end validation of the processes and procedures as implemented.
One of our campus system admin people broke the $100,000 backup robot and didn't tell anyone. A few weeks later the Digital UNIX RAID server (filer) bit the dust and we lost 90% of the user space. The bits and pieces of data we could recover (from month old tapes) was mostly e-mail and websites that people replaced/deleted by then. To make things worse, the data was stored in a proprietary format that only DEC could read so it took three months to get data manually extracted off a tape. So how is this better than a RAID backup system. A vast majority of the time, it isn't a natural disaster that takes out your file server, so keeping the tapes off-site is by and large useless. Me thinks that this RAID solution is pretty dang nifty. The durability of a hard drive is much much better than a DLT tape. How often do you think your sys admins test their backups???? Chances are, not often enough to catch a bad tape before it is too late.
This isn't some sort of revelation. There are multitudes of stories on this site alone about doing this. Major backup software vendors (Veritas, Legato) all support writing their archives to disk/file locations. The homegrown solutions using tar/cpio/gzip are too numerous to mention.
Many, many companies do their daily/incremental backups to a disk location, only sending their weekly/monthly archives to tape and offsite vaulting. Another method is to have backup software writing almost continuous backups to a disk or near-line storage medium (think HSM) only to have a tape backup solution come along during off-peak hours. Yet another method is to use filesystem snapshots to create a temporary backup copy of your dataset, allowing the backup software to work against that, removing the load that a backup places on the primary dataset (such as a high volume database).
Smart IT managers have figured this out long ago.
First, nothing begins if not opening
The real problem is that tapes lag behind hard disks in terms of storage capacity vs. cost. There isn't even much of a weight/size advantage with large tapes (DLT IV, AIT-3). Since 80GB drives can be had for less than $100, that would take 2 DLT tapes at $45 each. You save a few bucks on the tape, but you have more tapes that take up more room. Plus, hard disks have the extra benefit of being tons faster and seekable.
Now your argument about hard disk capacities increasing doesn't hold water. You will need more tapes to back up bigger data storage arrays anyway, so by that logic, you could buy still more cheap hard drives. The sweet spot in cost between tapes and the hard disks won't change.
So you just size up your "real" storage, then buy extra, cheap hard drives in sufficient quantity to mirror it at intervals.
I think the real issue why no one does this is because it seems counterintuitive, its not a common practice. Every time I want to configure some massive, but cheap volume to store project data, I always get stopped by my boss with "well how are we going to back this up?". The tape technology isn't there (for the right price). But if we spent the money we spend on tape drives and tapes to fund a hard-disk based solution, I wouldn't worry about how we would back those massive volumes up. I could probably buy a whole palette of hard disks for the entire project, and allocate X for the actual storage, and have Y slated for incrementals, and Z for archives and hot spares. Plus we could move hard disks in and out of the data volume and into the backup pool (or vice-versa) as our needs dictate.
What's nice about the hard disks is that they will be in storage, in parked mode most of the time so you shouldn't have to worry about them failing even if the warranty is shoddy. And it's got built-in electronics, so you don't have to worry about a tape drive/robot going on the fritz.
Fuck Beta. Fuck Dice
Hard Drives are far too sensitive to vibrations and have too many moving parts, magnetic tape is too prone to magnetic inference and damage, punch cards don't like liquids!
Can anybody think of a completely fool proof viable large sclae memory storage system?
Murphy's Law of Research: Enough research will tend to support your theory.
Now, if only you could learn to spell w00t, life would be perfect (like when you land at the top of the flag pole and get to see the fireworks).
Karma: The shiznight, mostly because I am the Drizzle.
StorageTek offers an IDE-based SAN storage device (BladeStore) that exceeds this and would be a whole lot easier to manage.
"Remember, any tool can be the right tool." -- Red Green
There is no hard drive brand I would like less to have in a hopefully reliable array than Maxtor.
I lost 100% of a batch of 10 Maxtors at work. 75% of hard drives retired due to failure (not old age/low capicity) have been Maxtors.
We have a unique backup method that is solid-state and faster than tapes.
What we do is plug a digital camera into the server, and copy everything to its flash media card inside. When we go on vacation we just take the backup "off-site" to the Bahamas.
And in the event of failure we also have a 256MB backup of the first bit of stuff on the hard drive, and a picture of the server room so we know what to order after it melts in a fire.
Offsite backups, whether tape or disk, present some pros and cons.
Pro: offsite is safer from local disaster effects.
Con: data restoration takes longer from further away.
Pro: high bandwidth connection makes moving data quick enough.
Con: high bandwidth connections are expensive
Con: high bandwidth connections are susceptible to disaster induced interruption
Overall, though, I like the random access provided by disk drives over linear searches of tapes. In case the network connection is broken to the backup site, you can easily load a couple of terabytes on cheap IDE drives into the back of your station wagon and bring them to any site you like and the effective BW will still be pretty darn good.
If you drive your station wagon across the continental U.S loaded with 3 TB of IDE drives in 3 days then you will be running faster than T1.
safer away from local disaster access time is high when locals need restoration big net pipe to far away but disaster that kills the network pipe ? maybe hard drives can be couriered back."Provided by the management for your protection."
Never underestimate the bandwidth of a station wagon full of tapes traveling down the interstate.
If you think education is expensive, you should try ignorance -- Derek Bok, president of Harvard
Wasn't there a recent Ask Slashdot thread about this, where the idea was thoroughly shot down?
.:|Jon|:.
This space for rent, inquire within.
576 Hard drives.
Assume 5 years MTBF.
That end up being 100 Hard drive failures per year, about $10,000/yr, not counting labor.
Or 2 per week. ($200/wk), if efficient to replace then add another $100/wk for ordering, shipping, storage, replacement and disposal.
That's assuming good cooling and low usage (equivilant to an intermittant home user - which is what I expect a good backup system to get used to)
So, ignoring the cost of the initial investment, they'll be paying up to $15,000 per year to maintain this backup solution.
This is more expensive than many traditional backup methods, such as tape.
However there were a few 'gimmes'. Firstly, the array only has to last 5 years. Secondly they are using 5400rpm hard drives - much cooler. Thirdly, these hard drives have a 3 year warranty, which is better than most places will give you now.
So it's likely that the maintenance cost, in this case, is going to be low compared to the initial investment.
The real problem, then, is the tendancy to keep an old system long past its prime and original intent. Someone in the future will say, "Instead of junking the system and upgrading to new technology, let's just throw larger hard drives in there each time one fails and up the capacity. Eventually it will cost $10k or more per year, and they won't know it.
-Adam
large Raid is better as if your gonna bother to make a large Raid array you might aswell make it the best RAID option.... 5 involves mirroring and parity
if one drive dies u can hotswop it with a fresh one and it will claim back the data from another hard disk with no down time
The big boys in storage (ie: emc, ibm, persist, et al) are already doing this with self healing cheap disk based WORM technology to do archiving of data locally and mirroring geographically. This guy is just using the idea for the hardware design without the smarts of the self healing software.
The company I work for has been doing this for a few years already... ATTO Technologies, Inc.
There are many off the shelf products to do this. Among them:
o Tier 3 storage platforms from vendors like NetApp and others. You can use a NetApp R100 as near line storeage for backup.
o Techniques such as BCV or snapshot for backup. You can leave the 3rd mirror broken all day and use it for fast restore that day if necessary (or for remounting as a reporting database or for copying production data to test, or, or, or)
I certainly wouldn't roll my own. Tapes aren't going away. There will always be a need for archival and secure off site backups. But doing short term backups to disk (or staging backups to disk) is become a fairly common solution to dealing with today's larger data volumes and smaller backup windows.
Perhaps this should be an Ask Slashdot:
;-), but I manage to build reliable, low-maintenance systems for other functions. It's not the hardware either; the problems occur regardless of manufacturer or vendor.
I manage the systems for several small businesses. At every one, I've spent endless hours dealing with unreliable DAT tape drives. The labor cost is very expensive for my clients.
It could be me of course
Can anyone suggest an affordable, low-maintenance alternative to DAT? After exhaustive searching, DLT is the next best option I could find, but it's very expensive (for a small business) and I don't have enough personal experience to say it's any more reliable than DAT.
I setup one office with removable hard drives, but they have no archiving needs. Most of my clients need to archive, and as the article and others have noted, hard drives aren't up to the job.
- You can't move your data offsite on HDs, first of all. Ask those shops in the WTC. Some learned it the hard way. "Offsite" really means "some km away".
- Then, even if you moved IDE-drives out of the building, they are far too easy to break. Just remember the Jaz-drives from Iomega. Crap.
- And, as other people have allready pointed out, if your controller breaks and ruins the disks, you're hosed, too.
Tapes may be expensive compared to IDE HDs, but you get what you pay for.I can drop a tape to the floor dozens of times and still be able to read it. Do that with a HD and compare. And don't say "I'm not going to drop it ever". Because that's BS.
Libraries are expensive, but either you need to have the data backuped (because of whatever reasons - laws, business continuity etc. pp.) reliably and be restoreable within a certain timeframe and with a certain confidence, then you have to shell out the bucks for the tape-robot. Or you decide that the data you "own" is just not worth the effort and go for a pseudo-backup with IDE-RAID etc. that might get destroyed when the PSU in your backup-server decides to explode and take the drives and all with it.
It's your job, your life, your company.
Windows 2000 - from the guys who brought us edlin
Robotic libraries full of hot-swappable hard drives. It sounds bizarre, but it could happen.
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."
Just a disclaimer to start things off - I am in the tape library business, so take what I say with a grain of salt. OTOH, I am a technical person, so it isn't going to be a polished marketing twist either.
.53 failure rate is good (I'm not sure what the published rates for new tape drive technology is) but the rate 5 years down the line is going to be much higher in my opinion.
The article mentions one major drawback, the inability to do offsite storage. You could work something out with offsite mirroring, but bandwidth costs at 70TB would get excessive. Not to mention needing the same hardware setup on the other end.
The other major advantage that tape has over disk is the archive ability. Once you write a tape, that data is static. I can have it sit in a slot in the library for a long time. Since this system is only designed for 5 years, archive is not a big deal, but a lot of industries it is huge. The ability to alter data on a disk drive seamlessly is a lot easier than to do on a tape.
The person who mentioned the shock/vibe values for a disk drive VS a tape cartridge: #1 I have dropped PLENTY of cartridges, and have only has one chip a corner. That chip did not affect my ability to use the tape further. Additionally, if the housing is destroyed, the process to spool off the tape, and splice it onto a different tape is not that difficult. I would not loose the data permanently. If there is a major mechanical failure inside a disk drive, getting the data off the platters is a lot harder.
I would be interested in seeing numbers for throughput of the system, power consumption, backup window lengths, average restore time. Some of these might stack up favorably to tape, others might not.
The comment on moving to optical as a backup medium - maybe someday, but for now the space needed/time to backup to optical does nto compare well with tape. A DVD of 4.5 GB VS a tape of 100GB (Currently available, yes I know blue lasers will improve that)
As for a robot failure, worst-case scenario, you put the tape in the drive manually. Realistically, at least at our company, we have solved this problem for our customers by providing the ability to easily replace components. This can happen either with a field engineer, or even the customer themselves. Generally all you need is a Phillips screwdriver, 20 minutes max, and the ability to follow instructions.
Again, I'm not in the sales department, so I can't quote costs, but a 435K total cost for 70TB is not that cheap. With tape systems, a lot of the cost depends on how fast the backups need to occur in. I could build out a 70 TB system with 1 drive, a SCSI connection and a huge wall of tapes relatively cheaply. As you add more drives, use fibre or gigabit Ethernet interfaces, etc costs go up, but access times go down. Cost can also be brought down by not going with the 500 lb gorilla of the field - StorageTek.
Yes disk is growing, but generally it does not replace tape, it only pushes it back a layer. This won't change for a while.
For some years there have been rumours of optical tapes with capacities in the several hundreds of GB or even several TB per cartridge, but no products that I am aware of so far.
Still I think that this misbalance between tape prices and HDD prices cannot last.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted and ignored otherwise.
When it comes to banks, driving tapes around town is easier than ordering a pizza. The major banks in downtown NYC that house their operations in more than one building and which are separated by more than one other building, they have multiple mini-vans that spend all day and all night driving from one building to the next, (one of which I'm familiar with does this with buildings next door to each other), transfering documents from transactions. They do this all day long, from one loading dock to the other, and back, all day, with trips spaced every few minutes apart. All day. All night. Adding tapes or other media to the mix would be incredibly simple. And it would be handled by the same department, as the documents and the media are both handled with some security in mind.
This is for buildings that are next door to each other, down the block from each other, and a few blocks from each other. From what I've seen. I'm sure they do the same thing, on a less frequent basis (maybe hours instead of minutes) with buildings in different geographic areas.
After the first bombing of the Trade Center, I helped move computers, files, and related materials to alternate sites due to the immediacy of the situation, and due to the extensive smoke damage in the towers. Back then, some backup sites took days to weeks to get fully functional. I worked long hours for over a month moving everything to the outer boroughs, Westchester, Jersey, and other areas. One moving company that had the Port Authority contract, and had contracts with many financial firms in the trade center had to lease trucks and drivers from other companies, and had over 100 moving vans and drivers (and workers in the buildings) running for over a month 24/7, at union rates.
After the first bombing, all of the companies we moved, from what I was told, all of them, implemented backup sites that could be up and running in hours rather than days or weeks. Many moved out, or moved a portion of their operations out, to other geographic regions, with Atlanta getting a huge share, and Jersey City, closer to NYC, getting quite a bit due to their tax free business zone, and being far enough away to survive another direct attack (which we all know happened).
In NYC, there are direct fiber runs to Brooklyn and Jersey. Very expensive. But very necessary. And they are used for backup. Among other things. But there is only so much that you can fit through a pipe over a given amount of time. And when talking about financial and security firms, by their nature, they perform operations that necessitate the physical transfer of documents/securities/whatever. Adding media for important backup is easy. And after the second bombing, you can all bet your asses that any company in the financial/securities/other types of industries that relies on computer data for their existence has a survival plan, and is implementing a more detailed one, that includes physical, not just fiber, backup of their data. Either that, or they wouldn't be able to get insurance to continue their operations.
It's as simple as that. No backup, no survival plan, no physical media backup? No insurance.
Why can't linux break into the fortune 1000 further than it already has? Whose going to indemn them? Red Hat? What's Red Hat's market cap? And the Fortune 1000?
It's as simple as insurance and indemnity folks.
You either follow what your insurer tells you to do, or you are dropped. Including those firms that self insure. Because if they self insure, they are also dealing with a re-insurer, like AIG. And anyone who knows AIG would know that you either do what their highly paid experts tell you to do, or they drop you. And don't pay. Period.
Let me start by saying I also work in the financial industry. More specifically, as a storage vendor to the financial industry. This guy doesnt even fathom SEC retention requirements. Some things need to be kept in 3, 5, and 10 year retention cycles.
So, once size doesnt fit all. Especially not tape. Guess what, ATA is mature. How do I know? Because more than one company is shipping technology based on it. I'm not talking about Maxtor or Western Digital, I'm talking about NetApp, HPaq, and EMC.
Additionally, one of our clients, being one of the larger insurance companies worldwide, has asked me to architect a "tapeless" data center. We are looking to replace all of the DLT, and LTO drives with ATA based fibre channel SAN storage. We are looking to do this at less than a penny per mb.
As far as offsite storage, we will be replicating every bit of data written to the arrays over ATM to another site, so we have location redundancy. Yes this is expensive, but it's also bleeding edge.
Likely this is coming to a data center near you. Data is becoming dramatically more flexible than you would believe, nowhere is this more true than the storage industry.
I've been in this business a long time now. I can usually get a disk drive from 10 years ago to work with something. Reading a tape from 10 years ago has been more problematical.Even finding a device to read a 10 year old tape has been problematical.
That assumes the bits are even still readable which is often not the case for a tape.
In our situation (modest backups, modest hold times) IDE (not even RAID, but all in removable jackets for offsite storage) came out ahead. I do have to shut down the machine to swap the drives out once a week.The drives are used as 120GB tapes.
IDE was about double the cost of tape (per byte) but no pair of tape drives was needed, just an extra IDE card.
Since tapes and tape drives tend to change a lot over time, buying "just one" is rarely an option.
There are many scenarious where tape beats out disks in backup, ours just wasn't one of them.
It is a misconception that tapes are more expensive than hardware. Sure initial investment is lower (if you exclude the requirement for a dedicated gigabit LAN)... When you factor in mission critical (ie Platinum) support, the cost of replacement hardware, administration of huge amounts of data bundled together it really isnt that much cheaper when you can fit 100GB of compressed data one 100$ tape, and put together a couple robots that can handle 20TB each! Then factor in the high risk of data loss or corruption in disaster situations this comes off looking like a slashdot geek's beowolf hardon... Get a SAN!! -zer
---Up Up Down Down Left Right Left Right B A START
However, for personal use, a $16,000 DVD-RAM jukebox is overkill (grin). I'm considering buying a Firewire hard drive to back up my new laptop, with a program that uses a MySQL database to track files on my laptop and update the ones that have changed using a typical rotation strategy (sort of like the one I wrote for the NAS box, except without needing all that futzy code for deciding which platter to put data onto etc.).
For extra points, I could even buy *two* Firewire drives, and rotate one of them off-site every day, for far less money than buying a new DDS-4 tape changer.
For big stuff, however, there's no substitute for an enterprise storage system. The way that EMC etc. work nowdays are with "snapshot" technology. The SAN storage device maintains "snapshots" as of various points in time. You back up a "snapshot" as of some point in time in the recent past, rather than live data, so that the data backed up is internally consistent. It works very well, and will back up terabytes of data to LTO without any of the backup window problems that afflict traditional online backup. Of course, we're talking about terabyte-sized disk arrays, and closet-sized tape changers.
In the enterprise setups, nobody uses tapes for their portability, BTW. The tapes never leave the jukebox, except as packs occasionally removed and placed in a vault to place new blank tapes into the jukebox. A fat pipe is used to duplicate transactions between the local data center and a remote data center. For example, Wal-Mart has dedicated fiber optics running from their main data center in Bentonville to their backup data center in southern Missouri (which is designed to survive everything short of an atomic bomb). Every enterprise transaction applied to storage in their local data center is also applied to storage in their remote data center. There's still a lot of local data that is not replicated, but for the important data, redundancy via backup tapes is the least of what they do.
Send mail here if you want to reach me.
In light of this article here's a script I use for my backups. It copies and compresses each file to a separate volume (in this case /storage). I have a samba share pointing to /storage that the users have read-only access to. They can restore at their leisure using winzip to decompress the files. I was looking for something like this for so long that I thought it only fair to share:
please be gentle with my server
I prefer to get the best of both worlds. I have a sizeable server that just holds all of the data. At our current level, it can hold about 10 days of backups, sorted by date in their own directories. Each morning, I have a shell script that sends the latest backup to a tape, which is taken off-site for insurance purposes.
:)
The nice thing about this setup is that to restore a backed up file takes about as much time as it does to copy across the network instead of going through the tape.
of course, I'm sure I'm not the only one who's done this. I sure thought it was original when I did it though.
It's true that Red Hat does not have the size to convince the insurance industry to support its products, but other major players in linux do; IBM and HP come to mind immediately.
Is this the KDE version of Tuxracer?
Eloi are stupid, throw morlocks at them!
I've never lost a single Maxtor in all the years I've been computing. That includes my trusty old 540MB hard drive that I've had for over 6 years... and I got it used.
Dr. Koch! hehe
When I started with my present employer about 18 months ago I was able to upgrade and reallocate six Proliant servers in two racks that where originally setup by someone who needed a few good wacks with the clue bat. I ended up with a spare 6000 and some leftover parts from the initial install two years prior. I loaded the two unused bays in the 6000 with cages, still in the box and installed the RAID controller, still in the box and bought a bunch of second hand 18Gb SCSI drives. I ended up with two 84.7Gb RAID 5 arrays to use on alternating nightly backups. During the day the nightly backup is copied to tape for off site and archival purposes. The best benefit is that except for testing I have not had to restore from tape since I implemented this and about 90% of the restores have been from the previous nights backup. Restoring from and copying to the SCSI drives is much faster than dealing with the tapes. We only backup about 40Gb a night so we have room to grow. I have lost one 18Gb drive but I still have four spares.
I can appreciate the appeal of building a massive system from commodity hardware, but it states that the entire system was $435,000. After some rough calculations, a smilar system using apple xserve-RAIDs would run around $300,000, or $135k less before host computer costs, and would most likely be much easier to maintain. Plus, five racks of xserves would look pretty bitchin' :P
- much
shorter).The principle reason not to use magnetic tape is the cost of the drives, not the shelf life of an alternative medium. Reasonable capacity tape drives are still easily $4000+ with yearly maintainace costs pretty close to a sixth their purchase price. However, the tapes themselves are dirt cheap (but less so relative to hard drives every year). Basically, if you have enough data to actually use the tapes on a weekly basis (as oposed to having enough data so you could back up to a CDROM or a DVDROM) it is worth having one, if you don't it isn't.
However, when reliability of backups for Optical/Tape/RAID is compaired, they are all really high. The biggest problem for any of these backup systems is the destruction of the physical location for the data, like a fire.
Galium Arsenide is the material of the future, and always will be.
Of course, for a more permanent solution I suppose you could etch a minor planetoid like the moon with your financial records and then enclose it in a durable 1 km deep sandy, impact resistant cap. Then you'd create a race of monstrous reptiles called dinosaurs to protect the planetoid, because everyone knows those things are unkillable.
Tape is on it's way out. For the following reasons: (1) Storage capacity is not keeping up with drives, (2) Tapes are not cheap and if you end up re-using it more than 3x, then a drive is cheaper (3) Tape back up software is expensive, clunky, unreliable, slow (4) Recovering from tape has always been a VERY painful experience, (5) Tape units have more moving parts than a hard drive and, in my experience with an ATL unit, fail more frequently (6) When was the last time you heard of a home user backing their stuff up to tape?
In space no one can hear your tape stream.
EMC sells similar backup arrays for quite a while now.
hmmm...
.... ...
*in his basement, leaning back in his chair, stareing at the ceiling*
so 576 * 160 GB = 92,160 GB
averaging 2 hours for each GB
92,160 GB * 2 = 184,320 Hours
averaging 2 hours/day wack session
184,320 hours / 2 hours/day = 92,160 days
92,160 days / 365 days/years = 252.49 years
hmmm, 252 years of wack material. I think I'll need 2 or 3 of these things.
Why not create a material that self replicates and includes the data in all of its "progeny". You could even include adaptive behaviors into the self replicating medium so it will change form to adapt to different environmental stimuli. Now all you have to do is seed it on a Class M planet and come back whenever you need to access your backup.
[cue twilight zone music now...]
'the Internet is right.'
The hardware mentioned in this article is pretty slick, and it looks like something I expected would raise slashdot's eyebrow. However, there's a big element of backups that article almost entirely dismissed:
Software.
What you really have are 20+ machines, with independent IPs, system configs, and lots of DAS (direct attached storage,) with no mention about how to seemlessly make these appear to be the 70GB data store. Or where to find the machine / drive / volume that has the data you need. Or how to tell your backup clients to communicate with the hive of machines.
Good backups and restores are more dependent on the software that drives them rather than the hardware they're serviced by. That article, albeit cool to see their home-grown environment, sailed right by that point.
It mentioned something about clients connect to one of the nodes that acts as the server, but then what? Does the server NFS mount all of those remote drives? So all of the traffic from the clients is throttled to the server's GigE card? (Coming in from the client, and back out again to the backup slave?) OR, does the server delegate the backup to one of the slaves. But then how does the server know what data is where?
If they've really designed that great of a software backup package, that can make that system slickly manage that many backup slaves, they should market the software. That's more a challenge than the hardware!
Dave
There is this server client model here; there is this administrator who decides what to backup, when and for how long. Now the next step would be to have the administrator set it up to save the data from the backup storage to tape; the data is "fixed" so you do it when it is convenient (think office hours).
IBM has a product called Tivoli Storage Manager (TSM). Great product, it evens provide you with a list of data telling you what is on the tapes and which tapes are going to the off site storage and what tapes are at the off site storage, it tells you what tapes to bring back next time from off site storage.
The problem with back-ups is not that it cannot be done; the problem is that many organisations do not do it properly or do not do it at all. Properly means that you figure out what to back-up, and how to use the back-up when required. Back-up can be done with as little as a 128 MB memory stick or a CD. Testing a back-up can be as simple as using a computer at home or a laptop that can double as the emergency system.
Hell, if I could spend a day for each organisation that does not have a back-up procedure in place, I would be employed for the rest of my years.
Thanks,
Gerard
a beowulf ...
Yeah, it isn't a good solution for a backup of that size. But for the rest of it, a modest SCSI or IDE RAID in house, mirrored over the Internet to an identically sized IDE system using Rsync or some other incremental solution could be viable.
An initial sync could be done in house, after the data had "filled up" and plateaued, and after the backup can go to another, perhaps hosted or colocated site.
Obviously if your information is at all sensitive, rsync would need to use ssh or something.
Not to mention that I might have enough space left for a movie or two.....
I don't see off-site as a problem if you can afford a second unit hooked in with a high speed network. The problem I do see is hackers, and, well, administrator screw-ups. With all copies of your data on-line, if it should get accidentally or purposefully deleted, you're totally screwed. The only other issue is you're talking really large, you have a scaling limitation in power and weight.
i wish the converstations here (or anywhere) would differentiate between archiving vs. back up. they're very different (at least for our purposes).
we back up nightly. just in case any one computer fails, we restore, presto... problem solved. any working files and the OS are saved. we're happy.
our archiving needs are tremendous and require constant management. at the end of every week we offload 70-80 gigabytes of information to tape. we then dupe the tapes for redudancy. these tapes are then stored forever (or at least the 10 years the company has been around). we probably have an average of 5-7 tapes fail per year. luckily we have never had a tape fail and also have the dupe of that tape fail. we gone through several itterations of DDS and will now probably settle on some form of AIT.
we looked at the feasibility of IDE hard drives. a 120 GB IDE drive can be had for $80 today. cheaper than a 100 GB AIT. but what is the shelf life of a hard drive? how do u connect it and have it be hot-swappable (firewire)? how do we span multiple hard drives?
an IDE RAID doesn't work for archiving. after all, you can't physically store an IDE RAID after you fill its capacity. swapping out the drive and storing just the drives themselves seems only slightly more feasible. even so, this doesn't address the issue of spanning, a RAID would just give more space. a SAN solution doesn't address archiving issues either.
what i want are IDE drives with lotsa shelf life (10 plus years) and an IDE auto-loader/duper that will automate the back up process the way a tape autoloader does. anyone sell such a thing?
That makes the backup volume vulnerable to the human element. It could be stupidity (rm -rf / ...OOPS, I logged in as root!). It could be malicious (had any layoff rumors lately?). It could be a code bug. But whatever it is, all the data volumes are there, where the server can access them! You really need to use either removable or write-once media, preferably both. Then move that media off-site, to a fire resistant, theft resistant, flood resistant safe. If you use an automated media handling system, be sure it has a decent "check-out" system so you can get those off-site volumes out of the robot's hands and into a safe repository.
On-line backup volumes are one step away from oblivion. Due diligence demands we do something better!
Disclaimer: In a former job, I was on automated tape library development teams. Also on RAID development teams. RAID is great, but you must back it up to provide for disaster recovery.
And for only 3 payments of $29.99, I'll explain how to take that interview by storm! Order now!
This is my digital signature. 10011011001
we use 120GB Iomega removable USB HDs for backup. The customers know they can't throw them around, so we haven't really had any problems yet. I think it's a better way to go I guess, seems less likely to wear out or break, not to mention you don't have to worry about your drive getting all dirty and not working. Iomega has some software for backing up, although I just schedule a batch file. Large capacity backups, restorable without having to install a scsi controller, drivers, tape drive, drivers, etc to restore, and you still have the ability to have take off site backups (get two for a rotation)
Sign me up for 3 of these!
I got a major stiffy on this!
Best. Name. Ever.
In the Shannara series, the AI Anthrax did a similar thing to backup all the histories of the old world that was wiped out by nuclear war. Antrax the only surviving entity from the tech world, kept on backup the archives from one storage array to the other, so that data would always be redudentent.
I want 92TB... Please?
[sig]www.masterslate.org[/sig]
Why does it always have to be the station wagon full of tapes? Why is it always across the continental U.S.?
What is it about the credit card that makes it the ideal object with which to compare the size of a new product?
Why are unimaginably large storage devices measured in Libraries of Congress?
Let's change it up a bit: A Segway rider, traveling 15 miles across town in three hours with 1 TB in his backpack will be running faster than OC12.
The cure for cancer is coming: Reovirus
Why hasn't anyone mentioned the idea of using a hotplug HDD rotation?
It wouldn't be that big of an issue, buy the biggest hotplug drive that will work your system, backup your data, remove it, and treat it like any other tape.
Granted this would only cover a part of some of the larger backup requirements, but on the other hand, you could always have HDD1/HDD2/HDD3 for sequential media.
One thing I have been thinking about is building a win2k server box as a domain controller (no master roles) and setting it up as a "backup" server, meaning that it runs all the backup jobs and stores them to files on the HDD, then backs those up to tape.
It would be just as easily done to a removable array.
With the tools that are available, and it being a non influential server on the network, taking it down, yanking the drives (for backup) out and replacing them would be a minor issue for all the speed you would gain.
I've had to do 2 restores from tape and BOTH were fubared, DVD-R is probably far more stable and there's even the chance that DVD drives will still be available in 10 years or so.
Tech Public Policy stuff
>> and a total of 576 x 160GB Drives.
and to think I had my dot matrix printer just print the binary data every night... I should back it up onto Magnetic media, huh?
WTPOUAWYHTTOTWPA
What's the point of using acronyms when you have to type out the whole phrase anyways?
We implemented a disk backup system using Storix Software for our Linux systems and it works perfect. We have two servers that we store the backups of the workstation on. One on site, one off. We do daily backups to the onsite disk array and weeklys to the offsite disk array at another office. The offsite office does the same as us. We hold on to their weekly backups. This covers all of our bases. The Storix backups are great because we can clone systems over the network from the backups we have stored on the disk arrays. Tape drives are out - DISKS ARE IN!!!
Flexible bare-metal recovery for Linux/UNIX
...(you know, those nifty little removable enclosures) and you rotate 'em periodically. It's easy, you just turn the key and pull it out. Oh, and you have your backup array in a machine other than the server that it is backing up. That way either machine can burn to the ground and you haven't lost anything other than what changes were made after the last backup. Not to mention this allows you to power down your backup box at will to swap disks, etc. I've been doing this very thing on a smaller scale for a couple of clients for a couple of years now. I figure you can set up a backup drive array for about a third the cost of a comprable capacity tape system.
You're using her as bait, Master!
Horrible choice of card for RAID 5. go read any reviews and you'll discover its (re)build times are insane. I've been running in 4 disks in RAID 1+0 because i don't have the hours to rebuild the sucka as a RAID 5. All the card has going for it is performance which is/was slightly higher than its competition
Wouldn't it be better to make an application that uses all the extra space on the corporate desktops for backup? Most people use only a fraction of the 80 gigs that comes on most desktops. Anybody heard of any products, or can prove to me that it's infeasible?
Stop the brainwash
The cheapest one is, of course, Linux. But you can also install Windows 2000 Server or Windows Server 2003 - it makes no difference in terms of performance. 3Ware provides drivers for any of the three OSes.
for a project like this, i would use linux. nothing bad against windows, but let's use the right tool for the right job.
and using linux for this type of project is the right rool.
...if you live somewhere you actually have to pay for electricity.
Cost of ownership on drives is much higher. People see a lower initial investment and get fooled into thinking they're saving money. Then they get the power bill for 30 terabyte of disk and say WTF?!?!?!
Minimise single point of failure, reread and rewrite your data periodically if needed etc etc.
Certainly for pharmaceutical companies there is a requirement to keep data for the patentable lifetime of a drug: this could be 25 years.
Not only do you need to archive it, you need to find an archive solution that you can read in 25 years - this might mean mothballing a current system, or exporting to XML etc etc....
It's big, and it's clever.
I have, it's hard. To test this, get yerself a standard floppy disk. Write data to it. Find a bit Hifi speaker and rub it all over the magnet. Reread. It's actually very difficult to wipe magnetic storage using a non-specialised electromagnet. Ric
Just change Proliant to Presario and away you go! Of course you'd need to make them in pretty colours, ship them without SmartStart CDs and charge for them to be shipped to you when you needed it, and put them in a case that you *can't open without a hammer or far more patience than i have*
Their backup policy ("we don't need to store backups in a different location") is simply bullshit.
Given that current hard disks, and especially RAID arrays (which are used in most servers anyway) are very reliable these days, backups are needed mostly for recovery in the case of a physical system breakdown (f.e. fire) or the total failure of the whole system, f.e. through a hacker break in. For these (as mentioned by other posters), storing backups in a physically different location, in a passive way (that is with no physical connection to the original data) is the whole point of having backups at all.
I use the amanda software to backup to a hard drive. Amanda can treat a directory as a virtual tape drive. You can even set up a virtual tape changer to change your virtual tapes automatically.
I have ten virtual tapes set up on a separate backup server at work. It backs up about 5 other machines. This system has been working very well for about a year.
If you're really paranoid, you can use a tape and disk backup scheme. Use the disk for instant backup and restore and use the tapes for off-site storage and archiving.
Tape needs to be located somewher also, and there's no requirement that the HDD array be at the same location that the backed up systems.
Except that keeping the drive array offsite adds exponentially to the cost. First off you need to have a site with all the features of your primary data center (ie Power, cooling, security etc). In addition you need to have some means to transfer that data.. 1 TB per night over a t-1? I think not.
Tapes on the other hand just need to be thrown into a van and driven somewhere. Whether you choose to use a data storage facility like Iron Mountain, or whether you store them in your basement at home can depend on the value of your data, but in either case is vastly less expensive than trying to back up a huge array to another array off site.
If privacy had a tombstone it would read "We did it for your own good" . -- John Twelve Hawks
We currently use this type of setup except on a much smaller scale.
We backup our servers to tape, but we back alot of our workstations with Norton's Ghost. We ghost the workstations to a 1.3 gig Windows98 machine with 2 120 gig EIDE drives in a raid 1 configuration. We don't require off site storage for the workstation backups as they are only use to restore if a drive fails. All work related data is stored in home/or other types of directories on servers backup to tape and stored offsite. So far it's been in place for 7 months and has worked flawlessly. I have the Raid server rebooted each Sunday morning. (since it's Win98 and not a server quality OS)
I would recommend it to anyone providing they are aware of when to, and when not to use this type of backup.
We do this, although on a slightly smaller scale at the ISP where I work. We have a weekly backup of around 1TB (incrementals), and the tape system was giving us problems. So we switched to a disk-based system, which, apart from disk failures has been quite good.
I wrote some perl software to handle the backups, as a client and a server, as well as reporting, archiving old backups to tapes (may as well make use of the money invested in the tape robot), and cleaning up space.
It's called dbackup, and it's at http://www.dparrish.com/dbackup.html
Anything is possible, except skiing through revolving doors.