Large IDE Drives as Long-Term Archival Media?
"Backups are of no use without offsite archival copies so I plan to take one set of disks out of the pool, and archive them offsite on a quarterly basis.
However, I've heard horror stories about the data retention and usability off older disks which have been shelved for archival, for example disk stiction - where people try to restore data off of a 4 to 5 year old drive only to find that the disk won't spin up due to solidification of lubricants, or that they've experienced data degradation.
I'd be interested in the Slashdot crowd's opinion on using large IDE drives as an archival media. Clearly one possible problem is being able to get hold of a machine in the future with a suitable IDE interface to plug them into for restoration, but I can't see IDE disappearing within 5 years (maybe 10 though). I'm more interested in experiences and opinions on the suitability of the disks themselves for long-term archival.
- Is stiction still likely occur on newer makes of IDE drives or have manufacturers beaten the problems which caused this in the past?
- Likewise how likely is bit drop-out and general data degradation over say a 5 year and 10 year period, and what do people think would be the likely maximum feasible time that a shelved drive would be usable for?
- Any suggestions as to how would I need to store drives in order to minimize these types of problem and maximise their feasible life as archival media.
I haven't really had any problems with stuck spindles since the early 90's with the old Quantum drives they used to stick in Macs. I have a number of Seagate Barracudas that had been sitting idle since approximately 1996/7 that I just fired up last week. All of them (about 40) worked and still had their data, which actually happened to be usenet archives that I'd been saving.
I'm certain manufacturers have gotten even better with lubrication issues over the last 7 years and I don't think I'd waste too many cycles worrying about it. With the price of large capacity DLT/AIT tape these days, it sounds like backing up to cheap IDE disk is a viable option.
Cheers,
Just Another Anonymous Coward
I've had customers bring me DLT tape backups of their databases, and 4 out of 5 times I can't get the tape to read the catalog.
Tape works great same system same system, but it quickly becomes an arcane science beyond that.
I know this parent was modded up as +Funny, but it's actually +Informative. "Rock and chisel" are the best thing we have, and there's a real trend toward using it more. Take a look at Norsam's HD-Rosetta. It's an etched nickel plate designed to last for thousands of years. Vive la Rock & Chisel!
In that case, you could always just buy a new, cheap system for the purpose of reading the IDE disks, and keep that in the vault with the drives "just in case".
I'm not saying this idea with backing up to IDE is a good idea, though. Drop a tape on the floor while you're running to the tape drives for a critical restore, no biggie. Drop a drive on the floor in the same situation, you'd better hope your resume wasn't one of the files needing a restore.
Trolls lurk everywhere. Mod them down.
At the least, toss the media into freezer-weight ziplock bags. Better yet is double-bagging it - put the media in a smaller bag, and then in a larger bag with smaller bag's opening on the 'far' side.
Paper-rated "fire safes" work by putting a media that undergoes a phase change at high temperatures, releasing steam in the process. (Think of the latent heat involved in freezing and melting ice, same theory is used to keep the interior of the safe at a reasonable temperature.)
The only problem is that paper tolerates steam fairly well. Ditto the smoke that can make its way into the safe. The paper may be damaged, but it is still readable. Computer media will be destroyed. Fortunately freezer-weight plastic is more than adequate to block the steam, leaving only small openings in the seal. Even this is modest, and the second bag is mostly to allow you to avoid smearing soot onto the media as you remove it from the bag.
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
On the other hand, I could spend (as I have) US$40 on a basic (a.k.a. el-cheapo) FireWire-IDE case, US$30 for 3 removeable IDE enclosures, and (eventually) about US$70 each for 3 60GB IDE drives. Total cost: US$280.
What do I sacrifice? Not much ... one of the drives might fail. At that point I'd just replace it with another US$70 capacity drive (which would probably be larger.) If I needed to restore something from backup, I'm already looking at up-to 24-hour old data, and if that drive happened to die, possibly 48-hour ... it's unlikely that all the drives would fail at once.
The advantages? I can use the US$780 I save for something else and I don't have to worry about shelling out another US$1000 every four years just to scale to "current" requirements. I don't know what the upper limit of an IDE drive is these days (i.e. what can the ATAPI bus handle) but even 200GB is pretty big for me right now.
Anyway, just a few thoughts. The basic thing is lower cost for nearly the same risk ... tapes fail too, you know. Remember, too, that this story would be very different if I had to handle 50 machines instead of 2.
--- Jason Olshefsky
Karma: Poser (mostly affected by adding this line long after everyone else did)
It's not that they're showing a lack of commitment in their products. The warranties never covered data loss anyway, only the cost of a replacement drive. With the cost of drives going steadily down to the level they're at now it's frankly not even worth having a warranty. If it's not worth having a warranty then it's not worth paying warranty charges to the manufacturer to set up the infrastructure to manage the warranties. That infrastructure is where the costs go, not the drives -- if they can cut back on that infrastructure then they can lower prices. The consumer is no worse off really because the warranties are practically worthless anyway.. at least this way they're not paying for a worthless warranty. It's not a sign that the drives are of worse quality, just that after one year the manufacturer doesn't want to have to track $hitloads of obsolete hardware. They'd probably rather just give you a new drive than endure the costs of having you phone up, have a sevice woman check you agaisnt a database and have the thing shipped off.. They just keep a minimal warranty around to placate irate customers who happen to get a DOA drive and are really livid and even although it's not worth their time to go through the warranty procedure will do so anyway just because they're angry.
Note that in both cases I The remote backup part is expensive, but it's the only reliable way. You seed it by tape (full backup to tape, and mail them to the vendor) and then use dedicated lines to keep a regular incremental update going.said "live mirror".
I agree wholeheartedly. Though, I would note, that IDE is the perfect solution for your redundancy. All you need is space. It doesn't have to be the fastest, or the highest quality mirror. Buying 20 IDE drives and having half of them fail is still cheaper than high capacity SCSI. Do a RAID 50 (IIRC, two RAID 5's - mirrored) offsite, and use rsync to mirror your data over your Inet line. Or string your mirror. Have your 'backup' offsite RAID rsync off the primary offsite RAID. I'd bet the only people who would have problems with that are the ones doing heavy graphics.
Check out Rackspace for your offsite needs, I didn't think they were that expensive, at least compared to an actual archival facility. Pick your favorite encryption method to secure it. Hell of a lot cheaper than a point to point.
Those people yelling 'insecure' apparently don't have an issue with their data being driven all around town. You want banking info? Just steal the grey box out the the '80 Ford Escort. OTOH, A 'man-in-the-middle' attack requires just that. So, if possible, host at your own ISP.
"I can't give you a brain, so I'll give you a diploma" - The Great Oz (blatently stolen sig)
> It's likely that SCSI drives are identical to
> IDE drives apart from the interface.
Uh huh. Well, why don't you send me a couple of those 15K rpm IDE drives and we'll see how they compare. Oh there aren't any? OK, I'll take some 10K drives then? None of those either? You do have 7200 rpm ATA drives? Darn, there's hasn't been a 7200 rpm SCSI drive for a couple of years now...
Yes, they must be exactly the same...
SCSI drives cost more because these days most of them end up in server or enterprise level applications and are optimized for that world. So there's higher rotational speed, faster transfer and cache, higher head seek speed, and probably beefier construction.
I have a 20 mb (yes, you read that right) hard drive from 1989 that I can still read just fine. I've hooked it up once or twice over the years just for the nostalgia.
Cogito ergo sum in Slashdot.
That tapes go unusable are often due to storing them in a unhealty environment.
:-/ :-)
If you take the trouble to contact the manufakturer of you specific brand of tape, they can usually advise you about what temperature, dustlevels and air moisture levels they should be stored in.
I've had no trouble recovering data from 4 - 5 year old backups that's been stored correctly.
The correct way of making sure about long term backups is to build/rent a storagearea with a controlled climate within those recommendations.
Another thing to keep in mind is magnetic fields.
It might, for an example, be a bad idea to store them on a shelf made out of metal, in a room close to the local transformationstation (if you've got one inside your building that is) or the electrical feed for you building.
You also have to make sure that the fireextinguising system you use won't damage the tapes if something should happen.
And, of course, keep at least a few of your tapes in a remote location from your regular tapes.
You never know. Even if that place is separate from you servers you might still catch a fire, break in, earthquake, crazy terrorist pilot or waterleakage at that location.
Some of this would also apply for storing your ancient backupequipment, cd's, dvd's and harddrives.
But I don't understand why some people say that it would be safer to backup to SCSI disks...
I thought the biggest difference was in the electronics, not in the mechanical parts of the drive.
Hmm... Though, you *would* expect to get higher quality mechanism when buying a 10x as expensive disk.
This post is getting long... I'll be quit now.
/.Mattsson - My native language is not English, so please don't whine over linguistic errors. (That's lame anyway...)
For current drives though, I'd say "No way." The advances in drive storage size come from pressing more and more data into smaller spaces, meaning magnetic drift in time will affect them much more adversely than even older drives. Smaller and more compact also means the internal mechanisms need to be more precise, narrower tolerances for more points of failure. Older drives were more robust in many ways. 350M SCSI Seagate, read head came off one arm, wires shorting out on the platter. Took it apart, removed the platter, and the damn drive served without flaw for 3 more years in the home server until the box was retired. Try -that- with a drive nowadays...
Rotating backups on tape (with a tape cleaning & replacement schedule), off-location backup rotation, and 'hard medium' backup (CD-R, DVD-R, -not- R/W) of crit. files on a monthly/quarterly basis, and you can be covered for just about anything...
There's no wrong way, to eat a Rhesus...
Obviously you haven't purchased any DLT tapes recently...
Lets just say you go with 40GB DLT tapes...
220/40 = 5.5 DLT tapes to back up your data.
DLT tapes cost 50 bucks a piece. 6 tapes * 50 bucks = 300 bucks just for the tapes.
Oh yeah, now you've gotta buy a DLT drive as well... and if you plan on doing any real backups your not going to sit there and load 6 tapes in succession into the drive so your going to need a library of some kind. So, tack on 5000 bucks for a library... I'll make the assumption that your using a some free archival software, otherwise you'd have to tack on some big money for that as well...
So... 5300 dollar tape solution vs. 500 harddrive solution...
You choose...
Yes Francis, the world has gone crazy.
Actually, using RAID5 on tapes is not unusual. It has the same benefits that RAID5 disk arrays have. It allows for the loss of one tape, as well as increased throughput. This technique can actually be extended to any media.
Tapes are designed for backups. If you seriously need to backup 200GB, then you are looking at DLT or better, and it ain't cheap.
Who the fuck has 220GB of personal data? Seriously, for the cost of backing up that much porn, you can just go down to the store and buy the legit DVDs. While you're at it, you can stop off at the record store and buy some albums so you can re-rip your MP3s.
Just because you have 220GBs of hard drives in your machines doesn't mean you need to back up every byte.
Pop quiz - if I wanted to back up this machine, do I
My documents (resume, web pages, GNU Cash files, email etc.) live on a server, where they are in fact backed up nightly to a second hard drive.
Every couple of months I burn a CD of the latest backup tarfiles. Cheap CDRs are a half-assed long-term archival solution, but the price is right.
Some things (Mozilla installer, service packs) are so ephemeral that they aren't worth backing up, i.e. when you need them there will probably be a new version available anyway.
What about my MP3s and pr0n? When I've got enough new stuff I burn a CD full. Every year or so it's worth re-burning the MP3s so that I've got the same genre on a given CD. When you've got Sarah McLaughlin, Mozart, Dead Kennedies, Suicidal Tendencies, Reverand Horton Heat and Johnny Cash on the same CD, there isn't a person in the world who won't make fun of you.
I did not recommend that no backup be performed. I said that I do not trust IDE drives for long-term archival use.
If you are determined to archive to IDE, fill your boots - it ain't my data.
If you were actually going to produce some kind of machine-readable dead-tree backup, it's more likely that you'd produce a type of 2D barcode that could be scanned back in and read. Assuming an 8x10" grid at 200 dpi (the remaining area can be used for alignment and checksumming), you could get about 390K per page (single-sided...you could also double that by making it a "flippy," and you wouldn't need a notch-cutter :-) ). You're still looking at a little over 5 tons for 1 TB, but it's an improvement. 200 dpi should be well within the abilities of currently-available laser printers and scanners. If you wanted to try 300 dpi, you'd more than double your capacity and get about 879K per page (single-sided).
20 January 2017: the End of an Error.
But how about a 600dpi laser printer, 8"x10"?
For good readability, we can use:For (1,0) which gives us 3 dots per bit, or 200 bits per inch. A square inch would then give us 40,000 bits, or 5,000 bytes. A sheet of 8x10 then gives us 400,000 bytes. Or if you tweak the margins, 400k per page. So that's already 20 times your density. Increase the resolution to 1200dpi, and you can increase the data density to 1600k per page.
We can also use different encodings: Right now we use 9 bits to encode 1 bit of information (really, really, redundant). We can probably safely use the following encoding to double our data density:So this further gives us 2 bits of information in the same 3x3 square, which increases our data density another 2fold: 800k or 3200k per page. At 1200dpi, that's 3mb per page, so that 1gb == 333 pages, and 1tb == 333k pages. 67 boxes, or 134 pounds per terabyte.
There are more variations of course. We can increase density to 4 bits per 3x3 square. With a bit of thought, we can also increase the density up to the theoretical limit of 2^9 values in a 3x3 square, but we want to include some leeway for data redundancy...
So by doubling to 4 bits per square, we require only 70 pounds per terabyte. By doubling again to 8 bits per square, That's down to 35 pounds.
That much (little) paper... is actually lighter than a terrabyte of digital storage!
GPL Deconstructed
2000 sheets of 8-1/2 x 11, 20# laserwriter paper weighs 20 lbs.
First of all, this changes your estimate of weight from 100 tons to 250 tons.
Typical yield of paper: 125 lbs per tree
250 tons (500000 lbs) divided by 125 lbs per tree gives us 4000 trees.
440 trees per acre :)
This, after division, gives us 9 acres of trees destroyed for backing up 1 TB of data. Seem worth it?
Has anyone had luck using a CD-RW disk for automated backups?
I've tried to keep one in my laptop (has a combo DVD-reader/CD-writer drive) for doing automated backups but so far I've had 3 CD-RW disks become corrupt after doing backups once a week for about a month or two. Is this a pervasive problem or just something particular to my drive/disks?
However, for long-term archival backups, IDE might not be the best idea. Drives do tend to get corrupted, and if you're not careful about letting them spin down completely each time before you remove them, or they get a little too much shock in transit, you could lose all your data. If you're looking for 1-2 year archival, IDE should probably work, but not much longer than that.
The next option would be tape backups. If you have the money, I would spring for one -- an autoloader if you can -- mainly because it will be more reliable than IDE. Recent experience suggests that tape media (at least the old TRAVAN kind, and some older 8mm DAT) has a shelf life of about 4 years after you write it. I recently tried to recover, for a client, some monthly non-incremental backups for the period 1995-1998 (they are the subject of an IRS audit). The tapes were a mix of older tape (TRAVAN and compat. earlier standards) and DAT media, depending on which of two servers they came from (and how old). Only one (a DAT) of the 24 tapes from 1995 (12 DAT, 12 TRAVAN) gave us 100% of the backed up data. 20 gave us partial, and 3 were completely unusable. The 96 tapes were a little better (12 complete data, 1 unreadable, 11 partial). 97 saw 20 complete, 4 partial; in 1998 all the data was fine. After 1998, the company switched completely to DAT; all the tapes later than '98 worked fine. So DAT isn't the best long-term storage medium. It is also worth noting that the tapes weren't kept in climate-controlled conditions, but instead in an shielded box in a cabinet in a manager's office. The office was air-conditioned, and the temp rarely got above 74, but company is in NYC and it can get pretty humid in the summertime. From talking with some colleagues, don't expect more than 5 years from tape unless you've got it in a climate-controlled environment.
One solution, however, is to backup to tape and then restore and backup to new tape once a year. After their debacle with the IRS, the aforementioned company is going to start doing that in the future.
Another good question is, what are you backing up? If its documents, and you're looking for long-term storage, the best solution is to print the documents out on acid-free paper, put them all in a box, and archive them at a storage facility. Although with 220 gigs of data it sounds like you've got quite a bit more than just documents...
Statistically speaking, there's a 99.998% chance that my IQ is higher than yours. Get over it.
Who the fuck has 220GB of personal data?
And what's so weird about it?
A scan of a single frame of a 35mm film, on a high-end consumer film scanner will create a file... let's see:
The scanner is 4000dpi, so the resulting image is about 4000x6000 pixels. We are working in 16-bit-per-color-channel mode, so that's 6 bytes per single pixel. A bit of multiplication get you 144Mb. As a practical matter, the film frame is slightly smaller so your output TIFF file is about 120Mb in size. That is for a single 35mm film frame.
So raw scans of slightly under 2000 film frames will already hit the 220Gb figure.
Still think it's a ridiculous number?
Kaa
Kaa's Law: In any sufficiently large group of people most are idiots.
I think this is a bit of a ridiculuous suggestion.
/. last week. (Though the
Why not just save up and buy a DVD burner? Spend
a little time to sort out and 'archive' some of
your data so that you can back most of it up once,
then only back up the parts of it that you 'take out' of the archive.
Yeah, I realise at 4G or so (?) it is going to take a few DVDs and a bit of time organising that 220G, but probably less than the time you would
waste getting your hard disk solution going.
As for stiction, in the 5 or 10 years you mention that taking, there will be larger optical formats for data storage & backup, ie like DVD burners. There was an article on one such format that can store 120G or so on
capacity for a burning media will probly be less). In a few years or so, buy one of them and
transfer your backups to that format.
Alternatively, if you can afford now to buy 160G drives to use as backups, then in 5 or 10 years
when you might need to worry about 'stiction',
similar capacities will be much cheaper and you will be able to buy shitloads of extra disks to
transfer your backups to.
still, if you have 120GB now, in 5 or 10 years
you could have a few TB...
Several years ago I heard about some FMD
technology that supposedly held quite a lot of promise for huge capacities... no actual product seems to be materialising though..
here's a few lines quoted from some article written in feb 2000:
Hello FMD-ROM -- Bye-Bye DVD?
By Andy Patrizio, Byte.com
Feb 21, 2000
"Constellation 3D is in the final development stages of its FMD-ROM drive. "
"The capacity potential for the first-generation of FMD-ROM is up to 140 GBs of storage, almost 15 times the capacity of a dual-layer DVD-ROM disc."
Solution 1: RAID5 IDE cards are available. Buy large disks, and budget for replacing the RAID unit every two years, and for spares. Additionally, if you have the bandwidth, backup the system to an identical setup at another office location or a 3rd party. After an initial large backup, incremental backups should be manageable overnight for small to medium-sized companies.
Solution 2: Contract a 3rd party to do the job. If they lose your data, sue, win, retire, and stop worrying about it.
I wouldn't use hard drives for back up in any scenario. They are really unreliable. I'm sure we all have a few drives that *just died* lying around. That's why you need something that has the storage mechanism separate from the storage medium.
So, use tape drives for short term. Don't worry about massive data as it is likely to not be changed the next backup around. There are many apps out there that will backup only what actually changed. After the first backup, you can just leave the tape in the drive and automate it.
Then every year copy your tapes to DVD's, as at least these will last few years and put them in a place outside your house, like a safe box or something. Burning is a hassle, especially with lots of data, but once a year is not that bad.
This will only last you a little longer though. So you will need to copy those DVD's onto other media/formats.
In 10 years, chances are that it will be a challenge finding a device capable of reading your backup format/media. Even DVD's will not last that long. They will be replaced by drives 1/10th the size and 1000 the capacity. Not to mention the file systems will likely go through a big change as well.
www.paperdisk.com claims that they can get either 660K or 1MB depending on resolution on a sheet of paper. How long a piece of paper will last when encoded with this density is unknown, but with good paper I'd bet it's a hell of a lot longer than any disk. Furthermore, even at that density, there's a huge ammount of physical redundancy in the data storage. If the paper gets to be fifty years old or so, I would imagine that the technology would be available to cheaply scan at ultra-high resolution to compensate for any degradation.
A little off topic but I have an old 40Mb Miniscribe hardrive which was removed from an IBM XT around 1987, been in storage ever since. A few weeks ago I got curious and decided to plug it in to a spare box I had lying around to see what would happen. Fairly easy since the original MFM controller and cables had been kept with the drive.
I was amazed to find that not only was all the data on the drive intact but the thing booted up straight into MSDOS 3.1 with no problems.
MSDOS 3.1 and Word for DOS 5.0 really scream on a PIII 800 !
Anyway, curiosity satisfied I put the drive back in storage. I figure I'll try it again in another ten years or so... if I can still find a motherboard with ISA slots
I've never had a hard drive fail that was in storage, not counting the one that rattled around on the dashboard of my car for 6 months. I still dont think I'd recommend using them for long-term archival media though