Large IDE Drives as Long-Term Archival Media?
"Backups are of no use without offsite archival copies so I plan to take one set of disks out of the pool, and archive them offsite on a quarterly basis.
However, I've heard horror stories about the data retention and usability off older disks which have been shelved for archival, for example disk stiction - where people try to restore data off of a 4 to 5 year old drive only to find that the disk won't spin up due to solidification of lubricants, or that they've experienced data degradation.
I'd be interested in the Slashdot crowd's opinion on using large IDE drives as an archival media. Clearly one possible problem is being able to get hold of a machine in the future with a suitable IDE interface to plug them into for restoration, but I can't see IDE disappearing within 5 years (maybe 10 though). I'm more interested in experiences and opinions on the suitability of the disks themselves for long-term archival.
- Is stiction still likely occur on newer makes of IDE drives or have manufacturers beaten the problems which caused this in the past?
- Likewise how likely is bit drop-out and general data degradation over say a 5 year and 10 year period, and what do people think would be the likely maximum feasible time that a shelved drive would be usable for?
- Any suggestions as to how would I need to store drives in order to minimize these types of problem and maximise their feasible life as archival media.
Speaking from experience I can give this bit of advice for archiving critical information. Use a solid state device, don't even consider a magnetic solution, unless losing some or all of the data won't ost you your job.
Everyone is entitled to their own opinion. It's just that yours is stupid.
Hard drives are not non-volatile storage.
I back up close to 300GB on a nightly basis using GraniteDigital's FIRE Vue(TM) FireWire 1394 IDE Ultra ATA Systems
I have 6 120GB Maxtor's and rotate them nightly, storing them in a fireproof safe, rated for paper storage. Granted, if a fire occurs, I'm not sure if the data storage would survive, but I think that would be the least of my worries, at that point. The Firewire works great and is very fast.
Using magnetic media to back up magnetic media isnt the greatest idea in the world, but it can work. Hard drives fail, and when they do, you want to have the data available so that you can get to it. The IDEAL way to do this is to contract an outside company or manage for yourself a backup server which does incremental backups as often as you need and periodically burns them to a more permanant media like DVD. If you cant afford this or dont like the idea, then you can burn DVDs on your own. A good program will track files for incremental backup and 220 gigs can fit on something like 50 DVDs, with maybe 1 more per session (assuming that not all files are constantly changed) Obviously a lot depends on what you have, how much money you are spending, and what you need.
People who think they know everything really piss off those of us that actually do.
What you're proposing will cost no less than a high-quality AIT drive, which, though you may need to span tapes in the most extreme of situations, will give you quite a bit of capacity. You can pick up 90GB native-capacity AIT drives now for around $500 or so on eBay. The media is affordable, too.
- A.P.
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
With tape, the failure of a tape drive doesn't separate your from your data (unless it catches on fire with the tape in it or something.) You can just get a new tape drive and you are good to go again.
Thus, tapes are very good because the storage medium and the read/write hardware are separated and not interdependent.
Their answer? A huge RAID array starting at 180TB and growing steadily over time.
Your answer? Probably figure out which of the data is fixed and which of it changes and attempt to back up accordingly. Does all 220gb change on a weekly basis? That seems unlikely...
The "right" way to make your data reliable is with mirroring of various sorts. On-site backups are kinda silly except when you're using them operationally because you dont have the disk capacity to do otherwise for infrequently used data. Backing up to removable media should be exclusively for offsite storage.
So get two drives and mirror your data, and you're covered in the case of drive failures. If your worried about a whole machine going up in smoke, maybe do a nightly or hourly rsync to another machine across the room.
If your home data is important enough to need offsiting (usually a home user's "important" data amounts to what could fit on a CDROM, not 220 gigs - the rest is probably multimedia fluff that you can stand to re-encode or download in teh case of a tornado or fire), then consider rsyncing with a freind at night over your DSL or cablemodems in a mutual arrangement. Encrypt the data before syncnig it over if it's sensitive.
If you're a business with large volumes of data that need to be offsite in case of disaster, then the best practice is still tape drives of some sort, and an offsite storage service like Iron Mountain.
11*43+456^2
About all tape has going for it over disk, are physical robustness issues (the lack of the "stiction" problem that he mentioned, the fact that dropping a tape onto the floor is less scary than dropping a disk, etc).
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
Well, don't know about LucasFilm, but Pixar use massive tape libraries (we are talking robots with 100+ drives and tens of thousands of slots.)
Incremental backups every HOUR, tape drives spinning all the time. They are a customer of the company I work for. (Veritas)
You speak of not having tape failures, but you omit one important fact; how many times have you successfully retrieved data from tape?
IDE disks will fail from continual use, and that failure will generally be obvious, but what way do you have of knowing that you genuinely don't have any tape failures, if all you are doing is rewriting over the same tapes?
On a smaller scale (personal), this is essentially what I do.
First, only some personal data is critical, not the GBs of operating systems and programs I can redownload/recompile if necessary. Things like documents, saved games (you'd think it's unimportent until you play the first 2/3s of Fallout 2 five times and can't stomach getting far enough to see how it all turns out, because you'd have to play that 2/3s again...), email maybe, whatever, but some limited amount. 10MB can go a long way... that's a lot of programming, for instance. (Been working on a project for about half a year now and I'm just ready to break 300KB of code...)
Then, set up a live backup amounst all the disks you have on various machines. I use unison so that I can change files in the repository on any machine and have the changes propogate correctly, instead of the unidirectional updates rsync does.
Use symlinks to put everything you need into one directory, and tell Unison to follow the symlinks, not archive them directly. Then just run that every so often on the machines, and you're set.
Once more of my family gets set up with always-on connections, I intend to set up a family-level repository of backed up files with Unison, so that "off-site backups" are a weekly script run without intervention by the family, making off-site backups across the state (or country, or world) easy. This will protect the scanned pictures and other things in the family heritage easily and effectively.
Which reminds me, the first always-on connection just came online and I really ought to talk to that member about a reciprocating backup setup...
True. The tape drive solution is oriented towards businesses who have the money for a backup device :)
From the poster's requirement of needing offsite backup, i was assuming that it was for a business.
For home users, you can probably afford one generation behind. A DLT 8000 (40GB/80GB Compressed) drive on eBay runs for about $500. A DLT 7000 (35/70GB) runs for $300-500, so it is possible to do tape backup on a budget. It's the usual tradeoff between time and money, so you'll need to spend more time changing tapes.
On the other hand, for home use, i only archive my data onto CD since most of the data I have does not change and does not need incremental backups..
err no. scsi drives are much more durable. these drives are not identical. how do you spin the same hardware twice as fast, without failure? fact is, you can't. remember, scsi drives run at 10000rpm, or 15000 rpm. not 7200.
think before you post.
We're like rats, in some experiment! -- George Costanza
You have to treat harddrives as unreliable pieces of crap that will eventually fail. Once you have accepted this fact, then yes it is possible to do backups onto IDE drives. It just requires that you keep making copies of your data such that you don't ever have one single point of failure at a given time.
Optimally, you'd have a pool of different computers networked at different sites, and you'd just have them replicate all of their important data all of the time. If one goes down, you fix it asap and continue.
It would be nice if there was a distributed filesystem that did guaranteed replication of data. Maybe one of the P2P applications could be set up this way such that you could backup your harddrive and guarantee that none of the files went away even though N different nodes failed? Anyway, good project for the future.
And for keeping tabs on what is on which disk... I've been using a freeware program called "Cathy" (I don't have any links)...Although I don't know whether it'll do DVD's, I haven't tried.
Cathy is avalible for download here. According to these sites it will handle many disk formats ("CD-ROMs, LS120, Iomega Zip and Jaz disks, or even diskettes"). The link to the home page is broken.
While funny, this guy has hit the nail on the head. Without constant, vigilant backups, plastic and magnetic media don't mean dick in the long run.
If you're serious about keeping data for ever and ever, but also want convenience, you have to back up both ways.
1. Go ahead and keep data on that harddrive, but you're stucking buying another one to replace it, at least every year or so, just to make sure. This gives you the highest convenience for reinstating that data when (not if) it is corrupted.
2. Print it out. Print out all of it on non-acid paper with archival ink with the most expensive commercial printer that money can buy. Images, text, what have you. If you don't have a hard copy, you don't have the data for the long term. Once it's all printed out, put it in air and water-tight containers and then put it in a temperature controlled vault somewhere, preferrably underground so that it remains termperature controlled, even if power is lost for a long time.
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
Unfortunately, the so-called "archival" papers, while "rated" for 100 years, won't last anywhere near that long without some degradation. Then, if you're going to store it that densely, you've got to make allowance for putting the data into "tracks", so you have to leave spaces between each row. Cuts your 300 dpi down to, say, 100. Add check-summing data, so that you can recover from dirt, toner falling in the cracks, etc. And now, let's make the dashes twice the size of the dots. Cuts your storage by another 50%. Now, let's put spaces between the dots and dashes - otherwise, you get one LOOOONG dash. Your 11kb per square inch is now less than 0.5kb per square inch. Oh, and don't do duplex printing, you'll have transfer of toner onto the drum from the previously-printed side. Net result == about 30kb to 50kb per page... Oh well, maybe we should try microfiche ... or bit-encode the data into fake avi files and record them on VCR tape - cheap media for sure.
I just downloaded Cathy from http://rvas.webzdarma.cz/, the developer's page. The latest version is only a few days old. Looks like a nice simple program (Windows only). It's a 53 KB exe file, no installer, no frills. I've been looking intermittently for this type of program for a while.
So use a reliable tape format and store it properly. When stored properly, DLT has a shelf life rated in decades.
So use a tape format that is backward compatible. Today's SDLT drives can still read all the old DLT formats.
Check the shelf life of CD-{R,RW} and DVD[+-]{R,RW}. Most of the CD/DVD media is only rated for a five year life at most. Mastered CDs and DVDs will be readable for decades, but burned CDs and DVDs won't be.
The bigger problem with really long term backups is with the data format used by the backup software. If you use a backup program that only runs under Windows, what are you going to do when you need to recover that data in 10 years, and you only have Linux (or the other way around, the point still stands)? This is where Open Source software is good, because (assuming you can still find the source) you can always decode the data stream.
"Who the fuck has 220GB of personal data? "
I'm getting there, in audio data.
My own music, that I write and record, so, going down to the store to replace it isn't exactly an option.
It's also on DAT, and on CD audio, so you could say
I have a backup, but that's not really true -- the DAT is the source material, and a CD would represents one view of some of the data.
Am I going to buy a $65,000 SAN tape library machine, just because I'm getting into volume? (No.) Would I like an inexpensive solution that is less cumbersome than CDR? (Yes.)
-fb Everything not expressly forbidden is now mandatory.
Burnt CD's (like you'd use at home) have a shelf-life of about 10 years. Then the medium starts to oxidize (the metallic film, not the plastic itself), and flakes..
So, you have a 10 year backup.. It all depends on how important your information is. If it's that important, I'd put it on a RAID5 where it can be monitored. As drives fail, replace them. Continue migrating to newer arrays in the future.. Expensive, but I konw perfectly well any drive will fail. I've had several hard drives, that would fail to spin up properly after sitting for a few days.. Some of them, they only way they'd start is if I hit the side of the drive with a screwdriver..
You have to expect failure of your medium. If he wants to be very sure, use multiple backup methods.. RAID5's in multiple locations, and CD's. Someone will need to monitor all of it occasionally. Make sure the RAID's (and their associated machine) are running. Make sure the CD"s are oxodizing...
Even floppy disks die of old age. I found a few boxes with Novell Unix. They're is years old, and most of the floppies couldn't be read. They were brand new, still in the sealed boxes and envelopes. I finally found a boot disk that would work, but it would bomb out trying to install under VMWare (I was curious).
Is that data really going to be useful to you in 10 years? That's the important question. People are all paranoid of loosing Email and the like now, but in 1 year they don't care about it any more. In 2 years, it's just wasted space. In 10 years, they won't even know who or what they were talking about..
Serious? Seriousness is well above my pay grade.
Well, according to this, their parent company Singapore Technologies filed bankruptcy in late '97, and rather then try reorganizing under Chap. 11, they just liquidated the company..
There's no wrong way, to eat a Rhesus...
Point 1.
Make sure you select a very well-made drive, don't cut costs there. Example: I have a 20-year old Mountain HardCard that still works fine. However, I have had cheap 3-year old drives fail.
Bringing up point 2:
If you try it, make sure to use an "exercise" schedule for all the drives in your backup set. For example, once a week for each drive, plug it into a spare box and ensure that it spins up, spins down, and the read/write arm travels its full sweep. Maybe do some read/writes at various places on the platter surfaces, just to be sure.
It works for me, so I hope this helps.
C|N>K
I still think somebody will make glass MO archive media, with gold as the reflective surface, but if you're going to use paper, use 2d barcodes... about 1.1K/in^^2, for around 9.5K/side.
Oh, and to be sort of on-topic for the actual story, My friends at Seagate say that modern drives should start up fine after many years proper storage. I still don't trust them (the drives, not the friends).
Laser printers do gray scale by dithering, you lose resolution. Good idea though. Better storage medium would be black/white photographic film like microfiche.
-Yarn - Rio Karma: Excellent
So after a brief look at hardware RAID I realized that the software RAID support in Linux was all I really needed. Since this is my own machine, I didn't really need the hot-swap capability of a hardware RAID controller.
I bought two 100GB Western Digital drives and set them up in a RAID-1 configuration. A month later, I bought another drive, replaced one of the drives in the machine with it, and put the removed drive in the safe. A month after that, I bought another drive and repeated the process, this time moving the drive in the safe to an off-site location.
Every month or so I repeat the process, rotating the second drive of the array through my various offline storage locations. The real beauty of this (especially vs tape) is that I only need enough downtime to swap the drives and reboot the system; the mirror reconstruction runs in the background as I use the system normally.
The use of RAID-1 gives me complete protection against data loss in the event one of the online drives fails (though I've had no failures yet with the WD drives). If both drives are somehow ruined (e.g., by a fire within the computer), or if I accidentally delete something important, I have my first offline backup, less than a month old. If that's also ruined (e.g., my whole house burns down and the fire-rated safe fails to protect the drives it contains) I have my off-site drive, which is less than 2 months old. Obviously I could easily extend this process with more drives and more offsite storage locations.
Because the backup drives are regularly rotated into online service, bearing stiction should be less likely to occur. And if an offline drive were to fail when I bring it back into service, so what? It was about to get overwritten anyway.
Naturally, I also continually back up especially important files (e.g., email, work projects, documents, etc) to various machines over the network, as that's the easiest and most effective way to protect small amounts of data. But when it comes to periodic full backups of big disks, nowadays I just don't see any practical alternative to disk-to-disk copying. And RAID-1 is the easiest way to do that copying.
According to this page, the expected realistic life of "high-quality" CD-R media can range from 50 to 100+ years.
Apparently, since DVD+R and DVD-R use higher quality versions of the same materials and read/write process, the expected shelf life for them is also from 50 to 100+ years.
(A quick search on Google will show you all sorts of estimates, but the 50 to 100+ year life expectancy numbers are quotes from TDK and Kodak. The question is, do you believe them? I guess I do...)
P.