Ask Slashdot: Keeping Digital Media After Imaging?
New submitter rogue_archivist writes "I'm an archivist at a mid-sized university archives, trying to develop a policy for archiving computer files ('born-digital records' in archival parlance). Currently old floppy disks, CDs, and the occasional hard drive are added to our network storage. Then the physical media is separated from archival paper documents and placed into storage. My question for all you slashdotters out there is: should these disks be imaged and then the physical copies discarded? Is there any benefit for keeping around physical copies of storage media long since rendered obsolete?"
Your interest is in the contents, not the container. Therefore, once you have a known-good copy of the data, you're all set.
Remember to keep a few of the old tapes/drives/whatever for the museum display, of course.
If you're a zombie and you know it, bite your friend!
I work on a team which does archiving. We have multiple layers of data storage. First, we keep all copies of media in a library. The media is imaged and stored on a SAN. The SAN is backed up to an off-site NAS. And once a year, we copy the data to hard drives and ship the drives to another site across the country. If you have the capability, put the originals in an archival storage area. I have never known a single archivist to get rid of anything, so you must be new to this community.
As an FYI, there is no such thing as obsolete media, as evidence by this project. And trust me, you can usually find a way to image most old media formats.
sudo make me a sandwich
..or can check all of the content to be perfectly read, then yeah, sure, no loss in destroying the originals.
however.. if you have the space, why destroy? another issue is sw where you in theory might have to prove ownership of a legit copy or the originals might have some other curiosity value. another thing with paper records is that if you destroy the old ones, what was stopping you from introducing new data like a record for your uncles graduation from said university and with you having destroyed the paper records no way to go check them.
so my question is, is it really that expensive to store them, just for posterity's sake? even then you could just destroy them via sloppy storage rather than intentionally burning energy for destroying them..
world was created 5 seconds before this post as it is.
If you had a 1979 copy of Wizardry on an Apple ][ floppy disk, you could images the contents. But if you wrote them back to a disk and tried to run it, it would fail.
This is because as a means of copy protection, Wizardry used track arcing. Part of a track was written on a track. Another partial track was written half a head-step away. The timing of the writes was synchronized so the partial tracks didn't overwrite. Anyone doing a naive read and write, or even a not-so-naive scan of the half tracks would fail, because they would get the timing of the writes necessary to prevent collision and to meet the consistency checks in the program.
Obviously people reverse engineered this and wrote adaptive copy programs that you could direct to do the right thing, but how is an archivist going to know that?
If you can get this level of deviousness on a primitive floppy disk, I imagine that there is plenty of deviousness to go around on other formats.
Keep the media.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
OK, that's all well and good, but what scenario do you propose that would make the 5 1/4" floppy disk a useful tool again?
Wobbly tables.
An enigma, wrapped in a riddle, shrouded in bacon and cheese
You're thinking of burned DVDs. Most professional video DVDs are stamped.
You never know ... when you will discover errors in your digital copy. DVDs are not born analog. In fact the only have a shelf life of around 7 years. You need to get everything off DVDs and make several digital copies of it. You should keep the DVDs as long as possible but eventually you will not be able to read them anymore. Make sure your digital copies of the DVDs are error-free because there will come a time when you cannot go back to the DVDs.
Hmm... DVDs only live for 7 years, eh? In an archive?
I was just watching a DVD last night that I bought in 2000; it still works fine, with no scratches or degradation. I was also pulling data off a DVD-R the other day that I recorded in 2003. This DID have a slight bit of degradation, so maybe there's an issue here. Never had a problem with properly stored pressed DVDs though.
For that matter, I've still got 5 1/4" floppy disks that have readable data on them from 198, and Audio CDs from 1990. Got rid of all my cassette tapes though; both the digital and analog ones degraded really quickly with use.
What would be ideal is a file format that stores data with some error correction, so if a block got corrupted on older media, the corruption wouldn't just be detectable, but possibly correctable.
It isn't really "archival grade", but I've used the WinRAR utility for this. Archives made in 1999-2000 with error correction are still readable, check-able, and repairable, and can be moved from old CD-R to DVD to Blu-Ray, possibly to whatever the next generation of optical media will be. In fact, multi-volume archives that might have one CD or DVD go bad in a set are recoverable because I usually had one recovery volume for every four others, which might add 20% more disks to a set, but it seemed to be a fair compromise for restoring.
Analog media like photos? Keep. Who knows if there might be a better scanning technique to find more information from a photograph, similar to how one finds info about paintings.
Digital media? At least make a hash file that goes with the stored data at the minimum so corruption can be detected as the items pass to different storage media over time.
Not this tired argument again. I have burned CDs from 1995 that still work perfectly fine. Sure, they "estimated" that they would only last 7 years. Guess they were wrong, since unless I can see physical scratches or other damage, 99% of my discs from my life still work perfectly. The only ones that didn't last and had no physical damage were a cheap brand I got where the dye turned cloudy, but that happened within the first 2 years.
Peter predicted that you would "deliberately forget" creation 2000 years ago...
I have no mod points at the moment. But that's a VERY important point: A straight copy may not be good enough, due to outside-the-standards copy protection schemes.
Other floppy-based commercial games used a number of other techniques.
(One, for instance, had track 3 deliberately corrupted, by scratching the medium with a pin. No error on reading it - or writing and re-reading it - and the game would load, erase the disk, and play. This let the person who made the copy think he had a good copy - when in fact he had a blank disk. Let's see you make a good archival copy of THAT. B-b )
You get the same thing on other media as well - even analog. (Example: Macrovision, which plays with the sync and saturation levels, so that analog TVs intended for over-the-air reception (usually) correct the distortion as if it were a fading signal, while videotape machines copy the "fading" picture and regenerate a non-fading sync, so the copy isn't corrected when viewed.)
One of the several copy protection schemes for DVDs includes hidden modulation in sync information, decoded by the drive's hardware and detected by its firmware, so you can make a perfect copy of the bits and it still won't play.
Wikipedia has a long list of such copy-protection schemes, any of which would make archival copying difficult to impossible (without special equipment that would expose you to arrest and federal prosecution if you possessed it).
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
image->backup->check image and backup->discard
factor 966971: 966971
I disagree, keeping the original only makes sense when the original is in a stable format and you have plenty of room.
I've been dealing with this problem on a much smaller scale, and if you aren't extremely careful it can be hard to keep track of which disks you're keeping because you can, and which ones you're keeping because you have to.
Dump it to disk, verify the contents, back it up and chuck the original media. In the long term, 1 CDROM is going to last better than 400 or so floppies will.
Now, if you're dealing with paper, those tend to be incredibly durable provided decent paper and ink was used, those you're generally best keeping if you're archiving and have space.
As an archivist, I would think you might want to:
1) keep multiple copies of each type of media, preferably from different manufacturers, all written with identical data
2) Separately, a copy of the data contained on the media
Occasionally check the media to see at what rate their integrity is decaying. As readers for the media become increasingly difficult to encounter develop alternative methods to read the data, checking it against your reference copy. Eventually someone is going to appear on your doorstep with something like the Pioneer spacecraft data tapes or the Nixon Oval Office recordings, and if you can pull the data off it you'll be the hero of the day.
"Think about how stupid the average person is. Now, realise that half of them are dumber than that." - George Carlin
> I have burned CDs from 1995 that still work perfectly fine.
If they're redbook audio CDs, and your definition of "work perfectly fine" is "I can stick the disk in, hit play, it spins up at 1X, music comes out, and the player doesn't totally gag", you might be right. Now try ripping the disc using software that can monitor the realtime bit error rate. You'll probably be *horrified* to see how high it is.
Redbook audio CDs are very robust, even when their bits are rotting all over the place. They were designed in an era when hardware couldn't do much in realtime, so they bent over backwards to make sure they had a "plan B" to make sure the show would go on after the disc got scratched, dirty, or whatever else happened to it. They were designed so the audio data is interleaved in a way that when a read error occurs, the left and right channels get merged for 1 sample. A redbook audio CD has to be nearly *destroyed* (cracked, melted, fried, whatever) before it literally won't play, as long as the player is able to find the lead-in and sync up to the spiral track.
It'll start to sound "rough" and lose channel separation, but things have to be pretty bad before it will LITERALLY stop playing. At least, as long as the player itself is faithfully following the original redbook audio specs, and isn't trying to realtime-rip the audio to a ram buffer and play it back from there (which is what some, if not most, new optical-disc media players do TODAY). I have plenty of CDs that new players choke on and refuse to even try playing, but yet my 25 year old antique CD player that cost something outrageous like $600 or $800 when new, can play just fine. Apparently, it's because first-generation CD players were precision hardware that could blindly track a CD spiral as long as the disc itself was 100% within spec, whereas new players depend upon realtime error-analysis to stumble and wobble around, and make up for the fact that discs no longer spin precisely, and worm-gear optical assemblies no longer track with precision measured in microns.
That said, my experience has ALSO been that CD-R discs manufactured in THIS century are less likely to rot and become unplayable in new drives, but are more likely to have major problems with old players. The old players were precision hardware, and assumed the discs themselves were manufactured to precision specs. The first-gen CD-R media had dye that deteriorated over time, but their spiral tracks were spot-on, just like pressed discs. As drives got better at handling sloppy tracking, the discs themselves became sloppier.
Net effect: first-gen redbook audio CD-R media is likely to play with acceptable audio quality on an old CD player from the 80s or early 90s, but be unplayable on many modern drives & be un-rippable on most drives (some will allow you to spin down to 1X & emulate the playback mode of a legacy player if you're running a sophisticated ripping app). Newer discs that are still old will probably skip and have problems playing on an old player, but might still be equally bad on a new one. When today's bargain-bin CD-R media is 10 years old, it will probably be unplayable on anything, the same way my old VHS tapes from the 80s still play fine, but VHS tapes recorded after ~1998 are largely unplayable on anything I can find.
TLDR point: the storage life of "last-gen" CD-R media is likely to be better than first-gen CD-R media was at the same age, but enormously WORSE than that of the best "turn of the century" CD-R media (the golden era when quality standards were still high, and the worst faults of the first-gen media were addressed. Any box of CD-R media you buy TODAY is probably shit of the worst kind. The best media you can buy TODAY for long storage life? Non-LTH BD-R single-layer discs. But MAKE SURE they aren't LTH... most manufacturers don't go out of their way to scream, "These discs are LTH garbage!"
Just to reply to myself . . .
Target headquarters in Minneapolis gets VCR tapes from security systems all over the country, and they work with the FBI to read them and export the video. Security equipment manufacturers are notorious for using proprietary equipment or file formats to limit interoperability with the competition's systems, and they apparently have a lab that specializes in decoding them to extract the usable data.
"Think about how stupid the average person is. Now, realise that half of them are dumber than that." - George Carlin
You should keep the DVDs as long as possible
Bottom line, if you have N digital copies then what is the benefit of keeping the original DVD over one N+1 digital copies of the DVD?
Near as I can tell. Zero benefit. And massively increased storage requirements. So make one extra digital archive and discard them. Better still donate them! to public libraries? independent / private archivists? You don't have to "destroy" them -- which is surely about as counter-instinctual as it gets for an archivist. :)
eventually you will not be able to read them anymore
Odds are that if there were errors reading from it today, you won't get a better copy from that disc 50 years from now. Better to make copies from 2 different discs or exchange back ups from another center. 2 different rips of the same disc is better than 2 copies of a rip from the same disc in terms of ever being able to restore missing information from a rip.
Re:Is there any benefit?
No. Image and discard.
Untrue. Backups get lost, go bad, or otherwise screwed up. At a previous employer old directories no longer accessed got backed up to tape to free up server space. We used a lot of storage at this company. On multiple occasions some years old files were needed. About half the time we would be told that these files were no longer recoverable by IT. After one of these backup failures I recalled that I had made a backup DVD of one old project we were trying to get a copy of. I went to our archivist and she found the DVD, it was readable.
Save the media. Buy a USB based floppy drive. It makes sense to copy the files on these legacy format to a server for future use but keeping the originals around as a backup is a good idea.
Do not image the media. These image formats may fall out of favor and not be recognizable in the future. Look at the various problems NASA has had with some of its old tapes using formats no longer supported. Create a folder on a server for a particular piece of media and just copy and verify the files from the legacy media to the folder on the server.
I'm an archivist at a mid-sized university archives, trying to develop a policy for archiving computer files ('born-digital records' in archival parlance).
Get Your Bits Off (Old Storage Media)
Demystifying Born Digital Reports
Working Draft of the Levels of Digital Preservation Chart
The actuality of bit-rot in media is uncertain. Many documents 500 years old are readable-ish if you have the skills and accept that some parts may have decayed. That tells us a lot about te exact media people used way back then.
The trouble with digital records is this:-
Searchability is a requirement (even though we don't expect that with written records). The reason is that there is so much of it when compared with the sparse records of times past. So you need a 'good' copy for data analysis and some original media to inform historians of the future how we looked upon the information, or what 'ordinary people' or 'ordinary businesses' had at their disposal.
Comment removed based on user account deletion
image->backup->check image and backup->discard->sign in triplicate->sent in->send back->query->lose->find->subject to public inquiry->lose again->bury in soft peat for three months and recycle as firelighters
systemd is Roko's Basilisk.