Ask Slashdot: Keeping Digital Media After Imaging?
New submitter rogue_archivist writes "I'm an archivist at a mid-sized university archives, trying to develop a policy for archiving computer files ('born-digital records' in archival parlance). Currently old floppy disks, CDs, and the occasional hard drive are added to our network storage. Then the physical media is separated from archival paper documents and placed into storage. My question for all you slashdotters out there is: should these disks be imaged and then the physical copies discarded? Is there any benefit for keeping around physical copies of storage media long since rendered obsolete?"
No. Image and discard.
For born-analog content, always keep the original physical copy. You never know when you will need to rescan at a higher quality or when you will discover errors in your digital copy. DVDs are not born analog. In fact the only have a shelf life of around 7 years. You need to get everything off DVDs and make several digital copies of it. You should keep the DVDs as long as possible but eventually you will not be able to read them anymore. Make sure your digital copies of the DVDs are error-free because there will come a time when you cannot go back to the DVDs.
Your interest is in the contents, not the container. Therefore, once you have a known-good copy of the data, you're all set.
Remember to keep a few of the old tapes/drives/whatever for the museum display, of course.
If you're a zombie and you know it, bite your friend!
I work on a team which does archiving. We have multiple layers of data storage. First, we keep all copies of media in a library. The media is imaged and stored on a SAN. The SAN is backed up to an off-site NAS. And once a year, we copy the data to hard drives and ship the drives to another site across the country. If you have the capability, put the originals in an archival storage area. I have never known a single archivist to get rid of anything, so you must be new to this community.
As an FYI, there is no such thing as obsolete media, as evidence by this project. And trust me, you can usually find a way to image most old media formats.
sudo make me a sandwich
Old media will become obsolete and degrade ofer time. It is best to copy to modern media. The files should be stored based on their SHA hash code, so that duplicates need not be stored. You can't have too many copies.
OK, keep in mind that I'm being rather abstract here:
What makes a thing obsolete? That it isn't a commonly used item anymore, or that its usefulness has become non-existent?
Take, for example, the carrier pigeon - once considered 'obsolete' due to the invention of telecommunications equipment, I can see the medium coming back into vogue in wake of the new knowledge that governments the world over are monitoring our every word over the aforementioned modern channels. Today, you can't send a message along electronic media without it being intercepted, somewhere, by someone other than the intended recipient; however, you can tie a coded message to a bird's leg and be reasonably confident in the message reaching it's intended recipient without interception and decoding (international and relay flights notwithstanding).
Thus, that which was obsolete becomes useful again, bringing us back to the initial philosophical quandary: What makes a thing obsolete, anyway?
An enigma, wrapped in a riddle, shrouded in bacon and cheese
Scan the container for labels and nostalgia. Keep a few samples. Hope you have solid backup policies, and you test your backups. Otherwise, well....
Another major problem is reading the original file format later. Or even that some media (forth floppies) come without an actual file system. Archivists have been working on that too. So instead of (or, in addition to) asking a bunch of nerds, see what your fellow professionals have been able to come up with.
Also, "media" is plural, thanks.
..or can check all of the content to be perfectly read, then yeah, sure, no loss in destroying the originals.
however.. if you have the space, why destroy? another issue is sw where you in theory might have to prove ownership of a legit copy or the originals might have some other curiosity value. another thing with paper records is that if you destroy the old ones, what was stopping you from introducing new data like a record for your uncles graduation from said university and with you having destroyed the paper records no way to go check them.
so my question is, is it really that expensive to store them, just for posterity's sake? even then you could just destroy them via sloppy storage rather than intentionally burning energy for destroying them..
world was created 5 seconds before this post as it is.
Then the physical media is separated from archival paper documents and placed into storage
What does that mean?
Those DVDs you burn are stored and the paper is ....what?
If you really MUST archive stuff, then store it in multiple media - paper, DVD, original, etc ...
For example, if I were archiving Da Vinci's paintings, I'd keep the original, photograph the original in the highest def digital camera I can get, and photograph it in film - preferably slide film because then you don't have to worry about the second layer. (analog sucks for archive, btw.), and have the most talented copy artist ever dupe it.
Unless you are paying Manhattan real estate prices, why not keep the originals? They serve as another backup. They will likely not be too much of a burden. Most "obsolete" media is still perfectly usable and may be so for quite some time.
There is simply no need to rush into destroying something you already have and can serve as an alternate form of backup.
Originals always have some value in being the definitive version of something.
A Pirate and a Puritan look the same on a balance sheet.
It's always interesting to see the files and what they were kept on. Floppy disks, whether 3, 5 or 9in variety. Old tape reels, large disk platters... "This file took up 3 of these..." or..
An entire windowing system (macos) PLUS MS-Word fit on two floppy disks.
My phone currently has more storage than the enterprise datacenter that I used to work at in the 80s. And it was a LARGE datacenter...
If you had a 1979 copy of Wizardry on an Apple ][ floppy disk, you could images the contents. But if you wrote them back to a disk and tried to run it, it would fail.
This is because as a means of copy protection, Wizardry used track arcing. Part of a track was written on a track. Another partial track was written half a head-step away. The timing of the writes was synchronized so the partial tracks didn't overwrite. Anyone doing a naive read and write, or even a not-so-naive scan of the half tracks would fail, because they would get the timing of the writes necessary to prevent collision and to meet the consistency checks in the program.
Obviously people reverse engineered this and wrote adaptive copy programs that you could direct to do the right thing, but how is an archivist going to know that?
If you can get this level of deviousness on a primitive floppy disk, I imagine that there is plenty of deviousness to go around on other formats.
Keep the media.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
I implemented that archive system and I got great support, knowledge and experiance from their community : https://wiki.duraspace.org/display/DSPACE/Discussion
In the same way a Gutenberg Bible has something a modern reprint can never match...
loading an 8" floppy into a drive and waiting several minutes to access a text file has something a file on a NAS can never match.
Not all old documents should be preserved in their original format if they are duplicated elsewhere, but a representative sample of each generation should be kept for posterity. Of course, idiots will damage those 8 inch floppies over the years. So, when in doubt, save more than you will ever need.
Readable disks are far more useful than museum-relics that can be displayed but not used.
I am constantly amazed at how well (very) old computers work. My Grid Pad 1 (early laptop) boots just fine.
Image->Discard->and back-up the image
Don't forget the last step, make back-ups of your digital copies.
My family pics reside on >6 hard drives including one in a safety deposit box.
Get rid of the darn things. Make sure you have the proper emulation and other tools you need, be sure to reformat, but absolutely get rid of the disks. They will fail (magnetic impulses cannot be captured forever) and you will be left with a goodly-sized stock of unreadable media (in fact, IIRC, latest NDSA suggestions are to remove all files from optical media ASAP). Save yourself the trouble and the expense and dump them from the start.
I don't know what your goals and requirements are, but I wouldn't bet on old floppies, CDs, or even hard drives lasting for very long. There's an essential problem with old physical media in that the readers are becoming more scarce. You may have a lot of floppies, but how easy is it to find a floppy drive? It's not always easy to find adapters for old IDE or SCSI formats as newer interfaces have been developed. Personally, I don't expect CD/DVD drives to be around in 10 years.
But beyond that, there's an even bigger issue: media goes bad. Of course, how quickly it goes bad depends on quite a few things, including how it was manufactured, and how it's stored. Even if you store a bunch of CDs and floppies under good conditions, I'd expect at least 10% to go bad within 6 years. I'm completely pulling that number out of my ass and I have no science to back me up, but my point is, this stuff is not reliable. I think my 10% number is too low, even, but I'm trying to make sure I don't exaggerate.
I was recently did the same thing. I had about 2 old OnStream 30 GiG tapes and a hand full of old QIC-80's. Not even mention the CD-R pile in my room.
During the years I never had the space to just extract everything and sort though it all. Not to mention I would move backup data from tape, to CD, back to tape so I have copy's of the same things all over the spectrum. I have recently started consolidating it all, finding an old OnStream tape drive and old QIC floppy drives to restore everything to a single drive, get rid of all the duplicates and save the important stuff on archived DVD media and/or "the cloud" It was a nightmare but now I don't have to worry about trying to get hold of a bankrupt tape drive company's hardware in another 10 years.
I will then delete the tapes and burn them.
If it was hand labeled by a professor he liked or someone famous like Bill Gates I could see that. But there is no other reason to keep it around once it's contents have been properly indexed and stored. The only exception is when you need the obsolete media to be used in another obsolete computer. AKA making a disk for a cC64.
Let me put it this way. What do you think will happen in 10 years, when someone else finds that box of media. Even if he was told that it was all indexed and stored, he might question it and do it all over again "just to be sure" :P
"loading an 8" floppy into a drive and waiting several minutes to access a text file has something a file on a NAS can never match."
"when in doubt, save more than you will ever need."
"Readable disks are far more useful than museum-relics that can be displayed but not used."
Fuck Off
I've dealt with imaging thousands of floppies and they are all shit and deserve to be in the landfill. The same goes for mag tape.
The sooner the shelves are cleared of these, the better.
The DATA is what you care about, not the carrier.
...as the contents.
Depending on what the contents are, and your reasons for keeping it, the medium may be just as valuable. Or, said another way, the contents may lose their value when divorced from the medium.
For example, I'm thinking specifically of old copies of MacOS. The primary reason for keeping old versions of MacOS around would be to boot old Macs. If you discard the medium, you'll never boot that hardware again.
And before you jump and say "I'll just write another copy", it's nowhere near that simple. Original Mac drives spun the discs at variable speeds, while PC drives spun at a fixed rate. You cannot write a Mac floppy disk with a "modern" commodity floppy drive. If you can get your hands on an image (the contents), it will still require multiple generations of Mac hardware and software to backtrack far enough to write a usable physical copy.
Of course, a fair rebuttal here is "Why would you want to boot an original Mac?", and for that I have no other answer besides "Nostalgia".
see title
I have no mod points at the moment. But that's a VERY important point: A straight copy may not be good enough, due to outside-the-standards copy protection schemes.
Other floppy-based commercial games used a number of other techniques.
(One, for instance, had track 3 deliberately corrupted, by scratching the medium with a pin. No error on reading it - or writing and re-reading it - and the game would load, erase the disk, and play. This let the person who made the copy think he had a good copy - when in fact he had a blank disk. Let's see you make a good archival copy of THAT. B-b )
You get the same thing on other media as well - even analog. (Example: Macrovision, which plays with the sync and saturation levels, so that analog TVs intended for over-the-air reception (usually) correct the distortion as if it were a fading signal, while videotape machines copy the "fading" picture and regenerate a non-fading sync, so the copy isn't corrected when viewed.)
One of the several copy protection schemes for DVDs includes hidden modulation in sync information, decoded by the drive's hardware and detected by its firmware, so you can make a perfect copy of the bits and it still won't play.
Wikipedia has a long list of such copy-protection schemes, any of which would make archival copying difficult to impossible (without special equipment that would expose you to arrest and federal prosecution if you possessed it).
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
loading an 8" floppy into a drive and waiting several minutes to access a text file has something a file on a NAS can never match.
You obviously don't know how shitty my NAS is... :-(
Re:Is there any benefit?
No. Image and discard.
Untrue. Backups get lost, go bad, or otherwise screwed up. At a previous employer old directories no longer accessed got backed up to tape to free up server space. We used a lot of storage at this company. On multiple occasions some years old files were needed. About half the time we would be told that these files were no longer recoverable by IT. After one of these backup failures I recalled that I had made a backup DVD of one old project we were trying to get a copy of. I went to our archivist and she found the DVD, it was readable.
Save the media. Buy a USB based floppy drive. It makes sense to copy the files on these legacy format to a server for future use but keeping the originals around as a backup is a good idea.
Do not image the media. These image formats may fall out of favor and not be recognizable in the future. Look at the various problems NASA has had with some of its old tapes using formats no longer supported. Create a folder on a server for a particular piece of media and just copy and verify the files from the legacy media to the folder on the server.
the app store ideas and apples lack of ports is bad for archiving.
We may get to the point where the app store sand boxing makes it so that an archiving app can't put the files in an place where other apps can see it. and we may have hard time reading an outside data source as well.
Though it is not my primary business, I offer my services when people have difficulties accessing archives. I am often surprised when people come to me to rescue data off floppies, both 3 1/2 and 5 1/4. These are mostly legal documents or contracts. Why people keep floppies but not drives to read them is beyond me. I have a 3 1/2 USB drive I keep for routine work. I saved a Pentium 90 with a 5 1/4 inch at home I use for those rare occasions. The other problem is reading the data. I have Office 97 on the Pentium to convert and read old formats. I have another machine with Windows and Office 2000 to bring documents to somewhat modern formats. I helped one company update their union contract. They had it printed as a small book, and had been giving it out for 20 years. When they finally ran out of copies, they wanted to incorporate the changes over the years and reprint it. They handed me a pile of floppies. Each chapter was a separate document, which I finally figured out were in Wordstar for DOS format. Luckily, there is an Office 97 converter for that. The lesson is, without software to read it, your archives are useless to keep. Save old versions of your software, and if necessary, hardware to run it.
I'm an archivist at a mid-sized university archives, trying to develop a policy for archiving computer files ('born-digital records' in archival parlance).
Get Your Bits Off (Old Storage Media)
Demystifying Born Digital Reports
Working Draft of the Levels of Digital Preservation Chart
The actuality of bit-rot in media is uncertain. Many documents 500 years old are readable-ish if you have the skills and accept that some parts may have decayed. That tells us a lot about te exact media people used way back then.
The trouble with digital records is this:-
Searchability is a requirement (even though we don't expect that with written records). The reason is that there is so much of it when compared with the sparse records of times past. So you need a 'good' copy for data analysis and some original media to inform historians of the future how we looked upon the information, or what 'ordinary people' or 'ordinary businesses' had at their disposal.
DVDs, even commercially stamped, can suffer from bit rot. Optical disk technology is inherently flaky. Use multiple HD backups and make sure you have offsite storage.
Comment removed based on user account deletion
The real question is how long do you need this data to be archived.
If it more than a few year, print is out, and make a mico-fiche of it.
From where I stand, digital data only has a life of a few years.
First, you may have legal issues. You may not be allowed to copy the data but you may also someday prove that you did buy the 'data' and show proof of
the original. This is a big fear for anyone that has had the Internet Nazis, err The Media Industry come after them.
As a photographer myself I would recommend to print out the best choice prints and store them physically, as photographic prints still has the best record for preservation when compared to any / all types of digital media. By all means take running copies of all your data, on and offsite backup. A physical copy of the best prints though is likely to be preserved longer.
This. I logged in to say this.
An archivist should keep the original as much as possible. Otherwise, what's the point? Would it be acceptable to photocopy a letter and than discard the original because it's too old? No. You photocopy (or actually non-destructively scan these days), and then you keep the original in a climate controlled environment. People can work off the copy, but sooner or later someone will want to look at the original.
At a minimum you take high-resolution photos of the media and record those photos (and the type of media, etc.) along with the copy of that you are putting into your fancy database.
What's the point of archiving anything? It's too keep it for posterity.
Geeze, what do they teach archivists these days...
Disclaimer: I'm not an archivist, but I have done a lot of study and work in the digital sector (including archiving).
HELP MY ACCOUNT HAS BEEN HACKED BY AN ILLIBERAL ART STUDENT SET TO DESTROY THE INTERWEBZ!
Should i keep my old vinyls as the CD has taken over?
It really is down to you, only you know what you really want to achieve.
- If your only interested in the content, just copy the files.
- If you only interested in the source container, keep it.
Actually, Macrovision played havoc with the sync pulses - it would produce a deliberately too weak one (but enough that most VCRs could lock on) and then produce a strong one, etc..
What happens is the TV is looking for a sync pulse and sees it, even though it's non-standard (it's a case of basically seeing what you expect - the TV is looking for a pulse, it sees the beginnings and locks on).
A VCR though wants to ensure that what gets recorded on tape (which has a poor dynamic range) is as strong as possible, so the weak pulse causes the VCR's AGC circuits to kick up the gain for the frame. The strong pulse causes the AGC to kick it down. The fact that the video signal is otherwise normal means the AGC has now made it clip in the first case, and suppressed it in the second.
And in fact, it was technology (and pressure from Macrovision) that really resulted in it working - older VCRs had slower AGC circuits and could play it just fine as the AGC never messed with the signal before the pulses reset themselves. But as technology and Macrovision pressure increased, the AGC circuits got better and produced this artifacting.
It's why the simplest cure was a signal regenerator - it normalized the pulses back to standard.
You can still do enough to mess with the disc - a modern CD-ROM or DVD-ROM supports standard cd-ripping, and on data CDs, the first track can be reset back into playing as audio for ripping. Or you could embed a fake audio track on a CD that actually has data on it. There's enough flexibility in the CD-ROM spec because it was designed for audio first - data was a hack.
DVDs are a bit trickier, but I'm sure all the copyprotection tricks movies use can be employed for regular data as well. Blu-Ray likewise, though it has some tricks of its own embedded in the firmware.
3 is 2
2 is 1
1 is none.
best way to make sure you have it
3 copies
2 different media, at least (hd, ssd, thumb, cd, dvd, tape, cloud)
1 is offsite.
How many times much this be drilled into your heads?
Born Digital Records are born, live, and die in RAM. Anything else is a copy. But we must distinguish among the various types of copies. Most programs will allow the "saving" of data in RAM when it ends. This is the Working Copy (w-copy) of the Born Digital Record. It is common practice to create, maybe several, Backup Copies (b-copy) of the w-copy. It is not guaranteed that any b-copy is the same as the w-copy, however, we rely on a b-copy to recreate a w-copy when needed.
After I have finished the Great American Novel, my w-copy gets published. It now need to be archived (a-copy). This is a read only copy and is highly protected. In fact, I may have several a-copies in several locations. It is never changed! It must always be available! Hence, another copy, a Library copy (l-copy), where it can be read, analyzed, written about. If this is not desirable, why save it at all?
This means that all copies are, in a sense, live. They all are easily available, usually online. There is no concern about the creating programs being abandoned, since the software to process all copies is current level. This may mean reprocessing a format X a-copy to a format Y a-copy in the future.
About the game with the hardware error. That is not a digital born record. You need a specific computer to play the game and they both must be archived.
As for the my Great American Novel, why I print it out on acid free paper or maybe papyrus or etch it in copper.