Ask Slashdot: Keeping Digital Media After Imaging?
New submitter rogue_archivist writes "I'm an archivist at a mid-sized university archives, trying to develop a policy for archiving computer files ('born-digital records' in archival parlance). Currently old floppy disks, CDs, and the occasional hard drive are added to our network storage. Then the physical media is separated from archival paper documents and placed into storage. My question for all you slashdotters out there is: should these disks be imaged and then the physical copies discarded? Is there any benefit for keeping around physical copies of storage media long since rendered obsolete?"
For born-analog content, always keep the original physical copy. You never know when you will need to rescan at a higher quality or when you will discover errors in your digital copy. DVDs are not born analog. In fact the only have a shelf life of around 7 years. You need to get everything off DVDs and make several digital copies of it. You should keep the DVDs as long as possible but eventually you will not be able to read them anymore. Make sure your digital copies of the DVDs are error-free because there will come a time when you cannot go back to the DVDs.
Your interest is in the contents, not the container. Therefore, once you have a known-good copy of the data, you're all set.
Remember to keep a few of the old tapes/drives/whatever for the museum display, of course.
If you're a zombie and you know it, bite your friend!
I work on a team which does archiving. We have multiple layers of data storage. First, we keep all copies of media in a library. The media is imaged and stored on a SAN. The SAN is backed up to an off-site NAS. And once a year, we copy the data to hard drives and ship the drives to another site across the country. If you have the capability, put the originals in an archival storage area. I have never known a single archivist to get rid of anything, so you must be new to this community.
As an FYI, there is no such thing as obsolete media, as evidence by this project. And trust me, you can usually find a way to image most old media formats.
sudo make me a sandwich
Old media will become obsolete and degrade ofer time. It is best to copy to modern media. The files should be stored based on their SHA hash code, so that duplicates need not be stored. You can't have too many copies.
OK, keep in mind that I'm being rather abstract here:
What makes a thing obsolete? That it isn't a commonly used item anymore, or that its usefulness has become non-existent?
Take, for example, the carrier pigeon - once considered 'obsolete' due to the invention of telecommunications equipment, I can see the medium coming back into vogue in wake of the new knowledge that governments the world over are monitoring our every word over the aforementioned modern channels. Today, you can't send a message along electronic media without it being intercepted, somewhere, by someone other than the intended recipient; however, you can tie a coded message to a bird's leg and be reasonably confident in the message reaching it's intended recipient without interception and decoding (international and relay flights notwithstanding).
Thus, that which was obsolete becomes useful again, bringing us back to the initial philosophical quandary: What makes a thing obsolete, anyway?
An enigma, wrapped in a riddle, shrouded in bacon and cheese
..or can check all of the content to be perfectly read, then yeah, sure, no loss in destroying the originals.
however.. if you have the space, why destroy? another issue is sw where you in theory might have to prove ownership of a legit copy or the originals might have some other curiosity value. another thing with paper records is that if you destroy the old ones, what was stopping you from introducing new data like a record for your uncles graduation from said university and with you having destroyed the paper records no way to go check them.
so my question is, is it really that expensive to store them, just for posterity's sake? even then you could just destroy them via sloppy storage rather than intentionally burning energy for destroying them..
world was created 5 seconds before this post as it is.
Unless you are paying Manhattan real estate prices, why not keep the originals? They serve as another backup. They will likely not be too much of a burden. Most "obsolete" media is still perfectly usable and may be so for quite some time.
There is simply no need to rush into destroying something you already have and can serve as an alternate form of backup.
Originals always have some value in being the definitive version of something.
A Pirate and a Puritan look the same on a balance sheet.
It's always interesting to see the files and what they were kept on. Floppy disks, whether 3, 5 or 9in variety. Old tape reels, large disk platters... "This file took up 3 of these..." or..
An entire windowing system (macos) PLUS MS-Word fit on two floppy disks.
My phone currently has more storage than the enterprise datacenter that I used to work at in the 80s. And it was a LARGE datacenter...
If you had a 1979 copy of Wizardry on an Apple ][ floppy disk, you could images the contents. But if you wrote them back to a disk and tried to run it, it would fail.
This is because as a means of copy protection, Wizardry used track arcing. Part of a track was written on a track. Another partial track was written half a head-step away. The timing of the writes was synchronized so the partial tracks didn't overwrite. Anyone doing a naive read and write, or even a not-so-naive scan of the half tracks would fail, because they would get the timing of the writes necessary to prevent collision and to meet the consistency checks in the program.
Obviously people reverse engineered this and wrote adaptive copy programs that you could direct to do the right thing, but how is an archivist going to know that?
If you can get this level of deviousness on a primitive floppy disk, I imagine that there is plenty of deviousness to go around on other formats.
Keep the media.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
I implemented that archive system and I got great support, knowledge and experiance from their community : https://wiki.duraspace.org/display/DSPACE/Discussion
Get rid of the darn things. Make sure you have the proper emulation and other tools you need, be sure to reformat, but absolutely get rid of the disks. They will fail (magnetic impulses cannot be captured forever) and you will be left with a goodly-sized stock of unreadable media (in fact, IIRC, latest NDSA suggestions are to remove all files from optical media ASAP). Save yourself the trouble and the expense and dump them from the start.
I don't know what your goals and requirements are, but I wouldn't bet on old floppies, CDs, or even hard drives lasting for very long. There's an essential problem with old physical media in that the readers are becoming more scarce. You may have a lot of floppies, but how easy is it to find a floppy drive? It's not always easy to find adapters for old IDE or SCSI formats as newer interfaces have been developed. Personally, I don't expect CD/DVD drives to be around in 10 years.
But beyond that, there's an even bigger issue: media goes bad. Of course, how quickly it goes bad depends on quite a few things, including how it was manufactured, and how it's stored. Even if you store a bunch of CDs and floppies under good conditions, I'd expect at least 10% to go bad within 6 years. I'm completely pulling that number out of my ass and I have no science to back me up, but my point is, this stuff is not reliable. I think my 10% number is too low, even, but I'm trying to make sure I don't exaggerate.
I was recently did the same thing. I had about 2 old OnStream 30 GiG tapes and a hand full of old QIC-80's. Not even mention the CD-R pile in my room.
During the years I never had the space to just extract everything and sort though it all. Not to mention I would move backup data from tape, to CD, back to tape so I have copy's of the same things all over the spectrum. I have recently started consolidating it all, finding an old OnStream tape drive and old QIC floppy drives to restore everything to a single drive, get rid of all the duplicates and save the important stuff on archived DVD media and/or "the cloud" It was a nightmare but now I don't have to worry about trying to get hold of a bankrupt tape drive company's hardware in another 10 years.
I will then delete the tapes and burn them.
If it was hand labeled by a professor he liked or someone famous like Bill Gates I could see that. But there is no other reason to keep it around once it's contents have been properly indexed and stored. The only exception is when you need the obsolete media to be used in another obsolete computer. AKA making a disk for a cC64.
Let me put it this way. What do you think will happen in 10 years, when someone else finds that box of media. Even if he was told that it was all indexed and stored, he might question it and do it all over again "just to be sure" :P
see title
I have no mod points at the moment. But that's a VERY important point: A straight copy may not be good enough, due to outside-the-standards copy protection schemes.
Other floppy-based commercial games used a number of other techniques.
(One, for instance, had track 3 deliberately corrupted, by scratching the medium with a pin. No error on reading it - or writing and re-reading it - and the game would load, erase the disk, and play. This let the person who made the copy think he had a good copy - when in fact he had a blank disk. Let's see you make a good archival copy of THAT. B-b )
You get the same thing on other media as well - even analog. (Example: Macrovision, which plays with the sync and saturation levels, so that analog TVs intended for over-the-air reception (usually) correct the distortion as if it were a fading signal, while videotape machines copy the "fading" picture and regenerate a non-fading sync, so the copy isn't corrected when viewed.)
One of the several copy protection schemes for DVDs includes hidden modulation in sync information, decoded by the drive's hardware and detected by its firmware, so you can make a perfect copy of the bits and it still won't play.
Wikipedia has a long list of such copy-protection schemes, any of which would make archival copying difficult to impossible (without special equipment that would expose you to arrest and federal prosecution if you possessed it).
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
image->backup->check image and backup->discard
factor 966971: 966971
Why does analog suck for archiving? sure you can't just get a hash of the data and tell at a moments notice whether it is exactly as it was, however you also can't store a hard disk in a vault for 70 years and have a high expectation of it working.
As an archivist, I would think you might want to:
1) keep multiple copies of each type of media, preferably from different manufacturers, all written with identical data
2) Separately, a copy of the data contained on the media
Occasionally check the media to see at what rate their integrity is decaying. As readers for the media become increasingly difficult to encounter develop alternative methods to read the data, checking it against your reference copy. Eventually someone is going to appear on your doorstep with something like the Pioneer spacecraft data tapes or the Nixon Oval Office recordings, and if you can pull the data off it you'll be the hero of the day.
"Think about how stupid the average person is. Now, realise that half of them are dumber than that." - George Carlin
That's good in theory, but it's unlikely that your version of MacOS from the '80s is still going to work. Floppies are terrible in terms of reliability over long periods of time.
Just to reply to myself . . .
Target headquarters in Minneapolis gets VCR tapes from security systems all over the country, and they work with the FBI to read them and export the video. Security equipment manufacturers are notorious for using proprietary equipment or file formats to limit interoperability with the competition's systems, and they apparently have a lab that specializes in decoding them to extract the usable data.
"Think about how stupid the average person is. Now, realise that half of them are dumber than that." - George Carlin
Re:Is there any benefit?
No. Image and discard.
Untrue. Backups get lost, go bad, or otherwise screwed up. At a previous employer old directories no longer accessed got backed up to tape to free up server space. We used a lot of storage at this company. On multiple occasions some years old files were needed. About half the time we would be told that these files were no longer recoverable by IT. After one of these backup failures I recalled that I had made a backup DVD of one old project we were trying to get a copy of. I went to our archivist and she found the DVD, it was readable.
Save the media. Buy a USB based floppy drive. It makes sense to copy the files on these legacy format to a server for future use but keeping the originals around as a backup is a good idea.
Do not image the media. These image formats may fall out of favor and not be recognizable in the future. Look at the various problems NASA has had with some of its old tapes using formats no longer supported. Create a folder on a server for a particular piece of media and just copy and verify the files from the legacy media to the folder on the server.
the app store ideas and apples lack of ports is bad for archiving.
We may get to the point where the app store sand boxing makes it so that an archiving app can't put the files in an place where other apps can see it. and we may have hard time reading an outside data source as well.
Though it is not my primary business, I offer my services when people have difficulties accessing archives. I am often surprised when people come to me to rescue data off floppies, both 3 1/2 and 5 1/4. These are mostly legal documents or contracts. Why people keep floppies but not drives to read them is beyond me. I have a 3 1/2 USB drive I keep for routine work. I saved a Pentium 90 with a 5 1/4 inch at home I use for those rare occasions. The other problem is reading the data. I have Office 97 on the Pentium to convert and read old formats. I have another machine with Windows and Office 2000 to bring documents to somewhat modern formats. I helped one company update their union contract. They had it printed as a small book, and had been giving it out for 20 years. When they finally ran out of copies, they wanted to incorporate the changes over the years and reprint it. They handed me a pile of floppies. Each chapter was a separate document, which I finally figured out were in Wordstar for DOS format. Luckily, there is an Office 97 converter for that. The lesson is, without software to read it, your archives are useless to keep. Save old versions of your software, and if necessary, hardware to run it.
I'm an archivist at a mid-sized university archives, trying to develop a policy for archiving computer files ('born-digital records' in archival parlance).
Get Your Bits Off (Old Storage Media)
Demystifying Born Digital Reports
Working Draft of the Levels of Digital Preservation Chart
The actuality of bit-rot in media is uncertain. Many documents 500 years old are readable-ish if you have the skills and accept that some parts may have decayed. That tells us a lot about te exact media people used way back then.
The trouble with digital records is this:-
Searchability is a requirement (even though we don't expect that with written records). The reason is that there is so much of it when compared with the sparse records of times past. So you need a 'good' copy for data analysis and some original media to inform historians of the future how we looked upon the information, or what 'ordinary people' or 'ordinary businesses' had at their disposal.
DVDs, even commercially stamped, can suffer from bit rot. Optical disk technology is inherently flaky. Use multiple HD backups and make sure you have offsite storage.
Comment removed based on user account deletion
image->backup->check image and backup->discard->sign in triplicate->sent in->send back->query->lose->find->subject to public inquiry->lose again->bury in soft peat for three months and recycle as firelighters
systemd is Roko's Basilisk.
As a photographer myself I would recommend to print out the best choice prints and store them physically, as photographic prints still has the best record for preservation when compared to any / all types of digital media. By all means take running copies of all your data, on and offsite backup. A physical copy of the best prints though is likely to be preserved longer.
This. I logged in to say this.
An archivist should keep the original as much as possible. Otherwise, what's the point? Would it be acceptable to photocopy a letter and than discard the original because it's too old? No. You photocopy (or actually non-destructively scan these days), and then you keep the original in a climate controlled environment. People can work off the copy, but sooner or later someone will want to look at the original.
At a minimum you take high-resolution photos of the media and record those photos (and the type of media, etc.) along with the copy of that you are putting into your fancy database.
What's the point of archiving anything? It's too keep it for posterity.
Geeze, what do they teach archivists these days...
Disclaimer: I'm not an archivist, but I have done a lot of study and work in the digital sector (including archiving).
HELP MY ACCOUNT HAS BEEN HACKED BY AN ILLIBERAL ART STUDENT SET TO DESTROY THE INTERWEBZ!
i wouldn't be so sure about that - my floppy copies of MacOS 6 and 7 worked well on my classics and such. even my powerbook 140, and this was only 5-6 years ago. worked perfectly, original factory disk sets.
http://www.hardwaresecrets.com/printpage/How-to-Generate-Floppy-Disks-for-Old-Macintosh-Computers/1713
anyway, for the curious, a rundown on writing old mac floppies.
(note: if you do them as 1.44MB floppies, they read/write at the same speed, so no special hardware needed as long as you have macs that can read 1.44s. easy way to avoid the hassle if you have the option.)
Seconded regarding "check image and backup". Only after you have successfully tested the restore process, you know that you actually have viable backups.
Also, think about the nature of your images. Are they easily migrated to another format, if the original hardware is no loger available?
For instance, I have encountered one or two floppy "imaging" programs that simply store the contents of all sectors on a 3.5" floppy into a 1.44MB binary file. Good for getting hidden information in seemingly unused sectors too, but if you need to access the information 20 years from now, there may be no more floppys to restore to.
Depending on your archiving goals, it may be better to copy the contents of the media to a directory on a larger medium.
C - the footgun of programming languages
Actually, Macrovision played havoc with the sync pulses - it would produce a deliberately too weak one (but enough that most VCRs could lock on) and then produce a strong one, etc..
What happens is the TV is looking for a sync pulse and sees it, even though it's non-standard (it's a case of basically seeing what you expect - the TV is looking for a pulse, it sees the beginnings and locks on).
A VCR though wants to ensure that what gets recorded on tape (which has a poor dynamic range) is as strong as possible, so the weak pulse causes the VCR's AGC circuits to kick up the gain for the frame. The strong pulse causes the AGC to kick it down. The fact that the video signal is otherwise normal means the AGC has now made it clip in the first case, and suppressed it in the second.
And in fact, it was technology (and pressure from Macrovision) that really resulted in it working - older VCRs had slower AGC circuits and could play it just fine as the AGC never messed with the signal before the pulses reset themselves. But as technology and Macrovision pressure increased, the AGC circuits got better and produced this artifacting.
It's why the simplest cure was a signal regenerator - it normalized the pulses back to standard.
You can still do enough to mess with the disc - a modern CD-ROM or DVD-ROM supports standard cd-ripping, and on data CDs, the first track can be reset back into playing as audio for ripping. Or you could embed a fake audio track on a CD that actually has data on it. There's enough flexibility in the CD-ROM spec because it was designed for audio first - data was a hack.
DVDs are a bit trickier, but I'm sure all the copyprotection tricks movies use can be employed for regular data as well. Blu-Ray likewise, though it has some tricks of its own embedded in the firmware.
3 is 2
2 is 1
1 is none.
best way to make sure you have it
3 copies
2 different media, at least (hd, ssd, thumb, cd, dvd, tape, cloud)
1 is offsite.
Security equipment manufacturers are notorious for using proprietary equipment or file formats to limit interoperability with the competition's systems, ...
You should have written " All manufacturers are notorious for using proprietary equipment or file formats to limit interoperability with the competition's systems." This problem has been with us from the start of the industrial age, and possibly earlier.
This was one of the primary reasons why, back in the 1960s, the US military's ARPA (Advanced Research Projects Agency, now DARPA) created the R&D project that led to the ARPAnet, which evolved into the Internet. The military folks were using more and more electronic gadgetry, and had learned that it was impossible to write military specs so precisely that manufacturers couldn't find subtle ambiguities and create data formats that were incompatible with their competitors. The manufacturers' reps always used the same argument: "If you'd bought only our stuff, you wouldn't have problems of incompatibility."
So they faced the fact that such incompatibilities are a permanent fact of life, and developed a solution: They'd plug electronic gadgets into those newfangled "computer" thingies, where they could write software that would decode a gadget's signals and data formats, translate them to a standard format, and transmit them to another computer with attached proprietary gadgets, where the software would translate the standard formats into their manufacturers' formats. This allowed remote military sites to use whatever equipment they had on hand (which hadn't been destroyed by some enemy ;-), and it could communicate with other equipment anywhere else that they could send the standard-format data.
This has always applied to "obsolete" equipment, too. Manufacturers want customers to continually upgrade to the latest stuff, and encourage this by making the current stuff unable to communicate with stuff more than N releases old. But if you still have the software that talked to the old models, you can use it and the ARPA/Internet system to communicate with the newer models, using the same translation-to/from-standard scheme.
It can be amusing to read comments implying that this sort of incompatibility is something new. It's not only not new; it was one of the prime motives for the development that led to the Internet and its protocols half a century ago. Without it, hardly any electronic gadgets would be able to communicate with anything not from the same manufacturer (and "upgraded" fairly recently).
If you investigate, you may be surprised to learn how much of the Internet's infrastructure is running on ancient PCs that will no longer run MS Windows (or DOS ;-). I've helped build a number of "server centers" that were made up mostly of free PCs whose previous owners just wanted to dump them. It's now 10 or 15 (or 20) years later, and the server software is still sitting there running just fine, talking to any equipment that it can physically exchange bits with. It's all open-source software, so the software is easy to upgrade indefinitely. The vendors aren't very happy with us, though. ;-)
I've also worked on a number of software projects that can be summarized as "cracking" a company's data, typically from old backups, and sending it to modern computers in formats that they understand. This hasn't been for law-enforcement or military agencies; it has been for the companies themselves who find that their old data is either unreadable or misinterpreted by current IBM/Microsoft software. It's an old story ...
And to steer back to the original topic, the same comments apply to whatever digital media you may have. If you want your family photos or videos usable 20 or 50 or 100 years from now, you should be translating them to (preferably several of) the current standard formats. Keep multiple copies of each.
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
Personal conversation with a former boss who had been an intern at the inception of ARPANet indicated an origin slightly different than the 'official' version. She said that her boss was tired of having four different terminals on his desk to communicate with the four different project groups that he worked with. He and some of the other sysadmins slapped together a method that would allow the local mainframe (some DEC monster) to talk to the HP mainframe in (IIRC) Colorado. They guys in Colorado adapted that work to let the HP talk to an IBM in Palo Alto(?). Now her boss could communicate with the IBM and the HP simultaneously, and get rid of two of the terminals.
Salescritters saw what they had done in, essentially, their spare time and said, "We should be getting paid for this!" The official history starts there.
"Think about how stupid the average person is. Now, realise that half of them are dumber than that." - George Carlin
Actually, I've read a number of similar stories, dealing with both the original ARPAnet and the origins of unix at Bell Labs. What they all have in common is the problem of dealing with incompatible gadgetry that use different data/message formats, with a solution that involved connecting the incompatible stuff to a computer running software that acted as a translator. Acting as a remote terminal was part of about half the stories, and translating file formats was often involved..
I get the impression that the general incompatible of electronic stuff, especially computers, was a common complain in the years around 1970. That was a time when computers were spreading rapidly, and smaller computers were coming out that were cheap enough that they could be the center of a small lab rather than in an organization-wide computer center. So it was feasible for people to consider uses that wouldn't be permitted in the batch-oriented computer centers of the time. Using a computer as a "middle-man" between stuff that couldn't quite communicate was likely a widespread application of such smaller computers.
It's too bad that a lot of this was so poorly documented, so that often all we have is after-the-fact anecdotes from a lot of different sources. But the people involved probably just thought they were working on personal annoyances with their equipment, not inventing important new things or fomenting some sort of revolution. ;-)
Those who do study history are doomed to stand helplessly by while everyone else repeats it.