Most Digital Content Not Stable
brunes69 writes "The CBC is running an article profiling the problems with archiving digital data in New Brunswick's provincial archives. Quote from the story: 'I've had audio tape come into the archives, for example, that had been submerged in water in floods and the tape was so swollen it went off the reel, and yet we were able to recover that. We were able to take that off and dry it out and play it back. If a CD had one-tenth of one per cent of the damage on one of those reels, it wouldn't play, period. The whole thing would be corrupted'. Given the difficulties with preserving digital data, is it really the medium we should be using for archival purposes?"
That content can not be preserved at all. We'll be a civilization without written history, like American Indians.
Isn't that the point of digital? Lossless copies are possible (depending on format obviously). Why have one plastic cylinder that can be lost when you can have it in 5 or 10 locations?
An operating system should be like a light switch... simple, effective, easy to use, and designed for everyone.
Stone tablets. Just drill a hole for a zero and your away and laughing :)
Now we just need a large enough area to store them
Help! help!, the termites are eating my DRAM!!!
let's play it all by memory. Seriosly, do we really have a choice? The more densely we pack the information that more of a chance it has for corruption. The "CD" mentioned by the article has effectively 700 minutes of music of the same quality as the 60 minute tape.
Any guest worker system is indistinguishable from indentured servitude.
At the enterprise level we use 3.5" 1.44MB Floppy drives in an elaborate redundant array. It consists of roughly 70,000 Disks, each changed nightly. We haven't had any problems yet. Hopefully the rest of the world will play catch up soon.
In a world of acronyms, the words are the real victims.
Ridiculous. It's not the fact that content is digital, it's the fact that the media being used to store the information (CDs etc) is fragile. If these mythical audio tapes had been digital tapes, recovering the signal from them would have been just as easy.
... wasn't *exactly* what you put on. You have the appearance of stability, that you can retrieve something off a damaged tape, but the truth is something different. That's the beauty of analogue. The same simplicity and fault-tolerance of the format also means the format will naturally degrade over time. The contents may be retrievable, but they've degraded, and as such are not the same contents as when first written. Digital fails, but when it doesn't fail, you have exactly the same content as you did when you started. Archivists will not run from digital - their techniques will improve instead. or something.
we need to realise that nothing lasts forever.
Then, we can figure out the most cost-effective medium to record stuff on, with determined re-archival cycles.
Shouldn't it be possible to take all the media and just crush it? You know, like throw it into a Mega Power 3000 Digital Garbage Collector (TM) and crush it into a diamond or something? Let future generations figure out how to decompress it.
How to Download YouTube Videos
Just because it's harder to recover the data doesn't mean it's impossible.
Of course, anyone using CDs or DVDs for large data backup must have a lot of interns to do the disc swapping.
"Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
Some analog technologies, like old color films, have also degraded and need image enhancement to recover the original content.
Yes, analog tape is durable. But let's take it and that "CD" and put them in front of a large electromagnet and see how each fares.
SJW: Someone who has run out of real oppression, and has to fake it.
Have people already forgotten the advantage of digital? If you have an analog tape, every time you make a copy of it, the quality will be degraded. But with digital, you can make a million copies and the final copy will be the byte by byte equivilent of the original. So what if CDs only last 10 years before becoming unusable? You can make another copy! So what if this guy wouldn't have been able to recover after physical damage to his media....if it was important, he should have had digital offsite backups! And those backups would have been 100% equivelent to the originals.
Qxe4
If losing 1% of the data on a CD means the data is a total loss, doesn't that say to you that you should be using a file system and data formats with more redundancy and parity?
Of course for the ultimate in durable electronically readable storage you should be burning everything to PROMs.
"Prefiero morir de pie que vivir siempre arrodillado!"
Emphasizing the “I” in RAID.
Why bother.
I don't know what it is with /. but it seems this kind of infopocalypse story comes up at least once every 6 months in regards to digital data. I can only think one thing in each case: This is fucking retarded.
As you said, the great thing about digital data is that is can be replaced cheaply, perfectly, and spread around. It's resilience isn't in the one copy lasting 1000 years, it is in having copies everywhere, so no even short of nuclear war can eliminate them all, and maybe not even then.
This also is the response to the other big cry-wolf thing, "What happens when the data is in a format that's too old???!!11one" The answer is we just keep copying it to new formats. I have digital copies of papers that I wrote in high school. They were written on an old copy or Works for Windows 3.1 and usually saved to floppy. I don't have a floppy any more but it isn't a problem. I long ago transferred them to a harddrive and I just keep transferring them to new drives when I get them. I also periodically load the old documents in to whatever my current word processor is, convert them, and re-save them as a new format.
So the parent is completely correct. Because of digital's ability to be perfectly copied, and especially with the Internet's ability to distribute those copies to anywhere in the world, it can have a permanence far above and beyond analogue. The individual copies might be fragile, but get a few thousand, or million of them and you'll be hard pressed to get rid of them all.
Hrmm,
DLT
reel-to-reel
Mini8mm
SAN
CD/DVD
etc...
Depends on how deep your pockets go and your calculation for the value of the data if lost. You are doing the math on loss of data, riggghhhhttt?
When the only tool you have is a hammer, every problem looks like a nail
The tape had analog data on it. Analog, as we all know from years of television and radio, is very forgiving of damage. CDs are digital data. There is error correction, but for normal playback/reading devices there is a limit beyond which they simply give up trying. The data is perfect or its gone for those machines.
Sad to say, tape dies too.
What is more interesting is the use of compression (and rights management, though if your originals are encrypted you deserve to get screwed - physical security comes first). With analog and simple stream encoding of time domain data (such as audio recordings) much data can be recovered using an external benchmark for the time code. Compress that data and lose your parity and you're totally hosed.
I've never been a proponent of compressed or encoded backups. Sure they save space and add a layer of "security", but that comes at the cost of flexibility should damage occur.
Of course, as has certainly already been mentioned - with digital data, you have the luxury of making multiple perfect copies as well as the ability to perform automated checks of that data, mostly possible without user interaction necessary.
Othwise, stone tablets have the best track record so far, though the storage density is a bit on the light (or should I say heavy?) side.
Is it just my observation, or are there way too many stupid people in the world?
...the solution is simple. We need a way to take a quantum snapshot of the whole of the Earth at least once every 24 hours and then to send that data out into space as a broadcast in all directions. To retrieve the quantum structure, we'd simply pop out of a wormhole near where the data is passing and retrieve it, then retransmit it back to here and reconstruct the Earth as it was before catastrophe struck. The nice thing about this is that if we can find another M class star like Usolia (our sun), we don't even have to beam the data through the wormhole. We could just intercept it near the star and start the assembly process there. Point-in-time restores for the whole of the planet. Imagine that. You're welcome.
-"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o
In the 1980's they digitized the Domesday Book. Trouble was the format they used is now obsololete. The good news (apart from still having the origional) they have re-inveted the wheel. http://news.bbc.co.uk/2/hi/technology/2534391.stm for details.
Semper ubi sub ubi
If a CD had been submerged in water, it would've been fine. There's no point in making the comparison if it wouldn't have been damaged in the first place. They need to find a better example.
"No one likes working in a hamster wheel, and your shop smells of cedar shavings from here." - TaleSpinner
There is much that has already been documented and guidelines exist to guarantee somehow the short to medium-term preservation of digital assets; this particular link is for audio-related digital assets, but data is all the same...!
A combination of multiple sets of magneto-optical and tape backups maintained in separate locations, all temperature and humidity-controlled environments should easily yield 25~30 years shelf life, which guarantees that by then we'll hopefully have found better long-term options to transfer these to.
I am transferring most of my 15 to 20-year old audio DAT tapes digitally with no problems. Good brand-name CD-R's (like Tayo-Yuden) kept out of the light and at a steady temperature seem fairly resilient so far, but there has been batches which over time have developed 'rot' or layer oxydation, which sometimes renders them partially or wholly unusable.
DLT tapes are so far the most trouble-free type of media I have encountered, but with only 10 years to go back on, not sure that is accurate.
Z.
This also is the response to the other big cry-wolf thing, "What happens when the data is in a format that's too old???!!11one" The answer is we just keep copying it to new formats. I have digital copies of papers that I wrote in high school. They were written on an old copy or Works for Windows 3.1 and usually saved to floppy. I don't have a floppy any more but it isn't a problem. I long ago transferred them to a harddrive and I just keep transferring them to new drives when I get them. I also periodically load the old documents in to whatever my current word processor is, convert them, and re-save them as a new format.
I think you're missing an important element here. As you move along in time, the volume of data that must be converted to the format du jour only gets bigger and bigger.
For a single person, it's probably not too bad. I, too, have pretty much everything I ever wrote since I first got a computer, and every few years I've committed to rolling the whole thing onto new media. So I've gone from offline backups on floppies, to Zip disks (in retrospect a mistake), to CDs, to DVD-R, and now to DVD+R (the -R discs were crappy and I've since heard that +R is a superior format anyway). This isn't much trouble, because the amount of data I have to backup hasn't really grown that much faster than the data density of available media. I'm probably up to a couple of DVDs for the stuff I really, really care about, maybe a binder if I include all the photos and video.
But what's a basic Saturday-afternoon copy-and-burn job for an individual is a Sisyphean task for a large government agency or library, particularly one who is constantly generating new content. I've seen places that could barely keep up with archiving the stuff they were producing, much less roll their vast archives forward onto new media. So they'd have vaults of hard drives, sitting next to DLT cassettes, next to IBM 3480, next to racks of old half-inch open-reel tapes. Probably back in some dark corner there were piles of punched cards; it really wouldn't surprise me. The problem of data loss due to unreadable formats isn't some abstract 'maybe,' it's already happened in a lot of places (but nobody really wants to talk about it, so it mostly gets buried and whatever's on the tapes gets written off).
The reason why there's so much interest in preservable formats is because while it may not be strictly impossible to constantly roll old backups and archives forward, it's very hard, and requires vast amounts of effort and expense. If you have a backup that's being written into a format that you know is going to be readable for a long time, even if it's more expensive to write initially, you can save a lot of money and time down the road by not having to copy it forward as often.
People may get a little shrill when they're talking about these issues, but they're quite real.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
I know I'm offtopic, injecting facts into this debate, but I thought it might be interesting to bring up the VXA tape format. It allegedly survives all kinds of abuse like freezing, see Freezing Test
I have never tried these drives, and would love to hear from someone independent who has.
Chappies in New Brunswick:
From an earlier /. article:
Quick someone tell the author of: 'So You've Lost a $38 Billion File' that everything is alright! New Brunswick had data that was submerged in water, tape so swollen it was off the reel; they still managed to recover it.
And don't come out with that: 'Polar Bear ate the backup tape' excuse again!
I'm going to transform myself into a mighty hawk. Either that or I'll just go and work at Dixons, haven't decided yet.
This is a dual problem:
1) Digital data needs to be moved about once every 5 years onto a new physical store, disk, whatever. Think of the amount of data sitting around on floppy disks that is being lost as we speak.
2) Data has to be recorded in a way that that presumes whatever software you use to create it will not exist in the future. Anyone who saved their life's work in some ancient binary word processor file will know what I mean. For most computer-based data storage that requires data be stored somewhere in plain text, and using as open a format of 'markup' as possible, if any.
In effect, from a historical/archival point of view, data does not exist unless it is kept in at least two places at all times, and unless whatever bit of software you use to create it can also save it in a non-binary format of some sort for access for future generations who don't have a copy of your software.
Ok, that does not pertain to sound recordings or images, but even then some sort of 'permanent' standard is essential for all data.
I used to work with medieval documents written on vellum - sheep skin. The original Domesday book was written on vellum, and is as readable today as it was in 1150. (It also doesn't need a power supply to work!) Meanwhile the digital 'Domesday' Laser Disk made in the early 80s in the UK had to be saved from oblivion a few years ago (with a great deal of work) because the computers and hardware that it was created to work with were utterly obselete. Fortunately, and unusually, someone realised the problem before it was too late.
The tragedy of what was done to the Native Americans isn't that Europeans came in and conquered them. It's the way they were treated afterwards. I don't think anyone can read about the Trail of Tears and not feel something. You can't confuse war with murder. There is a difference.
That being said. What's done is done. It should be remembered so we learn from those horrible mistakes. It shouldn't be a constant source of guilt to be used against people that had no part in it. The same goes for slavery, genocide and all the other ignorant suffering we've inflicted on each other.
About 6 years ago our home burned down. It was a complete loss. Once we were able to pick through what remained I came across some jewel cases containing some backup data. These cases were next to some cassette tapes. The jewel cases had warped considerably but many of the CDs inside were still flat and usable. The cassette tapes were warped as well but the tape inside looked like it had shriveled from the heat. What ever the type of plastic the CDs were made from withstood heat the cassettes could not.
:)
Just tossing this out there. The topic made me remember the pleasure of finding some stuff in tact.
The European invasion of North America hardly constitutes a genocide. The sole purpose was not to eradicate a race, but to destroy the fabric of the culture and remove them from the land. I do believe I have friends that have some native american ancestry... The only difference is that it happened in a modern era, and the conquered people were allowed to retain some continuity. People act as if the inhuman treatment that befell the natives was in some way out of the ordinary for human nature. You can not compare the destruction of the Native Americans to Rome conquering Greece. Greece was a well developed empire that fell to another and was absorbed. There was technology and racial similarities that promoted integration. By comparison, the native people of North America had no such technology, literature, and had no relationship with the Europeans. In the beginning people negotiated, but the problem is that negotiations are a farce, and they only matter if neither side has an advantage. In the case of the Native Americans, they never really had a choice, and the some of them knew it. They had absolutely no chance against European powers simply because of the lacking of technology and cultural cohesion. One thing that people forget is that the idea of a superior people has been around forever and still continues. It is part of the human psyche and almost every major religion in the world. Don't think of it so much as a racial superiority, but rather religious. This is very much what is going on in the middle east and why they can't have peace. The religions of the region believe they are chosen to possess the holy land, and they can't let the sub humans have it. This has happened throughout all of history to ever race in the world (even among the same peoples)... just this one was more well documented.
That's the first real main stream didgital format and it has not yet reached the point where we have to move data off of it for fear of it being unreadable.
... say again?
Erm
There are terabytes (quite literally tons) of data sitting around on everything from old 7- and 9-track 1/2" open reel tape, to old 8" and 5-1/4" floppies, and other formats that are basically dead. [I'm not familiar with anything older than that, but I'm sure there are some real greybeards around that could enlighten you as to what came after punchcards but before the vac-column tape drives.] The only saving grace of those formats is that if you can find a reader, there's a chance it might either still work, or could be made to work, if you could find a compatible computer to interface it to (because the machines themselves were built pretty well; they were still viewed as industrial equipment of a sort, rather than consumer electronics). But the expense of doing that would be enormous -- the people who know how to maintain, and increasingly to operate, those things are retiring and becoming hard to find.
And analog formats aren't exactly immune, either. Where I used to work, we had several boxes of old video recordings on 2" quad that we were storing for preservation purposes, but couldn't afford to have transferred to another medium (despite the obvious: that the longer you wait, the more expensive it's going to get if you ever do really want it). That format was used for over 20 years; there's got to be thousands of hours of it sitting around.
Even if you define 'mainstream' to be something that an average person could afford, CDs certainly weren't first; lots of people had PCs with various types of digital storage.
But to only focus on 'mainstream' formats misses the point entirely. Stuff that's been distributed out to millions of people isn't what's at risk of disappearing; it's the original source material (think NASA's Apollo videos), or information that's naturally stored in big 'silos' (think public records) that's really at risk, and those have been stored in a plethora of formats, digital and analog, over the past 50-75 years, which are difficult to work with today.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
The European invasion of North America hardly constitutes a genocide.
That's a pretty fine hair to split between genocide and ethnic cleansing. What is the real difference between successful ethnic cleansing and unsuccessful genocide?
I do believe I have friends that have some native american ancestry...
All this means is that the genocide was not complete.
The Nazis attempted a genocide against the Jews but did not complete the job. If they had started out simply with a mission of ethnic cleansing and achieved the same result would it have been a better thing?
As much as I like the convenience of digital recording (random access especially), I can see where they're coming from. Especially from a consumer electronics standpoint.
Our one, and so far only, experience with our DVD recorder (the TV/Video kind) illustrates why we haven't gotten rid of our VHS tapes yet.
Least steps to record onto a new VHS:
1) pop tape in
2) press record
Least steps to record onto new DVD (-RW in our case):
1) pop DVD in
2) wait 10 seconds before format options come up
3) wait 1 min for format to finish
4) select recording option (quality setting, etc)
5) begin recording
At the end of an hour-long show, I finally hit "stop" on the DVD recorder. In earlier, shorter tests it took about 30 seconds to write out the information for that hour. This time, it failed for some reason.
End result: the whole hour of recording was lost.
All the other nice features that would've come with recording to DVD were flushed right down the drain, for the simple reason the damn thing can't even guarantee that what I recorded would, in the end, actually be available to play back!
I think what the original poster meant to say was, "With digital you either get a perfect copy, or a corrupt copy. With analog you always get a corrupt copy."
Digital content isn't unstable, it's just more sensitive to corruption because in general software expects to be able to extract a perfect copy every time, rather than a near-perfect copy. Whether you can recover partially corrupted digital data depends on several things:
A) Choice of filesystem (journaling, error correction, built-in redundancy)
B) Choice of media (CD/DVD bad unless multiple copies you have, hmm?)
C) Choice of physical storage method and location (store CD/DVD out of sunlight, vertical in jewel case)
D) Choice of archival file formats (PAR2, anyone?)
E) Choice of hardware (some hardware is more robust)
F) Choice of software used to read the media (most software gives up too easily)
The cure:
1. Use the right media (with phsyical redundancy measures to counter physical damage).
2. Use a robust filesystem (preferably with error correction and redundancy measures also to counter minor physical damage).
3. Use a robust file format specifically designed for archiving data (again with built-in redundancy measures and compartmentalized structure that can work around partial corruption).
4. Use hardware that has a high tolerance for physical or digital media corruption.
5. Use software specifically designed to keep trying to extract data even after encountering partial corruption (like Unstoppable Copier).
All that being said, if you were to say that digital media, file formats, filesystems, hardware and software are too fragile, I would have to agree. There is far too little fault tolerance and redundancy built into digital storage media, hardware, software, filesystems and file formats. A lot could definitely be improved for the future. But calling most digital content unstable because a CD got scratched is disingenuous at best.