Most Digital Content Not Stable
brunes69 writes "The CBC is running an article profiling the problems with archiving digital data in New Brunswick's provincial archives. Quote from the story: 'I've had audio tape come into the archives, for example, that had been submerged in water in floods and the tape was so swollen it went off the reel, and yet we were able to recover that. We were able to take that off and dry it out and play it back. If a CD had one-tenth of one per cent of the damage on one of those reels, it wouldn't play, period. The whole thing would be corrupted'. Given the difficulties with preserving digital data, is it really the medium we should be using for archival purposes?"
That content can not be preserved at all. We'll be a civilization without written history, like American Indians.
Isn't that the point of digital? Lossless copies are possible (depending on format obviously). Why have one plastic cylinder that can be lost when you can have it in 5 or 10 locations?
An operating system should be like a light switch... simple, effective, easy to use, and designed for everyone.
Stone tablets. Just drill a hole for a zero and your away and laughing :)
Now we just need a large enough area to store them
Help! help!, the termites are eating my DRAM!!!
If you are using CDs, or for that matter DVDs, for archival, you deserve it.
PS: Not flamebait. Anybody who has worked seriously on backups/archiving knows for sure that magnetic is still the way to go.
At least for now.
let's play it all by memory. Seriosly, do we really have a choice? The more densely we pack the information that more of a chance it has for corruption. The "CD" mentioned by the article has effectively 700 minutes of music of the same quality as the 60 minute tape.
Any guest worker system is indistinguishable from indentured servitude.
I like that digital content is fluid and can be easily changed.
The real problem is more that the media is not stable. Optical disks are certainly not a long term archival strategy.
I wonder if there's a good way to convert digital video into black and white film (maybe with one frame per color channel) since it's got a proven archival record.
At the enterprise level we use 3.5" 1.44MB Floppy drives in an elaborate redundant array. It consists of roughly 70,000 Disks, each changed nightly. We haven't had any problems yet. Hopefully the rest of the world will play catch up soon.
In a world of acronyms, the words are the real victims.
Ridiculous. It's not the fact that content is digital, it's the fact that the media being used to store the information (CDs etc) is fragile. If these mythical audio tapes had been digital tapes, recovering the signal from them would have been just as easy.
... wasn't *exactly* what you put on. You have the appearance of stability, that you can retrieve something off a damaged tape, but the truth is something different. That's the beauty of analogue. The same simplicity and fault-tolerance of the format also means the format will naturally degrade over time. The contents may be retrievable, but they've degraded, and as such are not the same contents as when first written. Digital fails, but when it doesn't fail, you have exactly the same content as you did when you started. Archivists will not run from digital - their techniques will improve instead. or something.
we need to realise that nothing lasts forever.
Then, we can figure out the most cost-effective medium to record stuff on, with determined re-archival cycles.
Shouldn't it be possible to take all the media and just crush it? You know, like throw it into a Mega Power 3000 Digital Garbage Collector (TM) and crush it into a diamond or something? Let future generations figure out how to decompress it.
How to Download YouTube Videos
Just because it's harder to recover the data doesn't mean it's impossible.
Of course, anyone using CDs or DVDs for large data backup must have a lot of interns to do the disc swapping.
"Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
Some analog technologies, like old color films, have also degraded and need image enhancement to recover the original content.
The point is: Like what?
Yes, analog tape is durable. But let's take it and that "CD" and put them in front of a large electromagnet and see how each fares.
SJW: Someone who has run out of real oppression, and has to fake it.
Your audio tape surely has lost some bits during its water accident. Though you may be able to play it, it is like you have downsampled the bitrate of an mp3. Oh, _that said_, even if my (S)VCD is borked at one spot, I can (usually) still play it, with a little video glitch of course, just like your audio tape. The problem is that most CD/DVD drives retry too long on one sector, even at the hardware level.
Have people already forgotten the advantage of digital? If you have an analog tape, every time you make a copy of it, the quality will be degraded. But with digital, you can make a million copies and the final copy will be the byte by byte equivilent of the original. So what if CDs only last 10 years before becoming unusable? You can make another copy! So what if this guy wouldn't have been able to recover after physical damage to his media....if it was important, he should have had digital offsite backups! And those backups would have been 100% equivelent to the originals.
Qxe4
...so you can store 600MB on an audio tape now? ...can you recover a backup tape that can store any significant amount of data using the same methods?
If not, what's the point of comparison?
But in fairness, how does the write-once flash memory that Sandisk announced stand in comparison to the audio tape? Or a normal HDD?
If losing 1% of the data on a CD means the data is a total loss, doesn't that say to you that you should be using a file system and data formats with more redundancy and parity?
Of course for the ultimate in durable electronically readable storage you should be burning everything to PROMs.
"Prefiero morir de pie que vivir siempre arrodillado!"
Emphasizing the “I” in RAID.
Why bother.
I don't know what it is with /. but it seems this kind of infopocalypse story comes up at least once every 6 months in regards to digital data. I can only think one thing in each case: This is fucking retarded.
As you said, the great thing about digital data is that is can be replaced cheaply, perfectly, and spread around. It's resilience isn't in the one copy lasting 1000 years, it is in having copies everywhere, so no even short of nuclear war can eliminate them all, and maybe not even then.
This also is the response to the other big cry-wolf thing, "What happens when the data is in a format that's too old???!!11one" The answer is we just keep copying it to new formats. I have digital copies of papers that I wrote in high school. They were written on an old copy or Works for Windows 3.1 and usually saved to floppy. I don't have a floppy any more but it isn't a problem. I long ago transferred them to a harddrive and I just keep transferring them to new drives when I get them. I also periodically load the old documents in to whatever my current word processor is, convert them, and re-save them as a new format.
So the parent is completely correct. Because of digital's ability to be perfectly copied, and especially with the Internet's ability to distribute those copies to anywhere in the world, it can have a permanence far above and beyond analogue. The individual copies might be fragile, but get a few thousand, or million of them and you'll be hard pressed to get rid of them all.
The tape had analog data on it. Analog, as we all know from years of television and radio, is very forgiving of damage. CDs are digital data. There is error correction, but for normal playback/reading devices there is a limit beyond which they simply give up trying. The data is perfect or its gone for those machines.
Sad to say, tape dies too.
What is more interesting is the use of compression (and rights management, though if your originals are encrypted you deserve to get screwed - physical security comes first). With analog and simple stream encoding of time domain data (such as audio recordings) much data can be recovered using an external benchmark for the time code. Compress that data and lose your parity and you're totally hosed.
I've never been a proponent of compressed or encoded backups. Sure they save space and add a layer of "security", but that comes at the cost of flexibility should damage occur.
Of course, as has certainly already been mentioned - with digital data, you have the luxury of making multiple perfect copies as well as the ability to perform automated checks of that data, mostly possible without user interaction necessary.
Othwise, stone tablets have the best track record so far, though the storage density is a bit on the light (or should I say heavy?) side.
Is it just my observation, or are there way too many stupid people in the world?
In their quest to cram more bits onto the media, they've sacrificed longevity.
We could have very robust backup media, but it wouldn't have the density of a DVD or DLT.
They still use mylar paper tape for sequencing machine tools, it's super low density, but name another media that will survive exposure to metal shavings and coolant.
...the solution is simple. We need a way to take a quantum snapshot of the whole of the Earth at least once every 24 hours and then to send that data out into space as a broadcast in all directions. To retrieve the quantum structure, we'd simply pop out of a wormhole near where the data is passing and retrieve it, then retransmit it back to here and reconstruct the Earth as it was before catastrophe struck. The nice thing about this is that if we can find another M class star like Usolia (our sun), we don't even have to beam the data through the wormhole. We could just intercept it near the star and start the assembly process there. Point-in-time restores for the whole of the planet. Imagine that. You're welcome.
-"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o
With digital data, we have a vastly better chance of protecting data from disasters - and never use it. The data on a CD is not recoverable if 50% of the disc is unreadable, because there's not that much redundancy. Audio tape is recoverable under extreme circumstances because there's much more redundancy: The storage density is lower because the individual bit of information is stored in a much bigger (and thus harder to destroy completely) area. You can do the same with digital data. Just spread the information so that it becomes very unlikely that a disaster destroys every place one bit is stored. Error correction codes can be used to make this much more effective and efficient than keeping plain duplicates or simply lowering the storage density, which are the only two options you have with analog media. We're just so stuck in the belief that digital makes everything smaller that we tend not to "waste" storage space on redundancy. When was the last time you made an off-site backup?
Google seach for archivign+digital+files
has some promising links to how to do digital archives well.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
In the 1980's they digitized the Domesday Book. Trouble was the format they used is now obsololete. The good news (apart from still having the origional) they have re-inveted the wheel. http://news.bbc.co.uk/2/hi/technology/2534391.stm for details.
Semper ubi sub ubi
Stable media can hang around for a long time. Media like film, paper, stone also have the annoying property that the information can usually be recovered without resorting to rebuilding the machines to read the media. That means almost anyone could view the information!
Look at all the trouble the Constitution and Bill of Rights have caused! If they were stored instead on a medium requiring proprietary devices (now lost to time) to read the masses would never know exactly what they said. All that time wasted by congress and the courts could have been saved!
Who needs a cultural history anyways, its not like controlling the past has any sort of advantage; we've always been at war with Eurasia, that's all we need to know.
If a CD had been submerged in water, it would've been fine. There's no point in making the comparison if it wouldn't have been damaged in the first place. They need to find a better example.
"No one likes working in a hamster wheel, and your shop smells of cedar shavings from here." - TaleSpinner
There is much that has already been documented and guidelines exist to guarantee somehow the short to medium-term preservation of digital assets; this particular link is for audio-related digital assets, but data is all the same...!
A combination of multiple sets of magneto-optical and tape backups maintained in separate locations, all temperature and humidity-controlled environments should easily yield 25~30 years shelf life, which guarantees that by then we'll hopefully have found better long-term options to transfer these to.
I am transferring most of my 15 to 20-year old audio DAT tapes digitally with no problems. Good brand-name CD-R's (like Tayo-Yuden) kept out of the light and at a steady temperature seem fairly resilient so far, but there has been batches which over time have developed 'rot' or layer oxydation, which sometimes renders them partially or wholly unusable.
DLT tapes are so far the most trouble-free type of media I have encountered, but with only 10 years to go back on, not sure that is accurate.
Z.
I currently have two 320GB external USB drives with everything I consider important on them. Once a week I update each copy, and they are both stored in completely different physical locations. Every couple of years or when the technology changes enough, I buy a bigger/newer drive and copy everything over. I intend to do this until the singularity comes and it all becomes moot.
Thats funny, Google or Yahoo has never lost my entire email account due to 'Digital Content Not Stable'.
RAID 6 eh? (not to be confused with raid 6a).
In order to make digital backups that are more durable than their analog counterparts:
1) Make a digital copy
2) Repeat step 1 until your digital copy takes up as much physical space as an analog copy would
3) For no reason, lay out all your digital copies in such a way that the whole of them create an analog copy
4) For fun, Pretend what you've done is "holographic storage"
-- 'The' Lord and Master Bitman On High, Master Of All
You don't have to store everything in the native format (like red book). You can
take advantage of error correction codes and put a lot of redundant info on there. To the point
that you'd have to damage a large percentage of it to really screw it up.
This also is the response to the other big cry-wolf thing, "What happens when the data is in a format that's too old???!!11one" The answer is we just keep copying it to new formats. I have digital copies of papers that I wrote in high school. They were written on an old copy or Works for Windows 3.1 and usually saved to floppy. I don't have a floppy any more but it isn't a problem. I long ago transferred them to a harddrive and I just keep transferring them to new drives when I get them. I also periodically load the old documents in to whatever my current word processor is, convert them, and re-save them as a new format.
I think you're missing an important element here. As you move along in time, the volume of data that must be converted to the format du jour only gets bigger and bigger.
For a single person, it's probably not too bad. I, too, have pretty much everything I ever wrote since I first got a computer, and every few years I've committed to rolling the whole thing onto new media. So I've gone from offline backups on floppies, to Zip disks (in retrospect a mistake), to CDs, to DVD-R, and now to DVD+R (the -R discs were crappy and I've since heard that +R is a superior format anyway). This isn't much trouble, because the amount of data I have to backup hasn't really grown that much faster than the data density of available media. I'm probably up to a couple of DVDs for the stuff I really, really care about, maybe a binder if I include all the photos and video.
But what's a basic Saturday-afternoon copy-and-burn job for an individual is a Sisyphean task for a large government agency or library, particularly one who is constantly generating new content. I've seen places that could barely keep up with archiving the stuff they were producing, much less roll their vast archives forward onto new media. So they'd have vaults of hard drives, sitting next to DLT cassettes, next to IBM 3480, next to racks of old half-inch open-reel tapes. Probably back in some dark corner there were piles of punched cards; it really wouldn't surprise me. The problem of data loss due to unreadable formats isn't some abstract 'maybe,' it's already happened in a lot of places (but nobody really wants to talk about it, so it mostly gets buried and whatever's on the tapes gets written off).
The reason why there's so much interest in preservable formats is because while it may not be strictly impossible to constantly roll old backups and archives forward, it's very hard, and requires vast amounts of effort and expense. If you have a backup that's being written into a format that you know is going to be readable for a long time, even if it's more expensive to write initially, you can save a lot of money and time down the road by not having to copy it forward as often.
People may get a little shrill when they're talking about these issues, but they're quite real.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
I know I'm offtopic, injecting facts into this debate, but I thought it might be interesting to bring up the VXA tape format. It allegedly survives all kinds of abuse like freezing, see Freezing Test
I have never tried these drives, and would love to hear from someone independent who has.
Chappies in New Brunswick:
From an earlier /. article:
Quick someone tell the author of: 'So You've Lost a $38 Billion File' that everything is alright! New Brunswick had data that was submerged in water, tape so swollen it was off the reel; they still managed to recover it.
And don't come out with that: 'Polar Bear ate the backup tape' excuse again!
I'm going to transform myself into a mighty hawk. Either that or I'll just go and work at Dixons, haven't decided yet.
Just because a bit or a million bits of a CD or DVD is unreadable does not necessarily make the entire contents unreadable. CDs broken in half can be taped or even glued back together, and with a little patience most of the data can be recovered. Avoid this situation :-|.
Sometimes I've not been able to recover disks that have been damaged beyond a certain point. But I've never lost a CD because it got wet, or had one become unplayable because it warped. I keep backup tapes in a water-resistant container (or in a bank vault).
And with digital media, as others have noted, I'm not limited to one archival copy.
DRM is a red herring, as encrypting archive copies of sensitive material is a feature of digital media, not a flaw. DRM is only tangentially related to media stability, since any encryption you would use to protect archives would be have high fault tolerance and recovery. And if you've done your job and made duplicates of anything you'd get fired for losing, having an encrypted backup be damaged is no different than having an unencrypted one damaged. Either way, DRM on backups doesn't matter for recoverability.
sigs, as if you care.
Two words:
Magneto Optical.
'nuff said.
I have not lost my mind... it's backed up on disk somewhere!
Only cuneiform tablets have truly stood the test of time. Even printed paper can't match the 5 millenia of a solid piece of dried clay.
Given one hour to live, the student replied: "I'd spend it with professor FP who can make an hour seem like a lifetime."
Reed-Soloman is designed to correct for error bursts - eg. scratches. That's why it is ideal for CDs and DVDs.
But it can not compensate well at all for even medium amounts of random bit errors. These are the exact kinds of errors that occur on CD and DVD media over time as it degrades. That is what is being referred to here.
If you have a piece of analogue data, and it degrades, you can still get enough meaning from the original to make it worth archiving. A piece of digital data with even a relatively small amount of random bits transposed could be totally corrupt, especially if it is in a compressed format.
Audio recordings, especially voice recordings, are so full of redundancies that you could lose up to half of the recording, and still have recoverable audio. If you had that redundancy spread over several CDs (something RAIDlike) you could recover your data.
When our name is on the back of your car, we're behind you all the way!
It seems to me that one answer is to increase the reliability of the way we store information on digital media so that it is better able to handle corruption and loss.
0 1239
For instance Reed-Solomon codes or Tornado codes can be used to break data up so that you can use a subset of the pieces to reconstruct the original signal. After chunking things up into small enough pieces that these codes are practical to apply, you can scatter the chunks across the disk or across multiple disks. This general sort of thing almost made it into Blu-Ray, but I guess in the rush to cram DRM down our throats the reliability of information was low on the list of priorities. http://www.truedisc.com/ http://ask.slashdot.org/article.pl?sid=07/03/08/0
Sanity is a sandbox. I prefer the swings.
Sure, stick a tape in water and it might still maintain enough data for recovery. But how about running it past a magnet? Oops, there goes your data, and nowadays there are a lot of things that generate magnet fields which - while they might not be enough to completely fry all you data - do a good job of scrambling magnetic media.
Now I'm not going to suggest that a box of DVD's bought at Walmart in a 50/$25 pack is a good replacement for tapes, but in cases with proper handling and storage, optical-based media can be in fact quite reliable. There are still some steps that should be followed for any media, however:
a) Checksums. Write 'em if you can.
b) As with (a), verify them every now and then, and also do a full "test restore" once and awhile for good measure
c) Transportation: Use care, if you have magnetic media beware of anything that might give off a strong enough magnetic field to kill you data. If you have optical media, carefully store it and prevent scratching or abrasion
d) Storage: Use the same as above. Keep archives stored in a safe place - preferably OFF SITE, with friendly environmental conditions. An on-site storage facility will not help you if your building burns down. Storing it under the bed by the radiator in the computer tech's house is a good way to kill backups (such things happen more than one might think). A low-light, proper-humidity, sealed, element-safe storage location should be a good part of any backup plan. If it is on-site... well then get one that's fireproof, waterproof, and make sure that the door is actually closed on the thing at the end of the day otherwise all bets are off (off-site is still better).
e) Safety: Make sure your data storage location is secure. If you're using off-site, make sure that access to the off-site location is limited. Same thing with on-site (though it may be easier to control this). Remember, those backups have all your data, so while it may be safe from disc/tape-rot, fire, and flood... it also has be safe from the guy who's looking to steal your multi-million-dollar account numbers that have been saved in the backup.
Who backs up all their data on CD/DVDs? I don't know of any enterprise who puts their long term backups on CD/DVDs. Everyone still uses tape. It is just in digital format vs. analog format.
And like other posters have pointed out, there are more serious concerns such as DRM and equipment resources.
And again, like other posters have pointed out, you can make perfect digital copies. You cannot do that with analog.
My mom always said, "Jim, you're 1 in a million." Given the current population, there are 7000 of me. God help us all!
We all know our modern plastic society has feeble digital means that won't even last as long as an Egyptian mummy. But... do we want anyone reading this garbage? Marketing reports, superbowl commercials and gummy pop music is all we produce. Let's archive it in double thin DVDs made from recycled trash bags and stored in damp basements.
technical writing / development
Parent post needs another redundant mod and about 4 underrated mods to get (+5, Redundant). Anyone willing to help out?
Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it ;)
(source)
http://ascending.wordpress.com/
This is a dual problem:
1) Digital data needs to be moved about once every 5 years onto a new physical store, disk, whatever. Think of the amount of data sitting around on floppy disks that is being lost as we speak.
2) Data has to be recorded in a way that that presumes whatever software you use to create it will not exist in the future. Anyone who saved their life's work in some ancient binary word processor file will know what I mean. For most computer-based data storage that requires data be stored somewhere in plain text, and using as open a format of 'markup' as possible, if any.
In effect, from a historical/archival point of view, data does not exist unless it is kept in at least two places at all times, and unless whatever bit of software you use to create it can also save it in a non-binary format of some sort for access for future generations who don't have a copy of your software.
Ok, that does not pertain to sound recordings or images, but even then some sort of 'permanent' standard is essential for all data.
I used to work with medieval documents written on vellum - sheep skin. The original Domesday book was written on vellum, and is as readable today as it was in 1150. (It also doesn't need a power supply to work!) Meanwhile the digital 'Domesday' Laser Disk made in the early 80s in the UK had to be saved from oblivion a few years ago (with a great deal of work) because the computers and hardware that it was created to work with were utterly obselete. Fortunately, and unusually, someone realised the problem before it was too late.
The solution to this may be as simple as saying "Don't store data on offline mediums at all".
By offline mediums I mean things like tape, optical disk, or even a solitary hard disk.
Instead, I see the future of reliable data storage as vast networks/clusters of shared storage with built-in redundancy. Look at how Google tackles their data storage needs. They have tens of thousands of highly unreliable machines, but they use them in such a way that they can store a large amount of data in a highly reliable manner.
If all "important" data were kept in such large distributed (geographically as well) or global/universal networks, there would be no need for offline storage. There'd be no need to back data up, because all data would be stored reliably enough anyhow. If you break into a Google datacenter with a shotgun and start shooting hard-drives, you're not going to cause any data loss.
By the same token, any offline backup can be physically destroyed. If you've backed up your data in an offline digital (or analog) form, those objects can be destroyed. With redundant storage on a cluster-level, the loss of any individual object isn't important.
I'm probably not conveying the idea very well, but my point is that if all information is stored in some sort of global storage network with inherent reliability (even if the actual storage is unreliable), then you're better off than offline backups and have the added advantage of all data being accessible all the time.
This must be why we can't find any record or trace of the aliens that visited ancient civilizations (Egyptians, Mayans, etc.) here on earth all those many thousands of years ago.
Write your data down on paper, fools!
The power of digital backup isn't creating one indestructable copy... The power is creating massive amounts of redunant copies. For the cost of one high quality tape, I could create 50 copies of a CD, and mail them to 50 different locations.
The tragedy of what was done to the Native Americans isn't that Europeans came in and conquered them. It's the way they were treated afterwards. I don't think anyone can read about the Trail of Tears and not feel something. You can't confuse war with murder. There is a difference.
That being said. What's done is done. It should be remembered so we learn from those horrible mistakes. It shouldn't be a constant source of guilt to be used against people that had no part in it. The same goes for slavery, genocide and all the other ignorant suffering we've inflicted on each other.
Now, let's face it; thanks to digital media we've actually made a huge step forwards in many respects. In fact, in our times the (possibilities of) digital data is on the same level of the first writings...well...at least as important as the printing press (http://en.wikipedia.org/wiki/Johannes_Gutenberg#P rinting_press). There is an enormous amount of data and knowledge available from all sort of sources, which would otherwise perhaps disappear or gather dust in some closed room of a university (http://www.verbumvanum.org/), new forms of collaborating knowledge-gathering is possible, as seen with the wikipedia-project, dissimination (copies) can happen on a lot of places and people all over the world can access it almost instantly; something not seen before on this scale.
e arch_avenues) This has already been a huge problem, especially for organisations that want to archive data and knowledge in the long term. It's not the first time data gets lost because nobody knows how to convert it properly anymore.
However, it has also some drawbacks:
- It's not intrinsically readable. Meaning, you can not, as a human, just understand what it means, without the help of additional tools. to a certain degree, this is always true, but, contrary to books, who can be read if you have the book itself and enough light to read, digital media is, in a sense, much more delicate. You need a lot of additional tools; electricity, a computer, the right application to run it, etc. And even then you aren't sure: data may be stored on some former format, which has long since become obsolete. (http://en.wikipedia.org/wiki/Pioneer_anomaly#Res
- The inherent weak physical carriers of digital data. This has only become worse, since research has shown by now, that half of the recorded data on normal consumer (blanket) CD/DVD's have lost 80% of their data within 5 years. In this respect, the new media is a disaster for long term storage, and libraries or organisation who wish to hold on to their data are obliged to constantly upgrade and transport their data on newer hardware. But even then, it's a losing battle.
The only way to stop this is creating a format (open standard) which will not change (or at least, remains backward compatible) and using storage-hardware which endures time (in that respect, I remember an optical storage technique talked about in a FA on slashdot which was based on rubies or diamonts being used (and 'burning' transparent corridors with lasers). It had a very high data-density and a minimum lifetime of 20.000 years. That would help for long-term storage! )
--- "To pee or not to pee, that is the question." ---
Every new technology for storing data seems to be worse than what was used before. Carving/etching stone or clay tablets is the oldest data storage technology. We can still read them today. Later, people started using biodegradable material as a base for pigment based data. Those are still around, but are mostly in bad shape. Eventually, this material stabilized after a few thousand years. People began writing with natural inks. These are still readable. Graphite and lead pencils came along later. They have a tendency to rub off, but left alone, they are still readable after 100+ years. Today's chemically manufactured inks are of poor quality compared to pencils. They tend to fade after little more than a decade, even if stored in a dark place.
Modern electronic data storage is even worse. The most notable thing about it is that the data requires electricity to read. What happens if there is a huge war that destroys the power generation capabilities. Once the batteries run out, the data is lost. What if the data that is lost is the information telling how to build a power plant or charge a battery?
Usually any mention of stone-based storage is a joke, but someone needs to take it seriously. We need to develop a technology with the longevity of stone tablets. Only then can we be sure that people 2000 years from now will still be able to access our data. We also need to get rid of the DRM. If the data has to check with a server to make sure it is authorized to play, I doubt it will work in 2000 years. The servers can't possibly last that long.
Whats better? backing things up on expensive tapes or to fairly cheap removable USB2/Firwire HD's.. The only differance I see is that it's harder to put a External HD in your pocket to take home. Isn't is easier to recover Data on a HD for a normal person then recovering data on a tape?
Ad eundum quo nemo ante iit!
It's vilified because 80 or 90 million natives died in this tale of conquest (out of 120 million or so). It dwarfs the Holocaust by a factor of 10. The civilian AND military casualties (for all countries!) of WWII were 63 million combined.
Print it out in binary, laminate the pages, and lock them bitches in a fireproof safe.
Sure, restores are a bitch, but were talking mega-backups. The last line of recovery... so you know, we don't forget about human history, because there is only 1 copy of that...
Years ago I had 1gb of mp3's, that disk is long dead. but it was backed up, and when that disk died (very short after) I knew I could recover from the backup, now that 1gb is in amongst the 90gb a few generations of discs down the line in a raid array, when a disk dies, i just buy another one.
What they are infact saying, is if you don't give a XXXX about your data and leave it there, forgotten (something that is common in media companies eg BBC, that tape) you might not be able to re-discover it at a later date. The pay off is that you can take all that old data and it in essence becomes cheaper and cheaper to store as time goes on as disks die and you purchase new bigger disks etc.
AVENGERS ASSEMBLE! We must stop the High Evolutionary from changing the records!
We have always been at war with Eurasia!
It's the most insightful post I've read on slashdot in ages!
Wow -- well, that's the coolest thing I've seen today. And what's more, it's been going on since for close to 10 years now (with only five mentions on Slashdot, including one article back in 2000).
My first questions are whether the P2P scheme it uses for replicating and repairing data is centralized (relies on a server somewhere to track all the nodes and make them aware of each other) or decentralized; because that seems like a potential SPOF. But even if it is, it's still a great project. It's obnoxious that it has to go through so many hoops (getting permission from the journal publishers) before it can start archiving journals, since libraries don't need such permission to archive things in non-digital formats, but a hobbled system now is better than a great system never.
I think the same idea could easily be extended and combined with other concepts, like darknets, to make them more robust and survivable; I'm thinking specifically about repositories of information in places where the authorities may be hostile to uncontrolled press. I could imagine a group of people setting up a network of self-replicating servers in the same way that previous resistance organizations might have set up underground newspapers; with storage so cheap, each node could contain all the information (suitably encrypted and obfuscated) in the network to maximize survivability. If you had any idea that you'd been compromised, you'd just hose your node and not worry about actually destroying anything valuable.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Short of running special recovery algorithms, the solution is to make the digital record to require much higher degree of change to affect the meaning then just changing 0 to 1. RAID 5+ is a good example of this principal in HDD applications. Since data on the CDs are most likely is not to be re-written, having multiple copies is the obvious solution.
IMHO, another method to bring digital storage to be as withstanding as analog could be to accept a bit not being just one dot burned on CD, but group of for example 11 dots, each having the same value (0 or 1). In case of damage, you'd just accept the value of the bit to be the value of the largest recovered group (at least 6 out of 11).
About 6 years ago our home burned down. It was a complete loss. Once we were able to pick through what remained I came across some jewel cases containing some backup data. These cases were next to some cassette tapes. The jewel cases had warped considerably but many of the CDs inside were still flat and usable. The cassette tapes were warped as well but the tape inside looked like it had shriveled from the heat. What ever the type of plastic the CDs were made from withstood heat the cassettes could not.
:)
Just tossing this out there. The topic made me remember the pleasure of finding some stuff in tact.
Diamond. We can make it much more cheaply now, and if it's scaled up to meet demand, then it'll be cheaper still. Etch your digital data into diamond, and then tell me you've got a problem with the longevity of the medium.
The European invasion of North America hardly constitutes a genocide. The sole purpose was not to eradicate a race, but to destroy the fabric of the culture and remove them from the land. I do believe I have friends that have some native american ancestry... The only difference is that it happened in a modern era, and the conquered people were allowed to retain some continuity. People act as if the inhuman treatment that befell the natives was in some way out of the ordinary for human nature. You can not compare the destruction of the Native Americans to Rome conquering Greece. Greece was a well developed empire that fell to another and was absorbed. There was technology and racial similarities that promoted integration. By comparison, the native people of North America had no such technology, literature, and had no relationship with the Europeans. In the beginning people negotiated, but the problem is that negotiations are a farce, and they only matter if neither side has an advantage. In the case of the Native Americans, they never really had a choice, and the some of them knew it. They had absolutely no chance against European powers simply because of the lacking of technology and cultural cohesion. One thing that people forget is that the idea of a superior people has been around forever and still continues. It is part of the human psyche and almost every major religion in the world. Don't think of it so much as a racial superiority, but rather religious. This is very much what is going on in the middle east and why they can't have peace. The religions of the region believe they are chosen to possess the holy land, and they can't let the sub humans have it. This has happened throughout all of history to ever race in the world (even among the same peoples)... just this one was more well documented.
This is a HUGE problem, that, seemingly no one cares about.
.pdf, oh-wait, it was on that DVD that .... nevermind.
A generation of pictures, information and general "stuff" becomes unrecoverable, worthless.
I stills shoot film for important subjects just for this reason (I'm so smart/broke - huh?)
I believe Hollywood had this problem with "nitrate" film, (most of that era's film is now dust) that's how we got "safety' film
I read something a while back about storage on crystals, I archived the info, a
- maybe it was this.
Holographic Storage
http://physicsweb.org/articles/world/13/7/7
http://www.mobilemag.com/content/100/102/C5313/
Internet Archive
http://www.archive.org/index.php
~hylas
This is pointless nostalgia. In order to store same amount of info on tape as a cd rom / dvd, you'd need close to the capacity of a warehouse. A cd/dvd is easily copied, imagine the logistics invloved in doing the same operation with your tape reels. Unstable?, sure digital media suffers from corruption (very very occasionally) but there exist protocols for encoding the data which make it resistant to corruption and even if there weren't, simply copying the data and storing it in mutitple servers around the world would solve the prolem
prepare the survey weasels.
That's the first real main stream didgital format and it has not yet reached the point where we have to move data off of it for fear of it being unreadable.
... say again?
Erm
There are terabytes (quite literally tons) of data sitting around on everything from old 7- and 9-track 1/2" open reel tape, to old 8" and 5-1/4" floppies, and other formats that are basically dead. [I'm not familiar with anything older than that, but I'm sure there are some real greybeards around that could enlighten you as to what came after punchcards but before the vac-column tape drives.] The only saving grace of those formats is that if you can find a reader, there's a chance it might either still work, or could be made to work, if you could find a compatible computer to interface it to (because the machines themselves were built pretty well; they were still viewed as industrial equipment of a sort, rather than consumer electronics). But the expense of doing that would be enormous -- the people who know how to maintain, and increasingly to operate, those things are retiring and becoming hard to find.
And analog formats aren't exactly immune, either. Where I used to work, we had several boxes of old video recordings on 2" quad that we were storing for preservation purposes, but couldn't afford to have transferred to another medium (despite the obvious: that the longer you wait, the more expensive it's going to get if you ever do really want it). That format was used for over 20 years; there's got to be thousands of hours of it sitting around.
Even if you define 'mainstream' to be something that an average person could afford, CDs certainly weren't first; lots of people had PCs with various types of digital storage.
But to only focus on 'mainstream' formats misses the point entirely. Stuff that's been distributed out to millions of people isn't what's at risk of disappearing; it's the original source material (think NASA's Apollo videos), or information that's naturally stored in big 'silos' (think public records) that's really at risk, and those have been stored in a plethora of formats, digital and analog, over the past 50-75 years, which are difficult to work with today.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
You can not compare the destruction of the Native Americans to Rome conquering Greece. Greece was a well developed empire that fell to another and was absorbed. There was technology and racial similarities that promoted integration.
That argument falls down as soon as you realise that the Romans didn't massacre the technically and culturally inferior Celtic tribes either and the Roman religion at the time wasn't a love thy neighbour turn the other cheek religion either unlike the religion of the first US settlers.
This is BS. So you said that the CD has a higher undistorted bandwidth than tapes, but went on to say tapes are better because if one encoded something greater than 22 kHz it would alias on a CD. The point is you don't encode anything at a higher frequency than 22 kHz because people cannot hear it (which is probably why tapes were designed to support an undistorted response to 18 kHz). You seem to indicate that aliasing is a phenomenon that gradually gets worse, saying that at 18 kHz a CD is horribly aliased, but that isn't the case. A signal at 18 kHz will not be aliased when sampled at 44 kHz. Furthermore, I think many times sampling for recordings is done at 88 kHz and then later down sampled to 44 kHz so that the antialiasing filters can have a very sharp response at 22 kHz, meaning you can pretty much accurately reproduce anything up to the nyquist frequency. If I digitally sampled my music with a 24 bit A-to-D at 88 kHz, would that make you happier? Or would you still consider the tape to be better? After all, with that I could only encode signals up to 44 kHz, I'm sure way to low of a frequency for the audiophiles ear.
The European invasion of North America hardly constitutes a genocide.
That's a pretty fine hair to split between genocide and ethnic cleansing. What is the real difference between successful ethnic cleansing and unsuccessful genocide?
I do believe I have friends that have some native american ancestry...
All this means is that the genocide was not complete.
The Nazis attempted a genocide against the Jews but did not complete the job. If they had started out simply with a mission of ethnic cleansing and achieved the same result would it have been a better thing?
As much as I like the convenience of digital recording (random access especially), I can see where they're coming from. Especially from a consumer electronics standpoint.
Our one, and so far only, experience with our DVD recorder (the TV/Video kind) illustrates why we haven't gotten rid of our VHS tapes yet.
Least steps to record onto a new VHS:
1) pop tape in
2) press record
Least steps to record onto new DVD (-RW in our case):
1) pop DVD in
2) wait 10 seconds before format options come up
3) wait 1 min for format to finish
4) select recording option (quality setting, etc)
5) begin recording
At the end of an hour-long show, I finally hit "stop" on the DVD recorder. In earlier, shorter tests it took about 30 seconds to write out the information for that hour. This time, it failed for some reason.
End result: the whole hour of recording was lost.
All the other nice features that would've come with recording to DVD were flushed right down the drain, for the simple reason the damn thing can't even guarantee that what I recorded would, in the end, actually be available to play back!
Insightful?? Please ..
As a matter of fact, most of the native American population was killed off by disease, not warfare. So blaming them for not fighting harder seems more than a bit harsh.
Etch your digital data into diamond, and then tell me you've got a problem with the longevity of the medium.
regardless of quotes, diamonds are not forever.
upon the advice of my lawyer, i have no sig at this time
Wasn't the Roman way of conquest more like "control and tax"? If you kill someone, he can't pay you tribute every year.
Note, he said CD's. Red Book Audio uses less ECC, and uses pop-smoothing recovery that averages. I always find problems with audio CDs. Then there's Green Book for data/multimedia which is less reliable too, and I think most software uses Green Book.
However, Yellow Book CD-ROM is the most reliable. Maybe his CD burning software is using Green Book for multi media, with less ECC. Make sure your software is burning Mode 1, Yellow Book CD-ROM. I remember CDRWIN will show the tracks as yellow or green, mode1 or 2.
Unlike the thin exposed top layer on CDR's, DVD+R's are much better, as the data layer is sandwich on both sides by a thick plastic, and probably uses better ECC, but I've not researched DVD ECC.
I'd conclude DVD R's are the way to go right now. With a plextor drive, to read through scratches better.
regardless of quotes, diamonds are not forever.
Is there another material we can make that lasts longer and is ridiculously strong? Use that, then.
"If a CD had one-tenth of one per cent of the damage on one of those reels, it wouldn't play, period."
Wrong. What's going on here is that restoration techniques for audio tape is very low tech and easy but the author does not have the skill knowledge and resource to recover a defective CD. Not that it can't be done. Likely all that would be required is to copy the readable parts back to a hard disk and patch up the file system by hand then burn a new disk. People say the same with new cars, that "no one" can fix them like they could back in the the 1950s or 60's when they were much simpler.
There is a difference between being "inherently unfixable" and simply not knowing how to fix it. But I agree, the effect the same.
One thing in Digital's favor is that it is cheap and easy to copy and so there is likely to be backup copies. How many people backed up audio tapes of film negatives? these were frequently destroyed by fire but with digital there is a posibility that a backup copy was kept off site.
So what one should look at is the probability of survival of the data over some span of years not the probability of survival of one copy of the media
Everything is analog when you get right down to it. It's the bit-packing that makes it tough.
Storing digital data at many megabytes per area is really just high-frequency information. That probably was not recovered well from the soaked audio tape mentioned either.
Air and temperature are big problems. Put your media in a low-oxygen environment (read: slight vacuum) and keep it cool and dry (but not _too_ dry) and you'll have little problems. Keeping tapes (digital backup or otherwise) or CDs in a shoe box in your damp basement isn't the best way to archive this sensitive media.
Hire a public servant to read the binary content of the digital archives to cassette tapes. "One, zero, zero, one, one, ..., please turn over cassette # 100393836737 and proceed".
Proven to be flood proof.
Diamond is both unstable (it spontaneously decays to graphite) and flammable.
Precisely. The Romans ruled a very pluralistic empire and had no desire to kill off anyone (well, leaving aside Carthage, of course). They didn't even really want your land, either. They wanted political control to a degree, and revenue -- if you did as they asked and gave them a piece of the action, they were happy.
Incidentally, Rome came to the defense of the Greeks several times before simply annexing Greece. They didn't consider themselves conquerors, they considered themselves the protectors of Greece. A euphemism? Perhaps. But it's how they saw themselves, even if it wasn't entirely true.
"Convictions are more dangerous enemies of truth than lies."
Diamond is both unstable (it spontaneously decays to graphite) and flammable.
Really? Wow - I'd never heard either of those things.
I found a reference to the melting point of diamond as 3820 degrees Kelvin. I think that should do.
And spontaneous decay to graphite? Under what conditions?
I think what the original poster meant to say was, "With digital you either get a perfect copy, or a corrupt copy. With analog you always get a corrupt copy."
Digital content isn't unstable, it's just more sensitive to corruption because in general software expects to be able to extract a perfect copy every time, rather than a near-perfect copy. Whether you can recover partially corrupted digital data depends on several things:
A) Choice of filesystem (journaling, error correction, built-in redundancy)
B) Choice of media (CD/DVD bad unless multiple copies you have, hmm?)
C) Choice of physical storage method and location (store CD/DVD out of sunlight, vertical in jewel case)
D) Choice of archival file formats (PAR2, anyone?)
E) Choice of hardware (some hardware is more robust)
F) Choice of software used to read the media (most software gives up too easily)
The cure:
1. Use the right media (with phsyical redundancy measures to counter physical damage).
2. Use a robust filesystem (preferably with error correction and redundancy measures also to counter minor physical damage).
3. Use a robust file format specifically designed for archiving data (again with built-in redundancy measures and compartmentalized structure that can work around partial corruption).
4. Use hardware that has a high tolerance for physical or digital media corruption.
5. Use software specifically designed to keep trying to extract data even after encountering partial corruption (like Unstoppable Copier).
All that being said, if you were to say that digital media, file formats, filesystems, hardware and software are too fragile, I would have to agree. There is far too little fault tolerance and redundancy built into digital storage media, hardware, software, filesystems and file formats. A lot could definitely be improved for the future. But calling most digital content unstable because a CD got scratched is disingenuous at best.
How about aggregated diamond nanorods or if you want something that won't burn in a fire (if you can afford it, take a torch to a diamond and watch the light show), try Borazon.
Lose: misplace or fail || Loose: not bound together
Now you've got something to read your old format for another 8-10 years.
Next upgrade you will deal with the VM, not with your data. So it is constant time for any volume of old documents.
How about scribing everything an inch deep into titanium slabs?
I drank what? -- Socrates
The solution is simple for non-confidential data: If the data were of no interest to anyone then we wouldn't need to archive them. Since they are archived, there must be at least 1 person on this planet who is interested in these data (the archiver), but such uniqueness is rare in humans (we enjoy to mimmick each other, including each other's interests), and therefore if there is 1 person who is interested in something there must be some more people who share the same interest. Why should the archiver spend so much effort in archiving their data if there are others who are interested in them, too? Let's share the effort among all interested persons by using a peer-to-peer system. With multiple copies of every bit of the archived information spanned across thousands of hard disks, the information will be much safer than a bunch of tapes at a library.
Why not store digital data on microfiche at a high resolution, but not so high that it can't be scanned in later? Or why not even on paper as little black dots? From what I understand, you can get multiple megabytes of data on an 8.5x11" sheet of paper. Either of these would be exceptionally stable, and probably exist in some commercial form already.
Microfilm can last 100-500 years. How about this: convert data to UUE/YEnc, print that out, and put that on microfilm. To retrieve it, print it out, have a computer scan that, and use some sort of OCR software to rebuild the file. Would that work?
with preserving digital data, is it really the medium we should be using for archival purposes?"
Of course not. Vinyl is still the only way to go, baby. It has proven to be extremely stable.
What?
Only on Slashdot will a question about digital media robustness turn into a discussion about the plight of 19th Century Native Americans. I wanted to read people's views on technology! I guess I came to the wrong place.
Part of the hardcore faithful who believed in Apple long before it was cool again to do so
Diamond reverts to graphite quickly only at high temperatures and low pressures.
From examination of graphite nodules, one can clearly discern facets of the
sort that hexagonal carbon (graphite) does not possess; the nodules are clearly
the result of cubic carbon (diamond) having changed state quickly, probably as
a diamond mass was ejected in a lava stream from deep in the Earth to
the surface.
From the size of the graphite nodules, it is clear that the Earth's mantle houses
diamond crystals in the 1 meter diameter range. Only the most violent eruptions
result in cooling rapid enough to retain the cubic crystal structure. The violent
eruption does reduce the particle size somewhat...
See Robert Hazen's book _The Diamond Makers_ for more info.
Criminy, could the moderation on this site get any worse?
Could somebody PLEASE explain to me how the genocide of native americans relates to the problems of preserving digital data? Somebody?
Comment of the year
It's about immediate-term access. As alaska found out. having a paper copy sitting in storage can be quite a boon when the digital copy is trashed by a typo.
Free Software: Like love, it grows best when given away.
Agreed, but while Reed Solomon is well suited for small scratches on a CD or small losses of streamed data, it is rather inefficient in case of massive losses that can destroy a whole (contiguous) block of the data, as it operates on small blocks, and so its range of effect is limited. Fortunately there exist very robust FEC codes that can protect larger blocks. With Raptor or LDPC codes, one can protect very large blocks of data with redundant codes that yield a high probability of recovery given an overall loss rate.
In the case of critical data storage, I'd advocate in favor of large block codes such as Raptor or LDPC with a redundancy of at least 100% (meaning that one can recover the whole data is half of it is destroyed). Note that this need not be done on the physical layer, rather one could FEC-encode files with varying levels of redundancy.
Note that data recovery on damaged physical storage is no different than on unreliable transmission channels such as Wifi, 3G or IP multicast. For information, Raptor codes have been chosen for the DVB-H standard as the preferred FEC scheme for IP datacasting (based on the FLUTE protocol), and LDPC is used in DVB-S2.
For those interested by LDPC, there exists an excellent LGPL'd library by the french INRIA here (info) and here (download), and best of all it's a patent-free technology. As for Raptor, it is unfortunately proprietary and patented by Digital Fountain, but you can however find a lot of enlightening info on their Web site (worth a read if you're interested in FEC technologies).
In the first place, CD-R isn't suitable for archiving purposes, the organic lacquer on the back which covers the dye layer is subject to all sorts of environmental attack. DVD+R (preferred to DVD-R) has layers of plastic on top and bottom. And Netflix's business model is based on the relative invulnerability of DVDs to environmental attack. I'd expect a decent quality burned DVD to handle a submerging just fine. I am far more concerned with dye breakdown over time.
In the second place, I don't think this guy understands C/DVD formatting. While it might take special software to get it to load if severely damaged, any track on which the bits storing the digitized data haven't been physically damaged should be just fine. Compressed file volumes might be a problem, I use dar for archiving, in which each file is sufficiently separated that if the data on an adjacent file is FUBAR, non-corrupted files should be just fine.
Tech Public Policy stuff
Your data is probably in a lot more danger from physical damage than from breakdown of the dyes on your DVDs.
If your house gets destroyed through a disaster, it won't matter how stable your copies are.
I keep my backup copies on the other side of the continent. . . any disaster that takes out both sets of copies will probably be massive enough that it probably took me out with it.
Tech Public Policy stuff
I've had two occasions where I've needed to recover a tape backup. The first time, a software glitch produced a premature EOF... customer service told me that this was a known problem with the backup software and since the company didn't bother licensing the bug-fixed version, I should go to the software vendor site and download a time-limited demo version. The second time, it just plain failed... and I was several thousand miles from the originals.
Just then, DVD recorders finally dropped to the affordable point, I switched and never looked back.
Tech Public Policy stuff
Why is a cd/DVD so vulnerable?
Is this a built-in limitation of the filesystem used?
Is there a filesystem which writes redundant data and checksums to cd/DVD?
If I write a file of one tenth the media capacity, the room is there for
multiple copies of the file, as well as checksum info to insure that the a "known good" copy is read back, even if the media is damaged quite severely.
If this was done at the FS level, what would the penalties be?
Further, what if this idea was combined with compression?
If the media capacity is known, an image could be constructed to provide
as many extra copies as could fit.
Lack of oxygen makes you die! They needed a study to find this out? Why are so many research articles about things that are patently obvious to even the least trained in the applicable field? That's it, I quit. I'm going in to the study business. My next research is on how the sun is causing plants to grow. This is breakthrough research, folks!
It just leaves the question. How many gigabytes of data can you fit on a 500 sheet ream of paper?
If you want to store data permanently, this is just the ticket...
http://www.norsam.com/hdrosetta.htm
This is asking the wrong question. "Digital" is not a medium. A CD is a medium, as is a hard disk, a tape, a memory card, etc. The important thing about digital data -- the way in which it is least like analog data -- is precisely that it can be transferred from medium to medium without degradation. Thus it doesn't matter that your hard drive will fail within ten years, because by then the data on it will be replicated in lots of places.
These days, it makes more sense to think of digital data's medium as "The Cloud". The Cloud is all those servers over at Flickr and Google and YouTube and Yahoo and Archive.org and wherever else your bits go, plus your hard drive, your USB memory stick, your camera's flash card, and a zillion other locations. Once data enters The Cloud, it never leaves. (Yes, this is an idealization, but it is becoming more true every day; it is pretty clearly where we're headed.) It doesn't matter if one individual component of The Cloud goes down; think of it like RAID-Infinity storage. The Cloud may not be more than the sum of its parts, but it has a *lot* of parts.
http://www.red-bean.com/kfogel