Copyrights and CD-Rs Endanger Audio History
SEWilco writes "A study by the Library of Congress has found that many audio recordings are being lost due to copyright restrictions and temporary media. Old audio recordings are protected by a various US state copyrights, so it's hard for preservationists to get and copy material. Recent data is threatened by being put on writable CDs, because CD-Rs begin to lose data after a few years, so recordings from as recently as 9/11 and the 2008 elections are already at risk."
Actually, it's 11 months in the future, September, 2011.
The Library of Congress used to have a goal of including complete hard copies, at least for items of US origin and 'good grade' (that is, they aimed to have copies of things such as hardback books that were intended to last, more than, say, ephemera such as the pulp magazines). However, that goal has become an obvious impossibility due to sheer volume. After about 1960, the library began being more selective.
That's bad enough in some senses, but unfortunately, there's also a secondary effect. Pick a subject you know well, and go to the library, and examine the LOC page at the front of the book for a few dozen volumes of varying ages. That information will tell you if the book has been archived in the LOC, but it will also include other details, such as what topics it is indexed under. For example, a biography of Supreme Court Justice Thurgood Marshall might be indexed more specifically under 'Biographies of Prominent Americans' and not just 'Biography', and it might also be indexed under "Non-fiction', 'Legal Commentary', and "20th Century History". Many of these index terms were developed as a standard system, but that system seems to have more and more glitches with time. In general, you'll see more and more errors, both of accuracy and by simple omission, for the newer books. I don't know if there's any real explanation of why the indexing seems to become worse after the LOC gave up trying to have physical copies of all significant works, but many people think they have noticed a certain 'sloppyness'.
For works such as audio or video recordings, it could be very hard to get any useful information if the same pattern holds. Imagine for example, researching video and 30% of all the westerns aren't indexed as westerns, while some documentary footage about life in the old west has been miss-classified as 'fiction' and 'western'. Then add there was also once a rule that anything shorter than 8 commercial reels was considered a short, but somebody forgot that rule about 1976 and started thinking it was anything under 30 minutes running time. Whatever the subject, problems such as these are likely to crop up.
Who is John Cabal?
Many of these index terms were developed as a standard system, but that system seems to have more and more glitches with time
It's called ontology drift. It was a big problem for the cyc project. They started entering all human knowledge, and after 20 years found that they were entering the same stuff again because the index terms had changed over time. A large amount of semantic web and AI research is devoted to combatting this problem.
I am TheRaven on Soylent News
We will be a mystery to archaeologists of the future.
No we won't, and I'm tired of hearing this trite assertion repeated as a truism. This is one of those things that has become a meme because it sounds plausible, but under analysis it's flawed because it (a) disregards the massive proliferation of digital data and (b) misapplies digital fragility.
To start off with, most artifacts and information from previous cultures have likely perished too. On top of this we're producing a staggering amount of information- or at least data- in general compared to previous generations.
It's true that any given piece of data stored on a given digital medium is arguably at higher risk of being lost. But this disregards the fact that there may easily be multiple copies of that information stored elsewhere.
However, the primary flaw is that it focuses on the fragility of any *specific* piece of digital information, e.g. that photo of your dog in a funny hat you have stored on a mouldering old CD-R is at serious risk of being lost forever. While that's true, it doesn't apply to this situation, because our future archaeologists or historians probably won't require specific pieces of information to have a decent idea of our culture- they'll merely require an adequately large arbitrary selection of such data to get a decent picture of who we were.
And because there's so much data out there, we could probably lose 99.999% of the stuff at random and it'd still probably be far easier to reconstruct our culture than those that have gone before.
So yeah, if one is worried about a particular hilarious photo of their dog, or any given film, or whatever... digital fragility is an issue. But using it to asssert that our culture is going to become a digital "black hole" to future generations is fundamentally flawed.
We will not disappear from history- at least not for those reasons.
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
Short of carved writing on stone tablets (eg, the Behistun monument), the longest-lasting medium I can think of is printed paper. Libraries know how to archive it: it's called a book.
There are ways to take digital files and convert them to bitmaps (eg www.ollydbg.de/paperbak). You can print the bitmaps, and read them back reliably with a scanner. About 500K can fit on one page of paper, so a one-hour MP3 recording (about 60MB) would take up 30 sheets of paper. If printed on acid-free stock, this should last for centuries. The pages could be bound in a book, whose introduction would describe the encoding, and provide an algorithm to extract the data.
Why rely on currently-fashionable media like the chemical dyes in a CD-R when good old reliable natural-fiber materials like paper are known to last centuries?
Alejo Hausner