Afterlife Will Be Costly For Digital Films
Andy Updegrove writes "For a few years now we've been reading about the urgency of adopting open document formats to preserve written records. Now, a 74-page report from the Academy of Motion Picture Arts and Sciences warns that digital films are as vulnerable to loss as digitized documents, but vastly more expensive to preserve — as much as $208,569 per year. The reasons are the same for video as for documents: magnetic media degrade quickly, and formats continue to be created and abandoned. If this sounds familiar and worrisome, it should. We are rushing pell-mell into a future where we only focus on the exciting benefits of new technologies without considering the qualities of older technologies that are equally important — such as ease of preservation — that may be lost or fatally compromised when we migrate to a new whiz-bang technology." Here's a registration-free link for the NYTimes article cited in Andy's post.
Because ultimately the digital storage is just a bunch of brittle plastic ( dvd ) and non permanent ferrous spots on metallic plates. Really its all the same thing, just now you have to also contend with a faster 'obsolesce' of your medium due to technologies lack of a long term memory and no respect for 'yesterdays' history.
---- Booth was a patriot ----
If they want to permanently archive digital media, why not just keep the DVD glass masters around? They shouldn't degrade like plastic, and if carefully packaged it seems that they could last for millenia. If a special reader were developed that could optically scan the glass surface without the need for a rot-prone metal layer, then the information could be retrieved without having to risk damaging the master by making a new pressing.
analog also decays. The difference is that it is easier to pull SOMETHING out of it as it decays. The downfall of analog is that it is is MUCH more expensive to protect.
Back in 90/91, I worked for a company that did burning of CDs and Laserdisc (compressed data for the DOD). The CDs cost something like 5 or 10 each, and the laserdiscs were a couple of hundred each. IIRC, These were based on gold, and would last something like 50 or 100 years without losing a single pixel. I would guess that hollywood could easily afford these.
I prefer the "u" in honour as it seems to be missing these days.
How about printing a few copies of a binary bar-code record in big books of archival quality paper for terms of a few centuries? Or how about blowing the bit pattern into any other format with some longevity on some nice passive substrate like a non-flowing glass if you'd like to keep them for a few millennia? Two hundred plus grand a year per film to maintain, my aching ass. Give me two million bucks - the supposed cost to archive just ten films - and I *guarantee or your money back* that I can design (and build a prototype) archive system that will reliably maintain digital films such that they can be recovered many centuries from now with no more "yearly archival cost per film" than a roof over its digital head. Error correction and all. All this story demonstrates is that someone isn't taking proper advantage of the technical community.
I've fallen off your lawn, and I can't get up.
The problem with this type of storage and distribution, is that it strongly favors only what is popular. This is exactly what happens with bittorrent sites like isohunt or pirate bay, and with usenet as well. You'll have no problems finding the latest and greatest blockbuster in HD (until the excitement wanes of course), but try finding some obscure independent film, or a foreign film and you'll be lucky to get a low quality version.
Technicolor dye transfer (imbibition) prints were much less fugitive. Color separations onto black and white film stock (often termed YCM for yellow, cyan, magenta) are much more robust. Production of these separations (and imbitition relief "matrix" films) was intrinsic to the Technicolor printing process (even if the film was shot in conventional tripack negative, then transferred to Technicolor for printing), and films where these intermediates were saved (or where someone presciently thought to have a set of YCMs made), are much safer for the future than anything kept only on color stock.
In the 70s there were some photo places (especially in Los Angeles) that marketed Eastman Color Negative 5247 movie film (short-end remnants from the movie industry) as a cheaper alternative for 35mm color negative still photography, and printed this onto 5283 color print film (same as movie prints) for 35mm slides.
I recently found a few boxes of these that I had shot back then (and stored under entirely careless, or Arrhenius/Murphy if you prefer, conditions). I am not good at evaluating color negatives by eye, but the positives were faded either to mutated colors or to almost nothing.
Even simple technologies can have amazingly short shelf lives under conditions of disuse. I recently turned on my stereo system after close to 3 years of not being used. The amplifier, CD player, and LP turntable all failed to operate. Part of this might have been due to de-formed electrolytic capacitors; these appear to have more-or-less repaired themselves after a couple of hours with the power turned on. Both the CD player and the turntable suffered additional electromechanical problems that required a combination of manual exercise and cleaning to rectify.
None of these devices have anywhere near the scary sophistication of a modern hard disk drive.
Seeing as I cannot remember what I last set my external firewall password to, imagine the additional challenge of future Hollywood being bitten deeply in the butt by present Hollywood's favored time-bombed destined-to-be-lost-art proprietary DRM technologies, with the keys long since dissipated in Hollywood's perennial miasma of mergers, acquisitions, lawsuits, cocaine, and personal vendettas.
The DPX format commonly used for digital post production uses about 35 megabytes *per frame*.
My calculator says a 2 hour movie at 24 frames/sec will have about 175,000 frames.
A few more button presses tell me that's a bit north of 6 terabytes of data.
Let's quadruple that to include all the cut scenes and unused footage, to 25 terabytes.
TB drives are available now for $400 or so each. They use under 10 watts idle.
Building a 30 drive RAID would thus cost $12,000, and require perhaps 500 watts if run constantly, including cooling. Let's bump that to $15,000 to pay for controllers and chassis.
Three such arrays (in case of earthquakes, etc... keep 'em at opposite ends of the continent) would cost an initial $45,000, take up perhaps 7u of rack space, and need 50 kWh per day for all three. At 30 cents per kWh, that's 15 bucks a day, or $5500 per year. Let's double that, assuming those 7u cost you $5500 a year.
So... my numbers, triply redundant, come to an initial investment of $60,000 (profit, hey!), and a yearly cost of $20,000 (more profit!).
How the hell they came up with $208k is beyond me. I'm thinking I should start a company that does this for the studios, it's looking quite lucrative.
Apart from the idea that you would not use tapes I am in complete agreement. I would add they are stuck in a 1985 mindset where the internet does not exist.
It is a pretty simple problem to solve. You set up a smallish data centre on three continents. You install some LTO4 tape libraries and start replicating the data to each over the internet. With LTO4 you are looking at ~600TB per 19" rack, and when you are not accessing the data (most of the time) you are not consuming power. Add in some checksumming and patrol checking of the tapes and problem sorted. In 5,10 years time you migrate to some new tape tech. That involves sticking some more frames in, hooking them up and telling the software to copy the data to the new tapes.
Remember as well this is a high assurance system not a high availability system, so some of the expense of a datacentre can be saved. No need for that diesel generator for example because it does not really matter if you cannot access the data today because of a power cut. What matters is that it is preserved and when the power returns you can access it.
I'm not convinced we need to keep 90+% of youtube or Friends and similar crap for people to watch 100 years from now.
Engineering is the art of compromise.
Your second sentence deserves a +5 Informative; it's the short answer.
;)
[rant ON]
Also, grandparent isn't seeing the big picture, but I'll assume it was a genuine question, as most people Just Don't Need To Know this stuff. How much does a piece of paper cost? Barring external damage and extremes of pH (like newsprint), that piece of paper and the information stored on it (like say, oh, a Constitution) is good to go for a few hundred years, maybe shy of a thousand if it was hand made without chemicals at all.
I need three layers of technology just to spin up and read data from a 4 gigabyte IDE platter drive I bought 8 years ago, and that's just to access 8 year old pr0n!
Back to the topic, seeing as games like Doom3 involved terabytes of data for development, a digital motion picture like Star Wars with 4-5 hours of raw footage and god knows how many terabytes of ILM effects... well I can't really count that high, but RTFA for an idea of the level of complexity "born digital" masters involve. Do you really think they're going to "throw away the source code" and just keep the neat and tidy digital master? How will he make Han shoot first again, huh? Costs triple when you have to deconstruct it first!
Okay, here's an example of how it was in 2000. Now, extrapolate data size and storage size and content creation for eight years... hmm, do you think there's a ceiling? What about the next eight years? Is that a logarithmic curve jumping off my page?
Photography was invented 150 years ago, and we still have the first physical photographs ever taken. They may be boring, but I will personally kick the ass of anyone that says they're not important.
[rant OFF]
Bottom line is, technology and content is changing and growing so fast, we no longer have TIME to decide or realize what of it is actually important, or the BUDGET to actually save it all indefinitely.
No. The trick here is only half archival; the other half - and it's not complex, just apparently not obvious - is that it should take any half-competent tech no more than a day or so to rig up a reader using discrete components of current technology, the task having intentionally made simple. An optical diode, resistors, a transistor, maybe a lens system and an XY table. Not "drives" and metaconstructs like them. This way, the components can be emulated if required (doubtful, but possible) by higher technology. The format needs to be blind-dumb-simple, as does the error correction; row-column EC will allow recovery of single lost datums and is trivial to implement. If it is easy to do today, it will be easy to do tomorrow. Once that is done, you can construct as sophisticated a reader as you like, all the while knowing that if worst comes to worst, some half-smart high schooler can recover the data given enough time and $100 in parts.
You misunderstood my guarantee, too; I was guaranteeing that I could get the job done and archive, and recover, a movie in this fashion, making a maintainance free storage method that did not suffer from unrecoverability. I was not guaranteeing the data; they have to provide physical security for it, and I have no control over that, so I couldn't possibly make any promises in that area. I *could* sell them some land in Montana; I just bought two city lots and the 5000 sq ft building on them for 25 grand. Taxes are low, too. ;-) There's plenty more where that came from - hundreds and hundreds of square miles. Thousands, even. Storage space isn't a problem unless they insist it be in LA, which - of course - would be stupid. It should be in a geologically stable area with a high speed pipe and reliable power, that's all.
I've fallen off your lawn, and I can't get up.
How long until the market stops demanding more from their computers. I know people are just going to say that I'm being short sighted, but I think that in about 10-20 years, the computer will be fast enough that there won't be any demand from most people for them to be any faster. Sure there will still be industrial uses to have ever increasing speed and storage sizes, but as far as the home computer goes, I think it is coming close to hitting a plateau. Once you can edit HD video without the computer taking a hit, and have enough storage space for that, I can't imagine most people could find much else that would consume more resources. People aren't going to be running physics simulations to find the origins of the galaxy in their basement.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Like Barbra Cartland? Or Penny Dreadfuls? Or the RFC Archive? Or YouTube?
Huge amounts of fundamental culture simply disappears because it is so transparent or ordinary to those it affects. The next generation comes along and they forget about it because of that apparent mediocracy. For example, breast feeding was normal, ordinary, and public in America up through the 1950's. Movie and later Television rule-makers didn't allow showing it unless it was part of some National Geographic type presentation. Today, breast feeding is being re-discovered in a storm of controversy because an entire generation has not only forgotten, but confused the topic with beer commercials.
Then again, how many people want to remember Phillippine Midget Snuff films? And why?
Pacifist paratroopers yell, "Ghandi!" when they jump.
I'd imagine the big G would fall over themselves to do it. And it would cost the movie industry zilch.
flash memory is not really any more reliable than a harddrive for long term storage. At least if you're talking about the cheap high capacity stuff that you would need to store a Tbyte or two of raw data.
A stack of archival CD-R or DVD-R, or actually pressing a master would let you hold the digital data for a few hundred years quite reliably. Just has a FORMAT.TXT on there to describe the encoding format(s) you used, just in case anyone forgets. And yes, a text file can be 1000 pages long, if it must be.
And C programming language has been thriving for 30+ years, it might not be too much of an assumption to think someone could dig up a C compiler in 50 years and compile a straight ANSI C program. A program that converts My Weirdo Format(tm) to raw binary frames and audio with comments in the source code might be all that is necessary for transferring lost media. I suspect the source code for that could fit on your archival media and would take a tiny fraction of a percent of the space.
I suspect that since CDs and DVDs are so prevalent and such an open format, that even a thousand years from now someone will be able to figure out how to read one and copy it to another medium. And CD's format is simple enough that it would be trivial to reverse engineer, if someone dug up our civilization in 10,000 years they could likely find the thousands of the various dictionary and language CDs out there as a sort of rosetta stone.
obviously there would be data loss on 10,000 year old CDs, but theoretically you could pull something off the regular non CD-R kind.
“Common sense is not so common.” — Voltaire
Hear, hear, this is why I have 3.7TB (soon to be 6TB) of raid5 storage for my private "backups" :)
On a more serious sidenote, some older shows are already ridiculously hard to get, legally or not... A good reason to archive them.
An example is Chicago Hope (yes, I know the show sucks), I could only find one season on bitmetv, the rest is gone... even tho it sucked, what if someone wants to watch it again in X years?