MXF+JPEG-2000+HDD = Future of Video Preservation?
Anonymous Archivist writes "Media Matters, a technical consultancy specializing in archival audio and video material, recently completed a Mellon Foundation funded Digital Video Reformatting Preservation Project for the Dance Heritage Coalition. They conclude that MXF is the recommended container format, JPEG-2000 is the recommended encoding format and HDD is the recommended storage media. It's a very valuable series of experiments and offers a strong indication of where the archival preservation of analogue video is heading."
Isn't JPEG lossy? Why would it be recommended as the prefered storage mechanism, instead of TIFF or the like?
Enlighten me. What's MXF?
quidquid latine dictum sit altum videtur.
...I just threw up in my mouth a little bit. That is a Word document!
Why would they go with a compression format that doesn't do inter-frame compression?
It might be nice for editing, but you could get more quality in the same space with something like h264, or even h263 if they have to do this right now (i.e. before h264 is quite ready for prime time).
Recommended Storage Media: Peer to Peer network.
for all those lousy patent lawsuits you'll get now.
The HDD recommendation doesn't seem to make much sense. The article talks about cost-per-gigabyte, but obviously it is much cheaper to use CDRs or DVDRs. This is video preservation, after all, not storing indefinitely for video /editing/, which would require a more malleable storage medium.
And before someone points out that there are studies showing that the longevity of CDR/DVDR discs is questionable, surely proper storage of discs (and not buying the Best Buy free-after-rebate special) would be sufficient. HDD, after all, is susceptible to head crashes, and being a magnetic medium can be more easily overwritten.
SELECT quote.text AS sig FROM quote NATURAL JOIN attribute WHERE attribute.description = 'witty';
0 rows returned
No, they mean Hard Disk Drive. RTFA.
SELECT quote.text AS sig FROM quote NATURAL JOIN attribute WHERE attribute.description = 'witty';
0 rows returned
Enlighten me. How can I avoid a stupid FP?
make their report available on a format other than a '.doc' file. it is known to change a lot and therefore not suitable for long term storage.
OK, let's talk archiveability. Let's talk about a medium that you can leave in a shoebox for a hundred years and read just by shining a light through it. I'm not talking hypothetical here - this technology is proven by the fact that people used it a hundred years ago and it worked. And the technology is even better now, even more stable.
I am of course talking about film. It is very very easy now to write digital images onto film, not very much more difficult than it is to scan film. There's no need to worry about whether the file format will be supported in the future, as I've already said. You don't need to shovel money into vendor's pockets every few years just to copy it to the latest trendiest type of disc. You can build a machine to project film out of junk if you need to, or you can scan it if you want a digital image and when you have a better scanner (e.g. a higher DMax), you can just scan it again.
The dude who wrote this report is just blowing smoke. He's trying to sell snake oil.
http://www.openoffice.org
;-)
Now stop being such a pansy
Ripping an new rectum in the fabric of spacetime.
Can I use MXF to recompress my Tivo recordings of MXC?
Okay, so JPEG 2000 uses wavelets and is therefore quite advanced, but as I have understood, it's still geared for still images (ok, there is probably some form of motion jpeg 2000?).
I would think that most optimal method would be to use something like DIRAC instead (or Ogg Theora). DIRAC uses wavelets and adaptive arithmetic coding, so it should be "on par" with JPEG 2000 - and should also be free of patent encumberance.
JPEG 2000 has one feature that might make it better in "archival" purposes - there is a lossless mode which still achieves higher compression ratios than PNG.
https://bugzilla.mozilla.org/show_bug.cgi?id=36351 (no link for obvious reasons) is the bug report, which has been around since April 2000 but has not progressed much due to licensing issues (copyright ones fixed, patent ones not?).
Ummm what about the sound?!
Were that I say, pancakes?
It's a really poor cut-and-paste troll. Woo fucking hoo.
the program being Mozilla's image library
Avoiding inter-frame compression means that, if you have some small amount of data corruption, you only get one, maybe two corrupted frames of video.
I hope the digital archive guys have their trademark ducks pretty firmly in a row -- if they're playing in the same namespace as a destructive scumbag like Brock, just being in the right isn't going to help. And Soros or not, he has plenty of money, lawyers and press contacts.
What I'm listening to now on Pandora...
Storing digital information on paper is feasible and lots of research efforts have been put into it.
Storing data on anything magnetic or optical is a bit worrysome. But then, it's not critical data so I guess it doesn't really matter.
I could have told people this as they've replaced video tape, and audio tape for me for the past decade. I find them much more convenient, portable and cross platform. I have SCSI drives from 1994 that will still work in a PC (Linux or Windows) or Mac today. They are easy to backup to and restore from. The HD is about as close to perfection as you can get in a storage medium. At least until you get flash drives that can store 1 terabyte at minimum, and have an infinite number of writes. At least a 100 year lifespan.
-"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o
I can't wait until the neighbourhood drugstore starts selling HDDs instead of MiniDV cassettes.
...had a guide on capturing analog video, said to be the part of a 3 part series, going over each capturing, cleaning, and compressing. Only part I ever came out - Ars do you read slashdot? - I am waiting on the last guides for some advice on how to preserve these rotting home VHS tapes.
Meanwhile, does anyone else have advice on capturing and cleaning video since we are already talking about compression? What settings are good for capturing and what sort of software exists to clean up VHS and give it the appearance of more clarity? I am using a WinTV card as Ars recommended it.
But survivability isn't the only consideration. Cost is always an issue. (So much for my platinum plates, though your approach isn't exactly cheap either.) You also want to be able to able to access the data in the short term. I worked my way through college operating film projectors. It's is not a convenient medium!
One thing I'd like to know is why archival-quality optical discs weren't considered. (Presumably there's something in the document about this, but it's a poorly structured word file, and finding key facts is more work than I care to expend.) They cost 5 times as much as standard CD-Rs and recordable DVDs, but their manufacters claim the data is good for 300 years. Of course, you need some fairly complicated technology to play them back, but CD and DVD drives are pervasive consumer devices -- they should be around for a very long time.
There will always be multiple backup solutions, but the biggest trend continues to be towards using hard disks for backup. When your data files are enormous (such as with audio/visual data), HDD backup is even more attractive.
There is more to jpeg2000 than a compression scheme offering scaleable quality and resolution within a single losslessly compressed file. There is also the interactive delivery mechanism offered by the JPIP protocol. Now there is something really useful...
In theory, theory and practice are the same. In practice, they're not.
Hell yes, it's amusing.
I haven't seen that posted in years now.
People who dislike Slashdot trolls have no sense of humor.
The nice thing about digital media is that you can leave it in an unrefrigerated shoe box for a decade or two, then come back and make a perfect copy of it with absolutely no degradation.
You can also make a perfect copy and stick it in numerous locations, making them harder to lose in a fire/terrorist attack/rampaging llama incident. They don't require refrigeration, and they take up a lot less room.
But it's not perfect: there's a analog-to-digital step, and you lose information there. Even a print of the film is an analog-to-analog copy, which is even blurrier. The only way to preserve that perfectly is to preserve the original medium.
This plan would be best done alongside preservation of the original media. You preserve media for things you have reason to believe are worth the expenense of elaborate storage for a century. For the rest of it, you keep a digital copy which is as accurate as possible, and you revisit it very, very rarely to ensure that it is still viable.
There is expense associated with refreshing the media every decade or so (at which time you also compress 10 CDs onto 1 DVD, then 10 DVDs onto a single holo-tera-whatever). But while I haven't seen numbers, I suspect that preserving the film stock is at least as expensive, and probably more so, and still prone to single points of failure.
There's an interesting prospect. RAW actually takes up less space than TIFF even though it holds potentially more information. But decoding it takes a lot of CPU time.
Transcend Humanity. Please.
the main reason for not choosing other storage formats is that film has a bit depth greater than 10bits per color channel. MPEG, MPEG-4, WMV, Quicktime, etc do not satisfy this requirement. Also the film people do not want their frames interpolated, they want absolute frame accurate reproduction. Its no wonder that media matters has chosen JPEG2000, its the format that has been ratified by the D-Cinema folks (meaning all future digital projection is jpeg2K) and there are already real-time jpeg2000 solutions available http://www.digital-rapids.com/Brochures/CarbonHD.p df (pdf file)
All this is is a method to line some guy's pockets. I'm sure the tape guys are gonna say, use XYZ type of tape. The disk guys are gonna say disk.
What makes this guy think that the interface to the HDD is going to be around in X years?
PC's have only had two dead (non-(e)IDE/ATA) interfaces, the ESDI and the ST506/ST-412 interfaces.
But what if you were trying to find a computer with IPI (1960s mainframe) interface.
The Fed gov't has this problem with trying to find parts for their old 8/9track tape drives..
Here's a good list of all the HDD interfaces over the years: http://www.i-t-s.com/corporate/terms.html
Stick with microfiche, film, that way we don't have to pay some vendor $$$/yr to keep alive a dead technology or pay some other vendor $$$/media to move them from old to new media.
Honestly... just re-copying a digital linear tape every 10 years is a safer bet than any other technology or solution.
DLT tapes are nearly indestructible with a price/GB which is close to HDDs. And now we have LTO, which is even more dense in a similar form factor.
I've recovered data from tapes that were left outside in the rain, kicked around in storage boxes, and stained with smoke (after cleaning of course). All you need to do is keep magnets away (and by this I mean keeping the field strength under a few gauss... )
Any media relying on a chemical process to retain information is probably not a good idea... and I think it would cost more in the long run.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
I'd be happy to be corrected, but my understanding is that RAID 5 is no more capable of telling you which drive is "right" about a block than RAID 1, at least its most common default configuration.
RAID 5 with one parity disk essentially stores the XOR of all the other disks' data on the partity disk. (It's not that simple because it stripes it across disks, but in terms of the logic that doesn't matter).
If I have three values, a, b, and p where:
p = a XOR b
and I find out that *one* of them is wrong because at a later stage I check and find out that now:
p != a XOR b
all I know is that either a, p, or b are wrong. Not which one. As I understand it, the purpose of RAID 5 is that if I lose any one of a, b, or p I can reconstruct it from the two remaining values, and I can extend this scheme up to any number of values ("disks").
For this reason, I'm pretty certain most decent RAID controllers reserve a small percentage of your disk's capacity to store checksum (probably CRC) information. This lets them checksum blocks and determine *which* block is wrong, so they can go from "one of these values is wrong" to "this value is wrong, let's reconstruct it from the others".
My understanding is that RAID 1 frequently doesn't use this, because it's not doing any logical disk remapping like RAID 5 does anyway, and also typically wants to retain the option of using any one disk stand-alone in an none-RAID system.
It should also be noted that IIRC high-end RAID systems permit you to dedicate two disks (or even more?) to parity. Much like the difference between a CRC value and an ECC value, this lets you recover one wrong value (without extra CRCs etc) and detect two wrong values.
Thanks for pointing that out - it's something I should've realised.
A friend of mine has a friend who was running lots of host machines (web servers, irc, file sharing, etc.) in a kind of ghetto set-up in Calgary. One day, everyone sitting in irc saw the connection drop, all the stuff being served was gone. No web site, no streaming audio, no irc.
:)
The servers were all sitting on a shelf... you know, the kind you use those brackets that screw into the wall, and put some board on top? I will leave it to your imagination to figure out what the technical problem was that day.
Talk about servers crashing!
I've got a bad attitude and karma to burn. Go ahead. Mod me down.
TSIA. It would probably be a bad idea to start publishing material with it until the patent expires.
LRC, the best-read libertarian site on the web
Should Microsoft's Office business unit collapse, the OpenOffice.org maintainers will probably freeze the .doc format as whatever OO.o Writer's .doc import filter accepts.
Besides, if you have Windows 98 or later, you can open simple Word 97 .doc files in WordPad.
And if you can spare the space, a directory with a wav file and a stack of uncompressed TIFF images is even better. Compression formats are complicated to reverse engineer.
Store .mng + .flac + source code for libmng and libflac, and you don't need to worry about any sort of complicated gnireenigne.
TIFF would be a much better choice for archiving, because it's a much simpler format and is much easier to decode.
Does it really matter how simple something is to decode if you're including source code for the decoder libraries on each HDD?
I noted several shortcomings of this study.
1) He states the Jpeg2000 can be lossless or that it can "scale down" and be compressed. He does not subject the "scaled down" Jpeg 2000 codecs to the same rigor as he subjects other compressed formats, etc Mpeg4, Mpeg2, Sorenson, WM9, and real video. I think if anything like the hardware he suggests is put into place there will be extreme temptations to scale down the capture. He suggests that people searching these archives would be happy to have "scaled down" transcodings of the uncompressed file. How many MIPS does that take? How long does that take? Why won't he tell us how this lossy transcode looks compared to the other lossy transcodes. Of the people who go to the library to check out dance videos how many of them are going to be pleased or even able to view a lossy transcode in MXF format?
2) 640*480 captures??? Who uses this but armatures??? It has the advantage of being square pixels, and many of the compressed formats he uses are also square pixels but even his "uncompressed" AVI is going to be -DOWNSAMPLED- from the DVCAM source he lists. That's not something you want to do before upsampeling to HD.
3) The Mpeg 2 test lists 20Megabit Mpeg 2 but the source is 640*480 video. Mpeg 2 like the DVCam tape can do better, I don't understand not using "main profile" and not using a normal data rate. CBR like he did is not the only way to produce Mpeg2. Though I am a fan of CBR I wonder about things like 2 pass VBR and how that would affect the quality measurement.
4) What software or software packages were used to do the respective compressions? It's not listed. Different packages can have wildly different quality results.
5) I feel the author was truly misunderstanding Mpeg-2 when sentences like this are in the report "While this form of encoding looks more or less attractive on a standard television screen, whole frames of video are thrown lost, thrown away in the digitizing process to get the file small enough to fit onto the DVD media." Not only does he seem to not quite get the techniques he radically underestimates the advantages of Mpeg-2 and it's omnipresence. You can get a wide variety of encoding and decoding packages and devices. You're not waiting around for a few "real time jpeg2000" capture boards to hit the market. You don't need a container format with Mpeg2 and it is HIGHLY optimized to reproduce what is on a video tape, so much so the sampling is RARELY like his lame 640*480 captures.
I know there are problems with Mpeg2 and I respect the instinct to go for high quality but this study is just flawed. While he devalues the archivists for "hoping for the best" in maintaining their analog libraries, he is out there "hoping" for some jpeg2000 capture boards and some market adobption of MXF format. Goodness he's making an expensive proposition based on Hope. I see no cost analysis besides a "projection" of dropping hard disk storage costs, perhaps there are some extremes system costs to this sort of storage besides just bulk media???
For whatever reason (I'm not a video expert) many people prefer intraframe codecs for archival. As you probably guessed, Motion JPEG 2000 just treats each video frame as a still image and compresses it with JPEG 2000.
Dirac will give much better compression that JPEG 2000, but it also introduces the possibility of interframe artifacts.
U r 100% right that paper and film have a proven track record, which is more then u can say for anyting digial (proven = > 100 years) Analog is also inherently superior to dig for archive in that partial loss is only partial loss.
/. reader does, you transfer to new, much larger media every 5 years
The required equipment could be built in volume at a reasonable price (and even custom glass lens are not that $$, as you can see from say photonics magazine), but as u c every day in cameras, you can mold pretty good lens out of polycarbonate or coc cheaply). The cost of 35 or 64 mm movie film is cheap, etc IN VOLUME
which raises the question,if this is real, why is kodak or someone like that not doing this.
I dont know the size of the data archive market, but it has to be at least a few hundred MM a year, small but i think just large enough to interest a company like kodak or fuji or agfa or even a zombie like polaroid.
but..you still have analog to dig to get the data back, and you have the horrendous problem of cataloging analog data for a digital world and you have the prolbem of the declining cost of storage...which is probably the largest arguemtn, from an ROI standpoint, against film
IMHO this whole archive thing is silly..you do what every
An archive format??
1) Jpeg 2000 is not widely used, and is not much better than good old jpeg. its easier to implement a jpeg decoder, or find info on how to make one.
2) But really, the most flexible would be to use NO container format.(folder of jpegs, a wav file, and a text file telling you the fps)
3) raw wav audio is high quality and small (compared to video) and can be read by anything
4) storage is silly; 10-20 years a new format will be pushed, and you'll have to migrate everything anyway. The media might last, but the players will not. just pick what is cost effective now.
Because when you're archiving digital data, recoverability is paramount.
No, Viacom is paramount.
"What if all I had was a piece of this data, say, a hundred gigabytes from the middle of the disk? Could I turn that data into useful information?"
As long as your codec is seekable, this works. Motion JPEG is trivially seekable, consisting entirely of keyframes. Toss a redundant copy of the codec on the volume after every GB or so of video data, and recoverability is preserved.
For people concerned with the preservation of "data", they've sure picked an interesting format to write about it in.
--
"Open source is good." - Steve Jobs
"Open source is evil." - Microsoft
indeed, lossless for archival preservation is the
only way, as it fits the basic rule of art restoration
technology -- never apply "improvements" which
cannot be reversibly undone to take advantage
of future science.
ironically then, the lossless format doesn't matter.
however, at least for the instant case of dance video,
the likely input (a myriad of digital tape formats)
is hopelessly neanderthal -- anything having to do with DV,
or MPEG, or even ATSC HDTV already tosses away much
color information. (4:1:1, 4:2:0, and 4:2:2 colorspace is embarrassing
to preserve "losslessly".) ditto for temporal
info, with interlacing being the culprit. even film at
24fps just will not cut it for motion such as dance.
so here's to better camera technology, whether it's
10- or 12-bit 4:4:4 RGB, or something like
carver mead's foveon made swift.
The heck with compression. We have an infinite digital storage in space. Beam it out as tightly focused megawatt microwave beams, and just remember the exact heading and the date.
Our future FTL capabability will ensure that we can recover every last digital bit, as often as we need.
Downside? Those pesky aliens will not be paying their share of the royalties!
I'm so very, very sorry. Please ignore the parent post.
I don't understand why they don't use LTO Digital Tape. LTO-3 currently holds 400GB (using no on tape compression). Is $0.30US/GBM L
http://www.cdw.com/shop/search/results.aspx?grp=T
Is very reliable and will last for a very long time. It is great for archiving and is what TV stations use. Also, if they are serious about archiving, why are they not considering higher bit rates? If they are going to do this then they should be considering 50 Mbit instead of 20 Mbit. Shure it takes up a lot of space, but that is why you use LTO. Also, every ~two years they come out with the next gen that has double the capacity of the previous version. And the new gens can read the older tapes.
Fly me to the moon Let me sing among those stars Let me see what spring is like On jupiter and mars
Ooops. Didn't find it. That part of the data was lost.
In the 100 GB extant fragment you mention, there should have been at least a hundred copies.
You do realize that "encode" also means "to obscure," right?
The "EFM" part of the physical layer of CD or DVD storage and the "RLL" part of the physical layer of hard disk storage are also encodings or, in your definition, obscurings. You have a shiny platter and a drive motor that hasn't worked for centuries; how would you retrieve even a byte stream from the platter?
If you found a shoebox in your attic filled with letters and postcards written in Middle English, translating them to Modern English would be a massive effort.
By the Church-Turing thesis, every language of computation can be perfectly interpreted within another language of computation. You mention that only scholars can translate Old English to Modern English and Classical Latin to Modern Italian, but the difference between spoken languages such as Middle English and mathematical languages such as C is that a small team of scholars can create a perfect automated translator for a programming language.
Expecting the people who want to access that information to (1) understand your language, (2) understand your programming language, and (3) understand all your baroque encoding algorithms is just fundamentally wrong.
People who have the data and want to convert it to the format du jour only have to have access to a program that performs (2). Programming language archaeologists can create this translator program once and be done with it.
And who's to say that automatic computers will even exist in the distant future? What if thinking machines are banned between now and then? Then how would you pull byte arrays off a platter and turn them into visually perceptible motion?
What, stuck in the middle of a piece of video? No.
Why not interleave the video with copies of the decoder? Haven't you seen the movie Contact, where the rules for interpreting the alien diagrams were placed around the edges of the blueprints?
That's why electronic storage is a controversial topic for archivists. Its use is a compromise.
Everything is a compromise. It depends on which scale of time you're trying to archive for. I see HDD as useful for archiving video over the course of a century or two. Beyond that, it has become increasingly likely that Armageddon could destroy everything.
But every aspect of the story that involves the use of the phrase "a small team of scholars" reduces the likelihood that the information will ever be recovered.
The difference is that a small team of scholars would only have to write a C compiler once in a modern programming language, and then any program written in ancient C would become executable.
My personal observations are that MXF is primarily used as a container for DV, the Jpeg 2000 codecs are too slow to do much good, and MXF is hardly in use anywhere except trade shows.
[With interleaved decoder and coded content] it's even harder to tell what's content and what's not!
As I've said, source code in any programming language that uses US-ASCII encoding will have bit 7 clear throughout. This is a strong correlation in bit 7, a method of framing which I think any future scholar would have little or no trouble discovering.
I'm going to pretend your source of insight for this conversation isn't a bad science fiction movie.
I'm going to pretend that a lot of engineers working on breakthrough technology didn't receive inspiration from speculative fiction. Without the Dick Tracy comics, for instance, do you think anybody would have bothered trying to squeeze a PDA into a wristwatch form factor?
If you don't know what the fuck you're talking about, why don't you stop talking?
Then please explain why you feel that I don't know what the intercourse I'm talking about. I am willing to learn.
50 x long-life DVDs @AUD$1 each including cover: $50
Labour @$10 an hour to feed these to a burner: $30
Controlled storage for 50 DVDs for 20 years: $200?
Drive to read and copy the suckers after 20 years: $X?
TOTAL: $280+X
vs
1 x 200GB IDE HDD @AUD$160: $160
1 x tray @AUD$10: $10
Labour @$10 an hour to plug it in and walk away: $1
Controlled storage for 20 years: $50?
Functioning IDE buss to read and copy the suckers after 20 years: Z?
TOTAL: $221+Z
If you mail it across Australia to the storage, the hard disk just won by a bit more. It fits in a 750g satchel for about $5, the DVDs won't fit in a 3kg satchel so they'll get "cubed" and probably be about $20-$25.
Got time? Spend some of it coding or testing
Thanks. I noted in my post that I was oversimplifying the description of RAID 5 as if the error correction data was all on one disk. I realise that in reality it's striped across all disks. In terms of the data recovery logic that doesn't matter, since for any one set of sectors across all disks there is still an error correction set.
I was, however, unaware that RAID 5 used ECC not simple parity checks. How does it do this when only a single disk is dedicated to error detection? I would expect ECC data to take up more space than that (in fact, twice the space). If you have details I'd be interested to find out.