Technical Objections To the Ogg Container Format
E1ven writes "The Ogg container format is being promoted by the Xiph Foundation for use with its Vorbis and Theora codecs. Unfortunately, a number of technical shortcomings in the format render it ill-suited to most, if not all, use cases. This article examines the most severe of these flaws."
I don't see any comment and the website is already down. gg /.
No matter how bad it is, it's still better than AVI. I personally use Matroska, it has all of the ideological benefits (free, non-encumbered, open-source) over stuff like MP4.
Yet Another Tech Blog
(but so much more, including game and movie reviews)
http://yanteb.peasantoid.org
Who cares about container formats anyway? I could write ten before I get up in the morning. The codecs are the hard part.
"I would have done it diffferently" does not mean that the format is bad. None of these "flaws" render the format unusable. Maybe it doesn't perform as well as another format, maybe it isn't designed the way you would like, but it's implemented, it's available, and it's in use.
Just use MKV and be done with it already :/
It's not a selling point, it's a starting point. It's a sine qua non. For an application like video on the Web, nothing non-free can even enter the conversation.
Is that the ogg container format doesn't play itself.
The article complains about the container format, not about the codec actually used in the container.
Therefore, please: do not confuse ogg - the - container - format with
theora - the codec used to encode the video.
It would be rather easy to replace / reimplement the container format with something else (easy compared to replacing
the codec with a newly designed codec anyway)
There's at least one obvious flaw in his reasoning. He talks about removing the 8-bit version field in the header and replacing that with a 1-bit portion of the flags field to distinguish it from a hypothetical future version. That only works if one assumes there will only *ever* be two versions (v1 and v2). Such a basic failing of analysis is a pretty good indicator that he hasn't thought it all through as completely as he thinks he has.
I'm not an expert in video or audio production, I just dabble in it as a hobby. but one thing I often wonder is, what is the point of these container formats?
I've got a miniDV camera, and a canon point and shoot that thanks to chdk can record good-enough video. Both give me ".AVI" files, even though one is miniDV, while the other is Mjpeg. Mjpeg files don't work in my editor, while miniDV does. but I didn't know this at first, all I knew was that I have a bunch of .AVI files sitting in my hard drive, some work, some don't. I dont care about file extensions, I care about having files that work. I care about codecs. If they were named "filename.minidv" and "filename.mjpg" that information would be useful to me. What good is a container format when only half of the files within that container will play on my system?
I'm not trying to knock the idea of container formats, if they exist, their must be some beneficial reason for them. Could someone please enlighten me on what that reason is?
-I only code in BASIC.-
[flamebait] .rm
OR, we could go back to
[/flamebait]
PPN
Here's the coral mirror:
http://hardwarebug.org.nyud.net/2010/03/03/ogg-objections/
Did you mount a military-grade, variable-focus MASER on an unlicensed artificial intelligence?
I can't say anything about video, but for audio all my CD collection is converted to Ogg instead of MP3, you can't even spot the difference in quality, thou the filesize is smaller. BTW, my MP3 player supports Ogg playing as well.
"In the long run, all file formats become programming languages."
From this I draw a number of conclusions, the first being that when designing a format you need to bring a "language sensibility" to it. If you don't, it's only a question of *when*, not if, your format will become a poorly designed language. OK, "language" may not be the right word. I'd also accept, "byte code" or "executable file", but it's the same idea. JMHO.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
His website says "Everything is broken". If that were true wouldn't that, you know, mean that Ogg is no different than its competitors?
... of what format *should* be used in its place.
It is all very well claiming one format is not particularly good, but overall rather pointless if you don't argue an alternative.
So the question any .ogg user will have (since they probably chose this slightly obscure format over the more 'normal' .mp3 alternative due to the reputation of being better to listen to from an audiophile POV) is what to use instead? FLAC is fine if you have the space, but sometimes you want to compromise in order to save storage space...
His complaints:
On top of this we have the 27-byte page header which, although paling in comparison to the packet size encoding, is still much larger than necessary.
Ok, it's a container format, nobody cares about an extra 27 bytes when you can buy TB of storage for virtually nothing. And if you're complaining because it needs to go in the intertubes, gz compression on the server does a very good job of extracting and compressing plain non-random text like page headers but again, MBits are cheap and unless you're living in the US they are plenty.
The version field could be disposed of, a single-bit marker being adequate to separate this first version from hypothetical future versions. One of the unused positions in the flags field could be used for this purpose
It's kind of important to keep track of versions. If your player can't play the next version or an older version it should be able to detect that so it doesn't try-and-fail. It might also want to suggest what version of the player is required.
A 64-bit granule_position is completely overkill. 32 bits would be more than enough for the vast majority of use cases. In extreme cases, a one-bit flag could be used to signal an extended timestamp field.
That's what they said about our memory too back in the early 90's. 32-bit addressing is enough, nobody will ever have more than 4G of RAM. Again, these open formats tend to be scalable across time because they need to fulfill a certain mission. Look at ZFS, they have 128-bit addressing but nobody (currently) needs that amount of storage.
32-bit elementary stream number? Are they anticipating files with four billion elementary streams? An eight-bit field, if not smaller, would seem more appropriate here.
Why not, how many languages are there around the world? If you need to bring out a media file with subtitles and audio-tracks for each language, braille instructions and who knows what else for open access to certain media, you might want to use more than 256 streams.
The 32-bit page_sequence_number is inexplicable. The intent is to allow detection of page loss due to transmission errors. ISO MPEG-TS uses a 4-bit counter per 188-byte packet for this purpose, and that format is used where packet loss actually happens, unlike any use of Ogg to date.
Well, maybe the makers intended Ogg to be used eventually to replace MPEG (c)(patented) and used across links with much higher transmission errors. Sometimes my MPEG-encoded stream I get from my DTV provider has enough errors to stall and cause artifacts. When NASA wants to use Ogg for a non-repeatable stream from outer space, they should be able to. Again, overhead is a small cost to pay these days.
A mandatory 32-bit checksum is nothing but a waste of space when using a reliable storage/transmission medium. Again, a flag could be used to signal the presence of an optional checksum field
Ah, well, what is reliable these days? Ever used a large array of hard drives? Ever used a freakin' dial-up connection? As the makers of ZFS, Google and a few others recently have shown hard drive and memory reliability is not as good as we take for granted. Silent data corruption is a major cause of data loss these days.
With the changes suggested above, the page header would shrink from 27 bytes to 12 bytes in size.
Whoop-dee-doo, you made it half the size but you sacrificed reliability, error correction and future-proofness.
Latency
You show that the overhead is anywhere from 1% to 7%. That might not be the requirements for latency-sensitive applications but then you would again sacrifice other features. That is always a balance between speed and reliability but for most applications it doesn't really matter if the movie needs to be buffered 5ms longer.
Random access
You've got somewhat of a point there, maybe somebody will find a solution for that. The issues around indexing however is that seeking within a stream is possible. HTTP servers
Custom electronics and digital signage for your business: www.evcircuits.com
The reason that the page header data is repeated is because Ogg is a streaming file format. You can write it out live in real time, and decoders can decode it as you write it... you can arbitrarily chop it in the middle and or truncated off the end and still get something demuxable. You don't have to seek back and rewrite headers when you're done writing, and you don't have to seek to the end to read footers when you're done reading. Other formats (e.g. MKV) don't handle truncation, incremental writing, and streaming as well (or without putting them into a mode that makes their behaviour and overhead more similar to Ogg.
This means that some header data must be repeated frequently. The version number is 8 bits so that highly simplistic code can implement a version check with simple character level logic and without implementing a fancy format-specific bit unpacker. Yes, it costs a little efficiency, but if you really can't tolerate 7 BITS of additional overhead per greater than 500k bits (0.001%) then you have no business using _any_ multimedia transport format.
Besides it being EBML (a binary and efficient kind of XML), I’ve yet to see a feature that it can’t do. Even a complete 3D TV series with multiple perspectives, languages, subtitles, additional content, hull cover... streamed over the net in one file? No problem.
Also, it’s already the format of choice for HD video and multichannel audio format rips on the net.
A competitor would be nice. Unfortunately, OGG can’t hold a candle to it. But if they manage to catch up, they will be very welcome.
Any sufficiently advanced intelligence is indistinguishable from stupidity.
Quite a bit of the analysis seems to be reasonable on the surface, but something about the way it was presented set me off in a geek-rant that I put in the comments. Since I'm having trouble posting that comment on the site, here it is.
Many of the points sound reasonable, but the argument is strongly undermined by the fact that it offers not a single apples-to-apples comparison between ogg and any other container format in the article. On a section-by-section basis:
Generalities/codec mapping:
Article complains about how there is no global mapping, but does not assert that other containers have one.
Overhead:
The breakdown of where space is wasted is informative and mostly reasonable, but some of them seem to be a reach, such as the checksum being unneeded, and the suggestion of implementing the functionality in optional fields seems like a bad idea to me in general, since it will make the header variable-length, which is something to strongly avoid in my experience. Finally, when the article does "compare" ogg to mp4, it compares some rather hand-wavey numbers for ogg to a different scenario for mp4.
Latency:
The article fires off a bunch of numbers here, but then offers no comparison to the alternatives. In fact it don't even provide an explanation of how other formats avoid this latency in theory, much less in practice, and instead of showing how bad the latency is, it uses it as a platform to show that a naive reaction to the issue will cause bad header overhead.
Random Access:
In this section it lists quite a few worst-case numbers for disk accesses (why isn't it being pre-cached by the filesystem?) and then ends with no comparison to alternatives at all.
Complexity:
Once again it has a bunch of statements of problems the author has with the format, but no comparisons to "good" formats, in addition this section is particularly weak, with statements like, "implementation is annoying", and "ambiguity is bad".
Final Words:
"We have shown" is a rather specific claim to make, which the article has not remotely achieved. This pretty much sums up the whole article, which is titled "Ogg objections", but then tries in the text to bill itself as a rigorous analysis of ogg, which it is not.
If the author had matched the tone of the article to the title, this would be reasonable, but he only hurts his position when he throws around phrases like, "True generality is evidently not to be found with the Ogg format.", "The Ogg format is clearly not a good choice for a low-latency application.", and "We have shown the Ogg format to be a dubious choice in just about every situation.". He has demonstrated NONE of the above claims, and by making them the article has rendered me skeptical of the rest of its claims.
I'd imagine container formats have far fewer patent issues to worry about compared to compression algorithms.
Indeed. Technically there is nothing stopping you from having MPEG4/Theora beyond the playback applications.
Dirac was one codec that seemed to show some promise, from the BBC, but I don't know how much of a decent candidate it is and how much push it is being given?
Jumpstart the tartan drive.
This whole thing is really about bad blood between Xiph and the mplayer folks. Once, long ago, I made disparaging remarks about a particular mplayer developer's extensive collection of ass hats, and they declared war. This stopped being about facts or reason years ago. Here's the last blog thread that got completely hijacked by the anti-Ogg container wingnuts. It's a hell of a read:
http://blog.gingertech.net/2010/02/20/googles-challenges-of-freeing-vp8/
So, rehashing this yet again: The Anti-Ogg bullet points [Not going to bother with complete sentences, because I've wasted too much typing on this recently]:
1) A few of the mplayer/x264 hackers are right pissed that Ogg and Theora are getting all this attention when x264 is so obviously superior. That simply cannot stand. Since only America has patents and there are no computers there anyway, nobody should have to worry about them. Stick it to The Man! (How very ironic, Xiph being considered 'The Man' by folks contributing to an h264 encoder).
2) Xiph should immediately drop Ogg for [insert container here], breaking millions of hardware decoders and hundreds of millions of software decoders:
a) the [patented] mp4/MOV container is one suggestion they actually make seriously. Never mind adding 'willful infringement' to breaking the entire installed software/hardware base, this choice would totally redeem Xiph in their eyes. The benefit: by their own figures, it would reduce container overhead from .7% to .3%.
(Except that number is wrong. I found later that DonDiego screwed up his mp4 overhead figures at the link above; I had simply assumed he got his container numbers right. The mp4 file in his example has almost identical container overhead to the Ogg, a shade under 1%. His demultiplexed mpeg audio and video had framing in them, so it made it appear the mp4 container overhead was much smaller when he subtracted their file sizes.)
b) OK, mp4 is patented and no better, fine, Xiph should have just used Matroska from the beginning. Despite the fact that Ogg and Vorbis predated it by about five years (also mkv's not been able to interleave until just recently, which == no streaming). This is not to say you can't put Theora and Vorbis in Matroska. It's even a good idea! I've come to like MKV. But for streaming, Ogg is still much easier to deal with. Ogg was designed to stream, mkv was not.
c) OK, so, mp4 and Matroska are right out for streaming, Xiph should use Nut, which is the system they designed. Nut came ten years after Ogg was already widespread. And looks almost exactly like Ogg. Which is not to say there aren't things about it I like that improved on the Ogg approach. Eg, the packet length encoding is better. It has a conditional checksum coverage feature I had never considered, etc. At some point we'll make those changes when that wouldn't mean completely abandoning any chance we have at adoption just to save a fraction of a percent and add... no new features.
d) but.. but.. even FLV is better! OK, at this point I can't even entertain the arguement.
3) OK, maybe not adopt another container, but Xiph should immediately improve/change Ogg for, breaking millions of hardware decoders and hundreds of millions of software decoders for a 'better' implementation that won't actually give users any features they don't have now. FOSS need _tools_, not us wasting time overoptimizing something they couldn't care less about.
3) 64 bit timestamp! OMG, waste! Wait, mov/mp4 uses 64 bit stamps... Also, plenty of things in Ogg use a full byte instead of one bit because the container assumes octet alignment. Alignment makes it much faster/easier to deal with (you don't need a bitpacker to read pages, and you don't have to repack packet data to embed it into the page). Remember, all the completely unacc
Ogg is silly. Matroska stomps it in every way possible except for a *slightly* higher amount of overhead--which, when compared to the total size of any audio-visual file, is negligible. There's no reason to use Ogg now that MKV is reasonably stable.
As you have seen, those properties of OGM used by people who propagate
it are no real advandage, and at least one major drawback could be easily
xed to some extent without breaking compatibility to anything. But no one
seems to care to x it. It seems that people who could x don’t care enough
about OGM to really do it.
Finally, here the excerpt of a ’discussion’ between one of the most famous
OGM Zealot and other people:
Zealot: "Who would put ac3 or dts into ogm btw.? OGM is meant to be
a container for ogg vorbis sondtracks and video as well as subtitles."
Someone else: "So you acknowledge the fact that OGM is just a technology
demonstration to advertise Vorbis ? Not a general purpose container ?"
Zealot: "You know very well that OGM is a general purpose container..."
As you can see, not even OGM Zealots trying to defend OGM at all costs
(again, OGM, I don’t speak about OGG here!) can write reasonable sen-
tences not contradicting each other.
http://www.alexander-noe.com/video/documentation/containers.pdf
I sadly have to agree, and I've voiced the same objections for a long time. It really is like he tells it: it's just bad at everything it was intended to achieve. It's a source of bugs, it's horrendously complicated to support, and it's horrendously inefficient at anything but audio (and even then, not so good).
It seems to me, most of what went wrong was trying to support concatenation of Ogg streams. This is a nice idea, but actually quite a rare case. It's also incredibly naive for the specification document to request that Ogg implementation detect this. What, I'm supposed to scan the entire file in case that happens? No. I'll just not be compliant to that, thank you very much.
I even wrote my own Ogg/Vorbis decoder from scratch a while back (and dabble every now and then), and found Ogg to be a never-cooling, never-extinguishing steaming pile of hippo crap left over from consuming a dog. It just made everything so difficult to do. Seeking a stream involves divide-and-conquer - not necessarily a bad thing, but when you have huge streams the number of seeks can be bad. Not to mention if your stream has an endpoint the other side of the Atlantic Ocean. Why oh why did they pick timestamps being at the END of a page and indicating the output byte count produced by the END of that page? That little detail alone probably cost me days of debug.
I almost gave up at one point and went to a container format of my own which would have worked much better. Header: 'CONTAINER v1'. Packet: 'MAGIC', 4 byte Length, 4 byte Output pos. Job done. The sad fact is, that's easier than Ogg, smaller than Ogg (unless you're talking really low bit rate), and does entirely the job of Ogg without the complexity.
I'm probably going to add a Matroska container to my codec just to see how easy they are to produce. The spec looks fantastic, but the devil's always in the details - although seeing the praise on various (engineer) forums, it looks like the way to go.
So, Ogg, please die. We need you to get out of the way.
NUT is another alternative, which is open, simple, and well designed. Along with Matroska, it is also capable of containing Ogg Theora and Vorbis streams, so there is really is no good reason to use the Ogg container anymore. The author of the article is correct--the Ogg container is an awful format.
The main complaints about Matroska are two-fold. One, the EMBL encoding is overly complicated. It requires a considerable amount of code to parse, and also imposes an unnecessary degree of overhead. The second is a much more serious problem: a Matroska file can only contain one timebase. Thus, in order to mux streams with different timebases, approximation is required. To accurately represent the converted timebases, it is necessary to use a much finer granularity, and then you also lose the exact timestamps.
The NUT specification and code is available from svn://svn.mplayerhq.hu/nut, and the (de)muxers are included in MPlayer/FFmpeg, VLC, and probably elsewhere.
Overall, most experts would agree that Theora is still a good codec but it seems like the latest talk is all about Dirac: http://en.wikipedia.org/wiki/Dirac_%28codec%29 and http://diracvideo.org which is a very strong contender. It has suddenly gained backing from a number of the major corporations who were previously in favor of H.264... This is good news since Dirac offers much better quality than either of the other codecs, is royalty-free, and released under either GPL2, LGPL2, or MPL.
The great thing about Matroska is that it supports (or at least can support) absolutely everything.
The main drawback of Matroska is that it supports (or at least can support) absolutely everything.
Thanks! I had forgotten that Fight Club is on tonight on TCM.
I come here for the love
SMPTE (in coordination with the European Boradcasting Union and other groups) developed the Material eXchange Format (MXF) container:
MXF is exceedingly flexible. Many MXF-wrapped files play back in VLC today.
This article looks at the view from the player, but there are authoring issues also.
A key feature of ISO MPEG-4 is it is based on the QuickTime file format that authoring tools all speak. So adding ISO MPEG-4 support to an authoring tool was a minimal job, and it happened quickly and broadly. The fact that ISO MPEG-4 was a standardization of the QuickTime we were already using was very practical. Very much like how many HTML5 features are things the browsers or authors were already doing.
This all happened years ago, of course. The funny thing with the patent debate is the ISO MPEG-4 patents will expire before we could ever move everybody over to Ogg. And until then, it is a patent pool with cheap licensing that can't be denied to anyone, like GSM in phones, protection from liability, and with no content tax. Hardly the terrible evil it is made out to be.
But even so, the author is right: you have to bring the tech first. It has to be practical.
why is a completely general video container format actually useful? [...] rather than a file that is decoder specific?
A container format is needed because media files contain many different things using different codecs: video stream(s) using some video codec, audio streams using audio codecs, subtitles in various forms, timecode tracks, metadata, etc. Without a container format to bring all these together, your videos would be without sound unless you also have the audio file and manage to start both at the same time so they start in sync. If you also need subtitles, that would be 3 files to find and start at the same time.
Comment removed based on user account deletion
Alright, so despite doing a bit of reading on wikipedia I'm still pretty puzzled about exactly how container formats can work. Setting aside fancy features like user menus and things like chaptering it seems to me the primary purpose of a container format is to do two things.
1) Define a format to multiplex many different data streams, e.g., allow packets in the audio stream, video stream, subtitle data to be interleaved so the right data can be available when needed (putting all the video stream first then the audio stream would be a bad idea).
2) Provide synchronization information to let arbitrary video, audio, and subtitle formats coordinate their display.
----
Now 1 seems relatively straightforward. What confuses me is part 2. I mean if we were encoding video as a simple list of pictures and audio in pcm this would seem straightforward. Each packet encoding a video frame gets tagged with the frame number and each audio packet gets tagged with the frame number it should be played with.
However, how does this work given the fact that to display frame 10034 the video codec may need to use information from frame 10000 and similarly with the audio codec. So if I want to jump to frame 10034 the player needs to know to look back at the info for frame 10000. I mean I can think of various ways this might be done but they all would seem to require particular knowledge about how the individual streams work.
Could someone explain how these work or give me a pointer to a good explanation?
Thanks
If you liked this thought maybe you would find my blog nice too:
Of course, a true geek uses ffmpeg (the basis for much of handbrake). Or MEncoder for the slightly less geeky.
Now mouse over
Ogg isn't bad, it's just different.
and it will turn into:
Ogg isn't dead, it's just resting.
Even more fun:
Put your mouse cursor right at the end of the t in different.
This is far enough to the right that when the mouseover activates, it shrinks different to resting and your mouse is now outside the mouseover, so it goes into an infinite loop.
Well, Risen, aren't you a little confused?
Ogg is the container format, which can contain sound (Vorbis) or video (Theora).
I'm guessing your Sansa Fuze does not play Theora, even if it claims to understand Ogg. Which was exactly the point of HairyFeet.
That shitload of audio players you mentioned play *VORBIS*, not *THEORA*. There is a difference, one being audio, the other being video. Both are Ogg files. Neat, eh?
Youtube/Flash used H262 earlier. You know, when there were internet videos on Youtube but the H264 spec wasn't written.
There are no technical shortcomings in the format!
"You didn't understand his point. The latency is inherent in Ogg due to the large pages (not packets) required to reduce its size overhead, and in the position of the CRC (at the front of the page rather than the end). Reducing the page size makes the page headers start taking significant percentages of size if it's a low bit rate stream, e.g internet audio. "
Can you please suggest another format that does single-packet-latenc operation with variable packet sizes and variable temporal duration?
MOV (/MP4) can't, AVI can't, MKV can't. RTP/UDP (which isn't normally written to files) does, but its overhead is similar to Oggs in this use case.
"ou end up just making copies of packets out of the stream, which is inefficient. In fact, that's exactly what the official Xiph codecs do: they make ugly copies."
The tremor vorbis decoder library (the one for fixed point / low memory devices) includes an alternative version of libogg which is zero copy.
Try again.
And the whole VIDEO tag question will get solved, in the long run. Just not with Theora.
But with some next gen video format which will be :
- better quality for same bitrate as h.264
- get openCL-accelerated (hardware acceleration on anything with a GPU)
- be patent free
I'm personally looking toward the Wavelet crowd (Dirac/Schroedinger, Tarkin, and the like).
But Google's acquisition of On2 and their latest VP8 could be another pointer at that future.
People would still keep their old collection ripped with x264.
But if a new codec emerge, with a better compression ratio than anything available on Blue-Rays currently and thus enable better quality and smaller rips, it will be an overnight success on PirateBay and thus get mass adoption.
And if TFA is right, Matroshka could perhaps become the preferred container for those too (although, I'm currently noticing better real-world performance with OGG containers as with MSK ones).
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
The whole debate is stupid, firefox needs to just use the operating system's built in codecs to play h264. Problem solved.
Yup. Let's swap 1 binary BLOB - Flash, Silverlight.
With another one - a system codec - which might not have been designed with Web in mind, or could even be absent from the system (No default h264 in most windowses).
The whole idea of a standard is to specify something which could then be implemented by anyone wishing it.
The patent trolls hanging around h264 just prevent that.
But, the victory won't depend between h264 or theora.
The biggest standard victory will be won by whoever manage to be the next MP3, MPEG-4.2/DIVX or MPEG-4.10/H264 : The next format to be vastly supperior to what is currently available and thus be an instant success as a sharing format.
So let's better get the big brains of the programming world working on that one.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]