Ogg Format Accusations Refuted
SergeyKurdakov sends in a followup to our discussion a couple of months ago on purported shortcomings to the Ogg format. The inventor of the format, Monty "xiphmont" Montgomery of the Xiph Foundation, now refutes those objections in detail, with the introduction: "Earnest falsehoods left unchallenged risk being accepted as fact." The refutation has another advantage besides authoritativeness: it's far better written than the attack.
http://xkcd.com/386/
Summary so far:
Many of the complaints levied against Ogg were not about its technical merits, but about its inadequate documentation -- a feature Matroska shares. Other complaints were about features of Ogg (such as mappings) which nearly every other container format has as well. ... I've only gotten about a quarter of the way through, so far.
... the last time we discussed this, didn't the consensus eventually become that ogg isn't a fun container to work with, despite the fact that the guy who wrote the rant about it was a moron for wanting to trim headers that contribute fractions of percents to the overall size of files? I know I personally have worked with ogg, and it was a pain in the ass, mostly because (as the author of the format admits) the documentation blows.
"Whenever you want information on the 'net, don't ask a question; just post a wrong answer."
-- Cancer Omega
The article is itself basically a summary of the format. If you don't care enough to read the article, probably all you should be worried about is what your iPod will and won't play.
Nearly every other container format+codec has exactly two bits that are codec dependent: an identifier (e.g. 'XVID' or "V_MPEG4/ISO/AVC" or a number) and binary private data/codec-specific init data/whatever you want to call it. Some codecs in some containers additionally define one bitstream, if the codec has multiple possible (h.264).
Timestamps, dimensions, aspect ratio, framerate, samplerate, etc. are stored in codec-independant ways in the container.
Ogg is not like that at all. The only thing it stores in a codec-independant manner is framing. Every other piece of information you might expect a container to have is stored in a codec-dependant manner. Even metadata!
I have no fucking clue why the creator does not see this as the problem that it is for everyone that tries to work with ogg.
Oh, it's far worse than that: one of his competitors thinks it's not a good format.
"Our two-party system is like a bowl of shit looking at itself in a mirror." - Lewis Black
What possible use could you have for obtaining time stamps within a video stream that you cannot decode? As far as I'm concerned, a container format should provide enough information to determine two things:
Although there might be advantages of having other data encoded in a consistent fashion for people writing debug tools, when it comes to general software, as long as the CODEC software provides a standard set of accessor functions that return the data in a consistent way across all CODECs, it is by no means a requirement that they be stored in the same way, and in terms of the format's long-term flexibility, it is advantageous to allow the data to be stored in a codec-specific fashion.
Check out my sci-fi/humor trilogy at PatriotsBooks.
Funny, I thought the goal was to get away from a patent encumbered format. Does Ogg work? Is it reasonably close to MP3/4? I believe the answer is yes to both. Now is Ogg as efficient as MP3/4, I cannot really comment because I am not that technically versed. If a standard HTML5 Video is adopted, it should and must be patent unencumbered. Rather than this nitpciking, I would love to see that same energy poured into improving Ogg. Like any design, Ogg can be improved upon to reach the same robustness of MP3/4.
A better question is: why should the demuxer care about whether or not you can decode a given codec?
There has been absolutely nothing new with regards to codec timestamps since MPEG-1 introduced the concept of out-of-order coding and B frames. ogg was developed nearly a decade after that. Thus, there is and was no reason whatsoever to make timestamps codec-dependant.
And you're ignoring the problem that with ogg you have to hunt down and read the spec of every single codec that you want to implement demuxing support for, and that it is impossible to have, say, a generic lightweight file analyzer that tells you duration, codecs used, metadata, samplerate, framerate, etc.
And more importantly, they're wrong, in the eyes of its developer.
It's a cogent flame of his critics, but it also exposes what are plainly design differences-- and his critic's non-nuanced eye. You have to appreciate someone that can split hairs so finely when taking a set of arguments apart. I like thinkers.
---- Teach Peace. It's Cheaper Than War.
The best way to get documentation out of a project is trash talk it until a developer gets into such a frothy rage he explains it in a manner "even an idiot could understand." Used to do this all the time in the early years of Linux, worked like a charm :-)
As far as I'm concerned, a container format should provide enough information to determine two things:
Basically, what you just wrote is "there shouldn't be containers."
Is that really your position? I certainly can understand it. It has that quality to it that any hack can go ahead and start coding to handle it immediately, which is great. But checking with reality, we seem to have so many container formats because ID/LEN is just not enough for purposes.
"His name was James Damore."
I want to prepend my ignorance in this area, however one thing that occrs to me in your complaint is that isnt this really how the OSI model works? The higher level (container) has the info it needs to pass its payload along to the next level. http, being a payload in the data of IP, and so on. Now I cannot speak to if this makes sense in the contact of media storage, but parsing deep into the media itself would seem to be out of scope of a container, and then end up being a crutch that could break later for yet unimagined content.
From the article:
When MPEG-1 started it closely followed H.261. H.261 was very well written. Back in 1994 when Xiph started, MPEG-1 had already been going 6 years.
Ogg is full of strange fields and difficult to read structures. The author of the criticism is right to question it, especially when Ogg used similar fields but changed the names. There was never any need to change terminologies. H.261 and MPEG-1 were well written standards but not freely available and included patented technologies. The "not freely available" means that you have to buy it, not that it's secret.
If Xiph wanted to produce a free standard for video coding they could easily have adopted the same terminologies and similar structures, defining their own versions of them and recommending unpatented technologies. Instead they chose their weird terminology and rushed to come out with something different without spending the time to work out how difficult it would be for users to implement and what quality it would give. H.261 and MPEG were backed up by masses of research by companies and universities of which much was freely available in journals and conference proceedings.
The idea that "MPEG was hardly dominant" is the thought of someone who either didn't do his homework at the time or a revisionist. VCD (created 1993) was massively popular in the second half of the nineties, or doesn't that count ?
From the summary:
I wish it had been. If you want to refute a rant, pick some illustrative points and clearly answer them. Don't pick apart the text, all of it, sentence by sentence. Fancy colouring and highlighting don't make it better written.
My rant with Ogg is not so much the minute details of the format itself but that it works badly in a few common real world cases:
I know it's all been said before, but these are pretty common cases and Ogg isn't great when you have to deal with them. Everything else is nit-picking. I'm not a fan of the minute details of the format either, to be honest, but the above are real world examples of where it falls a little short. I should add that none of these issues make it unusable in any of those situations: just annoying.
He didn't say it was good, he explained why it is good.
Dilbert RSS feed
And you're ignoring the problem that with ogg you have to hunt down and read the spec of every single codec that you want to implement demuxing support for, and that it is impossible to have, say, a generic lightweight file analyzer that tells you duration, codecs used, metadata, samplerate, framerate, etc.
From the article:
"This is commonly asserted by detractors, but a combination of false and missing the point.
Ogg transport is based entirely on the page structure primitive, described accurately above. There are no other structures in the container transport itself. Higher level structures are built out of pages, not built into them. All Ogg streams conform to this page structure and all Ogg streams are parseable and demuxable without knowing anything about the codec. "Drop the needle" anywhere in an Ogg stream and start demuxing; you get the codec data out without knowing anything about the codec. You possibly won't know what exactly to do with that data without the codec mapping and the data is possibly useless without the codec anyway, but that's true of every container.
To avoid being accused of sidestepping the issue, I posit that the actual [if unstated] objection is that the Ogg container does not fully specify the granule position in the transport specification. Beyond a few requirements, a codec mapping defines the granule position spec for that codec's streams, not the Ogg spec. In theory, this would mean that without codec knowledge or some other place to find the granule position definition, a decoder missing the codec for a given stream would not be able to determine the timestamp on the stream that it is not capable of decoding anyway. In practice, the granule position mapping does in fact exist in the stream metadata within the Skeleton header[7] (as it would be in Matroska or NUT). Additionally, the Ogg design allows implementations to ignore the pretty design theory and just do things the way other containers do by building granule position calculation into the mux implementation.
There's specific considered reasons for the granulepos design which take some space to explain accurately. Because Mr. Rullgard also wrote a lengthy diatribe against Ogg timestamping[8], I'll leave the explanation for there and link to it here when my response to the other article is live."
Just how much money is MPEG-LA making on their patent pool? How much are they spending on bad mouthing OGG to preserve/increase their income?
Treat any criticism of proprietary product competitors with a very large grain of salt.
Particularly against free competitors since it's legally safer as they often don't have the legal resources to fight half-truths and innuendo.
Good to see Monty's refutation.
---
Anonymous company communication is unethical and can and should be highly illegal. Company legal structures require accountability.
I wouldn't assume because the OSI model works that way means its the right model for a video container format.
And, given the plethora of systems out there that have had to add functionality to introspect higher layers while routing lower layers, I wouldn't even assume the OSI model is actually the right one for networking, either.
>Ogg transport is based entirely on the page structure primitive, described accurately above. There are no other structures in the container transport itself. Higher level structures are built out of pages, not built into them.
And my argument is that a container should provide more than just framing. Hell, many codecs provide framing themselves and don't need container framing.
>All Ogg streams conform to this page structure and all Ogg streams are parseable and demuxable without knowing anything about the codec.
Only in the sense that you can find frame boundaries, not in the sense that you can do anything useful with them whatsoever. And indeed, the only thing you can do is drop and ignore pages for streams with codecs you don't recognize.
>and the data is possibly useless without the codec anyway, but that's true of every container.
Dead wrong. As a trivial example, remuxing.
And didn't you just say that you can parse streams without knowing about the codec?
>In practice, the granule position mapping does in fact exist in the stream metadata within the Skeleton header
Too bad that in practice, I've seen a skeleton header maybe once. And anything optional is guaranteed to be missing in many cases. Thus to demux a new codec you still have to find the codec spec, find the ogg mapping, write the granule demangler, write a parser for the codec headers, etc. instead of adding a single entry to a table like you would for sane containers.
>Additionally, the Ogg design allows implementations to ignore the pretty design theory and just do things the way other containers do by building granule position calculation into the mux implementation.
I'm not really sure what he's talking about here, but ogg certainly doesn't allow you to do like other containers do and store the unmangled timestamp.
Mans Rullgard:
"Ogg considered harmful"
Monty Montgomery:
""Ogg considered harmful" considered harmful"
entropy happens
It looks like it's depending on the information from each codec though. What happens if you don't have one of the codecs installed?
You're wrong, but instead of pointing out why, I'll just note that my teeth and hairstyle are better than yours, and that my opinion is the opinion of a proven winner. Only disagreeable people would disagree with me!
(This comment is known to cause cancer in the state of California.)
Too bad that in practice, I've seen a skeleton header maybe once. And anything optional is guaranteed to be missing in many cases. Thus to demux a new codec you still have to find the codec spec, find the ogg mapping, write the granule demangler, write a parser for the codec headers, etc. instead of adding a single entry to a table like you would for sane containers.
I think this speaks to your own inexperience more than anything else. Here's an ogg video with a Skeleton stream:
http://videos.videoonwikipedia.org/video/275/cell-phone-engineerguyogv
You can find many more with Skeleton streams at http://videos.videoonwikipedia.org or http://openvideo.dailymotion.com or http://www.archive.org or many other sites. I can only conclude that you are not very knowledgeable about ogg usage in practice.
Okay, I'll uninstall libtheora from my Fedora 12 desktop:
# rpm --nodeps -e libtheora-1.1.0-1.fc12.x86_64 libtheora-1.1.0-1.fc12.i686
Totem no longer plays Elephants_Dream.ogg because libtheora is not installed. It outputs a warning:
$ totem Elephants_Dream.ogg
(totem:8219): GStreamer-WARNING **: Failed to load plugin '/usr/lib64/gstreamer-0.10/libgsttheora.so': libtheoraenc.so.1: cannot open shared object file: No such file or directory
Let's try ogginfo now:
$ ogginfo Elephants_Dream.ogg
Processing file "Elephants_Dream.ogg"...
New logical stream (#1, serial: 1e05b679): type skeleton
New logical stream (#2, serial: 72ba3177): type theora
New logical stream (#3, serial: 30fa15ff): type vorbis
Theora headers parsed for stream 2, information follows...
Version: 3.2.1
Vendor: Xiph.Org libThusnelda I 20081201
Width: 426
Height: 240
Total image: 432 by 240, crop offset (0, 0)
Framerate 24/1 (24.00 fps)
Aspect ratio undefined
Colourspace: Rec. ITU-R BT.470-6 Systems B and G (PAL)
Pixel format 4:2:0
Target bitrate: 0 kbps
Nominal quality setting (0-63): 32
User comments section follows...
ENCODER=ffmpeg2theora-0.24
Vorbis headers parsed for stream 3, information follows...
Version: 0
Vendor: Xiph.Org libVorbis I 20080501
Channels: 2
Rate: 48000
Nominal bitrate: 80.000000 kb/s
Upper bitrate not set
Lower bitrate not set
User comments section follows...
ENCODER=ffmpeg2theora-0.24
Logical stream 1 ended
Vorbis stream 3:
Total data length: 4986375 bytes
Playback length: 10m:53.696s
Average bitrate: 61.023779 kb/s
Logical stream 3 ended
Theora stream 2:
Total data length: 30621851 bytes
Playback length: 10m:53.791s
Average bitrate: 374.698578 kb/s
Logical stream 2 ended
Information output is the same. Reinstall libtheora:
# yum install libtheora.x86_64 libtheora.i686
And totem plays Elephants_Dream.ogg once again.
That's because MPEG Transport Streams have an easily-accessible Presentation Time Stamp (PTS) in each GOP header, and it's reasonably easy to calculate the increment between PTSs (which will vary with framerate). The simplistic explanation is that the GOP header has the bit rate* & framerate; you can calculate the PTS increment either from the framerate or examining adjacent blocks, you then check the current PTS, calculate the desired PTS from that, and can then jump to the appropriate part of the file to find the PTS you're after.
(That's assuming you're working with a TS file, where the player can examine the first & last block to determine file length. With streaming, you're restricted to working with what's in the buffer (& hopefully your app knows how long the buffer is, since it allocated it!))
Ogg, AFAIK, doesn't have that info in the block header - IIRC it relies on the bitstream having presentation timing stored in it (i.e. none, in the case of most audio formats), which means you have to decode the block to find it. It was done that way to allow for variable framerates to be stored without having to build a huge index. MKV is a bit better in this respect, but it's a remarkably fragile container.
* It falls down a bit sometimes, particularly where the bitrate in the block header is set to max (15Mbps), or where you're using VBR. With the latter the calculation will usually get you in the ballpark; with both cases, some splitters/decoders calculate the bitrate themselves while playing, store it, and use that for seeking.
What part of "a well regulated militia" do you not understand?
That's great. What container are they going to put it in again?
(Less snarky version: VP8 is a codec. Ogg (& MKV, & AVI, & etc) are container formats that hold data encoded with various codecs. The situation is muddied somewhat with MPEG, as the various versions encompass both a codec and a container. DivX too, but the DivX container is nothing more than a bastardised .AVI container containing video encoded with the DivX codec.)
What part of "a well regulated militia" do you not understand?
No, that's not at all what I wrote. The fundamental purpose of a container is to consolidate multiple pieces of information into a single file. Without containers, you would need one file for your audio data and a separate one for your video data. You should not need to understand the video data to play the audio data; it is sufficient to know its length. And *that* is what I wrote.
Check out my sci-fi/humor trilogy at PatriotsBooks.
* Mandatory per-page CRC forces low-latency streaming to use single packet per page. Demux cannot continue before an entire page is received, which increases latency by the number of packets in a page (minus 1). Per-packet or even no CRC would be more appropriate.
I don't know anything about Ogg, but you're forced to single "bigger unit per codec packet" for very low latency with all most all containers, CRC or not. What forces you is either length coding (either coding of the whole bigger unit size or the individual packet sizes), or the encoding of the "bigger unit" timing information. At least ogg does appear to support moderately low latency for that single packet case.
Can you suggest another format which does zero (container) latency better while still being low overhead? I can think of some things which are about half the overhead compared to ogg in the single packet case, but they retain the high (several percent) overhead even when you don't care about very low overhead.
ID and LENGTH is not a "container" by any definition that I have ever heard of or used in practice.
What you are describing is a common ordinary linked list.
None of the containers that I am aware of require you to understand the video data in order to play the audio data, so what the heck are you actually getting on about? That "containers" should be ordinary linked lists?
In reality, thats not fit for purpose. That media file contains at least two stream, and while each stream can be treatable as independent, they can also be treatable as semi-dependent. There exists information that is shared between streams. For example, metadata.
If I am not required to decode the video stream, then you can't put the shared metadata in the video stream. If I am not required to decode the audio stream, then you can't put the shared metadata in the audio stream. So what then?
And thus, the media container is born. Linked lists just don't cut it. These formats are more than linked lists for a real (and I gave only one of them) reason.
"His name was James Damore."
From TFA:
An index is only marginally useful in Ogg for the complexity added; it adds no new functionality and seldom improves performance noticeably. Why add extra complexity if it gets you nothing?
You can do seeking without an index:
A binary search is discussed in the spec for ease of comprehension; implementation documents suggest an interpolated bisection search. So far, this is the same as Matroska and NUT.
The only difference being, Matroska implementers tend to be lazy about implementing the indexless seeking properly, and people tend to use indexes, thus propagating this myth even more.
The Vorbis source distribution includes an example program called 'seeking_example' that does a stress-test of 5000 seeks of different kinds within an Ogg file. Testing here with SVN r17178, 5000 seeks within a 10GB Ogg file constructed by concatenating 22 short Ogg videos of varying bitrates together results in 17459 actual seek system calls. This yields a result of just under 3.5 real seeks per Ogg seek request when doing exact positioning within an Ogg file. Most actual seeking within an Ogg file would be more appropriately implemented by scrubbing with a single physical seek.
And there you go. I don't know WTF is wrong with your players, but really, how can a total of four seeks bring your system to a crawl?
Don't thank God, thank a doctor!
Ah, the gotcha is in the source:
http://svn.xiph.org/trunk/vorbis-tools/ogginfo/ogginfo2.c
Ogginfo's source includes information on how to process the metadata for various codecs.
So, the grandparent's complaint is still valid. Ogginfo appears to require recompilation for every stream that they want to support inside an ogg container.
A DVD is MPEG-PS, not MPEG-TS. Your cable system and satellite feed are TS. Both are built on top of the PES layer.
MPEG-2 is the reason I have no hair left on my head.
The PTS isn't stored in the GOP header. The GOP header is defined in part 2 of the spec, the PTS is in part 1. So the PTS and the DTS are in the PES header. MPEG frames are typically sent out of order. You'll need to do a lot of decoding to figure out the frame rate from the PTS. The bitrate is just as tricky to determine if you're just looking at one layer. A transport multiplexer needs to know a lot about the video it's multiplexing to be able to maintain the proper bitrate and order the frames correctly, etc. etc.
So basically, without understanding your elementary stream to some degree, you can't do much with it in the system layer. Even something seemingly simple, like remuxing, isn't easy to do without knowing something about the structure of the elementary stream. I'm guessing that this is why Ogg doesn't even try to pretend like you can abstract the codec enough to do something meaningful with data you know nothing about - other than skip it, that is.
I've dealt with proprietary data that was multiplexed in MPEG-2 TS before, and there's not a whole lot you can do with it without knowing what it is.
What possible use could you have for obtaining time stamps within a video stream that you cannot decode?
Right, so much for Ogg.
This kind of answer, which amounts to "You shouldn't want to do that", is an absolutely certain indicator of a product that doesn't solve the problem that poeople actually have and never will, because when the inadequacies of the solution are pointed out, users are told they should have a different problem.
Every time I have ever been told by anyone anything like that it has been a sure indication that they have simply failed to understand the domain I am working in.
Blasphemy is a human right. Blasphemophobia kills.