Ogg is not the only freely available codec, and it didn't fulfill the requirements of all the Web browser vendors. We are still looking into a way to resolve this, though. Everyone agrees that the solution must be royalty free.
That's an interesting idea. I'll file it away for consideration. You can also send feedback to the lists (see e.g. http://www.whatwg.org/mailing-list#specs) or to me directly (ian@hixie.ch).
HTML5 keeps the XML syntax variant alive. The HTML5 spec in fact has two syntaxes, one for text/html called HTML, and one for XML called XHTML. In fact, the HTML5 spec is intended to be a replacement for the XHTML 1.x specs.
With HTML5 we are doing a few things to address the fact that authors write invalid content. One is that we are relaxing a lot of the content model requirements. Another is that we are allowing the "/>" style on elements that have no end tag (like can be written ). We're also simplifying some things like making the type="" attribute optional on the and elements.
There's also work to make validators for HTML5 that are far more detailed and friendly than the HTML4 validators ever have been.
But to be honest, this hasn't been the main focus of HTML5. We've been concentrating more on making the behaviour well defined for browsers, and on adding new features for authors to relax the need for proprietary technologies like Flash.
I don't think the spec ignores that the user has the final say, I think it is entirely true that if you have scripting disabled, you MIGHT be unable to fully convey the author's intent. What should that sentence say instead?
Here's the XHTML5 version of the page you quoted (basically no need for a DOCTYPE, and the type="" attribute on <script> is optional for JavaScript):
<html xmlns="http://www.w3.org/1999/xhtml" lang="en"> <head> <title>js</title> <script> <![CDATA[ function rewrite_noscript(){ var e = document.getElementById('noscript'); // Repopulate with innerHTML } window.onload = rewrite_noscript; ]]> </script> </head> <body> <div id="noscript"> No attempt has been made to ensure this page functions without javascript. </div> </body> </html>
Wait, I misunderstood what you wrote. I thought you were asking why HTML5 kept it, not why it was removing it. We're not removing it. We're keeping it, for precisely the reason you gave. The only change is we have made "_blank" be an invalid value, to encourage people to use named windows or iframes instead of annoying popup windows for all their links (as a user, I'd rather say when I want a link to open in a new tab).
HTML5 is the furthest thing from committee-driven development in the W3C. Basically, I'm a dictator and every piece of feedback goes through me. (This is a point of contention with a lot of people who disagree with my approach, which has basically been to focus on use cases, pragmatic arguments, and research, and to eschew "expert opinions" as the sole guide to what the spec should say.)
Also, spec writers aren't in charge of anything. This is actually a common fallacy, which leads to people writing specs without paying attention to their users and implementers -- just look at most specs coming out of the W3C. No, spec writers are in fact at the very bottom of the food chain. We can only specify things which the implementers want to implement, otherwise they'll ignore us, and we are only able to control what users do in so far as we tell them to do things that they want to do, otherwise they'll ignore us too. Just look at browser vendors ignoring specs they disagree with. Just look at how many pages have some sort of syntax error (over 93% according to a study of several billion documents I did last year).
With HTML5 we're specifically trying to avoid torpedoing what implementers and users are doing today. A huge part of the effort is to make the spec relevant, specify what users are doing, specify things that other specs left vague, add features where users are working around holes in the spec, etc.
As to whether my job is a "real job" or not... I can't speak to that. It's a lot of work, at least.:-)
Most of HTML5 was actually done outside of the W3C.
However, to address your earlier point, one of the big things we're doing with HTML5 is we're going and specifying the bits that all the other specs avoided, like 'window', like 'setTimeout', like how to parse HTML in the face of errors, and so on, and saying exactly how they should work, based on how browsers do them now, so that we can get the browsers to converge on one interoperable set of behaviours.
I'm also working on the Acid tests, e.g. Acid2 and Acid3, to foster interoperability on the older specs. It's working pretty well so far.
So... HTML5 should actually help bring the browsers closer on the bits that weren't specified before, and the Acid tests are directly intended to do that with the bits that _were_ specified before. If you want to help out, please do -- see the links above for how to help with Acid3, and the links below for how to help with HTML5:
The browser vendors are part of the working group, so they would be part of any discussions as to what to change. In practice, it's actually the browser vendors who request the changes -- typically, it's because the spec requires things that are contradictory or that don't really work in the real world for one reason or another, and the browser vendors thus would rather implement something else. The requirement for waiting until we have 2 complete implementations is so that we know, when we say the spec is done, that it really can be implemented and that such implementations really can be interoperable.
XHTML failed: hardly anyone uses it. According to studies I did at Google, using a sample of several billion pages, about 0.0044% of pages use XHTML with the XML MIME type, and about 15% of people try to use XHTML, by giving the XHTML namespace, but actually use HTML, by sending it with the text/html MIME type.
I'd rather work on a spec that is considered drivel but that everyone ends up using, than work on a spec that is theoretically perfect but which makes zero impact on the world at large.
Actually originally we wanted to remove the DOCTYPE altogether, and since the start tag is optional that would have made the boilerplate "" (the empty string), or "" if you want to include the start tag. Unfortunately, in non-HTML5 browsers, if there's no DOCTYPE, you'll get quirks mode, which we wanted to avoid. That's why we went with the shortest string we could find that triggered standards mode, namely "".
I agree that it's not ideal, but I couldn't really see a way around it.
Oh no, the errors on Web pages are all kinds of things. One of the most common errors was bogus content inside tables, for example (27% of pages had this error). (Like, ...)
The authors of the Acid2 test (primarily me) didn't actually include any HTML4 parsing error handling tests. There were some CSS ones, but a far cry from all the ones I could think of. (Acid3 has even fewer.) The CSS and HTML5 standards define how you handle errors, by the way -- that's why these things are in the Acid Tests at all, it's all part of testing browser compatibility and conformance with the specs. It's not about wiggle room.
There are different Acid Tests; you can always pass any given Acid Test by supporting the specs it tests. It's true that we want to keep providing new tests to encourage browser vendors to keep doing better, but it's wrong to say that you can't ever pass an Acid Test. (It's not even true that you can't help doing badly at an Acid Test -- there's a reason Safari and Firefox did better at Acid2 than IE when the test came out, and that's simply that they overall had a better implementation of the specs.)
The Acid tests are easier for the less technically inclined to get a hold of. In practice, the browser vendors take Acid tests and turn them into small tests of the kind you describe before fixing them. For Acid2, I was the one who did a number of those small tests for Opera (I worked for Opera at the time) -- you can see them here:
According to my studies, about 93% of all pages out there are syntactically invalid in some way. So to render 99.44% of all pages out there correctly, a browser has to be able to handle syntactically invalid pages. That's why it's important to test handling of correct markup as well as incorrect markup.
(Based on a study I did at Google using several billion pages.)
I'm not sure how Todd did it for Acid1 -- I think he may have worked it out by hand and drawn it in photoshop.
For Acid2, I made a second version of the test that worked around all the bugs in Firefox, and then took a screenshot of Firefox.
For Acid3, I actually made the background of the reference rendering first as a simple HTML file, took a screenshot of that, made that the background of the reference.html file, and then added some text to the reference file and used absolute positioning to get the text where I wanted it. Then, I made the actual test page have the same theoretical rendering. If you look at the source of the test you'll see some of my notes where I work out the exact pixel alignment of some of the bits to make sure they match the reference rendering.
How does it not? The HTML5 parser spec (linked to above) lists exactly how to handle invalid content, explicitly specifying required error handling behaviour as well as specifically stating what is an error and what isn't, indicating its importance. It gives exact rules for parsing literally any sequence of Unicode characters, valid or not. (The spec even says how to handle the content after its parsed, even if its not valid, or e.g. if invalid content is created in the DOM using script, though that's in other parts of the spec than that cited above.)
As far as I can tell it's exactly what was being discussed -- literally, I mean, exactly, the great grandfather even linked to the same page. How is it not what was being discussed? I don't understand what you want, if that isn't it.
Ogg is not the only freely available codec, and it didn't fulfill the requirements of all the Web browser vendors. We are still looking into a way to resolve this, though. Everyone agrees that the solution must be royalty free.
That's an interesting idea. I'll file it away for consideration. You can also send feedback to the lists (see e.g. http://www.whatwg.org/mailing-list#specs) or to me directly (ian@hixie.ch).
HTML5 keeps the XML syntax variant alive. The HTML5 spec in fact has two syntaxes, one for text/html called HTML, and one for XML called XHTML. In fact, the HTML5 spec is intended to be a replacement for the XHTML 1.x specs.
With HTML5 we are doing a few things to address the fact that authors write invalid content. One is that we are relaxing a lot of the content model requirements. Another is that we are allowing the "/>" style on elements that have no end tag (like can be written ). We're also simplifying some things like making the type="" attribute optional on the and elements.
There's also work to make validators for HTML5 that are far more detailed and friendly than the HTML4 validators ever have been.
But to be honest, this hasn't been the main focus of HTML5. We've been concentrating more on making the behaviour well defined for browsers, and on adding new features for authors to relax the need for proprietary technologies like Flash.
Why is it better than the tag?
The namespace for the XML version of HTML5 is the same as XHTML 1.0 and 1.1: http://www.w3.org/1999/xhtml
I don't think the spec ignores that the user has the final say, I think it is entirely true that if you have scripting disabled, you MIGHT be unable to fully convey the author's intent. What should that sentence say instead?
Here's the XHTML5 version of the page you quoted (basically no need for a DOCTYPE, and the type="" attribute on <script> is optional for JavaScript):
Wait, I misunderstood what you wrote. I thought you were asking why HTML5 kept it, not why it was removing it. We're not removing it. We're keeping it, for precisely the reason you gave. The only change is we have made "_blank" be an invalid value, to encourage people to use named windows or iframes instead of annoying popup windows for all their links (as a user, I'd rather say when I want a link to open in a new tab).
HTML5 is the furthest thing from committee-driven development in the W3C. Basically, I'm a dictator and every piece of feedback goes through me. (This is a point of contention with a lot of people who disagree with my approach, which has basically been to focus on use cases, pragmatic arguments, and research, and to eschew "expert opinions" as the sole guide to what the spec should say.)
:-)
Also, spec writers aren't in charge of anything. This is actually a common fallacy, which leads to people writing specs without paying attention to their users and implementers -- just look at most specs coming out of the W3C. No, spec writers are in fact at the very bottom of the food chain. We can only specify things which the implementers want to implement, otherwise they'll ignore us, and we are only able to control what users do in so far as we tell them to do things that they want to do, otherwise they'll ignore us too. Just look at browser vendors ignoring specs they disagree with. Just look at how many pages have some sort of syntax error (over 93% according to a study of several billion documents I did last year).
With HTML5 we're specifically trying to avoid torpedoing what implementers and users are doing today. A huge part of the effort is to make the spec relevant, specify what users are doing, specify things that other specs left vague, add features where users are working around holes in the spec, etc.
As to whether my job is a "real job" or not... I can't speak to that. It's a lot of work, at least.
Most of HTML5 was actually done outside of the W3C.
However, to address your earlier point, one of the big things we're doing with HTML5 is we're going and specifying the bits that all the other specs avoided, like 'window', like 'setTimeout', like how to parse HTML in the face of errors, and so on, and saying exactly how they should work, based on how browsers do them now, so that we can get the browsers to converge on one interoperable set of behaviours.
I'm also working on the Acid tests, e.g. Acid2 and Acid3, to foster interoperability on the older specs. It's working pretty well so far.
http://ln.hixie.ch/
http://www.webstandards.org/action/acid3
So... HTML5 should actually help bring the browsers closer on the bits that weren't specified before, and the Acid tests are directly intended to do that with the bits that _were_ specified before. If you want to help out, please do -- see the links above for how to help with Acid3, and the links below for how to help with HTML5:
http://blog.whatwg.org/w3c-restarts-html-effort
The browser vendors are part of the working group, so they would be part of any discussions as to what to change. In practice, it's actually the browser vendors who request the changes -- typically, it's because the spec requires things that are contradictory or that don't really work in the real world for one reason or another, and the browser vendors thus would rather implement something else. The requirement for waiting until we have 2 complete implementations is so that we know, when we say the spec is done, that it really can be implemented and that such implementations really can be interoperable.
XHTML failed: hardly anyone uses it. According to studies I did at Google, using a sample of several billion pages, about 0.0044% of pages use XHTML with the XML MIME type, and about 15% of people try to use XHTML, by giving the XHTML namespace, but actually use HTML, by sending it with the text/html MIME type.
I'd rather work on a spec that is considered drivel but that everyone ends up using, than work on a spec that is theoretically perfect but which makes zero impact on the world at large.
Actually originally we wanted to remove the DOCTYPE altogether, and since the start tag is optional that would have made the boilerplate "" (the empty string), or "" if you want to include the start tag. Unfortunately, in non-HTML5 browsers, if there's no DOCTYPE, you'll get quirks mode, which we wanted to avoid. That's why we went with the shortest string we could find that triggered standards mode, namely "".
I agree that it's not ideal, but I couldn't really see a way around it.
target="" is mostly useful for targetting iframes.
Oh no, the errors on Web pages are all kinds of things. One of the most common errors was bogus content inside tables, for example (27% of pages had this error). (Like, ...)
The authors of the Acid2 test (primarily me) didn't actually include any HTML4 parsing error handling tests. There were some CSS ones, but a far cry from all the ones I could think of. (Acid3 has even fewer.) The CSS and HTML5 standards define how you handle errors, by the way -- that's why these things are in the Acid Tests at all, it's all part of testing browser compatibility and conformance with the specs. It's not about wiggle room.
There are different Acid Tests; you can always pass any given Acid Test by supporting the specs it tests. It's true that we want to keep providing new tests to encourage browser vendors to keep doing better, but it's wrong to say that you can't ever pass an Acid Test. (It's not even true that you can't help doing badly at an Acid Test -- there's a reason Safari and Firefox did better at Acid2 than IE when the test came out, and that's simply that they overall had a better implementation of the specs.)
As opposed to...?
We write those tests too, they're called test suites and if you look at my site you'll find literally hundreds if not thousands of them:
http://hixie.ch/tests/adhoc/
The Acid tests are easier for the less technically inclined to get a hold of. In practice, the browser vendors take Acid tests and turn them into small tests of the kind you describe before fixing them. For Acid2, I was the one who did a number of those small tests for Opera (I worked for Opera at the time) -- you can see them here:
http://www.hixie.ch/tests/evil/acid/002/opera001.html
http://www.hixie.ch/tests/evil/acid/002/opera002.html
http://www.hixie.ch/tests/evil/acid/002/opera003.html
http://www.hixie.ch/tests/evil/acid/002/opera004.html
http://www.hixie.ch/tests/evil/acid/002/opera005.html
http://www.hixie.ch/tests/evil/acid/002/opera006.html
http://www.hixie.ch/tests/evil/acid/002/opera007.html
http://www.hixie.ch/tests/evil/acid/002/opera008.html
http://www.hixie.ch/tests/evil/acid/002/opera009.html
http://www.hixie.ch/tests/evil/acid/002/opera010.html
http://www.hixie.ch/tests/evil/acid/002/opera011.html
They're not as exciting as the smiley face, so they don't get the media's attention in the same way.
According to my studies, about 93% of all pages out there are syntactically invalid in some way. So to render 99.44% of all pages out there correctly, a browser has to be able to handle syntactically invalid pages. That's why it's important to test handling of correct markup as well as incorrect markup.
(Based on a study I did at Google using several billion pages.)
I'm not sure how Todd did it for Acid1 -- I think he may have worked it out by hand and drawn it in photoshop.
For Acid2, I made a second version of the test that worked around all the bugs in Firefox, and then took a screenshot of Firefox.
For Acid3, I actually made the background of the reference rendering first as a simple HTML file, took a screenshot of that, made that the background of the reference.html file, and then added some text to the reference file and used absolute positioning to get the text where I wanted it. Then, I made the actual test page have the same theoretical rendering. If you look at the source of the test you'll see some of my notes where I work out the exact pixel alignment of some of the bits to make sure they match the reference rendering.
Aw but if you remove the "Open" in "Office Open" how are they supposed to confuse people who ask about "Open Office"?
I updated the FAQ answer with more details. Let me know if it makes more sense now.
HTML became ISO/IEC 15445:2000.
Already ordered...
How does it not? The HTML5 parser spec (linked to above) lists exactly how to handle invalid content, explicitly specifying required error handling behaviour as well as specifically stating what is an error and what isn't, indicating its importance. It gives exact rules for parsing literally any sequence of Unicode characters, valid or not. (The spec even says how to handle the content after its parsed, even if its not valid, or e.g. if invalid content is created in the DOM using script, though that's in other parts of the spec than that cited above.)
As far as I can tell it's exactly what was being discussed -- literally, I mean, exactly, the great grandfather even linked to the same page. How is it not what was being discussed? I don't understand what you want, if that isn't it.
http://www.whatwg.org/specs/web-apps/current-work/ multipage/section-parsing.html#parsing