HTML5 has a defined parsing model and is not actually any harder or slower to parse than XML. In fact, I have heard implementors from several browser vendors say that HTML5's parser spec is easier to implement than XML, and that there should be no performance difference.
There are also a growing number of HTML5 parsers out there, including some for Python, Ruby, and Java, with more being written. The spec makes it actually really brain-dead easy to implement an HTML5 parser that is compatible with Web content, and a big test suite has developed around it which makes tracking down bugs even easier.
Regarding <br/> vs <br>, HTML5 allows both in text/html (though the / has no effect, it's just ignored).
I've put XHTML tests into Acid3, so hopefully that will convince Microsoft to get with the programme and support it. We'll see.
I agree that CSS is better than . My question was why is style="" (HTML that happens to include a CSS declaration) better than (also HTML, which happens to include CSS values)?
HTML5 actually introduces a bunch of stuff to make it into more than a layout language, and more of a language and application description language, with things like <article>, <section>, <footer>, <dialog>, <datagrid>, etc. Hopefully that will encourage separation of layout/formatting/style and semantics, but we'll see.
For interactive tables, we have in HTML5. That supports sorting.
<datagrid> will actually also resolve your third problem, as you can provide a dynamic data source for <datagrid> which populates dynamically as the user scrolls. It's not pagination, though.
Alternating colours can be done today in browsers that support:nth-child (part of the Selectors spec), but that's a CSS issue, not HTML.
alt="" (with the empty string) means that the image is decorative, and should be ignored altogether. However, if you're on a page where the image is the main item of interest, it would be silly to just ignore the image altogether.
Certainly it's possible that screen readers should have better behaviour than just to read out the filename if the alt="" is missing.
The HTML5 spec includes an XHTML variant, just like it includes a text/html variant. The spec itself is agnostic about which you should use, you can use whichever one you want.
I think there's a lot more to HTML5 than just audio and video, though. Some new features, like <canvas>, are already widely implemented. The HTML5 parsing spec is already revolutionising how people handle HTML on the server side (see e.g. html5lib). The DOM Level 0 stuff that we're specifying is going to make it a lot easier to get the browsers to align on their weird heretofore-undocumented APIs.
As far as staying on HTML4 -- good! HTML5 isn't anywhere near ready yet.
Disclaimer: I've been using MathML+XHTML since the late 90s.
Math is a hard issue for various reasons. MathML in particular has two variants, one for encoding the semantics of the maths (equivalent to HTML's elements like <p>, <em>, etc), and one for encoding the presentation (similar to, though not quite as bad as, <font> tags and style="" attributes). Unfortunately, there's no really good way to go from semantic MathML to a rendering, and the browser that supports MathML (Firefox) only does presentational MathML.
Another problem is that there is very little desire in the browser space for implementing MathML. I don't know of any vendor other than Mozilla that has any desire to implement MathML, and even in Mozilla, MathML support has always been a second-class citizen that runs the risk of being cut out at a moment's notice.
MathML also has the problem that it is very verbose. Writing it is painful, and even if you write it with an equation editor, maintaining it later is annoying.
As far as HTML5 goes, we've been looking into how to address mathematics. It's not clear how to proceed. One option is to define how MathML can be written in text/html, but then do we define content MathML or presentational MathML? Another option is to define a new vocabulary that maps to MathML using certain defined rules, but again, which variant to we use? We could just have a generic namespacing mechanism for text/html, but that introduces all kinds of really hard problems and is not yet a solve problem. None of these suggestions solve the problem of browser vendors not wanting to support MathML, either.
If you want to take part in these discussions, please feel free to do so. See http://whatwg.org/mailing-list#specs for the link to join the WHATWG list, and http://blog.whatwg.org/w3c-restarts-html-effort for the link to join the W3C list. We also have IRC channels, see http://wiki.whatwg.org/wiki/IRC for details.
There's no difference between XHTML1 and XHTML5. They have identical processing requirements. You don't need to distinguish them.
I'll see about adding a sentence to remind authors not to rely on script if at all possible.
HTML5 adoption hasn't really started yet, but that's a good thing, we're nowhere near ready for adoption. Even basic things in the spec are still changing in big ways at the moment. There have always been plans to write shims for adding HTML5 support to IE, e.g. http://excanvas.sourceforge.net/ provides <canvas> support in IE today, so that you can use <canvas> in all browsers. It's early days still as far as that goes. We can add support in this way for many features, in ways far easier than for XHTML.
We're removing a lot of the presentational stuff from HTML5, but there's nothing we can do in the spec that I can think of which would stop people from using the old stuff or using layout tables or putting elements or style="" attributes everywhere.
As far as other things go, I'm working on the Acid tests (http://www.acidtests.org/) and others are working on new HTML5 validators, both of which might help to make Web developers' lives easier, which might help.
Beyond that, I don't know what we can do. Suggestions welcome.
Well, we need tables to be able to represent tabular data. How else would you, for example, represent an invoice? Or a timetable? There are certain things for which tables make sense.
Naturally, semantic tables should never be used for layout purposes, and that has never been allowed by any of the HTML specs.
I'd rather have a spec that is perfect and correctly implemented than one that is perverted and correctly implemented, but I don't see how to get there from here.
Could you suggest some things that you think are perverted that you think we should change to be more perfect?
Evidence I've seen suggests that actually hand-authoring is still very common.
Regarding editors, I've yet to see a WYSIWYG editor that creates conforming markup that doesn't abuse the semantics of elements. I'd love to see someone find a way to do this, but until someone does, we can't design our spec on the assumption that it will happen.
Regarding your last point: I don't control the media.:-) I would imagine that the media would find their audience less receptive to being told to use XHTML than they would to being told they can keep using HTML and that they are now being given even more toys.
In HTML5 there is no ambiguity even in the face of very invalid content, since the HTML5 spec very strictly defines how you are to parse any random bytestream.
But even in HTML4, which didn't define error handling, omitting optional end tags didn't make the document ambiguous. HTML4 defined (through SGML) how optional end tags were to be processed.
Anyway, HTML5 doesn't take a position on this XML vs HTML issue -- it defines both an XML syntax and a text/html syntax, and lets the author pick which he prefers.
I don't understand what you are talking about with the last three paragraphs of your document. HTML5 goes to quite extreme lengths to separate semantics and style.
What you describe seems more like a stylistic thing than a semantic thing (e.g. it wouldn't really make much sense in a speech browser, as far as I can tell). I recommend suggesting it to the CSS working group.
Actually there was very little pressure to remove the text from the spec (and absolutely no pressure to do so during the last HTML5 meeting), I did it purely because the text in the spec was basically a lie. It promised that browser vendors would implement Ogg, but not all vendors are willing to implement Ogg.
Everyone involved in the HTML5 effort basically agrees with your sentiment ("The Net needs real open source and royalty free codec standards NOW!"). But we're not sure how to get there. We're trying.
Yeah, that might well be why most authors don't care about XHTML. But it would also be a reason for the spec to not drop text/html yet either.
Personally, though, I wouldn't be surprised if most Web authors would prefer to keep using text/html even when faced with the realistic choice of using XML. XML is far more verbose, far more brittle (it requires showing error messages in the face of errors, and more things are considered errors in the first place with XML), and now that we have defined parsing for text/html, only really has one advantage, namely mixing in other vocabularies. We might even introduce that to text/html, if someone can work out a good way to do it.
Actually most specs at the W3C don't use this model, which is what explains a lot.:-)
But yeah, like with software development, you have to fix bugs when you find them, and you rarely find the bugs before actually trying to use the software (or in this case, the spec).
alt="" is required in almost all cases, but there are indeed some specific cases where it can be omitted (basically for sites like flickr who have no idea what the images are).
I guess one could make the argument that given and style="", the latter is more powerful and thus better, true.
HTML5 has a defined parsing model and is not actually any harder or slower to parse than XML. In fact, I have heard implementors from several browser vendors say that HTML5's parser spec is easier to implement than XML, and that there should be no performance difference.
There are also a growing number of HTML5 parsers out there, including some for Python, Ruby, and Java, with more being written. The spec makes it actually really brain-dead easy to implement an HTML5 parser that is compatible with Web content, and a big test suite has developed around it which makes tracking down bugs even easier.
Regarding <br/> vs <br>, HTML5 allows both in text/html (though the / has no effect, it's just ignored).
I've put XHTML tests into Acid3, so hopefully that will convince Microsoft to get with the programme and support it. We'll see.
I agree that CSS is better than . My question was why is style="" (HTML that happens to include a CSS declaration) better than (also HTML, which happens to include CSS values)?
HTML5 actually introduces a bunch of stuff to make it into more than a layout language, and more of a language and application description language, with things like <article>, <section>, <footer>, <dialog>, <datagrid>, etc. Hopefully that will encourage separation of layout/formatting/style and semantics, but we'll see.
For interactive tables, we have in HTML5. That supports sorting.
:nth-child (part of the Selectors spec), but that's a CSS issue, not HTML.
<datagrid> will actually also resolve your third problem, as you can provide a dynamic data source for <datagrid> which populates dynamically as the user scrolls. It's not pagination, though.
Alternating colours can be done today in browsers that support
alt="" (with the empty string) means that the image is decorative, and should be ignored altogether. However, if you're on a page where the image is the main item of interest, it would be silly to just ignore the image altogether.
Certainly it's possible that screen readers should have better behaviour than just to read out the filename if the alt="" is missing.
The HTML5 spec includes an XHTML variant, just like it includes a text/html variant. The spec itself is agnostic about which you should use, you can use whichever one you want.
We'll find a codec for in due course.
I think there's a lot more to HTML5 than just audio and video, though. Some new features, like <canvas>, are already widely implemented. The HTML5 parsing spec is already revolutionising how people handle HTML on the server side (see e.g. html5lib). The DOM Level 0 stuff that we're specifying is going to make it a lot easier to get the browsers to align on their weird heretofore-undocumented APIs.
As far as staying on HTML4 -- good! HTML5 isn't anywhere near ready yet.
Disclaimer: I've been using MathML+XHTML since the late 90s.
Math is a hard issue for various reasons. MathML in particular has two variants, one for encoding the semantics of the maths (equivalent to HTML's elements like <p>, <em>, etc), and one for encoding the presentation (similar to, though not quite as bad as, <font> tags and style="" attributes). Unfortunately, there's no really good way to go from semantic MathML to a rendering, and the browser that supports MathML (Firefox) only does presentational MathML.
Another problem is that there is very little desire in the browser space for implementing MathML. I don't know of any vendor other than Mozilla that has any desire to implement MathML, and even in Mozilla, MathML support has always been a second-class citizen that runs the risk of being cut out at a moment's notice.
MathML also has the problem that it is very verbose. Writing it is painful, and even if you write it with an equation editor, maintaining it later is annoying.
As far as HTML5 goes, we've been looking into how to address mathematics. It's not clear how to proceed. One option is to define how MathML can be written in text/html, but then do we define content MathML or presentational MathML? Another option is to define a new vocabulary that maps to MathML using certain defined rules, but again, which variant to we use? We could just have a generic namespacing mechanism for text/html, but that introduces all kinds of really hard problems and is not yet a solve problem. None of these suggestions solve the problem of browser vendors not wanting to support MathML, either.
If you want to take part in these discussions, please feel free to do so. See http://whatwg.org/mailing-list#specs for the link to join the WHATWG list, and http://blog.whatwg.org/w3c-restarts-html-effort for the link to join the W3C list. We also have IRC channels, see http://wiki.whatwg.org/wiki/IRC for details.
There's no difference between XHTML1 and XHTML5. They have identical processing requirements. You don't need to distinguish them.
I'll see about adding a sentence to remind authors not to rely on script if at all possible.
HTML5 adoption hasn't really started yet, but that's a good thing, we're nowhere near ready for adoption. Even basic things in the spec are still changing in big ways at the moment. There have always been plans to write shims for adding HTML5 support to IE, e.g. http://excanvas.sourceforge.net/ provides <canvas> support in IE today, so that you can use <canvas> in all browsers. It's early days still as far as that goes. We can add support in this way for many features, in ways far easier than for XHTML.
We're removing a lot of the presentational stuff from HTML5, but there's nothing we can do in the spec that I can think of which would stop people from using the old stuff or using layout tables or putting elements or style="" attributes everywhere.
As far as other things go, I'm working on the Acid tests (http://www.acidtests.org/) and others are working on new HTML5 validators, both of which might help to make Web developers' lives easier, which might help.
Beyond that, I don't know what we can do. Suggestions welcome.
Well, we need tables to be able to represent tabular data. How else would you, for example, represent an invoice? Or a timetable? There are certain things for which tables make sense.
Naturally, semantic tables should never be used for layout purposes, and that has never been allowed by any of the HTML specs.
I'd rather have a spec that is perfect and correctly implemented than one that is perverted and correctly implemented, but I don't see how to get there from here.
Could you suggest some things that you think are perverted that you think we should change to be more perfect?
HTML5 allows authors to use both XHTML and HTML.
:-) I would imagine that the media would find their audience less receptive to being told to use XHTML than they would to being told they can keep using HTML and that they are now being given even more toys.
Evidence I've seen suggests that actually hand-authoring is still very common.
Regarding editors, I've yet to see a WYSIWYG editor that creates conforming markup that doesn't abuse the semantics of elements. I'd love to see someone find a way to do this, but until someone does, we can't design our spec on the assumption that it will happen.
Regarding your last point: I don't control the media.
In HTML5 there is no ambiguity even in the face of very invalid content, since the HTML5 spec very strictly defines how you are to parse any random bytestream.
But even in HTML4, which didn't define error handling, omitting optional end tags didn't make the document ambiguous. HTML4 defined (through SGML) how optional end tags were to be processed.
Anyway, HTML5 doesn't take a position on this XML vs HTML issue -- it defines both an XML syntax and a text/html syntax, and lets the author pick which he prefers.
I don't understand what you are talking about with the last three paragraphs of your document. HTML5 goes to quite extreme lengths to separate semantics and style.
Let's try that again, without Slashdot eating my tags:
In HTML you can just say <option selected> with no attribute value and it'll work fine. It's even valid.
In HTML you can just say with no attribute value and it'll work fine. It's even valid.
(I'm the HTML5 spec's editor.)
What you describe seems more like a stylistic thing than a semantic thing (e.g. it wouldn't really make much sense in a speech browser, as far as I can tell). I recommend suggesting it to the CSS working group.
They don't (IE doesn't do XHTML), and I'm also not convinced that XHTML is necessarily the superior one.
Actually there was very little pressure to remove the text from the spec (and absolutely no pressure to do so during the last HTML5 meeting), I did it purely because the text in the spec was basically a lie. It promised that browser vendors would implement Ogg, but not all vendors are willing to implement Ogg.
Everyone involved in the HTML5 effort basically agrees with your sentiment ("The Net needs real open source and royalty free codec standards NOW!"). But we're not sure how to get there. We're trying.
Er. Wow. Sorry about that. I clearly need to read the comments here more carefully. :-)
Anyway. Yeah. The codec issue isn't resolved yet. The spec lists our requirements, and people are indeed working on addressing this.
Yeah, that might well be why most authors don't care about XHTML. But it would also be a reason for the spec to not drop text/html yet either.
Personally, though, I wouldn't be surprised if most Web authors would prefer to keep using text/html even when faced with the realistic choice of using XML. XML is far more verbose, far more brittle (it requires showing error messages in the face of errors, and more things are considered errors in the first place with XML), and now that we have defined parsing for text/html, only really has one advantage, namely mixing in other vocabularies. We might even introduce that to text/html, if someone can work out a good way to do it.
Actually most specs at the W3C don't use this model, which is what explains a lot. :-)
But yeah, like with software development, you have to fix bugs when you find them, and you rarely find the bugs before actually trying to use the software (or in this case, the spec).
HTML5 doesn't say that... where did you get that quote from?
alt="" is required in almost all cases, but there are indeed some specific cases where it can be omitted (basically for sites like flickr who have no idea what the images are).
See the part of the spec for more detail:
http://www.whatwg.org/specs/web-apps/current-work/multipage/section-embedded0.html#the-img
What's wrong with that attribute?