Creative Commons and every open-source license in existence are _needed_ only because copyright exists. I'd be much happier with the copyright law actually giving people the right to copy and just protecting attribution rights than with the way things are set up now. We wouldn't need those licenses then.
Anyway. I stand by my original statement which is that copyright, as it stands today, is not something that society, as it stands today, agreed to. It was lobbied for by special interest groups. The original US copyright law, and the original British copyright law and other copyright doctrines that it was based on, may well have been acceptable to the societies of the time, where copying and publishing were expensive, but those times are long gone.
When a law is almost uniformly ignored the way that today's copyright is, it's hard to argue that society agrees with it.
HTML5 doesn't actually have the problem of some parts being "delayed" because of other parts being immature -- the spec has annotations all the way down showing how stable each section is, and browsers (including Microsoft!) are implementing it. The HTML5 spec has been progressing much faster, with much more input being taken into account, than other specs at the W3C. In fact, splitting the spec would likely make things go significantly slower, since it would mean that there would be much more cross-group and cross-spec coordination to do.
As far as splitting out the spec goes, I don't think anyone especially disagrees that it should happen. The problem is that we don't have anyone who is volunteering to do the work.
Until I started working on HTML5, there was no spec that defined "window" (as in, window.location, window.document, etc), there was no spec that defined XMLHttpRequest, there was no spec that defined the details of how to talk between iframes, etc. Does this mean nobody cares about those either?
Actually the spec has an annotation system where you can see how stable each section is, so we've somewhat side-stepped the issue of the whole thing not being done being a blocker for smaller parts.
In practice, implementors (including Microsoft!) are happily implementing HTML5 already.
Making the one spec be a bazillion smaller specs wouldn't stop us from having to make sure that each bit is compatible with implementations of that bit. Also, a smaller spec doesn't necessarily go much faster through the system than a big spec. Just look at XMLHttpRequest, which used to be part of HTML5 -- it's been split off for years, but it's still far from being a REC, and that's for a spec that's actually just describing existing browsers! This isn't anyone's fault, it's just that specs take a long time to get right. Anne's doing a great job on that spec, and I'm really glad he took it out of HTML5.
Hopefully other editors will come up and volunteer to take other things out of HTML5. Several people have tried; we have a very poor success rate for these specs. Generally, things that get taken out just languish and die a slow death until I fold them back into HTML5.
I'm the editor of HTML5, and I agree entirely with Microsoft here (and they're far from the only people saying this). The problem is that we have very few competent specification editors, and if we did have some, there are literally dozens of specifications that are really important to the Web that need editors. Splitting the spec wouldn't make the Web platform grow any faster, it would just mean big parts of the spec would languish even longer.
I don't see how omitting optional tags is "tag soup" or "poorly coded crap". Omitting optional tags doesn't in any way affect the precision of the meaning of the markup.
HTML5 defines the processing and rendering of all HTML content, including HTML4 content, including invalid content. Why does it matter what the author claims he is writing?
The meaning and implementation requirements of all features in the spec are well-defined (at least, that's the intent). Making them resilient to changing trends, making them abstract enough to be useful over long periods of time, making them concrete enough to be usable -- that's part of what makes writing specs an art. Hopefully we're doing a good job, but only time will tell. Certainly it's possible to get it wrong.
Regarding movie and audio content, accessibility concerns should really be addressed in the formats themselves, otherwise the accessibility augmentations get lost when the media resource is used in other, non-HTML, contexts (e.g. saved to disk). Most modern codecs support extensive accessibility features, certainly including things like subtitles.
Most document formats on the Web don't have them, and even those who do (e.g. all earlier versions of HTML) have not actually had them used for rendering differences. (Really the reason is the same as the reason why IE's version switch idea is a bad one.)
One problem is what happens in older UAs when you change the behaviour and the older UAs visit a new page?
Also, consider people claiming to use the new version before the new version is supported (e.g. the way so many people claim to use XHTML) -- new browsers would "break" those pages when they came out, since they expect the old behaviour despite claiming the new version. (With XHTML, the example would be using an XML parser on XHTML files sent as text/html -- it's not possible, you'd be reporting XML parse errors all over the place, and your market share would drop like a stone.)
On removing features: browsers don't actually ever remove features, unless they have near-zero use. I don't expect browsers to ever not support <font>, for instance. I don't see that as a problem. Indeed HTML5 will eventually have a section defining how to support these old features. That doesn't make them a part of the language.
Maybe eventually the spec will get "too" complex, but it's also likely that eventually we'll have something so radically better than HTML that it is actually worth migrating to a new format altogether. People are always trying to do this, and they succeed occasionally.
Anyway. It's hard work to make a spec backwards- and forwards- compatible when the installed base is as large as HTML's. But it's not impossible, if you're careful, and patient. I think it's worth it.
Not ever breaking the previous formats in any way at all is a good thing.
Validation should be against the latest version, otherwise you'll be telling authors not to use new features (which is dumb) and not telling them about the mistakes that earlier versions didn't know about (which is also dumb). Thus validators also don't need a version switch.
I don't understand what you are saying or asking in your first paragraph. Browsers aren't going to ignore the new HTML5 elements when they find them in XHTML1 documents, they'll just apply their HTML5 meaning. All the elements that are in both versions have the same processing rules. What's deceptive about this? What do DTDs have to do with what browsers do?
Mostly the decision of what new semantic elements to include or not was based on seeing what authors were missing the most. For instance we looked at the most common class="" attribute values.
See e.g. http://code.google.com/webstats/2005-12/classes.html
Versioning is basically meaningless on the Web. Browser vendors (other than Microsoft, at least) have repeatedly said that they don't want to have multiple code paths for features, which means that they want each version of HTML to work the same as each previous version. Same for CSS, same for the DOM APIs.
If every version of HTML is going to be identical from the browser's point of view, why bother including any versioning information in there at all?
As far as validators go: the point of validators is to report errors. When HTML6 comes out, if there are things in HTML6 that are errors that aren't errors in HTML5, that presumably means we found bugs in the HTML5 spec, and so it is more helpful to authors if we report them than if we don't. Therefore validators should always validate against the latest spec (unless manually configured otherwise, of course), and the validators don't need a version number in the format.
Having version numbers in formats makes people do stupid things, like make behaviour depend on the version flag. Not having a version number in the format makes people notice that kind of mistake more (since then explicit flags have to be invented to make the mistake, instead of just using the version number in the format).
We don't want to invalidate documents for no good reason, and we don't really want to require , since HTML has been happily going along with it being optional for so long.
I don't see how "<html version=5>" is any better than "<!DOCTYPE HTML>", to be honest. Both are just magic strings, at the end of the day. I'd rather have my magic strings look magical (with exclamation marks and capital letters) rather than look like any other part of the markup.:-)
...then you should configure your browser to do that. I, on the other hand, don't want that -- and I shouldn't have to fight the author to get what I want either.
The problem is that we don't want to disallow that kind of heuristic -- maybe there are certain well-known images or filename styles that could have specific behaviour. It's a tough area to give good rules for.
It's aimed at Web Applications that, e.g., might want to hide a bunch of content until you have logged in, because that content is irrelevant until you have logged in.
Actually, technically, we started from a blank slate, and only added the features we thought we should add. I'm not sure what you mean by "tags that really do need to be deprecated but can't be". We haven't just deprecated things like , we've obsoleted them altogether. They don't appear in the HTML5 language, deprecated or otherwise.
XHTML1 Strict and HTML4 Strict are exactly the same language, by the way; they just use a different syntax. Neither is "more minimal" than the other.
The spec says "non-visual user agents should apply image analysis heuristics to help the user make sense of the image". But yeah, maybe we could do more.
In a word, no.
Creative Commons and every open-source license in existence are _needed_ only because copyright exists. I'd be much happier with the copyright law actually giving people the right to copy and just protecting attribution rights than with the way things are set up now. We wouldn't need those licenses then.
Anyway. I stand by my original statement which is that copyright, as it stands today, is not something that society, as it stands today, agreed to. It was lobbied for by special interest groups. The original US copyright law, and the original British copyright law and other copyright doctrines that it was based on, may well have been acceptable to the societies of the time, where copying and publishing were expensive, but those times are long gone.
When a law is almost uniformly ignored the way that today's copyright is, it's hard to argue that society agrees with it.
It's not clear to be that society agreed to be bound by copyright.
HTML5 doesn't actually have the problem of some parts being "delayed" because of other parts being immature -- the spec has annotations all the way down showing how stable each section is, and browsers (including Microsoft!) are implementing it. The HTML5 spec has been progressing much faster, with much more input being taken into account, than other specs at the W3C. In fact, splitting the spec would likely make things go significantly slower, since it would mean that there would be much more cross-group and cross-spec coordination to do.
As far as splitting out the spec goes, I don't think anyone especially disagrees that it should happen. The problem is that we don't have anyone who is volunteering to do the work.
Yeah I'll be doing a round of adding intro sections and generally adding examples and such at some point before the spec is done.
I'd love to be able to make the Web browser developers not implement anything but what the spec says. However, they don't obey us. :-)
Better to have a spec for them to follow than to say "no, implement the rest first!" and have them make up their own thing.
Until I started working on HTML5, there was no spec that defined "window" (as in, window.location, window.document, etc), there was no spec that defined XMLHttpRequest, there was no spec that defined the details of how to talk between iframes, etc. Does this mean nobody cares about those either?
Actually the spec has an annotation system where you can see how stable each section is, so we've somewhat side-stepped the issue of the whole thing not being done being a blocker for smaller parts.
In practice, implementors (including Microsoft!) are happily implementing HTML5 already.
Making the one spec be a bazillion smaller specs wouldn't stop us from having to make sure that each bit is compatible with implementations of that bit. Also, a smaller spec doesn't necessarily go much faster through the system than a big spec. Just look at XMLHttpRequest, which used to be part of HTML5 -- it's been split off for years, but it's still far from being a REC, and that's for a spec that's actually just describing existing browsers! This isn't anyone's fault, it's just that specs take a long time to get right. Anne's doing a great job on that spec, and I'm really glad he took it out of HTML5.
Hopefully other editors will come up and volunteer to take other things out of HTML5. Several people have tried; we have a very poor success rate for these specs. Generally, things that get taken out just languish and die a slow death until I fold them back into HTML5.
I'm the editor of HTML5, and I agree entirely with Microsoft here (and they're far from the only people saying this). The problem is that we have very few competent specification editors, and if we did have some, there are literally dozens of specifications that are really important to the Web that need editors. Splitting the spec wouldn't make the Web platform grow any faster, it would just mean big parts of the spec would languish even longer.
The test isn't screwed up, it's a bug in IE8.
Actually, _I_ put me in charge of HTML5.
I don't see how omitting optional tags is "tag soup" or "poorly coded crap". Omitting optional tags doesn't in any way affect the precision of the meaning of the markup.
HTML5 defines the processing and rendering of all HTML content, including HTML4 content, including invalid content. Why does it matter what the author claims he is writing?
I didn't understand the rest of the message.
The meaning and implementation requirements of all features in the spec are well-defined (at least, that's the intent). Making them resilient to changing trends, making them abstract enough to be useful over long periods of time, making them concrete enough to be usable -- that's part of what makes writing specs an art. Hopefully we're doing a good job, but only time will tell. Certainly it's possible to get it wrong.
Regarding movie and audio content, accessibility concerns should really be addressed in the formats themselves, otherwise the accessibility augmentations get lost when the media resource is used in other, non-HTML, contexts (e.g. saved to disk). Most modern codecs support extensive accessibility features, certainly including things like subtitles.
Most document formats on the Web don't have them, and even those who do (e.g. all earlier versions of HTML) have not actually had them used for rendering differences. (Really the reason is the same as the reason why IE's version switch idea is a bad one.)
One problem is what happens in older UAs when you change the behaviour and the older UAs visit a new page?
Also, consider people claiming to use the new version before the new version is supported (e.g. the way so many people claim to use XHTML) -- new browsers would "break" those pages when they came out, since they expect the old behaviour despite claiming the new version. (With XHTML, the example would be using an XML parser on XHTML files sent as text/html -- it's not possible, you'd be reporting XML parse errors all over the place, and your market share would drop like a stone.)
On removing features: browsers don't actually ever remove features, unless they have near-zero use. I don't expect browsers to ever not support <font>, for instance. I don't see that as a problem. Indeed HTML5 will eventually have a section defining how to support these old features. That doesn't make them a part of the language.
Maybe eventually the spec will get "too" complex, but it's also likely that eventually we'll have something so radically better than HTML that it is actually worth migrating to a new format altogether. People are always trying to do this, and they succeed occasionally.
Anyway. It's hard work to make a spec backwards- and forwards- compatible when the installed base is as large as HTML's. But it's not impossible, if you're careful, and patient. I think it's worth it.
Not ever breaking the previous formats in any way at all is a good thing.
Validation should be against the latest version, otherwise you'll be telling authors not to use new features (which is dumb) and not telling them about the mistakes that earlier versions didn't know about (which is also dumb). Thus validators also don't need a version switch.
I don't understand what you are saying or asking in your first paragraph. Browsers aren't going to ignore the new HTML5 elements when they find them in XHTML1 documents, they'll just apply their HTML5 meaning. All the elements that are in both versions have the same processing rules. What's deceptive about this? What do DTDs have to do with what browsers do?
Mostly the decision of what new semantic elements to include or not was based on seeing what authors were missing the most. For instance we looked at the most common class="" attribute values.
See e.g. http://code.google.com/webstats/2005-12/classes.html
Versioning is basically meaningless on the Web. Browser vendors (other than Microsoft, at least) have repeatedly said that they don't want to have multiple code paths for features, which means that they want each version of HTML to work the same as each previous version. Same for CSS, same for the DOM APIs.
If every version of HTML is going to be identical from the browser's point of view, why bother including any versioning information in there at all?
As far as validators go: the point of validators is to report errors. When HTML6 comes out, if there are things in HTML6 that are errors that aren't errors in HTML5, that presumably means we found bugs in the HTML5 spec, and so it is more helpful to authors if we report them than if we don't. Therefore validators should always validate against the latest spec (unless manually configured otherwise, of course), and the validators don't need a version number in the format.
Having version numbers in formats makes people do stupid things, like make behaviour depend on the version flag. Not having a version number in the format makes people notice that kind of mistake more (since then explicit flags have to be invented to make the mistake, instead of just using the version number in the format).
We don't want to invalidate documents for no good reason, and we don't really want to require , since HTML has been happily going along with it being optional for so long.
:-)
I don't see how "<html version=5>" is any better than "<!DOCTYPE HTML>", to be honest. Both are just magic strings, at the end of the day. I'd rather have my magic strings look magical (with exclamation marks and capital letters) rather than look like any other part of the markup.
...then you should configure your browser to do that. I, on the other hand, don't want that -- and I shouldn't have to fight the author to get what I want either.
Yeah, this is one of the things on my list of things to look at.
http://www.whatwg.org/issues/#graphics-iframe
(see in particular the e-mails with the subject "sandboxing ideas")
We don't really want a version attribute at all.
We can't put anything on the <html> start tag, because that tag is optional, and we don't want to require it to be present to trigger standards mode.
The "<!DOCTYPE HTML>" thing isn't that big a deal, IMHO.
The problem is that we don't want to disallow that kind of heuristic -- maybe there are certain well-known images or filename styles that could have specific behaviour. It's a tough area to give good rules for.
It's aimed at Web Applications that, e.g., might want to hide a bunch of content until you have logged in, because that content is irrelevant until you have logged in.
Actually, technically, we started from a blank slate, and only added the features we thought we should add. I'm not sure what you mean by "tags that really do need to be deprecated but can't be". We haven't just deprecated things like , we've obsoleted them altogether. They don't appear in the HTML5 language, deprecated or otherwise.
XHTML1 Strict and HTML4 Strict are exactly the same language, by the way; they just use a different syntax. Neither is "more minimal" than the other.
The spec says "non-visual user agents should apply image analysis heuristics to help the user make sense of the image". But yeah, maybe we could do more.