I don't really care if it's served as XML or not, the point is that if it's not well formed XML it becomes a massive ballache to deal with, because XML tools and libraries are so prevalent.
It's only syntax, it shouldn't be a big deal. There's plenty of XML-based tools that are useful, and HTML5 goes to some lengths to define the text/html (i.e. non-XML) syntax so you can still use those tools and just translate the syntax at the edges.
The text/html and XML syntaxes are based on exactly the same underlying conceptual model (the DOM tree), so you can switch without any radical changes. E.g. the validator.nu HTML5 parser implements the same APIs as standard XML parsers - drop it in front of your existing XML tools and libraries, stick an HTML serialiser on the other end, and your system can work pretty much the same as before (with the bonus of working for any arbitrary page on the web, not just the tiny fraction that are well-formed XML).
The ethos surrounding HTML5 is that well, lots of old sites didn't follow newer standards, so lets make those web sites standard by taking everything they did shit, and making that standard.
Who is helped by a standard that almost everybody ignores? If you, say, want to write code to parse HTML pages, and you try to implement what HTML4 specifies (based on SGML), your code will be pretty useless because HTML4 is incompatible with reality and you'll get incorrect output most of the time (stray characters, incorrectly nested elements, half the page text disappearing inside a misparsed script element, etc). Similarly if you implement what XHTML specifies, you'll fail since most pages aren't well-formed XML. You can declare that those pages are broken and non-standard but that doesn't stop them from existing and being a serious problem for anybody writing software that interacts with the web.
Nowadays you can just implement what HTML5 specifies (or find a library that already does it), and your parser will work identically to the current or near-future versions of all major browsers - it's defined in enough detail that there's no ambiguity in how to process any stream of bytes. That's never been possible before, when the standards were focused on some vision of a simple coherent syntax and refused to deal with the messy details that are critical in real life.
If you want to document a set of best practices for writing HTML, with rules for lowercase names and closing tags and quoting attributes and for indentation etc, that's fine and would be nice (especially if you could find a way to motivate people to follow the best practices - a decade of promoting XHTML doesn't seem to have stopped people writing terrible code so we need a better way). Meanwhile, HTML5 is solving the harder problem of how to cope with people who ignore those rules.
You can store a 63-bit integer or a 32-bit floating point value in a JavaScript pointer and only promote them to real objects wrapping 64-bit values when an operation would lose precision. This reduces memory required for JavaScript.
SpiderMonkey uses 64-bit value types on all architectures (x86, x86-64, ARM, etc), storing either a 64-bit float or a 32-bit int or a pointer (31 bits on 32-bit, 47 bits on 64-bit), so it shouldn't make any difference to their memory usage. (The non-float values get packed into the range of unused NaN float representations, to avoid ambiguity). I think other modern JS engines do pretty much the same thing. JS semantics are that numbers are 64-bit floats, so implementations couldn't really use 63-bit ints (too precise) or 32-bit floats (too imprecise) anyway, though 32-bit ints are a safe optimisation.
Have you seen the OpenSUSE Build Service? That can automatically build native packages for several distros (OpenSUSE, Fedora, Mandriva, Debian, Ubuntu (if you don't depend on anything in Universe)), and already has plenty of games, and isn't too hard to set up when you can copy from existing examples. (I've been trying to use it for 0 A.D. and it seems okay so far.)
Reminds me of Glasshouse. Hop into a nanoassembler gate, get your brain backed up, switch to a healthy new physical body if you fancy. Murder is a minor crime but identity theft is extremely serious. Works great until someone releases a worm that uses humans as transmission vectors, infecting the assembler gates and deleting certain memories from anyone who uses them. You have to put a lot of trust into whoever runs the technology, and they're bound to make mistakes.
Subsetting is not EOT functionality - EOT is basically just a wrapper around a TTF file, and subsetting just involves modifying the TTF, so you can do exactly the same in browsers that read raw TTF files. I've written a font optimizer tool (open source) that does that. (Windows has an API to generate embedded fonts with subsetting, which the WEFT tool uses; I'm not currently aware of any other subsetting implementations.)
Gazelle is from Microsoft Research, and their paper discusses the details of the security model - it's not just a marketing claim.
The idea is that every 'origin' (basically a domain name, which is used as the basis for access control in all modern browsers) is separated into its own sandboxed process. If a page on your domain embeds an iframe from an advertiser's domain, the iframe is rendered in a separate process, and all communication is handled through a Browser Kernel which enforces the security constraints (e.g. preventing the advert from touching or rendering anything outside its iframe box, even if an attacker can find a way to execute arbitrary code in it). Plugins are handled in the same way.
Chrome's security model doesn't handle that kind of separation of multiple sites within a single page. But Gazelle sacrifices some backward compatibility (e.g. it removes the document.domain attribute, and it requires all plugins to be rewritten to use the Browser Kernel instead of directly accessing the network or filesystem), which is unlikely to be acceptable in practice.
And Gazelle is certainly not a replacement for the IE engine - it's built on the existing IE7 components for parsing, rendering, scripting, etc. It's research, and the value is its ideas, some of which could perhaps be integrated into current browser engines to improve security. It's not meant to be a real browser engine, but it seems successful as a research experiment.
The PADDINGXXPADDING is just a standard artifact of the Visual C++ build process - there's a manifest XML string that's added to the.exe (for 'side-by-side' DLL dependency handling), and padding is added for some internal alignment requirements. (This article says the UpdateResource API is what adds that string). So it's nothing unusual or suspicious.
There's also an interesting quote from David Attenborough in response to people asking "why he did not give "credit" to God" for the subjects of his nature documentaries:
They always mean beautiful things like hummingbirds. I always reply by saying that I think of a little child in east Africa with a worm burrowing through his eyeball. The worm cannot live in any other way, except by burrowing through eyeballs. I find that hard to reconcile with the notion of a divine and benevolent creator.
My father wouldn't let me read this because it's somewhat anti-feminist.
"Somewhat"? In Flatland, the social status of men is proportional to their number of sides (triangles are the lowest class, and priests are nearly circles); women are even lower, being straight lines. Women are not allowed to walk in public spaces without swaying and emitting noises, so that men do not accidentally get impaled on them. They have to enter their houses by the back door. They are considered "wholly devoid of brain-power", driven by emotion and instinct and lacking memory, and they receive no education.
But it's social satire, not a reflection of the author's views. He was "a firm believer in equality of educational opportunity, across social classes and in particular for women", and the book is attempting to highlight a Victorian mindset that was still prevalent at that time. The women in the book act in far more complex ways than their men give them credit for. The author even says "To my readers in Spaceland the condition of our Women may seem truly deplorable, and indeed it is" - he's not happy with how they're treated, and readers in Spaceland will hopefully see that it's caused by the absurd class system holding them back, though the narrator can't avoid falling back into the prejudices of his society.
The book makes more sense when you understand the context. The Annotated Flatland is quite interesting, providing some background on the author and mathematics and the society of the time.
("more sense" doesn't mean it actually does make sense - it all still seems a bit muddled to me, with a random mixture of physical differences and social differences between people, and strange science (like Lamarckian evolution where the actions of a parent affect the number of sides (hence social status) not of themselves but of their offspring), and sections that I don't understand the point of (like the whole thing about colour being discovered and then banned - it makes sense within Flatland but is it meant to be satirising anything in real life?). Much of it is probably because the world has changed so drastically in 125 years that I just can't understand where the author was coming from. But it's an interesting book despite (or perhaps because of) that.)
Yep, but I was responding to the "note that their chip uses doubles, not floats" statement, and my point was that it does doubles and floats (presumably somewhat like SSE).
Larrabee gains its computational density from the 16-wide vector processing unit (VPU), which executes integer, single-precision float, and double-precision float instructions.
And it's definitely aimed largely at games: the paper gives performance studies of DirectX 9 rendering from Half Life 2, FEAR and Gears of War.
HTML 5 recently added inline MathML support, and temporarily defined inline SVG too (but then removed that since the SVG Working Group didn't like it). So you can write an HTML document like
<!DOCTYPE html> <title>MathML test</title> <p>Here's an equation: <math><msup><mi>e</mi><mrow><mi>i</mi><mi>π</mi></mrow></msup><mo>+</mo><mn>1</mn><mo>=</mo><mn>0</mn></math>
and send it as normal text/html, and (if this was implemented anywhere yet, which it isn't) it would work properly. HTML 5 doesn't allow arbitrary extension languages, but MathML and SVG were considered part of the 'web platform' and it was worth extending the HTML parser specifically to handle them.
Last I checked, HTML 5's working doc says that forms aren't going to change over html4.
They are going to change. It's not yet decided exactly how they will change – the HTML WG has Web Forms 2 (an extension of HTML4's forms), and the Forms WG is working on some rough ideas for trying to fit XForms into HTML5, and there is a joint Task Force that is meant to be working things out between the groups but hasn't actually managed to achieve anything yet. (None of the major browser developers has indicated much interest in implementing XForms, whereas Opera has already released an implementation of WF2 and there is some ongoing work to implement parts in Firefox and Safari, so the momentum is currently in that direction.)
allow forms to validate without having to have [div]s that do nothing but hold hidden fields because [input] is a presentation tag and therefore must be within a text-carrying tag
Web Forms 2 says "input elements of type hidden may be placed anywhere (both in inline contexts and block contexts)", which sounds like it satisfies your concern (and has the advantage of working in all existing web browsers, unlike a new <state> element).
can we PLEASE have them back so that we can use them for tabular data (like item names, prices, descriptions, etc)?
<table> has never been deprecated, and HTML5 still permits it. (Tables used for layout are not allowed, although that's impossible for an automatic validator to detect). There are already CSS properties that can replace cellpadding ('padding') and cellspacing ('border-spacing').
would it really kill the documentation writers to say what something has been deprecated BY?
It seems spec writers usually think that kind of thing should be described in tutorials or other documents, not in the specification. The HTML5 spec is far harder to read than HTML4 (because it's far more detailed, to fix the differences between implementations caused by HTML4's vagueness), so it really needs that kind of user-oriented documentation. The differences document gives a brief mention of what should be used instead of some obsolete features, but it would be nice to have more detail and examples for people who want to move to HTML5.
An alternative would be to have the compiler perform or insert the checks that, in current systems, are performed by the kernel and the hardware at run-time. This way, processes don't have to run in restricted mode and go through the kernel anymore, because they aren't going to do any of the things the kernel would prevent them from doing anyway. Of course, this requires a rather safer type system than C's, and it shifts trust from the kernel to the compiler - which raises issues about how you can know that the code you want to run was indeed compiled by a trustworthy compiler.
You can do this without a trusted compiler – when an untrusted compiler compiles a program, it can work out how to prove that the program is 'correct' (e.g. follows the restrictions on memory accesses, perhaps by first having the compiler insert "if memory access if out of range: abort" commands before every access it's unsure about), and then include the proof in the compiled code. The kernel just has to verify that the proof is valid, which is much easier than working out the proof in the first place. If the compiler lies about code being correct, the verifier will detect that and refuse to run it.
The compiler can accept an unsafe language like C, and just emit an error message (and hopefully tell the user where the problem is) if it's unable to prove the program is correct. It wouldn't be possible to run any arbitrary C program, but it wouldn't be necessary to teach people a whole new language.
Of course this is all active research, i.e. it doesn't actually work well enough in practice (and maybe it never will), but it's still an alternative that could work and avoids any reliance on trusted compilers.
The only way we will evolve on the web is with another bloody tag war.
I agree in general – though fortunately we've learned from last time, and there is more negotiation and less bloody war. One example is CSS animation: the WebKit developers designed and implemented a first draft, and provided it in their nightly builds, and sent a description to the CSS group to get feedback from developers of other browsers and from other people with relevant expertise.
Similarly, Opera proposed a <video> element earlier this year, and released an experimental alpha build with the feature. The HTML5 group developed a specification for it, and significantly extended the functionality based on feedback from relevant people (Apple, Google (YouTube), etc). Now Apple and Mozilla have experimental implementations of the same feature. There has been very little blood (except over the issue of codecs), and it seems a much better model than the old idea of simply releasing features in a new browser version and expecting your competitors to reverse-engineer your implementation. So there is some hope for the future.
It didn't affect normal images - it broke the drawImage function from the HTML 5 <canvas> element API, which is a fairly new feature and is used relatively rarely but actually quite widely (with ~10 independent bug reports in a couple of days).
Still, I agree it's an unacceptable failure of testing, and I should have said that more strongly. Even the most trivial automated testing of that feature would have caught the problem immediately. Looking at the new tests in Firefox 3, there's still only one which incidentally relies on drawImage. (I have several hundred browser-independent canvas test cases, so I guess I should see if they could be incorporated into Mozilla somehow, to avoid a repeat of this problem in this particular area...)
The interesting thing is that it was the fastest ever release of a browser update. John Resig gives most of the details: A security patch in Firefox 2.0.0.10 was incorrectly checked in, and introduced a bug which was not caught by the testing process. That was only discovered after the release, so the code was fixed and the whole release process had to start up again. Three days later, the 2.0.0.11 update is available for forty languages and three platforms.
So, it reflects badly on Mozilla's testing efforts, though that is an area where Firefox 3 has made significant improvements with automated testing. It reflects well on their release process, which can push out a critical update in just a few days.
Opera 9.x is one of the only stable browsers with tentative support for HTML 5.
That's not really true - the four major desktop browsers all support different features that are newly specified in HTML 5. For the ones other than Opera: Firefox 2 supports client-side storage; Firefox and Safari 2 support the <canvas> element; IE7 and Safari support the contenteditable attribute and drag-and-drop; Safari supports <input type=range>; IE and FF and Safari support <input autocomplete>.
(But Opera does have the only native implementation of the new forms stuff, and Audio, and cross-document messaging, and server-sent DOM events, so I think it's still fair to say it supports HTML 5 more than any other browser.)
I would hope it'll go through some standardisation process before they try adding it to an official release; and that process hasn't happened at all yet, so it would be quite a while. (Opera experimented with a <video> feature some months ago, then submitted a proposal to the WHATWG, and it changed quite substantially when put into the specification - hopefully they'll use the same process for other new features.)
At least they've improved their 2d canvas support for Opera 9.5 (adding getImageData, setTransform, etc, and fixing a few bugs) - but there's still a long way to go before any browser does 2d correctly or matches any other browser, so I guess it'll be years before 3d is well supported.
A big advantage of XHTML was that the conversion to a parse tree was unambiguous. Why give up that at this late date?
The conversion of HTML5 to a parse tree is unambiguous too – the spec defines exactly what happens to any input document, including ones full of syntax errors. (Or at least it's unambiguous to machines – humans may have a harder time working out how a badly broken document gets parsed). There's currently an parse tree viewer from html5lib (Python and Ruby), and an independently-developed Java one, and some other private or incomplete implementations, and they should all give the same output for whatever input you try.
HTML5 is much more strict than HTML4, from the perspective of browsers – e.g. it defines precisely how any sequence of bytes is parsed and the DOM tree that is produced, and HTML5-compliant browsers must follow those rules. HTML4 left most of the parser undefined, so browser developers had to make it up themselves, which caused the "you couldn't get anything to display consistently across browsers". A fundamental goal of HTML5 is that HTML documents (including the significant majority that are invalid and full of syntax errors) must work consistently across browsers, by being strict in how browsers handle any content.
It's much less strict than XHTML from an author's perspective, but that is orthogonal to the issue of browser inconsistency – any garbage can get parsed by an HTML5 parser without raising fatal errors, but it will be parsed the same by all HTML5 parsers.
a certain browser vendor that has 90% of the market is notably absent from this venture.
That's not really true, since Microsoft is present in the HTML WG which is now working on HTML 5, and Chris Wilson is a co-chair of the group. They have been extremely quiet in the group so far, though.
User agents may support any audio codecs and container formats. User agents must support the WAVE container format with audio encoded using the PCM format.
(Bit-rate, bit-depth and number of channels (and maybe other aspects?) are undefined - I assume the specification may end up adding some restrictions on what support is required, depending on what implementors suggest.)
I've experimented with a simple FPS engine using the HTML5 <canvas> element (which provides a 2D graphics API for JavaScript). It's not exactly Quake (it's much more like Duke Nukem 3D minus the gameplay), but it works in most browsers and it's not too terribly slow. It can even do multiplayer AJAX deathmatch, though that's not available online right now...
Indeed, but web sites using deprecated features do not get updated and do not go away, so web browsers continue supporting those features forever, and almost nothing has ever been "phased out" in practice. (Browsers also have to continue supporting features that were never specified or documented at all – e.g. the front page of IMDB accidentally uses <image>, which still gets treated like <img> because nobody wants to write a browser that doesn't work with such sites). Since all the browsers implement those things, and any other HTML consumer ought to work the same if it's going to work as well as possible on the web, HTML5 does specify how things like plaintext are handled, and so it should continue to be supported correctly in the future.
It's still true that deprecated features are usually bad ideas and it's strongly suggested to not use them, and plaintext seems like a particularly bad idea, but the danger of them being phased out and no longer implemented is quite small.
It's only syntax, it shouldn't be a big deal. There's plenty of XML-based tools that are useful, and HTML5 goes to some lengths to define the text/html (i.e. non-XML) syntax so you can still use those tools and just translate the syntax at the edges.
The text/html and XML syntaxes are based on exactly the same underlying conceptual model (the DOM tree), so you can switch without any radical changes. E.g. the validator.nu HTML5 parser implements the same APIs as standard XML parsers - drop it in front of your existing XML tools and libraries, stick an HTML serialiser on the other end, and your system can work pretty much the same as before (with the bonus of working for any arbitrary page on the web, not just the tiny fraction that are well-formed XML).
Who is helped by a standard that almost everybody ignores? If you, say, want to write code to parse HTML pages, and you try to implement what HTML4 specifies (based on SGML), your code will be pretty useless because HTML4 is incompatible with reality and you'll get incorrect output most of the time (stray characters, incorrectly nested elements, half the page text disappearing inside a misparsed script element, etc). Similarly if you implement what XHTML specifies, you'll fail since most pages aren't well-formed XML. You can declare that those pages are broken and non-standard but that doesn't stop them from existing and being a serious problem for anybody writing software that interacts with the web.
Nowadays you can just implement what HTML5 specifies (or find a library that already does it), and your parser will work identically to the current or near-future versions of all major browsers - it's defined in enough detail that there's no ambiguity in how to process any stream of bytes. That's never been possible before, when the standards were focused on some vision of a simple coherent syntax and refused to deal with the messy details that are critical in real life.
If you want to document a set of best practices for writing HTML, with rules for lowercase names and closing tags and quoting attributes and for indentation etc, that's fine and would be nice (especially if you could find a way to motivate people to follow the best practices - a decade of promoting XHTML doesn't seem to have stopped people writing terrible code so we need a better way). Meanwhile, HTML5 is solving the harder problem of how to cope with people who ignore those rules.
SpiderMonkey uses 64-bit value types on all architectures (x86, x86-64, ARM, etc), storing either a 64-bit float or a 32-bit int or a pointer (31 bits on 32-bit, 47 bits on 64-bit), so it shouldn't make any difference to their memory usage. (The non-float values get packed into the range of unused NaN float representations, to avoid ambiguity). I think other modern JS engines do pretty much the same thing. JS semantics are that numbers are 64-bit floats, so implementations couldn't really use 63-bit ints (too precise) or 32-bit floats (too imprecise) anyway, though 32-bit ints are a safe optimisation.
Have you seen the OpenSUSE Build Service? That can automatically build native packages for several distros (OpenSUSE, Fedora, Mandriva, Debian, Ubuntu (if you don't depend on anything in Universe)), and already has plenty of games, and isn't too hard to set up when you can copy from existing examples. (I've been trying to use it for 0 A.D. and it seems okay so far.)
Reminds me of Glasshouse. Hop into a nanoassembler gate, get your brain backed up, switch to a healthy new physical body if you fancy. Murder is a minor crime but identity theft is extremely serious. Works great until someone releases a worm that uses humans as transmission vectors, infecting the assembler gates and deleting certain memories from anyone who uses them. You have to put a lot of trust into whoever runs the technology, and they're bound to make mistakes.
Subsetting is not EOT functionality - EOT is basically just a wrapper around a TTF file, and subsetting just involves modifying the TTF, so you can do exactly the same in browsers that read raw TTF files. I've written a font optimizer tool (open source) that does that. (Windows has an API to generate embedded fonts with subsetting, which the WEFT tool uses; I'm not currently aware of any other subsetting implementations.)
Gazelle is from Microsoft Research, and their paper discusses the details of the security model - it's not just a marketing claim.
The idea is that every 'origin' (basically a domain name, which is used as the basis for access control in all modern browsers) is separated into its own sandboxed process. If a page on your domain embeds an iframe from an advertiser's domain, the iframe is rendered in a separate process, and all communication is handled through a Browser Kernel which enforces the security constraints (e.g. preventing the advert from touching or rendering anything outside its iframe box, even if an attacker can find a way to execute arbitrary code in it). Plugins are handled in the same way.
Chrome's security model doesn't handle that kind of separation of multiple sites within a single page. But Gazelle sacrifices some backward compatibility (e.g. it removes the document.domain attribute, and it requires all plugins to be rewritten to use the Browser Kernel instead of directly accessing the network or filesystem), which is unlikely to be acceptable in practice.
And Gazelle is certainly not a replacement for the IE engine - it's built on the existing IE7 components for parsing, rendering, scripting, etc. It's research, and the value is its ideas, some of which could perhaps be integrated into current browser engines to improve security. It's not meant to be a real browser engine, but it seems successful as a research experiment.
The PADDINGXXPADDING is just a standard artifact of the Visual C++ build process - there's a manifest XML string that's added to the .exe (for 'side-by-side' DLL dependency handling), and padding is added for some internal alignment requirements. (This article says the UpdateResource API is what adds that string). So it's nothing unusual or suspicious.
Charles Darwin and the Tree of Life?
There's also an interesting quote from David Attenborough in response to people asking "why he did not give "credit" to God" for the subjects of his nature documentaries:
"Somewhat"? In Flatland, the social status of men is proportional to their number of sides (triangles are the lowest class, and priests are nearly circles); women are even lower, being straight lines. Women are not allowed to walk in public spaces without swaying and emitting noises, so that men do not accidentally get impaled on them. They have to enter their houses by the back door. They are considered "wholly devoid of brain-power", driven by emotion and instinct and lacking memory, and they receive no education.
But it's social satire, not a reflection of the author's views. He was "a firm believer in equality of educational opportunity, across social classes and in particular for women", and the book is attempting to highlight a Victorian mindset that was still prevalent at that time. The women in the book act in far more complex ways than their men give them credit for. The author even says "To my readers in Spaceland the condition of our Women may seem truly deplorable, and indeed it is" - he's not happy with how they're treated, and readers in Spaceland will hopefully see that it's caused by the absurd class system holding them back, though the narrator can't avoid falling back into the prejudices of his society.
The book makes more sense when you understand the context. The Annotated Flatland is quite interesting, providing some background on the author and mathematics and the society of the time.
("more sense" doesn't mean it actually does make sense - it all still seems a bit muddled to me, with a random mixture of physical differences and social differences between people, and strange science (like Lamarckian evolution where the actions of a parent affect the number of sides (hence social status) not of themselves but of their offspring), and sections that I don't understand the point of (like the whole thing about colour being discovered and then banned - it makes sense within Flatland but is it meant to be satirising anything in real life?). Much of it is probably because the world has changed so drastically in 125 years that I just can't understand where the author was coming from. But it's an interesting book despite (or perhaps because of) that.)
Yep, but I was responding to the "note that their chip uses doubles, not floats" statement, and my point was that it does doubles and floats (presumably somewhat like SSE).
That's not true. From their paper:
And it's definitely aimed largely at games: the paper gives performance studies of DirectX 9 rendering from Half Life 2, FEAR and Gears of War.
They are going to change. It's not yet decided exactly how they will change – the HTML WG has Web Forms 2 (an extension of HTML4's forms), and the Forms WG is working on some rough ideas for trying to fit XForms into HTML5, and there is a joint Task Force that is meant to be working things out between the groups but hasn't actually managed to achieve anything yet. (None of the major browser developers has indicated much interest in implementing XForms, whereas Opera has already released an implementation of WF2 and there is some ongoing work to implement parts in Firefox and Safari, so the momentum is currently in that direction.)
Web Forms 2 says "input elements of type hidden may be placed anywhere (both in inline contexts and block contexts)", which sounds like it satisfies your concern (and has the advantage of working in all existing web browsers, unlike a new <state> element).
<table> has never been deprecated, and HTML5 still permits it. (Tables used for layout are not allowed, although that's impossible for an automatic validator to detect). There are already CSS properties that can replace cellpadding ('padding') and cellspacing ('border-spacing').
It seems spec writers usually think that kind of thing should be described in tutorials or other documents, not in the specification. The HTML5 spec is far harder to read than HTML4 (because it's far more detailed, to fix the differences between implementations caused by HTML4's vagueness), so it really needs that kind of user-oriented documentation. The differences document gives a brief mention of what should be used instead of some obsolete features, but it would be nice to have more detail and examples for people who want to move to HTML5.
You can do this without a trusted compiler – when an untrusted compiler compiles a program, it can work out how to prove that the program is 'correct' (e.g. follows the restrictions on memory accesses, perhaps by first having the compiler insert "if memory access if out of range: abort" commands before every access it's unsure about), and then include the proof in the compiled code. The kernel just has to verify that the proof is valid, which is much easier than working out the proof in the first place. If the compiler lies about code being correct, the verifier will detect that and refuse to run it.
The compiler can accept an unsafe language like C, and just emit an error message (and hopefully tell the user where the problem is) if it's unable to prove the program is correct. It wouldn't be possible to run any arbitrary C program, but it wouldn't be necessary to teach people a whole new language.
Of course this is all active research, i.e. it doesn't actually work well enough in practice (and maybe it never will), but it's still an alternative that could work and avoids any reliance on trusted compilers.
I agree in general – though fortunately we've learned from last time, and there is more negotiation and less bloody war. One example is CSS animation: the WebKit developers designed and implemented a first draft, and provided it in their nightly builds, and sent a description to the CSS group to get feedback from developers of other browsers and from other people with relevant expertise.
Similarly, Opera proposed a <video> element earlier this year, and released an experimental alpha build with the feature. The HTML5 group developed a specification for it, and significantly extended the functionality based on feedback from relevant people (Apple, Google (YouTube), etc). Now Apple and Mozilla have experimental implementations of the same feature. There has been very little blood (except over the issue of codecs), and it seems a much better model than the old idea of simply releasing features in a new browser version and expecting your competitors to reverse-engineer your implementation. So there is some hope for the future.
It didn't affect normal images - it broke the drawImage function from the HTML 5 <canvas> element API, which is a fairly new feature and is used relatively rarely but actually quite widely (with ~10 independent bug reports in a couple of days).
Still, I agree it's an unacceptable failure of testing, and I should have said that more strongly. Even the most trivial automated testing of that feature would have caught the problem immediately. Looking at the new tests in Firefox 3, there's still only one which incidentally relies on drawImage. (I have several hundred browser-independent canvas test cases, so I guess I should see if they could be incorporated into Mozilla somehow, to avoid a repeat of this problem in this particular area...)
The interesting thing is that it was the fastest ever release of a browser update. John Resig gives most of the details: A security patch in Firefox 2.0.0.10 was incorrectly checked in, and introduced a bug which was not caught by the testing process. That was only discovered after the release, so the code was fixed and the whole release process had to start up again. Three days later, the 2.0.0.11 update is available for forty languages and three platforms.
So, it reflects badly on Mozilla's testing efforts, though that is an area where Firefox 3 has made significant improvements with automated testing. It reflects well on their release process, which can push out a critical update in just a few days.
That's not really true - the four major desktop browsers all support different features that are newly specified in HTML 5. For the ones other than Opera: Firefox 2 supports client-side storage; Firefox and Safari 2 support the <canvas> element; IE7 and Safari support the contenteditable attribute and drag-and-drop; Safari supports <input type=range>; IE and FF and Safari support <input autocomplete>.
(But Opera does have the only native implementation of the new forms stuff, and Audio, and cross-document messaging, and server-sent DOM events, so I think it's still fair to say it supports HTML 5 more than any other browser.)
I would hope it'll go through some standardisation process before they try adding it to an official release; and that process hasn't happened at all yet, so it would be quite a while. (Opera experimented with a <video> feature some months ago, then submitted a proposal to the WHATWG, and it changed quite substantially when put into the specification - hopefully they'll use the same process for other new features.)
At least they've improved their 2d canvas support for Opera 9.5 (adding getImageData, setTransform, etc, and fixing a few bugs) - but there's still a long way to go before any browser does 2d correctly or matches any other browser, so I guess it'll be years before 3d is well supported.
The conversion of HTML5 to a parse tree is unambiguous too – the spec defines exactly what happens to any input document, including ones full of syntax errors. (Or at least it's unambiguous to machines – humans may have a harder time working out how a badly broken document gets parsed). There's currently an parse tree viewer from html5lib (Python and Ruby), and an independently-developed Java one, and some other private or incomplete implementations, and they should all give the same output for whatever input you try.
HTML5 is much more strict than HTML4, from the perspective of browsers – e.g. it defines precisely how any sequence of bytes is parsed and the DOM tree that is produced, and HTML5-compliant browsers must follow those rules. HTML4 left most of the parser undefined, so browser developers had to make it up themselves, which caused the "you couldn't get anything to display consistently across browsers". A fundamental goal of HTML5 is that HTML documents (including the significant majority that are invalid and full of syntax errors) must work consistently across browsers, by being strict in how browsers handle any content.
It's much less strict than XHTML from an author's perspective, but that is orthogonal to the issue of browser inconsistency – any garbage can get parsed by an HTML5 parser without raising fatal errors, but it will be parsed the same by all HTML5 parsers.
That's not really true, since Microsoft is present in the HTML WG which is now working on HTML 5, and Chris Wilson is a co-chair of the group. They have been extremely quiet in the group so far, though.
HTML 5 currently says
(Bit-rate, bit-depth and number of channels (and maybe other aspects?) are undefined - I assume the specification may end up adding some restrictions on what support is required, depending on what implementors suggest.)
I've experimented with a simple FPS engine using the HTML5 <canvas> element (which provides a 2D graphics API for JavaScript). It's not exactly Quake (it's much more like Duke Nukem 3D minus the gameplay), but it works in most browsers and it's not too terribly slow. It can even do multiplayer AJAX deathmatch, though that's not available online right now...
I expect id would do something more sane, though.
Indeed, but web sites using deprecated features do not get updated and do not go away, so web browsers continue supporting those features forever, and almost nothing has ever been "phased out" in practice. (Browsers also have to continue supporting features that were never specified or documented at all – e.g. the front page of IMDB accidentally uses <image>, which still gets treated like <img> because nobody wants to write a browser that doesn't work with such sites). Since all the browsers implement those things, and any other HTML consumer ought to work the same if it's going to work as well as possible on the web, HTML5 does specify how things like plaintext are handled, and so it should continue to be supported correctly in the future.
It's still true that deprecated features are usually bad ideas and it's strongly suggested to not use them, and plaintext seems like a particularly bad idea, but the danger of them being phased out and no longer implemented is quite small.