Slashdot Mirror


HTML V5 and XHTML V2

An anonymous reader writes "While the intention of both HTML V5 and XHTML V2 is to improve on the existing versions, the approaches chosen by the developers to make those improvements are very different. With differing philosophies come distinct results. For the first time in many years, the direction of upcoming browser versions is uncertain. This article uncovers the bigger picture behind the details of these two standards."

5 of 344 comments (clear)

  1. Bet there still isn't a decent "Stop!" button by TheLink · · Score: 5, Interesting

    You have to hand it to the W3C, they keep supplying web designers with rope.

    I've been trying to get them (and browser people) to include a security oriented tag to disable unwanted features.

    Why such tags are needed:

    Say you run a site (webmail, myspace (remember the worm?), bbs etc) that is displaying content from 3rd parties (adverts, spammers, attackers) to unknown browsers (with different parsing bugs/behaviour).

    With such tags you can give hints to the browsers to disable unwanted stuff between the tags, so that even if your site's filtering is insufficient (doesn't account for a problem in a new tag, or the browser interprets things differently/incorrectly), a browser that supports the tag will know that stuff is disabled, and thus the exploit fails.

    I'm suggesting something like:

    <restricton lock="Random_hard_to_guess_string" except="java,safe-html" />
    browser ignores features except for java and safe-html.
    unsafe content here, but rendered safely by browser
    <restrictoff lock="wrong_string" />
    more unsafe content here but still rendered safely by browser
    <restrictoff lock="Random_hard_to_guess_string" />
    all features re-enabled

    safe-html = a subset of html that we can be confident that popular browsers can render without being exploited e.g. <em>, <p>).

    It doesn't have to be exactly as I suggest - my main point is HTML needs more "stop/brake" tags, and not just "turn/go faster" tags.

    Before anyone brings it up, YES we must still attempt to filter stuff out (use libraries etc), the proposed tags are to be a safety net. Defense in depth.

    With this sort of tag a site can allow javascript etc for content directly produced by the site, whilst being more certain of disabling undesirable stuff on 3rd party content that's displayed together (webmail, comments, malware from exploited advert/partner sites).

    --
  2. Re:Web Applications? by gsnedders · · Score: 5, Informative

    HTML 5 is aiming to support various things needed for web applications (in fact, the current draft is formed of two documents: Web Applications 1.0 and Web Forms 2.0). Also, see http://www.w3.org/2006/appformats/admin/charter.html.

  3. Different directions -- Need Both by alexhmit01 · · Score: 5, Insightful

    Most of the web is non well-formed, so it's variations of HTML 4 with non-standard components. An HTML 5, that remains a non-XML language, presents a reasonable way forward for "web sites." Without the need to be well-formed, the tools to create are easier and can be sloppy, particularly for moderately admined sites. Creating a new HTML 5 might succeed in migrating those sites. If you avoid most breaks with HTML 4, beyond the worst offenders, Browsers could target an HTML 5, and webmasters would only need to change 5%-10% of the content to keep up. That would mean a less degrading "legacy" mode than the HTML 4 renderers we have now.

    So while the HTML 4 renderers floating around wouldn't be trashed, they could be ignored, left as is, and focus on an HTML 5 one. Migrating to XHTML is non-trivial for people with out-dated tools and lack of knowledge. You can't ignore those sites as a browser maker, but HTML 5 might give a reasonable path to modernizing the "non-professional" WWW.

    XHTML has some great features, by being well-formed XML, you can use XML libraries for parsing the pages. This makes it much easier to "scrape" data off pages and handle inter-system communication, which HTML is not equipped for.

    It's interesting in that HTML and XHTML look almost identical (for good reasons, XHTML was a port of HTML to XML) but are technically very different, HTML being an SGML language, and XHTML an XML language. Both programs have their uses, HTML is "easier" for people to hack together because if you do it wrong, the HTML renderer makes a best guess. XHTML is easier to use professionally, because if there is a problem, you can catch it as being an invalid XML document. Professionals worry about cross-browser issues, amateurs worry about getting it out there.

    XHTML "failed" to replace HTML because it satisfies the needs of professionals to have a standardized approach to minimize cross-browser issues, but lacks the simplicity needed for amateurs and lousy professionals.

    Rev'ing both specs would be a forward move that might simplify browser writing in the long term while giving a migration path. XHTML needs a less confusing and forward looking path, and HTML needs to be Rev'd after being left for dead to drop the really problematic entries and give people a path forward.

  4. Re:Why not ditch HTML? by GrouchoMarx · · Score: 5, Interesting

    As a professional web developer and standards nazi, I'd agree with you if it weren't for one thing: User-supplied content.

    For content generated by the site author or a CMS, I would agree. Sending out code that is not XHTML compliant is unprofessional. Even if you don't want to make the additional coding changes to your site to make it true XHTML rather than XHTML-as-HTML, All of the XHTML strictness rules make your code better, where "better" means easier to maintain, faster, less prone to browser "interpretation", etc. Even just for your own sake you should be writing XHTML-as-HTML at the very least. (True XHTML requires changes to the mime type and to the way you reference stylesheets, and breaks some Javascript code like document.write(), which are properly left in the dust bin along with the font tag.)

    But then along comes Web 2.0 and user-supplied content and all that jazz. If you allow someone to post a comment on a forum, like, say, Slashdot, and allow any HTML code whatsoever, you are guaranteed to have parse errors. Someone, somewhere, is going to (maliciously or not) forget a closing tag, make at typo, forget a quotation mark, overlap a b and an i tag, nest something improperly, forgets a / in a self-closing tag like hr or br, etc. According to strict XHTML parsing rules, that is, XML parsing rules, the browser is then supposed to gag and refuse to show the page at all. I don't think Slashdot breaking every time an AC forgets to close his i tag is a good thing. :-)

    While one could write a tidy program (and people have) that tries to clean up badly formatted code, they are no more perfect than the "guess what you mean" algorithms in the browser itself. It just moves the "guess what the user means" algorithm to the server instead of the browser. That's not much of an improvement.

    Until we can get away with checking user-submitted content on submission and rejecting it then, and telling the user "No, you can't post on Slashdot or on the Dell forum unless you validate your code", browsers will still have to have logic to handle user-supplied vomit. (And user, in this case, includes a non-programmer site admin.)

    The only alternative I see is nesting "don't expect this to be valid" tags in a page, so the browser knows that the page should validate except for the contents of some specific div. I cannot imagine that making the browser engine any cleaner, though, and would probably make it even nastier. Unless you just used iframes for that, but that has a whole host of other problems such as uneven browser support, inability to size dynamically, a second round-trip to the server, forcing the server/CMS to generate two partial pages according to god knows what logic...

    As long as non-programmers are able to write markup, some level of malformed-markup acceptance is necessary. Nowhere near the vomit that IE encourages, to be sure, but "validate or die" just won't cut it for most sites.

    --

    --GrouchoMarx
    Card-carrying member of the EFF, FSF, and ACLU. Are you?

  5. Re:The current situation is awful. by shutdown+-p+now · · Score: 5, Insightful
    Drag'n'drop is simply not a working approach to design proper UI (i.e. the one that automatically scales and reflows to any DPI / window size / whatever).

    As for "defining named things" - the concept of HTML is all about semantic markup. That's why using tables for layout is frowned upon, not because they are bad as such.