Slashdot Mirror


HTML V5 and XHTML V2

An anonymous reader writes "While the intention of both HTML V5 and XHTML V2 is to improve on the existing versions, the approaches chosen by the developers to make those improvements are very different. With differing philosophies come distinct results. For the first time in many years, the direction of upcoming browser versions is uncertain. This article uncovers the bigger picture behind the details of these two standards."

17 of 344 comments (clear)

  1. Re:Bet there still isn't a decent "Stop!" button by LiquidCoooled · · Score: 3, Insightful

    Why not just simplify your entire comment:

    Content from a 3rd party runs in a more restrictive context than the primary site (this includes frames etc).
    You are then not held at the whim of a web admin to ensure these tags are included.

    Or you could just use the noscript addin right now and choose which sites you trust at your discretion.

    --
    liqbase :: faster than paper
  2. Browser vendors choice by gsnedders · · Score: 4, Insightful

    All the browser vendors have already said they will support HTML 5 (yes, that includes MS) and all but MS have said they won't support XHTML 2 (MS hasn't made much of an effort to suggest they will support it either).

    As it stands, with both XHTML 5 and XHTML 2 using the same namespace, it is only possible to support one of the two.

  3. Re:I bet my ass.. by coryking · · Score: 3, Insightful

    I urge every web developer to stop treating MSIE as a special case, since it does not follow standards No offense to you, but I love how every single person who smugly suggests this usually has a link to a website that looks like shit when viewed on any browser.

    This also seems to be the case when ever somebody bitches about web designers changing fonts, using javascript, or doing something to make their page look nice. You visit the websites created by the "changing the font at all, even in the stylesheet, is evil" or the classic "why are you trying to use two columns? two columns are evil" religious zealots and all their pages look really dull and boring. Long streams of times new roman. I guess this is our future, eh?
  4. reboot the web! by wwmedia · · Score: 4, Insightful

    am I the only developer thats sick of this html / css / javascript mess??

    people/companies are trying to develop rich applications using decade old markup language thats improperly supported by different browsers (even firefox doesn't fully support css yet) and is a very ugly mix right now, its like squeezing a rectangular plasticine object thru a round,triangular and starshaped holes at the same time



    the web needs a reboot


    we need a programming language that:
    *works on the server and the client
    *something that makes making UIs as easy as drag and drop
    *something that does not forgive idiot html "programmers" who write bad code
    *something that doesnt suffer from XSS
    *something that can be extended easily
    *something that can be "compiled" for faster execution
    *something thats implemented same way in all browsers (or even better doesnt require a browsers and works on range of platforms)

  5. Different directions -- Need Both by alexhmit01 · · Score: 5, Insightful

    Most of the web is non well-formed, so it's variations of HTML 4 with non-standard components. An HTML 5, that remains a non-XML language, presents a reasonable way forward for "web sites." Without the need to be well-formed, the tools to create are easier and can be sloppy, particularly for moderately admined sites. Creating a new HTML 5 might succeed in migrating those sites. If you avoid most breaks with HTML 4, beyond the worst offenders, Browsers could target an HTML 5, and webmasters would only need to change 5%-10% of the content to keep up. That would mean a less degrading "legacy" mode than the HTML 4 renderers we have now.

    So while the HTML 4 renderers floating around wouldn't be trashed, they could be ignored, left as is, and focus on an HTML 5 one. Migrating to XHTML is non-trivial for people with out-dated tools and lack of knowledge. You can't ignore those sites as a browser maker, but HTML 5 might give a reasonable path to modernizing the "non-professional" WWW.

    XHTML has some great features, by being well-formed XML, you can use XML libraries for parsing the pages. This makes it much easier to "scrape" data off pages and handle inter-system communication, which HTML is not equipped for.

    It's interesting in that HTML and XHTML look almost identical (for good reasons, XHTML was a port of HTML to XML) but are technically very different, HTML being an SGML language, and XHTML an XML language. Both programs have their uses, HTML is "easier" for people to hack together because if you do it wrong, the HTML renderer makes a best guess. XHTML is easier to use professionally, because if there is a problem, you can catch it as being an invalid XML document. Professionals worry about cross-browser issues, amateurs worry about getting it out there.

    XHTML "failed" to replace HTML because it satisfies the needs of professionals to have a standardized approach to minimize cross-browser issues, but lacks the simplicity needed for amateurs and lousy professionals.

    Rev'ing both specs would be a forward move that might simplify browser writing in the long term while giving a migration path. XHTML needs a less confusing and forward looking path, and HTML needs to be Rev'd after being left for dead to drop the really problematic entries and give people a path forward.

  6. Re:Bet there still isn't a decent "Stop!" button by throup · · Score: 3, Insightful

    Could you not get around that by injecting code like:

    </restriction> <!-- closes the existing restriction zone. Might not pass as valid XML, but HTML browsers work with tag soup. -->
    Something evil!!!
    <restriction lock="I don't really care here" except="everything"> <!-- This bit is purely optional -->

    Obviously I need to work on something more destructive than "Something evil!!!" before I attempt to conquer the planet...

  7. Re:Bet there still isn't a decent "Stop!" button by Bogtha · · Score: 4, Insightful

    even if your site's filtering is insufficient (doesn't account for a problem in a new tag

    Why would your site let through new tags that it doesn't recognise? Use a whitelist.

    the browser interprets things differently/incorrectly

    This only usually occurs if you let through malformed HTML. Use tidy or similar to ensure you only emit valid HTML. Not to mention the fact that the whole problem is caused by lax parsing — something the W3C has been trying to get people to give up on with the parsing requirements for XML.

    safe-html = a subset of html that we can be confident that popular browsers can render without being exploited e.g. <em> , <p> ).

    You could define such a subset using the modularised XHTML 1.1 or your own DTD.

    Before anyone brings it up, YES we must still attempt to filter stuff out (use libraries etc), the proposed tags are to be a safety net. Defense in depth.

    Yes, but it won't be actually used that way. If browsers went to the trouble of actually implementing this extra layer of redundancy, all the people with lax security measures would simply use that as an alternative and all the people who take security seriously will use it, despite it not being necessary. I think the cumulative effect would be to make the web less secure.

    --
    Bogtha Bogtha Bogtha
  8. No standard without reference implementation by ikekrull · · Score: 4, Insightful

    The worst thing about W3C standards is the lack of a reference implementation. If you can't produce a computer program that implements 100% of the specification you are writing in a reasonable timeframe, your standard is too complex.

    Is doesnt matter if the reference standard is slow-as-molasses or requires vast quantities of memory, at least you have proven the standard is actually realistically implementable. On the other hand if your reference implementation was easy to build and is really good, then that will foster code re-use and massively jump-start the availability of standardised implementations from multiple vendors. It might also show that you have a really good standard there.

    If you don't do this, you get stuff like SVG - I don't think there is even one single 100% compliant SVG implementation anywhere, and there may never be.

    There aren't any fully compliant CSS, or HTML implementations either, to my knowledge.

    The same goes for XHTML and HTML5. If you, as a standards organisation, are not in a position to directly provide, or sponsor the development of an open reference implementation, then personally, I think you should be restricting your standard to a smaller chunk of functionality that you are actually able to do this with.

    There is no reason a composite standard, with a bunch of smaller, well defined components, each with reference implementations, can't be used to specify 'umbrella' standards.

    Now, i am also aware that building a reference application tends to make the standard as written overly influenced by shortcomings in the reference implementation, but i really can't believe this would be worse that the debacle surrounding WWW standards we've had for the last 10+ years. Without a conformant reference implementation, HTML support in browsers is dictated by the way Internet Explorer and Netscape did things anyway.

    I'm also aware that smaller standards tends to promote a rather piecemeal evolution of those standards, when what is often desired is an 'across the board' update of technology.

    But this 'lets define monster standards that will not be fully implemented for years, if at all, and hope for the best' approach seems to be obviously bad, allowing larger vendors to first play a large role in authoring a 'standard' that is practically impossible to fully implement, and then to push their own hopelessly deficient versions of these 'standards' on the world and sit back and laugh because there is no way to 'do better' by producing a 100% compliant version.

    --
    I gots ta ding a ding dang my dang a long ling long
  9. Re:Where is Microsoft? by Planesdragon · · Score: 3, Insightful

    Is Microsoft involved in this at all? If it is, then I am worried. If Microsoft isn't involved at all, then it will fail. That's what "monopoly" means.

  10. The current situation is awful. by Animats · · Score: 4, Insightful

    The current situation is awful.

    • Major tools, like Dreamweaver, generate broken HTML/XHTML.. Try creating a page in Dreamweaver in XHTML or Strict HTML 4.1. It won't validate in Dreamweaver's own validator, let alone the W3C validator. The number of valid web pages out there is quite low. I'm not talking about subtle errors. There are major sites on the web which lack even proper HTML/HEAD/BODY tags.
    • The "div/float/clear" approach to layout was a terrible mistake. It's less powerful than tables, because it isn't a true 2D layout system. Absolute positioning made things even worse. And it got to be a religious issue. This dumb but heavily promoted article was largely responsible for the problem.
    • CSS layout is incompatible with WYSIWYG tools The fundamental problem with CSS is that it's all about defining named things and then using them. That's a programmer's concept. It's antithetical to graphic design. Click and drag layout and CSS do not play well together. Attempts to bash the two together usually result in many CSS definitions with arbitrary names. Tables mapped well to WYSIWYG tools. CSS didn't. (Does anybody use Amaya? That was the W3C's attempt at a WYSIWYG editor for XHTML 1.0.)
    • The Linux/open source community gave up on web design tools. There used to be Netscape Composer and Nvu, but they're dead.
    1. Re:The current situation is awful. by shutdown+-p+now · · Score: 5, Insightful
      Drag'n'drop is simply not a working approach to design proper UI (i.e. the one that automatically scales and reflows to any DPI / window size / whatever).

      As for "defining named things" - the concept of HTML is all about semantic markup. That's why using tables for layout is frowned upon, not because they are bad as such.

    2. Re:The current situation is awful. by ceoyoyo · · Score: 3, Insightful

      HTML isn't supposed to be WYSIWYG. If you want traditional graphic design, make a PDF.

      HTML is supposed to be a document format that can be flexibly rendered. Pretty much the opposite of WYSIWYG actually.

    3. Re:The current situation is awful. by bar-agent · · Score: 3, Insightful

      Drag'n'drop is simply not a working approach to design proper UI (i.e. the one that automatically scales and reflows to any DPI / window size / whatever).

      Drag'n'drop works fine if it is manipulating a proper UI API. OS X's Interface Builder, with its springs and struts system, comes to mind.

      --
      i'd hit it so hard, if you pulled me out you'd be the king of britain [bash.org]
  11. Re:Bet there still isn't a decent "Stop!" button by coryking · · Score: 4, Insightful

    On the contrary, it's very easy. There's plenty of tools out there to do this for you. Cow Crap!

    You want easy? SQL injections are easy to handle. Just use a parameterized query so you don't have to mix tainted data with your trusted SQL.

    Back in the stone age before php thought parameterized queries were more then enterprise fluffery, you were forced to mix your user data with your SQL. And oh were the results hilarious! It look three tries (and three fucking functions) for PHP/mysql to get their escape code right and I'm sure you can still inject SQL with "mysql_real_escape_string()" in some new unthought of way.

    There is no "parameterized query" with HTML. You are *forced* to mix hostile user data with your trusted HTML. If it was that hard to sanitize an "easy" language like SQL, how hard is it to sanitize a very expressive language like HTML?

    You are telling me all those CPAN modules handle the hundreds of ways you can inject HTML into the dozens of different browsers? How many ways can you make an angle bracket and have it interpreted as a legit browser tag? How many ways can you inject something to the end of a URL to close the double quote and inject your javascript? How many ways, including unicode, can you make a double quote? Dont forget, your implementation cannot strip out the Unicode like I've seen some filters do - I need the thing to handle every language! I would guess there are thousands of known ways to inject junk into your trusted HTML.

    I promise you that even the best CPAN module is still exploitable in some way not considered by the author. And I'd be insane to roll my own, as I'm not as smart as she is.

    Don't kid yourself and thinking filtering user generated content is easy. It is very, *very* hard.
  12. I prefer XHTML 2, thanks by wikinerd · · Score: 4, Insightful

    I thank the HTML 5 guys for their attempts, but I prefer XHTML v2

    From TFA:

    XHTML V2 isn't aimed at average HTML authors

    XHTML is for intelligent human beings, you know, people who can actually understand what separation of concerns is.

    [HTML v5] propose features that might simplify the lives of average Web developers

    So HTML v5 is for people who don't understand separation of concerns.

    Unfortunstely that's the 99% of web kiddies out there.

    The standards will appeal to different audiences.

    One standard for smart people who know programming and actually work with an engineering mindset, another for those who see the web as a big graffiti and work with an "anything goes" mindset. No thanks, I prefer ONE standard for smart people, XHTML v2, and just to kick out everyone who isn't qualified.

    1. Re:I prefer XHTML 2, thanks by Dracos · · Score: 4, Insightful

      Agreed, this article is HTNL5 apologist rhetoric. I thought it was rather well-balanced until the author got to HTML5, where his preference is subtly revealed.

      XHTML2's universal src attribute is mentioned (confusingly called a tag), but the universal href attribute is not, which allows any element to be transformed into a link. Nor is the rolse attribute mentioned, which allows a tag to be assigned a semantic meaning (like menu or header) without expanding the tag set.

      TFA even admits in a roundabout way that HTML5 exists because the majority of so called "web developers" are ignorant of the current standards and incapable of effectively using them. If you need to be "clever" to use XHTML2, then perhaps no one will have to reach for the eye-bleach every time they wander into places like MySpace (where page skins are based on an exploit where browsers interpret <style> tags outside the document head, which is illegal).

      I tell people "Writing web pages is easy. Writing them well is hard." This is proven by the amount of junk documents on the web that don't validate as anything but pretty, even if beauty is in the eye of the beholder.

      The author wisely avoided any discussion of the silly new tags (some of which are presentational, not semantic) HTML5 includes. He does mention XHTML5, which is "optional"... why should we take that step backwards?

      The anti-XML-compliance people like to complain that XML is too verbose. If they don't like it, they can use something else, like RTF. Cars have gotten verbose too over the years. Those people can put their money where their moths are by buying an antique that doesn't have a radio, GPS, seat belts, padded dashboards, windows, crumple zones, suspension, electric engine starters, or any number of improvements that could be argued to be bloat.

      XHTML2 is the way we should go.

  13. Re:Bet there still isn't a decent "Stop!" button by curunir · · Score: 3, Insightful

    Why go through all the hassle of "random hard to guess string" which, if implemented improperly could be guessed? Plus, as others have pointed out, HTML is not a dynamic language. Your random, hard to guess string could be observed and used by an attacker.

    Wouldn't something like:

    <sandbox src="restrictedContent.html" allow="html,css" deny="javascript,cookies"/>

    ...be a whole lot simpler? Just instruct the browser to make an additional request, but one in which it's expected to fully sandbox the content according to rules that you give it. This makes it much harder for application developers to screw up and a lot harder for malicious code to bypass the sandboxing mechanism.

    --
    "Don't blame me, I voted for Kodos!"