Slashdot Mirror


XML 1.1 Spec Hits Some Snags

oever writes "News.com reports that the new XML 1.1 specification defines a new newline character, making it incompatible with the 1.0 specifiation. Apparently, IBM has been pushing the new character to avoid having to modify their software, thereby invalidating everybody else's XML software."

10 of 257 comments (clear)

  1. It's only a candidate specification. by tomhudson · · Score: 5, Insightful
    This specification is being put forth as a W3C Candidate Recommendation of XML 1.1.

    If you don't like it, keep in mind that you CAN bitch about it and help change this.

  2. Considering ... by DigitalDreg · · Score: 5, Insightful

    That IBM gave the world SGML and XML by derivative ....

    That a lot of useful data exists on IBM mainframes ....

    That EBCDIC doesn't "cleanly" map into Unicode by design like ASCII/UTF-8 does ...

    That this benefits IBM users and customers, not IBM because there is no strategic market position related to new-line characters ...

    That this was a recommendation reached by a group ...

    Let it live and get a life.

  3. Re:One tiny little update ??? by PainKilleR-CE · · Score: 5, Insightful

    IBM has contributed so much, it's only natural that some changes might be characterized in the news as benefitting them more than other parties. Is anyone that worried about adding a new EOL character in 1.1 that XML 1.0 "chokes" on ?

    and, as an IBM rep pointed out in the article, XML documents are supposed to specify what version they're using at the top of the document. Any proper XML parser should read that it's 1.0 and interpret the newline character as 1.0 would.

    --
    -PainKilleR-[CE]
  4. 2 line summary by Shagg · · Score: 5, Insightful

    1) XML 1.0 does not follow the Unicode spec
    3) XML 1.1 makes a change so that it does follow the spec

    What's the complaint again?

    --
    Unix is user friendly, it's just selective about who its friends are.
  5. What do they mean, "XML 1.0 chokes"? by st.+augustine · · Score: 5, Insightful

    Does anyone have a link to a page explaining what's really going on? Last I heard, XML doesn't even have a concept of newlines -- most of the time all white space gets normalized (collapsed). The only problem that I could see is if the character wasn't part of the spec for white space. Now, people may have written XML software that chokes, but I think that's a slightly different story. So is the problem that the new character shows up as bogus text content in elements? And is that true for all XML processing software, or does software that relies on a proper Unicode engine not have the problem? What's the deal?

    --

    -- Some things are to be believed, though not susceptible to rational proof.
  6. Re:Read the Unicode spec.... by Anonymous Coward · · Score: 5, Insightful

    Like the man says, read the Unicode specification! Unicode defines a far wider range of characters than simple 7 or 8bit ASCII text can cover, and the à is simply mapped into another Unicode byte pair. You won't loose the ability to use à in your XML documents, you just use Unicode.

  7. *Shrug* by Fweeky · · Score: 5, Insightful
    If you're using the XML prologue like you're supposed to, your XML 1.0 documents will have:
    <?xml version="1.0" ?>
    At the top. The parsers will then parse using the XML 1.0 specification and you won't notice a thing.

    If you don't use it, tough luck, you should have followed the original recommendation more closely. Lucky for you it's not exactly difficult to automatically process XML documents and add the prologe later.
  8. This is just not that big a deal. by BobGregg · · Score: 4, Insightful

    As the quote from IBM points out in the article, this issue is just a subset of the larger problem with Unicode compatibility in XML 1.0. And as someone else pointed out, if document creators are using the XML headers appropriately to begin with, then parsers would handle documents correctly anyway. I'm also willing to bet that the percentage of existing XML documents which contains this particular character (0x85), and which are not already on IBM mainframes, is *extremely* small.

    Face it: this just isn't that big a deal. It's good for industry acceptance and propagation of the standard, at very low cost. Move along, there's nothing to see here.

  9. Re:What about poor old Acorn users? by gorilla · · Score: 4, Insightful

    The Atom, which the BBC was based upon came out in 1979.. At that time there wasn't an IBM PC, and the world was very diverse. You could choose Apples (0D), CP/M (0D,0A), Unix (0A), Primos (8A), VMS - which is either records or 0D. Also, it was quite rare for files to be shared amongst different systems - a file created on an Apple would stay on an Apple forever. A decision which looks strange in 2002 looks as sensible as any other option in 1979.

  10. Mod Article -1 Troll by kalidasa · · Score: 4, Insightful

    This is just a way to spark a holy war "my newline character is better than yours" debate. The proposal makes perfect sense - it brings XML into line with Unicode and ISO-10646.