Slashdot Mirror


Does the World Need Binary XML?

sebFlyte writes "One of XML's founders says 'If I were world dictator, I'd put a kibosh on binary XML' in this interesting look at what can be done to make XML better, faster and stronger."

481 comments

  1. For Starters by Nom+du+Keyboard · · Score: 2, Insightful
    what can be done to make XML better, faster and stronger.

    For starters, keep Microsoft out of it.

    --
    "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
    1. Re:For Starters by jrm228 · · Score: 1

      Good point, because we're all better off if the world's biggest and most influential software vendor makes their own standards without any external input. Not too bright.

    2. Re:For Starters by Anonymous Coward · · Score: 0

      You know that Jean Pauli, a co-founder of XML, works for Microsoft right?

    3. Re:For Starters by Omega1045 · · Score: 4, Interesting
      Why? Microsoft has done a fairly good job promoting XML and SOAP XML Web Services. As long as they stick to the standards (yes, I know) I see no reason to keep them out.

      IBM has actually tried to introduce some goofy stuff into the XML standards, like line breaks, etc, that should not be in a pure node-based system like XML. Why are not you picking on them in your comment?

      As far as SOAP and XML Web Services (standardized protocols for XML RPC transactions) Microsoft was way ahead of the pack. And I rather enjoy using their rich set of .NET XML classes to talk to our Unix servers. It helps my company interop.

      --

      Great ideas often receive violent opposition from mediocre minds. - Albert Einstein

    4. Re:For Starters by Soko · · Score: 3, Insightful

      Agreed.

      However, let me re-phrase the grandparent:

      "For starters, make sure Microsoft can't extend it to lock out compeditors in some way."

      Better?

      Soko

      --
      "Depression is merely anger without enthusiasm." - Anonymous
    5. Re:For Starters by leerpm · · Score: 2, Insightful

      Good idea. Without Microsoft's support from their tools division, this idea will be dead on arrival..

    6. Re:For Starters by Comatose51 · · Score: 1

      Their .Net XML components are pretty damn nice. It makes parsing XML really easy. The ability to save Office documents as XML is really nice as well. So far, Microsoft has only helped spread the usage of XML.

      --
      EvilCON - Made Famous by /.
    7. Re:For Starters by jrm228 · · Score: 1

      Figured that's what you meant, but I'd rather not propel Slashdot's typical propensity for anti-company/anti-MS bias, especially in a technology thread.

    8. Re:For Starters by Anonymous Coward · · Score: 0

      IBM has actually tried to introduce some goofy stuff into the XML standards, like line breaks

      XML has one purpose: To be easily parsed. By machine or human. I like me my line breaks.

      If you don't want linebreaks or people reading your datafiles, start a new standard. Call it XBL (eXtensible Binary Language) or something.

    9. Re:For Starters by Anonymous Coward · · Score: 0

      Not exactly flamebait. MS has warped standards. I recently had a problem where I developed in VBA an XML document that included base64 binary blob data that I was posting to a servlet to be parsed by JAXB. MS formatted this as "xs:base64Binary" while JAXB used "bin.base64".

      One can also look at the Active Directory implimentation of LDAP to see name changes of standardized fields. Single Sign On (SSO) in a mixed platform environment can be challenging due to little changes like this.

    10. Re:For Starters by short · · Score: 2, Insightful

      I had to be compatible with .NET SOAP (XML) and I had hard time guessing how to behave to be compatible with the Microsoft SOAP server.

      I was using Perl SOAP::Lite, see its section on Microsoft compatibility issues. Still Microsoft conforms to the specification although only the Microsoft way of using standards is correctly recognized. Clearly anti-competitive behavior while still standards compliant - simply perfect.

    11. Re:For Starters by Anonymous Coward · · Score: 0

      FYI: XBL is already Mozilla's eXtensible Binding Language.

      Off-topic and so forth, but meh.

    12. Re:For Starters by aichpvee · · Score: 0

      Is there any reason NOT to be anti-microsoft?

      --
      The Farewell Tour II
    13. Re:For Starters by Anonymous Coward · · Score: 0

      Microsoft is not the world's biggest software vendor.

    14. Re:For Starters by danheskett · · Score: 1

      I looked over the compatibility issues you linked to, and was wondering where in the list of items (which is brief, by-the-way) you belive to be anti-competitive. You called them that - "clearly anti-competitive".

      I've been developing with SOAP for a long time, and have several cross-platform products in daily heavy use that use SOAP and XML in general very heavily.

      I've found MS's reading of the standard and spec to be spot on virtually every case. I have found that a lot of smaller SOAP servers/clients are sorely lacking and are incompatible with MS's SOAP routines through their own ignorance of the standard, bugs, or willful neglect.

      In almost any standard there is room for interpretation.

      What specifically do you fault MS for in their handling of SOAP?

    15. Re:For Starters by Anonymous Coward · · Score: 0

      Yes. To be anti-Microsoft is to automatically reject the fundamental belief system of 90% of the people you know. When you tell them "MS is bad", to some degree they hear that as you telling them that they got suckered by clever marketing. The fact that they realize in their hearts that this is at least partially true doesn't help. You've also got to be credible: for most people, it is easier to set up and patch an XP system than a Linux system. If you try to convince people otherwise, they're going to call BS on everything else you say. Windows Update for example, is about as easy as you can make it. Unfortunately, we also know that it has huge security issues an occasionally fails in some undocumented fashion but my mom doesn't care about or even comprehend why that might be a problem. Yes, I know what this says about MS and I agree 100% but I also realize that I can't begin to explain this to my mom in a way that is not going to make me sound like a nitpicking nerd.

      All this contributes to my new approach to Windows vs Linux. Whenever I build a new system, I test boot it with Knoppix before doing anything else. I just use Linux day-to-day and don't really try to sell it. My friends and associates notice and, since I'm not trying to ram a bucket of bile down their throats, they listen. On at least three occasions in the past two weeks I've had conversations like this:
      Friend: "Hey, aren't you booted into Linux?"
      Me: "Yes, of course. Why do you ask?"
      Friend: "Your Windows share is still showing up on my machine" (exact feature varies)
      Me: "Of course."
      Friend: "You can do that!?"
      Me: "Of course."
      Friend leaves looking thoughtful.

      or...

      Friend: "Do you have any software to..."
      Me: "Yes, but I'll have to boot to Linux(*)"

      (*) This invariably happens when I'm playing BF1942.

      These friends are starting to simply accept that the software and features available under Linux ARE better than what they've got. They're not ready to make the jump yet but when they are, they know that I will be happy to help them. By being realistic about their computing knowledge and needs and not anti-MS for the sake of being anti-MS (I do that too but under different circumstances) I'm gaining ground a lot faster than I was.

      As an aside, this is almost exactly how I got all of my friends onto Firefox. Rather than panning IE, I just said "Hey I found a new browser that is way faster than what you are using. Would you like to give it a whir?" The utterly seamless installation, the fact that everything just worked, the obvious performance gain and the generally improved browsing experience made it so they just never bothered to go back.

      JM2C

    16. Re:For Starters by short · · Score: 1

      I must admin that in this case Microsoft behave as well as it could, no known faults on its side. I had to retrieve the mails and I really remembered it exactly the way I wrote it (wrongly) in my original comment.

      The mail regarding the compatibility: mbox.

      My apologies _for_this_specific_case_only_ to Microsoft.

    17. Re:For Starters by aichpvee · · Score: 0
      I don't see a problem with that. If 90% of the people I know are idiots, then I have no problem telling them so. Though most of the people I know who run windows are well aware that it is the idiot's choice in OS, but they happen to play games and not do much else on their computers so they don't care.

      But saying that I should be pro-m$ because 90%+ of people use it is like saying I should be pro-christian because the majority of people in america are. And that's not only ridiculous but extremely dangerous.

      --
      The Farewell Tour II
  2. Then what by chris_mahan · · Score: 2, Funny

    Then what happens, do you base64 the binary xml and wrap it in an ascii xml document?

    --

    "Piter, too, is dead."

  3. Binary = Proprietary by Anonymous Coward · · Score: 0

    This will kill XML

    1. Re:Binary = Proprietary by taybin · · Score: 1

      Just like binary killed jpeg? Or ELF? Please. Binary != proprietary.

    2. Re:Binary = Proprietary by Adhemar · · Score: 3, Insightful

      Of course binary doesn't equal proprietary. Those are two completely different concepts.

      PNG is a binary format. It isn't proprietary, though. And although I can't immediately find a text-based proprietary format, such formats are not impossible (although arguably easier to reverse-engineer than binary proprietary formats).

      But if the XML is really such a problem, I suggest the simple solution. Compressing XML with a simple and open algorithm like gzip or bzip2, is the way to go. XML usually compresses very easily.

    3. Re:Binary = Proprietary by Adhemar · · Score: 1
      But if the XML is really such a problem
      was meant to be
      But if the size of XML is really such a problem

      And

      XML usually compresses very easily.
      should rather be
      XML usually compresses very well.
      Yes, I did preview. I just did it blindly.
    4. Re:Binary = Proprietary by Austerity+Empowers · · Score: 2, Insightful

      That's the dumbest statement I've ever heard.

      As long as it's standardized, the standard is freely available to anyone who wants it, it does not depend on an external library, and it is unencumbered by any sort of patent, it isn't proprietary.

      I hate XML right now because of all the string processing and parsing. Text is a sloppy way of defining something, and it begets lots of big processing libraries. It's OK for big PC memory hog apps, but I can't build a small enough one that is still robust enough to want to integrate it into the work I do (small, compact stuff). I find myself doing other, backwards things, or worse, fracturing XML into useable subsets. It somewhat defeats its utility.

      Binary XML sounds like a great idea to me, as long as we're clear on a few things. One, it has to be totally documented in a standard (see above for my definition). Two, the standard must define a tool that can read an XML file and say "Yes this is XML" or "No, this is some [microsoft] non-compliant crap". Three, keep it simple: no compression, no outside library dependencies, no cruft.

      If those things cannot be achieved then it will not reach maximum utility and something proprietary will swoop down and take over (*cough* microsoft *cough*).

    5. Re:Binary = Proprietary by Anonymous Coward · · Score: 0
      As many people have pointed out. Binary != Proprietary. Many of the examples given however are not formats or standards with the flexibility of XML.

      One format being developed called Argot goes further than XML. It is binary, but works using a dictionary approach to data definitions. This means you can mix diferent schemas easily. Look at http://www.einet.com.au/ for more info

  4. The solution is clear... by LordOfYourPants · · Score: 3, Funny

    Use the Z-modem protocol between Information Superhighway routers to compress the plaintext.

    1. Re:The solution is clear... by doofusclam · · Score: 1

      Why is that modded funny? The O.P is correct - plain text gets compressed to hell by most browsers for a start. It doesn't help with processing requirements (using XML in the first place is a big drain compared to a well designed binary file) but it does render the bandwidth question moot.

    2. Re:The solution is clear... by DunbarTheInept · · Score: 2, Insightful

      The more compression that is done, the greater the CPU usage. Eventually it reaches a point of diminishing returns where there is no point in trying to compress a network stream any further because you are merely turning it from an I/O bound task to a CPU bound task. Also, to get really good compression, you need to look ahead and see a lot of the bytes of the file to look for similarites. But in a stream application, you don't have the luxury of holding giant buffers for each stream of bytes - so you have to make do with finding what compression you can in the smaller buffered chunks of the data that you pass through a little at a time. Therefore, although compression is used, it's not going to be the really good kind of compression we're used to seeing with something like "gzip -9".

      --

      Don't label something "offtopic" unless you know the topic well enough to tell what's on topic.

    3. Re:The solution is clear... by doofusclam · · Score: 1

      I know the cpu usage is the problem - I stated so, compression only alleviates the bandwidth, at the expense of a (minor) increase in CPU and memory bandwidth requirements. For some applications, it'd be fine.

      This said, my original point still stands - to mod the O.P as 'funny' shows a lack of knowledge in the moderators.

      Remember too that HTTP supports compressed data, and believe me that XML is prime material for compression - it's repeatable and uses a limited ascii set - and a compressed XML document will not be a lot different in size than one written in some binary XML format.

      I'm no fan of XML - most of the real-world implementations (Origo, for a start) that i've seen simply apply a relational model to the XML translation, which is not only missing the point but is sub-optimal too.

  5. Step 1 to getting binary XML by Anonymous Coward · · Score: 2, Insightful

    Binary XML = zip file.xml > file.xml.zip
    Thats all you need. XML compresses great.

    1. Re:Step 1 to getting binary XML by Anonymous Coward · · Score: 0

      zip file Why??? it sucks and is closed.

      bzip or Gzip. bzip is smaller and faster and 100% open and free. I can make a product with bzip compression and sell it for 20 trillion dollars a copy.

      fools use closed or patent encumbered compression.

      and Zip is exactly that

    2. Re:Step 1 to getting binary XML by Dasein · · Score: 3, Insightful

      The problem is that many systems that produce XML have a more compact internal storage (rows from a DB or whatever), then they go through an "expansion" to produce XML.

      So, to propose simply compressing it means that there's and expansion (which is expensive) followed by a compression (which is really expensive). That seems pretty silly. However, given an upfront knowledge of which tags are going to be generated, it's pretty easy to implement a binary XML format that's fast and easy to decode.

      This is what I did for a company that I worked for. We did it because performance was a problem. Now, if we don't get something like this through the standards bodies, more companies are going to do what mine did and invent thier own format. That's a problem -- back to the bad old days before we had XML for interoperability.

      Now, if we get something good through the standards body then, even though it won't be human readable, it should be simple to provide converters. To have something fast that is onvertable to human readable and back seems like a really good idea.

      --
      You are not a beautiful or unique snowflake -- but you could be if you got off your ass.
    3. Re:Step 1 to getting binary XML by Tsiangkun · · Score: 2, Interesting

      In my hands, bzip compresses better, but is somewhere between somewhat slower and orders of magnitude slower on my system, depending on the options used to invoke the command and the size of the file being compressed. gzip is fast, works on streams instead of blocks, and is available on nearly every system.

    4. Re:Step 1 to getting binary XML by Anonymous Coward · · Score: 0

      # echo "you are wrong wrong wrong" > test.txt
      # zip test.zip test.txt
      adding: test.txt (deflated 35%)
      # rm test.txt
      # mv test.zip test.txt.gz
      # gunzip test.txt.gz
      # cat test.txt
      you are wrong wrong wrong
      #

    5. Re:Step 1 to getting binary XML by Waffle+Iron · · Score: 1
      Compression solves the data size problem, but not the random access problem. XML is a tree structure, so in theory it should often be an O(log n) operation to access a random piece of data in the file. However, since it's stored as a text stream that must be linearly parsed from the start of the document, it's usually an O(n) operation to get something out of XML. You can load the whole document into a DOM tree to amortize the one O(n) over many accesses, but then you typically consume even more memory than the original XML document.

      An efficient binary format could simultaneously provide compact storage and fast random access.

    6. Re:Step 1 to getting binary XML by Anonymous Coward · · Score: 0
      I can make a product with bzip compression and sell it for 20 trillion dollars a copy.
      When will it be out? You can't believe how long I have waited for it.
    7. Re:Step 1 to getting binary XML by DunbarTheInept · · Score: 1

      XML is mostly about transmission over the wire - an inherently sequential operation.

      --

      Don't label something "offtopic" unless you know the topic well enough to tell what's on topic.

    8. Re:Step 1 to getting binary XML by Anonymous Coward · · Score: 0

      No it's not. It's also for configuration files, document markup, a generic storage format for applications like OpenOffice or dia, and some people even use it for databases. Oh, and the SVG image/animation format is also XML.

    9. Re:Step 1 to getting binary XML by Waffle+Iron · · Score: 2, Insightful
      XML is mostly about transmission over the wire - an inherently sequential operation.

      Maybe if non-sequential operations were made more efficient, it would open up more applications than just transmission.

      At any rate, if what you claim were totally true, then nobody would be complaining about performance in the first place since transmission is slower than CPUs and gzip is trivial to apply to the stream.

    10. Re:Step 1 to getting binary XML by arkanes · · Score: 2, Insightful
      Why are you using XML? If you're using it for buzzword compliance, then you're wrong. And nobody but your PHBs cares anyway so it doesn't matter. If you're using it for interchange with other companies, then why are you worried about inefficency, and why is compressing it too much of a barrier? There's lots of obstacles in the way of direct communication with other businesses, compressing your XML is pretty trivial. If you're using it internally as an exchange format, maybe you should consider using something else. There's quite a few alternatives to XML, depending on exactly what you're doing.

      Disclaimer: I detest XML with a passion. However, it has some good points:
      1: Widespread adoption. Anyone who can't parse XML is someone you don't want to be doing business with anyway.
      2: Flexible. Highly compressable plain text means it can go over almost any transport with minimal if any extra work.
      3: The ability to validate against a DTD is invaluable for interchange. It's not the magic interopability the trade rags would have you believe, but if you've ever written parsing and validating code for a binary interchange standard then you'll really value a standard format with a standard validation scheme

      Now, XML certainly has it's bad points. And it's made even worse by the hype. But it really does have advantages.

    11. Re:Step 1 to getting binary XML by Taladar · · Score: 1

      XML was hyped to be more than it's original purpose. So Mobile Phone Companies think they can add another well known Buzzword to their Feature-List and distract the user from their absurd pricing schemes a little longer.

    12. Re:Step 1 to getting binary XML by Dasein · · Score: 1

      Were were using XML because it was an XQuery product. The documents returned were xml -- we just wanted the expansion to occur on the client side, which was more likely to be cheap for customers to scale-up.

      --
      You are not a beautiful or unique snowflake -- but you could be if you got off your ass.
    13. Re:Step 1 to getting binary XML by quantum+bit · · Score: 1

      And bzip2 is a memory hog.

    14. Re:Step 1 to getting binary XML by DunbarTheInept · · Score: 1

      Of the list you gave, ONLY "some people even use it for databases" would make it a random-access format. For everything else in that big list (open office, markup, dia, conf files), it's used serially. (OpenOffice does not access the file randomly - it writes it from start to finish, or reads it from start to finish, only when loading or saving. The rest of the time, it does not use the file.)

      Now, once it's been parsed into something in memory, THEN it might access it randomly, but the parsing or unparsing are sequential.

      --

      Don't label something "offtopic" unless you know the topic well enough to tell what's on topic.

  6. KISS by stratjakt · · Score: 5, Interesting

    On the face of it, compressing XML documents by using a different file format may seem like a reasonable way to address sluggish performance. But the very idea has many people -- including an XML pioneer within Sun -- worried that incompatible versions of XML will result.

    I agree with his point.

    What's wrong with just compressing the XML as it is with an open and easy-to-implement algorithm like gzip or bzip2?

    --
    I don't need no instructions to know how to rock!!!!
    1. Re:KISS by barryman_5000 · · Score: 1

      I am more concerned with the idea that "when you send something in xml is in text form and therefore easier to read." -- Well how about you make a secure connection before you go off throwing your text around? Isn't this how 99% of the world buys stuff off the internet with their credit cards? Problems with bzip/gzip -- not on windows.

    2. Re:KISS by Anonymous Coward · · Score: 0

      Because that doesn't fix the processing overhead in parsing text. String processing is the slowest operation to do on a CPU.

    3. Re:KISS by man_of_mr_e · · Score: 1

      The problem is transmission speed over the network. Even the slowest modern processor is orders of magnitude faster at parsing than the fastest network.

    4. Re:KISS by Anonymous Coward · · Score: 0

      what's wrong with that?
      Microsoft or another company cant make an intentionally incompatable version that will not work with other platforms.

      duh, what other reason is there to use non-open formats?

    5. Re:KISS by Anonymous Coward · · Score: 0

      yes but what about when that processor is driven by a battery and you want to maximize the battery life?

    6. Re:KISS by phasm42 · · Score: 1

      This is often done (large feeds to Amazon.com are compressed). However, you still have to decompress and parse the resulting stream, which is where a big penalty is incurred. I'm hoping that whatever compression they are considering, it will reduce the uncompressed size, as well as making parsing/searching faster.

      --
      "No one likes working in a hamster wheel, and your shop smells of cedar shavings from here." - TaleSpinner
    7. Re:KISS by Ewan · · Score: 2, Informative

      gzip uncompression is built into internet explorer, it's used all the time for speeding up the transfer of html to clients.

      There's no reason why it couldn't be used for xml just as it is for html.

      Ewan

    8. Re:KISS by rootmonkey · · Score: 1

      My previous company used XML as a realtime protocol (I know very lame) and its not the size of the docs is the overhead in parsing especially when you have several Mb a second and only one intel cpu. Ascii --> binary --> Ascii really kills an app.

      --

      Yes but every time I try to see it your way, I get a headache.
    9. Re:KISS by Derek+Pomery · · Score: 1

      Huh? There are plenty of archive tools for windows that read gzip/bzip - and anyway, most users don't need to open XML archives. Programs for windows can load a lib just fine.

      --
      -- perl -e'print pack"H*","6e656d6f406d38792e6f7267"' /. ate my old sig. Bastards.
    10. Re:KISS by Anonymous Coward · · Score: 0

      Exactly right. Something like this is the solution.

    11. Re:KISS by barryman_5000 · · Score: 1

      I know it is on IE but is it built into windows servers?

    12. Re:KISS by Abcd1234 · · Score: 1

      Then don't use XML! Jeebus... at what point did the concept of "the right tool for the job" vanish?

    13. Re:KISS by iabervon · · Score: 1

      It's part of the HTTP 1.1 standard from 1997 ("Content-Encoding: gzip"), so it probably is.

    14. Re:KISS by Anonymous Coward · · Score: 0

      Perhaps binary XML is "the right tool for the job". Opps blew up your argument.

    15. Re:KISS by Anonymous Coward · · Score: 1, Insightful
      Because that doesn't fix the processing overhead in parsing text. String processing is the slowest operation to do on a CPU.

      Then don't use string operations! If you write a dumb parser that uses the language's string functions, breaking off and allocating little chunks of memory, of course it's going to be slow as molasses. The way to do it is to hold the entire xml file in a single string in memory (up to memory capacity), then tightly code a c-language loop to scan it. If done properly the overhead can be barely more than the time it takes to iterate through that many characters, and that overhead may be swamped by the internal table building, binary lookups, and consistency checking that you have to do anyway, no matter what the format.

    16. Re:KISS by ZakMcCracken · · Score: 2, Interesting

      What's wrong with just compressing the XML as it is with an open and easy-to-implement algorithm like gzip or bzip2? I'll tell you one thing that's wrong: these compression algorithms might run fine on your desktop or server; but on an embedded system with restricted memory and CPU power, that's another matter...

    17. Re:KISS by stratjakt · · Score: 1

      Sounds like the problem is a misapplication of the tech. They want to fit the square peg that is XML into a round hole, by making it skinnier.

      Binary XML also sounds like an oxymoron. It's not a meta-language anymore, it's a file container format.

      --
      I don't need no instructions to know how to rock!!!!
    18. Re:KISS by EnderWiggnz · · Score: 1

      not only very lame, but a huge waste of cycles in a realtime environment. what happened, did someone above make a decree that you would use xml in this solution?

      --
      ... hi bingo ...
    19. Re:KISS by Ramses0 · · Score: 5, Informative

      On the surface that works, but it only solves a portion of the problem.

      Data => XML.

      XML == large (lots of verbose tags)

      XML == slow (have to parse it all [dom], or
      build big stacks [sax] to get at data)

      Solution:

      XML => .xml.gz

      You've solved (kindof) the large problem, but you still keep the slow problem.

      What they're suggesting is nothing more than:

      XML => .xml.gzxml

      Basically using a specialized compression schemes that understand the ordered structure of XML, tags, etc, and probably has some indexes to say "here's the locations of all the [blah] tags", attributes so you can just fseek() instead of having to do domwalking or stack-building. This is important for XML selectors (XQuery), and for "big iron" junk, it makes a lot of sense and can save a lot of processing power. Consider that Zip/Tar already do something similar by providing a file-list header as part of their specifications (wouldn't it suck to have completely to unzip a zip file when all you wanted was to be able to pull out a list of the filenames / sizes?)

      "Consumer"/Desktop applications already do compress XML (look at star-office as a great example, even JAR is just zipped up stuff which can include XML configs, etc). It's the stream-based data processors that really benefit from a standardized binary-transmission format for XML with some convenient indexes built in.

      That is all.

      --Robert

    20. Re:KISS by ad0gg · · Score: 0, Troll

      Because with compression all you get is a size drop. If you implement a decent binary format, you could also improve performance in XMLdom parsing and even validating against a schema. I know lot companies like Microsoft and IBM support compiling of xslts for performance, now if we can only do the same with standard format for xml.

      --

      Have you ever been to a turkish prison?

    21. Re:KISS by DunbarTheInept · · Score: 1

      Compressing will not add one iota of security if it uses a well-known open compression scheme. And if it doesn't use a well-known compression scheme, then it defeats the purpose of having an open standard of file exchange and thus doesn't belong in XML.

      If the data needs to be hidden for security, then encrypt the transmission via https instead of http. But don't go ruining the openness of the XML file format to do it.

      --

      Don't label something "offtopic" unless you know the topic well enough to tell what's on topic.

    22. Re:KISS by saider · · Score: 1

      I had that situation as well. XML was the 'way it should be'. The problem was that we really did not know exactly what the product should do. A simple data recording program became a data processing program. I have since left and I do not know what has become of it.

      In short, XML is fine for some applications. But for others it is a real beast. Knowing that there is no 1 solution to everything is the first step to enlightenment.

      --


      Remember, You are unique...just like everyone else.
    23. Re:KISS by Anonymous Coward · · Score: 0

      I'd rather be fishing then trolling on slashdot

      'Than'

    24. Re:KISS by e2d2 · · Score: 2, Interesting

      What you said is right on target. I've worked with XML in a few applications (specifically web services) and everytime we saw a performance drop it was not because of a network bandwidth issue but instead it was because the documents were so large that the parser became the bottleneck. And then when you throw in style sheets for manipulation.. well you get the point.

      So if the need is for compression over networks, well thats only half of XML performance problems. And if the end result becomes a binary format, then how is it, and why would it need to be related to XML in the first place? Data compression over networks is not a valid reason for another standard IMO.

    25. Re:KISS by rootmonkey · · Score: 1

      "did someone above make a decree that you would use xml in this solution?"

      Thats exactly what it was. Our CTO couldn't get it through his head that customers shouldn't be parsing our data they should be using an API to access it. The CTO wanted to keep the data simple for humans even though programs read the data 99% of the time.

      XML is a good format for static documents or small bandwidth apps but not the solution for high speed streaming apps.

      --

      Yes but every time I try to see it your way, I get a headache.
    26. Re:KISS by rootmonkey · · Score: 1

      XML was indeed misapplied in that situation. And I agree that binary XML is an oxymoron.

      --

      Yes but every time I try to see it your way, I get a headache.
    27. Re:KISS by Taladar · · Score: 1

      How can anyone even consider using security by obscurity for a file format where the specifications are (legally) available on the web?

    28. Re:KISS by Taladar · · Score: 1

      I would bet that compared to parsing XML uncompressing a few KB of text is nothing in cpu utilization.

    29. Re:KISS by dbacher · · Score: 1

      Quote> What's wrong with just compressing the XML as it is with an open and easy-to-implement algorithm like gzip or bzip2? ...

      There is some repetetive data, but not a lot. A compact binary representation, for the vast majority of common documents, will both be smaller and can be gzip/bzipped with a greater efficiency than the XML can be.

      Just taking this and usnig a basic ASCII binary-esque encoding:
      AB12Test AccountC4test~C~B~A

      This will bzip/gzip "as well" as the XML does, it's as extensible as the XML is, it can represent hierarchal data as well as the XML can, it's more compact, it can be processed faster, etc.

      If you have to process tens of thousands of transcations per second, you don't want to be processing a 32k text file. Gzip/bzip help transport level size only.

      Note that all it takes is a dictionairy and a tagged binary format to get all the functionality of XML, in terms of extensibility, etc. You pick (and require) a byte order, etc.

      XML has its uses -- but a standard binary form also has its uses. There's this assumption "there's always a bigger pipe," but it's not reasonable to be using mountains of bandwidth "just because you can."

      --
      If your code is acting bloated, and is running rather slow, it's likely and predicted that some loops you will unroll.
    30. Re:KISS by Anonymous Coward · · Score: 0

      Security through obscurity is done (validly) all the time on small scales to prevent "casual observance" of sensitive data. E.g. a POP mail client might store passwords in simple XOR encryption. The config file should be protected anyway, but a good-guy admin might be looking at it to change a POP server address, and he doesn't need to accidentally see your password.

      In meatspace, it's like when you put the faceplace of your car radio in the glovebox. It doesn't make it any harder to steal the radio, but it makes it less appealing and less clear that it is of value. Would you say this is useless and you should only use a key/lock system? Ridiculous.

    31. Re:KISS by Anonymous Coward · · Score: 0

      Maybe you should get it through your head that your CTO talks to other CTOs who in turn talk to CEOs who influence purchasing decisions of your product and therefore revenue for your company and therefore your job itself.

      Guess what one of the biggeest deal-breaker questions he hears all the time is: "Yes, but is the data in XML?"

    32. Re:KISS by Derek+Pomery · · Score: 1

      Really? Is it disabled by default? Because according to my webserver logs, all the IE clients don't seem to trigger mod_deflate (unlike Mozilla).

      And yes, this is on topic.
      gzip compression of semi-XML

      --
      -- perl -e'print pack"H*","6e656d6f406d38792e6f7267"' /. ate my old sig. Bastards.
    33. Re:KISS by EnderWiggnz · · Score: 1

      and why on earth cant he answer with, "It isnt, and here's why it doesnt matter:"

      --
      ... hi bingo ...
    34. Re:KISS by d1v1d3byz3r0 · · Score: 1

      I could see the advantage of integrating the decompression and parsing layers. Using simple Huffman coding, a tag like <THISISMYVERBOSETAG> would get assigned a binary symbol rather than an ascii substring.

      Then, the XML parser simply can pre-search the *dictionary* for symbols that start with < and end with > and flags those symbols as tags explicitly. Then the parser can hunt through the compressed XML body to find those symbols directly. It makes a lot more sense than scanning through an ASCII file looking for &gt and &lt.

    35. Re:KISS by frisket · · Score: 1
      What's wrong with just compressing the XML as it is with an open and easy-to-implement algorithm like gzip or bzip2?

      Nothing except it's too easy :-)

      Tim is quite right, anyway. If you believe your "documents" need compressed XML, you probably shouldn't be using XML in the first place, because they're probably not documents, just data, and we already have ways of transmitting compressed data.

      If I were world dictator, I'd keep the programmers well away from XML until they can demonstrate they have grokked the fullness of markup.

    36. Re:KISS by thogard · · Score: 1

      Text parsing isn't hard on a CPU unless you do it in way that requires a huge about of stack or memory. Sort of like xml (or TeX or lisp). Did the people whoe came up with XML ever read Knuth's or Wirth's books or even look at the reasons behind they way lisp does things?

    37. Re:KISS by Anonymous Coward · · Score: 0

      Only on Win2003. Previous verions had a hack for static content, or you could buy a 3rd party add-in.

    38. Re:KISS by Anonymous Coward · · Score: 0

      If you believe your "documents" need compressed XML, you probably shouldn't be using XML in the first place, because they're probably not documents, just data, and we already have ways of transmitting compressed data.

      But not standard ones. (No, gzip does not count - that standardises the compression, not the format.)

      What "binary XML" would provide, would be a standard, extensible, language-independent data serialisation mechanism. If you can't see how that provides any advantage over the current situation, I'm forced to wonder exactly what networking-related experience you have...

    39. Re:KISS by Ed+Avis · · Score: 1

      I suggest people look at which is a faster way to parse ordinary XML files; you can get enough speedup from that without needing to use a binary format.

      If choosing a binary format there are two kinds. One is motivated by saving space and would serialize the XML stream in a different way. To take a very noddy example, you could choose to encode as byte 00 and as byte 01. Parsing such a file would be faster than parsing textual XML but essentially the same limitations apply: need to scan sequentially to reach a particular element, need to either build up a big tree in memory or use some awkward token-based interface.

      You suggest some indexes in the file 'so you can just fseek()'. This would be rather hairy to do for XML in general and I don't think any one file format would fit all. But if you have a particular XML DTD or schema in mind and you know your application's requirements (eg, 'I need to quickly count the elements and do constant-time lookup of the Nth element') then you could do something... though I don't know of any tool to help with generating and using this kind of indexed format, and hand-rolling code for it could be tedious.

      Of course, a 'fast' binary format with the extra indexes would be bigger than a simple encoding of the data, and perhaps bigger than plain .xml.gz, so you might not want to use it for downloading. (Unless the format has special properties so that a partially downloaded file can be opened and processed without waiting for the rest...)

      Rambling a bit: I would like to see a file compressor / encoder based on BNF grammars (maybe even yacc input files). Every legal file of a given grammar can be produced by applying rules of that grammar. If at a certain stage there are four possible rules to apply then this choice can be encoded in two bits. If one rule is more likely than others then Huffman or aritmetic coding could be used. I haven't thought through exactly how this could work... probably parse the file into a syntax tree and then output that from the top down.

      --
      -- Ed Avis ed@membled.com
    40. Re:KISS by Ed+Avis · · Score: 1

      Bloody Slashdot 'plain text' mode; why does it not understand that a less-than sign is an entirely legal character in plain text?

      I meant to say: http://flexml.sourceforge.net/, and you could encode <foo> as byte 0 and </foo> as byte 1.

      --
      -- Ed Avis ed@membled.com
    41. Re:KISS by Anonymous Coward · · Score: 0

      I do this in my applications. XML/XSL + apache mod_gzip. Works great.

    42. Re:KISS by sjames · · Score: 1

      Compressing will not add one iota of security if it uses a well-known open compression scheme. And if it doesn't use a well-known compression scheme, then it defeats the purpose of having an open standard of file exchange and thus doesn't belong in XML.

      Absolutely agreed. Part of the reason the web took off the way it did is that HTTP is a simply defined TEXT based protocol used to transfer a TEXT based markup language. In the web's early days when the servers were a LOT rougher than they are now, many bugs were squashed by telnet <server name> 80, and manually (or using cut/paste) taking part in a transaction.

      The same applies to mail. In fact, just last week I used that technique to easily debug a custom sendmail rule. If sendmail insisted on using binary XML instead, I would have wasted at least as much time writing a debugging client as it took me to do the actual debugging.

      The proponants of binary XML all forget the possability (a very STRONG possability) that the app generating the output might have a bug to fix. If that app produces text based XML, it's not too hard to spot the broken output and maybe even be able to deduce the nature of the bug using only grep and vi (or emacs).

      Another issue we may expect to see with binary XML is exploits galore. When people talk about binary XML taking less CPU power to parse, what they REALLY mean is "With binary XML, i can get rid of all that CPU intensive error checking by loading the data into a C struct and trusting it to be correct".

      A more reasonable approach might be a compromise where binary helper data is included in the standard XML such that a statndrd XML parser will accept it. It is a hackish cheat, but is better than binary XML.

    43. Re:KISS by Trejkaz · · Score: 1

      Isn't there already an ASN.1 transformation which can already be performed to XML to save space and processing time? Sun were trumpeting it as "Fast Web Services" a year or more ago.

      --
      Karma: It's all a bunch of tree-huggin' hippy crap!
  7. Make a XML compiler... by Yaa+101 · · Score: 1

    But make it a open source one...

    I guess this is another itch to scratch by the community...

    1. Re:Make a XML compiler... by leerpm · · Score: 1

      Compilers are for code, not data. Xml is data.

    2. Re:Make a XML compiler... by Yaa+101 · · Score: 0

      Really?

    3. Re:Make a XML compiler... by Anonymous Coward · · Score: 0
      "Make a XML compiler...
      But make it a open source one...

      • Got something against the word "an"?
    4. Re:Make a XML compiler... by Yaa+101 · · Score: 1

      <rect class="red str" x="15" y="15" width="100" height="50" rx="12" ry="18" />

    5. Re:Make a XML compiler... by Yaa+101 · · Score: 0, Offtopic

      Dunno...
      Maybe I am not a english native speaking person and maybe you are arrogant elitist?...

    6. Re:Make a XML compiler... by Anonymous Coward · · Score: 0

      And maybe the AC doesn't know that the rule is to use "an" before words beginning with a vowel. For the AC's benefit, the set of vowels is {a, e, i, o, u} and x is not a member of that set, so the article preceding "XML" is "a", not "an".

      [And yes, I know that the punctuation grammatically belongs inside the quotation marks, but I'm protesting!]

    7. Re:Make a XML compiler... by Scott+Wood · · Score: 0, Offtopic

      The word "an" precedes vowel sounds, not letters. "X" in "XML" is pronounced "eks", and thus gets "an". Would you say "an unit"?

    8. Re:Make a XML compiler... by Anonymous Coward · · Score: 0

      I bet everyone involved in this thread bitches about brace placement in code too.

    9. Re:Make a XML compiler... by Anonymous Coward · · Score: 0

      I hate hate hate the K&R Cuddle. I can NEVER find my left brace. sigh.

    10. Re:Make a XML compiler... by Yaa+101 · · Score: 1

      Thanks for explaining... :-)

    11. Re:Make a XML compiler... by the_lesser_gatsby · · Score: 1

      XML compilers essentially already exist, XML describes a tree. Parsing it turns it into a DOM tree (parse tree).

      Compilers also turn source code into trees. The code generation from the tree completes the task of compilation.

      Generate the binary (in some standard way) from the DOM tree - and you're got binary XML. I suppose serializing the tree (or a more compact version) would be a simple way to do it.

    12. Re:Make a XML compiler... by Anonymous Coward · · Score: 0
      Compilers are for code, not data. Xml is data.

      Code is data, and XML can be code. XML is a meta language. You could create a programming language that is written in XML. An example program could be like:
      <?xml version="1.0"?>

      <example_program>

      <function name="max">

      <param name="p1" type="int" />
      <param name="p2" type="int" />
      <return type="int" />

      <body>
      <if>
      <expr>
      <gt left="p1" right="p2" />
      </expr>
      <return val="p1">
      <else>
      <return val="p2">
      </else>
      </if>

      </body>

      </function >

      <function name="main">
      <body>

      <var name="result" type="int">
      <call function="max">
      <param value="1" />
      <param value="2" />
      </call>
      </var>

      </body>
      </function>

      </e xample_program>
      It's not pretty though.
    13. Re:Make a XML compiler... by Anonymous Coward · · Score: 0

      The word "an" precedes vowel sounds, not letters. "X" in "XML" is pronounced "eks", and thus gets "an". Would you say "an unit"?

      Nah, I pronounce XML "Zimmil".

    14. Re:Make a XML compiler... by Anonymous Coward · · Score: 0
      If you've ever played with languages like lisp you'll see the distance between code and data is not very far. Basically your browser is a compiler for HTML code.

      There's an alternative to binary XML which treats the descriptions of each type in the document like any code. In effect it creates small code segments to read the data. Check out http://www.einet.com.au/

  8. Oooh, limelight! by csbruce · · Score: 1

    Check out CWXML/BXML. Especially significant though perhaps unintuitive is the savings in compression time from the source data being more compact.

    1. Re:Oooh, limelight! by Anonymous Coward · · Score: 1, Insightful

      Any programmer worth his salt can put together a really good/efficient binary representation of XML in a few days. That's not the issue. The issue here is standardization.

  9. a kabosh? by krisp · · Score: 1

    looks like the developer in question is a little too close to his prize development. speeding up xml by removing all the bloat, however that would be accomplished, be it compiling xml into some sort of byte code or whatnot, seems like a much better idea from the client and server point of view. why transfer 100kb of text data when you can send 10kb of binary data for the same message?

    1. Re:a kabosh? by SnapShot · · Score: 2, Funny
      Considering that for most purposes XML contains a lot of redundant formatting it seems like you could get nearly 10:1 compression simply by using (as has already been mentioned) zip or some other compression algorithm.

      However, you wanted to go to a binary encoding you could try for something relatively straight forward like:

      original:
      <tag name="value"/>
      patented XML encoding algorithm (hexideximal):
      3c746167 206e616d 653d2276 616c7565 222f3e00
      --
      Waltz, nymph, for quick jigs vex Bud.
    2. Re:a kabosh? by Anonymous Coward · · Score: 0

      Why don't we just go back to EDI then?? BXML isn't just stupid, it's a step backwards...

  10. Awesome! by Anonymous Coward · · Score: 0

    Now we can have competing formats of Binary XML. Fuck that human readability bullshit, what we need is to make it so that Apple's Binary XML implementation differs from SUN's Implementation and nothing works with Microsoft's, not even their own files!

  11. Binary XML has been around a while... by PipianJ · · Score: 4, Informative

    Binary XML is nothing new, as I wager that many people here are already using it, albeit unknowingly.

    One of the earliest projects that has tried to make a binary XML (as far as I'm aware) was the EBML (Extensible Binary Meta-Language) which is used in the Matroska media container.

    1. Re:Binary XML has been around a while... by Anonymous Coward · · Score: 0

      That's not really binary XML, that's a Binary Meta Language similar to XML.

    2. Re:Binary XML has been around a while... by leerpm · · Score: 1

      Of course, there are a zillion ways to binary encode XML, but none are a W3C standard.

    3. Re:Binary XML has been around a while... by Zphbeeblbrox · · Score: 1

      Seems to me if you make binary xml its not really xml either. its xbml or extensible binary meta-language. What these people want is not a binary form of xml they want a standard for encoding data in binary. Fine with me just don't confuse the issue by calling it binary "xml". Cause its not xml.

      --
      If you see spelling or grammatical errors don't blame me. I tried to preview but IE here at work borked the CSS
    4. Re:Binary XML has been around a while... by Bert690 · · Score: 2, Insightful
      Exactly... the question shouldn't be "Does the world need Binary XML?" because the answer is "the world already has it, about 1000 different kinds in fact!" It's not like Tim Bray's whining is going to make it go away. ("Waaaahh... someone doesn't like my stuff!")

      The question should instead be "How can we best standardize binary XML?"

      My main fear is the typical "design by committee" style of standards bodies will lead to a super-bloated binary standard containing every pet feature of each participant. This could make it just as slow and painful as working with any textual encoding. I think Mike Conner's "CBXML" is probably the right mix of simplicity, compactness, and efficiency. Sun's Fast Infoset is a horrendous concoction that we can only hope never achieves any prominence. Leave it to the company who made Java the bloated mess it is today to come up with something like that!

      Hey guys, here's a clue...before including an ever so nifty new compression / performance feature into your proposals, how about actually quantifying the expected benefits? This includes both performance of parsing as well as generation. Yes we need a binary XML standard, but keep it simple PLEASE.

    5. Re:Binary XML has been around a while... by Bert690 · · Score: 1

      "binary xml" is simply a shorter way of saying something like "binary serialization of the XML infoset". The binary XML guys like to be concise and compact in their terminology in addition to their representations :-)

    6. Re:Binary XML has been around a while... by bay43270 · · Score: 1

      Hey guys, here's a clue...before including an ever so nifty new compression / performance feature into your proposals, how about actually quantifying the expected benefits? This includes both performance of parsing as well as generation. Yes we need a binary XML standard, but keep it simple PLEASE.

      I can understand why we need a binary standard format, but why should it have anything to do with XML? The entire point of XML (AFAIK) was to have a human-readable data-interchange format with a known set of rules and restrictions (no guessing about how to parse it). If it's binary, it's no longer human-readable.

      Surely we aren't using XML as a basis just because we like hierarchical data models?!?

  12. I admit I'm just a starting developer... by temporalillusion · · Score: 1

    ...but web servers and browsers can use gzip to reduce the size of the HTML going back and forth, why not have something similar where a web service gzips the XML and the consumer decompresses it?

  13. Goals by realdpk · · Score: 1

    FTFA "The goal of the Fast Infoset project is to generate interest among developers and eventually create a standardized binary format."

    I'm not sure why they think that one has to come before the other.

    Frankly, make it a standard so I can write proper code to handle it, and you'll have me (joe random developer) interested.

    1. Re:Goals by WarPresident · · Score: 2, Insightful

      FTFA "The goal of the Fast Infoset project is to generate interest among developers and eventually create a standardized binary format." I'm not sure why they think that one has to come before the other.

      Because standards written in a vacuum tend to suck. Why wouldn't you want input from developers with different backgrounds and needs, then cherry pick the best ideas (many of which you didn't think of), toss out universally reviled ones, and implement a broad, useable standard?

      --
      Here come da fudge!
    2. Re:Goals by realdpk · · Score: 1

      I agree in principle that standards written in a vacuum as you say tend to suck. However, they could release a "preliminary" spec, and I (and others interested) could write to that, give feedback, etc, and they could perhaps use it to develop a release-1.0 spec. Specifications can change as long as it's clear what specification a particular piece of software relies on.

      Basically, they could start with some structure, to ensure that structure may always be present. Hopefully. :)

    3. Re:Goals by GunFodder · · Score: 1

      Everything sucks in a vacuum.

  14. Makes no sense by Anonymous Coward · · Score: 1, Insightful

    Binary XML would destroy what makes xmal powerful: being able to use vi or emacs to understand its content, no fuss, no adobe reader like software, no nothing.

    1. Re:Makes no sense by Anonymous Coward · · Score: 0

      You are aware that both vi and emacs and any other text editor translate binary data into human readable format right? ASCII is a binary format with a standardized translation table. Try loading an ASCII file with an EBCDIC text editor if you don't believe me.

      A standardized translation table for XML components can and should be created for a binary representation of XML. Of course there is a binary protocol already which could be used. Yes I'm talking about ASN.

    2. Re:Makes no sense by cnettel · · Score: 1

      If it's a direct enough mapping, I can't imagine why emacs couldn't be modded to handle it almost too easily. Run-length encoding of indentation whitespace, if there is any, back-referencing by number of previous name entities and you already have significant savings in a way that could be undone with a few regexps!

    3. Re:Makes no sense by ClosedSource · · Score: 1

      I really don't understand this argument. We require users to have special-purpose software like browsers to view web pages, but we insist that using a text editor for page creation is somehow "powerful".

      Perhaps we should use only paper because we can prepare content without the "fuss" of a text editor and computer.

    4. Re:Makes no sense by ad0gg · · Score: 1
      Uhh.. What makes XML powerful standardized way of accessing an xml document like XMLDOM and using Xpath to get your information. Being able to view it in a text editor isn't what makes it powerful. Anyway, if you want to view it in a text editor, convert the binary back to text. Saving space and processing is a big advantage, but for webservices or xmlrpc, it won't save much since most webservers already have built in compression. There's also support for compression in WSE specifications.

      I just hope it gets standardized with 1 format, I really hate how webservices got fragemented with the RPC and Document format.

      --

      Have you ever been to a turkish prison?

    5. Re:Makes no sense by spud603 · · Score: 1

      The trick would be to design the binary format such that it could be translated at arbitrary points in the file on the fly. This way it would be easy to extend text editors (vixml) to read and edit the files as if it were just text (ie quickly and transparently), but have the file size stay skimpy. It's important, though, that the structure, or any individual node, be accessible and modifiable without translating the entire tree to ascii. This is why gzip et. al. would not work.

    6. Re:Makes no sense by Anonymous Coward · · Score: 0
      Binary XML would destroy what makes xmal powerful:
      What? Nothing? Maybe they should just kill XML and replace it with a text based format which is not bloated...
    7. Re:Makes no sense by cicho · · Score: 1

      You know, when you start with XML, you read docs that give you those cute little examples like Doee ></person>, so it's human readable and sexy. Then in real life you deal with multi-meg documents filled with stuff like this (pasted from a file I'm working on right now, a few strings munged since it's all sooper-seekret stuff):

      <Body><Raw><ut Class="procinstr" DisplayText="Instruction">&lt;?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?&gt;</ut>
      <ut Type="start" Style="external" RightEdge="angle" DisplayText="topic">&lt;topic id=&quot;00000000-0000-0000-0000-000000000001&quot ; revisionNumber=&quot;12&quot;&gt;</ut>
      <ut Type="start" Style="external" RightEdge="angle" DisplayText="developerConceptualDocument">&lt;deve loperConceptualDocument xmlns=&quot;http://some.domain.com/someplace/1&quo t; xmlns:xlink=&quot;http://www.w3.org/1999/xlink&quo t;&gt;</ut>

      Human-read me *this*.

      --
      "Only the small secrets need to be protected. The big ones are kept secret by public incredulity." - Marshall McLuhan
  15. gzip ? by JonyEpsilon · · Score: 2, Interesting
    Am I missing something, or would just gzip'ing xml when it goes over the network not solve the problem ? And isn't this sort of solution already widely implemented for web content ?

    Somebody fill me in ...

    1. Re:gzip ? by Anonymous Coward · · Score: 0

      cpu cost of gzipping...

    2. Re:gzip ? by Bigby · · Score: 1

      We use that in a backend application at our company. You just have to remember to distribute the gzip library with the application.

    3. Re:gzip ? by ahsile · · Score: 1

      You're missing the point. It's the Parsing. Never very fun.

    4. Re:gzip ? by Anonymous Coward · · Score: 0

      There is gzip hardware so you can offload that role from the CPU.

  16. there are already standards for this... by ophix · · Score: 2, Interesting

    ... its called zipping, most webservers have it as an option to zip the data up as it streams to the client browser

    i fail to see the need to have a "binary xml" file format when there are already facilities in place to compress text streams

    1. Re:there are already standards for this... by rootmonkey · · Score: 5, Insightful

      I'll say it again.. Its not the size of the document its the overhead in parsing.

      --

      Yes but every time I try to see it your way, I get a headache.
    2. Re:there are already standards for this... by grumbel · · Score: 1

      Binary xml wouldn't be just about getting the files smaller, but also about making the parsers simpler. Parsing a xml file today is quite complex and slow, sure it doesn't matter much for a webpage or two, but if you have larger amounts of data its really no fun at all, a proper binary XML standard might speed that up by an order of magnitude or two.

    3. Re:there are already standards for this... by Shadowlore · · Score: 1

      Its not the size of the document its the overhead in parsing.

      Riiiight. Keep telling yourself that. ;)

      --
      My Suburban burns less gasoline than your Prius.
    4. Re:there are already standards for this... by Da+VinMan · · Score: 1

      Do you have proof? Because this is something I'm actually interested in, and no one I know has a beef with XML's performance and a proof to back up their claims.

      --
      Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!
    5. Re:there are already standards for this... by interiot · · Score: 2, Insightful
      Okay, look, he's absolutely spot-on.

      Binary formats contain pointers all over the place... pointers that say "this many bytes to the next record", or if the binary format is designed to be very fast to read, will even contain pointers that say "record 22031 is at offset XXX, record 22032 is at offset YYY". It's very quick to get to record 22032 for these formats, you just jump there and don't even have to wait eons for a physical disk to read in every single byte in between.

      Now, compare to XML. EVEN IF every record was a single xml tag, the parser would have to look for "<", followed by "</", and would have to repeat that 22030 more times.

      That may seem like an extreme example, but 1) most XML "records" are much more complex to parse, and 2) this demonstrates THE MOST MAJOR DOWNSIDE that human-writable formats have... they can't have these "jump to byte XXXX" markers in them, because humans don't want to constantly be updating these references every time they add or subtract a byte.

      Machine-writable file formats realize that inserting or deleting bytes in the middle of a file is a big no-no, so they use several tricks to make sure they don't have to do that. All of these tricks annoy the heck of humans (they either require updating a lot of bytes, or require writing/reading the file in "pages" which bug humans because you can't "see" a whole section at the same time, or other tricks).

      Therefore, human-writable formats should NOT be used as the most basic storage/access format. Agreeing to put an extremely minimal storage layer below XML is simply accepting that machines are more optimized to read/write a different kind of format than humans are.

    6. Re:there are already standards for this... by Excors · · Score: 1

      As here, XML is ten times slower than a binary format, at least for very simple data files that don't get processed much after being read (which are very common in what I've been using XML for). It's still pretty fast, but the speed does become a problem when there are hundreds of files loaded at once. Age of Mythology also uses XMB, so presumably they encountered the same performance problem.

    7. Re:there are already standards for this... by kafka47 · · Score: 1

      Leave him alone, the girls have been telling him that for years! :-)

    8. Re:there are already standards for this... by 14erCleaner · · Score: 1
      The first standard I thought of when I saw this article was ASN.1. Is that what Sun is basing their binary-XML work on?

      ASN.1 was a major pain to deal with, as I recall from my application-development days, and there was a dearth of freely-available tools for manipulating it (my projects always had to use expensive proprietary encoder/decoder libraries), despite being a supposedly open standard. In fact, you couldn't even download the standard itself; it cost fairly major bucks just to get the documents describing it.

      --
      Have you read my blog lately?
    9. Re:there are already standards for this... by Shadowlore · · Score: 1

      Dude, it was a joke. Granted, it wasn't a joke in binary format it was in human readable format, nor was it in a thread about getting girls but it was a joke about him saying "it's not size it is how you [use it]" but I'd figure even binary format people could get it. Apparently not.

      --
      My Suburban burns less gasoline than your Prius.
    10. Re:there are already standards for this... by Bert690 · · Score: 1
      Do you have proof? Because this is something I'm actually interested in, and no one I know has a beef with XML's performance and a proof to back up their claims.

      try this

      Bottom line is that most XML parsers are crap. There are some good ones though (such as expat) but you can still do a lot better even with fairly simple binary encodings.

    11. Re:there are already standards for this... by Da+VinMan · · Score: 1

      I don't understand why you wanted to use XML for a game anyway. Aside from easy file validation with a schema or DTD, you wouldn't benefit from it anyway.

      To my way of thinking, XML is to ease data interchange and make data portable. It's mainly useful when you want to have data that's 1) human readable and 2) self-describing. There are inherent advantages in those two characteristics that lend themselves to open standards, easy modification, etc. But those aren't advantages that I would expect any game to need.

      Has your project experienced any advantages from using XML or binary XML? Just curious. Your project isn't using XML for interchange of data at all (is it?) and data portability wouldn't appear to be a concern, so if you're experiencing other benefits from XML, that would be interesting.

      I think part of the problem here is classic industry hype. XML really doesn't solve very many problems, but it sure got sold that way.

      On another note, when is 0 A.D. going to be playable? (I loved AOM!) :+)

      --
      Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!
    12. Re:there are already standards for this... by Da+VinMan · · Score: 1

      The other side to this coin is that most of the time XML is being used, it doesn't need to be used. If the primary benefits of XML are 1) human readable format and 2) self-describing data and your project doesn't need either of those, then why use XML at all?

      XML's performance will always pale in comparison to flat data or binary data, but then it has advantages that come with the trade-off.

      --
      Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!
    13. Re:there are already standards for this... by Excors · · Score: 1

      One of the game's aims is to make it easy for people to alter almost every aspect of it. That's partly done by providing all the tools (the same as are used to actually create the game), and partly by using human-readable file formats - rather than making e.g. a visual GUI editor, we just use XML and a text editor. (That probably makes it harder to do the layout, but easier to embed scripting code; WYSIWYG wouldn't really work when all the objects are repositioned at run-time, anyway. And it requires less effort from the programmers.)

      XML is mostly being used for graphical objects (specifying a combination of models, animations, textures, etc) and for units (setting strength, speed, scripted event handlers, etc). Those files tend to be generated automatically by the scenario editor, but end up being hand-edited: the data needs to be extended as features are added to the game, and it's far easier to just edit the text file than to add new fields to the graphical editor. (That's mostly laziness again; but there's nothing wrong with laziness, and it seems like a fairly good reason for using a text-based format.)

      It's quite nice having the XML parser do some validation of the data with DTDs, but we could easily live without that. The only interchange of data is between different parts of the program (if you don't count humans who want to mess with it), so there's little need to follow any standards. But there's still the consideration of laziness: XML can be fairly readable, and libraries exist for handling it, and it seems good enough. I don't think anyone actually knew of any alternatives (although the decision was made a long time before I joined the project), but there's nothing particularly bad about XML given that we really want a text-based structured format - except for its speed, which was mostly solved by caching it in a binary format. We don't use XML where it's completely inappropriate - terrain height-maps, configuration files, network communication, etc - but it seems to work fairly well for most other types of data.

      Self-describing is incidentally useful for people who don't want to put up with tools that we've made. I spent literally months trying to understand the giant data file used by Age of Empires 2, since there was no known way to edit it, and still could only read half of it. The game's developers probably never worried about that, but people always want to do things that the developers didn't think of, so a self-describing format is extremely helpful for those kinds of hackers. But a binary format with some documentation would be just as good - self-description isn't really an advantage when any other format can come with some form of description.

      As for when the game will be finished (at least to a beta-testing level), I would imagine 2006 is not an unreasonable guess. But we're all just working on it in our spare time, so progress is a bit erratic, and so it's more a case of "when it's done" ;-)

    14. Re:there are already standards for this... by rootmonkey · · Score: 1

      Well you know what they say, that there are 10 types of people in this world, those that know binary and those that don't. :)

      --

      Yes but every time I try to see it your way, I get a headache.
    15. Re:there are already standards for this... by Anonymous Coward · · Score: 0

      In the end binary XML is nothing like XML any longer. An alternative to binary XML is Argot. It has a binary language for describing binary data in the same way XML Schema describes XML. The resulting parser is also simpler. The description language is even able to describe itself. http://www.einet.com.au/

  17. What would Homer Simpson do? by Anonymous Coward · · Score: 1, Funny

    What would Homer Simpson do if he found out about this news in Springfield? Be creative! Best answer gets 2+ mod points. Good Luck!!!!

    1. Re:What would Homer Simpson do? by dknight · · Score: 1

      *puts on best homer simpsons voice*
      mmmmmm... binary xml... *drools*

      Wait? THAT'S what binary XML is? What's the thing where the chicks wail on each other?? THAT is what I wanted!! Awwwwwww.

      *takes off homer simpson voice*
      thank you

  18. XML is not S-Expressions by Anonymous Coward · · Score: 0

    For all those going to say this? Read this.

  19. ASN.1 by XorA · · Score: 1

    Is binary xml not just a stupid idea and clashing with ASN.1.

    ASN.1 is already a standard, used heavilly in the smartcard/GSM sim industry.

    1. Re:ASN.1 by ZakMcCracken · · Score: 1

      Yes, ASN.1 implements a lot of excellent ideas from the get go, such as

      * being an "abstract" format, i.e. considering data to be independent from its actual byte-wise representation
      * ability to define space-smart encodings
      * supporting canonicalization from the get-go
      * use of a well-defined ISO namespace
      * modularity of grammar definitions

      But take a close look at it and you will see that unfortunately, it is a standard that is very difficult to interpret, crippled with obsolete string formats, and in practice not very well implemented. As a result, useful implementations will even sometimes have to have the ability to break compliance to work with other broken implementations.

      For example, ASN.1 is the underlying language of the X.509 certificate standard, which is in turn used by IETF's PKIX, SSL and HTTPS standards. Canonicalization is supposed to allow the decision of "object equality" in a well-defined manner and known time. However, a widespread HTTPS browser (not IE) did not implement canonicalization in some parts of their implementation. As a result, interoperation with that browser required the implementer to actually violate canonicalization rules so that the object would be properly understood by the browser (!)

      Another sign that ASN.1 might be a little bit too complex is the fact that there are no fully compliant open-source implementations of ASN.1 parsers, parser generators, parsing libraries etc. Even the commercial offering itself is not that good and dearly priced.

      What would be nice is a simpler, leaner version of ASN.1 keeping the main structural features and getting rid of the problematic / obsolete features.

      For resources on ASN.1 and XML, including an XMLSchema-to-ASN.1 converter

  20. ASN.1 + BER? by Anonymous Coward · · Score: 0

    BER encoded ASN.1 data is just this - a tree structure of values w/ external definitions of data types and structures...

    http://www.insidiae.org/~mike/code/asn1dec1-00.0 0. 01.zip

  21. Maybe this is like comparing assembly to C by Stevyn · · Score: 5, Insightful

    Programs written in assembly can run faster than programs written in C, but it's easier for someone to open a .c file and figure out what's going on.

    I'm sure when C came out, the argument was similar that the performance hit doesn't make up for the readability or cross compatibility. But as computers and network connections became faster, C becomes a more viable alternative.

    1. Re:Maybe this is like comparing assembly to C by fizban · · Score: 2, Insightful

      Holy smokes, that's wrong. C code will run exactly the same speed as assembly code if they are both compiled to the same machine code. Computers don't read C or assembly. They read binary computer instructions, whether those instructions were originally written in assembly, C, Java, Perl, Python, etc... If a computer had to read C code every time it wanted to run, it would take so, so, so much longer to do anything. XML is great for humans, but sucks for computers. Not only are you sending gobs of string data that could very easily be represented in a more compact binary format, you are also doing string parsing on both ends. It just screams for optimization.

      What really needs to be done is to separate the presentation of the data from the actual storage. Create translators that can convert to a human readable XML format when required, but otherwise store and communicated the data in a compact binary representation (still standard!), and I don't mean just compressing the string XML, but actually removing all string representation completely.

      --

      +1 Insightful, -1 Troll. What can I say, I'm an Insightful Troll.

    2. Re:Maybe this is like comparing assembly to C by negativeview · · Score: 1

      What I am pretty sure he meant is that in the beginning, C compilers wrote relatively terrible assembly code. It was much much faster than, say, BASIC, but wouldn't come close to matching hand-written assembly.

      Over time though, the assemblers got better, and often write better code than die-hard assembly fanatics can do. Plus it's a lot easier to understand programs in C, and a lot easier to port.

      I beleive that the point he's trying to make is that by the time a Binary XML is anywhere close to being accepted, the problems with XML may go away. And then it will be easier to read XML than Binary XML. That's one advantage that Binary XML would be hard pressed to match.

    3. Re:Maybe this is like comparing assembly to C by delphi329 · · Score: 1

      You are right. When C was invented, it was estimated that it was 30% less efficient than assembly code. However, we now know that those 30% deficiency in running was compensated by 300000% increase in coding productivity.

    4. Re:Maybe this is like comparing assembly to C by Anonymous Coward · · Score: 0

      If implemented properly, binary XML is in 1-to-1 correspondence with XML. Thus, you can convert between the two formats, independent of details, with a single command, and without loosing anything. If you open a binary XML with an editor, it could transform on-the-fly, similar to how compressed files are handled by vim and emacs.

      That is, a think the C comparison does not hold. If implemented 1-to-1, binary XML makes a lot of sense to me.

    5. Re:Maybe this is like comparing assembly to C by Stevyn · · Score: 1

      Alright, I clearly understand that a computer doesn't execute C source code or assembly code files. However, I don't think a C compiler is going to generate machine code as efficient as somebody writing assembly code. Since assembly code is specific opcodes and instructions for the CPU, compilers have to do their best to make the code that will run fastest, but I don't think it's going to compile to the same machine code often, if ever. The point I was trying to make is that while it's slower for the computer to run the code, the increased efficiency of being human readable will make up for this in the future as computers become faster.

    6. Re:Maybe this is like comparing assembly to C by king-manic · · Score: 1

      Assembly code is faster if you hand optimize every single line and you throughly understand the machine. Otherwise machien optimized codes will be closer to optimal. And it also happens to be portable.

      --
      "There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy."
    7. Re:Maybe this is like comparing assembly to C by fizban · · Score: 1

      Computers may become faster, but if we ask them to do more to achieve the same result in the same amount of time and space, they do not really become faster.

      No matter how fast a computer is, it will always read 1 bit (indicating true or false) faster than it reads 4 or 5 bytes representing the words true or false. Why waste computer cycles, no matter how fast they're going, reading and writing extra data that is unnecessary for the task at hand?

      Like I said in my original post, you can have two versions of the same XML data, one that is human readable and writable and one version that the is more geared to the computer, which is exactly what is currently done with modern programing languages! So, consider binary XML as a "compiled" XML format. No human will ever read or write it, but the computers will use it when sending over the network or doing any data manipulations. If done correctly, the compilation will be handled behind the scenes by the applications so that humans never have to do anything but work with the XML "text" data.

      --

      +1 Insightful, -1 Troll. What can I say, I'm an Insightful Troll.

    8. Re:Maybe this is like comparing assembly to C by quantum+bit · · Score: 1

      No matter how fast a computer is, it will always read 1 bit (indicating true or false) faster than it reads 4 or 5 bytes representing the words true or false.

      Hrm, not necessarily. Most machines have a 32-bit memory bus and some architectures (RISC) even enforce memory alignment for normal access. So reading 1 bit is often exactly the same speed as reading 4 bytes -- both operations pull 4 bytes over the memory bus.

      5 on the other hand would be worse, since it would always take 2 memory accesses to get the whole thing (cache notwithstanding).

    9. Re:Maybe this is like comparing assembly to C by bgspence · · Score: 1

      I never run a C program in C, but do run binary compiled C programs.

      I might create or build a text format XML file and use it to debug my XML application, but I might prefer to run the XML application using the corresponding binary XML for improved execution efficency.

  22. You don't need to change XML itself by Nom+du+Keyboard · · Score: 2, Insightful
    XML's verbosity and lack of inherent compression...XML standard calls for information to be stored as text.

    Text compresses quite well, especially redundant text like the tags. So why not just leave XML alone and compress it at the transportation level with protocols like sending it as a zip, let v.92 modems do it automatically, or whatever. No need to touch XML itself at all.

    --
    "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
    1. Re:You don't need to change XML itself by TheRaven64 · · Score: 2, Interesting
      Actually, you could compress XML by a significant amount by making one simple change to the language. Picture the following piece of XML:

      <SomeTagName>some character data</SomeTagName>

      According to the XML spec, the closing tag must close the nearest opening tag. So why does it have to include the opening tag's name? This is 100% redundant information, and is included in every XML tag with children or cdata. An obvious compression would be to replace this with:

      <SomeTagName>some character data</>

      I really don't know why this wasn't done from the outset (backwards compatibility with HTML, where tags often overlap - although they're not meant to - I suppose). Either allow tags to overlap (which allows some more interesting data structures to be easily encoded in XML) or make the name optional in the closing tag.

      --
      I am TheRaven on Soylent News
  23. Binary is need for more than just file size by Anonymous Coward · · Score: 0

    Would you want to store a .bmp as a series of words like pixel(253,8764) = Black? Somethings are better left in binary for and if XML is going to be used for data transportation between programs then it needs support binary data.

  24. I for one by Anonymous Coward · · Score: 0

    welcome our binary XML overlords.

  25. Binary XML is called ASN.1 by Saint+Stephen · · Score: 2, Insightful
    For starters, we already have binary XML, it's called ASN.1. Don't argue, I know it's not exactly the same.

    But secondly, no, you don't need Binary XML, all you need to do is Gzip it on the wire. It gets as small as Binary XML.

    One of the easiest ways to shrink your XML by about 90% is use tags like:
    <a><b><c>
    instead of
    <FirstName><CompanyName><Address>
    You can use a transformation to use the short names or long names on the wire.
    1. Re:Binary XML is called ASN.1 by Keeper · · Score: 1

      But secondly, no, you don't need Binary XML, all you need to do is Gzip it on the wire. It gets as small as Binary XML.

      And it becomes even slower to parse as a result. Binary XML's advantage isn't its size, it is its parsing performance.

    2. Re:Binary XML is called ASN.1 by Anonymous Coward · · Score: 0

      With that advice its no wonder you are onboard. The whole point to having a text based humanly parsable file format is that you cant make sense of it with a text editor. Your suggestion breaks things just as bad.

    3. Re:Binary XML is called ASN.1 by ine8181 · · Score: 1

      I think you're mostly correct. ASN.1 is very cool and efficient, but the problem is standardisation. Same goes with GZip. If everybody decides to use Gzip everytime they send an XML document, we will have a solution.

      I have to object to the shorter tag names though -- this method does not get rid of the inherent redundancy in the open/close brackets, whitespaces and the ASCII data inside, which can be further compressed.

      I develop on Java and .NET, been using XML full time for last 3 years, and yes, we had the XML bloat problem and tried various things including shortening the tag names.. Which didn't help much.

      At the moment we're getting around the problem by the gzip method, which is non standard.

    4. Re:Binary XML is called ASN.1 by Saint+Stephen · · Score: 1

      Most server processes are not CPU bound. That's not the low-hanging fruit.

    5. Re:Binary XML is called ASN.1 by Anonymous Coward · · Score: 0

      A better example would be the HDF standard (currently HDF5, go google).

      Gzip is relatively slow, and you still have to store the tags, even with something like huffman encoding, where the most commonly used characters get represented with 8 bits.

      Also, reading a large chunk of binary data, such as a 400mb hyperspectral image, can be reasonably fast if you don't need to uncompress ascii, convert to binary, and then tossed into the final array. Instead, you can use system calls to toss straight from filestream to array.

    6. Re:Binary XML is called ASN.1 by Anonymous Coward · · Score: 0

      Uh, yes they are. What else do you think accounts for the ever-increasing power and numbers of server CPUs?

    7. Re:Binary XML is called ASN.1 by Saint+Stephen · · Score: 1

      They are typically I/O or memory bound.
      Unless you're compting science stuff, your CPU usually isn't pegged at 100%. So the processing time of parsing the XML isn't the problem.

    8. Re:Binary XML is called ASN.1 by aboyko · · Score: 1

      Actually, you're wrong, and right. This proposal for Binary XML is ASN.1, according to Tim Bray's blog posting.

    9. Re:Binary XML is called ASN.1 by eviltypeguy · · Score: 1

      And guess what larger amounts of data require? I/O and memory. So binary XML improves on both.

    10. Re:Binary XML is called ASN.1 by Keeper · · Score: 1

      I never said they were CPU bound. Most processes are limited by IO or memory bandwidth. Traditional XML parsing is bound by the latter. Compressing the text stream results in more hits to memory, not less. Binary XML does not incur this hit. It has the added benefit of reducing the processing power required to use it, which in general increases the scalability of your server.

    11. Re:Binary XML is called ASN.1 by snorklewacker · · Score: 1

      It is expressed as an ASN.1 schema, yes. ASN.1 can express almost any binary format, but ASN.1 itself is the language to express the format, not the format itself.

      And the ASN.1 specification itself is pretty freakin awful. It seems to have grown appendages and nobbies and warts to fit every last vendor's idea of a datatype, no matter how unorthogonal. Still not the disaster of XML Schema, but it's still a mess.

      --
      I am no longer wasting my time with slashdot
    12. Re:Binary XML is called ASN.1 by voodoo1man · · Score: 1

      There's a pretty interesting blog post by Dave Roberts on why using ASN.1 for binary XML is a bad idea and what Fast Infoset does right and wrong. An amusing thing I learned from one of the user comments is that apparently one of the ASN.1 encoding schemes is XML. You can have lots of fun with that - binary XML encoded in ASN.1 encoded in XML transformed into binary...

      --

      In the great CONS chain of life, you can either be the CAR or be in the CDR.

    13. Re:Binary XML is called ASN.1 by blofeld42 · · Score: 1

      It's A proposal for doing XML in a binary format. It's certainly not the W3C choice. As it stands now, the W3C has a binary XML characterization working group. Their charter is _not_ to provide a binary XML format, but rather characterize the problem, ie define the problem to be solved and what the tradeoffs for various solutions are.

  26. If I were world dictator by Anonymous Coward · · Score: 0
    ok, time for the obligatory what I would do if i was world dictator posts:

    ....aaaaaaand GO!

    1. Re:If I were world dictator by Anonymous Coward · · Score: 0

      If I were dictator, Rachel Ray would be on her knees under my desk right now.

      See, she has nice lips but I can't stand her voice. Hence my command. Solves two problems at once.

  27. Amen To That by American+AC+in+Paris · · Score: 5, Insightful
    XML, as originally designed, is deliciously straightforward. Data is encoded into discrete, easy-to-process chunks that any given XML parser can make sense of.

    XML, as implemented today, is often little more than a thin wrapper for huge gobs of proprietary-format data. Thus, any given XML parser can identify the contents as "a huge gob of proprietary data", but can't do a damned thing with it.

    Too many developers have "embraced" XML by simply dumping their data into a handful of CDATA blocks. Other programmers don't want to reveal their data structure, and abuse CDATA in the same way. Thus, a perfectly good data format has been bastardized by legions of lazy/overprotective coders.

    The slew publications exist for the sole purpose of "clarifying" XML serves as testament to the abuse of XML.

    --

    Obliteracy: Words with explosions

    1. Re:Amen To That by hdc · · Score: 1

      Uh, yeah. If I'm not mistaken, one of the original goals of XML was to make data simply interchangeable. Doesn't making it binary totally demolish that purpose? Silly me, there I go talking sense again....

    2. Re:Amen To That by jandrese · · Score: 1

      The problem with trying to solve the connector conspiracy (in this case obtuse undocumented binary files) is that not everybody [b]wants[/b] to solve the connector conspiracy. Some people would rather have their file format die off than have a competitor gain any advantage whatsoever over their product. They also don't want people buying cheap knockoffs of their products and think they can stop this by not giving away any details on how to interface with their product. If we find a way to change this perception, then the connector conspiracy will mostly go away on its own (save for those lazy guys who just implement it however they want and never document anything, regardless of whatever standards are available).

      --

      I read the internet for the articles.
    3. Re:Amen To That by Kingpin · · Score: 4, Insightful

      An XML document is an abstract. The file with tags is a serialization of that document. A binary file would also just be a serialization. Then you deserialize it in your parser - and get the DOM. It's the job of the parser to give you the object represenation, no matter if it were human readable text or binary format.

      The data is interchangable either way - only difference is that binary XML file is not immediatly human readable.

      --
      Unable to read configuration file '/bigassraid/htdig//conf/14229.conf'
      Geocrawler error message.
    4. Re:Amen To That by Kingpin · · Score: 1

      Come on - how many real projects have you had to deal with "huge gobs of proprietary data" wrapped in XML? People AGREE on a data exchange format, everything else defeats the purpose.

      If the nails look bent - blame the hammer or the carpenter?

      --
      Unable to read configuration file '/bigassraid/htdig//conf/14229.conf'
      Geocrawler error message.
    5. Re:Amen To That by ClosedSource · · Score: 1

      ASCII is simply an abstraction of bits to characters. It works on a system only because that system has the software that supports it. Binary XML would be exactly the same.

    6. Re:Amen To That by Anonymous Coward · · Score: 0

      Thank you for that. The ubiquity of ASCII seems to have left people with the impression that it is not a binary format.

      Lets also not forget the performance penalty of processing entire strings rather than discrete integers within the XML parsers.

    7. Re:Amen To That by Wordsmith · · Score: 1

      Vendors are the ones smacking the huge gobs of proprietary data* into XML. It's a lock-in practice. Like Microsoft's close hold on the DOC format, it prevents interoperation with competing, less prominent products - so the other products can be killed off or at least kept a minor nusiance.

      *I propose a new Acronym - HGoPD, for the now oft-mentioned Huge Gobs of Proprietary Data.

    8. Re:Amen To That by wildBoar · · Score: 1

      XML is a load of old cobblers IMO. It is overused and overhyped.

      You just need to compare Apache config files with the newer xml ones to see how useless it is.

      As for people who want to make databases out of it IBM tried this is the 70s with hierarchical file structures, and quickly replaced them with relational databases.

      In fact there is a whole slew of Java programmers out there with no idea what to do with a decent RDBMS buys reinventing wheels.

      I always thought XML was meant as a simple data interchange protocol between systems. As such it has huge overheads but was adequate, all the rest is sheer madness.

    9. Re:Amen To That by DunbarTheInept · · Score: 1

      The existence of the TCP/IP protocol proves that it is possible to have a binary format still be open. The header data in the packets is totally binary - but in a well-defined way. It is a standard, for example, that all integers exchanged on the network will have their bytes (oh, I'm sorry I mean "octects") arranged in big-endian order. The 32-bit IP address scheme is a binary standard. The 32-bit timestamps are binary standards...etc

      --

      Don't label something "offtopic" unless you know the topic well enough to tell what's on topic.

    10. Re:Amen To That by Jay+Carlson · · Score: 1

      Too many developers have "embraced" XML by simply dumping their data into a handful of CDATA blocks.

      And a lot of people think that CDATA sections are a structural element.

      They're not. CDATA is just another way of quoting text. For example, these two fragments are identical:

      <b>Here is a &lt; character.</b>

      <b><![CDATA[Here is a < character.]]></b>

      Many XML parsers will not distinguish between these two. And you still have to do quoting when writing a CDATA section. If you see the string "]]>" you have to emit:

      ]]]]>&gt;<![CDATA[

      This isn't really directed at you, but it's a pet peeve of mine.

    11. Re:Amen To That by Anonymous Coward · · Score: 0

      That's the problem with taste. It's so subjective. See you find XML delicious whereas I find it leaves a nasty residue in my mouth because I have to chew it for so long.

    12. Re:Amen To That by alan_dershowitz · · Score: 1

      Unless I am misunderstanding you, I think you are incorrect.

      In the case that XML is just wrapped around proprietary data that's not markup and/or not even string data (I guess like JPGs or something), then this discussion is not even relevant because you have control over your content. Just take your binary data, gzip it, uuencode it, and dump it into a CDATA in your XML file. if you don't have control over the content, it's still irrelevant because it has nothing to do with the XML format in the first place. That's like complaining that text files on your hard drive are bloaty because the filesystem doesn't compress it for you.

      This discussion is for people who DO use XML in the proper manner, and have found it bloaty, which it is. Just GZipping XML reduces its size massively, so clearly there is a benefit to doing so. Correct XML is in bad need of compression, but XML wrapped around proprietary formatting could be compressed more or less in the proprietary format itself, avoiding the problem entirely.

    13. Re:Amen To That by wfberg · · Score: 1

      XML, as originally designed, is deliciously straightforward. Data is encoded into discrete, easy-to-process chunks that any given XML parser can make sense of.

      No, it's a simplified version of SGML, but it's not as simple as, say, S-expressions as used in LISP.

      The original XML has all sorts of nastiness; attributes (which can be easily replaced with tags), no context for tagnames (if you have a NAME tag you can't have it be syntactically incorrect for a car to have a NAME/SURNAME), in the beginning there were no schemas (so, no, a parser couldn't make sense out of "1".. is it a string? a boolean? a real number?), namespace nastiness, a zillion encodings.. I could go on..

      XML, as implemented today, is often little more than a thin wrapper for huge gobs of proprietary-format data. Thus, any given XML parser can identify the contents as "a huge gob of proprietary data", but can't do a damned thing with it.
      This is squarely due to the bad design of XML. You see, there are a lot of charaters that are verboten in XML. Like null (ascii 0). Unless you use CDATA sections. In which case you might as well embed binary stuff.

      Also, in a lot of cases it makes perfect sense to embed something binary. Like, a PNG image. Already a great format, no need to botch it. Even better would be a reference to an external file and store everything in an archive file (zip..), but then again, XLink is just scary-looking.

      As useful and pervasive as XML is, it's not a panacea, and it's also not really simple. It's far from the complicated (but extremely full-featured) mess that ASN.1 is, but that's about that.

      --
      SCO employee? Check out the bounty
    14. Re:Amen To That by snorklewacker · · Score: 1

      Doing that sort of escaping gives you three entities instead of one. Big headache. It's not surprising that most people just punt and base64-encode anything that could possibly contain that ending string, such as an XML fragment that itself contains a CDATA. Kind of defeats the purpose, no? Just another twisted piece of the train wreck of XML. Great for config info when you have a decent editor, but just don't try to nest it.

      --
      I am no longer wasting my time with slashdot
    15. Re:Amen To That by Doomdark · · Score: 1
      XML, as originally designed, is deliciously straightforward

      Hmmh... not really, XML 1.0 is surprisingly complicated (you know what I mean if you have ever tried to write a fully compliant parser... or even xml writer). It would be, if it wasn't for things like entities (and indeed most everything defined in DTD, like attribute typing and default values), and convoluted character validity rules (including automatic linefeed and white space normalization). And CDATA should not have been included in the first place -- not so much because of potential for 'abuse' (after all, you could just encode same thing explicitly, CDATA adds no expressive power), but because it complicates many aspects of parsing (coalescing of adjacent text nodes, '<' not being encoded inside CDATA, which prevents ultra-fast seeking etc. etc. etc.).

      XML as used in its simplest form, however, could be said to be nice, clean and simple. Too bad specs didn't limit to "sane" use cases. That's mostly SGML legacy, of course; compared to SGML even XML is delightfully simple.

      --
      I like paying taxes. With them I buy civilization -- Oliver Wendell Holmes
    16. Re:Amen To That by Anonymous Coward · · Score: 0

      Other programmers don't want to reveal their data structure, and abuse CDATA in the same way

      To round out the trio, don't forget the legacy developers that have no idea what their data structure is actually supposed to be (MS' Office team springs to mind) who resort to the same techniques.

    17. Re:Amen To That by Anonymous Coward · · Score: 0

      Too many developers have "embraced" XML by simply dumping their data into a handful of CDATA blocks. Other programmers don't want to reveal their data structure, and abuse CDATA in the same way.

      I haven't seen this happen once, and I've worked with a fair few document types from different organisations. Care to give examples?

    18. Re:Amen To That by Anonymous Coward · · Score: 0

      It sounds like you might have studied real computer science. We don't want your logic here. XML is just the best thing ever!

    19. Re:Amen To That by nzkoz · · Score: 1

      only difference is that binary XML file is not immediatly human readable.

      Which is of course a major benefit of using XML

      --
      Cheers Koz
  28. ZIP ?! by Bazouel · · Score: 0, Redundant

    Why not simply zip it ?

    As far as I know, there are programs/library for that format on every platform ...

    --
    Intelligence shared is intelligence squared.
    1. Re:ZIP ?! by gstoddart · · Score: 1
      Why not simply zip it ?

      As far as I know, there are programs/library for that format on every platform ...


      Because smaller file sizes is only one of the reasons for Binary XML.

      Simply compressing it makes it smaller, but does nothing to simplify handling. Parsing XML is the big hairy deal in this case. Things like XML include a lot of ambiguities and complex things, parsing/representing the trees can be a challenge. Think processing of name-spaces and all of the myriad things in XML.

      I suspect the purpose of a Binary XML is to have the data already parsed into a traversable structure that applications can use easier. This would improve load-times, as well as make it less necessary to have parsers fully implemented as part of every program.

      The problems with Binary XML mean that you no longer have a human-readable form of the data, so editing/reading becomes difficult. At this point, you've got yet another obscure binary file which is less easy to work with and fairly opaque to a user.

      In this case, this is what Tim Bray is complaining about.

      Cheers

      --
      Lost at C:>. Found at C.
    2. Re:ZIP ?! by BullfrogJones · · Score: 1
      Zipping an xml file would address one of the two problems that are discussed (transport over the network) but not the other.

      The harder problem is getting applications to process xml data faster. Starting with a gzipped data file and then processing it doesn't speed up processing. Some sort of parseable minimal text format or binary format, on the other hand, would make faster processing possible, just at the risk of diverging standards for how to convert to binary and read it thereafter.

  29. Two words. by Dasein · · Score: 1

    DIME attackments.

    --
    You are not a beautiful or unique snowflake -- but you could be if you got off your ass.
    1. Re:Two words. by CaptnMArk · · Score: 1

      Is this alive? I've looked at it recently and seems to be a dead standard proposal.

    2. Re:Two words. by Dasein · · Score: 1

      Yep -- support in Apache Axis and everything.

      --
      You are not a beautiful or unique snowflake -- but you could be if you got off your ass.
  30. ASN.1? by Anonymous Coward · · Score: 0

    Don't we already have ASN.1?

  31. Compression and huffing around by tod_miller · · Score: 2, Insightful

    A huff transform will give you entropy +1 compression. Not suitable for larger data sets (dictionary based compression is even better for this). 7z compression (or is it z7?) will give you a neat storage format.

    Lets talk about where this verbose talk of verbosity is stemming from:


    apple
    orange
    pineapple


    this is a data set. Noone knows what it is.
    Here it is again with some pseudo xml style tags
    I am listing vegetables here

    this is a list of vegetables
    vegetables are listed on thier own without any children pr parent tags, there can be one or more of them, this is version 1 of the document
    here now follows a vegetable
    tomato
    that was a vegetable
    here now follows a vegetable
    leek
    that was a vegetable
    here now follows a vegetable
    potato
    that was a vegetable
    here now follows a vegetable
    haddock
    that was a vegetable

    as you can see, this is (albeit slightly weird looking) list of items called 'vegetables'.

    The beauty of XML is two fold, the description of the document format (DTD and schemas) and the abilty to verify a document is valid, for any specified format.

    XML is a human readable file specification language, and file format, all in one, written in itself!

    A binary format of XML would be nice, you can make it yourself though.

    veg:http://slashdot.org/veg.xml
    v:tomato
    v:fru itcake
    v:lemongrass
    v:cat

    this is a minimal way to represent the same xml like structure, in a less verbose way.

    This is undeniable complexity, a binary format is just like a way of saying introduce a standard loosless compression format for XML, without changing what XML is.

    I say anything that gets the W3C stamp of 'this is official' gets my vote. After all, 1 bad standard is better than 11 good proprietary solutions in a world of millions of interconnected systems.

    --
    #hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
    1. Re:Compression and huffing around by Anonymous Coward · · Score: 0
      v:cat
      this is a minimal way to represent

      Your version loses structure. For something with 100% of the capabilities, s-expressions - the structure lisp's syntax uses - would be a good replacement.

      (vegiesoup (v tomatoe) (v potatoe))

  32. Sure... by Further82 · · Score: 1

    Given XML's predictable syntax and well-formed requirement it should be relatively easy to create a compression scheme taking advantage of XML then combining that with something like gz or bz2, rather than just compressing XML with gz or bz2. It would be like the difference between compressing a wav file with ZIP and with FLAC. Though with XML the difference would likely only be significant with very large files.

    Of course anything like this should be endorsed by W3C before being put into wide use.

  33. Several points. by NoMoreNicksLeft · · Score: 1

    1) Isn't the greatest benefit of XML that it can be opened in a text editor, and made sense of?

    2) Can't webservers and browsers (well, maybe not IE, but then it's not a browser... it's an OS component, haha) transparently compress XML with gzip or some other?

    3) Making it binary won't compress it all that much, using a proper compression algo will.

    4) Doesn't something like XML, that makes use of latin characters and a few punctuation marks, compress with insane ratios even in lame compression algo's?

    5) In a world moving ever closer to ubiquitous broadband, is a difference between a 10kb html file and a 17kb XML file all that fatal? Surely bittorrent and spam does more to suck up all available bandwidth than XML does (what little is out there).

    1. Re:Several points. by michaelggreer · · Score: 2, Informative

      The problem is that XML is being used for web services which are unlike HTML: the requesting machine will not like waiting 2-3 seconds for the response to the method call. These are interoperating applications, not people downloading text to read, so the response time is much more critical.

      I agree that gzip compression is a simple solution to the network problem. It does not address the parsing time problem, and in fact exacerbates it, but in my opinion the network issue is the big one. Time works in favor of faster parsing (faster processors), but works against network issues (more congestion). I would go with compression, test the results, and only then look into a binary solution.

    2. Re:Several points. by psyon1 · · Score: 1

      1.) Yes, but not all data needs to be able to be edited in a text editor. 2.) Yes, but binary formats allow for easier parsing, and less CPU load. 3.) The issue isnt size really, its parsing. 4.) see #3 5.) Having resources is no excuse for bloat. All servers and programs used to run on 286's... remember?

    3. Re:Several points. by ZakMcCracken · · Score: 1

      1) Isn't the greatest benefit of XML that it can be opened in a text editor, and made sense of?
      I think that would be confusing debugging with regular operation. Opening an XML file in a text editor and "making sense of it" is not all too common of an operation (hopefully) in production, solely used in debugging.
      A binary format accompanied by good (open source!) authoring & viewing tools that put the binary format back into a readable format and allow for easy edition would be very much sufficient for the "text editor" case.
      2) Can't webservers and browsers [...] transparently compress XML with gzip or some other?

      Sure, but you are just trading bandwidth for CPU, i.e. just moving the problem somewhere else and not solving it. Also, not practical on embedded devices.

      3) Making it binary won't compress it all that much, using a proper compression algo will.
      4) Doesn't something like XML, that makes use of latin characters and a few punctuation marks, compress with insane ratios even in lame compression algo's?

      Same issue.
      5) In a world moving ever closer to ubiquitous broadband, is a difference between a 10kb html file and a 17kb XML file all that fatal? Surely bittorrent and spam does more to suck up all available bandwidth than XML does (what little is out there).


      2 things:
      * in large business systems treating huge loads of documents, there is always a bottleneck at some point somewhere. A small amount of supplementary bandwidth/CPU doesn't cost much to consumer users, who have about 100x more than needed on average; but for a production system, increasing file size by X% means increasing pipe size requirements by X%, and costs follow suit

      * wireless bitrates do not increase all that much especially if you fix battery consumption, spectrum size and equipment bulk (latest standards tend to consume more battery power, larger swaths of spectrum and require bulkier equipment than established standards, which makes the transition to "higher bitrates" sometimes not practical)

    4. Re:Several points. by blofeld42 · · Score: 1

      1. It's _a_ benefit of text XML that it is sorta kinda human readable. The far greater benefit is that it's interoperable and standardized. 2. Yes, you can compress text XML. However, size is only part of the problem to be solved. Parse speed is another. Another thing people miss is databinding. If you have you often need to not only parse the text XML document but also create Java or other language datastructures in binary formats. This can be much faster in a binary XML format, since you can ship numeric data across the wire in IEEE format. The binary format, in a direct translation from text format, is usually somewhat smaller than the text, roughly 1/3 the size or so. This is bigger than a gzipped text XML file, but remember you can also gzip the binary. I've done some experimental implementations of binary XML (it turns out the technique I used is pretty much the same as what everyone else comes up with) and we found that gzipping the binary XML resulted in a somewhat smaller file than directly gzipping the text XML. creating the binary format apparently adds some structure that the gzip algorithm can exploit. 3. But you can gzip binary formats as well (actually, better than) text formats. 4. Again, size is not always the primary objective of binary XML. 5. Servers with high XML traffic are getting hammered. There are some ways around this, such as using optimized, restricted server-side XML parsers that only work with and are optimized for SOAP, but this introduces its own problems. (Are we using your restricted set of XML or my restricted set of XML?)

  34. Oh please god no by seldolivaw · · Score: 1

    I've had to work with binary XML for formatting WAP push messages and it is the ghastliest thing ever. Yes, I can see that it has low-bandwidth applications but my opinion is that I'd much rather have less bandwidth than have to deal with binary XML :-)

  35. SMPTE KLV by TheSync · · Score: 1

    I would suggest that people seeking fast, standard ways to deliver binary data look at SMPTE KLV (key, length, value) coding. It is SMPTE 336M, and is the standard for metadata coding in television, video, and digital cinema.

    1. Re:SMPTE KLV by Anonymous Coward · · Score: 0

      SMPTE looks like it has the same weakness of ASN.1 -- it requires a central registry of keys. Perhaps I'm wrong, but that's what a quick perusal of SMPTE.org's catalog seems to indicate.

  36. it's needed today, not tomorrow by alan_dershowitz · · Score: 1

    I totally drank the XML kool-aid, so don't interpret this as saying that I hate XML or anything. I really love it. However, you don't really get an appreciation of just how slow and bloaty XML is until you see it used in real life a few times. I sometimes wonder if these guys have ever built a system on something that wasn't a top-notch research bed.

    I'm not seeing in the article where he submits a solution to the problem, he just said as computers and networks get faster, the bloat won't be slow anymore. There's a very good chance I'll be using the same infrastructure in 3 years, so that is a non-solution for me, and I suspect many other people too.

    It's pretty clear to me he's out of touch. Everyone is clamoring for problems they have right now, and he wants everyone to wait for universal gigabit ethernet and 10Ghz CPUs.

    1. Re:it's needed today, not tomorrow by Abcd1234 · · Score: 1

      However, you don't really get an appreciation of just how slow and bloaty XML is until you see it used in real life a few times.

      Good grief... what are you doing that you actually notice the time it takes to parse XML? On an average computer today, if XML parsing is consuming a noticeable amount of time, you're either 1) doing something wrong (e.g., using DOM parsing instead of SAX) or 2) you're doing something that requires such high performance on the data input side that you obviously shouldn't have chosen XML in the first place.

    2. Re:it's needed today, not tomorrow by alan_dershowitz · · Score: 1

      server to server. We're using Apache SOAP libraries, and Oracle SOAP Transport (everything uses SAX), all in Java. You don't notice it on individual transactions unless load is high, you just have a much lower system throughput than a less wordy protocol.

  37. Sounds like CORBA or any other RPC. by Anonymous Coward · · Score: 2, Insightful

    The XML guys are funny. First make a text version of binary protocols to make it easy to sell XML them to the mass of "31137 HTML PRogrammers" who feel comfortable "programming" in dreamweaver; and then make a binary version to make it work.

    1. Re:Sounds like CORBA or any other RPC. by Anonymous Coward · · Score: 0

      The classic problem with our industry - everyone always assumes that they're the first one to ever have considered a problem and developed a solution. We're never going to get ahead if everyone keeps starting from scratch instead of building on proven technologies!

  38. parsing. by Anonymous Coward · · Score: 1, Insightful

    When the XML is in text you still need to parse it. Sounds like an easy job if you're just doing it on your home computer. But a server handling thousands of simultaneous transactions can get bogged down parsing text down to binary when it can just get sent in binary to begin with.

    MUCH faster. And you don't have the overhead of compression. Sure, gzip/bzip2 will cut down on network overhead, but what about processor overhead?

    1. Re:parsing. by Taladar · · Score: 1

      If you have to worry about that you shouldn't use XML. It is a format for Data Exchange (read: can be interpreted without specification if necessary) not a universal technology applicable for everything.

      You wouldn't e.g. send your processor the instructions in XML because it is simply absurd.

  39. Fielding on binary Waka (HTTP replacement) by suso · · Score: 1

    Roy Fielding, who is developing the Waka protocol, which is binary, argued at ApacheCon 2000 that as long as the protocol is still understood, binary utilities could be made to decode things for debugging. But the 99.9% of other requests would be more important and benifit more from being in binary.

  40. xtp:// by krygny · · Score: 1

    XML transfer protocol.

    Ok, we got a name. Now all we need is one fart smella to design it.

    --
    Research shows that 67% of those who use the term "research shows", are just making shit up.
    1. Re:xtp:// by *SECADM · · Score: 1

      Man... I really don't get these XML/W3C people. First they come to us with the wonder of "Text Format! It's the best thing since sliced bread!!!" And completely ignored people who were bitching about how verbose and inefficient XML really is. And now a couple of years later, people are all on board now, bought in with their technology. Oops, it is bloated. XML is slow, everyone admits. Now they come back with the same enthusiasm and tell us "We need a binary format for XML!!! Then all our problems will be solved!!!"

      And as for those mentioning of having another protocol to transfer XML files... Isn't that the whole *point* of SOAP, to use standard text and utilizing existing technology that is the HTTP web browser? What is the point of creating even more crud to "process" compressed/binary XML files, even *on top of* the bloat of parsing the damn thing??

      It's so hilarious (and sad) to see how the industry goes around in circle like a dog chasing its own tail. If we want thin data and fat clients, we could've stuck with good 'old ASN.1, vanilla RPC and XDR. But the industry didn't, they wanted fat data and all the magic XML promised them. So then let's do it right and build big enough bandwidth to support transferring all these angle brackets.

      --
      sure I'll have a sig.
  41. Then we wrap it again, that's what! by Tackhead · · Score: 4, Funny
    > Then what happens, do you base64 the binary xml and wrap it in an ascii xml document?

    Of course not! That's not XML!

    <file=xmlbinary> <baseencoding=64> <byte bits=8> <bit1>0 </bit><bit2>1 </bit><bit3>1 </bit><bit4>0 </bit><bit5>1 </bit><bit6>0 </bit><bit7>0 </bit><bit8>1 </bit> </byte>
    <boredcomment>(Umm, I'm gonna skip a bit if y'all don't mind)</boredcomment>
    </baseencoding> </file>

    Now it's XML!

    1. Re:Then we wrap it again, that's what! by Anonymous Coward · · Score: 0

      I realize that's a joke, but that's not XML syntax at all, by the way.

    2. Re:Then we wrap it again, that's what! by Anonymous Coward · · Score: 0

      What kind of programmer are you ?
      Since when do we start to count bits at 1 ?
      It should be bit 0 to bit 7.

    3. Re:Then we wrap it again, that's what! by Anonymous Coward · · Score: 0

      You forgot the quotes around the attributes!

      <file="xmlbinary"> <baseencoding="64"> <byte bits="8"> <bit1>0 </bit><bit2>1 </bit><bit3>1 </bit><bit4>0 </bit><bit5>1 </bit><bit6>0 </bit><bit7>0 </bit><bit8>1 </bit> </byte>
      <boredcomment>(Umm, I'm gonna skip a bit if y'all don't mind)</boredcomment>
      </baseencoding> </file>

    4. Re:Then we wrap it again, that's what! by TheTomcat · · Score: 2, Funny

      Since others feel the need to correct you, I'll join in:

      <file type="xmlbinary">
      <baseencoding base="64">
      <byte bits="8">
      <bit seq="0">0</bit>
      <bit seq="1">1</bit>
      <bit seq="2">1</bit>
      <bit seq="3">0</bit>
      <bit seq="4">1</bit>
      <bit seq="5">0</bit>
      <bit seq="6">0</bit>
      <bit seq="7">1</bit>
      </byte>
      <!--
      (Umm, I'm gonna skip a bit if y'all don't mind)
      -->
      </baseencoding>
      </file>

      <!-- </retentive> -->

      S

    5. Re:Then we wrap it again, that's what! by lack1uster · · Score: 1, Funny

      You didn't write the xml declaration header, YOU BASTARD!

    6. Re:Then we wrap it again, that's what! by CrackHappy · · Score: 1

      You put a closing "retentive" element in an HTML comment section, with no opening tag! Your parser is going to curse you and asplode your hard drive.

      --
      1f u c4n r34d th1s u r34lly n33d t0 g37 l41d Capitalization really works: i helped my uncle jack off a horse
    7. Re:Then we wrap it again, that's what! by TheTomcat · · Score: 1

      well yeah, if the parser REALLY sucks.

      Otherwise, it'll completely ignore anything between

      S

    8. Re:Then we wrap it again, that's what! by TheTomcat · · Score: 1

      stupid slashdot: ... between <!-- and -->

    9. Re:Then we wrap it again, that's what! by Anonymous Coward · · Score: 0
      Well yeah, if the parser REALLY sucks. Otherwise, it'll completely ignore anything between <!-- and -->.

      On the contrary. If it permits
      <!-- This is not -- repeat not -- a well-formed XML comment. -->
      or even
      <!-- Nor is this. --->
      without even emitting a warning, then it's not doing a proper job of conforming to the behaviour mandated in the XML standard...
    10. Re:Then we wrap it again, that's what! by CrackHappy · · Score: 1

      Have you used MSXML 2? :)

      Personally I have to use MSXML 4, and it's ok, it does what I need, the rest I make up in JS.

      --
      1f u c4n r34d th1s u r34lly n33d t0 g37 l41d Capitalization really works: i helped my uncle jack off a horse
  42. Doesn't work at all by revery · · Score: 1

    What the world needs now, it binary XML?

    Nope, sorry, those lyrics suck. We're gonna stick with Mr. Bacharach's version.

  43. XML is nothing but verbose s-expressions by Anonymous Coward · · Score: 0

    Another improvement the lisp guys noticed decades ago is instead of redundantly putting the name of the tag in the closing tag, you don't need it.

    <Name><FirstName>John</FirstName><LastName> Doe</LastName></Name>

    vs

    <Name><FirstName>Jo hn</><LastName>Doe</></>

    or better

    (Name (FirstName John) (LastName Doe))

    1. Re:XML is nothing but verbose s-expressions by Anonymous Coward · · Score: 0

      The lisp guys all have really cool CARs and CDRs, too.

  44. CPU vs. Bandwidth by ancalagon · · Score: 1
    While
    <a><b><c>
    is indeed much smaller as
    <FirstName><CompanyName><Address>
    , it takes the same amount of CPU cycles (more or less) to PARSE that string. If you have a really fast data stream (say 1 Gbit/s or more), you will have a problem on the receiver's end.

    If you gzip the stream, you save bandwidth, but gunzip on the receiver makes the problem worse. However, bandwidth is usually not a concern within clusters. You want to something with the data you received, right? This takes CPU cycles as well.

    What we need is a combination of XML and binary, fixed data streams.

  45. Images in XML? by jergh · · Score: 1
    ...mobile-phone companies such as Nokia, have argued for a binary XML format. Without it, large files such as images will take too long to download to devices such as mobile phones

    So they instead of JPEGs they use something like this?
    <image width="800" height="600">
    <pixel type="rgb">#FF0011</pixel>
    <pixel type="rgb">#444444</pixel>
    <pixel type="rgb">#838300</pixel>
    <pixel type="rgb">#303030</pixel>
    ...
    </image>
    WTF!?
    1. Re:Images in XML? by Anonymous Coward · · Score: 0

      probably more like: 30A5FF0119A...

  46. Binary XML? by telstar · · Score: 1

    That's what you get when somebody forgets to choose "BIN" in their FTP client and dumps a bunch of XML to a directory, right?

  47. But ASCII is binary after all... by MarkWPiper · · Score: 2, Interesting
    The fact is, ASCII is a binary format. It just happens to be a format that has become universally accepted. As the article says, there are certainly benefits to having ASCII-based XML: "The fact that XML is ordinary plain text that you can pull into Notepad... has turned out to be a boon, in practice," he said. "Any time you depart from that straight-and-narrow path, you risk loss of interoperability."

    However, if anything, XML has shown us the power of well-structured information. XML has given the possibility of universal interoperability. Developments in XML-based technologies have led us to the point where we know enough now to create a standard for structured information that will last for several decades.

    It's time that we had a new ASCII. That standard should be binary XML.

    When I think of the time that has been wasted by every developer in the history of Computer Science, writing and rewriting basic parsing code, I shudder. Binary XML would produce a standard such that an efficient, universal data structure language would allow significant advances in what is technically possible with our data. For example: why is what we put on disk any different from what's in memory? Binary XML could erase this distinction.

    A binary XML standard needs to become ubiquitous, so that just as Notepad can open any ASCII file today, SuperNotepad could open any file in existance, or look at any portion of your computer's memory, in an informative, structured manner. What's more, we have the technology to do this now.

    1. Re:But ASCII is binary after all... by Jon+Pryor · · Score: 1
      It's time that we had a new ASCII. That standard should be binary XML.

      One minor question: How do we debug this? :-)

      The nice thing about plain text (ASCII) is that I can open it in an editor and read it, without worrying that my editor may be screwing things up. I can't do that with Binary XML.

      If we did as you propose, I couldn't easily examine the output of my program. I'd instead have to load the output of my program in Super Notepad to view the output. And if my Binary XML functions are buggy? Super Notepad is no longer a help. So what do I do? Start adding print statements to get plain text output back, allowing me to understand what's actually happening. (Sure, I could use a debugger too, but I need some context to find where the bug is, so I'll need ASCII output at some point...)

      The last thing I need is yet another layer of binary madness between me and the data I'm trying to interpret. That way lies madness.

    2. Re:But ASCII is binary after all... by Anonymous Coward · · Score: 0

      LMAO!! This is brilliant! Somebody mod this +5 funny please. Talk about well written -- I actually thought MarkWPiper was serious for a sec.

    3. Re:But ASCII is binary after all... by Anonymous Coward · · Score: 0

      For example: why is what we put on disk any different from what's in memory?
      Pointers. We don't always replicate data in memory because we store pointers to the data, but on disk, we have to replicate the data for long-term storage (because the pointers become invalid).

    4. Re:But ASCII is binary after all... by MarkWPiper · · Score: 2, Insightful
      The true problem is that, right now, we're stuck in a transition where there is not yet an accepted binary standard. So yes, right now there is a problem in debugging. But give it a few years, and (hopefully!) there won't be.

      However (as I tried to emphasize), ASCII is binary too. It's not that binary is inherently more difficult to debug. It's that we need a binary standard as universal as ASCII has become.

      Imagine debugging before in the 1960's, when ASCII wasn't standardized. We forget about those times now, because ASCII has been there for nearly 50 years. But go ahead, take a look.

      Believe it or not, there were over 60 binary text standards in use before ASCII. I think we should be thanking Bob Bemer (the father of ASCII) a whole lot more often.

    5. Re:But ASCII is binary after all... by MattRog · · Score: 2, Interesting

      Jesus Christ, no. The solution is simple:
      (1) Have every PC OS contain a DBMS (this is not as difficult as you would think)
      (2) Always keep your data in a DBMS
      (3) Have said DBMS transfer the data via whatever method it would like. Chances are this would be some sort of compact, efficient binary method.

      --

      Thanks,
      --
      Matt
    6. Re:But ASCII is binary after all... by fizban · · Score: 1

      I think we should be thanking Bob Bemer (the father of ASCII) a whole lot more often.

      Unless you live in China, in which case you curse him every waking morning.

      --

      +1 Insightful, -1 Troll. What can I say, I'm an Insightful Troll.

    7. Re:But ASCII is binary after all... by FangVT · · Score: 2, Interesting
      The fact is, ASCII is a binary format. It just happens to be a format that has become universally accepted. As the article says, there are certainly benefits to having ASCII-based XML: "The fact that XML is ordinary plain text that you can pull into Notepad... has turned out to be a boon, in practice," he said. "Any time you depart from that straight-and-narrow path, you risk loss of interoperability."
      Not that anybody will care but...

      XML is not ASCII. XML is Unicode. That's why Tim Bray said "plain text" not ASCII.

      Because it was such a long hard road for ASCII to become the universal data format that it is for English text the creators of Unicode wisely made sure that there was backwards compatibility such that any valid ASCII texts (ones that do not include OS-specific, proprietary extensions in the range above 0x7F) are also valid Unicode texts when the encoding is UTF-8.

    8. Re:But ASCII is binary after all... by Anonymous Coward · · Score: 0

      You're a tool

    9. Re:But ASCII is binary after all... by mlippert · · Score: 1

      thanks, I was hoping someone would point that out! Too many people still say ASCII when what they really mean is plain text.

  48. Modern C compilers write better assembly by chopper749 · · Score: 1

    them most assembly programers can right.

  49. Re:10 types of people by Eric604 · · Score: 1
    (what is 10 in binary?)

    1010

  50. Actually... by tsanth · · Score: 1

    There are better ways to compress XML.

    A little understanding about what a particular XML file is supposed to represent can go a long way.

  51. Ummm zip is open by Anonymous Coward · · Score: 1, Informative

    While I do like Bzip2 and Gzip better, zip is open. There are numerous open source compression/decompression libraries for it.

    1. Re:Ummm zip is open by phats+garage · · Score: 1

      The folks at infozip would agree. They like to say that unzip is the third most portable program in the world, next to "hello world" and c-kermit.

    2. Re:Ummm zip is open by chrish · · Score: 1

      The compression used in gzip (which is the same "deflate" compression used in ZIP files) is also open; check out zlib's license.

      --
      - chrish
  52. This is really about making it proprietary. by khasim · · Score: 2, Insightful
    Zip functionality is so easy to implement in servers and clients that there really isn't any argument about "binary XML".

    This is all about different companies trying to get THEIR binary format to be the "standard" with XML.

    From the article
    Manufacturers of consumer devices such as Canon, as well as mobile-phone companies such as Nokia, have argued for a binary XML format. Without it, large files such as images will take too long to download to devices such as mobile phones, they argue.
    Images are already binary data. They really don't compress much more (if you've chosen the right format). That means that they will take the same amount of time to download, binary XML format or not.
    1. Re:This is really about making it proprietary. by arkanes · · Score: 1

      Maybe they're talking about SVG images.

  53. XML images !? by morane · · Score: 2, Funny
    Without it, large files such as images will take too long to download !

    Yeah, right ! XML binary images... So needed...

    <image>
    <pixel x="0" y="1">
    <r value="255" />
    <g value="255" />
    <b value="255" />
    <pixel/>
    ...
    <image/>
  54. Overwhelming feeling... by GOD_ALMIGHTY · · Score: 4, Insightful

    of "I told you so!" coming over. Between all the people who jumped on the web services bandwagon without any clue how to handle distributed systems efficiently and the "OMG! It's human readable!" crowd, the architecture de jour has become a bloated PITA. Why this wasn't built into the spec in the first place alludes me. If we can use tools like ethereal to read those binary IP datagrams, why wouldn't the same concept be used for this standard? A standardized, compressed, data format with a standardized API for outputting plaintext (XML), would have allowed this system to be much more efficient.

    Didn't anyone remember that text processing was bulky and expensive? Sometimes the tech community seems to share the same uncritical mind as people who order get-rich-quick schemes off late night infomercials. I doubt XML would have gotten out of the gate as is, had the community demanded these kinds of features from the get-go.

    --
    Arrogance is Confidence which lacks integrity. -- me
    1. Re:Overwhelming feeling... by michaelggreer · · Score: 1

      The obvious advantage of a text format, and the reason XML became popular, is that we had tons and tons of text processing tools already available. All you needed were parsers, and your dev tools already worked. Starting with a binary protocol would have been too steep. Same reason HTML succeeded.

    2. Re:Overwhelming feeling... by GOD_ALMIGHTY · · Score: 1

      Why wouldn't the parsers handle the text/binary conversion? If the primary presentation format was text XML, what would be the difference, other than a more efficient technology. Your dev tools could still work just as fine. Besides, how much XML do you hand edit? I really don't find it easy to deal with in raw format.

      XML and web services were sold as enterprise technologies from the beginning. How we've managed to max out our current generation of hardware without significantly increasing the amount of transactions processed is beyond me. I've seen far too many projects bogged down by all the pitfalls XML and web services allow less experienced developers fall in.

      --
      Arrogance is Confidence which lacks integrity. -- me
    3. Re:Overwhelming feeling... by michaelggreer · · Score: 1

      You are right, and perhaps I was unclear. I responding to your statement that starting XML as a text format to begin with was a poor choice. Whereas, I think it was an excellent choice. What we do now is another matter, but I still think ease of development is the most important issue: it is by far the most expensive part of a project.

    4. Re:Overwhelming feeling... by Anonymous Coward · · Score: 0

      I am tired of people poo pooing xml. They claim that xml was just a hot tech (which it was) that really had no lasting value (which it does!). They say that, "the overhead of parsing it too long", "why the hell do you want a human to read the data at that point", blah blah. Listen, XML is a logical leap forward in data integration. The data and the definition of those data are fused together in one document. That is of great, real importance. If I am going to fuse the data and its definition some human has go to beable to read the defintion. XML is a simple approach to that. You don't have to use xml to transfer data to one system to another - but somebody has got to store the definition of that data somewhere (and separately). Sure makes things easier when logic is tightly coupled with its data. I would rue working on a project where the architect didn't see the value of XML data integration - suggest basic ideas of encapsulation and couple are not properly understood.

    5. Re:Overwhelming feeling... by ttfkam · · Score: 1
      Didn't anyone remember that text processing was bulky and expensive?

      Absolutely right! Time to dump HTTP, FTP, SMTP, SNMP, NTP, NNTP, POP3, IMAP4, etc. Text-based stuff sucks! Why haven't we learned that yet?

      Now, those binary MS Word 6.0 documents and Exchange Server extensions are another story. Elegance at its finest!
      --

      - I don't need to go outside, my CRT tan'll do me just fine.
    6. Re:Overwhelming feeling... by thogard · · Score: 1

      Its not a logical step forward. Study lisp and learn why.

    7. Re:Overwhelming feeling... by leighklotz · · Score: 1

      >Didn't anyone remember that text processing was bulky and expensive?
      Human understanding of binary formats is what's bulky and expensive.

      By comparison, getting a nearly-universal data description format widely adopted is hard. The W3C committee has now developed use cases for binarization and compression that show there are different problems to be solved, and they can be solved differently. These problems can be solved by using computers and algorithms, and by the nature of things, the solutions will get better, faster, and smaller as time goes on.

      On the other hand, trying to get legions of programmers to hand-code optimal binary formats, or use systems such as ASN.1, has been shown time and time again to be difficult and error-prone and (I would say) counter-productive, and the problems is not likely to get any better as it's a human issue, not a computer issue.

      As another poster commented, MIME, FTP, SMTP, HTTP, POP, and IMAP are all wildly succesful text-based protocols. For wildly popular binary protocols all I can think of is MS Word binary format (which they're dumping for XML and compressed XML), SSH, and the binary part of HTTPS (which encapsulates text-based HTTP).

      I think using building on what Dave Raggett calls the "extraordinary success of HTML" (and by extension HTTP and MIME) and continuing to produce human-readable protocols is indeed the best path to success, and that designing a few standard ways to achieve the use cases that the W3C has gathered is the way to solve the specific problems.

      Some of the most important use cases are
      - 1. Generic compression (gzip/deflate is pretty good here)

      - 2. Schema-specific optimal compression (a program processes the Schema or DTD and outputs a binary converter; standardizing this compression mechanism would then allow the DTD or Schema to be enough to specify the binary version to any parser, preprocessor, or generator in any language)

      - 3. Binary inclusions -- what if you want a JPEG file inside your document; short answer is to use MIME just like email does but there may be some other solutions

      - 4. All of the above, but retaining the ability to include content allowed but not specified by the schema (foreign namespaces, etc.) This is much harder but just because this problem exists doesn't mean that we have to give up those people whose applictions are perfectly happy with 1, 2, and 3 and go back to everyone chiseling out their own binary protocols.

    8. Re:Overwhelming feeling... by prockcore · · Score: 1


      Didn't anyone remember that text processing was bulky and expensive?


      The tradeoffs don't outweigh the benefits. When you go with a binary format you immediately run into limitations. If you've ever looked at a binary format that has been around for 10 years, you see tons of hacks made over time.

      I'm talking about files running out of header room, and adding offsets to "extended headers". Strange numeric representations to represent data that is larger than previously anticipated.

      XML benefits from having none of these limitations. Numeric fields can be as large as you want them to be. There is no "header". You don't have the problem of "This field is reserved in Version 1.0 and must be 0, in Version 1.1 it will have an offset".

      Plus storing binary data in a database is the biggest pain in the ass ever.

    9. Re:Overwhelming feeling... by nikster · · Score: 1

      OK so everybody keeps saying that parsing XML is slow. do you have any proof for that?

      In my experience, that's just not true. I was working on an extreme case where we had plain data inside the XML file. logs with 1000s of x/y entries.

      i assume this is the very worst case since the data complexity is very low - flushing that out to a binary array would be super-simple.

      it was slow. first there was a bug in the DOM parser that made it waste insane amounts of memory, so we used SAX. then we reorganized the data from nm to n1, n2, n3.... and so on, which sped things up by 10x.
      eventually we stored the raw data as binary streams which sped things up a little bit more (maybe 2x).

      but this is an ideal case. in cases, binary was only marginally faster than SAX parsing.

      i think there are two reasons:

      1) SAX is pretty efficient. keep in mind that a SAX parser doesn't really keep that much stack around. are your XML tags stacked 20 levels deep? big deal, that's exactly the max. stack size of the SAX parser. 20, or 100, is nothing.

      2) even if everything is binary, you _still_ have to do the parsing. you still have to take the binary values and put them in all the right places. that takes a while, XML or not.

      so, no, i don't think XML is significantly slower than binary. i would love to see some real world comparisons.

      file size and network traffic, on the other hand, are much more clear-cut. if i zip an xml file, i can shrink it by 95%. definitely an issue (hint: just zip them b4 sending...).

  55. what's wrong with GZip? by CaptainPinko · · Score: 1

    just gzip, and proceed as before. it would require only minimal changes in the work case and none at all in the best case. isn't this how OpenOffice works?

    --
    Your CPU is not doing anything else, at least do something.
  56. Check out the analysis at: by Anonymous Coward · · Score: 1, Informative

    http://news.com.com/5208-7345-0.html?forumID=1&thr eadID=4163&messageID=23888&start=-1

    1. Re:Check out the analysis at: by Anonymous Coward · · Score: 0

      Correction:
      comment at cnet

  57. How would you grade XML? by Anonymous Coward · · Score: 1, Informative

    The design goals for XML are:

    1. XML shall be straightforwardly usable over the Internet.
    Grade: A

    2. XML shall support a wide variety of applications.
    Grade: B

    3. XML shall be compatible with SGML.
    Grade: don't know / don't care

    4. It shall be easy to write programs which process XML documents.
    Grade: F

    5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
    Grade: F

    6. XML documents should be human-legible and reasonably clear.
    Grade: F

    7. The XML design should be prepared quickly.
    Grade: F

    8. The design of XML shall be formal and concise.
    Grade: C

    9. XML documents shall be easy to create.
    Grade: C

    10. Terseness in XML markup is of minimal importance.
    Grade: A+

    1. Re:How would you grade XML? by bmalia · · Score: 1

      4. It shall be easy to write programs which process XML documents.
      Grade: F


      AMEN!

      --
      There's no place like ~/
    2. Re:How would you grade XML? by keytoe · · Score: 1

      While agree with most of your assessment, I do have issue with a couple of the marks you assigned:

      4. It shall be easy to write programs which process XML documents.
      Grade: F


      It is true that writing a proper XML parser is not an easy task. However, we are lucky enough to have sixteen bazillion different implementations readily available for use.

      5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
      Grade: F


      Optional means you don't have to use them. For those who need them, there they are.

      6. XML documents should be human-legible and reasonably clear.
      Grade: F


      What you say?! An F?! Are you suggesting that there is a more human readable format for arbitrarily structured data? Maybe you have an innate ability to scan real binary on the fly, but I don't.

      9. XML documents shall be easy to create.
      Grade: C


      This one is more pedantic, but I'm on a roll. XML documents are easy to create. Tedious, yes - but hard?

      10. Terseness in XML markup is of minimal importance.
      Grade: A+


      If only there were a grade higher than A+. Perhaps Z-?

  58. Because it's freaking slow by grahamsz · · Score: 1

    SOAP is an excellent technology but it's SLOW. Servers get bogged down doing string processing, and when you are handling thousands of requests per minute it's a big problem. Adding a gzip/gunzip into the mix would make it slower still.

    As it happens, most soap requests are NOT human readable. Sure i can sit and figure one out, but unless it's a trivial example, trying to decipher it isn't easy.

    A standard binary xml format would allow a standard binary soap variant. Debuggers could hand bsoap->soap translation and everything would get quite a speed boost.

    My argument would be that if it's not standardized then people will develop non-standards-complient implementations, which is definitely a bad thing.

    1. Re:Because it's freaking slow by temporalillusion · · Score: 1

      That makes sense, thanks for taking the time to reply :)

  59. This just in...binary is faster by Anonymous Coward · · Score: 1, Insightful
    XML is verbose (especially when pretty-printed), and the signal:noise level is pretty low. The problem can be compounded by poorly-designed schemas. These things should be painfully obvious (if they're not, please stay away from the software industry). Only slightly less obvious is that XML is not a silver bullet. Some things simply should not be stored as XML.

    Clearly these fundamental tenets have escaped the People In Charge in many places, who are now discovering that their brilliant ideas to represent images, large databases, etc, as XML were, in fact, fucking stupid ideas.

    Enter binary XML. This lets the People In Charge save face by saying that the system still uses XML, so they must have been right when they designed/required it to use XML in the first place. And now it's 50% faster!

    The article has an excellent point, there will be compatibility problems and we'll "degrade" to different binary XML formats - each best suited to a particular niche. That's exactly where the world was before XML came along - data formats designed for (and reasonably appropriate for) particular applications. Those formats are invariably more efficient than XML, and are often simpler and easier to parse than XML. Binary XML attempts to combine those old-fashioned file formats with XML, resulting in a system that's more bloated (and slower) than the old way but not quite as bad as XML in its current form. So now we've come full circle, except that we've added an extra layer of bloat to something that worked well enough to begin with. Congratulations, Mr. Binary XML person! You fail it!

    While I'm at it: If network bandwidth is really the bottleneck, use zlib. XML's best feature is that it compresses really well.

  60. Arg damn HTML processing by Anonymous Coward · · Score: 0
    probably more like:

    <image width="800" height="600" type="JPEG">30A5FF0119A...</image>
    1. Re:Arg damn HTML processing by woah · · Score: 1

      It's called base64 enocoding.

  61. Binary XML by fnord_uk · · Score: 1

    One approach might be to provide an application layer PEP, to transcode the text into binary. Then the impoverished clients can have their binary and the rest of the world can have their text. It could be at the edge of the wireless network, or at the server.

    --
    In theory, theory and practice are the same. In practice, they're not.
  62. binary xml - a dreadful idea by drbart · · Score: 1

    step one in dealing with the speed issue is to jettison the various slow parsers like SAX - you can get competitive with native serialization and retain the text advantages of xml. see frex http://sourceforge.net/projects/javadata/

    i like the comments about the binary->xml->binary full circle. reminds me of how the original ethernet evolved from a coax bus to a point to point switched network.. ether in name only.

  63. Compression vs Efficiency by Codebender · · Score: 1

    There must be 50 posts already saying basically:
    "Just compress the XML, duh!"

    Compression is already in use on most servers, assuming the clients send the appropriate accept headers. The perceived sluggishness of XML is partly caused by the fact that the XML must be generated by the sender and than parsed by the receiver. Numeric values have to be converted from binary to ASCII-coded-decimal and back, strings have to be embedded and extracted, etc. I think this is the type of inefficiency that these people are trying to prevent.

    That said, I think binary XML is a terrible idea. Keep using gzip on HTTP transfers and the technology will catch up shortly.

    <?xml version="1.0"?>
    <flamebait audience="geeks">
    The people complaining about XML being slow are probably using Java anyway. They should be used to their software being sluggish.
    </flamebait>

  64. YAML any 1? by Anonymous Coward · · Score: 0

    What about using YAML

  65. This is a no brainer! by Anonymous Coward · · Score: 0

    Stop trying to use XML for inappropriate situations (where large data volumes, good performance are requirements).

    DUH!

  66. Why not re-examine http? by digitalgimpus · · Score: 2, Interesting

    I think that's where the true problem lies. HTTP.

    We need to look towards http 2.0. What I would want:

    - pipelining that works, so that it could be enabled for use on any server that supports http 2.0
    - gzip and 7zip support.
    - All data is compressed by default (a few excludes such as .gz files, .zip files etc. since that would be pointless).
    - Option to initiate persistant connection (remove the stateless protocol concept), via a http header on connect. This would allow for a whole new level for web applications via SOAP/XML.

    There are tons of other things that could be enhanced for today's uses.

    HTTP is the problem. Not XML

    1. Re:Why not re-examine http? by Codebender · · Score: 1

      > pipelining...

      Pipelining already works, most clients just don't use it.

      > ...gzip and 7zip support.

      Nearly all web servers already support gzip. HTTP 1.1 supports arbitrary compression protocols. Any server/client can add 7zip support right now, just put it in the "accept" header.

      > All data is compressed by default

      That's a terrible idea. All data is plain by default, and compressed any time the client says that they can understand compression. (Which is how HTTP/1.1 works) That should be almost all the time, of course, but I want to be able to use telnet to test a web server or make an extremely simple web server/client that doesn't need compression libraries. (I remember seeing a web server built into an RAJ-45 jack at one point...)

      > Option to initiate persistant connection

      HTTP/1.1 already supports that.

      See a trend? HTTP/1.1 _is_ this wondrous "2.0" that you're asking for.

    2. Re:Why not re-examine http? by digitalgimpus · · Score: 1
      >Pipelining already works, most clients just don't use it.


      Actually, there are lots of compatibility issues do to how some servers handle it. See here

      My proposal is to create a new HTTP, that in order to claim support... you must support it.

      > Nearly all web servers already support gzip. HTTP 1.1 supports arbitrary compression protocols.
      > Any server/client can add 7zip support right now, just put it in the "accept" header.


      Only a handful implement it. It's very under used. CPU is cheaper than bandwidth at this point. 7-zip tends to be faster, and better compression.

      This is why http 1.1 sucks right now... because nobody takes advantage of the ability to get performance, because they are afraid to break things.

      We need a protocol that is designed for performance. Transfering plain data is archaic and unnecessary....

      we can also just go back to plain-email... aka pen and paper. But is that efficient?


      XML isn't the problem... it's the pipeline it uses. HTTP needs to branch a 2.0, with strict standards to adhere to. It needs to be geared towards performance TODAY, not for compatibility with 1995 webservers.
    3. Re:Why not re-examine http? by Anonymous Coward · · Score: 0
      HTTP responses must be sent in order and may not be interleaved, which fails to solve the head-of-line blocking and throughput problems "pipelining" implies.

      HTTP persistent connections are merely a TCP-level optimization. User agents and proxies are always permitted to drop a connection and send the next request over another, so a stateful server always needs some other means to associate the requests.

    4. Re:Why not re-examine http? by ad0gg · · Score: 1

      Uh almost everything supports compression(GZIP). IE(90% of the market) and mozilla supports it by default and most big website support it including google and not mention there's apache module and IIS has compression enabled by default How can you break things with compression? Client sends that it can support compression(Accept-encoding: gzip), server responds with compression. Only thing that isn't compressed is the client request which is usually just under 1k of data unless of course your doing a post.

      --

      Have you ever been to a turkish prison?

    5. Re:Why not re-examine http? by MikeBabcock · · Score: 2, Insightful

      As others have pointed out, most of those features are here today.

      Please remember that not all XML data is transmitted by HTTP however (thank god).

      --
      - Michael T. Babcock (Yes, I blog)
    6. Re:Why not re-examine http? by lpontiac · · Score: 1

      There's this nifty protocol called TCP/IP that I think could do what you're looking for. You can send data in either direction at any point in time and compress it however you like.

    7. Re:Why not re-examine http? by Hallow · · Score: 1

      >> Option to initiate persistant connection
      >
      > HTTP/1.1 already supports that.

      Some HTTP/1.1 web servers, like Apache, support KeepAlive, which creates a persistant connection, until a certain amount of time has passed. Right now that information is not available in the browser, and I've seen no way to exploit it to create applications (and a good persistant connection would also perhaps allow thing like push, not just pull). It seems to exist solely for performance reasons, not as a way to eliminate or reduce using cookies.

  67. Makes no sense-UN Software. by Anonymous Coward · · Score: 0

    "Binary XML would destroy what makes xmal powerful: being able to use vi or emacs to understand its content, no fuss, no adobe reader like software, no nothing."

    Do you know of any format that doesn't require a piece of software as an intermediate between the user and the machine?

  68. XML-specific binary is for sure better than zip by aclidiere · · Score: 1


    As mentioned elsewhere in this thread, it is already possible to use zip compression at transportation.
    But there are reasons why XML-specific encoding has chances to be far more efficient. Consider this:

    <hello></hello>

    For anyone familiar with XML, it translates into:

    <hello/>

    ..which takes less space.

    The '<', '>', and '/' represent the "empty element" aspect of the XML code, and that seems like an overkill. Think of way to represent the notion of "empty element". I'm sure that if all notions of XML were listed, you wouldn't need a lot of bits to uniquely code each of them.

    Already, without any statistical compression, we've saved many bytes in my example.

    Other advantages of being language-specific is that, knowing the weaknesses of the language, the binary format can make a smart use of redondancy. (Such as: I'd rather lose comments than useful code -- may the comments be coded in the binary-XML)

  69. It's a markup language by kahei · · Score: 1


    It's a markup language, it's not supposed to be ideal for general purpose data transfer.

    People should stop trying to optimize it for a task it wasn't designed for. Focus on making XML better for markup, and for pity's sake come up with something else that's concise and simple and efficient for general purpose use.

    --
    Whence? Hence. Whither? Thither.
  70. Great! by babyblink · · Score: 1

    So I suppose to call this 'FXML' ?

    --
    [self dealloc];
  71. definitely by PureCreditor · · Score: 1

    Binary XML would be provide a much better transport for binary data when compared to Base64 or even something like QuotedPrintable. Using extra layers on top of XML just to transport binary data is a waste of resources. What we need are fewer but more powerful standards. Binary XML will do JUST that.

  72. Does the World Need Binary XML? by Reignking · · Score: 1

    Q: Does the World Need Binary XML?

    A: 0

    --
    One man's Funny is another man's Offtopic.
  73. Ask Erik Naggum! by notany · · Score: 2, Interesting
    Erik Naggum (SGML/XML-guru) who first proposed empty elements

    <foo/>

    form Re: Lisp syntax, what about resynchronization?

    ... so it had to come up, and one of the least
    productive solutions, XML, won the day. I was there, at the conference
    table where the first thoughts that became XML surfaced. a few months
    earlier, I had proposed the need for a special marker for empty elements
    -- and then retracted that proposal because it led to new problems -- but
    guess what survived in XML!...

    Attributes in XML are inherited from SGML and they were thingking markup for textual documents. When you want to represent data it being attribute or not is completely irrelevant.

    Whether something is an attribute or element is _completely_ arbitrary.
    It is based on some arbitrary choices in the design process that reveal
    absolutely no inherent qualities. For purely pragmatic reasons, SGML
    folks will use attributes for some things and elements for others because
    their tools can deal with some things in attributes and some things in
    elements. The faulty idea that attributes say something "about" the
    element and sub-elements somehow constitute be their contents is the same
    premature structuring that premature optimization of code suffers from.
    The whole language is incredibly misdesigned in making that distinction.

    Deep explanation: From:The horror that is XML

    ... XML, being the single suckiest syntactic invention in the history of
    mankind, offers you several layers at which you can do exactly the same
    thing very differently, in fact so differently that it takes effort to
    see that they are even related.

    <foo type="bar">zot</foo> actually defines three different views on the
    same thing: Whather what you are really after is foo, bar, or zot,
    depends on your application. XML is only a overly complex and otherwise
    meaningless exercise in syntactic noise around the message you want to
    send. Its notion of "structure" must be regarded as the same kind of
    useless baggage that come with language that have been designed by people
    who have completely failed to understand what syntax is all about. It is
    therefore a mistake to try to shoe-horn things into the "structure" that
    XML allows you to define.

    In the abaove example, foo can be the application-level element, or it
    can be the syntax-level element and bar the application-level element.
    It is important to realize that SGML and XML offer a means to control
    only the generic identifier (foo) and their nesting, but that it is often
    important to use another attribute for the application. This was part of
    the reason for #FIXED in the attribute default specification and the
    purpose of omitting attributes from the actual tags. In my view, this is
    probably the only actually useful role that attributes can play, but
    there are other, much more elegant, ways to accomplish the same goal, but
    not within the SGML framework. Now, whether you use one of the parts of
    the markup, or use the contents of an element for your application is
    another design choice. The markup may only be useful for validation
    purposes, anyway.

    Let me illustrate:

    <if><condition>...</condition>
    <then>...</then>
    <else>...</else>
    </if>

    The XML now contains all the syntax information of the "host" language.
    Many people think this is the _only_ granularity at which XML should be
    used, and they try to enforce as much structure as possible, which

    --
    Dyslexics have more fnu.
  74. Just don't screw up XML by PepeGSay · · Score: 1

    don't screw up XML because people architect their applications poorly. i've worked on a few applications that use web services only because they *can* not because they should, then people complain about performance, even though we said "using web services will give you a 40% performance hit".

  75. Don't worry, we'll be at 10ghz soon... by dim5 · · Score: 1
    "Bray noted that there are methods for speeding up XML traffic other than creating a binary format. Advances in networking and processing power go a long way in addressing performance concerns, though perhaps not on battery-constrained mobile phones, he said."

    Didn't we just get done talking about the problem with assuming these things will clear up with faster tech? I was surprised to read this from Bray.

    --

    Is something burning?
    Oh, it's my karma.

  76. Vast omissions! by kahei · · Score: 4, Funny


    Aside from the mistakes pointed out by others, you also forgot to reference the xmlbinary namespace, the xmlbyte namespace, and the xmlboredcommentinparentheses namespace, and to qualify all attributes accordingly. You also didn't include anything in or any magic words like CDATA, and you didn't define any entities. You also failed to supply a DTD and an XSL schema.

    This is therefore still not _true_ XML. It simply doesn't have enough inefficiency. Please add crap to it :)

    --
    Whence? Hence. Whither? Thither.
    1. Re:Vast omissions! by Tribbin · · Score: 2, Funny

      Tsss, like your message is pure XML. It's not even proper XHTML!

      "<BR>Aside from the mistakes pointed out by o"

      Empty elements must either have an end tag or the start tag must end with />

      --
      If you mod this up, your slashdot background will turn into a beautiful sunset!
    2. Re:Vast omissions! by Anonymous Coward · · Score: 0

      Tsss, like your message is pure XML. It's not even proper XHTML!

      You'll have to ask Taco about /.'s continual mangling of proper XHTML. As this article on A List Apart shows, it can be done pretty effeciently.

    3. Re:Vast omissions! by cicho · · Score: 1

      I thank you, sir. Not only is this hilarious, but right on the mark. Wish I had mod points today.

      --
      "Only the small secrets need to be protected. The big ones are kept secret by public incredulity." - Marshall McLuhan
  77. What about encryption? by mjh · · Score: 1

    So, does an XML SOAP message encrypted using WSSEC constitute binary XML? If the answer is "Yes", then how would a world w/out binary XML enable encryption? If the answer is "no", then what constitutes binary XML? What about XML wrapped in SSL?

    Opaque doesn't always mean proprietary.

    --
    Key to financial independence: Spend less than you earn. Save and invest the difference. Do it for a long time.
    1. Re:What about encryption? by Anonymous Coward · · Score: 0

      No, when people say "binary" they don't mean variable-length everything with escaping, whitespace, comments, namespaces, and other assorted crap that has to be parsed away before you can use the actual data.

  78. Am I Missing the Point? by CyNRG · · Score: 1

    I thought one of the major points of XML was to keep it ASCII so that it is platform neutral? If you go to binary, then you have to perform byte reversal of the binary XML message if different types of CPU's are involved. We would be right back where we were.

    I know the other args: common format, blah, blah. However XML, as Microsoft has proved, can be made very proprietary in the blink of an eye.

    I thought about this when XML was first gathering steam. Like everything else it's all about marketing and not about thinking.

  79. Desperately needed by TFoo · · Score: 1

    XML has some things going for it -- as a markup language for primarily text data (eg web pages) it works fairly well.

    At a high level, XML is *CONCEPTUALLY* a great idea. I like the DOM programming model -- it is very expressive, and yet even complicated data tends to be understandable when represented as a DOM tree. Unfortunately, the basic text-XML representation that everybody uses is a terrible wire format from an efficiency and ease-of-programming perspective.

    The real problem with XML is the massive inefficiency at the lower levels. XML is easily 2x-5x less efficient than comparable wire formats. For example I once worked on a project for an Instant Messaging server which used XML to communicate. I abstracted out the very lowest-level protocol layers so that they used simple XML token-compression and attribute-name compression....the result was a fully 400% increase in throughput through the server! This is primarily because the processor has so much less data to process (less string comparasins, string copies, etc) and therefore the memory bandwidth requirements are significantly lower.

    Complexity of parsing is an issue as well. Writing a complete XML parser is full of subtleties and surprisingly difficult -- don't jump in and say otherwise unless you've actually done it. If you don't believe me, go look at how complicated something like the expat source or the dom4j source is.

    A primary XML design goal (go read the XML designer's notes) was for ease-of-human-reading: this comes at the expense of efficient machine reading. Because data is not length-prefixed, and of arbitrary length, there are massive inefficiencies in buffering which leads to a lot of copying as you parse. There are never any "hints" in the protocol about what is coming up: and so parsers are forced to buffer things for an arbitrary amount of time (looking for that closing /> for example) and end up using a lot of memory and doing a lot of buffer expanding, or complex buffer-chaining stuff.

    Additionally, a text-based representation like XML is extremely inefficient for binary data. Having to parse through all of your data and escape/unescape special characters is yet another big performance hit.

    A standardized and fully-supported binary XML representation would have a huge impact on the performance of things we use every day -- and it could all happen at the low levels without even touching app-level code.

  80. The world does not need binary XML by sys49152 · · Score: 0, Redundant

    Here's why:

    1. As noted in the article, there are other ways of solving the problem:
    a. XML parsing by ASICs in dedicated XML processing hardware.
    b. Moore's Law.

    2. XML is successful specifically because it's text based and a standard. Just as compiled languages are slower than assembly, and managed code is slower than compiled code, the benefits of text based information is worth the cost.

    3. I'm not sure the problem even exists. I've spent the last 3 years specializing in SOAP Web Services, and you know what? None of my (very big) clients actually has a problem with too much XML on the network. They just anticipate having this problem in the future; see point 1.

    4. This one's a stretch, and I'm not sure I'm comfortable with it yet, but... If a system is self-contained, even if distributed, then I don't see the value in using XML for communicating between processes. You might as well use the native RPC mechanism, such as RMI for Java apps. If a system is not self-contained, then XML should be used for just the interfaces exposed to the outside world. Internal communication should remain native. In other words, a lot of XML on the network is completely unnecessary.

  81. Binary XML could be a good thing by quinnharris · · Score: 1

    If a binary XML file is semantically equivalent to its text counterpart and you have good tools to convert between the two, binary XML would be much like lossless (possibly minus beautification) compression for XML. Yet, if done right, it could speed up how long it takes to process XML files instead of slow it down as something like gzip would do.

  82. Binary not needed - better table format neeeded. by DunbarTheInept · · Score: 2, Insightful

    The real problem with XML is that it adds the extra verbosity of the metadata text tag for EACH INSTANCE of a pice of data even in cases where that metadata is identical for row after row of data. In the case of table data, that is really stupid. There should be some sort of XML means to handle a table of values better. A way to say "Column 1 has the following XML properties: name, etc", then "Column 2 has the following XML properties: name, etc".... and then after that section, a way to syntactically list just the values up until the end of the loop.

    This is what made us balk at using XML for storing NMR spectroscopy data, even though it is already in a textual form to begin with. The current textual form is whitespace-separated, little short numbers less than 5 digits long, for hundreds of thousands of rows. That isn't really that big in ascii form. But turn it into XML, and a 1 meg ascii file turns into a 150 meg XML file because of the extra repetative tag stuff.

    In another bit of irony, we can't find an in-memory representation of the data as a table which is more compact than the ascii file is. The original ascii file is even more compact than a 2-D array in RAM. (because it takes 4 bytes to store an int even when that int is typically just one digit and is only larger on rare occasions.)

    --

    Don't label something "offtopic" unless you know the topic well enough to tell what's on topic.

  83. An Option to Binary XML by Anonymous Coward · · Score: 1, Interesting

    I think that Binary XML misses the mark on bringing any real benefits other than transmission compression. XML can be a huge benefit from a human and coding perspective, but it also has drawbacks in transmission (due to size) and in processing (again due to size). A lot of XML data goes thru many different processing systems that never need descriptive tags and the overhead of document size can bog down some very large computers.

    I know, try to do an XSLT on a 60 meg file.

    One approach that could potentially benefit everyone is to have interchangable namespaces. By that I mean have a human readable namespace that also had a machine friendly name space.

    In the Human version you could have those wonderful long tags like [FirstNameOfMyGrandmothersThirdCousin] and have a transform that would make that [ID1001] for maching processing.

    You can save a ton of space by swapping out all of the Elment and Attribute names, holding structure, allow for machines to more efficiently process, and then if a human or UI needs descriptive information, you could go grab the friendly Namespace and be back to your large XML file.

  84. XML performance by BillAtHRST · · Score: 2, Informative

    The problem is that not everything in a typical XML message is text, so there can be a lot of translation going on between XML text and the binary format that an application needs (e.g., double). In our tests we've found XML to be 100x - 250x SLOWER than other approaches (e.g., JMS MapMessage). (FWIW, the 100x is using the MS parser, the 250x is with Xerces/Xalan). For high-volume, high-performance apps that's simply intolerable. Note that this has nothing to do with size on the wire, which is another consideration entirely.

  85. binary xml is problematic by iplayfast · · Score: 1

    Binary XML is problematic in that there will be competing standards and in the end, the end user/client will need mutiple XML decompressors in order to read the various XML formats that come down the pipe.

    I think what needs to happen is the XML (or html or any data for that matter) needs to be compressed as part of the TCP standard. (I can't believe this isn't happening already). XML as viewed by the server and client is uncompressed (and can be edited by any text editor). XML as viewed by the Internet is tightly compressed.

    1. Re:binary xml is problematic by the_greywolf · · Score: 1
      --
      grey wolf
      LET FORTRAN DIE!
    2. Re:binary xml is problematic by iplayfast · · Score: 1

      You are my new friend :)

      So has this been accepted at all? Has there been a rfc?

      I was thinking of something built into the tcp stack that would mark the data as compressed. But then the recieving stack would also need to know it was compressed, which implies a protocal (in order to see if compression is supported). So to incrementally release it (assuming it has the stamp of approval) you start with main servers. Add it as a protocal (compressed tcp). The first thing the protocal does is request if the receiving end supports compression. If not it reverts into tcp. The problem is that we will need one layer of the stack talking to another at the other end. Usually that is transparent.

      You would think that tcp would have a bit in it's protocal to say "compressed/uncompressed".

      Of course I'm talking with 101 knowledge here so I'm probably totally off base.

  86. Here's a more general question by fionbio · · Score: 1

    Does the world need XML when there are s-expressions?

  87. It's simple by Anonymous Coward · · Score: 0

    See www.json.org I use this everywhere. (Althoug I use a silghtly dirrerent version 'ron' ruby object notation.

  88. The article doesn't go far enough... by Da+VinMan · · Score: 5, Insightful

    It doesn't tell us what the specific performance problems are with XML. Does it take too long to transmit? Does it take too long to validate? Does it take too long to parse? Does it take too long to format? What's the real problem here?

    From experience, I can state that using XML in any high performance situation is easy to screw up. But once you get past the basic mistakes at that level, what other inherent problems are there?

    Oh, and just stating "well, the format is obviously wasteful" just because it's human readable (one of its primary, most useful, features) is NOT an answer.

    I get the feeling that this perception of XML is being perpetuated by vendors who do not really want to open up their data formats. Allowing them to successfully propagate this impression would be a very real step backwards for all IT professionals.

    --
    Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!
    1. Re:The article doesn't go far enough... by kellererik · · Score: 1

      I get the feeling that this perception of XML is being perpetuated by vendors who do not really want to open up their data formats. Allowing them to successfully propagate this impression would be a very real step backwards for all IT professionals.
      I agree 100%. The most important advantage IMHO is that no vendor is able to lock me out of the data I created. As soon as the XML is in binary format, the usual suspects (lawyers and Bill himself) will pop up and try to sue everyone decoding some patent-ridden file format.
      The beauty of XML IS the fact that I'm able to use vi or similar to view/change the contents and will be able to do so as long as there is a program capable of reading ASCII files.
      my 2 cents

  89. we need a binary standard by jilles · · Score: 1

    The most relevant thing about xml is that it is a standard for representing structured information. A problem with the current representation is that it requires a relatively large amount of cpu horse power and bandwidth to process and transport xml. This is the price we pay for something we do not really need in cases where xml is used for communication between two programs (i.e. most cases).

    Both problems are easy to solve. Ebxml for example is a binary xml industry standard that is used in the mobile domain. It's simply a more efficient way of storing the same information. It uses small binary tokens for tags and attributenames and doesn't include comments. This makes the parsing process simpler and faster. In addition ebxml wastes less bytes, especially when combined with compression. Parsers simply parse the data to the usual internal datastructures with the usual apis (dom, sax, etc) so at the application level there is no difference.

    The main problem with ebxml is non technical: it is not a widely adopted standard. What is needed is a w3c endorsed standard similar to ebxml with support in the form of dom & sax parsers and conversion tools for all major platforms and a mime type like binary/xml.
    Then stuff like soap, rss etc. can be served up in binary form to applications that can handle binary/xml and text/xml to other applications. The performance gains for soap heavy applications are probably considerable.

    The data itself does not change, it's just being transmitted more efficiently so there are no consequences for applications that use the dom/sax apis or higher level apis based on those. Also there's no reason for stuff like xsl processors, schema validators etc to stop working just because the xml data is handed to them as binary/xml instead of text/xml. Of course all applications that spit out hand coded xml need to feed their output through some conversion filter (if binary/xml is required).

    IMHO it would also be a great way to serve up XHTML to browsers.

    --

    Jilles
  90. Sometimes it's necessary by Excors · · Score: 1

    I've been working on a free RTS game (0 A.D.) that uses XML for storing most of its data, with Xerces to load those files. It seemed a little slow, so I made a simple binary XML format which eliminates the parsing step and just loads the data directly into memory.

    Loading a simple XML file (a couple of hundred bytes of data, plus another few hundred for a DTD), Xerces took about 10ms. The binary format took about 1ms. For a larger file, Xerces took 160ms while Xeromyces (the binary version) took 80ms (of which most was spent in other bits of code, handling the data that it read).

    When there are hundreds or thousands of such files that need to be loaded before the game can start, speed is critical, and XML by itself is just too slow. Our implementation of the binary format retains the advantages of XML (such as... erm... I'm sure there must be some), since you can always just edit the original XML, and it upates the cached binary version whenever it needs to; but it greatly reduces the performance problems that are inherent in parsing a text file. So, if you're loading lots of fairly static data files, binary XML is definitely a worthwhile thing to implement.

  91. WHO NEEDS FREAKING READABILITY ?! by apankrat · · Score: 2, Insightful

    Dude! Wake up! How often do you open an XML-RPC packet trace with your morning coffee and think 'Gosh, how cool it's in a readble bloated text format and I don't need to parse it with Ethereal !'

    Seriously, the only time readability is needed is when you edit an XML web page with a notepad. Otherwise it's a brain-dead technology that first got popular among scripting developers, which are notoriously afraid of anything binary, and then it got pushed into the areas where it didn't belong.

    Unfortunately, the majority of XML zealots are plain ignorant. Should they took time to learn what the byte ordering and TLV encoding mean, we would've not probably have this XML craze now :-/

    Don't get me wrong, XML has its place. But it is next to HTML, and not next to RPC or databases!

    --
    3.243F6A8885A308D313
    1. Re:WHO NEEDS FREAKING READABILITY ?! by swimmar132 · · Score: 1

      Having a human readable format makes it a lot easier to a) parse the data and b) validate the data.

    2. Re:WHO NEEDS FREAKING READABILITY ?! by sporty · · Score: 1

      When something needs maintenance or what have you, this is INVALUABLE. If you use a cryptic protocol from the getgo, we go into a state of screwed.

      --

      -
      ping -f 255.255.255.255 # if only

    3. Re:WHO NEEDS FREAKING READABILITY ?! by Anonymous Coward · · Score: 0

      Having a human readable format makes it a lot easier to a) parse the data

      No, it doesn't.

      and b) validate the data.

      No, it doesn't.

    4. Re:WHO NEEDS FREAKING READABILITY ?! by Anonymous Coward · · Score: 0

      > > Having a human readable format makes it a lot easier to a) parse the data

      > No, it doesn't.


      Yes, it does.

      > > and b) validate the data.

      > No, it doesn't.


      Yes, it does.

      Do I get a prize?

    5. Re:WHO NEEDS FREAKING READABILITY ?! by Doctor+Faustus · · Score: 1

      Well, yes, but if calling it "XML" is going to mean anything, it will be freely convertable into the text format.

    6. Re:WHO NEEDS FREAKING READABILITY ?! by rikkus-x · · Score: 2, Informative

      What should we use instead of XML to encapsulate RPC calls? Something at least semi-human-readable, please. I don't need to be able to read a graphic image, but I'd like to see the name of the method I'm calling, and at least string and text parameters.

      And when someone sends me a bunch of data they want importing into a database, in what format should they send it? I'd like to be able to ensure that their data is correct before giving it to my import routine, and when my validator says there's an error, I'd like to be able to see what's wrong by eye.

      Suggestions?

    7. Re:WHO NEEDS FREAKING READABILITY ?! by Doctor+Faustus · · Score: 1

      Don't get me wrong, XML has its place. But it is next to HTML, and not next to RPC or databases!
      HTML is basically an image format, albeit a vague one. I'm a big fan of PostScript, and think it should've replaced HTML starting with Mosaic, but I wouldn't say you should keep data in it.

      I don't think you should build an entire database in XML, but I think XML does have some very good things to offer relational databases.

      First, binary XML would make it feasible to talk to the database server with XML, so that you could standardize the connections and not require different drivers for each server. This is a pretty simple application, but I haven't heard of anyone doing it.

      Second, and this is already being done (MS is working on it, and I think Oracle may already have it), sometimes there are just too many variations in your data to strictly follow the first normal form rule (no non-atomic fields) without making a mess of your database. If you have a specific XML field type, you can add an "ExtraInformation" field and your data is still structured and queriable (with an XPath test on that field in your SQL).

    8. Re:WHO NEEDS FREAKING READABILITY ?! by Anonymous Coward · · Score: 0

      OK, maybe I'm responding to a troll....

      You can't break XML data. You can always puzzle out what the data means. Try that with a binary file when you've got to get work done.

    9. Re:WHO NEEDS FREAKING READABILITY ?! by Jhan · · Score: 1
      Having a human readable format makes it a lot easier to a) parse the data and b) validate the data.

      ...for a human. It makes it immensely harder for a computer to parse, or validate.

      Have you ever tried to program a validating XML parser? Yes, without any libraries. If you want to do it proper it's about a man year of programming...

      --

      I choose to remain celibate, like my father and his father before him.

    10. Re:WHO NEEDS FREAKING READABILITY ?! by dbacher · · Score: 3, Insightful

      I agree with your point, however there's one additional case where it is nice.

      The best use for XML is at system or domain boundaries, where you cannot control the software on both sides.

      For example, a support system might use file exchange to open support tickets in a vendors system for hardware failures. In this case, the vendor probably needs to deal with multiple different customers, and each of their customers might be dealing with several vendors.

      Being able to encapsulate to XML, in this case, is valuable so that all partners can understand the data.

      You could do this with a binary format, etc. but there is no binary format with the universal library support, and C doesn't guarauntee byte orders and structure layout between platforms, so in that case XML is useful.

      That's the only time it's useful.

      I strongly dislike using it for comms protocols, because the extensibility and transformation capabilities are lost, and it cripples throughput in the best of situations.

      --
      If your code is acting bloated, and is running rather slow, it's likely and predicted that some loops you will unroll.
    11. Re:WHO NEEDS FREAKING READABILITY ?! by handslikesnakes · · Score: 1
      HTML is basically an image format, albeit a vague one.

      No it's not. HTML is a markup language for the structure of hypertext documents; it has nothing to do with the appearance of those documents.

      I'm a big fan of PostScript, and think it should've replaced HTML starting with Mosaic

      Postscript's a layout language. If the web were built on it, it would be impossible to use anything except a traditional visual browser. Which is antithetical to the web in a lot of ways. Even something like browsing the web on a cell phone would be difficult.

    12. Re:WHO NEEDS FREAKING READABILITY ?! by Anonymous Coward · · Score: 0

      Yes, I present to you, AC, the "Ignorant Fool 2005" award for not having any idea what you're posting about and posting for the sole purpose of being a dick.

    13. Re:WHO NEEDS FREAKING READABILITY ?! by Anonymous Coward · · Score: 0

      Only when said cryptic protocol isn't clearly laid out like a well-formed binary XML could be. That's the point of the article. It needs to have good structure information available, but without the lexical parsing.

    14. Re:WHO NEEDS FREAKING READABILITY ?! by Anonymous Coward · · Score: 0

      There are 10 types of people: Those who call people who don't like binary XML fools, and those who don't like binary XML.

    15. Re:WHO NEEDS FREAKING READABILITY ?! by novitk · · Score: 1

      How about IIOP/CORBA, which did everything SOAP does now in 1990 with about 1/10th of the bandwidth.

    16. Re:WHO NEEDS FREAKING READABILITY ?! by Anonymous Coward · · Score: 0

      Dude! Wake up! How often do you open an XML-RPC packet trace with your morning coffee and think 'Gosh, how cool it's in a readble bloated text format and I don't need to parse it with Ethereal !'

      Instant hackability is important. For example, look at Amazon's web services. Their REST services are far more popular than their SOAP services. If you'd ever looked at a SOAP packet, you'd understand why.

    17. Re:WHO NEEDS FREAKING READABILITY ?! by Anonymous Coward · · Score: 0

      Archives need human-readability. The OAIS (Open Archival Information Standard) mandates that metadata be stored in a human-readable form. Metadata standards are hierarchical and complicated, so XML is an ideal format.

      Of course, no archive wants to have its online access system tied to a bunch of text files, so seamless integration between the XML format and the database is necessary.

      XML-RPC is simply a convenient method to get around the nightmares of endian-ness when dealing with a highly heterogeneous computing environment. The conversion from binary to text works everywhere, so it's the LCD.

    18. Re:WHO NEEDS FREAKING READABILITY ?! by Anonymous Coward · · Score: 0

      Have you ever tried to program a validating XML parser? Yes, without any libraries. If you want to do it proper it's about a man year of programming...

      Your point being?

      Writing a fully conformant optimising C++ compiler is several orders of magnitude harder. So we should all be entering machine code in a hex editor, right? I mean, it's harder for a human, but it's immensely easier for a computer to parse...

    19. Re:WHO NEEDS FREAKING READABILITY ?! by lachlan76 · · Score: 1

      Using a human readable format also removes any endian problems. No problems with the data size (v1.0: well, we'll make the size an unsigned char. v2.0: guess we need more data...this time it'll be an unsigned short int, v3.0: ok, let's fix this problem forever: data length is an __int64. some time in the future: fuck this, make it a char* )

    20. Re:WHO NEEDS FREAKING READABILITY ?! by Harald+Paulsen · · Score: 1

      Hmm, yes a standarized human-readable way to interface with databases, that would actually be really helpful.

      I don't know, perhaps we could call it a Structured Query Language or something. :-)

      --
      Harald
  92. Re:Binary = Proprietary ... I disagree by theIG · · Score: 1

    Yes, obviously binary == proprietary, sort of. But that isn't the problem with binary xml. A binary data stream of any sort has one glaring usability problem, and that is it is not transparent, but rather opaque I guess. What if I have data lying around from one of your apps that I have to recover? I could use your app, a heavyweight tool to find the data I want, or if I don't have it I might have to reverse engineer it. If you'd used a text format from the start I could have use my favorite editor or grep.

    Why would you say that text is a slopy way to "define something"? I find that statement rediculous. Especially when you say that it is only ok for memory hogs. If you ask me, the transparency of a text stream far outways any cost in performance. But the truth is, there isn't much anyway. If you can develop a good parser (not that hard), the cost difference is negligable, if any. Now this isn't true for all cases. For example, it would probably be silly to use a text format for a large, high-traffic database. like postgres or mysql. But for most anything else, there isn't any reason to use binary data formats, unless you want to keep something from your users, or at least most of them :)

  93. Sarcastically... by Anonymous Coward · · Score: 0

    Just get a faster computer.

  94. Re:Ummm bzip is also open by Thundersnatch · · Score: 1

    So is BZIP2:

    bzip2 is a freely available, patent free (see below), high-quality data compressor...
  95. Binary XML by Anonymous Coward · · Score: 0

    ASN.1

  96. Human readable by Chemisor · · Score: 1

    Remember that text is human readable only because you have a text editor. Any binary format can be human readable if you have a definition file and an editor that can show the data using it. If such a tool is made available on all platforms, all binary formats will become as easy to read and edit in raw form as text is.

  97. untrue by Anonymous Coward · · Score: 0

    They are not easy-to-process chunks.

    For any given XML file it isn't clear that you can process it easily in a small amount of memory. It is unclear it can be processed quickly (easily) at all.

    XML is a very poor format. You are forced to read large chunks of it character by character.

    Binary XML, if done correctly might help with that. But really, for any kind of speed, you need a file format with well-defined chunk lengths so you can read in entire areas of the file at once, not read a character at a time looking for CRs and close tags.

  98. The answer... by eomnimedia · · Score: 1


    <answer type="emphatic">No</answer>

  99. ASN: Bad Idea! Try IFF! by kbonin · · Score: 1
    ASN.1 is a truly poor specification. Yes, it sort of works in several spaces, I use them as part of my job reading and writing various credentials. But the spec is ambiguous, allows creation of documents that break parsers, and cannot be left-right parsed cleanly.

    I suggest IFF. I've been working on an open spec for 3d content based on XML as a hobby project since VRML dropped the ball, and at the same time working on a parallel IFF chunk hierarchy based spec.

    This lets me have text XML when I need human readable, and lets me have quickly parsed binary data when I need that - best of both worlds, and trivial to translate between. From experimenting with all the ways to do this, I've found XML/IFF chunking to be a clean map...

    An older version of the spec is at http://www.vscape.com/vml/index.html, if anyone wants an example of how this could work.

  100. XML doesn't need to be non-ascii to be small by iabervon · · Score: 3, Informative

    Three ideas, in order of increasing significance and increasing difficulty:

    Stop using bad DTDs. There seems to be a DTD style in which you avoid using attributes and instead add a whole lot of tags containing text. Any element with a content type of CDATA should be an attribute on its parent, which improves the readability of documents and lets you use ID/IDREF to automatically check stuff. Once you get rid of the complete cruft, it's not nearly so bad.

    Now that everything other than HTML is generally valid XML, it's possible to get rid of a lot of the verbosity of XML, too. A new XML could make all close tags "</", since the name of the element you're closing is predetermined and there's nothing permitted after a slash other than a >. The > could be dropped from empty tags, too. If you know that your DTD will be available and not change during the life of the document, you could use numeric references in open tags to refer to the indexed child element type of the type of the element you're in, and numeric references for the indexed attribute of the element it's on. If you then drop the spaces after close quotes, you've basically removed all of the superfluous size of XML without using a binary format, as well as making string comparisons unnecessary in the parser.

    Of course, you could document it as if it were binary. An open tag is indicated with an 0x3C, followed by the index of the element type plus 0x30 (for indices under 0xA). A close tag is (big-endian) 0x3C2F. A non-close tag is an open tag if it ends with an 0x3E and an empty tag if it ends with an 0x2F. Attribute indices are followed with an 0x3D. And so forth.

    1. Re:XML doesn't need to be non-ascii to be small by edp927 · · Score: 2, Insightful

      Stop using bad DTDs. There seems to be a DTD style in which you avoid using attributes and instead add a whole lot of tags containing text. Any element with a content type of CDATA should be an attribute on its parent, which improves the readability of documents and lets you use ID/IDREF to automatically check stuff. Once you get rid of the complete cruft, it's not nearly so bad.

      Not to nitpick, but attributes != elements. (hint: one of them is ordered, and repeatable). As far as ID/IDREF goes, key/keyref in XMLSchema replicates this for arbitrary markup. Use of attributes, in some instances is rather crufty precisely because they need to be handleed anampohically to elements.

      A new XML could make all close tags ". The > could be dropped from empty tags, too.

      You're design decision, not mine. Some might think that if you're going to have a verbose format like xml, you might as well throw in a few sanity checks as well, since they're almost free by comparison.

      Look, I've said it before, and I'll say it again. Like Hello Kitty, XML has one thing going for it -- ubiquity. If, like me, you're a proponent, you need to understand this, embrace it. Once you've repeated the words enough, you will come to a blissful realization. It doesn't matter how "bad" xml performance is, the only thing that matters is that it be useful for everyone. This means that it should be useful for config files for simple programs/scripts. It should be useful for people who want to build (by hand) a little web-service to serve up their mp3 collection, or to multi-billion dollar companies that want to run online acutions. If XML can be this broad-based, then so can the tooling that is used to manipulate it. That's good news for big companies who want to save money, and script hackers who just want to save time, good for us all. Anything that fractures xml's ubiquity undermines the technology itself, and should be avoided. Binary XML falls into this category.

      Now as for performance, my personal opinion is that its way to early to start running around creating binary standards. XML itself has been around a while, but the higher-level standards are still evoloving (web services, xml schema, etc). Most of the current tooling around xml is currently written to demonstrate standards compliance. When we really start to see performance-oriented solutions, and they still suck, then everyone can start rioting.

    2. Re:XML doesn't need to be non-ascii to be small by Frater+219 · · Score: 1
      A new XML could make all close tags "</", since the name of the element you're closing is predetermined and there's nothing permitted after a slash other than a >. The > could be dropped from empty tags, too.
      You could go a little bit further. Attributes are syntactic sugar for nested tags. The difference between <A FOO="BAR" /> and <A><FOO>BAR</></> is primarily aesthetic rather than semantic. So (as many people have done in practice) you can replace attributes with nesting. This also lets you eliminate the needless = mark, and even eliminate "" when you're referring to a known symbol rather than an actually arbitrary string.

      Then, since you've earlier eliminated distinct close tags for different open tags, instead of making the close tag </ you make the open tag simply not have the final > ... use that for the close tag instead. So for what started as <A FOO="BAR" /> ends up as <A <FOO BAR>>.

      Now, angle brackets are kind of ugly. It would be more aesthetically pleasing to use something a little more rounded, something that has a history in mathematics of meaning "nested syntax for evaluation". Like, say, parentheses.

      So you end up with (A (FOO BAR)). Conveniently enough, this is also a syntax for which highly efficient parsers have existed since ... oh, 1960 or so. There are, today, two major common schemes for evaluating trees like this as code, and three or so different standards for expressing syntax-to-syntax transformations on trees like this -- kind of like XSLT, but actually efficient.

      Sounds like a win to me.

  101. zip by Anonymous Coward · · Score: 0

    How about compressing xml with zip?

  102. As long as it's convertable... by arendjr · · Score: 1

    I use XML quite extensively and though I love it for my purposes, I do have to admit it slows down things. Some time ago, this became very apparent when I was making a XSLT stylesheet which included about 8 other XSLT sheets which were sent to the user's browser which converted my custom XML schema to some decent XHTML and XUL code. It become _slow_, to say the least.

    So, if you ask me, I'm all for a binary XML *standard* which is then supported by browsers, life, the universe and everything. One thing I do ask them is to make it possible to easily convert text XML files to binary XML files and vice versa (thus not losing the original tag names and such). As long as they obey to that, I'm all for it!

  103. Re:Binary not needed - better table format neeeded by MikeBabcock · · Score: 1

    Lets try using intelligent compression ... just a thought, but why not use a dictionary compression system for compressing tags as they occur in the output so that for transmissions with less than 255 tags, there would be single-byte tagging in the document.

    Building such a decompression scheme into a SAX parser seems mind-numbingly simple as well, and even faster if the parser were run-time optimizable.

    Actual content (between tags) could be compressed using any system of course, with a proper marker at the beginning to specify which method was used (or none).

    --
    - Michael T. Babcock (Yes, I blog)
  104. "Regular" compression is better by Lulu+of+the+Lotus-Ea · · Score: 1

    What really makes sense, IMO, is something I wrote a couple articles about several years back. Namely, rather than define a custom binary format that every tool needs to understand, simply perform a reversible transformation on XML (i.e. compression) for the storage and transmission steps. So the XML writing application and the XML reading application have no need to know anything about the compression. That's all pipelined, in a way invisible to the ends.

    Of course, standard tools like 'gzip' can do exactly this. But it's also possible to take advantage of the inherent structure and redundancy of XML to get far better compression ratios. But the concept isn't really much different. See these (and also the longer articles for Intel that they link to, but the basics are in the below):

    http://www-106.ibm.com/developerworks/xml/library/ x-matters13.html

    http://www-106.ibm.com/developerworks/xml/library/ x-matters19/

    Actually, you can find better formatted versions of the Intel versions at:

    http://gnosis.cx/publish/tech_index_ids.html

    In any case, the concept is the same... (losslessly) futzing with structure can help out standard compression like gzip. But there's no need to build binary or different semantics into the heart of XML.

  105. Re:Ummm bzip is also open by Dingbat1066 · · Score: 1

    Well, yeah. My point was that the original poster was wrong to thing that ZIP wasn't.

  106. put a freakin' *what*??? by myowntrueself · · Score: 1

    "If I were world dictator, I'd put a kibosh on binary XML"

    a 'kibosh'? Is that like a death sentence? A reward? a what?

    What language is he talking?

    --
    In the free world the media isn't government run; the government is media run.
    1. Re:put a freakin' *what*??? by yohan1701 · · Score: 1

      http://dictionary.reference.com/search?q=kibosh

    2. Re:put a freakin' *what*??? by myowntrueself · · Score: 1

      Ok, my guess is Yiddish?

      --
      In the free world the media isn't government run; the government is media run.
    3. Re:put a freakin' *what*??? by Anonymous Coward · · Score: 0

      Learn English.

      Try a dictionary.

    4. Re:put a freakin' *what*??? by myowntrueself · · Score: 1

      It may be in American English but not English English, ok? And it definitely doesn't look phonemically English to me. More like Arabic or some other semitic language.

      You might be surprised at how many Americanisms are a total puzzle to the rest of the English speaking world!

      --
      In the free world the media isn't government run; the government is media run.
    5. Re:put a freakin' *what*??? by CrackerJack9 · · Score: 1

      It's definitely not American English. It seems to be an idiom of English English, but is not used frequently any more. At least that's what I tell from a Google search. Either way, it wasn't made to make sense to 99% of us anyway

  107. yes by t_allardyce · · Score: 1

    In the end its all about the application, if you're using XML to describe an entire website for example, then you can compress it in whatever way you want (remember, no matter how long tag names are or how much they are repeated, a good compression system will see this redundancy) and if its done right, you can even process it while its compressed! (im looking at you Huffman!) Yes I did RTFA, the point is XML isn't about this layer, its about the overall way of storing data in its most natural form, which isn't going to be the smallest way. XML is supposed to be big and wasteful of memory, its like maths, it doesn't care about the logistics.

    Obviously people using it do care about the logistics and there are going to be cases where you don't know in advance if something can handle a particular compression or binary format, hence you need a way to tell the other system what you are trying to send or what you can send: eg.. an XML exchange format (in _raw_ XML) which basically says 'this stream of bytes is an XML file compressed in gz format' etc.. and a way for two machines to negotiate - ie the first says 'i understand these formats' and the second sends it in the best understood format/compression scheme. Thats almost certainly been done already, in fact I know SVG browsers are ment to be able to accept gzipped SVG for example.

    Actually technically any given XML stream is already in a binary format technically, you have to know how each character is stored before you can read it...

    --
    This comment does not represent the views or opinions of the user.
  108. Re:Binary = Proprietary ... I disagree by I_Love_Pocky! · · Score: 2, Insightful

    If you ask me, the transparency of a text stream far outways any cost in performance.

    It far outweighs it huh? I guess you have never heard of a large segment of the computing world refered to as embeded systems.

    If you can develop a good parser (not that hard), the cost difference is negligable, if any.

    This is simply untrue, development of a good parser is easy, but it's added bloat that isn't negligable for many computing devices outside of the PC/Server realm. Not to mention the added network traffic that uncompressed text yeilds (embeded devices don't always have the fastest I/O). Some say that the solution to reducing the network overhead of XML is compression. Compression takes CPU power, another thing lacking in may embeded devices.

    My point is that there are actually a lot of applications where XML is just not well suited.

  109. Unfounded Objections by e2d2 · · Score: 1

    ZapThink, a research firm specializing in XML and Web services, echoed concerns over binary XML, notably the possibility of proprietary implementations. ZapThink analysts also noted that an XML message can touch several different pieces of software and hardware, such as security systems, all of which would support any binary XML standard.

    I think this is totally unfounded for two reasons:

    1. Proprietary binary versions of XML will be created anyway if needed, you really can't get around that.

    2. The need for binary versions of xml is in the need for faster transmission. On the receiving end you could translate to text format and then pass this text version to your other applications, so no you would not have to have binary XML support in every application that supports XML.

    But this brings up a valid point, we already have compression formats that we can use for transmission over pretty much any format, do we need to incorporate binary transmission of data directly into web services? Or should those that are in need of better performance simply wrap up their large datasets inside XML payloads and use the current format?

  110. This sounds awfully familiar.... by RoadWarriorX · · Score: 1

    Not to start a flame war or something, but when I was looking into SOAP and XML-RPC, I came across this newsgroup post by Michi Henning (co-author of Advanced CORBA Programming in C++) that makes me really, really think about using XML as an RPC mechanism.

    I like using XML and all, but reverting back to a "binary" XML format for RPC is like going back to CORBA and COM. It just does not make sense! XML has it's uses and I really do not think RPC is one of them, IMHO.

  111. What the hell? by cardshark2001 · · Score: 1
    Manufacturers of consumer devices such as Canon, as well as mobile-phone companies such as Nokia, have argued for a binary XML format. Without it, large files such as images will take too long to download to devices such as mobile phones, they argue.

    Ehh...... are they encoding images in xml? Is the reporter just typically wrong in the techno babble translation, or are people really doing that?

    --
    WWJD? JWRTFA!
  112. The problem is verbosity & parsing by mveloso · · Score: 1

    The main problem with XML is:

    * verbosity. Those tags take up space, and for small amounts of data the tag volume is larger than the actual data. The verbosity also causes problems on smaller devices with less available memory and bandwidth.

    * parsing. String parsing is expensive compared to binary parsing. It's easier to parse through a TIFF file than it is to parse through a small XML document.

    The human-readable aspect is nice, but with a good editor you don't need human-readable tags. You need well-defined tags.

    For well-defined DTDs why use text at all? Substitute binary for the tags, and provide a binary->text mapping. Suddenly editors will appear that automatically display text tags, but save as binary tags.

    Human readability is nice, but as someone else has asked, how often do you really read XML? When I sniff packets, my sniffer decodes everything for me. I could decode the packet headers myself, but why*? That tedious stuff is what software is for.

    BinaryXML as an alternate representation of XML would be welcome. It'd complicate matters for existing parsers, though.

    You could also unofficially do it by sticking a textXML->binaryXML translator on the end of both of your pipe. That would take care of the small device problem, sort of.

    * note: I tend to end up decoding the packet payload anyway, but that's because I'm too lazy to write a plugin to decode it for me.

    1. Re:The problem is verbosity & parsing by Da+VinMan · · Score: 1

      To my way of thinking, XML is to ease data interchange and make data portable. It's mainly useful when you want to have data that's 1) human readable and 2) self-describing. There are inherent advantages in those two characteristics that lend themselves to open standards, easy modification, etc.

      So, if you don't need those characteristics, why use XML at all? Do you need those characteristics on small devices? Aren't we just talking about all the ways XML falls apart when it's abused?

      --
      Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!
  113. Re: well, actually... by scotty777 · · Score: 1
    The article gets right into bandwidth problems, and mentions the lack of compression.

    From a communications standpoint, this seems to me to be shortsighted. Bandwidth is getting cheaper even faster than storage, and storage is getting cheaper faster than processing. Compression is a solution to an old problem; one that is rapidly going away.

    The real problems (IMHO) are the lack of fine grained security, and the hierarchical (tree) structure that is usually imposed on relational (networked meanings) data. The value of data is often proportional to how many connections it has, and how well we can protect it.

  114. Speeding up communications of XML. by t-maxx+cowboy · · Score: 1

    Lets look at it in terms of old technology. There once were slow modems. Lets no go to slow, but lets say 2400bps. These were dialup modems. When we used to connect to the old world BBS, we would use download protocols to transfer files. One such protocol was Z-modem. Now the last time I checked Z-modem could compress data over that 2400bps modem resulting in speeds often double that of 2400bps. Move ahead to ethernet. No compression on the network layer. Simple solution add compression protocols to webservers/and other servers that support XML. I know that webservers and clients already zip some stuff up for communications. Why not all of it. Add the compression at the client/server level and be done with it. Hence in the old days, if you wanted to use Z-modem to make things faster it had to be setup on both ends. Same thing today, just with a diffent protocol

    --
    Regards,

    Ryan Pritchard
    Fun Extends All Basic Life Expectancies
  115. Why not re-examine http?-Jabber. by Anonymous Coward · · Score: 0

    "Please remember that not all XML data is transmitted by HTTP however (thank god)."

    I believe that Jabber is an example of that.

  116. Re:ASN: Bad Idea! Try IFF! by pommiekiwifruit · · Score: 1
    Hey, I use IFF too... However I don't register my chunks with Electronic Arts so I suppose they are not really that valid.

    What is slightly annoying about IFF though is that it is based on the 68000 chip so you're supposed to align stuff to 16 bits and put the bytes around backwards. Naturally Microsoft ignored those parts of the spec when they wrote .wav files.

  117. For Starters: Nix the XML closing tag name by Anonymous Coward · · Score: 0

    At the very least, the XML closing tag name should be nixed - it serves no purpose whatsoever and only wastes bytes. "</>" is everybit as expressive as "</tag>". Instant 30% reduction in XML file size.

    1. Re:For Starters: Nix the XML closing tag name by T-Man78 · · Score: 1

      Obviously you have no idea on how to use xml. Sometimes you need the notation if you have text in between the tags, like youHaveNoClue

    2. Re:For Starters: Nix the XML closing tag name by FLEB · · Score: 1

      Why exactly couldn't you use:

      youHaveNoClue

      ???

      --
      Information wants to be free.
      Entertainment wants to be paid.
      You just want to be cheap.
    3. Re:For Starters: Nix the XML closing tag name by Anonymous Coward · · Score: 0

      The op meant why not replace :
      <tag>thinkBeforeYouReply</tag>
      With :
      <tag>thinkBeforeYouReply</>

    4. Re:For Starters: Nix the XML closing tag name by truth_revealed · · Score: 1

      <get><a><clue>duh</></></>

    5. Re:For Starters: Nix the XML closing tag name by Mark+Pitman · · Score: 1

      It would be a lot less readable when you have complex nesting going on and the text isn't indented all nice for you.

    6. Re:For Starters: Nix the XML closing tag name by Anonymous Coward · · Score: 0

      Because if you use something like

      <B>These words bold, <I>These words bold and italic </B> these words Italic </I>

      which is:

      These words bold, These words bold and italic these words Italic

      And you use your method, you end up with:

      <B>These words bold, <I>These words bold and italic </> but these words are bold when I want them italic </>

      Obviously this can be avoided by using additional slashes and markup, but it adds complexity where none is needed.

    7. Re:For Starters: Nix the XML closing tag name by KevinKnSC · · Score: 1

      XML doesn't allow what you're trying to do, though. It has to be properly nested, so you'd have to do something like:

      <b>bold text</b><i><b>bold italic text</b> italic text</i>

      A result is that the closing tag is always the last opened tag, so allowing </> makes sense.

  118. Anecdotal example by plopez · · Score: 2, Interesting

    Had data to be delivered to client, dumped from a database. As flat files they were ~20mb in size as flat files. That bloated ~120mb after conversion to XML.

    Client attempted to open in a DOM based application which I suspect used recursion to parse the data (easy to code, recursion). Needless to say it brought their server to its knees.

    We switched to flat files shortly there after.

    In my problem domain, where 20MB is a small data set, XML is useless. XML seems does not scale well at all (though using a SAX parser helps at times).

    YMMV.

    --
    putting the 'B' in LGBTQ+
    1. Re:Anecdotal example by Da+VinMan · · Score: 1

      Are you certain that application was implemented well? I find it hard to believe that XML can't handle a mere 20MB of data?! I would be shocked if a dataset 10x that large couldn't be processed as XML.

      Now, I totally understand how standard flat files are going to outperform XML every time, but they do serve a different need (i.e. large homogeneous datasets) than XML (i.e. many heterogeneous datasets). So, maybe XML really isn't appropriate for large datasets anyway. Besides, do you really need your large datasets to be self-describing?

      --
      Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!
    2. Re:Anecdotal example by styxlord · · Score: 1

      What was the rationale of storing the data in XML?
      Were you planning on using XSLT to transform it? Were you planning on hand editing it and using an XSD schema to make sure it was still valid?

      There's nothing wrong with XML, XML is an extensible markup language that allows you to write schemas that define your own subset of the language and write transforms that convert it into something presentable. Databases support outputing XML so that you can have the results of a query tranformed into something else (like XHTML).

      If you're dealing with bulk data, by all means use what ever method is fast and reliable.

  119. Random Access XML by McSmiley · · Score: 1

    I think the point of this is, you'd like to have a random access version of XML. Right now there's no way to say, seek to the next sibling node without reading all the intervening characters. DOM and other higher-level API's hide that fact, but it's still there.

    --
    "I compare [open source vs. non-open source] to science vs. witchcraft." linus
  120. Re:Binary not needed - better table format neeeded by Anonymous Coward · · Score: 0

    And, of course, decompressing a 10 meg file into a 150 meg file and processing that first into an enormous tree with a bloated XML-parser, then accessing that through a complex object-oriented interface, will cause no performance hit over just processing a 1 meg file directly.

    XML sucks.

  121. Wrong Problem by slyckshoes · · Score: 2, Insightful

    It seems to me that the problem isn't with XML, it's with what people are using it for. I read some complaints here from people saying "I tried to use XML for BLAH and it was too slow." However, if they'd thought about it, BLAH would have been better served by some binary format in the first place. The article also discusses the fact that mobile devices need something less cumbersome for transferring pictures/media. Why are they using XML for that at all? One of the benefits of XML is that it's human readable, but in those applications you don't need that benefit, so don't use XML. Instead of coming up with a binary XML standard, come up with a generic binary standard that does exactly what you want. Too many people have been given the hammer of XML and now everything looks like a nail.

    1. Re:Wrong Problem by johnjaydk · · Score: 2, Insightful
      Dead on.

      Use XML in places where it makes sense: Interfaces between different companies/business partners/departments etc, interfaces between mutually hostile vendors, really long time data storage.

      Using xml as data format between two tightly coupled Java programs, standing next to each other and who's exchanging massive amounts of data is insane.

      This is of course a simplified example BUT the point is ALWAYS beware of the trade-offs you do when you make a technology choice. Same things go for algorithms ... think !!!

      --
      TCAP-Abort
  122. ASN.1 rules! Great Opensource Compiler! Free book! by gd23ka · · Score: 1
    Short answer: It is a stupid idea and it is clashing with ASN.1.

    First off: ASN.1 (X.680) is not a fringe technology and it is alive and kicking. ASN.1 is dead == BSD is dead. In fact ASN.1 and the binary wire presentations (CER/DER/PER/XER) are at the core of many important services we use daily including but not limited to:

    PKIX / X.509 / PKCS (Public Key Cryptography)

    Kerberos authentication

    SNMP / CMIP

    X.500 LDAP / DAP directory services

    X.400 messaging

    Voice over IP: H.323 T.38

    The 3GPP specifications (GSM / UMTS mobile phones)

    OSI layer 7 protocols (FTAM.. etc.)

    RFID

    In comparison to XML, ASN.1 is a huge bandwidth saver, in fact the PER (Packed Encoding Rules) were designed for saving bandwidth. There is even a way for encoding data in XML using the XER (XML Encoding Rules) specification.

    Last but not least there is finally a worthwhile opensource ASN1 to C compiler out there: Get ASN1C here.

    New to ASN.1?? Visit this site and be sure to pick up the excellent free book on ASN.1!

  123. Re:10 types of people by jellomizer · · Score: 1

    Well it depends if you are using signed or unsigned.

    10 Unsigned is 2 Dec
    10 Signed is -2 Dec (Assuming you are using 2 bit numbers)

    1010 Unsigned is 10
    1010 Signed is -6 (assuming that you are using 4 bit numbers)

    Now to solve this problem you just give a leading 0

    so There are 010 Types of People.

    (What is 10 in binary?)

    01010

    Thisway solves any confustion.

    --
    If something is so important that you feel the need to post it on the internet... It probably isn't that important.
  124. The fake grass is always greener... by Just+Some+Guy · · Score: 2, Insightful
    And I rather enjoy using their rich set of .NET XML classes to talk to our Unix servers. It helps my company interop.

    You had me until then; no self-respecting engineer would ever use those terms.

    --
    Dewey, what part of this looks like authorities should be involved?
    1. Re:The fake grass is always greener... by Omega1045 · · Score: 1
      What do you want me to say?

      And I rather enjoy using their big ol' set of .NET XML classes to talk to our Unix servers. It helps my company get our Windows computers to get along with our Unix computers.

      I did not use paradigm or think outside the box. But I am glad you feel superior all the same. It must be nice for your ego, you self-respecting engineer, you!

      --

      Great ideas often receive violent opposition from mediocre minds. - Albert Einstein

  125. XML not useful for xferring copious binary data by smcdow · · Score: 2, Insightful
    Our applications (real-time geographically distributed RF DSP) involve shipping around lots and lots and lots and lots of digitized RF data. We have our share of wonks who think we should be using XML for this kind of thing. We all agree that XML would solve many problems for us. Except there's no convenient way to represent the actual data payloads, which consist of scads of binary data.

    A good binary XML specification could be an extremely good fit for us.

    And, don't suggest that we just compress XML and send that. Here's why: first we have to expand all that digitized data into some sort ASCII encoding, which is then compressed. End result: no gain and a possible loss of precision in the data.

    A real, live, useful binary XML spec could help us immensely. I say BRING IT ON!!!!

    BTW, wasn't DIME supposed to address these problems? What happened to DIME, anyway?

    --
    In the course of every project, it will become necessary to shoot the scientists and begin production.
    1. Re:XML not useful for xferring copious binary data by dvdeug · · Score: 1

      Here's why: first we have to expand all that digitized data into some sort ASCII encoding, which is then compressed. End result: no gain and a possible loss of precision in the data.

      Compressed ASCII encoded binary will rarely be much larger than the original binary compressed. And how are you screwing up so badly that there's a possible loss of precision in the data? If all else fails, use Base64. If you're dealing with floating point numbers, express them exactly in hex.

  126. ?xml by Laaserboy · · Score: 1


    <noun val="point"/>
    <prep val="of"/>
    <noun val="XML"/>
    <verb val="is"/>
    <contraction val="it's"/>
    <adjective val="easy"/>
    <infinitive val="to"/>
    <verb val="read"/>
    <period val="."/>

  127. How about WBXML? by Anonymous Coward · · Score: 1, Interesting

    Wbxml is very compact, easy to parse and it's standardized too. Have a look at http://www.w3.org/TR/wbxml/ .

    1. Re:How about WBXML? by sakrank · · Score: 1

      I have used it and it works fine even on mobile phones.

  128. Microsoft XML by Spy+der+Mann · · Score: 2, Interesting

    take an example on microsoft XML formats. Word, or the MSN messages format... they're _NOT_ xml. They're proprietary formats DISGUISED as XML.

    If Microsoft doesn't respect text-only XML, what do you think will happen when^H^H^H^Hif binary XML is out?

    1. Re:Microsoft XML by Austerity+Empowers · · Score: 1

      Which in itself is why we need to standardize on a binary format. Ad hoc formats are springing up out of necessity, either it becomes a good standard or people will embrace some corporate version.

      Note in my original post I described a need for a validator, precisely to catch and detect MSisms.

      It's not a matter of respecting, it's a matter of noting a weakness, exploiting the weakness and eventually providing an alternative.

  129. Human readability makes it much easier by Baki · · Score: 2, Informative

    to make inaccurate interpretations of the data and not using proper and accurate specifications.

    Many people claim that XML is so great because you can "just read and understand it" without having to use cumbersome and hard to understand specifications. This exactly is what makes XML, indeed, nice for typesetting purposes like HTML, maybe as an alternative for simple configuration files etc, but indeed NOT for RPC and databases as you write. I couldn't agree more.

    I have seen so much time and money lost due to intuitive but false interpretations of XML schema's. People think that because its human readable with "meaningful" tagnames that they don't need a proper spec no more. Well I guess it fits in nicely with todays "cut and paste" programmers who don't really know what they're doing :(.

  130. Clarification by Spy+der+Mann · · Score: 1

    When I said MS Word format, I meant "MS Word HTML output format".

  131. None of It's Really Necessary by Greyfox · · Score: 1
    You can solve the speed AND size problem by having well documented interfaces to your application that send only the necessary data without all the markup. For about 98% of the applications that currently are using XML, documented binary interfaces would be sufficient.

    If your boss insists on XML, write a documented binary interface and then a converter that reads the documented binary interface and outputs XML. Most of the time that would voilate "you ain't gonna need it" but very few projects every really spend much time doing good design.

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

  132. Finally the XML bubble bursts... by niktesla · · Score: 1
    "Not only is XML verbose, but it's extremely wasteful in how much space it needs to use for the amount of true data that it is sending,"

    I've been saying this since the begining: Why send ten times the bytes for the same amount of data? Sure, its human readable and writeable, but how many times do humans actually read or write XML (I'm talking about web users here, not us /.ers ;)). It seems to me that if something is primarily machine read and written, a binary format makes much more sense: its more compact and can be interpreted by the machine much faster.

    Advances in networking and processing power go a long way in addressing performance concerns, though perhaps not on battery-constrained mobile phones, he said.

    And that quote exemplies the reason why we have a whole lot faster machines, but still feel bogged down doing the same things. The speed advantage is largely negated by ineffecient coding and data storage formats such as XML. You cannot always assume the next round of hardware will make things fast enough. I'll be glad when we reach the limits of silicon and Moore's Law is put to rest, because it will force people to stop thinking of fast hardware as an excuse for sloppy coding and bloat.

    --
    I've discovered a remarkable proof, but this margin is too small to contain it...
  133. Possible solution by Spy+der+Mann · · Score: 1

    a) Use an internal representation of the DOM tree.

    b) Publish the specs

    c) DON'T call it XML. Try "Extensible Tree Based Binary Format" or something. Just because XML is a standard people want XML to devour everything as some kind of spec blob.

    In other words:
    Don't like XML? DON'T USE IT!

  134. Here is the asnwer by rocksh · · Score: 0

    The answer to your question posted on slashdot on January 14 2005 written in plain English, containig 31 letters and no binary data or images and entitled "Does the World Need Binary XML?" is given to you after long deliberations and consultations with higher authorities and spelled in plain English as "Yes"!

    --
    >
  135. Overwhelming feeling of deja vu by po8 · · Score: 1

    You seem to have a marked lack of appreciation for the intelligence, knowledge, and experience of the folks behind XML. May you be cursed to use ASN.1 until your appreciation improves.

    1. Re:Overwhelming feeling of deja vu by GOD_ALMIGHTY · · Score: 1

      No, it's not a refutation of the guys who came up with XML, it's more of a criticism of the community's use. XML in itself is useful, for certain things. I just don't think it should be used everywhere. It's a Rosetta Stone of sorts, while useful for translating data, writing everything in 4 languages simultaneously is inefficient. I have mentioning ASN.1 purposely because I don't think it solves the problems that XML was trying to solve. I just think that in order for XML to fill the shoes the hype that's been around since it's inception, it needs to be more efficient. A binary version should have been there from the get go with a text presentation format that would appear as we see text XML today.

      My lack of appreciation is over the way the community has jumped into the void without thinking about what might be inside.

      --
      Arrogance is Confidence which lacks integrity. -- me
    2. Re:Overwhelming feeling of deja vu by Unordained · · Score: 1

      I just can't help it: "what", not "how."

      XML is more like A4 or 8.5"x11" paper than it is like a rosetta stone. The rosetta stone allowed us to translate semantics, not just syntax. There are many more efficient file formats than XML; if you're sending table (relation) data, then .csv files are pretty much automatically more efficient, and libraries have existed "forever" to handle them. Sure, you can squeeze more efficiency out of them, but the point is that XML was never about efficiency, it was about having a single syntax and a single library for reading all your files and parsing them into ... something. But it was never about semantics either; it won't automatically translate between two different schemas, it can't tell you what a file means, it can just present you with a tree of stuff for you to parse. And that's still the hardest part of the job. I can't say parsing ever worried me ... I've been working on HL7-related stuff for a few months. Nasty syntax, but nastier semantics. The main problem's been a lack of documentation on what the fields "mean" and when they're appropriate, when they go together, etc. It doesn't matter that the file is human readable; I could read greek all day and still not grok it.

  136. UTF-8? by bill_mcgonigle · · Score: 1

    I don't get it. If you really want to hide a binary blob in XML, why not just call it UTF-8 and decode it as you wish?

    You can put e.g. JPEG data bytes in and call it UTF-8 if your parser knows what to expect.

    XML does support 8 and 16-bit encodings, right?

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  137. My name is Britney by Anonymous Coward · · Score: 0
    What is your name?
    1. <name>Britney</name>
    2. name:Britney;
    3. Britney
    Nuff said
  138. Re:Binary = Proprietary ... I disagree by Austerity+Empowers · · Score: 2, Insightful

    What is text? It's a binary code that a computer translates into graphical glyphs. Is it proprietary? Not any more. Your computer is what turns that binary code into something that means something to you. It doesn't even mean something to everyone (in fact iirc the first line in an xml file identifies a code page for using the intended symbols). So firstly, opaqueness even on "text" is not quite black and white. Second, what is transparent to YOU may be completely opaque to software, I'll elaborate on this later.

    So what could binary XML be? It's a binary code that translates into XML syntax. Except it's easier to deal with for software, there's no processing. Let me present this example, which I will endeavor to use over and over. . I could write this in binary as 0001010203, obviously to do that i'd have to store the strings "mytag", "is", "simple" in a string table elsewhere, but this is just a simple example. I made "0x00" mean "a tag", the first "0x01" mean 1 attribute, and the rest are string references. Reading this tag would be very simple, fread(buffer,10,1,file) (i picked the two middle numbers out of the air since we have not really defined this format).

    Saying that binary is proprietary makes absolutely no sense. Proprietary means property of an owner (usually a business). A file can't be proprietary. It's contents, the format of it's contents, certainly. But a binary file is atomic, it's like the sky. It just is what it is. Binary XML COULD become proprietary, but it will not NECESSARILY happen. Nothing is inherently proprietary about a binary file. If the binary XML format satisfies the constraints of a standard in my first post, it will absolutely not be proprietary by construction, or so I think. I don't work in standards groups, people more experienced with their goings on may point out additional refinements.

    Your next point, recovering data. What do you use to read XML files? A text editor usually. What does that do? It reads a binary file (uh a text file!), applies some understanding of what the ASCII code (as an example) means, and displays it to you. Most of it is usually character data, but not always, there's a bunch of special characters that text editors often respond to for formatting or other things. Unix and PCs can't agree even on how to terminate a line. The point being even right now you can't totally say plain text XML is transparent, magic happens for you to just see it. Nothing about its presentation is defined (nor should be, imho). So what could binary XML be viewed with? Only slightly more overhead. You could "textify" as a preprocessing step to be viewed in a text editor. bin2txt myfile.bin.xml > myfile.txt.xml as an example. Or you could write your xml in plain text, and do the opposite. It's one to one, no loss. XML is just a syntax.

    Now as for processing, I'll admit to waving my hands and skipping a few pieces. The XML syntax is defined clearly, there's no ambiguity (that i know of). However the step of choosing text-like strings to declare the syntax elements is where it gets hairy. Your first step in writing a parser is to grab the syntax elements out of their native text string. This is disgusting as compiler writers, language developers, etc. understand. You have to make lexx/yac scripts or workalikes to generate code, or worse, write your own (no one should do this but that's purely my opinion and not defendable). Theres a complicated state machine, some funny thing called LRM, and some other gotchas. All this just to take and break it into it's constituent elements. Usually then you have a tree structure or some hierarchy that a computer can understand.

    Take a look at some common XML libraries: xerces, libxml, a few others I can't remember. They're pretty damn big. Mostly, I argue, due to the text nature of their data. A lot of work goes into making text files useable by a program. A lot (but not all) of cruft can be cut by adopting a format that is simpler for softare to understand.

    Sure, people who write MS Word (i.e.

  139. Using Serialized Script Code Instead of XML by Camel+Pilot · · Score: 1

    I am currently writing a xul client/server application. I am using the xmlhttprequest function. however instead of processing xml data which is slow, especially when you need to parse a data set several times a second, i started sending javascript code and data stuctures. In addition the server code is written in perl so for storing status and configuration information, I used serialized perl data strucures processing requirements fell dramatically. I still have the clear text editing and inspection capabilities without the speed and space issues.

    It seems like serialized script code, such as perl, python, java provides the benefits of xml without the headaches.

  140. Re:Binary = Proprietary ... I disagree by Austerity+Empowers · · Score: 1

    gah i got mungified

    (angle) mytag is='simple'(angle)

  141. Storing Images in XML? by zimage · · Score: 1
    Manufacturers of consumer devices such as Canon, as well as mobile-phone companies such as Nokia, have argued for a binary XML format. Without it, large files such as images will take too long to download to devices such as mobile phones, they argue
    Why are they storing images in XML?
  142. No. Period. by Anonymous Coward · · Score: 0

    Go find another standard to pollute, you asshats. We get to keep this one, and use it to reform HTML into something near its original goal: interop. DRM, compression, db, platform specific RPC, endian and everything else were all left alone so jackasses could fuck those standards up all they want. Throw us a fricken' bone and read the 10 guiding principles of XML again, douchebags.

  143. DNS is binary; does that make it proprietary? by Skapare · · Score: 2, Interesting

    DNS is binary; does that make it proprietary? Not at all. It is a published open standard in RFC 883 and later documents. Other examples include ASN.1/BER as used in SNMP. It's not whether it is binary or text that matters; it's whether it is openly documented and unencumbered by intellectual property claims (a separate issue some of XML has).

    The decision of binary vs. text for a format should be the result of specific needs. XML is verbose. XML can be compressed for transmission purposes, but it still has to be uncompressed to its verbose form for parsing. If speed in parsing is necessary (it might be as I have noticed quite many XML based progams are rather slow), a binary format can have things like length prefixes and continuation tags, instead of having to detect and verify collection of characters whose position is unknown. A parser that does not recognize a given tag, or does not need to process it, in a binary format can simply skip it by jumping the specified number of bytes. Binary format is very optimal for machine processing.

    The usual argument for a text format spans the range of permitting humans to create the content for most things directly in an editor like vi or emacs (no wars here, I listed my favorite last), or reading that content directly, such as to diagnose the real cause of misunderstood errors. XML is too utterly complex for human creation or interpretation to be effective on a direct basis. There may be some argument that it can still be effective for diagnostic purposes (I have in fact needed to do so many times). Given that it is the powerful tools of XML that are used as the basis for the benefit of XML and promoting it, then what does it really matter what format is underneath as long as it is open and unencumbered?.

    A binary format for XML will absolutely not kill XML. DNS is obviously not dead (and you'll love it even more when IPv6 rolls into your network). What a binary format might do is weed out some of the weaker programmers who are sticking their fingers a bit too deep into the inner workings of some applications and tools.

    --
    now we need to go OSS in diesel cars
  144. Nope, he's not spot on by ttfkam · · Score: 1

    What you're describing is not a data format. You are describing a prepended index. Very different animals.

    Nevermind that binary formats are soooooooo easy to algorithmically validate for correctness. Oh, look! A valid pointer! I sure hope it points to something useful... Oops! Unexpected NULL value. Better fix that... Okay! Now it works!

    user: could you add this feature?

    Hmmm... Gotta add it to the data structure... Okay, I've got to make sure the client and server protocols match by version. Damn. Gotta rework that validation code because my offsets have changed. (Etc. etc. etc.)

    Pop quiz: what is the binary representation of the string "my pretty little lamb"? How does it differ from the "text" representation of the same? How do you mark hierarchy? Do those markers use up less space than the one-byte '<' character? How do you allow for optional values as well as allow for modification and future expandability with a binary format? How much more efficient is binary parsing with validity checks for structure and data correctness when compared with text (XML) parsing?

    And finally, for fifty points, how expensive is your time as a developer as compared to hardware processing time as a dollar value?

    If your time writing, parsing, validating and debugging a binary format is cheaper over the course of a year than the same amount of money used to purchase server hardware, then you have made the right choice with a binary format.

    Oh! And don't forget to comment your code and document your binary format. Those really suck for future code maintainers to reverse engineer.

    Have a nice day! :)

    --

    - I don't need to go outside, my CRT tan'll do me just fine.
    1. Re:Nope, he's not spot on by Anonymous Coward · · Score: 0

      And finally, for fifty points, how expensive is your time as a developer as compared to hardware processing time as a dollar value?

      If your time writing, parsing, validating and debugging a binary format is cheaper over the course of a year than the same amount of money used to purchase server hardware, then you have made the right choice with a binary format.

      Oh! And don't forget to comment your code and document your binary format. Those really suck for future code maintainers to reverse engineer.


      And this is EXACTLY why binary XML would be a Good Thing. Read your comments again; looks to me like your main objections would all be answered by a standardised format with good library support...

    2. Re:Nope, he's not spot on by Anonymous Coward · · Score: 0

      What you're describing is not a data format. You are describing a prepended index. Very different animals.

      And very difficult to do in a text-based format. Maybe you could do line and column, but that still takes a lot more time to seek through.

      Nevermind that binary formats are soooooooo easy to algorithmically validate for correctness.

      They are, assuming you design it properly. The simplest method is adding checksums. Since XML are binary as well, anything that can be done with XML can be done with binary.

      Pop quiz: what is the binary representation of the string "my pretty little lamb"? How does it differ from the "text" representation of the same?

      I'm guessing this specific example doesn't come up too often... The data isn't what makes XML inefficient, the structure is.

      A typical, simple encoding would be a 32-bit length followed by the raw bytes, zero padded to the nearest 32-bit boundary. You need to decide on endianness for the length word. Typically you can dictate the character encoding when you design the standard. Loading the string is basically a malloc and an fread. Saving it is similarly simple.

      For XML, you need to figure out what encoding the data is stored in (including endianness btw), which may be in the file, or you may just have to guess. You need to pry it from the surrounding string data and remove any intersparsed tags that have no meaning to you (including comments), and then you need to unescape it (using a table of entity declarations, some of which may be in the file you're parsing). Nowhere during this process do you know how long the resulting string will be (or even intermediate ones), so you need to dynamically resize your buffers and copy over data multiple times.

      How do you mark hierarchy?

      Nested packets would be the most obvious method, I'd think.

      Do those markers use up less space than the one-byte '

      4 bytes, and maybe a 4 byte checksum versus four angle brackets, a slash, and the tag name repeated. I think that compares decently.

      How do you allow for optional values as well as allow for modification and future expandability with a binary format?

      Probably the same way that binary formats have been doing for ages. PNG, I think, has the most spiffy, where each block type gets a 4-character tag, where case tells software that doesn't recognize the tag what to do with it.

      How much more efficient is binary parsing with validity checks for structure and data correctness when compared with text (XML) parsing?

      That depends on how careful your error detection and correction is. With XML you have to guess a lot for corrections.

      And finally, for fifty points, how expensive is your time as a developer as compared to hardware processing time as a dollar value?

      If you're developing a widely-used application, remember to multiply out that hardware cost by the number of users.

      Binary formats are pretty simple (if you're not an idiot) anyway. Depending on the language you're developing in, possibly even simpler than using SAX or parsing a DOM tree.

      And don't forget to comment your code and document your binary format. Those really suck for future code maintainers to reverse engineer.

      If you have the code, not as bad as you might think. Without code, XML does have a significant advantage here.

    3. Re:Nope, he's not spot on by starm_ · · Score: 1

      "Hmmm... Gotta add it to the data structure... Okay, I've got to make sure the client and server protocols match by version." Same problems arize with XML.

      "Damn. Gotta rework that validation code because my offsets have changed. (Etc. etc. etc.)"

      none of this needs to be dealth with directly. see below...

      "And finally, for fifty points, how expensive is your time as a developer as compared to hardware processing time as a dollar value?"

      Let see, in Java all you need to do is add:

      "implements Serializable"

      before your class. Is that more complicated than using XML?

  145. Binary XML no / Pointer XML maybe? by Anonymous Coward · · Score: 1, Interesting

    I don't know that I care about or for "binary XML". I don't terribly worry about the efficiency that might be gained by converting a textual integer like 3,000,000,000 into a 32 bit binary integer.

    However, I might be interested in a "Pointer XML" - in an XML that allows me to use lseek like operations to efficiently move around a document.

    XPaths conceptually require parsing lots of the document. It's hard to skip over pieces - you have to process all of the byres from the start of the document to the first place where the XPath matches.

    Most of the "optimized XML" formats create a hash table from Xpath to file location or binary. But this is still at least O(length of Xpath string).

    If there was a way of providing the link as a textual integer, and then lseeking to this, it's O(lg NbytesInXmlDoc). That might be a saving.

    (Adage: don't worry about constants like 2X or 4X. Do worry about changing the O() efficiency.)

    There would be no reason that such a "Pointer XML" could not remain entirely textual. It might simply be an extra syntax or modifier to an Xpath:

    Instead of linking to xpath /a/b/c
    Link to /a/b/c || byte_position=5454786

    The lseek positions would have to be in bytes, not characters, and would get confused if the coding system were changed. But they would at least be useful an usable if the coding system were not changed.

    The hard part would be ensuring consistency. E.g. in the example above, you would want to ensure that the element at byte_position=5454786 really was the xpath /a/b/c. It would be bad if it was not. But I think that sort of thing could be checked in the same way that we check DTDs.

    Also, some minor annotations, such as placing anchors at the lseek-ed to byte position, might help in maintaining such consistency.

    Moreover, I would never advocate abandoning XPaths - I would just be suggesting including the lseekable byte positions as a performance hint. It should also be correct to ignore the byte possitions and just use the XPath links.

    By adding padding (blanks, whatever) you could avoid the need to change all of the lseekable byte position hints whenever you changed an element value.

    1. Re:Binary XML no / Pointer XML maybe? by pclminion · · Score: 1
      What you describe is surprisingly similar to PDF. A PDF file is a stream of objects with a cross reference table to allow random access. The format is general and can be used to store all kinds of stuff besides PDF documents.

      Adobe uses a language called FDF for submitting PDF form data to form servers. The format is identical to PDF from a structural standpoint and serves the same purpose as XML for that application.

      Another advantage to PDF/FDF is that it is human-readable (unless you use compressed object streams, which is another issue). However, generating the cross reference table is not something easily accomplished by hand.

  146. As if... by Anonymous Coward · · Score: 0

    ...the world doesn't need XML let along binary XML.

    USE='-xml -xml2'

    XML is ugly and totally not needed except by those 'dot-com all day long' fucks.

  147. Re:For Starters--Try looking at what I really said by Nom+du+Keyboard · · Score: 1
    Good idea. Without Microsoft's support from their tools division, this idea will be dead on arrival..

    I never said they couldn't use it afterwards. Only that they should be kept away from the design process so that they cannot warp it to their own ends.

    I would hope they would use it aftewards, along with everyone else in the very same way. Their track record with standards is, however dismal by anyone's definition.

    --
    "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
  148. s-expressions as compact XML alternative by Anonymous Coward · · Score: 0

    Why not use s-expressions if you need a sleeker representation than what XML gives you ? Admittedly, s-expressions are poorly suited to document representation (ie, openoffice documents), but for places like wire protocols, they seem ideal.

  149. XML is stupid.We need a data binary common format. by master_p · · Score: 1

    XML is plain stupid as an idea. Before you flame me, here are the reasons:

    1) XML is almost unreadable by humans, especially if the XML document is complex and long.

    2) XML is unreadable by computers. Computers need to parse XML, then convert it to binary data, then back to XML.

    3) XML is not object-oriented. We made such a fuss the previous decade about object-orientation, and now our data are not object-oriented! XML applications must know apriori what to do with an XML document.

    4) XML is easily parseable by computers, they say. So what? binary data are just as parseable, even more than text, and they have been so from the beginning of the computer era.

    5) XML is editable with a text file, they say. Yet nobody uses a simple text editor to make XML files...we all use GUI apps.

    What we need is a binary data format, that is structured and treated just like XML. It would be best if the format was object-oriented, i.e. a computer could ask some form of code to accompany the data (maybe p-code). Nowadays all computers are at least 32-bit...from the smallest handheld or cellphone, to the mightiest mainframe, all computers can easily handle 32-bit data. There is no excuse for the lame XML format.

  150. Binary XML already exists today by Anonymous Coward · · Score: 0

    it is called CORBA

  151. Isn't this what ASN.1 was for? by Anonymous Coward · · Score: 0

    I though this was what ASN.1 was for, but the XML folks didn't like it because it wasn't human readable.

    What can binary XML do that isn't already solved by ASN.1?

    1. Re:Isn't this what ASN.1 was for? by PengoNet · · Score: 2, Informative

      The "Fast Infoset Project" for creating Binary XML as mentioned in the article is using ASN.1. See this blog entry by Rick Jelliffe for details.

      Fast Infoset is to ASN.1 what XML is to SGML. At least if it becomes the standard anyway.

  152. I know by Pan+T.+Hose · · Score: 1

    I'll say it again.. Its not the size of the document its--

    --it's how you use it. I know, I hear it all the time...

    --
    Sincerely,
    Pan Tarhei Hosé, PhD.
    "Homo sum et cogito ergo odi profanum vulgus et libido."
  153. Re:Binary not needed - better table format neeeded by dutky · · Score: 1
    DunbarTheInept wrote:

    This is what made us balk at using XML for storing NMR spectroscopy data...The current textual form is whitespace-separated, little short numbers less than 5 digits long, for hundreds of thousands of rows...

    ...we can't find an in-memory representation of the data as a table which is more compact than the ascii file is. The original ascii file is even more compact than a 2-D array in RAM. (because it takes 4 bytes to store an int even when that int is typically just one digit and is only larger on rare occasions.)


    Huh? It should be trivial to devise a more compact memory structure for that data. Given only a few minutes of thought I came up with this:


    Every value gets a single byte in a big 2D array. All bytes values but one are valid data values. The remaining byte value (the key value) is used for large data values (greater than can be encoded in a single byte). For these data values, we look in a second array which contains entries only for the large values. Assuming we need a 1024x1024 matrix, the C declarations are:
    char matrix[1024][1024];
    int *big_value[1024];
    #define KEY_VALUE 0xff
    Each row of big_value corresponds to one row in matrix and has one element for each element in matrix that contains the key value. In order to find the value of any data point (x,y), first look it up in matrix: if the value matrix[x][y] is not the key value, you have the actual data value. If the value matrix[x][y] is the key value: find out how many key values preceed this data point in row x (call that n) and then get the value at big_value[x][n] which is the actual value.

    The C function to get a value from this data structure is:

    int get_value(int< x, int y)
    {
    int i, n;

    if(matrix[x][y] == KEY_VALUE)
    {
    for(i=n=0; i<y; i++)
    if(matrix[x][i] == KEY_VALUE)
    n++;

    return big_value[x][n];
    }

    return matrix[x][y];
    }
    The code to insert into the data structure is left as an exersise for the student (it will require some dynamic memory allocation and an initialization routine).

    This should be about 1/4 the memory requirement of a simple 2D array of integers (assuming 32-bit integers), so long as most of the values in data set can be represented in 1 byte.

    Other, more exotic, data structures could be used to get even better memory efficiency: they are called sparse matrices and have been well understood for decades. Do a google search, go to the library or ask a computer scientist for advice.

    (Thank-you, thank-you. I'll be here all week. I do weddings and bar-mitzvahs and am available for hire)

  154. Here are your answers by Anonymous Coward · · Score: 1, Interesting
    It doesn't tell us what the specific performance problems are with XML. Does it take too long to transmit? Does it take too long to validate? Does it take too long to parse? Does it take too long to format? What's the real problem here?

    I cannot believe that your naieve post was modded up to a 5. FWIW the answer to all of your above questions is a resounding "Yes!", although some deserve a stronger "Yes!" than others. Let me state for the record that, from your newbie questions, you are XML-ignorant. And you apparently did not take compiler theory, where you would have learned how computationally expensive parsing was. But you are hardly alone; the industry is full of dumbasses who don't understand what's happening. I, on the other hand, predicted these problems four years ago and have yet to receive my Nobel Prize.

    XML is a cluster fuck for the following reasons. Any message must be:

    1. encoded to XML on the server,
    2. transmitted over the network (but the XML message is longer and requires greater bandwidth),
    3. received by the client,
    4. parsed by the client into some structure(which may require fetching the DTD over the network),
    5. If an error occurs, the message must be retransmitted, otherwise
    6. the relevant fields must be selected from the parsed structure.

    Note that at every step XML requires more CPU, more memory and more bandwidth. This is true for every component of the network! There is no way around these problems other than sheer computing power and throughput. So, one might say, the problem will disappear if we merely wait a few years. Unfortunately other factors are loading the Internet even more than XML, sapping Moore's Law.

    And that's without considering the problems of the W3C's various XML committees! But don't get me started.

    1. Re:Here are your answers by Da+VinMan · · Score: 1

      Umm... yeah, well *I* predicted that XML would never get adopted because it was so expensive (CPU-wise) to use. But, gee, I guess I was wrong.

      So, apparently, the cost of XML parsing is acceptable. Which makes all of the so-called dumbasses the decision makers and you and I, the peons. So take your superior attitude and shove it.

      I happen to be painfully aware of the tradeoffs XML makes. But the industry chose to embrace it. Now that we've embraced XML, it's suddenly not efficient enough. I'm just trying to make sense of all this. Given that XML is meant to be human readable and self-describing, there really is NO TECHNICAL REASON WHATSOEVER to use it. So, we're back to human factors, where compiler theory doesn't matter one whit.

      You know, it's probably good you posted as AC. That way no one would know that you chose to disengage the more important parts of your brain before firing a full salvo of completely correct and completely irrelevant trivia. Typical nerd...

      --
      Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!
  155. This too shall pass by frisket · · Score: 1
    XML has its place. But it is next to HTML, and not next to RPC or databases!

    Yessssss! Finally someone who understands XML. (Well, nearly.)

    It's for storing text for publishing. Remember publishing? Books? Articles? Reports? Text documents? Sure, you can stuff it with spreadsheet data and send it over the wire. Sure, you can generate it from a database with element names 25 yards long and containing 1 byte of data. You can also try to open a sardine can with a banana.

    But if you edit XML with Notepad, you a) haven't understood XML and b) deserve everything you get. It's by no means brain-dead, and it had nothing to do with scripting developers (a fine red-herring, that). It certainly has been pushed into areas where it didn't belong. It's a tribute to XML that it has actually performed well in those areas in circumstances where the document type has been carefully designed, but they are rare.

  156. Re:Binary not needed - better table format neeeded by edp927 · · Score: 1

    what about:

    <column name='column 1' etc='some other stuff'>123 456 789 135 458 432 <!-- ... --></column>
    <column name='column 2' etc='some other stuff'>789 135 458 432 <!-- ... --></column>
    <column name='column 3' etc='some other stuff'>123 135 458 432 <!-- ... --></column>

    XMLSchema has support for list types.
    see http://w3.org/TR/xmlschema-0/#ListDt

    Now, DOM/SAX/etc have no useful/performant way to separate this data, but that's your fault for using DOM/SAX as an API.

  157. Re:Binary = Proprietary ... I disagree by theIG · · Score: 1

    "I guess you have never heard of a large segment of the computing world refered to as embeded systems."

    Since we are discussing xml, I never considered embedded systems. I was simply responding to his statement that text was a sloppy way to store data. I wasn't refering to xml, just that statemtent.

    I personally think that xml is well suited for very few applications. I've never found a good reason to use it over other, more conventional formats.

    I completely agree with you, but that wasn't my point. I guess I should be marked off topic.

  158. Re:Binary not needed - better table format neeeded by DunbarTheInept · · Score: 1

    You are correct that your scheme would work to fix the problem I described. I made the mistake of describing the problem as being a lot simpler than it really is, however.

    --

    Don't label something "offtopic" unless you know the topic well enough to tell what's on topic.

  159. Re:Binary not needed - better table format neeeded by DunbarTheInept · · Score: 1


    that's your fault for using DOM/SAX as an API

    The biggest reason for considering using XML is that it "bought" us access to some useful standard libraries and tools that others could use to look at our data. Get rid of that, and there's also no reason to bother with XML anymore and we might as well go with something we invent on our own.

    --

    Don't label something "offtopic" unless you know the topic well enough to tell what's on topic.

  160. "Random access", not file size, is the key issue by PengoNet · · Score: 1

    Regular compression (like gzip) helps the file size issue, but it does not allow for random access of the XML.

    Wait a minute, XML is text and you cannot randomly access it anyway. Well that's the point of binary XML. The focus on XML compression seems to be missing the key advantage of binary XML. That is, a binary XML format could allow indexes of elements and attributes for fast access of complex pointer-rich data structures.

    Random access of a text format simply cannot be done in a sensible way. gzipping XML doesn't help give XML random access.

  161. Bah, you really don't know XML, do you? by Anonymous Coward · · Score: 0

    You left out so much there it's not funny. Let's try for some *real XML, shall we?

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE binaryxml [
    <!ELEMENT binaryxml (encoding? bytes)>
    <!ELEMENT encoding EMPTY>
    <!ATTLIST encoding base CDATA #REQUIRED>
    <!ELEMENT bytes (byte*)>
    <!ELEMENT byte (bit+)>
    <!ATTLIST byte bits CDATA "8">
    <!ELEMENT bit (#PCDATA)>
    <!ATTLIST bit seq CDATA #REQUIRED>
    ]>
    <binaryxml>
    <encoding base="64"/>
    <bytes>
    <byte bits="8">
    <bit seq="0">0</bit>
    <bit seq="1">1</bit>
    <bit seq="2">1</bit>
    <bit seq="3">0</bit>
    <bit seq="4">1</bit>
    <bit seq="5">0</bit>
    <bit seq="6">0</bit>
    <bit seq="7">1</bit>
    </byte>
    <!-- snip -->
    <byte bits="8">
    <bit seq="0">0</bit>
    <bit seq="1">0</bit>
    <bit seq="2">0</bit>
    <bit seq="3">0</bit>
    <bit seq="4">0</bit>
    <bit seq="5">0</bit>
    <bit seq="6">0</bit>
    <bit seq="7">0</bit>
    </byte>
    </bytes>
    </binaryxml>

  162. Re:Storing Images in XML? Here's why... by dbk25 · · Score: 1

    Your question should be "Why would the want to store images in XML?"

    And the answer is that, much more often than not, you need to store meta-information or additional information related to the the image; and that is typically in an extensible, self-defining, hierarchial, tag-value format.

    Note, by the way, that the question applies equally well to audio.

    There are several completely different formats in which to store a picture or audio along with additional information. It's been often noted that this information is exactly the kind of information that XML holds quite nicely. The actual pixel or audio data, however, does not fit in XML well. These are typically stored in binary, because they tend to vary from large to gigantic; for this reason compression is quite common. Size is very much a concern for these data. To store in XML, these need to be converted to character, adding an often-unacceptable increase in size. XML is not an option.

    There is a lot of attractiveness in having a single, format that could be used for binary data with much of the benefits that XML has for text. Experience with existing applications point out viability and benefits for future uses, even if existing applications remain with the current standards.

    Note: I've heard the suggestion that the additional or meta-content could be in XML, and have the XML reference the binary content. Although that works for the relationship of HTML and images on the web, for this purpose it's a non-starter. These formats are designed to contain the data, not just describe it; in this model, you would still need to define some format to hold the data. (And, then, why not just stick with the current?) Managing the two parts separately introduces possible mismatch risks not possible when they are a single unit; they could get separated, become different versions, etc. An absolute location reference mechanism can prevent copying the data from place-to-place, and even a relative reference mechanism complicates using an instance. Just not going to fly.

  163. Can't they just bzip2? by mixmasta · · Score: 1


    What's wrong with just compressing (bzip2/gzip) the files to speed downloads... and save space? I doubt all the string handling is that big a bottle neck on modern processors. It's not like they are using XML to render fur or anything.

    Can anyone here set me straight?

    --
    #6495ED - cornflower blue
  164. Why not make the closing tag optional? by Anonymous Coward · · Score: 0

    I can think of one simple thing that could help a bit: make the closing tag name optional. In other words, instead of <tag>data</tag> you could simply use <tag>data</>.

    I've taught basic XML to a number of people, and in almost every case, they have the same reaction: "Why does the field name have to be duplicated like that?". I think it's a question that's deserving of some serious consideration.

  165. leading 0 by PengoNet · · Score: 1

    Except now people think you're talking in octal.

    010 is eight.

  166. Why not create a standard XML-Stream? by gbrayut · · Score: 1

    It seems that a solution to both problems would be to create a standard xml-stream or xml transport protocol that could be utilized when you need to extract information from a database yet still be able to render it in xml form on the receiving end for maximum flexibility.

    This would offset a large portion of the parsing and DOM work to the client side, which would be ideal for web services that are currently overburdened by having to generate the markup required by XML.

    The transport protocol could minimize the redundant information by first defining the document's structure and then transferring the data in a more compact form. The receiving side could then either recompile it into standard xml or if it already knew the final destination of the information (such as a database) it could bypass the extra parsing and directly access the required data.

  167. existing standards: ASN.1, WBXML by p00ya · · Score: 1

    I don't think so. Marked-up binary similar to what's in EBML has been around in the telecommunications industry for a while. There's ASN.1 (complete with standardised XML encoding), and also WBXML (oh, and this *is* a W3C standard). Still, their design is at odds with many of the principles behind XML, but they're extensible and contain tag-like metadata.

    1. Re:existing standards: ASN.1, WBXML by axlrosen · · Score: 1

      WBXML (oh, and this *is* a W3C standard)

      Nope:

      This document is a NOTE made available by W3C for discussion only. This indicates no endorsement of its content, nor that W3C has had any editorial control in its preparation, nor that W3C has, is, or will be allocating any resources to the issues addressed by the NOTE.

  168. Re:ASN.1 rules! Great Opensource Compiler! Free bo by XorA · · Score: 1

    Eh, you stuck this on the wrong post, I pointed out ASN.1 is not dead. Or have you never heard of GSM?

  169. Possibly I'm a cynic by smittyoneeach · · Score: 2, Funny

    ...but I thought that the strategic goal of XML is to sell more hardware.
    We should rejoice, buy more CPUs, and move the problem from XML, to languages with poor concurrency support.

    --
    Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
  170. There's something better already developed. by voodoo1man · · Score: 1

    Universal Binary Format has been around for a few years now, and it includes everything binary XML would have, but in a cleaner, more well-thought out form, in addition to having an extra higher-level protocol for inter-machine transport and security issues.

    --

    In the great CONS chain of life, you can either be the CAR or be in the CDR.

  171. Re:Binary not needed - better table format neeeded by marcosdumay · · Score: 1

    Gzip the XML document and it will be even smaller than the original notation. Gzip removes the extra verbosity of the tags and of your data, and you gain standard data representation for free (accounting only storage space).

  172. Re:Binary = Proprietary ... I disagree by Mark+Pitman · · Score: 1
    What is text? It's a binary code that a computer translates into graphical glyphs. Is it proprietary? Not any more.

    To complicate matters further, there are ASCII text files, UTF-8 text files and Unicode text files (and of course EBCDIC on IBM mainframes). If you only have an ASCII text reader, you won't be able to read Unicode and you may not be able to read a UTF-8 file if it has characters in it that take more than one byte. So "plain text" doesn't really mean one thing anymore.

  173. My thoughts by mnordstr · · Score: 1
  174. 20 years ago by vlad_petric · · Score: 1
    That was 20 years ago. These days, compilers are likely to produce better machine code than a human being.

    The problem is that high performance microprocessors are simply too complex. How many assembly hackers actually understand out-of-order execution (which pretty much all desktop processors do, with the notable exception of Transmeta) ? How many are aware that branches can sometimes severely degrade performance and thus do tricks like predication with conditional moves or loop unrolling ? How many perform loop pipelining, to eliminate data stalls/waits ?

    And yeah, a good compiler does all this.

    --

    The Raven

  175. Why modded Flamebait? by Anonymous Coward · · Score: 0

    I fail to understand why parent is modded down, he is absolutely right.