Slashdot Mirror


Tim Bray On The Origin Of XML

gManZboy writes "Queue just posted an interview with XML co-inventor Tim Bray (currently at Sun Microsystems). Interestingly enough the interviewer is none other than database pioneer Jim Gray (currently at Microsoft). Among other things, in their discussion Tim reveals where the idea for XML actually came from: Tim's work on the OED at Waterloo."

20 of 218 comments (clear)

  1. Oh boy... by Alwin+Henseler · · Score: 2, Insightful
    So this guy Tim Bray is one of the people we have to thank for replacing compact, binary config files with 'human-readible', resource-intensive XML, that needs specialized libraries to make sense of it?

    Thanks Tim, the world owes you one!

    But okay you're right, you gotta use those CPU cycles for something...

    --Don't give the world what it asks for, but what it needs.

    1. Re:Oh boy... by MrLint · · Score: 4, Insightful

      Umm doesnt any kind of config file require specialized code to read it?

      As you wither need metadata to interpret the binary data, or know the predetermined data layout to read it, that sounds kinda specialized to me.

      The other option is plain text with encoded binary data. This isnt bad, its human readable, kinda, it doesnt explain the encoded binary data. metadata is also needed. I can think of xinitrc files and old ini files from win16. Has to be parsed as plain text. No guarantee of best practice or anything

      XML, well human readable, some meta info. still encoded binary data. This bonus here is the layout has at least some kinda standard to adhere to, and its possible in theory for one XML parser to read any arbitrary XML file.

      So in any case you get a deal with faust. Not human readable, or something that needs to be parsed.

    2. Re:Oh boy... by Alomex · · Score: 4, Insightful

      Try making sense of your "compact binary config files" when something goes wrong, or when you want to port the config to a different application.

      Yes, CPU cycles are cheap. CPUs sit idle over 90% of the time, even when there is a user in front of it. Spending the extra power processing 10K properly tagged files that are compatible across platforms rather than incompatible binary files is one of the best uses of raw CPU power we had.

    3. Re:Oh boy... by Laxitive · · Score: 4, Insightful

      Uhm, sorry, do you even know what the hell you're talking about?

      Let's dissect this piece by piece.

      >> "So this guy Tim Bray is one of the people we have to thank for replacing compact, binary config files"

      Who the hell said anything about config files?

      And we have tools to make things "compact" for us. It's called "compression".

      >> "with 'human-readible', resource-intensive XML, that needs specialized libraries to make sense of it? "

      Yes. Human readable. I'm a human. I can read it. Thus: Human readable. I don't understand what the quotes were for. Or your misspelling of "readable".

      And "specialized libraries"? Oh, right.. I forgot. Binary formats don't NEED libraries to parse. Yep. Dunno why libjpeg62 even exists, when it's patently obvious you can just dump jpeg data straight to video memory. Oh yeah, who needs Microsoft Word. I just "cat resume.doc >/dev/lp" to print my documents. Cause it's binary you see. I don't need a library to parse it.

      >> "Thanks Tim, the world owes you one!

      But okay you're right, you gotta use those CPU cycles for something... "

      No shit sherlock. Using CPU cycles to strictly check the type-validity of self-describing documents seems pretty worthwhile to me.

      -Laxitive

    4. Re:Oh boy... by AaronGTurner · · Score: 2, Insightful

      There may be a lot of spare compute cycles about, but what is critical is the ability to process XML in a timely manner on the CPU power that happens to be available at that precise instant in time at the appropriate location. Looking at the average CPU cycles used is like sitting in a traffic jam at 8am and noting that, on average, the road you are on is only used at 10% capacity. It being free at 4am is not much good if you are trying to get to work for 9am.

  2. Re:This is article is amazingly honest by Evil+Grinn · · Score: 1, Insightful

    Some bright bunny came up with the idea of using perl stringified data structures instead using Data::Dumper.

    Uhh.. that's one of the things that Data::Dumper was designed to do.

  3. Re:This is article is amazingly honest by sicking · · Score: 2, Insightful

    Amazingly, for such a popular method of 'communication' between and within applications, XML is admitted by most to be rather flawed and bulky...

    Yep. That didn't stop Microsoft from adding even more weight to it by creating SOAP though. Now there's a bulky format. It's like shipping a shirt-button in container on an oiltanker.

    --
    Failing to learn from history dooms you to repeat it.
  4. Re:Why, oh why, did they have to repeat the tag na by Alomex · · Score: 5, Insightful

    why the hell does the end tag name have to be repeated?

    Because that is the single biggest source of headaches in parsing SGML, the precursor of XML, in which such a construct is allowed.

    It also makes error recovery very difficult, something that we know is quite important from all that malformed HTML code out there. The XML creators knew that too.

  5. What it should have looked like by Anonymous Coward · · Score: 5, Insightful

    I think XML should have looked more like this:

    (html
    (head
    (title "This is an example"))
    (body
    (h1 "A first level header")
    (p "There's no reason for all the extra characters.")
    (p "Although this looks like LISPy HTML it could have all the features of XML")))
    1. Re:What it should have looked like by ikkonoishi · · Score: 2, Insightful
      Sounds great... but then this happens

      (html
      (head
      (title "This is an example")
      (body
      (h1 "A first level header")
      (p "There's no reason for all the extra characters.")
      (p "Although this looks like LISPy HTML it could have all the features of XML")))

      Now your entire webpage is blank. What happened?
  6. Re:This is article is amazingly honest by filmmaker · · Score: 2, Insightful

    That depends on what you're transacting. Plus, there's a forest for the trees issue here. We're already using a sub-set of XML for most HTTP transactions - that is, HTML. A move to XML standards simply opens up a huge array of opportunities for robotic transactions, as well as leaving the field relatively wide open for web developers of traditional varieties. It's a positive good, RSS, being an obvious example of why.

  7. Re:Why, oh why, did they have to repeat the tag na by syukton · · Score: 2, Insightful

    Yeah that'd work great if you knew 100% of the time that you'd never get bad data. If you've got a multi-nested element hierarchy however and you lose one or two of your , how do you know where to put them back in? It's very easy to look for an opening tag followed by a closing tag of the same name, especially when building a parser that error-checks.

    You know what would cut down the datagram size more? Smaller tag names. Tag names don't have to be readable so much as uniquely identifiable; you can use an interface layer in the editor to make the tag names user friendly and then de-friendify them for transit. Then you've got:

    <a>
    <b>woo&lt/b>
    </a>

    insted of:

    <element>
    <subelement>woo&lt/subelement>
    </ele ment>

    According to wc, switching to single-character element names instead of the multicharacter ones would give a 41% reduction in bulk, for the example above.

    --
    Reinvent the wheel only at either a lower cost, greater effectiveness, or your own personal enrichment and satisfaction.
  8. Please explain by johannesg · · Score: 2, Insightful

    I've heard this quote in relation to XML before, and I don't get it. LISP is a programming language. XML is a method for storing data. About the only relation between the two that I can find is that both use nesting. So, why does this get brought up whenever XML is being discussed?

  9. Explicitness by samael · · Score: 2, Insightful

    Because it would make spotting your bug harder. Did you _mean_ to close that tag, or did you think you were closing a different tag? If all closing tags look the same it would make tracing certain bugs harder.

  10. Re:Semantic web snake oil... by bblfish · · Score: 2, Insightful

    I work with Tim Bray, but I seriously disagree with this position of his. If you had gone back to the days before xml was invented you could have made exactly the same argument against xml: "SGML was not a success, therefore XML can't be". I have blogged about this falacious argument at length. You can work with the Semantic Web without having to take on the most difficult problems of AI. You can use it to work on some really simple problems very effectively. Speaking of "frauds", "ignorants" and "snake oil" when speaking of this project is really simplistic and (dare I turn the arrogance of the above poster against him?) stupid.

  11. [OT] bad summary by hankaholic · · Score: 4, Insightful
    Tim reveals where the idea for XML actually came from: Tim's work on the OED at Waterloo.
    If you believe that "OED" will be misunderstood by enough people to justify enclosing it with a link to a definition, why not just spell out "Oxford English Dictionary"?

    "Hmmm, OED might be unclear to tons of people reading this, I'll make them have to click on a link to know what I'm talking about."

    Obligatory relation to discussion content:

    Providing a link instead of writing a clear summary is choosing the wrong tool for the task at hand. Authors of some other comments in this thread have shown that XML also is the wrong tool for many of the tasks to which it is applied. Whether it's passing data internally within an application or summarizing an article for the homepage, choosing the right alternative can make a difference between efficient clarity and an inelegant kludge.

    Applying the right algorithmic tool to the right problem is actually a focus of CS. This is why sorting routines are often studied -- for instance, a routine which is more efficient at sorting millions of unordered pieces of data may be very wasteful when dealing with nearly presorted data.

    The distinction is not often understood and has more of an impact that the observer might think. For instance, when writing an application for a handheld in which data is kept sorted and is usually viewed between insertions it makes sense to sort after every data element added to the database. However, this means adding a single item to a mostly-ordered set. Understanding that quicksort is a poor choice for this application means a difference in battery life.
    --
    Somebody get that guy an ambulance!
  12. Re:Please explain-Chinese firewall. by SnowZero · · Score: 2, Insightful

    And they have infix notation...

    S-expressions are in prefix notation. Infix describes expressions such as "1+2". Lots of parenthesis is hard to read, but twice that number of angle brackets is certainly not easier.

    Blurring the line between data and code is a useful technique...

    This only matters if you use the data in Lisp without being careful. Any non-interpreted language could use it just as safely as XML.

    P.S. I don't even like Lisp, being a person who likes type checking before I actually execute a snippet of code. On the other hand, they really do have a point regarding S-expressions and XML.

  13. Re:Semantic web snake oil... by bblfish · · Score: 2, Insightful
    Tim thinks so, and so do I.

    My suggestion to you: don't put too much weight on Tim Bray's bet. If you look carefully at his rdf.net challenge you will notice that the wording leaves him ample space to maneuvre were things to turn out agains him:

    • This has to happen before January 1, 2006, and
    • I am the sole judge and jury, but
    • Ill publicize anything thats submitted formally, and my comments on it, so Im doing this in the open, except for
    • Im busy, so I may exercise fairly brutal triage on incoming proposals and take a while to get to the ones really worth looking at, and
    • If theres serious money in it, the recipient of RDF.net is morally obligated to find a way to cut me in for a piece of the action.
    I like the last one: if someone has a idea that is going to make a lot of money they can have rdf.net for free if they cut him in on the action. wow! here is a man who really does not believe anything is going to happen :-)
  14. Re:Semantic web snake oil... by Jagasian · · Score: 4, Insightful

    If your post could be modded above a "5", I would mod your post as "insightful". I guess people have no memory, and that is why these Semantic Web frauds get grants, venture cap, etc. They have these big promises of seemlessly integrating web services... AUTOMATICALLY?!?!

    The easiest way to disprove their crap is this. Even in RDF or OWL, it is possible to have "semantic aliasing", i.e. multiple ways of representing the same concept. This is exactly the core problem that they claim they address and that they claim that XML does not address. Think about it, how can automated inferences be made, if two concepts have distinct _semantic_ (not just syntactic) representations? Furthermore, it can be shown that in general these different representations cannot be automatically determined to represent the same thing.

    So their entire project is a farce! It is a bunch of people that are both ignorant of pertinent theoretical mathematical results on computability, completeness, and hell, the fact that even in axiomatic set theory there are multiple ways to represent... say... the real numbers... and they are also ignorant of practical computer/sofware engineering and sociological limitations.

    They have stop-gaps: ontologies. Oh if only people could agree on one common unified ontology, the entire semantic aliasing problem would be solved... or so they seem to think. But just because people agree on a common vocabulary, the way it is used can still give rise to the semantic aliasing problem. So even though the fact that agreeing on some complete or near-complete ontology is going to be IMPOSSIBLE, even if it was done, it still wouldn't fix the deep underlying problems with the Semantic Web - problems that have been struggled with for over 100s years in the field of formal mathematics.

  15. Re:Intra-vendor XML is (usually) stupid by mi · · Score: 3, Insightful
    Then you are not using XML right.

    Does anybody?.. I guess, not...

    clearly you guys are spending too much time coding and not enough thinking

    No disagreement here -- that was my point, in fact.

    two thousand floating points ain't a giant string, unless you are programming an 8086 in Elbonia.

    Just tested simply sprintf-ing the same double 2000 times into the same text buffer on a PII-Xeon @450MHz with 2Mb of L2-cache, the whole program and the puny buffer are entirely in cache (which is not the case in real-life). 5-16 milliseconds (of user time, ignoring the sys-time)... The PII is not much slower, than the Sparcs we are using. Even if the latest and greatest CPUs are 10 times faster (which they aren't), why waste their power on chewing XML tags?

    Converting two thousand numbers to text should take 50 microseconds at the most.

    Now add the time to parse it on the other end, and consider, that the whole point of passing it is to have some computations happen. And the computations themselves happen in about 200 milliseconds...

    Now realize that size of the XML-file is 3-4 times bigger than it needs to be -- but the network packets are still 1500 bytes and with XML we need 5 or 6 (at best) instead of 2. Bandwidth is cheap, but latency is not...

    Now throw in the loss of precision from the double-text-double conversion(s) and climb up the wall next to me...

    Using XML in such scenarios is like overnighting papers from one end of the office floor to the other. Defending this practice is like saying, that FedEx is really fast and efficient everywhere except in Elbonia...

    --
    In Soviet Washington the swamp drains you.