Slashdot Mirror


Tim Bray On The Origin Of XML

gManZboy writes "Queue just posted an interview with XML co-inventor Tim Bray (currently at Sun Microsystems). Interestingly enough the interviewer is none other than database pioneer Jim Gray (currently at Microsoft). Among other things, in their discussion Tim reveals where the idea for XML actually came from: Tim's work on the OED at Waterloo."

8 of 218 comments (clear)

  1. Re:Oh boy... by MrLint · · Score: 4, Insightful

    Umm doesnt any kind of config file require specialized code to read it?

    As you wither need metadata to interpret the binary data, or know the predetermined data layout to read it, that sounds kinda specialized to me.

    The other option is plain text with encoded binary data. This isnt bad, its human readable, kinda, it doesnt explain the encoded binary data. metadata is also needed. I can think of xinitrc files and old ini files from win16. Has to be parsed as plain text. No guarantee of best practice or anything

    XML, well human readable, some meta info. still encoded binary data. This bonus here is the layout has at least some kinda standard to adhere to, and its possible in theory for one XML parser to read any arbitrary XML file.

    So in any case you get a deal with faust. Not human readable, or something that needs to be parsed.

  2. Re:Oh boy... by Alomex · · Score: 4, Insightful

    Try making sense of your "compact binary config files" when something goes wrong, or when you want to port the config to a different application.

    Yes, CPU cycles are cheap. CPUs sit idle over 90% of the time, even when there is a user in front of it. Spending the extra power processing 10K properly tagged files that are compatible across platforms rather than incompatible binary files is one of the best uses of raw CPU power we had.

  3. Re:Oh boy... by Laxitive · · Score: 4, Insightful

    Uhm, sorry, do you even know what the hell you're talking about?

    Let's dissect this piece by piece.

    >> "So this guy Tim Bray is one of the people we have to thank for replacing compact, binary config files"

    Who the hell said anything about config files?

    And we have tools to make things "compact" for us. It's called "compression".

    >> "with 'human-readible', resource-intensive XML, that needs specialized libraries to make sense of it? "

    Yes. Human readable. I'm a human. I can read it. Thus: Human readable. I don't understand what the quotes were for. Or your misspelling of "readable".

    And "specialized libraries"? Oh, right.. I forgot. Binary formats don't NEED libraries to parse. Yep. Dunno why libjpeg62 even exists, when it's patently obvious you can just dump jpeg data straight to video memory. Oh yeah, who needs Microsoft Word. I just "cat resume.doc >/dev/lp" to print my documents. Cause it's binary you see. I don't need a library to parse it.

    >> "Thanks Tim, the world owes you one!

    But okay you're right, you gotta use those CPU cycles for something... "

    No shit sherlock. Using CPU cycles to strictly check the type-validity of self-describing documents seems pretty worthwhile to me.

    -Laxitive

  4. Re:Why, oh why, did they have to repeat the tag na by Alomex · · Score: 5, Insightful

    why the hell does the end tag name have to be repeated?

    Because that is the single biggest source of headaches in parsing SGML, the precursor of XML, in which such a construct is allowed.

    It also makes error recovery very difficult, something that we know is quite important from all that malformed HTML code out there. The XML creators knew that too.

  5. What it should have looked like by Anonymous Coward · · Score: 5, Insightful

    I think XML should have looked more like this:

    (html
    (head
    (title "This is an example"))
    (body
    (h1 "A first level header")
    (p "There's no reason for all the extra characters.")
    (p "Although this looks like LISPy HTML it could have all the features of XML")))
  6. [OT] bad summary by hankaholic · · Score: 4, Insightful
    Tim reveals where the idea for XML actually came from: Tim's work on the OED at Waterloo.
    If you believe that "OED" will be misunderstood by enough people to justify enclosing it with a link to a definition, why not just spell out "Oxford English Dictionary"?

    "Hmmm, OED might be unclear to tons of people reading this, I'll make them have to click on a link to know what I'm talking about."

    Obligatory relation to discussion content:

    Providing a link instead of writing a clear summary is choosing the wrong tool for the task at hand. Authors of some other comments in this thread have shown that XML also is the wrong tool for many of the tasks to which it is applied. Whether it's passing data internally within an application or summarizing an article for the homepage, choosing the right alternative can make a difference between efficient clarity and an inelegant kludge.

    Applying the right algorithmic tool to the right problem is actually a focus of CS. This is why sorting routines are often studied -- for instance, a routine which is more efficient at sorting millions of unordered pieces of data may be very wasteful when dealing with nearly presorted data.

    The distinction is not often understood and has more of an impact that the observer might think. For instance, when writing an application for a handheld in which data is kept sorted and is usually viewed between insertions it makes sense to sort after every data element added to the database. However, this means adding a single item to a mostly-ordered set. Understanding that quicksort is a poor choice for this application means a difference in battery life.
    --
    Somebody get that guy an ambulance!
  7. Re:Semantic web snake oil... by Jagasian · · Score: 4, Insightful

    If your post could be modded above a "5", I would mod your post as "insightful". I guess people have no memory, and that is why these Semantic Web frauds get grants, venture cap, etc. They have these big promises of seemlessly integrating web services... AUTOMATICALLY?!?!

    The easiest way to disprove their crap is this. Even in RDF or OWL, it is possible to have "semantic aliasing", i.e. multiple ways of representing the same concept. This is exactly the core problem that they claim they address and that they claim that XML does not address. Think about it, how can automated inferences be made, if two concepts have distinct _semantic_ (not just syntactic) representations? Furthermore, it can be shown that in general these different representations cannot be automatically determined to represent the same thing.

    So their entire project is a farce! It is a bunch of people that are both ignorant of pertinent theoretical mathematical results on computability, completeness, and hell, the fact that even in axiomatic set theory there are multiple ways to represent... say... the real numbers... and they are also ignorant of practical computer/sofware engineering and sociological limitations.

    They have stop-gaps: ontologies. Oh if only people could agree on one common unified ontology, the entire semantic aliasing problem would be solved... or so they seem to think. But just because people agree on a common vocabulary, the way it is used can still give rise to the semantic aliasing problem. So even though the fact that agreeing on some complete or near-complete ontology is going to be IMPOSSIBLE, even if it was done, it still wouldn't fix the deep underlying problems with the Semantic Web - problems that have been struggled with for over 100s years in the field of formal mathematics.

  8. Re:Intra-vendor XML is (usually) stupid by mi · · Score: 3, Insightful
    Then you are not using XML right.

    Does anybody?.. I guess, not...

    clearly you guys are spending too much time coding and not enough thinking

    No disagreement here -- that was my point, in fact.

    two thousand floating points ain't a giant string, unless you are programming an 8086 in Elbonia.

    Just tested simply sprintf-ing the same double 2000 times into the same text buffer on a PII-Xeon @450MHz with 2Mb of L2-cache, the whole program and the puny buffer are entirely in cache (which is not the case in real-life). 5-16 milliseconds (of user time, ignoring the sys-time)... The PII is not much slower, than the Sparcs we are using. Even if the latest and greatest CPUs are 10 times faster (which they aren't), why waste their power on chewing XML tags?

    Converting two thousand numbers to text should take 50 microseconds at the most.

    Now add the time to parse it on the other end, and consider, that the whole point of passing it is to have some computations happen. And the computations themselves happen in about 200 milliseconds...

    Now realize that size of the XML-file is 3-4 times bigger than it needs to be -- but the network packets are still 1500 bytes and with XML we need 5 or 6 (at best) instead of 2. Bandwidth is cheap, but latency is not...

    Now throw in the loss of precision from the double-text-double conversion(s) and climb up the wall next to me...

    Using XML in such scenarios is like overnighting papers from one end of the office floor to the other. Defending this practice is like saying, that FedEx is really fast and efficient everywhere except in Elbonia...

    --
    In Soviet Washington the swamp drains you.