Slashdot Mirror


Tim Bray On The Origin Of XML

gManZboy writes "Queue just posted an interview with XML co-inventor Tim Bray (currently at Sun Microsystems). Interestingly enough the interviewer is none other than database pioneer Jim Gray (currently at Microsoft). Among other things, in their discussion Tim reveals where the idea for XML actually came from: Tim's work on the OED at Waterloo."

61 of 218 comments (clear)

  1. OH come on.. by Anonymous Coward · · Score: 3, Funny

    We all know Microsoft invented XML, how else could have filed a patent for it:)

    1. Re:OH come on.. by Anonymous Coward · · Score: 2, Funny

      I thought it was Al Gore who invented XML.

    2. Re:OH come on.. by Mistlefoot · · Score: 2, Informative

      Microsoft is not applying for a patent on XML but rather, a patent

      " that cover word processing documents stored in the XML (Extensible Markup Language) format. The proposed patent would cover methods for an application other than the original word processor to access data in the document."

      <URL:http://news.com.com/2100-1013_3-5146581.htm l/ >

    3. Re:OH come on.. by ikkonoishi · · Score: 3, Funny

      I resent that.

      I never had a day of training in my life!

      OHH a banana!

  2. here's my question.. can you decrypt this? by peculiarmethod · · Score: 3, Funny

    < td padding="5px" > I'm < td >

    --
    ** "It's not my job to stand between the people talking to me, and the ones listening to me." -- Pego the Jerk
    1. Re:here's my question.. can you decrypt this? by holy_robot · · Score: 3, Funny

      Your cell is open.

      --
      Just cause you feel it doesn't mean it's there.
    2. Re:here's my question.. can you decrypt this? by Segway+Ninja · · Score: 5, Funny

      You should be in a padded cell, but someone forgot to close it.

    3. Re:here's my question.. can you decrypt this? by Anonymous Coward · · Score: 5, Funny

      More correctly that, in a, say, riddle.html, should read (notice the closing ):

      <?xml version="1.0" encoding="UTF-8"?>
      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
      < html xmlns="http://www.w3.org/1999/xhtml">
      <head>
      <title>Riddle</title>
      <link rel="stylesheet" href="/design/default.css" type="text/css" title="Default Stylesheet" />
      </head>
      <body>
      <table>
      <tr>
      <td class="example">I'm</td>
      </tr>
      </table>
      <p class="W3C">
      <a class="debug external" href="http://validator.w3.org/check?uri=referer">< img class="debug" src="http://www.w3.org/Icons/valid-xhtml11" alt="Valid XHTML 1.1!" /></a>
      <a class="debug external" href="http://jigsaw.w3.org/css-validator/check/ref erer"><img class="debug" src="http://jigsaw.w3.org/css-validator/images/vcs s" alt="Valid CSS!" /></a>
      </p>
      </body>
      </html>

      With a corresponding /design/default.css like:
      td.example { padding: 5px; }
      p.W3C { display: none; }

      Additionally you should take care that your .htaccess includes (to correct the application/xhtml+xml to text/html for IE & Co...):
      RewriteEngine on
      RewriteBase /
      RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml
      RewriteCond %{HTTP_ACCEPT} !application/xhtml\+xml\s*;\s*q=0
      RewriteCond %{REQUEST_FILENAME} \.html$
      RewriteCond %{THE_REQUEST} HTTP/1\.1
      RewriteRule .* - [T=application/xhtml+xml]

      Of course there's a serious lack of meta-data here, The padding should be given in cm (or any other absolute measure) or em and it's not fulfilling W3C Accessability Guidelines... :-P

      And now I need to overcome the Lameness filter, oh dear... I assume it's the whitespace which I used for indentation. *shrugs* It doesn't help so far, sometimes I wonder how I'm supposed to write real comments including code examples here. Slashdot sure ssems stupid sometimes.

  3. SGML by Anonymous Coward · · Score: 3, Interesting

    I think it's very funny that XML looks like it is based on SGML.

    But according to the interview, it seems that the similarities are merely coincidental.

    1. Re:SGML by smallpaul · · Score: 2, Informative

      XML is defined as a subset of SGML. From the specification:

      "The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document."

    2. Re:SGML by JohnQPublic · · Score: 2, Informative

      GML even had tags for doing Gantt charts, and I would dearly love to find a publishing system that could do printouts from such tags. ... ... Here it is 10 years later, and we still haven't gotten back to the level of ease of use and flexibility that GML had in the '80s

      You're looking for Gary Richtmeyer's B2H program, available from IBM's z/VM download site. It's written in Rexx and runs on every system you're likely to be using, comes in source form, and can process just about everything the BookMaster markup can dish out (even the syntax diagram tags).

  4. Lisp strikes again by Dancin_Santa · · Score: 5, Funny

    How's that old saying go?

    Those that do not understand Lisp are doomed to reinvent it, badly.

    Why can't someone reinvent C so that it sucks less?

    1. Re:Lisp strikes again by r2q2 · · Score: 2, Informative

      I believe you are refering to greenspuns 10th law .http://c2.com/cgi/wiki?GreenspunsTenthRuleOfProgr amming

      --
      My UID is prime is yours?
  5. Can't Microsoft do *anything* original? by kelzer · · Score: 2, Interesting

    From the "Jim Gray" link:

    Jim Gray is a "Distinguished Engineer" in Microsoft's Scaleable Servers Research Group and manager of Microsoft's Bay Area Research Center (BARC).

    OK, Xerox has their famous Palo Alto Reseach Center (PARC), so Microsoft just has to have its own similarly named center in the same general vicinity. Sheesh!

    --

    ---------------------------------------------
    SERENITY NOW!!!!!!!!!!!!!!!!
    1. Re:Can't Microsoft do *anything* original? by Baricom · · Score: 2

      Bay Area Research Center (BARC).

      Woof!

  6. Oh boy... by Alwin+Henseler · · Score: 2, Insightful
    So this guy Tim Bray is one of the people we have to thank for replacing compact, binary config files with 'human-readible', resource-intensive XML, that needs specialized libraries to make sense of it?

    Thanks Tim, the world owes you one!

    But okay you're right, you gotta use those CPU cycles for something...

    --Don't give the world what it asks for, but what it needs.

    1. Re:Oh boy... by MrLint · · Score: 4, Insightful

      Umm doesnt any kind of config file require specialized code to read it?

      As you wither need metadata to interpret the binary data, or know the predetermined data layout to read it, that sounds kinda specialized to me.

      The other option is plain text with encoded binary data. This isnt bad, its human readable, kinda, it doesnt explain the encoded binary data. metadata is also needed. I can think of xinitrc files and old ini files from win16. Has to be parsed as plain text. No guarantee of best practice or anything

      XML, well human readable, some meta info. still encoded binary data. This bonus here is the layout has at least some kinda standard to adhere to, and its possible in theory for one XML parser to read any arbitrary XML file.

      So in any case you get a deal with faust. Not human readable, or something that needs to be parsed.

    2. Re:Oh boy... by Alomex · · Score: 4, Insightful

      Try making sense of your "compact binary config files" when something goes wrong, or when you want to port the config to a different application.

      Yes, CPU cycles are cheap. CPUs sit idle over 90% of the time, even when there is a user in front of it. Spending the extra power processing 10K properly tagged files that are compatible across platforms rather than incompatible binary files is one of the best uses of raw CPU power we had.

    3. Re:Oh boy... by Laxitive · · Score: 4, Insightful

      Uhm, sorry, do you even know what the hell you're talking about?

      Let's dissect this piece by piece.

      >> "So this guy Tim Bray is one of the people we have to thank for replacing compact, binary config files"

      Who the hell said anything about config files?

      And we have tools to make things "compact" for us. It's called "compression".

      >> "with 'human-readible', resource-intensive XML, that needs specialized libraries to make sense of it? "

      Yes. Human readable. I'm a human. I can read it. Thus: Human readable. I don't understand what the quotes were for. Or your misspelling of "readable".

      And "specialized libraries"? Oh, right.. I forgot. Binary formats don't NEED libraries to parse. Yep. Dunno why libjpeg62 even exists, when it's patently obvious you can just dump jpeg data straight to video memory. Oh yeah, who needs Microsoft Word. I just "cat resume.doc >/dev/lp" to print my documents. Cause it's binary you see. I don't need a library to parse it.

      >> "Thanks Tim, the world owes you one!

      But okay you're right, you gotta use those CPU cycles for something... "

      No shit sherlock. Using CPU cycles to strictly check the type-validity of self-describing documents seems pretty worthwhile to me.

      -Laxitive

    4. Re:Oh boy... by Evil+Grinn · · Score: 3, Interesting

      replacing compact, binary config files with 'human-readible', resource-intensive XML

      Like what, the Windows registry? Don't say shit like that or ESR will shoot with one of those guns he collects.

      http://www.faqs.org/docs/artu/ch03s01.html#id288 82 98

    5. Re:Oh boy... by Short+Circuit · · Score: 2, Interesting

      Idle 90% of the time, but swamped for the 10% of the time you're waiting on results.

      We need to shift applications from a event-compute-display model to a predict-compute-event-display model.

      Caching data and intermediate data structures helps. Possibly even pre-computing them, when available memory permits.

      For example, let's say you've just entered a formula into a spreadsheet. The spreadsheet app can prepare the results of what would happen if you, for example, filled a row or column of cells with the formula.

    6. Re:Oh boy... by Faust · · Score: 5, Funny

      hi!

    7. Re:Oh boy... by LordHunter317 · · Score: 2, Interesting

      Except the XML file tells the parser where its own definition is. Each of the XML files inside of an OO.o package tell you how to figure out what they are.
      It's not quite that simple. XML files have two definitions: the DTD and the schema. The DTD is required for validation (i.e., well-formed XML), the schema for retreiving the layout of about the elements (i.e., an integer goes in the foo attribute). Neither are required for an XML document (though you must have a DTD if you want to validate it). Schemas aren't required at all, and that's what you want if you really to be able to progmatically manipulate XML without knowing anything it's form. Even then, they may not very useful; they'll tell you what's legal content in a element, but they still tell you nothing about what's supposed to go into to that element (i.e., what does the data stored in element 'foo' mean)? DTDs are useless for telling you anything about the content as well; they are a holdover from the SGML days.

      I should go further to point out that OO does define DOCTYPES, but doesn't define any XML Schema information. Even if it did, that still doesn't tell me what the tag 'font-attribute' means. You still have to structure your XML schema in such a manner that a human can interpret meaning. So 'human-readable' is still in the eye of the beholder. XML doesn't go any further to rectify this than any other format. Making your data XML doesn't automatically make it human-readable. It's just like naming variables in a programming language: the name is arbitrary, but a good name will tell me what the variable is supposed to be holding (e.g., 'tmp' vs. 'lookup_value').

      As an aside, were you referring to the xmlns declaration when you said, "A generic XML parser can at least find the URI to the file's type definition"? Those don't actually have any real-world meaning. They exist solely to let the XML parser know that the namespace I call 'foo' in one document and the namespace I call 'bar' in the second document are the same namespace. They don't have to have any real-world relevance (though they often do). They play no role in valdiation besides for the namespace identification I mentioned. If you look up the namespaces even for 'offical' XML groups you'll see they usually link to their documentation, not to a DTD or anything.
      Some parsers do smart things with some of the well-known namespace URIs, but there is no requirement for them to do so AFAIK.

    8. Re:Oh boy... by lahi · · Score: 2, Interesting

      I just want to state - again - that I think that Tim Berners-Lee ought to be fined heavily _and_ imprisoned for designing HTTP and HTML. Both contain uncountable design errors, which we have had to work around constantly ever since. He has done a tremendous disservice to the Internet Community. The HTTP protocol is simply a perverted form of the Gopher protocol (which itself was a trivial elaboration of the finger protocol, which is only good as protocol sample code.) And not having a proper SGML DTD from the start, but just a "loosely based on SGML" definition of HTML was outright criminal.

      Oh, and the definitions of URI and URLs also sucks! Defining any constraints on the local part is the biggest mistake ever. URIs should have been like mail addresses and message IDs, which were the two prevalent object identifiers before the URL: both have a host part which defines the host to which they apply, and a local part which is just that: local - no meaning defined by the protocol. If that had been the case, there would be no need for stupid URL-encoding, which can be done wrong in so many ways, that I frankly doubt there is any way to actually do it right consistently.

      -Lasse

    9. Re:Oh boy... by AaronGTurner · · Score: 2, Insightful

      There may be a lot of spare compute cycles about, but what is critical is the ability to process XML in a timely manner on the CPU power that happens to be available at that precise instant in time at the appropriate location. Looking at the average CPU cycles used is like sitting in a traffic jam at 8am and noting that, on average, the road you are on is only used at 10% capacity. It being free at 4am is not much good if you are trying to get to work for 9am.

  7. well... by rune2 · · Score: 2, Interesting

    I was damned by [GNU Project founder] Richard Stallman in egregiously profane language for working on it.

    Why do I not find this hard to believe...

  8. pioneer ... currently at Microsoft by i.collect.spam · · Score: 3, Funny

    "database pioneer ... (currently at Microsoft)" translated for slashdot readers: "sellout"

  9. This is article is amazingly honest by tabkey12 · · Score: 4, Interesting
    JG I assume that the burning issue was keeping it simple.

    TB And we missed. XML is a lot more complex than it really needs to be. It's just unkludgy enough to make it over the goal line. The burning issues? People were already starting to talk about using the Web for various kinds of machine-to-machine transactions and for doing a lot of automated processing of the things that were going through the pipes.

    Amazingly, for such a popular method of 'communication' between and within applications, XML is admitted by most to be rather flawed and bulky...

    1. Re:This is article is amazingly honest by Camel+Pilot · · Score: 3, Interesting

      I current working on a project that is doing machine-to-machine transactions. We started off using XML to bundle and unbundle the data. However as the data rates went up performance went south.

      Some bright bunny came up with the idea of using perl stringified data structures instead using Data::Dumper.

      On the receiveing end the data structure is Safe eval'ed and viola there is the data - orders of magnitude faster and there is still the ability to read or edit the data via text editor.

      XML is just a representation of hierarchy data via named parameters and list. Perl (or Python if want) or very adept at parsing code strings.

      Also with code structures you can add dynamic functionality like

      'rsv_time' = localtime(time)

      which you can't with XML...

    2. Re:This is article is amazingly honest by sicking · · Score: 2, Insightful

      Amazingly, for such a popular method of 'communication' between and within applications, XML is admitted by most to be rather flawed and bulky...

      Yep. That didn't stop Microsoft from adding even more weight to it by creating SOAP though. Now there's a bulky format. It's like shipping a shirt-button in container on an oiltanker.

      --
      Failing to learn from history dooms you to repeat it.
    3. Re:This is article is amazingly honest by hqm · · Score: 2, Interesting

      People should use CommonLisp S-expressions instead of XML. S-expressions have the advantage that they have basic datatypes built into the format (string, list, ints, floats, symbols), and the namespace model is much more straightforwards.

    4. Re:This is article is amazingly honest by filmmaker · · Score: 2, Insightful

      That depends on what you're transacting. Plus, there's a forest for the trees issue here. We're already using a sub-set of XML for most HTTP transactions - that is, HTML. A move to XML standards simply opens up a huge array of opportunities for robotic transactions, as well as leaving the field relatively wide open for web developers of traditional varieties. It's a positive good, RSS, being an obvious example of why.

    5. Re:This is article is amazingly honest by Decaff · · Score: 2, Informative

      XML is just a representation of hierarchy data via named parameters and list.

      It is far more than that.

      It conforms to a standard. It allows its format to be extended in standard ways without breaking the original meaning. It has rules for allowing internationalisation. Also, there are a large number of efficient parsers and processors already written for it in almost every language.

      Also with code structures you can add dynamic functionality like

      'rsv_time' = localtime(time)


      The XML dialect known as XSLT allows for such dynamic functionality, and in a standard way.

      which you can't with XML...

    6. Re:This is article is amazingly honest by hankaholic · · Score: 2, Informative

      I think you may have misread. He said "blah blah blah instead using Data::Dumper", not "blah blah blah instead of using Data::Dumper".

      If you haven't misread, your post was a little unclear, but I thought I'd respond by posting instead of with a nondescript "Overrated" mod.

      --
      Somebody get that guy an ambulance!
  10. happy gilmore quote by wolfgang_spangler · · Score: 3, Funny

    Gray interviews Bray, should have done it in May. Over by the bay.

    Is the my karma burning? Oh what the hay.

  11. The Origin of XML by TimeTraveler1884 · · Score: 5, Funny

    That's hogwash. Everyone knows that the idea for XML came from the tablets of stone that Moses brought down from Mount Sinai. In these tablets were the beginnings of self-describing data. That alone was where the commandments of W3C was originally sent out to the world.

    But only in the last decade have scholars used transformation style sheets and super-computers to find more declarative complex types, hidden in the original Hebrew CDATA. It is thought there are tens if not hundreds of specifications in these texts that may never have a finalized draft.

    Progress has been slow, while the discovery of SOAP in the 1800's has made the hygiene of data possible, there much that has yet to be standardized. Considering the aging DTD schemas left from the era of King James, it will be crucial to the data-exchange of humanity to uncover more secrets of XML.

  12. Why, oh why, did they have to repeat the tag name? by Anonymous Coward · · Score: 3, Interesting


    I work with XML every day. And every day I wonder the same thing: why the hell does the end tag name have to be repeated? Why can't it just be optional? In other words, why can't it just be abbreviated as: <tagname>data</> ?

    Oh MAN I wish they could have done just that one little thing for us. It would cut our datagram size down by at least 30%, maybe more.

  13. Jim Gray interviews Tim Bray by Saeed+al-Sahaf · · Score: 4, Funny
    Jim Gray interviews Tim Bray Right, sure.

    Have you ever seen these guys in the same room at the same time? No? I thought as much.

    --
    "Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
  14. Right in front of you, Tim! by Anonymous Coward · · Score: 4, Interesting

    You know, the people who invented XML were a bunch of publishing technology geeks, and we really thought we were doing the smart document format for the future. Little did we know that it was going to be used for syndicated news feeds and purchase orders.

    The most amazing thing is that back then in 1995-1996 at Open Text we were already using SGML as a data exchange protocol. All of us there (including Tim) ought to have known that XML would also have a life as a computer-to-computer communication protocol. Problem was that at the time so much of the SGML discourse was wrapped around the content versus format debate that we missed the obvious: the main of use of XML was not a replacement for HTML as a text format for the web, but as a kind of uber ASCII to allow the ready exchange of data between disimilar applications (just like ASCII in its time had eased the transfer of data between dismilar hardware and/or software platforms).

  15. Semantic web snake oil... by Alomex · · Score: 5, Interesting

    TB: I spent two years sitting on the Web consortium's technical architecture group, on the phone every week and face-to-face several times a year with Tim Berners-Lee. To this day, I remain fairly unconvinced of the core Semantic Web proposition.

    Everyone who has actually done work on knowledge representation in the real world knows that this is a huge, difficult problem, unlikely to be solved anytime soon, as Tim Bray claims.

    The only people who claim otherwise are either frauds or ignorant. The Semantic Web initiative has both: Tim Berners-Lee is very smart, but not a computer scientist, so he's not aware of the size of the challenge, plus he's a genuinely nice person, so he tends to trust others too much.

    He has surrounded himself with the snake oil AI salesmen from the early 1980s who had promised us impending ubiquitous intelligent computers. Those fraudsters got found out back then, and spent the next fifteen years in academic limbo, only to be rescued by Tim Berners-Lee naivete.

    1. Re:Semantic web snake oil... by bblfish · · Score: 2, Insightful

      I work with Tim Bray, but I seriously disagree with this position of his. If you had gone back to the days before xml was invented you could have made exactly the same argument against xml: "SGML was not a success, therefore XML can't be". I have blogged about this falacious argument at length. You can work with the Semantic Web without having to take on the most difficult problems of AI. You can use it to work on some really simple problems very effectively. Speaking of "frauds", "ignorants" and "snake oil" when speaking of this project is really simplistic and (dare I turn the arrogance of the above poster against him?) stupid.

    2. Re:Semantic web snake oil... by bblfish · · Score: 2, Insightful
      Tim thinks so, and so do I.

      My suggestion to you: don't put too much weight on Tim Bray's bet. If you look carefully at his rdf.net challenge you will notice that the wording leaves him ample space to maneuvre were things to turn out agains him:

      • This has to happen before January 1, 2006, and
      • I am the sole judge and jury, but
      • Ill publicize anything thats submitted formally, and my comments on it, so Im doing this in the open, except for
      • Im busy, so I may exercise fairly brutal triage on incoming proposals and take a while to get to the ones really worth looking at, and
      • If theres serious money in it, the recipient of RDF.net is morally obligated to find a way to cut me in for a piece of the action.
      I like the last one: if someone has a idea that is going to make a lot of money they can have rdf.net for free if they cut him in on the action. wow! here is a man who really does not believe anything is going to happen :-)
    3. Re:Semantic web snake oil... by Jagasian · · Score: 4, Insightful

      If your post could be modded above a "5", I would mod your post as "insightful". I guess people have no memory, and that is why these Semantic Web frauds get grants, venture cap, etc. They have these big promises of seemlessly integrating web services... AUTOMATICALLY?!?!

      The easiest way to disprove their crap is this. Even in RDF or OWL, it is possible to have "semantic aliasing", i.e. multiple ways of representing the same concept. This is exactly the core problem that they claim they address and that they claim that XML does not address. Think about it, how can automated inferences be made, if two concepts have distinct _semantic_ (not just syntactic) representations? Furthermore, it can be shown that in general these different representations cannot be automatically determined to represent the same thing.

      So their entire project is a farce! It is a bunch of people that are both ignorant of pertinent theoretical mathematical results on computability, completeness, and hell, the fact that even in axiomatic set theory there are multiple ways to represent... say... the real numbers... and they are also ignorant of practical computer/sofware engineering and sociological limitations.

      They have stop-gaps: ontologies. Oh if only people could agree on one common unified ontology, the entire semantic aliasing problem would be solved... or so they seem to think. But just because people agree on a common vocabulary, the way it is used can still give rise to the semantic aliasing problem. So even though the fact that agreeing on some complete or near-complete ontology is going to be IMPOSSIBLE, even if it was done, it still wouldn't fix the deep underlying problems with the Semantic Web - problems that have been struggled with for over 100s years in the field of formal mathematics.

  16. Re:Why, oh why, did they have to repeat the tag na by Alomex · · Score: 5, Insightful

    why the hell does the end tag name have to be repeated?

    Because that is the single biggest source of headaches in parsing SGML, the precursor of XML, in which such a construct is allowed.

    It also makes error recovery very difficult, something that we know is quite important from all that malformed HTML code out there. The XML creators knew that too.

  17. Intra-vendor XML is (usually) stupid by mi · · Score: 5, Interesting
    It drives me up the wall, that my employer is using XML to let parts of their own application communicate with other parts. DTDs are not used and all parts still need to be modified/recompiled whenever one of them changes. Same people maintain both ends of the communication.

    Theirs is, in reality, a proprietory format, but to stay buzz-word compliant they use XML, which hurts performance -- sometimes dearly...

    For example, to pass a couple of thousands of floating-point numbers from front end to a computation engine, each is converted to text string with something like <Parameter> around it. The giant strings (memory is cheap, right?) are kept in memory until the whole collection is ready to be sent out... The engine then parses the arriving XML and fills out the array of doubles for processing.

    It really is disgusting, especially since freely available alternatives exist... For instance, PVM solved the problem of efficiently passing datasets between computers a decade ago, but nooo, we only studied XML in college -- and it is, like, really cool, dude...

    --
    In Soviet Washington the swamp drains you.
    1. Re:Intra-vendor XML is (usually) stupid by Alomex · · Score: 2, Interesting

      It drives me up the wall, that my employer is using XML to let parts of their own application communicate with other parts. DTDs are not used and all parts still need to be modified/recompiled whenever one of them changes.

      Then you are not using XML right. For one the format shouldn't be changing much, if it is clearly you guys are spending too much time coding and not enough thinking. Second any application that does not use the new attribute should be able to ignore it without any compilation change. Third, two thousand floating points ain't a giant string, unless you are programming an 8086 in Elbonia. Converting two thousand numbers to text should take 50 microseconds at the most.

    2. Re:Intra-vendor XML is (usually) stupid by mi · · Score: 3, Insightful
      Then you are not using XML right.

      Does anybody?.. I guess, not...

      clearly you guys are spending too much time coding and not enough thinking

      No disagreement here -- that was my point, in fact.

      two thousand floating points ain't a giant string, unless you are programming an 8086 in Elbonia.

      Just tested simply sprintf-ing the same double 2000 times into the same text buffer on a PII-Xeon @450MHz with 2Mb of L2-cache, the whole program and the puny buffer are entirely in cache (which is not the case in real-life). 5-16 milliseconds (of user time, ignoring the sys-time)... The PII is not much slower, than the Sparcs we are using. Even if the latest and greatest CPUs are 10 times faster (which they aren't), why waste their power on chewing XML tags?

      Converting two thousand numbers to text should take 50 microseconds at the most.

      Now add the time to parse it on the other end, and consider, that the whole point of passing it is to have some computations happen. And the computations themselves happen in about 200 milliseconds...

      Now realize that size of the XML-file is 3-4 times bigger than it needs to be -- but the network packets are still 1500 bytes and with XML we need 5 or 6 (at best) instead of 2. Bandwidth is cheap, but latency is not...

      Now throw in the loss of precision from the double-text-double conversion(s) and climb up the wall next to me...

      Using XML in such scenarios is like overnighting papers from one end of the office floor to the other. Defending this practice is like saying, that FedEx is really fast and efficient everywhere except in Elbonia...

      --
      In Soviet Washington the swamp drains you.
  18. What it should have looked like by Anonymous Coward · · Score: 5, Insightful

    I think XML should have looked more like this:

    (html
    (head
    (title "This is an example"))
    (body
    (h1 "A first level header")
    (p "There's no reason for all the extra characters.")
    (p "Although this looks like LISPy HTML it could have all the features of XML")))
    1. Re:What it should have looked like by pkphilip · · Score: 2, Interesting

      Yes, I think this definitely looks more sensible. It would have reduced the size of documents considerably and it does look cleaner.

      Consider a XML snippet:

      <sampletag name="this" type="that">
      Some value
      </sampletag>

      This could be translated into
      (sampletag [name="this"] [type="that"]Some value)

      which is much smaller.

      I wonder if someone will consider this for real

    2. Re:What it should have looked like by ikkonoishi · · Score: 2, Insightful
      Sounds great... but then this happens

      (html
      (head
      (title "This is an example")
      (body
      (h1 "A first level header")
      (p "There's no reason for all the extra characters.")
      (p "Although this looks like LISPy HTML it could have all the features of XML")))

      Now your entire webpage is blank. What happened?
  19. Re:Why, oh why, did they have to repeat the tag na by syukton · · Score: 2, Insightful

    Yeah that'd work great if you knew 100% of the time that you'd never get bad data. If you've got a multi-nested element hierarchy however and you lose one or two of your , how do you know where to put them back in? It's very easy to look for an opening tag followed by a closing tag of the same name, especially when building a parser that error-checks.

    You know what would cut down the datagram size more? Smaller tag names. Tag names don't have to be readable so much as uniquely identifiable; you can use an interface layer in the editor to make the tag names user friendly and then de-friendify them for transit. Then you've got:

    <a>
    <b>woo&lt/b>
    </a>

    insted of:

    <element>
    <subelement>woo&lt/subelement>
    </ele ment>

    According to wc, switching to single-character element names instead of the multicharacter ones would give a 41% reduction in bulk, for the example above.

    --
    Reinvent the wheel only at either a lower cost, greater effectiveness, or your own personal enrichment and satisfaction.
  20. Not Very insightful! by stevens · · Score: 2, Informative
    I hadn't thought about that. Very insightful.

    Lots of people have thought about it. Not Very Insightful.

    The reason is that if the parser encounters unbalanced end-tags, and they're all just </>, the parser will go farther and get very confused before it dies.

    It will be very difficult to pinpoint *which* tag isn't closed, like C's optional {} after an if(), or SGML's optional closing tags.

    It's much easier to correct if your parser can say "You forgot to close <account> on line 115" rather than "Something or other is unbalanced somewhere before line 224."

  21. Please explain by johannesg · · Score: 2, Insightful

    I've heard this quote in relation to XML before, and I don't get it. LISP is a programming language. XML is a method for storing data. About the only relation between the two that I can find is that both use nesting. So, why does this get brought up whenever XML is being discussed?

    1. Re:Please explain by Anonymous Coward · · Score: 5, Informative

      johannesg writes: "I've heard this quote in relation to XML before, and I don't get it. LISP is a programming language. XML is a method for storing data. About the only relation between the two that I can find is that both use nesting. So, why does this get brought up whenever XML is being discussed?"

      Lisp source code is first parsed into S-expressions before being compiled. The programmer can manipulate these S-expressions to generate new programming constructs.

      S-expressions are nested lists of dynamically typed data. The compiler turns these nested lists into bytecode or assembly code. But before this happens you're able to manipulate a well defined, concise and platform independent data format. The format is so useful that it is also used to store and transport non-code.

      Here's a Lisp function call nested within another function call:

      (/ (+ 1 2 3) 6)

      [i.e. add 1, 2, and 3 together and then divide by 6] Let's first give different names to the function operators:

      (divide (plus 1 2 3) 6)

      Now introduce redundancy by duplicating the opening function names:

      (divide (plus 1 2 3 /plus) 6 /divide)

      Translate the dynamically typed integers to explicit type indentifiers:

      (divide (plus (integer 1 /integer) (integer 2 /integer) (integer 3 /integer)) (integer 6 /integer) /divide)

      Now convert the parentheses and spaces to angle brackets to generate XML:

      <divide>
      <plus>
      <integer>1</integer>
      <integer>2</integer>
      <integer>3</integer>
      </plus>
      <integer>6</integer>
      </divide>

      Lisp S-expressions are a method for storing/expressing data AND code. They have less overhead than XML, solve more problems than XML (comfortably human readable programming languages can also be written in S-expressions, e.g. Scheme and Common Lisp) and they were invented decades earlier.

      Regards,
      Adam Warner

  22. Explicitness by samael · · Score: 2, Insightful

    Because it would make spotting your bug harder. Did you _mean_ to close that tag, or did you think you were closing a different tag? If all closing tags look the same it would make tracing certain bugs harder.

  23. I COULD NOT AGREE MORE. gzip is our friend! by TheLittleJetson · · Score: 2, Informative

    when i work with XML in java, i generally use just pass the XML through a GZIP stream. need to see the file contents? zcat. XML compresses well since it's repetative text. Lately I've been doing a lot of XUL code with PHP/smarty as the back-end, and again, I transparently gzip this...

    So, this solves the problem of the size of the XML to be stored on disk or transmitted over network... The only difference is parsing. Again, when i'm in java, i use PICCOLO to parse the XML -- it uses a lexical analyzer (jflex?) to parse XML more like a compiler parses code, by tokenizing it. turns out, this is really fast.

    Disk space is cheap. CPU's are fast. Mainstream XML parsing technology can always be made faster. Why must we abandon our beloved, human-readable, standardized format for files and protocols alike in favor of binary files?

  24. Re:Why, oh why, did they have to repeat the tag na by ikkonoishi · · Score: 4, Informative

    < ele1> < ele2> < ele3> < /> < /> < ele4> < ele5> < /> < />

    Which element did I forget to close?

    < ele1> < ele2> < ele3> < /ele3> < /ele1> < ele4> < ele5> < /ele5> < /ele4>

    Clearer now?

  25. Originally: Bay Area Research Facility by quarkscat · · Score: 2, Funny

    also known as: BARF. The name was changed, no
    doubt, in order to instill a greater sense among
    MSFT employees there that they actually might
    (someday) have a workable product. Hence, BARC.

    XML is more complicated than it should be, but
    it is NOT a MSFT "invention", and has no business
    being patented by MSFT. Let alone, encumbered
    with their viral and restrictive and expensive
    licensing scheme. What it IS is yet another
    example of the slimey "embrace/extend/extinguish"
    monopolistic business practices of MSFT. If the
    DoJ weren't more like a 90 year old grandmother
    that misplaced her full dentures (aka the Dubya
    regime), they would have MSFT back into court to
    exact "new & improved" punishment on the 800 lb.
    gorilla.

  26. The essence of XML... by CondeZer0 · · Score: 2, Funny

    "The essence of XML is this: the problem it solves is not hard, and it does not solve the problem well." -- Phil Wadler

    --
    "When in doubt, use brute force." Ken Thompson
  27. [OT] bad summary by hankaholic · · Score: 4, Insightful
    Tim reveals where the idea for XML actually came from: Tim's work on the OED at Waterloo.
    If you believe that "OED" will be misunderstood by enough people to justify enclosing it with a link to a definition, why not just spell out "Oxford English Dictionary"?

    "Hmmm, OED might be unclear to tons of people reading this, I'll make them have to click on a link to know what I'm talking about."

    Obligatory relation to discussion content:

    Providing a link instead of writing a clear summary is choosing the wrong tool for the task at hand. Authors of some other comments in this thread have shown that XML also is the wrong tool for many of the tasks to which it is applied. Whether it's passing data internally within an application or summarizing an article for the homepage, choosing the right alternative can make a difference between efficient clarity and an inelegant kludge.

    Applying the right algorithmic tool to the right problem is actually a focus of CS. This is why sorting routines are often studied -- for instance, a routine which is more efficient at sorting millions of unordered pieces of data may be very wasteful when dealing with nearly presorted data.

    The distinction is not often understood and has more of an impact that the observer might think. For instance, when writing an application for a handheld in which data is kept sorted and is usually viewed between insertions it makes sense to sort after every data element added to the database. However, this means adding a single item to a mostly-ordered set. Understanding that quicksort is a poor choice for this application means a difference in battery life.
    --
    Somebody get that guy an ambulance!
  28. Re:Please explain-Chinese firewall. by SnowZero · · Score: 2, Insightful

    And they have infix notation...

    S-expressions are in prefix notation. Infix describes expressions such as "1+2". Lots of parenthesis is hard to read, but twice that number of angle brackets is certainly not easier.

    Blurring the line between data and code is a useful technique...

    This only matters if you use the data in Lisp without being careful. Any non-interpreted language could use it just as safely as XML.

    P.S. I don't even like Lisp, being a person who likes type checking before I actually execute a snippet of code. On the other hand, they really do have a point regarding S-expressions and XML.