Tim Bray On The Origin Of XML

SGML by Anonymous Coward · 2005-03-18 14:50 · Score: 3, Interesting

I think it's very funny that XML looks like it is based on SGML.

But according to the interview, it seems that the similarities are merely coincidental.

Can't Microsoft do *anything* original? by kelzer · 2005-03-18 14:50 · Score: 2, Interesting

From the "Jim Gray" link:

Jim Gray is a "Distinguished Engineer" in Microsoft's Scaleable Servers Research Group and manager of Microsoft's Bay Area Research Center (BARC).

OK, Xerox has their famous Palo Alto Reseach Center (PARC), so Microsoft just has to have its own similarly named center in the same general vicinity. Sheesh!

--

---------------------------------------------
SERENITY NOW!!!!!!!!!!!!!!!!

well... by rune2 · 2005-03-18 14:55 · Score: 2, Interesting

I was damned by [GNU Project founder] Richard Stallman in egregiously profane language for working on it.

Why do I not find this hard to believe...

This is article is amazingly honest by tabkey12 · 2005-03-18 14:59 · Score: 4, Interesting

JG I assume that the burning issue was keeping it simple.

TB And we missed. XML is a lot more complex than it really needs to be. It's just unkludgy enough to make it over the goal line. The burning issues? People were already starting to talk about using the Web for various kinds of machine-to-machine transactions and for doing a lot of automated processing of the things that were going through the pipes.

Amazingly, for such a popular method of 'communication' between and within applications, XML is admitted by most to be rather flawed and bulky...

--
Get a free iPod Nano 4GB!

Re:This is article is amazingly honest by Camel+Pilot · 2005-03-18 15:18 · Score: 3, Interesting

I current working on a project that is doing machine-to-machine transactions. We started off using XML to bundle and unbundle the data. However as the data rates went up performance went south.

Some bright bunny came up with the idea of using perl stringified data structures instead using Data::Dumper.

On the receiveing end the data structure is Safe eval'ed and viola there is the data - orders of magnitude faster and there is still the ability to read or edit the data via text editor.

XML is just a representation of hierarchy data via named parameters and list. Perl (or Python if want) or very adept at parsing code strings.

Also with code structures you can add dynamic functionality like

'rsv_time' = localtime(time)

which you can't with XML...
Re:This is article is amazingly honest by hqm · 2005-03-18 16:31 · Score: 2, Interesting

People should use CommonLisp S-expressions instead of XML. S-expressions have the advantage that they have basic datatypes built into the format (string, list, ints, floats, symbols), and the namespace model is much more straightforwards.

Why, oh why, did they have to repeat the tag name? by Anonymous Coward · 2005-03-18 15:18 · Score: 3, Interesting

I work with XML every day. And every day I wonder the same thing: why the hell does the end tag name have to be repeated? Why can't it just be optional? In other words, why can't it just be abbreviated as: <tagname>data</> ?

Oh MAN I wish they could have done just that one little thing for us. It would cut our datagram size down by at least 30%, maybe more.

Right in front of you, Tim! by Anonymous Coward · 2005-03-18 15:36 · Score: 4, Interesting

You know, the people who invented XML were a bunch of publishing technology geeks, and we really thought we were doing the smart document format for the future. Little did we know that it was going to be used for syndicated news feeds and purchase orders.

The most amazing thing is that back then in 1995-1996 at Open Text we were already using SGML as a data exchange protocol. All of us there (including Tim) ought to have known that XML would also have a life as a computer-to-computer communication protocol. Problem was that at the time so much of the SGML discourse was wrapped around the content versus format debate that we missed the obvious: the main of use of XML was not a replacement for HTML as a text format for the web, but as a kind of uber ASCII to allow the ready exchange of data between disimilar applications (just like ASCII in its time had eased the transfer of data between dismilar hardware and/or software platforms).

Semantic web snake oil... by Alomex · 2005-03-18 15:45 · Score: 5, Interesting

TB: I spent two years sitting on the Web consortium's technical architecture group, on the phone every week and face-to-face several times a year with Tim Berners-Lee. To this day, I remain fairly unconvinced of the core Semantic Web proposition.

Everyone who has actually done work on knowledge representation in the real world knows that this is a huge, difficult problem, unlikely to be solved anytime soon, as Tim Bray claims.

The only people who claim otherwise are either frauds or ignorant. The Semantic Web initiative has both: Tim Berners-Lee is very smart, but not a computer scientist, so he's not aware of the size of the challenge, plus he's a genuinely nice person, so he tends to trust others too much.

He has surrounded himself with the snake oil AI salesmen from the early 1980s who had promised us impending ubiquitous intelligent computers. Those fraudsters got found out back then, and spent the next fifteen years in academic limbo, only to be rescued by Tim Berners-Lee naivete.

Re:Oh boy... by Evil+Grinn · 2005-03-18 15:46 · Score: 3, Interesting

replacing compact, binary config files with 'human-readible', resource-intensive XML

Like what, the Windows registry? Don't say shit like that or ESR will shoot with one of those guns he collects.

http://www.faqs.org/docs/artu/ch03s01.html#id288 82 98

--
where there's fish, there's cats

Intra-vendor XML is (usually) stupid by mi · 2005-03-18 16:18 · Score: 5, Interesting

It drives me up the wall, that my employer is using XML to let parts of their own application communicate with other parts. DTDs are not used and all parts still need to be modified/recompiled whenever one of them changes. Same people maintain both ends of the communication.

Theirs is, in reality, a proprietory format, but to stay buzz-word compliant they use XML, which hurts performance -- sometimes dearly...

For example, to pass a couple of thousands of floating-point numbers from front end to a computation engine, each is converted to text string with something like <Parameter> around it. The giant strings (memory is cheap, right?) are kept in memory until the whole collection is ready to be sent out... The engine then parses the arriving XML and fills out the array of doubles for processing.

It really is disgusting, especially since freely available alternatives exist... For instance, PVM solved the problem of efficiently passing datasets between computers a decade ago, but nooo, we only studied XML in college -- and it is, like, really cool, dude...

--
In Soviet Washington the swamp drains you.

Re:Intra-vendor XML is (usually) stupid by Alomex · 2005-03-18 16:39 · Score: 2, Interesting

It drives me up the wall, that my employer is using XML to let parts of their own application communicate with other parts. DTDs are not used and all parts still need to be modified/recompiled whenever one of them changes.

Then you are not using XML right. For one the format shouldn't be changing much, if it is clearly you guys are spending too much time coding and not enough thinking. Second any application that does not use the new attribute should be able to ignore it without any compilation change. Third, two thousand floating points ain't a giant string, unless you are programming an 8086 in Elbonia. Converting two thousand numbers to text should take 50 microseconds at the most.

Re:Oh boy... by Short+Circuit · 2005-03-18 16:51 · Score: 2, Interesting

Idle 90% of the time, but swamped for the 10% of the time you're waiting on results.

We need to shift applications from a event-compute-display model to a predict-compute-event-display model.

Caching data and intermediate data structures helps. Possibly even pre-computing them, when available memory permits.

For example, let's say you've just entered a formula into a spreadsheet. The spreadsheet app can prepare the results of what would happen if you, for example, filled a row or column of cells with the formula.

--
tasks(723) drafts(105) languages(484) examples(29106)

The almighty Q by Anonymous Coward · 2005-03-18 18:20 · Score: 1, Interesting

Q: How does an XML newbie go about learing what it is including xslt, dtd, and how to structure xml, xslt, dtd so that it does not break in 5 years and is not ungodly complex?

My initial impression is that XML is essentially as good as the VSAM/ISAM/Network Database Model and for similar reasons may drop out of use after 10 years.

Re:Oh boy... by LordHunter317 · 2005-03-18 18:47 · Score: 2, Interesting

Except the XML file tells the parser where its own definition is. Each of the XML files inside of an OO.o package tell you how to figure out what they are.
It's not quite that simple. XML files have two definitions: the DTD and the schema. The DTD is required for validation (i.e., well-formed XML), the schema for retreiving the layout of about the elements (i.e., an integer goes in the foo attribute). Neither are required for an XML document (though you must have a DTD if you want to validate it). Schemas aren't required at all, and that's what you want if you really to be able to progmatically manipulate XML without knowing anything it's form. Even then, they may not very useful; they'll tell you what's legal content in a element, but they still tell you nothing about what's supposed to go into to that element (i.e., what does the data stored in element 'foo' mean)? DTDs are useless for telling you anything about the content as well; they are a holdover from the SGML days.

I should go further to point out that OO does define DOCTYPES, but doesn't define any XML Schema information. Even if it did, that still doesn't tell me what the tag 'font-attribute' means. You still have to structure your XML schema in such a manner that a human can interpret meaning. So 'human-readable' is still in the eye of the beholder. XML doesn't go any further to rectify this than any other format. Making your data XML doesn't automatically make it human-readable. It's just like naming variables in a programming language: the name is arbitrary, but a good name will tell me what the variable is supposed to be holding (e.g., 'tmp' vs. 'lookup_value').

As an aside, were you referring to the xmlns declaration when you said, "A generic XML parser can at least find the URI to the file's type definition"? Those don't actually have any real-world meaning. They exist solely to let the XML parser know that the namespace I call 'foo' in one document and the namespace I call 'bar' in the second document are the same namespace. They don't have to have any real-world relevance (though they often do). They play no role in valdiation besides for the namespace identification I mentioned. If you look up the namespaces even for 'offical' XML groups you'll see they usually link to their documentation, not to a DTD or anything.
Some parsers do smart things with some of the well-known namespace URIs, but there is no requirement for them to do so AFAIK.

Re:What it should have looked like by pkphilip · 2005-03-18 20:36 · Score: 2, Interesting

Yes, I think this definitely looks more sensible. It would have reduced the size of documents considerably and it does look cleaner.

Consider a XML snippet:

<sampletag name="this" type="that">
Some value
</sampletag>

This could be translated into
(sampletag [name="this"] [type="that"]Some value)

which is much smaller.

I wonder if someone will consider this for real

Re:Oh boy... by lahi · 2005-03-19 00:33 · Score: 2, Interesting

I just want to state - again - that I think that Tim Berners-Lee ought to be fined heavily _and_ imprisoned for designing HTTP and HTML. Both contain uncountable design errors, which we have had to work around constantly ever since. He has done a tremendous disservice to the Internet Community. The HTTP protocol is simply a perverted form of the Gopher protocol (which itself was a trivial elaboration of the finger protocol, which is only good as protocol sample code.) And not having a proper SGML DTD from the start, but just a "loosely based on SGML" definition of HTML was outright criminal.

Oh, and the definitions of URI and URLs also sucks! Defining any constraints on the local part is the biggest mistake ever. URIs should have been like mail addresses and message IDs, which were the two prevalent object identifiers before the URL: both have a host part which defines the host to which they apply, and a local part which is just that: local - no meaning defined by the protocol. If that had been the case, there would be no need for stupid URL-encoding, which can be done wrong in so many ways, that I frankly doubt there is any way to actually do it right consistently.

-Lasse

Slashdot Mirror

Tim Bray On The Origin Of XML

17 of 218 comments (clear)