Tim Bray On The Origin Of XML
gManZboy writes "Queue just posted an interview with XML co-inventor Tim Bray (currently at Sun Microsystems). Interestingly enough the interviewer is none other than database pioneer Jim Gray (currently at Microsoft). Among other things, in their discussion Tim reveals where the idea for XML actually came from: Tim's work on the OED at Waterloo."
We all know Microsoft invented XML, how else could have filed a patent for it:)
< td padding="5px" > I'm < td >
** "It's not my job to stand between the people talking to me, and the ones listening to me." -- Pego the Jerk
I think it's very funny that XML looks like it is based on SGML.
But according to the interview, it seems that the similarities are merely coincidental.
How's that old saying go?
Those that do not understand Lisp are doomed to reinvent it, badly.
Why can't someone reinvent C so that it sucks less?
"database pioneer ... (currently at Microsoft)"
translated for slashdot readers:
"sellout"
TB And we missed. XML is a lot more complex than it really needs to be. It's just unkludgy enough to make it over the goal line. The burning issues? People were already starting to talk about using the Web for various kinds of machine-to-machine transactions and for doing a lot of automated processing of the things that were going through the pipes.
Amazingly, for such a popular method of 'communication' between and within applications, XML is admitted by most to be rather flawed and bulky...
Get a free iPod Nano 4GB!
Gray interviews Bray, should have done it in May. Over by the bay.
Is the my karma burning? Oh what the hay.
That's hogwash. Everyone knows that the idea for XML came from the tablets of stone that Moses brought down from Mount Sinai. In these tablets were the beginnings of self-describing data. That alone was where the commandments of W3C was originally sent out to the world.
But only in the last decade have scholars used transformation style sheets and super-computers to find more declarative complex types, hidden in the original Hebrew CDATA. It is thought there are tens if not hundreds of specifications in these texts that may never have a finalized draft.
Progress has been slow, while the discovery of SOAP in the 1800's has made the hygiene of data possible, there much that has yet to be standardized. Considering the aging DTD schemas left from the era of King James, it will be crucial to the data-exchange of humanity to uncover more secrets of XML.
I work with XML every day. And every day I wonder the same thing: why the hell does the end tag name have to be repeated? Why can't it just be optional? In other words, why can't it just be abbreviated as: <tagname>data</> ?
Oh MAN I wish they could have done just that one little thing for us. It would cut our datagram size down by at least 30%, maybe more.
Umm doesnt any kind of config file require specialized code to read it?
As you wither need metadata to interpret the binary data, or know the predetermined data layout to read it, that sounds kinda specialized to me.
The other option is plain text with encoded binary data. This isnt bad, its human readable, kinda, it doesnt explain the encoded binary data. metadata is also needed. I can think of xinitrc files and old ini files from win16. Has to be parsed as plain text. No guarantee of best practice or anything
XML, well human readable, some meta info. still encoded binary data. This bonus here is the layout has at least some kinda standard to adhere to, and its possible in theory for one XML parser to read any arbitrary XML file.
So in any case you get a deal with faust. Not human readable, or something that needs to be parsed.
Try making sense of your "compact binary config files" when something goes wrong, or when you want to port the config to a different application.
Yes, CPU cycles are cheap. CPUs sit idle over 90% of the time, even when there is a user in front of it. Spending the extra power processing 10K properly tagged files that are compatible across platforms rather than incompatible binary files is one of the best uses of raw CPU power we had.
Have you ever seen these guys in the same room at the same time? No? I thought as much.
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
Uhm, sorry, do you even know what the hell you're talking about?
Let's dissect this piece by piece.
>> "So this guy Tim Bray is one of the people we have to thank for replacing compact, binary config files"
Who the hell said anything about config files?
And we have tools to make things "compact" for us. It's called "compression".
>> "with 'human-readible', resource-intensive XML, that needs specialized libraries to make sense of it? "
Yes. Human readable. I'm a human. I can read it. Thus: Human readable. I don't understand what the quotes were for. Or your misspelling of "readable".
And "specialized libraries"? Oh, right.. I forgot. Binary formats don't NEED libraries to parse. Yep. Dunno why libjpeg62 even exists, when it's patently obvious you can just dump jpeg data straight to video memory. Oh yeah, who needs Microsoft Word. I just "cat resume.doc >/dev/lp" to print my documents. Cause it's binary you see. I don't need a library to parse it.
>> "Thanks Tim, the world owes you one!
But okay you're right, you gotta use those CPU cycles for something... "
No shit sherlock. Using CPU cycles to strictly check the type-validity of self-describing documents seems pretty worthwhile to me.
-Laxitive
You know, the people who invented XML were a bunch of publishing technology geeks, and we really thought we were doing the smart document format for the future. Little did we know that it was going to be used for syndicated news feeds and purchase orders.
The most amazing thing is that back then in 1995-1996 at Open Text we were already using SGML as a data exchange protocol. All of us there (including Tim) ought to have known that XML would also have a life as a computer-to-computer communication protocol. Problem was that at the time so much of the SGML discourse was wrapped around the content versus format debate that we missed the obvious: the main of use of XML was not a replacement for HTML as a text format for the web, but as a kind of uber ASCII to allow the ready exchange of data between disimilar applications (just like ASCII in its time had eased the transfer of data between dismilar hardware and/or software platforms).
TB: I spent two years sitting on the Web consortium's technical architecture group, on the phone every week and face-to-face several times a year with Tim Berners-Lee. To this day, I remain fairly unconvinced of the core Semantic Web proposition.
Everyone who has actually done work on knowledge representation in the real world knows that this is a huge, difficult problem, unlikely to be solved anytime soon, as Tim Bray claims.
The only people who claim otherwise are either frauds or ignorant. The Semantic Web initiative has both: Tim Berners-Lee is very smart, but not a computer scientist, so he's not aware of the size of the challenge, plus he's a genuinely nice person, so he tends to trust others too much.
He has surrounded himself with the snake oil AI salesmen from the early 1980s who had promised us impending ubiquitous intelligent computers. Those fraudsters got found out back then, and spent the next fifteen years in academic limbo, only to be rescued by Tim Berners-Lee naivete.
replacing compact, binary config files with 'human-readible', resource-intensive XML
8 82 98
Like what, the Windows registry? Don't say shit like that or ESR will shoot with one of those guns he collects.
http://www.faqs.org/docs/artu/ch03s01.html#id28
where there's fish, there's cats
why the hell does the end tag name have to be repeated?
Because that is the single biggest source of headaches in parsing SGML, the precursor of XML, in which such a construct is allowed.
It also makes error recovery very difficult, something that we know is quite important from all that malformed HTML code out there. The XML creators knew that too.
Theirs is, in reality, a proprietory format, but to stay buzz-word compliant they use XML, which hurts performance -- sometimes dearly...
For example, to pass a couple of thousands of floating-point numbers from front end to a computation engine, each is converted to text string with something like <Parameter> around it. The giant strings (memory is cheap, right?) are kept in memory until the whole collection is ready to be sent out... The engine then parses the arriving XML and fills out the array of doubles for processing.
It really is disgusting, especially since freely available alternatives exist... For instance, PVM solved the problem of efficiently passing datasets between computers a decade ago, but nooo, we only studied XML in college -- and it is, like, really cool, dude...
In Soviet Washington the swamp drains you.
I think XML should have looked more like this:
hi!
< ele1> < ele2> < ele3> < /> < /> < ele4> < ele5> < /> < />
/ele3> < /ele1> < ele4> < ele5> < /ele5> < /ele4>
Which element did I forget to close?
< ele1> < ele2> < ele3> <
Clearer now?
johannesg writes: "I've heard this quote in relation to XML before, and I don't get it. LISP is a programming language. XML is a method for storing data. About the only relation between the two that I can find is that both use nesting. So, why does this get brought up whenever XML is being discussed?"
/plus) 6 /divide)
/integer) (integer 2 /integer) (integer 3 /integer)) (integer 6 /integer) /divide)
Lisp source code is first parsed into S-expressions before being compiled. The programmer can manipulate these S-expressions to generate new programming constructs.
S-expressions are nested lists of dynamically typed data. The compiler turns these nested lists into bytecode or assembly code. But before this happens you're able to manipulate a well defined, concise and platform independent data format. The format is so useful that it is also used to store and transport non-code.
Here's a Lisp function call nested within another function call:
(/ (+ 1 2 3) 6)
[i.e. add 1, 2, and 3 together and then divide by 6] Let's first give different names to the function operators:
(divide (plus 1 2 3) 6)
Now introduce redundancy by duplicating the opening function names:
(divide (plus 1 2 3
Translate the dynamically typed integers to explicit type indentifiers:
(divide (plus (integer 1
Now convert the parentheses and spaces to angle brackets to generate XML:
<divide>
<plus>
<integer>1</integer>
<integer>2</integer>
<integer>3</integer>
</plus>
<integer>6</integer>
</divide>
Lisp S-expressions are a method for storing/expressing data AND code. They have less overhead than XML, solve more problems than XML (comfortably human readable programming languages can also be written in S-expressions, e.g. Scheme and Common Lisp) and they were invented decades earlier.
Regards,
Adam Warner
"Hmmm, OED might be unclear to tons of people reading this, I'll make them have to click on a link to know what I'm talking about."
Obligatory relation to discussion content:
Providing a link instead of writing a clear summary is choosing the wrong tool for the task at hand. Authors of some other comments in this thread have shown that XML also is the wrong tool for many of the tasks to which it is applied. Whether it's passing data internally within an application or summarizing an article for the homepage, choosing the right alternative can make a difference between efficient clarity and an inelegant kludge.
Applying the right algorithmic tool to the right problem is actually a focus of CS. This is why sorting routines are often studied -- for instance, a routine which is more efficient at sorting millions of unordered pieces of data may be very wasteful when dealing with nearly presorted data.
The distinction is not often understood and has more of an impact that the observer might think. For instance, when writing an application for a handheld in which data is kept sorted and is usually viewed between insertions it makes sense to sort after every data element added to the database. However, this means adding a single item to a mostly-ordered set. Understanding that quicksort is a poor choice for this application means a difference in battery life.
Somebody get that guy an ambulance!