Tim Bray On The Origin Of XML

OH come on.. by Anonymous Coward · 2005-03-18 14:38 · Score: 3, Funny

We all know Microsoft invented XML, how else could have filed a patent for it:)

Re:OH come on.. by Anonymous Coward · 2005-03-18 14:41 · Score: 2, Funny

I thought it was Al Gore who invented XML.
Re:OH come on.. by Anonymous Coward · 2005-03-18 14:51 · Score: 0, Offtopic

No, Al Gore spilled boiling McDonalds coffee while using emacs.
Re:OH come on.. by LokieLizzy · 2005-03-18 15:15 · Score: 1

The fact that the parent was modded "4: Informative" is the clearest indicator yet that a pack of trained chimpanzees are responsible for moderating slashdot.

--
My digital rights don't need management.
Re:OH come on.. by Mistlefoot · 2005-03-18 15:22 · Score: 2, Informative

Microsoft is not applying for a patent on XML but rather, a patent

" that cover word processing documents stored in the XML (Extensible Markup Language) format. The proposed patent would cover methods for an application other than the original word processor to access data in the document."

<URL:http://news.com.com/2100-1013_3-5146581.htm l/ >
Re:OH come on.. by Anonymous Coward · 2005-03-18 15:23 · Score: 0

The fact that the parent was modded "4: Informative" is the clearest indicator yet that a pack of trained chimpanzees are responsible for moderating slashdot.

Come on! After all, chimpanzees have written the works of Shakespeare.
Re:OH come on.. by ikkonoishi · 2005-03-18 19:08 · Score: 3, Funny

I resent that.

I never had a day of training in my life!

OHH a banana!
Re:OH come on.. by SQLz · 2005-03-20 11:35 · Score: 1

Because the patent office is smoking pot. The good shit too.

here's my question.. can you decrypt this? by peculiarmethod · 2005-03-18 14:41 · Score: 3, Funny

< td padding="5px" > I'm < td >

--
** "It's not my job to stand between the people talking to me, and the ones listening to me." -- Pego the Jerk

Re:here's my question.. can you decrypt this? by Anonymous Coward · 2005-03-18 14:44 · Score: 0

that's easy..

"I'm in a padded cell" haha
Re:here's my question.. can you decrypt this? by holy_robot · 2005-03-18 14:57 · Score: 3, Funny

Your cell is open.

--
Just cause you feel it doesn't mean it's there.
Re:here's my question.. can you decrypt this? by Segway+Ninja · 2005-03-18 14:59 · Score: 5, Funny

You should be in a padded cell, but someone forgot to close it.
Re:here's my question.. can you decrypt this? by Anonymous Coward · 2005-03-18 15:14 · Score: 0

GODDAMN this was the funniest thing since Gary Niger showed the MacZealots what a cock is.
Re:here's my question.. can you decrypt this? by Anonymous Coward · 2005-03-18 15:15 · Score: 5, Funny

More correctly that, in a, say, riddle.html, should read (notice the closing ):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
< html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Riddle</title>
<link rel="stylesheet" href="/design/default.css" type="text/css" title="Default Stylesheet" />
</head>
<body>
<table>
<tr>
<td class="example">I'm</td>
</tr>
</table>
<p class="W3C">
<a class="debug external" href="http://validator.w3.org/check?uri=referer">< img class="debug" src="http://www.w3.org/Icons/valid-xhtml11" alt="Valid XHTML 1.1!" /></a>
<a class="debug external" href="http://jigsaw.w3.org/css-validator/check/ref erer"><img class="debug" src="http://jigsaw.w3.org/css-validator/images/vcs s" alt="Valid CSS!" /></a>
</p>
</body>
</html>

With a corresponding /design/default.css like:
td.example { padding: 5px; }
p.W3C { display: none; }

Additionally you should take care that your .htaccess includes (to correct the application/xhtml+xml to text/html for IE & Co...):
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml
RewriteCond %{HTTP_ACCEPT} !application/xhtml\+xml\s*;\s*q=0
RewriteCond %{REQUEST_FILENAME} \.html$
RewriteCond %{THE_REQUEST} HTTP/1\.1
RewriteRule .* - [T=application/xhtml+xml]

Of course there's a serious lack of meta-data here, The padding should be given in cm (or any other absolute measure) or em and it's not fulfilling W3C Accessability Guidelines... :-P

And now I need to overcome the Lameness filter, oh dear... I assume it's the whitespace which I used for indentation. *shrugs* It doesn't help so far, sometimes I wonder how I'm supposed to write real comments including code examples here. Slashdot sure ssems stupid sometimes.
Re:here's my question.. can you decrypt this? by Anonymous Coward · 2005-03-18 22:21 · Score: 0

Additionally you should take care that your .htaccess includes (to correct the application/xhtml+xml to text/html for IE & Co...):

Except serving XHTML 1.1 (or XHTML 1.0 that doesn't follow Appendix C) as text/html is against spec.. Leave off the XML declaration and switch to XHTML 1.0 if you want to support Internet Explorer.
Re:here's my question.. can you decrypt this? by Anonymous Coward · 2005-03-19 00:38 · Score: 0

Were talking XML here, so the xml declaration is mandatory (it wouldn't be if we were just referring to XHTML...). So, if we want to serve XHTML+XML, then we have to check for compatability. Given it's there, send application/xhtml+xml, otherwise remain at default for .html (which is a valid extension, even for xml), in most cases text/html, which is wrong, but then, only IE is still stupid enough not to understand application/xhtml+xml... It's a decision based upon your personal preference (OR used as coders would): Adhere to a standard in your document (xhtml OR xhtml+xml) OR adhere to the standard server side (no text/html under any cicrumstances). Personally, I prefer my documents to adhere to the standard and have the server work around it as long as it's necessary (until IE dies, which will take quite some time)...
Re:here's my question.. can you decrypt this? by Anonymous Coward · 2005-03-19 00:58 · Score: 0

Were talking XML here, so the xml declaration is mandatory

It's only necessary if you use something other than UTF-8 or UTF-16.

(it wouldn't be if we were just referring to XHTML...).

XHTML is XML. We are talking about both at once.

in most cases text/html, which is wrong

Well that was my point.

It's a decision based upon your personal preference

It's my personal preference to comply with the specifications instead of going out of spec., yes.

Adhere to a standard in your document (xhtml OR xhtml+xml) OR adhere to the standard server side (no text/html under any cicrumstances).

Which standard mandates no text/html under any circumstances?

Personally, I prefer my documents to adhere to the standard

Well then you won't serve XHTML 1.1 documents as text/html then.
Re:here's my question.. can you decrypt this? by Jerf · 2005-03-19 04:33 · Score: 1

You know, it's not Slashdot's fault you can't read. Whack "reply", and you'll see:
Allowed HTML <B> <I> <P> <A> <LI> <OL> <UL> <EM> <BR> <TT> <STRONG> <BLOCKQUOTE> <DIV> <ECODE> <DL> <DT> <DD> <CITE> (Use "ECODE" instead of "PRE" or "CODE".)
And note that I used ECODE to show that.

Ecode isn't perfect,
but it does preserve indentation without histrionics. And it preserves brackets: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> < html xmlns="http://www.w3.org/1999/xhtml"> <head>
It eats extra spaces in the text itself, though; in that example "indentation without" has four spaces between those two words. (They're probably still in the HTML, just not converted.) But the HTML was just a straight copy-paste.

SGML by Anonymous Coward · 2005-03-18 14:50 · Score: 3, Interesting

I think it's very funny that XML looks like it is based on SGML.

But according to the interview, it seems that the similarities are merely coincidental.

Re:SGML by p0rnking · 2005-03-18 19:22 · Score: 1

I do believe that xml is a "simplified", "sub-set" of SGML, same with HTML.
Re:SGML by Anonymous Coward · 2005-03-18 20:35 · Score: 0

But according to the interview, it seems that the similarities are merely coincidental.

Er, no. According to the interview XML is SGML. The 5% of SGML that anybody would want to use.
Re:SGML by Anonymous Coward · 2005-03-18 22:07 · Score: 0

HTML isn't a subset of SGML, it's an SGML application. The same way a computer program isn't a "subset" of C merely because it is written in it.
Re:SGML by iamvego · 2005-03-18 22:45 · Score: 0

I'm very surprised the interview didn't mention SGML as I thought XML was directly adapted from it with some properties stripped away for simplicity's sake.
Re:SGML by $1uck · 2005-03-19 02:41 · Score: 1

If only that were true. I have to deal with far too much sgml (particularly really poorly written DTD's). =/
Re:SGML by Max+Webster · 2005-03-19 06:59 · Score: 1

From the interview, the OED project was partly sponsored by IBM, which would have been the IBM Canada Lab where I worked. In fact, I think my department was involved in some way with the OED work.

I had always thought that SGML evolved into XML too, but from the article it sounds like maybe it's a separate branch descended straight from GML. Since both were IBM-influenced, makes sense that they would look similar.

(GML?)

Long before SGML or XML, IBM had "GML" which was a set of tags that you would find very familiar today. p for paragraph, ul for unordered list, h1 for heading level 1. Closing tags for paragraphs and list items were optional. Only thing was, they used different delimiters, so the markup looked like: :h1.First-Level Heading:eh1.

The DTD was actually better than any SGML or XML DTD I've used for publishing since, because it had a lot of good semantic tags for reference and programming content. You can see echoes of it in Docbook (things like the tags to generate syntax diagrams), but Docbook grew so big that everyone only uses a subset of it.

GML even had tags for doing Gantt charts, and I would dearly love to find a publishing system that could do printouts from such tags. I had a system for tracking schedules by manipulating date attributes within the schedule tags, e.g. "this activity is starting late but must still finish on time; this other activity is starting late and the end date will slip by the same amount".

The unstructured aspect was actually quite useful for publishing. You could define a "type" of table with column widths specified in a variety of relative and absolute ways, including "make the column wide enough to hold the text 'abc xyz'". Support for entities and imported content was also useful. Then you could create multiple instances of that table type. In practice, I found that was more intuitive than (say) defining via CSS. You could define a couple of entities one way and pull in a module that used them; then redefine them and pull in the same module again, to produce slightly different versions of the same content. Ditto the conditional text facility, all tag- and text-based, with enough flexibility to be useful.

IBM switched to various SGML and XML DTDs for publishing back in the mid-to-late '90s, but the restrictions and draconian error handling represented a step backwards IMHO. Here it is 10 years later, and we still haven't gotten back to the level of ease of use and flexibility that GML had in the '80s.
Re:SGML by smallpaul · 2005-03-19 18:57 · Score: 2, Informative

XML is defined as a subset of SGML. From the specification:
"The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document."
Re:SGML by JohnQPublic · 2005-03-21 03:11 · Score: 2, Informative

GML even had tags for doing Gantt charts, and I would dearly love to find a publishing system that could do printouts from such tags. ... ... Here it is 10 years later, and we still haven't gotten back to the level of ease of use and flexibility that GML had in the '80s

You're looking for Gary Richtmeyer's B2H program, available from IBM's z/VM download site. It's written in Rexx and runs on every system you're likely to be using, comes in source form, and can process just about everything the BookMaster markup can dish out (even the syntax diagram tags).

Lisp strikes again by Dancin_Santa · 2005-03-18 14:50 · Score: 5, Funny

How's that old saying go?

Those that do not understand Lisp are doomed to reinvent it, badly.

Why can't someone reinvent C so that it sucks less?

Re:Lisp strikes again by r2q2 · 2005-03-18 14:59 · Score: 2, Informative

I believe you are refering to greenspuns 10th law .http://c2.com/cgi/wiki?GreenspunsTenthRuleOfProgr amming

--
My UID is prime is yours?
Re:Lisp strikes again by Anonymous Coward · 2005-03-18 15:04 · Score: 0

Anyone who doesn't understand lambda calculus should be shot and killed, or prevented from programming a computer. Either way is fine with me.
Re:Lisp strikes again by Anonymous Coward · 2005-03-18 15:25 · Score: 0

How about you eat my fuck?
Re:Lisp strikes again by Anonymous Coward · 2005-03-18 15:36 · Score: 0

Those that do not understand Lisp are doomed to reinvent it, badly.

I am sick and tired of hearing about Lisp as the Leet Language of the Gods. Phoooey!
Re:Lisp strikes again by Anonymous Coward · 2005-03-19 21:27 · Score: 0

> Why can't someone reinvent C so that it sucks less?

They did, it's called Pike.
Re:Lisp strikes again by Anonymous Coward · 2005-03-21 02:40 · Score: 0

that would be impossible, as C is devoid of suckiness.

Can't Microsoft do *anything* original? by kelzer · 2005-03-18 14:50 · Score: 2, Interesting

From the "Jim Gray" link:

Jim Gray is a "Distinguished Engineer" in Microsoft's Scaleable Servers Research Group and manager of Microsoft's Bay Area Research Center (BARC).

OK, Xerox has their famous Palo Alto Reseach Center (PARC), so Microsoft just has to have its own similarly named center in the same general vicinity. Sheesh!

--

---------------------------------------------
SERENITY NOW!!!!!!!!!!!!!!!!

Re:Can't Microsoft do *anything* original? by datastalker · 2005-03-18 14:54 · Score: 1

The other possibility is that they were trying to establish a subway system software division, and Bay Area Rapid Transit (BART) was already taken. ;)

--
Find out about the Lexus Rx400h Hybrid!
Re:Can't Microsoft do *anything* original? by Baricom · 2005-03-18 14:54 · Score: 2

Bay Area Research Center (BARC).

Woof!
Re:Can't Microsoft do *anything* original? by Ctrl-Z · 2005-03-18 15:35 · Score: 1

Moof?

--
www.timcoleman.com is a total waste of your time. Never go there.
Re:Can't Microsoft do *anything* original? by Anonymous Coward · 2005-03-18 16:25 · Score: 1, Funny

"...And we'll call it the Bay Area Research Facility. And then we can... er... just a moment..."

Third Post! by Anonymous Coward · 2005-03-18 14:51 · Score: 0

But seriously, XML is good technology, but how can Microsoft patent something they don't even invent..... Oh, sorry, they filed in the US. Got it!

Oh boy... by Alwin+Henseler · 2005-03-18 14:52 · Score: 2, Insightful

So this guy Tim Bray is one of the people we have to thank for replacing compact, binary config files with 'human-readible', resource-intensive XML, that needs specialized libraries to make sense of it?

Thanks Tim, the world owes you one!

But okay you're right, you gotta use those CPU cycles for something...

--Don't give the world what it asks for, but what it needs.

Re:Oh boy... by MrLint · 2005-03-18 15:25 · Score: 4, Insightful

Umm doesnt any kind of config file require specialized code to read it?

As you wither need metadata to interpret the binary data, or know the predetermined data layout to read it, that sounds kinda specialized to me.

The other option is plain text with encoded binary data. This isnt bad, its human readable, kinda, it doesnt explain the encoded binary data. metadata is also needed. I can think of xinitrc files and old ini files from win16. Has to be parsed as plain text. No guarantee of best practice or anything

XML, well human readable, some meta info. still encoded binary data. This bonus here is the layout has at least some kinda standard to adhere to, and its possible in theory for one XML parser to read any arbitrary XML file.

So in any case you get a deal with faust. Not human readable, or something that needs to be parsed.
Re:Oh boy... by Alomex · 2005-03-18 15:26 · Score: 4, Insightful

Try making sense of your "compact binary config files" when something goes wrong, or when you want to port the config to a different application.

Yes, CPU cycles are cheap. CPUs sit idle over 90% of the time, even when there is a user in front of it. Spending the extra power processing 10K properly tagged files that are compatible across platforms rather than incompatible binary files is one of the best uses of raw CPU power we had.
Re:Oh boy... by Laxitive · 2005-03-18 15:32 · Score: 4, Insightful

Uhm, sorry, do you even know what the hell you're talking about?

Let's dissect this piece by piece.

>> "So this guy Tim Bray is one of the people we have to thank for replacing compact, binary config files"

Who the hell said anything about config files?

And we have tools to make things "compact" for us. It's called "compression".

>> "with 'human-readible', resource-intensive XML, that needs specialized libraries to make sense of it? "

Yes. Human readable. I'm a human. I can read it. Thus: Human readable. I don't understand what the quotes were for. Or your misspelling of "readable".

And "specialized libraries"? Oh, right.. I forgot. Binary formats don't NEED libraries to parse. Yep. Dunno why libjpeg62 even exists, when it's patently obvious you can just dump jpeg data straight to video memory. Oh yeah, who needs Microsoft Word. I just "cat resume.doc >/dev/lp" to print my documents. Cause it's binary you see. I don't need a library to parse it.

>> "Thanks Tim, the world owes you one!

But okay you're right, you gotta use those CPU cycles for something... "

No shit sherlock. Using CPU cycles to strictly check the type-validity of self-describing documents seems pretty worthwhile to me.

-Laxitive
Re:Oh boy... by Evil+Grinn · 2005-03-18 15:46 · Score: 3, Interesting

replacing compact, binary config files with 'human-readible', resource-intensive XML

Like what, the Windows registry? Don't say shit like that or ESR will shoot with one of those guns he collects.

http://www.faqs.org/docs/artu/ch03s01.html#id288 82 98

--
where there's fish, there's cats
Re:Oh boy... by Anonymous Coward · 2005-03-18 15:47 · Score: 0

XML, that needs specialized libraries to make sense of it?

Not going to mention any names here, but you might be surprised how much corporate B2B code doesn't use any specialized libraries for manipulating XML.

Plain text manipulation all the way for them... ugh.
Re:Oh boy... by Geoffreyerffoeg · 2005-03-18 15:50 · Score: 1

So this guy Tim Bray is one of the people we have to thank for replacing compact, binary config files with 'human-readible', resource-intensive XML, that needs specialized libraries to make sense of it?

No.

Unless you're the kind of guy who likes to blame Henry Ford for the drive-by shooting.
Re:Oh boy... by TummyX · 2005-03-18 15:53 · Score: 1

Yes. I think we should also blame the other Tim who invented HTTP and HTML (both text based).

I mean, what's wrong with a WWW based on the superior binary DOC format?
Re:Oh boy... by Anonymous Coward · 2005-03-18 16:17 · Score: 0

Hasn't he said he's a terrible shot or something along those lines? If so it kind of defeats the purpose of being a gun enthusiast.
Re:Oh boy... by shutdown+-p+now · 2005-03-18 16:31 · Score: 1

Oh yeah, who needs Microsoft Word. I just "cat resume.doc >/dev/lp" to print my documents. Cause it's binary you see. I don't need a library to parse it.

Well, try to unpack an .sxw file and cat it. You won't get anything useful out of it either without knowing the way data is stored inside an XML, and by the time you get there, what does it matter whether you operate in terms of bytes or tags? The code will be specific to the file format nonetheless.
Re:Oh boy... by Short+Circuit · 2005-03-18 16:51 · Score: 2, Interesting

Idle 90% of the time, but swamped for the 10% of the time you're waiting on results.

We need to shift applications from a event-compute-display model to a predict-compute-event-display model.

Caching data and intermediate data structures helps. Possibly even pre-computing them, when available memory permits.

For example, let's say you've just entered a formula into a spreadsheet. The spreadsheet app can prepare the results of what would happen if you, for example, filled a row or column of cells with the formula.

--
tasks(723) drafts(105) languages(484) examples(29106)
Re:Oh boy... by Graymalkin · 2005-03-18 17:41 · Score: 1

Except the XML file tells the parser where its own definition is. Each of the XML files inside of an OO.o package tell you how to figure out what they are. A generic XML parser can at least find the URI to the file's type definition. Knowing what all of the elements represent makes it possible to figure out what to do with all of the data and to tell if the XML file itself is even made properly. A comma delimited file doesn't tell the parser what each of the fields means, neither does a straight binary data file. At best they annouce their filetype in a header.

Joe,Slashdort,,124 Anyplace,CO: Mom,Los Angeles,CA,34,118

versus

<slashdork>
<firstname>Joe</firstname>
<lastname>Slashdort</lastname>
<nickname></nickname>
<address1>123 Anyplace</address1>
<address2>CO: Mom</address2>
<city>Los Angeles</city>
<state>CA</state>
<latitude>34</latitude>
<longitude>118</longitude>
</slashdork>

Twenty years from now the dude writing a parser for the XML can do so largely without a manual or any other documentation. The latitude and longitude and second address line aren't necessarily obvious in the CSV example. If the data was much more ambiguous, like the dump from a database, the XML would be even more self explanitory. While XML might not be the absolute best or most space efficient answer to self describing data it does a pretty good job, much better than a lot of the data encoding schemes it is starting to replace.

--
I'm a loner Dottie, a Rebel.
Re:Oh boy... by Faust · 2005-03-18 17:59 · Score: 5, Funny

hi!
Re:Oh boy... by LordHunter317 · 2005-03-18 18:47 · Score: 2, Interesting

Except the XML file tells the parser where its own definition is. Each of the XML files inside of an OO.o package tell you how to figure out what they are.
It's not quite that simple. XML files have two definitions: the DTD and the schema. The DTD is required for validation (i.e., well-formed XML), the schema for retreiving the layout of about the elements (i.e., an integer goes in the foo attribute). Neither are required for an XML document (though you must have a DTD if you want to validate it). Schemas aren't required at all, and that's what you want if you really to be able to progmatically manipulate XML without knowing anything it's form. Even then, they may not very useful; they'll tell you what's legal content in a element, but they still tell you nothing about what's supposed to go into to that element (i.e., what does the data stored in element 'foo' mean)? DTDs are useless for telling you anything about the content as well; they are a holdover from the SGML days.

I should go further to point out that OO does define DOCTYPES, but doesn't define any XML Schema information. Even if it did, that still doesn't tell me what the tag 'font-attribute' means. You still have to structure your XML schema in such a manner that a human can interpret meaning. So 'human-readable' is still in the eye of the beholder. XML doesn't go any further to rectify this than any other format. Making your data XML doesn't automatically make it human-readable. It's just like naming variables in a programming language: the name is arbitrary, but a good name will tell me what the variable is supposed to be holding (e.g., 'tmp' vs. 'lookup_value').
As an aside, were you referring to the xmlns declaration when you said, "A generic XML parser can at least find the URI to the file's type definition"? Those don't actually have any real-world meaning. They exist solely to let the XML parser know that the namespace I call 'foo' in one document and the namespace I call 'bar' in the second document are the same namespace. They don't have to have any real-world relevance (though they often do). They play no role in valdiation besides for the namespace identification I mentioned. If you look up the namespaces even for 'offical' XML groups you'll see they usually link to their documentation, not to a DTD or anything.
Some parsers do smart things with some of the well-known namespace URIs, but there is no requirement for them to do so AFAIK.
Re:Oh boy... by Paua+Fritter · 2005-03-18 22:52 · Score: 1

Well, try to unpack an .sxw file and cat it. You won't get anything useful out of it either without knowing the way data is stored inside an XML

I reckon you probably could get a fairly useful text out of it, without OpenOffice, using standard XML tools to extract the textual content only.
XSLT example
<transform version="1.0" xmlns=""http://www.w3.org/1999/XSL/Transform"/>
Re:Oh boy... by Paua+Fritter · 2005-03-18 22:55 · Score: 1

spot the deliferate mistale
Re:Oh boy... by lahi · 2005-03-19 00:33 · Score: 2, Interesting

I just want to state - again - that I think that Tim Berners-Lee ought to be fined heavily _and_ imprisoned for designing HTTP and HTML. Both contain uncountable design errors, which we have had to work around constantly ever since. He has done a tremendous disservice to the Internet Community. The HTTP protocol is simply a perverted form of the Gopher protocol (which itself was a trivial elaboration of the finger protocol, which is only good as protocol sample code.) And not having a proper SGML DTD from the start, but just a "loosely based on SGML" definition of HTML was outright criminal.

Oh, and the definitions of URI and URLs also sucks! Defining any constraints on the local part is the biggest mistake ever. URIs should have been like mail addresses and message IDs, which were the two prevalent object identifiers before the URL: both have a host part which defines the host to which they apply, and a local part which is just that: local - no meaning defined by the protocol. If that had been the case, there would be no need for stupid URL-encoding, which can be done wrong in so many ways, that I frankly doubt there is any way to actually do it right consistently.

-Lasse
Re:Oh boy... by shutdown+-p+now · 2005-03-19 01:51 · Score: 1

I'm not advocating CSV here. But you forget about such beautiful thing as S-expressions (of which XML is a bastard child, by the way). Validation? Sure, S-expressions again, only this time let it be Lisp code, not just plain data.
Re:Oh boy... by Munrobasher · 2005-03-19 02:39 · Score: 1

I really can't decide between human readble (just) XML et al versus binary formats. I love the ability to easily see what's going on but at the same time, the inefficiency (parsing & space) kind of offends the ex-games programmer in me.

Rob.
Re:Oh boy... by Munrobasher · 2005-03-19 02:48 · Score: 1

But at least your don't hear people complaining or worrying about how fast it is to access the registry.

I guess it comes down to whether you believe apps and systems have become bloated or not. This comes to code and data size.

Rob.
Re:Oh boy... by AaronGTurner · 2005-03-19 04:09 · Score: 2, Insightful

There may be a lot of spare compute cycles about, but what is critical is the ability to process XML in a timely manner on the CPU power that happens to be available at that precise instant in time at the appropriate location. Looking at the average CPU cycles used is like sitting in a traffic jam at 8am and noting that, on average, the road you are on is only used at 10% capacity. It being free at 4am is not much good if you are trying to get to work for 9am.
Re:Oh boy... by Anonymous Coward · 2005-03-19 12:20 · Score: 0

You don't get a deal with Faust. You get a deal with the devil. Faust is an example of someone who made a deal with the devil. It's like confusing Frankenstein and his monster. Typical Slashdotter not knowing what he's on about, but still desperately trying to sound educated.
Re:Oh boy... by shashark · 2005-03-21 00:01 · Score: 1

{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl {\f0\fswiss\fcharset0 Arial;}} {\*\generator Msftedit 5.41.15.1505;}\viewkind4\uc1\pard\f0\fs20 Why not use \i cat readble \b\i0 RTF\b0 for a few more \ul CPU\ulnone Cycles.\par }
Re:Oh boy... by Nevyn · 2005-03-21 11:34 · Score: 1

A comma delimited file doesn't tell the parser what each of the fields means, neither does a straight binary data file. At best they annouce their filetype in a header.

Joe,Slashdort,,124 Anyplace,CO: Mom,Los Angeles,CA,34,118

Errr... sorry to break it to you but CSV files will often have a header that looks like:

firstname,lastname,nickname,address1,address2,city ,state,latitude,longitude

...making it just as obvious what each column is for. Actually it's often much better, I can write about 3 lines of perl that will always correctly parse the CSV file and put the values into a hash ... God knows how much time you'd have to spend on the XML version. Because, of course, the XML file is really an "XML subset" or the parser needs to deal with the fact that the "record" has 10 extra child nodes (whitespace) and will often have extra nodes or changed nodes like having only a nickname, not having a nickname, having the nickname as a child of firstname or having the firstname, lastname and nickname inside another "person" node -- because, hey, it's XML so it's magic.

And if you went the XML subset route, you are screwed because people will assume otherwise.

--
ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
Re:Oh boy... by Graymalkin · 2005-03-23 23:47 · Score: 1

While XML isn't the answer to every question it does have its uses. Just transmitting straight database information like I used in my example might not be the best use of it. If I started to add meta data to each data element though, XML would be much more well suited.

With your Perl code putting a CSV into a hash you might have to change that around for every program you write. So eventually you will build a generic module that handles CSV files so anyone can throw a CSV into a function's arguments and have a nice and neat hash of all of the data. Now you're put as much work as an XML parser that does pretty much the same thing. Again it comes down to your uses, if you're writing apps to import Access databases into MySQL using CSV table dumps your CSV module would likely be quicker than XML::Parser. If your database app allowed for direct XML exporting which maintained all sorts of important meta data then the XML::Parser route would probably be more effective.

I use XML sparingly because it tends to be a little heavy on the processor doing the parsing. I wouldn't swear by it but I'm not going to disavow it because parsing can take a few extra cycles here and there. There's plenty of times where providing some XML support can really save some cross-platform/environment/language hassle, especially for lousy programmers.

--
I'm a loner Dottie, a Rebel.

well... by rune2 · 2005-03-18 14:55 · Score: 2, Interesting

I was damned by [GNU Project founder] Richard Stallman in egregiously profane language for working on it.

Why do I not find this hard to believe...

Re:well... by TimeTraveler1884 · 2005-03-18 15:13 · Score: 1

I was damned by [GNU Project founder] Richard Stallman in egregiously profane language for working on it.Why do I not find this hard to believe...
Tim Bray: I want to work on OED.

Richard Stallman: Fuck no! Those Oxford pricks disgust me. Fuck!... Shit!... Bu Balls!
Re:well... by Anonymous Coward · 2005-03-18 15:14 · Score: 0

Trust me, Taco would be Gates' personal massagist if he'd be offered half the money this guy gets.

pioneer ... currently at Microsoft by i.collect.spam · 2005-03-18 14:57 · Score: 3, Funny

"database pioneer ... (currently at Microsoft)" translated for slashdot readers: "sellout"

Re:pioneer ... currently at Microsoft by Anonymous Coward · 2005-03-18 17:57 · Score: 0

honestly, if you were offered a high paying position at Microsoft you wouldn't take it?

This is article is amazingly honest by tabkey12 · 2005-03-18 14:59 · Score: 4, Interesting

JG I assume that the burning issue was keeping it simple.

TB And we missed. XML is a lot more complex than it really needs to be. It's just unkludgy enough to make it over the goal line. The burning issues? People were already starting to talk about using the Web for various kinds of machine-to-machine transactions and for doing a lot of automated processing of the things that were going through the pipes.

Amazingly, for such a popular method of 'communication' between and within applications, XML is admitted by most to be rather flawed and bulky...

--
Get a free iPod Nano 4GB!

Re:This is article is amazingly honest by Camel+Pilot · 2005-03-18 15:18 · Score: 3, Interesting

I current working on a project that is doing machine-to-machine transactions. We started off using XML to bundle and unbundle the data. However as the data rates went up performance went south.

Some bright bunny came up with the idea of using perl stringified data structures instead using Data::Dumper.

On the receiveing end the data structure is Safe eval'ed and viola there is the data - orders of magnitude faster and there is still the ability to read or edit the data via text editor.

XML is just a representation of hierarchy data via named parameters and list. Perl (or Python if want) or very adept at parsing code strings.

Also with code structures you can add dynamic functionality like

'rsv_time' = localtime(time)

which you can't with XML...
Re:This is article is amazingly honest by TimeTraveler1884 · 2005-03-18 15:18 · Score: 1

JG I assume that the burning issue was keeping it simple.
Ugh... Gives new meaning to the personal interview.
Re:This is article is amazingly honest by arodland · 2005-03-18 15:23 · Score: 1

YAML!

A little fodder for the lameness filter
Re:This is article is amazingly honest by Evil+Grinn · 2005-03-18 15:37 · Score: 1, Insightful

Some bright bunny came up with the idea of using perl stringified data structures instead using Data::Dumper.

Uhh.. that's one of the things that Data::Dumper was designed to do.

--
where there's fish, there's cats
Re:This is article is amazingly honest by Anonymous Coward · 2005-03-18 15:46 · Score: 0

How many other environments understand Perl datadumps? One could equally pretend that serialized VisualBasic objects are a great interchange format.
Re:This is article is amazingly honest by sicking · 2005-03-18 15:57 · Score: 2, Insightful

Amazingly, for such a popular method of 'communication' between and within applications, XML is admitted by most to be rather flawed and bulky...

Yep. That didn't stop Microsoft from adding even more weight to it by creating SOAP though. Now there's a bulky format. It's like shipping a shirt-button in container on an oiltanker.

--
Failing to learn from history dooms you to repeat it.
Re:This is article is amazingly honest by hqm · 2005-03-18 16:31 · Score: 2, Interesting

People should use CommonLisp S-expressions instead of XML. S-expressions have the advantage that they have basic datatypes built into the format (string, list, ints, floats, symbols), and the namespace model is much more straightforwards.
Re:This is article is amazingly honest by starwed · 2005-03-18 16:36 · Score: 1

XML is just a representation of hierarchy data via named parameters and list
I think the point is: XML is a standardised representation of hierarchy data. (And not just a standard, but the dominant standard.)
Re:This is article is amazingly honest by andreyw · 2005-03-18 16:51 · Score: 1

Why bother with XML when it's just the NIH syndrome as applied to S-expressions.
Re:This is article is amazingly honest by filmmaker · 2005-03-18 17:04 · Score: 2, Insightful

That depends on what you're transacting. Plus, there's a forest for the trees issue here. We're already using a sub-set of XML for most HTTP transactions - that is, HTML. A move to XML standards simply opens up a huge array of opportunities for robotic transactions, as well as leaving the field relatively wide open for web developers of traditional varieties. It's a positive good, RSS, being an obvious example of why.

--
I Want To Believe
Re:This is article is amazingly honest by eln · 2005-03-18 17:34 · Score: 1

Which really represents the double-edged sword that is standards. Sure, with standards come greater interoperability, but if the standard is a bad one, you end up stuck with everyone having to work around flaws in the standard to get any actual work done.
Re:This is article is amazingly honest by Anonymous Coward · 2005-03-18 18:07 · Score: 0

I've always been struck by how amazingly inefficient XML representation can be. Now it's dawned on me--THAT'S why Microsoft embraces it so.
Re:This is article is amazingly honest by Anonymous Coward · 2005-03-18 19:33 · Score: 1, Informative

XML is just a representation of hierarchy data via named parameters and list.

That may be the only part of XML that your application is using, but don't make the mistake of believing that it is all that XML is good for. XML also gives you a platform-indepentant representation of your data with parsers already available for each platform. It also gives you automatic validation of the data structures using DTDs or XSDs and it gives you a framework and tools for doing data transformations (XSL).

It also gives you the ability to edit by hand. This is the biggest bloat area. XML could be much more compact and be parsed faster if you use a binary representation.

So just remember that you were using a small subset of XML's features. High performance is not one of them. If you need high performance, design your own format and write your own parser. It's not hard, just time consuming.
Re:This is article is amazingly honest by Ed+Avis · 2005-03-18 21:02 · Score: 1

I guess one reason to use XML is that you can document your file format using a DTD or equivalent and then others can easily validate their documents against the DTD to check they are correct.

--
-- Ed Avis ed@membled.com
Re:This is article is amazingly honest by ikkonoishi · 2005-03-18 21:07 · Score: 1

Umm HTML is not a "sub-set" of XML. It is a completely seperate standard which his based on the same standard that XML was based on.

HTMl is a version of SGML that uses a fixed set of tags.

XML is a simplified version of SGML.
Re:This is article is amazingly honest by CastrTroy · 2005-03-18 23:51 · Score: 1

If you really are passing that much data around that it's straining the network, you may consider compressing the data, and have the program uncompress the data before process. Using perl stringified data structures may work for you interfacing with your own systems, but if you have to interface your systems with othe people's systems, there are 2 standards, XML, and CSV.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:This is article is amazingly honest by filmmaker · 2005-03-18 23:53 · Score: 1

Well, right. Your pedantry is typical, though, of missing the forest for the trees here in slashdot. For the record, I do understand the distinction. I should have said "effectively a sub-set."

--
I Want To Believe
Re:This is article is amazingly honest by Decaff · 2005-03-19 02:16 · Score: 2, Informative

XML is just a representation of hierarchy data via named parameters and list.

It is far more than that.

It conforms to a standard. It allows its format to be extended in standard ways without breaking the original meaning. It has rules for allowing internationalisation. Also, there are a large number of efficient parsers and processors already written for it in almost every language.

Also with code structures you can add dynamic functionality like

'rsv_time' = localtime(time)

The XML dialect known as XSLT allows for such dynamic functionality, and in a standard way.

which you can't with XML...
Re:This is article is amazingly honest by hankaholic · 2005-03-19 02:25 · Score: 2, Informative

I think you may have misread. He said "blah blah blah instead using Data::Dumper", not "blah blah blah instead of using Data::Dumper".

If you haven't misread, your post was a little unclear, but I thought I'd respond by posting instead of with a nondescript "Overrated" mod.

--
Somebody get that guy an ambulance!
Re:This is article is amazingly honest by Anonymous Coward · 2005-03-19 03:06 · Score: 0

Amazingly, for such a popular method of 'communication' between and within applications, XML is admitted by most to be rather flawed and bulky...

Those looking for an interesting alternative might want to have a look at UBF.
Re:This is article is amazingly honest by Anonymous Coward · 2005-03-19 12:03 · Score: 0

And X12 and HL7 and ASN.1 and CORBA and SQL and COBOL or C data structures etc.......
Re:This is article is amazingly honest by Anonymous Coward · 2005-03-19 21:16 · Score: 0

s-expressions are common to all lisps, and not just common lisp.

Hmmm... by Anonymous Coward · 2005-03-18 15:02 · Score: 0

Tim Bray and Jim Gray. Which one's the ward and which one's Batman?

happy gilmore quote by wolfgang_spangler · 2005-03-18 15:02 · Score: 3, Funny

Gray interviews Bray, should have done it in May. Over by the bay.

Is the my karma burning? Oh what the hay.

Re:happy gilmore quote by cablepokerface · 2005-03-19 01:25 · Score: 1

So what is it you're trying to say?
Re:happy gilmore quote by Anonymous Coward · 2005-03-19 03:24 · Score: 0

You are clearly gay.

The Origin of XML by TimeTraveler1884 · 2005-03-18 15:04 · Score: 5, Funny

That's hogwash. Everyone knows that the idea for XML came from the tablets of stone that Moses brought down from Mount Sinai. In these tablets were the beginnings of self-describing data. That alone was where the commandments of W3C was originally sent out to the world.

But only in the last decade have scholars used transformation style sheets and super-computers to find more declarative complex types, hidden in the original Hebrew CDATA. It is thought there are tens if not hundreds of specifications in these texts that may never have a finalized draft.

Progress has been slow, while the discovery of SOAP in the 1800's has made the hygiene of data possible, there much that has yet to be standardized. Considering the aging DTD schemas left from the era of King James, it will be crucial to the data-exchange of humanity to uncover more secrets of XML.

Re:The Origin of XML by Anonymous Coward · 2005-03-18 15:09 · Score: 0

Various religions would correspond to XSLT then.

Jim Gray?! by Anonymous Coward · 2005-03-18 15:05 · Score: 0

Isn't that the guy who was hounding Pete Rose at the All Star game?

Why, oh why, did they have to repeat the tag name? by Anonymous Coward · 2005-03-18 15:18 · Score: 3, Interesting

I work with XML every day. And every day I wonder the same thing: why the hell does the end tag name have to be repeated? Why can't it just be optional? In other words, why can't it just be abbreviated as: <tagname>data</> ?

Oh MAN I wish they could have done just that one little thing for us. It would cut our datagram size down by at least 30%, maybe more.

MOD FUNNY MOD-NAZIS by Anonymous Coward · 2005-03-18 15:26 · Score: 1, Funny

I wonder how I'm supposed to write real comments including code examples here. Slashdot sure ssems stupid sometimes.

Now this is what I call understatement.

Jim Gray interviews Tim Bray by Saeed+al-Sahaf · 2005-03-18 15:29 · Score: 4, Funny

Jim Gray interviews Tim Bray Right, sure.

Have you ever seen these guys in the same room at the same time? No? I thought as much.

--
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck

Right in front of you, Tim! by Anonymous Coward · 2005-03-18 15:36 · Score: 4, Interesting

You know, the people who invented XML were a bunch of publishing technology geeks, and we really thought we were doing the smart document format for the future. Little did we know that it was going to be used for syndicated news feeds and purchase orders.

The most amazing thing is that back then in 1995-1996 at Open Text we were already using SGML as a data exchange protocol. All of us there (including Tim) ought to have known that XML would also have a life as a computer-to-computer communication protocol. Problem was that at the time so much of the SGML discourse was wrapped around the content versus format debate that we missed the obvious: the main of use of XML was not a replacement for HTML as a text format for the web, but as a kind of uber ASCII to allow the ready exchange of data between disimilar applications (just like ASCII in its time had eased the transfer of data between dismilar hardware and/or software platforms).

Semantic web snake oil... by Alomex · 2005-03-18 15:45 · Score: 5, Interesting

TB: I spent two years sitting on the Web consortium's technical architecture group, on the phone every week and face-to-face several times a year with Tim Berners-Lee. To this day, I remain fairly unconvinced of the core Semantic Web proposition.

Everyone who has actually done work on knowledge representation in the real world knows that this is a huge, difficult problem, unlikely to be solved anytime soon, as Tim Bray claims.

The only people who claim otherwise are either frauds or ignorant. The Semantic Web initiative has both: Tim Berners-Lee is very smart, but not a computer scientist, so he's not aware of the size of the challenge, plus he's a genuinely nice person, so he tends to trust others too much.

He has surrounded himself with the snake oil AI salesmen from the early 1980s who had promised us impending ubiquitous intelligent computers. Those fraudsters got found out back then, and spent the next fifteen years in academic limbo, only to be rescued by Tim Berners-Lee naivete.

Re:Semantic web snake oil... by bblfish · 2005-03-18 21:49 · Score: 2, Insightful

I work with Tim Bray, but I seriously disagree with this position of his. If you had gone back to the days before xml was invented you could have made exactly the same argument against xml: "SGML was not a success, therefore XML can't be". I have blogged about this falacious argument at length. You can work with the Semantic Web without having to take on the most difficult problems of AI. You can use it to work on some really simple problems very effectively. Speaking of "frauds", "ignorants" and "snake oil" when speaking of this project is really simplistic and (dare I turn the arrogance of the above poster against him?) stupid.
Re:Semantic web snake oil... by Anonymous Coward · 2005-03-18 22:34 · Score: 0

And just what has XML got to do with the Semantic Web? It's about RDF, not XML. XML is merely a serialisation syntax.
Re:Semantic web snake oil... by CondeZer0 · 2005-03-19 00:32 · Score: 1

The Semantic Web, Syllogism, and Worldview

"metadata is just data with unstandard interfaces"; read, write, and hierarchical file namespaces rule

--
"When in doubt, use brute force." Ken Thompson
Re:Semantic web snake oil... by Anonymous Coward · 2005-03-19 02:49 · Score: 0

you could have made exactly the same argument against xml: "SGML was not a success, therefore XML can't be".

It is not the same argument. SGML never looked like an unsurmountable problem, and the success of LaTeX and HTML proved that there were ways to bring SGML to the masses.

. You can use it to work on some really simple problems very effectively.

I agree, but that is not what the AI salesman are promising. The same was the case with AI. It wasn't that it was completely useless, the problem was with the over-promising and underdelivery.

(dare I turn the arrogance of the above poster against him?)

You are confusing arrogance with a J'accuse in the sternest of terms. There is widespread unanimity that the AI efforts in the 80s went nowhere, and were overblown. The only question is if the current semantic web is also one of those. Tim thinks so, and so do I.
Re:Semantic web snake oil... by bblfish · 2005-03-19 05:51 · Score: 2, Insightful
Tim thinks so, and so do I.
My suggestion to you: don't put too much weight on Tim Bray's bet. If you look carefully at his rdf.net challenge you will notice that the wording leaves him ample space to maneuvre were things to turn out agains him:
- This has to happen before January 1, 2006, and
- I am the sole judge and jury, but
- Ill publicize anything thats submitted formally, and my comments on it, so Im doing this in the open, except for
- Im busy, so I may exercise fairly brutal triage on incoming proposals and take a while to get to the ones really worth looking at, and
- If theres serious money in it, the recipient of RDF.net is morally obligated to find a way to cut me in for a piece of the action.
I like the last one: if someone has a idea that is going to make a lot of money they can have rdf.net for free if they cut him in on the action. wow! here is a man who really does not believe anything is going to happen :-)
Re:Semantic web snake oil... by Jagasian · 2005-03-19 06:15 · Score: 4, Insightful

If your post could be modded above a "5", I would mod your post as "insightful". I guess people have no memory, and that is why these Semantic Web frauds get grants, venture cap, etc. They have these big promises of seemlessly integrating web services... AUTOMATICALLY?!?!

The easiest way to disprove their crap is this. Even in RDF or OWL, it is possible to have "semantic aliasing", i.e. multiple ways of representing the same concept. This is exactly the core problem that they claim they address and that they claim that XML does not address. Think about it, how can automated inferences be made, if two concepts have distinct _semantic_ (not just syntactic) representations? Furthermore, it can be shown that in general these different representations cannot be automatically determined to represent the same thing.

So their entire project is a farce! It is a bunch of people that are both ignorant of pertinent theoretical mathematical results on computability, completeness, and hell, the fact that even in axiomatic set theory there are multiple ways to represent... say... the real numbers... and they are also ignorant of practical computer/sofware engineering and sociological limitations.

They have stop-gaps: ontologies. Oh if only people could agree on one common unified ontology, the entire semantic aliasing problem would be solved... or so they seem to think. But just because people agree on a common vocabulary, the way it is used can still give rise to the semantic aliasing problem. So even though the fact that agreeing on some complete or near-complete ontology is going to be IMPOSSIBLE, even if it was done, it still wouldn't fix the deep underlying problems with the Semantic Web - problems that have been struggled with for over 100s years in the field of formal mathematics.
Re:Semantic web snake oil... by illaqueate · 2005-03-19 10:07 · Score: 1

I've been saying this forever (read: back when I was a slashdot troll in high school) and one need not even look to philosophy. on the other hand, more modest ontologies seem eminently reasonable. the "semantic" part is hype since it's just data embedded in a social network, but as a mechanism of trust the regularization of data makes a lot of sense. in other words it's a structure that supports standardization, and that depends on social processes more than any facts about objects when they intersect with meaning.

Very insightful! by 3770 · 2005-03-18 15:47 · Score: 1

I hadn't thought about that. Very insightful.

There has got to be a reason though. Maybe that validation wouldn't be as good or something like that?

That's the only thing I can think of. With the notation you can tell that something is wrong, but not necessarily where.

--
The Internet is full. Go Away!!!

Re:Why, oh why, did they have to repeat the tag na by Anonymous Coward · 2005-03-18 15:58 · Score: 0

Perhaps because it becomes ambiguous when you start nesting and overlapping tags?

Re:Why, oh why, did they have to repeat the tag na by Alomex · 2005-03-18 16:02 · Score: 5, Insightful

why the hell does the end tag name have to be repeated?

Because that is the single biggest source of headaches in parsing SGML, the precursor of XML, in which such a construct is allowed.

It also makes error recovery very difficult, something that we know is quite important from all that malformed HTML code out there. The XML creators knew that too.

Re:Why, oh why, did they have to repeat the tag na by XanC · 2005-03-18 16:03 · Score: 1

Overlapping is illegal anyway.

It's not ambiguous for nesting; it would just close the closest opening tag.

The only reason I can think of is readability. When I have a long "if" statement in C or Perl, for example, I'll comment the closing curly brace with the statement's conditional.

Re:Why, oh why, did they have to repeat the tag na by Anonymous Coward · 2005-03-18 16:11 · Score: 0

something that we know is quite important from all that malformed HTML code out there.

The problem here is that browser display malformed HTML at all. If they didn't, missing end tags would be detected with the most rudimentary tests and there'd be little malformed HTML around.

Anyway, if I were to design XML, I'd go with a constrcut like <tagname>{...}. This would've made it just as readable and much easier to manage in a text editor.

Intra-vendor XML is (usually) stupid by mi · 2005-03-18 16:18 · Score: 5, Interesting

It drives me up the wall, that my employer is using XML to let parts of their own application communicate with other parts. DTDs are not used and all parts still need to be modified/recompiled whenever one of them changes. Same people maintain both ends of the communication.

Theirs is, in reality, a proprietory format, but to stay buzz-word compliant they use XML, which hurts performance -- sometimes dearly...

For example, to pass a couple of thousands of floating-point numbers from front end to a computation engine, each is converted to text string with something like <Parameter> around it. The giant strings (memory is cheap, right?) are kept in memory until the whole collection is ready to be sent out... The engine then parses the arriving XML and fills out the array of doubles for processing.

It really is disgusting, especially since freely available alternatives exist... For instance, PVM solved the problem of efficiently passing datasets between computers a decade ago, but nooo, we only studied XML in college -- and it is, like, really cool, dude...

--
In Soviet Washington the swamp drains you.

Re:Intra-vendor XML is (usually) stupid by Alomex · 2005-03-18 16:39 · Score: 2, Interesting

It drives me up the wall, that my employer is using XML to let parts of their own application communicate with other parts. DTDs are not used and all parts still need to be modified/recompiled whenever one of them changes.

Then you are not using XML right. For one the format shouldn't be changing much, if it is clearly you guys are spending too much time coding and not enough thinking. Second any application that does not use the new attribute should be able to ignore it without any compilation change. Third, two thousand floating points ain't a giant string, unless you are programming an 8086 in Elbonia. Converting two thousand numbers to text should take 50 microseconds at the most.
Re:Intra-vendor XML is (usually) stupid by Short+Circuit · 2005-03-18 17:01 · Score: 1

I know of a company whose management has mandated that they use automake and autoconf, simply because the OSS projects use them, and Open Source is real successful.

The problem? This code will *never* be maintained by anyone outside the company, and is only intended to run on a single embedded platform.

--
tasks(723) drafts(105) languages(484) examples(29106)
Re:Intra-vendor XML is (usually) stupid by Anonymous Coward · 2005-03-18 22:28 · Score: 0

For instance, PVM solved the problem of efficiently passing datasets between computers a decade ago, but nooo, we only studied XML in college

That's a valid position for a company to take. As somebody else said, you have an overactive sense of efficiency. It's not usually as important as, say, programmer time, or available skill sets.

Like it or not, there are way more programmers out there that are intimately familiar with XML than with other formats. There is way more software that deals with XML than with other formats. There are way more libraries that help you use XML than with other formats. There are way more related specifications (e.g. XPath) that help you use XML than with other formats.

Those are important issues. Unless the inefficiency of XML is seriously causing you problems, XML is far, far better choice than other formats, simply because of the network effect.

Your problem is that you are thinking like a programmer and not like a software engineer. Yes, if you just look at how to construct a program, XML is often not the best option. But if you look at how to solve the problem at hand, it often is.
Re:Intra-vendor XML is (usually) stupid by Anonymous Coward · 2005-03-18 22:51 · Score: 0

And what if there's millions of numbers to pass on a regular basis? And the possible loss of accuracy now and again? I see this sort of crap in company after company.

Similarly, company after company reads a database, converts the record into XML which is then parsed and read into an object. Processing takes place, the output is put into XML which is then parsed and stored on the database. What a waste of time and effort.
Re:Intra-vendor XML is (usually) stupid by CondeZer0 · 2005-03-19 01:36 · Score: 0, Troll

> Then you are not using XML right.
Oh, I have heard this same argument so many times... I still have to see _anyone_ "using XML right".

> For one the format shouldn't be changing much
You really should read The Mythical Man-Month; change is in the essence of software development.

> Converting two thousand numbers to text should take 50 microseconds at the most.
Yes, but what about parsing two thousand XML elements, building their DOM and then going thru it?(and SAX is not much better, not to mention much more limited). If you haven't realized yet the insane overhead of XML parsing you have never had to use XML in any serious production environment.

At risk of repeating myself:

"The essence of XML is this: the problem it solves is not hard, and it does not solve the problem well." -- Phil Wadler

--
"When in doubt, use brute force." Ken Thompson
Re:Intra-vendor XML is (usually) stupid by Alomex · 2005-03-19 03:02 · Score: 1

Processing takes place, the output is put into XML which is then parsed and stored on the database. What a waste of time and effort.

Yet again you miss the point. Exporting data in XML means that if you ever have a change of format or create a second consumer application it can readily understand the data. For example you can pass it to gecko and display the data. Try that with your propietary binary format.

Exchanging intra vendor data in XML is no more foolish than exchanging intra vendor data in ASCII. It almost always makes a lot of sense. If, on the other hand, you happen to have the rare application that moves massive amounts of data around, sure, feel free not to use ASCII or XML, and use your own proprietary format. Just be aware that this is the exception, not the rule.
Re:Intra-vendor XML is (usually) stupid by tootlemonde · 2005-03-19 06:02 · Score: 1

change is in the essence of software development
This statement is actually one of the reasons for using XML.
Using a standard data format like XML that is widely understood (but nooo, we only studied XML in college, as you put it) and has a mature set of parsing tools makes handling the data easier when the applications that process the data change, particularly if they change dramatically.
It sounds like your primary objection to XML for your application is that is not very efficient. There is always a trade-off between efficiency and flexibility. In the long-term flexibility usually wins.
Re:Intra-vendor XML is (usually) stupid by Jagasian · 2005-03-19 06:19 · Score: 1

Stop hurting the buzzword-engineer's feelings. They need to keep thinking that a piece of software is good if it makes use of enough buzzwords.
Re:Intra-vendor XML is (usually) stupid by Jagasian · 2005-03-19 06:22 · Score: 1

Ahhh, gotta love that "software engineering" bullshit. Yup, who cares if the actual system is properly constructed, lets stick to a bunch of dogmatic beliefs and buzzwords. I guess it explains why there is a bunch of crappy software out there. Too many people refuse to do what is necessary and instead just go with the flow.
Re:Intra-vendor XML is (usually) stupid by CondeZer0 · 2005-03-19 06:52 · Score: 0, Flamebait

How much bullshit, XML is not a "standard data format", it's an "standard for the (very lousy and almost completely useless) 'definition' of data formats". An XML file without documentation is as useless as a big binary BLOB; some day you should check the XML MS word generates, and I have seen much worse from other proprietary XML tools.

UTF-8 is a real standard data format, infinitely easier to parse and read than XML, and I got the best tool set to work with it just under /bin

So, when will you be adding the -X option to gnu/grep so it understands XML? No, wait, it will be --parse-XML-files-and-be-slower-than-g++, because you, like all GNU fools, can't live without verbosity.

If you excuse me while I wait I will go back to work in a XML-free system.

--
"When in doubt, use brute force." Ken Thompson
Re:Intra-vendor XML is (usually) stupid by Anonymous Coward · 2005-03-19 07:42 · Score: 0

Yup, who cares if the actual system is properly constructed, lets stick to a bunch of dogmatic beliefs and buzzwords.

Sorry no, you are suffering from that, not me. A dogmatic belief in efficiency is harmful when you actually want a solution to a problem instead of just participating in programmer mental masturbation. Efficiency is merely one factor to consider when deciding how to implement something, and it's almost never the most important one.
Re:Intra-vendor XML is (usually) stupid by tootlemonde · 2005-03-19 08:16 · Score: 1

So, when will you be adding the -X option to gnu/grep so it understands XML?
See xmlgrep. Also xgrep and xml command line utilities.
you, like all GNU fools, can't live without verbosity
A strange comment considering Plan 9's Unix origins.
Re:Intra-vendor XML is (usually) stupid by mi · 2005-03-19 10:38 · Score: 3, Insightful

Then you are not using XML right.

Does anybody?.. I guess, not...
clearly you guys are spending too much time coding and not enough thinking

No disagreement here -- that was my point, in fact.
two thousand floating points ain't a giant string, unless you are programming an 8086 in Elbonia.

Just tested simply sprintf-ing the same double 2000 times into the same text buffer on a PII-Xeon @450MHz with 2Mb of L2-cache, the whole program and the puny buffer are entirely in cache (which is not the case in real-life). 5-16 milliseconds (of user time, ignoring the sys-time)... The PII is not much slower, than the Sparcs we are using. Even if the latest and greatest CPUs are 10 times faster (which they aren't), why waste their power on chewing XML tags?
Converting two thousand numbers to text should take 50 microseconds at the most.

Now add the time to parse it on the other end, and consider, that the whole point of passing it is to have some computations happen. And the computations themselves happen in about 200 milliseconds...
Now realize that size of the XML-file is 3-4 times bigger than it needs to be -- but the network packets are still 1500 bytes and with XML we need 5 or 6 (at best) instead of 2. Bandwidth is cheap, but latency is not...
Now throw in the loss of precision from the double-text-double conversion(s) and climb up the wall next to me...
Using XML in such scenarios is like overnighting papers from one end of the office floor to the other. Defending this practice is like saying, that FedEx is really fast and efficient everywhere except in Elbonia...

--
In Soviet Washington the swamp drains you.

What it should have looked like by Anonymous Coward · 2005-03-18 16:20 · Score: 5, Insightful

I think XML should have looked more like this:

(html (head (title "This is an example")) (body (h1 "A first level header") (p "There's no reason for all the extra characters.") (p "Although this looks like LISPy HTML it could have all the features of XML")))

Re:What it should have looked like by nsaneinside · 2005-03-18 18:20 · Score: 1

exampletag
Re:What it should have looked like by pkphilip · 2005-03-18 20:36 · Score: 2, Interesting

Yes, I think this definitely looks more sensible. It would have reduced the size of documents considerably and it does look cleaner.

Consider a XML snippet:

<sampletag name="this" type="that">
Some value
</sampletag>

This could be translated into
(sampletag [name="this"] [type="that"]Some value)

which is much smaller.

I wonder if someone will consider this for real
Re:What it should have looked like by ikkonoishi · 2005-03-18 21:36 · Score: 2, Insightful

Sounds great... but then this happens

(html
(head
(title "This is an example")
(body
(h1 "A first level header")
(p "There's no reason for all the extra characters.")
(p "Although this looks like LISPy HTML it could have all the features of XML")))

Now your entire webpage is blank. What happened?
Re:What it should have looked like by alarch · 2005-03-18 21:59 · Score: 1

you haven't closed head tag! use VIM ;)

--
Deliriant isti Americani.
Re:What it should have looked like by Anonymous Coward · 2005-03-18 22:15 · Score: 0

<sampletag name="this" type="that">Some value</sampletag>

This could be translated into

(sampletag [name="this"] [type="that"]Some value)

which is much smaller.

Much smaller? Hardly. The only thing that makes the XML bigger is the newlines that you added to the XML example but not the LISPy example, and the end tag. XML could have abbreviated the end tag, but there are good reasons for having it that somebody else has gone into elsewhere in this story. Not counting those issues, the XML is actually smaller by one character. Furthermore, in practice, XML is usually compressed, so the redundancy of the closing tag is eliminated anyway.
Re:What it should have looked like by ikkonoishi · 2005-03-18 22:23 · Score: 1

Are you sure? Would the parser be? It could assume that the head has to be closed before the body begins, but it shouldn't have to.
Re:What it should have looked like by lahi · 2005-03-19 00:43 · Score: 1

The parser would say: "that doesn't parse", and that's really all it _should_ say.

-Lasse
Re:What it should have looked like by Anonymous Coward · 2005-03-19 01:48 · Score: 0

The parser wouldn't know that it didn't parse until the very end of the document. That's a big deal if you're parsing large files.

It's also annoying to debug if you don't get told exactly where the error is. Imagine if GCC just said "I won't compile this, figure out why yourself"!
Re:What it should have looked like by cortana · 2005-03-19 03:23 · Score: 1

/me chuckles

XML blows by ZeekWatson · 2005-03-18 16:42 · Score: 0, Troll

XML sucks ... too verbose for humans and too ambiguous for machines.

One day we'll look back and laugh!

Re:XML blows by JohnFluxx · 2005-03-18 17:33 · Score: 1

ambiguous?
Re:XML blows by Munrobasher · 2005-03-19 02:55 · Score: 1

Inefficient would be a better word.
Re:XML blows by ZeekWatson · 2005-03-19 19:16 · Score: 1

Totally.
<dictionary> <definition term="e-mail">electronic mail</definition> <definition term="html">hypertext transport language</definition> <definition term="xml">extensible markup language</definition> </dictionary> or is it: <dictionary> <word> <id>e-mail</id> <def>Electronic mail</def> </word> </dictionary>
More info here:

http://c2.com/cgi/wiki?XmlSucks
Re:XML blows by JohnFluxx · 2005-03-19 20:20 · Score: 1

Uh, you make it whichever way you want it to be. Just because there's more than one way to design it doesn't mean it's ambiguous.
Re:XML blows by Munrobasher · 2005-03-20 00:02 · Score: 1

Uh, you make it whichever way you want it to be. Just because there's more than one way to design it doesn't mean it's ambiguous.

Err, yes it does - the more ways there are to express something, the more ambigious it is.

Rob.
Re:XML blows by JohnFluxx · 2005-03-20 00:31 · Score: 1

um dude, you are seriously confused.

Ambiguous means that something has multiple semantic meanings. Not that one semantic meaning has multiple representations.
Re:XML blows by Munrobasher · 2005-03-20 01:03 · Score: 1

Err, no it doesn't:

1. Open to more than one interpretation

2. Doubtful or uncertain

The original poster was asking whether in his example it should be attributes, the body or child tags. As there are three ways of just this one example, then it's certainly open to interpretation as to which is the "best" way to do it. And I'd argue that as there are three ways, it uncertain as to which is the right answer.

Ambiguous can also simply mean "lack of clarity" which is what part of this thread is all about.

Rob.
Re:XML blows by JohnFluxx · 2005-03-20 02:18 · Score: 1

"open to more than one interpretation"
Yes that's exactly what I said - has multiple semantic meanings.

"Doubtful or uncertain"
Ditto.

You are getting yourself very confused. Just because it's unclear which way is the best way to design your xml doesn't make your decided xml in the least bit ambiguous.
Re:XML blows by Munrobasher · 2005-03-20 03:20 · Score: 1

I agree with that but that wasn't what the original poster was talking about. They were trying to get across that it's ambiguous which is the correct way to design your XML schema in the first place. Not that the resultant XML is ambiguous.

Cheers, Rob.
Re:XML blows by JohnFluxx · 2005-03-21 21:21 · Score: 1

Yes, but to state that as "xml is ambiguous" is a rather large stretch.

Re:Why, oh why, did they have to repeat the tag na by chgros · 2005-03-18 16:46 · Score: 1

I work with XML every day. And every day I wonder the same thing: why the hell does the end tag name have to be repeated? Why can't it just be optional? In other words, why can't it just be abbreviated as: <tagname>data</> ?
Same thing for me, although I'd rather have C-like blocks e.g. {data} so it's easier to jump from one side to the other (as any good editor will allow you to do). And quoting could be made easier, too (Come on, <? What were they thinkin?!). The only advantage of not using \ as everyone else does is that if you actually have to store a string that's quoted for something else you don't have to write e.g. \\\\\\\\ (what you need for instance to grep for a \\ from a shell).

now I know why! by Anonymous Coward · 2005-03-18 16:53 · Score: 0

" The OED was the first open source project that I can think of. It was a collection of scholars who were working together and collecting quotes from all over the world. Did the fact that this was an open source project influence you at all?"

The second open source project was fortune(s) which quickly developed a -o option! Fortunes most likely started with very bored OED programmers.

Re:Why, oh why, did they have to repeat the tag na by iabervon · 2005-03-18 17:11 · Score: 1

If more than 60% of your datagram size is element names, your element names are too long. Or you're using nested elements when you should be using attributes.

This is article is amazingly honest-"S" isn't XML. by Anonymous Coward · 2005-03-18 17:41 · Score: 0

Because as I've pointed out for the one-billionth time. XML is not S-Expressions Now why don't you all give it a rest.

Re:Why, oh why, did they have to repeat the tag na by syukton · 2005-03-18 17:45 · Score: 2, Insightful

Yeah that'd work great if you knew 100% of the time that you'd never get bad data. If you've got a multi-nested element hierarchy however and you lose one or two of your , how do you know where to put them back in? It's very easy to look for an opening tag followed by a closing tag of the same name, especially when building a parser that error-checks.

You know what would cut down the datagram size more? Smaller tag names. Tag names don't have to be readable so much as uniquely identifiable; you can use an interface layer in the editor to make the tag names user friendly and then de-friendify them for transit. Then you've got:

<a>
<b>woo&lt/b>
</a>

insted of:

<element>
<subelement>woo&lt/subelement>
</ele ment>

According to wc, switching to single-character element names instead of the multicharacter ones would give a 41% reduction in bulk, for the example above.

--
Reinvent the wheel only at either a lower cost, greater effectiveness, or your own personal enrichment and satisfaction.

Not Very insightful! by stevens · 2005-03-18 18:01 · Score: 2, Informative

I hadn't thought about that. Very insightful.

Lots of people have thought about it. Not Very Insightful.

The reason is that if the parser encounters unbalanced end-tags, and they're all just </>, the parser will go farther and get very confused before it dies.

It will be very difficult to pinpoint *which* tag isn't closed, like C's optional {} after an if(), or SGML's optional closing tags.

It's much easier to correct if your parser can say "You forgot to close <account> on line 115" rather than "Something or other is unbalanced somewhere before line 224."

Re:Why, oh why, did they have to repeat the tag na by stevens · 2005-03-18 18:09 · Score: 1

Anyway, if I were to design XML, I'd go with a constrcut like <tagname>{...}

Or just maybe something like

(html (body (p One s-expr to (em rule) them all, and in the (strong darkness) bind them.)))

The almighty Q by Anonymous Coward · 2005-03-18 18:20 · Score: 1, Interesting

Q: How does an XML newbie go about learing what it is including xslt, dtd, and how to structure xml, xslt, dtd so that it does not break in 5 years and is not ungodly complex?

My initial impression is that XML is essentially as good as the VSAM/ISAM/Network Database Model and for similar reasons may drop out of use after 10 years.

Re:The almighty Q by ikkonoishi · 2005-03-18 21:09 · Score: 1

http://www.w3schools.com/xml/default.asp

That is a very good resource for the beginner to intermediate XML user.

bah by nsaneinside · 2005-03-18 18:22 · Score: 1

&op;exampletag&cp;

Re:Why, oh why, did they have to repeat the tag na by FuzzyBad-Mofo · 2005-03-18 18:59 · Score: 1

I work with XML every day.

I'm sorry..

Please explain by johannesg · 2005-03-18 20:05 · Score: 2, Insightful

I've heard this quote in relation to XML before, and I don't get it. LISP is a programming language. XML is a method for storing data. About the only relation between the two that I can find is that both use nesting. So, why does this get brought up whenever XML is being discussed?

Re:Please explain by Anonymous Coward · 2005-03-18 22:23 · Score: 5, Informative

johannesg writes: "I've heard this quote in relation to XML before, and I don't get it. LISP is a programming language. XML is a method for storing data. About the only relation between the two that I can find is that both use nesting. So, why does this get brought up whenever XML is being discussed?"

Lisp source code is first parsed into S-expressions before being compiled. The programmer can manipulate these S-expressions to generate new programming constructs.

S-expressions are nested lists of dynamically typed data. The compiler turns these nested lists into bytecode or assembly code. But before this happens you're able to manipulate a well defined, concise and platform independent data format. The format is so useful that it is also used to store and transport non-code.

Here's a Lisp function call nested within another function call:

(/ (+ 1 2 3) 6)

[i.e. add 1, 2, and 3 together and then divide by 6] Let's first give different names to the function operators:

(divide (plus 1 2 3) 6)

Now introduce redundancy by duplicating the opening function names:

(divide (plus 1 2 3 /plus) 6 /divide)

Translate the dynamically typed integers to explicit type indentifiers:

(divide (plus (integer 1 /integer) (integer 2 /integer) (integer 3 /integer)) (integer 6 /integer) /divide)

Now convert the parentheses and spaces to angle brackets to generate XML:

<divide>
<plus>
<integer>1</integer>
<integer>2</integer>
<integer>3</integer>
</plus>
<integer>6</integer>
</divide>

Lisp S-expressions are a method for storing/expressing data AND code. They have less overhead than XML, solve more problems than XML (comfortably human readable programming languages can also be written in S-expressions, e.g. Scheme and Common Lisp) and they were invented decades earlier.

Regards,
Adam Warner
Re:Please explain by Anonymous Coward · 2005-03-19 02:31 · Score: 0

Another example, adapted from Practical Common Lisp:
In XML, you might have something like this:
<albums> <album><title>Home</title><artist>Dixie Chicks</artist><rating>9</rating><ripped>T</ripped ></album> <album><title>Fly</title><artist>Dixie Chicks</artist><rating>8</rating><ripped>T</ripped ></album> <album><title>Roses</title><artist>Kath y Mattea</artist><rating>7</rating><ripped>T</ripped ></album> </albums>
In Lisp you could have:
((:TITLE "Home" :ARTIST "Dixie Chicks" :RATING 9 :RIPPED T) (:TITLE "Fly" :ARTIST "Dixie Chicks" :RATING 8 :RIPPED T) (:TITLE "Roses" :ARTIST "Kathy Mattea" :RATING 7 :RIPPED T))
or just:
(("Home" "Dixie Chicks" 9 T) ("Fly" "Dixie Chicks" 8 T) ("Roses" "Kathy Mattea" 7 T))

XML by Anonymous Coward · 2005-03-18 20:13 · Score: 0

XML... but what it is good for?

Explicitness by samael · 2005-03-18 21:15 · Score: 2, Insightful

Because it would make spotting your bug harder. Did you _mean_ to close that tag, or did you think you were closing a different tag? If all closing tags look the same it would make tracing certain bugs harder.

--
My Journal

Re:Why, oh why, did they have to repeat the tag na by ikkonoishi · 2005-03-18 21:22 · Score: 1

Which causes the same problem described in the grandparent.

I COULD NOT AGREE MORE. gzip is our friend! by TheLittleJetson · 2005-03-18 21:30 · Score: 2, Informative

when i work with XML in java, i generally use just pass the XML through a GZIP stream. need to see the file contents? zcat. XML compresses well since it's repetative text. Lately I've been doing a lot of XUL code with PHP/smarty as the back-end, and again, I transparently gzip this...

So, this solves the problem of the size of the XML to be stored on disk or transmitted over network... The only difference is parsing. Again, when i'm in java, i use PICCOLO to parse the XML -- it uses a lexical analyzer (jflex?) to parse XML more like a compiler parses code, by tokenizing it. turns out, this is really fast.

Disk space is cheap. CPU's are fast. Mainstream XML parsing technology can always be made faster. Why must we abandon our beloved, human-readable, standardized format for files and protocols alike in favor of binary files?

Re:Why, oh why, did they have to repeat the tag na by ikkonoishi · 2005-03-18 21:32 · Score: 4, Informative

< ele1> < ele2> < ele3> < /> < /> < ele4> < ele5> < /> < />

Which element did I forget to close?

< ele1> < ele2> < ele3> < /ele3> < /ele1> < ele4> < ele5> < /ele5> < /ele4>

Clearer now?

Re:Why, oh why, did they have to repeat the tag na by Dominic_Mazzoni · 2005-03-18 22:30 · Score: 1

You know what would cut down the datagram size more? Smaller tag names.

Not really, because any protocols that exchange large amounts of XML data should be compressing the data anyway, right?

Originally: Bay Area Research Facility by quarkscat · 2005-03-18 22:36 · Score: 2, Funny

also known as: BARF. The name was changed, no
doubt, in order to instill a greater sense among
MSFT employees there that they actually might
(someday) have a workable product. Hence, BARC.

XML is more complicated than it should be, but
it is NOT a MSFT "invention", and has no business
being patented by MSFT. Let alone, encumbered
with their viral and restrictive and expensive
licensing scheme. What it IS is yet another
example of the slimey "embrace/extend/extinguish"
monopolistic business practices of MSFT. If the
DoJ weren't more like a 90 year old grandmother
that misplaced her full dentures (aka the Dubya
regime), they would have MSFT back into court to
exact "new & improved" punishment on the 800 lb.
gorilla.

Re:Why, oh why, did they have to repeat the tag na by Anonymous Coward · 2005-03-18 23:34 · Score: 0

The element you forgot to close is the one whose content model doesn't allow the content that follows. (If you don't have a DTD or schema, you might as well get used to handling garbage, because you're going to see a lot of it.)

Re:I COULD NOT AGREE MORE. gzip is our friend! by Anonymous Coward · 2005-03-18 23:43 · Score: 0

Do the words "Slashdot effect" ring a bell? Server farms are not cheap, and when you're supporting a truly large number of clients, gzip is no longer your friend. Pissing away cycles per octet per request per user just to ignore optional whitespace and comments(!) in my RPC is just stupid.

Re:Why, oh why, did they have to repeat the tag na by Bitsy+Boffin · 2005-03-18 23:53 · Score: 1

I don't think you got the joke.

See here for enlightenment.

--
NZ Electronics Enthusiasts: Check out my Trade Me Listings

Influence of Locoscript by Sambeau · 2005-03-19 00:22 · Score: 0

Locoscript (on the Amstrad PCW Word Processor) was really big amongsty British Academia during the 1980's. It would be interesting to know if their [+bold]tagged format[-bold] had any influence on the OED..

Needs to be bottom up by samael · 2005-03-19 00:37 · Score: 1

If we're going to categorise the web then a fuzzy definition set with multiple overlapping definitions is going to be necessary. I suspect that del.icio.us is going to be the first step in this direction - link it into google and you've got a good stab at understanding what concepts web pages are actually connected to.

--
My Journal

Re:Why, oh why, did they have to repeat the tag na by Anonymous Coward · 2005-03-19 00:37 · Score: 0

But while XML compresses well, it is not necessarily the case that verbose XML compresses so much better than compact XML that it ends up smaller.

$ cat test1.xml <a> <b>woo</b> </a> $ cat test2.xml <a-long-tag-name> <be-amazed-by-the-length-of-this>woo</be-amazed-by -the-length-of-this> </a-long-tag-name> $ d test*xml* rw-r--r-- user:group 20 Mar 19 12:32 test1.xml rw-r--r-- user:group 47 Mar 19 12:34 test1.xml.gz rw-r--r-- user:group 110 Mar 19 12:33 test2.xml rw-r--r-- user:group 94 Mar 19 12:34 test2.xml.gz

So, on a trivial example, using small tag names produces a factor of 5 reduction in size in uncompressed XML, which remains a factor of 2 reduction when compressed with gzip -9.

Sure, the difference may or may not be smaller on larger files. But it exists.

Re:Why, oh why, did they have to repeat the tag na by lahi · 2005-03-19 00:37 · Score: 1

Which is not _actually_ a problem.

-Lasse

The essence of XML... by CondeZer0 · 2005-03-19 00:58 · Score: 2, Funny

"The essence of XML is this: the problem it solves is not hard, and it does not solve the problem well." -- Phil Wadler

--
"When in doubt, use brute force." Ken Thompson

Re:The essence of XML... by Anonymous Coward · 2005-03-19 02:31 · Score: 0

No, here's a better description of XML.

"XML is like violence: If it doesn't solve your problem, you aren't using enough of it."
Re:The essence of XML... by CondeZer0 · 2005-03-19 03:09 · Score: 0, Flamebait

"XML is like violence: If it doesn't solve your problem, you aren't using enough of it."

Yes, forgot to mention that one, it's one of my recent additions to my quotes database; do you know the author?

--
"When in doubt, use brute force." Ken Thompson
Re:The essence of XML... by Anonymous Coward · 2005-03-19 04:10 · Score: 0

I believe it's from Chris Maden http://whump.com/moreLikeThis/

Some of my other favorites include:

Being able to read other people's source code is a nice thing, not a fundamental freedom.

In God we trust, all others we virus scan.

Treat your password like your toothbrush. Don't let anybody else use it, and get a new one every six months.

Re:Why, oh why, did they have to repeat the tag na by tyen · 2005-03-19 02:31 · Score: 1

Interesting suggestion. This technique might only work for XML documents above a certain level of size, number of tag types, and possibly even parsing complexity. The applicability might depend upon whether your utilization of XML serialized data is batch-oriented or transactional in nature. This technique would yield lots of benefits for scenarios where someone is dumping a few, very large XML documents across the wire, but perhaps not so much for scenarios where lots of small, quick XML documents are being exchanged back and forth. CPU saturation (and eventually memory I/O saturation) for example, might become a concern in certain scenarios.

In any case, it seems one name for this technique is XML "compaction". I searched around Sourceforge and found quite a few projects trying to tackle the general problem domain of efficient XML transmission. The compaction terminology was used and explicitly described by the Xqueeze project. There are other projects that either directly apply themselves against the XML compression problem or are tangentially resolving the problem by completely changing the representation format (no transcoding): xmltk, XMLPPM, XBIS XML, WAP Binary XML (WBXML). I will probably look at Xqueeze and XMLPPM for my own programming work that requires handling XML formatted data in a more batch-oriented setting.

Who cares - they are both ill-formed anyway by Anonymous Coward · 2005-03-19 02:42 · Score: 0

So what if I stop at ele2 or ele3 - the document is wrong anyway and must be rejected. The result is the same - humans don't read XML anyway - only machines.

Re:I COULD NOT AGREE MORE. gzip is our friend! by Munrobasher · 2005-03-19 02:43 · Score: 1

But bandwidth and even LANs aren't that fast which is where the bottleneck occurs. We're experimenting with the excellent Infragistics NetAdvantage suite of web controls like the grid. These things are ending up sending a 2MB HTML file across - okay, it's not XML in this case but it's the same idea/problem.

Rob.

50 microseconds is not bad - it's terrible by Anonymous Coward · 2005-03-19 02:46 · Score: 0

Try marshalling and unmarshalling 1,000,000,000 floating point numbers as used in many mathematical simulations - XML is a non-starter.

Re:50 microseconds is not bad - it's terrible by Alomex · 2005-03-19 03:21 · Score: 1

Exchanging intra vendor data in XML is no more foolish than exchanging intra vendor data in ASCII or IEEE floating point format. It almost always makes a lot of sense. If, on the other hand, you happen to have the rare application that moves massive amounts of data around, sure, feel free not to use ASCII or XML, and use your own proprietary format. Just be aware that this is the exception, not the rule.

Same shit would happen in XML by Anonymous Coward · 2005-03-19 02:50 · Score: 0

How is your post insightful? Not closing a tag is not unique to Lisp-ish expressions - XML suffers from it as well.

Re:Same shit would happen in XML by Anonymous Coward · 2005-03-20 14:03 · Score: 0

How is your post insightful? Not closing a tag is not unique to Lisp-ish expressions - XML suffers from it as well.
But thanks to XML syntax requirements, the parser can point out the exact tag that wasn't properly closed.

Re:This is article is amazingly honest-"S" isn't X by Anonymous Coward · 2005-03-19 02:53 · Score: 0

Because you're an idiot. The difference you point out in your article are mostly not substantial, as e.g. SXML, the entire "XML Infoset" expressed in Scheme illustrates. The _only_ part of value in XML is as you point out the dealing with different character encodings. And that's really pretty independent from the rest of XML and could be applied just as easily to a SEXP-based-but-not-XML file format.

[OT] bad summary by hankaholic · 2005-03-19 02:53 · Score: 4, Insightful

Tim reveals where the idea for XML actually came from: Tim's work on the OED at Waterloo.

If you believe that "OED" will be misunderstood by enough people to justify enclosing it with a link to a definition, why not just spell out "Oxford English Dictionary"?

"Hmmm, OED might be unclear to tons of people reading this, I'll make them have to click on a link to know what I'm talking about."

Obligatory relation to discussion content:

Providing a link instead of writing a clear summary is choosing the wrong tool for the task at hand. Authors of some other comments in this thread have shown that XML also is the wrong tool for many of the tasks to which it is applied. Whether it's passing data internally within an application or summarizing an article for the homepage, choosing the right alternative can make a difference between efficient clarity and an inelegant kludge.

Applying the right algorithmic tool to the right problem is actually a focus of CS. This is why sorting routines are often studied -- for instance, a routine which is more efficient at sorting millions of unordered pieces of data may be very wasteful when dealing with nearly presorted data.

The distinction is not often understood and has more of an impact that the observer might think. For instance, when writing an application for a handheld in which data is kept sorted and is usually viewed between insertions it makes sense to sort after every data element added to the database. However, this means adding a single item to a mostly-ordered set. Understanding that quicksort is a poor choice for this application means a difference in battery life.

--
Somebody get that guy an ambulance!

Re:Why, oh why, did they have to repeat the tag na by cortana · 2005-03-19 03:14 · Score: 1

Or even (:tagname data)?

Get this man a copy of Practical Common Lisp!

Re:Please explain-Chinese firewall. by SnowZero · 2005-03-19 04:18 · Score: 2, Insightful

And they have infix notation...

S-expressions are in prefix notation. Infix describes expressions such as "1+2". Lots of parenthesis is hard to read, but twice that number of angle brackets is certainly not easier.

Blurring the line between data and code is a useful technique...

This only matters if you use the data in Lisp without being careful. Any non-interpreted language could use it just as safely as XML.

P.S. I don't even like Lisp, being a person who likes type checking before I actually execute a snippet of code. On the other hand, they really do have a point regarding S-expressions and XML.

Criteria for PAT by pvcf · 2005-03-19 05:16 · Score: 1

TB: ...A bunch of Canadian government money came in, on the stipulation that they produce not only academic research, but also working, usable software...

And PAT was:

working: yes
usable: mostly
pretty: good thing this wasn't one of the criteria :-)
Sorry Tim, I couldn't resist. But you have to admit that PAT was rather ugly...
....Paul

--
F U NE X N M? Son: "Dad... How do you spell 'hourly'?" Dad: "0 * * * *"

Re:Why, oh why, did they have to repeat the tag na by Anonymous Coward · 2005-03-19 05:19 · Score: 0

It's very easy to look for an opening <element> tag followed by a closing tag of the same name

It's easy for you, but it's impossible for me! Why? Because we consume XML on embedded devices with very small memory. We have no space for all the extra code that would be needed to do sophisticated recovery from XML syntax errors. When any XML error is detected, we simply discard the packet and jump into recovery mode.

The right solution is to let the USER decide if they want to bloat their datagram to facilitate sophisticated error recovery.

The incorrect assumption that you made is a perfect example of the lack of thinking that XML's design suffers from.

Yeah that'd work great if you knew 100% of the time that you'd never get bad data.

Pretty much. We use TCP, which has extremly high protection against packet corruption. The only XML error we have seen in deployment is truncation when somebody pulls out a CAT5 cable, or when a program crashes in the middle of a socket write.

(Obviously, we get other XML errors during development, but those errors are caused by software bugs. We don't need a sophisticated parser to make dynamic corrections in our buggy XML, when we can simply fix the bug.)

+1 Insightful by Jagasian · 2005-03-19 06:28 · Score: 1

Please mod the parent post up!

Lets see.... by jefu · 2005-03-19 06:39 · Score: 1

Okey dokey.

First, lets add matching close parens to make error detection easier. (It might be handy to have a way (such as the "/") to indicate an end tag, but we're going for brevity here.)
(html ... html)

Now let's add attributes. It is probably most convenient to put these in a list right after the element name. Obviously if there are no attributes we need to put in an empty list so the parsing won't be ambiguous :
(html '() (head '() (style '( type."text/css" ...) style) head) html)

(Of course, one could use some special syntax to indicate attribute lists, or even map them into attributes. And then the attribute lists would not be required to follow the element name and the parser could tell the difference easily enough.
(html (attribute-list ... attribute-list) html)
Or perhaps :
(html #( ... attributes in here )# html)
Now we have the question of attribute lists that might not follow the element name - is that legal? Better not let it be, I can see several problems with that that would change the semantics - not unusefully, but incompatibly with XML.

OK. That looks like it will work and be (more or less) isomorphic to XML. How much space does it really save? One ">" for each start element and two characters for each end element - but there are the added characters for the empty attribute lists.

I doubt anyone would be terribly bothered if someone built a syntax that was isomorphic to that of XML (meaning that a syntax transformer and its inverse run on a document of either sort would produce the same document).

Go for it - I'm sure there are a lot of people who would like to be able to human-author xml without all the syntax. But wait - there are XML editors that do just that! (Most could do it better, but thats another problem.)

Re:Why, oh why, did they have to repeat the tag na by syukton · 2005-03-19 07:02 · Score: 1

Pretty much. We use TCP, which has extremly high protection against packet corruption. The only XML error we have seen in deployment is truncation when somebody pulls out a CAT5 cable, or when a program crashes in the middle of a socket write.

How about when the power goes out? When a hard drive has a bad sector and transfers a malformed file? When your parser misses a closing tag, how does it know which XML element parents the next XML element? Does it guess? How would you recover from such an error?

I've written my share of XML parsers and routines, including a de-bloating script which does the single-character tag naming. I actually used 2 characters because we had more than 26 elements and I wanted to remain strictly alphabetic; but 2 characters is suitable for a document possessing 676 elements or fewer. 3 characters for 17576 elements. Anyhow, I've never done anything on an embedded platform. I've always had the space available to me to fix broken XML. I actually had this parser once which would try to recover data from a bad XSL transform and make it standards compliant. It did so by adding and removing tags as it saw fit (based on a config file I created) in order to ensure that every element was parented validly and only possessed valid child elements.

--
Reinvent the wheel only at either a lower cost, greater effectiveness, or your own personal enrichment and satisfaction.

that made me snort my coffee by Anonymous Coward · 2005-03-19 07:30 · Score: 0

mod parent up. That is so damn true.

Re:Why, oh why, did they have to repeat the tag na by Anonymous Coward · 2005-03-19 08:40 · Score: 0

How about when the power goes out?

Same case as a program crash. There's no way for me to know if my client crashed or lost power. And I don't care.

When a hard drive has a bad sector and transfers a malformed file?

Same thing. It causes a crash or some other error. All errors are the same to me.

When your parser misses a closing tag, how does it know which XML element parents the next XML element?

Don't care. The XML is either perfect, or it's garbage. In our limited-memory embedded environment, we simply don't have the abililty to be dainty about this. All errors are handled the same.

How would you recover from such an error?

Hard reset.

I've always had the space available to me to fix broken XML.

That's the whole problem here. People make assumptions about the environment.

I wrote my XML parser by hand, in C. It does only the absolute minimum that it needs to extract a substring from the XML. I don't have the luxury of doing anything else.

I actually had this parser once which would try to recover data

That's a nice feature. But it's totally different from what I have the resources to do.

We've got a big ol' basket of apples and oranges here.

Re:Why, oh why, did they have to repeat the tag na by syukton · 2005-03-19 09:06 · Score: 1

How much memory are you working with? What's your architecture? I don't want to get you into any trouble with your company or anything for improper disclosure, but I'm curious what sort of project you're working on, specifically.

--
Reinvent the wheel only at either a lower cost, greater effectiveness, or your own personal enrichment and satisfaction.

Re:Why, oh why, did they have to repeat the tag na by Anonymous Coward · 2005-03-19 11:02 · Score: 0

I don't want to get you into any trouble with your company or anything for improper disclosure, but I'm curious what sort of project you're working on, specifically.

No problem; that's why I'm posting as AC.

See: http://www.rabbitsemiconductor.com/ (2000 series). It's an 8-bit processor, similar to the Z80. We're very happy with the processor's speed, but we're trying to minimize memory usage to save cost. We use it for controlling various devices via RS232 serial ports. We also need it to run a TCP server so that we can remotely send it configuration data (in XML). Everything needs to fit in 256 KB of flash memory. The compiler is supplied by: http://www.zworld.com/

YAML by Bronster · 2005-03-19 15:08 · Score: 1

Last time I tried the Perl YAML module I could generate a pathological perl data structure (strings designed to look suspiciously like bits of YAML) and corrupt the output sufficiently that it didn't parse back into the same data structure.

This was a bit over a year ago.

I'm sorry, but I'm just not interested in using a format where I can't rely on it being clean enough to even pass printable text cleanly through a conversion and back again. Get back to me when you've got a format which isn't a crock of shit.

Re:YAML by doom · 2005-03-20 17:25 · Score: 1

Bronster wrote:
Last time I tried the Perl YAML module I could generate a pathological perl data structure (strings designed to look suspiciously like bits of YAML) and corrupt the output sufficiently that it didn't parse back into the same data structure.

This was a bit over a year ago.

I'm sorry, but I'm just not interested in using a format where I can't rely on it being clean enough to even pass printable text cleanly through a conversion and back again. Get back to me when you've got a format which isn't a crock of shit.
Interesting complaint. You've filed a bug report, right?
Re:YAML by Bronster · 2005-03-20 18:10 · Score: 1

I'm pretty sure I emailed the author of the module at the time with a test-case - but it's over a year ago and at a previous job where I didn't get a copy of my sent email or work area when I left (it wasn't on the best of terms...) so I can't check.

Ah yes, the "error recovery" excuse... by alispguru · 2005-03-19 16:18 · Score: 1

It also makes error recovery very difficult, something that we know is quite important from all that malformed HTML code out there. The XML creators knew that too.

The extra redundancy of closing tag labels makes sense when your documents are generated by humans, like most SGML was.

It makes no sense at all for documents that are generated by programs, especially programs that create documents in some canonical manner like building DOM structures and then serializing them - if you trust the serializer, then it can't drop a close tag.

--

To a Lisp hacker, XML is S-expressions in drag.

Re:I COULD NOT AGREE MORE. gzip is our friend! by TheLittleJetson · 2005-03-19 16:53 · Score: 1

But bandwidth and even LANs aren't that fast which is where the bottleneck occurs.

Like I said in my post, gzip works pretty damn well for networks too, as it supports streams. If you're running a web server, use something like mod_gzip -- if you're writing a network application with a custom-made XML-based protocol, you can simply wrap a gzip compressor/decompressor around the socket stream.

Binary XML is intended to make things parse faster, but as others have said, it's worth the extra CPU power to preserve a format that is human-readable. Compression is an easy fix for disk storage / network transmission problems.

Fucking retarded mods ... fuck off by ZeekWatson · 2005-03-19 19:19 · Score: 0, Troll

What fucking retard modded this to a troll?

Re:I COULD NOT AGREE MORE. gzip is our friend! by Munrobasher · 2005-03-19 23:38 · Score: 1

Whilst this isn't quite on-topic, it is to do with text versus binary formats :-) Our problem is that when you start trying to make a webapp do more client type functions (like grid, tree controls etc.) then the size of the HTML file with Javascript but most often state information becomes a problem. Even with an include, it still downloads a large jscript file as text and then compiles (or more likely interprets).

The same is true of using XML.

Is there anyway to compress the HTML and jscript files as they are fetched? Good-old modems used to compress the text stream on the fly, does the same thing happen with broadband?

Cheers, Rob.

Re:I COULD NOT AGREE MORE. gzip is our friend! by Munrobasher · 2005-03-19 23:43 · Score: 1

Binary XML is intended to make things parse faster, but as others have said, it's worth the extra CPU power to preserve a format that is human-readable. Compression is an easy fix for disk storage / network transmission problems

Binary XML and thinking further, binary HTML would IMHO be a good compromise. We can keep readablity at the development end but maintain efficiency over the network. Everyone seems to assume the internet is now really, really fast. Well, it isn't :-)

As long as you can very easily rebuild the text version of the XML/HTML at the client end. Hmm, that's a nice way to secure things as well - encrypt the binary as well with keys and then people can't look at your source code if you don't want them to.

Binary XML data would also be inherently more secure to packet sniffers although not perfect.

Somebody must have thought about this before.

Cheers, Rob.

Re:I COULD NOT AGREE MORE. gzip is our friend! by Munrobasher · 2005-03-19 23:50 · Score: 1

>Somebody must have thought about this before.

Ohh hang on, I've just re-invented the "the compiler" :-)

Rob.

Re:I COULD NOT AGREE MORE. gzip is our friend! by TheLittleJetson · 2005-03-20 09:04 · Score: 1

Is there anyway to compress the HTML and jscript files as they are fetched? Good-old modems used to compress the text stream on the fly, does the same thing happen with broadband?

Yes, as I've said now in 2 posts on this thread, you can transparently compress stuff from web servers. Web browsers send the server an "Accept-Encoding" header, that shows the different formats it can take. Most (all?) modern browsers support gzip.

The project I'm doing at work is XUL (XML user interface language, as used in Mozilla browsers, Thinlet and Macromedia Flex) and JavaScript. Both of these are sent to the client gzipped.

Interchange File Format by sorbits · 2005-03-20 18:29 · Score: 1

Furthermore his question about why no-one did this before 1996 is wrong. Electronic Arts did IFF in 1985 for the exact same purpose: interchange of data between applications and computers.

Cowlishaw's LEXX influenced Bray by JohnQPublic · 2005-03-21 03:18 · Score: 1

If Bray worked on the OED in 1985, he must have seen IBM Fellow Mike Cowlishaw's LEXX editor, which was the thing that displayed "the electronic version[] of dictionary. It was what we would now call XML. It had little embedded tags saying entry, word, and then pronunciation, etymology, a brief quotation, and the date, source, text, and so on." It was a color-terminal application, running (initially) on IBM's VM mainframe system. LEXX was the subject of an IBM Journal of Research and Development article back in 1987 - it's worth reading, even though the screen-shots didn't survive being scanned into the PDF.

Slashdot Mirror

Tim Bray On The Origin Of XML

218 comments