Tim Bray on the Birth of XML, 10 Years Later
lazyguyuk writes "Tim Bray posts a lengthy blog on the birth of XML, formalized as 1.0 in Feb 1998. 'XML is ten years old today. It feels like yesterday, or a lifetime. I wrote this that year (1998). It's really long. The title was originally Good Luck and Internet Plumbing but the filename was "XML-People" and I decided I liked that better. I never got around to publishing it, so why not now?'"
Just wondering as I'd love to read it! ;)
Thanks BillG
Young Buck: Hey, we have a data exchange problem between two systems, lets use XML !
Greybeard: Ok, but now you have 2 problems.
I want to delete my account but Slashdot doesn't allow it.
I realize the XML is used for a lot of things, but whenever my fellow developers learn that the vendor is shipping us some interface in XML, the groans are audible. About half the time, their XML format isn't quite standard, and we've got to dig around for utilities to try and work with it (or write something custom). I'd say the vast majority of our interfaces are good ol' delimited text files.
For other purposes, XML is great and very readable, but I'm not sure it makes sense to use it everywhere.
Considering all the (internet, and elsewhere) crapola that gets passed around as XML, with pretty much anything-you-want included, I don't really understand how we can call it "formalized".
Add to that the fact that then the ability to "display" XML comes down to the whatever-you-want-to-write manner, and I think there are plenty of people who would be hard to convince that there really is a "formal standard" for XML.
Perhaps Duke Nukem Forever will be written with this fantastic standard?
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
Do you maintain a website? XML has been a godsend for those who want to maintain web and print output side by side. By keeping your data in an XML format, you can use simple XSL stylesheets to generate multiple types of output. See e.g. O'Reilly's XSLT Cookbook for dozens of very real-world examples (it's probably in your library).
That's just one example of how XML technology has made coding easier. Others I'm sure will point out others.
If you aren't a developer, then I'm not sure XML was supposed to directly revolutionize your end-user experience.
<Greetings>
<Birthday>Happy</Birthday>
<Who>XML</Who>
</Greetings>
Wasn't there a story about this 10 years ago?
I've recently taken a job at a primarily Java shop. After seeing XML used and abused for ant, maven and various other things I've grown even more disenchanted with it. And now I've also gotten the chance to see that not only does Java represent a poor trade off between the annoyances of a strongly typed language and the speed of a dynamic interpreted one, it has a horrible mess of dependency issues that nobody really solves besides.
I'm much more hopeful about technologies like Thrift and/or D-Bus than I ever was about such abysmal abominations as SOAP, or the only slightly better XML-RPC.
The Java XML world seems like this little closed ecology of mutual masturbators who all come up with more Java and XML 'solutions' to problems that never existed before they started using Java and XML.
I see the value of XML for long-lived documents that don't spend a lot of their life on the wire. And possibly for config files, though IMHO it is too ugly and unreadable for those. But as a general tool for Internet plumbing it's awful.
Need a Python, C++, Unix, Linux develop
Looks like you're going to have to wait a little longer. Try holding your breath, this time.
XML is like violence.. when it doesn't work, use some more!
I recently began testing some RSS and Atom parsing modules for Movable Type that I wrote, and noticed that they were breaking on different feeds that Google Reader handles easily. When I looked at the RSS and Atom markup, I noticed that the reason was that the various generators that were causing problems weren't always generating RSS and Atom in the way that I expected. WordPress.com, for example, was using content:encoded tags for some of the content for blog posts, and had an empty description tag in that item block.
It's XML, not HTML, so it's not going to be as hard to get working if its done as properly formatted XML, but one problem I have is with the ad-hoc mixing of tags. If you are going to provide a syndication feed or something to that effect, using a standard, and stick to it, even if there are limitations to the standard.
And, of course, my post is incomplete with reference to my little rant on why CORBA and other forms of RPC are bad. Both Thrift and D-BUS are pretty close to the ideal solution I describe later. They focus on message content over semantics and are extremely easy to parse. SOAP and XML-RPC fail on both of those counts. They are about semantics (you are making a remote function call that does some specific thing, not sending a hunk of data that has some particular content) over content and they are a huge pain to parse.
Need a Python, C++, Unix, Linux develop
It already did. They called the revolution Web 2.0. Sorry you missed it.
Perhaps I'm being too negative here. I sound like a troll. But really folks, do yourself and the rest of us a favor and read up on JSON and YAML. You''ll see I'm being only too kind and generous to YAML.
Some drink at the fountain of knowledge. Others just gargle.
If everyone had jumped on the boat 10 years ago, it might have. But that didn't happen.
XML is too difficult, and allows abuse/over-use too easy. Personally, I love it, but I'm a minority. The other key-factor is that there is simply no short term need for it in many places. Or better, the need for it isn't recognized by the majority. Pragmatic solutions have a tendency to win over new revolutionary ones.
It only takes one man to change the Wisdom of the Crowd to Tyranny of the Masses.
>By keeping your data in an XML format, you can use simple XSL stylesheets to generate multiple types of output.
Just like LaTeX! Reinvention is a wonderful thing.
Evil people are out to get you.
I do a lot of Java and XML. I don't know what you're using for a library, but I'd suggest JDOM.
As for the abuses for Maven and Ant... yeah. I'll agree. There are a lot of things that seem to use XML just because they can. I know there is some theory behind why they use them (machine readable, blah blah blah) but for most things it's just a giant pain for the complexity you get. Maybe if you were trying to build Windows with Ant.
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
Excellent point, and I'll take it one step further. When coupled with XSLT and other WS-* standards, you have an extremely flexible way to connect otherwise absurdly different applications (See Sun's OpenESB and JBI standard).
The hatred for XML, I think, stems from frequent, ugly misuse. Here's one basic, freakin' obvious rule: if a human, at any time at all, has to read or manually edit an XML document, you're doing it wrong. Just because it's ASCII doesn't mean it's human-compatible.
LaTeX is restricted to certain types of print output. It emphatically cannot output HTML easily. Just look at the umpteen thousand threads on comp.text.tex where someone complains that
.Yay! Nothing like the combination of XML and Java to bring out the haters. Incompetent use of a language/API doesn't equate to a bad language/API. I can show you plenty of crappy C/C++ code freely browsable in some open source libraries. Does that mean C++ sucks? Hell no.
My experience with Java+XML you ask? OFX servers for financial institutions. Without name dropping, check out the list of banks, brokerages, tax services, and credit card providers (Quicken) out there successfully serving up client data. I guess we're all circle jerking while you're downloading your account information into Quicken or Money.
Some good uses for XML:
Some bad uses for XML:
I have to admit, I'm clueless about your Java dependency issues. The only way I can see that ever happening is if you're dumping all of your classes into the default top-level package; and that's major user error if you are.
I swear to God...I swear to God! That is NOT how you treat your human!
That should read: Just because it's UTF-8 doesn't mean it's human-compatible.
As compression schemes go, XML is probably the worst I've encountered.
How can it be that XML is 10 years old, and there's STILL no industry-standard way to embed binary data into an XML document without base64 encoding. I want bits and bytes. I want small.
In the course of every project, it will become necessary to shoot the scientists and begin production.
Kevin Smith on Prince
XML doesn't seem like a big deal. SGML was around since the mid-80s, making it over 20 years old. XML is stricter in many ways, and layers some useful concepts on top of SGML. But otherwise it seems to have a lot of the same uses and syntax as SGML itself.
As a side note, I dislike it when people use XML inappropriately, like using XML-RPC when something based on ASN.1 might be more appropriate. (How many wannabe MMORPG projects have I read that are "XML-RPC" based? too many). I'm sure there are good uses for XML, but there are a lot of people out there who apparently aren't aware that there are bad uses for XML.
“Common sense is not so common.” — Voltaire
I'll take an Ant XML build file over an "is that a tab or a space" Makefile any day...
Xenon, where's my money? -Borno
I use it in web development constantly, and have for about 8 years. It's great for documents mostly since it's much easier to process than a home-grown set up.
:-).
You want to transform the document, you can use any of a number of techniques, and trivially guarantee that the resulting document is at least syntactically valid. If you use a home-grown format (or HTML), you'll need to resort to regular expressions, or a custom parser - which works fine up to a point. Regex's are error prone (it's quite difficult, for instance, to make an untrusted HTML document safe with regex'es), and parsing is difficult, and doesn't solve the transformation step very elegantly - wheras XPath and others are absolutely brilliant for quickly distilling the stuff you need from a document.
But on the parsing side... take a look at ANTLR, it's just great
In general, if you have data to be structured and serialized, XML is one way to do it. If you think XML a poor choice, then could you suggest an alternative? Incidentally, that suggestion should not imply that everyone reinvent their own formats (again).
Would you provide evidence aside from personal anecdotes, and possibly consider evidence to the contrary?
Perhaps you meant “modern software” instead. Any complex application these days relies on dozens of libraries and services to perform tasks. Not quite sure where exactly you are having difficulties, so I cannot elaborate further.
XML is intended for consumption by machines first, people second. You might also argue that in-memory data structures are ugly and unreadable.
Incompetent use of a language/API doesn't equate to a bad language/API
No, but incompetent design of a language/API does, in fact, equate to a bad language/API.
OFX servers for financial institutions. Without name dropping, check out the list of banks, brokerages, tax services, and credit card providers (Quicken) out there successfully serving up client data.
I'm aware of OFX, and it is something I consider a non-evil use of XML. It is all about the data, and the data is high-volume, structured and text-like, so something like XML makes sense for representing it.
OTOH, name dropping gets nowhere with me. Large institutions routinely adopt very stupid technologies for the most ridiculous of reasons. I'm much more interesting in what a small, nimble high-tech company like Automated Trading Desk is doing than what Chase-Manhattan is doing. Of course, ATD appears to have gone to an all-flash homepage, which is an impressive level of stupidity, so maybe they've gotten all grown up now.
I have to admit, I'm clueless about your Java dependency issues. The only way I can see that ever happening is if you're dumping all of your classes into the default top-level package; and that's major user error if you are.I do a build with Maven and it pulls down at least 20 different Java libraries and packages them all up with my program for even the most innocent of dependencies. Not only that, but then when something is deployed it tends to get deployed with all of its dependencies. No sense of a standard place to put libraries or trying to make sure that you don't have 20 different versions of a library around for the 10 different apps that use it. It's a nightmare.
And when I complain to Java people they tend to tell me "Oh, enterprises like it that way, it means they can stay in crufty code land forever and never have to upgrade anything if they don't want to!" which I read as "We don't really want to actually spend any time trying to make our development process vaguely reasonable, we just want to toss code on the wall and wait for things to stick.". It's pathetic and makes for intolerable integration issues for larger projects. I guess it all fits with the idea of Java being for programmers who don't actually want to think about the code they write.
Need a Python, C++, Unix, Linux develop
That's because OFX IS A DEFINED STANDARD - a standard driven by Intuit. I guess you're too young to remember NPC - a competing standard? Or having to support BOTH? Oh yeah... that was great fund.
You tell me what is a standard in Ant? Nice taking his comments out of context.
Java is clearly moving away from the massive over-use of XML in everything from configuration to messaging. From Java 5 onwards, annotations are rapidly becoming the configuration mechanism of choice, where infrastructure configuration is placed in the source code directly, in a way thats significantly less obtrusive than writing code to manage things like persistence and transactions yourself, and significantly easier to follow than placing it in many XML files. Anyone who has migrated from EJB 2.1 to 3.0 for example should be much happier now that the various XML files needed to get it to run are going the way of the dodo. This use of annotations to replace XML is an emerging trend popular in many frameworks, from EE 5 through to Hibernate and Spring. On the messaging side there are a slew of code generation tools and XML-to-POJO (annotation-based) mappings that keep you away from raw XML - yes its another layer of abstraction but it keeps you away from the coding horrors of SAX, DOM, and yes even the comparative simplicity of JDOM.
I thought that was caused by people adding comments boxes to webpages? You don't need XML to do web 2.0 type stuff :o
which is totally what she said
Java and XML are similar in that both of them got over-hyped. They're also similar in that sometimes they really are the right solution -- just not as often as PHBs seem to think. I've had exactly one application where I started designing the file format, and realized, "Oh heck, I'm reinventing XML," so I went with XML and it was the right choice. For config files, the advantage I can see is that although XML may not be optimal for every type of config file, it does provide an alternative to the traditional Unix philosophy of having a different, goofy syntax for every single program's config file. Re Java, what was really a disaster, in hindsight, was applets. They were overhyped, the CPUs weren't fast enough to give acceptable performance, the VM and its libraries are still too huge to give attractive startup times, AWT was a botch and had to be replaced, and implementations of browser plugins still suck -- in fact, my browser crashes every single goddamn time I visit this applet. Because Sun blew it so bad with applets (with a little help from MS), we've ended up instead with the de facto standard being flash, which is basically a totally proprietary system. (Yeah, I know about Gnash, Haxe, etc. Let me know when you can buy a Flash book and make the examples work using a totally open-source software stack.)
Find free books.
There needs to be some description of an XML lite.
For config files and such.
- No doctype needed
- tags are case insensitive
- Can do comments with # character instead of
- Etc
Will someone tell me if this is stupid, smart, or both?
http://www.syntaxerr.org/~daniell/sss.html
Try writing a regex for parsing documents consisting of arbitrarily deeply nested elements. Say, documents of the form
<x><x><x><x>...</x></x></x></x>
See?
HAND.
TFA is a fun read. Too bad XML sucks. As Jerome and Philip Wadler write, "[T]he essence of XML is this: the problem it solves is not hard, and it does not solve the problem well."
Lisp had the same problem solved 40 years earlier. While a lot of people find S-expressions verbose, XML is quite a bit more verbose. Slava Akhmechet has a nice essay on the relationship between the two notations.
Your god may be dead, but mine aren't!
Here is another obvious rules: If a computer, at any time at all, has to parse or generate XML in large amounts, you are doing it wrong. There is really no need to resend the same string 100000 times, encode multi-megabyte binary data as BASE64 or lose floating point precision by encoding to or from strings. If need be, an efficient binary format can represent the data with an arbitrary schema. Communicating parties can exchange their schemas at runtime and avoid sending attributes that the other end is not going to use.
I don't know about Thrift being a real contender in the web/internet-based services area. Really, code generation? How 80's. Haven't we learned enough from Sun RPC that this is a PITA, give me a proper library dammit! And AFAIK D-Bus is for local IPC, good luck sending messages over a network without a couple of hoops to jump through.
I can see your viewpoint, if you want to squeeze as much performance out of your application you might want to investigate Thrift, D-Bus or simply write your own TCP protocol. It's not rocket science. However in a world where companies expect to exchange data and organizations want to link databases from different vendors together, I'd rather have a poor-but-workable standard than none at all.
SOAP is a complete pile of bloatware. It puts OpenOffice to shame on this front. However I'd rather have a nice Python library that lets me throw around objects and gets the job done than a performance improvement of 50% and a lot of extra work. It's simply not worth it for most of the time, premature optimization.
Having said that, I prefer XML-RPC and REST-style interfaces. The simpler, the better.
This sig is intentionally left blank
I like XML as a data format, but I am sick and tired of lazy people using it as a programming language format. If you want to design your own language, do it properly, and don't drown other folks in angle brackets, double quotes, entity train wrecks and so on.
Only maybe XSLT gets a pass on this, even though XSLT is a godawful horrible mess.
..for 3 years of optimism... ...followed by 7 years of bloated files, bandwidth increases, unenforced constraints (hell, *undescribed* *non-existent* constraints), duplicate "unique IDs", unreadable "human readable" documents, unenforced constraints, ambiguous schema, confusion between syntax and structure, wasted stack space, stupid whitespace issues, stupid encoding issues, infinite numbers of documents representing the same data, setting data management theory back about 40 years (hierarchical, text-only), and angle brackets. Lots of fucking angle brackets.
XML: lets incompetent people feel smarter by making their tools more limited.
PS: I had an issue last week parsing a 4GB XML file.. I solved it with "grep", isn't that funny?
Also, for tree-like structures, XML/XQuery databases can often beat relational (once you start getting into 10+ joins in the latter, that is). Of course good XML databases don't really store XML in text; they merely use the XML Infoset as their data model. Still, XQuery is pretty convenient, and much more readable than SQL.
Does anyone still use latex2html? All of the TeX users I know who care about HTML output switched to tex4ht years ago. It produces a variety of XML formats, including XHTML (with MathML) and OpenDocument.
I am TheRaven on Soylent News
What you want is called a balanced group. The .NET flavor or regex's have the ability to parse those (Depth keyword). See here.
Of course, by definition, an arbitrarily nested structure is not "regular", but regex's have been adapted to do all sorts of things that really fall into the realm of what a CFG should do.
mod -1 we-know-what-he-meant
-1 not first post
There aint no pancake so thin it doesn't have two sides.
What was I doing 10 years ago? Well I was kissing girls and such (being 10 years old then) I wish I could say the same again.
One computer storing temporary data? XML is worthless. A computer storing data for use on said same computer? XML brings little to the table.
One computer program writing something that a different computer program will read from a file system at a later date? Look at XML. If you save a non-trivial amount of processor or developer time, go with it.
And let's ignore the fact that AJAX really doesn't work without XML, will we? Because that kind of defeats the original whiney argument.
Because the only thing that's more scary and complex than the overly-complicated RDF we have today is the under-planned, overly-extended JSON and YAML that we'll have five years from now, whose original form is twisted and contorted beyond recognition in an attempt to make it do things in the future that XML was designed to do from the get-go.
Ergonomica Auctorita Illico!
> Here's one basic, freakin' obvious rule: if a human, at any time at all, has to read or manually edit an XML document, you're doing it wrong.
Amen! Which is why I absolutely HATE HATE HATE XML config files. Because they aren't human readable and editing one is an invitation to disaster. There are no editors so XML is only useful for apps to communicate with each other. And there are equally useful ways for that to be implemented.
Seriously, there is no editor. I'm told you can buy them for Windows if you spend insane quantities of cash, but I don't do Windows. Comglomerate claims to be working toward the ability to edit XML for *NIX but I only tried it once. Installed an RPM and fed it a Fedora comps.xml file.... and waited. Until the OOM killer put it out of my misery.
Democrat delenda est
what happened with DBASE and its kin. It's easy enough to use that any idiot can...and you end up with schema that reflect that idiocy.
XML isn't the problem. Idiots writing XML is. I'm beginning to think that a certain level of difficult is necessary as a screening device.
The comment boxes are part of Web 1.0, but the RSS feed to those comments is Web 2.0. Web 2.0 defines the machine readable web. Documents designed for computers instead of humans.
While there are some people using technologies like JSON and YAML, for the most part you do need XML for Web 2.0 stuff.
That's Maven's job. It's supposed to get all the JARs you tell it to.
It's not Maven's job to figure out if you actually use a JAR (which gets complicated when code depends on JAR A, which depends on JAR B, which....).
The usual way to handle something like this is to use Maven to keep things up to date on your machine. You can deploy all those JARs with your program (as you seem to be doing) or you can keep them somewhere else on the server and update them manually. Maven makes sure you have the requisite stuff when you checkout someone else's project, and once you put it on the server that code is already there in the classpath so you don't have to upload all those JARs. If you open other projects all the time, having Maven pull random JARs for you can be a real plus compared to hunting them down yourself.
That said, I'm not a Maven fan. Maybe I'm just too old-fashioned. Maybe it's because I don't know the tool very well.
I could say about your "shoehorn everything into Maven and hope it works" mentality.
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
That was just the standard trivial example -- it stands to reason that some people have hacked around it since it's such a common practical limitation. There are also other examples, say, anything requiring arbitrary amounts of (token) lookahead to resolve ambiguities.
HAND.
I don't know about Thrift being a real contender in the web/internet-based services area. Really, code generation? How 80's. Haven't we learned enough from Sun RPC that this is a PITA, give me a proper library dammit! And AFAIK D-Bus is for local IPC, good luck sending messages over a network without a couple of hoops to jump through.
The environment has changed. Dynamic languages allow the code generation to be done at runtime. I think Thrift has a good chance of succeeding in this sort of environment. Of course, IMHO, in order for that to really come into its own, Thrift must insist that any Thrift service support a standard API that allows downloading the API description.
I too prefer REST-style interfaces. I prefer technologies that encourage things to be done this way. RPC technologies almost universally try to make things 'easy' by making network messages look like function calls. And I think this is all the wrong approach for a variety of reasons, one if which is that it tends to lead to very non-RESTy interfaces.
Need a Python, C++, Unix, Linux develop
"Here's one basic, freakin' obvious rule: if a human, at any time at all, has to read or manually edit an XML document, you're doing it wrong."
/points/ of XML; the human read/editability. In fact, if that is /not/ the point of XML, there is no reason to use it at all; you would just use some binary format.
I hate to break it to you, but that was one of the
Religion is what happens when nature strikes and groupthink goes wrong.
So you're the guy who shits tabs in random places in source files, because you haven't figured out how to set up your editor to show you the difference. Please stop doing that. Tabs and spaces are different characters, even if the language you're using today treats them the same. If you're a VIM user, please learn to use "list" and "listchars."
I mourn the day XML was born and I would put out a bounty for the heads of anybody related to it (along with millions of suffering developers).
I long for the day something simpler, cleaner and prettier is created although pretty much anything you can come up with will do (except perl of course)
btw: Why the hell are we still using HTML?, can somebody please come up with a better markup language for the new generation of browsers instead of patching and complicating HTML even more?
HTML is obsolete. It's time for a new, simpler and richer markup language.
Amen to that!
...
For very large systems it works well that way. You don't want to have to retest every module due to a library upgrade in one, the ideal situation is where you can unit test the module and it's interface and just know that the rest of the application works based off that. With a J2EE container you can easily deploy several related but abstracted services that use different versions of their libraries. Once written, a module can be left untouched as long as it meets your needs.
RFC 4180
You should be ground up and fed to the pigs, you fucking faggot.
I agree with most of what you wrote, but this assertion is just incorrect. Plenty of "AJAX" systems use non-XML formats to ship data around. One obvious alternative is JSON, but others exist too.
(Unless you're talking about "AJAX using XML" in the sense of "AJAX manipulating the DOM", but that's not really accurate either, since most sites don't provide well-formed XML as output and they still use AJAX techniques just fine.)
Read my blog.
It's an informational guideline on what MIME data of type text/csv should contain, and it's ignored by the majority of CSV implementations.
Does my bum look big in this?
Well, it is convenient for the developer, rather than the end user, to be able to read the stream. But I agree, there should be a standard binary format for proven applications. I believe standards have been developed for binary XML, but nothing in widespread use. Also, because of the structured format, XML is incredibly compressible, and I use xmill to save my XML data files in a few percent of their expanded size.
Funny person. Of course there are editors, but then the human is editing the content, not the XML. That's fine, if the structure is sufficiently complex. But how often are you searching for one bloody key-value pair in a config file that's 10 times longer than it needs to be, when a properties file would do just fine?
"All the time. Its not that hard. Also, if you're worried about such things as quoting, etc., you can always use fixed-width fields - makes indexing, looking up, and modifying values REAL FAST. Compare that to the mess of xml." I know, I use 255 chars al
Or you can just use a print stylesheet like you're supposed to. You know, that thing that browsers support by default?
$ make love
make: don't know how to make love. Stop
The fact that XML is often difficult (sometimes impossible) for humans to read and manipulate is a failure of XML to meet it's design goals.
Required reading for internet skeptics
CSS print stylesheets have many limitations. And as far as I know, it's impossible to print (X)HTML with appropriate alignment and hyphenation based on language, which with transforming XML to LaTeX or FOP is easy to get.
Oops, that should read "...to LaTeX or XSL:FO is easy to get".
Firstly it's easy to never make the tab/space error. I've used make heavily for years and I don't make that error. What's wrong with your tools?
Secondly Ant and make aren't even comparable in power or capabilities. Ant files are large, hard to read, and are the "training wheels" version of makefiles: There's so much you just cannot do, or cannot do with similar ease.
The Makefile is an exemplary manifestation of the UNIX philosophy: a concise, powerful DSL that does one thing well.
(OTOH, at my day job we do build our Java applications with Ant but are not afraid to use make where it makes sense (haha).)
If you can't learn to use make, maybe you should stick to Visual Studio...
you had me at #!
It Depends. We have systems that are arranged in a long content chain. One machine sends data to the next machine, maybe by pull, maybe by push. Next machine does ... something ... with it, passes it to next machines. Maybe the developers talk to each other, or remember why their predecessor made the system do that, or maybe they don't. XML is really Just The Thing for the job. And the fact that it can be tweaked by a human (e.g. the sysadmin who has to fix a broken thing) is fantastically useful.
http://rocknerd.co.uk
Just because it's ASCII doesn't mean it's human-compatible.
If it's not supposed to be human-compatible, then why is it ASCII?
Give me Classic Slashdot or give me death!
The answer to one particular parsing stupidity is not to introduce a different, altogether different set of parsing stupidities to fix it. XML is not a programming language, and making it into one is a pretty distressing and contorted thing to do.
Need a Python, C++, Unix, Linux develop
Is there a real difference between the two when you get right down to it?
tone
I'll take a Makefile (whose biggest problem can be solved with find-and-replace in any text editor) over an Ant XML build file any day.
I could go off on the problems with Ant for pages, but instead, I'll just point out that the problem you have with Make has a corresponding problem in Ant. What happens when somebody uses Latin-1 smart-quotes in an Ant buildfile? (I've seen this exactly as many times as I've seen tabs in Makefiles.)
Answer: If our BuildBot ever fails due to spaces in the Makefile, we can all see right away who did it, and ask him to fix his editor. Your editor is screwed up, so fix it. We could do exactly the same with Ant and smart-quotes.
Ant and Make are equal on bad characters. Make is far easier to read and write, and more powerful. End of story, as far as I'm concerned.
You make a good point. I know I've tried (and failed) to make a "good enough" XML parser in the past...
Is there anything like an Acid test for XML? Some XML document (or set of) with a bunch of pitfalls that you can test against?
Almost the whole ruddy point.
Personally I'm certain somebody could take a Big step back and say stuff all backwards compatibility with SGML and hence XML and do all (worthwhile) things that XML does simpler and in a lot fewer bytes.
nice post.
Some drink at the fountain of knowledge. Others just gargle.
Personally I disagree with you. XML *is* good at exactly two things:
1) Object serialization format with transformation possibilities, as long as you don't mind the verbosity
2) Interfaces using the above benefits between programs.
To be fair you are basically talking about using XML as a serialization format for hypertext and then transforming to other formats, but in the end it suffers from:
1) Verbosity (like all SGML dialects) compared to something like LaTeX (which is, I believe, better at multi-format document maintenance). Verbosity is an issue because if you have a human editing it, this increases the likely error rate.
2) People think of XML as an information storage device (out to replace the RDBMS). This is just wrong.
The further you get from the two uses I outlined earlier, the worse XML does...
LedgerSMB: Open source Accounting/ERP
I agree that it depends on what you are doing. But reading/writing XML just for a 1-app format makes no sense. Nor does it matter that the data is going to be read later. In that case, I would suggest an RDBMS for many sorts of things.
XML is very good as an object serialization format when you need the ability to transform the object model into that used by another application. So XML in your application would only be a good idea if:
1) Application A was writing files for application B to process
or
2) Application A was trying to write data directly to application B's interfaces.
Beyond that, XML is worthless.
LedgerSMB: Open source Accounting/ERP
From other comments here, I'd say the consistancy is a biggie, but I'm going to guess all three problems exist somewhere in the range of possibilities, or special-case solutions would never have got out the door.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
It depends on what development environment you are in as to whether XML is efficient to parse.
In Perl (and probably C/C++) I would think that the verbosity of the format would be a limiting feature but that this would not be too bad. I wouldn't think that processor time would be saved by moving to a more terse format, but I/O time might be...
In Java, the fact that the language does not efficiently handle text strings is a major limiting factor and the verbosity only makes this worse. Hence XML and Java is one combination I would try to avoid... I would think that a custom binary serialization format in Java would be *way* faster and use maybe 20% of the memory that an XML format would use in the parsing stage.
The major advantate to XML is that it is a useful language for interchange between applications of structured data. I.e. one application can serialize its data into a form which can be transformed into an object model of the other application. However, it still trades efficiency for human readibility and the fact it is based on older standards (SGML). In other words, it is accepted as the method of choice for such interchange, is human readable, and reasonably familiar, but is inefficient.
LedgerSMB: Open source Accounting/ERP
And then you're left with software all kinds of weird little glitches because someone fixed a problem in a library and nobody ever bothers to upgrade to a newer version. Or somebody uses one version of a library to build a data structure or update a database and somebody else uses a different version and they get all confused about what the data really is.
Either you publish interfaces that are not based on any programming language at all and stick to those or you upgrade your libraries. Having a whole ton of different versions of various libraries wandering around your organization seems a recipe for disaster.
Need a Python, C++, Unix, Linux develop
My concern about Java and XML have to do with the way Java internally represents text. Yes, I know it is popular with the buzzword-driven businesses and those businesses with historic ties to Sun, but it is also grossly inefficient and as you say something to be used as sparingly as possible in that environment.
.ini file format was actually good because it was simple and didn't raise the semantic issues that XML does. However, I have also seen issues where people use .ini files where XML would have been more appropriate (I saw someone try to do arbitrary depth menues using a .ini file).
I am not sure about config files, however. I think the
LedgerSMB: Open source Accounting/ERP
"So you're the guy who shits tabs in random places in source files, because you haven't figured out how to set up your editor to show you the difference."
OMG there's more than ONE of them??? I've got the same problem at work - a guy who uses windows and the MOUSE to cut-n-paste c code. NOTHING lines up.
If this keeps on, I'm going back to assembler. At least its clean-looking, and I have yet to see anyone who writes assembler f$ck up the formatting TOO badly! (And no holy brace wars ...)
And yes, I'm serious about assembler - I've been playing around with it for the first time in 15 years this weekend. For some things, its just so much easier than c.
Wait, let me be sure I read this correctly...
simple XSL stylesheets
Wow. No, I wasn't just imaging it.
Maury
Yay! Nothing like the combination of XML and Java to bring out the haters
The word is critics not 'haters'. I'm guessing you're in your late teens or early 20's by your use of such pathetic slang.
I can show you plenty of crappy C/C++ code freely browsable in some open source libraries. Does that mean C++ sucks?
It sure does when it becomes the standard.
Some good uses for XML:
* Ephemeral representations of atomic, structured data; usually for transport.
* Config files. More verbose and the syntax is far better at keeping you from fat fingering a setting and blowing up your app. If you can't clearly read XML, you need glasses.
Using XML for transport is laughable and is a bad use for XML. Binary transport is much more efficient, and doesn't require the time or complexity of a modern parser. If the content is human readable, the binary will also be human readable. If not, you don't waste cycles converting back and forth just so a lazy incompetent programmer has an easier time debugging. Any good programmer doesn't have a hard time printing a binary value from any decent debugger.
Now config files. A good portion of the code I have to deal with every day (probably 20%) is in goddamn XML config and with the brilliance of Aspect Oriented programming Java style even infrastructure level code intercepts are now XML resulting in a fine mess to try to trace anything. Unlike printing a binary value, tracing through layers of XML is not easy. What's worse it ruins your type checking and config errors come up at runtime instead of compile time.
These posts express my own personal views, not those of my employer
It would be great if someone came up with a way to attach a little bit of executable to the data/reference that could activate an object within an application. That way, you would merely access the data, and it would appear like an active object inside your program, or OS if appropriate.
Thank you. It's good to occasionally run into a Java programmer that realizes just how bad it's gotten. It's all being driven by consultancies selling their own brand of programming religion....and like a cult it makes me sick how many intelligent people fall for these methodologies and frameworks hook, line, and sinker.
These posts express my own personal views, not those of my employer
There is nothing simple about XSLT. It is a nutty and extreme idea. Unless your HTML and XML are so incredibly simple as to render the format duality useless the style sheets start reading like gibberish.
These posts express my own personal views, not those of my employer
XML not good for unbounded streams?
Why? You must be confusing XML with DOM. In fact, XML!=DOM.
Expat works just fine for streams. While the folks at Sun/apache found an infinite number of ways to wrap Expat so that it no longer works with streams; it doesn't mean YOU have to be a lemming and just use whatever crap is most available. Simply wrap Expat using the JNI. You get the speed of C in your comfortable little pointerless java womb.
James
Beverly, MA
I wrote a custom filter for Eclipse which inserts tabs in place of any whitespace. Except when it doesn't, because we all know variety is the spice of life. It also replaces as many characters in 1iteral strings as possible with Unicode which looks the same but is different, which will teach that lazy bastard in the next cubicle why we do not use string literals as hash keys. For the finale, it rewraps long lines so that anyone editing the file and then using Eclipse's auto-format will see every long line shifted one character or token to the right, which borks diff something fierce.
I also considered replacing all ls used in literals with 1 but even I'm not that evil.
Signed,
That Guy
P.S. Who caught the 1? Yeah, like I said, evil.
Help poke pirates in the eyepatch, arr.
Save yourself a lot of trouble and use CSS @media instead..
For emergencies. So that when it all goes wrong you fire up a text editor instead of a hex editor.
Show me a large website that keeps its data in XML and I show you a slow website.
For large amounts of data you need a database. Although there are now databases that have an xml datatype.
"simple XSL stylesheets" LOL
XSL is unfortunately a functional programming language done wrong.
Most XML is parsed with real programming languages and converted to some specific output format.
How would one convert XML to PDF? Obviously not with XSL-FO if you want more than some simple text (Wikipedia has a rather detailed paragraph about its drawbacks).
Moreover the implementations are so lacking that I'll take LaTeXs quirks anytime (which are not that bad at all if you don't force LaTeX to do things it just can't do).
Well, I thank $god that I don't have to mess around with binary formats generated by bad programmers. It's awful enough what they do to XML.
UnNetHack: NetHack Improved!
Last time I needed multi format output, LaTeX provided PDF, Postscript, DVI (the more or less "native" output of current LaTeX-compilers) and with minimal work HTML, Text, RTF and Palm-Doc.
That's just wrong.
TeX4ht does this with "htlatex file.tex".
Additionally it supports outputting DocBook and ODF.
UnNetHack: NetHack Improved!
You probably like python, too.
So Java EE has discovered hard coding? Isn't that wonderful! Because annotations are basically hard-coded variables for the code-generator (preprocessor) to run before compilation. I used to think they were cool too, until after about a week of XDoclet I saw Rod Johnson's preview of J2EE without EJB. And then he went on to invent (or at least popularize) XML programming. Now he's pushing pre-processor directives to generate XML to generate Java code to be compiled and then have bytecode injected to the compiled code. What's next? GOTO implemented as coantinuations stored in a OO-database (probably with a distrubuted associative array for caching)?