Why XML Doesn't Suck
Richard Eriksson writes "Recalling the earlier discussion on why XML sucks for programmers, Tim Bray clarifies his stance on his co-creation, XML, and gets back on his pulpit to declare that XML Doesn't Suck. He writes: 'Let's look at some of XML's chief virtues, then I'll address some of the XML-sucks arguments, in the same spirit that Sammy Sosa addresses a fastball.'"
That would be "I didn't have sexual relations with that woman" A subtle distinction. ;-)
And to stay on topic, XML sucks for some things and doesn't suck for others, just like any other technology. A hammer claw is a fine tool for removing a nail, but not as useful for removing a splinter from your finger. Less energy needs to be spent on arguing whether technologies like XML suck or not, and more energy needs to be put into studying their most practical and optimal uses.
"Times have not become more violent. They have just become more televised."
-Marilyn Manson
XML is much better that anything else in certain situations.
XML is much worst that lots of other choices in certain situations.
Why can't you see the shades of grey, and insist on seeing all in black and white ?
Have fun,
Daniel
4 is a big old red herring.
The data compresses so well because it's encoded in a highly inefficent manner. Your average compression algorithm will be able to find more redundancy and give you a better % compressed, but it still won't compare with a human actually packing the data tightly together in the first place.
or, to take a more information theory POV, there is a certain amount of information in your post, which can be compressed down X percent by default. That same information has to be encoded in the XML version, and has the additional overhead of XML to deal with, so even compresed it will always be larger than the compacted and compresed binary only version.
XML has a lot of strengths, but compactness is not one of them.
I read the internet for the articles.
sorry for the tools you're stuck working with, but xml as a language/specification is agreed upon. it's in the vendor's implementations where YMMV. i haven't worked with perl/soap, but many people find the xerces parser to work nicely.
computers don't have to deal with the xml schema, it's someone's implementation of how to handle schema's is where the problem comes in.
just my quarter.
They can't help it though, the W3 committees are infested by the same lifers who destroyed SGML. It would be refreshing to see a standards committee for once run by people who are suspicious of standards committees. Right now the XML world is run by the people who live off of the small cred being on a committee lends to their consulting biz, etc. so they have no motivation to ever finish the committee's works.
I don't know XML. I used to know HTML. It used to be simple and consistent and easy to manage, too. But I don't know it anymore because it's expanded into this monstrosity that requires validation, formating, and whatnot.
How long will it be before XML, which may be simple and easy to make consistent now, is "extended" into a similar monster, only to be replaced by some other "savior" specification?
Why is it so difficult for us to recognize that, except for the most basic of things, automation is hard. There's no silver bullet. The job is hard, and as soon as we think we've got one thing automated, we're going to try to depend on that thing to automate the next thing. Then we're going to discover that we didn't do such a good job with the first thing and we'll have to go back and fix it.
Automation is hard. There's no silver bullet that will make it easy, not even XML. We may think it's going to solve all of our problems, but it won't. We won't know whether it solves anything until we rely on it. At which point it'll break because it's too simplistic, and we'll have to change it so that it's complicated and not simple and then we'll be again waiting for the next "savior". Call me a skeptic, but I don't think there is one.
Key to financial independence: Spend less than you earn. Save and invest the difference. Do it for a long time.
I'm running OS X, and it sure sucks that almost all my preferences are stored in easy-to-parse buzzwords!
XML is very useful. It's not XML's fault that Microsoft isn't implementing it.
--
the strongest word is still the word "free"
People who say XML sucks are the people who are forced to look at it and change it by hand.
But XML is not for that!
XML is like dough. Nobody eats raw dough (it's probably OK to eat it, but it ISN'T tasty), but eats cookies and bread instead.
XML is NOT for user and/or administrator usual exposure, XML is for application data transfer.
And applications that require XML to be written by human are only half done: they should be used in combination with HumanInput -> XML generation programs.
If I have a system and I want to publish some data so that you can read it, we used to have to go through a long song-and-dance routine where we decided what byte-order things were in, and which character sets we were using, and exactly to the bit how various fields were aligned. Once we'd hammered out a 40-page design spec of our interop format, you'd go and code a reader to the spec and I'd go and code a writer. Then we'd come back and find we still couldn't talk to each other because of inconsistencies and ambiguities in our spec. So we go through a couple of weeks hammering those out. Now we're sending data to each other, but actually there's a couple of subtle bugs left in your reader, so you suddenly send a final payment demand for $300,000,000 to a granny in Cardiff, get a mass of bad publicity and are eventually forced out of business.
Nowadays, this procedure has been replaced by "agreeing to use XML". The parsers are standard and implemented on a wide variety of platforms. They've been extensively debugged - far more time has gone into debugging Xerces than you'd ever have put into a custom format parser. They're Free, in both senses. They're easy to use. They're just - better!
No. Really.
There are IEEE specifications for numbers that are exact down to the bit. And processors actually comply to them.
Now convert your number to text, using a decimal representation (as AFAIK is recommended for XML). What you get is typically not the number you had before.
SOAP itself is what you're really complaining about being inconstant between your Perl SOAP and .NET - if the documents parse, then XML is working fine.
I don't like SOAP for most uses as it's overly complex for things like simple RPC style calls. Simple XML over HTTP can work just fine for how most people use the thing - it's not like everyone is doing distributed transactions or things that really take advantage of the SOAP envelope.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
If you're using the proper tools, and programming with the proper libraries, there's no reason you have to dig down into the XML in order to "write SOAP calls". I've used SOAP for a handful of tasks, and I can't tell you anything significant about how the requests are represented in XML. Developers don't necessarily need to know that. If things are breaking for you, and you're having to debug the actual XML data to figure out what's going wrong, then either your toolset is buggy or you're not using it correctly.
Seems to me that Plain Text is a pretty good document type. Seems that XML is a way of structuring some of that data. Seems that something else has to be layered over that - specifically, the tags that you create.
So when you read the file, you parse the text, then the XML, then your tags to get your data into a usable state. XML is is just a way of formatting text. That's where the "meta" comes in. It's not a document type, it's just a standard for creating document types.
The only way XML makes data long lived, is by leaving it in plain text so that it remains open. Your web app will be replaced in a couple of years. Another app can be written that will read your files, because humans can read your files, not because of some Eternal Data Tag that XML applies. Proprietary files could be handled the same way, except that the format isn't open.
Now, just because you used XML doesn't mean that your format (ie: your tags, your way of breaking up and marking different elements of data) will be eternal. You can break up a big old text file and mark it up, and your bosses will decide months later that they're looking for some piece of information that you didn't tag. Like they want to pick out all first names from within your "customer comments" tag. You re-write your format. You manually re-write your files.
It might be more useful for Mr. Erikkson to develop a few of these final file formats using XML to present as standards. A suggested set of XML tags for addresses, for example. Do you tag the street name (Main, Elm) differently from the street type (Street, Drive ) or is that all in a single "Address1" tag? Your XML will never work with my program if the higher level formatting isn't agreed upon. And XML doesn't do that.
You read some of the arguments against XML, and you realize that people just don't "get it".
1 - XML sucks as a language
Repeat after me, XML is NOT a language. Certainly not in the sense that C++ is a language. XML is a standard that defines how one structures data.
2 - XML is bloated, I can send binary much cheaper/easier
DUH. If your application is fine using binary data transfer, then USE it. HOWEVER, many applications that either have to A) communicate with other applications or B) have to deal with varying data sets benefit greatly from using XML. Anyone who has been programming for any length of time knows that while binary is more compact, it is less flexible and potentially more error prone. Want to add a new field in the middle of your data, boy you better not get your software versions mixed. Want to write an app that can do reasonably intelligent things with ANY data it recieves, binary is not the way to go. As with all things in life, use the tool for that which it was intended (vs some peoples view that it is the end all be all of data representation).
3 - It's slow
Same as 2 above. If absolute performance is an issue, then by all means, use whatever representation gives you what you need. XML is about flexibility and standardization, NOT performance.
4 - It's complex
Well as complex as you want to make it, and it does sometimes encourages more complexity than is really needed, but it doesn't FORCE you into it. If you want/need schemas, go for it. If you need the functionality but in a simpler form, then do that (unless of course you need to communicate with another system expecting a schema, but his is obvious). It's just like C++, you don't HAVE to use templates and multiple inheritence (hell, you don't even have to create classes if you don't want/need), you use the parts of the tool that are useful and provide benefit, you don't use them just because they're there.
So I don't see what all the bruhaha is about. It has it's strengths, it has it's weaknesses. As with anything, relatively, new, people are trying it in various places. Some of these places not really fit, others do. I've designed apps that benefited greatly, others I've dismissed xml for entirely.
Hate to burst your bubble, Tim, but this is the same justification that Microsoft to defend their monopoly on PC operating systems. There wouldn't be any portability issues if everyone used Windows(but there might be stability issues!)
And I agree with the notion that standards are a good thing, however, I have to be realistic at the same time. Any standard sufficiently broad to cover all of the possible bases will be so general as to be useless, or at the least, very inefficient in a large number of cases. The reasons why different standards crop up is because different users have different needs and values. In the UNIX community, portability, stability, and interoperability are highly regarded, where as in the Windows community, flashy GUI's and speed are often more important. Hence, two widely different systems.
The portability of XML is nice. The fact that it can represent just about anything is also nice. But the nature of XML precludes indexing, which means if I'm searching for a particular record in an XML dataset, I might have to read the entire file. Not a problem for small databases, but for mainframe size databases, this is simply unworkable.
No, XML doesn't suck. But then again, it's not a silver bullet either. Need I say the adage about hammers and nails?
The society for a thought-free internet welcomes you.
XML is mostly just a buzzword, used by middle-managers in meetings
Perhaps, but those meetings are about the fact that the department over there uses technology X and the department over here uses technology Y and the company saves $$$ if the two departments can actually talk because right now you pay people to do data entry twice and you pay more senior people to deal with the discrepancies.
These managers ask their tech people "How do we deal with this problem" and they hear "XML" and take that up the chain.
The bottom line is that in a company, system integration costs are the biggest expense in IT. XML decouples data from platforms and that makes integration easier and saves big bucks. So it becomes a buzzword because upper management needs buzzwords to describe things that enable.
I strongly suggest you take a complex MS Word document, and convert it into StarOffice 5.0 format, then into OpenOffice 1.0 (XML) format. The filesize of the OpenOffice 1.0 (XML) document will be FAR smaller than either of the previous formats. Add to that the fact that it is using very weak compression (zip) on the XML files.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
Most of his (excellent) points have to do with exchanging data between applications (with long-term storage being essentially a special case of that). And he's right -- for those, XML is a huge win, and we should all bow down and worship at its feet.
However, because XML is such a huge buzzword now, people are proposing (or insisting on) using it as a format at the heart of complicated applications. Where anyone would have said 'Use a database' a few years ago.
In doing so, people are losing sight of the essential beauty of the relational data model. With a RDBM, you, the programmer, have tremendous flexibility about *how* you view your data. This is a huge win inside of an application. XML forces you to commit to one specific view of your data. Yes, if that data needs to live forever and yes, if that data needs to get sent to someone else, than by all means, store it in an XML file. But if you need to *do* something with that data, you're going to be much happier with a relational db.
-Dan
I have written a truly remarkable operating system which this sig is too small to contain.
No. There is no contradiction between "Current XML parsing solutions suck for programmers." to "XML is a good thing overall." It's not a 360 [sic], because those are two seperate dimensions entirely.
RTFAs.
(One could argue that no good parsing solution is itself a weakness of XML, but IMHO the problem is that we got stuck going down the wrong road(s) for parsing, with SAX and DOM, both of which look good on paper but lack a certain practicality. If in five years there's still no good solution then maybe it is XML's fault.)
I work for a publishing services firm that is focusing on XML-based production of print and online materials, ranging from books to scientific journals to grade-school testing applications.
Simply put, XML is the best tool available for storing content to be databased, searched, rendered in multiple formats and broken apart and reconstituted into custom documents. XML also lends itself nicely to the representation of complex mathematics using MathML. Because of this, we've based many of our production processes on XML.
One particular journal we produce is a heavily mathematical, 250 page weekly scientific journal. This journal is produced in both print and online forms, as well as being databased by the publisher. Using tools such as Arbortext Epic (www.arbortext.com) for content editing and Advent 3B2 (www.advent3b2.com) for semi-unattended formatting we are able to produce the journal with a staff of only 10 people. A year ago, it took twice as many people and the end product was not nearly as flexible. In this application, XML rocks.
However, using XML in every application imaginable without considering whether or not it's the appropriate tool can be quite foolish. A hammer is great for pounding on things, but is pretty worthless in nearly every other application. A lot of the frustration felt by coders implementing XML solutions is due to the fact that it may not be the best tool for the job.
Spot the 'lite' user of XML. If you're dealing with anything of any size, complexity or (let's face it) use, then that's a really good idea for unmaintable, buggy XML.
Is it the best? Probably not. But it's undeniably an effective lingua franca. A human can easily creat, edit, and manage it dynamically - you want a new tage you just do it.
Then, it's also as easy on the software side to reflect those changes. The fashionable arguments people use against it (why is it so fashionable to bash anything that happens to be a buzzword?) are non sequiturs in terms of what XML is intended for.
I use it, hell I probably overuse it. It's so damn easy to parse that I don't want to waste time building a custom format just to save that extra 1K of space or 1/100th of a second.
XML does allow for ambiguity. However, it also allows for a lot of control -- it's just that many users don't make use of it.
If you wanted to, you could write an XML document without any sort of DTD or schema. It validates, because there's nothing to validate it against. Similarly, many companies create XML files without bothering to create schemas, and so they run into problems because they didn't define their own document structures first.
XML has its own standard, but that standard isn't meant to extend to the content. It's just a few simple rules on syntax, not a tag structure. You are left to do that on your own. Similarly, the tools out there can validate XML documents against schemas or DTDs (ever tried Xalan?) but they can't do a dang thing if you don't have a schema to go with your file.
You seem to be blaming the W3C and an open file standard for your own problems with document structure. It appears you're not using the right tools.
got standards? --- http://www.w3.org/
First, XML is a language used to define markup languages. Those markup languages are called XML applications. The XML Application designed to store your preferences for a particular software application may be quite different than the XML application designed to store the data used by that software application.
All XML must be properly formatted, or well-formed, for a given XML document to be readable by XML parsers. That means that beginning and end tags have to match, the tags are all in lower case, and attributes must be enclosed in double or single quotes, as well as some other rules. It's also possible to define a DTD or Schema which formally defines the XML application and can place restrictions on how the data in a given tag can be represented. The Schema gives the XML application developer the most control over how the XML application documents can be constructed.
Not all applications need be "anal" about their data files and a certain amount of flexability can be very useful. When XML is used for configuration files, for instance, tags that aren't understood are ignored by default. The XML application schema can be written so that missing tags take on a default value, making it easy to upgrade software applications without having to include a conversion program to update the configuration files. Ignoring unknown tags also means that additional tags can be added to XML files produced and used by one application in order to add data for a second application that uses information from both sets of tags.
Enforcing the standards for a particular XML application is the job of the DTD or Schema which defines the XML application. Software applications that use the same XML application will use the same Schema. Using the Schema and validating the XML provides just the degree of "analness" that the XML application designer and software application designer desire.
-All that is gold does not glitter - Tolkien
www.ra
I could care less whether "<", "(", "{" or any other character begins a tag. The structure of
beats by a mile.Data should be stored in a way that is easy to parse and unambiguous to design. XML would have been better designed with a way to represent pointers (e.g. LET/LETREC) than the silly attributes and other syntactic nonsense.
-m