Why XML Doesn't Suck
Richard Eriksson writes "Recalling the earlier discussion on why XML sucks for programmers, Tim Bray clarifies his stance on his co-creation, XML, and gets back on his pulpit to declare that XML Doesn't Suck. He writes: 'Let's look at some of XML's chief virtues, then I'll address some of the XML-sucks arguments, in the same spirit that Sammy Sosa addresses a fastball.'"
in the same spirit that Sammy Sosa addresses a fastball
You mean he strikes out swinging on three pitches while trying to jack the ball in the stands instead of trying to make contact?
Wait a minute... according to a previous /. article, XML did suck. Now I'm cornfoozed.
What does XML stand for anyway?
Is it good for anything?
Will it run Quake?
.... because people will pay you out the ying-yang to convert their system to use XML ...
... enough said!
Besides, it is a great buzz word!!!
HallmarkOrnaments.Com
Going from "XML sucks" to "XML doesn't suck" isn't clarifying your stance! It is doing a 360. Even Bill "I didn't have sex with that woman" Clinton would have a tough time with this one.
It seems to work well for Apple. It works great for the core database for iTunes, iPhoto and many other apps :-)
Steve likes it, he really likes it!
Just my 2 cents.
today is spelling optional day.
http://www.ietf.org/rfc/rfc3252.txt
a 360 from "XML sucks" is "XML sucks"
try 180.
These days we have zillions of XML parsers to play with, and they make it pretty darn easy. And it sure is nice to know that when a vendor says they'll give you XML, you can read it. Unless that vendor is Microsoft, of course.
Mr. Bray makes a point about the longevity of XML based documents (where he says that tying up documents in a binary format is foolish), but this is a point that (La)TeX users have been arguing for years.
Will XML really solve this problem? Hopefully the OpenOffice format will help, but if Microsoft maintains its marketshare (and keeps its XML generation limited or even proprietary), are we really better off?
I'll just stick with LaTeX.
I havn't read the article yet, but XML does NOT suck because:
1. the data and/or fields added at anytime WITHOUT breaking anything
2. the data is in a heiracherical format, reducing data replication and allowing for a more sophisticated data structure.
3. the daya can be changed by a text editor.
4. and BECAUSE the data is text, it compresses REALLY well.
I don't get all this fuss over XML. It seems to me that it's just a pretty handy markup language for programmers to use to store data in a human-readable (and therefore human-editable) fashion, that (with the help of things like libxml) also happens to be fairly machine readable. It's also extensible (X- duh!) and yet also has its limits.
/. stories about this? Can somebody explain why this raises people's passions so? It seems to me like arguing the merits of HTML or SGML - it's all so bloody obvious!
Why are there so many
As a web developer & admin, XML is my best friend. I have cases where I need non-webheads to develop content (better yet, portable content), and XML is the only way - they only have to know a basic set of HTML tags, they don't have to worry about HTML validation, formatting, or anything else, and everything they generate is consistent!
Not ony can I transform their content into different views or formats, but (for example) the same XML file that is used to provide software documentation also is used to build the software GUI and provide tool tips and other forms of context sensitive help.
No database required. No parsing required. Just a couple libraries and tools, and we're set to go.
send it vith SOAP.
XML is much better that anything else in certain situations.
XML is much worst that lots of other choices in certain situations.
Why can't you see the shades of grey, and insist on seeing all in black and white ?
Have fun,
Daniel
I saw a letter to Dr. Dobbs recently that was saying that XML needed to have the ability to embed things like Visual Basic and javascript in it to be really useful. I think that this is a horrible idea. The whole point of XML was to have a generic data model, i.e. one parser to rule them all.
I've been able to do thing like export MySQL schemas into XML, then using XSLT generate an entire set of base classes providing persistent objects. What was once weeks worth of work, now takes an afternoon (from concept to final product). The whole set is entirely consistent, no misspelled names or changed signatures. When bugs were found, I fixed all the files in one place and rerun the XML/XSLT script. Massive productivity boost. If that isn't an argument on why XML doesn't suck I don't know what is.
The idea of embedding code in XML is a perverse distortion of what XML is really about. XML would suck if one uses it for unintended purposes. I don't use a hammer to tighten machine bolts, well I guess some people do.
I used to wonder what was so holy about a silent night, now I have a child.
<include file="stdio.xml" />>
<function name="main">
<if:xml sucks="true">
<printf>XML sucks as a programming language</printf>
<else>
<printf>XML rules as a programming language</printf>
</else>
</if:xml>
</function
...They have a great busines plan! 1.) Work hard to develop something 2.) Give it away for free 3.) ???? 4.) Pay off credit cards!
Repeal the DMCA!
I want something that works like the Database Template Library for XML. It'd be nice just to map XML tags into a structure and suck the whole XML file in using an iterator.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
The main thesis of Tim Bray's original post was that he didn't like having to choose between either storing all his data in memory (i.e. DOM) or using a callbacks(i.e. SAX) when processing XML. The problem with this kind of thinking is that although it may have been true two or three years ago that the only way to process XML was via DOM or SAX this is no longer the case.
.NET Framework. Similar APIs exist in the Java world as well as Python from what I've heard. This is besides the current push in some quarters for programming languages that natively process XML (i.e. intrinsicly understand an XML datamodel or datatype).
.NET Framework. This article on XML.com points to other people who also point out that such pull-based APIs for processing XML are available on other platforms and languages as well.
There are more classes of APIs supported on multiple platforms for processing XML such as pull-based APIs and cursor based APIs which are represented by the System.Xml.XmlReader and System.Xml.XPath.XPathNavigator in the
Tim Bray's original problem was that he doesn't have a pull-based API for XML parsing in Perl. I pointed out in my kuro5hin diary how the pseudo code he showed as being his ideal for processing XML already exists in C# and
this.
:)
Some of you may already have read it, but it's on-topic nonetheless.
For shame France...For shame.USA keeps you around,why?
poking around his site I came across this. hehe.
/., and by following back the links in from other blogs, I sure did learn a whole bunch about the state of the programming art as regards XML. Some of the things I said were wrong (or at least open to challenge), and I got fodder for a really substantial follow-up piece, which I'll get around to soon. I don't suppose it's mathematically possible for everyone to get their theses batted around by some tens of thousands of well-informed people, which is a real pity."
1 9/ Who
"Slashdot and Stupidity I visit Slashdot once per day, sometimes more, because they seem to do a really good job of relaying the geek zeitgeist. It's a long time since I read much of the follow-ups, but I thought I ought to this time, and I'm reminded why. How can a publication that caters (on the face of it) to smart people attract the attention of so many shallow, drivelling morons?"
"Interactivity Again There were a few smart things there in among the chaff on
http://www.tbray.org/ongoing/When/200x/2003/03/
period.
I read a Dr. Dobb's article which said basically that plain-text programming languages were dead and this was the Wave of the Future.
N4st0r, trixx0r h0bb1tz0rz! Th3y st0l3 0ur pr3c10uzz!
There are shortcomings to XML, certainly. Having worked with it, I know that I've had someissues. But the benefits that it brings (the human readability, the structure, and the parsers that are available for it, etc.), makes it a good thing much more so than a bad thing.
Programming for XML is more work, but in the end, it forces you to be more structured and disciplined to work with it. You are always working with standard way of constructing data and messages, rather than having to reinvent a new wheel each time, or create your own format that is totally different from everyone else's. It's more work upfront, but for maintainability and as Tim Bray points out, longevity, you can't beat it. They don't called it the new ASCII for nothing (though not completely appropro). Just like there were text editors that can open and edit any plaintext ASCII files, so it is with XML files - if you have an XML editor/parser (and they are almost everywhere, including the humble text editors), you can view it and edit it.
It's like following coding standards - it's more pain and effort to do it and to do it right, but you will be thankful for it later, and it will be well worth the effort. Those who don't think it's worth the effort probably aren't building anything that is supposed to last for years to come.
They can't help it though, the W3 committees are infested by the same lifers who destroyed SGML. It would be refreshing to see a standards committee for once run by people who are suspicious of standards committees. Right now the XML world is run by the people who live off of the small cred being on a committee lends to their consulting biz, etc. so they have no motivation to ever finish the committee's works.
+ Linux doesn't suck.
+ Dell printers suck.
+ Blue-light DVDs do not suck.
+ No one is really sure about OSX.
/syle
XML is acceptible for eyebaling data but when you take into effect how verbose it is, it becomes very wasteful for transmitting over small pipes (modems)
A simple look up table and RLEing the results with a checksum can offer significant savings.
For an exercise, try sending a 900 K XML file off to a server and wait till it's done. Then look at the XML and see how you could make it smaller. It's kinda obvious and sad that it wasn't done in the first place.
- Zav - Imagine a Beowulf cluster of insensitive clods...
XML took a massive blow when its co-creator went public and said that it wasn't good and that it was too complicated for developers, generating headlines throughout the tech community.
Now he returns trying to "clarify" his stance on the matter. But, the fact is that all the clarifications are not going to garner nearly as much interest as the initial statement saying basically, that XML sucks. The damage has been done.
Regardless of whether XML is good or bad, it now faces a long uphill battle. Having one of the creators of XML deride it was/is a devastating blow to XML and his initial statement will be brought out against XML everytime there is even the slightest resistance to using it in a project. From now on, every time someone mentions XML, someone else is going to say; "It sucks! Even its own creator said so. There's no way we should use it."
Did you Vote for Linux?
I'm running OS X, and it sure sucks that almost all my preferences are stored in easy-to-parse buzzwords!
XML is very useful. It's not XML's fault that Microsoft isn't implementing it.
--
the strongest word is still the word "free"
XPP which has evolved into Common API for XML Pull Parsing . I don't believe there is a standard pull-based API for XML parsing in the Java world yet although there is JSR 173.
People who say XML sucks are the people who are forced to look at it and change it by hand.
But XML is not for that!
XML is like dough. Nobody eats raw dough (it's probably OK to eat it, but it ISN'T tasty), but eats cookies and bread instead.
XML is NOT for user and/or administrator usual exposure, XML is for application data transfer.
And applications that require XML to be written by human are only half done: they should be used in combination with HumanInput -> XML generation programs.
If someone is passing you on the right, you are an asshole for driving in the wrong lane.
No. Really.
There are IEEE specifications for numbers that are exact down to the bit. And processors actually comply to them.
Now convert your number to text, using a decimal representation (as AFAIK is recommended for XML). What you get is typically not the number you had before.
Badger
Any reason to use XML vs standard listed csv data files? except spending a lot of extra storage space on tags?
I guess I should buy a XML book someday..
It never ceases to amaze me how many people think XML is a language.
What do you think the L in XML stands for?
SOAP itself is what you're really complaining about being inconstant between your Perl SOAP and .NET - if the documents parse, then XML is working fine.
I don't like SOAP for most uses as it's overly complex for things like simple RPC style calls. Simple XML over HTTP can work just fine for how most people use the thing - it's not like everyone is doing distributed transactions or things that really take advantage of the SOAP envelope.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
If you're using the proper tools, and programming with the proper libraries, there's no reason you have to dig down into the XML in order to "write SOAP calls". I've used SOAP for a handful of tasks, and I can't tell you anything significant about how the requests are represented in XML. Developers don't necessarily need to know that. If things are breaking for you, and you're having to debug the actual XML data to figure out what's going wrong, then either your toolset is buggy or you're not using it correctly.
Seems to me that Plain Text is a pretty good document type. Seems that XML is a way of structuring some of that data. Seems that something else has to be layered over that - specifically, the tags that you create.
So when you read the file, you parse the text, then the XML, then your tags to get your data into a usable state. XML is is just a way of formatting text. That's where the "meta" comes in. It's not a document type, it's just a standard for creating document types.
The only way XML makes data long lived, is by leaving it in plain text so that it remains open. Your web app will be replaced in a couple of years. Another app can be written that will read your files, because humans can read your files, not because of some Eternal Data Tag that XML applies. Proprietary files could be handled the same way, except that the format isn't open.
Now, just because you used XML doesn't mean that your format (ie: your tags, your way of breaking up and marking different elements of data) will be eternal. You can break up a big old text file and mark it up, and your bosses will decide months later that they're looking for some piece of information that you didn't tag. Like they want to pick out all first names from within your "customer comments" tag. You re-write your format. You manually re-write your files.
It might be more useful for Mr. Erikkson to develop a few of these final file formats using XML to present as standards. A suggested set of XML tags for addresses, for example. Do you tag the street name (Main, Elm) differently from the street type (Street, Drive ) or is that all in a single "Address1" tag? Your XML will never work with my program if the higher level formatting isn't agreed upon. And XML doesn't do that.
Sammy Sosa is 7th on the all-time list of career strikouts; and 2nd among active players.
Read about it here.
I'd say XML is analogous to that: perhaps right behind C# among active "players?"
This space for rent
These days we have zillions of XML parsers to play with, and they make it pretty darn easy. And it sure is nice to know that when a vendor says they'll give you XML, you can read it. Unless that vendor is Microsoft, of course.
:-)
Oh, you'll be able to read (parse) it... you just won't be able to understand what it means once parsed!!
"There is more worth loving than we have strength to love." - Brian Jay Stanley
I used to enjoy your posts on /., now you've just turned into a Micorosft shill.
You read some of the arguments against XML, and you realize that people just don't "get it".
1 - XML sucks as a language
Repeat after me, XML is NOT a language. Certainly not in the sense that C++ is a language. XML is a standard that defines how one structures data.
2 - XML is bloated, I can send binary much cheaper/easier
DUH. If your application is fine using binary data transfer, then USE it. HOWEVER, many applications that either have to A) communicate with other applications or B) have to deal with varying data sets benefit greatly from using XML. Anyone who has been programming for any length of time knows that while binary is more compact, it is less flexible and potentially more error prone. Want to add a new field in the middle of your data, boy you better not get your software versions mixed. Want to write an app that can do reasonably intelligent things with ANY data it recieves, binary is not the way to go. As with all things in life, use the tool for that which it was intended (vs some peoples view that it is the end all be all of data representation).
3 - It's slow
Same as 2 above. If absolute performance is an issue, then by all means, use whatever representation gives you what you need. XML is about flexibility and standardization, NOT performance.
4 - It's complex
Well as complex as you want to make it, and it does sometimes encourages more complexity than is really needed, but it doesn't FORCE you into it. If you want/need schemas, go for it. If you need the functionality but in a simpler form, then do that (unless of course you need to communicate with another system expecting a schema, but his is obvious). It's just like C++, you don't HAVE to use templates and multiple inheritence (hell, you don't even have to create classes if you don't want/need), you use the parts of the tool that are useful and provide benefit, you don't use them just because they're there.
So I don't see what all the bruhaha is about. It has it's strengths, it has it's weaknesses. As with anything, relatively, new, people are trying it in various places. Some of these places not really fit, others do. I've designed apps that benefited greatly, others I've dismissed xml for entirely.
How do I encode properties (fields) of my data: child elements or element attributes?
How do I join the preceding-sibling namespace descendent ancestor-or-self following axis of evil?
The Russians have won. They have made the world a cesspool of distrust, greed, fear and hate.
I work for a VAN (Value Added Network) which is basically a middleman for data. You send an electronic purchase order to us; the company you're ordering from gets it from us. The value we add is we'll say you sent and tell you they got it.
However, we charge by the kilocharacter of data you send and receive per month. So, for us, XML is awesome, because it increases the size of an ASCII-X12 or EDIFACT document by a factor of 5-a lot more (usually somewhere around 15-20 I think).
X12 and EDIFACT are standards for business document exchange that have been around for a while, but people are converting to XML because they think it's better (eventhough, usually, they just use the X12 or EDIFACT format, but with XML tags).
For example, a line item record may go from something like this:
LIN:0001
to something like this:
<LIN_GROUP>
<LIN>
<LIN_01>0001</LIN_01>
</LIN>
</LIN_GROUP>
It's not always that bad, but it can also be much worse. (Imagine replacing each instance of "LIN" above with "Line_Item" and "LIN_01" with "Line_Item_Number".) (And why won't that semi-colon after the LIN_01 end tag go away?)
so-- for us, XML doesn't suck-- it increases our revenue. For our clients, it's sucks, because it increases their monthly bill.
Hate to burst your bubble, Tim, but this is the same justification that Microsoft to defend their monopoly on PC operating systems. There wouldn't be any portability issues if everyone used Windows(but there might be stability issues!)
And I agree with the notion that standards are a good thing, however, I have to be realistic at the same time. Any standard sufficiently broad to cover all of the possible bases will be so general as to be useless, or at the least, very inefficient in a large number of cases. The reasons why different standards crop up is because different users have different needs and values. In the UNIX community, portability, stability, and interoperability are highly regarded, where as in the Windows community, flashy GUI's and speed are often more important. Hence, two widely different systems.
The portability of XML is nice. The fact that it can represent just about anything is also nice. But the nature of XML precludes indexing, which means if I'm searching for a particular record in an XML dataset, I might have to read the entire file. Not a problem for small databases, but for mainframe size databases, this is simply unworkable.
No, XML doesn't suck. But then again, it's not a silver bullet either. Need I say the adage about hammers and nails?
The society for a thought-free internet welcomes you.
Vast improvements were made almost a year ago... ;-)
-- Arien
Amen, brother. Amen.
Just make </> close whatever the last tag was. That instantly cuts the size of the files in almost half, and makes them easier to read as well.
And yes, it could be confusing in a heavily nested file, but nothing says you have to use them. It would be a godsend for database columns.
Sometimes it's best to just let stupid people be stupid.
It has been my observation that much of 'XML' work with the query engine extension has been a recreation of hierarchical databases. But relational databases were designed to overcome hierarchical databases' failures. It seems we are turning back the clock. For a good critique of XML, C. J. Date's site has an article critical of XML http://www.firstsql.com/dbdebunk/
putting the 'B' in LGBTQ+
Two words:
Human-readable.
As a programmer, this is the most useful property a data stream can take on. Why? Debugging. The reasoning here is twofold:
1. Non-parallel development of opposite ends of the data stream:
It's quite a challenge to develop the code which produces the data and the code which uses the data at the same time. If it doesn't work, you don't know where the problem is. With a human-readable format, you can simply pipe the data in or out of the app directly from a text file, and verify that it's correct yourself.
2. Debugging:
Something of an extention of the previous, if you have two bits of code communicating through XML, you can log the bad transmission and read it yourself to find out if the bug is in transmission or reception.
Now, I won't pretend that XML is the only human-readable data-structuring format, but it has a lot of nice advantages over the others, each of which is covered in the article. XML makes apps a pain to develop, but a breeze to debug--and the debugging is far more important!
-Amalcon
Let's say you are using XML to store the class rosters for your school. Assume the structure, . This has the advantages of being the easiest to parse, you create your data-structure and it's finished, and lastly, writing XSLT to convert your XML to HTML is a piece of cake. However, it's both redundant in the XML itself and in memory.
Assuming something more efficient, like <class><studentId>, where you simply reference students by an id rather than inlining each student's data, removes the reduplication problem. However, everything else becomes harder. First, you have to be able to reference a student by its id, so you use a hashtable. Next you either have to require that student data comes first, or you have an update phase where you update each of your class objects. Lastly, XSLT isn't cake anymore (show me the roster for class X including all the students details).
Although this problem exists in any other application that parses data that contains internal references, it's still a major pain-in-the-ass.
What's the best way to tackle this situation?
In the Health Care Industry, XML is going to be the default format for transaction processing (due to HIPPA regulations). The problem with this format is the amount of storage and extra bandwidth required to send/store the meta-tags. Currently, the transactions are delimited or fixed format, with the systems using the offset to determine data content.
So instead of having (delimited with a pipe):
|SIXPACK^JOE|100 OAK STREET|
we have
SIXPACKJOE100 OAK STREET
Don't you remember these fond words? Does that mean nothing to you? Whether you know it or not,God is what binds us all together.Well our souls I mean..You know what I mean.If not,you will,if your present of mind as your life ebbs away (later rather than soon of course my son!).
Good luck finding your eternal bliss.You'll be ok..I believe it.
... Like most of folks here, we've successfully used it in several situations, across different languages (Java, Perl, ASP) and different purposes(configuration, data transfer, web page generation, small online data storage, etc). It's da bomb.
XSL/XSLT on the other hand can be a pain to use in anything other than trivial transforms, in my unschooled opinion. The concept of recursive processing is great, but the math/logic syntax available is byzantine (eg "variable" is really a constant).
*sigh* I know this will get modded offtopic, but seriously... anyone agree with me, or do you actually like writing transform logic and processing in XSL? Please comment.
In the last five years, XML has - for instance - completely revolutionized the way my company writes software. We use code generators that mungle XML definitions into templates (imagine PHP controlling the generation not of HTML but of C or... PHP, and using XML to specify the abstract model in question).
We don't need schemas, stylesheets, xpaths,... just simple XML. And yet we can write very rich code in XML instead of in native code. Today we're producing about 25 lines of final code for 1 line of XML, and we're pushing this up all the time. My current project generates workflow engines from XML definitions, building a 10k workflow application from a single 500-line XML file.
My point is that XML is not just a handy way to store data. It is a meta language, able to formally define any concept, no matter how abstract. This is an incredible but subtle thing. The power comes not from XML technology itself, which is really very, very simple once you ignore the W3C fluff. The power comes from the freedom that XML technology gives you, namely the ability to abstract your problem to as high a level as your mind can take it, and to solve it at that level.
This is difficult, and takes time, but as the XML space settles down it will become clear that this is the real value of the technology.
The 'con' arguments all appear to be related to people trying to use XML in the wrong place, for the wrong thing, or to replace existing abstractions that work perfectly well.
Sig for sale or rent. One previous user. Inquire within.
(Parenthetically (Duh!) for similar problems though I sometimes written an XSLT filter to reduce the size and complexity of the original data set, then used a DOM style reader on that. So I'd rather like an API that would let me do that more easily.)
From article:
"XML Can Represent Pretty Well Anything"
"XML has been used to represent, without loss of information...yearly calendars, and Zen koans. OK, I don't know for sure about the koans."
<koan attribute_to="Chao-chou">
<question asked_by="random monk">
Does a dog have Buddha-nature or not?
</question>
<response master="true" smileQuizzically="true" useMuResponse="true"/>
</koan>
"There is more worth loving than we have strength to love." - Brian Jay Stanley
It has no lips, lungs, or tongue.
Binary Lexical Octet Ad-hoc Transport.
I can hear the gears whiring away...
Yes, you're right. Your extremely simple dataset would be more compact in that non-XML format.
Now, let's say that someone wants to include notes with those entries. For example, on day 1, "The Pepsi deal finalized." On day 2, "Lightning took out the business park power transformer so that work was suspended for four hours." Seems like a legitimate request, right?
In your format, you must make absolutely certain that no \r or \n charaters make it into the notes. "Easy enough," you say. "That's a short Perl one-liner."
Now someone wants not just the number of sales but the daily revenues to be included (Picture a red-faced boss screaming about how she doesn't care how many nickle and dime jobs were sold but the amount of money coming in). Hmmm... We can't put the revenues after the notes because we have no effective delimiter. We could put the notes in quotes, but then we must escape out all of the quotation marks in previous entries *and* retool our parser to look for quotes. Then again, you could put the revenues before the notes. Once again, you head back and fix the parser.
In XML, you simply add <note> and <revenue> tags. As a bonus, XML would allow for multiple notes per day. Make a DTD, XML Schema, or (better yet) RelaxNG document for your format, and you have an easy method of finding any errors without touching code and without tying your self to any particular programming language.
Yes, there's always BNF. But BNF is harder to write than RelaxNG. There's also the issue of tying the BNF to the document. XML handles this through the use of namespaces. I'm sure you could find a way to tie your documents to your BNF file. You've reinvented the wheel everywhere else, why stop here?
Finally, how does your format handle i18n? Is your parser UTF8-aware? You know that in some languages (those crazy Asians), UTF8 is more bulky than UTF16 or UCS2 right? Big5 and Shift-JIS are in heavy use as well. Make sure not to forget your chartype transcoder.
By writing your own file format, your own parser, and your own validator (you did write a validator, right?), you are spending extra time only to have to spend extra time later. Chances are, your task isn't writing a parser. Let others handle the job of parsing and get on with doing what you really need to do: get the job done.
- I don't need to go outside, my CRT tan'll do me just fine.
...for me to POOP ON!!!
* attributes and elements and mixed contents complicates programming. now you have too many different types of fields tied together in an ugly manner which introduces all sorts of programming complexity. i would rather have only elements.
* although xml is verbose, it has done some corner cutting, like empty element can be written as <element/>, or can be written as <element></element>. Different parsers treat these two things identically or differently, this again introduces some bugs. there should be exactly one syntax.
* white spaces: no easy way to handle this. either will look ugly in text editor or will require extra programming effort.
* comments: Comments are extremely hard to read too. also from programmers point of view, comments are ambiguous. should they be preserved or discarded? no clear guidelines.
* No private elements: when i write an xml documents, i would like to put additional private information. the only way to write this now is using comments, but that is not structured. there should be a way to write structured comments too. this is not there in c/java/c++ etc too. java uses /**...*/ notation to tell it is a javadoc comments. this is a hack and some central body has to enforce it. there is no built-in way. also sun uses special javadoc tags to indicate fields within javadoc elements. again this is a hack. why doesn't xml have a built-in way of structuring comments so that all parsers, viewers would honor it?
Most of his (excellent) points have to do with exchanging data between applications (with long-term storage being essentially a special case of that). And he's right -- for those, XML is a huge win, and we should all bow down and worship at its feet.
However, because XML is such a huge buzzword now, people are proposing (or insisting on) using it as a format at the heart of complicated applications. Where anyone would have said 'Use a database' a few years ago.
In doing so, people are losing sight of the essential beauty of the relational data model. With a RDBM, you, the programmer, have tremendous flexibility about *how* you view your data. This is a huge win inside of an application. XML forces you to commit to one specific view of your data. Yes, if that data needs to live forever and yes, if that data needs to get sent to someone else, than by all means, store it in an XML file. But if you need to *do* something with that data, you're going to be much happier with a relational db.
-Dan
I have written a truly remarkable operating system which this sig is too small to contain.
Whew. Thanks for straightening that out for me. I was worried for a moment that I'd have to stop hating XML again.
I think everyone was aware before either of these articles became part of the collective, that like any other technology, XML isn't always the best solution. I think this is more news because the author of a widely used technology is actually stating this obvious fact. Go ahead and ask a company what their product isn't best at or where another techonology or product might do better and I bet you'll hear a lot of stammering (unless they have a much more expensive product to sell you...). So good for Tim for airing these thoughts.
No. There is no contradiction between "Current XML parsing solutions suck for programmers." to "XML is a good thing overall." It's not a 360 [sic], because those are two seperate dimensions entirely.
RTFAs.
(One could argue that no good parsing solution is itself a weakness of XML, but IMHO the problem is that we got stuck going down the wrong road(s) for parsing, with SAX and DOM, both of which look good on paper but lack a certain practicality. If in five years there's still no good solution then maybe it is XML's fault.)
I can see XML finding purchase in the same kinds of things HTML and SGML were used for: document-type representations where the amount of text in between the start and end tags is much greater than the amount of markup.
Things like databases, RPC calls, freakin' programming languages (I didn't make that stuff up about XML programming languages being said to obsolete normal languages in Dr. Dobbs), stuff like that, would probably benefit much more from a format that isn't as cluttered as XML.
Look at an old-school NeXTStep-style propertylist, and look at one of the new Mac OS X XML propertylists, and tell me which one looks better, and easier to edit.
N4st0r, trixx0r h0bb1tz0rz! Th3y st0l3 0ur pr3c10uzz!
I work for a publishing services firm that is focusing on XML-based production of print and online materials, ranging from books to scientific journals to grade-school testing applications.
Simply put, XML is the best tool available for storing content to be databased, searched, rendered in multiple formats and broken apart and reconstituted into custom documents. XML also lends itself nicely to the representation of complex mathematics using MathML. Because of this, we've based many of our production processes on XML.
One particular journal we produce is a heavily mathematical, 250 page weekly scientific journal. This journal is produced in both print and online forms, as well as being databased by the publisher. Using tools such as Arbortext Epic (www.arbortext.com) for content editing and Advent 3B2 (www.advent3b2.com) for semi-unattended formatting we are able to produce the journal with a staff of only 10 people. A year ago, it took twice as many people and the end product was not nearly as flexible. In this application, XML rocks.
However, using XML in every application imaginable without considering whether or not it's the appropriate tool can be quite foolish. A hammer is great for pounding on things, but is pretty worthless in nearly every other application. A lot of the frustration felt by coders implementing XML solutions is due to the fact that it may not be the best tool for the job.
That said, the challenge stems from MV-fields. Those nifty things in PICK which give you the power of keeping associated fields within one table, with as many associations as you like. (for good or for bad, bad usually when it's been abused or good housekeeping neglected.) Piling MV stuff into CSV is just plain icky. Normalizing it first is also icky. However XML may offer a simple, elegant way of keeping it all together in the shape it existed in (which may be important down the road if someone has to produce a report from it (auditors, second guessers, or a55-covering because some account didn't have the right amount of debits or credits for years and the difference needs to be found.)
I'm off to explore XML more fully. There's probably yet-another O'Reilly book in my future...
A feeling of having made the same mistake before: Deja Foobar
First, a huge "Thank You" to Tim Bray and to /. for facilitating this important (IMHO) discussion on the relative merits and demerits of XML. I am a self-taught XML enthusiast. This sort of discussion, especially one with such expert validity, is useful to me because it grounds some of the vague notions I was coming up with through my own experience using XML.
I wanted to respond in particular to one thing in Mr. Bray's most recent article:
"But let's face it, when you parse XML, you get a data structure that is kind of an ordered sequence and kind of a tree and kind of a hypertext. This maps well onto no known programming paradigm."
I'm not sure if this precisely addresses what is asserted, but I think perhaps there is a known programming paradigm that does map well: SceneGraph APIs.
This is probably a much longer discussion that I have time for right now, but here's what I mean:
SceneGraphs are data structures commonly used for 3D graphics programming. SGs have concepts like nodes, parents, addChild, removeChild, and nested nodes, etc. XML can be extremely useful for representing a SceneGraph. If nodes are nested in the XML, then they are nested in the SG and vice versa. Nodes themselves are containers of attributes (PropertyBags, or whatever you like to call them), so serialization and deserialization is indeed a good thing here. The mapping is very tight between XML and SGs in this sense. But that just addresses the data structure and OO parts of the "equation" that maps XML particulary well to SG APIs.
Another important concept in SG APIs are SG traversals. Traversals are the sequential paths taken by the SG engine as it performs calculations on, updates to, and rendering of the SG. Some traversals process the entire SG, depth-first, left-to-right, and some just traverse down to a single leaf and stop, but essentially any traversal is a path through the graph. Traversals can be specialized. Some might search, some might render, some might calculate. This seems very XSL-like, but at the very least, there is a sequential processing of the structure (can be parallel but results are ultimately combined in the join and finalized sequentially).
Finally, as for hyperlinking (Mr. Bray actually said "hypertext"), that's there too. A technique similar to hyperlinking can be used to dynamically jump to different subtrees within the overall SG. If I want to render only a portion of an SG, I might move the traversal pointer from some SG root node down to a subtree containing just a few models. References are also used in SceneGraphs. If I want a model to appear 10 times, I don't want to load it 10 times - I want to load it once and make 9 references.
But like I said, there's probably a lot more to consider. I'd certainly like to hear what Mr. Bray thinks as well as what any Slashdotters w/ SG experience might have to say.
A language is the set of all ways a grammar allows symbols to be combined.
(of course, a grammar is a set of rules on how to combine symbols.)
Under the formal definition, XML is indeed a language. It is not a language useful for defining algorithms, admittedly.
Can we stop with this "XML is not a language" now?
No idea why the powers that be chez slashdot should have chosen to focus on Mr Bray believing XML to suck without sucking.
A brief visit to his site reveals him to be an Economist reader who doesn't believe in Adam Smith!
The man is a human logic bomb, one bowel movement away from destroying us all.
The idea is to get things done fast and with some quality. Using xml can increase productivity because once you know how to use it, it becomes very simple and the next time you run into a situation that calls for a data file, you can quickly implement it.
Is it the best? Probably not. But it's undeniably an effective lingua franca. A human can easily creat, edit, and manage it dynamically - you want a new tage you just do it.
Then, it's also as easy on the software side to reflect those changes. The fashionable arguments people use against it (why is it so fashionable to bash anything that happens to be a buzzword?) are non sequiturs in terms of what XML is intended for.
I use it, hell I probably overuse it. It's so damn easy to parse that I don't want to waste time building a custom format just to save that extra 1K of space or 1/100th of a second.
Look at TIFF, look at XML. Both are the same idea, except one is binary and one is text. The problem with TIFF was that no one could read anyone else's TIFF files. Since anyone could create a tag, everyone did, and no one could read anyone else's tags.
XML doesn't solve this problem either. Look at the problems it is having.
The only solution is to standardize the data format, not the meta-format. And once you do that, text format adds nothing except overhead.
XML is supremely unsuited for most of the things it is used for.
Since then I've considered the same kind of thing a few times and while I don't think that XML syntax for a programming language is such a great thing in some ways, I've come to think its a Very Good Thing on other ways.
For instance, reading a recent discussion of Hungarian Notation, I saw a comment where someone suggested being able to mouse over a variable and have its definition, scope, any associated comments, maybe all its uses highlighted in different ways. Using XML as a basic markup for a program could facilitate this quite a bit (yes, there are other ways).
Or imagine having the ability to embed diagrams, images and other documentation into a program. (Yes, I'm aware of c#'s mechanisms - I'm thinking of something far more pervasive.) UML information could also be carried along with tbe program. (Yes, again, I've seen mechanisms that do this, but they mostly appear to me to be a bit on the hokey side.)
Similarly, generative programming, Karl Lieberherr's Adaptive Object Oriented Programming, Aspect Oriented Programming could all use XML markup as a facilitating mechanism.
Or consider Knuth's Literate Programming with an XML syntax. For those who've only seen LP in the context of some of the weaker literate programming mechanisms available, check out the books on TeX and MetaFont where the code is presented using literate programming.
XML markup would also present mechanisms for macros and conditional compilation that could be very powerful indeed. For instance, having used Sather, I always want pre and post conditions nicely included in procedure definitions in other languages (and have remained frustrated that the Java language definition (c# as well, as far as i can tell) has not included these constructs except as ad hoc add ons). An underlying XML markup would make this possible, even relatively easy.
As a programmer I wouldn't want to have to interact with it directly, I'll admit - but we should be able to build Emacs modes or similarly "smart" editors that make the interaction reasonable. And given how popular big fancy IDE's are these days, this shouldn't be all that tough to manage.
And there are other potential benefits as well - programmer control of name mangling, programmer control of obfuscation (also much stronger support for intelligent de-obfuscation, by the way), support for better code refactoring, the ability to track changes on a more conceptual level in a version control system.
To say nothing of being able to finally resolve all those damn arguments about the One True Way to lay out your code - every user could define their own preference in their UI - the markup itself would not need to know anything about it at all.
Yes, I'm well aware of the drawbacks - I use emacs for almost everything. But given the effort that has goine into making IDE's these days, most of the drawbacks are quite resolveable.
Uniform syntax for instructions and data, trivial to parse, no assumed meanings... So, with a little more effort, and maybe some standardization, XML will eventually reach the same place that LISP has been for decades.
If I may quote Sherlock Holmes, who was quoting the book of Ecclesiastes: There is nothing new under the sun. It has all been done before.
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
...because the standard of comparison is CSS. Now THAT sucks!
they should be used in combination with HumanInput -> XML generation programs.
Then how did you format the comment you just wrote? You wrote HTML, which in its modern form is an application of XML. Do you claim that only programs like Dreamweaver should be used to generate XHTML in practice?
Will I retire or break 10K?
Charging by the kilocharacter?! What decade is this?
The 2000s. And in the 2000s, ISPs continue to bill by the KB. A British ISP caps transfer to a single account at 30,000,000 KB per month. Worse, some other ISPs outside North America allocate only 3,000,000 KB to each customer per month.
But can't gzip reduce the size of an XML stream?
Will I retire or break 10K?
First, you have to be able to reference a student by its id, so you use a hashtable. Next you either have to require that student data comes first, or you have an update phase where you update each of your class objects.
XHTML, a common application of XML, chooses what I think you're calling the the "update phase" method, through URIs in src and href attributes of elements. For example:
Here, "foo.png" and "bar.png" are relative URIs to image documents.
Will I retire or break 10K?
I think both arguments - XML sucks and doesn't suck - are missing the point. For example, my company switched to ClearCase because of the touted features, but there are several meetings per week to solve some branch problems, or VOB problems, or wrong tag. The same group was using RCS before and people had to do some manual merge but didn't spend nearly so much time on it.
.ini files.
Something technology can be incredibly powerful, extensible, theoretically clean and yet incredibly unpleasant to use. I had to write a simple servlet the other day and was dismayed to see that JSDK 2.0 command line switches are gone and I have to write a multi-page jumble of brackets, columns and what not to get the thing running. Ended up searching for an old version of JSDK on co-workers machine. What wouldn't I give for a servlet engine that uses good old
It's true that various libraries could hide the mess they use internally. But then when they break, I get to look at a particulary convoluted example of the mess. There are so many smart people in CS. Can't someone come up with a data format that is pleasent to read, edit and parse, not just powerful?
is 50% suck good or suck? hmmm...
[self dealloc];
Maybe because it is complete and utter bullshit?
Honestly People! Speaking as someone who used XML right and left on several projects...
.INIs other text file formats could have equivalent infrastructure for validation, and they'd be faster and more effective. Same with internationalization, and many many other features XML claims is theirs and only possible with the David Copperfield magic of XML.
When you have people arguing that XML is a godlike power because people decide to sit down, exchange text files, which happen to be in XML.
That isn't really saying anything. What's the difference in this case between a CSV and XML? They're both just text file formats! Except...
1. XML is much much slower in raw performance due to parsing overhead.
2. XML is very hardware resource intensive. If you want to do extensive reading or manipulation of an XML file, the entire DOM tree has to be in memory, and the DOM tree is not very compact...
3. XML is not easily learned. Showing a beginning programmer with a grasp on any language how to read a CSV file is easy. Try doing the same with XML tools, you have to teach someone about trees, leafs, nodes, and if they're using XSL, a functional recursive programming language.
4. CDATA. Now COME ON people. If you want binary data or formatting-preserved data, base 64 encode it. XML people saying readable data is a primary goal of XML are completely undercut by these worthless structures
OUCH
Nine times out of ten you don't need XML when its being used today.
CSVs or
Finally, I'd like to point out that when I had to write a fast-performing and more compact "XML" parser on a resource-limited PDA, two things helped out enormously:
1) the line break actually means something
2) Use a generic ending tag. Saves space
Hey, I'm just your average shit and piss factory.
I could care less whether "<", "(", "{" or any other character begins a tag. The structure of
beats by a mile.Data should be stored in a way that is easy to parse and unambiguous to design. XML would have been better designed with a way to represent pointers (e.g. LET/LETREC) than the silly attributes and other syntactic nonsense.
-m
God understands our arrangement even if you don't.I stay on his good side and he keeps out of my way.When we do get together we're usually kicking asses and taking names.Well he tracks souls anyhow.I just try preach the good word to good folk who are having trouble (like yourself).
I'm going to pray that you find your way before you're on your death bed with your life ebbing away,unable to form coherent thoughts.Because then,my friend,you just missed the train (dare I say Soul Train?).So jump on,there's plenty of room,even for a blasphemer such as yourself.
I'll be rooting for ya!
Communist countries don't believe in superstitions like religion!
That's part of what makes them so scary and mean!
Those damn athiestic commie bastards!
After 2 years of use, i realized I could have been doing something more productive than using XMLDocument, XMLNode, and XSLT sheets. I could have used a database instead of storing text data in XML.
Go ahead defend XML, and its virtue, it will not change the fact it is ungodly unprogrammer (if there is such a word) friendly.
But bureaucrats being what they are (and bureaucrats being in charge of environmental agencies), they've been told that XML is a GOOD THING, and want to force everything into that mold. And it doesn't fit!
Call it the "law of the instrument," as someone (Poul Anderson, I think, put it:
That's XML, to a tee!"My opinions are my own, and I've got *lots* of them!"
"I'll just stick with LaTeX."
Personally, I use LaTeX for about 90% of my document preparation needs. XML, however, is a completely different beast that works better for some things, worse for others, and will work for somethings that LaTeX simply isn't designed to do.
If I want to set operating system preferences or default variables, an XML file can serve for this quite nicely in a similar manner to plists. This is *not* something that LaTeX is even designed for.
Similarly, there are some things that XML is just better at. Apple's Keynote uses XML and using LaTeX for something like that, while doable, would just be silly.
Integrate Keynote and LaTeX
I don't know, either. I will predict, though, that XML may create new interest in Lisp. The rest of you can go on pushing around mountains of Java and XSLT, bringing your computers to their knees when you need to do hairy XML manipulation - I'll run rings around you using S-expressions as DOM-lite, using systems that have been tuned for twenty years to efficiently handle these data structures.
To a Lisp hacker, XML is S-expressions in drag.
Some people here say that XML is just a buzzword with no real advantages etc. Let me tell you, they are all wrong.
My company makes apps for the military. These type of apps make heavy use of binary messages exchanged over a network. Up until now, there were numerous errors in the specification, different type systems and other problems between departments. With XML, we managed to do the following:
1) make a standard for expressing messages. Since messages are tree structures (struct embedded in struct etc), XML is ideal for it.
2) made a tool to write messages and group them according to context. Now specification docs can be automatically produced by the tool and handed over to subcontractors, whereas previously they were written by hand, contained many errors and had different styles. Now all these problems are gone, changes are documented and saved in the configuration and versioning/control system, messages are automatically versioned and the whole procedure is automated to the point that it takes a few clicks to modify a message and produce a new specification.
3) made tools which can prepare scenarios for testing these messages automatically. This saved us a lot work!!! it is quite a big amount to test every field of every struct of each message from the up to 10000 message a combat system has (and each message can contain hundrends of numeric fields)!!! thanks to XML, each field's bit width, range, default value, minimum and maximum value and enumeration is known beforehand from the XML data produced for the specification, so by using XSLT messages are automatically converted to C, C++, ADA and Java code along with the relevant code to send, receive and validate each message.
One of the true benefits of XML is that data are not tied to a specific application. For my company, it has saved us a lot of work, because there is no need to bloat one app with all the functionality, we can make several separate tools which do one job only and operate on those XML data.
One word (or for if your anal) HTML. People looked at html on the web and saw it was good. XML looked alot like HTML. Must be good.
-Nuke the moon
Well, take a look at the page source:
Seriously, the HTML doesn't look too bad. In some ways, it's more readable than the rendered copy - such as his list of data items rendered in XML.oh, my dear moderators - this is funny! funny, i say! has no one a sense of irony? or are all your spam filters set so tight that no one here recognizes a thoughtful and clever parody???
Umm, yeah. It still sucks.
There are reasons why democracy does not work nearly as well as capitalism.
-- David D. Friedman
I guess if XML doesn't suck, I won't be able to look forward to my xml pocket-pussy or artificial "cron job"... Everyone needs a good cron job every so often.
// //
Syntax can be, and has been, interoperable. The definitions of the telephone network, the Internet, email, and the Web are all bits-on-the-wire definitions of what you send back and forth, and they've all worked well enough to change the world. XML provides a nice set of syntax rules that you can stick in the face of a recalcitrant vendor and say "you claim to be interoperable? Well, ship me some XML then." And these days, they can't say no, and this is good for everyone.
quote from the article. the last sentence just wraps it all up nicely for you.
http://threading.2038bug.com/xml-answer.html
-paul
XML Has Internationalization Pretty Well Nailed
Well, not quite, if you consider the fact that localizing XML data sources requires duplication of the file (here comes maintenence) per locale, or using keywords instead of hard coded values within the data source, and thus completely abolishing the use of Serilaization/Deserialization (here comes DOM/SAX parsing and a headache).
XML Can Represent Pretty Well Anything
And then come NEMI, RosettaNet and rest of the lot and agree on what they believe to be the most gratifying way of representing that data, to allow for standartization of communciation channels for B2B and B2C. Sadly, none of these groups agree on the standards, and most software manufacturers usually "extend" or "invent their own". Now, if we could only get everyone to agree on the standards of communication using XML as a vehicle, then we'd all be in a world of good.
That being said, for programmers, when defining internal documents for application configuration, data retrieval, etc., it is possibly one of the easiest data stores to manage and maintain.
Furthermore, XML cannot directly contain information represented in any binary form(attachments, for instance) in any straightforward (and non-space consuming) way, causing it to be a loose container.
Well then Johnny, tell us why it is about oil. I'll point you to an article that explains why it isn't.
Read this
I read a few of your other posts, and you desperately need to become more informed before posting in the future. Reading this article is a start.