Tim Bray on the Birth of XML, 10 Years Later
lazyguyuk writes "Tim Bray posts a lengthy blog on the birth of XML, formalized as 1.0 in Feb 1998. 'XML is ten years old today. It feels like yesterday, or a lifetime. I wrote this that year (1998). It's really long. The title was originally Good Luck and Internet Plumbing but the filename was "XML-People" and I decided I liked that better. I never got around to publishing it, so why not now?'"
Young Buck: Hey, we have a data exchange problem between two systems, lets use XML !
Greybeard: Ok, but now you have 2 problems.
I want to delete my account but Slashdot doesn't allow it.
I realize the XML is used for a lot of things, but whenever my fellow developers learn that the vendor is shipping us some interface in XML, the groans are audible. About half the time, their XML format isn't quite standard, and we've got to dig around for utilities to try and work with it (or write something custom). I'd say the vast majority of our interfaces are good ol' delimited text files.
For other purposes, XML is great and very readable, but I'm not sure it makes sense to use it everywhere.
Do you maintain a website? XML has been a godsend for those who want to maintain web and print output side by side. By keeping your data in an XML format, you can use simple XSL stylesheets to generate multiple types of output. See e.g. O'Reilly's XSLT Cookbook for dozens of very real-world examples (it's probably in your library).
That's just one example of how XML technology has made coding easier. Others I'm sure will point out others.
If you aren't a developer, then I'm not sure XML was supposed to directly revolutionize your end-user experience.
I've recently taken a job at a primarily Java shop. After seeing XML used and abused for ant, maven and various other things I've grown even more disenchanted with it. And now I've also gotten the chance to see that not only does Java represent a poor trade off between the annoyances of a strongly typed language and the speed of a dynamic interpreted one, it has a horrible mess of dependency issues that nobody really solves besides.
I'm much more hopeful about technologies like Thrift and/or D-Bus than I ever was about such abysmal abominations as SOAP, or the only slightly better XML-RPC.
The Java XML world seems like this little closed ecology of mutual masturbators who all come up with more Java and XML 'solutions' to problems that never existed before they started using Java and XML.
I see the value of XML for long-lived documents that don't spend a lot of their life on the wire. And possibly for config files, though IMHO it is too ugly and unreadable for those. But as a general tool for Internet plumbing it's awful.
Need a Python, C++, Unix, Linux develop
Looks like you're going to have to wait a little longer. Try holding your breath, this time.
XML is like violence.. when it doesn't work, use some more!
And, of course, my post is incomplete with reference to my little rant on why CORBA and other forms of RPC are bad. Both Thrift and D-BUS are pretty close to the ideal solution I describe later. They focus on message content over semantics and are extremely easy to parse. SOAP and XML-RPC fail on both of those counts. They are about semantics (you are making a remote function call that does some specific thing, not sending a hunk of data that has some particular content) over content and they are a huge pain to parse.
Need a Python, C++, Unix, Linux develop
Perhaps I'm being too negative here. I sound like a troll. But really folks, do yourself and the rest of us a favor and read up on JSON and YAML. You''ll see I'm being only too kind and generous to YAML.
Some drink at the fountain of knowledge. Others just gargle.
I do a lot of Java and XML. I don't know what you're using for a library, but I'd suggest JDOM.
As for the abuses for Maven and Ant... yeah. I'll agree. There are a lot of things that seem to use XML just because they can. I know there is some theory behind why they use them (machine readable, blah blah blah) but for most things it's just a giant pain for the complexity you get. Maybe if you were trying to build Windows with Ant.
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
Excellent point, and I'll take it one step further. When coupled with XSLT and other WS-* standards, you have an extremely flexible way to connect otherwise absurdly different applications (See Sun's OpenESB and JBI standard).
The hatred for XML, I think, stems from frequent, ugly misuse. Here's one basic, freakin' obvious rule: if a human, at any time at all, has to read or manually edit an XML document, you're doing it wrong. Just because it's ASCII doesn't mean it's human-compatible.
LaTeX is restricted to certain types of print output. It emphatically cannot output HTML easily. Just look at the umpteen thousand threads on comp.text.tex where someone complains that
.Yay! Nothing like the combination of XML and Java to bring out the haters. Incompetent use of a language/API doesn't equate to a bad language/API. I can show you plenty of crappy C/C++ code freely browsable in some open source libraries. Does that mean C++ sucks? Hell no.
My experience with Java+XML you ask? OFX servers for financial institutions. Without name dropping, check out the list of banks, brokerages, tax services, and credit card providers (Quicken) out there successfully serving up client data. I guess we're all circle jerking while you're downloading your account information into Quicken or Money.
Some good uses for XML:
Some bad uses for XML:
I have to admit, I'm clueless about your Java dependency issues. The only way I can see that ever happening is if you're dumping all of your classes into the default top-level package; and that's major user error if you are.
I swear to God...I swear to God! That is NOT how you treat your human!
Kevin Smith on Prince
I'll take an Ant XML build file over an "is that a tab or a space" Makefile any day...
Xenon, where's my money? -Borno
I use it in web development constantly, and have for about 8 years. It's great for documents mostly since it's much easier to process than a home-grown set up.
:-).
You want to transform the document, you can use any of a number of techniques, and trivially guarantee that the resulting document is at least syntactically valid. If you use a home-grown format (or HTML), you'll need to resort to regular expressions, or a custom parser - which works fine up to a point. Regex's are error prone (it's quite difficult, for instance, to make an untrusted HTML document safe with regex'es), and parsing is difficult, and doesn't solve the transformation step very elegantly - wheras XPath and others are absolutely brilliant for quickly distilling the stuff you need from a document.
But on the parsing side... take a look at ANTLR, it's just great
In general, if you have data to be structured and serialized, XML is one way to do it. If you think XML a poor choice, then could you suggest an alternative? Incidentally, that suggestion should not imply that everyone reinvent their own formats (again).
Would you provide evidence aside from personal anecdotes, and possibly consider evidence to the contrary?
Perhaps you meant “modern software” instead. Any complex application these days relies on dozens of libraries and services to perform tasks. Not quite sure where exactly you are having difficulties, so I cannot elaborate further.
XML is intended for consumption by machines first, people second. You might also argue that in-memory data structures are ugly and unreadable.
Yes. XML was formalized. It is strictly defined and easy to check for compliance (with the right tools). Only a little bit of the definition has passed out of common usage, mostly focused around DTDs.
... you get the idea. If a standard can't solve the problem, you can't count the lack of solution against it.
If you encounter a file that claims to be XML, but does not meet the XML standard, then it is not the XML standard that is to blame. The claim is wrong and the file is not XML.
XML is not a fuzzy-wuzzy adjective that can be applied willy-nilly to anything and magically turn it into "XML". It is not a marketing term or English Professor term. It is a rigidly specified engineer term for a document format, and a given document is XML if and only if it meets that format.
If someone wants to hack together a half-assed parser or emitter of any language, they will. I've seen half-assed XML parsers, I've seen half-assed JSON parsers, I've seen half-assed HTML parsers, I've seen half-assed YAML parsers, I've seen
Java is clearly moving away from the massive over-use of XML in everything from configuration to messaging. From Java 5 onwards, annotations are rapidly becoming the configuration mechanism of choice, where infrastructure configuration is placed in the source code directly, in a way thats significantly less obtrusive than writing code to manage things like persistence and transactions yourself, and significantly easier to follow than placing it in many XML files. Anyone who has migrated from EJB 2.1 to 3.0 for example should be much happier now that the various XML files needed to get it to run are going the way of the dodo. This use of annotations to replace XML is an emerging trend popular in many frameworks, from EE 5 through to Hibernate and Spring. On the messaging side there are a slew of code generation tools and XML-to-POJO (annotation-based) mappings that keep you away from raw XML - yes its another layer of abstraction but it keeps you away from the coding horrors of SAX, DOM, and yes even the comparative simplicity of JDOM.
xhtml is one very small dialect of xml.
/etc/passwd is more legible and appropriate. And there are times when the volume of data requires binary. XML is good because it is widely known and when the originating application is lost, the data can still be (with moderate difficulty) understood.
when you are entering html style markup tags, you are using xml. but xml is a much much larger subject than that. hand editing a website is fine. (if the documents are getting huge, it should be split into smaller files and automated somehow, anyway) hand editing, say, Open Office's xml format or any of the fairly arcane XMl formats used for interprocess communication.
XML is sort of designed to be the second best data format for any application. There are a lot of times when something like
It's very similar to Java really. It got hyped for a specific web use that didn't really materialize, but it's ability to be generic, widely-spoken, and safety-checked means it has found widespread use across the entire computer industry in places that aren't quite as visible to end-users as simple web application or document formats.
In Capitalist America, bank robs you!
Here is another obvious rules: If a computer, at any time at all, has to parse or generate XML in large amounts, you are doing it wrong. There is really no need to resend the same string 100000 times, encode multi-megabyte binary data as BASE64 or lose floating point precision by encoding to or from strings. If need be, an efficient binary format can represent the data with an arbitrary schema. Communicating parties can exchange their schemas at runtime and avoid sending attributes that the other end is not going to use.
Does anyone still use latex2html? All of the TeX users I know who care about HTML output switched to tex4ht years ago. It produces a variety of XML formats, including XHTML (with MathML) and OpenDocument.
I am TheRaven on Soylent News
This is why regular expressions are typically used for lexical analysis (tokenisation) not syntactic analysis (parsing).
I am TheRaven on Soylent News
No, you cannot with a regex. If you can, it's not really a regex, it's something different.
Please, for the good of Humanity, vote Obama.
So you're the guy who shits tabs in random places in source files, because you haven't figured out how to set up your editor to show you the difference. Please stop doing that. Tabs and spaces are different characters, even if the language you're using today treats them the same. If you're a VIM user, please learn to use "list" and "listchars."
How is this insightful? Yes, from a strictly comp-sci definition of a "regular expression", you are exactly right. But this is not a comp-sci class and this is not a theory lesson! In the real world where real programmers write real (crappy) code, a parser that parses only regular languages is not very useful. All modern regex parsers handle more than just regular expressions - back referencing, depth parsing, lookahead/lookbehind are all common features of modern regex engines that violate the rules of parsing a "regular language" using a simple memory-less DFA/PDA state machine. Real regex parsers use (GASP) *memory* to do their parsing. So, while you wallow in semantics and theory, people are out there are doing real (and granted silly) things with regex parsers because they can. For the purpose of this discussion, the original poster is right that it is possible (through incredibly unholy) to determine well-formed-ness of XML via a modern regex parser even through XML is not a regular language.
It Depends. We have systems that are arranged in a long content chain. One machine sends data to the next machine, maybe by pull, maybe by push. Next machine does ... something ... with it, passes it to next machines. Maybe the developers talk to each other, or remember why their predecessor made the system do that, or maybe they don't. XML is really Just The Thing for the job. And the fact that it can be tweaked by a human (e.g. the sysadmin who has to fix a broken thing) is fantastically useful.
http://rocknerd.co.uk
The answer to one particular parsing stupidity is not to introduce a different, altogether different set of parsing stupidities to fix it. XML is not a programming language, and making it into one is a pretty distressing and contorted thing to do.
Need a Python, C++, Unix, Linux develop