What Is XML And Why Should I Care?
Anonymous Coward asks: "I 've been reading a lot about XML, I know Slashdot uses it for some features, but I haven't found a site or tutorials that give clear and good examples about it. There are a lot of software for Windows to develop XML aplications. However, I have only seen XML parsers for Linux (no applications). Much of the tutorials found on the Web about XML are not that good. It's all abstract. I am looking for good examples..." This may be FAQ, but I too would like to see the "Layman's Description of XML and Why It is Cool."
It's one of thost things that when you start using it, you start to really see how significant it really is. Let me go over some high points:
1. The web has shown us how useful a mechanism that plain text is for communication. In this day, essentially anyone or anything can read simple text. It is ubiquitous (I will use this word again).
2. When two things need to communicate, they need to establish a method of communication. In the annals of the computer industry, many forms of communication involve "one-off" type plain text communication mechanisms. Think flat files. Think fielded files (COBOL copylib's anyone??? ARRRGH!!!) Think comma delimited, tab delimited, etc. XML is essentially a contender in this arena. XML happens to be better.
3. XML is a better mechanism for many reasons.
a. It represents hierarchical data well (this is a key piece). It is difficult to effectively represent "has-a" type relationships in a tab delimited file... (Customers have orders, orders have items, items have descriptions, etc.)
b. It has built in mechanisms so that using third party tools called parsers, you can (without writing a line of code), validate that an XML document is *syntactically* correct. Think about how important this is when communicating between two systems. When you know before you even touch the data that it is syntactically correct, that simplifies things a great deal.
c. It is human readable. Tags are meant to be self describing, so that you can look at an XML document, and have a clue what the data represents that you are looking at.
4. When combined with ubiquitous (to use that word again) protocols like HTTP over TCP/IP, which is supported by most systems today, XML becomes an extremely effective form of communication between two arbitrary systems. The operating systems and hardware platforms and underlying architectures become complete irrelevant (with respect to the two systems) because the form of communication is so trivial to use.
*Obviously* building a system based on XML is no small matter. *Obviously* XML is not the end-all be-all of the computing world. *Obviously* XML is not going to cure cancer.
But it is *really* cool...
XML suffers from the same problems that a lot of "popular" technologies suffer from: overhype. XML has a lot of potential to change the way you move data around. You can share data between totally different applications. You can post an XML version of your headlines (such as sites like Slashdot, LWN, Freshmeat, et al do) and have other sites snarf them to list the headlines. It's a desert topping and a floor wax. You may not believe it, but it will cure your asthma, too.
OK, enough hype.
XML is a data description standard that relies on pairs of tags, which are enclosed by < and >. These tags can be nested, and this nesting represents a hierarchy. For example:
<foo>Hello!</foo>
Here, the foo tag has a value of "Hello!". I could just as easily written the same thing using attributes:
<foo value="Hello!" />
This tells us the same thing. Here is a nested tag:
<foo><bar>Hello</bar>
</foo>
So, bar lives inside foo, and has the value "Hello!". Who cares about all this stuff? Why does this matter? Glad you asked.
Basically what it means is that I can take my data, in whatever format I keep it in (whether it be text files, HTML files, PGP-scrambled MD5 hashes, or even something really stupid like an Access database), convert it to XML format, and it is easily usable by other programs.
How? you ask. In order for a document to be valid XML, it has to meet some pretty stringent requirements, such as all tags must be closed and properly nested. In addition, you can define you data types in advance (some standard XML document types (called a DTDsor Document Type Definition) are RSS (Rich Site Summary, a Netscape-induced standard that lets you describe a sites contents (this is what Slashdot uses)) and CDF (Microsoft's Channel Definition Format, used for their (failed) push technologies)).
Yeah, great. Contrary to what uyou may be reading and such, XML is not revolutionary; XML is not earth-shattering; XML is not new. XML is a good idea that just happens to have a lot of people, and therefore a lot of momentum, behind it.
How do I use it? Well, the first thing XML requires is a parser. A parser (usually) reads in the XML, turns it into some sort of a parse tree, and them outputs it into some format your target application finds useful. There are many parsers out there, many written in Java, but of course also in C, Perl, Python, Tcl, and others. Slashdot uses the XML::RSS module to slurp in headlines from the 8 zillions other sites that make up the slashboxed on the right of the page.
The story of XML is the story of potential. There is tons of it: Potential to share data among applications and among businesses. But you still have to do much of the work. Premade solutions (such as Perl's XML::RSS) tend to be specific purpose solutions, or very general purpose (like the expat parser, written in C, which you plug in to your application to parse XML). Most of the work still needs to be done by the programmer in question; XML provides a framework for data sharing.
This frameworks entirely developed by the developer who controls the data. While you can use predefined DTDs if you want, you are not at all obligated to do so. Recently /. ran a review of "Docbook: The Definitive Guide"; Docbook is an example of a premade DTD for technical writing and documentation. But of course everyone's data is different, so your DTD will reflect your data exactly, without you having to modify it to fit into someone else's schema.
XML is only a part of the story; it describes the data itself, with nothing about how the data should be presented or connected to other data sources. These other parts have their own markup languages, XSL (eXtensible Stylesheet Language) and XLL (eXtensible Linking Language). There are tons of X_L lanagues (eXtensible Query Language (XQL) anyone?) which are designed to fill in the various gaps.
Microsoft, for all their faults, have been doing a lot with XML lately. They are moving the native formats for their Office suite to be XML-based; there's CDF I mentioned earlier; they developed a business-to-business langauge called BizTalk (which is just a DTD and some assorted supporting programs/parsers/etc). IBM has also done a great deal with XML and Java, producing parsers and translators.
Hope I didn't ramble or jump around too much.
darren
Cthulhu for President!
(darren)