Inside XML
The Scoop People love it, but XML won't save the world. If properly applied, it will improve the transfer of information between different individuals, platforms, and programs. A language that describes languages, XML in the real world has spawned hundreds of applications. In Inside XML, Steven Holzner attempts to make sense of the basic principles and more popular implementations as things stand right now. What's to Like? Holzner's caught platform independence fever, and he imparts a healthy sense of respect for W3C standards to his readers. While the current state of XML handling, especially in web browsers, is mediocre at best, he varies platforms when possible. Though most examples use IE on Windows, the author occasionally examines offerings from Mozilla and IBM.
The book's strength is describing a technology. The first five chapters explore XML's essential concepts, including DTDs and schemas, in as good an explanation as you'll find anywhere. Later chapters cover XSL (used to format and to transform documents), XHTML (the successor to HTML), CSS (governing the presentation of XML and XHTML documents) and RDF and CDF (to describe available resources) in sufficient detail. The explanations here are good, with accurate information and plenty of examples.
Java programmers will appreciate the extended descriptions of the DOM and SAX parsing styles. Though the examples themselves are in Java, most concepts translate fairly well to other languages. JavaScript also gets some attention, mostly in the confines of IE5.
What's to Consider? Though the cover blurb claims otherwise, most programming examples use Java. Perl earns a brief 13-page treatment, while ASP and Java Servlets share just eight pages in the same chapter. Exotic languages like C and C++ are conspicuously absent. A detailed description of the DOM and SAX approaches would benefit everyone, not just Java hackers.This massive tome could have stood another round of editing. Many examples run up to a page and a half in length when only two to four lines have changed from the previous listing. Other material is arguably filler, such as four and a half pages of JavaScript events supported in IE, or fifteen pages detailing XML DOM objects and associated methods before giving a single example of DOM usage. The publisher could have cut between 100 and 200 pages, instead adding footnotes to authoritative sites.
Worse yet, the book's organization is questionable. After describing the basics of XML, it veers off into a 50-page JavaScript tutorial. Java soon suffers the same fate. These chapters break the flow of subjects, use no XML in their examples, and should be appendices. (They're decent, as far as tutorials go. They just don't belong in the middle of the book.) Readers will have difficulty finding useful reference material mixed in with tutorials.
English majors will also find Holzner's transitions awkward. Logical sections often conclude with a phrase such as "Now I will talk about the topic named in the heading immediately following this sentence." XML is not a serial radio cliffhanger, and most readers can find their way down the page by themselves. It occurs often enough to be distracting.
The Summary Besides the reservations above, most of the information is solid and usable. Inside XML is at its best when describing technologies instead of how to work with them. Uneven presentation hinders (not hobbles) the book, making it a better introduction than a definitive guide. Though falling short of its claims, cautious readers will learn plenty. Table of Contents- Essential XML
- Creating Well-Formed XML Documents
- Valid XML Documents: Creating Document Type Definitions
- DTDs: Entities and Attributes
- Creating XML Schemas
- Understanding JavaScript
- Handling XML Documents with JavaScript
- XML and Data Binding
- Cascading Style Sheets
- Understanding Java
- Java and the XML DOM
- Java and SAX
- XSL Transformations
- XSL Formatting Objects
- XLinks and XPointers
- Essential XHTML
- XHTML at Work
- Resource Description Framework and Channel Definition Format
- Vector Markup Language
- WML, ASP, JSP, Servlets, and Perl
- The XML 1.0 Specification
You can purchase this book at FatBrain.
This book has served me well.
It was a huge advantage, but not because XML is some amazing breakthrough, no, just because it's a standard meta-syntax. So we can re-use the XML parser on each data source rather than having to write a whole new parser for each data source, which is the way that sort of thing used to go.
So if MapQuest started offering a new data service that we subscribed to, like driving directions for flying cars, it would be very helpful if they offered the data served from their servers in XML -- just because we are already set up to use XML.
So the whole magic of XML is just that it's standard and flexible. That makes it highly worthwhile.
I have my doubts about other XML related subjects like XSL and XHTML, which may not ever get hugely popular. But XML itself is already hugely popular -- behind the scenes where you may never notice it, busily exchanging data with remote servers.
Professional Wild-Eyed Visionary
This usage predates XML. Many compilers are written this way, in fact. One could just as easily substitute any intermediate format for XML and get the same advantages.
The main advantage of XML is that it gives you not a common document format (since DTDs differ) but a common syntax, so you can use a common parser. That's a win, but it's not the cure for all the world's problems.
[This is on-topic, honest...]
I don't know if it is just me, but does anyone else here deal with that "interesting" data represenation language "ASN.1" (Abstract Syntax Notation 1) with its associated binary representation BER (Basic Encoding Rules)?
ASN.1 defines a textual language for the representation of named primitive data types (strings, integers, real numbers, bitstrings, etc), structured ways of grouping them together (sequences, sets, etc).
BER provides a machine-independent way for these to be represented 'on-the-wire'. These representations are also abritrary-precision and have other cool features.
Fortunately or unfortunately (depending on your perspective and religion with respect to 'ISO/OSI' standards) ASN.1 never really caught on in the Internet world (SNMP and LDAP permitting).
However, I see it as playing a very similar role as XML. (Machine-independent representation of arbitrarily complex structured/nested data).
The main difference is that the BER represenation of ASN.1 is a (somewhat complex) binary format, whereas XML is text-based.
This is both an advantage and a disadvantage (more compact, harder for humans to read).
I'm wondering whether it is worth anyone's time writing up a BER XML translator to attack those 'but XML is too verbose' criticisms...
Maybe I will write a Perl program to post that comment to Slashdot about every article that comes up. I'll call it "first_post.pl" It will do constant HTTP GET's of the webserver, and post that comment right away whenever there is a new article. I will be the first-post king!
of course it isn't. No language does EVERYTHING you want it to do as easily as you want it to. You have to put work into it. XML is a useful language just like every other language.
It really irks me when people say that about "X", each language has something that it does especially well, XML has its place...
Several our products use XML as a scripting
control language, parameter management, and
report generation. It's not the core technology
but useful. It is not necessarily the best
featured way of performing these functions,
but from a life-cycle maintenance point of view.
We need things likely to be around for ten years.
Well, I'm currently reviewing O'Reilly's Learning XML, but I'd still say that Inside XML still seems to be a better book for beginners. Mainly because you can continue on with this book and increase your knowledge once it teaches you the basics, whereas advanced topics like XPointers and XLinks are pretty lacking in the O'Reilly book. O'Reilly's XML in a Nutshell is good, but if you choose Learning XML, it's pretty much a necessity (if you're into that fancy book learnin') because of Learning XML lack of advanced topics.
Another book that I'd recommend for beginners just below Inside XML is the second edition of Just XML. It's not as thorough as Inside XML, but it still manages to delve into quite a bit, like XLinks and XPointers (and covers them well for beginners, slowing down for parts that the author knows his readers will have trouble wrapping their heads around). The cool thing about it is the author's very approachable style, which makes for a very quick read (plus, there's a lot of anecdotal fluff that you can skip if you're not up for being amused), and the best part is that throughout the book, you're working toward building a B-movie database. The hands-on approach is nice, as I know a lot of XML newbies are left thinking, "But what can I use this stuff for?" Do make sure that you get the second edition. Again, I've just started Learning XML, but it's not seeming like it'll top either of these two. (Not that I think it's a poor book.)
Stay away from O'Reilly's XML Pocket Reference. It's too old. In the next month or two, they're coming out with a second edition of it, which I would expect to be a great pickup for the price. Wait 'til then.
Cheers,
Uh, scotch that. "flying buttrice" won't fly as a tag name, as one can clearly see from the spec. The BNF (buttrice-naming form) clearly states that whitespace separates a tag name from an (optional) list of attributes or the final ">". Nobody wants a malformed buttrice. ;-)
Babar
XML does offer great portability of content across devices. IBM uses it to provide content to their website, wap site, bluetooth devices, etc. Using XML, XSL, XSLT, Apache's Coccoon, and numerous other additions. The dynamic generation of PDFs is always nice too.
I think that one can offer great services processing XML data, but I can't seem to find a whole lot of it. As an example, some online TV guide could provide the information on what's on at what time simply in XML. I could retrieve that, add links to IMDB or whatever... Right now, one would have to download (messy) HTML, which is a pain to parse and likely to change its structure with the next site facelift (of whoever provides it).
Is there a repository of XML data? A list of links, maybe?
There are virtual machines for almost every platform. Some of them (esp. Kaffe) are free _and_ very portable.
That's what I should have said, "gives a common syntax".
Best Slashdot Co
At DMSO they are using xml for communication between different model systems. The XML defines a common format. Instead of having to support filters to convert each doc type to each other doc type you just have to be able to convert each to/from xml. If there are 3 doc types, you need 8 filters (3^2) to get to/from each. If you use XML you need 3 filters to get to/from xml. Assuming the filter is 2-way,from/to. If it is one way, to or from only, double the number of filters. This is what xml is really designed for. To get a common document format.
Best Slashdot Co
Wrox press has a really good book on XSLT, called XSLT, Programmer's Reference. Of course the key to XML is the Apache projects parser and transformer libs over at http://www.apache.org
Someone you trust is one of us.
Or don't, since arguably, you're the only person/company that'll ever use that data. Just send the data.
XML describes the syntax only, not the content. I would argue that the syntax is never the important part of the message. Hence regardless of whether the reader can figure out the syntax of the XML encoded data that describes itself (woo.) your reader would still have to be partial to information about what to do with it.
Since the bulk of your data is content, and the bulk of what must be done with it cannot be described through XML and syntax descriptions, then the bulk of your work will be writing your reader, and your writer. Whom does it aid in that case that the data is in the almost human unreadable XML format, as opposed to being either more readable or more compact as per your inevitable specifications?
-Daniel
don't believe the hype - zootv [U2]
Ignoring the minor problem that ASN.1's binary representations are the ugliest thing since Intercal or maybe PGP and that people who try too hard to steal bits should be locked up in padded rooms, and that in spite of its ugliness it's still offers incompleteness and ambiguity, the important difference between the two standards is that ASN.1 is a top-down centrally-controlled standard where anybody who wants to define an object type has to either negotiate with a committee to get namespace or buy a hunk of private vendor namespace, while XML is decentralized and anybody can define any object type they like that doesn't start with [Xx][Mm][Ll] and propagate the definitions. The good part about centralization is that it's unambiguous, doesn't lead to conflicts, and reduces portability problems, but it's slow, cumbersome, and often not worth the bother (though the slowness and cumbersomeness does encourage you to get the design right before going through the pain of registration.) Decentralized groups who want to do reusable interoperable XML DTDs still have to negotiate namespace with each other, but you can resolve much of that with naming conventions like FooProjectWidget1 instead of just calling your object Widget1, so you only need to discuss with other FooProject makers what kinds of widgets you need.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
I agree that XML does have some similarities to Unix tab-delimited text files, but I think a better analog is SQL databases - the database is a bunch of tuples, and the database schema is also a bunch of tuples.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
any XML book that's any larger than a post-it note, to largely be filled with useless information on unrelated topics.
My "XML Book" is mainly double-sided prints of TR's from the W3C site in a ring-binder. It recently spilled over into a second 2" thick binder. I use all of this stuff on a regular basis - I also have most of it available as off-line webbage on my laptop.
Do you understand HTML tags
There's more to it than that. If you said SGML, then you might be closer to it.
Try this - Is a "naked ampersand" (i.e. not ) valid HTML ? Is it valid XML ? If anyone still reads Usenet, and the webmastering groups, then follow the recent thread in there on just this subject (where I had my butt spanked for getting it wrong). It's not as simple as you think, and it's not all the same as HTML.
It saddens me when ever I see people talk about XML as if it is "just a markup language" while this is true, but that is only small part of what XML is "really" about.
Lets face it. When Java was interdicted, its goal was to have Applet running in your home refrigerator, and toaster to name a few -- this was the basic goal of Java. With the arrival of the browsers, this goal was extended to computers so that you can write your program once and it will run on any computer (the "write once run everywhere" slogan.)
While the underlying principle of Java is very powerful, achieving it is so hard. The main reason is due to the fact that you will need a JVM, without which the idea of run "everywhere" is useless.
While for a full fledge PC this is not much of a problem (almost every flavor of an OS out there today has some version of JVM), this is a serious problem for new devices and as well as for companies that want to use Java. Here is why:
1) In order for Java to run on a new device, a JVM must exits for it.
2) In order for an existing application to run in a Java environment (using JVM) the application must be re-written in Java
Sure those issues can be addressed, but doing so you end up by making the new device and the new language "tightly coupled" to Java and JVM. In a client/server environment, this is a limited design such that it means the client part must be bounded to the server part.
XML frees you from this "tight coupling". All that I need to do is publish my Schema using XML and any application written in any language running on any device can now communicate with my application. So if my application provides a service to process patient record and I publish my API to my server-application via a Schema and XML-SOAP, than the client can get my service by simply adhering to my Schema -- the client can be written in any language and running on any device.
In short, XML is all about "data exchange protocol" -- the communication between two XML enabled applications is happening at the data-encoding-level. This is the power of XML where an API or SDK based solution can't solve.
So from now on, stop thinking about XML as "just a markup language" -- XML is a new way for which applications (and soon, components) well start communicating. The future of programming is based on data-communication not API or SDK or a language.
-- George
---------------
Sig
abbr.
Karma stuck at 50? Add 2-5 inches.. err.. 2-5x Karmas Count to your pen1es.. err.. Karma all naturally and private
I think you mean it is not a programming language.
XML = eXtensible Markup Language
It is a language. It has a defined syntax and semantics.
take a triptonica to subthunk
Years after the virtues of XML were first extolled XML still isn't the do-all, be-all wonder we were led to believe.
GNOME wasn't built in a day.
http://www.google.com/search?sourceid=navclient&q= xml+tutorial
No, Thursday's out. How about never - is never good for you?
However, I don't think we need an 8 foot tall case of books on it at Borders.
XML Certification Test:
Question 1: Do you understand HTML tags?
Answer: Yes.
Result: You're certified.
Anyways, you'd have to expect any XML book that's any larger than a post-it note, to largely be filled with useless information on unrelated topics.
But I do hope your little XFL thing lasts. I believe that at long last the US soul has been mirrored in sport: a pefect mix of agression, vanity, and reductive dualism that reflects everything that Americans hold dear.
We thieves, we liars, we vandals, and poets. Networked agents of Cthulhu Borealis.
Aside from what the other guy pointed out about your example, XML provides for much more than just simple x=y type data passing. The beauty is in the heirarchies that can be setup to make data passing so much easier.
What if you're transmitting results from a database query in plain text? With your example, you might say, "put all data for one row on one line." Then what happens if some of the data is multi-line? Well, you have to escape that data somehow.
With XML you can define a "row", and inside a row you can define a "column", and each value in your columns can have well-defined types.
So, by all means use
KEY1="value1"
KEY2="value2"
but if you want something more robust for passing data across standard implementations in many languages, use XML.
"And like that
The problem with tabular data is that it doesn't let you represent a lot of information in a convenient form: people often do need and want hierarchical/tree-structured data. And tabular data in UNIX isn't self-describing. Configuration files, package descriptions, bibliographic citatinos, etc. all need fairly complex descriptions.
There are many ways of representing tree structured data. XML wouldn't be my favorite, but it is workable. And XML is getting a fairly complete set of tools for dealing with tree structured data: search, extraction, restructuring, etc.
With this, maybe the Linux community will pick up some of that old UNIX spirit again. Today, the habit seems to be that when anything needs to get done on Linux, someone writes a big Gnome or command line program in C, or, on a good day, writes a monolithic Perl program. For example, something like "rpmfind" should really just be a collection of a few command line tools: something like the Xerces tools for extracting information, gunzip to uncompress the data, and curl to retrieve information. The same is true for a lot of other Linux applications.
Oh, if you want to play around with XML, I found some of the Apache Xerces tools at http://xml.apache.org/ to be quite useful. They come in both Java and C++ flavors.
The idea is that XML spares you the trouble of defining your own file format and managing all the grunt work of data files. It lets you use common parsing, validation, and document manipulation tools.
Hmmm.... let's try a little something here:
Funny... it still sounds right. now let's try: Spooky - sounds like a Microsoft press release. Now, for the grande finale: OK, well maybe that was pushing it just a little...________________________
Corporate Jenga: You take a blockhead from the bottom and you put him on top...
If you're using Java, then a properties file is a good alternative, but if your data gets too complex, (e.g. repeating fields), XML will be much simpler.
To me, the main advantage is the fact that it is both machine and human readable. An XML config file is normally instantly understandable and programming languages can manipulate it quite easily without having to worry about CR/LFs and the completely different formats in flat file databases.
Also, the new XML Schemas allow a fully self-documenting, detailed explanation of what the content of an XML document should contain. You could use a stylesheet to turn the Schemas into DocBook documentation if you so desired. Certainly better than going over your application, taking notes, and writing up the documentation.
To be honest, although all of here at Slashdot can think that XML hasn't had an effect. MS's .NET is going to be entirely XML based, which is a good thing as it will allow communication with their platform easily. Sun's released ONE, which is just a rebranding name against .NET for Java, but it works now and is being used now. GNOME uses XML heavily and why not? Anyone writing applications can easily read the config files and output of another application and know what to do with it.
Okay, I've ranted a bit here, sorry, but it isn't just the future, it's the present. Of course, in the UNIX world we'll continue to use flat files and standard non-object oriented databases, but when we want to talk with the rest of the world, we will have a method of doing so now that doesn't involve reverse-engineering and so is a lot quicker to develop.
Prior to XML, we had used our own text based markup language (surprisingly similar to XML except only two levels of hierarchy) since 1989. We (30 year OLTP designers and coders) found it *much easier* to design, develop, debug, comunicate about, and communicate with than prior fixed field non-text formats
List of Synergies in case of XML (and mostly true with our old approach) include:
I would also point out that I have used SAX in some cases and DOMs in another. I had no problem quickly using SAX for message-based uses. It may be harder when using all the features of XML, but not all are needed for most data interchange usages.
The only good weather is bad weather.
Of course you can't think of XML as some sort of godsend. It won't make your children's teeth straighter and whiter. It won't solve world hunger, and it won't create a tax law that is equally fair and acceptable to all.
It's just a markup language. It's only strengths lie in the fact that 1)it clearly and easily represents heirarchical data and 2)could become a standard way of representing data in many applications.
If XML finds its way into wide acceptance under certain industries(if it hasn't already), then its strength as a descriptive markup is perfectly valid. It will make business easier if you can unambiguously exchange information, rather than sifting through proprietary annotations or trying to convert a flat ASCII file into [proprietary language of your choice].
<rant>
I am sorry to see from the looks of the review that New Riders has gone the way of Sybex and Que, though. As far as books in "pop tech" go, you can usually go by this rule of thumb: thickness is inversely proportional to quality.
Take two examples: My O'Reilly XML(before the standard) pocketbook, and our Que "Mastering Javascript" Special edition. My O'Reilly book is a scant 107 (small) pages, yet has proven to be a completely invaluable reference. I wouldn't trade it for anything but the next edition of the same book. Back when I was trying to learn Javascript, that freaking Que book wasted more of my time than anything else I've ever read. By the time I needed to know about the syntax for multi-dimensional arrays you could just forget it.
</rant>
This is a manual virus. Copy it to your sig and help me spread!
People that explain XML carefully should be revered. These scholars are pointing the way to the future. WHO CARES if the book uses strange English. It is after all a technical document and should be as obfuscated as possible!
Quote from early car manual (Subaru 360) "if one wish to engage first gear, one is pleased to depress clutch." Who needs clarity in writing and loginc when gems such as this are produced. I'm awaiting my XML update to that great old car
"Science is about ego as much as it is about discovery and truth " - I said it, so sue me.
Some say that the XML isn't even a real language, that in spite of its proclaimed extensibility, it is "fixed." But I think they're cultural elitists.
Applications for the XML