Effective XML
In Effective XML: 50 Specific Ways to Improve Your XML, Elliotte Rusty Harold takes a different approach: know your elements and tags -- they are not the same thing! -- and weigh your choices in a context, because any technology applied for the wrong reasons may fail to deliver on its promises.
Following Scott Myers' groundbreaking Effective C++, the author invites us to re-evaluate seemingly trivial issues to discover that life is not as simple as it seems in the world of XML. In each of the 50 items (chapters), he gets into the inner workings of the language, its usage and related standards, thus giving us specific advice on how to use XML correctly and efficiently. The 300-page book is divided into four parts: Syntax, Structure, Semantics, and Implementation. Yet in the introduction, the author sets the tone by discussing such fundamental issues as "Element versus Tag," "Children versus Child Elements versus Content," "Text versus Character Data versus Markup," etc. On these first pages the author started earning my trust and admiration for his knowledge and ability to get right to the point in a clear and simple language.
The first part, Syntax, contains items covering issues related to the microstructure of the language, and best practices in writing legible,maintainable, and extensible XML documents. (In it, over 19 pages are dedicated to the implications of the XML declaration!) That seems a lot for one XML statement that most people cut-and-paste at the top of their XML documents without giving it much thought, doesn't it? Actually not, if you follow the author's reasoning and examples.
The second part, Structure, discusses issues that arise when creating data representation in XML, i.e. mapping real-world information into trees, elements, and attributes of an XML document; it also talks about tools and techniques for designing and documenting namespaces and schemas.
The third part, Semantics, explains the best ways to convert structural information represented in XML documents into the data with its semantics. It teaches us how to choose the appropriate API and tools for different types of processing to achieve the best effect. This chapter has a lot of good advice for creating solutions that are simple, effective, and robust.
The final part, Implementation, advises the reader on design and integration issues related to the utilization of XML; these issues include data integrity, verification, compression, authentication, caching, etc.
This book will be useful to a professional with any level of experience. It may be used as a tutorial and read from the cover to cover, or one can enjoy reading selected items, depending on the experience and taste. The book's very detailed index makes it an excellent reference on the subject as well. In the prefix to the book, the author writes, "Learning the fundamentals of XML might take a programmer a week. Learning how to use XML effectively might take a lifetime." I'm not sure about the "lifetime" -- that's an awfully long time for using one technology -- but for the most confident of us this still may not be enough :) . Your mileage may vary, but I suspect that you could shave a few months off that time by browsing through this book once in a while. Most importantly, it will make you a better professional and make you proud of the results of your work. Wouldn't this worth your while?
You can purchase Effective XML: 50 Specific Ways to Improve Your XML from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
If you want to read any book for free, just ask your local library to order it and they will. Libraries guess at what books people want to read, so if anyone shows any interest in any book, they order it. They loose their federal funding if they don't spend the money they are allocated, so they are generally VERY willing to buy as much as possible.
----
Squirrel
It's got to be better than Ineffective XML
Yet Another Web Site
Reading this book shortens life expectancy. Still, it's your choice...
Using HTML in email is like putting sound effects on your phone calls. Just say <strong>no</strong>.
That the book won't mention the "s-exprs on drag" angle...
Save your wrists today - switch to Dvorak
Actio personalis moritur cum persona. (Dead men don't sue)
Sure, XML isn't inherently that deep - but neither are the tab-separated ASCII files which Unix tools used to do all kinds of really powerful things. Similarly, LISP property lists aren't that complex. XML's a bit more flexible, and carries enough decoration with it that people are willing to use it for building interfaces that they might not build using ASCII or XDR. And anything that lets the EDI people replace their stuff with simpler, more open technology is good too..
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
On a related note, more details on Microsoft Indigo are finally available. According to this article on XML mania microsoft's future platform will use XML as much as possible. More details are available on microsft's site. The funniest part is they are claiming indigo + longhorn will be the best thing since slice bread. Maybe they haven't learned the hard lesson that parsing XML kills performance.
I think one of the main problems with the embedding of XML architecture into office productivity software is unfortunately the end user. I mean, how long have programmes like MS Word had "document properties" contained in them, and how many people are actually using them? I'm currently working on a project to retrieve documents accross a company's backed-up data from the past 10 years, and there is very very little metadata available for us to do any searching on. Unless the embedded XML contained within office suites is brought more "to the fore" and in the face of users, instead of being a behind the scenes 'option', people just are not going to use it
The linux hacker
So, are they touting application, or merely increasing your ability to do something useless?
XML has great web potential, but saying so is dangerously rehashing an old argument and certainly not new.
So why would we want a book that supposedly teaches us how to use something for which we as of yet have little use in the online world?
I get the point. I thing application and implementation should be pushed before this.
It's like teaching someone to program in cobol without giving them a robotic arm on which to experiment.
Damon,
http://actionPlant.com
XML would work better if there were consistent DTDs for tagging information that everyone would use. There should be an open database of these DTDS.
I was looking for a simple one to tag photos with. Couldn't find it, made my own. Is there a repository of these DTDs out there?
... a floor wax and a desert topping...
"This is even more effective"
:wq!
that is a XML database. xindice looks interesting, though I wonder how it will scale?
Does the book discuss the pros and cons of XML? Such as, when is it a good idea to use XML? When would a CSV, INI, or other structured text document be a better choice than XML?
These are issues that need to be solved first, before one creates an effective XML structure. Does the book address them?
"Times have not become more violent. They have just become more televised."
-Marilyn Manson
One of the things that I have found limiting about XML is that it is inheirently hierarchical. Real "things" can be categorized many ways. Hierarchical classification systems (such as our modern file systems) work poorly to classify a broad scope of information. Thus, some of the new development in the FS in Longhorn and also some I've head about, but can't remember, for Linux.
There are only 6,863,795,529 types of people in the world.
Wouldn't this worth your while?
Wouldn't this what my while???
All your base are belong to us!
(huge eye roll)
Glad you cleared that up for us non-programmers. Now if I could just figure out what it really is!
Include an XML Declaration
Mark Up with ASCII if Possible
Stay with XML 1.0
Use Standard Entity References
Comment DTDs Liberally
Name Elements with Camel Case
Parameterize DTDs
Modularize DTDs
Distinguish Text from Markup
White Space Matters
Structure:
Make Structure Explicit through Markup
Store Metadata in Attributes
Remember Mixed Content
Allow All XML Syntax
Build on Top of Structures, Not Syntax
Prefer URLs to Unparsed Entities and Notations
Use Processing Instructions for Process-Specific Content
Include All Information in the Instance Document
Encode Binary Data Using Quoted Printable and/or Base64
Use Namespaces for Modularity and Extensibility
Rely on Namespace URIs, Not Prefixes
Don't Use Namespace Prefixes in Element Content and Attribute Values
Reuse XHTML for Generic Narrative Content
Choose the Right Schema Language for the Job
Pretend There's No Such Thing as the PSVI
Version Documents, Schemas, and Stylesheets
Mark Up According to Meaning
Semantics:
Use Only What You Need
Always Use a Parser
Layer Functionality
Program to Standard APIs
Choose SAX for Computer Efficiency
Choose DOM for Standards Support
Read the Complete DTD
Navigate with XPath
Serialize XML with XML
Validate Inside Your Program with Schemas
Implementation:
Write in Unicode
Parameterize XSLT Stylesheets
Avoid Vendor Lock-In
Hang On to Your Relational Database
Document Namespaces with RDDL
Preprocess XSLT on the Server Side
Serve XML+CSS to the Client
Pick the Correct MIME Media Type
Tidy Up Your HTML
Catalog Common Resources
Verify Documents with XML Digital Signatures
Hide Confidential Data with XML Encryption
Compress if Space Is a Problem
Too bad this was moderated down...that is the perenial problem with XML and other technologies. A universalist technology compromises a little bit of everything for everybody. Possibly moderator didn't catch the SNL reference.
When the people fear their government, there is tyranny; when the government fears the people, there is liberty.
It has been my experience with XML that it is like a lot of other things in development: the good developers understand it immediately and have native intuition towards best practices. The bad developers never really get it and spend their time reproducing tricks they saw in a cookbook. That's good and fine until you need something that doesn't quite fit into categories a, b or c. Another example of this is how high school and university data structure/algorithm classes never spend any time of development of new data structures that exactly meet the problem specification. Instead they lay out half a dozen types of linear lists, a couple of trees, and some hashing functions and say, "Well, you can glue just about anything together from this." Perhaps this book takes what is, IMHO, the better approach-- laying out the tools and politely explaining what the implication of each is, rather than attempting to list out pages of cute examples of what each can do.
====
Crudely Drawn Games
I know that as a student maintaining a website I am in the minority of XML users, but I the main thing that stops me from moving my site (small-scale though it may be) over to using more XML is sheer server load. The fact of the matter is that we still don't have true low-bandwidth database solutions, and until this changes, I doubt that much will be done with technologies like XML (at least on smaller, non-corporate sites) no matter how much potential they have.
--Goat
CEO, Goat Software
Goatblog
... and it is starting to dawn on me that trends like pervasive XMLization is going to haunt us for ever. The combination of business-minded consultants that push a market to create demand for themselves and a huge number of clueless but enthusiastic developers that will jump on any new idea and push it where it doesn't want to go unsurprisingly leads to this kind of instability.
I hate XML with a passion. Let me present you with three examples
1) Programming languages based on XML.
Yes, it is true. Perverted minds, somewhere on this planet, actually seems to think that this is a neat idea! Since their initial conception the pivotal point of programming languages have been to raise the level of programming. To move from the computers domain to the human domain - to make it more intuitive an natural for a human being to program a computer. With these new XML-based languages we are moving a step backwards, because truely the only benefit of XML in this context is that it is easier for computers to parse, while it is certainly harder for humans.
2) XSLT
Have you tried it? I rest my case.
3) SOAP
Okay, initially this actually seemed like a good idea to me, but having thought about it, I really think it sucks. Okay, so it is easier to implement SOAP for a particular platform or programming language, but a wire protocol is like a compiler or an OS kernel in a certain sense - it is okay that it is very hard to write, as long as it is stable and high performance, because it is such a central component.
... is XML-RPC. A sort of lightweight SOAP. Very very useful for API's when you're doing cross-platform coding...
:-)
The site has loads of implementations of both server and client code, some in *very* obscure languages
Simon.
Physicists get Hadrons!
I have not read this book, but it sounds interesting already.
XML is an interesting technology that has the potential for changing the way we use technology in all kinds of weird and wonderful ways. (And in a few ways that may not be so wonderful.) But using XML correctly is tough. I've written and discarded more DTDs and schemata than I care to admit because they were seriously flawed. Getting it right is important and very, very hard.
XML looks simple, and in some ways it is. But in so many other ways it is not simple at all - in large part because it gives us a tool to approach some very hard problems. And hard problems, often even when expressed in the simplest way around, tend to stay hard. (Calculus makes saying some things simple, for example, but understanding those things still takes work and insight.)
I will be taking a good look at this book in the near future to see what it has to say. And I'd urge those who dislike XML to do the same. And finally, even those who like XML need to think hard about how to use it well, so perhaps this would be a good read for them too.
XML is just text! If the XML parser is slow, write a faster one! Figure out where the bottlenecks are! Don't give me this XML is slow crap. This is slashdot - you're supposed to be a geek. If you don't like XML, fine, but come up with a geeky reason not to like it, not some problem whose solution is just to roll up your sleeves and do some hacking!
:')
Oy!
I guess an Element would correspond to an object wheras a Tag would highlight a data item in an document. Also, for a webserver serving xml and xslt or css can't you put the processing load onto the client?
Hello Cruel World
i use XML for a lot of things and it's been quite decent. but on the other hand, we're using dual pentium IIIs for trivial stuff that was running fine on a PII with c/c++ app without XML.
the fact is that XML is just marshelling and unmarshelling of all computational data to and from strings thereby negating fast numerical performance that a CPU inherently has. you want to add two numbers? create a string representation, pass it around thru a bunch of parsers/transformers as strings then finally convert it back to the number it really is then add then convert it back to string for passing it around all over again... what a waste.
...resource hogs.
While I'm not an XML zealot, I like the clarity it can bring to many domains of practice. Regarding the performance hit, get a faster computer! If you don't have a fast enough one yet, wait a year.
Lisp was shunned in the past primarily for speed reasons, too. Now the main reason many don't like Lisp is because they don't understand advanced software engineering concepts and write poor Lisp code.
The antidote for misuse of freedom of speech is more freedom of speech.
-- Molly Ivins
Who's giving mod points today? Just look at that!
I'm finding lots of little applications that were using a database or text-file scheme with relatively little data and I've been converting them to xml files for storage and lookup. It's been a performance improvement everytime, often a huge one. There is presumably some size/schema complexity point where this gain turns around the other way (the cost of loading and parsing a doc vs. creating a db connection to parse data), but it's been a big win for me so far.
Hmmm...I've entertained the idea of morphing the incoming XML from a tree to some other graph. There's also the idea of building up a representation of just the nodes and having pointers to the actual data, with a dictionary to reduce the size. Remember, just because it starts out as XML doesn't mean it has to stay that way.
Browse the Technical Reports, Recommendations and Proposed Recommendations at W3C as there are a lot of DTDs and Schemas there. I found a DTD for generic simulation representation there. There's quite a bit if you take the time to look.
What is music when you despise all sound?
i have been in the business for 4 years now, and i use XML on a daily basis.
not only is it a powerful media for representing (and caching) hierarchy/tree-based data, extensions like XSLT providing tremendous advantages in transforming data for a variety of other purposes (you probably hated lisp/scheme based language, too).
While programming language based on XML at first sound a little strange, combining an XML based programming language with XSLT could be super powerful, especially with concepts like code generation...
Wait, before you shoot me let me explain myself. I've tried to view some webpages that are XML-based, but all that shows up in the freakin' browser is the source code. I've ditched HTML 4.0 and use XHTML 1.0 instead, but I don't know about full-blown XML. The only time I've seen XML used properly is when you look at the source code for an MSN Messenger Saved Contacts list, and that isn't a webpage! Could someone please tell me what XML does exactly and where XML would be useful?
"But using XML correctly is tough. I've written and discarded more DTDs and schemata than I care to admit because they were seriously flawed. Getting it right is important and very, very hard."
Then its the wrong tool. If you find yourself writing comments like "This is really tricky code here", then you need to rewrite it! Use a different algorithm, use a different tool.
"XML looks simple, and in some ways it is. But in so many other ways it is not simple at all"
Hmm, just like Visual Basic...I cannot say how many VB apps I've seen (and fixed) that started out simple, and rapidly became quagmires of code.
And how many out there use validation at all? I'd say very few, because no one mentions Xerces lib at all.
And config files, simpler parsing like 'property=value' is easier and faster.
A Gnome config file that has 4 tags, and 1 tag had 80! attributes is just stupid. Yet this is how people use XML.
And finally, if XML is soo great, why are there half a dozen competing markup technologies out there in Freshmeat?
JoeR
So basically you're complaining about closure(1).
BTW INI files have that as well. It's called a carriage-return.
(1) It most certainly isn't the more percise levels of discrimmination, because other formats can do the same, and I don't hear people complaining as much about them.
For performance, EDI is definately better
Well, hang on. There's the cost factor. When you take into account Value Added Network (VAN), storage and interconnection fees, plus the usual per-kilocharacter fees, XML suddenly performs much better - the bandwidth to send it is greater but if you have an FTP server then it's not even your bandwidth at issue. The cost per invoice/order is MUCH less even when development fees are taken into account and therefore performance is higher.
EDI is a pain in the ass to debug too. Missing tags or misplaced required fields - oh joy. Start counting those plusses +++000+75000+?:+++ already.
"It's not your information. It's information about you" - John Ford, Vice President, Equifax
Reading through the posts on this board, I tend to agree with the criticisms about XML. It's a big dreadnought of a specification when, in most cases, a nice light corsair or even single-seat fighter would do the trick. Still, I would normally be inclined to say of XML what is said about Democracy: it's the worst system out there, except for all the others.
Then I found YAML. Long and short, YAML is very lightweight, eminently readable, easy to use (parsers exist in multiple languages) and a pleasure all kinds of projects that require data serialization. Where XML branches off into other types of uses, like XSL programming, YAML doesn't really compete. I find this to be a strength, actually, because once you've used YAML and seen it in action, XSL seems like a big, fat add-on. But for those that rely on XSL and other things, YAML won't do the trick.
But if all you need is data serialization in a compact, easy-to-read, easy-to-use package -- and this, in my opinion, is by far what XML is most used for -- then YAML is great. Give it a shot.
As for XML. I used to hate it with a passion. Now I still hate it, but I'm less passionate. The creators of XML are ambitious people, and they tried to do something in that spirit. It works, basically and XML doesn't deserve *all* the bad press it gets.
Chr0m0Dr0m!C
You can represent tree structured data dead easily in .INI files (as long as your API for parsing them can enumerate sections and keys, not just ask for ones by names you already know).
.INI files without any trouble, other than that of remembering that your application needs to avoid running round loops forever.
Actually there's nothing forcing you to stop at trees; you could represent arbitrary directed graphs in
-2???
I don't understand that either. I don't think it deserves to be modded up but it sure as hell doesn't deserve a -2.
Not everything is analogous to cars. Car analogies rarely work.
Try parsing the compressed form. The redundancy can be used to your advantage.
Since I am not a moderator today I use the filthy "mod parent up" trick instead. Mod parent up and mod me down!
Well at least I know it's not just my imagination... Anyway, this isn't the place to discuss moderation issues I was told. I simply wanted to thank you for undertanding my pov, so there, "Thanks VioletGreen" :)
Portability is just a by product of being a standard. The real benefit of XML is structure and extensibility. config and ini files are more easily parsed if they were in XML. Current parsers may be slow, but that doesn't mean new parsers would not be more performant.
e.g. IBM's take.
You can link between XML entities quite easily.
Also consider that RDF, which describes directed graphs, is quite easily expressed in XML; there's nothing to say that you can't describe a graph and reference actual elements with IDREFs. I don't think you've really thought about this.
Don't forget binary data can be tagged as well. Think generator code. When the parser sees a particular tag (warning binary program ahead) it can run the generator and append the results to the tree. Kind of like the power of a function that generates PI vs a full listing of PI.
XML is highly overrated and generally over-used. Admittedly XML + CSS is better than html, but beyond that its only reasonable use is as a generalized syntax for configuration files, and as such does a good job, or at least I've had success using it that way in the past. Many (if not most) of its other uses are just poor program design. Soap is an extremely silly idea. Why use XML for a marshalling syntax for RPC? It's slower, bulkier, and just a bad choice in comparison to a binary marshalling mechanism. Now as a syntax for an RPC's IDL XML makes a lot of sense, but not as a transport.
Glad to get that off my chest. I have a bitter history with XML. I was the first person at my former company to bring XML in as a uniform configuration file format for our product, but then found myself a couple of years later forced into adding XML specific features to the filesystem that was the core of our company's product. I spent a week thinking about the idea, and concluded that it was a bad one. Thus followed a long (and fruitless) battle with management to scratch the plan. The end result was a technically nifty but useless set of features. The work remains unreleased for lack of customer interest. At least I get a bit of "I told you so." pleasure.
VeryGeekyBooks has more reviews of this book.
And one of them is Just Plain Wrong, also IMHO.
Here are two heuristics for good XML design that I dearly wish more people would take to heart:
1. If processing any text field requires parsing, Something Is Wrong, and you probably need to break it apart into more elements/subelements.
The only exceptions to this rule are fields that are numbers, or maybe date/time stamps that adhere to ISO standards.
2. If you're using attributes, You'll Wish You Hadn't In The Future.
Attributes are supposed to be the way XML seperates metadata from data. The problem with them is that they are also "leaves" of the XML tree, and intended to be simple, flat text. If you ever need more complex structure in attribute metadata, you're screwed - you must either violate rule 1 above, or move the data out into elements, totally breaking your old structure. Just don't use them, OK?
To a Lisp hacker, XML is S-expressions in drag.
This is the same review that is on Amazon.com.
It is the first customer review.
-- This is not a sig
illegitimii non ingravare
uh. caching?
XPath is not inherently a pig. Many API's handle XPath with aplomb, usually building an alternative data structure behind the scenes for access. XSL usually wants the whole tree but many implementations optimize this out unless large structures are being reorganized.
Use the context, Luke.
illegitimii non ingravare
No.
Interestingly, XML was originally intended as a userland technology, bringing the strength of SGML to the web, fixing what was broken in HTML (the last great userland data format). The game has lost sight of the goal a bit, I think, which is the root of much of the kvetching this topic generates.
Frankly, ERH is a great writer and has good insights into the use and abuse of markup. This book is one of the things that was missing while the pro/anti-XML hype trains were picking up steam.
illegitimii non ingravare
That's because people somehow seem hung up on XML having to be text. No, having it gzipped doesn't count, I'm talking about XML at parsing time. Why should XML be text? So humans can write the stuff by hand and read it with ordinary text viewers? What is that about? Wasn't it supposed to be machine readable in the first place? Isn't 99% of its use supposed to be at the hands of automated parsers and middleman tools that take the strain from the human? There's no reason why it couldn't be a nice binary format with all kinds of tricks (standardized ones, mind you) shoved in it to make parsing and modification faster and more efficient.
i ate crayons when i was a kid and now i have two braincells and the blue ones taste nicer
1) Programming languages based on XML
Yeah, but code *generation* with XML is the cat's pyjamas.
2) XSLT
You clearly haven't tried it, or did not use it as intended. Do you have any experience with other functional languages? I work almost exclusively with XSLT at the moment and wouldn't have it any other way.
3) SOAP
is butt-stupid, I admit. But hey, ninty-odd percent of the beef this topic has generated can be fixed with a glance at the book being reviewed.
illegitimii non ingravare
Hope this title does not dilute the strength of the "Effective" brand. I know that the Scott Meyers book and the Java book they put out was also killer. I'm skeptical that XML can be effective in any fashion. Doubtful that this book will change that opinion of mine.
Bandwidth is an order of magnitude more limiting than tree parsing, egg. That and the facilities the tool vendors decorate their stuff with. Of course its not free, what is?
SQLXML and most other value-adds are bull. Your business objects should optimize the hell out of their DB access and return XML. XML is messaging and presentation tier glue. Read the book.
illegitimii non ingravare
Laugh, moderator, laugh!
Yes, I've tried XSLT. It's different, and takes some adjustment. And I use it extensively. It's the right tool for the job. Maybe you use Windows, but in Unixland that's the rule of thumb.
Anyhow, if you really choose to build around XSL there are WYSIWYG XSL template generators, so you can write application logic in your language of choice that spits out XML, and off-load the pretty-print work to a Dreamweaver fanatic (or in this case, XML Spy or your XSL editor of choice).
XPipe is an ambitious project to migrate the usefulness of pipes and text streams to XML. The meat of it is a process to break-up tree transformations into small steps.
That's because everyone uses slow XML parsers. Some years ago at one of the then-top 5 web portals I was unhappy with the standard SAX/DOM parser in use; it was ridiculously slow (and buggy).
So I wrote a new one. Parsing XML became one hundred fold faster! I timed it quite carefully.
Other people in this thread are saying "of course XML is slower than binary formats, it's 3 times bigger." But a factor of 3 in performance is nothing, considering some of the advantages.
A slowdown of 100, on the other hand, is absurd.
I don't know why people don't rebel against this and make faster XML parsers the widely-used ones; for whatever reason, apparently everyone continues using slow parsers.
At any rate, no, XML is not slow. It's just a simple, easy to parse format, for which IBM and others have written very, very slow parsers.
And everyone just assumes that it has to be slow. Sheesh, why should an XML parser be slower than a C++ compiler??? Come on.
Professional Wild-Eyed Visionary
JSON - JavaScript Object Notation
/RS
http://www.json.org/
Enjoy!
read this book with water and everything in your life will be solved.
Recently I was developing a pseudo file system and was using xml to store the metadata (ie date, name, link references, permissions, etc.). The chief advantage of using xml was that the data files were text and could be readly edited and read. However they need to be accessed often and performance was a dog. My boss saw what I was doing and recommended I use perl syntax to represent the hierarchal data and use Data::Dumper and Safe::rdo. I did and performance improved several times while still retaining the advantages of text. For example (using a nominal order record) instead of
...
...
...
<order>
<customer>
<name>
<fname>Bill</fname>
<lname>Brune</lname>
</name>
</name>
<customer>
</order>
<manifest>
&nbs p; <item>
<id>209</id>
<title>Grapes of Wrath</title>
<qnt>1</qnt>
<unit_price>$10.75</unit_price>
  ; </item>
...
would look something like ( compacted to avoid the lameness filter).
order => {
customer => {
fname=>'Bill'
lname=>'Brune'
manifest => [ { id=>1,
title=>'Grapes wrath',
qnt=>1
unit_price=>$10.75
},
{
}
The added advantage is that you can also add code to such as
{ 'timestamp'=> scalar localtime,
'pid'=> getppid,
...
}
Actually server load is the reason I moved to xml. I generate the site with xslt stylesheets and I serve static pages that are updated with a simple 'make'. I get the benefit of custom tags, automatic rss feeds and more, while the server serves static pages (so the users get the pages fast).
Even if you serve your pages as xml and xsl, your bandwidth usage will decrease or increase depending on how well you designed it and the number of pages you serve. In most cases xml pages will be shorter because you will not need to include boilerplate code in every page (menus etc) and the xsl stylesheet will be cached on the user side so you will not need to serve it very often.
The biggest benefit is that you separate presentation from content so you can change your layout dramatically and you don't need to update a single page by hand.
See http://www.xmlsuck.com/pault/pxml/xmlalternatives. html .
XML is a programming language. It is a database. And sometimes it is both at the same time. But it is very rarely ever the best tool for the job.
Actually our library has patron-only online access to various databases and books (kind of like safaria). Very nice, and of course we have the usual main-library databases as well.
I did the same thing for myself. Why pay the cost of dynamic pages if they are static to the server?
Mine has a makefile at its heart too. Makes me feel all fuzzy.
I guess there is always WPXML. There's a group in the W3C working on the problem outside of the WAP arena already, and then there are ASN.1 mappings such as Sun demonstrated in Fast Web Services.
Karma: It's all a bunch of tree-huggin' hippy crap!
Did you just use "RDF" and "easily" in the same sentence?
Karma: It's all a bunch of tree-huggin' hippy crap!
Okay, so if XSLT is bad, what is better? Writing different application code for every single language that ever needs to convert XML into another format?
I'll agree for some things (such as styling web pages), SiteMesh, PHP-Mesh or whatever might be better, but for converting plain XML into XHTML, you can't beat a stylesheet (for the purposes of this statement, CSS does not yet qualify as a stylesheet.)
Karma: It's all a bunch of tree-huggin' hippy crap!
Cool, let's all adopt that, so we can put viruses in web pages more easily. :-D
Karma: It's all a bunch of tree-huggin' hippy crap!
This is posted as "code" because obviously xml is not just plain old text. What's the html formated option for?!? extrans wouldn't work because /. likes to insert spaces randomly for good measure.
:) Also, by all means never put unneeded whitespace.
/.'ers can come up with seem neat ideas. You'd be amazed at what all you can actually call XML. :)
If only there were a native binary format, we wouldn't have this problem to begin with.
Is it too much to ask from the W3C for a binary encoded xml format? Maybe to make my point I'll start using one character tags in UTF-8 encoding. Put the popular tags in using 8 bits, and the unpopular tags in as 16 bits, then I'll just do xslt when anyone wants a copy.
I bet that would shave a good 20-30% off file size and parse time. Too bad you wouldn't be able to read it because unicode a's are different than normal a's but they would look the same.
<z><y><x>Title</x></y><w>Hel lo World!</w></z>
is much smaller than
<html>
<head>
<title>Title</title>
</head>
<body>
Hello World!
</body>
</html>
Now if you really want to be naughty to improve performance of parsing you can require tags that give the offset of other tags.
<z><a>026</a><a>040</a><b><c a="a"/></b><e a="g"/></z>
That way you can tell where tags begin without parsing the entire file if all you want is just one little peice.
I don't know if I can be annoying enough myself to actually get someone to make a binary xml counterpart standard, but I'm sure plenty of
Karma Clown
I agree on all that... but what if you program already has to work with text data for 99.999% of the time?
imagine some content management system (like slashdot, blog tools, and such). the only place where you gona store numbers are in dates and user id's. all the rest would be text, text and some more text.
Well, you still have the performance problem if compared against a proprietary parser written specificaly for your data... but i think that the benefits outcome this easily since you don't have to rewrite the parser (maybe you can tweak it latter, while the data volume is increasing and performance WILL became a problem) and you can easily port your data (assuming other systems also tought xml was useful hehe)
Repeating what's already said before on this tread zillion of times, it's all a matter of the right tool for the job.
Nonsense.
Take a look at the xNL standard for specifying names. It's not all that obvious or simple and I even wonder if it is complete - in particular if "MiddleName" might need an "order" attribute in order to specify print order (see below). And while there is no "nee" name specified (for maiden names), there is a "type" specifier for middle name that probably works. Further, there doesn't seem to be any way to define a date range for the timespan in which the name is applicable (though I suspect that was probably considered and moved out into the broader group of DTDs/schema that now encompass xNL).
As an (extreme but notable) example, Prince Charles has as a full name "Charles Philip Arthur George Windsor". And I suspect the order of the middle names matters. He is titled "Prince Charles, the Prince of Wales". He is also Earl of Chester, Duke of Cornwall, Duke of Rothesay, Earl of Carrick, Baron Renfrew, Lord of the Isles and Prince of Great Steward of Scotland. He also has the rank of Captain (Royal Navy) and Group Captain (RAF).
XML is standard. It can fit almost any type of data
e To Xml
Well, then why not make a comma-delimited standard or a relational text standard? Relational can also fit almost any kind of data (in theory).
http://www.c2.com/cgi/wiki?RelationalAlternativ
Table-ized A.I.
But using XML correctly is tough. I've written and discarded more DTDs and schemata than I care to admit because they were seriously flawed. Getting it right is important and very, very hard.
Call me a "relational troll" (on second thought, don't), but I think part of the problem is that XML designs tend not to follow relational rules. Relational rules and normalization are fairly well agreed-on are have a lot of experience, history, and some mathematical concepts to back it. Relational normalization is mostly about not repeating information that does not need to be repeated, and not hard-wiring your schema to fit one application/user at the expense of another. True, existing RDBMS implementations don't support dynamic columns very often, but relational theory does not exclude such.
Table-ized A.I.
People complain about XSLT for the same reason that procedural programmers complain "Lisp sucks" or "OOP sucks" or whatever: laziness and aversion to novelty. XSLT is a great declarative (functional {if you're willing to go through contortions}) language that (in combination with XPath, and other X-technologies) is extremely well suited for manipulating XML. That's it! But isn't that enough?
http://www.dublincore.org/
It seems that over the last 3 years a lot of DTDs where created, it's just that few people want to follow XYZ if it doesn't have W. It's always more fun to create one's own- but this always creates problems in the wider world. In fact, I believe most developer's are a bit wary of impelementing another's XML DTD/tags/attributes/schema etc. because it is not yet HTML 3.2/4.0 or it's not yet HTML - 1996 yet.
I've seen a very large company that I worked at try and develop there own special Schema/DTD for making the coolest content management system the world's ever seen, yet they didn't have a clue that the Dublin Core/IMSProject.org may have broke some practical ground. It was more like "what are you talking about?", "What's that all about and why is it better than our homegrown stuff", "We can't control it". Never trust standards to those with extreme individual ambitions.
http://www.imsglobal.org/digitalrepositories/driv1 p0/imsdri_bindv1p0.html
/ geo22.htm "
or www.dublincore.org
All text is eventually used for learning at some time.
"Although Z39.50 was developed by the library community to allow searching of bibliographic information and the development of client software that, theoretically, can search any library's catalog, the protocol's extension mechanisms have allowed other communities to take advantage of the features of Z39.50. The definition of bibliographic searching has been extended to include the Dublin Core. Community of interest profiles have been defined for information as diverse as cultural heritage:
Computer Interchange of Museum Information (CIMI), government and community information: http://www.cimi.org/
The Government Information Locator Service (GILS) Profile (http://www.gils.net/), and GeoSpatial Data: http://www.blueangeltech.com/Standards/GeoProfile
About IMS
The IMS Global Learning Consortium develops and promotes the adoption of open technical specifications for interoperable learning technology. Several IMS specifications have become worldwide de facto standards for delivering learning products and services. IMS specifications and related publications are made available to the public at no charge from www.imsglobal.org. No fee is required to implement the specifications.
IMS is a worldwide non-profit organization that includes more than 50 Contributing Members and affiliates. These members come from every sector of the global e-learning community. They include hardware and software vendors, educational institutions, publishers, government agencies, systems integrators, multimedia content providers, and other consortia. The Consortium provides a neutral forum in which members with competing business interests and different decision-making criteria collaborate to satisfy real-world requirements for interoperability and re-use.
####
For more information contact
Marketing, marketing@imsglobal.org
http://www.imsglobal.org
> XSLT providing tremendous advantages in
> transforming data for a variety of other purposes (you
> probably hated lisp/scheme based language, too).
Gah. I'm tired of people comparing XSLT to Lisp or Scheme. Okay XSLT can transform and generate itself just like Lisp, but that's where the similarities end. In almost every other design aspect it is the opposite of Lisp.
XSLT is an incredibly baroque, verbose language, only useful for a very limited set of trivial XML transformations, (ie. surprise style sheets!) that involve no I/O or complex computations. If you do a lot of this - then maybe it is worth learning, but my experience is that you can hit it's limits very quickly.
Lisp on the other hand is a incredibly elegant, compact and powerful general purpose language that has been used in almost every application domain imaginable. Maybe the most elegant, clear and powerful single programming language ever invented, where very complex functionality can often be written in an amazingly small amount of understandable code.
Your sig rocks.
I've been listening to that album for the past week nonstop and I think my brain is changing.
Ade_
/
Big Bubbles (no troubles) - what sucks, who sucks and you suck
the infinite loops that xml handles so well!?
Nice review. Thanks! It's interesting how many of the comments here relate directly to chapters in the book. For instance, there's a lot of concern about XML's perceived verboseness. This is addressed directly in Item 50, Compress if space is a problem. This chapter and ten others are online at http://www.cafeconleche.org/books/effectivexml/ . Check it out.
There have been a lot of comments on performance and the possibility of binary formats. A little googling turned this up:
t ml
http://www.xml.com/pub/a/2001/04/18/binaryXML.h
Summary: you would *think* binary would be a performance boost, but that doesn't seem to be the case.
John.
Nice redirect.
There are some XML altenatives