Slashdot Mirror


Effective XML

milaf writes "Who doesn't know about XML nowadays? Quite a few people, actually: there has been so much hype around it that some people think that XML is a programming language, a database, or both at the same time. On the other hand, if you are a developer, chances are that you feel that -- no matter its usefulness -- there is not much to XML. After all, it may take just a few hours to get the hang of creating and parsing an XML document. Maybe this is why most of the many and voluminous books discuss numerous XML-related technologies, but say less about the usage of XML itself." Read on for milaf's review of a book that takes the opposite tack. Effective XML: 50 Specific Ways to Improve Your XML author Elliotte Rusty Harold pages 336 publisher Addison-Wesley rating 10/10 reviewer milaf ISBN 0321150406 summary Very well written collection of topics on XML Best Practices

In Effective XML: 50 Specific Ways to Improve Your XML, Elliotte Rusty Harold takes a different approach: know your elements and tags -- they are not the same thing! -- and weigh your choices in a context, because any technology applied for the wrong reasons may fail to deliver on its promises.

Following Scott Myers' groundbreaking Effective C++, the author invites us to re-evaluate seemingly trivial issues to discover that life is not as simple as it seems in the world of XML. In each of the 50 items (chapters), he gets into the inner workings of the language, its usage and related standards, thus giving us specific advice on how to use XML correctly and efficiently. The 300-page book is divided into four parts: Syntax, Structure, Semantics, and Implementation. Yet in the introduction, the author sets the tone by discussing such fundamental issues as "Element versus Tag," "Children versus Child Elements versus Content," "Text versus Character Data versus Markup," etc. On these first pages the author started earning my trust and admiration for his knowledge and ability to get right to the point in a clear and simple language.

The first part, Syntax, contains items covering issues related to the microstructure of the language, and best practices in writing legible,maintainable, and extensible XML documents. (In it, over 19 pages are dedicated to the implications of the XML declaration!) That seems a lot for one XML statement that most people cut-and-paste at the top of their XML documents without giving it much thought, doesn't it? Actually not, if you follow the author's reasoning and examples.

The second part, Structure, discusses issues that arise when creating data representation in XML, i.e. mapping real-world information into trees, elements, and attributes of an XML document; it also talks about tools and techniques for designing and documenting namespaces and schemas.

The third part, Semantics, explains the best ways to convert structural information represented in XML documents into the data with its semantics. It teaches us how to choose the appropriate API and tools for different types of processing to achieve the best effect. This chapter has a lot of good advice for creating solutions that are simple, effective, and robust.

The final part, Implementation, advises the reader on design and integration issues related to the utilization of XML; these issues include data integrity, verification, compression, authentication, caching, etc.

This book will be useful to a professional with any level of experience. It may be used as a tutorial and read from the cover to cover, or one can enjoy reading selected items, depending on the experience and taste. The book's very detailed index makes it an excellent reference on the subject as well. In the prefix to the book, the author writes, "Learning the fundamentals of XML might take a programmer a week. Learning how to use XML effectively might take a lifetime." I'm not sure about the "lifetime" -- that's an awfully long time for using one technology -- but for the most confident of us this still may not be enough :) . Your mileage may vary, but I suspect that you could shave a few months off that time by browsing through this book once in a while. Most importantly, it will make you a better professional and make you proud of the results of your work. Wouldn't this worth your while?

You can purchase Effective XML: 50 Specific Ways to Improve Your XML from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

312 comments

  1. library by Pompatus · · Score: 5, Interesting

    If you want to read any book for free, just ask your local library to order it and they will. Libraries guess at what books people want to read, so if anyone shows any interest in any book, they order it. They loose their federal funding if they don't spend the money they are allocated, so they are generally VERY willing to buy as much as possible.

    --

    ----
    Squirrel ... It's not just for breakfast anymore
    1. Re:library by mrfunky405 · · Score: 0

      This is offtopic? I wonder about the kickbacks the janitors must be getting from bn.com.

    2. Re:library by Anonymous Coward · · Score: 0

      I thought the appropriate response was to ask what kickbacks the original poster was getting from the libraries.

    3. Re:library by Anonymous Coward · · Score: 0

      I've never heard of libraries receiving federal funding. Maybe for internet access, but not for books. My town library receives some money from the town gov't and the remainder of their budget comes from an annual plant/book/food sale.

    4. Re:library by essdodson · · Score: 2, Funny

      They should spend more on dictionaries. Perhaps then they could tighten up their federal funding before it gets too loose.

      --
      scott
    5. Re:library by Anonymous Coward · · Score: 0

      yeah, but I don't want to steal from authors... libraries are for pirates.

    6. Re:library by styrotech · · Score: 1

      Hell, if you want to read this review elsewhere - look at the first Amazon review dated Nov 3.

      Now, where can I find someone to pay me for prolific book reviews?

    7. Re:library by Anonymous Coward · · Score: 0

      They should spend more on dictionaries.

      Or they could just bitch and whine at random people on the Internet over typographic errors until everyone's grammar magically becomes perfect...

  2. One thing is for sure... by foistboinder · · Score: 4, Funny

    It's got to be better than Ineffective XML

    1. Re:One thing is for sure... by FortKnox · · Score: 1

      Hate to reply to a joke, but there ARE books that discuss the 'wrong' way to do things in order to avoid them.

      One that comes to mind would be Bitter Java which demonstrates wrong patterns used in applications and alternatives that tend to be more effective.

      So don't be too sure that it is better than Ineffective XML ;-)

      --
      Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!
    2. Re:One thing is for sure... by roundand · · Score: 1

      but there ARE books that discuss the 'wrong' way to do things in order to avoid them.

      I can also recommend the AntiPatterns Book as having some wince-makingly familiar bad software patterns, analysis of how they arrive, and re-factorings to escape them.

      So I'd certainly give a book called Ineffective XML a look. Especially if it was written by someone who's seen as much good and bad markup as Elliotte Rusty Harold.

    3. Re:One thing is for sure... by elharo · · Score: 2, Funny

      Hmm, I was wondering what I could do for a sequel. (Only half-kidding).

  3. Government Health Warning by NickFitz · · Score: 4, Funny
    Learning how to use XML effectively might take a lifetime
    ...
    you could shave a few months off that time by browsing through this book

    Reading this book shortens life expectancy. Still, it's your choice...

    --
    Using HTML in email is like putting sound effects on your phone calls. Just say <strong>no</strong>.
    1. Re:Government Health Warning by Anonymous Coward · · Score: 0

      Will you look at these mod scores?


      Starting Score: 1 point
      Moderation +1

      60% Funny
      20% Offtopic
      20% Troll


      How is this anything but +1, Funny? MOD PARENT UP!

  4. Why do I have the feeling... by ultrabot · · Score: 0, Insightful

    That the book won't mention the "s-exprs on drag" angle...

    --
    Save your wrists today - switch to Dvorak
    1. Re:Why do I have the feeling... by elharo · · Score: 3, Insightful

      You're right about that. It doesn't. Not all technologies that are isomorphic to each other are equally useful, any more than all Turing complete programming languages are the same. The representation matters, and the XML representation has proven more useful and accessible than the S-expression representation.

      I'm not fully convinced that S-expressions are isomorphic to XML either. The proper handling of Unicode and non-English, non-ASCII text presented in multiple encodings is a big advantage of XML compared to S-expressions. I suppose something like this could theoretically be added to S-expressions, but has it been?

  5. Health hazard? by Drantin · · Score: 0, Funny
    "Learning the fundamentals of XML might take a programmer a week. Learning how to use XML effectively might take a lifetime."
    --I suspect that you could shave a few months off that time by browsing through this book once in a while.

    ...Does this mean the book could shorten our liftime? or shave a few months off of a week?
    --
    Actio personalis moritur cum persona. (Dead men don't sue)
    1. Re:Health hazard? by Drantin · · Score: 1

      bah.. someone else spotted it while i was typing :-/

      --
      Actio personalis moritur cum persona. (Dead men don't sue)
  6. Unix Tab-Separated ASCII Files vs. XML by billstewart · · Score: 4, Interesting

    Sure, XML isn't inherently that deep - but neither are the tab-separated ASCII files which Unix tools used to do all kinds of really powerful things. Similarly, LISP property lists aren't that complex. XML's a bit more flexible, and carries enough decoration with it that people are willing to use it for building interfaces that they might not build using ASCII or XDR. And anything that lets the EDI people replace their stuff with simpler, more open technology is good too..

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
    1. Re:Unix Tab-Separated ASCII Files vs. XML by Anml4ixoye · · Score: 4, Interesting
      And anything that lets the EDI people replace their stuff with simpler, more open technology is good too..

      My current project for the last 8 months has been working on just that - parsing HIPAA EDI transactions. We do it by converting them to XML data structures. There is a decent white paper about it too.

      What I've found is that, for readability, XML is the way to go. For performance, EDI is definately better. I have one EDI file that is 23k. When expanded to XML, it is close to 5000 lines long.

      I agree with an earlier post. If you are using an hardware XML accelerator, or using small XML documents (config, etc), or needing readibility over performanc then it is great. But I have a hard time believeing that it will replace tab-seperate files any time soon (not that the parent poster was implying this).

    2. Re:Unix Tab-Separated ASCII Files vs. XML by binaryDigit · · Score: 1

      Sure, XML isn't inherently that deep - but neither are the tab-separated ASCII files which Unix tools used to do all kinds of really powerful things.

      I wholeheartedly disagree. XML adds a level of standardization that is unheard of (though not impossible technically to achieve) vs any type of tab/comma/verticalbar/whatever (I'll refer to any file like this as csv). Using csv, you either have to agree on a convention for labeling, or you're stuck using positions to access data. If your schema changes, unless you always add new things at the end, every piece of software that is assuming that ordering breaks. csv files have no standard way of representing parent child relationships (you need some type of agreed upon "record marker" to know what type of record each line contains) or even worse, people flatten the data.

      csv files are merely extensions of paper typed columnar data. At least XML goes a step beyond that and models itself more after the conceptual record. This and the more "standardized" definitions makes it significantly more powerful than csv files were in the past.

    3. Re:Unix Tab-Separated ASCII Files vs. XML by Anonymous Coward · · Score: 0

      Well, you can't actually compare a 23K EDI file to a 5000 line XML file, since if the XML file has fewer than an average of about 4-5 characters/line, it still wins. However, uncompressed storage requirements aren't really such a big deal; disk space and memory are cheap.

      The real problem with larger files is transmission speeds/costs. It would be interesting to compare the compressed sizes of the EDI and XML file. Maybe XML files should be gzip compressed by default? If you did it at the file system level, it'd be transparent. That'd be wacky (I'm not advocating it), but cool.

      Incidentally, I don't believe you should evaluate a technology on performance considerations alone. After all, most big Web sites using HTTPS couldn't operate without hardware SSL accelerators, but that isn't an argument against SSL.

    4. Re:Unix Tab-Separated ASCII Files vs. XML by billstewart · · Score: 1
      I'm not arguing that XML is useless or not powerful - it's possible to do really good standardization if you want to. SQL for relational databases is also pretty good at standardizability and readability (so I'll dispute your "unheard of" assertion, and point out that you're using the term "schema"....) I am a bit concerned that people will be too likely to reimplement a lot of the 1970s hierarchical database structure things that we replaced with relational databases during the late 1980s-1990s, though XML probably makes it easier to do that well as opposed to badly.

      But just as you can write undecipherable undocumented bad spaghetti code in almost any computer or non-computer language, you can also write uselessly non-standardizable XML that nobody can communicate with independently, even if you _aren't_ deliberately trying to obfuscate it. Microsoft Office's latest version uses XML for data interchange between components, and I've heard people assert that it's not possible for a programmer who isn't buying a bunch of APIs (or otherwise getting Office to do tasks) to communicate with it dependably. I haven't verified that myself, but I've seen the HTML code produced by previous versions, so I'm inclined to believe it.

      --

      Bill Stewart
      New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
    5. Re:Unix Tab-Separated ASCII Files vs. XML by Anml4ixoye · · Score: 1
      Well, you can't actually compare a 23K EDI file to a 5000 line XML file, since if the XML file has fewer than an average of about 4-5 characters/line, it still wins.

      The XML file contains at least 20 characters per line for a majority of the document.

      However, uncompressed storage requirements aren't really such a big deal; disk space and memory are cheap.

      This isn't really as much about disk storage space as it is parsing through the document. And it isn't really about that either, it is about the people who talk about how easy it is to take a 5000-line XML file and *poof* transform it with XSL. It's the tradeoff for readibility for performance. If you need raw performance, you use something like EDI. Difficult to read and program for (simply because it is so cryptic) compared to a nice XML structure, but the machine doesn't care about readibility (to a degree)

      The real problem with larger files is transmission speeds/costs. It would be interesting to compare the compressed sizes of the EDI and XML file.

      That would be interesting to find out. I'm sure the XML file would be marginally larger, but again the issue isn't with the bandwidth for the file, it is with the processing of the document.

      After all, most big Web sites using HTTPS couldn't operate without hardware SSL accelerators, but that isn't an argument against SSL.

      Correct. In those cases they traded the performance of not having HTTPS for the security of having it. My guess would be that Slashdot couldn't keep up the load it does on the same hardware if everything is HTTPS. But, there is no reason to have it for them. Same goes for XML. When you need the readibility (or the ability to say you "do" XML) you go with XML. There isn't anything wrong with either approach. XML is an excellent method for us to allow SQL, Informix, HP3000's and Servlets to all talk to each other. EDI is great for receiving the transmissions from providers.

  7. The main issue with XML is performance by Anonymous Coward · · Score: 4, Informative
    Others have said it before, but I'll say it again. XML is heavy weight and isn't free. The best example of this is SQLXML. Although it sounds nice to use SQLXML, the performance on most commercial database see a huge drop in performance. This is due to the fact that parsing XML blows and eats up copious amounts of CPU and memory. I've had people ask me about how to solve problems with SOAP on windows and java applications. The bottom line is, unless you're using hardware XML accelerators, XML is a resource hog.

    On a related note, more details on Microsoft Indigo are finally available. According to this article on XML mania microsoft's future platform will use XML as much as possible. More details are available on microsft's site. The funniest part is they are claiming indigo + longhorn will be the best thing since slice bread. Maybe they haven't learned the hard lesson that parsing XML kills performance.

    1. Re:The main issue with XML is performance by musikit · · Score: 0, Troll

      although i agree with you in pretty much all respects to XML. the reason i feel that MS is changing wverything to XML data vs non XML data is the fact there is nothing left to do but security patches,device drivers, and tuning the OS so it doesn't take 2 gig. so they need to add retarted requirements that will do nothing but slow performance selling faster processors and more memory. Everything for the user experience was in with windows 98.

    2. Re:The main issue with XML is performance by satyap · · Score: 1

      Not free? How do you mean, money-wise, processor-wise, or what? I agree that sometimes using XML is a bigger pain than not using it. Especially the gratuitous examples in some documentation. Like .

    3. Re:The main issue with XML is performance by Anonymous Coward · · Score: 1, Informative

      Sure, I've tried and it not an easy task. The original post is bitching, but have you seen any body improve the performance by 2-3x the last 3 years? Is it even possible to improve XML parser performance beyond current parsers? Both .NET and Java have some nice parser that use either SAX or stream based parsing. In .NET it's the XMLTextReader. In Java there's XML PullParser v2. Think of it another way. If you only need a few nodes in a XML structure, you can definitely improve performance. On the otherhand, if you're doing webservices and using a DOM centric approach, there's very little you can do outside of hardware acceleration. Put it another way, your statement or maybe they stopped bitching about the XML performance and found a faster/better way to parse it. is equivalent to saying "write a faster encryption library". There's only so much you can do with software to speed up XML parsing. IBM, Sarvega and a couple of other companies are making XML accelerator to get around the performance problem. Go google for it and you'll see that hardware accelerators can provide 10x improvement over the fastest SAX/Stream parser today.

    4. Re:The main issue with XML is performance by satyap · · Score: 1

      Like .

    5. Re:The main issue with XML is performance by I8TheWorm · · Score: 2, Interesting

      I share your opinion regarding XML, and have yet to find a great reason to use it, other than feeding data to our vendors systems through their proprietary file layouts.

      On that note though, I wonder if this author has some insight into better uses for XML than what I've typically seen (XML does everything!). I won't, however, be running out to buy it, as XML will always be just more bloat and a resource hog by nature.

      --
      Saying Android is a family of phones is akin to saying Linux is a family of PCs.
    6. Re:The main issue with XML is performance by I8TheWorm · · Score: 5, Insightful

      To put it another way...

      this single record

      Doe, John 1234567 12/1/2001

      took 31 bytes, while it's XML companion (using short, simple tags) took 96 bytes.

      Not all XML files wind up being 3 times the size of their flatfile counterparts, but they are inherintly larger. There really isn't a way to make loading/parsing that data any faster, by the nature of working with ASCII/ANSI files. XML will always be slower.

      --
      Saying Android is a family of phones is akin to saying Linux is a family of PCs.
    7. Re:The main issue with XML is performance by vmfedor · · Score: 3, Insightful
      This is the main reason I personally don't believe XML can be used as a functioning database. I see it being used more as a way to transport data across the internet and across different platforms. If two companies merge and one uses mostly UNIX-based servers and the other uses Microsoft, the two can combine their databases easily using XML.


      I see XML as a nice way to transport data but (at least right now) it's not mature and/or fast enough to serve as a fully functioning database.

      --

      I like my women how I like my sugar.. granulated.

    8. Re:The main issue with XML is performance by ivan256 · · Score: 1

      Yeah. Use more memory and a faster processor.

      They make most of their consumer OS money through OEM sales anyway, so why not take advantage of all that under utilized power.

      Nobody knows better than Microsoft that it's buzzwords that sell software. There's probably only 1 in a thousand users that actually even begin to take advantage of any features made available by the changes that they put into their back end software these days. Their design decisions are all purely marketing related. Fooling yourself into thinking there's novel technological progress behind any of their decisions or that they're technology driven in the slightest way is short sighted, and quite honestly it would be bad business for them to work that way.

      Believe me, if Microsoft does announce some novel way to accelerate XML parsing or does manage to improve performance beyound that of everybody else, it'll be because marketing made it a product requirement, not because they're sitting around brainstorming about text parsing for fun in the engineering department.

    9. Re:The main issue with XML is performance by Anonymous Coward · · Score: 0

      How hard is it to call .Serialize and .Deserialize on objects?

      You're a fuckin moron.

    10. Re:The main issue with XML is performance by Citizen+of+Earth · · Score: 3, Interesting

      Others have said it before, but I'll say it again. XML is heavy weight and isn't free.

      XML needs to be updated to allow binary encoding. The open-source high-performance parser/generator library at the link demonstrates the performance gain.

    11. Re:The main issue with XML is performance by Anonymous Coward · · Score: 0
      Believe me, if Microsoft does announce some novel way to accelerate XML parsing or does manage to improve performance beyound that of everybody else, it'll be because marketing made it a product requirement, not because they're sitting around brainstorming about text parsing for fun in the engineering department.

      that makes me think marketing guys are some what useful. If it takes a bunch of marketing driods to make better XML performance a priority, then it may benefit the world of XML in general, which isn't a bad thing. Of course, not that I'm going to hold my breath it will actually happen. And no matter how smart Microsoft programmers are, they still won't be able to be beat hardware accelerators.

    12. Re:The main issue with XML is performance by Anthony+Boyd · · Score: 1
      This is the main reason I personally don't believe XML can be used as a functioning database. I see it being used more as a way to transport data across the internet and across different platforms.

      XML is not a functioning database. XML is a way to transport data. So your misgivings are due to the fact that you have stumbled across reality.

    13. Re:The main issue with XML is performance by Anonymous Coward · · Score: 0
      XML needs to be updated to allow binary encoding. The open-source high-performance parser/generator library at the link demonstrates the performance gain.

      A couple of friends and I have thought about that, but it would go against the whole human readable goal of XML. If it was optional aspect of XML, that would go a long way to improving XML performance. I've seen some other benchmarks that show XML can out perform RMI under very specific cases. Basically, simple objects that really shouldn't be using RMI to begin with and should be marshaling by value.

    14. Re:The main issue with XML is performance by Anonymous Coward · · Score: 0
      How hard is it to call .Serialize and .Deserialize on objects?

      well that's what I said, just not the exact words. Though you'd be amazed at how many times I've seen that happen. Not everyone uses technology the way it was designed or intended.

    15. Re:The main issue with XML is performance by RetroGeek · · Score: 1

      How hard is it to call .Serialize and .Deserialize on objects

      Ok, are your ints big-edian or little-edian?

      How big IS your int, 32 bits, or 64 bits?

      Serialising objects is fine if you are xfering between identical systems, but if you want your Windows box to get data from your mainframe, you need something better.

      And yes you can bit-bash the data, but you need to build a bit-basher for EVERY different system (and s/w) which communicates.

      XML is a general way of moving data around where the creator/parser is standarized.

      --

      - - - - - - - - - - -
      I am a programmer. I am paid to produce syntax not grammar. Deal with it.
    16. Re:The main issue with XML is performance by starm_ · · Score: 3, Informative

      I have to agree with that. Last year I did a work term in a department where they where converting their software to XML and SOAP. When I came in they asked me to learn XML and SOAP (c++'s gsoap and java soap). We were making and converting distributed applications. Usually with a user client made in java and a c++ server (for performance). After a few weeks into my work term I was still in the processes of working on one of these SOAP servers when finally one group finished converting one of our main product. When they went to test it they discovered it was too slow to use. When the user on the client side wanted to visualize the results of its database query it took 40 seconds to serialize sent thrue the network (fiber optic network, top of the line computers) deserialize and display the results. It took only 2 seconds with RPC.

      They just didn't know how they could explain this to the users. They could not see that the users would understand that in the new and improved program that looks exactly the same as the old one, when they clicked the "visualize" button they had to wait 40 seconds.

      Also XML is very cryptic. Has anyone tried to do XSLT? my god I had to do it once and it made a simple task very difficult. They are many more efficient and intuitive way of visualizing data than XML. XML makes development time very long and costly for some tasks

      I think that XML has its uses though. Like for making standard word processor documents, and things like that. But it shouldn't be used everywhere like some people seem to think.

    17. Re:The main issue with XML is performance by Boing · · Score: 4, Insightful
      Doe, John 1234567 12/1/2001

      took 31 bytes, while it's XML companion (using short, simple tags) took 96 bytes.

      Uh huh. Now let me ask you, is that record space-delimited? Comma-delimited? Fixed-width [shudder]? If it's fixed width, and the first name is fixed at four characters, is the person's name "John" or "John-Paul"?

      31 bytes for your record, and 96 for equivalent XML... but how many extra bytes were spent on code to manage your particular flavor of data? How much time was spent in development of that code? How does that time (and associated cost) compare to the extra millisecond/record required to transmit and process the XML data?

      XML is standard. It can fit almost any type of data (though binary data is not currently the most effective thing in the world, but it can be incorporated). Since MS is integrating XML into all of their products, we won't have to worry about many people who don't have a good XML library installed on their systems. So instead of 50 programs with their own (limited and likely buggy) data formatting subsystems, we'll have 50 programs that each call one library on disk, in a standard, robust system with enough exposure to squash the show-stopping bugs.

      XML will always be slower.

      Depends on how you look at it. If the aforementioned widely-available XML parser gets enough of a beating, it will be optimized like you wouldn't believe. Yes, two data processors (one XML, one markupless) with equal amounts of work spent on them will perform in favor of the simpler format... but XML's simplicity and universality will make it so that the XML parsers will have more eyes.

      The same philosophy is why the well known open-source programs (linux, apache, etc) are functional and stable as hell:

      Wide use + Openness = Greatness.

    18. Re:The main issue with XML is performance by helix_r · · Score: 1


      "Performance" is only one of a long list of things to consider when putting together a complex system. As processor, network and storage speeds get faster and bigger, raw performance becomes less of an issue and often laughably irrelevant.

      XML buys you interoperability and a vast continuum of well-tested standardized tools to work with. Also, xml and the tools that work with it are somewhat future-proof-- the standards have been designed intelligently from the ground up to deal with change.

      If your application needs to consume a configuration file upon starting up what difference does it make if that takes 10 millseconds or 1 millisecond? Not much. However by using xml instead of an ini file you now have all kinds of tools to validate, parse and manipulate your configuration file.

    19. Re:The main issue with XML is performance by GeckoX · · Score: 1

      Serialize and Deserialize in .NET is totally portable and doesn't matter what the current/target system is. It Serializes and Deserializes to the base .NET datatypes which ARE portable.

      (I realize that this isn't the case with most systems, just a huge benefit of .NET)

      --
      No Comment.
    20. Re:The main issue with XML is performance by gorilla · · Score: 1

      Of course they're larger, it's obviously impossible to have data the same size as data+markup. The advantage of adding the markup is that if your record format changes, then the XML representation doesn't have to, and even if you have to change the representation, then it's possible for your import routine to detect which version of the XML you've got, and handle it correctly. That's impossible in a pure data representation.

    21. Re:The main issue with XML is performance by GeckoX · · Score: 1

      Couldn't have said that better.

      --
      No Comment.
    22. Re:The main issue with XML is performance by schapman · · Score: 1

      common... XSLT is fun... ... .. like cancer :P The only way I've found it bearable is to use Sonic Software's Stylus Studio. It makes the hell of XSLT debugging much easier.. and my pain was worse... I was doing XSLT, FOP, and SVG all in the same doc.. for my first introduction to writing XML !! oh.. the pain, the pain, the pain.

      --
      Wouldnt you like to be a pepper too?
    23. Re:The main issue with XML is performance by I8TheWorm · · Score: 1

      Uh huh. Now let me ask you, is that record space-delimited? Comma-delimited? Fixed-width [shudder]? If it's fixed width, and the first name is fixed at four characters, is the person's name "John" or "John-Paul"?

      I'm assuming since you can read that you also read my minor disclaimer that followed that text. If not, go back and revisit it.

      XML is standard

      So are flat files, and all you need to know is what the fields are. And if you don't know what they are (and no data dictionary is provided) then you don't need to mess with the file.

      but how many extra bytes were spent on code to manage your particular flavor of data?

      None, since it was hand typed to form an example with empirical data, which I have yet to see elsewhere in this article.... nothing but subjective opinions. In my experience, flat files don't require any more code than your typical XML parsers... as a matter of fact far less code to manage them, since they don't care about anything but a header file and some sort of delimiter. Simple stuff really.

      Since MS is integrating XML into all of their products

      Have you seen MS's implementation of XML in ADO? It's horrible, and a bit like a flat file anyway. And it requires some pretty detailed XSL to make it readable... not like the typical XML most people would write out.

      but XML's simplicity and universality will make it so that the XML parsers will have more eyes.

      That has nothing to do with speed, which is what this thread was about in the first place.

      --
      Saying Android is a family of phones is akin to saying Linux is a family of PCs.
    24. Re:The main issue with XML is performance by Ed+Avis · · Score: 3, Informative

      If you know your XML will conform to a particular DTD, FleXML can be used to generate a very fast parser for it in the style of lex/yacc. You don't have to mess with all that slow DOM or SAX stuff if you're concerned about speed. It may still be a resource hog compared with binary file formats and protocols but not nearly as sucky as often seen (my own code included).

      --
      -- Ed Avis ed@membled.com
    25. Re:The main issue with XML is performance by Not+The+Real+Me · · Score: 2, Insightful

      XML, when it comes to data and databases, is nothing more than a beefed-up alternative to CSV (comma separated values).

    26. Re:The main issue with XML is performance by Citizen+of+Earth · · Score: 0

      A couple of friends and I have thought about that, but it would go against the whole human readable goal of XML.

      I suppose that you can read a GZIPped binary file by looking at a hex dump? Of course not; you use a tool that converts to and from the GZIP format.

      RMI sucks because Java sucks for raw performance.

    27. Re:The main issue with XML is performance by I8TheWorm · · Score: 1

      I have to ask this though... which is more important to you regarding web apps/services in the enterprise. The possibility of representation changing, or the speed at which you can get the data to the client?

      My entire history of coding in the enterprise has been about saving bandwidth while focusing on performance of the app. I deal with very large amounts of data (as do many programmers that read /. I'm sure) and cannot hog bandwidth here in the office or on the VPN's to our vendor data with files that could easily be smaller and faster.

      Add to that the idea that there are RDBMS's with XML parsers... that's an awful idea. Can you imagine the geniuses that are trying to load GB's at a time using XML? That sort of activity would bring many DB servers to their knees, if not to a screeching halt.

      Personally, I just think well written code handles 99% of what people seem to be using XML for. That being said, I did suggest the book might have some insight into uses for XML that I have yet to come across.

      --
      Saying Android is a family of phones is akin to saying Linux is a family of PCs.
    28. Re:The main issue with XML is performance by helix_r · · Score: 2, Insightful


      "Doe, John 1234567 12/1/2001 "

      If you think about it that is a useless piece of information without lots and lots of context surrounding it.

      * What is Doe?
      * What is " John"?
      * What is 1234567
      * 12/1/2001 looks like a date. Is it Dec 1 or Jan 12?
      * How do I know if this record is complete?
      * Is my field separator a " " or ","?

      Problem: The year is 2023, we now use format "x" in our records, you need to onvert all records to format "x" -- there are 233 different types of records. 7,220,134 records need to be translated in 2 weeks. Which formats will be the easiest to convert??

      XML allows you to beat the above problems by being a somewhat self-describing format. For a few extra bytes you get a lot more functionality, interoperability and future-proof-ness

    29. Re:The main issue with XML is performance by onomatomania · · Score: 1

      By the time Longhorn actually ships, we'll all have 20 TeraHertz processors to go with our moon colonies and personal rocket packs. Problem solved.

    30. Re:The main issue with XML is performance by Anonymous Coward · · Score: 0

      Sounds like the problem is not XML performance but your design decisions.

      XML is a tool, it is not a cure-all that needs to be used for all instances. If performance is the key then design around the XML performance hits or
      design better XML layouts to increase performance.

      In some instances it may seem fat (ex. file conversion with FOP), but on the other hand any other file conversion implementation would be fat as well and more than likely, IMHO, less flexible if not parsing some kind of tree-based markup language to do it.

      In conclusion, when it comes to cost on my end the processor/memory usages are much less expensive than the man-hours it would take to write and develop parsers to do processes that XML is good for. Plus it is much easier to increase computer resources than human resources.

    31. Re:The main issue with XML is performance by Boing · · Score: 1
      read my minor disclaimer that followed that text

      If, by that, you mean this disclaimer: "Not all XML files wind up being 3 times the size of their flatfile counterparts, but they are inherintly larger.", then you misinterpreted my criticism. I was pointing out that fixed files, which are not at all uncommon, have a major flaw in that data that are longer than the fixed field size cannot be stored (and are frequently truncated, of all things). XML does not have that problem.

      > XML is standard

      So are flat files, and all you need to know is what the fields are. And if you don't know what they are (and no data dictionary is provided) then you don't need to mess with the file.

      > but how many extra bytes were spent on code to manage your particular flavor of data?

      None, since it was hand typed to form an example with empirical data, which I have yet to see elsewhere in this article.... nothing but subjective opinions. In my experience, flat files don't require any more code than your typical XML parsers... as a matter of fact far less code to manage them, since they don't care about anything but a header file and some sort of delimiter. Simple stuff really.

      Okay, think of this on the large scale. There are a million software developers out there, with a hundred projects during their careers that require a data format. Assuming that the developer gets more complicated projects over time, it's reasonable to assume that their later projects will require extensions to their home-grown format that make them incompatible with the earlier ones. (see: internationalization, encryption, multiplicity)

      What you were saying about the data dictionary and stuff is relevant in terms of what's stored in the XML document, but does not address the issue of how complicated my life as a developer is as a result of a bajillion different answers to how that data is stored.

      In addition, you are correct that a parser for a specific flat file format probably takes less time to develop than a parser for XML. But the parsers for a hundred different flat file formats took far more (accumulated) time to develop, and require many times the developer brain real estate to understand them.

      Have you seen MS's implementation of XML in ADO? It's horrible, and a bit like a flat file anyway. And it requires some pretty detailed XSL to make it readable... not like the typical XML most people would write out.

      I'm sure your brain wasn't much more effective than a flat file when you were four years old either. Microsoft is still early in their widespread adoption of XML. Immaturity is inherent to any product's life cycle. XML parsers are also relatively slow at the moment. More immaturity. But both issues are addressed by the fact that, as the XML presence becomes more ubiquitous, the open-source XML implementations will get more eyes with creative ideas on how to performance-tune and optimize the parsers and transformers, and the proprietary implementations will have more market-share, and thus more money to throw at the same problem.

      > but XML's simplicity and universality will make it so that the XML parsers will have more eyes.

      That has nothing to do with speed, which is what this thread was about in the first place.

      It has everything to do with speed, because more work and money spent, directly translates into a more efficient program.

    32. Re:The main issue with XML is performance by I8TheWorm · · Score: 1

      I agree with that issue, except that in typical ISAM or simple flat files, you generally have a header file associated with it. And a "few" extra bites is a stretch.

      Maybe XML, then, is useful for development where coding fast (less documentation) is important. That's not a world I want to live in, but I'm sure others would think differently.

      --
      Saying Android is a family of phones is akin to saying Linux is a family of PCs.
    33. Re:The main issue with XML is performance by I8TheWorm · · Score: 1

      If I sounded terse in my previous post (I just re-read it) I apologize... I actually enjoy this kind of thread.

      Bear in mind there is a huge difference between a flatfile and a fixed file. Flat files can simply be one-line (or in some cases more than one line) per record and delimited by any character or combination of characters. So the idea of extra spaces taking up room is a bit moot if you're delimiting by a single character. Granted, that one character per "field" in each line of the flatfile, so you would have n-1 extra characters, where n is the number of fields.

      About this... It has everything to do with speed, because more work and money spent, directly translates into a more efficient program.

      Say you get all of the coding done, and you start your batch processing of data, which happens on a daily basis. A batch on the flatfile will always be faster, and in some cases, many multiples of times faster, because there is no parsing to do other than read until the next delimiter.

      Part of the XML standard allows for data to be in any order, so your parser has to check for the tags themselves, and use the data accordingly. In one record employee_number may be at the beginning of the employee's data, in another it might be at the end. With flatfile processing that's not a concern. And if the developers on both ends are worth their weight in wheat, the data will always come in in the specified format, as related in the header file.

      I do see your point, and I have to say that I now see another use for XML that I didn't before, but I'm still not convinced that you couldn't do the same with a flatfile and some halfway decent documentation. Maybe I'm getting old and have hit that point in my career where I resist change or something.

      --
      Saying Android is a family of phones is akin to saying Linux is a family of PCs.
    34. Re:The main issue with XML is performance by Anonymous Coward · · Score: 0

      I'd say ditch XML and use UBF instead! It supports binary objects and is proveably three times faster to parse than XML.

    35. Re:The main issue with XML is performance by helix_r · · Score: 1


      The advantage of xml is strong when you have complex data that is semantically rich.

      For example, cheml, is an XML schema for representing arbitrary chemical structures. Such an undertaking using "header" files would be major undertaking. In chemL, using a well-established peer-reviewed schema, its a simple job (including validation features). In addition to that, your files are now accessible to a wide variety of software. This makes it vastly easier to add rich features quickly.

      Try that with you custom header format-- assuming you can get someone to even read it and your documentation about it! :-)

    36. Re:The main issue with XML is performance by murdocj · · Score: 1

      Do you have any idea *why* it was taking 40 seconds instead of 2 seconds? Was it sending hundreds of times more data? Making hundreds of more calls?

      I've worked on several apps where data is transmitted from client to server and back again in xml format, and I can't see I've seen any speed issues that are related to using xml.

    37. Re:The main issue with XML is performance by butane_bob2003 · · Score: 1

      All persistence is slow. Parsing and serializing XML is the only slow part, just as establishing database connections and parsing queries is the slow part of using databases. Anytime you have to get at persistent data you will see some performance hits. Persistence is never 'free'. If XML is less 'free' perforance-wise than other persistence mechanisms, it makes up for it in portablity, flexiblity, simplicity, efficiency... Much thought has been given to the performance of XML parsers, most freely available tools are highly optimized and efficient, only retrieving data that is needed (using deferred instantiation and similar techniques). There are many different was of getting at data stored as XML, including a plain text editor should the need arise. Try doing that with, say, the windows registry, or an Oracle database. It's rare that performance is the #1 aspect of design in an application, and I have never found using XML to create noticeable performance bottlenecks when applied correctly to a problem. Any other persistence mechanism would have created the same bottleneck when applied similarly.

      --


      TallGreen CMS hosting
    38. Re:The main issue with XML is performance by Zaiff+Urgulbunger · · Score: 1

      Appologies a weird cross linking thing, but I've replied to this comment:
      "I share your opinion regarding XML, and have yet to find a great reason to use it, other than feeding data to our vendors systems through their proprietary file layouts."

      Here: LINKY

    39. Re:The main issue with XML is performance by Anonymous Coward · · Score: 0

      XML not a way to transport data. XML is a way to annotate a document. Anything else is wrong. An XML document can be a substitute for a protocol, but in the end, you are using XML, you're just flattening out one protocol into a text stream, using another protocol to transmit it, and then rebuilding into another after it has been transported.

    40. Re:The main issue with XML is performance by Anonymous Coward · · Score: 0

      The data is not transmitted in XML format. It is transmitted (I'm guessing here) as TCP packets that are assembled into a text stream which is then parsed into an XML document and then transformed into whatever internal representation your application uses. Maybe realizing this will help you see the speed issues releated to using XML.

    41. Re:The main issue with XML is performance by Anonymous Coward · · Score: 0

      Every XML file format needs a specific parser as well. The only thing an "XML parser" does is check that that XML file has proper XML markup characters and that it's structure matches a specifically defined format (DTD or Schema.)

      *You* haven't gained anything by using XML. XML does have benefits, however. The main benefit being that you can check for problems such as truncated files or corrupted data (because the format is data as well) easier.

    42. Re:The main issue with XML is performance by Anonymous Coward · · Score: 0

      Actually, I've done benchmarks comparing SQLXML vs OLEDB for Microsoft SQL Server. I believe the performance issue mention isn't necessarily persistence. For example, I ran simple select queries on a single table with and without SQLXML. My own results showed it was 100x slower with 6 concurrent queries.

    43. Re:The main issue with XML is performance by crucini · · Score: 1
      I think your view is a little simplistic. You seem to be limiting the discussion to communicating tables, which is an area where CSV can work pretty well. But what if there is hierarchical structure? An XML document could have a list of PC's, each containing a list of peripherals with attributes.

      And if you don't know what they are (and no data dictionary is provided) then you don't need to mess with the file.

      This doesn't match my experience. I maintain a complex app that receives XML requests and returns XML responses. The ability of a human to quickly read the data is invaluable. Also, as messages evolve, there is no chance of a data item being misinterpreted due to being in the wrong column.

      My XML is simple and readable, not SOAP or anything like that.

      As time goes on, optional tags get added to requests to cause special behavior. Only the programmer who needs that behavior needs to care about the existence of the new tag. Likewise, more info gets added to responses, but the new info can be ignored by existing clients.

      CSV has its place - importing and exporting table-formatted data. But for cross-platform messaging, XML is the best right now.
    44. Re:The main issue with XML is performance by RetroGeek · · Score: 2, Insightful

      doesn't matter what the current/target system is

      Well yes it does matter. It must be .NET

      With XML, I can create it in DOS version 1 using an 8bit utility, put it onto a diskette and have a user read it on a Linux, Windows, OS/2, ... system.

      --

      - - - - - - - - - - -
      I am a programmer. I am paid to produce syntax not grammar. Deal with it.
    45. Re:The main issue with XML is performance by starm_ · · Score: 1

      Well the data structure that was sent was about 2MB before serialization. Now I don't know much about RPC but I asume it must send this data almost "as is", binary representation or whatever. On our fiber optic network it took about 2 seconds to send 2 MB of data.

      Now I'm not sure exactly how XML SOAP made it slower. But I assume that it is mostly serializing and deserializing the 2MB data structure. Even if XML overhead trippled the size it would have taken only 6 seconds to send the 6 MB. There would still have been 32 seconds to acount for.

    46. Re:The main issue with XML is performance by crucini · · Score: 1
      I deal with very large amounts of data (as do many programmers that read /. I'm sure)

      That's probably the key issue. I wouldn't want to move large amounts of tabular information with XML. Most of the XML messages I handle are quite small, and the database processing time dwarfs the time to generate and transmit the XML. Bloating 1 KB to 3 KB doesn't matter, but bloating 100G to 300G would definitely matter.
    47. Re:The main issue with XML is performance by Anonymous Coward · · Score: 0

      > Have you seen MS's implementation of XML in ADO? It's horrible, and a bit like a flat file anyway.

      To be fair, I don't think you are supposed to use ADO XML to interface with other systems. It's just a way of serializing data. ADO.NET is completely different.

    48. Re:The main issue with XML is performance by Trejkaz · · Score: 1

      Your short tags aren't exactly the shortest. I mean, 'person' could be 'p' if you really wanted to skimp on space, but the point is for the document to describe itself.

      --
      Karma: It's all a bunch of tree-huggin' hippy crap!
    49. Re:The main issue with XML is performance by Pseudonym · · Score: 1
      XML is heavy weight and isn't free. The best example of this is SQLXML.

      Actually, SQLXML is possibly the worst example of XML use. It's hardly the best.

      I work with large amounts (terabytes) of structured marked-up text. XML is almost precisely the right way to approach this problem domain. Forcing tabular data into XML is almost precisely the wrong way to approach that problem domain. Horses for courses.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    50. Re:The main issue with XML is performance by iabervon · · Score: 1

      Care for some column headings for that record? Once you include the column headings, the flat version is 2/3 the size of the XML version. Sure, you don't need another set of column headings for each record, but, since you're talking about a single record, you can't just ignore the column headings.

      Furthermore, XML isn't well suited for the case where you have a large number of records with the same column structure. It's not really even worthwhile for all different columns if you don't have nested records. But once you have a complicated structure, XML becomes much more manageable than defining separate tables with ids and references.

      Once you get to something like an Apache config file, doing it as a flat file doesn't save you significant space, and makes the whole thing totally unusable.

    51. Re:The main issue with XML is performance by Boing · · Score: 1
      Part of the XML standard allows for data to be in any order

      That's true, in that the XML standard says nothing about what content you should put in your elements... it simply specifies the syntax with which you create them. On the other hand, strict ordering is a supported part of XML. I mean, by your logic, a regular XHTML file could appear as follows (minus XML declaration, etc), and mean the same thing as the "correct" version:

      <html>
      <body>
      <div class="second_paragraph">Who's there?</div>
      <p class="first_paragraph">Knock, knock.</p>
      </body>
      </html>

      Yes, if the definition of your specific type of XML file does not specify ordering, then you're pretty much boned when it comes to parsing... you'll potentially have to read in the whole thing. Then again, any half-decent data designer will presumably use strict ordering in the document definition unless there's a good reason to leave it unordered. And if there's a good reason to leave it unordered, then you would have been boned with a flat file anyway, since you obviously have a flexible enough structure that parsing your flat file would be just as (or more) complex.

    52. Re:The main issue with XML is performance by maghen · · Score: 1

      Elliotte Rusty Harold has also written "Processing XML with Java", which, in the first chapter makes a case for XML vs flat files.

      // Magnus

    53. Re:The main issue with XML is performance by Anonymous Coward · · Score: 0

      Well you could code all your programs in pure assembler and spend hours tweaking every line to get most performance out of it - but you don't, you use a high level programming language and accept the performance hit.

      Same goes for XML.

    54. Re:The main issue with XML is performance by I8TheWorm · · Score: 1

      Hey, I appreciate the link. I haven't read very much yet, but so far the document has been very insightful. That's been the first really strong case for better uses for XML than I've seen (this, coming from a member of HR-XML). Thanks for digging that up for this thread.

      --
      Saying Android is a family of phones is akin to saying Linux is a family of PCs.
    55. Re:The main issue with XML is performance by cluckshot · · Score: 1

      Having worked for more than 2 years on XML I would like to say a few things about it. It is a well demanded waste of time. The customers want it because somehow it is "Machine" and "Human Readable." Honestly it is neither. Humans don't think this way generally unless they are geeks or hackers. Machines hate this stuff. XML geometrically degrades in performance the larger the document you must use to extract information from is.

      To speed this up XSLT and other methods such as DTD's were developed. The problem is that these destroy the reason for XML. XSLT and similar technologies take a document and do a Linear transform on the data for a "One Shot" view. This is fast but it honestly is only for viewing not for real "Using." DTD's are hopelessly rigid in their function.

      The XML design expands data many times in band width. This requires a process expensive reduction mechanism at both ends of the process. We need Broad Band to use XML but if we sent a binary file we would hardly use the old dial up bandwidth. Even the tagging process stinks. It is either too free form to use or it is hopelessly rigid (DTD style). Yea XML sounds sexy but it really is a bad idea.

      If we were to use tagged data, it would be better to use a pseudo xml where the top of the document defined the data tags with an numeric relationship and the rest of the document sent a set of number tags with the data.

      Also if XML is to be functional we also need to see it reduced in complexity. To do this we need to either use Attributes a better way or we need to eliminate them. Similarly we need to knock out the "unparsed" CData segments and just make the data a tagged set. It needs either flattened this way or forgotten.

      If XML is used to mine data or find something the original user did not laboriously design into it, the processing of it is horridly long. The choice is parse followed by recursive parse after parse or massive memory hogging. Otherwise you get out of XML ASAP!

      --
      Never Politically Correct ~ I prefer the facts If you don't like what I say, get a life, or comment yourself.
    56. Re:The main issue with XML is performance by gorilla · · Score: 1

      For me the #1 priority has to be correctness. If I don't care about the application being right, then I don't need to write it at all, I can just pick one at random. XML lets me be sure that the data is correct, while CSV and binary formats don't. Only once I've got this correctness established, the next priorities are programmer time - I don't want to spend 2 years saving 1 byte of bandwidth. Again XML saves me time, because I don't have to spend a lot of time writing format convertors, or import routines for each different version of the CSV. Only once I get these down do I worry about bandwidth. As for performance, performance goes down a huge amount if you have to spend 2 days tracking down why your application isn't working correctly because of some corrupted data.

    57. Re:The main issue with XML is performance by butane_bob2003 · · Score: 1

      Ouch, thats pretty weak. I have found that XML support in database products is usually pretty bad. With SQL Server, there is not much you can do but use what works the best. Too bad there is no one to ask about optimizing the XML implementation.

      --


      TallGreen CMS hosting
  8. XML... by the+man+with+the+pla · · Score: 5, Insightful

    I think one of the main problems with the embedding of XML architecture into office productivity software is unfortunately the end user. I mean, how long have programmes like MS Word had "document properties" contained in them, and how many people are actually using them? I'm currently working on a project to retrieve documents accross a company's backed-up data from the past 10 years, and there is very very little metadata available for us to do any searching on. Unless the embedded XML contained within office suites is brought more "to the fore" and in the face of users, instead of being a behind the scenes 'option', people just are not going to use it

    --
    The linux hacker
    1. Re:XML... by Zo0ok · · Score: 2, Interesting

      I saw a Microsoft demo that was supposed to show how powerful and useful it could be to insert XML-tags into Word documents. The idea was to fill the Word document with useful information (just fill in the users name here, and all information about the user is automatically inserted, now how good isnt that?). MS calls this Smart Document.

      So, I took a look in the XML-file that the connected to the Word document to make it smart. I wasnt very impressed (but fairly amused) when I saw that the XML-file was like 30 lines of blahah, and in the middle of it I found a reference to a .dll-file.

      If I need to write a .dll-file that conforms to Word interfaces (that MS of course will have debugged and patched in about two years, and then they'll obsolete it by releasing a new version), then writing something that GENERATES a Word document and gives it to the user makes much more sense to me...

      And in any case, XML has nothing to do with it... they could of course have created "tag" functionality in Word without using XML.

  9. What? by ActionPlant · · Score: 0

    So, are they touting application, or merely increasing your ability to do something useless?

    XML has great web potential, but saying so is dangerously rehashing an old argument and certainly not new.

    So why would we want a book that supposedly teaches us how to use something for which we as of yet have little use in the online world?

    I get the point. I thing application and implementation should be pushed before this.

    It's like teaching someone to program in cobol without giving them a robotic arm on which to experiment.

    Damon,

    --
    http://actionPlant.com
    1. Re:What? by ViolentGreen · · Score: 1

      So why would we want a book that supposedly teaches us how to use something for which we as of yet have little use in the online world?

      Maybe people in the professional world can find use of this. XML is used quite a bit in my experience. It is a format that requires no special software to read; just the corresponding dtd. This makes it both platform and software independent. It is also a very useful format for transfering data between different applications.

      --
      Not everything is analogous to cars. Car analogies rarely work.
  10. where are the open source XML repositories by acomj · · Score: 4, Interesting


    XML would work better if there were consistent DTDs for tagging information that everyone would use. There should be an open database of these DTDS.

    I was looking for a simple one to tag photos with. Couldn't find it, made my own. Is there a repository of these DTDs out there?

    1. Re:where are the open source XML repositories by Anonymous Coward · · Score: 5, Informative

      Maybe here?

    2. Re:where are the open source XML repositories by mugnyte · · Score: 0, Interesting

      No. Because XML is supposed to be extensible. The closest thing may be a XLST that taught everyone looking at your file what tags you had to display and their data types. But as for standardizing - everything that gets fixed becomes a limit to be broken some day.

      Other than that, you're defining a file format. Why bother? Many formats already exist that encapsulate some form of metadata. Keeping the metadata with the file is another whole subject anyway. Metadata itself should be connect-able to the target file, but not bound. There's a lot of thought about this already going around.

      mug

    3. Re:where are the open source XML repositories by Anonymous Coward · · Score: 0

      There are large project repositiories at places like Oasis (http://www.oasis-open.org/home/index.php), but for medium to small projects there needs to be an effective repository of schemas and DTDs.

    4. Re:where are the open source XML repositories by Arrgh · · Score: 2, Informative
      What's the matter, is your Google finger broken?

      Let's see... A <digital> element contains zero or more <frame>s, each of which can contain an <image> with a URL.

    5. Re:where are the open source XML repositories by supergwiz · · Score: 0

      XML.org has a Schema/DTD registry. Most of them are industry specific (I doubt you'd find a general usage definition such as photo tagging) but this is the largest repository I've seen.

    6. Re:where are the open source XML repositories by Anonymous Coward · · Score: 0

      If you are a publisher, maybe this dtd would be of use to you.

      As you can see, dtd's can get as complicated as you'd like...

    7. Re:where are the open source XML repositories by GeckoX · · Score: 5, Informative

      You have absolutely NO idea what you are talking about, and of course have been modded +3 insightful. Good one mods.

      XML is extensible by it's very nature. By itself, an xml file is just that, an xml file, it means absolutely NOTHING without context and definition.

      This is what DTD's do. They don't limit xml in any way, rather they describe a particular use of xml. For example: SVG, MathML and XHTML are all languages that use xml. Each one of these languages have a DTD that define the format for a valid xml document FOR THAT LANGUAGE.

      Just because a DTD for SVG exists doesn't mean that anything at all has changed with xml itself.

      Next, XSLT is a technology with a very specific purpose, simply put: To take an xml file as input and create a new xml file for output based on the rules written into the transform.

      So, with all of that said, there is absolutely NO reason why there shouldn't be a DTD repository, and again, there is no reason why there shouldn't be a PhotoAlbum DTD in that repository. What problems would this cause? None. What benefits could be observed? Instead of everyone needing an xml document to describe photo albums rolling their own format, people might just reuse a standard DTD to do so. And application writers just might too. And lo and behold, Application X on platform Y might be able, with no work involved, open Album AA Created by Application BB on platform CC.

      Getting some of the big picture?

      --
      No Comment.
    8. Re:where are the open source XML repositories by MBoffin · · Score: 1

      DTD Repositories

      There are some other great ones out there, but that should get you started.

    9. Re:where are the open source XML repositories by JamesOfTheDesert · · Score: 1
      By itself, an xml file is just that, an xml file, it means absolutely NOTHING without context and definition. This is what DTD's do.

      DTDs are but one way to do this. W3C schemas, RELAX NG, or simply a memo sent from me to ytou will also do the trick. DTDs are a good way to enforce contrants on an XML document, but a poor way to communicate among humans. None of these formats help convey much about semantics or appropriate use.

      Anyway, you're confusing XML the syntax spec with specific markup language that use the XML syntax. And, as others have pointed out, there are respositories and directories for many of these languages, if you care to poke around a bit.

      --

      Java is the blue pill
      Choose the red pill
    10. Re:where are the open source XML repositories by GeckoX · · Score: 1

      No, the post I was replying to was confusing XML syntax with specific markup languages, that was the exact point I was trying to make and clarify.

      The post I was replying to was insinuating that DTD's are useless because they impose limitations on xml and make xml harder to use, which is just not the case. DTD's, as with Schemas or your proverbial memo all exist to make xml useable in a given context.

      --
      No Comment.
    11. Re:where are the open source XML repositories by swimfastom · · Score: 1

      "Next, XSLT is a technology with a very specific purpose, simply put: To take an xml file as input and create a new xml file for output based on the rules written into the transform."
      Keep in mind XSLT can transform the xml document into more than just another xml file. It can be xml, html, PDF, or others... This is one of the benefits of xml!!

      --
      http://tomgould.com/
    12. Re:where are the open source XML repositories by mugnyte · · Score: 1

      no idea? think about it guy.

      An open DTD repository would serve no purpose. For example: A Photo. ANY definition of a "standard" for Name, Location, Date, Photographer and Content in a DTD repository would immediately fragment. Further tags would be defined (film used, camera, settings, ambient light levels, digital source information) or other esoteric tags for specialized uses.

      So then, we start another DTD, layered on top of/extending the prior (or not, where unknowingly someone starts a new one) for the professional printer or photographer, etc. Of course, then we get into content tags (PR0N based, geographic, designer-esque terms). It's the long slow trail to a failed Grand-Unification Theory of information. It takes a dictatorship to run, which clears up some mess while now being an entity to appeal and fight with about standards.

      You say yourself, CONTEXT is king. I agree, and so the components of metadata (ignore the XML strawman for a second) are context sensitive. There are any many contexts as uses. Imposing a fixed list (or tree) of contexts on information has this fuzzy appealing concept of "global search and find, a univeral catelog" - but it's a red herring. The structure of metadata gets knocked about in every instance until it appears as a all-in-one gigantic structure (EDI comes to mind) or woefully inadequate.

      - Search your favorite P2P, find a universal tagging scheme? Even with the MP3 v1 and v2 embedded tags, it's a mess.
      - How many extensions of C are there?
      - We've seen photo album fomats come with every package. Why didn't anyone reuse an existing format? Because they thought it inadequate.

      Now getting into XML, building a database from such a source has been verified over and over: bad idea. Parsing, cleanup, indexing is all great for a one-time hit, but not if XML is the primary data store. XML is the usage of an arbitrary grammar. It follows the same rules of context and reuse than everything has, from programming languages to other structured metadata : good only to a point, and always behind the times at the general level.

      "To reuse the tools from another vendor" is a great goal, but each vendor may *not* want to be compatible, for many reasons. Also, if they can provide a catelog of *their* images with extensive behavior, why change? a slew of big vendors (AutoDesk, Adobe, Microsoft, Bently) are not cross-compatable because they don't want you to change. Read other formats, but do not write.

      CLOSED DTD repositories work great. GM has a large library of DTDs that its vendors must obey. But there is no central repository because there is no central "context owner". Much like a linguistics issue, in my mind.

      mug

    13. Re:where are the open source XML repositories by Anonymous Coward · · Score: 0

      Errr...no...

      XSLT can only transform XML into XML...

      XSL-FO is actually XML, that can be transformed into PDFs etc by an XSL-FO transformer, but, that's not done using XSLT.

  11. XML by Anonymous Coward · · Score: 2, Funny

    ... a floor wax and a desert topping...

  12. ![CDATA[This is effective XML]] by fedor · · Score: 1, Funny

    "This is even more effective"

    --
    :wq!
    1. Re:![CDATA[This is effective XML]] by Anonymous Coward · · Score: 0

      Just so you know: :wq! is an odd command. You're writing (w), then quitting withoug writing (q!)

      Any particular reason?

    2. Re:![CDATA[This is effective XML]] by swingkid · · Score: 1

      from the VIM help file: :wq! Write the current file and quit. Writing fails when
      the current buffer does not have a name.

      It doesn't force quit, it forces the write then quits.

    3. Re:![CDATA[This is effective XML]] by fedor · · Score: 1

      ...and besides that, it's my signature...

      --
      :wq!
  13. apache has a project called Xindice by Anonymous Coward · · Score: 0

    that is a XML database. xindice looks interesting, though I wonder how it will scale?

    1. Re:apache has a project called Xindice by janbjurstrom · · Score: 2, Informative

      True, Xindice (Apache license, has reached version 1.0) looks good (I've no experience with it), but some of the original developers (Tom Bradford - dbXML, see below, and Kimbro Staken - Syncato, also below) of the source donated to Apache think they (Apache) haven't made the most of it. I don't know if this is true, and I don't know nor have any connections with either Bradford or Staken, but they seem like competent developers; they certainly churn out code - positive sign, right?

      There is choice :): Check out Kimbro Staken's weblog Inspirational Technology (who also develops Syncato, an XML database weblog system using Berkeley DB XML.):

      Consider Berkeley DB XML (currently at v1.1.0). Built on Berkeley DB and identically licensed (open source, free for non-commercial/development use, etc.); tons of APIs - can't get hold of the link but one of the developers (at least I think so) maintains a weblog of 'all' things Berkeley DB XML. Googleit.

      Bradford recently released dbXML under GPL (commercial licenses available should you need it), there's a v2.0 beta available at the site.

      Another native XML database is eXist, at version 0.9.2, java-based, LGPL licensed, I've only glanced at it, looks alright though I'm not the guy to say..

      Then there're several commercial alternatives - X-Hive, Birdstep, Virtuoso, et al. - but this is Slashdot so..
      Well, someone called Ron Bourret has compiled a full-bodied overview of XML databases, and have a big list of XML/DB links too (some link-rot). Goto.

      --
      668.5
  14. milaf, if you could expand a bit... by Randolpho · · Score: 4, Insightful

    Does the book discuss the pros and cons of XML? Such as, when is it a good idea to use XML? When would a CSV, INI, or other structured text document be a better choice than XML?

    These are issues that need to be solved first, before one creates an effective XML structure. Does the book address them?

    --
    "Times have not become more violent. They have just become more televised."
    -Marilyn Manson
    1. Re:milaf, if you could expand a bit... by LetterJ · · Score: 4, Insightful

      Unfortunately, most Slashdot reviews are little more than book reports with pretty much no analysis. They end up just listing what the chapters contain.

      Incidentally, one of the main reasons to choose XML over either CSV or INI is that both of those formats are pretty driven by rigid "column" type structures. In most INI files there's only room for pairs of names and single values. In CSV records are one row with a set number of fields.

      XML lets you expand the children fully and represent more complex data. For instance, a classical CSV file with address information for customers would have columns for street address, city and then start to have problems when you start having columns for State (when you actually consider the world outside the US), postal codes, etc. If this is in XML, you can have your schema be more flexible and say that each <customer> contains a <shippingaddress> element which can contain either a <state> or a <province> or neither.

      In other words, you can use trees to represent data instead of flat rows. I'm not saying that it's the be-all and end-all that the evangelists say it is. There are still lots of places that simpler text files and other data storage formats are better, but XML can be useful.

    2. Re:milaf, if you could expand a bit... by Randolpho · · Score: 1

      Actually, I already understand that. The problem is; what happens if you don't *need* to represent a tree for your data? Why should you use XML rather than some flat, easier-to-parse CSV file? I mean other than the fact that XML is the current buzzword, of course.

      Every book on XML should address this issue. I wonder if this book does.

      --
      "Times have not become more violent. They have just become more televised."
      -Marilyn Manson
    3. Re:milaf, if you could expand a bit... by tomhudson · · Score: 1
      Here's another one:

      If xml is so great, why wasn't the review written in xml? Why wasn't the book written in xml? Why aren't its' advantages obvious as opposed to the disadvantages (bloat, slow, etc).

    4. Re:milaf, if you could expand a bit... by Anonymous Coward · · Score: 0

      I've never understood this overzealous markup of address data. Why not store the address as a simple block of text? I'm fed up with trying to massage my address into someone's preconceived notion of what an address *should* look like. Why do you need my street number in a separate field anyway? Will you run SQL queries on it?

      What programmer can seriously hope to create a scheme that will encompass all the different ways of writing addresses in all the countries of the world? Just treat the address as a block of text. That is what the post office does. If you need to store the state --- use a separate field.

    5. Re:milaf, if you could expand a bit... by Hacksaw · · Score: 1

      You can represent arbitrarily complex data quite well with C syntax. It'd be quite easy to parse, and lots of people know it already.

      --

      All the technology in the world won't hide your lack of vision, talent, or understanding.

    6. Re:milaf, if you could expand a bit... by grrussel · · Score: 1

      AFAIK, he claims to have used Open Office in writing this book; guess what? The file format it uses is XML.

      Previously the same author has used the docbook XML document processing system for writing books.

    7. Re:milaf, if you could expand a bit... by tomhudson · · Score: 1
      I didn't think I would have to state the obvious, but The review, AS POSTED, is not in xml. If it was, it would be almost unreadable (despite the claims of xml advocates that xml is a human-friendly format).

      Stuffing everything into nodes of trees which don't even allow for (c-style) unions sux.

  15. XML Limited in at least one regard. by AllergicToMilk · · Score: 4, Interesting

    One of the things that I have found limiting about XML is that it is inheirently hierarchical. Real "things" can be categorized many ways. Hierarchical classification systems (such as our modern file systems) work poorly to classify a broad scope of information. Thus, some of the new development in the FS in Longhorn and also some I've head about, but can't remember, for Linux.

    --
    There are only 6,863,795,529 types of people in the world.
    1. Re:XML Limited in at least one regard. by Anonymous Coward · · Score: 0

      Mod the parent up.
      XML is hierachical and we know we can't do heirachical DBs as we gave that up in the 60s. No-one has come up with a logical mathematically sound model that improves upon the relational model.

      Just because today's DBMS suck doesn't mean we should give up on the relational model. SQL is _not_ relational.

      Anything that allows NULL in tables columns is not relational.

      When I see XML and DBMS brought together I get nervous.

    2. Re:XML Limited in at least one regard. by Doctor+Faustus · · Score: 1

      You can treat second level elements as tables, the third level elements under them as rows, and the attributes on those as columns (including some that function as primary and foreign keys), leaving you with a structure that is more-or-less relational.

      I've been drifting this way lately, and it works quite well with XSLT. Rather than following hierarchies that are actually in the XML, I do nested for-each's. For instance, if I wanted a hierarchy in the output of customer and then orders, I would do a for-each on /root/Customers/Customer, get a CustomerID, and then do a loop on /root/Orders/Order[@CustomerID=$CustID].

    3. Re:XML Limited in at least one regard. by Atom+Tan · · Score: 1

      Sure, XML as a format is hierarchical, but various mechanisms can be used to represent arbitrary, non-tree structures in XML, and to categories a single entity in numerous ways. A simple example is defining an "id" attribute for a node and then using "ref-id" attributes elsewhere in the document as shorthand for repeating the original XML subtree.

      More complex relationships can be expressed with more sophisticated mapping approaches: XTM is a good example of an XML dialect that can represent a semantic web of ideas and relationships between them.

    4. Re:XML Limited in at least one regard. by smallpaul · · Score: 1

      Neither file systems nor XML are strictly hierarchical. One has symlinks and the other has XLinks.

  16. Hmm.. by jpsowin · · Score: 2, Funny

    Wouldn't this worth your while?

    Wouldn't this what my while???
    All your base are belong to us!

    (huge eye roll)

    1. Re:Hmm.. by cant_get_a_good_nick · · Score: 1

      My mom caught me worthing my while once... said I'd go blind and everything.

  17. Not a programming language? by lcsjk · · Score: 1

    Glad you cleared that up for us non-programmers. Now if I could just figure out what it really is!

    1. Re:Not a programming language? by Anonymous Coward · · Score: 0

      This is valid XML:

      <parent id="myid">
      <child>mytest</child>
      </parent>

      This is not valid XML:

      <parent id=hello>
      <child text>
      </parent>
      </child>

      Now you know everything there is to know about XML.

    2. Re:Not a programming language? by gbrayut · · Score: 1

      Universal Data Structure Interface

    3. Re:Not a programming language? by Anonymous Coward · · Score: 0
      Now that's just a perfect explanation for a non-programmer! Let me think about that for a week or two or maybe a year. Wait, isn't USB a Universal Data Structure Interface also? So they must be the same. Where can I buy one of those XML thingys?


      Sorry to be facetious, but you lost me with that one!

    4. Re:Not a programming language? by gbrayut · · Score: 1

      Hmm, lets see what the dictionary says-

      Universal:
      Applicable or common to all purposes, conditions, or situations. (think universal remote)

      DATA:
      Numerical or other information. (think numbers/text/records/quotes/sales...)

      Structure:
      The way in which parts are arranged or put together to form a whole. (think layout, format, framing)

      Interface:
      A boundary across which two systems communicate. (more difficult, but think of "communicate" as transfering data)

      So Universal DATA Structure Interface is:
      A common (Universal) way in which numerical or other information (DATA) can be arranged (format/layout/structure) to "communicate" (interface) with other systems.

  18. Here's the list of 50 by FearUncertaintyDoubt · · Score: 5, Informative
    Syntax:
    Include an XML Declaration
    Mark Up with ASCII if Possible
    Stay with XML 1.0
    Use Standard Entity References
    Comment DTDs Liberally
    Name Elements with Camel Case
    Parameterize DTDs
    Modularize DTDs
    Distinguish Text from Markup
    White Space Matters

    Structure:
    Make Structure Explicit through Markup
    Store Metadata in Attributes
    Remember Mixed Content
    Allow All XML Syntax
    Build on Top of Structures, Not Syntax
    Prefer URLs to Unparsed Entities and Notations
    Use Processing Instructions for Process-Specific Content
    Include All Information in the Instance Document
    Encode Binary Data Using Quoted Printable and/or Base64
    Use Namespaces for Modularity and Extensibility
    Rely on Namespace URIs, Not Prefixes
    Don't Use Namespace Prefixes in Element Content and Attribute Values
    Reuse XHTML for Generic Narrative Content
    Choose the Right Schema Language for the Job
    Pretend There's No Such Thing as the PSVI
    Version Documents, Schemas, and Stylesheets
    Mark Up According to Meaning

    Semantics:
    Use Only What You Need
    Always Use a Parser
    Layer Functionality
    Program to Standard APIs
    Choose SAX for Computer Efficiency
    Choose DOM for Standards Support
    Read the Complete DTD
    Navigate with XPath
    Serialize XML with XML
    Validate Inside Your Program with Schemas

    Implementation:
    Write in Unicode
    Parameterize XSLT Stylesheets
    Avoid Vendor Lock-In
    Hang On to Your Relational Database
    Document Namespaces with RDDL
    Preprocess XSLT on the Server Side
    Serve XML+CSS to the Client
    Pick the Correct MIME Media Type
    Tidy Up Your HTML
    Catalog Common Resources
    Verify Documents with XML Digital Signatures
    Hide Confidential Data with XML Encryption
    Compress if Space Is a Problem

    1. Re:Here's the list of 50 by iago · · Score: 1

      In the mighty words of Herb Zipper,

      WHORE!

      --
      Worst Sig Ever
    2. Re:Here's the list of 50 by pb9494 · · Score: 1

      Great ! Now post the rest of the book !

  19. Re:XML by Analogy+Man · · Score: 1

    Too bad this was moderated down...that is the perenial problem with XML and other technologies. A universalist technology compromises a little bit of everything for everybody. Possibly moderator didn't catch the SNL reference.

    --
    When the people fear their government, there is tyranny; when the government fears the people, there is liberty.
  20. My experience with XML by Valar · · Score: 4, Insightful

    It has been my experience with XML that it is like a lot of other things in development: the good developers understand it immediately and have native intuition towards best practices. The bad developers never really get it and spend their time reproducing tricks they saw in a cookbook. That's good and fine until you need something that doesn't quite fit into categories a, b or c. Another example of this is how high school and university data structure/algorithm classes never spend any time of development of new data structures that exactly meet the problem specification. Instead they lay out half a dozen types of linear lists, a couple of trees, and some hashing functions and say, "Well, you can glue just about anything together from this." Perhaps this book takes what is, IMHO, the better approach-- laying out the tools and politely explaining what the implication of each is, rather than attempting to list out pages of cute examples of what each can do.

    1. Re:My experience with XML by Anonymous Coward · · Score: 0
      Instead they lay out half a dozen types of linear lists, a couple of trees, and some hashing functions and say, "Well, you can glue just about anything together from this."


      Well... you can. :-) Seriously, most of the time I find I never need a structure more complicated than a single linked list, since mostly I'm just using the heap to store data temporarily, and I don't need to do extensive searches on it. Hash tables come in handy when I want to use string keys on data sets of unknown size (mainly for use in parsers and such), but I rarely need to use other types of keys or fancy tables. And I find I don't really use trees for anything at all, although I know there are situations where they could be useful... it's just hash tables work better in most cases where I want to do searching, and lists work better in most cases where I don't. I'm mainly thinking I'll use (B) trees for disk-based structures (but if I use a database, I don't have to think about that), or for things like space sorting (BSP-style), which I haven't needed to do yet. And heck, it'd probably be even easier to pick out the right container type in Java (the above was all done in, horror of horrors, C and C++).
    2. Re:My experience with XML by Anonymous Coward · · Score: 0

      the good developers understand it immediately and have native intuition towards best practices.

      Exactly, and with XML we know that the best practice may be choosing not to use it.

  21. Server load could be at the root of XML's problems by mrgoatCEO · · Score: 3, Insightful

    I know that as a student maintaining a website I am in the minority of XML users, but I the main thing that stops me from moving my site (small-scale though it may be) over to using more XML is sheer server load. The fact of the matter is that we still don't have true low-bandwidth database solutions, and until this changes, I doubt that much will be done with technologies like XML (at least on smaller, non-corporate sites) no matter how much potential they have.

    --
    --Goat
    CEO, Goat Software
    Goatblog
  22. 5 years in the business... by pong · · Score: 5, Insightful

    ... and it is starting to dawn on me that trends like pervasive XMLization is going to haunt us for ever. The combination of business-minded consultants that push a market to create demand for themselves and a huge number of clueless but enthusiastic developers that will jump on any new idea and push it where it doesn't want to go unsurprisingly leads to this kind of instability.

    I hate XML with a passion. Let me present you with three examples

    1) Programming languages based on XML.

    Yes, it is true. Perverted minds, somewhere on this planet, actually seems to think that this is a neat idea! Since their initial conception the pivotal point of programming languages have been to raise the level of programming. To move from the computers domain to the human domain - to make it more intuitive an natural for a human being to program a computer. With these new XML-based languages we are moving a step backwards, because truely the only benefit of XML in this context is that it is easier for computers to parse, while it is certainly harder for humans.

    2) XSLT

    Have you tried it? I rest my case.

    3) SOAP

    Okay, initially this actually seemed like a good idea to me, but having thought about it, I really think it sucks. Okay, so it is easier to implement SOAP for a particular platform or programming language, but a wire protocol is like a compiler or an OS kernel in a certain sense - it is okay that it is very hard to write, as long as it is stable and high performance, because it is such a central component.

    1. Re:5 years in the business... by Anonymous Coward · · Score: 1, Insightful

      With these new XML-based languages we are moving a step backwards, because truely the only benefit of XML in this context is that it is easier for computers to parse, while it is certainly harder for humans.

      Flashback to the late 60s and early 70s...

      With these new "high-level" languages we are moving a step backwards, because truly the only benefit of HLLs in this contect is is easier for humans to read, while it is certainly slower for computers to execute.

      XSLT. Have you tried it? I rest my case.

      It sounds like a piss-poor case then, as I've had no issues with XSLT. Yes, it takes some learning, but so does anything if you're encountering it for the first time.

      ...it is okay that it is very hard to write, as long as it is stable and high performance, because it is such a central component.

      It's exactly that -- a trade-off for increased performance and stability. Different tools for different purposes, my friend.

    2. Re:5 years in the business... by Anonymous Coward · · Score: 0
      1) Programming languages based on XML.

      I agree

      2) XSLT

      Have you tried it? I rest my case.

      Yup. Tried it. It works for me. I prefer to off-load processing to the client where possible though.

      3) SOAP

      Okay, initially this actually seemed like a good idea to me, but having thought about it, I really think it sucks. Okay, so it is easier to implement SOAP for a particular platform or programming language, but a wire protocol is like a compiler or an OS kernel in a certain sense - it is okay that it is very hard to write, as long as it is stable and high performance, because it is such a central component.

      See XML-RPC if you don't like SOAP.

    3. Re:5 years in the business... by kwerle · · Score: 1

      I hate XML with a passion. Let me present you with three examples

      1) Programming languages based on XML.


      Yup.

      2) XSLT

      Have you tried it? I rest my case.


      I'm coding some right now, and it's not easy. The thing is this: it is tremendously powerful, and good at doing the one thing it's good at: converting XML to XML. There aren't many cases when you should need to do this, and XSLT beats perl, IMHO.

      3) SOAP

      Okay, initially this actually seemed like a good idea to me, but having thought about it, I really think it sucks. Okay, so it is easier to implement SOAP for a particular platform or programming language, but a wire protocol is like a compiler or an OS kernel in a certain sense - it is okay that it is very hard to write, as long as it is stable and high performance, because it is such a central component.


      Let me start by saying I much prefer XML-RPC.

      I totally agree with you in one respect: you shouldn't care that SOAP uses XML. Why do you? If you noticed it, you must be doing something wrong, unless you're writing an implemntation - in which case you should be grateful that it's so easy to do!

      Finally, my crusade for XML: /etc files. All those damn config files with their propietary one line formats where # signs mean a comment should be written in XML. Then they would all use the same damn parser, and it would be easy for one to borrow information from another, and it would be easy to write a tool to maintain them using the UI of your choice.

      That's the kinda think XML is ideal for: we gotta store some flatfile data. It should be human readable/editable if need be. XML is a great tool for that job.

    4. Re:5 years in the business... by Brandybuck · · Score: 1

      XSLT: Have you tried it? I rest my case.

      I've tried it, and I ended up loving it. Of course, I'm not using it as it was intended. I'm using it to convert DocBook into HTML and PDF statically. This is a heck of a lot better than using SGML/Jade.

      Like XML, XSLT is being adopted in areas the authors never really intended, but ignored in those they did. I use XML in several areas, so when my employer offered to send me to an XML class for free, I accepted. It was horrible! The examples used by the professor always kept coming back to this web application where everything was done on-the-fly dynamically by XML tools. He raptured ecstatically about a future when all web browsers would be native XML browsers, downloading schemas on the fly to render the pages. Hah! It all looks good on paper, but falls down when it hits the real world. He never once mentioned real world applications of XML, and when I inquired about using XML for file formats and structured documentation, he said, "Hmmm, interesting ideas. I guess it could be used for that...". I won't say his name, but this is one of the chief XML evangelists.

      --
      Don't blame me, I didn't vote for either of them!
    5. Re:5 years in the business... by freejamesbrown · · Score: 1


      1) i've worked on a custom xml-based programming language. it's actually be really useful for our situation. we didn't have to write our own parser. the runtime isn't time dependent. the users of the language are moderately technical psychologists, but not programmers. they've been able to use the xml-based language to change AI rules for the larger piece of software without having to recompile anything or understand too much or pay developers to change that logic. i completely disagree that it's harder for people to understand. it depends on the language you've created.

      2) i won't go there.

      3) but soap could be used all over the place. should what it does be hard? didn't you rant against "consultants that push a market to create demand for themselves"?

      m.

    6. Re:5 years in the business... by Atom+Tan · · Score: 1

      3) SOAP
      Okay, initially this actually seemed like a good idea to me, but having thought about it, I really think it sucks. Okay, so it is easier to implement SOAP for a particular platform or programming language, but a wire protocol is like a compiler or an OS kernel in a certain sense - it is okay that it is very hard to write, as long as it is stable and high performance, because it is such a central component.


      Without regard to particular Web Services implementations (the ones I've tried have been difficult to use), the idea of an application communication protocol that is readily decipherable by tools and humans is pretty powerful. I think SOAP complements (not replaces) binary protocols like RMI, DCOM, or CORBA by providing for a looser coupling. For example, if the semantics of two particular services are the same or similar, it should be possible for a client to switch from one provider to another (or choose from multiple providers without modifying the application objects). The modification could instead be done in the binding between XML and application objects, or perhaps even by performing transformations on the XML going in and out over-the-wire. Loose coupling between systems is not impossible with RMI or CORBA, but is easier with SOAP in situations where the parties are independent and there is no possibility for everyone to conform to a standard language and interface.
    7. Re:5 years in the business... by jandrese · · Score: 1

      Most of those files that use # for a comment do have a single parser in common: /bin/sh

      Most of the time those "config files" are actually just little scripts that get sourced into whatever startup script needs the information. The variables you're setting are actually environment variables. Interestingly enough, if you changed the /etc files to XML, you would have to add the step to parse the variables in the XML files into (most likely) environment variables so your scripts could use them.

      Besides, if your configuration files include the default value and a comment, they are very easy (relatively speaking) to configure. When you see something like:
      defaultrouter="NO" # Set to default gateway (or NO).

      Most administrators know what to do.

      --

      I read the internet for the articles.
    8. Re:5 years in the business... by Anonymous Coward · · Score: 0

      2) XSLT

      Tried it. Love it. Yes, it's a tricky beast, but now I've used XSLT for lots of Intranet applications for transforming XML data into HTML (and SVG charts). For some Intranet apps, I'm doing the transformation on the server with XMLDOM, and for some I'm sending the XML and XSLT to the client for rendering (and re-rendering) client side, with no trips back to the SQL server.

      You can keep my "firm phonebook" Intranet app open on your browser ALL DAY long and it will never go back to the SQL Server at all. Efficient? You betcha! I have column sorting, filtering, and all manner of bountiful goodies, with just XML, XSLT and a bit of Javascript.

      It's gone from being a mildly useful app, to being used regularly by 100 or more users a day - and I have an dynamic SQL->XML->XSLT->SVG graph which shows the daily upward trend over the last 2 years. Based on that, and the feedback I get I KNOW it was worth the pain.

      The XSLT Templates are a joy - I use them like HTML functions, and once I write something, I just plug them in where I need them, over and over again.

      I use 1 ASP page to generate the XML document, and then depending on the user's choice, I can transform the same firm data into different things, the main PC browsable phonebook, a B&W tightly bunched printable page, a clickable floorplan navigator, a photo-gallery, an export for the printed PDF phonebook etc etc etc

      I too looked at XML with some skepticism, but I'm confident enough to know now that it has real potential, and is very very useful to me.

    9. Re:5 years in the business... by LetterJ · · Score: 1, Interesting

      "2) XSLT
      Have you tried it? I rest my case."

      Yes and I wouldn't rest my case on that statement if I were you.

      I've been working with XSLT professionally (for big clients including 3M) for 3 years, building the top tier in 3 tier architectures and have no problems working with it. It makes perfect sense for what it is: a solution for turning XML into something else, whether another XML document, another XSLT stylesheet (which I'll admit can be a brainbending exercise), HTML or plain formatted ASCII. In places where multiple presentations will exist for a given dataset or the presentation will change due to constantly redefined presentation requirements (ahem marketing ahem), XSLT gives you the flexibility to just keep building the same XML documents in your app and make them look like they're supposed to with different XSLT.

      <shamelessplug>
      Incidentally, I'm looking for a web development contract in St. Paul/Minneapolis if anyone's looking for an XSLT expert (or PHP or any of my other areas of expertise) who actually knows how to solve real problems. Email me for more info.
      </shamelessplug>

    10. Re:5 years in the business... by kwerle · · Score: 1
      Most of those files that use # for a comment do have a single parser in common: /bin/sh

      A bunch do, but many don't. For those that are really dressed up sh files, I have to agree with you. For the rest, a standard format (XML) would be nice.

      bombadil% ls /etc/*.conf ...
      • /etc/6to4.conf: shell
      • /etc/gdb.conf: not shell
      • /etc/inetd.conf: not shell
      • /etc/kern_loader.conf: blank?
      • /etc/named.conf: not shell
      • /etc/ntp.conf: not shell
      • /etc/resolv.conf: not shell
      • /etc/rtadvd.conf: no clue
      • /etc/slpsa.conf: probably shell
      • /etc/smb.conf: not shell
      • /etc/syslog.conf: not shell
      • /etc/xinetd.conf: not shell

      So in that set, around 3/4 were not shell files.

      Most administrators know what to do.

      But it would be real nice if there were a standard format, and those files were easy to parse, and there were a (very) few config tools that would do the right thing(tm), and one didn't have to be an "administrator" to get it right.
    11. Re:5 years in the business... by Willard+B.+Trophy · · Score: 1

      XSLT is considerably easier than the transformation language for SGML, DSSSL. Imagine Scheme blended with CSS; that's DSSSL. Groo!

    12. Re:5 years in the business... by retinaburn · · Score: 1

      It sounds like you hate what people do with XML rather than XML itself. Its like saying 'I hate paper bags, because once some kids put some feces in a paper bag, put it on my doorstop and lit it on fire.' We use XML for the configuration files for several of our newer products. We have DTD's for them, and parsing on load up is a breeze. However the files are all less than a hundred lines.

    13. Re:5 years in the business... by FooAtWFU · · Score: 1

      "XSLT: Have you tried it? I rest my case."
      As a matter of fact, a decent portion of my work a few years ago was with XSLT. It worked fine and beautifully. Of course, what were doing was really flouting the very design of XML itself itself with what we were doing... :)
      In essence, we wrote simple Java classes (yes, we were in Java, that's another matter =b) to query the database, add quick XML-like wrappers, send them back to the servlet, and then get it parsed into a web page with XSLT. You could get a whole lot of code re-use from that sort of thing.

      --
      The World Wide Web is dying. Soon, we shall have only the Internet.
    14. Re:5 years in the business... by Anonymous Coward · · Score: 0

      I'm coding some right now, and it's not easy. The thing is this: it is tremendously powerful, and good at doing the one thing it's good at: converting XML to XML. There aren't many cases when you should need to do this, and XSLT beats perl, IMHO.

      Tremendously powerful? Maybe if your implementation is super fast and has an incredibly deep stack. Still, I've definitely found it to be the best tool for converting XML to XML.

      Finally, my crusade for XML: /etc files. All those damn config files with their propietary one line formats where # signs mean a comment should be written in XML.

      If the configuration has an inherent tree-like structure, XML may be a good choice. Otherwise the simple "key=value" or "key value" format is the best. XML is overkill for anything that's only one or two levels deep. (unless you expect to have to do transformations on it) It's far harder to parse for both humans and machines and, in these cases, would not gain you much. (Most of them already *could* be read by a single parser, and you're better off standardizing on a simpler parser for the rest.)

      Then they would all use the same damn parser, and it would be easy for one to borrow information from another, and it would be easy to write a tool to maintain them using the UI of your choice.

      What you really need for this is meta-data about the file format. That is, the equivalent of an XML schema, only I don't think even that would contain enough data to be ideal. (it might...)

      In any case, think how much larger all those daemons would be if they had to include an XML parser or at least parse through a DOM tree... Sure, we've got memory and cycles to waste, but that doesn't mean we have to.

    15. Re:5 years in the business... by kwerle · · Score: 1

      XML is overkill for anything that's only one or two levels deep.

      It is a reasonable standard, and in the absense of others (which seems to be the case in /etc), it seems like a good way to go.

      In any case, think how much larger all those daemons would be if they had to include an XML parser or at least parse through a DOM tree... Sure, we've got memory and cycles to waste, but that doesn't mean we have to.

      A single shlib. They'd be smaller because they wouldn't include their own parser.
      As for efficiency, I wonder how fast the parsers are they all these programs have. I guess they're plenty fast, but so are xml parsers. Finally, they usually parse these files only once, or seldom.

    16. Re:5 years in the business... by Jellybob · · Score: 1
      and one didn't have to be an "administrator" to get it right.

      Why is everyone so obsessed with people not having to be admins to configure software without reading the manual (and if you have to, learning to read it).

      You wouldn't expect someone with no knowledge to be able to change their car radiator, or for that matter the oil. You have to read the manual, or know what your looking for.

      If you can't read the Haynes manual, you need to learn what it's talking about first.
    17. Re:5 years in the business... by kwerle · · Score: 1

      Why is everyone so obsessed with people not having to be admins to configure software without reading the manual (and if you have to, learning to read it).

      Most users don't need to read the Windows or Mac manual before setting up their computer. That's because the config tools are self documenting. That's how it should be. See, there are applications that you use to configure your computer, and they have text that tells you what's going on. If you're really lost, they have help buttons. vi is not the right tool to configure your system.

      You wouldn't expect someone with no knowledge to be able to change their car radiator, or for that matter the oil. You have to read the manual, or know what your looking for.

      See, your radiator doesn't have [much] text on it, nor do the tools needed to replace the radiator. There is no "replace radiator" application. There is no place for a help button if you're really lost. That's how computers should be different from cars - they should be more self-documenting and easier to use.

      If you can't read the Haynes manual, you need to learn what it's talking about first.

      No, the computer should do the right thing. So should the software.

      Thanks for keeping linux and other free software obscure!

      Oh, and w0w - yer really a l33t h4xor!

    18. Re:5 years in the business... by cloudmaster · · Score: 1

      A radiator is somewhat self-documenting. Anyone qualified to work on a car can look at it and tell which bolts need to come out, which hoses should be removed, and how to do those tasks. The mechanic in question can do that because the basic organization is roughly standardized, using a few common tools and fasteners to perform several different tasks. Similarly, the use of a standard set of tools in a config file (like XML's structure rules) would allow for simpler discovery of data.

      I can walk up to most any car and identify common parts, perform maintenence, etc. Similarly for computers based on a system I'm familiar with. Some of the /etc files aren't in a common format, though, and I do think it'd be helpful if they were forced to self-document at least a little. XML would do that. As a programmer, I know darned well that documentation doesn't usually happen unles it's forced. :)

  23. Nicest use of XML I've seen by Space+cowboy · · Score: 1

    ... is XML-RPC. A sort of lightweight SOAP. Very very useful for API's when you're doing cross-platform coding...

    The site has loads of implementations of both server and client code, some in *very* obscure languages :-)

    Simon.

    --
    Physicists get Hadrons!
    1. Re:Nicest use of XML I've seen by Anonymous Coward · · Score: 0

      XML-RPC and SOAP are deranged. Remote procedure calls don't need to contain optional whitespace and comments, and since we know they'll be produced and consumed by machines (not by hand) there's no justification for avoiding an efficient binary format like ASN.1 PER.

    2. Re:Nicest use of XML I've seen by Space+cowboy · · Score: 1

      How much easier it is to create ascii for the transport when you're coding in PHP,Java and C++ for the same project across a range of OS's.

      How much easier it is to debug and produce test cases on all those platforms when it's an ascii file...

      If you're concerned about size or security, then use gzip and ssl. No problem.

      Not sure I'd do it if I wanted the absolute last percentage point of performance from the system, but overall, in any application I've coded, it's a major win.

      Simon.

      --
      Physicists get Hadrons!
    3. Re:Nicest use of XML I've seen by fredrik70 · · Score: 1

      hey, where's the brainf*ck implementation?!

      --
      if (!signature) { throw std::runtime_error("No sig!"); }
    4. Re:Nicest use of XML I've seen by dbc · · Score: 1

      XML-RPC. A sort of lightweight SOAP

      You realize, of course, that SOAP was derived from/inspired by XML-RPC, don't you? The relationship is not quite so accidental as you make it sound.

    5. Re:Nicest use of XML I've seen by Trejkaz · · Score: 1

      This is what Sun have done in their Fast Web Services prototype, for what it's worth, converted XML to ASN.1 and back, in order to transport SOAP as binary.

      Of course why use even ASN.1? There is Java RMI, too, and most people overlook the fact that there are many implementations of RMI in other languages, even C, and even hardware!

      --
      Karma: It's all a bunch of tree-huggin' hippy crap!
    6. Re:Nicest use of XML I've seen by Trejkaz · · Score: 1

      I guess what you'd do for debugging is to run your binary format through a transform to XML. Afterall, tools like Ethereal do a pretty good job of formatting different binary protocols.

      --
      Karma: It's all a bunch of tree-huggin' hippy crap!
  24. X is for the Xtensions, M is for the Metadata... by jefu · · Score: 4, Interesting
    and L is for the Laughter it brings us.

    I have not read this book, but it sounds interesting already.

    XML is an interesting technology that has the potential for changing the way we use technology in all kinds of weird and wonderful ways. (And in a few ways that may not be so wonderful.) But using XML correctly is tough. I've written and discarded more DTDs and schemata than I care to admit because they were seriously flawed. Getting it right is important and very, very hard.

    XML looks simple, and in some ways it is. But in so many other ways it is not simple at all - in large part because it gives us a tool to approach some very hard problems. And hard problems, often even when expressed in the simplest way around, tend to stay hard. (Calculus makes saying some things simple, for example, but understanding those things still takes work and insight.)

    I will be taking a good look at this book in the near future to see what it has to say. And I'd urge those who dislike XML to do the same. And finally, even those who like XML need to think hard about how to use it well, so perhaps this would be a good read for them too.

  25. What are you talking about? by mellon · · Score: 5, Insightful

    XML is just text! If the XML parser is slow, write a faster one! Figure out where the bottlenecks are! Don't give me this XML is slow crap. This is slashdot - you're supposed to be a geek. If you don't like XML, fine, but come up with a geeky reason not to like it, not some problem whose solution is just to roll up your sleeves and do some hacking!

    Oy! :')

    1. Re:What are you talking about? by nat5an · · Score: 5, Insightful

      Okay, fine XML isn't slow by nature. But it's a generalized solution. Not every set of data needs to be stored in a general tree, so putting every set into one will often create a lot of extra work. The benefit of XML is its portablity, and the price is the performance hit you take from packing and unpacking all that data.

      --
      Head down, go to sleep to the rhythm of the war drums...
    2. Re:What are you talking about? by micromoog · · Score: 3, Insightful
      How about the fact that, by definition, it takes something like 10 times as much information to store/transfer data in XML than in a native binary format?

      Having a huge amount of metadata surround every piece of data is not always a good thing. XML is slow, parser issues notwithstanding.

    3. Re:What are you talking about? by larry+bagina · · Score: 3, Insightful

      parsing any text involves character-by-character analysis. No amount of geekdom code rewriting can change that. If an XML file is 3-times as large as a CSV file, it will take 3-times as long to parse. And both will be magnitudes slower than a binary record.

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

    4. Re:What are you talking about? by fishbonez · · Score: 1
      XML is just text!

      Exactly. It is just a form of text tagging. XML is evolutionary and not revolutionary in terms of technology. First there was SGML (Standard Generalized Markup Language) and then there was HTML.

      I remember working with SGML a dozen years ago. It was certainly not easier to use than the old system of formatting manuscripts. In fact it was much more time consuming. But the real benefit was the ability to make an archive of searchable articles with results that could be pulled up and be properly formatted.

      I view XML in the same way. It hurts performance but it allows information to be stored in a format that is much more searchable. That will ultimately create value. Just as SGML tagging created value by allowing publishing companies to have archives of articles that now go back a dozen years.

      --
      Frylock: That's not a toy!
      Master Shake: You say that about everything you own. You should own toys. They're fun.
    5. Re:What are you talking about? by Anml4ixoye · · Score: 3, Interesting

      You bring up some really good points. The reason that you hear a lot of "XML is slow" is because of the usage of XPATH. To use XPATH expressions, most implementations parse the entire XML document into memory.

      I suppose you *could* write a custom parser. If your structure is well-defined, and not subject to a lot of changes, you could significantly increase performance that way. The other option is to parse the document once, get out what you need to get out into smaller chunks, dump the larger document, and only work off the smaller chunks.

      Looks like TMTOWTDI is not just for Perl

    6. Re:What are you talking about? by paganizer · · Score: 1

      Hows this? It's not necessary, it has no real reason to exist, and everything that it can do can be done better by existing products?
      I'm at a loss, and have been ever since it came out, as to why this is becoming a common way of doing things.

      --
      Why, yes, I AM a Pagan Libertarian.
    7. Re:What are you talking about? by Anonymous Coward · · Score: 5, Funny

      Have you ever tried storing a picture in it?

      <pixelrow>
      <pixel>
      <value channel="red" level="0.023"/>
      <value channel="blue" level="0.22"/>
      <value channel="green" level="0.5"/>
      </pixel>

      ...

      </pixelrow>

      ...

      :)

    8. Re:What are you talking about? by Epistax · · Score: 1

      I really haven't touched XML. The reason? From the outside, it lookies bulky and slow. Parsing strings and using only 128 bits of the character isn't my idea of efficiency.

      Could someone explain to me why conversion into, say a binary map or such, isn't an advantage? I can easily see its portability, and ease of use, I just don't see the speed and small size.

    9. Re:What are you talking about? by GeckoX · · Score: 1

      And that would be a PERFECT example of the WRONG tool for the job.

      Now, if you want to compare something to do with images and xml, try comparing Flash files to SVG files and see what conclusions you come up with...

      --
      No Comment.
    10. Re:What are you talking about? by addaon · · Score: 1

      Graph, not tree.

      --

      I've had this sig for three days.
    11. Re:What are you talking about? by helix_r · · Score: 2, Informative

      Have you ever tried storing a picture in it?


      Actually, yes.

      Its called SVG, it is a very nice way to represent graphics.

    12. Re:What are you talking about? by retinaburn · · Score: 1

      I have to agreee. It's a generalized solution so that your data can be portable. Those bastards in the future always come up with new technology that makes all our hardwork now be fairly useless. You create a nice optimized data storage format, and access it for some 10 years. Then somebody wants to port your data to application Y on platform Z. If your data is important enough to take the time to optimize it it may be at least worthwile to ensuring some level of portability in the future. Lose performance now, save money later ?

    13. Re:What are you talking about? by iantri · · Score: 0, Troll

      IANAP, but my guess would be parsing:

      Apple|45|Yes
      Orange|72|No
      Banana|34|No
      Pear|7 8|No .. is always going to be faster than parsing:

      <fruit>
      <name>Apple</name>
      <weight>45</weight>
      <preference pref="yes"/>
      </fruit>
      <fruit>
      <name>Orange</name>
      <weight>72</weight>
      <preference pref="no"/>
      </fruit>
      <fruit>
      <name>Banana</name>
      <weight>34</weight>
      <preference pref="no"/>
      </fruit>
      <fruit>
      <name>Pear</name>
      <weight>78</weight>
      <preference pref="no"/>
      </fruit> ... there's a whole lot more to process. Yes, I know that my example is oversimplified and not much can be gleamed about what the numbers describe in my example, but there is no reason that there couldn't be good documentation.

      XML seems incredibly wasteful -- just because we have 3ghz processors now doesn't mean we should use incredible verbosity for no good reason.

    14. Re:What are you talking about? by gbjbaanb · · Score: 2, Insightful

      wow, wait a minute... you want a geeky reason not to use it... well, how about rolling your own binary parsing data format is a) much, much more difficult for others to understand, b) way faster, c) far more bandwidth efficient.

      there you go - 3 classic geek reasons to do something the hard way instead of the standard, ordinary, easy but OK for mortals way.

      Incidentally, XML really is slow. Sure it looks nice, is easy to understand, easy to create with the simplest of text editors, interoperable, and an industry standard. But it is still a technology that doesn't cut it when you need your data stored in small, fast blobs. A case in point - my previous company used XML everywhere (it was cool, after all), but after a while performance (when sclaed to many users) became an issue. Rewriting the XML-handling object to use a binary format made things much, much, much faster. The XML blobs were then only used for the browser front end, and for debugging on a developer machine. XML is good, but don't ever pretend its all things to all men, in all cases. It isn't. Its slow.

    15. Re:What are you talking about? by mellon · · Score: 1

      ROTFL! I wish I could mod this up! :')

      Er, you were kidding, right? :')

    16. Re:What are you talking about? by sporty · · Score: 1

      I think he's talking about it's displayed in hierarchical way, though you can reference nodes in the language.

      (Note to other readers, all tree's are graphs :)

      --

      -
      ping -f 255.255.255.255 # if only

    17. Re:What are you talking about? by mellon · · Score: 2, Interesting

      This is by no means assured. When you store data in a binary format, you generally have to have code to deal with byte-swapping and other format conversions. Also, generally speaking, the limitation on character parsing is memory bandwidth - if you are using a modern CPU, it is going to spend most of its time waiting for bits to come out of memory, and it doesn't care whether they're an ASCII (or utf8) byte stream or binary words.

      Also, a lot of stuff that goes around in packets is free-form text anyway, not binary data. So in the case where you're just passing numbers around, yes, XML is going to be a bit slower simply because there are more bits to pull out of the buffer. But in the case of plain text, the difference is probably not going to be very significant. In cases where it is significant, you probably don't want to use XML.

      You are right that XML is not a panacea - I wouldn't use it for every application. I think a lot of the anti-xml rhetoric we hear is because so many people do use it for the wrong applications, and then other people see what they've done and start retching.

      A couple more points - XML::Twig allows you to parse XML in PERL without sucking the whole file in at once. Also, the article to which I was replying was talking about SQLXML, which I presume is already plain text. It's tough to imagine that XML is really going to make that significantly slower - if it is, it's probably because of a poor implementation, not increased data size.

    18. Re:What are you talking about? by mellon · · Score: 1

      Have you ever tried to grep a DBM file? See the previous comment, which boils down to "XML's win is that it's a well-known format that can be easily searched and manipulated."

    19. Re:What are you talking about? by mellon · · Score: 1

      That's a really good point. The work I've been doing with XML at work has involved using XML as the outer representation, which is compiled to an internal representation.

      So we have the advantage of being able to use generic tools to search the master data representation, and we can explode the internal representation back into XML, but we generally work with the internal representation.

      And actually, because we have the master copy of the data in XML, we can (and do) generate specialized binary databases from the XML for particular applications, each of which just contains the data we need.

      Personally, I think sexprs are niftier (and smaller, and definitely geekier!), but XML has proven really valuable in this particular application, and we do not have to suffer from performance problems as a result of using it, because we are (IMHO) making good strategic use of it.

    20. Re:What are you talking about? by Anonymous Coward · · Score: 0

      Sounds like you are expecting something from XML that it is not designed to deliver. This is not unlikely given all the hype. But properly designed and executed DTDs wouldn't result in "bloat" at all. If your metadata is that large and your data that small, then you probably should look at other formats. However, you can embed binary data into XML files if you like (or use the XML to refer to binary resources), so even then your complaints are more about usage than the standard itself.

      Obviously it would be stupid to do something like storing a picture pixel by pixel with XML like: <image><row><pixel red="1" green="6" blue="2"/><etc...>. And if all of your data is relatively flat, that's another case where XML may be overkill. But if you have lots of text data that needs to retain a structure (like a book perhaps, or financial transactions, or GUIs), something like XML can be a good fit. The standardization involved and the ubiquity of parsers make it likely that you'll be up and running faster than if you have to first design a format and then write your own parser... and that's the point of XML. An example: XUL is a very nice use of XML for describing GUI interfaces. So compare that to the XML generated by GLADE for building GNOME applications, which creates huge, ugly XML files. So even the same problem domain can have good and bad solutions.

    21. Re:What are you talking about? by Anonymous Coward · · Score: 0

      Okay, fine ASCII isn't slow by nature. But it's a generalized solution. Not every text document requires the whole alphabet+digits+punctuation, so storing everything in ASCII will create a lot of extra work. The benefit of ASCII is its portability, and the price is the wasted space from all those unused bits.

    22. Re:What are you talking about? by Zaiff+Urgulbunger · · Score: 1

      Agree totally. But just to pick out one point here and comment on that:
      XML is just text!

      Thats the main complaint about XML. And its true that storing information as text is most likely to use more resources than a complete binary implementation. So why use XML? Because its easy to implement. Lots of the hard problems like data extensibility and internationlisation have been solved.

      Use the right tools for the right jobs. XML is great for a lot of things, and thanks to Moore's Law, it being less efficient doesn't make a jot of difference. If it did, then we'd be coding everything in pure assembler!!

      Even where XML isn't the solution (a database with more than one concurrent user or more than 200 records might be an example), use a proprietary binary solution but provide XML interfaces where the inefficiencies are not an issue.

      A previous comment was:
      I share your opinion regarding XML, and have yet to find a great reason to use it, other than feeding data to our vendors systems through their proprietary file layouts.

      With XML you can validate a file. You don't have to have previously written code to validate the file -- you can validate it as it is (with a schema). If we take the example of a config file, I can validate the contents of the config file without knowing anything about the application that uses it. Furthermore, I could relatively easily write an XForm to manipulate the contents of that config file and know, absolutely, for a fact, that I haven't stuffed up the contents of the file.

      And thats quite easy to do. But its only an example. The best bit really is not being able to do this, its being able to do it quickly, easily and cost effectively. Therefore, this is exploiting Moore's Law to best effect.

      I think the PR problem with XML is that it *is true* that XML in less efficient in most cases when compared to an existing solution. But such a comparision ignores the benefits of being able to quickly and easily change and extend such a solution. If we just take the namespacing part of XML, this alone make integration with other existing XML based solutions easier.

      The key to appreciating XML is understanding *why* it is good, and *where and when* to use it!

      Anyway, if you want inefficiency, just look at GUI's compared with traditional terminal based applications!! (walks way mumbling about Java and scripting....)

    23. Re:What are you talking about? by Anonymous Coward · · Score: 0

      Then if you parse it into a DOM, that'll take up about 10 times as much memory as the file on disk. Well, it keeps the RAM and hard disk manufacturers in business, I suppose.

    24. Re:What are you talking about? by Zaiff+Urgulbunger · · Score: 1

      F*ck -- and a point I forgot to make was about XML being text files. Text files are XML DOM's in a serialised form (sort of). Text is good because you can read and write text on anything, so it makes interoperating with legacy systems easy without needing anything complex.

      But say what you're doing involves opening XML text files, parsing them, extracting something and closing, well, there's nothing to stop you just dumping the parsed, binary tree structure from memory to disk and using that. Obviously you can't use this for interchange, and you'd have additional issues to deal with if you change the XML data structure latter on, but you are allowed to implement things how you like. People do of course already do this, e.g. storing pre-loaded/parsed XML DOM's as session objects on web servers.

      The point I meant to make in my first post (duh - me), was that XML being a text file isn't (or shouldn't be) a barrier to using XML. Don't get hung up on the text thing, as the text thing is only important for data interchange!

    25. Re:What are you talking about? by rabidcow · · Score: 1

      XML is just text!

      So is this post, but how slow and complicated do you think a program would have to be to correctly parse and accurately represent all of the semantic information contained within it?

      And never mind all the complicated structure XML layers on top of it, TEXT IS SLOW.

      If you want a fast data structure, go for packet-based binary, like what JPEG 2000 and PNG use. You check the first few bytes and can decide to quickly skip the entire packet if it's not important to you. In a text file you can't give a packet length, because you don't even know how many base storage units each character is going to take.

    26. Re:What are you talking about? by Anonymous Coward · · Score: 0

      Please show we a Linux based SVG browser/plugin(*)
      that I can get up 'n' running as easily as a Flash plugin?

      I know Firebird 0.7 and Mozilla 1.5 builds are supposed to have SVG support somewhere but I am lacking clue getting it to work!

      (*)Please not Amaya

    27. Re:What are you talking about? by Anonymous Coward · · Score: 0
      If the XML parser is slow, write a faster one!

      Obviously, you were sleeping during your compiler class! Parsers are inherently provably slow! If the XML is sufficiently complex, parsing is O(n**2), where n~number of elements parsed. There's a hard limit to how fast a correct parser can be.

    28. Re:What are you talking about? by GlassHeart · · Score: 2, Insightful
      my previous company used XML everywhere (it was cool, after all), but after a while performance (when sclaed to many users) became an issue. Rewriting the XML-handling object to use a binary format made things much, much, much faster. The XML blobs were then only used for the browser front end, and for debugging on a developer machine.

      No, your company did the exact right thing in choosing XML. When the nascent system is still being actively debugged, you made the process much easier because XML is human readable. As you begin to scale up its use, you proved a performance problem and relied on the modularity of the code to simply replace the XML code with efficient binary formats. If you had not seen a performance problem (perhaps the bottleneck is elsewhere and inevitable anyway), then presumably you'd leave the XML code alone.

      You started with a general solution, and then optimized as necessary. I consider this an example of a job well done.

    29. Re:What are you talking about? by wfrp01 · · Score: 1

      XML isn't slow by nature

      Hmm, you may be right, but I'm not so sure. Older technologies, such as relational databases, have very robust mathematical and computer science underpinnings. XML has a different heritage. It's the next step in the evolution of a markup language. I realize that comparing databases and XML is comparing apples and oranges, but that's kind of the point. I think the question of whether or not it can be shoehorned into efficient & robust data stores remains to be determined.

      --

      --Lawrence Lessig for Congress!
    30. Re:What are you talking about? by j3110 · · Score: 1

      If only there were a native binary format, we wouldn't have this problem to begin with.

      Is it too much to ask from the W3C for a binary encoded xml format? Maybe to make my point I'll start using one character tags in UTF-8 encoding. Put the popular tags in using 8 bits, and the unpopular tags in as 16 bits, then I'll just do xslt when anyone wants a copy. :) Also, by all means never put unneeded whitespace.

      I bet that would shave a good 20-30% off file size and parse time. Too bad you wouldn't be able to read it because unicode a's are different than normal a's but they would look the same.
      TitleHello World!
      is much smaller than

      Title

      Hello World!

      Now if you really want to be naughty to improve performance of parsing you can require tags that give the offset of other tags.
      026040

      That way you can tell where tags begin without parsing the entire file if all you want is just one little peice.

      I don't know if I can be annoying enough myself to actually get someone to make a binary xml counterpart standard, but I'm sure plenty of /.'ers can come up with seem neat ideas. You'd be amazed at what all you can actually call XML. :)

      --
      Karma Clown
    31. Re:What are you talking about? by Boing · · Score: 1
      Graph, not tree.

      Actually, tree. Excluding things like element references, which are not explicitly defined by the XML spec, no XML element can be directly contained by more than one other element, and all elements can be hierarchically traced back to the root element.

      As counterexample, explain to me how there could be a looping or disjointed XML document.

    32. Re:What are you talking about? by torpor · · Score: 1

      XML doesn't just describe trees.

      I've got hash tables, linked lists, b-tree's, indexes and databases, all working just fine in an XML framework, no performance problems at all, everything works very nicely.

      You're just looking for an argument, not finding one.

      --
      ; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --
    33. Re:What are you talking about? by yomahz · · Score: 1

      This is exactly how SAX works

      --
      "A mind is a terrible thing to taste."
    34. Re:What are you talking about? by DavidMonks · · Score: 1

      XML is not 'just text'. It is encoded in a text-based format, agreed, but so what? The data encoded will not be used/processed/displayed as text. XML offers flexibility and a degree of naked-eye readability, but storing data in an implementation-specific manner can offer significant optimisations.

      A key example is relational data. Here, XML is inefficient, not because of verbose tags or other bloat, but because of its flexibile structure.

      Take it away, Joel:

      How does a relational database implement SELECT author FROM books? In a relational database, every row in a table (e.g. the books table) is exactly the same length in bytes, and every fields is always at a fixed offset from the beginning of the row. So, for example, if each record in the books table is 100 bytes long, and the author field is at offset 23, then there are authors stored at byte 23, 123, 223, 323, etc. What is the code to move to the next record in the result of this query? Basically, it's this:
      pointer += 100;

      One CPU instruction. Faaaaaaaaaast.

      Now lets look at the books table in XML.

      <?xml blah blah>
      <books>
      <book>
      <title>UI Design for Programmers</title>
      <author>Joel Spolsky</author>
      </book>
      <book>
      <title>The Chop Suey Club</title>
      <author>Bruce Weber</author>
      </book>
      </books>

      Quick question. What is the code to move to the next record?

      Uh...

  26. Element vs Tag? by cifey · · Score: 1

    I guess an Element would correspond to an object wheras a Tag would highlight a data item in an document. Also, for a webserver serving xml and xslt or css can't you put the processing load onto the client?

    --
    Hello Cruel World
    1. Re:Element vs Tag? by Anonymous Coward · · Score: 0

      It's nested elements, attributes, and content that have meaning--tags merely tell the parser where the element boundaries are. In SGML-defined grammars some tags could even be omitted from the markup, and the presence of the element would be inferred anyway.

  27. i second.. by Hooya · · Score: 3, Informative

    i use XML for a lot of things and it's been quite decent. but on the other hand, we're using dual pentium IIIs for trivial stuff that was running fine on a PII with c/c++ app without XML.

    the fact is that XML is just marshelling and unmarshelling of all computational data to and from strings thereby negating fast numerical performance that a CPU inherently has. you want to add two numbers? create a string representation, pass it around thru a bunch of parsers/transformers as strings then finally convert it back to the number it really is then add then convert it back to string for passing it around all over again... what a waste.

    1. Re:i second.. by GeckoX · · Score: 1

      Ahh, I see, so your problem with XML is that it is a really slow math processor?

      Right tool for the job. I don't believe I've EVER heard somebody suggest that one should remove some heavy-duty number crunching from a c++ app and stuff it into XML...

      On the other side however, ever tried stuffing a family tree into a relational database? Or doing large quantities of text processing in c++?

      --
      No Comment.
  28. speek kills... by Broadcatch · · Score: 2, Insightful

    ...resource hogs.

    While I'm not an XML zealot, I like the clarity it can bring to many domains of practice. Regarding the performance hit, get a faster computer! If you don't have a fast enough one yet, wait a year.

    Lisp was shunned in the past primarily for speed reasons, too. Now the main reason many don't like Lisp is because they don't understand advanced software engineering concepts and write poor Lisp code.

    --

    The antidote for misuse of freedom of speech is more freedom of speech.
    -- Molly Ivins

  29. Re:comparing to Scott's famous c++ book!? by millette · · Score: 0, Offtopic

    Who's giving mod points today? Just look at that!

  30. Opposite for me by Felonius+Thunk · · Score: 1

    I'm finding lots of little applications that were using a database or text-file scheme with relatively little data and I've been converting them to xml files for storage and lookup. It's been a performance improvement everytime, often a huge one. There is presumably some size/schema complexity point where this gain turns around the other way (the cost of loading and parsing a doc vs. creating a db connection to parse data), but it's been a big win for me so far.

  31. eXtensively Meandering Language. by Anonymous Coward · · Score: 0

    Hmmm...I've entertained the idea of morphing the incoming XML from a tree to some other graph. There's also the idea of building up a representation of just the nodes and having pointers to the actual data, with a dictionary to reduce the size. Remember, just because it starts out as XML doesn't mean it has to stay that way.

  32. W3C by sielwolf · · Score: 4, Informative

    Browse the Technical Reports, Recommendations and Proposed Recommendations at W3C as there are a lot of DTDs and Schemas there. I found a DTD for generic simulation representation there. There's quite a bit if you take the time to look.

    --
    What is music when you despise all sound?
  33. Re:5 years in the business... WHERE??? by mikewolf · · Score: 1, Insightful

    i have been in the business for 4 years now, and i use XML on a daily basis.

    not only is it a powerful media for representing (and caching) hierarchy/tree-based data, extensions like XSLT providing tremendous advantages in transforming data for a variety of other purposes (you probably hated lisp/scheme based language, too).

    While programming language based on XML at first sound a little strange, combining an XML based programming language with XSLT could be super powerful, especially with concepts like code generation...

  34. I don't care for XML by Anonymous Coward · · Score: 0

    Wait, before you shoot me let me explain myself. I've tried to view some webpages that are XML-based, but all that shows up in the freakin' browser is the source code. I've ditched HTML 4.0 and use XHTML 1.0 instead, but I don't know about full-blown XML. The only time I've seen XML used properly is when you look at the source code for an MSN Messenger Saved Contacts list, and that isn't a webpage! Could someone please tell me what XML does exactly and where XML would be useful?

    1. Re:I don't care for XML by Anonymous Coward · · Score: 0

      I'm sure it's been said a dozen times already by now. Let's see... a word processor format (like in OpenOffice or Microsoft Office 2003), a web page format (yes, XHTML is XML), an instant messaging protocol (Jabber), several databases, and so on.

  35. L is for Lousy... by Anonymous Coward · · Score: 0

    "But using XML correctly is tough. I've written and discarded more DTDs and schemata than I care to admit because they were seriously flawed. Getting it right is important and very, very hard."

    Then its the wrong tool. If you find yourself writing comments like "This is really tricky code here", then you need to rewrite it! Use a different algorithm, use a different tool.

    "XML looks simple, and in some ways it is. But in so many other ways it is not simple at all"

    Hmm, just like Visual Basic...I cannot say how many VB apps I've seen (and fixed) that started out simple, and rapidly became quagmires of code.

    And how many out there use validation at all? I'd say very few, because no one mentions Xerces lib at all.

    And config files, simpler parsing like 'property=value' is easier and faster.
    A Gnome config file that has 4 tags, and 1 tag had 80! attributes is just stupid. Yet this is how people use XML.

    And finally, if XML is soo great, why are there half a dozen competing markup technologies out there in Freshmeat?

    JoeR

    1. Re:L is for Lousy... by gbrayut · · Score: 2, Insightful

      >And config files, simpler parsing like 'property=value' is easier and faster.
      A Gnome config file that has 4 tags, and 1 tag had 80! attributes is just stupid. Yet this is how people use XML.

      There are many cases where a simple property=value is much better then full scale XML, but when used correctly XML can be much more efficient.

      Take your everyday INI file, containing simple property=value strings. Sure it works, but all those properties have other information as well such as a description, data type, valid parameters, default settings... you get the point.

      Try adding that into an INI file and you will end up with a mess. XML can be used to incorporate all the additional information into one file and in doing so program configuration user interfaces can be dynamically created.

      Most programs add and remove features with every release and it is convenient to store settings in an XML file so that interfaces to those settings can be dynamically generated. Simply populate a list box or table with the name/value property pairs, have a text area display the description for a selected property, and have input data validated to the corresponding input parameters and data type.

      It might take longer to plan, but if implemented correctly it can save time and confusion. In the end, it will be a larger file, but if done correctly that data actually means something!

  36. The main issue with XML is performance-Closure. by Anonymous Coward · · Score: 0

    So basically you're complaining about closure(1).

    BTW INI files have that as well. It's called a carriage-return.

    (1) It most certainly isn't the more percise levels of discrimmination, because other formats can do the same, and I don't hear people complaining as much about them.

  37. Your forgetting a key reason for XML by BillsPetMonkey · · Score: 1

    For performance, EDI is definately better

    Well, hang on. There's the cost factor. When you take into account Value Added Network (VAN), storage and interconnection fees, plus the usual per-kilocharacter fees, XML suddenly performs much better - the bandwidth to send it is greater but if you have an FTP server then it's not even your bandwidth at issue. The cost per invoice/order is MUCH less even when development fees are taken into account and therefore performance is higher.

    EDI is a pain in the ass to debug too. Missing tags or misplaced required fields - oh joy. Start counting those plusses +++000+75000+?:+++ already.

    --
    "It's not your information. It's information about you" - John Ford, Vice President, Equifax
  38. Sick of XML? Try YAML. by Chromodromic · · Score: 3, Interesting

    Reading through the posts on this board, I tend to agree with the criticisms about XML. It's a big dreadnought of a specification when, in most cases, a nice light corsair or even single-seat fighter would do the trick. Still, I would normally be inclined to say of XML what is said about Democracy: it's the worst system out there, except for all the others.

    Then I found YAML. Long and short, YAML is very lightweight, eminently readable, easy to use (parsers exist in multiple languages) and a pleasure all kinds of projects that require data serialization. Where XML branches off into other types of uses, like XSL programming, YAML doesn't really compete. I find this to be a strength, actually, because once you've used YAML and seen it in action, XSL seems like a big, fat add-on. But for those that rely on XSL and other things, YAML won't do the trick.

    But if all you need is data serialization in a compact, easy-to-read, easy-to-use package -- and this, in my opinion, is by far what XML is most used for -- then YAML is great. Give it a shot.

    As for XML. I used to hate it with a passion. Now I still hate it, but I'm less passionate. The creators of XML are ambitious people, and they tried to do something in that spirit. It works, basically and XML doesn't deserve *all* the bad press it gets.

    --
    Chr0m0Dr0m!C
    1. Re:Sick of XML? Try YAML. by Anonymous Coward · · Score: 0

      YAML is nice, I use it for a couple things, but I find it harder to edit by hand. Why? It's very picky about punctuation and spacing. For instance, if your data has a "special" character in it, you must remember to put single quotes around it. then if you have single quotes in it too, you have to deal with those, etc.

      XML is much simpler, just angle brackets, quotes, and ampersand have to be escaped, everything else (pretty much) goes in unchanged if you are using UTF-8.

      So YAML requires a little more "thought" to write by hand but if you are simply using it for, e.g., data serialization, it's great and very easy to read.

    2. Re:Sick of XML? Try YAML. by oren · · Score: 1

      YAML is nice, I use it for a couple things, but I find it harder to edit by hand. Why? It's very picky about punctuation and spacing. For instance, if your data has a "special" character in it, you must remember to put single quotes around it. then if you have single quotes in it too, you have to deal with those, etc.

      If you use single quotes, all you have to worry about is quoting single quotes - no other character. If you don't want to bother quoting any character, use a literal:

      literal: |
      It's nice, anything goes*&%^$%#$@!

    3. Re:Sick of XML? Try YAML. by oren · · Score: 2, Interesting

      XML and YAML have different "sweet spot" domains, though you can apply both technologies outside their intended domain.

      XML is great for "documents" - text documents, that is. XML does an admirable job seperating "content" from "markup" which can be used to drive "presentation". It really is a big improvement over SGML. Things like DocBook, and CSS stylesheets, make XML the choice for writing documents.

      YAML is great for "data" - data structures, that is. YAML directly maps to common application data structures, so the result is more readable for both humans and computer programs. It is still very new, but is gaining acceptance, and IMVHO is the way to serialize data.

      Sure you can use XML for data (lots of people do) and YAML for documents (the YAML spec is written as a YAML document, just as a test of how far this can be pushed). But in both cases you are using the technologies outside their intended domain and suffer the consequences. It is all about using the right tool for the job.

      XML was never designed for data, it is an "Extensible Mark Up Language" for crying out loud. Promoting it as the end-all be-all solution for serializing data is strange - it is like promoting the use of the C++ programming language for writing scripts (it is all "programs", right?).

      In contrast, YAML Ain't Markup Language - it was designed specifically for data, and is very good at what it does. Just as the world has mostly come to accept that "system languages" and "scripting languages" are different animals, it will discover that "document formats" and "data formats" are different animals - hence the need for both XML and YAML.

      (I'm one of the YAML spec authors, so the above reflects about 33% of the "official YAML position" :-)

  39. Read it on Safari by Bishop923 · · Score: 1
    <karmawhore type="shameless">
    If you have a Safari account, you can read it Here
    </karmawhore>
    1. Re:Read it on Safari by Dazhel · · Score: 1

      Thanks for the link!
      If I had mod points I'd spend 'em here. I was just thinking that once I finish reading the comments I'll head on over to Safari and see if it's available...

  40. .INI files by Tim+Ward · · Score: 1

    You can represent tree structured data dead easily in .INI files (as long as your API for parsing them can enumerate sections and keys, not just ask for ones by names you already know).

    Actually there's nothing forcing you to stop at trees; you could represent arbitrary directed graphs in .INI files without any trouble, other than that of remembering that your application needs to avoid running round loops forever.

    1. Re:.INI files by GeckoX · · Score: 1

      Imagine that, you can represent just about anything in a flat text file!

      Now, what do the contents of this ini file mean and how shall I edit it to do what I want it to?

      [fido.ini]

      (Contents don't matter for the point to be made)

      --
      No Comment.
  41. Re:comparing to Scott's famous c++ book!? by ViolentGreen · · Score: 1

    -2???

    I don't understand that either. I don't think it deserves to be modded up but it sure as hell doesn't deserve a -2.

    --
    Not everything is analogous to cars. Car analogies rarely work.
  42. What are you talking about?-Dictionary. by Anonymous Coward · · Score: 0

    Try parsing the compressed form. The redundancy can be used to your advantage.

  43. LOL, Parent is FUNNY by Zo0ok · · Score: 0, Redundant

    Since I am not a moderator today I use the filthy "mod parent up" trick instead. Mod parent up and mod me down!

  44. Re:comparing to Scott's famous c++ book!? by millette · · Score: 1
    somebody's got it against me *looking over shoulder*

    Well at least I know it's not just my imagination... Anyway, this isn't the place to discuss moderation issues I was told. I simply wanted to thank you for undertanding my pov, so there, "Thanks VioletGreen" :)

  45. One way to improve it. Don't use it. by wdavies · · Score: 2, Insightful
    Ok, maybe I'm missing a point, but the next time I see an XML file like this...
    <RECORD NAME=".." ADDRESS=".." AGE = "..">
    <RECORD NAME=".." ADDRESS=".." AGE = "..">
    <RECORD NAME=".." ADDRESS=".." AGE = "..">
    <RECORD NAME=".." ADDRESS=".." AGE = "..">
    instead of this
    ..\t..\t..
    ..\t..\t..
    ..\t..\t..
    I am going to go nuts. Yes, XML is an improvement for truly hierarchical or repeating data, but efficient it isn't and a pain in the butt to use with AWK or anyone of a million Unix utilities. The one downside I have on ESR's Art of Unix is that while espousing how clean is with pipes and text, he then starts waxing lyrical about XML... Winton
    1. Re:One way to improve it. Don't use it. by Anonymous Coward · · Score: 2, Insightful

      The whole idea with XML is that it will catch the error when a user or script writes "RECROD" in one place, or forget a space. Your AWK script will likely just crash without an explanation or miss a record if the user e.g. forgets the carriage return between lines.

      And just assume that six months after releasing your program you realize it would be very useful with an "OCCUPATION" field to. What do you do now? Maintain a separate collection of databases for each generation of your software?

      The XML-database would be guaranteed to be backwards comaptible, in contrast to your simple solution.

      The reason why XML is good in "the real world" is quite simple: Programmer time is expensive. Testing is expensive. Compatibility between versions is important, but expensive to maintain. Storage is cheap. CPU-power is cheap.

    2. Re:One way to improve it. Don't use it. by JohnnyCannuk · · Score: 2, Interesting

      Well, duh, if you are using XML for non-heirarchical data, then your using it wrong.

      On the other hand if it looked more like this:

      &ltRecords&gt
      &ltRECORD id = .. NAME=".." ADDRESS=".." AGE = ".."/&gt
      &ltRECORD id = .. NAME=".." ADDRESS=".." AGE = ".."/&gt
      &ltRECORD id = .. NAME=".." ADDRESS=".." AGE = ".."/&gt
      &ltRECORD id = .. NAME=".." ADDRESS=".." AGE = ".."/&gt
      &lt/Records&gt

      and if the tag was nested in something else, then xml is appropriate.

      At the risk of sounding trite "right tool for the job".

      I am currently working on an EDI application where the highly structured and hierarchical nature of our data makes it perfect for xml. Add in good tools and searching capabilities (Like XSLT for transforming the raw structure to something else or XPath for searching it) and you have a very powerful data exchange that is platform and language neutral.

      But just as you wouldn't use VB to program kernel modules or device drivers, you wouldn't (and shouldn't) use XML for everything, just because it's cool and new.

      I am always amazed by the XML luddites on /. The same folks who insist that obscure languages like Haskell, Dylan or Eifel are "better than ${your language here} why doesn't anybody else use it". will still insist on transmitting and storing data in language and platform dependant binary files or in non-self describing data structures such as:

      ..\t..\t..

      ..\t..\t..

      ..\t..\t..

      As for it not being efficient, well that really depends on what you mean by efficient. If you mean that it is slow to read, then you have chosen the wrong parser. Not a fault of the markup itself. Perhaps the design of your document is inefficient. But If you want a way to efficiently exchange self-describing data between applications written on different plaforms in different languages, then use XML.

      Or come up with something better.

      --
      Never by hatred has hatred been appeased, only by kindness - the Buddha
    3. Re:One way to improve it. Don't use it. by Anonymous Coward · · Score: 0

      I doing work in an industry that makes extremely heavy use of CSV and TSV files, and quite frankly it sucks ass. Virtually every file needs to be manually "munged" before it can be imported into whatever system.

      At least XML specifies how mutiple values can be stored, and defines the character set and encoding rules. Those two features alone are worth the bloat.

    4. Re:One way to improve it. Don't use it. by GlassHeart · · Score: 1
      Yes, XML is an improvement for truly hierarchical or repeating data, but efficient it isn't and a pain in the butt to use with AWK or anyone of a million Unix utilities.

      Agreed, so design new utilities that understand XML. Take /etc/passwd or any CSV file for example. If you grep it for a user whose first name is "Richard", you'll also find somebody whose last name is "Richards" or lives on "Richard Street". An XML sensitive grep equivalent could understand a command like this:

      xmlgrep "firstname" "Richard"
      and return better results without intimate knowledge of the file format (which column corresponds to which field, etc).
  46. Benefits of XML by xswl0931 · · Score: 1

    Portability is just a by product of being a standard. The real benefit of XML is structure and extensibility. config and ini files are more easily parsed if they were in XML. Current parsers may be slow, but that doesn't mean new parsers would not be more performant.

  47. ID and IDREF, meet the previous poster by holygoat · · Score: 2, Informative

    e.g. IBM's take.

    You can link between XML entities quite easily.

    Also consider that RDF, which describes directed graphs, is quite easily expressed in XML; there's nothing to say that you can't describe a graph and reference actual elements with IDREFs. I don't think you've really thought about this.

  48. speek kills...Line blurring. by Anonymous Coward · · Score: 0

    Don't forget binary data can be tagged as well. Think generator code. When the parser sees a particular tag (warning binary program ahead) it can run the generator and append the results to the tree. Kind of like the power of a function that generates PI vs a full listing of PI.

  49. XML is just tagged s-lists. by BrittPark · · Score: 4, Insightful

    XML is highly overrated and generally over-used. Admittedly XML + CSS is better than html, but beyond that its only reasonable use is as a generalized syntax for configuration files, and as such does a good job, or at least I've had success using it that way in the past. Many (if not most) of its other uses are just poor program design. Soap is an extremely silly idea. Why use XML for a marshalling syntax for RPC? It's slower, bulkier, and just a bad choice in comparison to a binary marshalling mechanism. Now as a syntax for an RPC's IDL XML makes a lot of sense, but not as a transport.

    Glad to get that off my chest. I have a bitter history with XML. I was the first person at my former company to bring XML in as a uniform configuration file format for our product, but then found myself a couple of years later forced into adding XML specific features to the filesystem that was the core of our company's product. I spent a week thinking about the idea, and concluded that it was a bad one. Thus followed a long (and fruitless) battle with management to scratch the plan. The end result was a technically nifty but useless set of features. The work remains unreleased for lack of customer interest. At least I get a bit of "I told you so." pleasure.

  50. more reviews of this book by zontroll · · Score: 3, Informative

    VeryGeekyBooks has more reviews of this book.

  51. He missed a couple, IMHO by alispguru · · Score: 2, Informative

    And one of them is Just Plain Wrong, also IMHO.

    Here are two heuristics for good XML design that I dearly wish more people would take to heart:

    1. If processing any text field requires parsing, Something Is Wrong, and you probably need to break it apart into more elements/subelements.

    The only exceptions to this rule are fields that are numbers, or maybe date/time stamps that adhere to ISO standards.

    2. If you're using attributes, You'll Wish You Hadn't In The Future.

    Attributes are supposed to be the way XML seperates metadata from data. The problem with them is that they are also "leaves" of the XML tree, and intended to be simple, flat text. If you ever need more complex structure in attribute metadata, you're screwed - you must either violate rule 1 above, or move the data out into elements, totally breaking your old structure. Just don't use them, OK?

    --

    To a Lisp hacker, XML is S-expressions in drag.
  52. Same Review on Amazon by Scarpux · · Score: 1

    This is the same review that is on Amazon.com.

    It is the first customer review.

    --
    -- This is not a sig
  53. Sure: by rodentia · · Score: 5, Informative
    <?xml version="1.0" ?>
    <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN"
    "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/sv g10.dtd">

    <svg>
    <line x1="50" y1="50" x2="300" y2="300"
    style="stroke:#FF0000;
    stroke-width:4;stroke-opacity:0.3;"/>
    <line x1="50" y1="100" x2="300" y2="350"
    style="stroke:#FF0000;
    stroke-width:4;stroke-opacity:1;"/>
    </svg>>
    --
    illegitimii non ingravare
    1. Re:Sure: by Anonymous Coward · · Score: 0

      Very good. But how do you store this?

    2. Re:Sure: by rodentia · · Score: 1

      As a base-64 encoded binary object. Not everything belongs in brackets; I believe that's the point of the book.

      --
      illegitimii non ingravare
  54. at the root of the problem. by guet · · Score: 1

    uh. caching?

  55. Some, not most by rodentia · · Score: 1

    XPath is not inherently a pig. Many API's handle XPath with aplomb, usually building an alternative data structure behind the scenes for access. XSL usually wants the whole tree but many implementations optimize this out unless large structures are being reorganized.

    Use the context, Luke.

    --
    illegitimii non ingravare
  56. Well, double dumbass on us.... by vt0asta · · Score: 1
    XML is just text! If the XML parser is slow, write a faster one! Figure out where the bottlenecks are! Don't give me this XML is slow crap. This is slashdot - you're supposed to be a geek. If you don't like XML, fine, but come up with a geeky reason not to like it, not some problem whose solution is just to roll up your sleeves and do some hacking!

    Oy! :')
    XML may not be slow, but when it's used as a network protocol like SOAP and that craziness called XML-RPC it sure is. An XML parser is unjustifiable overhead to a lot of people no matter how you slice it. Making it suck less, by optimizing it more isn't the answer. Geeks also analyze technology based on merit, XML should solve more problems than it creates, agreed?

    --
    No.
  57. userland XML by rodentia · · Score: 1

    Interestingly, XML was originally intended as a userland technology, bringing the strength of SGML to the web, fixing what was broken in HTML (the last great userland data format). The game has lost sight of the goal a bit, I think, which is the root of much of the kvetching this topic generates.

    Frankly, ERH is a great writer and has good insights into the use and abuse of markup. This book is one of the things that was missing while the pro/anti-XML hype trains were picking up steam.

    --
    illegitimii non ingravare
  58. Why should XML be text? by Crayon+Kid · · Score: 1

    That's because people somehow seem hung up on XML having to be text. No, having it gzipped doesn't count, I'm talking about XML at parsing time. Why should XML be text? So humans can write the stuff by hand and read it with ordinary text viewers? What is that about? Wasn't it supposed to be machine readable in the first place? Isn't 99% of its use supposed to be at the hands of automated parsers and middleman tools that take the strain from the human? There's no reason why it couldn't be a nice binary format with all kinds of tricks (standardized ones, mind you) shoved in it to make parsing and modification faster and more efficient.

    --
    i ate crayons when i was a kid and now i have two braincells and the blue ones taste nicer
    1. Re:Why should XML be text? by Trejkaz · · Score: 2, Informative

      From XML 1.0:

      The design goals for XML are:

      1. XML shall be straightforwardly usable over the Internet.
      2. XML shall support a wide variety of applications.
      3. XML shall be compatible with SGML.
      4. It shall be easy to write programs which process XML documents.
      5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
      6. XML documents should be human-legible and reasonably clear.
      7. The XML design should be prepared quickly.
      8. The design of XML shall be formal and concise.
      9. XML documents shall be easy to create.
      10. Terseness in XML markup is of minimal importance.

      I believe you're questioning point 5 while bitching about point 10.

      If you want a binary tree representation, check out ASN.1. It has commonly been used as a binary interchange format for the same sort of data, and XML can be mapped to ASN.1 using a schema and a bit of patience.

      --
      Karma: It's all a bunch of tree-huggin' hippy crap!
    2. Re:Why should XML be text? by Trejkaz · · Score: 1

      Evidently I mean point 6, not 5, testament to how hard it is to count

    3. elements in a tiny textbox. ;-)
    --
    Karma: It's all a bunch of tree-huggin' hippy crap!
  • when I was young. . . by rodentia · · Score: 1

    1) Programming languages based on XML

    Yeah, but code *generation* with XML is the cat's pyjamas.

    2) XSLT

    You clearly haven't tried it, or did not use it as intended. Do you have any experience with other functional languages? I work almost exclusively with XSLT at the moment and wouldn't have it any other way.

    3) SOAP

    is butt-stupid, I admit. But hey, ninty-odd percent of the beef this topic has generated can be fixed with a glance at the book being reviewed.

    --
    illegitimii non ingravare
    1. Re:when I was young. . . by pong · · Score: 1
      2) XSLT

      You clearly haven't tried it, or did not use it as intended. Do you have any experience with other functional languages? I work almost exclusively with XSLT at the moment and wouldn't have it any other way

      My beef is not with XSLT as a transformation language, my beef with it is that it uses XML syntax! XML is easy to parse for machines not for humans, to me it looks like line noise and I have to mentally filter out a lot of superflous syntax that makes it easy if you only know 1s and 0s. XSLT itself is fine, it is just a mess to look at even with proper syntax highlighting. It should have been implemented as a proper transformation language, but the designers got caught up in the XML hype and decided that XML syntax was a meaningful "interface" to present to the XSLT developers/users. Bad call! - My opinion.
    2. Re:when I was young. . . by jnana · · Score: 1
      No. The reason that XSLT is an XML vocabulary is so that it can be generated using the very same tools that you use for generating any other XML -- namely, a parser and a stylesheet processor.

      I routinely use XSLT to generate XML, XHTML, XSLT, Schematron, and an XML-based pipelining language, among other things -- all at runtime. It is extremely well suited for these tasks, and if the angle-bracket-averse folks had their way, I would have to use a different type of parser and transformation engine for parsing, transforming, and creating most of these languages -- including XSLT. If you're only using XSLT occasionally, or only for straight XML->XML/XHTML transformations, then I can understand initial frustration, but when you have used XSLT in more than a couple of different domains, you'll see the advantage of an XML-based serialization, for XSLT itself and also for others such as XSD or RelaxNG (try generating and manipulating DTDs at runtime if you want to see what I mean).

      If you want the best of both worlds, then something like RelaxNG's RNC is a wet dream -- two serializations, one XML, one non-XML. You deal with the compact syntax, and it gets transformed into the XML syntax for the machine.

  • Effective Series was Great` by the0ther · · Score: 1

    Hope this title does not dilute the strength of the "Effective" brand. I know that the Scott Meyers book and the Java book they put out was also killer. I'm skeptical that XML can be effective in any fashion. Doubtful that this book will change that opinion of mine.

  • piffle by rodentia · · Score: 2, Interesting

    Bandwidth is an order of magnitude more limiting than tree parsing, egg. That and the facilities the tool vendors decorate their stuff with. Of course its not free, what is?

    SQLXML and most other value-adds are bull. Your business objects should optimize the hell out of their DB access and return XML. XML is messaging and presentation tier glue. Read the book.

    --
    illegitimii non ingravare
  • funny! by Anonymous Coward · · Score: 0



    Laugh, moderator, laugh!

  • XSLT Rocks! by Anonymous Coward · · Score: 0

    Yes, I've tried XSLT. It's different, and takes some adjustment. And I use it extensively. It's the right tool for the job. Maybe you use Windows, but in Unixland that's the rule of thumb.

    Anyhow, if you really choose to build around XSL there are WYSIWYG XSL template generators, so you can write application logic in your language of choice that spits out XML, and off-load the pretty-print work to a Dreamweaver fanatic (or in this case, XML Spy or your XSL editor of choice).

  • XPipe by Anonymous Coward · · Score: 0

    XPipe is an ambitious project to migrate the usefulness of pipes and text streams to XML. The meat of it is a process to break-up tree transformations into small steps.

  • XML is very fast by Doug+Merritt · · Score: 4, Interesting
    XML is heavy weight ... ...see a huge drop in performance. This is due to the fact that parsing XML blows and eats up copious amounts of CPU and memory.

    That's because everyone uses slow XML parsers. Some years ago at one of the then-top 5 web portals I was unhappy with the standard SAX/DOM parser in use; it was ridiculously slow (and buggy).

    So I wrote a new one. Parsing XML became one hundred fold faster! I timed it quite carefully.

    Other people in this thread are saying "of course XML is slower than binary formats, it's 3 times bigger." But a factor of 3 in performance is nothing, considering some of the advantages.

    A slowdown of 100, on the other hand, is absurd.

    I don't know why people don't rebel against this and make faster XML parsers the widely-used ones; for whatever reason, apparently everyone continues using slow parsers.

    At any rate, no, XML is not slow. It's just a simple, easy to parse format, for which IBM and others have written very, very slow parsers.

    And everyone just assumes that it has to be slow. Sheesh, why should an XML parser be slower than a C++ compiler??? Come on.

    --
    Professional Wild-Eyed Visionary
    1. Re:XML is very fast by Anonymous Coward · · Score: 0
      So I wrote a new one. Parsing XML became one hundred fold faster! I timed it quite carefully.

      that is a pretty bold claim. it's either a lame troll, or you're god. I've worked with several parsers and know the code of a couple different java implementations. the only way I can a custom parser beating SAX or Microsoft's XMLTextReader is one using binary encoding, or specialized for flat models. There are numerous excellent xml parsers out there. this article by Dennis M. Sosnoski does a pretty good job of comparing parsers. there are also numerous research papars comparing C/C++ parsers to java. None of them showed a 100x difference. If you not full of hot air, prove it!

    2. Re:XML is very fast by Doug+Merritt · · Score: 1
      that is a pretty bold claim. it's either a lame troll, or you're god.

      I'm not trolling, I'm not exaggerating, and although I'm pretty good at making things run extremely fast, last time I looked I was only a mortal. It's true I fixed someone's bug recently and he did say I was a god, but I'm pretty sure that was just hyperbole. :-)

      I mentioned "several years ago", "SAX/DOM", and IBM -- we were using the SAX/DOM package originally written by IBM (this was at go.com, if that helps), and it appeared to be practically an industry standard at that time, at least from what multiple people told me.

      Yet it was buggy and slow -- which annoyed me at the time, and here people are claiming that whatever XML parsers they are using (most of the comments didn't mention), they're slow today too...that's what I was responding to.

      Nowhere did I claim that I have one that is 100 fold faster than whatever is fastest today; I haven't followed the different XML parsers as they've been introduced. I don't even know what was the fastest available when I did this 5 years ago; after some unsuccessful searching I gave up and wrote a new one, because it's trivial (I'm a compiler guy, among other things).

      The point I'm making is simple: either XML parsers are very fast...which I know is possible, because I created one...and people's complaints are unwarranted, or else they are using XML parsers that are much slower than they should be, and they should switch.

      So if the XMLbench URL you provided leads people to much faster parsers, great. If they don't need to switch, then they don't need to complain, either.

      Again, the point is that XML is not inherently slow. That point stands.

      This is also just common sense. Look at the syntax. It's practically trivial. Related technologies like Xpath, XSLT, etc etc may have their own issues, but that's not the same as saying "XML is slow". Again, it's not.

      As I recall I timed it at 3 megabytes of XML parsed per second on a Sparc desktop workstation of whatever model was popular 5 years ago...I forget. It could have been made even faster; I'm not claiming that what I did was as fast as possible. It was enough for it to improve on the IBM DOM/SAX parser.

      P.S. Speeding things up 100-fold (when they're dreadfully slow to start with) is not uncommon for programmers who specialize in such things -- usually by changing the underlying algorithm e.g. from quadratic to linear, sometimes by avoiding virtual memory thrashing, sometimes just by vastly simplifying crufty spaghetti code. Other times it helps to understand cpu architecture, assembly language, compilers, operating systems...many purely application-level programmers don't, and hence don't see where the code they're writing has inefficiencies.

      On the other hand, sometimes it's impossible to speed things up...if they were written by a guru to start with, then there's nothing left to improve.

      P.P.S. The post I'm responding to suggested looking at XMLbench at xmlbench . A glance at the first page makes it look like a reasonable starting place.

      --
      Professional Wild-Eyed Visionary
    3. Re:XML is very fast by ErikZ · · Score: 1

      "I don't know why people don't rebel against this and make faster XML parsers the widely-used ones; for whatever reason, apparently everyone continues using slow parsers."

      Maybe it's becuase there are people out there who've written parsers that are "100 fold faster" and don't make them available.

      Or tell people who to write one.

      Or even point to a faster one.

      --
      Democrats or Republicans. They are both taking us to the same place and they are not afraid of us anymore.
    4. Re:XML is very fast by Trejkaz · · Score: 1

      Speaking from the world of Java, the fastest parsers (MXP1 and Piccolo) are only around twice the speed of the most commonly used parsers (Xerces and Crimson.) Presumably the ratio of performance would extend to C if MXP1 were reimplemented there, and anyone who does witness "100-fold" performance may well be smoking crack. :-)

      --
      Karma: It's all a bunch of tree-huggin' hippy crap!
    5. Re:XML is very fast by Doug+Merritt · · Score: 1
      "I don't know why people don't rebel against this and make faster XML parsers the widely-used ones; for whatever reason, apparently everyone continues using slow parsers."

      Maybe it's becuase there are people out there who've written parsers that are "100 fold faster" and don't make them available.

      Well said. But I didn't have a choice, management was going through a freakish anti-open-source phase right then, and I was just a contractor with no bargaining power.

      Also, from my point of view, XML parsers aren't rocket science, so until today's discussion I didn't think it was that big of a deal. (I'm still not sure that it is.)

      P.S. I do try sometimes. I donated the first version of the C library (the core of it...string funcs, printf, etc) to Stallman back in the 80s. He threw the entire thing away and had someone else write a new one, and told me only after the fact that he did so -- all because he thought that my decimal-ascii to integer conversion (in printf/scanf) was too slow. I could've fixed it. Annoying.

      Linus has ignored some of my kernel submissions too; he's famous for that. But there are more hurdles to contributing free source than people generally acknowledge.

      --
      Professional Wild-Eyed Visionary
    6. Re:XML is very fast by Anonymous Coward · · Score: 0
      Nowhere did I claim that I have one that is 100 fold faster than whatever is fastest today; I haven't followed the different XML parsers as they've been introduced. I don't even know what was the fastest available when I did this 5 years ago; after some unsuccessful searching I gave up and wrote a new one, because it's trivial (I'm a compiler guy, among other things).

      that's the catch right. It may be 100x faster than parser 5 years ago, but most of the current parsers are pretty optimized. As far as I know XML Pull parser and ElectricXML are the fastest stream-based parsers today. XPP and electricXML should provide comparable performance to compiler based approach. I've thought about writing my own XML parser using a stack based approach, but looking at the performance of current parsers, squeezing 2-3x more performance is going to be tough. I'm guessing XPP and ElectricXML are around 100x faster than xml parsers from 98/99. But those parsers were first generation, so it's not a fair comparison. Life isn't fair :)

      I've written custom parsers in the past to process hundreds of megs of log files in CSV format. Using a stream based approach, I was able to parse over 500Mb of logs in about 3 minutes, which is comparable to 3Mb of XML/second. I've looked at the design and implementation of XPP and it's pretty darn optimized already. XPP v2 is probably one of the fastest parser out there today including microsoft's XMLTextReader. If some one could get another 100x improvement over current parsers, that would be truly impressive.

    7. Re:XML is very fast by Doug+Merritt · · Score: 1
      Ok, cool, so you've looked at the offerings, and you've looked at the source, and at the timings, and you have some interest, apparently, in the fastest possible XML parsing.

      So, what would you say? I know for some applications you'd want XML to be as fast as possible -- but would you call it downright slow, the way so many others commenting on this article have?

      I forgot to mention that I also experimented with a binary version of the XML format, and although of course it was faster, it was nothing like 10 times as fast.

      It seems to me that a lot of people blame XML for being "too slow" without adequately investigating the subject. Some database project uses XML, and it was slow, so it must be XML's fault. That sort of thing.

      I'm not an XML fanatic -- it's just a data exchange format -- but I like the fact that it's human readable. Most of the alternatives people suggest are not.

      --
      Professional Wild-Eyed Visionary
    8. Re:XML is very fast by Anonymous Coward · · Score: 0

      As you suggested, I looked at the link. According to this graph the current crop of parsers can handle 2.9megs in about a second also. so the data would suggest the current crop of parsers are comparable to the results you stated. Since you seem to know much more about parsers than i do. Just as a theoritical excersize, would it be possible for someone to improve the performance another 100x?

    9. Re:XML is very fast by Doug+Merritt · · Score: 1
      [ given the highly optimized current crop of parsers...] would it be possible for someone to improve the performance another 100x?

      If you're talking about XML parsers written in Java and then interpreted rather than compiled into native machine code, maybe, by rewriting in C and then using a slew of optimization techniques on top of that. Maybe. Probably not, even then; if it's using certain inefficient Java constructs, maybe 10 fold to 30 fold faster. 100 fold only if they weren't actually very optimized.

      If it's written in extremely optimized Java compiled to native machine code, it'll still typically be faster yet if it's rewritten well in C, no matter what Sun Microsystem's propaganda says, but not always by very much, and sometimes not at all. Java has certain language features that tend to impose at least a little bit of overhead compared with C, and sometimes a lot of overhead. But not 100-fold kinds of overhead, and maybe only 20%...it varies.

      But if these are already written in C and highly optimized, then I strongly doubt it (except of course by getting faster hardware). In the version I did, I could've made it somewhat faster...optimizing more for the most common cases might have bought a factor of two, with luck. Probably not, but maybe.

      Paying very close attention to fitting the core of it into level 1 cache, using exactly 100% of available registers even if it meant less clear code, trying very hard to avoid branches, and then trying even harder (e.g. multiplying by zero rather than branching, or the equivalent, depending on the cpu under test), and then recoding the innermost loop in assembler language....Typically such things give at least another 30%, and sometimes you can even get 5 five fold... but the level 1 cache isn't all that small and the code isn't all that big...

      I'd actually be shocked if a month's work by a world class guru could make it even 10 times faster. A 100 fold seems exceedingly unlikely; I'd bet serious money against it (but only after looking at the source code of the parser in question first :-)

      Once something has been highly optimized by one competent group of people, a second group of people might have new bright ideas, but they are usually more like 20% faster kinds of ideas, not 100 fold.

      Unless an entirely new algorithm is invented, but I don't see how that would apply here.

      Basically it's very hard to make things fast, and most of the time, a 20% increase in speed is doing very well indeed...unless something had unusual problems to start with -- which paradoxically has always been moderately common, and is even more so now that machines have gotten so fast, thanks to Moore's law. People often pessimize rather than optimize, these days.

      --
      Professional Wild-Eyed Visionary
  • XML is still bigger than JSON :) by Ronin+SpoilSpot · · Score: 1

    JSON - JavaScript Object Notation
    http://www.json.org/

    Enjoy! /RS

  • utopia by 10bt · · Score: 1

    read this book with water and everything in your life will be solved.

  • xml or perl by Anonymous Coward · · Score: 1, Interesting

    Recently I was developing a pseudo file system and was using xml to store the metadata (ie date, name, link references, permissions, etc.). The chief advantage of using xml was that the data files were text and could be readly edited and read. However they need to be accessed often and performance was a dog. My boss saw what I was doing and recommended I use perl syntax to represent the hierarchal data and use Data::Dumper and Safe::rdo. I did and performance improved several times while still retaining the advantages of text. For example (using a nominal order record) instead of

    <order>
    <customer>
    <name>
    <fname>Bill</fname>
    <lname>Brune</lname>
    ...
    </name>
    </name>
    <customer>
    </order>
    <manifest>
    &nbs p; <item>
    <id>209</id>
    <title>Grapes of Wrath</title>
    <qnt>1</qnt>
    <unit_price>$10.75</unit_price>
    &nbsp ; </item>
    ...

    would look something like ( compacted to avoid the lameness filter).

    order => {
    customer => {
    fname=>'Bill'
    lname=>'Brune'
    ...
    manifest => [ { id=>1,
    title=>'Grapes wrath',
    qnt=>1
    unit_price=>$10.75
    },
    {
    ...
    }

    The added advantage is that you can also add code to such as

    { 'timestamp'=> scalar localtime,
    'pid'=> getppid,
    ...
    }

  • Re:Server load could be at the root of XML's probl by anarxia · · Score: 1

    Actually server load is the reason I moved to xml. I generate the site with xslt stylesheets and I serve static pages that are updated with a simple 'make'. I get the benefit of custom tags, automatic rss feeds and more, while the server serves static pages (so the users get the pages fast).

    Even if you serve your pages as xml and xsl, your bandwidth usage will decrease or increase depending on how well you designed it and the number of pages you serve. In most cases xml pages will be shorter because you will not need to include boilerplate code in every page (menus etc) and the xsl stylesheet will be cached on the user side so you will not need to serve it very often.

    The biggest benefit is that you separate presentation from content so you can change your layout dramatically and you don't need to update a single page by hand.

  • What are you talking about?-If I'm Lion I'm dying. by Anonymous Coward · · Score: 1, Interesting

    <svg width="160" height="160" stroke="none">

    <polygon fill="#f2cc99" points="28,7 33,3 40,1 47,2 54,5 60,8 62,5 66,4 71,5 73,11 72,20 66,36 62,43 62,46 60,48 56,51 56,54 62,82 63,100 50,137 53,143 51,150 33,150 30,147 27,140 24,140 21,148 2,148 1,144 2,142 5,137 6,128 2,103 2,98 3,87 4,72 10,51 17,37 13,31 12,28 10,27 6,20 7,14 7,9 12,5 16,3 21,3 25,5 28,7 28,7 28,7"/>

    <polygon fill="#e5b27f" points="57,32 54,30 55,33 53,31 53,34 51,31 51,34 50,32 50,35 48,33 48,36 50,40 50,38 51,40 51,38 52,39 53,37 54,39 54,37 55,39 56,38 56,39 57,38 58,34 57,32 57,32 57,32"/>

    <polygon fill="#eb8080" points="51,40 53,40 55,40 58,40 57,42 54,44 51,40 51,40 51,40"/>

    <polygon fill="#f2cc99" points="71,92 63,99 56,118 50,140 55,142 63,143 73,137 85,133 94,115 94,104 91,101 85,100 75,100 71,92 71,92 71,92"/>

    <polygon fill="#9c826b" points="22,92 19,96 19,100 23,112 25,130 28,135 32,126 30,128 32,124 33,120 30,123 32,119 29,121 30,118 28,119 30,117 28,117 30,114 31,111 28,111 30,110 27,109 28,107 26,107 27,104 24,106 25,104 26,101 23,103 24,100 22,102 22,99 24,95 22,96 23,94 22,94 22,92 22,92 22,92"/>

    <polygon fill="#9c826b" points="30,145 32,147 32,147 34,145 36,145 37,148 38,149 40,149 43,144 44,148 45,149 46,148 48,143 49,145 49,148 50,148 52,147 53,143 54,144 52,150 51,151 38,151 34,150 30,148 30,145 30,145 30,145"/>

    <polygon fill="#9c826b" points="85,100 88,100 91,103 94,108 94,115 90,122 82,133 71,137 68,141 63,143 66,141 67,138 67,136 66,133 62,131 62,129 64,128 66,126 68,126 67,125 68,125 67,123 69,124 68,122 71,122 70,123 71,124 70,124 70,126 68,126 70,128 67,128 67,129 70,131 72,133 73,130 74,133 76,129 76,131 78,128 78,130 80,126 80,128 82,125 82,126 83,124 84,122 88,119 90,115 92,112 91,106 90,104 87,101 85,100 85,100 85,100"/>

    <polygon fill="#9c826b" points="60,82 60,95 60,101 56,107 51,113 48,120 52,120 50,125 47,130 46,135 48,138 53,141 53,136 55,133 58,132 62,131 61,128 61,116 63,108 68,104 71,111 77,100 70,86 60,82 60,82 60,82"/>

    <polygon fill="#9c826b" points="31,51 36,57 38,62 43,66 50,67 56,70 60,82 61,76 56,56 48,59 40,54 31,51 31,51 31,51"/>

    <polygon fill="#9c826b" points="8,23 14,25 15,27 13,28 17,30 16,32 19,32 22,33 18,38 14,32 13,29 10,26 8,23 8,23 8,23"/>

    <polygon fill="#9c826b" points="28,14 27,14 26,11 24,10 22,7 19,7 16,9 12,10 11,12 12,16 15,18 12,18 14,22 16,24 16,28 20,28 22,28 22,23 27,21 30,17 30,16 27,18 28,14 28,14 28,14"/>

    <polygon fill="#9c826b" points="56,30 56,33 57,36 58,42 59,42 62,42 62,34 63,31 62,29 60,31 58,31 56,30 56,30 56,30"/>

    <polygon fill="#9c826b" points="42,18 41,21 43,23 44,25 45,22 42,18 42,18 42,18"/>

    <polygon fill="#9c826b" points="56,19 56,22 58,23 56,25 55,26 54,24 55,21 56,19 56,19 56,19"/>

    <polygon fill="#9c826b" points="39,54 42,52 42,54 43,53 43,54 45,54 45,55 46,54 46,56 48,56 50,56 51,56 53,55 56,53 56,56 50,58 42,58 39,54 39,54 39,54"/>

    <polygon fill="#9c826b" points="39,46 41,48 41,46 44,47 46,47 49,46 51,43 54,44 57,43 56,46 58,47 60,48 58,50 56,50 51,48 45,50 40,50 39,46 39,46 39,46"/>

    <polygon fill="#9c826b" points="59,13 61,14 63,14 61,12 64,12 62,11 64,11 64,10 65,10 65,8 66,9 68,9 67,7 69,8 70,7 70,9 70,9 71,11 71,13 70,15 70,16 70,18 68,20 67,21 66,23 64,27 62,28 62,24 60,20 58,17 58,14 59,13 59,13 59,13"/>

    <polygon fill="#9c826b" points="34,29 36,30 37,30 40,30 42,30 41,32 38,32 35,30 34,29 34,29 34,29"/>

    <polygon fill="#9c826b" points="34,86 32,88 30,93 33,90 31,96 33,94 31,98 32,97 32,102 34,100 34,107 35,102 36,108 36,103 38,108 37,102 38,100 37,101 37,97 36,101 36,96 34,100 35,94 33,98 35,92 33,92 36,88 34,88 34,86 34,86 34,86"/>

    <polygon fill="#ffcc7f" points="37,27 38,29 40,29 42,29 43,26 42,25 40,25 37,27 37,27 37,27"/>

    <polygon fill="#ffcc7f" points="58,26 57,27 57,29 58,30 60,29 62,26 60,25 58,26 58,26 58,26"/>

  • Alternatives to XML by Anonymous Coward · · Score: 0
  • Some people are right by Anonymous Coward · · Score: 0

    XML is a programming language. It is a database. And sometimes it is both at the same time. But it is very rarely ever the best tool for the job.

  • library-Patrons by Anonymous Coward · · Score: 0

    Actually our library has patron-only online access to various databases and books (kind of like safaria). Very nice, and of course we have the usual main-library databases as well.

  • Re:Server load could be at the root of XML's probl by semios · · Score: 1

    I did the same thing for myself. Why pay the cost of dynamic pages if they are static to the server?

    Mine has a makefile at its heart too. Makes me feel all fuzzy.

  • Binary XML? by Trejkaz · · Score: 1

    I guess there is always WPXML. There's a group in the W3C working on the problem outside of the WAP arena already, and then there are ASN.1 mappings such as Sun demonstrated in Fast Web Services.

    --
    Karma: It's all a bunch of tree-huggin' hippy crap!
  • :-O by Trejkaz · · Score: 1

    Did you just use "RDF" and "easily" in the same sentence?

    --
    Karma: It's all a bunch of tree-huggin' hippy crap!
  • Alternatives to XSLT? by Trejkaz · · Score: 1

    Okay, so if XSLT is bad, what is better? Writing different application code for every single language that ever needs to convert XML into another format?

    I'll agree for some things (such as styling web pages), SiteMesh, PHP-Mesh or whatever might be better, but for converting plain XML into XHTML, you can't beat a stylesheet (for the purposes of this statement, CSS does not yet qualify as a stylesheet.)

    --
    Karma: It's all a bunch of tree-huggin' hippy crap!
  • Cool by Trejkaz · · Score: 1

    Cool, let's all adopt that, so we can put viruses in web pages more easily. :-D

    --
    Karma: It's all a bunch of tree-huggin' hippy crap!
    1. Re:Cool by Anonymous Coward · · Score: 0

      Check out Safe.pm

  • Why is plain old text not just plain old text? by j3110 · · Score: 1

    This is posted as "code" because obviously xml is not just plain old text. What's the html formated option for?!? extrans wouldn't work because /. likes to insert spaces randomly for good measure.

    If only there were a native binary format, we wouldn't have this problem to begin with.

    Is it too much to ask from the W3C for a binary encoded xml format? Maybe to make my point I'll start using one character tags in UTF-8 encoding. Put the popular tags in using 8 bits, and the unpopular tags in as 16 bits, then I'll just do xslt when anyone wants a copy. :) Also, by all means never put unneeded whitespace.

    I bet that would shave a good 20-30% off file size and parse time. Too bad you wouldn't be able to read it because unicode a's are different than normal a's but they would look the same.
    <z><y><x>Title</x></y><w>Hel lo World!</w></z>
    is much smaller than
    <html>
    <head>
    <title>Title</title>
    </head>
    <body>
    Hello World!
    </body>
    </html>

    Now if you really want to be naughty to improve performance of parsing you can require tags that give the offset of other tags.
    <z><a>026</a><a>040</a><b><c a="a"/></b><e a="g"/></z>

    That way you can tell where tags begin without parsing the entire file if all you want is just one little peice.

    I don't know if I can be annoying enough myself to actually get someone to make a binary xml counterpart standard, but I'm sure plenty of /.'ers can come up with seem neat ideas. You'd be amazed at what all you can actually call XML. :)

    --
    Karma Clown
  • i third. But... by nazsco · · Score: 1

    I agree on all that... but what if you program already has to work with text data for 99.999% of the time?

    imagine some content management system (like slashdot, blog tools, and such). the only place where you gona store numbers are in dates and user id's. all the rest would be text, text and some more text.

    Well, you still have the performance problem if compared against a proprietary parser written specificaly for your data... but i think that the benefits outcome this easily since you don't have to rewrite the parser (maybe you can tweak it latter, while the data volume is increasing and performance WILL became a problem) and you can easily port your data (assuming other systems also tought xml was useful hehe)

    Repeating what's already said before on this tread zillion of times, it's all a matter of the right tool for the job.

  • R is for wRong answeR by jefu · · Score: 1
    Then its the wrong tool.

    Nonsense.

    Take a look at the xNL standard for specifying names. It's not all that obvious or simple and I even wonder if it is complete - in particular if "MiddleName" might need an "order" attribute in order to specify print order (see below). And while there is no "nee" name specified (for maiden names), there is a "type" specifier for middle name that probably works. Further, there doesn't seem to be any way to define a date range for the timespan in which the name is applicable (though I suspect that was probably considered and moved out into the broader group of DTDs/schema that now encompass xNL).

    As an (extreme but notable) example, Prince Charles has as a full name "Charles Philip Arthur George Windsor". And I suspect the order of the middle names matters. He is titled "Prince Charles, the Prince of Wales". He is also Earl of Chester, Duke of Cornwall, Duke of Rothesay, Earl of Carrick, Baron Renfrew, Lord of the Isles and Prince of Great Steward of Scotland. He also has the rank of Captain (Royal Navy) and Group Captain (RAF).

  • Then lets make a comma-delimited standard by Tablizer · · Score: 1

    XML is standard. It can fit almost any type of data

    Well, then why not make a comma-delimited standard or a relational text standard? Relational can also fit almost any kind of data (in theory).

    http://www.c2.com/cgi/wiki?RelationalAlternative To Xml

  • Re:X is for the Xtensions, M is for the Metadata.. by Tablizer · · Score: 1

    But using XML correctly is tough. I've written and discarded more DTDs and schemata than I care to admit because they were seriously flawed. Getting it right is important and very, very hard.

    Call me a "relational troll" (on second thought, don't), but I think part of the problem is that XML designs tend not to follow relational rules. Relational rules and normalization are fairly well agreed-on are have a lot of experience, history, and some mathematical concepts to back it. Relational normalization is mostly about not repeating information that does not need to be repeated, and not hard-wiring your schema to fit one application/user at the expense of another. True, existing RDBMS implementations don't support dynamic columns very often, but relational theory does not exclude such.

  • Re:5 years in the business... WHERE??? by jnana · · Score: 1
    XSLT is that XML-based programming language, and yes, it's great for generating XML or XSLT or any other language that has an XML serialization.

    People complain about XSLT for the same reason that procedural programmers complain "Lisp sucks" or "OOP sucks" or whatever: laziness and aversion to novelty. XSLT is a great declarative (functional {if you're willing to go through contortions}) language that (in combination with XPath, and other X-technologies) is extremely well suited for manipulating XML. That's it! But isn't that enough?

  • Here's another one: by Anonymous Coward · · Score: 0

    http://www.dublincore.org/

    It seems that over the last 3 years a lot of DTDs where created, it's just that few people want to follow XYZ if it doesn't have W. It's always more fun to create one's own- but this always creates problems in the wider world. In fact, I believe most developer's are a bit wary of impelementing another's XML DTD/tags/attributes/schema etc. because it is not yet HTML 3.2/4.0 or it's not yet HTML - 1996 yet.

    I've seen a very large company that I worked at try and develop there own special Schema/DTD for making the coolest content management system the world's ever seen, yet they didn't have a clue that the Dublin Core/IMSProject.org may have broke some practical ground. It was more like "what are you talking about?", "What's that all about and why is it better than our homegrown stuff", "We can't control it". Never trust standards to those with extreme individual ambitions.

  • You can tag it with this DTD -Digital Repository by Anonymous Coward · · Score: 0

    http://www.imsglobal.org/digitalrepositories/driv1 p0/imsdri_bindv1p0.html

    or www.dublincore.org

    All text is eventually used for learning at some time.

    "Although Z39.50 was developed by the library community to allow searching of bibliographic information and the development of client software that, theoretically, can search any library's catalog, the protocol's extension mechanisms have allowed other communities to take advantage of the features of Z39.50. The definition of bibliographic searching has been extended to include the Dublin Core. Community of interest profiles have been defined for information as diverse as cultural heritage:

    Computer Interchange of Museum Information (CIMI), government and community information: http://www.cimi.org/
    The Government Information Locator Service (GILS) Profile (http://www.gils.net/), and GeoSpatial Data: http://www.blueangeltech.com/Standards/GeoProfile/ geo22.htm "

    About IMS
    The IMS Global Learning Consortium develops and promotes the adoption of open technical specifications for interoperable learning technology. Several IMS specifications have become worldwide de facto standards for delivering learning products and services. IMS specifications and related publications are made available to the public at no charge from www.imsglobal.org. No fee is required to implement the specifications.

    IMS is a worldwide non-profit organization that includes more than 50 Contributing Members and affiliates. These members come from every sector of the global e-learning community. They include hardware and software vendors, educational institutions, publishers, government agencies, systems integrators, multimedia content providers, and other consortia. The Consortium provides a neutral forum in which members with competing business interests and different decision-making criteria collaborate to satisfy real-world requirements for interoperability and re-use.

    ####

    For more information contact

    Marketing, marketing@imsglobal.org
    http://www.imsglobal.org

  • Re:5 years in the business... WHERE??? by akuzi · · Score: 1

    > XSLT providing tremendous advantages in
    > transforming data for a variety of other purposes (you
    > probably hated lisp/scheme based language, too).

    Gah. I'm tired of people comparing XSLT to Lisp or Scheme. Okay XSLT can transform and generate itself just like Lisp, but that's where the similarities end. In almost every other design aspect it is the opposite of Lisp.

    XSLT is an incredibly baroque, verbose language, only useful for a very limited set of trivial XML transformations, (ie. surprise style sheets!) that involve no I/O or complex computations. If you do a lot of this - then maybe it is worth learning, but my experience is that you can hit it's limits very quickly.

    Lisp on the other hand is a incredibly elegant, compact and powerful general purpose language that has been used in almost every application domain imaginable. Maybe the most elegant, clear and powerful single programming language ever invented, where very complex functionality can often be written in an amazingly small amount of understandable code.

  • OT: Perfect Circle by matrix0f8h · · Score: 1

    Your sig rocks.

    I've been listening to that album for the past week nonstop and I think my brain is changing.

  • Written by Dian Fossey? by ader · · Score: 1
    > On these first pages the author started earning my trust and admiration

    ...Next thing I know, I was shot with a tranquiliser dart and woke up on a table with a gloved finger exploring my rectum.

    Ade_
    /

    --
    Big Bubbles (no troubles) - what sucks, who sucks and you suck
  • not a prog. lang.? what about... by Anonymous Coward · · Score: 0

    the infinite loops that xml handles so well!?

  • Several chapters are online by elharo · · Score: 4, Informative

    Nice review. Thanks! It's interesting how many of the comments here relate directly to chapters in the book. For instance, there's a lot of concern about XML's perceived verboseness. This is addressed directly in Item 50, Compress if space is a problem. This chapter and ten others are online at http://www.cafeconleche.org/books/effectivexml/ . Check it out.

  • Intuition and Binary XML by jlusk4 · · Score: 1

    There have been a lot of comments on performance and the possibility of binary formats. A little googling turned this up:

    http://www.xml.com/pub/a/2001/04/18/binaryXML.ht ml

    Summary: you would *think* binary would be a performance boost, but that doesn't seem to be the case.

    John.

  • Re:Here's a link to get the book for under $20 by Anonymous Coward · · Score: 0

    Nice redirect.

  • Re:Sick of XML? by Anonymous Coward · · Score: 0

    There are some XML altenatives