Slashdot Mirror


Using XML in Performance Sensitive Apps?

A Parser's Baggage queries: "For the last couple of years I've been working with XML based protocols and one thing that keeps coming up is the amount of CPU power needed to handle 10, 20, 30 or 40 concurrent requests. I've ran benchmarks on both Java and C#, and my results show that on a 2ghz CPU, the upper boundary for concurrent clients is around 20, regardless of the platform. How have other developers dealt with these issues and what kinds of argument do you use to make the performance concerns know to the execs. I'm in favor of using XML for it's flexibility, but for performance sensitive applications, the weight is simply too big. This is especially true when some executive expects and demands that it handle 1000 requests/second on a 1 or 2 cpu server. Things like stream/pull parsers help for SOAP, but when you're reading and using the entire message, pull parsing doesn't buy you any advantages."

17 of 97 comments (clear)

  1. XML is just hard to parse by PD · · Score: 2, Insightful

    It's hard to parse. That takes cycles. You can probably tweak the parsing to make it faster, but that might not get you from 20 concurrent to 2000 concurrent.

    You've got two choices. More processors, which are pretty cheap right now; or a simpler and more specialized language to replace XML.

    1. Re:XML is just hard to parse by clintp · · Score: 4, Insightful
      In my experience XML isn't hard to parse at all. Basically, you just have to recognize tags (basic regexp) and match opening ones and closing ones (use a stack, Luke).
      SHHH! Don't say that too loudly!

      The XML Police that exist in several communities will come down on you like flies on manure. "You can't parse XML in regexps! That's not really parsing! You need to use the standard-flavor-of-the-month XML libraries for your language (which of course, may need dozens of prerequisite libraries)! What about CDATA? DTDs?! Encodings!? OH THINK OF THE CHILDREN!"

      <stage_whisper>But in my experience, most of the time, you're right</stage_whisper>

      --
      Get off my lawn.
    2. Re:XML is just hard to parse by Anonymous Coward · · Score: 1, Insightful

      Why not use a simpler, easier to parse, more general language?

      Sexp parsing libraries exist for Lisp (duh), Scheme, Java, C, Perl, Python.

    3. Re:XML is just hard to parse by andrewl6097 · · Score: 2, Insightful

      Even writing your own parser isn't entirely a bad idea. It depends on your message size. A few months ago, in an all-night hacking session, I whipped up a SAX parser that was over 3 times faster than expat for messages under a certain amount (roughly 200 bytes, IIRC). Often parsers will bog down because they have lots of features most people don't need - like namespaces for instance.

    4. Re:XML is just hard to parse by Viol8 · · Score: 3, Insightful

      In a protocol designed for efficiency you shouldn't have to parse anything at all!
      If some binary protocol was used you'd would for example use 1 char to represent the field types
      another to represent the record types and so forth. If you put all this into a packet that can be DIRECTLY mapped on a C structure you'll
      save god knows how many cycles. I like the way you say you just have to recognise tags. Have you any idea of the amount of
      processing involved in even simple regexp matching?? This is the problem when high level coders try to design low level
      systems, they simply don't have a clue how things really work and assume that the high level procedures/objects that they work with
      are some sort of magic that "just happens" and you can use them everywhere with no performance degradation.

    5. Re:XML is just hard to parse by BlackHawk-666 · · Score: 2, Insightful
      XML is not designed for speed, but for information exchange. Mapping onto a C structure may work well for a single platform and a single compiler but each processor and compiler have their own ideas about ordering of struct members and padding e.g. Intel likes DWORD alignment if available and used to pad as required...not sure about the latest batch of processors and compilers.

      You lose portability between platforms by trying this low level mapping. How well do you thin big endian systems will like to share with little endian ones? Portability, readability and exchangability are the reasons for XML, not flat out speed. That said, we use XSL around here for marking up our web pages and it is lightening fast!

      --
      All those moments will be lost in time, like tears in rain.
  2. Is that using SAX or DOM? by KDan · · Score: 4, Insightful

    It might be of some use if you actually told us what libraries you used, what methods, etc, not just "I tried to parse some XML files". Is that result of 20 concurrent requests using a SAX parser or DOM? Are you using the standard java DOM implementation (slow and bulky), or one of the slicker ones like JDOM, dom4j, etc (there's a bunch you should have a look at). Another thing you could do t o improve performance is to identify the points where you don't really need a DOM (eg you're just reading the values once and discarding) and use a SAX parser instead to fill in a custom class or a hashtable or such.

    Daniel

    --
    Carpe Diem
    1. Re:Is that using SAX or DOM? by Lechter · · Score: 2, Insightful

      First of all, the people who say that you should simply switch to a structured binary protocol, and get at your messages through casting are right. That'll be a lot faster. But if you're stuck with implementing a WebService then you're stuck with XML.

      As for using DOM, I'd argue that you should never use it in a performance critical application. I understand that you need to refer to different parts of the message at concurrently so an event-based parser alone won't work. But what you ought to consider is using a lighter weight representation of your messages than DOM. After all DOM gives you access to alot of information that you really don't need. You might look into XML->object mapping API's like Castor or maybe Betwixt. Or you could just roll your own. That way you could use a quick push parser like SAX to parse the XML, but still have the ability to access all of the message. You might also want to look into the parameters available in your parser, to try and strip it down...maybe turn off validation, DTD's etc...

      --
      credo quia absurdum
  3. java and c#? by Anonymous Coward · · Score: 5, Insightful

    well there's your problem.

    With mod_perl, XML::LibXML, XML::LibXSLT, I EASILY get 100/per second. and my code is shitty.

    what do you do with the XML, do you generate HTML from it with XSLT or what?

    another thing to try: intelligently cache your results in shared memory. you can easily double performance or more.

  4. So don't use XML. by WasterDave · · Score: 2, Insightful

    I don't understand what the problem is here. You're saying that you like XML, but it's slow. Fine, don't use it. It's not like it's the only tool in existence, is it?

    Dave

    --
    I write a blog now, you should be afraid.
    1. Re:So don't use XML. by Knight2K · · Score: 2, Insightful

      I would guess that using XML is to some degree a political issue that can't be avoided. Which is really symptomatic of the age-old problem of the business and technical sides not really listening to each other.

      --
      ======
      In X-Windows the client serves YOU!
  5. Re:Benchmarks, handmade parser... by Anonymous Coward · · Score: 2, Insightful

    What, you mean someone actually does implement all that unicode, DTD, CDATA and other crap into their software? Don't they have anything better to do?

  6. Re:using DOM by DukeyToo · · Score: 2, Insightful

    If you break it down, there are two basic methods of parsing XML - DOM-based or Stream-based. DOM requires the whole XML document to be loaded in memory, and so is inherently bad for scalability.

    Stream-based combined with XPATH processing is the way to go if you want to just get particular elements from the document. Even if you need to parse the whole document, I would still stay with stream-based method.

    --
    Most writers regard truth as their most valuable possession, and therefore are most economical in its use - Mark Twain
  7. Wrong uses of XML by Randolpho · · Score: 5, Insightful

    This is an example of the wrong way to use XML.

    XML is great because it's extensible and a markup language. It's great for storage, configuration files, and certain forms of data transmission (which is just a sub-set of storage).

    What XML is not good for is performance-critical transmission protocols. It's too verbose and too complex, and both are bad for protocols. That is the mistake made by the author of the article. Go with a structured protocol and skip the XML.

    --
    "Times have not become more violent. They have just become more televised."
    -Marilyn Manson
    1. Re:Wrong uses of XML by __past__ · · Score: 2, Insightful
      It's quite funny that you highlight XML being a markup language (or rather, a tookit to build markup languages), and don't even include document markup as something it's good for.

      Despite all the hype behind XML, markup somehow doesn't really seem to be any more hip than in the dark SGML ages. Sometimes I really wonder why all the data-heads try reinventing ASN.1 with more bloat and complexity so hard.

  8. Explain more by vadim_t · · Score: 2, Insightful

    First, what does your program do? Why are you so sure XML takes so much time to process? And, is really XML the best format for your application?

    You could get speed improvements by making things simpler. If XML data takes so much to process on your server then I guess you have two possible problems: Either the amount of data is very big, or you're doing something wrong. You don't really have to use every feature of XML in your program.

    Make sure you also understand what XML is for. Sending bitmaps by transferring gigabytes of <pixel r="10" g="100" b="0" /> is definitely not a good use of XML. For some kinds of data perfectly good formats already exist.

    Also, do you really need XML? If it's something time or bandwidth critical, rolling your own could be easier. Especially if you don't need a lot of interoperation with other programs. Binary protocols are quite easy to make extensible, too. For example, you can send everything in a kind of container. Say, a structure with a char or int for a command ID, and a long for a command length. Then put any data inside. That's just 5-8 bytes per header, and should let you add stuff easily.

  9. Parse it, don't check it by RhettLivingston · · Score: 3, Insightful

    Most of the work in an off the shelf XML parser is verifying that the XML is "good" or matches some schema specification. If its coming from one of your programs and going to one of your programs and you've done reasonable debugging, its good. You just parse it and use it. Not enough has been done to optimize the "trusted" app communications scenario even though in reality, that's probably 95%+ of the actual usage of XML. Very few sites are actually publishing XML that is really getting used by programs and pages other than the ones they've written.

    Parsing it is very easy and quick if you're in full control of the encoding. You can optimize your parser greatly by choosing not to handle the general case, but to instead handle only what your specific encoder generates.

    Use the protocol, pick up the buzz word for your app, but leave the pain of the generalities meant to handle some free data exchange world that is 15 years in the future out. When the semantic net comes about and applications can actually use any XML without needing to be written to use that XML schema, then you can worry about the general case.