Using XML in Performance Sensitive Apps?
A Parser's Baggage queries: "For the last couple of years I've been working with XML based protocols and one thing that keeps coming up is the amount of CPU power needed to handle 10, 20, 30 or 40 concurrent requests. I've ran benchmarks on both Java and C#, and my results show that on a 2ghz CPU, the upper boundary for concurrent clients is around 20, regardless of the platform. How have other developers dealt with these issues and what kinds of argument do you use to make the performance concerns know to the execs. I'm in favor of using XML for it's flexibility, but for performance sensitive applications, the weight is simply too big. This is especially true when some executive expects and demands that it handle 1000 requests/second on a 1 or 2 cpu server. Things like stream/pull parsers help for SOAP, but when you're reading and using the entire message, pull parsing doesn't buy you any advantages."
It's hard to parse. That takes cycles. You can probably tweak the parsing to make it faster, but that might not get you from 20 concurrent to 2000 concurrent.
You've got two choices. More processors, which are pretty cheap right now; or a simpler and more specialized language to replace XML.
If tits were wings it'd be flying around.
It might be of some use if you actually told us what libraries you used, what methods, etc, not just "I tried to parse some XML files". Is that result of 20 concurrent requests using a SAX parser or DOM? Are you using the standard java DOM implementation (slow and bulky), or one of the slicker ones like JDOM, dom4j, etc (there's a bunch you should have a look at). Another thing you could do t o improve performance is to identify the points where you don't really need a DOM (eg you're just reading the values once and discarding) and use a SAX parser instead to fill in a custom class or a hashtable or such.
Daniel
Carpe Diem
well there's your problem.
With mod_perl, XML::LibXML, XML::LibXSLT, I EASILY get 100/per second. and my code is shitty.
what do you do with the XML, do you generate HTML from it with XSLT or what?
another thing to try: intelligently cache your results in shared memory. you can easily double performance or more.
I don't understand what the problem is here. You're saying that you like XML, but it's slow. Fine, don't use it. It's not like it's the only tool in existence, is it?
Dave
I write a blog now, you should be afraid.
What, you mean someone actually does implement all that unicode, DTD, CDATA and other crap into their software? Don't they have anything better to do?
If you break it down, there are two basic methods of parsing XML - DOM-based or Stream-based. DOM requires the whole XML document to be loaded in memory, and so is inherently bad for scalability.
Stream-based combined with XPATH processing is the way to go if you want to just get particular elements from the document. Even if you need to parse the whole document, I would still stay with stream-based method.
Most writers regard truth as their most valuable possession, and therefore are most economical in its use - Mark Twain
This is an example of the wrong way to use XML.
XML is great because it's extensible and a markup language. It's great for storage, configuration files, and certain forms of data transmission (which is just a sub-set of storage).
What XML is not good for is performance-critical transmission protocols. It's too verbose and too complex, and both are bad for protocols. That is the mistake made by the author of the article. Go with a structured protocol and skip the XML.
"Times have not become more violent. They have just become more televised."
-Marilyn Manson
First, what does your program do? Why are you so sure XML takes so much time to process? And, is really XML the best format for your application?
/> is definitely not a good use of XML. For some kinds of data perfectly good formats already exist.
You could get speed improvements by making things simpler. If XML data takes so much to process on your server then I guess you have two possible problems: Either the amount of data is very big, or you're doing something wrong. You don't really have to use every feature of XML in your program.
Make sure you also understand what XML is for. Sending bitmaps by transferring gigabytes of <pixel r="10" g="100" b="0"
Also, do you really need XML? If it's something time or bandwidth critical, rolling your own could be easier. Especially if you don't need a lot of interoperation with other programs. Binary protocols are quite easy to make extensible, too. For example, you can send everything in a kind of container. Say, a structure with a char or int for a command ID, and a long for a command length. Then put any data inside. That's just 5-8 bytes per header, and should let you add stuff easily.
Most of the work in an off the shelf XML parser is verifying that the XML is "good" or matches some schema specification. If its coming from one of your programs and going to one of your programs and you've done reasonable debugging, its good. You just parse it and use it. Not enough has been done to optimize the "trusted" app communications scenario even though in reality, that's probably 95%+ of the actual usage of XML. Very few sites are actually publishing XML that is really getting used by programs and pages other than the ones they've written.
Parsing it is very easy and quick if you're in full control of the encoding. You can optimize your parser greatly by choosing not to handle the general case, but to instead handle only what your specific encoder generates.
Use the protocol, pick up the buzz word for your app, but leave the pain of the generalities meant to handle some free data exchange world that is 15 years in the future out. When the semantic net comes about and applications can actually use any XML without needing to be written to use that XML schema, then you can worry about the general case.