Using XML in Performance Sensitive Apps?

← Back to Stories (view on slashdot.org)

Using XML in Performance Sensitive Apps?

Posted by Cliff on Tuesday July 15, 2003 @06:45PM from the routing-around-the-performance-hit dept.

A Parser's Baggage queries: "For the last couple of years I've been working with XML based protocols and one thing that keeps coming up is the amount of CPU power needed to handle 10, 20, 30 or 40 concurrent requests. I've ran benchmarks on both Java and C#, and my results show that on a 2ghz CPU, the upper boundary for concurrent clients is around 20, regardless of the platform. How have other developers dealt with these issues and what kinds of argument do you use to make the performance concerns know to the execs. I'm in favor of using XML for it's flexibility, but for performance sensitive applications, the weight is simply too big. This is especially true when some executive expects and demands that it handle 1000 requests/second on a 1 or 2 cpu server. Things like stream/pull parsers help for SOAP, but when you're reading and using the entire message, pull parsing doesn't buy you any advantages."

3 of 97 comments (clear)

Min score:

Reason:

Sort:

using DOM by mlati · 2003-07-15 19:00 · Score: 5, Informative

1. I use DOM objects, in this case the MSXML free threaded model, to handle xml strings and read out the string only at the last point.
2. I would also suggest using wstring/string in the STL library as you can reserve string buffers in advance in case you have to handle the XML as strings, that's if your using c++, don't know much about c#/java sorry.

using this method I have manage to push it to ~200 concurrent requests.

mlati
Benchmarks, handmade parser... by Bazzargh · 2003-07-15 20:48 · Score: 4, Informative

First off, any chance you could post those benchmarks? 20 requests/second seems low, I'm wondering what the rest of the setup was.

For the first part: we had performance problems on an app where the customer had insisted on xml everywhere. However, in one particularly critical part of the system we were getting hammered by the garbage collection overhead of SAX (its efficient for text in elements, but not for attribute values or element names).

Anyway - we knew what was coming into the system as we were also the producers of this xml at an earlier stage. So we wrote a custom SAX parser that only supported ASCII, no DTDs, internal subsets etc; and wrote it to return element/attribute names from a pool (IIRC we used a ternary tree to store this stuff, so we didn't need to create a string to do the lookup).

It was like night and day. XML parsing dropped from generating 80% of the garbage to about 5% and it just didn't appear on my list of performance issues from then on.

Java strings do a lot of copying, the point is to get yourself as close as possible to a zero-copy xml parser as you can.

You might want to look at switching toolkits entirely as well - GLUEs benchmarks sound a lot better than yours.
Re:XML is just hard to parse by archeopterix · 2003-07-15 22:02 · Score: 4, Informative

It's hard to parse. That takes cycles. You can probably tweak the parsing to make it faster, but that might not get you from 20 concurrent to 2000 concurrent.
In my experience XML isn't hard to parse at all. Basically, you just have to recognize tags (basic regexp) and match opening ones and closing ones (use a stack, Luke).
The problem with perceived XML inefficiency is that many implementations build a whole parse tree in memory - that's slow mostly because of node allocations/deallocations. Removing the intermediary parse tree decreased CPU time per request by the factor of 15 in my application.