Using XML in Performance Sensitive Apps?

← Back to Stories (view on slashdot.org)

Using XML in Performance Sensitive Apps?

Posted by Cliff on Tuesday July 15, 2003 @06:45PM from the routing-around-the-performance-hit dept.

A Parser's Baggage queries: "For the last couple of years I've been working with XML based protocols and one thing that keeps coming up is the amount of CPU power needed to handle 10, 20, 30 or 40 concurrent requests. I've ran benchmarks on both Java and C#, and my results show that on a 2ghz CPU, the upper boundary for concurrent clients is around 20, regardless of the platform. How have other developers dealt with these issues and what kinds of argument do you use to make the performance concerns know to the execs. I'm in favor of using XML for it's flexibility, but for performance sensitive applications, the weight is simply too big. This is especially true when some executive expects and demands that it handle 1000 requests/second on a 1 or 2 cpu server. Things like stream/pull parsers help for SOAP, but when you're reading and using the entire message, pull parsing doesn't buy you any advantages."

12 of 97 comments (clear)

Min score:

Reason:

Sort:

using DOM by mlati · 2003-07-15 19:00 · Score: 5, Informative

1. I use DOM objects, in this case the MSXML free threaded model, to handle xml strings and read out the string only at the last point.
2. I would also suggest using wstring/string in the STL library as you can reserve string buffers in advance in case you have to handle the XML as strings, that's if your using c++, don't know much about c#/java sorry.

using this method I have manage to push it to ~200 concurrent requests.

mlati
1. Re:using DOM by macrom · 2003-07-16 01:10 · Score: 2, Informative
  
  I am not 100% sure, but I believe the System.Xml namespace in C# uses DOM. Which is sad because an article a few months back in Windows Developer Journal cited a test where MSXML was the slowest parser around. I believe it was Xerces that ran the fastest.
  
  As mentioned above, we use std::wstring as the storage mechanism (which isolates developers from the dreaded BSTR that MSXML uses. Ick.), but beware because that isolates your non-C++ users from the interface. We're looking at moving our business rule-enforcing parser to C# for better compatibility between .NET, COM and pure C++ applications.
Benchmarks, handmade parser... by Bazzargh · 2003-07-15 20:48 · Score: 4, Informative

First off, any chance you could post those benchmarks? 20 requests/second seems low, I'm wondering what the rest of the setup was.

For the first part: we had performance problems on an app where the customer had insisted on xml everywhere. However, in one particularly critical part of the system we were getting hammered by the garbage collection overhead of SAX (its efficient for text in elements, but not for attribute values or element names).

Anyway - we knew what was coming into the system as we were also the producers of this xml at an earlier stage. So we wrote a custom SAX parser that only supported ASCII, no DTDs, internal subsets etc; and wrote it to return element/attribute names from a pool (IIRC we used a ternary tree to store this stuff, so we didn't need to create a string to do the lookup).

It was like night and day. XML parsing dropped from generating 80% of the garbage to about 5% and it just didn't appear on my list of performance issues from then on.

Java strings do a lot of copying, the point is to get yourself as close as possible to a zero-copy xml parser as you can.

You might want to look at switching toolkits entirely as well - GLUEs benchmarks sound a lot better than yours.
Re:XML is just hard to parse by archeopterix · 2003-07-15 22:02 · Score: 4, Informative

It's hard to parse. That takes cycles. You can probably tweak the parsing to make it faster, but that might not get you from 20 concurrent to 2000 concurrent.
In my experience XML isn't hard to parse at all. Basically, you just have to recognize tags (basic regexp) and match opening ones and closing ones (use a stack, Luke).
The problem with perceived XML inefficiency is that many implementations build a whole parse tree in memory - that's slow mostly because of node allocations/deallocations. Removing the intermediary parse tree decreased CPU time per request by the factor of 15 in my application.
AOLserver and tDOM by Col.+Klink+(retired) · 2003-07-16 00:13 · Score: 2, Informative

I'm just going to guess at what your problem is since you didn't really tell us. I'm assuming that your application needs to load the entire DOM tree 20 times for 20 concurrent requests and that's taking either too much CPU or too much memory.
The solution would be to load the DOM in the backend and have front-end applications access it.
You could try using AOLserver as a multi-threaded web server and tDOM as your DOM processor.

--
-- Don't Tase me, bro!
XmlTextReader by MrProgrammer · 2003-07-16 01:58 · Score: 2, Informative

Many have asked about what libraries you are using to get at the XML. Loading up a whole DOM document is indeed quite inefficient.

On the .Net platform, I would suggest using the XmlTextReader class. This class and its bretheren are the parsers underlying Microsoft's DOM implementation, and anything else that needs access to XML. The class is noted for its strong performance advantage over loading a DOM or using XPathNavigator - and it is indeed a very lightweight class. It is certainly not as comfortable to use as the DOM, but neither is it incredibly painful, especially if your documents are relatively simple.

Give XmlTextReader a shot.
Interesting article by f00zbll · 2003-07-16 04:36 · Score: 2, Informative

There's an interesting article that compares the different types of parser and their advantage at a fairly low level. Dennis Sosnoski's article on xml performance was included on IBM's site a while back. It's a worth while read.
I'd have to agree with people's assertion that performance intensive apps should use a custom protocol and preferably binary based or some kind of delayed stream parser that only accesses the XML node when the app calls for it. I believe Sun has an API in the works for XML stream parsing JSR 317. It's too bad the jsr is still in public review phase. I've written custom parser in the past using SAX and it can definitely improve performance if you convert it to an object model. The question is trade off between being generalized and performance.
In the case of a webservice that uses schema, it's going to be hard to get around the performance issue. An obvious solution in situations where XML is required is to send as little as possible and only get the nodes you need. In that respect XPP2 and XmlTextReader help, until you need the entire document and you use the whole document.
S-expressions by toomuchPerl · 2003-07-16 06:02 · Score: 2, Informative

why even bother w/ XML? S-expressions are truly superior, and much easier to parse. You can write an S-expression parser in about a hundred lines of Perl, and there exist decent libraries or bindings for S-expression parsers available for C, Python, Java, Ruby. It's much faster and the overhead is always less.
--toomuchPerl
You picked the wrong tools by Voivod · 2003-07-16 07:28 · Score: 2, Informative

If you are using C/C++ check out gSOAP. It goes real fast, runs on many platforms, and I've used it to talk to Java, PHP, C# etc without a problem. It does about 3000 transactions per second on my little desktop PC. Obviously 100 parallel clients aren't going to get that speed, but it sounds like it will be much faster than what you're using!

http://www.cs.fsu.edu/~engelen/soap.html
Proper Parsing by jkichline · 2003-07-16 08:34 · Score: 3, Informative

I have to agree with many of the comments. The parser you choose is the most important decision. DOM is typically a memory hog and takes time. In my experience the MSXML 4.0 parser is very fast, written in C, etc. DOM is easier to user, but obviously can have some downsides. XML is great for portability and faster development, but performance concerns can arise.

Find out where the bottleneck lies. If you are running an XSLT processor on the server, that will limit your request/sec. I've found that stream XML from the server to a client (such as IE6, gasp) and having the client render to HTML is wicked fast. The XSLT parser in IE renders asynchronously allowing the results to be displayed before the entire doc is loaded. Of course this is MS specific stuff I've experienced, etc.

SAX is faster for grabbing XML events. While writing a web spider, I was parsing HTML using an HTML parser. I switched from that to regex and saw crawl speed increase significantly. It depends if you need to whole XML doc or not.

You may want to try loading the XML DOM once and serialize the binary. You could then ship the binary around town. Macromedia has some tools like this that can send binary objects to a flash client, etc. Limit the parsing.

Another tip... if you have control over the XML schema, you may want to research how to structure XML for performance. I've heard that attribute heavy XML docs are more efficient than docs with embedded data, etc. Also look into some XML tricks like IDs, etc.

Good luck in your pursuit. Choose your parser carefully. If testing turns out negative, you may just want to use some binary data. XML is a wonderful technology designed to aid in system integration, and ease of use... but it comes at a price.
Fastest all-around full-featured XML support libs by aminorex · 2003-07-16 15:24 · Score: 3, Informative

If you really do require full XML support, the fastest libraries are the GNOME libxml et al. See the benchmark results if you don't believe me.
If you can do with basic parsing, the nanoxml and picoxml libraries will put everything else to shame.

--
-I like my women like I like my tea: green-
Biztalk by badfish2 · 2003-07-17 04:36 · Score: 2, Informative

We use Biztalk for a lot of enterprise-level XML parsing, and we get up to 200+ documents parsed per second. Of course, there's a lot of hardware being used - 3 2-processor processing boxes handling the workload, for example. But for a system pushing and pulling messages in and out of a SQL Server database it works pretty well. And these are pretty decently sized documents, doing mapping and using all kinds of functoids and whatnot.

--
"On the Internet, nobody knows you're a dog!" - a dog