Java EE & Streaming Architectures
Amin Ahmad writes "Implementing a streaming architecture on a Java EE application server provides asymptotically better memory performance, and, hence, scalability, than current, widely-implemented, Java EE patterns endorsed by Sun. This article provides a concrete implementation of a streaming architecture and compares its scalability to two other, standard implementations: Remote EJB and Local EJB-based solutions. The implementation based on a streaming architecture comes out the hands-down winner: for example, when sending back 300 rows of data to the client, the Local EJB solution fails beyond 16 concurrent users whereas the streaming solution is still running at 128 concurrent users!
The article includes complete source code and the entire results database for the stress test. I would be interested in hearing your feedback."
I am 2 years gradutad in Electrical Marketeering and want to break into java2ee. Please send me the codes. I so love this sight, all storuies very fashinating. Thank thank thankyou all.
Will code for new sig.
I see the following problems with the article.
1. Other than MySQL, it doesn't specify the software in use (it implies Apache Tomcat, but that is not explicitly stated), except...
2. Microsoft Web Application Stress Tool. Pardon me if I refuse to put any faith into tools by Redmond. Particularly since, if Tomcat was in use, MWAST is being used instead of Apache's own ab tool.
3. Why wasn't Java 1.5 tested? By definition, Java 1.4 means that you're testing vs. EJB 2.x instead of EJB 3.x. I don't know what changes have been made between the two, as I haven't learned EJB, but I'm assuming there have been some changes between the two, for better or for worse.
4. What's causing the OutOfMemory errors? If a pair of servers are falling over at 16 simultaneous requests for a 301 row dataset, there's a major problem.
Just some thoughts.
GLaDOS for President 2016! "Well here we are again. It's always such a pleasure." -- GLaDOS, 2011
The article could say ANYTHING vs. EJBs is faster. I love Java and have successfully built several websites, but I feel EJBs are the antichrist. I avoid EJBs like the plague.
"No matter where you go, there you are." -- Buckaroo Banzai
That'd be an insightful comment, but for a few problems.
Java vs typical web languages isn't the sort of fight Java minds on a performance front. Performance certainly isn't the thing that makes a servlet/jsp design suck.
Java vs Flash equally isn't exactly a problem on a performance front. Find someone who switched from Java to flash for performance. It's got flexible networking and rendering that wipe the floor with flash. But flash, evidently, is easier. Java again seems to lose in that fight as it's not tailored enough towards the target audience.
Compile once run anywhere is just the sort of language people in the Grid computing world like to hear. But I guess Grid computing bods aren't worried about performance...
jh
A new DriverManager and a new db connection for every request? Welcome to 1998.
Even though they return 1000 rows, 50 requests per minute is pretty poor. Voca processes 80 million bank payment per day using Spring.
Anyone interested, please check the source code. If I understand it correctly, in his benchmark, he is not using any EJBs, he simply "simulate" this by adding serialization+deserialization to the code. It's quite optimistic to call this benchmark at all. Is it really that surprising that when I order computer to do X and then do X + something more, that it will be slower than the first case?
Adding layers to architecture is not primarily done to increase performance, but to create clean and easy to maintain design. If the implementation is not performing as required, it should be profiled and only then critical parts should be optimized for performance. If somebody in my team would dare to write code similar to this "streaming architecture" (read: plain old servlet with database access and model object polluted with html tags) it would be his last contribution to the project.
With technology knowledge like yours, we can only hope that you're currently flipping burgers somewhere...
The give away is the "Compile once, run anywhere" comment. If you think that I'm going to develop my current project that is being deployed on an AS/400 on an AS/400 workstation (if there is such a thing), you've got to be kidding. Same with the J2ME app in my link, do you think I develop and compile on my phone?
I know I'm feeding a troll, but unfortunately there's probably someone out there who believes this crap.
Bob
Listen to my latest album here
So the moral of this article, if you read and process any decent amount of data, process this data in a serialized fashion; for example use SAX instead of DOM. Also make sure your JDBC connections are tuned correctly to exhibit this 'chunking' behavior or else it won't matter anyways.
From the Article: Lines 8 through 14 open a connection to the database and configure the statement. It is critical to configure the connection and statement so that the database server does not send the entire result set to the application server, but rather sends chunks on demand. Clearly if the entire dataset is sent in a single piece to the application server, we will have nothing to show for our labors! To accomplish this in MySql 5 requires setting useCursorFetch=true and setting the fetch direction of the statement to ResultSet.FETCH_FORWARD. DB2 automatically chunks the data if the statement is scrollable (either ResultSet.TYPE_SCROLL_INSENSITIVE or ResultSet.TYPE_SCROLL_SENSITIVE). Finally, note that normally a connection would be obtained from an application server's connection pool, but for the sake of producing an application that can be deployed easily, this example takes a shortcut. Lines 17-25 loop through the result set, populating a president object and writing it to the page in each pass. This is the crux of the streaming architecture: No more than a single row is kept in memory at once.
"Little does he know, but there is no 'I' in 'Idiot'!"
> Java vs typical web languages isn't the sort of fight
> Java minds on a performance front. Performance certainly
> isn't the thing that makes a servlet/jsp design suck.
Well said indeed. The bottleneck there is rarely the VM, it's the database queries or the network connections or disk I/O or something else. That's the strange thing about the "x doesn't scale" argument where x == Ruby|Python|Java - once you get a certain number of servers, the speed at which the language opcodes are being processed fades into the background.
> Find someone who switched from Java to flash for performance.
> It's got flexible networking and rendering that wipe the
> floor with flash.
I think Flash has an edge here in that there seem to be more people who know how to do whizzy graphics in Flash than in Java. I mean, my perception is that not many folks have done much working with the Java 2D API.
But Flash is starting to get some good frameworks for user interfaces - witness ActionStep. We're using it for indi and it's working out well.
The Army reading list
Another obvious problem is that he's "testing EJBs" by completely ignoring them. He uses Java 1.4, sure, but the bigger problem is that he's not actually using EJBs at all - he's fetching the data directly from the DB in the servlet, and his "local EJB" returns the data immediately while his "remote EJB" serializes it.
This isn't what EJB does.
This is a retarded "benchmark." A real benchmark might be useful, but it'd require actually figuring out what kind of EJB implementation would make sense on the backend, remote or local. This test is completely invalid, I'm afraid.
Everybody dies.
With technology knowledge like yours, we can only hope that you're currently flipping burgers somewhere...
Hahaha.. You have a problem. Because I'm an expert in modern software design and also in getting replies from former CS students who believe that Java is still useful because it was the only language they could wrap their brains around in 1 or 2 semesters.
Let me urge you to look out for more modern and practicable languages as replacements instead of wasting your time by ridiculing yourself.
You still have a lot to learn my young padawan!
Wow, so much ignorance in just one post.
JSP is a nice try to eat into the PHP market.
Nope, different markets. JSP is for people who understand the Model, View, Controller architecture and are capable of building systems around generalised code (POJO's). PHP is ideal for small scale web apps, but that's it - it encourages unstructured code and no separation between presentation and business logic tiers.
But it's so slow and far away from the reality of productive systems that the Apache crew rather forked their server to a Java-only server (Tomcat) to keep their httpd codebase clean and free of anything that is Java.
Tomcat originated with Sun, as a proof of concept web container. The code was donated to the Apache foundation, and there has never been any plan to merge it with the Apache HTTP daemon. People usually proxy from Apache to Tomcat, as they have static content to serve as well, or just to keep web servers in the DMZ behind a single firewall and the web container servers behind a second firewall so that database connections and things like CORBA connections are more secure.
EVen if Java is open source now. There is a reason why Java applets have become almost distinct. Even Flash evidently performs better than an Java applet.
I assume you mean "extinct" rather than "distinct". However, applets have bugger all to do with JSP and webapps. They were a neat way of promoting the Java platform when it came out, but are tiny part of what made Java popular.
And why the hell would anyone still need this "Compile once, run anywhere." nonsense?
Because it's easier to port a virtual machine than a complete toolchain.
JSP is for people who understand the Model, View, Controller architecture and are capable of building systems around generalised code (POJO's). PHP is ideal for small scale web apps, but that's it - it encourages unstructured code and no separation between presentation and business logic tiers.
So JSP is for people who understand Java buzzwords and PHP is like a joke compared to JSP, eh?
> And why the hell would anyone still need this "Compile once, run anywhere." nonsense?
Because it's easier to port a virtual machine than a complete toolchain.
Because it's easier to suck in writing at least portable code and therefore using Java instead of being a real programmer and using PHP (which is portable), eh?
So JSP is for people who understand Java buzzwords and PHP is like a joke compared to JSP, eh?
Smalltalk "buzzwords" actually, that predate Java by years. If you don't know what MVC is, go and look it up. If you still think it's rubbish, then I hope I never get to use any of your code. As for POJO's, they are classes that encapsulate some funtionality (such as user administration for example) but they contain no presentation logic such as a user interface. Multiple interfaces can be created (Swing, JSP, etc) that sit on top of the POJO's - saving you having to refactor the code or duplicate it.
Because it's easier to suck in writing at least portable code and therefore using Java instead of being a real programmer and using PHP (which is portable), eh?
I can write portable code in C or C++, however I have to do a lot of testing. With Java my testing is much more straightforward, as I can even use the same test suites on different platforms (JUnit and JMeter for example). As for PHP, I'll wait until it can do some more of the things that Java can rather than being just a piss-poor web templating language.
This test is not completely invalid! His "EJBs" will actually outperform real EJBs because they're doing less work. He's just illustrating a point in the interface design: an EJB is supposed to have a coarse-grained interface, so it returns all the presidents as an array in a single call. His benchmark demonstrates how such a design can cause memory problems.
The real flaw in his benchmark is that he didn't publish the heap size he's using. We run JVMs with 1.7G heaps around here, and I'll bet his benchmark would able to hold a lot of those arrays with a heap that large, so much that I bet he would saturate his I/O before hitting an OutOfMemoryError.
The streaming design obviously uses less memory, but the design constraints it imposes are not trivial. You would only code this way if you really needed the scalability.
After reviewing the comments, I realize I could have been clearer as to intent in the article. Unfortunately, most of the comments here miss the mark. Only sonofagunn "gets" the point.
The article compares two classes of algorithms with asymptotically different memory usage:, traditional store-and-forward approaches against a streaming architecture. It is similar to comparing quicksort with bubble sort and saying that one is an (n log n) algorithm while the other is n^2. The point of the practical example is to "prove" that the difference does indeed exist. The actual number mean little, it is their relative values that are of interest. Let me take a sampling of the comments here and try to respond:
> "A new DriverManager and a new db connection for every request? Welcome to 1998."
Not really relevant. The article mentions this was done for convenience, and the fact is that all three implementations aquire the JDBC connection in the same manner. Therefore, relative performance numbers will not be skewed.
> "Even though they return 1000 rows, 50 requests per minute is pretty poor. Voca processes 80 million bank payment per day using Spring [infoq.com]."
Completely irrelevant. This test platform was a core 2 duo with a bunch of other services running. Run it on a big beefed up server and you will get a much higher throughput. Again, the point is the comparison between the numbers obtained for each impl., not the absolute numbers, as sonofagunn pointed out.
> "Anyone interested, please check the source code. If I understand it correctly, in his benchmark, he is not using any EJBs, he simply "simulate" this by adding serialization+deserialization to the code. It's quite optimistic to call this benchmark at all. Is it really that surprising that when I order computer to do X and then do X + something more, that it will be slower than the first case?
What is the point? an EJB call is like a method call+some container lookup overheads. If you were to use EJBs, it would slightly reduce the throughput numbers for the remote and local ejb impls., although it would remain asymptotically similar.
> Adding layers to architecture is not primarily done to increase performance, but to create clean and easy to maintain design. If the implementation is not performing as required, it should be profiled and only then critical parts should be optimized for performance. If somebody in my team would dare to write code similar to this "streaming architecture" (read: plain old servlet with database access and model object polluted with html tags) it would be his last contribution to the project."
I'm calling BS here. Claiming that architectural layers are added primarily to "create clean and easy to maintain design" is ludicrous. That can be one motivation, but not necessarily the only one, and definitely not always the dominant one: For example, in theory, J2EE's decoupling of the web and application tiers is primarily meant to improve scalability (though that rarely turns out to be the case).
> 3. Why wasn't Java 1.5 tested? By definition, Java 1.4 means that you're testing vs. EJB 2.x instead of EJB 3.x. I don't know what changes have been made between the two, as I haven't learned EJB, but I'm assuming there have been some changes between the two, for better or for worse.
I should have mentioned the testbed specs in more detail. This example was tested using Tomcat 5.x running Java 6, milestone 2, with default memory allocation for java.exe (128mb, I think, but I would have to check). However, you can run this test on WAS 3 and JDK 1.2 and you would receive similar results. You could up the memory to 8GB, and, with a high enough concurrency, you would see the EJB implementations fail and the streaming servlet continue to perform. Hell, you could switch to PHP, COBOL, ASM, Perl, or Ruby, and validate that a streaming architecture scales better than a store-and-forward approach.
As sonofagunn points out, streaming architectures are not rocket science, but EJBs preclude (you cannot stream between the application and web tiers using EJBs, AFAIK) their use and therefore, in the Java EE world, their potential utility is often overlooked.
3 things about computers: they're alive, they're self-aware, and they hate your guts.
The test is invalid. He's actually doing far MORE work with his "EJB" than a "real" EJB application would. I've written a version of his application, using actual EJBs and a decent architecture. The memory usage is nowhere near "worse" with the EJB version, because the container caches the entities. The scalability he illustrates is rubbish.
Everybody dies.
well said. PHP is great for "toy" web sites (by toy, I mean non-scalable SMB at most).
There's lots of things Java offers that are in addition to the language, which are what really won the enterprise space. C/C++/C#/.NET/PHP/Ruby/ROR do not have it. Basically, it's this - you can write relatively complex code quickly, easily, and have it scale while being HA (High Availability) and highly reliable. That doesn't mean dweeb code monkeys can produce such code, however. The quickly and easily terms are highly subjective and only apply to those of appropriate skill, otherwise such things are still "hard", "difficult", or "non-intuitive".
I wish I'd had mod points, you deserve a few.
The cesspool just got a check and balance.
...on an AS/400 workstation (if there is such a thing)
No, there isn't. Server only.
rd
You may also want to try GlassFish. Updated version of the Java EE 5 SDK was released toady. It is free. Sun's Application Server (9.0 Update 1 Patch 1) based on Project Glassfish is included in the SDK. It contains a performance bugfix that enables record-breaking price/performance on the application tier with SPECjAppServer result of 521.42 JOPS@Standard - see Scott's latest blog for all the details.
Non-scalable like Yahoo, the world's busiest web site, you mean? Yeah, thought so. Don't get me wrong, PHP has many problems[1]. The biggest of them is probably the ease with which an incompetent moron can create terrible code (a trait it shares with Perl). But to claim it doesn't scale is pure religious bigotry rather than being based in fact.
[1] As indeed does Java. But that's a discussion for another day.
"The invisible and the non-existent look very much alike." -- Delos B. McKown
I'll still maintain that PHP is not inherently scalable for applications without major hack type gyrations. The persist to DB method discussed here is something that works for low-volume high latency type applications. While PHP may be great at serving semi-stale or static content, high volume HA systems are not in its forte at this time. Even if you could work around those issues, why force a square peg in a round hole when standard relatively easy to use solutions exist? (Look at Weblogic or Resin to see high volume high performance HA capabilities, and you'll understand one reason why Java holds the sway it does in large enterprises)
PHP certainly does not have a monopoly on the ease with which an incompetent moron can create terrible code. Java, C++, C# all have their hats in that ring as well with the slavish devotion to bad patterns witnessed in all of those. I'll never figure out why someone wants to have 400 cookie cutter classes with massive overlap in code (read as cut N paste) for something that could easily be coded in a 100 lines or less in a single class, all for the sake of following a pattern with "type-safety", which is completely bogus as objects are serialized or marshalled across the wire via binary or XML streams. It merely creates 400 points for 400 errors, versus a single point with a single error. I know which system I'd rather work on.
The cesspool just got a check and balance.