Java EE & Streaming Architectures
Amin Ahmad writes "Implementing a streaming architecture on a Java EE application server provides asymptotically better memory performance, and, hence, scalability, than current, widely-implemented, Java EE patterns endorsed by Sun. This article provides a concrete implementation of a streaming architecture and compares its scalability to two other, standard implementations: Remote EJB and Local EJB-based solutions. The implementation based on a streaming architecture comes out the hands-down winner: for example, when sending back 300 rows of data to the client, the Local EJB solution fails beyond 16 concurrent users whereas the streaming solution is still running at 128 concurrent users!
The article includes complete source code and the entire results database for the stress test. I would be interested in hearing your feedback."
The article could say ANYTHING vs. EJBs is faster. I love Java and have successfully built several websites, but I feel EJBs are the antichrist. I avoid EJBs like the plague.
"No matter where you go, there you are." -- Buckaroo Banzai
Anyone interested, please check the source code. If I understand it correctly, in his benchmark, he is not using any EJBs, he simply "simulate" this by adding serialization+deserialization to the code. It's quite optimistic to call this benchmark at all. Is it really that surprising that when I order computer to do X and then do X + something more, that it will be slower than the first case?
Adding layers to architecture is not primarily done to increase performance, but to create clean and easy to maintain design. If the implementation is not performing as required, it should be profiled and only then critical parts should be optimized for performance. If somebody in my team would dare to write code similar to this "streaming architecture" (read: plain old servlet with database access and model object polluted with html tags) it would be his last contribution to the project.
"Little does he know, but there is no 'I' in 'Idiot'!"
> Java vs typical web languages isn't the sort of fight
> Java minds on a performance front. Performance certainly
> isn't the thing that makes a servlet/jsp design suck.
Well said indeed. The bottleneck there is rarely the VM, it's the database queries or the network connections or disk I/O or something else. That's the strange thing about the "x doesn't scale" argument where x == Ruby|Python|Java - once you get a certain number of servers, the speed at which the language opcodes are being processed fades into the background.
> Find someone who switched from Java to flash for performance.
> It's got flexible networking and rendering that wipe the
> floor with flash.
I think Flash has an edge here in that there seem to be more people who know how to do whizzy graphics in Flash than in Java. I mean, my perception is that not many folks have done much working with the Java 2D API.
But Flash is starting to get some good frameworks for user interfaces - witness ActionStep. We're using it for indi and it's working out well.
The Army reading list
Another obvious problem is that he's "testing EJBs" by completely ignoring them. He uses Java 1.4, sure, but the bigger problem is that he's not actually using EJBs at all - he's fetching the data directly from the DB in the servlet, and his "local EJB" returns the data immediately while his "remote EJB" serializes it.
This isn't what EJB does.
This is a retarded "benchmark." A real benchmark might be useful, but it'd require actually figuring out what kind of EJB implementation would make sense on the backend, remote or local. This test is completely invalid, I'm afraid.
Everybody dies.
After reviewing the comments, I realize I could have been clearer as to intent in the article. Unfortunately, most of the comments here miss the mark. Only sonofagunn "gets" the point.
The article compares two classes of algorithms with asymptotically different memory usage:, traditional store-and-forward approaches against a streaming architecture. It is similar to comparing quicksort with bubble sort and saying that one is an (n log n) algorithm while the other is n^2. The point of the practical example is to "prove" that the difference does indeed exist. The actual number mean little, it is their relative values that are of interest. Let me take a sampling of the comments here and try to respond:
> "A new DriverManager and a new db connection for every request? Welcome to 1998."
Not really relevant. The article mentions this was done for convenience, and the fact is that all three implementations aquire the JDBC connection in the same manner. Therefore, relative performance numbers will not be skewed.
> "Even though they return 1000 rows, 50 requests per minute is pretty poor. Voca processes 80 million bank payment per day using Spring [infoq.com]."
Completely irrelevant. This test platform was a core 2 duo with a bunch of other services running. Run it on a big beefed up server and you will get a much higher throughput. Again, the point is the comparison between the numbers obtained for each impl., not the absolute numbers, as sonofagunn pointed out.
> "Anyone interested, please check the source code. If I understand it correctly, in his benchmark, he is not using any EJBs, he simply "simulate" this by adding serialization+deserialization to the code. It's quite optimistic to call this benchmark at all. Is it really that surprising that when I order computer to do X and then do X + something more, that it will be slower than the first case?
What is the point? an EJB call is like a method call+some container lookup overheads. If you were to use EJBs, it would slightly reduce the throughput numbers for the remote and local ejb impls., although it would remain asymptotically similar.
> Adding layers to architecture is not primarily done to increase performance, but to create clean and easy to maintain design. If the implementation is not performing as required, it should be profiled and only then critical parts should be optimized for performance. If somebody in my team would dare to write code similar to this "streaming architecture" (read: plain old servlet with database access and model object polluted with html tags) it would be his last contribution to the project."
I'm calling BS here. Claiming that architectural layers are added primarily to "create clean and easy to maintain design" is ludicrous. That can be one motivation, but not necessarily the only one, and definitely not always the dominant one: For example, in theory, J2EE's decoupling of the web and application tiers is primarily meant to improve scalability (though that rarely turns out to be the case).
> 3. Why wasn't Java 1.5 tested? By definition, Java 1.4 means that you're testing vs. EJB 2.x instead of EJB 3.x. I don't know what changes have been made between the two, as I haven't learned EJB, but I'm assuming there have been some changes between the two, for better or for worse.
I should have mentioned the testbed specs in more detail. This example was tested using Tomcat 5.x running Java 6, milestone 2, with default memory allocation for java.exe (128mb, I think, but I would have to check). However, you can run this test on WAS 3 and JDK 1.2 and you would receive similar results. You could up the memory to 8GB, and, with a high enough concurrency, you would see the EJB implementations fail and the streaming servlet continue to perform. Hell, you could switch to PHP, COBOL, ASM, Perl, or Ruby, and validate that a streaming architecture scales better than a store-and-forward approach.
As sonofagunn points out, streaming architectures are not rocket science, but EJBs preclude (you cannot stream between the application and web tiers using EJBs, AFAIK) their use and therefore, in the Java EE world, their potential utility is often overlooked.