Should A High-Profile Media Website Abandon Java?
"It is all hugely expensive to license and to run, and it's not very scalable. We'd like to up our userbase from several tens of thousands to ten times that number - but the cost of scaling the Java/Solaris infrastructure is not trivial, because the Java servlet architecture costs too much in memory and execution time (creating several 100Ks of in memory objects for each logon is expensive stuff!). On current hardware we can support only 1200-1500 concurrent logins and scaling up requires a new app server (eg 1 processor + 1GB RAM) and a $20K software license for each additional 600-750 concurrent logged in users. And in today's 'cost per active subscriber' economics it doesn't add up - we cannot justify the present cost structure, by any rational measure, even before we try to scale it up.
So we're thinking of chucking it out and replacing it with a largely static site that is generated (written out to cache) from a new, simpler content management system. The few dynamic elements would be assembled using simple PHP scripts, frontending our existing Oracle DB server. We reckon we could serve vastly higher numbers, ten to a hundred times as many, of users on the same (or cheaper!) hardware: and it would be simpler by far to build and maintain and support.
I, personally, believe that the benefits of the Java system (rapid prototyping, development) are not important when large scale deployment is the issue. I am (as a user) fed up with large, poorly performing Java-based websites. My beef is not about Java the language though - it's a question of appropriateness. Fifteen years ago we'd prototype in Smalltalk and then code for deployment in C, and I feel the same applies here. The economics of the noughties do NOT support spending massive amounts of money on web infrastructure, unless the transactional revenue justifies it. Of course, most businesses generally don't justify it, in my opinion.
Our outsourcing partner who supports and maintains the architecture thinks we are crazy. Putting their potential loss of revenue aside they are hugely concerned that we'll not be able to support what we create. They are seriously against this idea.
I remember, prior to Java & the like, supporting simple CGI websites with tens & hundreds of thousands of users off of cheap FreeBSD systems, and we didn't have to pay an outsourced partner to do it.
So what does Slashdot think? What would you do if you, were in the same boat?"
Seems like your problem is one of architecture and not the underlying platform. You suggest that you would move away from a dynamic site built with J2EE to a generated static site built with PHP. If you really feel having a generated static site is the best way to go then why not leverage your existing Java infrastructure and have it generate a static site instead of server a dynamic one? And if you can levarge your existing code base for that, then writing a new code base could still be done in Java, so I am not sure why you are pointing to Java as the problem.
With all the above being said, I don't know what is wrong with your system, but it isn't that hard to build a dynamic site in Java that meets your scabilitiy needs. All you need is a good caching strategy and you are set. Generally speaking, a good caching strategy coupled with a dynamic site can lead to as good as or better peformance than a static site.
Sorry dude, but you're going to get many of these:
For your case, I hope I'm wrong. Just out of curiousity, have you considered, I donno, profiling your application, to see where the time's being spent. Also, how about switching to a more cost-effecitve Java platform? I've heard good things about this little thing called Linux.
But, your current solution does work, right? How much exploration have you done in optimizing the application? Oracle can certainly scale, and you're already willing to strip down the site to mostly static pages. Why throw out all that proven and thoroughly tested code? And if you're outsourcer can't do that, maybe its time to switch partners, not platforms. You have a large investment here, there's a good chance you can save most of it.
Care about electronic freedom? Consider donating to the EFF!
Python is a refactorer's dream. You can transition your Java application to Jython re-using your Java classes while ironing out the bugs and design of the Python code, implementing caching, static HTML generation and the like.
When you're done, swap the JVM out of Jython and run pure Python with debugged code. If Python gives you any performance trouble, write small C-based modules for your frequently used code and wrap it in Python (fairly easy to do).
You should be able to deal with a lot of your scalability issues by putting some kind of cache in front of your system, like Squid.
But it sounds like every page on your site is really dynamic. And thus uncachable.oy
But you want to replace it with a mostly static site, so obviously, not all that dynamic stuff is required.
Before you chuck the baby out with the bathwater:
- Can you revise your existing java site to serve most pages as essentially static?
- If so, will putting some cheap squid cache boxes in front of your main servers do the trick?
This technique really works, if you can do it.
---
I support spreading santorum
On current hardware we can support only 1200-1500 concurrent logins and scaling up requires a new app server (eg 1 processor + 1GB RAM) and a $20K software license for each additional 600-750 concurrent logged in users
I'm afraid your company must seriously consider other J2EE platform, rather than root up your existing architecture.
First of all, fuck SUN. I'm biased, of course, because I'm here to pro-Linux in this case. SUN's J2EE app server is almost the most expensive among their competitors, not to mention the incremental maintenance cost incurred by expensive SUN hardware. Nowaday big corps like IBM and HP offers enterprise support for J2EE on Linux platforms, and their support are M3(24/7) with at least 3 9's maintenance
Also, you don't pay per user for large scale web deployment, you pay per server license. Fuck SUN's sales multiple timesfor not reminding you of better license terms for your new deployment.
I remember, prior to Java & the like, supporting simple CGI websites with tens & hundreds of thousands of users off of cheap FreeBSD systems, and we didn't have to pay an outsourced partner to do it.
You're just going backward in this case. Existance of J2EE platform is to solve various problems with CGI. One of our deployment just switch from CGI to J2EE due to the former behaved unstable when handling high volume requests. Of course, I've been told of many success with CGI, but J2EE seems to fit in in this case.
Besides, I don't understand why you've scale-up problem with J2EE. Scalability is the major advantage of J2EE. In our most current project, we decouple RDBMS(Oracle), Web-Tier(Apache), App-server(9iAS) and EJB containers(OC4J) into 4 seperated Linux cluster pool and one share storage of SCSI raw disks. We could easily scale up our architecture on various requirements.
Instead, try to improve your current solution an bring its cost down:
Good Luck!
Stupidity is mis-underestimated.
Second: profile, profile, profile
Third: well, almost anybody that has used a J2SDK (or JRE) on Solaris knows about its problems. Try to run Volano's benchmark to know more about this. But like any banchmark, please don't believe your software will perform the same way the benchmark does. It is just an indicative.
There is a memo about this problem, supposedly from Sun. If the problem realy exists (I know it does, but you should find it by yourself), you'll know your Solaris servers will not deliver as much transactions as other power processing equivalent servers.
If your concearns are all about costs, you should make tests with x86 solutions. Some big players like IBM and HP will let you make some tests on a test machine (specially if your transition is successful and you let them put your case in an add ;-)
Think of the risks that a rewrite introduces:
you break existing business logic with the new implementation
you build a system that is slower than the existing one
you take way too long to finish it, all the while you have to pay your existing licences
Typically, the argument for a rewrite is that the cost of implementing new functional requirements is higher than the cost of just implementing them in a brand new system. Have you tried optimization? How are you maintaining session state? Do you know what it would take to get your app running in a free container? Have you looked at free/cheap caching APIs?
Further, java may not be the out-of-the-box fastest platform, but there is no reason whatever that it can't scale in an environment designed for it. Yes, you may need to have many smaller machines because of jvm memory issues, but that's exactly what you should want. The ideal situation is when you can say, 'if we need to support 10x the current users, we just need to drop in 10x more app servers.' It's called 'scaling linearly.'
There aint no pancake so thin it doesn't have two sides.
Ditch Solaris and go with Linux or FreeBSD on Intel hardware. Amazon and AOL did it and saved buckets of money, so you should feel confident that you can do it too.
It sounds like you have a shop full of good Java people. While you may want to change how things are run, I would not change languages. This is not based on any love for Java, but the fact that if you have a team of Java programers to get them to the point where they can write top flight code in something else will take time that you can better spend on something else.
But I would consider changing the architecture if that makes sense.
Erlang Developer and podcaster
You have a J2EE (or something like it) based application that is non-portable in both it's host OS and host application server and on top of that doesn't scale too well because of memory/CPU requirements?
Hmmm... somebody should be loosing their job. Either the consultants who built it or the person that approved such a thing.
If you had a true J2EE app that wasn't coded by a team of monkeys on a wild rampage this shouldn't a problem. The "porting" to a new app server should be trivial, if anything has to be done at all. You'd be able to keep the Sun hardware and whatever app server you use on it, while chucking in Linux machines with Bea, WebSphere or maybe JBoss on them. Slap a hardware based load balancer in front of it and viola, horizontal scalability.
I didn't see anybody else take offense to this, but 100k+ per user login memory usage? I might be showing my age, or rather my roots, but that seems excessive. My guess is your app's written like the app I now support. User logs in and everything about them is swallowed up into session (or application) wide collections immediately. The "lazy caching" thing just didn't cross these guy's minds. Of course, in my case neither did the "mark data dirty" thing but that's another matter.
Please, somebody show me 100k worth of data that you would really want on-demand from-memory on a user at any given instant. Just a C struct or something would suffice.
J2EE apps can be bleeding fast ultra lean sons of bitches if you do it up right. It can also be a dog-ass slow memory-hogging bastard. It just depends on who you had at the business end of the whiteboard I guess.
Going the PHP/static generation/caching route isn't neccessarily a bad idea either... but I don't think you should have to do this. I'm seeing the maintence of such a system as a big onus on the system administrators to make sure everything is up all the time... I know of no PHP frameworks out there that would let you drop sessions from one system to another. I've never tried pushing PHP that far though.
If your a system admin such a system might seem ideal... because while the systems and network might be a little "wonky" that's your domain and you feel comfortable supporting such a thing. I can't fault you for that; however I do think the onus is on the application development team. It is their job to make something scalable and construct it in a manner that it should fail over, recover, etc. from anything weird that may go on.
This isn't realistic, but you probably purchased a scalable application toted as portable because it's Java and you didn't get that. Demand that. If they can't deliver boot them out the door and take it inhouse if you must but I see many obstacles in your path on the system admin side alone... and certainly the re-development of it won't be cake walk.
Scalability problems are largely the development team's responsbilities, so long as such a requirement was put forth in the original development. Good system administration can help to reduce their errors along with a good helping of hardware but that's just a bandaid to the real solution.
Just my two cents.
Also, anyone who got suckered into a Broadvision sales pitch for a enterprise solution should be shot^H^H^H^H fired on first site.
I've worked on some very large sites with concurrent users running into the hundreds of thousands... these range from http://www.btopenworld.com/ through to the UK's Football League clubs and premium content video sites.
/., we all love it, and it's on Perl!
In my experience, Java was not the wisest choice, it was bloated, difficult to maintain (that's one that will rile the pro-Java camp), required too much focus on non-business focus areas (i.e. creating things like session pools and encryption when we should have been focusing on getting the actual business requirements fulfilled), created a object model bureacracy (pure OO with respect to encapsulation? or break the purity of the model because you know in advance that you want 27 objects and you could get them all from one piece of SQL, but this would have presumed knowledge on the internals of the object and thus have broken the rules of encapsulation).
All in all, Java proved to be the most substantial factor in late deliveries of projects, limited scaleability... and expense (you wouldn't run Java on Windows, and we were running it on some very sizeable Sun boxes). We had several major works at performance improvements, memory caching, singletons to persist seldom-modified data, re-working SQL, etc. But this didn't help dig us out of the hole that we were in.
As a comparison, we also ran some Windows boxes with ASP 3 code on it... used prolific file system caching, and because of poor OO support abandoned hope of properly creating encapsulation and objects purely... we did use re-usable components in DLL's, and we did do extensive work to cache page parts in both memory and on disk according to the predicted frequency of use.
Both systems were behind reverse proxy caches... but the Java had the benefit of all pages being cached (as authentication ran in an NSAPI plugin on the proxy), whereas the ASP did not have its pages cached (just the images, styles, etc) as authentication code ran in the pages (it had not moved behind the plugin when I had left the company).
Yet of these... the ASP consistently performed better on page generation times, concurrent users, etc... even though the ASP boxes were just cheap Compaq servers and the Java boxes were very over-specced Sun servers.
My experience of all of this led me to the following conclusions... which were ever obvious but merely got re-inforced.
1) Right tool for the right job. And at the moment that means considering things like PHP, Perl, ASP for web pages... not Java. String manipulation languages and those that are lower overhead are performing better on web sites.
2) If you do use Java, be prepared to dilute the purity of the object model you create to favour performance. DO NOT get caught in the trap that the object model purity is more important than total performance/maintenance... OO purity does not necessarily equate to maintenance increase... documentation and commenting achieves that more.
3) Cache everywhere! Parts of pages, generated pages, the images and styles used on pages, the queries in the database.
4) Control your cache flushing fiercely! Do not allow apps to ever flush anything that you are not sure has to be flushed... wild-card flushing should never occur. If you stay in Java, implement the Observer pattern and persist and serialise data everywhere.
Ultimately it comes down to architecture... but I have witnessed that Java encourages really strange architecture as everyone starts running after a holy grail of a pure object model.
I would generally favour not using Java and going for the re-write. Other languages encourage pure string manipulation and control of what you're doing at a far more approachable level.
Remember that you're only creating web pages:
1) Query database
2) Concat string
3) Echo string
4) ???
5) Profit!
It really isn't hard, and doesn't need rocket science. Look at
... we need highly publicised failures, to counter MS marketing.
> Don't overlook maintainability as an asset.
Bingo. Right on the Money. You are my hero.
Maintainability is not an asset to consider, it is by far the most important asset (that and a good user interface.)
I sometimes wish all these Python/PHP/Ruby/whatever dudes would learn that strong typing is an ASSET not a problem. Simply because the compiler checks a lot of structural integrity BEFORE the damn thing runs highlighting load of errors right there.
I know the argument that a type system is only in the way and is not really needed when the program is debugged in any case.
But a type system makes a hell of a difference when you (or your poor successor) needs to change anything later because many (if not most) of the conflicts caused by a change are IMMEDIATELY nailed down by the typechecker. This thing is, typeing bugs are bugs. If you send a number to a thing that expects a database connection Python will moan just as much as JAva. the difference is JAva will moan before you run. PYthon and PHP will not.
Don't get me wrong, I write loads of tings in Ruby and Python. Most of them are small things that do a specific task, adminstrative scripts that sort of things. But for large complex systems, don't get me on a non-typed language.
That said, I wish Java would move into a type system that is much more based on type inference, such as HAskell where the language IS stronlgy typed but you pretty much don't have to worry about it since it will infer almost everything. The parts you have to specify are not too much and you get teh best of both worlds. Strong typing as well as non-cluttered code.
The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
From the parent post: "Anyway, slashdot is the wrong place to be looking for serious solutions to problems like yours."
Maybe so, but Slashdot is a great place to get ideas. Many times Slashdot readers have extremely useful comments because they have unique experiences and are willing to share them.
It sounds like you've got a site written around a very proprietary system, and that your scalability etc. is tied to what that proprietary system can do.
.com that used the same proprietary system. Buy the company for pennies on the dollar and assume their license portfolio.
The solution, therefore, is to get away from the proprietary system. But only if you think you can do better. Either find a better proprietary system or write your own. If you write your own then plan for 'scale out' on lots of servers running something cheap like *BSD or Gnu/Linux, Apache, Tomcat, JBOSS, posgres, mysql, etc.
If you _can't_ get away from the proprietary app, then perhaps you can 'wrap' it in something else. Use static pages, PHP/mod_perl/C++/Lisp/jsp/whatever and a cheap but good database (mysql, postgres). Use these for all of the 'custom' content. Then have them access your 'back end' and dumb down the back end to get rid of everything that is not essential to a data feed. If possible aggregate the 'php' users into a few categories for the CMS to deal with. E.g.: have a 'sports' profile with 10,000 php users accessing a single 'sports' user on CMS.
Try negotiating with the vendor. Perhaps you can present your 'success story' at a gartner symposium or somesuch. Complain about scalability. Demand a linux version. Get them to agree to some unlikely performance guarantee and use that to cut costs down (via penalties). Get some free consulting from them to help fix the problems. Make sure to wear a T-Shirt or use a pen from their major competitor whenever they are around -- much more fun that way.
Find a failed
Another approach is to update your resume and get the heck out of there.
"But actually trying to use m4 as a general-purpose langage would be deeply perverse" --ESR
But a type system makes a hell of a difference when you (or your poor successor) needs to change anything later because many (if not most) of the conflicts caused by a change are IMMEDIATELY nailed down by the typechecker. This thing is, typeing bugs are bugs. If you send a number to a thing that expects a database connection Python will moan just as much as JAva. the difference is JAva will moan before you run. PYthon and PHP will not.
j sp?thread= 7590
Don't get me wrong, I write loads of tings in Ruby and Python. Most of them are small things that do a specific task, adminstrative scripts that sort of things. But for large complex systems, don't get me on a non-typed language.
Please get your terminology right. Python _is_ strongly typed (as you said yourself, it _will_ moan if you try to mix incompatible types).
But, it's dynamically typed + not compiled, therefore it can't complain in the compile stage. But that is why you write unit tests.
And as nice as static typing (which is what you are talking about), it forces you to do all kinds of distracting (at least IMO) typecasts.
See for instance
http://www.artima.com/weblogs/viewpost.
for the scoop on static vs. dynamic and strong vs. weak typing w.r.t. python.
Oh, and you may well be right that for really huge projects, the java handles typing is the way to go, if only because one can't trust all programmers working on the project at any point in time to not shoot themselves in the foot.
But still, your terminology isn't right.
There's a lot of comments here to this effect already, I'm just going to add my voice.
If you have 100s of K per login it almost certainly isn't the platforms fault, and it probably isn't the developer's fault either - all that memory must be going to customize content for the user, which means you can trace the performance problems back to the requirements. (your developers could be crap too, but profiling will tell you!)
If the user gets content which requires a massive amount of customisation on each and every page - and this a requirement - then performance will suck no matter what the platform, as that memory will still need to be used.
I've been through this before with a customer who demanded we try out every app server under the sun to resolve performance problems even though we showed him profiling figures that proved only 1% of the time per request was appserver overhead - 80-85% was in the DB, and the rest was the app code. Because the customer took a "religious viewpoint" that the appserver was wrong rather than believing at the profiling data, we wasted weeks.
You need to profile before you can state that java is the problem - and equally, you need to profile before you can state that it's not.
This isn't the problem.
(creating several 100Ks of in memory objects for each logonThis is.
So what does Slashdot think? What would you do if you, were in the same boat?"Don't know what /. thinks, but I think you need a serius rethink of your application's design. It sounds to me like you're throwing away exactly the advantage that servlets can give you by creating lots of objects for each logon.
Looks like I need to bring Joel Spolsky's excellent article, Things You Should Never Do, Part I, to a new readership.
;^)
The article speaks for itself, but essentially Joel's point is, "If it ain't broke, it's going to take you a heck of a lot longer to rewrite something inferior than you could've ever expected." Old code has tons of lessons learned that you'll never tease out. New code is easy to read and can implement every buzz word you'll find on O'Reilly Net right now, but it won't be battle-tested.
If you're still able to even think about throwing out your old investment and moving to CGI and BSD, however, I'm thinking your site isn't doing much very fancy. If you don't have much customization invested in your propriatary system, what Joel and I are saying is moot, especially at the licensing fees you're mentioning.
I'd also point out the title is very misleading. It's not Java that's the issue -- it's your system's architecture. Java is just as capable as creating a, "largely static site that is generated (written out to cache) from a new, simpler content management system," as language X. This is quite similar to the discussion we had about whether Java is an SUV just a while back (if it is an SUV, btw, that's not a bad thing). Your programmers' skillset is what's most important. If they already have a familiarity with Java, why ditch it?
So, keeping true to the post that says the recommendations here come out our arse, here's another pulled from the same place:
I'd recommend trying to refactor your current codebase to do two things. First, try to implement your static page idea using your current system. Two, take out as much of the crappy, non-scalable system that happens to be written in Java as possible. You don't name the system, but the whole advantage of Java is that it doesn't need to be platform-specific (if done right). Ditch Solaris. Create a server-farm of cheap x86 hardware with Linux or BSD with a JVM installed. Reread your license -- if you have thirty "clients" (new Linux servers) making static pages from one legacy server's dynamic content, can you pay a lower fee?
PS -- Who said Java was good for prototyping? Visual Basic (and vbscript/ASP or *gulp* ColdFusion), sure. REALbasic, sure. Java? Are you folks mad?!!
It's all 0s and 1s. Or it's not.
I've done it with two projects, one was heavily overbloated with EJB, another one was a typical JSP thing. In both cases I've moved to Python+Zope and it was done pretty quickly and smoothly.
Well, I admit, I've done it without Jython, as I've found there was no need for old/new code temporary integration aside of transparent authentification (which was simple - through LDAP). And I've made sure that in the middle of the transiotion no need to share any session objects.
Performance has been improved (shut-up about that common myth that "Zope is slow"), and so has been a memory usage.
So, I know on practice - it's doable.
By the way, I've never found the situation, when you think about re-writing some Python function to C to accelerate your web-server AND Java was fast enough with the same (logically same) function. In a well designed web-system (including templates and database) a web server has no potentially bad issues. Plus, you can always cache something. And that is the same with Python and Java.
Less is more !
See Ace's Hardware articles on how they converted from PHP to Java/Servlets/JSP, it is a blow-by-blow walkthrough that reads like a HOW-TO:
Building a Better Webserver in the 21st Century
SPECmine - A Case Study on Optimization
Scaling Server Performance
It's 10 PM. Do you know if you're un-American?
>I sometimes wish all these Python/PHP/Ruby/whatever dudes would learn that strong typing is an ASSET not a problem. The problem are the people who design that poorly. If one writes one's software to take advantage of dynamic typing *while* making sure that the application logic is intact (e.g. calling DoSomething() only on classes that implement it), one is fine. Just because something can be used badly doesn't mean it will be.
Marxist evolution is just N generations away!