Slashdot Mirror


Should A High-Profile Media Website Abandon Java?

newbroom asks: "The company I work for runs a large, high-profile web site with users all across the world and delivers them large amounts of streaming media content plus textual stories. You might guess therefore that this is a news website, frequently updated throughout the day, and delivering content 24x365. No names, or course, for obvious reasons. We have a big, custom, Java content management system (based on a framework from a proprietary vendor as it happens, but could just as well be EJB/J2EE for all that it matters in the context of this argument) and for deployment we run our website using Java app servers on Solaris behind Apache." If you were going to take such a site from 1000 users, to 10,000 users, would you be able to do it using this kind of setup?

"It is all hugely expensive to license and to run, and it's not very scalable. We'd like to up our userbase from several tens of thousands to ten times that number - but the cost of scaling the Java/Solaris infrastructure is not trivial, because the Java servlet architecture costs too much in memory and execution time (creating several 100Ks of in memory objects for each logon is expensive stuff!). On current hardware we can support only 1200-1500 concurrent logins and scaling up requires a new app server (eg 1 processor + 1GB RAM) and a $20K software license for each additional 600-750 concurrent logged in users. And in today's 'cost per active subscriber' economics it doesn't add up - we cannot justify the present cost structure, by any rational measure, even before we try to scale it up.

So we're thinking of chucking it out and replacing it with a largely static site that is generated (written out to cache) from a new, simpler content management system. The few dynamic elements would be assembled using simple PHP scripts, frontending our existing Oracle DB server. We reckon we could serve vastly higher numbers, ten to a hundred times as many, of users on the same (or cheaper!) hardware: and it would be simpler by far to build and maintain and support.

I, personally, believe that the benefits of the Java system (rapid prototyping, development) are not important when large scale deployment is the issue. I am (as a user) fed up with large, poorly performing Java-based websites. My beef is not about Java the language though - it's a question of appropriateness. Fifteen years ago we'd prototype in Smalltalk and then code for deployment in C, and I feel the same applies here. The economics of the noughties do NOT support spending massive amounts of money on web infrastructure, unless the transactional revenue justifies it. Of course, most businesses generally don't justify it, in my opinion.

Our outsourcing partner who supports and maintains the architecture thinks we are crazy. Putting their potential loss of revenue aside they are hugely concerned that we'll not be able to support what we create. They are seriously against this idea.

I remember, prior to Java & the like, supporting simple CGI websites with tens & hundreds of thousands of users off of cheap FreeBSD systems, and we didn't have to pay an outsourced partner to do it.

So what does Slashdot think? What would you do if you, were in the same boat?"

3 of 156 comments (clear)

  1. Your problem is architecture by DevilM · · Score: 5, Interesting

    Seems like your problem is one of architecture and not the underlying platform. You suggest that you would move away from a dynamic site built with J2EE to a generated static site built with PHP. If you really feel having a generated static site is the best way to go then why not leverage your existing Java infrastructure and have it generate a static site instead of server a dynamic one? And if you can levarge your existing code base for that, then writing a new code base could still be done in Java, so I am not sure why you are pointing to Java as the problem.

    With all the above being said, I don't know what is wrong with your system, but it isn't that hard to build a dynamic site in Java that meets your scabilitiy needs. All you need is a good caching strategy and you are set. Generally speaking, a good caching strategy coupled with a dynamic site can lead to as good as or better peformance than a static site.

  2. yes, switch by consumer · · Score: 4, Interesting
    It doesn't have to be to PHP, it could just be to an open source Java platform, but get off your expensive proprietary platform before it drives you into the ground. Java has good enough performance when done well, but most commercial Java frameowrks make it hard or impossible to write anything that isn't bloated, so ditch that thing. If you are good with PHP and your site is reasonably simple that should work fine. So would Python, Perl, simple Java servlets, etc. Static publishing (from a database) is a great idea if you can get away with it.

    Ditch Solaris and go with Linux or FreeBSD on Intel hardware. Amazon and AOL did it and saved buckets of money, so you should feel confident that you can do it too.

  3. How to reduce cost and scale... from my experience by buro9 · · Score: 3, Interesting

    I've worked on some very large sites with concurrent users running into the hundreds of thousands... these range from http://www.btopenworld.com/ through to the UK's Football League clubs and premium content video sites.

    In my experience, Java was not the wisest choice, it was bloated, difficult to maintain (that's one that will rile the pro-Java camp), required too much focus on non-business focus areas (i.e. creating things like session pools and encryption when we should have been focusing on getting the actual business requirements fulfilled), created a object model bureacracy (pure OO with respect to encapsulation? or break the purity of the model because you know in advance that you want 27 objects and you could get them all from one piece of SQL, but this would have presumed knowledge on the internals of the object and thus have broken the rules of encapsulation).

    All in all, Java proved to be the most substantial factor in late deliveries of projects, limited scaleability... and expense (you wouldn't run Java on Windows, and we were running it on some very sizeable Sun boxes). We had several major works at performance improvements, memory caching, singletons to persist seldom-modified data, re-working SQL, etc. But this didn't help dig us out of the hole that we were in.

    As a comparison, we also ran some Windows boxes with ASP 3 code on it... used prolific file system caching, and because of poor OO support abandoned hope of properly creating encapsulation and objects purely... we did use re-usable components in DLL's, and we did do extensive work to cache page parts in both memory and on disk according to the predicted frequency of use.

    Both systems were behind reverse proxy caches... but the Java had the benefit of all pages being cached (as authentication ran in an NSAPI plugin on the proxy), whereas the ASP did not have its pages cached (just the images, styles, etc) as authentication code ran in the pages (it had not moved behind the plugin when I had left the company).

    Yet of these... the ASP consistently performed better on page generation times, concurrent users, etc... even though the ASP boxes were just cheap Compaq servers and the Java boxes were very over-specced Sun servers.

    My experience of all of this led me to the following conclusions... which were ever obvious but merely got re-inforced.

    1) Right tool for the right job. And at the moment that means considering things like PHP, Perl, ASP for web pages... not Java. String manipulation languages and those that are lower overhead are performing better on web sites.

    2) If you do use Java, be prepared to dilute the purity of the object model you create to favour performance. DO NOT get caught in the trap that the object model purity is more important than total performance/maintenance... OO purity does not necessarily equate to maintenance increase... documentation and commenting achieves that more.

    3) Cache everywhere! Parts of pages, generated pages, the images and styles used on pages, the queries in the database.

    4) Control your cache flushing fiercely! Do not allow apps to ever flush anything that you are not sure has to be flushed... wild-card flushing should never occur. If you stay in Java, implement the Observer pattern and persist and serialise data everywhere.

    Ultimately it comes down to architecture... but I have witnessed that Java encourages really strange architecture as everyone starts running after a holy grail of a pure object model.

    I would generally favour not using Java and going for the re-write. Other languages encourage pure string manipulation and control of what you're doing at a far more approachable level.

    Remember that you're only creating web pages:

    1) Query database
    2) Concat string
    3) Echo string
    4) ???
    5) Profit!

    It really isn't hard, and doesn't need rocket science. Look at /., we all love it, and it's on Perl!