Slashdot Mirror


Should A High-Profile Media Website Abandon Java?

newbroom asks: "The company I work for runs a large, high-profile web site with users all across the world and delivers them large amounts of streaming media content plus textual stories. You might guess therefore that this is a news website, frequently updated throughout the day, and delivering content 24x365. No names, or course, for obvious reasons. We have a big, custom, Java content management system (based on a framework from a proprietary vendor as it happens, but could just as well be EJB/J2EE for all that it matters in the context of this argument) and for deployment we run our website using Java app servers on Solaris behind Apache." If you were going to take such a site from 1000 users, to 10,000 users, would you be able to do it using this kind of setup?

"It is all hugely expensive to license and to run, and it's not very scalable. We'd like to up our userbase from several tens of thousands to ten times that number - but the cost of scaling the Java/Solaris infrastructure is not trivial, because the Java servlet architecture costs too much in memory and execution time (creating several 100Ks of in memory objects for each logon is expensive stuff!). On current hardware we can support only 1200-1500 concurrent logins and scaling up requires a new app server (eg 1 processor + 1GB RAM) and a $20K software license for each additional 600-750 concurrent logged in users. And in today's 'cost per active subscriber' economics it doesn't add up - we cannot justify the present cost structure, by any rational measure, even before we try to scale it up.

So we're thinking of chucking it out and replacing it with a largely static site that is generated (written out to cache) from a new, simpler content management system. The few dynamic elements would be assembled using simple PHP scripts, frontending our existing Oracle DB server. We reckon we could serve vastly higher numbers, ten to a hundred times as many, of users on the same (or cheaper!) hardware: and it would be simpler by far to build and maintain and support.

I, personally, believe that the benefits of the Java system (rapid prototyping, development) are not important when large scale deployment is the issue. I am (as a user) fed up with large, poorly performing Java-based websites. My beef is not about Java the language though - it's a question of appropriateness. Fifteen years ago we'd prototype in Smalltalk and then code for deployment in C, and I feel the same applies here. The economics of the noughties do NOT support spending massive amounts of money on web infrastructure, unless the transactional revenue justifies it. Of course, most businesses generally don't justify it, in my opinion.

Our outsourcing partner who supports and maintains the architecture thinks we are crazy. Putting their potential loss of revenue aside they are hugely concerned that we'll not be able to support what we create. They are seriously against this idea.

I remember, prior to Java & the like, supporting simple CGI websites with tens & hundreds of thousands of users off of cheap FreeBSD systems, and we didn't have to pay an outsourced partner to do it.

So what does Slashdot think? What would you do if you, were in the same boat?"

8 of 156 comments (clear)

  1. Your problem is architecture by DevilM · · Score: 5, Interesting

    Seems like your problem is one of architecture and not the underlying platform. You suggest that you would move away from a dynamic site built with J2EE to a generated static site built with PHP. If you really feel having a generated static site is the best way to go then why not leverage your existing Java infrastructure and have it generate a static site instead of server a dynamic one? And if you can levarge your existing code base for that, then writing a new code base could still be done in Java, so I am not sure why you are pointing to Java as the problem.

    With all the above being said, I don't know what is wrong with your system, but it isn't that hard to build a dynamic site in Java that meets your scabilitiy needs. All you need is a good caching strategy and you are set. Generally speaking, a good caching strategy coupled with a dynamic site can lead to as good as or better peformance than a static site.

    1. Re:Your problem is architecture by Leknor · · Score: 2, Interesting
      I agree with DevilM.

      A J2EE or even a lighter java servlet based solution may not be the best for your needs but it sounds like to me your "big, custom" content management system is at least somewhat bloated.

      Unless your system is very highly customizable by your users you should have all sorts of opportunities for caching and optimizations geared towards scaleability.

      It's not the same but the webmail for UF scales to 2,000-3,000 concurrent users during peak load with only one gig of ram. Unlike a news site, every user is looking at custom content, their mailbox. Except for the login page, no two users see the same thing which prohibits caching.

      Anyway, slashdot is the wrong place to be looking for serious solutions to problems like yours.

  2. your problem is architecture not java by truffle · · Score: 2, Interesting


    You should be able to deal with a lot of your scalability issues by putting some kind of cache in front of your system, like Squid.

    But it sounds like every page on your site is really dynamic. And thus uncachable.oy

    But you want to replace it with a mostly static site, so obviously, not all that dynamic stuff is required.

    Before you chuck the baby out with the bathwater:
    - Can you revise your existing java site to serve most pages as essentially static?
    - If so, will putting some cheap squid cache boxes in front of your main servers do the trick?

    This technique really works, if you can do it.

    --

    ---
    I support spreading santorum
  3. yes, switch by consumer · · Score: 4, Interesting
    It doesn't have to be to PHP, it could just be to an open source Java platform, but get off your expensive proprietary platform before it drives you into the ground. Java has good enough performance when done well, but most commercial Java frameowrks make it hard or impossible to write anything that isn't bloated, so ditch that thing. If you are good with PHP and your site is reasonably simple that should work fine. So would Python, Perl, simple Java servlets, etc. Static publishing (from a database) is a great idea if you can get away with it.

    Ditch Solaris and go with Linux or FreeBSD on Intel hardware. Amazon and AOL did it and saved buckets of money, so you should feel confident that you can do it too.

  4. How to reduce cost and scale... from my experience by buro9 · · Score: 3, Interesting

    I've worked on some very large sites with concurrent users running into the hundreds of thousands... these range from http://www.btopenworld.com/ through to the UK's Football League clubs and premium content video sites.

    In my experience, Java was not the wisest choice, it was bloated, difficult to maintain (that's one that will rile the pro-Java camp), required too much focus on non-business focus areas (i.e. creating things like session pools and encryption when we should have been focusing on getting the actual business requirements fulfilled), created a object model bureacracy (pure OO with respect to encapsulation? or break the purity of the model because you know in advance that you want 27 objects and you could get them all from one piece of SQL, but this would have presumed knowledge on the internals of the object and thus have broken the rules of encapsulation).

    All in all, Java proved to be the most substantial factor in late deliveries of projects, limited scaleability... and expense (you wouldn't run Java on Windows, and we were running it on some very sizeable Sun boxes). We had several major works at performance improvements, memory caching, singletons to persist seldom-modified data, re-working SQL, etc. But this didn't help dig us out of the hole that we were in.

    As a comparison, we also ran some Windows boxes with ASP 3 code on it... used prolific file system caching, and because of poor OO support abandoned hope of properly creating encapsulation and objects purely... we did use re-usable components in DLL's, and we did do extensive work to cache page parts in both memory and on disk according to the predicted frequency of use.

    Both systems were behind reverse proxy caches... but the Java had the benefit of all pages being cached (as authentication ran in an NSAPI plugin on the proxy), whereas the ASP did not have its pages cached (just the images, styles, etc) as authentication code ran in the pages (it had not moved behind the plugin when I had left the company).

    Yet of these... the ASP consistently performed better on page generation times, concurrent users, etc... even though the ASP boxes were just cheap Compaq servers and the Java boxes were very over-specced Sun servers.

    My experience of all of this led me to the following conclusions... which were ever obvious but merely got re-inforced.

    1) Right tool for the right job. And at the moment that means considering things like PHP, Perl, ASP for web pages... not Java. String manipulation languages and those that are lower overhead are performing better on web sites.

    2) If you do use Java, be prepared to dilute the purity of the object model you create to favour performance. DO NOT get caught in the trap that the object model purity is more important than total performance/maintenance... OO purity does not necessarily equate to maintenance increase... documentation and commenting achieves that more.

    3) Cache everywhere! Parts of pages, generated pages, the images and styles used on pages, the queries in the database.

    4) Control your cache flushing fiercely! Do not allow apps to ever flush anything that you are not sure has to be flushed... wild-card flushing should never occur. If you stay in Java, implement the Observer pattern and persist and serialise data everywhere.

    Ultimately it comes down to architecture... but I have witnessed that Java encourages really strange architecture as everyone starts running after a holy grail of a pure object model.

    I would generally favour not using Java and going for the re-write. Other languages encourage pure string manipulation and control of what you're doing at a far more approachable level.

    Remember that you're only creating web pages:

    1) Query database
    2) Concat string
    3) Echo string
    4) ???
    5) Profit!

    It really isn't hard, and doesn't need rocket science. Look at /., we all love it, and it's on Perl!

  5. Slashdot is a great place to get ideas. by Futurepower(R) · · Score: 2, Interesting


    From the parent post: "Anyway, slashdot is the wrong place to be looking for serious solutions to problems like yours."

    Maybe so, but Slashdot is a great place to get ideas. Many times Slashdot readers have extremely useful comments because they have unique experiences and are willing to share them.

  6. Update your resume by smoon · · Score: 2, Interesting

    It sounds like you've got a site written around a very proprietary system, and that your scalability etc. is tied to what that proprietary system can do.

    The solution, therefore, is to get away from the proprietary system. But only if you think you can do better. Either find a better proprietary system or write your own. If you write your own then plan for 'scale out' on lots of servers running something cheap like *BSD or Gnu/Linux, Apache, Tomcat, JBOSS, posgres, mysql, etc.

    If you _can't_ get away from the proprietary app, then perhaps you can 'wrap' it in something else. Use static pages, PHP/mod_perl/C++/Lisp/jsp/whatever and a cheap but good database (mysql, postgres). Use these for all of the 'custom' content. Then have them access your 'back end' and dumb down the back end to get rid of everything that is not essential to a data feed. If possible aggregate the 'php' users into a few categories for the CMS to deal with. E.g.: have a 'sports' profile with 10,000 php users accessing a single 'sports' user on CMS.

    Try negotiating with the vendor. Perhaps you can present your 'success story' at a gartner symposium or somesuch. Complain about scalability. Demand a linux version. Get them to agree to some unlikely performance guarantee and use that to cut costs down (via penalties). Get some free consulting from them to help fix the problems. Make sure to wear a T-Shirt or use a pen from their major competitor whenever they are around -- much more fun that way.

    Find a failed .com that used the same proprietary system. Buy the company for pennies on the dollar and assume their license portfolio.

    Another approach is to update your resume and get the heck out of there.

    --
    "But actually trying to use m4 as a general-purpose langage would be deeply perverse" --ESR
  7. Profile profile profile by Bazzargh · · Score: 2, Interesting

    There's a lot of comments here to this effect already, I'm just going to add my voice.

    If you have 100s of K per login it almost certainly isn't the platforms fault, and it probably isn't the developer's fault either - all that memory must be going to customize content for the user, which means you can trace the performance problems back to the requirements. (your developers could be crap too, but profiling will tell you!)

    If the user gets content which requires a massive amount of customisation on each and every page - and this a requirement - then performance will suck no matter what the platform, as that memory will still need to be used.

    I've been through this before with a customer who demanded we try out every app server under the sun to resolve performance problems even though we showed him profiling figures that proved only 1% of the time per request was appserver overhead - 80-85% was in the DB, and the rest was the app code. Because the customer took a "religious viewpoint" that the appserver was wrong rather than believing at the profiling data, we wasted weeks.

    You need to profile before you can state that java is the problem - and equally, you need to profile before you can state that it's not.