Slashdot Mirror


Should A High-Profile Media Website Abandon Java?

newbroom asks: "The company I work for runs a large, high-profile web site with users all across the world and delivers them large amounts of streaming media content plus textual stories. You might guess therefore that this is a news website, frequently updated throughout the day, and delivering content 24x365. No names, or course, for obvious reasons. We have a big, custom, Java content management system (based on a framework from a proprietary vendor as it happens, but could just as well be EJB/J2EE for all that it matters in the context of this argument) and for deployment we run our website using Java app servers on Solaris behind Apache." If you were going to take such a site from 1000 users, to 10,000 users, would you be able to do it using this kind of setup?

"It is all hugely expensive to license and to run, and it's not very scalable. We'd like to up our userbase from several tens of thousands to ten times that number - but the cost of scaling the Java/Solaris infrastructure is not trivial, because the Java servlet architecture costs too much in memory and execution time (creating several 100Ks of in memory objects for each logon is expensive stuff!). On current hardware we can support only 1200-1500 concurrent logins and scaling up requires a new app server (eg 1 processor + 1GB RAM) and a $20K software license for each additional 600-750 concurrent logged in users. And in today's 'cost per active subscriber' economics it doesn't add up - we cannot justify the present cost structure, by any rational measure, even before we try to scale it up.

So we're thinking of chucking it out and replacing it with a largely static site that is generated (written out to cache) from a new, simpler content management system. The few dynamic elements would be assembled using simple PHP scripts, frontending our existing Oracle DB server. We reckon we could serve vastly higher numbers, ten to a hundred times as many, of users on the same (or cheaper!) hardware: and it would be simpler by far to build and maintain and support.

I, personally, believe that the benefits of the Java system (rapid prototyping, development) are not important when large scale deployment is the issue. I am (as a user) fed up with large, poorly performing Java-based websites. My beef is not about Java the language though - it's a question of appropriateness. Fifteen years ago we'd prototype in Smalltalk and then code for deployment in C, and I feel the same applies here. The economics of the noughties do NOT support spending massive amounts of money on web infrastructure, unless the transactional revenue justifies it. Of course, most businesses generally don't justify it, in my opinion.

Our outsourcing partner who supports and maintains the architecture thinks we are crazy. Putting their potential loss of revenue aside they are hugely concerned that we'll not be able to support what we create. They are seriously against this idea.

I remember, prior to Java & the like, supporting simple CGI websites with tens & hundreds of thousands of users off of cheap FreeBSD systems, and we didn't have to pay an outsourced partner to do it.

So what does Slashdot think? What would you do if you, were in the same boat?"

8 of 156 comments (clear)

  1. Python by costas · · Score: 2, Informative

    Python is a refactorer's dream. You can transition your Java application to Jython re-using your Java classes while ironing out the bugs and design of the Python code, implementing caching, static HTML generation and the like.

    When you're done, swap the JVM out of Jython and run pure Python with debugged code. If Python gives you any performance trouble, write small C-based modules for your frequently used code and wrap it in Python (fairly easy to do).

    1. Re:Python by RevAaron · · Score: 5, Informative

      That is the worst attempt at my-favorite-language-cheerleading I've ever seen. Ok, it's not, but it's still pretty bad.

      1. Writing C for web apps is not a solution. The wrapping tools for Python aren't impossible to use, but they can't perform miracles. Yes, it is very easy to use an external C function for performing some repetitive math function, an FFT or something- but in a data-intensive web app, it really makes no sense. In the case of the poster's problem, he and his team would end up re-writing half of the framework their using in C, giving it Python interfaces. If they were having problem with just Java's raw execution speed, they could just as easily use Java's JNI to interface with C libraries.

      2. No matter good it looks on paper, going from a big system written in Java for one particular framework to a system written half in Python and half in Java doesn't make all that much sense. They'll be dealing with the same bottlenecks, the same bloat- it's all running on the JVM. If anything else, they'd increase the footprint and slow the app down, as they're adding on yet another layer of complexity.

      Yes, I am fully aware that Jython outputs Java bytecode itself, but Sun's Java compiler does a lot better generating efficient Java bytecode out of Java than Jython does. Nothing inherent to Python or Jython, but when you've got a multi-billion dollar project like Java, when you consider what Sun puts into it- then compare that to the miniscule (by comparison) project that is Jython, it'd be absurd to expect the same results.

      I know it's easy to get a little jumpy when the dude mentions PHP and your favorite language is Python, or hell, anything that isn't PHP. You want to come in any say "hey, use my favorite language!" Believe me, I'm wanting to do the same thing, and I could substitute the word "Smalltalk" for "Python" throughout your post, and it'd be just as true; unfortunately, so would my points against it.

      Python and Jython certainly have their places, no doubt. Depending on a couple factors, I may use Python to write my system intiailly, but simply having a language that spit out Java bytecode doesn't mean you have some non-trivial, seamless transition between two system.

      --

      Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
  2. You need to justify your choice of J2EE platform by jsse · · Score: 5, Informative

    On current hardware we can support only 1200-1500 concurrent logins and scaling up requires a new app server (eg 1 processor + 1GB RAM) and a $20K software license for each additional 600-750 concurrent logged in users

    I'm afraid your company must seriously consider other J2EE platform, rather than root up your existing architecture.

    First of all, fuck SUN. I'm biased, of course, because I'm here to pro-Linux in this case. SUN's J2EE app server is almost the most expensive among their competitors, not to mention the incremental maintenance cost incurred by expensive SUN hardware. Nowaday big corps like IBM and HP offers enterprise support for J2EE on Linux platforms, and their support are M3(24/7) with at least 3 9's maintenance

    Also, you don't pay per user for large scale web deployment, you pay per server license. Fuck SUN's sales multiple timesfor not reminding you of better license terms for your new deployment.

    I remember, prior to Java & the like, supporting simple CGI websites with tens & hundreds of thousands of users off of cheap FreeBSD systems, and we didn't have to pay an outsourced partner to do it.

    You're just going backward in this case. Existance of J2EE platform is to solve various problems with CGI. One of our deployment just switch from CGI to J2EE due to the former behaved unstable when handling high volume requests. Of course, I've been told of many success with CGI, but J2EE seems to fit in in this case.

    Besides, I don't understand why you've scale-up problem with J2EE. Scalability is the major advantage of J2EE. In our most current project, we decouple RDBMS(Oracle), Web-Tier(Apache), App-server(9iAS) and EJB containers(OC4J) into 4 seperated Linux cluster pool and one share storage of SCSI raw disks. We could easily scale up our architecture on various requirements.

  3. Re:Boy you're exposing yourself by Zandall · · Score: 3, Informative
    Ask your software engineers to do what a software engineer use to do: verify if the design was made thinking in scalability. If not, it doesn't matter if it's a good design for just two nodes or ten nodes cluster.

    Second: profile, profile, profile

    Third: well, almost anybody that has used a J2SDK (or JRE) on Solaris knows about its problems. Try to run Volano's benchmark to know more about this. But like any banchmark, please don't believe your software will perform the same way the benchmark does. It is just an indicative.

    There is a memo about this problem, supposedly from Sun. If the problem realy exists (I know it does, but you should find it by yourself), you'll know your Solaris servers will not deliver as much transactions as other power processing equivalent servers.

    If your concearns are all about costs, you should make tests with x86 solutions. Some big players like IBM and HP will let you make some tests on a test machine (specially if your transition is successful and you let them put your case in an add ;-)

  4. Re:look before you leap by platypus · · Score: 2, Informative

    But a type system makes a hell of a difference when you (or your poor successor) needs to change anything later because many (if not most) of the conflicts caused by a change are IMMEDIATELY nailed down by the typechecker. This thing is, typeing bugs are bugs. If you send a number to a thing that expects a database connection Python will moan just as much as JAva. the difference is JAva will moan before you run. PYthon and PHP will not.

    Don't get me wrong, I write loads of tings in Ruby and Python. Most of them are small things that do a specific task, adminstrative scripts that sort of things. But for large complex systems, don't get me on a non-typed language.


    Please get your terminology right. Python _is_ strongly typed (as you said yourself, it _will_ moan if you try to mix incompatible types).
    But, it's dynamically typed + not compiled, therefore it can't complain in the compile stage. But that is why you write unit tests.

    And as nice as static typing (which is what you are talking about), it forces you to do all kinds of distracting (at least IMO) typecasts.

    See for instance
    http://www.artima.com/weblogs/viewpost.j sp?thread= 7590
    for the scoop on static vs. dynamic and strong vs. weak typing w.r.t. python.

    Oh, and you may well be right that for really huge projects, the java handles typing is the way to go, if only because one can't trust all programmers working on the project at any point in time to not shoot themselves in the foot.
    But still, your terminology isn't right.

  5. If it ain't broke (& has any measure o complex by mactari · · Score: 2, Informative

    Looks like I need to bring Joel Spolsky's excellent article, Things You Should Never Do, Part I, to a new readership.

    The article speaks for itself, but essentially Joel's point is, "If it ain't broke, it's going to take you a heck of a lot longer to rewrite something inferior than you could've ever expected." Old code has tons of lessons learned that you'll never tease out. New code is easy to read and can implement every buzz word you'll find on O'Reilly Net right now, but it won't be battle-tested.

    If you're still able to even think about throwing out your old investment and moving to CGI and BSD, however, I'm thinking your site isn't doing much very fancy. If you don't have much customization invested in your propriatary system, what Joel and I are saying is moot, especially at the licensing fees you're mentioning.

    I'd also point out the title is very misleading. It's not Java that's the issue -- it's your system's architecture. Java is just as capable as creating a, "largely static site that is generated (written out to cache) from a new, simpler content management system," as language X. This is quite similar to the discussion we had about whether Java is an SUV just a while back (if it is an SUV, btw, that's not a bad thing). Your programmers' skillset is what's most important. If they already have a familiarity with Java, why ditch it?

    So, keeping true to the post that says the recommendations here come out our arse, here's another pulled from the same place:

    I'd recommend trying to refactor your current codebase to do two things. First, try to implement your static page idea using your current system. Two, take out as much of the crappy, non-scalable system that happens to be written in Java as possible. You don't name the system, but the whole advantage of Java is that it doesn't need to be platform-specific (if done right). Ditch Solaris. Create a server-farm of cheap x86 hardware with Linux or BSD with a JVM installed. Reread your license -- if you have thirty "clients" (new Linux servers) making static pages from one legacy server's dynamic content, can you pay a lower fee?

    PS -- Who said Java was good for prototyping? Visual Basic (and vbscript/ASP or *gulp* ColdFusion), sure. REALbasic, sure. Java? Are you folks mad?!! ;^)

    --

    It's all 0s and 1s. Or it's not.
  6. MOD THE PARENT DOWN by axxackall · · Score: 2, Informative
    The parent certanly does not have any experience (or a motivation) of doing right the refactoryng and migration form Java to Python.

    I've done it with two projects, one was heavily overbloated with EJB, another one was a typical JSP thing. In both cases I've moved to Python+Zope and it was done pretty quickly and smoothly.

    Well, I admit, I've done it without Jython, as I've found there was no need for old/new code temporary integration aside of transparent authentification (which was simple - through LDAP). And I've made sure that in the middle of the transiotion no need to share any session objects.

    Performance has been improved (shut-up about that common myth that "Zope is slow"), and so has been a memory usage.

    So, I know on practice - it's doable.

    By the way, I've never found the situation, when you think about re-writing some Python function to C to accelerate your web-server AND Java was fast enough with the same (logically same) function. In a well designed web-system (including templates and database) a web server has no potentially bad issues. Plus, you can always cache something. And that is the same with Python and Java.

    --

    Less is more !
  7. Don't step down by Hard_Code · · Score: 2, Informative
    Java/Servlets can absolutely handle the load. I sincerely question your suggestion to step DOWN to PHP. While PHP is great for small projects, it is pretty MISERABLE at scaling because it has a huge gaping hole of not supporting application persistence. The very thing you DO NOT want to do with PHP, is attach it to a database with lots of SIMULTANEOUS users, because PHP has little or no way of pooling resources (e.g. your database connections will scale in one to one ratio with your users == BAD THING).

    See Ace's Hardware articles on how they converted from PHP to Java/Servlets/JSP, it is a blow-by-blow walkthrough that reads like a HOW-TO:

    Building a Better Webserver in the 21st Century
    SPECmine - A Case Study on Optimization
    Scaling Server Performance

    The move to a Java-based web application marks a giant step forward for our site software. While the "applications" we previously ran on Apache and PHP were little more than individual scripts interpreted by the webserver on request, the new site is in and of itself a complete, running, multithreaded application. When a request is made, the application starts a new thread to serve the request. Database connections are allocated as needed from a shared connection pool, maintained by the application.

    In the case of the interpreted scripts of old, programs were compiled and executed on the fly in a stateless manner. The scripts only ran when they were requested, and so there was no communication between threads or components and no sharing of resources.

    Our new software platform enables us to build true stateful applications that can create and share global resources. For instance, our message forums make use of a shared message index cache that, for all in intents and purposes, frees the database from nearly all read activity. The cache is shared in memory amongst all threads and it is only updated when a write operation is made to the database for a new posting, an edit, or a deletion. Such a cache would be very difficult to implement in something like PHP or PERL because its not possible to share persistent objects among different instances of an interpreted script.

    Our old web application was written in PHP and ran on Apache, a "pre-fork" multiprocess HTTP server. Apache works by starting a parent process which then forks several child processes to listen and wait for HTTP connections. Since, each of these child processes serves one HTTP request at a time, Apache creates a pool of processes to handle connections in a timely fashion.

    The disadvantage of this approach is that it can result in a great deal of overhead due to the 1:1 ratio between processes and requests. This can be particularly true in the case of HTTP keepalives, a feature designed to speed up web serving by handling multiple sequential requests from a client on the same connection, saving the time of having to build up a new connection for each request. The disadvantage comes into play when a child process is forced to wait a given amount of time on a client before accepting a connection from a different client. If the keepalive timeout is 15 seconds, then each Apache process will be unable to handle any other connections for 15 seconds following the final request from a client.

    This means an Apache web server using keepalives will need to have more child processes running than connections. Depending upon the configuration and the amount of traffic, this can result in a process pool that is significantly larger than the total number of concurrent connections. In fact, many large sites even go so far as to disable keepalives on Apache simply because all the blocked processes consume too much memory.

    Another issue with a multiproces

    --

    It's 10 PM. Do you know if you're un-American?