Slashdot Mirror


On PHP and Scaling

jpkunst writes "Chris Shiflett at oreillynet.com summarizes (with lots of links) a discussion about scalability, brought about by Friendster's move from Java to PHP. Chris argues that PHP scales well, because it fits into the Web's fundamental architecture. 'I think PHP scales well because Apache scales well because the Web scales well. PHP doesn't try to reinvent the wheel; it simply tries to fit into the existing paradigm, and this is the beauty of it.' (The article is also available on Chris' own website.)"

9 of 245 comments (clear)

  1. Another article by Anonymous Coward · · Score: 4, Informative

    Here's an article from Jack Herrington on PHP's scalability.

    http://www.onjava.com/pub/a/onjava/2003/10/15/ph p_ scalability.html

  2. jsp is a bad idea, but Java is not by ahmetaa · · Score: 5, Informative

    if someone wants to produce a high performance web site in Java, jsp is a bad choice. use Velocity - pure java objects - a decent DB abstraction mechanism (Hibernate, iBatis). . Plus, i used php, ok, it is easy to use and can be preferred small to medium size web sites. but call me biased, it is nowhere near the elegance of java.

  3. Re:Author seems to live in a vacuum by lamz · · Score: 5, Informative
    I don't see any part of the article addressing how PHP can benefit the developer facing real issues of large scale web development (such as the need for caching systems on high volume websites, or the maintence challenge of larger code bases on complex sites).

    The article doesn't mention it, but Smarty is an excellent PHP library that implements, among other things, caching. I have used it extensively with excellent results.

    --

    Mike van Lammeren
    It will challenge your head, your brain, and your mind.

  4. Re:Yahoo. by Anonymous Coward · · Score: 5, Informative
    Actually that's only partially true. Yahoo uses C/C++ for almost all backend development. PHP is used mostly for what it's good at: Simple web frontends that call on extensions written in C and C++ to do most of the heavy lifting, or access backend systems written in C/C++.

    Yahoo is very much a C/C++ shop first and foremost - PHP is used as a template system (alongside several proprietary systems) to allow easy modification of high level behaviour.

  5. rebuttal by Anonymous Coward · · Score: 4, Informative

    I will start with mandatory links to the great series of articles that Ace's Hardware ran, describing their server scenario and their migration from PHP to Java/J2EE:

    1. Building a Better Webserver
    2. Building a Better Webserver in the 21st Century
    3. Scaling Server Performance

    The PHP Scalability Myth starts of by defining three types of server architectures. The first, two-tier, and the last, logical-three-tier, are the same conceptually (there is the slight distinction between whether display and business logic code is "mingled", but this is typically not a performance issue, but just an aesthetic or design issue). This two-tier/logical-three-tier architecture is the only one PHP supports natively. The article then proceeds to compare a two-tier PHP architecture against the most elaborate full three-tier Java architecture, which is used rarely in practice, and extremely rarely in the same domain in which a PHP solution is feasible. Instead of comparing apples and oranges (if PHP supported a full three-tier architecture, I would imagine two-tier PHP vs. three-tier PHP would have the same performance discrepencies), let's simply compare the only architecture PHP supports natively, two-tier, against JSP talking directly to a database, as this scenario is the most analogous to the PHP one. Let's also discard any caching as again this is something that Java handily accomodates but is not natively (or at least easily) available in PHP due to lack of state. And let's assume the database is the largest bottleneck.

    The article states:

    At the time when the first versions of the JSP and EJB standards were released, the prevalent web server was (and still is) Apache 1.x, which had a process model that was not compatible with Java's threading model. This meant that a small stub was required on the web server side to communicate with the servlet engine. The remains a non-trivial performance overhead for those that decide to pay it, and was a significant performance overhead when the first scalability comparisons were made.

    I'm not sure what "stub" the article is referring to, but I will assume it means an Apache module which talks a "native" protocol to the servlet engine. The first such module was mod_jserv, which could run the servlet engine both in-process and over a compact protocol called AJP (Apache Java Protocol), which represents essentially a pre-parsed HTTP requests. This module, as well as the AJP protocol itself has gone through severel revisions, from mod_jk, to mod_jk2. I cannot quite recall, but I think some version of mod_jk might have lost the ability to run in-process. Every other version, including the most current, can, if I recall correctly. This is besides the point, because as far as I know, AJP always has been a trivial performance overhead (I believe recent versions can run over Unix domain sockets). In fact, Apache is routinely used in production as the front-end web server, instead of the built-in servlet engine web server, simply because it is faster at serving static content, and that the AJP protocol is negligable. If the "stub" referred to in the quote is not the AJP module, then this may not be relevant, nevertheless AJP has always been highly efficient and typically negligable with regard to performance (the same typical connection min/max/idle count configurations apply as do to Apache itself).

    The article goes on to proclaim the complexities of caching and data object persistence which we have eliminated from our comparision. Let's move on to the real bottleneck - the database. The article says "PHP's connectivity to the database consists of either a thin layer on top of the C data access functions, or a database abstraction layer called PEAR::DB. There is nothing to suggest tha

    1. Re:rebuttal by julesh · · Score: 4, Informative

      This two-tier/logical-three-tier architecture is the only one PHP supports natively.

      I'm not sure what you're on, but you can build however-many-tiers-you-like applications with PHP. In fact, PHP supports a number of technologies specificallly designed to communicate with additional tiers, including CORBA, JavaBeans and SOAP.

      Let's also discard any caching as again this is something that Java handily accomodates but is not natively (or at least easily) available in PHP due to lack of state

      PHP supports persistent state through shared memory blocks trivially. The implementation of data caching schemes that use this feature is not hard.

      17 child threads attempt to connect, one will not be able to. If there are bugs in your scripts which do not allow the connections to shut down (such as infinite loops), a database with only 32 connections may be rapidly swamped

      Why would you limit your database to serving fewer connections than you have limited your web server to?

      PHP supports an option to kill runaway scripts and reclaim their resources after a time limit has elapsed, which handily prevents the infinite loop problems mentioned.

      Ok, so now we have a bunch of "persistent" connections that hang around with the process. How long do they hang around?

      Until the database closes them or the PHP server process is killed.

      What if two threads in the same process want to use a connection?

      The connection is locked from the moment a thread acquires it (using the *_pconnect function) until the script using it terminates.

      In the worst case, persistent connections make your problem much much worse, because now you have many more connections open to your database.

      What does an inactive open connection to the database cost? Not very much, in my experience.

      Your arguments have a little merit, but please try to do your research before ranting about a system.

  6. Re:Scalability and Maintainability go hand in hand by iamdrscience · · Score: 4, Informative

    You sound like somebody who didn't use PHP long enough. Large PHP projects become plenty maintainable once you start using handy stuff like the Smarty templating engine (which IIRC is included by default now). There are also a myriad of great PEAR classes and PECL extensions. As for a module architechture that doesn't require you to recompile, that would be nice, however, I would bet that most PHP programmers have never recompiled their installation or needed to do so. You're right though, it would be nice.

    For the most part though, I would say that PHP is slightly better equipped for web development, just like Perl is better equipped for general scripting tasks... I'm a python man myself though ;-)

  7. Re:Author seems to live in a vacuum by claar · · Score: 4, Informative

    Um, this is an article about scaling, and therefore performance. Mentioning Smarty in such context is almost off-topic ;-)

    Personally, I find the lighter weight Savant to be a better choice, since it's straight PHP (No syntax to learn either -- bonus!). That removes the need for Smarty's "compile into php"
    step entirely, which has giving me MUCH better performance than when I was using Smarty. IMHO&experience, at least.

    (And if you want caching, it can be done at the PHP engine level rather than in your templating engine -- see any of the PHP accellerators out there)

    --
    I'd give my right arm to be ambidextrous...
  8. Re:Sorry buddy... by hotgazpacho · · Score: 4, Informative
    scaleable enterprise systems just AREN'T written in PHP
    Tell that to Yahoo!

    See their explanation on why they use PHP