On PHP and Scaling
jpkunst writes "Chris Shiflett at oreillynet.com summarizes (with lots of links) a discussion about scalability, brought about by Friendster's move from Java to PHP. Chris argues that PHP scales well, because it fits into the Web's fundamental architecture. 'I think PHP scales well because Apache scales well because the Web scales well. PHP doesn't try to reinvent the wheel; it simply tries to fit into the existing paradigm, and this is the beauty of it.' (The article is also available on Chris' own website.)"
Here's an article from Jack Herrington on PHP's scalability.
h p_ scalability.html
http://www.onjava.com/pub/a/onjava/2003/10/15/p
if someone wants to produce a high performance web site in Java, jsp is a bad choice. use Velocity - pure java objects - a decent DB abstraction mechanism (Hibernate, iBatis). . Plus, i used php, ok, it is easy to use and can be preferred small to medium size web sites. but call me biased, it is nowhere near the elegance of java.
The article doesn't mention it, but Smarty is an excellent PHP library that implements, among other things, caching. I have used it extensively with excellent results.
Mike van Lammeren
It will challenge your head, your brain, and your mind.
Yahoo is very much a C/C++ shop first and foremost - PHP is used as a template system (alongside several proprietary systems) to allow easy modification of high level behaviour.
I will start with mandatory links to the great series of articles that Ace's Hardware ran, describing their server scenario and their migration from PHP to Java/J2EE:
The PHP Scalability Myth starts of by defining three types of server architectures. The first, two-tier, and the last, logical-three-tier, are the same conceptually (there is the slight distinction between whether display and business logic code is "mingled", but this is typically not a performance issue, but just an aesthetic or design issue). This two-tier/logical-three-tier architecture is the only one PHP supports natively. The article then proceeds to compare a two-tier PHP architecture against the most elaborate full three-tier Java architecture, which is used rarely in practice, and extremely rarely in the same domain in which a PHP solution is feasible. Instead of comparing apples and oranges (if PHP supported a full three-tier architecture, I would imagine two-tier PHP vs. three-tier PHP would have the same performance discrepencies), let's simply compare the only architecture PHP supports natively, two-tier, against JSP talking directly to a database, as this scenario is the most analogous to the PHP one. Let's also discard any caching as again this is something that Java handily accomodates but is not natively (or at least easily) available in PHP due to lack of state. And let's assume the database is the largest bottleneck.
The article states:
I'm not sure what "stub" the article is referring to, but I will assume it means an Apache module which talks a "native" protocol to the servlet engine. The first such module was mod_jserv, which could run the servlet engine both in-process and over a compact protocol called AJP (Apache Java Protocol), which represents essentially a pre-parsed HTTP requests. This module, as well as the AJP protocol itself has gone through severel revisions, from mod_jk, to mod_jk2. I cannot quite recall, but I think some version of mod_jk might have lost the ability to run in-process. Every other version, including the most current, can, if I recall correctly. This is besides the point, because as far as I know, AJP always has been a trivial performance overhead (I believe recent versions can run over Unix domain sockets). In fact, Apache is routinely used in production as the front-end web server, instead of the built-in servlet engine web server, simply because it is faster at serving static content, and that the AJP protocol is negligable. If the "stub" referred to in the quote is not the AJP module, then this may not be relevant, nevertheless AJP has always been highly efficient and typically negligable with regard to performance (the same typical connection min/max/idle count configurations apply as do to Apache itself).
The article goes on to proclaim the complexities of caching and data object persistence which we have eliminated from our comparision. Let's move on to the real bottleneck - the database. The article says "PHP's connectivity to the database consists of either a thin layer on top of the C data access functions, or a database abstraction layer called PEAR::DB. There is nothing to suggest tha
You sound like somebody who didn't use PHP long enough. Large PHP projects become plenty maintainable once you start using handy stuff like the Smarty templating engine (which IIRC is included by default now). There are also a myriad of great PEAR classes and PECL extensions. As for a module architechture that doesn't require you to recompile, that would be nice, however, I would bet that most PHP programmers have never recompiled their installation or needed to do so. You're right though, it would be nice.
;-)
For the most part though, I would say that PHP is slightly better equipped for web development, just like Perl is better equipped for general scripting tasks... I'm a python man myself though
As your application scales beyond one server, you then need to find a way to share your session between servers. This can be done in PHP via NFS with the default file based session driver (I think sourceforge does this), or with a database session driver.
If you had stored sessions in memory, then you would encounter problems with having to route requests based on session, or migrate to a method for sharing session data between machines.
Um, this is an article about scaling, and therefore performance. Mentioning Smarty in such context is almost off-topic ;-)
Personally, I find the lighter weight Savant to be a better choice, since it's straight PHP (No syntax to learn either -- bonus!). That removes the need for Smarty's "compile into php"
step entirely, which has giving me MUCH better performance than when I was using Smarty. IMHO&experience, at least.
(And if you want caching, it can be done at the PHP engine level rather than in your templating engine -- see any of the PHP accellerators out there)
I'd give my right arm to be ambidextrous...
Having developed systems in Java and PHP I think it's wrong to try discussing how well either of them scales without considering the main factors that affect the scalability of projects, namely:
- The skill of the developers implementing the system
- The foresight of the original plan/architecture design
- Understanding of where bottlenecks/growth problems will occur
Any project that doesn't plan the scalability in from day one will likely struggle to fix the problem when scalability does become an issue.
IMHO scalability is a design and architectural problem, the language used (within reason) makes no difference- it's the quality and structure of the design itself which will make or break the system.
See their explanation on why they use PHP
Yes Smarty compiling the templates into PHP causes some overhead. Compiling templates only happens once (unless you modify the template) so I'm not sure why your performance numbers were so much better with Savant, maybe the config?
;)
But if you are running a site that can use the output caching that Smarty offers and the code is done properly, you will see huge speed increases as you can skip everything in the page including opening a db connection. Which gives very close to flat HTML performance.
As to using PHP accelerators, they don't handle output caching by themselves. You can code your own, but my time is better spent doing other things
Using Smarty and Turck together is pretty impressive.