On PHP and Scaling
jpkunst writes "Chris Shiflett at oreillynet.com summarizes (with lots of links) a discussion about scalability, brought about by Friendster's move from Java to PHP. Chris argues that PHP scales well, because it fits into the Web's fundamental architecture. 'I think PHP scales well because Apache scales well because the Web scales well. PHP doesn't try to reinvent the wheel; it simply tries to fit into the existing paradigm, and this is the beauty of it.' (The article is also available on Chris' own website.)"
*Will Inherently Lead to Scalability* (Damn, can't type this early)
Here's an article from Jack Herrington on PHP's scalability.
h p_ scalability.html
http://www.onjava.com/pub/a/onjava/2003/10/15/p
if someone wants to produce a high performance web site in Java, jsp is a bad choice. use Velocity - pure java objects - a decent DB abstraction mechanism (Hibernate, iBatis). . Plus, i used php, ok, it is easy to use and can be preferred small to medium size web sites. but call me biased, it is nowhere near the elegance of java.
The article doesn't mention it, but Smarty is an excellent PHP library that implements, among other things, caching. I have used it extensively with excellent results.
Mike van Lammeren
It will challenge your head, your brain, and your mind.
What do you mean "calling mysql directly"? I can assure you that isn't actually possible in Java. MySQL is a C application, Java can't call C code without some kind of intermediate layer.
Also, what's "Database.java" -- if it's part of the MySQL/Java interface layer, this would be perfectly appropriate behaviour.
You're not thinking in a PHP architecture.... thinking Java style J2EE does not apply to using PHP.
What is a PHP "server"... it is the combination of Apache and PHP and a request being served. Since the web is stateless with simple session IDs tying things together it's not really necessary to share memory or resources between requests... hence Rasmus Lerdorf's "share nothing architecture."
It doesn't make sense do an olympic-sized web crawling script, and certainly not invoke it in the time of a web request. It makes more sense to write a script that is spawned by cron, with probably multiple instances that divy up the task of doing the search and creating the index.
Yahoo is very much a C/C++ shop first and foremost - PHP is used as a template system (alongside several proprietary systems) to allow easy modification of high level behaviour.
I will start with mandatory links to the great series of articles that Ace's Hardware ran, describing their server scenario and their migration from PHP to Java/J2EE:
The PHP Scalability Myth starts of by defining three types of server architectures. The first, two-tier, and the last, logical-three-tier, are the same conceptually (there is the slight distinction between whether display and business logic code is "mingled", but this is typically not a performance issue, but just an aesthetic or design issue). This two-tier/logical-three-tier architecture is the only one PHP supports natively. The article then proceeds to compare a two-tier PHP architecture against the most elaborate full three-tier Java architecture, which is used rarely in practice, and extremely rarely in the same domain in which a PHP solution is feasible. Instead of comparing apples and oranges (if PHP supported a full three-tier architecture, I would imagine two-tier PHP vs. three-tier PHP would have the same performance discrepencies), let's simply compare the only architecture PHP supports natively, two-tier, against JSP talking directly to a database, as this scenario is the most analogous to the PHP one. Let's also discard any caching as again this is something that Java handily accomodates but is not natively (or at least easily) available in PHP due to lack of state. And let's assume the database is the largest bottleneck.
The article states:
I'm not sure what "stub" the article is referring to, but I will assume it means an Apache module which talks a "native" protocol to the servlet engine. The first such module was mod_jserv, which could run the servlet engine both in-process and over a compact protocol called AJP (Apache Java Protocol), which represents essentially a pre-parsed HTTP requests. This module, as well as the AJP protocol itself has gone through severel revisions, from mod_jk, to mod_jk2. I cannot quite recall, but I think some version of mod_jk might have lost the ability to run in-process. Every other version, including the most current, can, if I recall correctly. This is besides the point, because as far as I know, AJP always has been a trivial performance overhead (I believe recent versions can run over Unix domain sockets). In fact, Apache is routinely used in production as the front-end web server, instead of the built-in servlet engine web server, simply because it is faster at serving static content, and that the AJP protocol is negligable. If the "stub" referred to in the quote is not the AJP module, then this may not be relevant, nevertheless AJP has always been highly efficient and typically negligable with regard to performance (the same typical connection min/max/idle count configurations apply as do to Apache itself).
The article goes on to proclaim the complexities of caching and data object persistence which we have eliminated from our comparision. Let's move on to the real bottleneck - the database. The article says "PHP's connectivity to the database consists of either a thin layer on top of the C data access functions, or a database abstraction layer called PEAR::DB. There is nothing to suggest tha
You sound like somebody who didn't use PHP long enough. Large PHP projects become plenty maintainable once you start using handy stuff like the Smarty templating engine (which IIRC is included by default now). There are also a myriad of great PEAR classes and PECL extensions. As for a module architechture that doesn't require you to recompile, that would be nice, however, I would bet that most PHP programmers have never recompiled their installation or needed to do so. You're right though, it would be nice.
;-)
For the most part though, I would say that PHP is slightly better equipped for web development, just like Perl is better equipped for general scripting tasks... I'm a python man myself though
That's only partially true as well -- Yahoo uses Perl for tons of their backend stuff. But yes, PHP is only the finally delivery bit, not the actual applications at Yahoo.
My main issue with PHP scalability is the lack of a global context for app-level caching.
http://www.php.net/manual/en/ref.sem.php -- system V shared memory. See specifically the functions shm_put_var() and shm_get_var().
One year of PHP at Yahoo
Making the case for PHP at Yahoo
As your application scales beyond one server, you then need to find a way to share your session between servers. This can be done in PHP via NFS with the default file based session driver (I think sourceforge does this), or with a database session driver.
If you had stored sessions in memory, then you would encounter problems with having to route requests based on session, or migrate to a method for sharing session data between machines.
JSR 223: Scripting Pages in JavaTM Web Applications
The specification will describe mechanisms allowing scripting language programs to access information developed in the Java Platform and allowing scripting language pages to be used in Java Server-side Applications. JSR 223
Striving to be common...
I don't think it's inefficient. I use it. I have an extensive CLI PHP scripting system setup that does it all. It connects to FTP systems downloaded data for updates, runs updates on several databases, generates plain text reports, csv (Excel type reports), and most of all combining it with crontabed called from others systems it allows me to share data between two systems that previously where unable to do so.
This also allows me to move code blocks between different platforms without issue. It also allows some of our beginning programmers to make changes and updates to this systems without having to know 5+ different languages. Most of them took C classes in school and the transition to PHP is fairly easy. We have a online documentation server (php/postgresql) that we also keep a list of no nos for programming in php so alot of those new to php don't make common mistakes. I have found php to be invaluable. Sure it's doesn't fit for every job in you come up with, but it makes system automation a snap.
Anyway, it's made my job much easier. Perl can do everything that CLI PHP can, but it's far less cryptic to those that are new to it which means far less training time and far less debugging on my part after someone new to the language drops syntactic money wrenches into our code or logical errors.
Here's an article from Jack Herrington on PHP's scalability
And here is an actual link to the article.
Um, this is an article about scaling, and therefore performance. Mentioning Smarty in such context is almost off-topic ;-)
Personally, I find the lighter weight Savant to be a better choice, since it's straight PHP (No syntax to learn either -- bonus!). That removes the need for Smarty's "compile into php"
step entirely, which has giving me MUCH better performance than when I was using Smarty. IMHO&experience, at least.
(And if you want caching, it can be done at the PHP engine level rather than in your templating engine -- see any of the PHP accellerators out there)
I'd give my right arm to be ambidextrous...
Having developed systems in Java and PHP I think it's wrong to try discussing how well either of them scales without considering the main factors that affect the scalability of projects, namely:
- The skill of the developers implementing the system
- The foresight of the original plan/architecture design
- Understanding of where bottlenecks/growth problems will occur
Any project that doesn't plan the scalability in from day one will likely struggle to fix the problem when scalability does become an issue.
IMHO scalability is a design and architectural problem, the language used (within reason) makes no difference- it's the quality and structure of the design itself which will make or break the system.
See their explanation on why they use PHP
Quite. The advantage of Java when combined with a database (and as you rightly point out how often is a webapp *not* combined with a database?), is that you can take advantage of in memory caching, improving scaling up to a point by reducing load on the database, which is typically the slowest part of a web app transaction.
Personally I use and love both Java and PHP for web apps, horses for courses certainly, but I would be far more comfortable with Java for a large webapp any day.
Yes Smarty compiling the templates into PHP causes some overhead. Compiling templates only happens once (unless you modify the template) so I'm not sure why your performance numbers were so much better with Savant, maybe the config?
;)
But if you are running a site that can use the output caching that Smarty offers and the code is done properly, you will see huge speed increases as you can skip everything in the page including opening a db connection. Which gives very close to flat HTML performance.
As to using PHP accelerators, they don't handle output caching by themselves. You can code your own, but my time is better spent doing other things
Using Smarty and Turck together is pretty impressive.
"a module architechture that doesn't require you to recompile"
Your kidding right?
urpmi php-mysql php-pgsql php-curl php-xml php-sockets
service httpd restart
See any "make; make install" commands in there?
How is that not modular?
Nearly everything in PHP is a module (or PHP's term, an extension) that can be installed or removed without recompiling.
Open Source Time and Attendance, Job Costing a
Actually, last time I checked (yesterday), Zend's Performance Suite Enterprise Edition includes both a PHP accelerator and does handle script output caching. Yes you can code your own, but then it depends how you like to spend your spare time... ;-)