On PHP and Scaling
jpkunst writes "Chris Shiflett at oreillynet.com summarizes (with lots of links) a discussion about scalability, brought about by Friendster's move from Java to PHP. Chris argues that PHP scales well, because it fits into the Web's fundamental architecture. 'I think PHP scales well because Apache scales well because the Web scales well. PHP doesn't try to reinvent the wheel; it simply tries to fit into the existing paradigm, and this is the beauty of it.' (The article is also available on Chris' own website.)"
PHP inherntely will not lead to scalability, however, if you ever try to create any applications that use a DFS-type algorithm, it can happen. PHP (I know it is web-based, shouldn't ask too much) does not allow for extremely simple soloutions in DFS type algorithms that are apparent to most users. Many will end up with too many "while()" statements and bring down script efficency exponetialy.
it simply tries to fit into the existing paradigm
Allright, he used the word "paradigm", that makes his opinion automatically invalid.
The only real argument I could really find was "Java doesn't do X well, therefore PHP must be great". The author seems to live in a universe with only two choices, his straw man Java, and his favorite web language, PHP. When he does try and argue PHP's merits on its own, it seems to collapse into a PHP is good because its good argument. I don't see any part of the article addressing how PHP can benefit the developer facing real issues of large scale web development (such as the need for caching systems on high volume websites, or the maintence challenge of larger code bases on complex sites). While good arguments may exist for PHP, they just don't seem to be here.
Perhaps it's not mentioned very often because it's obvious, but I think it's an advantage for systems like PHP, or Rivet that they scale down very well.
What does this mean? That they don't consume too much in the way of resources, and are very easy to get started with. This puts a dynamic web site within reach of more people, which is a good thing, even if inevitably some of them will, yes, write crappy code. It is another example of the "worse is better" philosophy.
I just wish they had used Tcl or something else already out there instead of creating a language that in and of itself is nothing very exciting, and has been a bit slow.
http://www.welton.it/davidw/
Here's an article from Jack Herrington on PHP's scalability.
h p_ scalability.html
http://www.onjava.com/pub/a/onjava/2003/10/15/p
if someone wants to produce a high performance web site in Java, jsp is a bad choice. use Velocity - pure java objects - a decent DB abstraction mechanism (Hibernate, iBatis). . Plus, i used php, ok, it is easy to use and can be preferred small to medium size web sites. but call me biased, it is nowhere near the elegance of java.
I've seen a friendster stack trace before, when the app was running slow at 5 am. For those of you who don't know what this is, it's when Java runs into an error and tells you were your program died. It was really funny. Basically there was a servlet and a call to Database.java and on line 8000 of database.java they were calling mysql directly. Real nice architecture, NOT!
... but scaleable enterprise systems just AREN'T written in PHP. It's a great language, and I can see where it has a niche, but it doesn't offer the same kind of power over distributed objects/systems that Java does. It's like comparing MySQL to Oracle for enterprise systems.
I think the term is subjectable depending on the context in which it's used. Scalalable does have many definitions but I don't think that they are all wrong except for one.
His definition suits him well but it might not be helpful for me.
I might use scalable just to say that an application can easily (with little or no modification) handle 100x more users. This doesn't necessarily mean that the difference in system load varies a minimal specific amount per each extra request. All that matters is that it will work with higher demand. Who cares how or why.
I think scalable can also mean that an app can handle 10,000 users when hosted on a single machine but when put on a cluster of computers it can handle exponentially more users. To me that is a scalable application.
Scalable has no set definition in the contexts of applications.
Scalability is rarely that much of an issue- any halfway decent architecture (php, java, even .net) will let you scale horizontally- and Moore's law will take care of any performance problems in time.
My big issue with PHP is maintainability- I see it (perhaps incorrectly) as a glorified templating language, which places it on the same evolutionary track as ASP and cold fusion; developers will tend to munge sql calls into the templates, blow off any MVC separation, and get a system that is very hard to keep going for more than a few revisions.
What a strange bird is the pelican, his beak can hold more than his belly can.
First of all; Everytime I see the term "Scalation", the narrator writes as If scalation was only a term for "bigger". We have to think not only of being bigger, but being smaller.
PHP has a wide support for many RDBMS, APIs and Operating Systems, but it is only a Language. A language doesn't scale, it's the platform that scales.
That's why I see the PHP/Apache/Unix to scale far better than (for example) ASP/IIS/NT: The first platform can run from a PDA to a high-perfomance Minicomputer; The second can run from an I686 (pentium support was removed?) to the best PC-Architecture based computer you can buy. That's the difference: A wide option platform versus a closed option platform.
Probably, the first platform will have perfomance leaks and will not take every perfomance point from the machine it runs within, but its scalability potential resides that it can run in whatever you throw it at. Maybe J2EE or other platforms will run faster on the same hardware than PHP, but PHP will scale there and will be looking shoulder to shoulder to it.
That's why I don't like to valuate Scalability from the "speed" point of view, but the "where it runs" point of view.
------- The last Sig. got fired.
From the article
"Scalability is gained by using a shared-nothing architecture where you can scale horizontally infinitely. A typical Java application will make use of the fact that it is running under a JVM in which you can store session and state data very easily and you can effectively write a web application very much the same way you would write a desktop application. This is very convenient, but it doesn't scale. "
Storing and more importantly trying to replicate stored state via sessions in Java can be expensive, but saying Java scales badly because it makes it easy to do things that don't scale well is a poor argument. I don't know enough about the merits of PHP to comment on how it deals with this issue, but when you've done lots of server side Java programming you learn to be very judicious in the use of Session scope.
It is not a good thing that there is a short learning curve on PHP. While it does put the ability for dynamic webcontent at the fingers of most users, it also creates a crapflood of insecure sites. Not to mention when a user may get into more advanced PHP programming and know nothing of basic CS (I know, not a big CS language, but some things must be known). Inefficent scripts will bog down sites, improper loops and insecurity can wreak havok on a network. I have recieved several emails in relation to a PHP security project that I run from university admins who have difficulty with insecure PHP coders and allowing them to have access to PHP servers and SQL databases that others use.
I worked in a small shop developing web apps, and while it wasn't mission critical stuff like banking, it wasn't exactly brainless "dump data from MySQL" stuff either. I was lucky that my boss wasn't picky about languages. But if anyone I work with doubts the power and simplicity of PHP, I usually bring up Yahoo.
IMHO, PHP rocks. It's suitable for pretty much any and all web development. It can be used for quick hacks, or you can code it like a pro with objects and stuff.
PHP's problem is that it quickly becomes unmaintainable in larger projects. That's why it doesn't scale, not because the platform isn't fast enough or Apache can/can't scale.
PHP will continue to have this problem until someone comes and tells the developers about a nifty invention called 'namespaces'
Some other things that could help: Standard templating for easier separation of design/content from code, a better module architecture that doesn't require me to recompile just to get some new functionality, some nice standard modules that go with that new architecture.
Of course if someone did all of that you'd have Perl and since we already have Perl, I'll stick with it.
The Anti-Blog
I'd like to point out this blog post on Kottke:
Moore's buddy Matt Chisholm chimes in to tell me about a similar hack, a JavaScript app he wrote with Moore that works on Friendster. It mines for information about anyone who looks at his profile and clicks through to his Web site. "I get their user ID, email address, age, plus their full name. Neither their full name nor their email is ever supposed to be revealed," he says.
Notified of the security holes Moore and Chisholm exploit, Friendster rep Lisa Kopp insists, "We have a policy that we are not being hacked." When I explain that, policy or no, they are being hacked, she says, "Security isn't a priority for us. We're mostly focused on making the site go faster."
The term "scalable" has become an industry buzzword. It is fruitless to argue whether something is scalable or not if there is no clear defination. It's like arguing whether you believe in freedom or not. Of course most people in the world will say they believe in freedom, but if you ask 100 people to define it you will get 100 different answers (the Bush administration has had a field day with this because the minute you oppose them, they accuse you of not believing in freedom; their defination of course).
It is impossible to say php is or is not scalable unless a defination can be agreed on. And with "scalable's" current buzzword status, I don't see that happening very soon.
One of the great boons of PHP is the fact that you can build shell scripts with it. This allowed me to create a large distribution/inventory/control system in PHP, AND do all the back end processing in PHP as well. Sound inefficient, sure, but it works like a champ - plus any new programmers get to learn the system quite quickly due to consistency.
meh
HTTP URL Wrappers and file_get_contents and serialize, unserialize. With these functions alone you can recreate any CORBA SOAP XML-RPC type remoting. And remoting is good for for scalability because it lets you 'outsource' the workload to another machine. Truly N-Tier design (N>3).
Jon Bardin
Java: Wack
PHP: Dope
I will start with mandatory links to the great series of articles that Ace's Hardware ran, describing their server scenario and their migration from PHP to Java/J2EE:
The PHP Scalability Myth starts of by defining three types of server architectures. The first, two-tier, and the last, logical-three-tier, are the same conceptually (there is the slight distinction between whether display and business logic code is "mingled", but this is typically not a performance issue, but just an aesthetic or design issue). This two-tier/logical-three-tier architecture is the only one PHP supports natively. The article then proceeds to compare a two-tier PHP architecture against the most elaborate full three-tier Java architecture, which is used rarely in practice, and extremely rarely in the same domain in which a PHP solution is feasible. Instead of comparing apples and oranges (if PHP supported a full three-tier architecture, I would imagine two-tier PHP vs. three-tier PHP would have the same performance discrepencies), let's simply compare the only architecture PHP supports natively, two-tier, against JSP talking directly to a database, as this scenario is the most analogous to the PHP one. Let's also discard any caching as again this is something that Java handily accomodates but is not natively (or at least easily) available in PHP due to lack of state. And let's assume the database is the largest bottleneck.
The article states:
I'm not sure what "stub" the article is referring to, but I will assume it means an Apache module which talks a "native" protocol to the servlet engine. The first such module was mod_jserv, which could run the servlet engine both in-process and over a compact protocol called AJP (Apache Java Protocol), which represents essentially a pre-parsed HTTP requests. This module, as well as the AJP protocol itself has gone through severel revisions, from mod_jk, to mod_jk2. I cannot quite recall, but I think some version of mod_jk might have lost the ability to run in-process. Every other version, including the most current, can, if I recall correctly. This is besides the point, because as far as I know, AJP always has been a trivial performance overhead (I believe recent versions can run over Unix domain sockets). In fact, Apache is routinely used in production as the front-end web server, instead of the built-in servlet engine web server, simply because it is faster at serving static content, and that the AJP protocol is negligable. If the "stub" referred to in the quote is not the AJP module, then this may not be relevant, nevertheless AJP has always been highly efficient and typically negligable with regard to performance (the same typical connection min/max/idle count configurations apply as do to Apache itself).
The article goes on to proclaim the complexities of caching and data object persistence which we have eliminated from our comparision. Let's move on to the real bottleneck - the database. The article says "PHP's connectivity to the database consists of either a thin layer on top of the C data access functions, or a database abstraction layer called PEAR::DB. There is nothing to suggest tha
What is the largest, or most heavily used php driven site?
Man and Goat
Confession time: the worst Swing based class I have ever committed has about 4000 lines, but about 2/3 of that is Swing.
Panurge has posted for the last time. Thanks for the positive moderations.
I think to settle this debate is a possible real-world example. Look at the story on the Jboss Nukes Project. It explains the CPU utilization and speed of the PHP version and how moving to a J2EE implementation decreased the wait times dramatically.
Its difficult to argue with facts.
Are they dope, or are they whack?
Yha,
lord knows that Java or C# wouldn't allow you to write console or GUI apps.
In other words, scalability of an architecture will always be a factor for complex problems.
Double prices
:)
Wait for Bank Holiday
Halve prices
Promote MASSIVE SALE!!!!!!
Profit
(sorry www.dfsonline.co.uk
1. PHP scales well.
2. Java scales well.
3. Friendster couldn't devlop a scalable J2EE application, so they switched to PHP.
4. WHat will Friendster switch to when they can't develop a scalable PHP application?
It's simple: I demand prosecution for torture.
JSR 223: Scripting Pages in JavaTM Web Applications
The specification will describe mechanisms allowing scripting language programs to access information developed in the Java Platform and allowing scripting language pages to be used in Java Server-side Applications. JSR 223
Striving to be common...
...SourceForge is playing with Java: http://www.sdtimes.com/news/105/story9.htm
Urge.. to kill.. Linda Barker... rising....
While I am personally gratified that someone is making the case for PHP vs. Java, I think the whole idea of attributing scalability (as in, works for lesser and greater numbers) is wrong.
Scalability depends on how you write your code. If your algorithms are good, your system will scale, and if they aren't, it will not. Any language that doesn't let you write good algorithms cannot be expected to be generally useful, but I think neither PHP nor Java fall in that category.
Finally, I think scalability is really not what's important, but rather performance. When developing tailor-made applications, I only care if they requires more or fewer resources for the number of requests they actually get, not for higher or lower loads. Of course, for libraries, operating systems, etc. the argument is different.
Please correct me if I got my facts wrong.
The Turk mmCache accelerator also provide a way to easely store and retrieve variables in shared persistant memory.
The article blows me away for all the wrong reasons. Regardless of the language chosen, it amazes me that the focus was on rewriting the application rather than performance testing and tuning the existing software.
They missed out on some amazing diagnostic tools to help them understand what was happening in the code, deployed system, and IT organization. What planet do these people live on? Being a fantastic developer may help but it does not make one a performance or systems engineer.
It is important to have a measurable definition of scalability. A reasonable definition of scalability for a given platform P and application A is
S(A,P) = R(A,P) / C(A,P)
where
Here I've assumed 100% availability for the purposes of this discussion. Availability could be added to the definition if desired. Note that this expression displays the expected behavior shown by common usage of the term "scalability":
With this definition, scalability's dimensions are "requests processed per second per dollar".
Example: given the following known values for a single application A:
running on platform X:
R(A,X) = 1000 requests/second,
C(A,X) = $40,000
S(A,X) = 1000 requests/second / $40,000 = 0.025
running on not-so-fast but less expensive platform Y:
R(A,Y) = 500 requests/second,
C(A,Y) = $10,000
S(A,Y) = 500 requests/second / $10,000 = 0.05
While platform Y's throughput (performance) is much less than that of platform Y, Y is much more scalable than (in fact is twice as scalable
as) platform X when running application A.
This definition can also be used to estimate the utility of using various software methodologies. For example, use of components or object technology or Java or PHP may or may not change the values of each factor in the definition: the degree to which each is changed determines whether the resultant system is more or less scalable.
Michael D. Kersey
Here's an article from Jack Herrington on PHP's scalability
And here is an actual link to the article.
Having developed systems in Java and PHP I think it's wrong to try discussing how well either of them scales without considering the main factors that affect the scalability of projects, namely:
- The skill of the developers implementing the system
- The foresight of the original plan/architecture design
- Understanding of where bottlenecks/growth problems will occur
Any project that doesn't plan the scalability in from day one will likely struggle to fix the problem when scalability does become an issue.
IMHO scalability is a design and architectural problem, the language used (within reason) makes no difference- it's the quality and structure of the design itself which will make or break the system.
Why would you limit your database to serving fewer connections than you have limited your web server to?
I always wondered about that in application design. Just recently I talked with a former software architect from IBM and he gave me an answer that may make some sense. It still feels a little counter-intuitive to me, so any corrections would be welcome.
The answer is basically a resource management issue if I remember correctly. You want to manage the load on your database servers so it stays relatively even, that is, avoiding peaks and troughs. You also want to keep requests waiting at the edges of your network, add the web server level.
Imagine for a second a scenario where the database accepted an unlimited amount of connections. Eventually the database server would become swamped as more and more resources were consumed (database connections may not be quite as expensive as people think, but they still aren't free). At times you will have the servers running at 100% capacity and at other times at very low capacity.
From a business perspective, you want to use the cheapest machines that are capable of meeting the performance goals. You could keep adding database servers until you meet the load, but this may not always be cost efficient. In addition, you then occur more overhead for keeping the database servers in sync with each other. So you can't just allow unlimited connections.
Now consider the case I think you are stating. Let W web connections = D database connections. IIRCC one problem with this is that not all web requests require a call to the database, so essentially, you will probably have database connections going to waste. You are also wasting database server capacity and resources when you have a low traffic scenario since not all of the possible database connections are used.
Think about it like getting assignments at work. Would you rather get a steady stream of assignments that you can finish in a reasonable amount of time? Or get dumped on with a million things to do at once, and then have long periods of inactivity. I know the clock on the wall seems to move faster to me when I have a steady amount of work to do and I have less stress when expectations are reasonable. Also, from my boss' perspective, when I have nothing to do, he is essentially paying me for doing nothing, and he doesn't want that.
Another analogy might be the caching scenario. Caching is helpful because you can put data you are most likely to need in a location that is faster to search and quicker to access. But if your cache is the same size as your backend store (like memory matched to the size of your hard disk) you loose the advantage of fast searches since you have the same space to search. If your cache is too small you lose the access speed advantage since you have to go back to the underlying, and slower, store more often. So usually, you set the cache to be some fraction of the underlying store to maximize the search/access ratio.
Database connections may not be dog slow, but they have more to do than a web server connection, which may have more to do than an edge-side static content cache. So you want to tune your application to find the optimal usage of each of your resources.
So the IBM guy's answer was, you want to try and use a machine to around 80% capacity, so if the sh** really hits the fan you don't necessarily die, but you aren't maxed out either. By having the web server queue up requests to go to the database, you allow for a steady stream of traffic to the database, even at times when load is off-peak. This also means that it is easier to judge the affect of adding more hardware to the system, since you know how many pages you can serve at 80% capacity, so you know 1) when to add more servers (if the machines are constantly maxed) and 2) how many machines you have to add to keep the same quality of service.
I think the point is the same as the performance vs. scalability discussions in the articl
======
In X-Windows the client serves YOU!
It's also good to determine how scalable the code is. Is the code readable? Maintainable? Extensible? Can large teams effectively work on the same code base?
While this does have more to do with how the code is written, programming languages to contribute to code scalability.
Does PHP promote scalable code?
There is no longer anything that can be done with computers that is nontrivial and clearly legal. -- Paul Phillips
Dangit I just used my mod points in another thread
> Its difficult to argue with facts
Not at all!
You just grab a couple of other facts
(that support your claim) and voila, you're done!
<pet-peeve>
There is no programming language called C/C++. There is a language called C, and there is a language called C++. If Yahoo! uses C++ backends, then it uses C++ backends.
</pet-peeve>
IMHO, PHP rocks. It's suitable for pretty much any and all web development. It can be used for quick hacks, or you can code it like a pro with objects and stuff.
.NET and others.
Yes, PHP is excellent for web development. Yes, PHP can scale to even some large web sites. But since the web is still all the rage, this is unfortunately all that many people think about. Where PHP stumbles is when you need to move off the web or when you need to write complex business logic that is not solely driven by a web tier. PHP also fails when you need to integrate diverse transactional resources in an efficient manner. Not all business applications can be suitably implemented in PHP. As examples:
- PHP, by its scripted execute-and-terminate nature, cannot schedule the execution of tasks on its own. So, for example, there is no way to schedule an email to be sent at a specified time. If you need this sort of functionality, you'll have to look beyond PHP to ugly hacks like cron jobs that call PHP. (and then PHP scripts that can automatically modify your cron scripts..) Alternatively, you could write your own scheduler in a different language.
- Somewhat related, PHP is incapable of asynchronous operation. Suppose, for example, that we have a flood of customers placing orders. Our inventory database is fully capable of keeping up with the demand, but credit card processing system is backlogged and this is out of our control. So we cannot give users an immediate response as to whether their payment was accepted upon placing the order. We also don't want to make them wait 5-10 minutes after hitting the "place order" button for a response. The proper business solution is to accept the order, but send the customer an email later if the payment was rejected. This process requires asychronous operation -- queueing of the payment validation requests and possible further action separate from user interaction. PHP has no solution for this scenario or the many others like it and thus we must look beyond the PHP domain.
- PHP is quite weak when it comes to writing a complex business logic layer. This is not to say that it is not possible, but there are no frameworks available comparable to those offered in the Java world (and I'm not just talking about EJB, btw). So this is not a question of languages, but of available tools to do the job efficiently. For example, PHP has no concept of application-level transaction management. (declarative transactions, isolation levels, etc.) Looking towards the cutting edge, it has no support for Aspect Oriented Programming, which is an enormous boon to business logic developers, available in Java, C++,
- PHP is weak on tools for developing the persistence layer. For example, it has nothing comparable to Hibernate, let alone tools for RAD employing UML.
- PHP has no pre-built solutions for caching persistent data, and certainly not objects. Once again, it is possible, but developers are left to roll their own solutions using shm extensions or writing out to the database backend. Using the database can be terribly slow and even the shm approach requires (de-)serialization on script load/terminate. While this sort of thing does not limit scalability, it does limit performance (response times).
- PHP has no means of replicating application state in a cluster other than using the backend database. While this is often of no consequence, some complex business software holds a fair amount of state which needs not be persistent.
- PHP itself cannot reasonably be used to develop non-web clients such as a GUI tool for efficient rapid data entry or greater interactivity, a PDA client, or an embedded device that interfaces with a campus security system. These sorts of clients can talk to PHP scripts via SOAP extensions, but it should be recognized that we have again left the PHP domain to meet these needs and the resulting solution may not be the most efficient.
So in closing, PHP is great for some thing
tho it does sound a bit Soviet Russia. "Those who anger you, control you"
Show me a Java site that serves three billion page views a day.
Three billion pageviews a day and rising at a fair clip.
I used to be a PHP fantatic, but then I learnt Java, it is by far the better language, in terms of design, in terms of the architecture beneath it and performance. But by far the biggest reason why PHP doesn't scale is its raw basic types. Integers are at maximum 32-bit un-signed, that means when I was writing a simple file manager I couldn't handle files bigger than 4GB at all, and beyong 2GB PHP returned in-accurate results in terms of file sizes. Objects in PHP have always been an after-thought, same with most basic types in PHP. Unlike Java that was built from the ground up, much of PHP's functionaltiy has been tacked on at a later date without refactoring the project. PHP-5 does go some way to fix the problem, but it will never scale as well as Java does on any platform. Perhaps the best solution is like Yahoo, take the best of both worlds. Have PHP at the front, taking care of the final output is fine, but having it as the engine of your application is not always the best solution, and it certainly isn't for enterprise situtations.
Dave Bell
On a php apache installation php does not handle that many requests before running out of memory.
This is with the apache prefork method. Allthough if you are careful you can use the threaded model.
It is easy to write applications not using php which can handle 10x as many users on the same machine. Mainly because apache/php gobble up your memory too quickly. However there are also ways to make your php/apache site scale better without getting more hardware.
Use seperate apache installs specialised for each request. eg. Compile one apache with almost nothing compiled using the threaded mpm for static files/images. Use proxying(with squid or apache) to direct requests however you need to. Use different webservers when you need to. The good thing about setting up proxying on the same machine is that if you need to scale to more machines it is fairly easy.
Modularity is good. If there are any parts of your site which you can seperate, do so. Some common examples of things that can be moved include statistical reports(for the pointy haired bosses). Again if you keep things modular on the same machine, you can easily move those parts to seperate machines later. The more things you can keep modular the better!
Use an in memory db(or precache your disk db), with periodic syncing to disk for backup. For lots of cases *YOUR DATA IS CRAP* so don't bother too much trying to ensure that it gets written correctly. Reading/writing data from the hard disk is *slow*. Try and keep all your data in ram. Besides, with uptimes of 300+ days on an average linux/bsd box you will probably only loose one backup interval of data a year.
Repeating it again in case you missed it *YOUR DATA IS CRAP*.
Use a faster db for simple storage needs. eg use a bsddb database instead of an sql one.
Use mod_gzip/mod_compress to get your slow modem clients off your back quicker. Slow modem users sucking down big php pages can tie up apache processes using up memory slowing you down. If you get their content to them 8 times faster you can start serving another client quicker. As a bonus the users will think your site is 8 times faster! **holy cow this site is fast now!!!!**
Remove all unused apache mods. Try recompiling php to use less memory. If your sites do not need the GD image libraries, do not compile them in. With a bit of tweaking you should be able to use half as much memory.
On the client side, use javascript/flash!!! You can reduce the number of requests/page loads dramatically by doing things dynamically on the client side.
A js eg. Whilst writing a response to my post clicking the preview button should *not* need to make a http request, do some processing, then spit back the answer. For clients with javascript enabled simply have the preview code as javascript. Now you just cut the number of post related requests down by at least half!
Have fun!
"If the only tool you have looks like...."
d
Hammer meet nail, nail meet hammer.
Hammer I'm sorry you can't meet screw we already know what will happen don't we....
http://www.sys-con.com/story/?storyid=45250
an
http://kano.net/javabench/
So, the author basicaly states PHP scales better than Java because it does not have sessions. So how about not using sessions in Java either? You don't _have_ to use them, you know!
If you have a replicating database backend, use database connection pooling and stay away from sessions, you don't have the inter-JVM messaging problem.
When you do that, you can add as many database and web servers as you want, this is the same for Java and PHP.
Not that I believe Java is a superior solution to PHP, both loose out badly compared to AOLserver in the performance and ease-of-coding department.
I have no idea how much bandwidth/memory/db size Friendster uses. So lets make up some random numbers ;)
Assuming 100,000 simultaneous active users each consuming 100kb of data that works out to be under 10 gigs of memory. Put it up to one megabyte per user and that is around 100 gigs.
You can easily fit 16 gigs of memory in one machine. If you use ram disks you can even fit more than 16 gigs of memory. Put in ten machines and you should be able to service a lot of users.
Have fun!
If you design and code MVC and want to scale up and you're colo'ing in the same place, you simply add a new boxxen (web/application server) V/C and use the original M. If you're hosting at a new location far enough away that latency issues start raising their ugly little heads, then you need to do something else (host multiple Ms and do some kind of batch data reconciliation or slow-link updates).
On the other hand, your Model 1 guy can't do any of that because the M is hopelessly wedded to the V, so there's no way to add additional V/Cs without risking deadlock or eventual mismanaging of your singletons (one M gets updated and others don't, mix, repeat).
Yeah, right.
Obviously the NFS solution would not scale very well horizontally. "Imagine a beowulf cluster of these" all wanting a lock on the same NFS file at the same time. A database session driver might scale somewhat better, but couldn't you do the same thing with Java?
The best solution is to eschew the whole "session" model as much as possible. The article seemed to imply that PHP encouraged this, but I'm not convinced. I think I've seen plenty of PHP tutorials encouraging session variables unnecessarily.