The Environmental Impact of PHP Compared To C++ On Facebook
Kensai7 writes "Recently, Facebook provided us with some information on their server park. They use about 30,000 servers, and not surprisingly, most of them are running PHP code to generate pages full of social info for their users. As they only say that 'the bulk' is running PHP, let's assume this to be 25,000 of the 30,000. If C++ would have been used instead of PHP, then 22,500 servers could be powered down (assuming a conservative ratio of 10 for the efficiency of C++ versus PHP code), or a reduction of 49,000 tons of CO2 per year. Of course, it is a bit unfair to isolate Facebook here. Their servers are only a tiny fraction of computers deployed world-wide that are interpreting PHP code."
Exactly. He using stats from benchmark results from pure number-crunching tests. Seriously, here are the tests: pidigits, reverse-complement, regex-dna, k-nucleotide, n-body, fasta, binary-trees, fannkuch, spectral-norm, mandelbrot. Yep. Looks like stuff a web page would do... The biggest bottle neck is probably data access, in which case the language really doesn't make much, if any difference.
Google.com: 12,056 servers Website Worth: $ 186,352,889,952 USD (€ 126,610,389,668 EUR) Facebook: 34,872 servers Website Worth: $ 5,253,772,177 USD (€ 3,569,475,862 EUR) I'd say 'GOOG' i doing a bit better atm :)
for those interested:
info comes from: http://websiteshadow.com/
it will show ad rev. etc for URL's you provide
It's got a very rich set of features that are aimed straight at making web development dead simple. The syntax is fairly straightforward and familiar, being a typical mishmash of shell scripting, C and perlisms. It was built from day one to integrate with Apache, it's not a nasty bolt-on hack like mod_perl. It's in-process so there's no startup overhead like with CGI. I've been using it on some pretty large web sites for years and it's never let me down.
no longer working for cnet
Even better, TFA is a page for a C++ web toolkit. It's just spam.
Yes, PHP is a heck of a lot slower on proccessor-bound tasks than C++. In a pure benchmarking contest, no doubt C++ will win.
But what about when both languages have to query a database (be it mysql/postgress/oracle, etc)? In this case, both are blocked on the speed of the database. a 15 ms query takes 15 ms no matter what language is asking. Facebook is not calculating pi to 10 gazillion digits, and it is not checking factors for the Great Internet Mersenne Prime Search. It is serving up pages containing tons of customized data. This is not proessor-bound... it is I/O bound both on the ins and outs of the database and the ins and outs of the http request. It is also processor bound on the page render, but the goal of this many machines is to cache to the point where page renders are eliminated.
Once a page is rendered, it can be cached until the data inside of it changes. For something like facebook, I bet a page is rendered once for every ~10 times it is viewed by someone. Caching is done in ram, and large ram caches take a lot of machines.
So lets look at those 30,000 machines not by their language, but by their role. We can argue the percentages to death, but lets assume 1/3rd are database, 1/3rd are cache, and 1/3rd are actually running a web server, assembling pages, or otherwise dealing with the end users directly (BTW, I think 1/3rd is way high for that.)
So 1/3rd of the machines are dealing with page composition and serving pages. If they serve a page ~10 times for every render request, then abtou 1/10th of the page requests actually cause a render... the rest are being served from cache. Those page renders are I/O bound, as in the example above - waiting on the database (and other caches, like memcached), so even if they are taking a lot of wait cycles, they are not using processor power on the box. The actual page composition (which might be 20% of the processing that box is doing), would be a lot faster in C++... So 10,000 servers, the virtual equivalent of 2000 are generating pages using php, and could be replaced by 200 boxes using stuff generated in C++.
So the choice of using php is adding ~1800 machines to the architecture. or ~6% of the total 30,000. Given that a php developer is probably 10x more productive than a developer in C++, is the time to market with new features worth that to them? I bet it is.
OK, let's say AJAX didn't exist for a moment. People would have to refresh their browsers to display/submit forms, which would require Apache/PHP to serve a *full web page* for every form displayed and submitted. This in itself causes a load on servers, before dynamic content is even considered. If anything, AJAX *lowers* server load.
For example, consider the following. Say bad things about PHP all you want (it deserves it) but one of the things you don't generally see with PHP code is a buffer overflow, where you try to copy a bunch of strings and concatenate them together and you run out of room and don't notice it and you go clobbering memory. That's because the string manipulation code goes through a bunch of checks when you're appending strings. You can't just skip these checks and hope that everything will work the same. You may know that such and such a code-path isn't going to need all the bounds checking because you're, say, idunno, assembling fixed-length ZIP+4 codes or something, but the scripting language can't be informed of that fact using any extant mechanism (nor is it clear how you could integrate such a mechanism with the powerful abstraction that lets you not worry about the rest of your strings to begin with).
Moreover, as has already been pointed out, a lot of the computational price of rendering a web page is database queries and memory-cached-object queries which employ compiled code already. The string-manipulation overhead isn't all that significant compared to the abstraction that it buys you. It's probably a better idea to track down logic issues, where your code does stupid useless computations that it doesn't need that make it slow, or could do certain computations in advance to make it faster, or such.
I think there's a lot more potential for interesting machine optimization of code for things coming from the functional paradigm, where you can mathematically show the equivalence of certain portions of code with its optimized replacement, and that this paradigm will be making a resurgence in some places during the upcoming era of 128-core processors. This might be interesting.
The World Wide Web is dying. Soon, we shall have only the Internet.
From my personal experience: Data-heavy applications run at a complete crawl in PHP. 10 times slower, is, in my opinion, a vast understatement.
Then again, that’s not the point of PHP. The point is, that in PHP, provided you already know how to program, also get things done more than 10 times faster, than in C++. Because there is a simple function with defaults and automatisms for literally everything.
Only if those defaults and automatisms are other than what you expect, you will get into big trouble. And because the PHP interpreter is truly a horrible piece of shit (I was able to run totally illegal constructs, with plain text right in the middle of the code, and it ran, doing nothing of what I expected it to do.), that happens quite a lot.
It’s one reason that drove me to the extreme strictness of Haskell, where you have to get it right upfront, so it doesn’t bite you in the ass later.
Any sufficiently advanced intelligence is indistinguishable from stupidity.
Actually, Facebook uses APC to compile and optimize the code in the shared memory so it doesn't have to be compiled over and over again.
There are other libraries for caching PHP functions on many different levels as well, and they're open source, for the most part. Some real bright minds from Facebook and other large PHP applications have contributed to them.
Bottom line: PHP is quite powerful and efficient when built and extended properly.
Those were actual benchmarks run at peak load for 5-minute periods. sustained rate of over 600,000 queries in 5 minutes, or 2,000 per second (around 2,200 iirc), on absolutely craptastic hardware, against an 8 gig mysql table. Benchmark was by running ab (apache benchmark) against a custom forking server instead of apache, tested with between 100 and 400 simultaneous requests. Threads were never "reaped", always reused, so it was important that there were no memory leaks, but never having to spawn another thread after initial startup also contributed to the difference.
Contrast to php, where every script has to be loaded, interpreted, then flushed out of the system so it leaves a clean memory footprint for the next script, and where tons of variables that your script may never call have to be initialized each run. Obviously only compiling what you need and loading it once is more efficient :-)