The Environmental Impact of PHP Compared To C++ On Facebook
Kensai7 writes "Recently, Facebook provided us with some information on their server park. They use about 30,000 servers, and not surprisingly, most of them are running PHP code to generate pages full of social info for their users. As they only say that 'the bulk' is running PHP, let's assume this to be 25,000 of the 30,000. If C++ would have been used instead of PHP, then 22,500 servers could be powered down (assuming a conservative ratio of 10 for the efficiency of C++ versus PHP code), or a reduction of 49,000 tons of CO2 per year. Of course, it is a bit unfair to isolate Facebook here. Their servers are only a tiny fraction of computers deployed world-wide that are interpreting PHP code."
I remember when it was the script kiddie's substitute for cgi-perl. What does it offer from a theoretical and engineering PoV, apart from a Visual Basic learning curve?
That's a ridiculous way to analyze it. What about the environmental impact of the extra time required to write the same functionality in C++? What about the impact of whole classes of C++ bugs that don't exist in C++ (and, perhaps, vice versa) with the downtime or security breaches resulting from them? Or a hundred other ways in which writing all that software in C++ would be different of which I can't think at the moment?
Seriously, is somebody taking seriously the 1 to 10 ratio of the story?
I mean, maybe raw execution of pure code is going 10 times slower in PHP than C++ (ouch, I didn't know that) but even then, it's far from representing the same ratio when talking about a number of servers. You have to take into account all other parameters (disk access, network, IO, etc... Those aren't 10 times as slow in PHP one would guess).
I would be astonished if this ratio is close to be the truth. Does anyone have any insight/information on this?
Write boring code, not shiny code!
The thing that this article fails to see, is that some languages aren't for everyone. A PHP programmer who turns out good PHP code isn't going to magically make the same level of code for C++. It also doesn't see that Facebook can't be down for longer than an hour at most, otherwise risk user outrage. After all, they have many, many, many users and for it to go down for a day would be akin to Google going down for a day or so. The difference being that if Google is down for a day, most users can use Yahoo, Bing, Live, WolframAlpha, etc. to search. Not every Facebook user has a MySpace.
Taxation is legalized theft, no more, no less.
That's crazy. 10:1 is incredibly unfair. Especially when you consider that a cached C++ page takes just as much time to return as a cached PHP page. On top of that, majority of the work done is just searching a database. If would imagine a large part of processing a page is in getting and returning data, which is then up-to-the database. He is using stats that say PHP is 10 slower for running through loops, math that type of crap. Says nothing about querying a database then doing some minor presentation related logic. If I had to guess, for a web page the average "efficiency gain" of using C++ would be under 2x.
I'm thinking that these scripts are just thin front ends to a massive db. Thus, a lot of the computer's time is going to be spent on I/O, and a lot of the processing is going to be taking place in the db itself, which is probably written in C.
Mod points: Guaranteed to remove your sense of humor.
Side effects may include gullibility and temporary retardation
I know your being funny but you've got a good point. Developing and maintaining C++ code is not like developing and maintaining PHP script. Which of course is why we have PHP to begin with. It's designed for the web and ease of implementation. Sure C++ would be faster running but not necessarily more efficient in terms of dollars.
Simply put: no.
The reason why they have so many servers is because Facebook contains so much data. The servers are there for a reason, and the reason is CACHING.
The overhead of PHP is very small for a platform that is all about sharing data and the bulk of processor time surely goes towards fetching that data in the first place. What, do you seriously think that when you hit your home page on Facebook, there are database queries issued for that? Lulz.
Besides, I'm almost sure that FB uses something like Zend Accelerator, which increases code execution speed a lot.
Anyway, just no.
Why not rewrite everything in assembly? This comparison comes to a conclusion without any facts to back it up. As others have pointed out there is development time and compile time associated with C++... and what about ongoing development? Where does 10-1 come from? Are you assuming they aren't doing any optimization or using any sort of accelerator? I've personally re-written code in C++ from php, and then done the comparison. In our case, we decided the extra maintainability was worth the approx 10-20% increase in speed we saw.
For something that is deployed to tens of thousands of machines..
Is there some reason why these languages couldn't be compiled and optimized? Code is just the programmer's will expressed as text that the machine can somehow interpret, right? If there is so much PHP out there, why wouldn't/couldn't there be an efficient compiler (by which I mean something that produces executables and not just "executables that are really just an interpreter tacked onto a script")
The dearth of such compilers on the market suggests to me that the gains wouldn't be as great as claimed for the majority of applications where interpreted languages are used.
Can you be Even More Awesome?!
Does the author seriously believe that Facebook isn't running some sort of PHP compiling/caching service, like APC or something similar?
It would be ridiculous for them NOT to be running something like that, which eliminates much of the advantage C++ would enjoy through being pre-compiled. While there still may be a reduction if Facebook were magically changed to precompiled C++ code, the reduction would be fairly minimal. In addition to that, you'd need to factor in the debugging and coding/compiling times, which would exceed the PHP times by an order of magnitude at least.
What a troll. Any point or argument based on assumptions is very weak. Here there are two: "..Let's assume this to be ..." and "...assuming a conservative ratio of 10...".
Don't make stuff up.
-Foredecker
Jibe!
Many many moons ago efficiency was everything. The CPU was expensive, the developer was (relatively speaking) cheap.
Then Moore's law really started to kick in, and we needed a paradigm shift. Developers were more expensive, and CPU cycles could be had on the cheap. The mantra was "code it fast, and only worry about efficiency for the bottlenecks if at all".
Fast forward to almost 2010, and we have web applications deployed on a massive scale. Guess efficiency matters again. Not only from a pure cost standpoint but also from a moral argument to cut back on greenhouse gases. Amazing that more people haven't seen this coming. Especially given that web services are normally free to the consumer, the cost side of the equation clearly matters.
"assuming a conservative ratio of 10 for the efficiency of C++ versus PHP code"
ARRRRRGGGGHHHHHHHHHHHHH
Why? On what evidence? I mean, I hate PHP as much as the next guy, but last time I wrote a web application platform in C++, I got to the end, analysed the result and went "Great, I've made the fast bit even faster. Now, about that database engine..."
And everything exuding heat is perfectly natural, no problems there.
The deaths and environmental changes from heat exchange in rivers near power plants don't happen, nope, uh uh.
Water's perfectly natural you need it to live, no way to drown in it, nope, uh uh.
while true it ignores things like your comparing a simple search box, with millions of users who post multi megabyte files to their personal space for everyone to see. try it some day save a facebook user's page locally and see just how much data is coming down that pipe, on top of the scripts that are running.
Your comparing googles front door with facebooks entire company. Google probably has that many servers running web crawlers, and twice over again to store that massive database they use.
i thought once I was found, but it was only a dream.
"development" also has one.
Not to mention clients. 20K servers is nothing compared to the millions of clients drawing higher power due to running looping flash commercials.
What C++ has always lacked, and PHP, Java and others do not, is a bundle of standard libraries that let you do things like process XML, talk to databases, and make templating EASY.
That's it. php does the same things C++ does, but go one beyond and add a rich library and of course, the ability to skip the "compile" step in the write -> compile -> test
I agree with you, but there's one small thing I don't get.
Faced with this piece of information, someone thought the logical thing to do was to, er, write an entirely new language?
Companies use PHP to develop and run web app functionality because it saves them huge amounts of time and money over rolling out the same thing if you were to write it all in C++. Realize what the cost structure of a company like Facebook is - the amount they pay their engineers, marketing personnel, and so on is significantly more than their amortized server expenses and server operating expenses (including energy costs, etc.).
Furthermore, the 10x speedup assumption seems ridiculous - how much time is spent on their server in compute-intensive PHP loops where huge gains would be made from switching to C++? And how much of the "code" is really database queries of various sorts? Furthermore, you can generally isolate small areas like that in your codebase and rewrite them as modules in C or C++ to be invoked from PHP land - and if they could easily cut their server expenses even in half (let alone by 90%) by having a few engineers spend a few weeks rewriting some components, don't you imagine they've probably set about doing that already?
Re-casting a discussion in terms of greenhouse gas emissions or energy use doesn't change any of this - saving energy generally means saving money, unless it takes more expensive resources (such as 100s of humans, who have to spend hundreds of months re-writing code in C++, while they, their families, and dependents emit tons upon tons of greenhouse gases, use electricity, buy groceries, and so-on and so-forth). The cheapest solution certainly isn't always the most environmentally friendly solution (such as when negative externalities are involved - lower labor and pollution standards in China, for example, that make a less "green" product manufactured there less costly in the US), but a vastly more expensive solution that no company in its right mind would implement isn't necessarily greener just because it might save some electricity and a few servers once it was implemented.
Obviously lesser number of servers for a lesser CO2 footprint also means cheaper server infrastructure. If that was the case, don't you think FB would have done it long ago? Economic forces are the main drivers of technology innovation in social networking!
It probably is a valid excuse if you have 20,000 client machines connecting locally via ethernet from a B class subnet such that the arp tables on the server keep overflowing.
Of course if you, as a system administrator ever let such an environment be setup you probably are really good at excuses anyway.
Yes. I know the difference. C is an elegant if simple language, which is hard to program properly. C++ is an abomination that attempted to take the elegant, simple nature of C by bolting on spare body parts from dead object-oriented corpses, resulting in a language that is neither simple nor elegant, which is even harder to program properly.
See, I know the difference.
But if the point is to gain efficiency, why would you stop at C++? It's not a magical perfect balance of performance with elegance. C would give better performance than C++.
Sure, there's the non-OO tradeoff (though you could quite easily gain the benefits of OO, though not as elegantly as C++), and then you don't have to deal with fucking templates (which are really nice to program, but a bitch to clean up when someone else has fucked them up for you).
The premise of the article is stupid, and shows a pure lack of understanding of PHP, web service architecture and implementation, and a not-inconsiderable dose of C++ fanboi-ism.
Microsoft is to software what Budweiser is to beer.
"ever read someone's c++ code? has it been a good experience?"
Sure, when the code is written by someone who really knows how to use C++. Ever read bad PHP code? Bad Java code? I have seen programmers do things like this:
int int1, int2, int3, int4, int6, int7;
No, that is neither a joke nor an exaggeration, and the missing number is deliberate. This is a declaration I saw on a recent project. This kind of poor coding is language agnostic, and it is entirely irrelevant whether someone is using C++, PHP, or even a language like Haskell (bad Haskell code is worse than that worst C++ code I have ever seen -- if you use a functional language, get it right!).
On the other hand, I have seen some maintainable C++ code, with appropriate and useful comments, well thought out classes and class relationships, and expert use of the STL. I once worked on a project with C++ code that dated back to the early 90s, and had been continuous updated to support new features and needs, to make use of the STL (yes, this can be written into old code without causing a disaster), and so support systems that did not even exist when the code was originally written.
Don't blame the language, blame programmers who never learned about good programming practices. Blame computer science programs that give people degrees they do not deserve. Blame an industry that will hire anyone who can write a hello world program and then assume that they are capable of writing a maintainable system with millions of lines of code. The best programming language in the world will not solve the problem of poor programmers and poor coding practices.
Palm trees and 8
Maybe you should learn the language first. It seems there are an awful lot of people who love to comment on the complexity and performance of C++, who never bothered to really learn the language. Yet this doesn't stop them from pretending the be experts on it.
Part of the issue is the culture surrounding C and C++ code: there is a demand for backward compatibility. The C++ standards committee is very wary of breaking compatibility with previous editions of the standard, notwithstanding the breakage that compiler writers introduce. Thus, we are left with antiquated and frankly dangerous features that should not be used, but which novices wind up using anyway.
Strings are a perfect example. The C++ standard defines a string type that is decent enough and fixes a lot of problems associated with C-style strings. However, because of the demand for backward compatibility, C-style strings remain in the language, remain in use in parts of the library, and continue to wreak havoc on C++ programs.
I am not saying that the backward compatibility is a back thing -- in fact, I appreciate it very much, given the large body of old but useful code out there -- but I cannot deny that it creates problems. What I am saying is that I can see why someone would develop a new language to replace C++ instead of just writing a C++ library, given that there are a lot of people who write new code using these out of date features, thus creating a stream of horror stories about C++.
Palm trees and 8
IBM is working on a PHP compiler to create bytecode for the JVM. Using P8 as it is called, you can run your compiled PHP programs on a java application server such as Tomcat or JBoss.
She made the willows dance
Which would be very relevant if Facebook was doing heavy number-crunching. The only numbers on the site are comment and friend counts, which isn't especially taxing work (especially since it's all de-normalized). The majority of FB is database activity and transforming that into HTML and JSON. If you want to place blame for inefficiency, MySQL would probably be your best bet.
How are sites slashdotted when nobody reads TFAs?
What C++ has always lacked, and PHP, Java and others do not, is a bundle of standard libraries that let you do things like process XML, talk to databases, and make templating EASY.
I agree with you, but there's one small thing I don't get.
Faced with this piece of information, someone thought the logical thing to do was to, er, write an entirely new language?
What? Your logic is circular. PHP did not have standard libraries for XML (etc.) until after it existed, obviously.
PHP was invented as a lightweight server-side preprocessor as an alternative to CGI, not as a general-purpose systems-engineering low-level compiled language.
(I don't disagree with your gist that PHP is not well suited to many of the jobs it's used for today, but I wanted to clarify the history.)
-b
myselfmusic
Isn't this "study" a waste of energy?
I am a C/C++ programmer by trade; I'm not fond of PHP. Yet this "C++ saves energy over PHP" argument smells like more selfish politics to me. And selfish politics is what is bringing doom down on humanity's head -- the use of PHP vs. C++ is a sideline, a distraction, and only truly valuable for people who have a philosophical axe to grind.
You want to save a lot of energy? Shut down all the computers running MMOs. And stop wasting cycles looking for alien signals in cosmic radio waves. And get rid of banal YouTube videos... and... the list is endless. The science behind Global Warming is being used to further political and social agendas that have little or nothing to do with adapting our species from a potential environment change.
In the end, selfish politics will kill us all. We will become a footnote in history is we do not discover enlightened self-interest.
All about me
Ok, this has gone WAY too far .. we all need to just take a step back..
---- Booth was a patriot ----
Computer programmers are people with their own carbon footprints, $FLATULENCE_JOKE. So, people have raised objections to the underlying efficiency argument, I tend to agree with the people who estimate that the energy savings would be less than 10-fold, but it's not like I've looked at the diagnostic output of their servers.
Labor costs money, right? So if you assume that $X million worth of servers and electricity are cheaper than $X million worth of programmer time to reimplement the whole mess in C, then it's probably minimizing the carbon footprint to leave it alone. This ought to be a very simple business decision.
There are certainly cases where this is not true, but for most purposes, dollars spent on computer programming go directly to carbon footprint. I'm a Socialist, certainly not a free market fanatic by any stretch, but when it comes to spending millions on highly specialized, skilled labor to reduce carbon footprint, I doubt that it's worth it unless the electricity you save costs more than the specialized labor.
The good and new comes from no quarter where it is looked for, and is always something different from what is expected.
You cache pages on your server so that instead of going to the database to fetch info, the info is already there.
Until you have a good reason to believe the info has changed. Say, the user updated something or someone posted a message. Then you go back and get new data and cache it again.
You also cache page components. Parts of the page that are on a different update schedule than other parts of the page may be cached separately or not at all (like ads).
Asking people to think is like asking them to buy you a new car
Why is it that a decent PHP (or Python, or Ruby) MySQL binding couldn't do the exact same thing?
Don't thank God, thank a doctor!
Developers that are diligent enough to make only 1 memory-related bug/year can certainly spell variable names correctly.
If you have statically typed language, you rely on types. If you have dynamic, you rely on unit tests. Both are probably equally slow :)
I think it's fair to say that FB servers generate a large amount of database I/O.
And their PHP code is likely running a lot of graph flow, pattern matching, and other data mining algorithms.
Including plaintext indexing and search algorithms
Remember, the whole point of the social network from an advertiser perspective is to select people on the network who are most likely to be interested in certain ads.
This suggests a lot of elaborate DM on FB's part.
Just because the intensive computations aren't obvious to the end-user, doesn't necessarily mean there is no heavy numerical computation being done behind the scenes.
What kind of work were those 10K req/sec on your own custom server doing? Was it a standard db-backed web app, or something more specialized and computationally intensive?
Not that I doubt the difference you saw - but I'm still skeptical of the 10:1 factor as applied to Facebook servers, which seem relatively standard webapp cycle (request -> datastore lookup -> html), *just from the programming language*.
Admittedly, I don't do PHP, so the language could be as bad and impossible to scale as you claim. But from their architecture description, I really doubt the request spends enough time (>= 90%?) in PHP computation to make *any* change there translate into a 10X improvement.
At least on my own experience, once you get into the '~1K req/sec' scenario on that type of webapp, the middle-tier code is rarely your main perf headache. You spend more of your time ensuring your data sources (sql+cache + any other services) keep up, and your middle-tier code spends much of its time waiting for whatever went out the socket to come back. There is always some perf improvement to make on the middle-tier, but if the request spends 90% of its time on the presentation layer... well, that's usually a perf bug, not by design.
There are good reasons to justify the cost of switching away from PHP - and they seem to be aware of them. But order-of-magnitude perf improvements would more likely result from architectural and datastore improvements - and sensibly enough, that seems where their efforts have focused.
Freedom is the freedom to say 2+2=4, everything else follows...
This is idiotic, and is typical of the kind of pseudo-science underlying much of the climate alarmism currently en vogue. Like a lot of things, it is pretty much impossible to quantify which language ultimately uses more power, because of all the variables. As others have pointed out, you might save some power in the deployment of the code, but you would surely use more power in the development of that code. Then, you have to figure out what the total impact of that is, since you'd have more man-hours of coding, using human coders, who sit at desks, in offices, which must be heated and cooled, etc., etc.
Their decision for using PHP might have to do with being able to get their business up and running now using PHP rather than envisaging go-live a few years down the road with their developer resources and learning curve adjusted to C++ (which in all its well-deserved glory does take its time to master). Probably C's savings in power don't outweigh PHP's savings in manpower.
Running a server is cheap.
Paying a developer is not.
Civilisation is largely about the multiplication of human effort through the consumption of energy and automation. So, we multiply this developer's effort by a couple of thousand when running one machine and then do the same on another several hundred machines beyond. Each costs several thousand dollars to purchase and several thousand more every year in electricity, in cooling, networking, management and maintenance.
So, the effects of developer incompetence are also multiplied several thousand times often across hundreds or thousands of systems. Millions if we're really lucky.
So it isn't just one server, it's just one extra datacenter. It often pays to hire better people.
running a server for a day - $1
You think you get a real server for that? You get a tiny division of a server for that kind of money.
2) why doesn't these big server farms start looking at migrating code from PHP to C or C++ when the PHP+web design is solid?
The network effect. They migrate to Java instead.
Speed to delivery is nearly always primary importance.
Indicating speculative projects and disposable code.
Deleted
What about just turning off facebook, the entire thing is a collosal waste of time. And while your at shoot all the users as most of them are idiots who are a waste of space.
Yes, it is harsh, but anyone who has not programmed in c and assembler, and then spouts off nonsense about how php can't possibly be 10x slower, doesn't have the programmer mind-set.
That mindset includes understanding the runtime environment - which means knowing the limitations of your tool - in this case php. That means you'll not "have" to do something in php because "when all you know is php, everything looks like it needs a script" rather than a different tool.
Case in point - generating test data. Say you want half a billion examples stuffed into a db. You can run a script, which will take forever, or you can write it in c. Real-life example - a "world" measuring 8k x 8k cells, with each cell also measuring something like (iirc) 100 x 80. You can write a script to generate all that, and it will take a week to run (actually, it would have taken 220 hours). Or you can take an hour to write a similar program in c, let it run during lunch (a longish lunch - an hour and a half), and get back to work in the afternoon.
That's why I'm a bit harsh on the "script it!" crowd. They're not very imaginative or curious, or they would learn c - it's not THAT hard.
Ever read bad PHP code?
My hobby is refactoring PHP code. Note I say hobby, and not job.
After cutting my teeth with C, I moved on to web development with Perl. I was really annoyed at all the quirks in that language, namely, bizarre subroutines instead of functions, and clever regular expressions everywhere. Perl was just a pain, and I still don't like it! So, I decided to give PHP a spin, and I liked it because it was closer to the C code I used to write.
It didn't take long for me to realize there was something seriously wrong with the language. After reading up on its history, I realized the problem: PHP is a crappy template engine that has been "upgraded" into a language over the course of many painful years. Horrible inconsistencies and limitations abound, names of functions make little sense, and it's taken years for even the most basic facilities to be kludged in. References work in reverse? I have to test for automatic quotes at runtime to make sure I don't double-encode input? Some text functions are binary safe and others aren't? Completely different APIs for MySQL 4 and MySQL 5? Yeah, okay... thanks.
I went out of my way to study the language and write code correctly, but I can't blame people for writing bad PHP code. PHP is the Windows of the web development world. Nobody really designed it, it just kind of "grew" out of some little hack project and caught on among the rookies.
The only reason I work with PHP is because I want to make redistributable code that will work on shared servers, where people cannot install a new language on the server. Perl and PHP are your only options for that, and Perl can be a bitch to get running correctly, depending on the server's security policy. I don't care if Python makes a comeback or Ruby catches on, or whatever. I just want... something else to work with.
Ergo, it is going to reduce the processing necessary on the server to do any given job
Any given job, yes. But if there are a lot more "jobs" (i.e. more requests that require server side processing), the efficiency of the language used on the server side tends to become more critical, not less, especially if the per request overhead is significant, something that happens to be one of Facebook's primary complaints about PHP.
While you're churning away your super optimized C code which runs faster than god knows what and finally debugging the library to handle your super cusotmized tcp/ip replacement, I'll have already rolled out the application you wanted to do, but in some "non-programming/scripting" language like PHP, Ruby, Python, or hell... even Java.
There's a purpose for every language out there and frankly, writing some form of code to have a computer perform specific tasks is called programming. So please contain your ego instead of going off and spouting things like...
scripting (which isn't "real programming"
> Sure C++ would be faster running
If you take APC or similar compile caches into account, I think you'll find that the gap is remarkedly smaller than you'd expect. It'll never close entirely, but given that I've seen 20x speedups on some pages, the benefit is huge.
What a depressingly stupid machine.