The Environmental Impact of PHP Compared To C++ On Facebook
Kensai7 writes "Recently, Facebook provided us with some information on their server park. They use about 30,000 servers, and not surprisingly, most of them are running PHP code to generate pages full of social info for their users. As they only say that 'the bulk' is running PHP, let's assume this to be 25,000 of the 30,000. If C++ would have been used instead of PHP, then 22,500 servers could be powered down (assuming a conservative ratio of 10 for the efficiency of C++ versus PHP code), or a reduction of 49,000 tons of CO2 per year. Of course, it is a bit unfair to isolate Facebook here. Their servers are only a tiny fraction of computers deployed world-wide that are interpreting PHP code."
What about all the cycles compiling and debugging C++ code? Or all the trees torn down for C++ books? Or the environmental impact of C++ developers? I mean, have you ever had to share a cube with one of them? Pheewww.
Run and catch, run and catch, the lamb is caught in the blackberry patch.
That's a ridiculous way to analyze it. What about the environmental impact of the extra time required to write the same functionality in C++? What about the impact of whole classes of C++ bugs that don't exist in C++ (and, perhaps, vice versa) with the downtime or security breaches resulting from them? Or a hundred other ways in which writing all that software in C++ would be different of which I can't think at the moment?
Seriously, is somebody taking seriously the 1 to 10 ratio of the story?
I mean, maybe raw execution of pure code is going 10 times slower in PHP than C++ (ouch, I didn't know that) but even then, it's far from representing the same ratio when talking about a number of servers. You have to take into account all other parameters (disk access, network, IO, etc... Those aren't 10 times as slow in PHP one would guess).
I would be astonished if this ratio is close to be the truth. Does anyone have any insight/information on this?
Write boring code, not shiny code!
The thing that this article fails to see, is that some languages aren't for everyone. A PHP programmer who turns out good PHP code isn't going to magically make the same level of code for C++. It also doesn't see that Facebook can't be down for longer than an hour at most, otherwise risk user outrage. After all, they have many, many, many users and for it to go down for a day would be akin to Google going down for a day or so. The difference being that if Google is down for a day, most users can use Yahoo, Bing, Live, WolframAlpha, etc. to search. Not every Facebook user has a MySpace.
Taxation is legalized theft, no more, no less.
That's crazy. 10:1 is incredibly unfair. Especially when you consider that a cached C++ page takes just as much time to return as a cached PHP page. On top of that, majority of the work done is just searching a database. If would imagine a large part of processing a page is in getting and returning data, which is then up-to-the database. He is using stats that say PHP is 10 slower for running through loops, math that type of crap. Says nothing about querying a database then doing some minor presentation related logic. If I had to guess, for a web page the average "efficiency gain" of using C++ would be under 2x.
You mean kind of like Road Send
http://www.roadsend.com/home/index.php?pageID=compiler
Read the first posters points (in TFA) he pretty much sums everything up.
Just serve up plain text files. Anything else is pure decadence!
I'm thinking that these scripts are just thin front ends to a massive db. Thus, a lot of the computer's time is going to be spent on I/O, and a lot of the processing is going to be taking place in the db itself, which is probably written in C.
Mod points: Guaranteed to remove your sense of humor.
Side effects may include gullibility and temporary retardation
Simply put: no.
The reason why they have so many servers is because Facebook contains so much data. The servers are there for a reason, and the reason is CACHING.
The overhead of PHP is very small for a platform that is all about sharing data and the bulk of processor time surely goes towards fetching that data in the first place. What, do you seriously think that when you hit your home page on Facebook, there are database queries issued for that? Lulz.
Besides, I'm almost sure that FB uses something like Zend Accelerator, which increases code execution speed a lot.
Anyway, just no.
I don't care about your environmentalism.
Exactly. He using stats from benchmark results from pure number-crunching tests. Seriously, here are the tests: pidigits, reverse-complement, regex-dna, k-nucleotide, n-body, fasta, binary-trees, fannkuch, spectral-norm, mandelbrot. Yep. Looks like stuff a web page would do... The biggest bottle neck is probably data access, in which case the language really doesn't make much, if any difference.
Why not rewrite everything in assembly? This comparison comes to a conclusion without any facts to back it up. As others have pointed out there is development time and compile time associated with C++... and what about ongoing development? Where does 10-1 come from? Are you assuming they aren't doing any optimization or using any sort of accelerator? I've personally re-written code in C++ from php, and then done the comparison. In our case, we decided the extra maintainability was worth the approx 10-20% increase in speed we saw.
For something that is deployed to tens of thousands of machines..
Is there some reason why these languages couldn't be compiled and optimized? Code is just the programmer's will expressed as text that the machine can somehow interpret, right? If there is so much PHP out there, why wouldn't/couldn't there be an efficient compiler (by which I mean something that produces executables and not just "executables that are really just an interpreter tacked onto a script")
The dearth of such compilers on the market suggests to me that the gains wouldn't be as great as claimed for the majority of applications where interpreted languages are used.
Can you be Even More Awesome?!
Does the author seriously believe that Facebook isn't running some sort of PHP compiling/caching service, like APC or something similar?
It would be ridiculous for them NOT to be running something like that, which eliminates much of the advantage C++ would enjoy through being pre-compiled. While there still may be a reduction if Facebook were magically changed to precompiled C++ code, the reduction would be fairly minimal. In addition to that, you'd need to factor in the debugging and coding/compiling times, which would exceed the PHP times by an order of magnitude at least.
What a troll. Any point or argument based on assumptions is very weak. Here there are two: "..Let's assume this to be ..." and "...assuming a conservative ratio of 10...".
Don't make stuff up.
-Foredecker
Jibe!
Google.com: 12,056 servers Website Worth: $ 186,352,889,952 USD (€ 126,610,389,668 EUR) Facebook: 34,872 servers Website Worth: $ 5,253,772,177 USD (€ 3,569,475,862 EUR) I'd say 'GOOG' i doing a bit better atm :)
for those interested:
info comes from: http://websiteshadow.com/
it will show ad rev. etc for URL's you provide
Seriously, is somebody taking seriously the 1 to 10 ratio of the story?
Only 1 to10 ?!? I would have thought 1 to 100.
"assuming a conservative ratio of 10 for the efficiency of C++ versus PHP code"
ARRRRRGGGGHHHHHHHHHHHHH
Why? On what evidence? I mean, I hate PHP as much as the next guy, but last time I wrote a web application platform in C++, I got to the end, analysed the result and went "Great, I've made the fast bit even faster. Now, about that database engine..."
It's got a very rich set of features that are aimed straight at making web development dead simple. The syntax is fairly straightforward and familiar, being a typical mishmash of shell scripting, C and perlisms. It was built from day one to integrate with Apache, it's not a nasty bolt-on hack like mod_perl. It's in-process so there's no startup overhead like with CGI. I've been using it on some pretty large web sites for years and it's never let me down.
no longer working for cnet
.. because I didn't ever think I'd be defending PHP.
However, it is a much better choice for a web application than C or C++ - and I say that as someone who codes C, C++ and Java for a living. There are no decent web frameworks for C++, memory management is still an issue despite the STL, and the complexity of the language means both staff costs and development time are inflated. Peer review is harder, as the language is fundamentally more difficult to master than PHP. Compared to Java, the development tools are poorer, and things like unit testing a more complicated despite the availability of things like Cppunit. There's no "standard" libraries for things like database access, and no literature that I am aware of that describes how you would go about designing a framework for C++. You'd most likely end up porting something like Spring to C++, and the even if you published your code on the web, I doubt much of a community would build up around it.
If you want a less contentious argument, and one which can be backed up with hard evidence, then argue PHP that should be replaced with Java. A well written Java web application, using a lightweight framework such as Spring or PicoContainer, should outperform ad-hoc C++ code.
I use it because I can code up relatively fast, relatively secure dynamic websites in a very short amount of time. I can install it on a webserver in seconds and it integrates beautifully with Apache and MySQL. Maybe there is a better solution out there, but PHP has always done what I need it to and I've never had a problem with it. It's never given me a reason to look elsewhere.
What I don't understand is all of the PHP-haters out there. Really, who cares if it is "the script kiddie's substitute for cgi-perl"? Isn't the proper measure of a tool if it does what you need it to and not who else uses it?
Seriously, years ago I started working on a c++ version of j2ee (not just servlets, the whole kit) and i mean providing similar functions not identical methods of execution obviously. It wasnt terribly hard actually. But it all falls apart really quickly cause of several reasons:
1) platform architecture - the dependence here, even between different versions of the same distribution was a pain and essentially spelt the end of my work. So I was stuck with "do i make web apps c++ soruce, or shared library binaries?" to which there is only one real answer for portability - source.
2) its a systems langauge - dear god that makes it painful for so many reasons.
There are caveats to both those, but the reality is that php exists because it fulfils a need and it does it quite well. To compare the two (c++ and php) is a little ridiculous and ultimately this article just reeks of "please everyone advertise my c++ web tool kit for me!". Sure, facebook (and trillions of others) MIGHT move to c++ web tool kit, but find me a dev that knows how to code an app it, now find me 2, now find me 200 cause thats how many i'd need to write and maintain faceboot apps in c++.
Even taking the OP's assumtion c++ is 10 times more efficient at what php does and that you could actually code facebook in it as actually acurate and that php vs c++ is a one-to-one relationship for things like code maintenance, your still stuck with "how many API's am i going to have to re-write and how many php api's do i use that dont even exist in c++". Its ludicrous to assume that you could drop-in replace php with witty without ending up coding tonnes of c++ code just to do things that PHP already provided. Not to mention the zillions of little extensions that revolve around php to accelerate its web-abilities (memcached for example). The number of things that can be used along side php for web-related things and the number of api's in-built to php just mean witty is never even going to be viable as an alternative. Lets also not forget there are millions of people round the globe using php for web stuff - which ultimately leads to php being a good web language (i.e. security problems being found, optimizations, etc etc).
Of course, wouldn't facebook be using something like zend to compile php pages? I mean seriously, if the 25000 servers are running php and not running zend the waste here just in cost of servers would be unbelievable - shear idiocy on facebooks part (if it were true, and i'd very much doubt it) and I imagine zend would have almost given it away for free just so facebook could say "we got a x% improvement using the zend compiler".
So, I wonder how many people are now learning about witty for the first time (which seems like the only real reason for the article to begin with). Better advertising than adwords!
And everything exuding heat is perfectly natural, no problems there.
The deaths and environmental changes from heat exchange in rivers near power plants don't happen, nope, uh uh.
Water's perfectly natural you need it to live, no way to drown in it, nope, uh uh.
while true it ignores things like your comparing a simple search box, with millions of users who post multi megabyte files to their personal space for everyone to see. try it some day save a facebook user's page locally and see just how much data is coming down that pipe, on top of the scripts that are running.
Your comparing googles front door with facebooks entire company. Google probably has that many servers running web crawlers, and twice over again to store that massive database they use.
i thought once I was found, but it was only a dream.
Yes, PHP is a heck of a lot slower on proccessor-bound tasks than C++. In a pure benchmarking contest, no doubt C++ will win.
But what about when both languages have to query a database (be it mysql/postgress/oracle, etc)? In this case, both are blocked on the speed of the database. a 15 ms query takes 15 ms no matter what language is asking. Facebook is not calculating pi to 10 gazillion digits, and it is not checking factors for the Great Internet Mersenne Prime Search. It is serving up pages containing tons of customized data. This is not proessor-bound... it is I/O bound both on the ins and outs of the database and the ins and outs of the http request. It is also processor bound on the page render, but the goal of this many machines is to cache to the point where page renders are eliminated.
Once a page is rendered, it can be cached until the data inside of it changes. For something like facebook, I bet a page is rendered once for every ~10 times it is viewed by someone. Caching is done in ram, and large ram caches take a lot of machines.
So lets look at those 30,000 machines not by their language, but by their role. We can argue the percentages to death, but lets assume 1/3rd are database, 1/3rd are cache, and 1/3rd are actually running a web server, assembling pages, or otherwise dealing with the end users directly (BTW, I think 1/3rd is way high for that.)
So 1/3rd of the machines are dealing with page composition and serving pages. If they serve a page ~10 times for every render request, then abtou 1/10th of the page requests actually cause a render... the rest are being served from cache. Those page renders are I/O bound, as in the example above - waiting on the database (and other caches, like memcached), so even if they are taking a lot of wait cycles, they are not using processor power on the box. The actual page composition (which might be 20% of the processing that box is doing), would be a lot faster in C++... So 10,000 servers, the virtual equivalent of 2000 are generating pages using php, and could be replaced by 200 boxes using stuff generated in C++.
So the choice of using php is adding ~1800 machines to the architecture. or ~6% of the total 30,000. Given that a php developer is probably 10x more productive than a developer in C++, is the time to market with new features worth that to them? I bet it is.
Some optimized assembler would make a difference (ducks).
But network latencies, number of sustainable TCPs per session, db latency, weird table lookups (even arp drags a server down when you have 20K+ connects) are all at issue. Add in various dirty caches, file locks/unlocks and other OS machinations, and life can be tough for any app written in anything.
Then there are the backup servers, the availability servers, the DNS servers, the coffee servers, it just gets bogged down. A 10:1 efficiency claim is probably just language fanboy-ing..... or a consulting job looking for a spot marked X.
Certainly it's nice to be green... but using better optimization tricks (like GCD) for multi-cores is bound to help.... tickless kernels..... SSDs..... C++ wouldn't be my first pick.
---- Teach Peace. It's Cheaper Than War.
"development" also has one.
Not to mention clients. 20K servers is nothing compared to the millions of clients drawing higher power due to running looping flash commercials.
And water isn't a poison, but you'll still die if you drink too much of it.
It's a phenomenon we have also noted.
Sure C++ would be faster running but not necessarily more efficient in terms of dollars.
I think you'll find that the servers come out of the operational budget, not the development one. So the costs of running 10x more servers don't factor into development effort. The costs should of course be charged back to the dev teams.
Deleted
The proposed ratio of 1:10 is real, if not bigger. And here's why:
1.) For each request, PHP has to load entire application responsible for that particular response, including its configuration, etc. With memcache(d), you have to instantiate connection classes and reconfigure them, per request. Languages like C/C++, Python and Ruby have different architecture to begin with. They load ONCE and each request triggers a FUNCTION or METHOD of a class, with all the app-specific configuration, db and memcached connections done and configured on app init, NOT per request.
2.) TFA mentions microsecond relevance! Even a simple echo "Hello World" will take much more time than similar action in C. I have yet to see a PHP helloworld app that does it in under 1msec, let alone the microseconds required.
3.) Arrays in PHP are slow, being always hashmaps. Other data structures can speed up things. You don't always need hashmaps. SPLFixedArray() is a joke, btw, and available only as of 5.3. Can't compare it to a vector anyways, and lots of fixed structures can be represented by structs or classes in C which are anways faster than in PHP. Also the app can instantiate them once on init, and just (re)load when required.
4.) Even if all the app does it parse input vars and call memcache(d) / database funcs/methods to retrieve/store data, those calls are faster in C. Params can be parsed quicker in C, not requiring hashmaps for instance.
5.) FastCGI is crap. If this app were to be done in C, then it would require its own HTTP layer, epoll based (for Linux). It can take out all the crap in HTTP that is not requred to parse the AJAX calls, and does not need to be "generic" enough to deliver static content.
6.) For such dedicated and distributed deployments, garbage collection is sometimes not required. For instace, fixed-length stuctures can be preallocated upon app init, and the app can really take as much RAM as possible on startup. Yes, that would limit the MAX number of users/connections per server, but so what? The app dominates the server, nothing else is required to run (except basic OS environment for the app), so fixed memory consumption is not a problem.
7.) Even though each request has to wait for I/O of some sorts, either from memcache(d), from disk or from DB, you can process much more of these per front-end server and just scale backend servers as required. For example, with PHP your front-end server can serve 100k/sec, having X DB backends and Y memcached backends. With a C application, the front end can serve, say, 1M/sec. You still get to keep one front-end, even though you had to put more backends.
In short, you can significantly reduce the number of servers required if the app was written in C.
You can go to work in a F1 car, or your normal car.
I wish. My F1 always gets stuck in the gutter at the end of the driveway.
What C++ has always lacked, and PHP, Java and others do not, is a bundle of standard libraries that let you do things like process XML, talk to databases, and make templating EASY.
That's it. php does the same things C++ does, but go one beyond and add a rich library and of course, the ability to skip the "compile" step in the write -> compile -> test
I agree with you, but there's one small thing I don't get.
Faced with this piece of information, someone thought the logical thing to do was to, er, write an entirely new language?
From my personal experience: Data-heavy applications run at a complete crawl in PHP. 10 times slower, is, in my opinion, a vast understatement.
Then again, that’s not the point of PHP. The point is, that in PHP, provided you already know how to program, also get things done more than 10 times faster, than in C++. Because there is a simple function with defaults and automatisms for literally everything.
Only if those defaults and automatisms are other than what you expect, you will get into big trouble. And because the PHP interpreter is truly a horrible piece of shit (I was able to run totally illegal constructs, with plain text right in the middle of the code, and it ran, doing nothing of what I expected it to do.), that happens quite a lot.
It’s one reason that drove me to the extreme strictness of Haskell, where you have to get it right upfront, so it doesn’t bite you in the ass later.
Any sufficiently advanced intelligence is indistinguishable from stupidity.
"even arp drags a server down when you have 20K+ connects"
Are you perhaps a server admin in my company? I swear this is the best excuse for poor performance I've ever heard.
Companies use PHP to develop and run web app functionality because it saves them huge amounts of time and money over rolling out the same thing if you were to write it all in C++. Realize what the cost structure of a company like Facebook is - the amount they pay their engineers, marketing personnel, and so on is significantly more than their amortized server expenses and server operating expenses (including energy costs, etc.).
Furthermore, the 10x speedup assumption seems ridiculous - how much time is spent on their server in compute-intensive PHP loops where huge gains would be made from switching to C++? And how much of the "code" is really database queries of various sorts? Furthermore, you can generally isolate small areas like that in your codebase and rewrite them as modules in C or C++ to be invoked from PHP land - and if they could easily cut their server expenses even in half (let alone by 90%) by having a few engineers spend a few weeks rewriting some components, don't you imagine they've probably set about doing that already?
Re-casting a discussion in terms of greenhouse gas emissions or energy use doesn't change any of this - saving energy generally means saving money, unless it takes more expensive resources (such as 100s of humans, who have to spend hundreds of months re-writing code in C++, while they, their families, and dependents emit tons upon tons of greenhouse gases, use electricity, buy groceries, and so-on and so-forth). The cheapest solution certainly isn't always the most environmentally friendly solution (such as when negative externalities are involved - lower labor and pollution standards in China, for example, that make a less "green" product manufactured there less costly in the US), but a vastly more expensive solution that no company in its right mind would implement isn't necessarily greener just because it might save some electricity and a few servers once it was implemented.
Isn't that how things work in the real world? Your faucet is broken so you burn down the house. Seems like the logical way of dealing with it to me.
It wouldn't be so popular in first place if it was worthless. I wonder how much CO2 would be released into the atmosphere by the cars and computers of all the extra coders necessary to develop in C++ a website that could otherwise be developed in PHP, in the same period of time and with all other things equal. This is the same tired argument that can be used against *any* interpreted language. I love C++ - it's my favorite programming language - but I always use PHP for website development and I'll go on using it unless I'm forced not to.
It probably is a valid excuse if you have 20,000 client machines connecting locally via ethernet from a B class subnet such that the arp tables on the server keep overflowing.
Of course if you, as a system administrator ever let such an environment be setup you probably are really good at excuses anyway.
Yes. I know the difference. C is an elegant if simple language, which is hard to program properly. C++ is an abomination that attempted to take the elegant, simple nature of C by bolting on spare body parts from dead object-oriented corpses, resulting in a language that is neither simple nor elegant, which is even harder to program properly.
See, I know the difference.
But if the point is to gain efficiency, why would you stop at C++? It's not a magical perfect balance of performance with elegance. C would give better performance than C++.
Sure, there's the non-OO tradeoff (though you could quite easily gain the benefits of OO, though not as elegantly as C++), and then you don't have to deal with fucking templates (which are really nice to program, but a bitch to clean up when someone else has fucked them up for you).
The premise of the article is stupid, and shows a pure lack of understanding of PHP, web service architecture and implementation, and a not-inconsiderable dose of C++ fanboi-ism.
Microsoft is to software what Budweiser is to beer.
It was built from day one to integrate with Apache, it's not a nasty bolt-on hack like mod_perl. It's in-process so there's no startup overhead like with CGI
So mod_php is not a nasty bolt-on hack?
Don't forget to take account of the energy required to heat the water for the extra coffee it would take to build it in c++. People always forget about the coffee:production ratio.
"ever read someone's c++ code? has it been a good experience?"
Sure, when the code is written by someone who really knows how to use C++. Ever read bad PHP code? Bad Java code? I have seen programmers do things like this:
int int1, int2, int3, int4, int6, int7;
No, that is neither a joke nor an exaggeration, and the missing number is deliberate. This is a declaration I saw on a recent project. This kind of poor coding is language agnostic, and it is entirely irrelevant whether someone is using C++, PHP, or even a language like Haskell (bad Haskell code is worse than that worst C++ code I have ever seen -- if you use a functional language, get it right!).
On the other hand, I have seen some maintainable C++ code, with appropriate and useful comments, well thought out classes and class relationships, and expert use of the STL. I once worked on a project with C++ code that dated back to the early 90s, and had been continuous updated to support new features and needs, to make use of the STL (yes, this can be written into old code without causing a disaster), and so support systems that did not even exist when the code was originally written.
Don't blame the language, blame programmers who never learned about good programming practices. Blame computer science programs that give people degrees they do not deserve. Blame an industry that will hire anyone who can write a hello world program and then assume that they are capable of writing a maintainable system with millions of lines of code. The best programming language in the world will not solve the problem of poor programmers and poor coding practices.
Palm trees and 8
Maybe you should learn the language first. It seems there are an awful lot of people who love to comment on the complexity and performance of C++, who never bothered to really learn the language. Yet this doesn't stop them from pretending the be experts on it.
My own experience doing server development in c was that it's a minimum of 30:1 (and in in some cases, much greater). Plus the speed differential is huge, and also in favour of c.
There's a big difference between a couple of hundred requests a second and 6,000 - 10,000.
Then again, the php code had to be served through apache, while the c code was served directly by a custom server sitting on a separate socket, so there's no telling how much of the overhead was from apache.
Even the absolute worst-case scenarios were well over 10:1.
mod_php has never integrated into Apache nearly as deep as mod_perl did. That is, lower level Apache APIs are not exposed to PHP. Using mod_php is an acceptable replacement for CGIs, but mod_perl does a lot more than that. That means taking over the entire server life cycle handlers to the point where, in Apache2, you can implement (say) a Gopher server if you want.
mod_perl is not a hack. PHP, as a language and an API, very much is.
Not a typewriter
This is brilliant! I think it's clear now the direction we must go. Overuse of energy-guzzling languages like PHP have put us on an unsustainable trajectory fueling out of control global warming.
Congress must act to regulate the use of these energy-guzzling languages. No longer will programmers and corporations be permitted to turn out inefficient code with impunity.
PHP, Perl, Ruby, Bash, your days are numbered!
Just wait until we can get UN involved. Python, you and your CO2 spewing simplicity are next!
Part of the issue is the culture surrounding C and C++ code: there is a demand for backward compatibility. The C++ standards committee is very wary of breaking compatibility with previous editions of the standard, notwithstanding the breakage that compiler writers introduce. Thus, we are left with antiquated and frankly dangerous features that should not be used, but which novices wind up using anyway.
Strings are a perfect example. The C++ standard defines a string type that is decent enough and fixes a lot of problems associated with C-style strings. However, because of the demand for backward compatibility, C-style strings remain in the language, remain in use in parts of the library, and continue to wreak havoc on C++ programs.
I am not saying that the backward compatibility is a back thing -- in fact, I appreciate it very much, given the large body of old but useful code out there -- but I cannot deny that it creates problems. What I am saying is that I can see why someone would develop a new language to replace C++ instead of just writing a C++ library, given that there are a lot of people who write new code using these out of date features, thus creating a stream of horror stories about C++.
Palm trees and 8
Wrong - the language makes a huge difference. Try using the c api and CLIENT_MULTI_RESULTS and CLIENT_MULTI_STATEMENTS and concatenating 10,000 queries into one request, then using mysql_next_result() to get the next result set (no, not the next row, the next result set - 0 or more rows).
One connection. Not 10,000. A BIG difference in execution time. Testing showed that the optimum amount of strcat()ed or fsprintf'd queries was between 10,000 and 20,000 on hardware with limited resources (half a gig of ram, single cpu).
If each page requires 50 hits on the database, you're going to see a big difference.
Now imagine this on a machine with much more ram and more than one core.
More reading: http://dev.mysql.com/doc/refman/5.0/en/mysql-next-result.html
IBM is working on a PHP compiler to create bytecode for the JVM. Using P8 as it is called, you can run your compiled PHP programs on a java application server such as Tomcat or JBoss.
She made the willows dance
It's more like you decide you want a whole new room dedicated to watching movies, but in order to add that to your current house you'd have to spend tens of thousands of dollars and get approval from city hall and your homeowner's association. Just for a fairly small addition.
So instead you decide to go build a new house the way you like it, from the ground up, and while you're at it you add ethernet outlets into the planning because you always wanted that in your old house but you would have had to take down the drywall in order to get them where you wanted.
Buckle your ROFL belt, we're in for some LOLs.
Which would be very relevant if Facebook was doing heavy number-crunching. The only numbers on the site are comment and friend counts, which isn't especially taxing work (especially since it's all de-normalized). The majority of FB is database activity and transforming that into HTML and JSON. If you want to place blame for inefficiency, MySQL would probably be your best bet.
How are sites slashdotted when nobody reads TFAs?
What C++ has always lacked, and PHP, Java and others do not, is a bundle of standard libraries that let you do things like process XML, talk to databases, and make templating EASY.
I agree with you, but there's one small thing I don't get.
Faced with this piece of information, someone thought the logical thing to do was to, er, write an entirely new language?
What? Your logic is circular. PHP did not have standard libraries for XML (etc.) until after it existed, obviously.
PHP was invented as a lightweight server-side preprocessor as an alternative to CGI, not as a general-purpose systems-engineering low-level compiled language.
(I don't disagree with your gist that PHP is not well suited to many of the jobs it's used for today, but I wanted to clarify the history.)
-b
myselfmusic
Isn't this "study" a waste of energy?
I am a C/C++ programmer by trade; I'm not fond of PHP. Yet this "C++ saves energy over PHP" argument smells like more selfish politics to me. And selfish politics is what is bringing doom down on humanity's head -- the use of PHP vs. C++ is a sideline, a distraction, and only truly valuable for people who have a philosophical axe to grind.
You want to save a lot of energy? Shut down all the computers running MMOs. And stop wasting cycles looking for alien signals in cosmic radio waves. And get rid of banal YouTube videos... and... the list is endless. The science behind Global Warming is being used to further political and social agendas that have little or nothing to do with adapting our species from a potential environment change.
In the end, selfish politics will kill us all. We will become a footnote in history is we do not discover enlightened self-interest.
All about me
Ok, this has gone WAY too far .. we all need to just take a step back..
---- Booth was a patriot ----
"Faced with this piece of information, someone thought the logical thing to do was to, er, write an entirely new language?"
by my understanding, the whole new language slant is because of the nightmare of c++ code out there to reuse, with unintended consequences. php is very web centric and java the last attempt at a 'universal' coding setup. python is an example of new language and how more complicated new language implementation is.
Are you suggesting that they wrote PHP to avoid code reuse, that there hasn't been an attempt at a cross-platform language since Java, and that Python is complicated, all in the same paragraph?
Computer programmers are people with their own carbon footprints, $FLATULENCE_JOKE. So, people have raised objections to the underlying efficiency argument, I tend to agree with the people who estimate that the energy savings would be less than 10-fold, but it's not like I've looked at the diagnostic output of their servers.
Labor costs money, right? So if you assume that $X million worth of servers and electricity are cheaper than $X million worth of programmer time to reimplement the whole mess in C, then it's probably minimizing the carbon footprint to leave it alone. This ought to be a very simple business decision.
There are certainly cases where this is not true, but for most purposes, dollars spent on computer programming go directly to carbon footprint. I'm a Socialist, certainly not a free market fanatic by any stretch, but when it comes to spending millions on highly specialized, skilled labor to reduce carbon footprint, I doubt that it's worth it unless the electricity you save costs more than the specialized labor.
The good and new comes from no quarter where it is looked for, and is always something different from what is expected.
C++ is much too slow and carries too much of an overhead. And it usually requires an operating system on a general-purpose processor. You could go to hand-optimized binary code written directly for the processor but that still leaves us with inefficiencies.
Imagine if every website was implemented as an ASIC. Then we could talk about efficient datacenters. Maybe, if you're relly strapped for cash, you could implement each website in an FPGA. But that should only be a stopgap measure until you can afford a proper implementation.
USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
I have done projects like this, and received massive speedups and performance increases. The issue is that you need to understand the real reasons why rewriting a program in C and/or assembly gives a massive performance increase. Inevitably, the reason why the C program is so much faster, is that a programmer has went through and rethought the application. The programmer eliminated string copies, string manipulations, data communication overheads, and data manipulation/translation overheads by rethinking the programs design.
For example, imagine a very simple application designed to take a digital input, and display a red/green indicator to a user depending on the input state. Count every time a major string overhead, data communication overhead, or data translation overhead occurs in each of the proposed solutions.
Web Solution
1. Input digital input via PLC (Data Overhead #1)
2. Upload data from input via PLC communications protocol to PC (Data Overhead #2)
3. Make data available to other programs, for example RSSQL makes real-time I/O appear as SQL database queries (Data Overhead #3)
4. Use PHP or ASP to generate a web page based on a SQL query for the real-time input (Data Overhead #4)
5. Use a web browser to query the relevant web page. (Data Overhead #5)
Web Solution performance: it might be able to update the display screen every 1/5 second.
Embedded C Solution
1. Input a data point using real-time I/O
2. Paint a computers display screen accordingly. (Data Overhead #1)
C Solution Performance: 1/60 second, limited by the refresh rate of the monitor.
Assembly / Microcontroller Solution
1. Input the data point, with INP , AX
2. Output the data point to a Red/Green LED, with OUT AX,
Note: the assembly implementation doesn't have any string manipulation, so it doesn't have any significant data overhead.
Assembly Execution Time: Less than 1 micro-second.
The crucial concept from the above example is that the programmer reduced overhead and execution time, by simplifying program operation. The problem was solved in 3 different ways, and the fastest solution wiped out all the communication/string/data management overhead. If you want to make a computer program very fast, it is necessary to reduce data communication, string manipulation, and complex data structure overhead.
Which languages do this and why: .NET encourage carefree string use and data structure use. The have automatic garbage collection. As such, minimal penalties exist for the programmer to use strings.
Level 1 - Simplest: Assembly is the best at wiping out string overhead, because engineers willingly migrate complex functionality to hardware before implementing it in assembly. In this case, the display screen was eliminated in favour of a direct output to an LED.
Level 2 - Low-Level: C is remarkably quick at string manipulation programs, because programmers minimize the amount of string manipulation. String manipulation in C sucks, and is difficult to get correct. As such, programmers attempt to minimize it, or use optimized tools like lex/flex or yacc/bison that automate the difficult problems.
Level 3 - Garbage Collected: Java and
Level 4 - Scripted: PHP, Perl, Python are higher level languages focused on easy programming for high-level tasks. They pretty much assume the programmer doesn't care about the overhead of processing strings or complex data structures. Instead, they make it easy for the programmer to program the complex data structures.
An application like FaceBook has to have some complex data structures to do its job. In that case, a migration from PHP to C will likely not produce great benefits, because the C program still has to do all the same work the PHP program does. The old rule was that interpreters were very slow. With modern techniques, just about any language can be sufficiently compiled to
I came here for an argument!
turn up the jukebox and tell me a lie
Why is it that a decent PHP (or Python, or Ruby) MySQL binding couldn't do the exact same thing?
Don't thank God, thank a doctor!
Actually, both parent and GP are right. PHP is wonderful for web development, but has more than a few annoying quirks with regard to consistency.
On the flipside, it has hands-down some of the best documentation on the planet, which makes the quirks tolerable, and is a big part of the reason why the language is so popular (especially with new programmers)
I'm seriously hoping that a new PHP release finally clears up all of the inconsistencies in the main namespace once and for all. It'll be painful at first, but a very-good-thing in the long term. Updating old scripts could even be a semi-automated process, given that the necessary changes are extremely superficial.
-- If you try to fail and succeed, which have you done? - Uli's moose
Alright wise guy. Explain twitter.
/ \
\ / ASCII ribbon campaign for peace
x
/ \
Developers that are diligent enough to make only 1 memory-related bug/year can certainly spell variable names correctly.
If you have statically typed language, you rely on types. If you have dynamic, you rely on unit tests. Both are probably equally slow :)
I think it's fair to say that FB servers generate a large amount of database I/O.
And their PHP code is likely running a lot of graph flow, pattern matching, and other data mining algorithms.
Including plaintext indexing and search algorithms
Remember, the whole point of the social network from an advertiser perspective is to select people on the network who are most likely to be interested in certain ads.
This suggests a lot of elaborate DM on FB's part.
Just because the intensive computations aren't obvious to the end-user, doesn't necessarily mean there is no heavy numerical computation being done behind the scenes.
This is idiotic, and is typical of the kind of pseudo-science underlying much of the climate alarmism currently en vogue. Like a lot of things, it is pretty much impossible to quantify which language ultimately uses more power, because of all the variables. As others have pointed out, you might save some power in the deployment of the code, but you would surely use more power in the development of that code. Then, you have to figure out what the total impact of that is, since you'd have more man-hours of coding, using human coders, who sit at desks, in offices, which must be heated and cooled, etc., etc.
Wrong - the language makes a huge difference. Try using the c api and CLIENT_MULTI_RESULTS and CLIENT_MULTI_STATEMENTS and concatenating 10,000 queries into one request, then using mysql_next_result() to get the next result set (no, not the next row, the next result set - 0 or more rows).
One connection. Not 10,000. A BIG difference in execution time.
Are you trying to imply that PHP establishes an entirely new connection to the database for every query? If so, you basically lose all credibility you might otherwise have.
It would take a really serious amount of in-depth analysis of the server application to even approach knowing what the efficiency impact of using a compiled language vs an interpreter would be on any specific stack. Or even stacks in general. Plus we don't even know what it really means to be "using PHP". What is PHP doing? Is it processing templates, doing just some post or pre processing with some kind of XML pipeline in the middle, how is the PHP deployed, etc?
It is simply ridiculous to make any assertions and claim accuracy for them. I'm no PHP fan boy by a LONG shot, but I know from hard experience that often a higher level tool which is optimized for a particular job can get the job done quite a lot MORE efficiently than a lower level one that isn't.
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
Their decision for using PHP might have to do with being able to get their business up and running now using PHP rather than envisaging go-live a few years down the road with their developer resources and learning curve adjusted to C++ (which in all its well-deserved glory does take its time to master). Probably C's savings in power don't outweigh PHP's savings in manpower.
Your post is really annoying. Did you mean to be so obnoxious? And +5, Insightful. Come on, php isn't popular with slashdotters but whatever one calls reverse fanboyism it isn't cool either.
No, features that make web development "dead simple" are those that actually do something to make web development simpler...
Absolutely. And PHP does it. That's why it's so popular. There may be even more that can be done but if no popular language is doing it already that argument is kind of pointless.
You contradict yourself.
No he doesn't. You might not like scripting / dynamic languages but taking the best (or a good stab at taking the best) of scripting, C and perl can actually make some things more straight-forward. Need a regular expression? Used to function calls rather can syntactical regex? Need perl regex? preg_match.
Patently false. PHP has no dependency on Apache now, it originally used CGI, and continues to support CGI, FastCGI, and operation as a module in web servers other than Apache (such as IIS). The CGI startup overhead problem has many solutions, such as FastCGI, AJP, proxying, etc.
Patently missing the point. PHP and Apache go together so well it created the LAMP mindshare space.
But "not in-process" does not imply the use of CGI, and it does not imply the use of any system with long loading times. Furthermore, "in-process" is potentially insecure and can be less reliable - as all code runs in the same process.
Who cares? His point is startup cost which is generally higher for forks vs modules and you're just plain going to get more scalability compared to the traditional perl cgi forking method. Hence mod_perl.
Give me a break. You can dislike anything you want but why do you even bother when you don't have all the facts.
+5, Insightful. Dear me...
Selah.ca. Pause, and calmly think on that.
Running a server is cheap.
Paying a developer is not.
Civilisation is largely about the multiplication of human effort through the consumption of energy and automation. So, we multiply this developer's effort by a couple of thousand when running one machine and then do the same on another several hundred machines beyond. Each costs several thousand dollars to purchase and several thousand more every year in electricity, in cooling, networking, management and maintenance.
So, the effects of developer incompetence are also multiplied several thousand times often across hundreds or thousands of systems. Millions if we're really lucky.
So it isn't just one server, it's just one extra datacenter. It often pays to hire better people.
running a server for a day - $1
You think you get a real server for that? You get a tiny division of a server for that kind of money.
2) why doesn't these big server farms start looking at migrating code from PHP to C or C++ when the PHP+web design is solid?
The network effect. They migrate to Java instead.
Speed to delivery is nearly always primary importance.
Indicating speculative projects and disposable code.
Deleted
Yes, it is harsh, but anyone who has not programmed in c and assembler, and then spouts off nonsense about how php can't possibly be 10x slower, doesn't have the programmer mind-set.
That mindset includes understanding the runtime environment - which means knowing the limitations of your tool - in this case php. That means you'll not "have" to do something in php because "when all you know is php, everything looks like it needs a script" rather than a different tool.
Case in point - generating test data. Say you want half a billion examples stuffed into a db. You can run a script, which will take forever, or you can write it in c. Real-life example - a "world" measuring 8k x 8k cells, with each cell also measuring something like (iirc) 100 x 80. You can write a script to generate all that, and it will take a week to run (actually, it would have taken 220 hours). Or you can take an hour to write a similar program in c, let it run during lunch (a longish lunch - an hour and a half), and get back to work in the afternoon.
That's why I'm a bit harsh on the "script it!" crowd. They're not very imaginative or curious, or they would learn c - it's not THAT hard.
Ever read bad PHP code?
My hobby is refactoring PHP code. Note I say hobby, and not job.
After cutting my teeth with C, I moved on to web development with Perl. I was really annoyed at all the quirks in that language, namely, bizarre subroutines instead of functions, and clever regular expressions everywhere. Perl was just a pain, and I still don't like it! So, I decided to give PHP a spin, and I liked it because it was closer to the C code I used to write.
It didn't take long for me to realize there was something seriously wrong with the language. After reading up on its history, I realized the problem: PHP is a crappy template engine that has been "upgraded" into a language over the course of many painful years. Horrible inconsistencies and limitations abound, names of functions make little sense, and it's taken years for even the most basic facilities to be kludged in. References work in reverse? I have to test for automatic quotes at runtime to make sure I don't double-encode input? Some text functions are binary safe and others aren't? Completely different APIs for MySQL 4 and MySQL 5? Yeah, okay... thanks.
I went out of my way to study the language and write code correctly, but I can't blame people for writing bad PHP code. PHP is the Windows of the web development world. Nobody really designed it, it just kind of "grew" out of some little hack project and caught on among the rookies.
The only reason I work with PHP is because I want to make redistributable code that will work on shared servers, where people cannot install a new language on the server. Perl and PHP are your only options for that, and Perl can be a bitch to get running correctly, depending on the server's security policy. I don't care if Python makes a comeback or Ruby catches on, or whatever. I just want... something else to work with.
Ergo, it is going to reduce the processing necessary on the server to do any given job
Any given job, yes. But if there are a lot more "jobs" (i.e. more requests that require server side processing), the efficiency of the language used on the server side tends to become more critical, not less, especially if the per request overhead is significant, something that happens to be one of Facebook's primary complaints about PHP.
While you're churning away your super optimized C code which runs faster than god knows what and finally debugging the library to handle your super cusotmized tcp/ip replacement, I'll have already rolled out the application you wanted to do, but in some "non-programming/scripting" language like PHP, Ruby, Python, or hell... even Java.
There's a purpose for every language out there and frankly, writing some form of code to have a computer perform specific tasks is called programming. So please contain your ego instead of going off and spouting things like...
scripting (which isn't "real programming"
I remember when it was the script kiddie's substitute for cgi-perl. What does it offer from a theoretical and engineering PoV, apart from a Visual Basic learning curve?
Market penetration. From managerial perspective, you can hire PHP developers a dime a dozen, and replace them very quickly if needed. From developer perspective, you can grab any of those "PHP in 10 nanoseconds for complete idiots" books, an Apache+PHP+MySQL bundle installer for Windows, and learn it in a few days to the level sufficient to be hired.
Of course, the typical quality of a PHP solution is what you'd expect from such approach, but when did it ever stop anyone?
If you mean technological advantages, than there are none whatsoever. As a language, PHP today is essentially Java with weak typing, no proper packages, namespaces just being introduced (so no existing library uses them), and some very questionable language design decisions (like $a[1] is the same element as $a["1"], but $a["01"] is distinct).
From library perspective, the coverage is okay - about what you'd expect from a decent modern platform - but API design is essentially random and inconsistent with no common guidelines followed, and things such as Unicode support are usually an afterthought.
In short, there's nothing there over Python or Ruby, or even Groovy or Boo.
As to why it ended up in the spot it is in? Well, it's actually fairly obvious when you look at the history. PHP version 4, the point at which its popularity skyrocketed, was released in 2000. At that point the established frameworks were ASP (not ASP.NET - that didn't exist yet), JSP, and ColdFusion. Mentions of MVC at this point, in the context of Web development, would just earn you some blank stares; at best, some particularly advanced Java devs would be aware of "model 2", handcoded via servlets and JSPs...
ColdFusion was both getting dated, and cost $$$. The latter bit especially meant that it was right out for many.
ASP was really simplistic, with VBScript as a primary language (and that was much more primitive than PHP), no decent IDEs, and not exactly fast either; also, while it was kinda free itself, you needed IIS to run it, and that (in 2000, remember?) came with Win2K, which not everyone in the "casual newbie developer" group had or even wanted to have, and which was more expensive than 9x/ME (meanwhile, Apache ran on 9x).
JSP itself was okay in this context, but it had two problems compared to PHP. First of all, Java is still a rather verbose language, and that complexity showed when you don't have anything like modern frameworks mapping requests to beans etc. At that point, you had to work with raw request parameters (strings!), query database in raw SQL, and output plain text data (strings!) - and while you can do that all in Java, the corresponding PHP code was usually much shorter. As well, no-one in PHP land cared about the theoretical advantages of database decoupling that JDBC gave you, because they (we, really; I was doing it at that time as well) just hardcoded mysql_* calls, because that's all that was expected to be supported in the foreseeable future.
The other problem JSP had was setting it all up. Today, you can just download Netbeans and get it all out of the box configured properly; back then, it usually involved getting JDK first, then downloading and configuring Tomcat (not for the faint of heart, too). I don't recall seeing any all-in-one, one-click setup bundles like there were for PHP.
And documentation. Oh yes, that still persists as a myth that "PHP has the bestest docs" (witness various fanboi replies in this thread). It hasn't been true for a few years at least, but back then it definitely was. The big deal was that PHP manual somewhat tutorial-like - something you could read without having any clue as to how it all works, and get the general idea along with the minimum of details that you actually need to get it all working. Meanwhile