The Environmental Impact of PHP Compared To C++ On Facebook
Kensai7 writes "Recently, Facebook provided us with some information on their server park. They use about 30,000 servers, and not surprisingly, most of them are running PHP code to generate pages full of social info for their users. As they only say that 'the bulk' is running PHP, let's assume this to be 25,000 of the 30,000. If C++ would have been used instead of PHP, then 22,500 servers could be powered down (assuming a conservative ratio of 10 for the efficiency of C++ versus PHP code), or a reduction of 49,000 tons of CO2 per year. Of course, it is a bit unfair to isolate Facebook here. Their servers are only a tiny fraction of computers deployed world-wide that are interpreting PHP code."
All the more reason to use perl! notcop
I remember when it was the script kiddie's substitute for cgi-perl. What does it offer from a theoretical and engineering PoV, apart from a Visual Basic learning curve?
if there was a run-time compiler solution for PHP!
What about all the cycles compiling and debugging C++ code? Or all the trees torn down for C++ books? Or the environmental impact of C++ developers? I mean, have you ever had to share a cube with one of them? Pheewww.
Run and catch, run and catch, the lamb is caught in the blackberry patch.
That's a ridiculous way to analyze it. What about the environmental impact of the extra time required to write the same functionality in C++? What about the impact of whole classes of C++ bugs that don't exist in C++ (and, perhaps, vice versa) with the downtime or security breaches resulting from them? Or a hundred other ways in which writing all that software in C++ would be different of which I can't think at the moment?
Very poor reasoning. What about the increased number of developers and their systems needed when using a lower level language. I also wonder what the impact of their servers compared to the impact of all of the user systems executing java script client side? Could that JS code be optimized to make the client side functions greener? Also, maybe investing time in optimization of the current code base to reduce server load would be a benefit. First Post ?
The thing that this article fails to see, is that some languages aren't for everyone. A PHP programmer who turns out good PHP code isn't going to magically make the same level of code for C++. It also doesn't see that Facebook can't be down for longer than an hour at most, otherwise risk user outrage. After all, they have many, many, many users and for it to go down for a day would be akin to Google going down for a day or so. The difference being that if Google is down for a day, most users can use Yahoo, Bing, Live, WolframAlpha, etc. to search. Not every Facebook user has a MySpace.
Taxation is legalized theft, no more, no less.
How about we talk money instead. Apparently the cost for power and cooling is less than the cost of rewriting, at least within the planning horizon of Facebook management.
And while we are at it, fuck C++ too. :)
Yes, C-- (or even FORTRAN) is less CPU in-efficient than PHP (or most any interpreted langauge). Why do you think CPU is limiting performance and causing the high server count?
GOOG has so many servers to be _fast_ (keep their DB in RAM). Facebook may be doing the same, or for disk.
Can we _please_ moderate stories? This one is -1 {TROLL}.
That's crazy. 10:1 is incredibly unfair. Especially when you consider that a cached C++ page takes just as much time to return as a cached PHP page. On top of that, majority of the work done is just searching a database. If would imagine a large part of processing a page is in getting and returning data, which is then up-to-the database. He is using stats that say PHP is 10 slower for running through loops, math that type of crap. Says nothing about querying a database then doing some minor presentation related logic. If I had to guess, for a web page the average "efficiency gain" of using C++ would be under 2x.
I think they should be charged a 10% pollution tax for every server above the amount needed by C++ code.
Read the first posters points (in TFA) he pretty much sums everything up.
Just serve up plain text files. Anything else is pure decadence!
I'm thinking that these scripts are just thin front ends to a massive db. Thus, a lot of the computer's time is going to be spent on I/O, and a lot of the processing is going to be taking place in the db itself, which is probably written in C.
Mod points: Guaranteed to remove your sense of humor.
Side effects may include gullibility and temporary retardation
Simply put: no.
The reason why they have so many servers is because Facebook contains so much data. The servers are there for a reason, and the reason is CACHING.
The overhead of PHP is very small for a platform that is all about sharing data and the bulk of processor time surely goes towards fetching that data in the first place. What, do you seriously think that when you hit your home page on Facebook, there are database queries issued for that? Lulz.
Besides, I'm almost sure that FB uses something like Zend Accelerator, which increases code execution speed a lot.
Anyway, just no.
I don't care about your environmentalism.
Why not rewrite everything in assembly? This comparison comes to a conclusion without any facts to back it up. As others have pointed out there is development time and compile time associated with C++... and what about ongoing development? Where does 10-1 come from? Are you assuming they aren't doing any optimization or using any sort of accelerator? I've personally re-written code in C++ from php, and then done the comparison. In our case, we decided the extra maintainability was worth the approx 10-20% increase in speed we saw.
...were they to rewrite it all in assembly language!
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
For something that is deployed to tens of thousands of machines..
Is there some reason why these languages couldn't be compiled and optimized? Code is just the programmer's will expressed as text that the machine can somehow interpret, right? If there is so much PHP out there, why wouldn't/couldn't there be an efficient compiler (by which I mean something that produces executables and not just "executables that are really just an interpreter tacked onto a script")
The dearth of such compilers on the market suggests to me that the gains wouldn't be as great as claimed for the majority of applications where interpreted languages are used.
Can you be Even More Awesome?!
Does the author seriously believe that Facebook isn't running some sort of PHP compiling/caching service, like APC or something similar?
It would be ridiculous for them NOT to be running something like that, which eliminates much of the advantage C++ would enjoy through being pre-compiled. While there still may be a reduction if Facebook were magically changed to precompiled C++ code, the reduction would be fairly minimal. In addition to that, you'd need to factor in the debugging and coding/compiling times, which would exceed the PHP times by an order of magnitude at least.
I'm assuming the claim about 10 times is true, which I don't really think so...
But they could have done something - like precompile the PHP, just like JIT of Java, to make it better or on par with compiled C program.
There are PHP accelerators like Zend Accelerator for that.
This is why people don't take global warming seriously. Please, just stop it. If you really wanted to help, you could just fucking kill yourself and cut your carbon footprint to 0.
If you find this post offensive, don't read it! THINK ABOUT YOUR BREATHING! I am what I am because of how apes behave.
What a troll. Any point or argument based on assumptions is very weak. Here there are two: "..Let's assume this to be ..." and "...assuming a conservative ratio of 10...".
Don't make stuff up.
-Foredecker
Jibe!
What about the environmental impact of all the coke bottles required to power the C++ programmers?
This don't make much sense. You can go to work in a F1 car, or your normal car. You in theory will go faster in the F1 car.
In real world, there are other "fasters". The normal car is "faster to buy" (cheaper), "faster to mantain" (cheaper to mantain), and lots others "faster" that make faster your normal car than your F1 car.
Facebook is probably one of the few sites that could have written part of it on fast C++ code. In a F1 race, you will use a F1 car.
-Woof woof woof!
Many many moons ago efficiency was everything. The CPU was expensive, the developer was (relatively speaking) cheap.
Then Moore's law really started to kick in, and we needed a paradigm shift. Developers were more expensive, and CPU cycles could be had on the cheap. The mantra was "code it fast, and only worry about efficiency for the bottlenecks if at all".
Fast forward to almost 2010, and we have web applications deployed on a massive scale. Guess efficiency matters again. Not only from a pure cost standpoint but also from a moral argument to cut back on greenhouse gases. Amazing that more people haven't seen this coming. Especially given that web services are normally free to the consumer, the cost side of the equation clearly matters.
"assuming a conservative ratio of 10 for the efficiency of C++ versus PHP code"
ARRRRRGGGGHHHHHHHHHHHHH
Why? On what evidence? I mean, I hate PHP as much as the next guy, but last time I wrote a web application platform in C++, I got to the end, analysed the result and went "Great, I've made the fast bit even faster. Now, about that database engine..."
it sure seems to be, the way a lot of people write it. write it once and hope you never have to read it since its impossible to figure out what they intended. ever read someone's c++ code? has it been a good experience?
--
"It is now safe to switch off your computer."
facebook should be re-written in Java.
It could run on a java interpreter, written in java, that would give it ultimate speed compared to C++.
C++ is dying, netcraft confirms it, Java is faster than C++, it's the wave of the future!
Yeah right serving a page out of cache in PHP takes ten times as long as C++.
Plain text vs. any web graphics. Who needs all that fancy graphics crap? If you can't get your message across with plain ASCII, then you are incompetent, lazy, and on the verge of being a mindless PowerPoint Ranger.
.. because I didn't ever think I'd be defending PHP.
However, it is a much better choice for a web application than C or C++ - and I say that as someone who codes C, C++ and Java for a living. There are no decent web frameworks for C++, memory management is still an issue despite the STL, and the complexity of the language means both staff costs and development time are inflated. Peer review is harder, as the language is fundamentally more difficult to master than PHP. Compared to Java, the development tools are poorer, and things like unit testing a more complicated despite the availability of things like Cppunit. There's no "standard" libraries for things like database access, and no literature that I am aware of that describes how you would go about designing a framework for C++. You'd most likely end up porting something like Spring to C++, and the even if you published your code on the web, I doubt much of a community would build up around it.
If you want a less contentious argument, and one which can be backed up with hard evidence, then argue PHP that should be replaced with Java. A well written Java web application, using a lightweight framework such as Spring or PicoContainer, should outperform ad-hoc C++ code.
It's likely most of the overhead in Facebook's server farm is database-related and not PHP-related, meaning switching to C++ would not help much. Also, depending on what tasks the PHP codeebase is performing, one can write binary libraries to speed up critical portions of the operation, improving performance to near-total-binary without reducing maintainability. I wouldn't be surprised if the people at Face book were already aware of this.
developing web applications in c is not exactly a walk in the park. neither c was designed to build web applications, or maintain them. whereas you can easily go through php code to develop new functions and improve new and existing functions efficiency speedily and economically, it wont be the sam with c. what about those costs ?
looking at the instantaneous state of the server/php/performance situation is as stupid as just looking at the instantaneous state of a mass production factory and declaring that certain assembly lines are not efficient or green. there are a lot many factors and costs to count into in the bigger picture.
a half assed approach, which, somehow, brings the word 'green' into the mix - maybe to garner some attention, since it is the issue these days.
Read radical news here
as with ALL things invented by humans and which can be used to create other stuff, php has grown over the 'homepage tools' it was initially created to be. now not only it has a huge set of functions inside it to create full fledged applications, but through server modules it can also acquire an immense sea of functionality that can be used to perform innumerable other tasks. it is pretty much at the point where it can take over some desktop applications too, with the right server setup and modules - with some scripts and the proper modules you can even do a fair amount word processing in any web front app. to the extent of being able to do drag&drop editing/drawing and pdf creation and so on. of course not quite as efficient as a native desktop program, however, regardless, you can.
just check php functions in php.net, and check some modules apache can use to supplement php. there is QUITE a lot.
Read radical news here
Seriously, years ago I started working on a c++ version of j2ee (not just servlets, the whole kit) and i mean providing similar functions not identical methods of execution obviously. It wasnt terribly hard actually. But it all falls apart really quickly cause of several reasons:
1) platform architecture - the dependence here, even between different versions of the same distribution was a pain and essentially spelt the end of my work. So I was stuck with "do i make web apps c++ soruce, or shared library binaries?" to which there is only one real answer for portability - source.
2) its a systems langauge - dear god that makes it painful for so many reasons.
There are caveats to both those, but the reality is that php exists because it fulfils a need and it does it quite well. To compare the two (c++ and php) is a little ridiculous and ultimately this article just reeks of "please everyone advertise my c++ web tool kit for me!". Sure, facebook (and trillions of others) MIGHT move to c++ web tool kit, but find me a dev that knows how to code an app it, now find me 2, now find me 200 cause thats how many i'd need to write and maintain faceboot apps in c++.
Even taking the OP's assumtion c++ is 10 times more efficient at what php does and that you could actually code facebook in it as actually acurate and that php vs c++ is a one-to-one relationship for things like code maintenance, your still stuck with "how many API's am i going to have to re-write and how many php api's do i use that dont even exist in c++". Its ludicrous to assume that you could drop-in replace php with witty without ending up coding tonnes of c++ code just to do things that PHP already provided. Not to mention the zillions of little extensions that revolve around php to accelerate its web-abilities (memcached for example). The number of things that can be used along side php for web-related things and the number of api's in-built to php just mean witty is never even going to be viable as an alternative. Lets also not forget there are millions of people round the globe using php for web stuff - which ultimately leads to php being a good web language (i.e. security problems being found, optimizations, etc etc).
Of course, wouldn't facebook be using something like zend to compile php pages? I mean seriously, if the 25000 servers are running php and not running zend the waste here just in cost of servers would be unbelievable - shear idiocy on facebooks part (if it were true, and i'd very much doubt it) and I imagine zend would have almost given it away for free just so facebook could say "we got a x% improvement using the zend compiler".
So, I wonder how many people are now learning about witty for the first time (which seems like the only real reason for the article to begin with). Better advertising than adwords!
And everything exuding heat is perfectly natural, no problems there.
The deaths and environmental changes from heat exchange in rivers near power plants don't happen, nope, uh uh.
Water's perfectly natural you need it to live, no way to drown in it, nope, uh uh.
That’s like a cage match between a slow drooling retard and your crippled grandpa in his electric wheelchair.
In other words: Run it at double speed, add Yakety Sax to it, and it’s awesome. :D
Any sufficiently advanced intelligence is indistinguishable from stupidity.
Code it in Asm, and you can get 100:1, so you can power down 29,700 machines...
Better yet, make ppl. post all their wall posts directly in binary code. That way, you can destroy the code necessary to translate UTF-8 back-and-forth, the HTTP/MIME wrappers, and the SQL. Imagine the amount of electricity saved! You can market it as a brain-booster too, since now you have to think before you post on Facebook.
Yes, PHP is a heck of a lot slower on proccessor-bound tasks than C++. In a pure benchmarking contest, no doubt C++ will win.
But what about when both languages have to query a database (be it mysql/postgress/oracle, etc)? In this case, both are blocked on the speed of the database. a 15 ms query takes 15 ms no matter what language is asking. Facebook is not calculating pi to 10 gazillion digits, and it is not checking factors for the Great Internet Mersenne Prime Search. It is serving up pages containing tons of customized data. This is not proessor-bound... it is I/O bound both on the ins and outs of the database and the ins and outs of the http request. It is also processor bound on the page render, but the goal of this many machines is to cache to the point where page renders are eliminated.
Once a page is rendered, it can be cached until the data inside of it changes. For something like facebook, I bet a page is rendered once for every ~10 times it is viewed by someone. Caching is done in ram, and large ram caches take a lot of machines.
So lets look at those 30,000 machines not by their language, but by their role. We can argue the percentages to death, but lets assume 1/3rd are database, 1/3rd are cache, and 1/3rd are actually running a web server, assembling pages, or otherwise dealing with the end users directly (BTW, I think 1/3rd is way high for that.)
So 1/3rd of the machines are dealing with page composition and serving pages. If they serve a page ~10 times for every render request, then abtou 1/10th of the page requests actually cause a render... the rest are being served from cache. Those page renders are I/O bound, as in the example above - waiting on the database (and other caches, like memcached), so even if they are taking a lot of wait cycles, they are not using processor power on the box. The actual page composition (which might be 20% of the processing that box is doing), would be a lot faster in C++... So 10,000 servers, the virtual equivalent of 2000 are generating pages using php, and could be replaced by 200 boxes using stuff generated in C++.
So the choice of using php is adding ~1800 machines to the architecture. or ~6% of the total 30,000. Given that a php developer is probably 10x more productive than a developer in C++, is the time to market with new features worth that to them? I bet it is.
You excrete shit as well. I suppose that sewage pond known as your mom's basement isn't polluted either.
brandelf -t FreeBSD
And water isn't a poison, but you'll still die if you drink too much of it.
AGW/CC is about as real as Scientology. Shut the fuck up already. "The Environment" has taken care of itself for billions of years. I'm not hurting it and it doesn't need you to save it.
Beyond that, this post has taken it to a brand new, stupidly religious place. You're now calculating numbers you pulled out of your ass to indict programming languages that you find to be "un-green."
That's my limit. Go sit in the stupid corner until you learn to interact with intelligent adults again.
It's a phenomenon we have also noted.
Sure C++ would be faster running but not necessarily more efficient in terms of dollars.
I think you'll find that the servers come out of the operational budget, not the development one. So the costs of running 10x more servers don't factor into development effort. The costs should of course be charged back to the dev teams.
Deleted
gay, don't care
The proposed ratio of 1:10 is real, if not bigger. And here's why:
1.) For each request, PHP has to load entire application responsible for that particular response, including its configuration, etc. With memcache(d), you have to instantiate connection classes and reconfigure them, per request. Languages like C/C++, Python and Ruby have different architecture to begin with. They load ONCE and each request triggers a FUNCTION or METHOD of a class, with all the app-specific configuration, db and memcached connections done and configured on app init, NOT per request.
2.) TFA mentions microsecond relevance! Even a simple echo "Hello World" will take much more time than similar action in C. I have yet to see a PHP helloworld app that does it in under 1msec, let alone the microseconds required.
3.) Arrays in PHP are slow, being always hashmaps. Other data structures can speed up things. You don't always need hashmaps. SPLFixedArray() is a joke, btw, and available only as of 5.3. Can't compare it to a vector anyways, and lots of fixed structures can be represented by structs or classes in C which are anways faster than in PHP. Also the app can instantiate them once on init, and just (re)load when required.
4.) Even if all the app does it parse input vars and call memcache(d) / database funcs/methods to retrieve/store data, those calls are faster in C. Params can be parsed quicker in C, not requiring hashmaps for instance.
5.) FastCGI is crap. If this app were to be done in C, then it would require its own HTTP layer, epoll based (for Linux). It can take out all the crap in HTTP that is not requred to parse the AJAX calls, and does not need to be "generic" enough to deliver static content.
6.) For such dedicated and distributed deployments, garbage collection is sometimes not required. For instace, fixed-length stuctures can be preallocated upon app init, and the app can really take as much RAM as possible on startup. Yes, that would limit the MAX number of users/connections per server, but so what? The app dominates the server, nothing else is required to run (except basic OS environment for the app), so fixed memory consumption is not a problem.
7.) Even though each request has to wait for I/O of some sorts, either from memcache(d), from disk or from DB, you can process much more of these per front-end server and just scale backend servers as required. For example, with PHP your front-end server can serve 100k/sec, having X DB backends and Y memcached backends. With a C application, the front end can serve, say, 1M/sec. You still get to keep one front-end, even though you had to put more backends.
In short, you can significantly reduce the number of servers required if the app was written in C.
The first article is actually rather good. It focuses on what most of us suspect is the larger architectural challenge, the database,IO, and scaling components not originally designed for a much larger scale. Lessons learned are avoiding joins, reducing IO requests, avoiding DBs for static data, etc. PHP is mentioned as the presentation layer, and optimizations are architectural, not switching languages. Criticisms of PHP are not ones of performance, but ones of maintainability, programming practices, and integrating with C++ code.
I can't read the second article because it's slashdotted, but the summary of it leads me to believe the author either completely ignored the first article, or didn't understand it at all. I won't re-hash the "Where the fuck did 10:1 come from?" arguments everyone else is very correctly bringing up. But I would like to point out that the author of the second article doesn't sound like he/she has a good grasp of what the first article says.
AccountKiller
and Facebook didn't. Facebook has no interest whatsoever in minimising their power usage (electricity is free, you see) and like all corporations they never look for ways of minimising their costs. There are no possible reasons why FB may operate their servers in this way.
It reminds me of when certain people start raging about the fact that "x% of trucks on the road are EMPTY!!". Yeah, because the big trucking firms apparently don't have rooms full of people whose job it is to make sure that the trucks are as full as possible, because trucks, diesel and drivers are free.
This is a substitute for a clever sig that fits within the maximum number of characters.
Sadly I couldn't RTFA because of the good old Slashdot effect but the concept that efficiency can be determined by a direct correlation to performance metrics is just wrong.
For the sake of argument I'll confine my examples of why I believe this thinking is flawed to just the language vs language issue and not bring in any network, database, etc. issues. First, how many more computer hours did it take to build in C++ than PHP? Second if you build like for like functionality in C++ at a given point in time it probably isn't as flexible and maintainable so all maintenance takes longer. Now lets assume you do things "right" and build in all sorts of flexibility and injection points eventually you end up building a higher level abstraction (or perhaps even an full interpreted language) which has the same issues as PHP regarding performance.
The reason you accept performance declines associated with higher level abstractions is that it allows you to do more in a shorter amount of time at a still reasonable performance level and anyone who doesn't understand that and all the impacts of that certainly can't produce a legitimate analysis of power consumption based on languages. If the author really believes this he should program everything in assembly or even better build specialty hardware for every computing task or better yet simply quit using computers or electricity all together, that will definitely have a bigger impact.
What about the environmental trade-offs inherent in spending time considering this sort of environmental impact versus spending time considering more significant environmental and conservational issues?
Looks like he should re-write his webserver in C++ so it's not slashdotted so easily.
Companies use PHP to develop and run web app functionality because it saves them huge amounts of time and money over rolling out the same thing if you were to write it all in C++. Realize what the cost structure of a company like Facebook is - the amount they pay their engineers, marketing personnel, and so on is significantly more than their amortized server expenses and server operating expenses (including energy costs, etc.).
Furthermore, the 10x speedup assumption seems ridiculous - how much time is spent on their server in compute-intensive PHP loops where huge gains would be made from switching to C++? And how much of the "code" is really database queries of various sorts? Furthermore, you can generally isolate small areas like that in your codebase and rewrite them as modules in C or C++ to be invoked from PHP land - and if they could easily cut their server expenses even in half (let alone by 90%) by having a few engineers spend a few weeks rewriting some components, don't you imagine they've probably set about doing that already?
Re-casting a discussion in terms of greenhouse gas emissions or energy use doesn't change any of this - saving energy generally means saving money, unless it takes more expensive resources (such as 100s of humans, who have to spend hundreds of months re-writing code in C++, while they, their families, and dependents emit tons upon tons of greenhouse gases, use electricity, buy groceries, and so-on and so-forth). The cheapest solution certainly isn't always the most environmentally friendly solution (such as when negative externalities are involved - lower labor and pollution standards in China, for example, that make a less "green" product manufactured there less costly in the US), but a vastly more expensive solution that no company in its right mind would implement isn't necessarily greener just because it might save some electricity and a few servers once it was implemented.
Obviously lesser number of servers for a lesser CO2 footprint also means cheaper server infrastructure. If that was the case, don't you think FB would have done it long ago? Economic forces are the main drivers of technology innovation in social networking!
As they only say that 'the bulk' is running PHP, let's assume this to be 25,000 of the 30,000. If C++ would have been used instead of PHP, then 22,500 servers could be powered down ( assuming a conservative ratio of 10 for the efficiency of C++ versus PHP code)
In order to keep math simple, let's assume a horse is a perfect sphere...
Cool. Then C would be even faster. It should all be written in C.
Microsoft is to software what Budweiser is to beer.
Yes yes, very nice. Now make the language not shit, please.
This, in a discussion of C++?
Microsoft is to software what Budweiser is to beer.
blah, this is the sort of troll that makes us c++ lovers look so bad :(
Copyright infringement is "piracy" in the same way DRM is "consumer rape"
Nobody takes into consideration the serious environmental impact of C++ over PHP when it comes to designing, implementing and debugging applications, which take much more time and stress people more. They eat more, they shit more, they breath faster, they need to spend more time working and most of them don't live close enough to the office to walk so they either drive or use public transportation. It takes a lot longer to learn C++ than PHP, therefore the developers will be wasting a lot of time without actually producing anything. Why would PHP be 10 times slower than C++ on the web, when most of the work is done by the [most likely written in C] database engine?
In short: TFA is comparing apples to oranges and says that oranges have 10 times more juice than apples after particularly squeezing that exact amount from them.
Yes, it's sarcasm. Deal with it!
Before I even start, let me just say I am a C/C++ coder, I've never really touched PHP, and if I were going for a more abstract language, PHP probably wouldn't be it (mind you, I've not written off PHP altogether; I rarely do that with programming languages, except for FORTRAN, COBOL and C#). I've got no favoritism towards languages; I use what best fits the task and try to make my software as readable and maintainable as possible.
First: where did these numbers come from? I find it hard to believe them, as I have seen actual benchmarks of PHP, not just WAG of "10 times as slow as C++".
Second: if the author is so worried about PHP being inefficient, why doesn't he help improve the efficiency of the interpreter? Remember, there are no efficient languages, only efficient implementations.
Third: has he even factored in the fact that higher level languages require less total development time? What about all those commuting hours saved by the programmers because they weren't having to run their PHP scripts through valgrind's memcheck?
Fourth: why C++? How about FORTRAN or assembler? FORTRAN compilers are extremely good at optimizing code, and I'm sure you could squeeze out a few more cycles by coding it in assembly.
Nathan's blog
Yes. I know the difference. C is an elegant if simple language, which is hard to program properly. C++ is an abomination that attempted to take the elegant, simple nature of C by bolting on spare body parts from dead object-oriented corpses, resulting in a language that is neither simple nor elegant, which is even harder to program properly.
See, I know the difference.
But if the point is to gain efficiency, why would you stop at C++? It's not a magical perfect balance of performance with elegance. C would give better performance than C++.
Sure, there's the non-OO tradeoff (though you could quite easily gain the benefits of OO, though not as elegantly as C++), and then you don't have to deal with fucking templates (which are really nice to program, but a bitch to clean up when someone else has fucked them up for you).
The premise of the article is stupid, and shows a pure lack of understanding of PHP, web service architecture and implementation, and a not-inconsiderable dose of C++ fanboi-ism.
Microsoft is to software what Budweiser is to beer.
To paraphrase Heinlein, who cares if the answer takes a microsecond or a nanosecond as long as it's correct.
Assuming that the average C++ coder is heavier and bigger in size (even when shaved) that the average PHP coder, I think the exhalation of CO2 produced by the C++ programmers needed for the job overpower the 10:1 edge they have on code speed.
Also, they probably eat more, and drink more coffee, which turns into a bigger environmental footprint if you count the emissions produced by trucks that deliver those groceries to the nearest store. And, just to name one example: Let's assume that 50% of the coffee company employees drive to their work and so on....
Hey, they are the ones who started drawing conclusions based on assumptions, not me.
After all, what purpose do they really serve? Apart from fans of X Factor would anyone really notice if it was gone?
At the very least it would save us all from the annual slashdot Christmas bunfight over which code reigns supreme.
Posts, MyBio or Sig, may contain satire, sarcasm, bolded nouns be sardonic or even witty & be Church of SD
Dumb statements like this is what leads to premature optimization. Show me the proof: Put a profiler on facebook and show me where the bulk of code execution is happening. I seriously doubt one could code a similar app in C++ and make it run smoothly and stable and yet save that many servers.
If the goal is energy conservation, the server count might be not be reduceable -- # required for memory, network ports, disk seekers, or other things.
However, it is _certainly_ possible to reduce power and cooling requirements somewhat with less inefficient code. So you can install slower/lowerV/lowerW CPUs, or fewer cores (unless you are already at min). Or at least the CPUs spend more time in powersavings states.
The power reduction may not be all that great, ~20W per server, but over 25,000 , that is still 5 MW -- 4.4 M$/yr
Beware of false economies -- LoC does not matter if those lines are rarely executed. What runs often matters. What doesn't might not be worth the power investment of compilation.
My own experience doing server development in c was that it's a minimum of 30:1 (and in in some cases, much greater). Plus the speed differential is huge, and also in favour of c.
There's a big difference between a couple of hundred requests a second and 6,000 - 10,000.
Then again, the php code had to be served through apache, while the c code was served directly by a custom server sitting on a separate socket, so there's no telling how much of the overhead was from apache.
Even the absolute worst-case scenarios were well over 10:1.
I am curious about an example of a company that has really done this conversion, and what their savings was like. Where is it?
Living in Chile
This is brilliant! I think it's clear now the direction we must go. Overuse of energy-guzzling languages like PHP have put us on an unsustainable trajectory fueling out of control global warming.
Congress must act to regulate the use of these energy-guzzling languages. No longer will programmers and corporations be permitted to turn out inefficient code with impunity.
PHP, Perl, Ruby, Bash, your days are numbered!
Just wait until we can get UN involved. Python, you and your CO2 spewing simplicity are next!
First, a few helpful links:
Amdahl's law says that if Facebook were to switch from PHP to C++, the best possible improvement in the overall processing time is proportional to the total time spent in PHP now. If PHP processing accounts for 90% of the time and they reduce that to zero, they'd have a 10x speedup. However, if it accounts for 10% of the time and they reduce it to zero, they'd have about a 10% speedup.
So, the question is: How much time (overall) is spent in PHP processing? My guess is not very much. As other posters have pointed out, there are disk accesses and MySQL. And quite a bit is cached in Memcached.
The original article is slashdotted now, so I'm not sure if it says what those 30k servers are doing, but Facebook has more than just PHP running. Perhaps a thousand of those servers are running Hadoop, probably calculating the social network.
From an architectural perspective, it probably does not make sense for them to optimize for processing speed (i.e., switch PHP to C++) if their performance is acceptable. That's because they face larger risks: modifiability and time to market pressures. They may worry that switching to a statically typed language (such as C++, but Java would be similar) would make new feature development slower. If they could have both, great, but these two quality attributes often trade off against each other. A design with better performance may hinder modifiability, and vice versa.
I don't mean to start a language war -- I'm speaking broadly about the idea that dynamically typed languages (PHP, Smalltalk, Ruby, Python, ...) yield programs that are faster to write and modify compared to statically typed languages (C, C++, Java). You may disagree with that generalization, but you may agree that others think it is true, and are therefore acting rationally if they choose a dynamic language when they want modifiability.
Disclaimers: I knew Aditya in school but haven't spoken to him about Facebook; I am writing a book on software architecture.
I'd actually say that Wt can be a more productive framework than resorting to writing individual pages in classic PHP. I'm not sure that I'd use it for general purpose work, however.
While I agree that C++ is a much better programming language (no implicit declaration, decent type checking, access to good general purpose class libraries, a clear understanding of what is actually happening underneath, and the Standard Template Library - a collection of code that rivals the most beautiful hills of Scotland in implementation).
However, I'd not want to run native libraries/executables on a production web server. I can only imaging the security nightmare, and the potential for core dumping of web server processes, although with apache, the parent should re-spawn the workers... Also I'd have to write at least some fairly significant classes myself to replace functionality that is standard in the PHP libraries.
The big issue for any web application that I have ever written, has been performance of the database queries, not the frontend code. If you want to reduce the number of servers, developer effort would be best spent hiring people to look at caching, query optimisation, and load distribution; not re-writing the front end in C++.
Compile.
Why stop at 10? Since we're pulling numbers out of the air anyways, why not take a conservative estimate of 100 for the ratio of PHP to C++ execution, so they could run the whole thing on just 250 servers!
This seems to me like the poster has an axe to grind against PHP in general.
Even if you accept the "conservative" estimate of C++ being 10 times as efficient as PHP, you are still blindly assuming that each of their 30k servers is CPU bound. I find it just as plausible that memory, bandwidth, and/or storage are the limiting factors.
Don't let a lack of facts stop you from using the "environment" to drive your agenda though, it doesn't seem to stop anyone else.
Isn't this "study" a waste of energy?
I am a C/C++ programmer by trade; I'm not fond of PHP. Yet this "C++ saves energy over PHP" argument smells like more selfish politics to me. And selfish politics is what is bringing doom down on humanity's head -- the use of PHP vs. C++ is a sideline, a distraction, and only truly valuable for people who have a philosophical axe to grind.
You want to save a lot of energy? Shut down all the computers running MMOs. And stop wasting cycles looking for alien signals in cosmic radio waves. And get rid of banal YouTube videos... and... the list is endless. The science behind Global Warming is being used to further political and social agendas that have little or nothing to do with adapting our species from a potential environment change.
In the end, selfish politics will kill us all. We will become a footnote in history is we do not discover enlightened self-interest.
All about me
10 to 1? Nonsense!! If they had used C++, they'd be no Facebook ... yet, for zero carbon footprint. Extremely efficient.
Ok, this has gone WAY too far .. we all need to just take a step back..
---- Booth was a patriot ----
That's right, because a similar level of chicanery is going on in this claim. A small factor of system expense is being extended into a region of pure nonsense. There are plenty of more reasons to have a large, scattered base of servers. These include:
* Local database mirroring and caching to improve response times for dynamic content.
* Local proxying of static material to improve response times and to improve upstream bandwidth costs, and reduce the number of connections made to the core servers and avoid DDOS'ng yourself.
* The idea that PHP's function calls to pull and present disk-based or data-based material would somehow magically reduce the overall cost and need for servers, even theough the request for material is probably one of the most efficient steps.
I've had eager young engineers extrapolate their favorite tool into being the great solution to all issues this way before. Educating them in the concept of looking for the _other_ bottlenecks is a painful process, and I wish I could have found a good course in it before a lot of recent projects myself.
Few web applications are CPU bound. Most are bound by I/O, often that is between the data store and the app. No language selection is going to resolve that problem. If you're really concerned about the carbon footprint of your app, you're better off spending you time refactoring to optimize its performance rather than rewriting your {PHP | Ruby | Python | Perl} code in C++ or C.
Also consider the fact that many web applications are over-deployed. An idle server consumes as much energy as a busy one. You could save yourself a lot of carbon by using performance monitoring tools to determine the optimal number of servers to run your app.
C/C++ is mabye 10-100x faster and more efficient for carefully written inner loops. At the level of whole systems, it's an entirely different story. Because C++ lacks garbage collection, people end up retaining far more memory than they need to. Because algorithms are far harder to express in C++, people end up using brute force algorithms (linear search, etc.) a lot. Because templates need specially compiled versions for each combination of template arguments, you end up with dozens of different instances of basically the same code.
For web applications, there's probably not much of a difference either way; but in scripting languages like PHP, all the inner loops that are needed are already written in C. For scientific computing, C++ is acceptable because a lot of applications really are mainly about the inner loops.
But for many applications, like GUIs, C++ not only fails to be faster, it also ends up making everything a lot slower and more bloated. If our desktops were largely written in Python, Ruby, or Smalltalk, we'd be using a lot less energy and be able to get by with smaller, less-powerful machines. That's in addition to all the savings from the reduced number of bugs and reduced development costs.
I think this is also an unfair comparison. what are you considering a PHP page. A page mostly of HTML and a few PHP tags? C++ has to render all of the HTML so how would you compare that? I have worked on programs that where 100% coded in PHP, and HTML was created by either echo or print statements, these are nightmares to maintain.
climate change is only bad for the poor living in coastal corrupt hell hole
here in canada, and in northern usa it is a good thing since our agricultural output will be higher
The script languages like Perl/PHP/Python/Ruby are dynamic, and fill a role that C++ can never fill. Further, while GC can be added to C++, it changes the programming style so much that it is nearly another language (makes using 3rd party libraries tricky).
Java is a middle ground between C++ and script languages. It has the garbage collection, dynamic class loading, "safe" execution model and extensive libraries like PHP/Python/Ruby, but long running programs like web apps get compiled to optimized native code as they run. Yeah, the language has warts, but it is still more productive vs programmer time than C++.
This is as stupid as the "news" a month back claiming a pet dog used the same environmental resources as a large SUV. The problem was, multiplying the claims for agricultural land used to feed one dog by the number of dogs gave as a result that 10% of our agriculture is feeding dogs. Why is that obviously wrong? Because the whole pet food industry accounts for less that 1/10 of 1% of the value of agricultural production.
We're going to see more and more of this shit. Everyone will be competing to get their 15 minutes of fame by claiming that X - something that annoys them, like PHP or, evidently, dogs - is the major overlooked factor in destroying the Earth's climate. Because they know our press is so stupid in basic scientific literacy that they'll jump at the chance to put a headline over the claim, since we're all bound to read it just in case there's a there there.
"with their freedom lost all virtue lose" - Milton
If Kensai7 is so concerned with the environmental impact of (supposedly slow) interpreted PHP, then (s)he should write an efficient PHP compiler.
This would not only make the Facebook site more environment friendly, but other PHP powered sites as well.
Here's one of their 2008 presentations on the topic - apc @ facebook.
Computer programmers are people with their own carbon footprints, $FLATULENCE_JOKE. So, people have raised objections to the underlying efficiency argument, I tend to agree with the people who estimate that the energy savings would be less than 10-fold, but it's not like I've looked at the diagnostic output of their servers.
Labor costs money, right? So if you assume that $X million worth of servers and electricity are cheaper than $X million worth of programmer time to reimplement the whole mess in C, then it's probably minimizing the carbon footprint to leave it alone. This ought to be a very simple business decision.
There are certainly cases where this is not true, but for most purposes, dollars spent on computer programming go directly to carbon footprint. I'm a Socialist, certainly not a free market fanatic by any stretch, but when it comes to spending millions on highly specialized, skilled labor to reduce carbon footprint, I doubt that it's worth it unless the electricity you save costs more than the specialized labor.
The good and new comes from no quarter where it is looked for, and is always something different from what is expected.
I was hired to do C++ code on a mature project. The original coders were quite skilled. The code was well commented, and the conventions used were (fairly) consistent. The code itself was well-spaced and readable with descriptive variable names.
Personally, I think C++'s greatest failing is that it is too hard for most developers to do it well. The greater difficulty is a consequence of its greater power; it gives developers a wider variety of options for performance optimization. But it is hard.
Many software developers have pride issues. They get really intimidated when they must work with someone who is authentically better than they are. So they really, really don't like to admit that doing C++ properly is too hard for them. Instead they will try to say that the language is flawed, and that other languages (which are less powerful and therefore easier to use) are actually, objectively, superior, and that using C++ is pretty much the wrong decision in any case. Needless to say, I disagree.
C++ is much too slow and carries too much of an overhead. And it usually requires an operating system on a general-purpose processor. You could go to hand-optimized binary code written directly for the processor but that still leaves us with inefficiencies.
Imagine if every website was implemented as an ASIC. Then we could talk about efficient datacenters. Maybe, if you're relly strapped for cash, you could implement each website in an FPGA. But that should only be a stopgap measure until you can afford a proper implementation.
USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
Its all data. The PHP code that mashes it together is a tiny fraction of the backend processing which manages the data store.
In general code written in a dynamically typed language like PHP is harder to test than code written in a statically typed language like C++. The reason why is that statically typed language compilers catch hundreds of problems at compile time that dynamic languages typically cannot catch until run time, and with complete code coverage at that. Misspelled variables and other minor typos anyone?
In a sufficiently large project, all the time one might save not going through an extended compile cycle quickly gets eaten up by the extra testing overhead, testing which for user interface intensive applications is rather difficult to automate.
Of course the developers have to know what they are doing. If you run into more than one dangling point or array overflow problem per ten thousand lines of code, someone is either ignorant or careless. C++ programming is not for code monkeys.
C is certainly faster than PHP (I have written programs in both), but I don't think it is too great of problem in this way. If you want to make your PHP code take up less energy then possibly you can run an optimizer on it and store the optimiezd code. This would speed it up a bit (although it doesn't seem to help much).
I have written two programs, an assembler in C, and a program to copy the assembler's output to the hard drive image using PHP. The C program runs nearly instantly, while the PHP program, even though it is much simpler and just calls a C function to do the direct copy anyways, takes 1 second to run.
And, of course, like mentioned, you do need water and heat and stuff to live, and the major cause of global warming is the sun. And global warming is not as serious as many people says, but nevertheless, if you want to help by being more environmentally friendly you can please do so (as long as it is not a mistake).
That's because if they were writing in C++ they would still be writing the application and wouldn't have any users yet. Larry
...covered. Ban all programming languages except assembler! They emit CO2 and will lead to the destruction of the universe!
I do some basic php development and really couldnt imagine having to do the same work in C++. It would take so much more time it would make it nearly impossible to do.
so maybe its time to optimize php? Perhaps do some hardware acceleration of php code with CUDA -or- maybe if just focus on the php interpreter and make it more efficient.
as with any scripting language, an un-optimized interpreter will run the script code an order of magnitude slower or worse than a pre-compiled program.
I have seen some comments that it isnt likely to cause such a different in server needs because it doesnt take into account I/O performance. Well consider that everything must be bought with some sort of budget in mind then consider that a nice dual-quad core server will cost many thousands of dollars more than a single core2duo server. a 2 socket 3ghz quad core with 24Ghz aggregate x the 10:1 ration means you need a single 2.4 cpu so a core2duo e8400 is actually worth twice the cpu power for pre-compiled code vs a dual quad core system running php. The different is price could be spent on SSD disks or a faster SAS array and actually improve performance.
you can test this yourself by running a bash script to run 10,000 queries on a mysql database with the compiled mysql tools vs a php script to do the same thing (run on the command line) and then subtract the time the database took to return the data. This is a very typical workload for php and it will lose this race everytime.
do I need to put some result here? no, do it yourself so you can see first hand.
Seriously, why was this even posted. While we are at it, lets Mod SoulSkill as +1 Revenue Generator so even then this should get scored as -1 "What! Are you fucking kidding me!" ( with apologies to Robbin Williams )
But on a more on-topic note, PHP and C++ are languages for two totally different things and if SoulSkill doesn't know that, he has no business being a "editor".
Hey KID! Yeah you, get the fuck off my lawn!
FORTRAN is designed to do numeric processing. FORTRAN compilers are very good at optimizing such code. FORTRAN is not at all optimal for doing much of anything else.
Similarly, with the right framework, someone might write a general purpose web application in C++, because you can make string processing code a relatively painless exercise with the right class libraries. Plain old C, on the other hand, is much worse - essentially a language designed not to do string handling very well or very reliably.
Even with C++, there is an enormous interoperability and efficiency problem with strings of different flavors, and I would put rank that problem as #1 on a list of why few people do general purpose business programming in C++. I have *never* seen a standard C++ library compatible C++ string implementation that was worth using in both compile time and run time efficiency terms. Certainly the implementation that comes with GCC doesn't qualify...
This guys takes some benchmarks that have absolutely no basis in actual real world performance and beats his drum.
What, does he want a medal for a beat up on /.?
Drop this fool down a well, and leave optimisation to those who understand it.
1. Prototype PHP code written and tested in an afternoon.
2. Business case written in an afternoon (forgetting to include the profit generating point.)
3. Vulture Capitalist drinks too much at lunchtime pitch and agrees to provide $BIGNUM
4. Development of real codebase begins.
5. Vulture Capitalist sobers up and demands that the new service starts NOW!
6. Prototype code goes into production.
7. Real codebase development abandoned.
The PHP interpreter can and should run in-process to the webserver. Compiled C++, not so much.
Now, I imagine Facebook's scripts are really quite simple - fetch from the database and do some formatting. Moreover, they're called like a thousand times a second. Grabbing from the DB is already the expensive bit; C++ won't help that. But starting thousands of processes a second can't possibly be faster than the in-server interpreter effectively just looping.
Am I missing something or is this a pointlessly stupid article, bordering on troll?
I have developed a truly marvelous proof of this comment, which this signature is too narrow to contain.
I have done projects like this, and received massive speedups and performance increases. The issue is that you need to understand the real reasons why rewriting a program in C and/or assembly gives a massive performance increase. Inevitably, the reason why the C program is so much faster, is that a programmer has went through and rethought the application. The programmer eliminated string copies, string manipulations, data communication overheads, and data manipulation/translation overheads by rethinking the programs design.
For example, imagine a very simple application designed to take a digital input, and display a red/green indicator to a user depending on the input state. Count every time a major string overhead, data communication overhead, or data translation overhead occurs in each of the proposed solutions.
Web Solution
1. Input digital input via PLC (Data Overhead #1)
2. Upload data from input via PLC communications protocol to PC (Data Overhead #2)
3. Make data available to other programs, for example RSSQL makes real-time I/O appear as SQL database queries (Data Overhead #3)
4. Use PHP or ASP to generate a web page based on a SQL query for the real-time input (Data Overhead #4)
5. Use a web browser to query the relevant web page. (Data Overhead #5)
Web Solution performance: it might be able to update the display screen every 1/5 second.
Embedded C Solution
1. Input a data point using real-time I/O
2. Paint a computers display screen accordingly. (Data Overhead #1)
C Solution Performance: 1/60 second, limited by the refresh rate of the monitor.
Assembly / Microcontroller Solution
1. Input the data point, with INP , AX
2. Output the data point to a Red/Green LED, with OUT AX,
Note: the assembly implementation doesn't have any string manipulation, so it doesn't have any significant data overhead.
Assembly Execution Time: Less than 1 micro-second.
The crucial concept from the above example is that the programmer reduced overhead and execution time, by simplifying program operation. The problem was solved in 3 different ways, and the fastest solution wiped out all the communication/string/data management overhead. If you want to make a computer program very fast, it is necessary to reduce data communication, string manipulation, and complex data structure overhead.
Which languages do this and why: .NET encourage carefree string use and data structure use. The have automatic garbage collection. As such, minimal penalties exist for the programmer to use strings.
Level 1 - Simplest: Assembly is the best at wiping out string overhead, because engineers willingly migrate complex functionality to hardware before implementing it in assembly. In this case, the display screen was eliminated in favour of a direct output to an LED.
Level 2 - Low-Level: C is remarkably quick at string manipulation programs, because programmers minimize the amount of string manipulation. String manipulation in C sucks, and is difficult to get correct. As such, programmers attempt to minimize it, or use optimized tools like lex/flex or yacc/bison that automate the difficult problems.
Level 3 - Garbage Collected: Java and
Level 4 - Scripted: PHP, Perl, Python are higher level languages focused on easy programming for high-level tasks. They pretty much assume the programmer doesn't care about the overhead of processing strings or complex data structures. Instead, they make it easy for the programmer to program the complex data structures.
An application like FaceBook has to have some complex data structures to do its job. In that case, a migration from PHP to C will likely not produce great benefits, because the C program still has to do all the same work the PHP program does. The old rule was that interpreters were very slow. With modern techniques, just about any language can be sufficiently compiled to
Dude, are they going to take in to account the extra time your computer needs to be on to implement all that shit long hand? No? So you're saying your suggestion is something of a funny that failed or a troll that needs some souping up?
Or is there a joke in there that you are crap at telling?
Inquiring minds want to know.
Why is it that a decent PHP (or Python, or Ruby) MySQL binding couldn't do the exact same thing?
Don't thank God, thank a doctor!
Hi
This is one of the most stupids post about computer I have ever read. Has the author ever written something like facebook (btw i did see http://www.suvi.org/adrbook/ but on the main subject) If facebook would be implemented in C++ it would probably 1) not exist 2) more expensive (means more trees killed) for production and maintenance 3) and probably be crashing several time because of some implementation faults.
Cheers suvi
hey don’t nock it I have written billing systems (mostly) in FORTRAN 77 and my employer a major telco sold it as an international product. The bit that wasn’t FORTRAN 77 Was Pl/1G
hi /. , next time you want to blow up a flamefest, do it better. Why just assuming a conservative ratio of 10 for the efficiency of C++ versus PHP code ? Make it 1/100 , or 1/1000 ! so the trollpost will be at its trollest, and flames will be at their highest. 1/10 is for whimps.
It's a sad day when programmers cannot think any other reason to use c++ than some crackpot global warming co2 theory that doesn't hold water.
I hate PHP but there are lots of reasons why PHP might be the better solution. Others have pointed out how ridiculous this is. ASM, C, C++ require more skill, and tkae more time to program with. It seems the moment that lately the moment the environment is mentioned as a factor, not only do all other factors cease to be considered, but all common sense goes out the window. It's like everyone in the room spontaneously turns 12 years old and decides they have the solutions to all the world's problems if only everyone would listen, all the while ignoring the knock on effects and complexities of the real world.
These posts express my own personal views, not those of my employer
I think that once your server farm exceeds 100 servers, it's time to start seriously thinking of re-architecting in C. That's probably about the tipping point where your cost of build-out will exceed the engineering costs of switching technologies.
I invested in a small startup a couple of years ago. We based everything on Ruby-on-Rails, php, mysql, and I forget what else. The bitter lesson we learned -- and what was the main cause of the company's failure -- was that this technology doesn't scale. We nursed it along to 30M records on two servers (the db admins must've been geniuses to coax it that far), but in the end, it fell over.
We could have saved the company if we could have afforded to expand our servers on our shoestring budget, but in the end, the software infrastructure we were using would have failed us one way or another.
I've also worked for companies that scaled successfully, and they were the ones that got it. Switched from scripting languages to C++. Switched from MySQL to a dedicated database engine.
Scripting languages are great for development and prototyping, but for serious production use, you really need to bite the bullet and switch to compiled languages.
The exception proves the rule. FORTRAN (like COBOL) is certainly more computationally efficient than the vast majority of languages today, so if you are in a performance / overhead sensitive environment, there can be a lot to be said for such languages. It is just not the sort of thing people normally do without such constraints, because there are other, more modern languages that tend to be more suitable to the task.
No one is going to write an operating system kernel in FORTRAN, for example, although I have no doubt that it is actually possible.
This is idiotic, and is typical of the kind of pseudo-science underlying much of the climate alarmism currently en vogue. Like a lot of things, it is pretty much impossible to quantify which language ultimately uses more power, because of all the variables. As others have pointed out, you might save some power in the deployment of the code, but you would surely use more power in the development of that code. Then, you have to figure out what the total impact of that is, since you'd have more man-hours of coding, using human coders, who sit at desks, in offices, which must be heated and cooled, etc., etc.
I assume the original article doesn't have any basis in reality.
I can also assume, based on those server counts, what the service does, and what it provides to users, that something about Facebook doesn't have an extremely efficient operation.
Its fairly obvious that their solution to performance problems was to scale out rather than optimize. There are god knows how many reasons why they could have made that choice, I have no idea if it really was a good one or a bad one, but I'm fairly certain they could be far more environmentally friendly by using C or C++ on their front ends, even if you you take longer to develope it or you hire more developers (which there is no logical reason why you should, unless you hire incompetent C/C++ devs). Probably would cost more, PHP devs are a dime a dozen, competent C devs aren't.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Suppose all you care about is making money off some stupid internet idiots, so you start a site called Spacebook.
Which makes more business sense; buying a lot of computers and paying for a lot of rack space and electricity so you can hire cheap PHP kids, or paying less for hardware and ongoing hardware related costs, but paying more for C programmers.
I have no idea, but maybe the VCs funding horseshit like MyFace and SpaceBook don't know either.
It is true that PHP and similar script languages are very slow when it comes to number crunching etc. But serving web pages is not computing fractals or raytracing transparent materials, it's mostly pulling some data from the database and doing some string manipulation before presenting data to the visitor. For most of those things you call libraries anyway, like string library, XML library, SQL library etc. While those calls and some other gluing manipulations are faster in C++, the gain is not that significant (at least not 10x).
Like others mentioned, if you also consider caching, then performance gain from C++ is really marginal
It would take a really serious amount of in-depth analysis of the server application to even approach knowing what the efficiency impact of using a compiled language vs an interpreter would be on any specific stack. Or even stacks in general. Plus we don't even know what it really means to be "using PHP". What is PHP doing? Is it processing templates, doing just some post or pre processing with some kind of XML pipeline in the middle, how is the PHP deployed, etc?
It is simply ridiculous to make any assertions and claim accuracy for them. I'm no PHP fan boy by a LONG shot, but I know from hard experience that often a higher level tool which is optimized for a particular job can get the job done quite a lot MORE efficiently than a lower level one that isn't.
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
A lot of PHP code is just making function calls to the built-in library functions anyway, and those library routines are all compiled C/C++. If I call a library function from C or from PHP there is some difference in overhead when setting up the call and processing the results but the actual function is likely to be the exact same thing.
"Almost every wise saying has an opposite one, no less wise, to balance it." - George Santayana
- Server farms take up maybe 0.5-2% of the world's energy. If we're going to stop global warming, we should focus on power plants, cars, and the most energy-hungry industrial processes. McKinsey said 0.5% of the world's energy a year ago, and 1% is cited in some random press release at the top of a Google search.
- As data center usage grows, other factors will push down energy use per clock cycle: faster and more power-efficient processors (as we move to 32nm fabs, chips that power down idle cores, etc.) and more energy-efficient server farm designs (using weather for cooling, powering some boxes down at night, etc.). The gap between interpreted and compiled will narrow.
- It's hard to pin down how many server-hours C++ would actually save. PHP might be spending most of its time running code from C libraries (memcached lookups, HTML/XML parsing, regexp evaluations) instead of interpreting PHP. The article doesn't say what portion of the servers are running PHP, and the 10 to 1 efficiency ratio is pulled out of thin air. The server farm might be I/O-bound instead of CPU-bound, and if it's not, it's quite possible that it would *become* I/O-bound if you rewrote everything in C++, preventing any 10-to-1 savings.
- In capitalism, the amount of energy that will be saved depends directly on how cheap it is to save energy, and there are far more cost-efficient approaches than rewriting everything in C++. Improve the PHP code. Rewrite bottlenecks in Java or C++. Improve the PHP interpreter! Look above the code level to more efficient server/datacenter designs. It doesn't help to suggest cost-inefficient approaches like rewriting tons of code in another language or, for that matter, working with the lights off.
- What applies to Facebook doesn't apply to you and me. For small to medium sites, the environmental cost of the development effort dwarfs the impact of the servers. That is, ignoring all non-environmental factors like developer salaries, time to market, ease of innovation, etc., spending extra months developing my tiny site in C++ instead of Python or Perl or PHP is just going to use more energy for the development resources than the extra energy the servers use to interpret my code.
- Even for Facebook, C++ has its own environmental costs. Instead of a bigger server park, you need a bigger office park and more employees driving to work. My energy-saving suggestion for Facebook in its current PHP-driven incarnation: get more employees working from home!
C++ is an atom bomb in the hands of a chimp.
I've tested software for about 15 years. I can tell you from experience that THE most buggy, nasty, ill designed applications I've ever tested were written in C++.
The world is more than performance. For many, many application, the blazing speed of pointers on a local application simply *don't* *matter*.
Unless I needed to process large chunks of binary data in real time, I'd use anything else but C++. For a web application, I am *sure* that the downtime due to crashes, memory loss, uninitialized pointers and all the other dreck that each and every C++ programmer is convinced never happens to *them* would cost more time, cycles and energy than a perfectly functional PHP application.
Please do not read this sig. Thank you.
A site like Facebook isn't computing the value of pi or calculating jump coordinates for the Galactica, things that would benefit from a more efficient implementation. I don't know anything about Facebook's site, but many web applications use the application layer to essentially pull data from and send data to databases. Parsing data back and forth typically isn't sped up with C++ or even C that much compared to PHP. All many PHP sites do is draw HTML, and a lot of SELECT, INSERT, UPDATE, and DELETE queries. Hard to imagine C++ having a big impact on that, certainly not 10:1.
The holy PHP scripts are compiled and cached after the first run on a good server system. Then they are preached down to the spiritual RAM sanctuary and run through the CPU to make sure it has got the secret idea about the digital space Jesus.
The difference in efficiency results from the quality of the compiler. Which can be fixed by improving it.
Anyway, C++ could never replace PHP on the web, because it is not designed for the web.
c and c++ are both lower level languages than php, closer to the assembly level. c and c++ can be used to code the compiler that php runs on. php is a CUSTOM language for creating web applications. c, or c++, or anything of their level cant come close to it. because, php was PURPOSE BUILT.
Read radical news here
Use a nuclear power station. Problem solved.
... assuming CPU cycles are the key bottleneck and not, say, network communication and data access. I'd assume they look at performance pinch points and optimize those. So if the 10% most computationally-intensive code is written in C++ or Java, the savings in rewriting the rest in C++ might be 15%.
Ceci n'est pas une signature.
It's hard to pin down how many server-hours C++ would actually save. PHP might be spending most of its time running code from C libraries (memcached lookups, HTML/XML parsing, regexp evaluations) instead of interpreting PHP. The article doesn't say what portion of the servers are running PHP, and the 10 to 1 efficiency ratio is pulled out of thin air. The server farm might be I/O-bound instead of CPU-bound, and if it's not, it's quite possible that it would *become* I/O-bound if you rewrote everything in C++, preventing any 10-to-1 savings.
Indeed. Compute-bound code might be 10 times faster in C++, but both PHP and C/C++ will spend the same amount of time waiting for MySQL or PostgreSQL to get back with the results of a SELECT or UPDATE. But even if it's only 2* as fast, on average, that's a lot of servers to save.
As you say, the main thing to do isn't to recode everything in C++ or C, it's to identify the places where PHP performance is the bottleneck... and look for *well factored* shortcuts that can be implemented in a more efficient language.
I was interested in how many kgs of coal a badly written Flash Ad would consume. I did some quick calculations based on the typical Wattage of a desktop and assumed a 10 second view by 1 million people over the course of a year. This assumes that the crappy Flash Ad consumed 100% of a core. I also assumed that all power comes from coal (most of it in the US does). That's a lot of assumptions but should still get us in the "ballpark" for the final figure which, to my surprise, was quite high! I estimated that this would consume around 90 TONS OF COAL! Looking at the figure I am convinced that I made a mistake somewhere in my reasoning, calculations or raw data but I haven't found any problems yet. I would appreciate any interested Slashdotters to set me straight or confirm my work. Here's my blog article with all the calculations on it: http://blog.bit-matrix.com/2009/01/20/how-green-is-your-code/ Thanks.
This story makes assumptions on system architecture that point blank would just not be true in a large scale deployment.
Sure C++ is more efficient that PHP. But where does it say that EVERYTHING or even most is served via a php interpreter. In a large scale world dynamic caches and CDN's would have offloaded all of the cachable content and it would have never hit the php interpreter. Guess what a C++, Java, anything else env would do exactly the same thing.
You would be amazed at how much off load is possible. What you think is dynamic content is usually just a very simplistic dynamic wrapper over large swaths of static content. A seemingly dynamic html is really just a collection of static html blobs that can be held by a cache or CDN close to the consumer and assumbled as a complete html document on the fly by an web aware accelerator.
These 30,000 boxes that are spoken of are most likely composed of a large verity of purpose specific hardware. Where the purposes are much more diverse than a simple php / mysql combo's. Lets face it. Face book is across the planet and is relatively speedy all across the planet. This implies that a distributed CDN is in place. With local content caching nearest the consumer. Instantly this means we have more purposes than just php/mysql for the hardware. Authentication is usually offloaded by large scale web shops for a host of reasons. So lets add that kit to the list of purpose built. I could go on and on about the various purposes.
So it's fairly obvious that the statement we could shut off 22K+ worth of machines by simply ( nothing simple about this ) changing languages is just non-sense.
Plane and simple a large scale web environment is only partially the application hosting equipment. A very large part of the environment is the infrastructure, network, CDN, Caches, accelerators, security, .... equipment. Yes I can write a little fancy web app and run it on my netbook and wow people. It's a whole other matter to scale that up to 10's of millions of concurrent users spread across the planet.
Lets face it. That amount of equipment consumes one hell of a power bill. I'm positive that "Facebook" has already done an enormous amount of work to reduce that bill. Not for save the planet reasons but for simple dollar reasons. Can they do more. Absolutely. But it won't be a simple we'll just change everything over to this new language this weekend and then hit a bunch of power switches. The amount of power consumed by the massive development work force to write test document deploy this new wonder solution would probably leave "Facebook" in an energy debt for another several years.
PS. Good luck with that sales pitch if you ever make it near the office doors of facebook. :)
Clearly, it's time for economics 101 again - opportunity cost.
As a very rough, correct-within-a-factor-of-two estimate, let's assume that the average server results in two tonnes of CO2 being emitted to keep it running. So that's 45,000 tonnes of CO2 a year saved, if the OP's estimate of the difference in speed is correct (which I doubt, but anyway).
However, if Facebook implements in C++, it's reasonable to assume that they will need to hire more developers, and more expensive developers, than if they use PHP.
I don't have accurate numbers, but I'll pull them out of my arse here for the purposes of illustration. Let's say that rather than 500 PHP earning $60,000 a year, Facebook instead employed 750 C++ developers earning $80,000 a year. That's 50 million bucks a year in extra expenses.
Carbon credits on the EU ETS are currently going for around 20 USD. So if you want to prevent the emission of a tonne of CO2, you can go to the EU climate exchange, hand over the equivalent of 20 USD in Euros, and simply rip the permit and not use it.
So let's say that Facebook spends 10% of the difference in programmer costs - 5 million dollars - on ripping up EU emissions permits. They prevent 250000 tonnes of carbon emissions. That's more than five times as much emission reduction as achieved by substituting PHP for C++, and Facebook still has $45 million in the bank.
Heck, we could replace "buy carbon credits" with much higher-cost abatement options like "buy Priuses for company cars" and still come out in front.
Here endeth the lesson.
Any sufficiently advanced technology is indistinguishable from a rigged demo
--Andy Finkel (J. Klass?)
Their decision for using PHP might have to do with being able to get their business up and running now using PHP rather than envisaging go-live a few years down the road with their developer resources and learning curve adjusted to C++ (which in all its well-deserved glory does take its time to master). Probably C's savings in power don't outweigh PHP's savings in manpower.
"Seriously, is somebody taking seriously the 1 to 10 ratio of the story?"
Yet the assertion that programmer productivity varies with a 1 to 10 (or even 1 to 100) ratio is accepted without a blink of an eye or the firing of a single neuron.
My understanding was that PHP code wasn't interpreted for every page view.
...you're actually being serious, PHP with the Zend optimizer should give you most of the speed of C++ by precompiling at runtime; the "22,500" number also seems pie-in-the-sky optimistic. There is also the need to compile all the C++ code before going "live," which means all of your code needs to be compiled together (instead of updating 1 script); replacing the scripting language with C++ would not affect the power usage (and CO2 footprint) of the database server, and finally you're basing your numbers on the advertising page for Webtoolkit, which is going to be biased against PHP to sell their product. You're also going to have to train your programmers to use WT and C++ instead of PHP (which is a bit easier to learn; I know firsthand!), and pay CO2 penalties to convert all your code over. Not a pretty concept.
Why is this garbage even getting posted?
My blog: http://www.seebs.net/log/ --- My iPhone/iPad app: http://www.seebs.net/seebsfrac/
At best, a good case for compiling PHP code. I'm not sure the assumptions are sound at all: 10:1 ration? PHP code is dominant code executed? I'd bet most of execution is in database, and that is surely executing compiled C/C++.
Let's assume that C++ is twice as hard to code and maintain as PHP. Let's also assume 200 man years of work went in Facebook. Let's further assume there are 50 maintenance workers and each worker commutes 50km per day, 220 days per year.
Cars average 200 grams of CO2 emissions per km, so writing Facebook in C++ would have produced 440,000 more tonnes of CO2 than the PHP workers. Each year that goes by, the C++ maintenance workers would have produced 110,000 more tonnes of CO2 than the PHP workers.
Sounds like a good argument to dump C++. As if you needed any argument other than "it's C++"!
Running a server is cheap.
Paying a developer is not.
Civilisation is largely about the multiplication of human effort through the consumption of energy and automation. So, we multiply this developer's effort by a couple of thousand when running one machine and then do the same on another several hundred machines beyond. Each costs several thousand dollars to purchase and several thousand more every year in electricity, in cooling, networking, management and maintenance.
So, the effects of developer incompetence are also multiplied several thousand times often across hundreds or thousands of systems. Millions if we're really lucky.
So it isn't just one server, it's just one extra datacenter. It often pays to hire better people.
running a server for a day - $1
You think you get a real server for that? You get a tiny division of a server for that kind of money.
2) why doesn't these big server farms start looking at migrating code from PHP to C or C++ when the PHP+web design is solid?
The network effect. They migrate to Java instead.
Speed to delivery is nearly always primary importance.
Indicating speculative projects and disposable code.
Deleted
Clear evidence.
After a recent change in hardware platform for new acquisitions, Facebook was surprised to get a speedup from RDDR2-800 memory vs. FB-DDR2-667 memory, because many of their apps are actually memory-bound. They're a major user of memcached, so the real limiting factor on how many servers they can power down is how much RAM they can stuff in a server. The CPU utilization comes somewhere after memory/disk throughput/latency in power conservation considerations. Sure, there's a small marginal difference in how much energy they burn through on PHP code vs. C++ code, and a small marginal difference in how much RAM they need free for PHP vs. C++ code, but for the effort it would take them to switch to C++, they could save a lot more energy by optimizing how they use memcached. Which is exactly what they're doing.
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
I think if we managed to sustain a fart at least once a year we would putt less CO2 for a year than PHP cause for 10 years. Not to mention that how much PHP with all that Open Source phenomenon behind it save lives and nervs... PHP affected more on suicidal numbers that anything in C++ Web development world...
Noscript :>)
Mod me up/Mod me down: I wont frown as I've no crown
I did know that I will find it: "The average person farts 12-18 times a day, producing about 45 mL of CO2 and 45 mL of CH4 per day, or 16.43L of CO2 and 12.78L of CH4 per year." Source:http://envirostats.digitalcitizen.ca/2007/07/19/0172/ Here is interesting part: "At Standard Temperature and Pressure (STP), 1 kg CO2 is 509.1L, so the 284.9L per year is only 0.560 kg CO2 per year. This is less than the amount it takes to run your computer for a year (0.705 kg), and a tree would only have to spend 17 days per year “sniffing” the greenhouse gases in your farts all year to carbon neutralize it." Considering that not evry peroson on world has computer ( and uses PHP) conclusion is not discussable. Moderator are you one of people who always giggles when someone mention fart or similar word?
Well I am biased in that I *like* PHP, despite its flaws. However, speaking to the argument that Java might be a better choice, I did work on one project which we essentially completed in PHP as a first draft, and then the lead designer decided that PHP wouldn't be scalable enough (i.e. he decided he didn't like it, since he had no evidence to point to), and we rebuilt the whole thing in Java instead. PHP development time: 6 months, Java Reegineering: 2 years more.
The end result was efficient and made good use of Java's strengths, but I can't honestly say it was a superior product, and the additional development time spent on adapting it to Java could have been spent revising and improving the PHP version 4 times over.
PHP is quick to develop it, quick to make changes to, and good enough for many jobs. There is a reason why sites like Facebook are using it so heavily. Purist programmers may not like it for a variety of reasons that may be entirely justified, but in many cases PHP is the right tool for the job and that is what should really matter. Java is also an excellent tool, but I think the development time is greater in the end, and the results are not necessarily all that much better. It likely comes down to a matter of preference, as other languages like Python or Ruby would probably be just as useful in the end.
As an aside, why does the OP not mention that you can write your own extensions to PHP in C++. Wouldn't that be a better option where you see code that is running too slow: concentrate on only those bits where it might really have an effect.
"The first time I got drunk, I got married. The second time I bought a chimpanzee, after that I stayed sober" Arian Seid
What about just turning off facebook, the entire thing is a collosal waste of time. And while your at shoot all the users as most of them are idiots who are a waste of space.
I know it's already all been said but really...the author needs to try this:
Program a small application in php, and one in c++. All the data must be stored in a database, on a remote machine (which is the way it would be done for a huge site). Now, hardcode in some data for your first benchmark of php vs c++ to get an idea for raw php vs c++ performance doing the same task, now, comment all that out, and get the data from the database, and time that. Guess what, I bet the times become pretty darn similar in the latter test. No, php isn't going to be QUITE as fast...but it's gonna be really close for REAL web-application type workloads where latency from your sql server and loading all the other page content come into play. Your clients are never going to notice the difference on the majority of applications, and I don't believe that most time is spent processing php or c++ code on a web application either. It's waiting for the DB, and uploading content to clients. If you truely are doing extremely data heavy tasks and lots of floating point math or something, then yes...C++ is probably a better tool, but even then, there's no reason not to use php for the non-data heavy stuff...
I know I'm just a youngster web programmer who only graduated from college a while ago and who deserves little respect on slashdot compared to some of you, but I do have a decent amount of experience with quite a few programming languages. I learned C++ first, and know it pretty well for someone who doesn't use it constantly any more. I think I'm very very good with PHP...and so does my boss. Given that information, I have something to say.
If someone told me to program an entire web application from top to bottom in C++, I'd probably quit on the spot, and walk out laughing all the way to the parking lot. I like C++ for lots of things, but there is no way in hell you would EVER get me to program an entire web application in c++. It would take 10x longer at LEAST to develop than it would for me to do in even Java, with which I actually have less knowledge of (but still a good working knowledge of), and I don't consider ideal for web apps. The debugging for C++ on something like that would probably drive me completely insane. I say me, because I'm assuming that I'm building this web app and only me...like I can do very quickly in php or python or nearly any other language that does web stuff well, otherwise it would be me and all my co-workers.
Languages that are traditionally used for web development are used for a reason...and it's not how fast/efficient they run. It's for the difference in expense of the developers (HUGE factor really), for how well the language suits the web in it's core libraries, and how well it integrates with web servers, database abstraction libraries...well I could go on forever really. I'm not saying that C++ couldn't get libraries built for it that made its appealing as say, ruby (on rails), php, or python, but lets face it, it would take a very long time to get to that point where everything was as seamless and easy as it is in current web languages...not to mention getting hosting companies to let you run a c++ app YOU programmed on their servers (they'd have to be stupid...really freakin stupid). C++ will just never be popular enough for web stuff to be attractive to developers...that's the bottom line. Lack of efficiency is such a tiny price to pay compared to these other factors.
This is like saying that we should use those medieval wooden carts to ship packages across the US. PHP works in many places where cpp fails to deliver (no pun intended with relation to my previous statement).
We have done the actual benchmarks, and the original post matches our experience.
PHP gives processing times of around 1 second (for a search function) and C++ code via a CGI gaves times of 0.1 sec. A ten times improvement.
Graphs and numbers are here,
http://www.wrensoft.com/zoom/benchmarks.html
Further when we switched to FastCGI we saw another 5 fold improvement, after optimising the code for FastCGI.
So I would believe a 50 folder improvement should be possible by going from PHP to FastCGI (and rewriting code to suit a FastCGI)
Aw, c'mon, that's just harsh. It's programming, it's just one type of programming. Speaking as a C and assembler fanatic, but fan of inter-compatible versions of Python and burner-at-the-stake of perl.
Scripting is nice just to whip things up and do proof of concept in, too.
On-thread, I agree that the difference here is quite high (for PHP against C), it'll vary for other languages. There's a nice table of relative speeds here (and the argument about what the language is doing, I don't buy... in my experience, things tend to remain somewhat relative):
Multi language simple fractal benchmark
These are obviously not the times you'd get for web ops, but I am of the opinion you'd find a similar curve, similar time relationships, for any general program. Which is to say that not only is PHP slow, it's slow for everything, which means it is an inefficient use of system resources in busy environments. Lots of cycles for not much done, which means programs that are loaded longer and resources that are tied up longer -- regardless of the underlying DB transactions, too - the DB is running on the same server, those cycles could have been DB cycles. And if not, more apps could have been running with the others out of the way sooner and/or not consuming CPU time. Waste is waste.
I've fallen off your lawn, and I can't get up.
The problem is that when you've got poorly defined data structures and typing most of the overhead can't be compiled out, as your C/C++ program still needs to deal with all of the indirection and type conversion. Due to PHP's loose typing, the type of a variable at any given time cannot be predicted, as it may vary due to user input. The problem is that the language provides no mechanism of forcing a given variable to any type or creating any data structures. Even in compiled code, even a simple addition of 1 to an integer must involve pointer dereferencing and lots type changing bounds checking.
The code could be optimized if the type of a given variable could be forced for its entire life in scope, which would allow to compiler to avoid a lot of repetitive logic for each operation. Specifying strict data structures would also be very helpful, as the compiler could then avoid finding the pointer in a key hash and instead it could hard code the offset from the start of the structure.
I'm distraught about PHP and its efficiency. Unless you do some kind of guru magic, every click on your server is interpreting not only the screen you're visiting but all the libraries it requires to load itself. The computers spend nearly all of their time interpreting the same code over and over for each click. The analysis is sound and points out the fallacy of using interpretive languages in ways they aren't intended to be used. On the bright side, at least they aren't using Perl under CGI.
Kriston
A few years after processor designers ran face first into the power wall we're starting to see clearly the age of performance challenged anything goes abstraction absurdium has come and gone.
Out of curiosity, does anybody use Pike, at least outside of Scandinavia and Germany? I've only written toy programs in it, but it seems quite nice and to be pretty efficient. It's C-like, but with many of the nice features of dynamic languages. (Okay, I admit it, what I really like is that it allows binary literals.)
That's my guess of a stat. I'll keep with PHP, thanks.
Steve Magruder, Metro Foodist
Oh the joys of bad math.
But what's the environmental impact of making up 'facts', turning that into smoke, and then trying to blow that smoke up my ass?
A lot of people are getting bent out of shape by this thing, but I can't see how anyone can begin to take it seriously. The whole thing is so completely bogus and made up that giving it anything but a good hearty laugh
Thought I'd love it if my randomly made up factoids merited a front page story on Slashdot.
Ergo, it is going to reduce the processing necessary on the server to do any given job
Any given job, yes. But if there are a lot more "jobs" (i.e. more requests that require server side processing), the efficiency of the language used on the server side tends to become more critical, not less, especially if the per request overhead is significant, something that happens to be one of Facebook's primary complaints about PHP.
And how much would CO2 emissions be reduced if people just stopped fucking wasting their time Facebook and MySpace?
All you have demonstrated there is that statically compiled implementation A of a regular expression library is more efficient than statically compiled implemenation B of a regular expression library. This is a contest that PHP cannot win. The implementation PHP uses is not written in PHP, but rather something like C.
That said, programmers who have better alternatives do not rely on regular expression parsers for casual processing in performance critical code. Regular expression engieer tend to be much slower than the alternatives in any compiled language. Of course hand coding the equivalent code takes longer.
You will never find a competitive compiler that uses dynamically parsed regular expressions to do language parsing for precisely this reason. It would slow the parsing phase down by a factor of ten or more due to two problems - first the regular expressions themselves have to be parsed, and then the resulting state machine has to be interpreted. In other words regular expression libraries are a microcosm of the situation with PHP itself - provided the patterns are known ahead of time, hand written native code to do the same thing is much faster.
I think most of the analysis I've seen so far implies that it's a simple cost tradeoff - more programmers versus more servers. It's more complicated than that, and you have identified one of the reasons why: maintainability. The kind of programs Facebook needs written are just plain easier to write in PHP than in C or C++, for many reasons (though PHP was probably not the best language they could have chosen, it's not an awful choice, either). If they wrote it in C or C++, not only would it be more expensive, but (and this is the important part) it would take longer to write and test. Web companies can't afford to waste a couple years re-writing their core infrastructure. They also can't afford to be stuck with a tool that's hard to modify.
Also, it's not as if Facebook is an all-PHP shop. They've invested a lot of effort writing an open source tool called thrift, which allows programs in any of the (12 or so) supported languages to communicate easily with each other. If they decide it makes sense for them, they can change languages.
I think the world is better off when people use the language that meets their needs rather than the language that makes them look smart.
With modern techniques, just about any language can be sufficiently compiled to run within 1/2 the speed of C
If only this were true. In the real world, Java is dead on the client side precisely because a large Java application cannot be compiled (let alone JITed) to come within half the speed of an equivalent application written in C/C++. Dynamic languages are worse in the sense that they tend to compile to native code much more poorly than a statically typed language like Java, although this is indeed changing.
When you can run a word processor written in Javascript on a thousand page document with 1/2 the speed of a word processor written in C/C++, and not wait forever for it to start up, we will know that dynamic language compilers have arrived. I would be impressed if someone could do that with Java, especially without resorting to expedients like SWT, to say nothing of library after library of C support routines - stuff Java simply (currently) cannot do with any reasonable speed, no matter how good the compiler is.
And it need not be said that no one is going to write a word processor in Perl, Python, or PHP anytime soon, although once the appropriate JITs are out there it would be interesting to see someone try.
http://www.guardian.co.uk/technology/2008/jan/14/facebook
the advantage of C and C++ is that it is not dynamically typed. picking some language that has similar syntax but is dynamically type makes it no different than picking python, ruby or php. Other then you get to suffer with a system language's syntax instead of the niceties of a script's syntax.
“Common sense is not so common.” — Voltaire
well at one time some of PR1MOS was in fortran but i think the later versions where more PL/1G
I have not tried this, but there is a thing called PHC ( http://www.phpcompiler.org/ ) that purports to compile PHP code. Has anybody tried it? Does it work? Is it useful?
aren't they going to have more eco-friendly results with a black (instead of white) background on the html pages they produce?
that's 5 minutes of work and a zillion pixels on client screens that don't light up.
Because I know this Java guy and he got the foulest breath you ever smelled before you nose shuts off.
And since I am a PHPer and god's gift to woman kind (he decided they could do with a laugh) we can now safely conclude that 1. PHP 2. C++ 3. Java 999999999999999. ASP
The original article is just silly however, C++ is overkill were you often end up writing stuff that has already been written and been written better then you can ever do yourself in the time a typical web project has.
And that the author is full of shit can be clearly proven that he has NOT approached facebook with a sample of new code that would safe them a fortune. I myself have made very good money helping companies reduce their server needs, if he can shut so many servers of with his code, he can earn millions. But he isn't, is he? Perhaps because as a C++ web programmer he ain't used to actually getting a working site out on time and on budget that can be maintained easily to deal with rapidly changing demands?
I seen "real" programmers completly loose it on web-projects. They want to do design and analysis when the time to setup the schedule is the time you got for the entire project. Yeah yeah, if you give them a year or two, the product is no doubt 100% better then the 2 weeks PHP job, but the web moves a bit faster then that boys. I have seen a project where a traditional programmer sought out a framework solution that was just perfect. Nobody had used it but no problem, he had setup a 1 month training schedule for everyone. The entire project was supposed to be delivered in 1 month.
PHP works people because it works FAST. It allows you to drill straight down to the thing you want to do with a website rather then first having to develop your own solution for things that have already been done. Yes, this ease has lead to a LOT of PHP developers who couldn't code their way out of a paper bag, but this is like saying that because there are so many idiotic windows users, all windows users are idiots. Or indeed me saying that because I only seen C++/Java developers who over-engineered, all C++/Java developers do this.
And as for speed, when you use compiled PHP, it really ain't that much slower. Sure, you could optimize the hell out of your website with C++, but only if you spend ages doing this, making your site obsolete by the time it launches and extremely hard to adapt as demands change.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
You forgot that you can always do it in a FPGA, and if that isn't fast or efficient enough just build an ASIC
http://naspinski.net/post/AspNet-vs-php--speed-comparison.aspx
Actually OO guys routinely make things ambiguous.
Three lines of code becomes 60 after several classes have become established.
Instantiate is such a spontaneous word to them. Every 3rd word out of an OO guy's mouth is instantiate.
Damn Microsoft people.
So I proudly write int int1, int2 and int3 just to piss you off.
Reminds me of Debian people with their packages of packages.
Ar sucks. Just use Tar.
"The syntax is fairly straightforward and familiar, being a typical mishmash of shell scripting, C and perlisms"
How can somebody contradict himself in the short space of a single sentence?
IANAL but write like a drunk one.
Comparing the speed of PHP (or Perl, or Ruby, or ???) to a compiled application in a site like Facebook is an invalid comparison; so much of what needs to get done involves outside processes. Pesky little things like exchanging data with an SQL server.
Years ago, I wrote a highly-optimized C program to interpret input data and save it to a database. C was used because the interpretation involved a lot of bit-based (rather than byte-based) manipulation. It was very fast. But, a couple of years ago, I needed a one-time modified version of it for a project, so I developed the temporary program in PHP instead, since the changes to the C program would take longer than I wanted to dedicate to the process.
Surprisingly, the bit-wise manipulation in PHP, while significantly slower than that of the C program, was not a significant factor in how fast the conversion ran. What should have been a 100-200% increase in run time was less than 5%, due to the overhead of the database inserts. For a program that runs once or twice per day, the added overhead was inconsequential, so the next revision of that program was in PHP.
Could Facebook shut down thousands of servers if all their code was converted from PHP to C++? Doubtful. Maybe a hundred or so, but not tens of thousands, as claimed.
The premise of high-level scripting languages is they save programmers' time, and that programmers' time is worth more than computers' time. Let's ass/u/me it's true (yes, I know some people will disagree that high-level languages really save programming time).
If you're going to talk about how much would be saved by switching to a lower-level language, then you should also talk about what it would cost, i.e. how many people would still be working on those applications and their pay rates and other expenses.
And if you going to measure CO2 cost of PHP, then don't forget to measure the C++ developers' workstations' impact, compiling several times per day to develop an application whose PHP equivalent is already running. And do these developers commute? Are the lights on in their offices? Should you measure their entire carbon footprint during the increased development period? (Probably not, but some of it would surely apply.)
Get it done wins every time. Power is cheaper than a competent C++ coder's time and time to market is shorter for PHP. I'm not saying it's right, it's just reality kicking idealism's ass again. /shrug
Don't kid yourself. It's the size of the regexp AND how you use it that counts.
and as much as I had to say it PHP is about as efficient as any other language that has a C based runtime at String processing. There is probably NOTHING to be gained by adding the complexity and cost of a lower level language like C++. Now re-writing it in Erlang, that might get you something ;-)
The reason Facebook is not written in C++ is because they would still be writing it, so we would not have facebook.
I think, these benchmarks of C++ (CppCMS) vs PHP based web systems are more then relevant: http://cppcms.sourceforge.net/wikipp/en/page/benchmarks
Do you think the wikipedia should be written in C++ too? My G-O-D!!
The premise of the article is right on target, in terms of power efficiency for the entire front end. You are right, however, that if power consumption or hardware overhead is a sufficient problem, you could probably gain a marginal amount of performance by going to C instead of C++, although far short of the reduction one could obtain by making the first step. The question is one of balance.
If your requirements make it optimal to run a language that is running behind the state of the art in interpreted languages by nearly three decades because it is easier for some people to learn, that is great. Compared to something like Applesoft BASIC, which actually started out with an intelligent, highly optimized design (byte code even!), PHP started out as a first class hack. And like most first class hacks, short sightedness in the original design tends to cripple all future versions, even when it is far past time for the system to "grow up".
And that, unfortunately, is why the future of PHP looks approximately as bright as the future of Perl. The duct tape of the Internet. Not that there isn't a sizeable niche for that sort of thing.
Wow, Gratz to the submitter of this article. This is one of the most rabid and foolish examples of "going green" I've seen. PHP is certainly not the most efficient language in terms of run time resources. But you have to count the resources required to develop and maintain C++ code with all its pointer foulups and memory leaks versus PHP which is relatively simple and straightforward to develop and runs in a very stable manner on either a LAMP or WAMP stack. Sure servers eat up a lot of energy. But so do programmer armies who have to commute or log in or fill out timesheets by the forest, all to chase memory leaks, buffer overflows and the like. Oh yeah, and lets not forget the number of EXTRA servers you're going to have to put online to make up for the ones that need rebooted every few hours because some high school script kiddie doesn't bother to sufficiently check for memory leaks that chew through the server like a teen athelete on steroids. Maintainability is a HUGE factor in overall cost in terms of both $$$ and other resources. Use C++ or Assembler or whatever low level, low resource hogging language on a FEW critical sections, written by l337 coders, and let the pock faced script kiddie army churn out the mountains of PHP, .Net, JavaScript, etc that is at least garbage collected in their own VMs.
Be More, Be Manly, The Manly Geek Ubergeek Extraordinaire Blogger: www.manlygeek.com/blog Podcaster: podcast.man
"1. It is not a compact format
2. It has to be read into memory often the file itself isnt searchable or indexed.
3. No support for Unicode host names (its an ANSI text file, not UTF8)
4. There is no way to control access for readers and writers its a text file not a database
5. If I was a malware writer this is the first place Id look to change things. Oliver day mentions this in his article. So does Wikipedia. - http://foredecker.wordpress.com/2009/12/07/dear-anonymous-slashdot-guy/
Per your points on HOSTS files, my disprovals of your points are below, 1 by 1, via an emumerated reply:
====
"1. It is not a compact format" - by Foredecker http://foredecker.wordpress.com/2009/12/07/dear-anonymous-slashdot-guy/
APK REPLY/REBUTTAL: It isn't when you folks removed what makes it smaller & F A S T E R to read up from disk/file, into memory (0 blocking address, no longer possible in VISTA, Windows Server 2008, & Windows 7 ever since MS Patch Tuesday 12/08/2008, when Microsoft REMOVED 0 as a legit blocking IP address in HOSTS files in those versions of Windows NT based OS).
Funny - because Windows 2000 had it & still does (as do Windows XP & Windows Server 2003 still). However, Windows 2000 didn't have 0 as a LEGITIMATE BLOCKING ADDRESS FOR HOSTS FILES in its original model for sale on CD... 0 was added in a service pack, afterwards (because it is smaller & faster, & a good thing... a good thing I am wondering WHY you have removed from HOSTS in Windows VISTA onwards... when it DID WORK ON VISTA, up to 12/09/2008 MS Patch Tuesday, but not afterwards!)
----
"2. It has to be read into memory often the file itself isnt searchable or indexed" - by Foredecker http://foredecker.wordpress.com/2009/12/07/dear-anonymous-slashdot-guy/
APK REPLY/REBUTTAL: NO, it does not.
The local DNS client can handle it, but ONLY UP TO A CERTAIN SIZE (another problem IS the DNS CLIENT CACHE ITSELF, failing on larger HOSTS files, mind you)... so, you disable the local DNS client service is all.
Then, your local diskcache subsystem caches the file & "repeated reads" are ELIMINATED!
----
"3. No support for Unicode host names (its an ANSI text file, not UTF8)" - by Foredecker http://foredecker.wordpress.com/2009/12/07/dear-anonymous-slashdot-guy/
APK REPLY/REBUTTAL: The HOSTS file doesn't require this. Not on *NIX variants, not on Windows. It is a text file, period & SPECIFICALLY, an ASCII text file (not the types you stated), per RFC 606, 608, & 627 (nor is it a database as you seem to be alluding to above, this is how it was designed not by Microsoft, but by the folks in the *NIX world, period, via the BSD reference design which Microsoft uses for their IP stack).
----
"4. There is no way to control access for readers and writers its a text file not a database" - by Foredecker http://foredecker.wordpress.com/2009/12/07/dear-anonymous-slashdot-guy/
APK REPLY/REBUTTAL: You can READ ONLY (set this attribute on it) protect it. Easy enough (or more radically, apply ACL security to it)
----
"5.) If I was a malware writer this is the first place Id look to change things. Oliver day mentions this in his article" - by Foredecker http://foredecker.wordpres
No. The less you have to hit the database, the better. Slashdot is a good example - people who are logged in are only 1/3 of all page views, but require 2/3 of the boxes.
Also, caches like static content the best.
What? The point is that they should be dealing with relatively static data, therefore they go to their cache layer first (in this order usually: memory, memcached or similar, DB query cache, DB query)
Of course the less you have to hit the DB the better, but even moreso the less you have to re-calculate things that don't change the better.