The Environmental Impact of PHP Compared To C++ On Facebook
Kensai7 writes "Recently, Facebook provided us with some information on their server park. They use about 30,000 servers, and not surprisingly, most of them are running PHP code to generate pages full of social info for their users. As they only say that 'the bulk' is running PHP, let's assume this to be 25,000 of the 30,000. If C++ would have been used instead of PHP, then 22,500 servers could be powered down (assuming a conservative ratio of 10 for the efficiency of C++ versus PHP code), or a reduction of 49,000 tons of CO2 per year. Of course, it is a bit unfair to isolate Facebook here. Their servers are only a tiny fraction of computers deployed world-wide that are interpreting PHP code."
I remember when it was the script kiddie's substitute for cgi-perl. What does it offer from a theoretical and engineering PoV, apart from a Visual Basic learning curve?
What about all the cycles compiling and debugging C++ code? Or all the trees torn down for C++ books? Or the environmental impact of C++ developers? I mean, have you ever had to share a cube with one of them? Pheewww.
Run and catch, run and catch, the lamb is caught in the blackberry patch.
That's a ridiculous way to analyze it. What about the environmental impact of the extra time required to write the same functionality in C++? What about the impact of whole classes of C++ bugs that don't exist in C++ (and, perhaps, vice versa) with the downtime or security breaches resulting from them? Or a hundred other ways in which writing all that software in C++ would be different of which I can't think at the moment?
Seriously, is somebody taking seriously the 1 to 10 ratio of the story?
I mean, maybe raw execution of pure code is going 10 times slower in PHP than C++ (ouch, I didn't know that) but even then, it's far from representing the same ratio when talking about a number of servers. You have to take into account all other parameters (disk access, network, IO, etc... Those aren't 10 times as slow in PHP one would guess).
I would be astonished if this ratio is close to be the truth. Does anyone have any insight/information on this?
Write boring code, not shiny code!
The thing that this article fails to see, is that some languages aren't for everyone. A PHP programmer who turns out good PHP code isn't going to magically make the same level of code for C++. It also doesn't see that Facebook can't be down for longer than an hour at most, otherwise risk user outrage. After all, they have many, many, many users and for it to go down for a day would be akin to Google going down for a day or so. The difference being that if Google is down for a day, most users can use Yahoo, Bing, Live, WolframAlpha, etc. to search. Not every Facebook user has a MySpace.
Taxation is legalized theft, no more, no less.
That's crazy. 10:1 is incredibly unfair. Especially when you consider that a cached C++ page takes just as much time to return as a cached PHP page. On top of that, majority of the work done is just searching a database. If would imagine a large part of processing a page is in getting and returning data, which is then up-to-the database. He is using stats that say PHP is 10 slower for running through loops, math that type of crap. Says nothing about querying a database then doing some minor presentation related logic. If I had to guess, for a web page the average "efficiency gain" of using C++ would be under 2x.
You mean kind of like Road Send
http://www.roadsend.com/home/index.php?pageID=compiler
Read the first posters points (in TFA) he pretty much sums everything up.
Just serve up plain text files. Anything else is pure decadence!
I'm thinking that these scripts are just thin front ends to a massive db. Thus, a lot of the computer's time is going to be spent on I/O, and a lot of the processing is going to be taking place in the db itself, which is probably written in C.
Mod points: Guaranteed to remove your sense of humor.
Side effects may include gullibility and temporary retardation
Simply put: no.
The reason why they have so many servers is because Facebook contains so much data. The servers are there for a reason, and the reason is CACHING.
The overhead of PHP is very small for a platform that is all about sharing data and the bulk of processor time surely goes towards fetching that data in the first place. What, do you seriously think that when you hit your home page on Facebook, there are database queries issued for that? Lulz.
Besides, I'm almost sure that FB uses something like Zend Accelerator, which increases code execution speed a lot.
Anyway, just no.
I don't care about your environmentalism.
Exactly. He using stats from benchmark results from pure number-crunching tests. Seriously, here are the tests: pidigits, reverse-complement, regex-dna, k-nucleotide, n-body, fasta, binary-trees, fannkuch, spectral-norm, mandelbrot. Yep. Looks like stuff a web page would do... The biggest bottle neck is probably data access, in which case the language really doesn't make much, if any difference.
Why not rewrite everything in assembly? This comparison comes to a conclusion without any facts to back it up. As others have pointed out there is development time and compile time associated with C++... and what about ongoing development? Where does 10-1 come from? Are you assuming they aren't doing any optimization or using any sort of accelerator? I've personally re-written code in C++ from php, and then done the comparison. In our case, we decided the extra maintainability was worth the approx 10-20% increase in speed we saw.
...were they to rewrite it all in assembly language!
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
For something that is deployed to tens of thousands of machines..
Is there some reason why these languages couldn't be compiled and optimized? Code is just the programmer's will expressed as text that the machine can somehow interpret, right? If there is so much PHP out there, why wouldn't/couldn't there be an efficient compiler (by which I mean something that produces executables and not just "executables that are really just an interpreter tacked onto a script")
The dearth of such compilers on the market suggests to me that the gains wouldn't be as great as claimed for the majority of applications where interpreted languages are used.
Can you be Even More Awesome?!
Does the author seriously believe that Facebook isn't running some sort of PHP compiling/caching service, like APC or something similar?
It would be ridiculous for them NOT to be running something like that, which eliminates much of the advantage C++ would enjoy through being pre-compiled. While there still may be a reduction if Facebook were magically changed to precompiled C++ code, the reduction would be fairly minimal. In addition to that, you'd need to factor in the debugging and coding/compiling times, which would exceed the PHP times by an order of magnitude at least.
I'm assuming the claim about 10 times is true, which I don't really think so...
But they could have done something - like precompile the PHP, just like JIT of Java, to make it better or on par with compiled C program.
There are PHP accelerators like Zend Accelerator for that.
This is why people don't take global warming seriously. Please, just stop it. If you really wanted to help, you could just fucking kill yourself and cut your carbon footprint to 0.
If you find this post offensive, don't read it! THINK ABOUT YOUR BREATHING! I am what I am because of how apes behave.
What a troll. Any point or argument based on assumptions is very weak. Here there are two: "..Let's assume this to be ..." and "...assuming a conservative ratio of 10...".
Don't make stuff up.
-Foredecker
Jibe!
Google.com: 12,056 servers Website Worth: $ 186,352,889,952 USD (€ 126,610,389,668 EUR) Facebook: 34,872 servers Website Worth: $ 5,253,772,177 USD (€ 3,569,475,862 EUR) I'd say 'GOOG' i doing a bit better atm :)
for those interested:
info comes from: http://websiteshadow.com/
it will show ad rev. etc for URL's you provide
This don't make much sense. You can go to work in a F1 car, or your normal car. You in theory will go faster in the F1 car.
In real world, there are other "fasters". The normal car is "faster to buy" (cheaper), "faster to mantain" (cheaper to mantain), and lots others "faster" that make faster your normal car than your F1 car.
Facebook is probably one of the few sites that could have written part of it on fast C++ code. In a F1 race, you will use a F1 car.
-Woof woof woof!
Many many moons ago efficiency was everything. The CPU was expensive, the developer was (relatively speaking) cheap.
Then Moore's law really started to kick in, and we needed a paradigm shift. Developers were more expensive, and CPU cycles could be had on the cheap. The mantra was "code it fast, and only worry about efficiency for the bottlenecks if at all".
Fast forward to almost 2010, and we have web applications deployed on a massive scale. Guess efficiency matters again. Not only from a pure cost standpoint but also from a moral argument to cut back on greenhouse gases. Amazing that more people haven't seen this coming. Especially given that web services are normally free to the consumer, the cost side of the equation clearly matters.
Seriously, is somebody taking seriously the 1 to 10 ratio of the story?
Only 1 to10 ?!? I would have thought 1 to 100.
"assuming a conservative ratio of 10 for the efficiency of C++ versus PHP code"
ARRRRRGGGGHHHHHHHHHHHHH
Why? On what evidence? I mean, I hate PHP as much as the next guy, but last time I wrote a web application platform in C++, I got to the end, analysed the result and went "Great, I've made the fast bit even faster. Now, about that database engine..."
it sure seems to be, the way a lot of people write it. write it once and hope you never have to read it since its impossible to figure out what they intended. ever read someone's c++ code? has it been a good experience?
--
"It is now safe to switch off your computer."
"if any difference" is going slightly too far probably? There should be some, if only because of lower CPU utilization (plus possibility of using much more efficient, even if slower, CPUs?), and hence somewhat lower power usage (including cooling). However slight. But yeah, certainly nothing close to 10:1.
One that hath name thou can not otter
In addition to that, quite dodgey looking, ratio, you have to take into account why PHP was being used.
Even if the environment were a complete non-issue, nobody would pay substantial amounts of perfectly good money to keep ~20,000 servers humming. This tells us that either A) using PHP is substantially cheaper on the development side than using C++ would be(for this particular application or B) somebody used PHP long ago, because this facebook thing is just a quick hack, right? and now it is cheaper to live with their legacy than to break free of it.
While servers have a direct environmental impact that is pretty easy to measure(electricity consumption/year * nastiness of local form of electricity production), "development" also has one. The reason that development is expensive is that it involves forking over large sums of money to pay programmers and managers and such. With the occasional exception of that-weird-dude-who-telecommutes-from-his-solar-powered-eco-commune, pretty much every dollar spent on programmers and managers ends up being spent either on A) living a lavishly energy-intensive western lifestyle(if your dev team is local) or B) rapidly and dramatically increasing the already enormous population of the developing world's appetite for energy intensive lifestyles.
If you want to play the "OMG-environment" game, you can't just focus on the stuff that is trivial to measure and ignore the hard stuff.
Yeah right serving a page out of cache in PHP takes ten times as long as C++.
Plain text vs. any web graphics. Who needs all that fancy graphics crap? If you can't get your message across with plain ASCII, then you are incompetent, lazy, and on the verge of being a mindless PowerPoint Ranger.
.. because I didn't ever think I'd be defending PHP.
However, it is a much better choice for a web application than C or C++ - and I say that as someone who codes C, C++ and Java for a living. There are no decent web frameworks for C++, memory management is still an issue despite the STL, and the complexity of the language means both staff costs and development time are inflated. Peer review is harder, as the language is fundamentally more difficult to master than PHP. Compared to Java, the development tools are poorer, and things like unit testing a more complicated despite the availability of things like Cppunit. There's no "standard" libraries for things like database access, and no literature that I am aware of that describes how you would go about designing a framework for C++. You'd most likely end up porting something like Spring to C++, and the even if you published your code on the web, I doubt much of a community would build up around it.
If you want a less contentious argument, and one which can be backed up with hard evidence, then argue PHP that should be replaced with Java. A well written Java web application, using a lightweight framework such as Spring or PicoContainer, should outperform ad-hoc C++ code.
It isn't even clear that they measured the trivial stuff, they just took a guess (If Facebook cared to, they could probably come up with a slightly better estimate of their PHP overhead).
Nerd rage is the funniest rage.
It's likely most of the overhead in Facebook's server farm is database-related and not PHP-related, meaning switching to C++ would not help much. Also, depending on what tasks the PHP codeebase is performing, one can write binary libraries to speed up critical portions of the operation, improving performance to near-total-binary without reducing maintainability. I wouldn't be surprised if the people at Face book were already aware of this.
developing web applications in c is not exactly a walk in the park. neither c was designed to build web applications, or maintain them. whereas you can easily go through php code to develop new functions and improve new and existing functions efficiency speedily and economically, it wont be the sam with c. what about those costs ?
looking at the instantaneous state of the server/php/performance situation is as stupid as just looking at the instantaneous state of a mass production factory and declaring that certain assembly lines are not efficient or green. there are a lot many factors and costs to count into in the bigger picture.
a half assed approach, which, somehow, brings the word 'green' into the mix - maybe to garner some attention, since it is the issue these days.
Read radical news here
as with ALL things invented by humans and which can be used to create other stuff, php has grown over the 'homepage tools' it was initially created to be. now not only it has a huge set of functions inside it to create full fledged applications, but through server modules it can also acquire an immense sea of functionality that can be used to perform innumerable other tasks. it is pretty much at the point where it can take over some desktop applications too, with the right server setup and modules - with some scripts and the proper modules you can even do a fair amount word processing in any web front app. to the extent of being able to do drag&drop editing/drawing and pdf creation and so on. of course not quite as efficient as a native desktop program, however, regardless, you can.
just check php functions in php.net, and check some modules apache can use to supplement php. there is QUITE a lot.
Read radical news here
Seriously, years ago I started working on a c++ version of j2ee (not just servlets, the whole kit) and i mean providing similar functions not identical methods of execution obviously. It wasnt terribly hard actually. But it all falls apart really quickly cause of several reasons:
1) platform architecture - the dependence here, even between different versions of the same distribution was a pain and essentially spelt the end of my work. So I was stuck with "do i make web apps c++ soruce, or shared library binaries?" to which there is only one real answer for portability - source.
2) its a systems langauge - dear god that makes it painful for so many reasons.
There are caveats to both those, but the reality is that php exists because it fulfils a need and it does it quite well. To compare the two (c++ and php) is a little ridiculous and ultimately this article just reeks of "please everyone advertise my c++ web tool kit for me!". Sure, facebook (and trillions of others) MIGHT move to c++ web tool kit, but find me a dev that knows how to code an app it, now find me 2, now find me 200 cause thats how many i'd need to write and maintain faceboot apps in c++.
Even taking the OP's assumtion c++ is 10 times more efficient at what php does and that you could actually code facebook in it as actually acurate and that php vs c++ is a one-to-one relationship for things like code maintenance, your still stuck with "how many API's am i going to have to re-write and how many php api's do i use that dont even exist in c++". Its ludicrous to assume that you could drop-in replace php with witty without ending up coding tonnes of c++ code just to do things that PHP already provided. Not to mention the zillions of little extensions that revolve around php to accelerate its web-abilities (memcached for example). The number of things that can be used along side php for web-related things and the number of api's in-built to php just mean witty is never even going to be viable as an alternative. Lets also not forget there are millions of people round the globe using php for web stuff - which ultimately leads to php being a good web language (i.e. security problems being found, optimizations, etc etc).
Of course, wouldn't facebook be using something like zend to compile php pages? I mean seriously, if the 25000 servers are running php and not running zend the waste here just in cost of servers would be unbelievable - shear idiocy on facebooks part (if it were true, and i'd very much doubt it) and I imagine zend would have almost given it away for free just so facebook could say "we got a x% improvement using the zend compiler".
So, I wonder how many people are now learning about witty for the first time (which seems like the only real reason for the article to begin with). Better advertising than adwords!
And everything exuding heat is perfectly natural, no problems there.
The deaths and environmental changes from heat exchange in rivers near power plants don't happen, nope, uh uh.
Water's perfectly natural you need it to live, no way to drown in it, nope, uh uh.
That’s like a cage match between a slow drooling retard and your crippled grandpa in his electric wheelchair.
In other words: Run it at double speed, add Yakety Sax to it, and it’s awesome. :D
Any sufficiently advanced intelligence is indistinguishable from stupidity.
Code it in Asm, and you can get 100:1, so you can power down 29,700 machines...
Better yet, make ppl. post all their wall posts directly in binary code. That way, you can destroy the code necessary to translate UTF-8 back-and-forth, the HTTP/MIME wrappers, and the SQL. Imagine the amount of electricity saved! You can market it as a brain-booster too, since now you have to think before you post on Facebook.
while true it ignores things like your comparing a simple search box, with millions of users who post multi megabyte files to their personal space for everyone to see. try it some day save a facebook user's page locally and see just how much data is coming down that pipe, on top of the scripts that are running.
Your comparing googles front door with facebooks entire company. Google probably has that many servers running web crawlers, and twice over again to store that massive database they use.
i thought once I was found, but it was only a dream.
Yes, PHP is a heck of a lot slower on proccessor-bound tasks than C++. In a pure benchmarking contest, no doubt C++ will win.
But what about when both languages have to query a database (be it mysql/postgress/oracle, etc)? In this case, both are blocked on the speed of the database. a 15 ms query takes 15 ms no matter what language is asking. Facebook is not calculating pi to 10 gazillion digits, and it is not checking factors for the Great Internet Mersenne Prime Search. It is serving up pages containing tons of customized data. This is not proessor-bound... it is I/O bound both on the ins and outs of the database and the ins and outs of the http request. It is also processor bound on the page render, but the goal of this many machines is to cache to the point where page renders are eliminated.
Once a page is rendered, it can be cached until the data inside of it changes. For something like facebook, I bet a page is rendered once for every ~10 times it is viewed by someone. Caching is done in ram, and large ram caches take a lot of machines.
So lets look at those 30,000 machines not by their language, but by their role. We can argue the percentages to death, but lets assume 1/3rd are database, 1/3rd are cache, and 1/3rd are actually running a web server, assembling pages, or otherwise dealing with the end users directly (BTW, I think 1/3rd is way high for that.)
So 1/3rd of the machines are dealing with page composition and serving pages. If they serve a page ~10 times for every render request, then abtou 1/10th of the page requests actually cause a render... the rest are being served from cache. Those page renders are I/O bound, as in the example above - waiting on the database (and other caches, like memcached), so even if they are taking a lot of wait cycles, they are not using processor power on the box. The actual page composition (which might be 20% of the processing that box is doing), would be a lot faster in C++... So 10,000 servers, the virtual equivalent of 2000 are generating pages using php, and could be replaced by 200 boxes using stuff generated in C++.
So the choice of using php is adding ~1800 machines to the architecture. or ~6% of the total 30,000. Given that a php developer is probably 10x more productive than a developer in C++, is the time to market with new features worth that to them? I bet it is.
Ah, but these points are not hard to counter.
1. Needing more developers: more jobs for skilled programmers, which is *good*
2. More developer systems: negligible compared to server farm reduction.
3. Javascript client-side: already being optimized (see Chrome stories)
see a Text Widget
Some optimized assembler would make a difference (ducks).
But network latencies, number of sustainable TCPs per session, db latency, weird table lookups (even arp drags a server down when you have 20K+ connects) are all at issue. Add in various dirty caches, file locks/unlocks and other OS machinations, and life can be tough for any app written in anything.
Then there are the backup servers, the availability servers, the DNS servers, the coffee servers, it just gets bogged down. A 10:1 efficiency claim is probably just language fanboy-ing..... or a consulting job looking for a spot marked X.
Certainly it's nice to be green... but using better optimization tricks (like GCD) for multi-cores is bound to help.... tickless kernels..... SSDs..... C++ wouldn't be my first pick.
---- Teach Peace. It's Cheaper Than War.
"development" also has one.
Not to mention clients. 20K servers is nothing compared to the millions of clients drawing higher power due to running looping flash commercials.
And water isn't a poison, but you'll still die if you drink too much of it.
Considering a minimum cost per server in the range of $500-1000, 2500 servers equal between 1.25 and 2.5 million dollars...
That's certainly a budget for which you could get some C++ programmers... ?
see a Text Widget
It's a phenomenon we have also noted.
Sure C++ would be faster running but not necessarily more efficient in terms of dollars.
I think you'll find that the servers come out of the operational budget, not the development one. So the costs of running 10x more servers don't factor into development effort. The costs should of course be charged back to the dev teams.
Deleted
The proposed ratio of 1:10 is real, if not bigger. And here's why:
1.) For each request, PHP has to load entire application responsible for that particular response, including its configuration, etc. With memcache(d), you have to instantiate connection classes and reconfigure them, per request. Languages like C/C++, Python and Ruby have different architecture to begin with. They load ONCE and each request triggers a FUNCTION or METHOD of a class, with all the app-specific configuration, db and memcached connections done and configured on app init, NOT per request.
2.) TFA mentions microsecond relevance! Even a simple echo "Hello World" will take much more time than similar action in C. I have yet to see a PHP helloworld app that does it in under 1msec, let alone the microseconds required.
3.) Arrays in PHP are slow, being always hashmaps. Other data structures can speed up things. You don't always need hashmaps. SPLFixedArray() is a joke, btw, and available only as of 5.3. Can't compare it to a vector anyways, and lots of fixed structures can be represented by structs or classes in C which are anways faster than in PHP. Also the app can instantiate them once on init, and just (re)load when required.
4.) Even if all the app does it parse input vars and call memcache(d) / database funcs/methods to retrieve/store data, those calls are faster in C. Params can be parsed quicker in C, not requiring hashmaps for instance.
5.) FastCGI is crap. If this app were to be done in C, then it would require its own HTTP layer, epoll based (for Linux). It can take out all the crap in HTTP that is not requred to parse the AJAX calls, and does not need to be "generic" enough to deliver static content.
6.) For such dedicated and distributed deployments, garbage collection is sometimes not required. For instace, fixed-length stuctures can be preallocated upon app init, and the app can really take as much RAM as possible on startup. Yes, that would limit the MAX number of users/connections per server, but so what? The app dominates the server, nothing else is required to run (except basic OS environment for the app), so fixed memory consumption is not a problem.
7.) Even though each request has to wait for I/O of some sorts, either from memcache(d), from disk or from DB, you can process much more of these per front-end server and just scale backend servers as required. For example, with PHP your front-end server can serve 100k/sec, having X DB backends and Y memcached backends. With a C application, the front end can serve, say, 1M/sec. You still get to keep one front-end, even though you had to put more backends.
In short, you can significantly reduce the number of servers required if the app was written in C.
The first article is actually rather good. It focuses on what most of us suspect is the larger architectural challenge, the database,IO, and scaling components not originally designed for a much larger scale. Lessons learned are avoiding joins, reducing IO requests, avoiding DBs for static data, etc. PHP is mentioned as the presentation layer, and optimizations are architectural, not switching languages. Criticisms of PHP are not ones of performance, but ones of maintainability, programming practices, and integrating with C++ code.
I can't read the second article because it's slashdotted, but the summary of it leads me to believe the author either completely ignored the first article, or didn't understand it at all. I won't re-hash the "Where the fuck did 10:1 come from?" arguments everyone else is very correctly bringing up. But I would like to point out that the author of the second article doesn't sound like he/she has a good grasp of what the first article says.
AccountKiller
and Facebook didn't. Facebook has no interest whatsoever in minimising their power usage (electricity is free, you see) and like all corporations they never look for ways of minimising their costs. There are no possible reasons why FB may operate their servers in this way.
It reminds me of when certain people start raging about the fact that "x% of trucks on the road are EMPTY!!". Yeah, because the big trucking firms apparently don't have rooms full of people whose job it is to make sure that the trucks are as full as possible, because trucks, diesel and drivers are free.
This is a substitute for a clever sig that fits within the maximum number of characters.
Sadly I couldn't RTFA because of the good old Slashdot effect but the concept that efficiency can be determined by a direct correlation to performance metrics is just wrong.
For the sake of argument I'll confine my examples of why I believe this thinking is flawed to just the language vs language issue and not bring in any network, database, etc. issues. First, how many more computer hours did it take to build in C++ than PHP? Second if you build like for like functionality in C++ at a given point in time it probably isn't as flexible and maintainable so all maintenance takes longer. Now lets assume you do things "right" and build in all sorts of flexibility and injection points eventually you end up building a higher level abstraction (or perhaps even an full interpreted language) which has the same issues as PHP regarding performance.
The reason you accept performance declines associated with higher level abstractions is that it allows you to do more in a shorter amount of time at a still reasonable performance level and anyone who doesn't understand that and all the impacts of that certainly can't produce a legitimate analysis of power consumption based on languages. If the author really believes this he should program everything in assembly or even better build specialty hardware for every computing task or better yet simply quit using computers or electricity all together, that will definitely have a bigger impact.
What about the environmental trade-offs inherent in spending time considering this sort of environmental impact versus spending time considering more significant environmental and conservational issues?
Looks like he should re-write his webserver in C++ so it's not slashdotted so easily.
From my personal experience: Data-heavy applications run at a complete crawl in PHP. 10 times slower, is, in my opinion, a vast understatement.
Then again, that’s not the point of PHP. The point is, that in PHP, provided you already know how to program, also get things done more than 10 times faster, than in C++. Because there is a simple function with defaults and automatisms for literally everything.
Only if those defaults and automatisms are other than what you expect, you will get into big trouble. And because the PHP interpreter is truly a horrible piece of shit (I was able to run totally illegal constructs, with plain text right in the middle of the code, and it ran, doing nothing of what I expected it to do.), that happens quite a lot.
It’s one reason that drove me to the extreme strictness of Haskell, where you have to get it right upfront, so it doesn’t bite you in the ass later.
Any sufficiently advanced intelligence is indistinguishable from stupidity.
"even arp drags a server down when you have 20K+ connects"
Are you perhaps a server admin in my company? I swear this is the best excuse for poor performance I've ever heard.
Companies use PHP to develop and run web app functionality because it saves them huge amounts of time and money over rolling out the same thing if you were to write it all in C++. Realize what the cost structure of a company like Facebook is - the amount they pay their engineers, marketing personnel, and so on is significantly more than their amortized server expenses and server operating expenses (including energy costs, etc.).
Furthermore, the 10x speedup assumption seems ridiculous - how much time is spent on their server in compute-intensive PHP loops where huge gains would be made from switching to C++? And how much of the "code" is really database queries of various sorts? Furthermore, you can generally isolate small areas like that in your codebase and rewrite them as modules in C or C++ to be invoked from PHP land - and if they could easily cut their server expenses even in half (let alone by 90%) by having a few engineers spend a few weeks rewriting some components, don't you imagine they've probably set about doing that already?
Re-casting a discussion in terms of greenhouse gas emissions or energy use doesn't change any of this - saving energy generally means saving money, unless it takes more expensive resources (such as 100s of humans, who have to spend hundreds of months re-writing code in C++, while they, their families, and dependents emit tons upon tons of greenhouse gases, use electricity, buy groceries, and so-on and so-forth). The cheapest solution certainly isn't always the most environmentally friendly solution (such as when negative externalities are involved - lower labor and pollution standards in China, for example, that make a less "green" product manufactured there less costly in the US), but a vastly more expensive solution that no company in its right mind would implement isn't necessarily greener just because it might save some electricity and a few servers once it was implemented.
Obviously lesser number of servers for a lesser CO2 footprint also means cheaper server infrastructure. If that was the case, don't you think FB would have done it long ago? Economic forces are the main drivers of technology innovation in social networking!
As they only say that 'the bulk' is running PHP, let's assume this to be 25,000 of the 30,000. If C++ would have been used instead of PHP, then 22,500 servers could be powered down ( assuming a conservative ratio of 10 for the efficiency of C++ versus PHP code)
In order to keep math simple, let's assume a horse is a perfect sphere...
Agreed, it does seem a bit strange, to say the least.
A lot of the code for running a web app will not be written in the scripting language for the front end. The database engine, the app server and many libraries will have been implemented using more efficient languages. php scripts are essentially a configuration layer on top of all the more efficient binaries below it.
So, say for the sake of argument that we converted the php scripts to C. But perhaps we still have a few xml configuration files around. Now someone could look at those and go "OMG they are 100 times slower than C - let's rewrite them in C and throw out 99% of our servers!" ...which would be kind of a stupid argument to push, but it seems to me that's pretty much what they are saying, but with regards to the portions of a system that are php.
Cool. Then C would be even faster. It should all be written in C.
Microsoft is to software what Budweiser is to beer.
Yes yes, very nice. Now make the language not shit, please.
This, in a discussion of C++?
Microsoft is to software what Budweiser is to beer.
blah, this is the sort of troll that makes us c++ lovers look so bad :(
Copyright infringement is "piracy" in the same way DRM is "consumer rape"
Nobody takes into consideration the serious environmental impact of C++ over PHP when it comes to designing, implementing and debugging applications, which take much more time and stress people more. They eat more, they shit more, they breath faster, they need to spend more time working and most of them don't live close enough to the office to walk so they either drive or use public transportation. It takes a lot longer to learn C++ than PHP, therefore the developers will be wasting a lot of time without actually producing anything. Why would PHP be 10 times slower than C++ on the web, when most of the work is done by the [most likely written in C] database engine?
In short: TFA is comparing apples to oranges and says that oranges have 10 times more juice than apples after particularly squeezing that exact amount from them.
Yes, it's sarcasm. Deal with it!
Before I even start, let me just say I am a C/C++ coder, I've never really touched PHP, and if I were going for a more abstract language, PHP probably wouldn't be it (mind you, I've not written off PHP altogether; I rarely do that with programming languages, except for FORTRAN, COBOL and C#). I've got no favoritism towards languages; I use what best fits the task and try to make my software as readable and maintainable as possible.
First: where did these numbers come from? I find it hard to believe them, as I have seen actual benchmarks of PHP, not just WAG of "10 times as slow as C++".
Second: if the author is so worried about PHP being inefficient, why doesn't he help improve the efficiency of the interpreter? Remember, there are no efficient languages, only efficient implementations.
Third: has he even factored in the fact that higher level languages require less total development time? What about all those commuting hours saved by the programmers because they weren't having to run their PHP scripts through valgrind's memcheck?
Fourth: why C++? How about FORTRAN or assembler? FORTRAN compilers are extremely good at optimizing code, and I'm sure you could squeeze out a few more cycles by coding it in assembly.
Nathan's blog
It probably is a valid excuse if you have 20,000 client machines connecting locally via ethernet from a B class subnet such that the arp tables on the server keep overflowing.
Of course if you, as a system administrator ever let such an environment be setup you probably are really good at excuses anyway.
Yes. I know the difference. C is an elegant if simple language, which is hard to program properly. C++ is an abomination that attempted to take the elegant, simple nature of C by bolting on spare body parts from dead object-oriented corpses, resulting in a language that is neither simple nor elegant, which is even harder to program properly.
See, I know the difference.
But if the point is to gain efficiency, why would you stop at C++? It's not a magical perfect balance of performance with elegance. C would give better performance than C++.
Sure, there's the non-OO tradeoff (though you could quite easily gain the benefits of OO, though not as elegantly as C++), and then you don't have to deal with fucking templates (which are really nice to program, but a bitch to clean up when someone else has fucked them up for you).
The premise of the article is stupid, and shows a pure lack of understanding of PHP, web service architecture and implementation, and a not-inconsiderable dose of C++ fanboi-ism.
Microsoft is to software what Budweiser is to beer.
Don't forget to take account of the energy required to heat the water for the extra coffee it would take to build it in c++. People always forget about the coffee:production ratio.
Assuming that the average C++ coder is heavier and bigger in size (even when shaved) that the average PHP coder, I think the exhalation of CO2 produced by the C++ programmers needed for the job overpower the 10:1 edge they have on code speed.
Also, they probably eat more, and drink more coffee, which turns into a bigger environmental footprint if you count the emissions produced by trucks that deliver those groceries to the nearest store. And, just to name one example: Let's assume that 50% of the coffee company employees drive to their work and so on....
Hey, they are the ones who started drawing conclusions based on assumptions, not me.
After all, what purpose do they really serve? Apart from fans of X Factor would anyone really notice if it was gone?
At the very least it would save us all from the annual slashdot Christmas bunfight over which code reigns supreme.
Posts, MyBio or Sig, may contain satire, sarcasm, bolded nouns be sardonic or even witty & be Church of SD
Dumb statements like this is what leads to premature optimization. Show me the proof: Put a profiler on facebook and show me where the bulk of code execution is happening. I seriously doubt one could code a similar app in C++ and make it run smoothly and stable and yet save that many servers.
If the goal is energy conservation, the server count might be not be reduceable -- # required for memory, network ports, disk seekers, or other things.
However, it is _certainly_ possible to reduce power and cooling requirements somewhat with less inefficient code. So you can install slower/lowerV/lowerW CPUs, or fewer cores (unless you are already at min). Or at least the CPUs spend more time in powersavings states.
The power reduction may not be all that great, ~20W per server, but over 25,000 , that is still 5 MW -- 4.4 M$/yr
Beware of false economies -- LoC does not matter if those lines are rarely executed. What runs often matters. What doesn't might not be worth the power investment of compilation.
My own experience doing server development in c was that it's a minimum of 30:1 (and in in some cases, much greater). Plus the speed differential is huge, and also in favour of c.
There's a big difference between a couple of hundred requests a second and 6,000 - 10,000.
Then again, the php code had to be served through apache, while the c code was served directly by a custom server sitting on a separate socket, so there's no telling how much of the overhead was from apache.
Even the absolute worst-case scenarios were well over 10:1.
I am curious about an example of a company that has really done this conversion, and what their savings was like. Where is it?
Living in Chile
This is brilliant! I think it's clear now the direction we must go. Overuse of energy-guzzling languages like PHP have put us on an unsustainable trajectory fueling out of control global warming.
Congress must act to regulate the use of these energy-guzzling languages. No longer will programmers and corporations be permitted to turn out inefficient code with impunity.
PHP, Perl, Ruby, Bash, your days are numbered!
Just wait until we can get UN involved. Python, you and your CO2 spewing simplicity are next!
Wrong - the language makes a huge difference. Try using the c api and CLIENT_MULTI_RESULTS and CLIENT_MULTI_STATEMENTS and concatenating 10,000 queries into one request, then using mysql_next_result() to get the next result set (no, not the next row, the next result set - 0 or more rows).
One connection. Not 10,000. A BIG difference in execution time. Testing showed that the optimum amount of strcat()ed or fsprintf'd queries was between 10,000 and 20,000 on hardware with limited resources (half a gig of ram, single cpu).
If each page requires 50 hits on the database, you're going to see a big difference.
Now imagine this on a machine with much more ram and more than one core.
More reading: http://dev.mysql.com/doc/refman/5.0/en/mysql-next-result.html
First, a few helpful links:
Amdahl's law says that if Facebook were to switch from PHP to C++, the best possible improvement in the overall processing time is proportional to the total time spent in PHP now. If PHP processing accounts for 90% of the time and they reduce that to zero, they'd have a 10x speedup. However, if it accounts for 10% of the time and they reduce it to zero, they'd have about a 10% speedup.
So, the question is: How much time (overall) is spent in PHP processing? My guess is not very much. As other posters have pointed out, there are disk accesses and MySQL. And quite a bit is cached in Memcached.
The original article is slashdotted now, so I'm not sure if it says what those 30k servers are doing, but Facebook has more than just PHP running. Perhaps a thousand of those servers are running Hadoop, probably calculating the social network.
From an architectural perspective, it probably does not make sense for them to optimize for processing speed (i.e., switch PHP to C++) if their performance is acceptable. That's because they face larger risks: modifiability and time to market pressures. They may worry that switching to a statically typed language (such as C++, but Java would be similar) would make new feature development slower. If they could have both, great, but these two quality attributes often trade off against each other. A design with better performance may hinder modifiability, and vice versa.
I don't mean to start a language war -- I'm speaking broadly about the idea that dynamically typed languages (PHP, Smalltalk, Ruby, Python, ...) yield programs that are faster to write and modify compared to statically typed languages (C, C++, Java). You may disagree with that generalization, but you may agree that others think it is true, and are therefore acting rationally if they choose a dynamic language when they want modifiability.
Disclaimers: I knew Aditya in school but haven't spoken to him about Facebook; I am writing a book on software architecture.
Oops - typo: s/fprintf/sprintf/ - sorry for the typo. And no, we never overflowed the buffer - always checked the arguments length before adding more data.
Compile.
For the portion of processing that the interpretation is occupying the 10:1 ration is probably conservative. It may be 400:1. Compile.
Wow - if 20k arp connects are killing your servers, maybe you need to fix that :/
No, really... you need to fix that. :)
Now the biggie would be 20k users sending writes to disk - that would drive iowait into the frickin' stratosphere, even if you had the fastest disks alive. THAT is the fastest way to bog a server down.
I assume that the databases are not on the same servers as the PHP code sitting around, but that doesn't mean that the server load would drop as big as TFS says, since the databases will remain the same size no matter what you use on the front-end.
(which reminds me - maybe the article author should have compared loads from MySQL to Postgres to Oracle to (*snort*) SQL Server... THAT would have been a real comparison.)
Quo usque tandem abutere, Nimbus, patientia nostra?
Why stop at 10? Since we're pulling numbers out of the air anyways, why not take a conservative estimate of 100 for the ratio of PHP to C++ execution, so they could run the whole thing on just 250 servers!
Which would be very relevant if Facebook was doing heavy number-crunching. The only numbers on the site are comment and friend counts, which isn't especially taxing work (especially since it's all de-normalized). The majority of FB is database activity and transforming that into HTML and JSON. If you want to place blame for inefficiency, MySQL would probably be your best bet.
How are sites slashdotted when nobody reads TFAs?
Isn't this "study" a waste of energy?
I am a C/C++ programmer by trade; I'm not fond of PHP. Yet this "C++ saves energy over PHP" argument smells like more selfish politics to me. And selfish politics is what is bringing doom down on humanity's head -- the use of PHP vs. C++ is a sideline, a distraction, and only truly valuable for people who have a philosophical axe to grind.
You want to save a lot of energy? Shut down all the computers running MMOs. And stop wasting cycles looking for alien signals in cosmic radio waves. And get rid of banal YouTube videos... and... the list is endless. The science behind Global Warming is being used to further political and social agendas that have little or nothing to do with adapting our species from a potential environment change.
In the end, selfish politics will kill us all. We will become a footnote in history is we do not discover enlightened self-interest.
All about me
Ok, this has gone WAY too far .. we all need to just take a step back..
---- Booth was a patriot ----
That's right, because a similar level of chicanery is going on in this claim. A small factor of system expense is being extended into a region of pure nonsense. There are plenty of more reasons to have a large, scattered base of servers. These include:
* Local database mirroring and caching to improve response times for dynamic content.
* Local proxying of static material to improve response times and to improve upstream bandwidth costs, and reduce the number of connections made to the core servers and avoid DDOS'ng yourself.
* The idea that PHP's function calls to pull and present disk-based or data-based material would somehow magically reduce the overall cost and need for servers, even theough the request for material is probably one of the most efficient steps.
I've had eager young engineers extrapolate their favorite tool into being the great solution to all issues this way before. Educating them in the concept of looking for the _other_ bottlenecks is a painful process, and I wish I could have found a good course in it before a lot of recent projects myself.
C/C++ is mabye 10-100x faster and more efficient for carefully written inner loops. At the level of whole systems, it's an entirely different story. Because C++ lacks garbage collection, people end up retaining far more memory than they need to. Because algorithms are far harder to express in C++, people end up using brute force algorithms (linear search, etc.) a lot. Because templates need specially compiled versions for each combination of template arguments, you end up with dozens of different instances of basically the same code.
For web applications, there's probably not much of a difference either way; but in scripting languages like PHP, all the inner loops that are needed are already written in C. For scientific computing, C++ is acceptable because a lot of applications really are mainly about the inner loops.
But for many applications, like GUIs, C++ not only fails to be faster, it also ends up making everything a lot slower and more bloated. If our desktops were largely written in Python, Ruby, or Smalltalk, we'd be using a lot less energy and be able to get by with smaller, less-powerful machines. That's in addition to all the savings from the reduced number of bugs and reduced development costs.
The script languages like Perl/PHP/Python/Ruby are dynamic, and fill a role that C++ can never fill. Further, while GC can be added to C++, it changes the programming style so much that it is nearly another language (makes using 3rd party libraries tricky).
Java is a middle ground between C++ and script languages. It has the garbage collection, dynamic class loading, "safe" execution model and extensive libraries like PHP/Python/Ruby, but long running programs like web apps get compiled to optimized native code as they run. Yeah, the language has warts, but it is still more productive vs programmer time than C++.
This is as stupid as the "news" a month back claiming a pet dog used the same environmental resources as a large SUV. The problem was, multiplying the claims for agricultural land used to feed one dog by the number of dogs gave as a result that 10% of our agriculture is feeding dogs. Why is that obviously wrong? Because the whole pet food industry accounts for less that 1/10 of 1% of the value of agricultural production.
We're going to see more and more of this shit. Everyone will be competing to get their 15 minutes of fame by claiming that X - something that annoys them, like PHP or, evidently, dogs - is the major overlooked factor in destroying the Earth's climate. Because they know our press is so stupid in basic scientific literacy that they'll jump at the chance to put a headline over the claim, since we're all bound to read it just in case there's a there there.
"with their freedom lost all virtue lose" - Milton
that site you mention is apparently kinda unable to count correctly.
for example, it gives 8 visitors/day for a website that i know has 4000 visitors per day.
how does it get those numbers?
Computer programmers are people with their own carbon footprints, $FLATULENCE_JOKE. So, people have raised objections to the underlying efficiency argument, I tend to agree with the people who estimate that the energy savings would be less than 10-fold, but it's not like I've looked at the diagnostic output of their servers.
Labor costs money, right? So if you assume that $X million worth of servers and electricity are cheaper than $X million worth of programmer time to reimplement the whole mess in C, then it's probably minimizing the carbon footprint to leave it alone. This ought to be a very simple business decision.
There are certainly cases where this is not true, but for most purposes, dollars spent on computer programming go directly to carbon footprint. I'm a Socialist, certainly not a free market fanatic by any stretch, but when it comes to spending millions on highly specialized, skilled labor to reduce carbon footprint, I doubt that it's worth it unless the electricity you save costs more than the specialized labor.
The good and new comes from no quarter where it is looked for, and is always something different from what is expected.
C++ is much too slow and carries too much of an overhead. And it usually requires an operating system on a general-purpose processor. You could go to hand-optimized binary code written directly for the processor but that still leaves us with inefficiencies.
Imagine if every website was implemented as an ASIC. Then we could talk about efficient datacenters. Maybe, if you're relly strapped for cash, you could implement each website in an FPGA. But that should only be a stopgap measure until you can afford a proper implementation.
USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
In general code written in a dynamically typed language like PHP is harder to test than code written in a statically typed language like C++. The reason why is that statically typed language compilers catch hundreds of problems at compile time that dynamic languages typically cannot catch until run time, and with complete code coverage at that. Misspelled variables and other minor typos anyone?
In a sufficiently large project, all the time one might save not going through an extended compile cycle quickly gets eaten up by the extra testing overhead, testing which for user interface intensive applications is rather difficult to automate.
Of course the developers have to know what they are doing. If you run into more than one dangling point or array overflow problem per ten thousand lines of code, someone is either ignorant or careless. C++ programming is not for code monkeys.
That's because if they were writing in C++ they would still be writing the application and wouldn't have any users yet. Larry
...covered. Ban all programming languages except assembler! They emit CO2 and will lead to the destruction of the universe!
I had a number crunching PHP batch program that was taking about 45 minutes to run. After a long and tedious rewrite in C++ the best I could get was about 5 minutes faster (~10% improvement). On line the main performance hit with PHP (relative to C++) is the interpretation which can largely be negated using APC for caching and optimizing the intermediate code. Considering the nature of on line programs I would expect even less that the 10% gain I squeezed from the batch program. Of course it's horses for courses but from my experience I find the 10:1 ratio unrealistic for real world applications..
I do some basic php development and really couldnt imagine having to do the same work in C++. It would take so much more time it would make it nearly impossible to do.
so maybe its time to optimize php? Perhaps do some hardware acceleration of php code with CUDA -or- maybe if just focus on the php interpreter and make it more efficient.
as with any scripting language, an un-optimized interpreter will run the script code an order of magnitude slower or worse than a pre-compiled program.
I have seen some comments that it isnt likely to cause such a different in server needs because it doesnt take into account I/O performance. Well consider that everything must be bought with some sort of budget in mind then consider that a nice dual-quad core server will cost many thousands of dollars more than a single core2duo server. a 2 socket 3ghz quad core with 24Ghz aggregate x the 10:1 ration means you need a single 2.4 cpu so a core2duo e8400 is actually worth twice the cpu power for pre-compiled code vs a dual quad core system running php. The different is price could be spent on SSD disks or a faster SAS array and actually improve performance.
you can test this yourself by running a bash script to run 10,000 queries on a mysql database with the compiled mysql tools vs a php script to do the same thing (run on the command line) and then subtract the time the database took to return the data. This is a very typical workload for php and it will lose this race everytime.
do I need to put some result here? no, do it yourself so you can see first hand.
Not to mention clients. 20K servers is nothing compared to the millions of clients who should be running Flashblock.
FTFY.
As a rule of thumb, I've heard compiled languages beat interpreted ones by a factor of 600.
Some languages, like C# and Java lands in between. They're compiled to some fictional machine
which is emulated by software. I don't know where they land in this.
PHP is compiled on the fly, and IIRC you can cache the compiled output on a busy server to save
quite a bit of time.
You took the bait. Now spit it out. ;)
---- Teach Peace. It's Cheaper Than War.
FORTRAN is designed to do numeric processing. FORTRAN compilers are very good at optimizing such code. FORTRAN is not at all optimal for doing much of anything else.
Similarly, with the right framework, someone might write a general purpose web application in C++, because you can make string processing code a relatively painless exercise with the right class libraries. Plain old C, on the other hand, is much worse - essentially a language designed not to do string handling very well or very reliably.
Even with C++, there is an enormous interoperability and efficiency problem with strings of different flavors, and I would put rank that problem as #1 on a list of why few people do general purpose business programming in C++. I have *never* seen a standard C++ library compatible C++ string implementation that was worth using in both compile time and run time efficiency terms. Certainly the implementation that comes with GCC doesn't qualify...
This guys takes some benchmarks that have absolutely no basis in actual real world performance and beats his drum.
What, does he want a medal for a beat up on /.?
Drop this fool down a well, and leave optimisation to those who understand it.
1. Prototype PHP code written and tested in an afternoon.
2. Business case written in an afternoon (forgetting to include the profit generating point.)
3. Vulture Capitalist drinks too much at lunchtime pitch and agrees to provide $BIGNUM
4. Development of real codebase begins.
5. Vulture Capitalist sobers up and demands that the new service starts NOW!
6. Prototype code goes into production.
7. Real codebase development abandoned.
The PHP interpreter can and should run in-process to the webserver. Compiled C++, not so much.
Now, I imagine Facebook's scripts are really quite simple - fetch from the database and do some formatting. Moreover, they're called like a thousand times a second. Grabbing from the DB is already the expensive bit; C++ won't help that. But starting thousands of processes a second can't possibly be faster than the in-server interpreter effectively just looping.
Am I missing something or is this a pointlessly stupid article, bordering on troll?
I have developed a truly marvelous proof of this comment, which this signature is too narrow to contain.
I have done projects like this, and received massive speedups and performance increases. The issue is that you need to understand the real reasons why rewriting a program in C and/or assembly gives a massive performance increase. Inevitably, the reason why the C program is so much faster, is that a programmer has went through and rethought the application. The programmer eliminated string copies, string manipulations, data communication overheads, and data manipulation/translation overheads by rethinking the programs design.
For example, imagine a very simple application designed to take a digital input, and display a red/green indicator to a user depending on the input state. Count every time a major string overhead, data communication overhead, or data translation overhead occurs in each of the proposed solutions.
Web Solution
1. Input digital input via PLC (Data Overhead #1)
2. Upload data from input via PLC communications protocol to PC (Data Overhead #2)
3. Make data available to other programs, for example RSSQL makes real-time I/O appear as SQL database queries (Data Overhead #3)
4. Use PHP or ASP to generate a web page based on a SQL query for the real-time input (Data Overhead #4)
5. Use a web browser to query the relevant web page. (Data Overhead #5)
Web Solution performance: it might be able to update the display screen every 1/5 second.
Embedded C Solution
1. Input a data point using real-time I/O
2. Paint a computers display screen accordingly. (Data Overhead #1)
C Solution Performance: 1/60 second, limited by the refresh rate of the monitor.
Assembly / Microcontroller Solution
1. Input the data point, with INP , AX
2. Output the data point to a Red/Green LED, with OUT AX,
Note: the assembly implementation doesn't have any string manipulation, so it doesn't have any significant data overhead.
Assembly Execution Time: Less than 1 micro-second.
The crucial concept from the above example is that the programmer reduced overhead and execution time, by simplifying program operation. The problem was solved in 3 different ways, and the fastest solution wiped out all the communication/string/data management overhead. If you want to make a computer program very fast, it is necessary to reduce data communication, string manipulation, and complex data structure overhead.
Which languages do this and why: .NET encourage carefree string use and data structure use. The have automatic garbage collection. As such, minimal penalties exist for the programmer to use strings.
Level 1 - Simplest: Assembly is the best at wiping out string overhead, because engineers willingly migrate complex functionality to hardware before implementing it in assembly. In this case, the display screen was eliminated in favour of a direct output to an LED.
Level 2 - Low-Level: C is remarkably quick at string manipulation programs, because programmers minimize the amount of string manipulation. String manipulation in C sucks, and is difficult to get correct. As such, programmers attempt to minimize it, or use optimized tools like lex/flex or yacc/bison that automate the difficult problems.
Level 3 - Garbage Collected: Java and
Level 4 - Scripted: PHP, Perl, Python are higher level languages focused on easy programming for high-level tasks. They pretty much assume the programmer doesn't care about the overhead of processing strings or complex data structures. Instead, they make it easy for the programmer to program the complex data structures.
An application like FaceBook has to have some complex data structures to do its job. In that case, a migration from PHP to C will likely not produce great benefits, because the C program still has to do all the same work the PHP program does. The old rule was that interpreters were very slow. With modern techniques, just about any language can be sufficiently compiled to
Dude, are they going to take in to account the extra time your computer needs to be on to implement all that shit long hand? No? So you're saying your suggestion is something of a funny that failed or a troll that needs some souping up?
Or is there a joke in there that you are crap at telling?
Inquiring minds want to know.
You're a C denier who is killing the planet with your evil lying facts! Off to Kamp Kernighan for you - you'll parse strings with pointers - AS K&R INTENDED - until you crack!
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
Why is it that a decent PHP (or Python, or Ruby) MySQL binding couldn't do the exact same thing?
Don't thank God, thank a doctor!
The sad thing is that the open-source community hates flash (with *very* good reason), but hasn't brought forth any legitimate alternatives. Moon/silverlight is a legitimately good alternative, but has been bogged down by licensing concerns (the legally-binding promises made by Microsoft don't even appear to have registered on the consciousness of RMS and the like)
That said, I wonder if some sort of legal class-action can be brought against Adobe for making such a fantastically inefficient piece of software. The effect on power-consumption across an entire enterprise must be quite high.
-- If you try to fail and succeed, which have you done? - Uli's moose
hey don’t nock it I have written billing systems (mostly) in FORTRAN 77 and my employer a major telco sold it as an international product. The bit that wasn’t FORTRAN 77 Was Pl/1G
Java and C# can both actually beat compiled languages because they can try multiple compiler strategies at run time and use the winner. Compiled code has to accept the best guess from compile time.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
I think it's fair to say that FB servers generate a large amount of database I/O.
And their PHP code is likely running a lot of graph flow, pattern matching, and other data mining algorithms.
Including plaintext indexing and search algorithms
Remember, the whole point of the social network from an advertiser perspective is to select people on the network who are most likely to be interested in certain ads.
This suggests a lot of elaborate DM on FB's part.
Just because the intensive computations aren't obvious to the end-user, doesn't necessarily mean there is no heavy numerical computation being done behind the scenes.
It's a sad day when programmers cannot think any other reason to use c++ than some crackpot global warming co2 theory that doesn't hold water.
Not if your LAN is correctly configured.
Ok. 20,000 machines on a Class B subnet is ludicrous.
With so many, you should divide the LAN and subnet with routers, or if for some ludicrous reason you must keep a class B subnet, then use a LAN divided transparently with routers acting as Proxy ARP servers.
No excuses.
I hate PHP but there are lots of reasons why PHP might be the better solution. Others have pointed out how ridiculous this is. ASM, C, C++ require more skill, and tkae more time to program with. It seems the moment that lately the moment the environment is mentioned as a factor, not only do all other factors cease to be considered, but all common sense goes out the window. It's like everyone in the room spontaneously turns 12 years old and decides they have the solutions to all the world's problems if only everyone would listen, all the while ignoring the knock on effects and complexities of the real world.
These posts express my own personal views, not those of my employer
Uhm, googles server count is in the 100's of thousands. Hell they have data centers with more than 12k servers in them.
Of course, they do a HELL of a lot more, but thats another story.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
I think that once your server farm exceeds 100 servers, it's time to start seriously thinking of re-architecting in C. That's probably about the tipping point where your cost of build-out will exceed the engineering costs of switching technologies.
I invested in a small startup a couple of years ago. We based everything on Ruby-on-Rails, php, mysql, and I forget what else. The bitter lesson we learned -- and what was the main cause of the company's failure -- was that this technology doesn't scale. We nursed it along to 30M records on two servers (the db admins must've been geniuses to coax it that far), but in the end, it fell over.
We could have saved the company if we could have afforded to expand our servers on our shoestring budget, but in the end, the software infrastructure we were using would have failed us one way or another.
I've also worked for companies that scaled successfully, and they were the ones that got it. Switched from scripting languages to C++. Switched from MySQL to a dedicated database engine.
Scripting languages are great for development and prototyping, but for serious production use, you really need to bite the bullet and switch to compiled languages.
The exception proves the rule. FORTRAN (like COBOL) is certainly more computationally efficient than the vast majority of languages today, so if you are in a performance / overhead sensitive environment, there can be a lot to be said for such languages. It is just not the sort of thing people normally do without such constraints, because there are other, more modern languages that tend to be more suitable to the task.
No one is going to write an operating system kernel in FORTRAN, for example, although I have no doubt that it is actually possible.
What's a "data-heavy" application? If it's data-heavy, then perhaps one should try farming most of the work off to the database. If you cannot, then perhaps the schemas are poorly designed. (However, it's not always possible to change poor schemas without significant down-time.)
Table-ized A.I.
This is idiotic, and is typical of the kind of pseudo-science underlying much of the climate alarmism currently en vogue. Like a lot of things, it is pretty much impossible to quantify which language ultimately uses more power, because of all the variables. As others have pointed out, you might save some power in the deployment of the code, but you would surely use more power in the development of that code. Then, you have to figure out what the total impact of that is, since you'd have more man-hours of coding, using human coders, who sit at desks, in offices, which must be heated and cooled, etc., etc.
Java and C# can both actually beat compiled languages because they can try multiple compiler strategies at run time and use the winner.
That's the theory, and it's no doubt true... but I've never seem them come close in real life, and I doubt they ever have anywhere besides very non-representative benchmarks.
But it's not really that relevant, because even if the execution speed is 10:1 in favor of one language, language execution speed is only one of the bottlenecks. They'll be spending a fair amount of time outside of the language, executing the same APIs. There's network speed, disk access, shared libraries... a 10:1 difference might not be that significant when only 10% of the time is spent actually executing the program.
The sad thing is that the open-source community hates flash (with *very* good reason)
I don't think they do (have a very good reason, I mean). There's a good reason to hate the way Flash is USED, I agree; but many people hate flash as a language or platform, and there's really not much of a reason to. It's similar to Javascript in that way, a decent language that's condemned because of the way it's typically used; that's more a problem in the browser model than it is with Flash/Javascript themselves. The only strong criticisms of Flash that are really earned are its processor intensiveness, and perhaps that it's proprietary.
I assume the original article doesn't have any basis in reality.
I can also assume, based on those server counts, what the service does, and what it provides to users, that something about Facebook doesn't have an extremely efficient operation.
Its fairly obvious that their solution to performance problems was to scale out rather than optimize. There are god knows how many reasons why they could have made that choice, I have no idea if it really was a good one or a bad one, but I'm fairly certain they could be far more environmentally friendly by using C or C++ on their front ends, even if you you take longer to develope it or you hire more developers (which there is no logical reason why you should, unless you hire incompetent C/C++ devs). Probably would cost more, PHP devs are a dime a dozen, competent C devs aren't.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Suppose all you care about is making money off some stupid internet idiots, so you start a site called Spacebook.
Which makes more business sense; buying a lot of computers and paying for a lot of rack space and electricity so you can hire cheap PHP kids, or paying less for hardware and ongoing hardware related costs, but paying more for C programmers.
I have no idea, but maybe the VCs funding horseshit like MyFace and SpaceBook don't know either.
Wrong - the language makes a huge difference. Try using the c api and CLIENT_MULTI_RESULTS and CLIENT_MULTI_STATEMENTS and concatenating 10,000 queries into one request, then using mysql_next_result() to get the next result set (no, not the next row, the next result set - 0 or more rows).
One connection. Not 10,000. A BIG difference in execution time.
Are you trying to imply that PHP establishes an entirely new connection to the database for every query? If so, you basically lose all credibility you might otherwise have.
It would take a really serious amount of in-depth analysis of the server application to even approach knowing what the efficiency impact of using a compiled language vs an interpreter would be on any specific stack. Or even stacks in general. Plus we don't even know what it really means to be "using PHP". What is PHP doing? Is it processing templates, doing just some post or pre processing with some kind of XML pipeline in the middle, how is the PHP deployed, etc?
It is simply ridiculous to make any assertions and claim accuracy for them. I'm no PHP fan boy by a LONG shot, but I know from hard experience that often a higher level tool which is optimized for a particular job can get the job done quite a lot MORE efficiently than a lower level one that isn't.
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
A lot of PHP code is just making function calls to the built-in library functions anyway, and those library routines are all compiled C/C++. If I call a library function from C or from PHP there is some difference in overhead when setting up the call and processing the results but the actual function is likely to be the exact same thing.
"Almost every wise saying has an opposite one, no less wise, to balance it." - George Santayana
Nah, the garbage collector will outperform custom memory management every time. The best experts in the field have looked at the garbage collector. Have they looked at your memory management code?
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
C++ is an atom bomb in the hands of a chimp.
I've tested software for about 15 years. I can tell you from experience that THE most buggy, nasty, ill designed applications I've ever tested were written in C++.
The world is more than performance. For many, many application, the blazing speed of pointers on a local application simply *don't* *matter*.
Unless I needed to process large chunks of binary data in real time, I'd use anything else but C++. For a web application, I am *sure* that the downtime due to crashes, memory loss, uninitialized pointers and all the other dreck that each and every C++ programmer is convinced never happens to *them* would cost more time, cycles and energy than a perfectly functional PHP application.
Please do not read this sig. Thank you.
What's a "data-heavy" application?
Facebook
A site like Facebook isn't computing the value of pi or calculating jump coordinates for the Galactica, things that would benefit from a more efficient implementation. I don't know anything about Facebook's site, but many web applications use the application layer to essentially pull data from and send data to databases. Parsing data back and forth typically isn't sped up with C++ or even C that much compared to PHP. All many PHP sites do is draw HTML, and a lot of SELECT, INSERT, UPDATE, and DELETE queries. Hard to imagine C++ having a big impact on that, certainly not 10:1.
The sad thing is that the open-source community hates flash (with *very* good reason), but hasn't brought forth any legitimate alternatives.
HTML5+CSS3 is all we should ever need in the next few years. Other than that, if you need more, Flash and/or JavaApplets are fine. But only if you specifically need to.
Of course, it will never be used because f****ng M$ decided no to include it in its 40%+ market share browser, so that noone can decently propose this for a large website. But it is another story, right?
Write boring code, not shiny code!
Either way, the key is to use the tool that's best for the job. I have an app that uses html and javascript on the frontend to asynchronously call programs on the backend. For simple calls, such as database queries, I call python scripts, since they're quick and easy to write for that purpose. For more CPU-intensive backend tasks, I call C++ programs.
Language purity isn't a good thing, Everything has its strengths and weaknesses,
Nobody pushes buttons like our bunny. Big red buttons with labels that say "IGNITION", apparently.
c and c++ are both lower level languages than php, closer to the assembly level. c and c++ can be used to code the compiler that php runs on. php is a CUSTOM language for creating web applications. c, or c++, or anything of their level cant come close to it. because, php was PURPOSE BUILT.
Read radical news here
Use a nuclear power station. Problem solved.
... assuming CPU cycles are the key bottleneck and not, say, network communication and data access. I'd assume they look at performance pinch points and optimize those. So if the 10% most computationally-intensive code is written in C++ or Java, the savings in rewriting the rest in C++ might be 15%.
Ceci n'est pas une signature.
It's hard to pin down how many server-hours C++ would actually save. PHP might be spending most of its time running code from C libraries (memcached lookups, HTML/XML parsing, regexp evaluations) instead of interpreting PHP. The article doesn't say what portion of the servers are running PHP, and the 10 to 1 efficiency ratio is pulled out of thin air. The server farm might be I/O-bound instead of CPU-bound, and if it's not, it's quite possible that it would *become* I/O-bound if you rewrote everything in C++, preventing any 10-to-1 savings.
Indeed. Compute-bound code might be 10 times faster in C++, but both PHP and C/C++ will spend the same amount of time waiting for MySQL or PostgreSQL to get back with the results of a SELECT or UPDATE. But even if it's only 2* as fast, on average, that's a lot of servers to save.
As you say, the main thing to do isn't to recode everything in C++ or C, it's to identify the places where PHP performance is the bottleneck... and look for *well factored* shortcuts that can be implemented in a more efficient language.
I was interested in how many kgs of coal a badly written Flash Ad would consume. I did some quick calculations based on the typical Wattage of a desktop and assumed a 10 second view by 1 million people over the course of a year. This assumes that the crappy Flash Ad consumed 100% of a core. I also assumed that all power comes from coal (most of it in the US does). That's a lot of assumptions but should still get us in the "ballpark" for the final figure which, to my surprise, was quite high! I estimated that this would consume around 90 TONS OF COAL! Looking at the figure I am convinced that I made a mistake somewhere in my reasoning, calculations or raw data but I haven't found any problems yet. I would appreciate any interested Slashdotters to set me straight or confirm my work. Here's my blog article with all the calculations on it: http://blog.bit-matrix.com/2009/01/20/how-green-is-your-code/ Thanks.
You keep saying that, and it may be true in theory, but all the applications written in managed languages that I've used have been slower and used a lot more resources than the equivalent written in compiled languages.
Mada mada dane.
This story makes assumptions on system architecture that point blank would just not be true in a large scale deployment.
Sure C++ is more efficient that PHP. But where does it say that EVERYTHING or even most is served via a php interpreter. In a large scale world dynamic caches and CDN's would have offloaded all of the cachable content and it would have never hit the php interpreter. Guess what a C++, Java, anything else env would do exactly the same thing.
You would be amazed at how much off load is possible. What you think is dynamic content is usually just a very simplistic dynamic wrapper over large swaths of static content. A seemingly dynamic html is really just a collection of static html blobs that can be held by a cache or CDN close to the consumer and assumbled as a complete html document on the fly by an web aware accelerator.
These 30,000 boxes that are spoken of are most likely composed of a large verity of purpose specific hardware. Where the purposes are much more diverse than a simple php / mysql combo's. Lets face it. Face book is across the planet and is relatively speedy all across the planet. This implies that a distributed CDN is in place. With local content caching nearest the consumer. Instantly this means we have more purposes than just php/mysql for the hardware. Authentication is usually offloaded by large scale web shops for a host of reasons. So lets add that kit to the list of purpose built. I could go on and on about the various purposes.
So it's fairly obvious that the statement we could shut off 22K+ worth of machines by simply ( nothing simple about this ) changing languages is just non-sense.
Plane and simple a large scale web environment is only partially the application hosting equipment. A very large part of the environment is the infrastructure, network, CDN, Caches, accelerators, security, .... equipment. Yes I can write a little fancy web app and run it on my netbook and wow people. It's a whole other matter to scale that up to 10's of millions of concurrent users spread across the planet.
Lets face it. That amount of equipment consumes one hell of a power bill. I'm positive that "Facebook" has already done an enormous amount of work to reduce that bill. Not for save the planet reasons but for simple dollar reasons. Can they do more. Absolutely. But it won't be a simple we'll just change everything over to this new language this weekend and then hit a bunch of power switches. The amount of power consumed by the massive development work force to write test document deploy this new wonder solution would probably leave "Facebook" in an energy debt for another several years.
PS. Good luck with that sales pitch if you ever make it near the office doors of facebook. :)
Clearly, it's time for economics 101 again - opportunity cost.
As a very rough, correct-within-a-factor-of-two estimate, let's assume that the average server results in two tonnes of CO2 being emitted to keep it running. So that's 45,000 tonnes of CO2 a year saved, if the OP's estimate of the difference in speed is correct (which I doubt, but anyway).
However, if Facebook implements in C++, it's reasonable to assume that they will need to hire more developers, and more expensive developers, than if they use PHP.
I don't have accurate numbers, but I'll pull them out of my arse here for the purposes of illustration. Let's say that rather than 500 PHP earning $60,000 a year, Facebook instead employed 750 C++ developers earning $80,000 a year. That's 50 million bucks a year in extra expenses.
Carbon credits on the EU ETS are currently going for around 20 USD. So if you want to prevent the emission of a tonne of CO2, you can go to the EU climate exchange, hand over the equivalent of 20 USD in Euros, and simply rip the permit and not use it.
So let's say that Facebook spends 10% of the difference in programmer costs - 5 million dollars - on ripping up EU emissions permits. They prevent 250000 tonnes of carbon emissions. That's more than five times as much emission reduction as achieved by substituting PHP for C++, and Facebook still has $45 million in the bank.
Heck, we could replace "buy carbon credits" with much higher-cost abatement options like "buy Priuses for company cars" and still come out in front.
Here endeth the lesson.
Any sufficiently advanced technology is indistinguishable from a rigged demo
--Andy Finkel (J. Klass?)
Their decision for using PHP might have to do with being able to get their business up and running now using PHP rather than envisaging go-live a few years down the road with their developer resources and learning curve adjusted to C++ (which in all its well-deserved glory does take its time to master). Probably C's savings in power don't outweigh PHP's savings in manpower.
"Seriously, is somebody taking seriously the 1 to 10 ratio of the story?"
Yet the assertion that programmer productivity varies with a 1 to 10 (or even 1 to 100) ratio is accepted without a blink of an eye or the firing of a single neuron.
That's likely to be due to them using cross platform ui libraries. Those libs are terrible, and can't even come close to the native accelerated libs. If you need a highly performing UI, don't use a cross platform language and API to do it.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
All these mean better performance. WAY better performance.
This guy was talking non-sense. 1-10. If I was executing a single use once ever to run piece of code maybe this would be true. But facebook would never execute a line of code once.
In reality the ratio is more like 1-1.2. For well written code on both sides of the fence. Code that runs a LOT.
Also a large scale env would have plane and simple implemented a ton of optimizations for caching, CDN, that would make the choice of language almost moot. The language choice really boils down to the choice of lead arch and his/her favorite resume boosting tech of the moment. Of course you stuck with that choice for much longer than anyone ever guesses.
Language pro-con this that and the other thing become part of the hallway conversations that eat up valuable time. I always go with what will get me to market the fastest. I can't worry about if I made the wrong choice in languages simply because in two years one of the other languages may be better. It's just too hard to see out farther than 6 months.
Objective-C with garbage collection turned on blows away Objective-C with garbage collection turned off on Snow Leopard. I believe it is on the order of 20x faster. Objective-C is a compiled language.
...get things done more than 10 times faster, than in C++
That's the whole point. A server's time is much cheaper than a developer's. Even for Facebook it's going to be totally uneconomic to write everything in C. Maybe they'd benefit from writing a few routines in C where the extra performance is needed. For smaller websites even that would be a total waste of time & money.
The code is much more portable in php too.
Web Design
one word, vlan
He said 20k connections, as in web requests, not client computers. And its not a valid excuse in that case. The arp table is cached for fuck sake.
You're not considering the fact that there would be no product without PHP. So really, 7,500 servers for facebook, if they were using C++, is probably a bit of an over-estimation. :)
(That said, it's pretty evident that PHP was the wrong language for what they're doing. Python or Java, or even Perl would've been better choices. Arguably, .NET would be better, too, were it not for the likely architectural overhead in such a choice.)
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
The CPU-intensiveness alone is reason enough to despise it!
Although the windows version is borderline-acceptable, Flash absolutely crawls on Mac and Linux boxes (the PPC mac versions were particularly bad, and no PPC linux version ever existed).
I'm sorry, but a 320x240 YouTube video should *not* bring a modern system to its knees. VLC can play .FLV files using no more than 5% CPU on my fairly unremarkable 1.66 GHz Core Duo Mac Mini.
The fact that a hacked-together reverse-engineered codec performs 20x faster than the official implementation is just sad.
(As much as I want to love the HTML5 video element, there's little realistic chance of it actually being used unless Microsoft hop on board, and we all agree on a standard codec. Theora fits the bill, but isn't a particularly good codec compared to the proprietary options)
-- If you try to fail and succeed, which have you done? - Uli's moose
Why is this garbage even getting posted?
My blog: http://www.seebs.net/log/ --- My iPhone/iPad app: http://www.seebs.net/seebsfrac/
At best, a good case for compiling PHP code. I'm not sure the assumptions are sound at all: 10:1 ration? PHP code is dominant code executed? I'd bet most of execution is in database, and that is surely executing compiled C/C++.
Let's assume that C++ is twice as hard to code and maintain as PHP. Let's also assume 200 man years of work went in Facebook. Let's further assume there are 50 maintenance workers and each worker commutes 50km per day, 220 days per year.
Cars average 200 grams of CO2 emissions per km, so writing Facebook in C++ would have produced 440,000 more tonnes of CO2 than the PHP workers. Each year that goes by, the C++ maintenance workers would have produced 110,000 more tonnes of CO2 than the PHP workers.
Sounds like a good argument to dump C++. As if you needed any argument other than "it's C++"!
Running a server is cheap.
Paying a developer is not.
Civilisation is largely about the multiplication of human effort through the consumption of energy and automation. So, we multiply this developer's effort by a couple of thousand when running one machine and then do the same on another several hundred machines beyond. Each costs several thousand dollars to purchase and several thousand more every year in electricity, in cooling, networking, management and maintenance.
So, the effects of developer incompetence are also multiplied several thousand times often across hundreds or thousands of systems. Millions if we're really lucky.
So it isn't just one server, it's just one extra datacenter. It often pays to hire better people.
running a server for a day - $1
You think you get a real server for that? You get a tiny division of a server for that kind of money.
2) why doesn't these big server farms start looking at migrating code from PHP to C or C++ when the PHP+web design is solid?
The network effect. They migrate to Java instead.
Speed to delivery is nearly always primary importance.
Indicating speculative projects and disposable code.
Deleted
Clear evidence.
After a recent change in hardware platform for new acquisitions, Facebook was surprised to get a speedup from RDDR2-800 memory vs. FB-DDR2-667 memory, because many of their apps are actually memory-bound. They're a major user of memcached, so the real limiting factor on how many servers they can power down is how much RAM they can stuff in a server. The CPU utilization comes somewhere after memory/disk throughput/latency in power conservation considerations. Sure, there's a small marginal difference in how much energy they burn through on PHP code vs. C++ code, and a small marginal difference in how much RAM they need free for PHP vs. C++ code, but for the effort it would take them to switch to C++, they could save a lot more energy by optimizing how they use memcached. Which is exactly what they're doing.
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
I think if we managed to sustain a fart at least once a year we would putt less CO2 for a year than PHP cause for 10 years. Not to mention that how much PHP with all that Open Source phenomenon behind it save lives and nervs... PHP affected more on suicidal numbers that anything in C++ Web development world...
"Are you trying to imply that PHP establishes an entirely new connection to the database for every query? If so, you basically lose all credibility you might otherwise have."
PHP reuses a connection within a script but afaik every time a client requests dbrequest.php from the browser that script loads in its own little world and starts up its own personal connection to the server. A script making three requests uses one connection, but a script requested 12 times uses 12 connections.
I believe he is saying that when 20k people simultaneously request data, having a memory resident C app grab all the concurrent data requests in a single swipe and reusing the same connections eternally for all page requests its much faster.
Very good point, no-brainer optimization is often the best bang for the buck you can get without regards for the language used !
On a funny note, something triggered when I read:
> You could save yourself a lot of carbon
Now I know what it is: I think that nobody should try to save carbon, this would imply that one needs to produce more in the first place in order to save it ;-)
Again, not that I am trying to be the grammar police or anything (I hate that), but I swear something really bugged me we I read your sentence.
Everything I write is lies, read between the lines.
Java saves machine code to recall it next time.
So most of the time, Java use machine code, no interpretation at all. Go on and learn about sophisticated optimization algorithms that are incorporated inot the JVM, several JITs and the like.
I has become a field where very brilliant people work, they are probably ranking above the average developer ! ;-)
Everything I write is lies, read between the lines.
Noscript :>)
Mod me up/Mod me down: I wont frown as I've no crown
Incorrect, PHP can indeed maintain persistent DB connections across multiple requests.
http://php.net/manual/en/function.mysql-pconnect.php
"the connection to the SQL server will not be closed when the execution of the script ends"
"Mind, as manifested by the capacity to make choices, is to some extent present in every electron." -Freeman Dyson
> also get things done more than 10 times faster, than in C++.
For me, JSP and Java classes get things done more than 10 times faster than PHP. Heck, use struts and hibernate and you have just gained another 10 times and more security with regards to SQL injections and other topics.
Of course, there is a little of overhead in learning how to use the tools that, in my experience, I found most PHP developers are not willing to invest in. But once that overhead covered, it is done, you only have to learn it once..
Everything I write is lies, read between the lines.
I did know that I will find it: "The average person farts 12-18 times a day, producing about 45 mL of CO2 and 45 mL of CH4 per day, or 16.43L of CO2 and 12.78L of CH4 per year." Source:http://envirostats.digitalcitizen.ca/2007/07/19/0172/ Here is interesting part: "At Standard Temperature and Pressure (STP), 1 kg CO2 is 509.1L, so the 284.9L per year is only 0.560 kg CO2 per year. This is less than the amount it takes to run your computer for a year (0.705 kg), and a tree would only have to spend 17 days per year “sniffing” the greenhouse gases in your farts all year to carbon neutralize it." Considering that not evry peroson on world has computer ( and uses PHP) conclusion is not discussable. Moderator are you one of people who always giggles when someone mention fart or similar word?
I think you probably meant to reply to someone above me ... my post is in complete agreement with yours.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
Perhaps the reasons for his differences can be found in this quote from that same page then:
"* Apache does not work well with persistent connections. When it receives a request from a new client, instead of using one of the available children which already has a persistent connection open, it tends to spawn a new child, which must then open a new database connection. This causes excess processes which are just sleeping, wasting resources, and causing errors when you reach your maximum connections, plus it defeats any benefit of persistent connections. (See comments below on 03-Feb-2004, and the footnote at http://devzone.zend.com/node/view/id/686#fn1)"
He clearly said that he was not using apache but custom c to handle requests.
In any case. He is obviously talking about a completely custom server app, load balancing, etc. That is a far different debate than writing server side cgi in php vs the same in c.
Well I am biased in that I *like* PHP, despite its flaws. However, speaking to the argument that Java might be a better choice, I did work on one project which we essentially completed in PHP as a first draft, and then the lead designer decided that PHP wouldn't be scalable enough (i.e. he decided he didn't like it, since he had no evidence to point to), and we rebuilt the whole thing in Java instead. PHP development time: 6 months, Java Reegineering: 2 years more.
The end result was efficient and made good use of Java's strengths, but I can't honestly say it was a superior product, and the additional development time spent on adapting it to Java could have been spent revising and improving the PHP version 4 times over.
PHP is quick to develop it, quick to make changes to, and good enough for many jobs. There is a reason why sites like Facebook are using it so heavily. Purist programmers may not like it for a variety of reasons that may be entirely justified, but in many cases PHP is the right tool for the job and that is what should really matter. Java is also an excellent tool, but I think the development time is greater in the end, and the results are not necessarily all that much better. It likely comes down to a matter of preference, as other languages like Python or Ruby would probably be just as useful in the end.
As an aside, why does the OP not mention that you can write your own extensions to PHP in C++. Wouldn't that be a better option where you see code that is running too slow: concentrate on only those bits where it might really have an effect.
"The first time I got drunk, I got married. The second time I bought a chimpanzee, after that I stayed sober" Arian Seid
What about just turning off facebook, the entire thing is a collosal waste of time. And while your at shoot all the users as most of them are idiots who are a waste of space.
I know it's already all been said but really...the author needs to try this:
Program a small application in php, and one in c++. All the data must be stored in a database, on a remote machine (which is the way it would be done for a huge site). Now, hardcode in some data for your first benchmark of php vs c++ to get an idea for raw php vs c++ performance doing the same task, now, comment all that out, and get the data from the database, and time that. Guess what, I bet the times become pretty darn similar in the latter test. No, php isn't going to be QUITE as fast...but it's gonna be really close for REAL web-application type workloads where latency from your sql server and loading all the other page content come into play. Your clients are never going to notice the difference on the majority of applications, and I don't believe that most time is spent processing php or c++ code on a web application either. It's waiting for the DB, and uploading content to clients. If you truely are doing extremely data heavy tasks and lots of floating point math or something, then yes...C++ is probably a better tool, but even then, there's no reason not to use php for the non-data heavy stuff...
I know I'm just a youngster web programmer who only graduated from college a while ago and who deserves little respect on slashdot compared to some of you, but I do have a decent amount of experience with quite a few programming languages. I learned C++ first, and know it pretty well for someone who doesn't use it constantly any more. I think I'm very very good with PHP...and so does my boss. Given that information, I have something to say.
If someone told me to program an entire web application from top to bottom in C++, I'd probably quit on the spot, and walk out laughing all the way to the parking lot. I like C++ for lots of things, but there is no way in hell you would EVER get me to program an entire web application in c++. It would take 10x longer at LEAST to develop than it would for me to do in even Java, with which I actually have less knowledge of (but still a good working knowledge of), and I don't consider ideal for web apps. The debugging for C++ on something like that would probably drive me completely insane. I say me, because I'm assuming that I'm building this web app and only me...like I can do very quickly in php or python or nearly any other language that does web stuff well, otherwise it would be me and all my co-workers.
Languages that are traditionally used for web development are used for a reason...and it's not how fast/efficient they run. It's for the difference in expense of the developers (HUGE factor really), for how well the language suits the web in it's core libraries, and how well it integrates with web servers, database abstraction libraries...well I could go on forever really. I'm not saying that C++ couldn't get libraries built for it that made its appealing as say, ruby (on rails), php, or python, but lets face it, it would take a very long time to get to that point where everything was as seamless and easy as it is in current web languages...not to mention getting hosting companies to let you run a c++ app YOU programmed on their servers (they'd have to be stupid...really freakin stupid). C++ will just never be popular enough for web stuff to be attractive to developers...that's the bottom line. Lack of efficiency is such a tiny price to pay compared to these other factors.
This is like saying that we should use those medieval wooden carts to ship packages across the US. PHP works in many places where cpp fails to deliver (no pun intended with relation to my previous statement).
We have done the actual benchmarks, and the original post matches our experience.
PHP gives processing times of around 1 second (for a search function) and C++ code via a CGI gaves times of 0.1 sec. A ten times improvement.
Graphs and numbers are here,
http://www.wrensoft.com/zoom/benchmarks.html
Further when we switched to FastCGI we saw another 5 fold improvement, after optimising the code for FastCGI.
So I would believe a 50 folder improvement should be possible by going from PHP to FastCGI (and rewriting code to suit a FastCGI)
Aw, c'mon, that's just harsh. It's programming, it's just one type of programming. Speaking as a C and assembler fanatic, but fan of inter-compatible versions of Python and burner-at-the-stake of perl.
Scripting is nice just to whip things up and do proof of concept in, too.
On-thread, I agree that the difference here is quite high (for PHP against C), it'll vary for other languages. There's a nice table of relative speeds here (and the argument about what the language is doing, I don't buy... in my experience, things tend to remain somewhat relative):
Multi language simple fractal benchmark
These are obviously not the times you'd get for web ops, but I am of the opinion you'd find a similar curve, similar time relationships, for any general program. Which is to say that not only is PHP slow, it's slow for everything, which means it is an inefficient use of system resources in busy environments. Lots of cycles for not much done, which means programs that are loaded longer and resources that are tied up longer -- regardless of the underlying DB transactions, too - the DB is running on the same server, those cycles could have been DB cycles. And if not, more apps could have been running with the others out of the way sooner and/or not consuming CPU time. Waste is waste.
I've fallen off your lawn, and I can't get up.
The problem is that when you've got poorly defined data structures and typing most of the overhead can't be compiled out, as your C/C++ program still needs to deal with all of the indirection and type conversion. Due to PHP's loose typing, the type of a variable at any given time cannot be predicted, as it may vary due to user input. The problem is that the language provides no mechanism of forcing a given variable to any type or creating any data structures. Even in compiled code, even a simple addition of 1 to an integer must involve pointer dereferencing and lots type changing bounds checking.
The code could be optimized if the type of a given variable could be forced for its entire life in scope, which would allow to compiler to avoid a lot of repetitive logic for each operation. Specifying strict data structures would also be very helpful, as the compiler could then avoid finding the pointer in a key hash and instead it could hard code the offset from the start of the structure.
In my opinion the biggest benefit of compiled C code over Java code, particularly in highly dynamic workloads without a known peak load - is that the C code can allocate memory on the fly to expand to meet the needs of the workload, releasing that memory when it has finished the peak. With Java (not sure about C#, but I'm guessing C# also) the implementation team has to decide in advance how much memory the JVM could possibly need in the worst case scenario and pre-allocate that memory at application startup. Since determining the peak resource usage is a bit of a challenge even after hand tuning it manually a few times watching the system monitors, most apps are configured to use way more memory than they actually need ('just in case'.) It doesn't take too many applications configured to run using 256M, 512M or even 1024M - and you've literally sucked down all the memory in your machine and need to add another box.
Perhaps the 1:10 ratio isn't just about the raw code execution performance, but possibly enhanced by the ability to provision more concurrent instances on the same machine. Granted you're still limited by the other factors (database performance, I/O, etc) but if you can process 40 concurrent requests because your machine can run 40 instances of the same handler, vs 10 concurrent instances in an interpreted or VM based language - that's where I see the performance benefit.
Glonoinha the MebiByte Slayer
there's also the whole question of "what is the bottleneck". hint: it ain't the php servers, its the database (and its mirrors). this is true for any web app - most of the "power" of such a huge database app is in transaction handling and the like, and in that, the underlying php code is itself written in C++ - these servers will be doing the same amount of work (for this part of the process, 80/20 rules and all that) no matter what the rest of the http/html processing code is doing.
"But remember, most lynch mobs aren't this nice." (H.Simpson)
-- Joe
I'm distraught about PHP and its efficiency. Unless you do some kind of guru magic, every click on your server is interpreting not only the screen you're visiting but all the libraries it requires to load itself. The computers spend nearly all of their time interpreting the same code over and over for each click. The analysis is sound and points out the fallacy of using interpretive languages in ways they aren't intended to be used. On the bright side, at least they aren't using Perl under CGI.
Kriston
I have no axe to grind against PHP. PHP has its place. But when you reach the size of Facebook you need to use something that will scale better. I must say though, that the dozen or two errors I encounter daily using Facebook are handled rather well. But then, if they were not using PHP I seriously doubt there would be so many errors.
Out of curiosity, does anybody use Pike, at least outside of Scandinavia and Germany? I've only written toy programs in it, but it seems quite nice and to be pretty efficient. It's C-like, but with many of the nice features of dynamic languages. (Okay, I admit it, what I really like is that it allows binary literals.)
That's my guess of a stat. I'll keep with PHP, thanks.
Steve Magruder, Metro Foodist
But what's the environmental impact of making up 'facts', turning that into smoke, and then trying to blow that smoke up my ass?
A lot of people are getting bent out of shape by this thing, but I can't see how anyone can begin to take it seriously. The whole thing is so completely bogus and made up that giving it anything but a good hearty laugh
Thought I'd love it if my randomly made up factoids merited a front page story on Slashdot.
Ergo, it is going to reduce the processing necessary on the server to do any given job
Any given job, yes. But if there are a lot more "jobs" (i.e. more requests that require server side processing), the efficiency of the language used on the server side tends to become more critical, not less, especially if the per request overhead is significant, something that happens to be one of Facebook's primary complaints about PHP.
Whoosh!
All you have demonstrated there is that statically compiled implementation A of a regular expression library is more efficient than statically compiled implemenation B of a regular expression library. This is a contest that PHP cannot win. The implementation PHP uses is not written in PHP, but rather something like C.
That said, programmers who have better alternatives do not rely on regular expression parsers for casual processing in performance critical code. Regular expression engieer tend to be much slower than the alternatives in any compiled language. Of course hand coding the equivalent code takes longer.
You will never find a competitive compiler that uses dynamically parsed regular expressions to do language parsing for precisely this reason. It would slow the parsing phase down by a factor of ten or more due to two problems - first the regular expressions themselves have to be parsed, and then the resulting state machine has to be interpreted. In other words regular expression libraries are a microcosm of the situation with PHP itself - provided the patterns are known ahead of time, hand written native code to do the same thing is much faster.
I think most of the analysis I've seen so far implies that it's a simple cost tradeoff - more programmers versus more servers. It's more complicated than that, and you have identified one of the reasons why: maintainability. The kind of programs Facebook needs written are just plain easier to write in PHP than in C or C++, for many reasons (though PHP was probably not the best language they could have chosen, it's not an awful choice, either). If they wrote it in C or C++, not only would it be more expensive, but (and this is the important part) it would take longer to write and test. Web companies can't afford to waste a couple years re-writing their core infrastructure. They also can't afford to be stuck with a tool that's hard to modify.
Also, it's not as if Facebook is an all-PHP shop. They've invested a lot of effort writing an open source tool called thrift, which allows programs in any of the (12 or so) supported languages to communicate easily with each other. If they decide it makes sense for them, they can change languages.
I think the world is better off when people use the language that meets their needs rather than the language that makes them look smart.
With modern techniques, just about any language can be sufficiently compiled to run within 1/2 the speed of C
If only this were true. In the real world, Java is dead on the client side precisely because a large Java application cannot be compiled (let alone JITed) to come within half the speed of an equivalent application written in C/C++. Dynamic languages are worse in the sense that they tend to compile to native code much more poorly than a statically typed language like Java, although this is indeed changing.
When you can run a word processor written in Javascript on a thousand page document with 1/2 the speed of a word processor written in C/C++, and not wait forever for it to start up, we will know that dynamic language compilers have arrived. I would be impressed if someone could do that with Java, especially without resorting to expedients like SWT, to say nothing of library after library of C support routines - stuff Java simply (currently) cannot do with any reasonable speed, no matter how good the compiler is.
And it need not be said that no one is going to write a word processor in Perl, Python, or PHP anytime soon, although once the appropriate JITs are out there it would be interesting to see someone try.
the advantage of C and C++ is that it is not dynamically typed. picking some language that has similar syntax but is dynamically type makes it no different than picking python, ruby or php. Other then you get to suffer with a system language's syntax instead of the niceties of a script's syntax.
“Common sense is not so common.” — Voltaire
The TCP Setup Rate (new TCP connections/second) and Concurrent TCP connections are actually two very important metrics that can kill your network, and therefore your network-dependant application. Those are very CPU intensive for both the network devices and (even more) servers. You don't need 20,000 client machines from a class B subnet to kill that. If you have 2,000 connections per second and your firewall, SLB, reverse proxy, server or whatever can handle only 1,000 with a reasonable response time, you are in trouble because then it doesn't matter if you use PHP or C++ (layer 7), because you network is the cause of the issue. This is why there are testing companies (Spirent, Ixia, Agilent and others) that exist. They first test the network layers, then move up the layers and test directly against the servers. You'd be surprised to see how many times the bottleneck come from the network - either because of cheap hardware, poor design or configuration errors.
well at one time some of PR1MOS was in fortran but i think the later versions where more PL/1G
But it's not really that relevant, because even if the execution speed is 10:1 in favor of one language, language execution speed is only one of the bottlenecks. They'll be spending a fair amount of time outside of the language, executing the same APIs. There's network speed, disk access, shared libraries... a 10:1 difference might not be that significant when only 10% of the time is spent actually executing the program.
The point here is not to run the code faster. It is to use 10% of the original CPU power to run the code in the same time. Which leads to savings in hardware costs, electricity, air conditioning, even floor space. Of course, this is only the PHP frontend part and does not affect the database backend, but still I hear that the PHP frontend takes some significant processing power.
...and you assume they write their data-mining tools in PHP just because the "end-user experience" is written in that language?
Coffee-driven development.
I have not tried this, but there is a thing called PHC ( http://www.phpcompiler.org/ ) that purports to compile PHP code. Has anybody tried it? Does it work? Is it useful?
Because I know this Java guy and he got the foulest breath you ever smelled before you nose shuts off.
And since I am a PHPer and god's gift to woman kind (he decided they could do with a laugh) we can now safely conclude that 1. PHP 2. C++ 3. Java 999999999999999. ASP
The original article is just silly however, C++ is overkill were you often end up writing stuff that has already been written and been written better then you can ever do yourself in the time a typical web project has.
And that the author is full of shit can be clearly proven that he has NOT approached facebook with a sample of new code that would safe them a fortune. I myself have made very good money helping companies reduce their server needs, if he can shut so many servers of with his code, he can earn millions. But he isn't, is he? Perhaps because as a C++ web programmer he ain't used to actually getting a working site out on time and on budget that can be maintained easily to deal with rapidly changing demands?
I seen "real" programmers completly loose it on web-projects. They want to do design and analysis when the time to setup the schedule is the time you got for the entire project. Yeah yeah, if you give them a year or two, the product is no doubt 100% better then the 2 weeks PHP job, but the web moves a bit faster then that boys. I have seen a project where a traditional programmer sought out a framework solution that was just perfect. Nobody had used it but no problem, he had setup a 1 month training schedule for everyone. The entire project was supposed to be delivered in 1 month.
PHP works people because it works FAST. It allows you to drill straight down to the thing you want to do with a website rather then first having to develop your own solution for things that have already been done. Yes, this ease has lead to a LOT of PHP developers who couldn't code their way out of a paper bag, but this is like saying that because there are so many idiotic windows users, all windows users are idiots. Or indeed me saying that because I only seen C++/Java developers who over-engineered, all C++/Java developers do this.
And as for speed, when you use compiled PHP, it really ain't that much slower. Sure, you could optimize the hell out of your website with C++, but only if you spend ages doing this, making your site obsolete by the time it launches and extremely hard to adapt as demands change.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
"The syntax is fairly straightforward and familiar, being a typical mishmash of shell scripting, C and perlisms"
How can somebody contradict himself in the short space of a single sentence?
IANAL but write like a drunk one.
and the point of many others on the pro-php side is that there would be far MORE errors if it was written as a c++ application, due to what they see as that language's inherent complexities and lack of readability. I quit C++ for pretty much that very reason 13 years ago. Modern C++ is, in my opinion, self-obfuscating.
Now, PHP is also obfuscated now, for much the same reason - supporting multiple programming techniques (procedural, fake OO, and now real OO), large numbers of old deprecated libraries with different coding standards, and examples that poorly separate concerns (MVC) leading to bad mixes of logic and rendering until one goes out of their way to learn a template engine (and there's zillions of those, too).
But I don't have time for the mundanities of memory management and crap like that, especially when trying to figure out what the policy is for some library and how it is different from the next library I use, and for that matter, just how many libraries for C++ are out there, open-sourced and actively supported and maintained?
If FB was rewritten from scratch, to the design it is now (keeping in mind this is now effectively the 4th major iteration of it), then a C++ implementation would certainly be more efficient, if still more expensive from a developer resource perspective (C++ programmers are rare and expensive, 'cause nobody wants to work in it anymore because of all that tedium). But once written, it would be frozen because C++ produces generally far less maintainable code in my experience because of its difficulty and lack of readability.
Web applications in non-critical fields (and social networking is certainly non-critical) have to evolve, often and easily, and c++ does not provide that - it is better for a web app to risk a little instability than it is to provide 99.99% uptime but be impossible to change.
"But remember, most lynch mobs aren't this nice." (H.Simpson)
-- Joe
Considering mandelbrot took 116x as much CPU time on PHP than C++, I would say that the 10x is probably more of a guesstimate of what Facebook would use. That said, it is still quite high compared to my guesstimate, but it isn't naive enough to assume those tests represent Facebook performance.
When Argumentum ad Hominem falls short, try Argumentum ad Matrem
No, he assumes they write their data-mining tools in PHP because TFA claims that most of their servers are running PHP. Though maybe that just proves they aren't doing that much data mining. (Or if they are they're outsourcing to Microsoft, which would make sense since that's where they get their ads.)
It doesn't take too many applications configured to run using 256M, 512M or even 1024M - and you've literally sucked down all the memory in your machine and need to add another box.
Except the host OS will most likely use lazy memory allocation - i.e. it will allocate virtual address space, but will only allocate physical memory when it is actually used. The end result is that even after requesting 4GB of memory, if I've only written to 128MB of it, then I've only got 128MB allocated from the host.
On the other hand, if you request, and then immediately use all that memory, then yes, you are going to have issues (which is why a C malloc followed by memset is bad).
"Necessity is the plea for every infringement of human freedom. It is the argument of tyrants; it is the creed of slaves
Comparing the speed of PHP (or Perl, or Ruby, or ???) to a compiled application in a site like Facebook is an invalid comparison; so much of what needs to get done involves outside processes. Pesky little things like exchanging data with an SQL server.
Years ago, I wrote a highly-optimized C program to interpret input data and save it to a database. C was used because the interpretation involved a lot of bit-based (rather than byte-based) manipulation. It was very fast. But, a couple of years ago, I needed a one-time modified version of it for a project, so I developed the temporary program in PHP instead, since the changes to the C program would take longer than I wanted to dedicate to the process.
Surprisingly, the bit-wise manipulation in PHP, while significantly slower than that of the C program, was not a significant factor in how fast the conversion ran. What should have been a 100-200% increase in run time was less than 5%, due to the overhead of the database inserts. For a program that runs once or twice per day, the added overhead was inconsequential, so the next revision of that program was in PHP.
Could Facebook shut down thousands of servers if all their code was converted from PHP to C++? Doubtful. Maybe a hundred or so, but not tens of thousands, as claimed.
Ah, that must be the reason real-time systems disable garbage collection. It's just more efficient and outperforms manual memory management all the time.
Get it done wins every time. Power is cheaper than a competent C++ coder's time and time to market is shorter for PHP. I'm not saying it's right, it's just reality kicking idealism's ass again. /shrug
Don't kid yourself. It's the size of the regexp AND how you use it that counts.
No, they disable it because it has a space cost and real time systems are almost universally on small devices. Also, it outperforms manual memory management on average, but real time systems care about peak, not average.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
You know that the jvm can be configured to allocate more memory on demand, right? Your statement just isn't factual, and hasn't been for at least the last several years.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
and as much as I had to say it PHP is about as efficient as any other language that has a C based runtime at String processing. There is probably NOTHING to be gained by adding the complexity and cost of a lower level language like C++. Now re-writing it in Erlang, that might get you something ;-)
> I think you probably meant to reply to someone above me
indeed, my mistake, I am sorry about that...
Everything I write is lies, read between the lines.
The premise of the article is right on target, in terms of power efficiency for the entire front end. You are right, however, that if power consumption or hardware overhead is a sufficient problem, you could probably gain a marginal amount of performance by going to C instead of C++, although far short of the reduction one could obtain by making the first step. The question is one of balance.
If your requirements make it optimal to run a language that is running behind the state of the art in interpreted languages by nearly three decades because it is easier for some people to learn, that is great. Compared to something like Applesoft BASIC, which actually started out with an intelligent, highly optimized design (byte code even!), PHP started out as a first class hack. And like most first class hacks, short sightedness in the original design tends to cripple all future versions, even when it is far past time for the system to "grow up".
And that, unfortunately, is why the future of PHP looks approximately as bright as the future of Perl. The duct tape of the Internet. Not that there isn't a sizeable niche for that sort of thing.
It all depends, if the functionality you're writing is called all the time, it might very well be worth it to optimize the thing. A similar example: when I got the RAM size for an embedded application below 512KB, this saved a lot of money to the company I work for just by the sheer volume of units they sell each year (what was is, something like 3$/unit * 3000 units/yr - numbers could be off). The economics go further than just "faster to write".
PS: Yes, SRAM is very expensive.
"It's too bad that stupidity isn't painful." - Anton LaVey
Wow, Gratz to the submitter of this article. This is one of the most rabid and foolish examples of "going green" I've seen. PHP is certainly not the most efficient language in terms of run time resources. But you have to count the resources required to develop and maintain C++ code with all its pointer foulups and memory leaks versus PHP which is relatively simple and straightforward to develop and runs in a very stable manner on either a LAMP or WAMP stack. Sure servers eat up a lot of energy. But so do programmer armies who have to commute or log in or fill out timesheets by the forest, all to chase memory leaks, buffer overflows and the like. Oh yeah, and lets not forget the number of EXTRA servers you're going to have to put online to make up for the ones that need rebooted every few hours because some high school script kiddie doesn't bother to sufficiently check for memory leaks that chew through the server like a teen athelete on steroids. Maintainability is a HUGE factor in overall cost in terms of both $$$ and other resources. Use C++ or Assembler or whatever low level, low resource hogging language on a FEW critical sections, written by l337 coders, and let the pock faced script kiddie army churn out the mountains of PHP, .Net, JavaScript, etc that is at least garbage collected in their own VMs.
Be More, Be Manly, The Manly Geek Ubergeek Extraordinaire Blogger: www.manlygeek.com/blog Podcaster: podcast.man
True, I just wanted to point out that most arguments here can be proven wrong by one counter example.
Nearly everything in engineering involves a tradeoff. So arguments leading to statements like language a is better than language b are nearly always wrong.
Also garbage collection is turned off in real-time systems not because of space complexity but because of time complexity. For real-time systems the important factors are the deadlines for given tasks.
Additionally, normally garbage collection does not improve the time complexity of an algorithm but the space complexity, because it avoids memory leaks and possibly fragmentation.
No. The less you have to hit the database, the better. Slashdot is a good example - people who are logged in are only 1/3 of all page views, but require 2/3 of the boxes.
Also, caches like static content the best.
What? The point is that they should be dealing with relatively static data, therefore they go to their cache layer first (in this order usually: memory, memcached or similar, DB query cache, DB query)
Of course the less you have to hit the DB the better, but even moreso the less you have to re-calculate things that don't change the better.
Hence my comment about peak vs average performance in regards to real time systems. Real time systems can't have the large peak of a garbage collector, so they give up the superior average performance it provides.
Garbage collectors reduce the overhead cost of each memory collection by reducing the number of calls to memory collection, and thus typically provide superior average performance. It's a sort of reverse amortization.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
I recognize that you can specify a maximum memory allocation for the JVM, and can specify a minimum starting memory allocation for the JVM - and I appreciate the implications to the garbage collection routines based on the spread between the two.
I have four minor issues (to which I alluded, but never really called out) with this strategy, and over time they cause the statement I made to become fact.
a) Even though the JVM only initially allocates the Xms, eventually (and often fairly quickly) it will allocate the Xmx and treat the Xms as merely a goal during a GC.
b) I can't say I've ever seen a JVM ever return memory to the OS after allocating a large chunk in order to process a memory intense transaction.
c) Hand tuning the memory parameters for a fairly large application with an unknown anticipated workload is a fairly expensive process (labor costs.) What's the next setting used if 256M isn't enough? 512M of course. If that isn't enough? "Crank it up to 768M or throw a full Gig at it, I'm tired of farking with it and I have a backlog of projects to work on."
d) And what happens when 512M is sufficient for 99.9% of your workload (including 100% of your test cases) but once a month you get hit with a peak that requires 520M to process? The thing dies a horrifically spectacular death in production, in slow motion, with everybody watching. And you can't reproduce it, because the business req docs didn't include that one case, so you run it through your performance regression tests again for weeks, unable to duplicate the problem. The ability to dynamically allocate (and then de-allocate) an extra 10M of memory to the process just cost weeks of wasted effort on top of a production failure.
I like Java, don't get me wrong - coding up front end web based applications in C would be an expensive venture, given the existing base of development tools, environment extensions and developer talent. There would need to be a pretty serious scalability issue (known in advance, by a company on a project with resources to finance it up front) to warrant such an approach - but pre-allocating all the memory an application could ever use (ever!) makes for an environment that doesn't use the hardware to its full capacity, often being very wasteful. And I think that's what the original article was about.
Glonoinha the MebiByte Slayer
a+b = what paging is for. Java (sun vm) doesn't return allocated memory and trusts the os to do the right thing.
c+d = set Xmx to the max you can allocate to a process, and trust paging per a+b. Otherwise, if you consider a native process, what is a runaway memory allocator going to cost you?
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking