Facebook Rewrites PHP Runtime For Speed
VonGuard writes "Facebook has gotten fed up with the speed of PHP. The company has been working on a skunkworks project to rewrite the PHP runtime, and on Tuesday of this week, they will be announcing the availability of their new PHP runtime as an open source project. The rumor around this began last week when the Facebook team invited some of the core PHP contributors to their campus to discuss some new open source project. I've written up everything I know about this story on the SD Times Blog."
Is this what they're using on the newly redesigned site? Because if so, it's pathetically slow. Facebook is one of those places that with every attempt to "improve" things somehow manages to make it worse and worse. They're a perfect candidate for a Microsoft buyout.
OK, so can we put the "PHP is comparable to JSP and ASP.NET" to rest now?
PHP is for lazy developers. I develop my webapps in C and I even wrote my own httpd to improve performance.
At some point, if you are lucky enough, you will require extremely high performance from your web pages. You start out coding HTML in Notepad and move on to Perl CGI then on to PHP with scripting embedded right in the generated HTML. All the time you gain programming crutches at the expense of processing speed, and for a while this is a great tradeoff.
But one day you start having server hiccups because your scripts can't keep up with your traffic. Sites like Amazon have already run into this and have moved away from scripting languages and back to system languages. Running applications directly on the CPU instead of relying on a runtime to translate (at best) bytecode into machine instructions means maximizing CPU cycles.
So I wonder what longterm benefit there is in improving the language runtime.
Don't starting talking about high performance and then naming languages that don't have the chance to deliver. What you really need to do is just program the entire web page in Assembler and then your going to have speed and performance that can't get any faster. If your developers are noobs and can't use real languages and there just Object Oriented kids who can't work on memory and need to access everything through abstracted methods, then fire them and get in some embedded developer who know speed = good code and good languages. If you don't want to use assembler then use good old C!
You want speed use languages that can deliver and don't try to rewrite slow scripting languages to do the job of the trusted old methods, assembler and C.
...though traditional forks do not get started after "friendly" meetings. But it still sounds like one; which is not a very good thing in my opinion.
What they (Facebook) should have done is to combine resources with the PHP folks, then later release a "new" PHP version with this new engine.
This would be dubbed progress by the majority here.
By the way, where are the stats that show how wanting the current PHP engine's speed still is? I want to see some serious comparison.
Sounds like Facebook rewrote PHP and then invited PHP core developers to adopt it as their core development platform? I can't imagine that went over all that well... probably hit a number of them in the pride region. And the article said it is to be released as open source, but failed to mention the license. Will this be some sort of twisted "FriendFace Public License" or some perversion?
This is not what is meant when a party contributes to an open source project. "Here, I rewrote it for you. It's better. Now just throw away everything else you've done and use this." Really?
According to that article posted recently about Facebook's master password being 'Chuck Norris', the project is indeed a compiled PHP that goes by the name of HyperPHP, or HPHP. It will supposedly lower the load on the servers by 80% and speed up things 5x, according to the unnamed source in the original blog post.
May be presumptuous, but, if it is better and liberally licensed; one would be foolish not to use it. This happens in all industries, someone re-invents the wheel and does it better, faster, cheaper with newer techniques, and the dinosaur quickly disappears. If the article is correct, Facebook is giving the dinosaur a chance to avoid extinction.
From TFA: UPDATE: After sifting through the comments here and elsewhere, I'm inclined to agree with the folks who are saying that Facebook will be introducing some sort of compiler for PHP.
Not a fork. Not as newsworthy as implied.
Would be nice of there was an option to compile it to say .phpc files like Python. Would be a nice thing for Perl too.
This PHP compiler item was revealed three weeks ago by a Facebook employee. Read at http://therumpus.net/2010/01/conversations-about-the-internet-5-anonymous-facebook-employee/?full=yes
So there is one guy at Facebook doing this PHP rewrite. It must be possible to figure out who he is. Have they hired any high profile PHP developers?
Would a language that runs in a VM, like Java, Scala or C#, be faster? After all, Twitter rewrote their backend in Scala and they seem to have gotten better performance.
...rewriting your site, to use a real language, instead?
I had to use PHP for 4 years, and I’d rather die than to do it again. (Same thing with the Internet Explorer.)
Get yourself a real language. One that makes sense! One with an actual spec. One that makes sense! (Has to be said twice!)
Even Python would make more sense. Java would be a professional choice. And if you want to get futuristic, I’d recommend Haskell. ^^
Everything is better than PHP. (Ok, except perhaps Intercal/Malbolge/Piet. Perhaps...)
Any sufficiently advanced intelligence is indistinguishable from stupidity.
I don't know what the fascination is with scripting languages on the Linux platform or with FOSS in general, but it results in slow programs
Speed of development is faster in a scripting language, and in developed countries, below a certain scale, throwing hardware at it is cheaper than throwing programmers at it. The point of the article is that Facebook is above that scale, and programmers to write a new PHP interpreter have become cheaper than adding hardware+power+cooling.
with flaky UIs.
Citation needed. True, the often use a different widget set from the rest of the desktop (e.g. Tk from Tcl and Python and Swing from Java), but the popular widget sets also have scripting language bindings. how can one really tell the difference between a wxWidgets or GTK app written with Python vs. C++?
I like to use refurbished/recycled machines; which means that I'll have an old P4, 512M RAM and a slow bus.
Do these use more electric power than, say, an Acer Aspire Revo? The power consumption of a Pentium 4 and the power to remove the heat it generates can become an issue, especially for a server that's turned on 24/7.
Many times, applications written in a scripting language, whether it be Perl, Python, PHP, or whatever, will hang often and then start working.
There are three causes for this, and you can distinguish them with 'top' or 'Task Manager' or something else that can count CPU time and page file accesses:
I would truly like to see an assembly language revival though.
The revival is here. It is called NESdev. It's just not for making PC programs because the processes to be executed on a PC tend to be much more complex.
Now if we were talking about writing the PHP interpreter in assembler, I'd say you had a winner compromise.
You'd have to do so over and over for x86, x86-64, ARM, PowerPC, and other architectures on which PHP runs. That's why these interpreters are most often written in C, with occasional reference to the assembly language code that the compiler generates. Really the only time you have to handle assembly in a PC application is when you're implementing a just in time compiler, and it's becoming the fashion to let LLVM do that for you.
Why not just stash your farm of slow php systems behind some heavy duty caching appliance(s)?
Something like aicache might fit the bill.
If your developers are noobs and can't use real languages and there just Object Oriented kids who can't work on memory and need to access everything through abstracted methods, then fire them and get in some embedded developer
Embedded developers tend to 1. work on smaller, more focused systems, and 2. charge more. For one thing, a module inside Facebook deals with data types more complex than those in the firmware of a car engine's microcontroller. And below a certain scale, the money you save by hiring noobs (and taking the tax credit for recent graduates if available) can pay for throwing more hardware at the problem.
PHP is a weakly typed language, so for any given operation, the interpreter will have to check the types of the operands and then figure out which operation(s) on the CPU to call to solve it. Also, as it's dynamic, the operand may not even exist yet.
So, even if you did write a compiler for PHP, instead of the PHP interpreter doing the type checking and figuring out what to do, you'd have to compile in some runtime checks to implement the same logic that's currently in the interpreter for every single operation. This doesn't sound to me like it'll be significantly faster (Although I'll freely admit it's just a gut feeling.)
So, a question to the room - if it's even possible, is there any advantage in compiling a dynamic weakly typed language to native code?
Assembly language isn't platform-independent. It's really easy to screw up and hard to optimize. And it's not much faster than C/C++. The issue at hand is balancing the cost of writing the code with the cost of running it. I don't see how the cost of writing and maintaining software in assembly language will ever compete with the costs of C/C++, potential speed increases and all. Object-oriented languages make small performance sacrifices in return for much greater maintenance, and that's how it should be. Scripting languages take this even further, and for these large websites have lost their advantage. The only time assembly will prevail is when we return to incredible memory constraints, but even embedded systems pack tons of memory now so I don't see that being an issue.
You can lead a horse to water, but you can't make it dissolve.
.
I am sure we will be hearing all about how successful this project is, but is the auccessful application of a band-aid really the long-term solution Facebook needs?
its why its the way it is
now facebook
will faceplant
Flaky UIs - click on a button and nothing happens. Or things not drawing properly. These are my observations.
A refurb machine is about a third the cost of a new machine - even cheaper than the cheapo Acers.
Interesting list of possible causes. I don't care what the reasons are - scripting languages are not appropriate for large applications with GUIs. And as far as time savings for development is concerned, there are these things called "frameworks" - like gtk+ and Qt - that can speed up development to be just as fast as the scripting languages.
Thanks!
Assembly language isn't platform-independent. It's really easy to screw up and hard to optimize.
You wouldn't necessarily use it for your app, but having developers understand what's going on at a lower level, can help even when using higher level languages.
Perhaps this won't help with developers making VB-written corporate apps, but at the scale of Facebook, knowing a larger part of the whole stack helps you makes things run better. And since you don't necessarily know where you're working in your career, covering it in school (or doing it yourself, if self-taught), could be useful.
Even if you don't ever use it, personally I think knowledge and exploration are their own rewards, regardless of any "useful" benefit.
with flaky UIs.
Citation needed.
Run Eclipse.
What the hell kind of word is gotten?! Can't you people learn the language?!!
Flaky UIs - click on a button and nothing happens. Or things not drawing properly.
I've seen buttons do nothing and redraws fail even in compiled programs.
A refurb machine is about a third the cost of a new machine
By "cost", do you include or exclude the cost of power and cooling? And do you include or exclude the cost of replacing failed components? Capacitors die.
scripting languages are not appropriate for large applications with GUIs.
One scripting language has a huge deployment advantage over everything else: ECMAScript. It interacts with Document Object Models exposed by various runtime environments, and it's sandboxed so that users can more or less safely run a program without getting an administrator to install it. You might know it as JavaScript (ECMAScript + HTML DOM) or ActionScript (ECMAScript + SWF DOM). Or would you rather go back to ActiveX, where the web site sends the equivalent of a compiled DLL to each user, which runs with the user's full privileges and doesn't run on anything but a convicted monopolist's operating system?
If you could do in 1 day same thing that would take 2 weeks with assembly? The choice is clear.
Unless the two weeks of hand-tweaking the assembly language code of your program's single biggest bottleneck would reduce your program's system requirements so that twice as many users can use it. Such a case is reportedly common in video game development, where the increased revenue is often worth it.
Not to mention concerns about portability
"Portability" has more than one meaning. There's portability of the code, or its suitability for execution in multiple environments whose hardware isn't compatible. For this, you can keep a fallback implementation of each asm module in C. That's useful for running test cases such as whether the asm version still works correctly or whether it's worth continuing to maintain. The other kind of portability is the ability to run on small, battery-powered devices. These tend to have underpowered CPUs to save manufacturing cost and increase battery life, and the code to run on these CPUs must be extremely efficient in order for the application to be responsive. Go try to make a software 3D renderer on a handheld device with a 16.8 MHz ARM CPU and tell me you don't need assembly anymore.
That thing is a broken buggy piece of garbage. Any time I go out to an event or something and want to upload anything more than half a dozen photos, it inevitably blows up on random photos for no reason (completely fresh off the camera unedited photos). I have to babysit the upload and instead of just hitting select all and letting it go, I end up having to upload it in chunks of 5 photos at a time.
Not that surprising if you've read the book "The Accidental Billionaires". They specifically mention that there is one person dedicated to writing a PHP compiler and compiling all Facebook PHP.
Also, I don't understand why they don't use one of the currently available PHP compilers, phc or Roadsend. It's possible they started their initiative earlier, but they should have announced it and possibly prevented some duplicate work.
Caucho Resin has a mostly pluggable replacement for PHP which is written in Java. It adds web friendly features to PHP like distributed sessions and load balancing. Given the JVM JIT is already plenty fast and the benchmarks show that Java/PHP beats regular PHP handily - I wonder if Facebook considered using it at some point.
If Facebook really wants to speed up the customer experience all they need to do is remove Akamai from their content delivery network (CDN). That's where my browser is always stuck in a Waiting status when I notice a connectivity issue.
You are saying that it takes weeks to use assembly for something that you can code in an hour? That's gross understatement.
When assembly was used more in PCs, processors and programs were far simpler. Today, 4 cores is the standard even in desktop PCs! Things such as multithreading, etc. are essential basics of modern programming languages and can be handled exrtremely easily. However, properly designing, coding, debugging, testing and documenting those things on Assembly takes far more time than you implied. And then they are very platform dependant and need to be redesigned, modified and tested, etc. when you want to change servers... Or if you want to have different types of servers running the same code... It's a nightmare! Not to mention that it is harder to find developers for that, you need to pay them more, etc...
And how much you really win doing that on Assembly instead of using C or even C++? Very little! I am sure that the difference is notable in a system at the size of Facebook but they have obviously decided that it isn't worth it even for them.
php and threading! ... um nevermind nothing to see here.
Mean while efforts to get python running nicely without the GIL etc. are still going on without press.
Eclipse isn't written in a scripting language.
PHP is an example of a scripted language. The computer or browser reads the program like a script, from top to bottom, and executes it in that order: anything you declare at the bottom cannot be referenced at the top.
This was true in PHP3, but since PHP4, even declaring functions at the bottom of a file, they were still available at the start of a file execution. Everything got compiled in to an intermediate stage before execution.
creation science book
Anyone want to help me out with starting a PHP project? If so, please post your contact information here. I just need to consultation and perhaps initial coding.
If "Facebook" buys PHP and makes it similar to ".Net", i.e. kills it, it would be really sad.
Speed of "Java" and ".Net"? Is it a joke? "Java" hangs all the time and the ".Net" code to do a simple task is so convoluted that it is just ridiculous.
Speed of "Java" and ".Net"? Is it a joke?
No, it's not.
"Java" hangs all the time
No it doesn't.
and the ".Net" code to do a simple task is so convoluted that it is just ridiculous.
No, it's not.
Honestly, you really have no fucking clue what you're talking about, do you?
Flaky UIs - click on a button and nothing happens. Or things not drawing properly. These are my observations.
Unresponsive UIs tend to be written by naive/ignorant developers who don't understand threading.
almost all java speed problem that I have seen were caused by developer ignorant of the java memory model
or by admin unqualified to administer (performance tuning is a part of administration in my book) an application server.
If you know when to use different garbage collector, you applications will never froze.
Rewriting a managed runtime is difficult, costly, and time consuming. And at the end of the day, the odds that they will beat the performance of the CLR or JVM is super small. At best, they may match the performance of either of those runtimes, but at an enormous cost. Open sourcing it won't make a difference - it's already open source and has been for years. A much better investment would be changing the site to use a different technology that was faster. There are lots of options. The goal is to improve performance, not embark on a fun science project.
"Java" is a creation of "Sun" Co. We all use "Open Office" of the "Sun". It is free.
But did you try to switch on French or German language spell-checking in "Open Office"? I always think when I do it that their chief engineers should be really stupid. Indeed a little boy is needed to say that the king is naked.
As for ".Net" I do not trust it either. The creator of this language does not want the Internet to become what it can become, because it will damage their core business. Their natural interest is to slow down the Internet, at least for a year or two more (trillion dollars more). That is why the "Internet Explorer" is 4 times slower than other browsers. That is why ".Net" is so confusing, slow and obfuscating.
Oh, yes. They have 5000+ strong marketing department, which succeeded to convince you otherwise. No surprise, they spend billions on it each year.
But if the PHP will be bought out, as the MySQL has already been, it would mean the end of the Internet as we know it. Because this is the good open technology that just works.
You wanted the truth, here is the truth.
Agreed. The JVM does an decent job, and C# has some pretty nice syntactic conveniences.
I ran a string whacking benchmark (posted on my website, roboprogs.com: "... faster ..." page), and, for a sequential task, Perl cleaned up handily. However, for the commonly available languages on *nix, Java outshone everything else for dealing with threads. Keep in mind this was a naive use of threads as well: fire up for a single task, rather than having a thread pool that looped over requests from a queue. Java still did a good job managing it.
(though I do wish Java had more alternatives for quick and dirty coding at times - weakly typed options, delegate/function pointer type stuff -- still need to look more into jRuby, Scala, Groovy, et al)
Yow! I'm supposed to have a plan?
I think one of you is talking about client-side Java in web browsers (which is the worst thing in the universe) and one is talking about server-side Java.
"Patriotism is your conviction that this country is superior to all other countries because you were born in it." -- GBS
I was hoping this would be a fully open version of the PHP interpreter implemented in Java, maybe with a bytecode compiler.. Much like Quercus or Project Zero.
Come on. I've used Java. I always have to install some crap on my browser, usually about 12 MB of it. Then it doesn't work.
At work there's software that pops up from IE. This always crashes. Heaps of out software is written in Java because the developers are lazy and stupid. It always crashes, uses inconsistent controls, and just doesn't feel write. Trying to open up two of the same app will crash the computer.
Java is useless. It might work in a lab environment, but in the real world it is a buggy, naggy, ugly crutch for useless programmers.
And why is that important exactly?
Its UI is sluggish, and the whole thing is a living example of leaking abstraction gone wrong.
Might as well be written in PHP.
with every attempt to "improve" things [Facebook] somehow manages to make it worse and worse. They're a perfect candidate for a Microsoft buyout.
Microsoft isn't exactly known for speed either.
Yeah...
There is a lot of crappy software written in C/C++. It's easy to make sluggish UIs and bad abstractions. But that's not the point. The fact of the matter is, Eclipse is not written in a scripting language, it is written in a compiled (or what might as well be compiled) language with strong typing and a slightly more dynamic runtime than what you get with C++ and comparable speed. So bringing up Eclipse as an example of the problems with scripting languages is just plain invalid.
I think you're comparing to a poorly written Java application instead of a good Java application or Java itself. Applications can hang and crash in any language.
this is just contradiction!
I did not bring Eclipse up because it is scripted. I brought it up because it has flaky UI. The parent poster needed citation. I believe the amount of abstraction layers involved in Eclipse code in combination with the cost of the Java primitives used by each of these layers, are what bring Eclipse to such crawl. This is a potential problem with any scripted language as well - if you interpret something, the cost of primitives is high, and stacking APIs on top of each other is going to cost you (and your users) plenty. C/C++ get away not because the software is written better, but because the primitives cost less. And here is where I get back at the Eclipse issue - if Java program has this kind of performance, and it is written by a good team, I believe scripted language software is to expect the same set of problems. You need to make sure that if you do abstract things in your project, you understand the cost involved with each additional such layer. Also, you should not pay for features you do not use (the zero overhead principle.) Native code is the true God. You can start with whatever script you desire, but make sure you can reduce it to something that can run fast. The problem is thus not necessarily in languages (most of them offer the same superset of needed features), but in implementations.
I wouldn't call C or C++ platform-independent either. Not without a huge amount of work and lots of precautions.
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
As for ".Net" I do not trust it either. The creator of this language does not want the Internet to become what it can become, because it will damage their core business. [...] You wanted the truth, here is the truth.
.NET is a runtime environment, not a language. Examples of languages that run within the .NET environment are Assembly, C, C++, Haskell, Lisp, Perl, Python, Scala, and many many others. Speaking of the truth, you might want to do your homework before claiming you know it.
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
even embedded systems pack tons of memory now so I don't see that being an issue.
Talk to Microchip, STMicro, and Freescale about that. They sell billions of micros per year with less than 1K of general purpose read/write memory. I've written apps for chips with 24 bytes of RAM.
Hold on, there, Sparky. Assembly, C/C++, your-favorite-compiled-language ... are all just as "fast" as each other. They're all dependent upon the compiler for "speed". I recently moved to Big Company where we do Serious Embedded Programming. The old-skool assembly programmers were always bitching that "compilers were no good" and that to get real speed you had to "program to the iron." Here's what I observed: they redesigned the algorithms when moving to assembly. So, basically, they had algorithmic improvements. This was generally true except, in certain cases, when it was impossible to completely annotate the "hot" variables and how register spill/fill should be performed. In this case I suppose we are just dealing with a lack of expressiveness, and not any sort of fundamental problem with "high level" (or lower level) languages.
In today's modern processors you wont gain much performance in assembly. A core2duo simply reads the x86 instructions and converts them to risc and much of the optimizations happen at the compiler and during execution on the fly. You can always gain some speed but its nowhere near what it could do just a decade ago.
What also needs to be taken into account is the costs and time to rewrite years of development work from scratch. Sunken costs drive accountants crazy and threaten the job of any IT manager.
Instead of starting from scratch its better to use tens of millions of dollars of existing code.
Its nice from an engineer perspective but facebook is a corporation and money needs to come first and foremost.
Also assembler can crash a system and freeze it. The point of switching to NT or Unix was the point of stability of using c api's that are managed rather than using Windows95 which had assembly code that could freeze your computer.
http://saveie6.com/
250 comments later and all we get is arguments about what Facebook 'should' have done to resolve specific architectural problems none of us really know. Facebook is growing like wildfire. It has it's issue, some big. Many features are 'bad' (ever tried to run pop out chat? Jeebus my CPU cried for mercy) But it's quickly becoming THE go to site for millions of people. So Facebook has growth beyond their current ability to scale and they decide that rewriting PHP is a possible answer. The agree to open source it. Isn't this *exactly* what makes FOSS so great? Everyone benefits from the efforts of those using the code for their needs. Will this rewrite mean a global replacement to PHP's current implementation? I doubt it. But it may be just what is needed for many other sites with growing user bases and less $$$ for HW. Again, this is a bad thing because... If some random guy in a basement had done this, he'd be a borderline hero. But because a large corporate entity did, it's suspect and bad. I for one look forward to seeing what they really did and hearing from the PHP developers who attended the meetings as to what they are really doing, what types of bottlenecks they found, and what ideas they had to resolve them. Will they be 100% right? Doubt it. But in the end a large corporation is contributing back to the community, and potentially in a big way if their rewrite is widely usable. This. Is. A. Good. Thing.
Top Most Bizarre/Disturbing Error Messages
Way back when I was first programming PHP came out and all was miraculous, everyone jumped on that band wagon saying it was the fastest you could get, because it was like running c on your server. Well now 6 generations later, we see the bogged down version of an /. covering so many years worth of PHP stories for all the version and patches and upgrades and next version, how it was doing this and that, we had to take this out, need more security,...and oh yeah the doosy, going from one version to another broke so much code because it was no longer backwards compatible...4 to 5 i think....anyways, now I am seeing that the monster has gotten so out of control, that someone has decided to castrate it and make it a little lighter....problem is those balls only weight a few ounces....you need either a much older version of itself...which would be insecure, or a completely new product (ruby?) to go the distance with pure speed.
older lighter creature turned monster. How did this happen? I remember
"Don't hate the coder......, hate the language"
If they would just unlock my account I'd say they of a great site... I find that the client side is slow but I've never really had any issues with the back end speed.
Inventor, Artist http://www.Rubber-Power.com
Appropriate libraries can reduce or eliminate the advantage that scripting languages have. I don't know of any web template libraries for c or c++ but I'm sure they exist.
Don't worry, with 10 years of java devel you're most likely an adult who doesn't know how to code.
If you haven't written production code in at least ten languages, you're still a beginner. You can cut that down to three languages if one of them is LISP and one of them is COBOL, but personally I recommend against the short painful route. No single language is enough to make one a good programmer, and very similar languages aren't much help either.
1. Some bet on the Language (asm
2. Others bet on the Programmers (whatever the language an idiot remains an idiot).
3. The rest bet on the Hardware (servers are cheap, unlike humans -well, Facebook disagree this time).
All those points make sense -but presented in isolation they miss the real thing:
what can happen when you have C (asm will be only 20-30% faster), experienced coders, and powerful hardware?
Answer: you redefine the performance standards.
Why would it make sense, be needed, or only be desirable?
Because we will soon not have any other choice.