Facebook Rewrites PHP Runtime For Speed
VonGuard writes "Facebook has gotten fed up with the speed of PHP. The company has been working on a skunkworks project to rewrite the PHP runtime, and on Tuesday of this week, they will be announcing the availability of their new PHP runtime as an open source project. The rumor around this began last week when the Facebook team invited some of the core PHP contributors to their campus to discuss some new open source project. I've written up everything I know about this story on the SD Times Blog."
At some point, if you are lucky enough, you will require extremely high performance from your web pages. You start out coding HTML in Notepad and move on to Perl CGI then on to PHP with scripting embedded right in the generated HTML. All the time you gain programming crutches at the expense of processing speed, and for a while this is a great tradeoff.
But one day you start having server hiccups because your scripts can't keep up with your traffic. Sites like Amazon have already run into this and have moved away from scripting languages and back to system languages. Running applications directly on the CPU instead of relying on a runtime to translate (at best) bytecode into machine instructions means maximizing CPU cycles.
So I wonder what longterm benefit there is in improving the language runtime.
Sounds like Facebook rewrote PHP and then invited PHP core developers to adopt it as their core development platform? I can't imagine that went over all that well... probably hit a number of them in the pride region. And the article said it is to be released as open source, but failed to mention the license. Will this be some sort of twisted "FriendFace Public License" or some perversion?
This is not what is meant when a party contributes to an open source project. "Here, I rewrote it for you. It's better. Now just throw away everything else you've done and use this." Really?
So there is one guy at Facebook doing this PHP rewrite. It must be possible to figure out who he is. Have they hired any high profile PHP developers?
Actually, that isn't necessarily true. It might be true in a linear sense, but when it comes to juggling different threads and the like, assembly language as I knew it wasn't all that capable of describing the process all that well. Assembly language is a go-cart with rocket boosters.
I would truly like to see an assembly language revival though. I truly would. It would be a return to sensibilities in programming. It would be a return to being careful with memory usage with improved focus on small efficient programming. It would be a really good thing. I just don't see it happening.
The purpose for these more complex languages is about being able to more symbolically describe the processes to be executed by the machine. Assembly language was some of the worst about that -- if there wasn't a very detailed set of comments for nearly every line of code, it would be nearly impossible to follow in source. These more complex languages will always have their place and purpose. Trying to make them more efficient is a good and useful thing. Now if we were talking about writing the PHP interpreter in assembler, I'd say you had a winner compromise.
I don't know what the fascination is with scripting languages on the Linux platform or with FOSS in general, but it results in slow programs
Speed of development is faster in a scripting language, and in developed countries, below a certain scale, throwing hardware at it is cheaper than throwing programmers at it. The point of the article is that Facebook is above that scale, and programmers to write a new PHP interpreter have become cheaper than adding hardware+power+cooling.
with flaky UIs.
Citation needed. True, the often use a different widget set from the rest of the desktop (e.g. Tk from Tcl and Python and Swing from Java), but the popular widget sets also have scripting language bindings. how can one really tell the difference between a wxWidgets or GTK app written with Python vs. C++?
I like to use refurbished/recycled machines; which means that I'll have an old P4, 512M RAM and a slow bus.
Do these use more electric power than, say, an Acer Aspire Revo? The power consumption of a Pentium 4 and the power to remove the heat it generates can become an issue, especially for a server that's turned on 24/7.
Many times, applications written in a scripting language, whether it be Perl, Python, PHP, or whatever, will hang often and then start working.
There are three causes for this, and you can distinguish them with 'top' or 'Task Manager' or something else that can count CPU time and page file accesses:
Why not just stash your farm of slow php systems behind some heavy duty caching appliance(s)?
Something like aicache might fit the bill.
When your application is with each iteration generating more content dynamically than it was before (and you want to continue down that route), the benefit of caching starts to drop quite quickly.
Facebook does as much caching as it can - I mean, they're not daft. They're probably the world's greatest experts on large scale MySQL + memcached.
But sometimes cached data isn't good enough. Facebook users expect their statuses, messages and comments to reach their friends within seconds.
Flaky UIs - click on a button and nothing happens. Or things not drawing properly.
I've seen buttons do nothing and redraws fail even in compiled programs.
A refurb machine is about a third the cost of a new machine
By "cost", do you include or exclude the cost of power and cooling? And do you include or exclude the cost of replacing failed components? Capacitors die.
scripting languages are not appropriate for large applications with GUIs.
One scripting language has a huge deployment advantage over everything else: ECMAScript. It interacts with Document Object Models exposed by various runtime environments, and it's sandboxed so that users can more or less safely run a program without getting an administrator to install it. You might know it as JavaScript (ECMAScript + HTML DOM) or ActionScript (ECMAScript + SWF DOM). Or would you rather go back to ActiveX, where the web site sends the equivalent of a compiled DLL to each user, which runs with the user's full privileges and doesn't run on anything but a convicted monopolist's operating system?
If you could do in 1 day same thing that would take 2 weeks with assembly? The choice is clear.
Unless the two weeks of hand-tweaking the assembly language code of your program's single biggest bottleneck would reduce your program's system requirements so that twice as many users can use it. Such a case is reportedly common in video game development, where the increased revenue is often worth it.
Not to mention concerns about portability
"Portability" has more than one meaning. There's portability of the code, or its suitability for execution in multiple environments whose hardware isn't compatible. For this, you can keep a fallback implementation of each asm module in C. That's useful for running test cases such as whether the asm version still works correctly or whether it's worth continuing to maintain. The other kind of portability is the ability to run on small, battery-powered devices. These tend to have underpowered CPUs to save manufacturing cost and increase battery life, and the code to run on these CPUs must be extremely efficient in order for the application to be responsive. Go try to make a software 3D renderer on a handheld device with a 16.8 MHz ARM CPU and tell me you don't need assembly anymore.
I don't know why this has not yet been linked
Just taking a shot in the dark here, but I'll attempt an answer. The reason no one else linked to it is because you're the only one who considers it obligatory. Slashdot regulars will know that this type of thread happens on a near daily basis and with all due respect to xkcd there is simply no need to make another tired attempt at karma whoring.
brandelf -t FreeBSD
Instead of putting a band-aid on the current architecture
But that's exactly how you run a successful system.
1) Design product to meet needs of your audience
2) Design the implementation that you think will handle the load the best (with lots of load testing and simulations to make sure it meets expected demand)
3) Build product
4) Watch it behave in the wild... Realize that actual demand is considerably higher than expected demand and will continue to grow
5) Performance slows with more users... you need a solution that will the push the date of catastrophic overload further into the future, to buy time to work on *really* fixing the problem
6) Migrate to a new or adjusted architecture that will solve this current problem
7) Go to step 4
Facebook is on phase 5. You sound like scripting languages are the bane of slow products. Yet in reality, the main bottleneck is generally the database. If facebook rewrote everything in C or some other non-scripting language, not only would it be an incredibly long process, but the the end result would be far less beneficial than if they revamped their existing technologies and worked to up database performance. There is no ultimate solution for scaling a product. You need to be constantly adjusting your strategies, implementations, and systems to cope with resource usage.
The goal of computer science is to build something that will last at least until we've finished building it.
The key architectural performance issues in large web apps like Facebook are about scalability by clustering and parallelism and caching... usage of proper higher-level languages helps in this (think how pure-functional programming removes shared state and Google's mapreduce for example), while using a lower-level language may give a speedup on single individual machines but makes the architectural problems harder to tackle.
I want to play Free Market with a drowning Libertarian.
You are saying that it takes weeks to use assembly for something that you can code in an hour? That's gross understatement.
When assembly was used more in PCs, processors and programs were far simpler. Today, 4 cores is the standard even in desktop PCs! Things such as multithreading, etc. are essential basics of modern programming languages and can be handled exrtremely easily. However, properly designing, coding, debugging, testing and documenting those things on Assembly takes far more time than you implied. And then they are very platform dependant and need to be redesigned, modified and tested, etc. when you want to change servers... Or if you want to have different types of servers running the same code... It's a nightmare! Not to mention that it is harder to find developers for that, you need to pay them more, etc...
And how much you really win doing that on Assembly instead of using C or even C++? Very little! I am sure that the difference is notable in a system at the size of Facebook but they have obviously decided that it isn't worth it even for them.
I'm a Java developer with 10 years of experience developing enterprise grade server applications. We use Java, like the majority of Fortune 500 companies, because a Java app can be maintained with a development team greater than 1 coder, common memory coding errors and behaviours is avoided, a large API library prevents us from having to re-invent the wheel constantly, and the JVM is battle-tested in large deployments.
But, no, I guess I'm just a kid who doesn't know how to code.
This space left intentionally blank.
PHP is an example of a scripted language. The computer or browser reads the program like a script, from top to bottom, and executes it in that order: anything you declare at the bottom cannot be referenced at the top.
This was true in PHP3, but since PHP4, even declaring functions at the bottom of a file, they were still available at the start of a file execution. Everything got compiled in to an intermediate stage before execution.
creation science book
Speed of "Java" and ".Net"? Is it a joke?
No, it's not.
"Java" hangs all the time
No it doesn't.
and the ".Net" code to do a simple task is so convoluted that it is just ridiculous.
No, it's not.
Honestly, you really have no fucking clue what you're talking about, do you?
I've been involved in a number of projects that were prototyped in a scripting language (usually perl or python) and then rewritten in C for performance, with the disappointing result that the C code ran slower. I've also seen the same with C -> assembly a few times.
The explanation is fairly straightforward. The low-level-language experts (including me a few times) may have known their language well, but they'd never looked into the perl/python/cc code to learn the algorithms used there. It turned out that the implementers of perl/python/cc have developed some rather sophisticated algorithms for some of the time-wasting operations (e.g. table lookup) that were unknown to the asm programmers.
If they'd recoded the same algorithm used in the interpreters and compilers for the higher languages, they'd probably have won the contest, because there's no doubt that there's still some wasted cpu time in the higher languages' code. But, as others have pointed out, very often the algorithm used is a better predictor of speed than the language used.
So high-level vs. low-level language is a bit of a bogus distinction. The actual speed of the code is a combination of the algorithm and the efficiency of the implementation of the language. And some languages have several implementations with different efficiencies.
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
I'd like to point out that long before xkcd there was userfriendly, and that in my circle we still like to and this sort of joke by saying "magnets" and giggle. The "Edward Lorenz, the butterfly and the chaos theory" punchline seems a bit forced (unless you go for the 'M-x butterfly' twist to make the emacs guy get the attention ;) )
XKCD is occasionally amusing, UF never was.
Also, inodes? They're talking about DOS... sheesh!
Really the only time you have to handle assembly in a PC application is when you're implementing a just in time compiler, and it's becoming the fashion to let LLVM do that for you.
That's an interesting combination of overstating and understating the case.
For one thing, your favourite C/C++ compiler likely contains a hand optimized memcpy() routine, down to assembly if it exposes a worthwhile gain, or coded in C with or without intrinsics if it doesn't. Many C/C++ compilers contain hand-optimized floating point routines, even more so in the embedded world. Plus there are many performance libraries out there to handle the heavy lifting in multimedia, mathematics, and encryption, some of which are vendor tuned to the n'th degree. It's been a while since I've used an Intel library, but this is likely one of the breed:
Intel MKL
As for LLVM, I'd say it's more than fashion. The differences in performance characteristics from one micro-architecture to another are nightmares to cope with at the assembly language level. The average tablet computer these days could probably play Kasparov to a draw, and there are still macho programmers out there who think they can do register assignment and live range analysis better than your compiler? Dude, if you've got that much talent, roll up your sleeves and fix the freaking compiler. Hopefully LLVM will solve that old problem of first having to swallow the gcc ast syntax enzyme.
Tautology #1: I can beat my computer at chess => your chess computer sucks (or it's running on your wristwatch).
Tautology #2: I can beat my compiler at coding a non-trivial loop => your compiler sucks.
Unless your goal in life is to win rigged competitions, LLVM is a lot more than a fashion statement.
you must be *REALLY* new here
Mod points: Guaranteed to remove your sense of humor.
Side effects may include gullibility and temporary retardation
In today's modern processors you wont gain much performance in assembly. A core2duo simply reads the x86 instructions and converts them to risc and much of the optimizations happen at the compiler and during execution on the fly. You can always gain some speed but its nowhere near what it could do just a decade ago.
What also needs to be taken into account is the costs and time to rewrite years of development work from scratch. Sunken costs drive accountants crazy and threaten the job of any IT manager.
Instead of starting from scratch its better to use tens of millions of dollars of existing code.
Its nice from an engineer perspective but facebook is a corporation and money needs to come first and foremost.
Also assembler can crash a system and freeze it. The point of switching to NT or Unix was the point of stability of using c api's that are managed rather than using Windows95 which had assembly code that could freeze your computer.
http://saveie6.com/