Slashdot Mirror


How Facebook Runs Its LAMP Stack

prostoalex writes "At QCon San Francisco, Aditya Agarwal of Facebook described how his employer runs its software stack (video and slides). Facebook runs a typical LAMP setup where P stands for PHP with certain customizations, and back-end services that are written in C++ and Java. Facebook has released some of the infrastructure components into the open source community, including the Thrift RPC framework and Scribe distributed logging server."

23 of 111 comments (clear)

  1. Open source by Norsefire · · Score: 5, Funny

    As I recall, some of their code was made open source in 2007, although not deliberately.

  2. One question: by bogaboga · · Score: 4, Interesting

    About how much has Facebook saved by using Open Source Software? I ask because I am not familiar with licensing costs from competing solutions. Thanks!

    1. Re:One question: by julesh · · Score: 5, Informative

      About how much has Facebook saved by using Open Source Software? I ask because I am not familiar with licensing costs from competing solutions. Thanks!

      I haven't watched the presentation so don't know if this is answered there, but it's hard to pin down any numbers on precisely how many servers facebook operates. That said, an estimate of their expected power usage in their recently acquired second datacenter is 6 megawatts, placed at twice the usage in their current datacenter. Realistically, this probably equates to a cluster of around 5,000 machines in the current datacenter.

      Costs per machine are likely to be restricted to Windows Server Web Edition; other software would not be needed on all machines (depending on cluster architecture, of course) so would be a trivial cost in comparison. Retail for the web edition is $399; I think we could expect such a high profile user to qualify for a 50% discount. This would put their software costs at about $1M. Considering that they're believed to have spent over 100 times this on hardware and support costs over the last year, I doubt this would be a particular concern. Price of purchase is not a factor in why facebook does not run on proprietary software.

    2. Re:One question: by David+Gerard · · Score: 2, Funny

      Microsoft has announced the infrastructure for its cloud computing service Azure, formerly (and presently) Windows Vapor.

      "We want all open source innovation to happen on Windows. In practice, Windows is too slow, and just putting Linux underneath the same software stack triples performance. So we're running the Windows versions of the software on Linux using Wine."

      The new Microsoft Amazingly Open And Genuine Public License allows you complete freedom to use, modify and redistribute the software provided that every copy comes with a DVD of Windows Vista Ultimate, you acknowledge that Microsoft's FAT patent protects a remarkable and valuable innovation in computer science and all documentation is in OOXML.

      (work in progress, not yet on notnews)

      --
      http://rocknerd.co.uk
    3. Re:One question: by scientus · · Score: 2, Insightful

      yeah sorry, wrote without thinking.

      It doesnt have licences is the same way as commercial apps. Also agreeing to the licence is not mandatory to simply use the software, unlike the presumptions made by proprietary licences. In that way its licence is very different, but I did use the wrong words.

  3. Re:whatever by AHuxley · · Score: 4, Funny

    To quote a joke on slashdot
    "Is there anything Java cannot make slow."

    --
    Domestic spying is now "Benign Information Gathering"
  4. Mark Zuckerberg is a GEGAWNTIC DOUCHE w peach fuzz by Anonymous Coward · · Score: 2, Interesting

    Ah, the sweet ironies (and hypocrisies) in life. There's something beautifully creepy about a person fighting so hard against the same thing they fought so hard to create. In today's case, the culprit is Mark Zuckerberg, the young man more responsible than perhaps any other for his generation's obsession with displaying itself publicly on the internet. The New York Times has reported that a judge turned down Facebook's request to have "unflattering documents" about Zuckerberg removed from the website of Harvard magazine 02138.

    At the center of the issue is an article in 02138 about Facebook's evolution and the subsequent lawsuit from classmates asserting Zuckerberg stole the idea and computer source code to begin his own project. The New York Times calls the article "sympathetic to the plaintiffs's account and questions the validity of Mr. Zuckerberg's claims."

    The 02138 article also contains Zuckerberg's handwritten application to Harvard, and a journal that "contains biting comments about himself and others."

    Perhaps Gawker summarized it best, saying, "This is the same dude who made billions from a website that allows you to let everyone in your friend network know when you are peeing."

    And now he's mad that a private persona he would like to keep that way has entered the public domain. Yes, the sweet ironies and hypocrisies in life: why do we love them so much?

  5. Re:whatever by theillien · · Score: 2, Insightful

    Whatever they're doing, it's not working too well. Sure, they manage to serve the pages, but the user experience is confusing and it seems to take them forever to roll out new and improved versions.

    That has little to do with the infrastructure and more to do with the site design. Please don't blame the sys engineers/admins for the poor interface design.

  6. Re:whatever by Tubal-Cain · · Score: 4, Funny

    Too paraphrase the answer:
    "Sun's stock price plummet.

  7. Yeah, Blame the Language by aoheno · · Score: 2

    PHP is the most popular language on the planet for a good reason - transaction rate.

    If code is written in any language such that the app cannot handle more than 12 transactions per second, it's time to find another profession instead of blaming the language.

    Depending on the application, PHP can handle several hundred transactions per second, on *one* machine. It is common knowledge that Java requires far more resources to achieve a typical transaction rate, than PHP.

    --
    Her lips were softer than a duck's bill, but her quacks ...
    1. Re:Yeah, Blame the Language by julesh · · Score: 4, Informative

      Depending on the application, PHP can handle several hundred transactions per second, on *one* machine. It is common knowledge that Java requires far more resources to achieve a typical transaction rate, than PHP.[citation needed]

      This is just bullshit. A Java-based server will typically require a fairly constant 64MB more RAM than an equivalent PHP server, but other than this the Java system will outperform PHP in every sense. If the content generation is even remotely complex, Java can be up to 100 faster, which translates to 100 times higher transaction rate.

      Sure, PHP can handle several hundred transactions per second, if your script is <?php echo "hello world"; ?>. This benchmark of a non-trivial e-commerce application shows that Java can easily handle 500 requests per second on a small 2000-era 4-cpu cluster. A modern quad-core server should be handling at least 20 times that rate, absent any improvements in Java architecture since then (and there have been many; this test was run on Java 1.1, which was hideously slow compared to modern Java versions), and ignoring the performance improvement from not having to load balance requests at the front end or access the database server across the network.

  8. Re:Come on guys by BitZtream · · Score: 2, Interesting

    While I think Facebook is nothing more than one big popularity contest, I have to agree.

    At least most of the stuff on Facebooks website works.

    With slashdot, half the time clicking on a comment to expand it doesn't work unless I refresh several times or copy and paste the link into a new browser.

    The right hand sidebar will say 'freshmeat' and show stuff from linux.com and vice versa.

    At first I thought this was because I still used IE and that was the problem, being that slashdot doesn't cater to IE users, fine. So after I switched to Chrome I figured it wouldn't be an issue, yet its not any different.

    I still can't expect expanding a comment to work, I still get crap listed as fossfor.us showing freshmeat entries, 'get more comments' doesn't do shit half the time.

    As I've said countless times, programming in PHP and using MySQL 99% of the time means you don't know what you are doing. There are, however, those few large sites that use it that can actually justify its usage because it fits, but only if you actually know what your doing.

    I have websites powered by PHP, ASP.NET, ASP, Java, and C. Some of those are good fits for what they do, some of them aren't and I've learned that the hard way. I've also learned that in most cases things are written because a developer 'knows' a specific language. My personal opinion is, if you only 'know' one language, you aren't a programmer. A real programmer can use just about any language given a good reference manual, and can be proficient in that language rather quickly after starting to work with it.

    Unfortunately, most people who call themselves programers, aren't. They just happen to be able to get by with a language they've been spoon fed in the past long enough to hack out some POS that barely manages to get the job done and will drive any sane programmer absolutely mad when they get stuck taking over after the original devs are found to be incompetent.

    Makes you wonder how many online services have failed because of arrogance and ignorance of the developers.

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  9. Re:the blame is with management by Anonymous Coward · · Score: 5, Insightful

    That has little to do with the infrastructure and more to do with the site design. Please don't blame the sys engineers/admins for the poor interface design.

    Well, the fact that they gave a talk about their LAMP stack tells you that they consider engineering more important than site design. Furthermore, a poor choice of infrastructure makes doing good site design hard.

    And that's my point: Facebook is evidently driven by system stuff and programmers, while it should be driven by site design.

    Clearly, $MY_SPECIALTY should drive the entire system! They made a big mistake by allowing $OTHER_SPECIALTY to take precedence. Everyone knows that only $MY_SPECIALTY should dominate all design plans. Duh.

  10. Re:the blame is with management by Firehed · · Score: 4, Insightful

    If your site infrastructure is influencing how you design, you've made some sort of monolithic error along the way. Good code completely separates the content from the design. It's not like they've just hacked up a Wordpress install (which seems to go out of its way to tie content and design together) - Facebook employs hundreds if not now thousands of programmers; it's pretty safe to assume there's at least one UI/UX specialist on board as well.

    All things considered, I'd actually say that Facebook's design is pretty decent, but that's of course a matter of opinion. A lot of the code that went into that design sucks, but that's what happens when you have to support IE6. Regardless, I think it's great that they're sharing knowledge about how they've managed to use and customize an infrastructure to support 200,000,000 users, especially with the amount of traffic they have to deal with. That's well beyond the scale that many governments have to worry about!

    --
    How are sites slashdotted when nobody reads TFAs?
  11. Re:Related /. article by BitZtream · · Score: 2, Interesting

    It takes pretty much 0 work to make LAMP continue to function. Its for all practical purposes, set it up once (properly) and forget it.

    It takes work to make the applications on top of it function continually as thats where the change occurs. LAMP isn't going down on its own, it'll appear to 'go down' because of the 'mostly useless modules' that work along with it fail, not because LAMP does.

    I would expect the admin(s) that care for 'the core LAMP platform' spend most of their time doing other stuff. In reality, its probably only multiple to avoid any single person holding to much knowledge and to maintain coverage while that person isn't at work. I just can't imagine they do a whole lot of work 'keeping it running', with the exception of handling database growth and performance, which is more likely handled by the people who design and work with the applications that use that database.

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  12. Re:Not very well by FishWithAHammer · · Score: 2, Insightful

    You do realize that you can do massive improvements to PHP perf in the space of 5-10 minutes without a recompile right...? The idea that PHP is "slow" is FUDtastic. Of course it's slow if all you're doing is letting it interpret every time, but with APC or another caching mechanism it's interpret once, run-the-bytecode every other time. Massive speed improvements.

    --
    "You can either have software quality or you can have pointer arithmetic, but you cannot have both at the same time."
  13. Re:Not very well by Firehed · · Score: 4, Interesting

    PHP, as a language, is more than capable of handing four requests per second (which can be said of pretty much anything other than punch cards).

    Writing bad code in PHP, however, will of course slow things way down. Just like not having indexes on your databases, or doing stupid/unnecessary JOINs. Or not caching properly (see: Wordpress). Writing fast and efficient code in any language is easy enough provided you're a skilled programmer. Facebook, unfortunately, started off as Zuckerberg paying a friend with some web skills to build out a system, and it grew so quickly that replacing the code (or, rather, the DB schema) with something that doesn't suck probably became near-impossible. If you write code with scalability in mind, it's not a tremendous problem.

    Of course, nothing is going to cope well with the sheer volume that Facebook deals with. There's plenty you can do along the way to help yourself out, which Facebook may or may not have done. You can bet that nobody thought the site would ever have 200MM users when the first lines of code were written; they probably never expected 1% of that. Writing intelligent code is the most important part of scalability - writing smart DB queries and minimizing the number required probably being the biggest part of that. Have your MySQL servers instead of PHP do some calculations in queries (hashes, query-related math, etc) usually doesn't hurt since you're generally offloading CPU-intensive operations to a disk-bound machine (i.e., has spare cycles).

    There's all sorts of tricks and optimizations. Some are language-specific, and some aren't. But making bad decisions early on is a lot harder to fix than an inefficient foreach loop.

    --
    How are sites slashdotted when nobody reads TFAs?
  14. Re:Not very well by Fweeky · · Score: 4, Interesting

    They have somewhere in the region of 5,000 servers in their main datacenter and (I believe) others scattered around the world, but restricting it to just that main center, that means each server is handling around 4 requests per second

    I somewhat doubt every single one of them is a dynamically driven webserver. Probably at least half are databases, search servers, caching servers, backend appservers, file servers, CDN type stuff, backup servers, hot spares, admin servers, staging machines, etc.

    For example: Newzbin has 5 webservers in main rotation; it also has 7 search servers (plus one development machine with similar specs), 6 database machines, 2 backend systems running most of our cronjobs, 2 admin servers, 1 web development server, and 2 systems for building and deploying OS's from. As far as load is concerned, the backend stuff is far more important than the frontend. Sure, we could rewrite the main site in Java or Scala or C++ and get away with 3 webservers and still be N+2, but trust me, those extra two or three webservers is not a significant cost next to that of development.

    I can either spend £5k on extra equipment (plus occasionally boosting our space and bandwidth costs, but those are dominated by other systems already), or I can spend £70k a year on another developer, who *still* won't allow us to match our development speed with PHP, and then rewrite tens of thousands of lines of code, likely into much more.

    Much of our backend is written in C. That's where the big payoffs for efficient languages is, not a bit of database-limited HTML rendering. Judging by how many big sites are still running PHP, Python and Ruby for their frontends, this would seem to be the case elsewhere, too.

  15. Re:Not very well by coryking · · Score: 3, Interesting

    Facebook can cache most of their content which dramatically reduces the overhead of using a scripting language

    True. But writing cache code is not easy and makes your code more brittle. It increases the likely hood a user will interact with the website and do something, say "update my profile" only when they click "save", their profile hasn't updated yet because your cache sucks. Then you have to plaster your site with bullshit messages about "please allow 30 seconds to see the change".

    But what is far, far, far worse is you are allocating programming resources to non-features. Caching is a non-feature that adds zero value to your website. Your users dont interact with your cache. They interact with your website--and I bet if you are like any moderatly complex site, you've got all kinds of bugs that annoy the hell out of them. So rather than allocate your developer time to fixing those annoying bugs (thus adding value) or adding new features (thus adding value), you are stuck pissing away time optimizing bullshit your users never see.

    So yeah. You can cache the fuck-all out of your website. But only by stealing developer time away from working on features that make your users happy. Of course if you wrote the thing in C instead of PHP, you'd have a different set of development problems of which I could only have nightmares about.

    In otherwords, engineering is always a tradeoff. Use PHP (and MySQL) and piss away developer time on caching the fuck around their weakness. Use a compiled language like C and piss away developer time doing fuck-if-I-know because you didn't free mallocs or had to write a template language from scratch or some insane shit like that. Pick your poison!

  16. Re:Not very well by drinkypoo · · Score: 4, Interesting

    As you say, there is a tradeoff. It doesn't matter if you're fighting the need to cache intelligently in PHP, or the need to get everything right because you're developing a complete solution in C (or whatever) or the need to interface to someone else's system for serving pages if you're using something in between. It also doesn't matter if you're using a servlet technology, or you're punching bits out on a paper tape and feeding it into a machine which converts it into EBCDIC and... you get the idea: don't fuck up.

    In any case the whole argument is fucking stupid because: PHP is not implemented in PHP. And Facebook is not implemented in pure PHP. See summary: Facebook runs a typical LAMP setup where P stands for PHP with certain customizations. At some point you have to ask yourself how many wheels you want to reinvent. If you extend PHP you can reinvent fewer wheels. I'm not sure it's the right answer, but I'm sure it's not a horribly wrong one. I'm also absolutely certain that barring some massive development in processing the future is only going to involve more parallelism and more clustering, and that if you expect PHP to scale on a single machine you're a bozo.

    What I have personally noticed about using PHP is that a single page load can consume an absolutely insane amount of memory. This problem, too, is mitigated or eliminated by aggressive use of caching. In order to cache properly you need to do something intelligent with your data store, which I think is where most people fall down. Having looked into the mishmash that most CMSes produce in the db is enough to make you weep. I long for an elegant object-oriented CMS based on practically anything, but the simple truth is that PHP is by far the easiest thing to get going without spending any money and that has probably done more than anything else to propel it to the head of the FOSS class, at least in terms of popularity. A staggering number of quite excellent websites seem to be built with it as well.

    In summary, I reject the notion that PHP is a serious limiting factor for the majority of websites and that most of those for whom it is have failed to understand PHP. (Not that I'm any PHP guru.) It's true that a clustered web application is significantly more complex than something which is not clustered. However, it's also [potentially] far more scalable. At some point you simply run out of machine. When you can't get anything better from Sun (AFAICT they make the single machines which can handle the most threads today) you're going to have to cluster, even if it's only to two machines. At that point you'll have far more complexity invested in having a single system image to work with and the pain of moving to a cluster will be magnified that much more as well. If you accept the notion that clustering is today and for the foreseeable future the best way to handle scalability (which I admit is at this point not a proven notion, but is at least a well-supported theory) then the idea that PHP is a major limiting factor is just plain silly. Sun is circling the drain, and everyone else is concentrating on clustering. Your call...

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  17. Re:A hodge podge mess by mfnickster · · Score: 3, Insightful

    Facebook seems to be stitched together as a set of "solution de jour" technologies without any real architecture behind it. Too many languages, frameworks and other gems.

    I was thinking the opposite - they have developed an architecture that is modular enough to allow them to develop different pieces using different technologies, yet they all work together pretty seamlessly. I'd say that's quite an accomplishment!

    --
    "Slow down, Cowboy! It has been 3 years, 7 months and 26 days since you last successfully posted a comment."
  18. Who gives a shit? by PingXao · · Score: 2, Insightful

    I guess a story on /. with only 75 comments after 7 hours pretty much answers that question, eh?

  19. Re:Both Java and PHP Are Interpreted by julesh · · Score: 3, Informative

    Both Java and PHP are interpreted languages because this is how you create a cross-platform language.

    Each gets compiled to bytecode which gets executed in a OS specific VM.

    Java is JIT compiled to native code, whereas PHP is bytecode interpreted. The difference is more than an order of magnitude. In fact, judging by this comparison, in many cases Java is about 100 times faster than PHP.

    Frankly, most websites do not need an app server. Wikipedia uses PHP, not Java. It is not a 'simple' website that you say PHP is suited for.

    Wikipedia is presenting uncustomised content to most users. It runs a huge squid cache in front of its PHP servers. If it tried to run PHP for each user it would crawl. I run mediawiki locally on an AMD Athlon64 2200+. It takes ~0.2 seconds of 100% CPU time to process a simple request. There is simply no way Wikipedia could run without content cacheing.

    This is not to say that the task of serving that content is cheap. But they're doing a lot better than facebook; they're serving 30,000 requests/sec with only 350 servers. The difference, I suspect, is mostly down to the amount of cacheing they prform.

    Facebook is much less able to cache content. It doesn't have a squid front end because relatively few users see the same exact content, unlike for wikipedia; most users are logged in most of the time and see pages customised for themselves.