Slashdot Mirror


Facebook VP Slams Intel's, AMD's Chip Performance Claims

narramissic writes "In an interview on stage at GigaOm's Structure conference in San Francisco on Thursday, Jonathan Heiliger, Facebook's VP of technical operations, told Om Malik that the latest generations of server processors from Intel and AMD don't deliver the performance gains that 'they're touting in the press.' 'And we're, literally in real time right now, trying to figure out why that is,' Heiliger said. He also had some harsh words for server makers: 'You guys don't get it,' Heiliger said. 'To build servers for companies like Facebook, and Amazon, and other people who are operating fairly homogeneous applications, the servers have to be cheap, and they have to be super power-efficient.' Heiliger added that Google has done a great job designing and building its own servers for this kind of use."

83 of 370 comments (clear)

  1. You're Computin' for a Shootin' Mister by eldavojohn · · Score: 5, Insightful
    So let me get this straight, the Vice President of a web company is criticizing the hardware guys in two of the world's biggest chip makers?

    You guys don't get it

    Is it possible to take out a massive life insurance policy on Jonathan Heiliger?

    To build servers for companies like Facebook, and Amazon, and other people who are operating fairly homogeneous applications, the servers have to be cheap, and they have to be super power-efficient.

    I assure you, despite your misconception that the world revolves around you everyone has those requirements. From the people who build supercomputers right down to the netbook I am typing on while watching Gurren Lagann.

    Can we get like a panel of hardware engineers to have a discussion with this guy and can I get some popcorn?

    --
    My work here is dung.
    1. Re:You're Computin' for a Shootin' Mister by Daengbo · · Score: 4, Informative

      Can we get like a panel of hardware engineers to have a discussion with this guy and can I get some popcorn?

      Slashdotters might want to take a look at the details of the Google servers to see what Heiliger is looking for. There's also a video tour.

    2. Re:You're Computin' for a Shootin' Mister by Frosty+Piss · · Score: 2, Insightful

      You've basically agreed with him. Whats your bitch? I don't get it.

      --
      If you want news from today, you have to come back tomorrow.
    3. Re:You're Computin' for a Shootin' Mister by timmarhy · · Score: 3, Insightful
      i don't know that i agree with some of googles design choices. do they account for details such as 10 small batteries are more expensive than 1 large battery of the same capacity?

      and why have a PSU for each unit, why not just run 12v power rails to each server and do the ac/dc conversion on a larger transformer further down the line with larger batteries providng the back up for clusters of servers? after all no psu is cheaper than a psu with just the 5v taken out.

      --
      If you mod me down, I will become more powerful than you can imagine....
    4. Re:You're Computin' for a Shootin' Mister by fuzzyfuzzyfungus · · Score: 4, Informative

      At low DC voltages, you can't really do long cable runs without either suffering substantial resistive losses or using cable so thick you could club a seal to death with it.

    5. Re:You're Computin' for a Shootin' Mister by Anonymous Coward · · Score: 5, Funny

      I vote for the seal clubbing thing.

    6. Re:You're Computin' for a Shootin' Mister by mabhatter654 · · Score: 3, Informative

      I think they run AC to the row or rack of servers, then they have just one super efficient PSU powering all the servers in a rack rather than 42 separate power supplies (plus UL enclosures, connectors, extension cords, etc, etc)

      Essentially Google builds "rack-sized" blade centers... or at least catching up to what IBM and HP are doing but on a bigger scale, like full racks or multiple racks managed at once rather than just one chassis.

      I do agree that chip makers aren't thinking "big enough" with things like their Blade lines.. Google wants to see reference specs that include options for bare motherboards to slide right into your basic 42 unit rack with IO, disk and power all pulled out to the raw basics so Google can decide how to manage the bits rather than having stock OEM boards with such limited options. Google wants to manage a "rack" as a single machine and optimize power and parts across 40 servers as one group, not 40 separate little systems.

    7. Re:You're Computin' for a Shootin' Mister by node+3 · · Score: 5, Funny

      i don't know that i agree with some of googles design choices

      I'm sure they'll get right on that, random slashdot guy...

    8. Re:You're Computin' for a Shootin' Mister by jamesh · · Score: 5, Funny

      cable so thick you could club a seal to death with it.

      What's with everyone creating new units of measure? "We're going to need some 3 Seal Cable for this job!"

    9. Re:You're Computin' for a Shootin' Mister by lukas84 · · Score: 2, Insightful

      Well, i can tell you that i do not want cheap, shitty hardware with no feature as servers.

      This is all fine for companies like Facebook and Google that are in the primary business of running IT, and wrote software that accomodates for the shitty hardware they use.

      However, other applications like standard business IT requires highly resilient, highly managable hardware which offers many features, stable parts supplies, management possibilites, and is built upon sturdy hardware that can withstand non-datacenter conditions of cooling and dust.

    10. Re:You're Computin' for a Shootin' Mister by Anonymous Coward · · Score: 5, Funny

      If we're gonna start the AC vs. DC war again, I call dibs on tasering the elephant this time.

    11. Re:You're Computin' for a Shootin' Mister by rcw-home · · Score: 4, Informative

      I think they run AC to the row or rack of servers, then they have just one super efficient PSU powering all the servers in a rack rather than 42 separate power supplies (plus UL enclosures, connectors, extension cords, etc, etc)

      No, they don't. They use motherboards built to their own specification that require only 12V power. This power is supplied by the server's own PSU, which takes 240V input. The PSU hooks into a 12V sealed lead acid battery, implementing UPS functionality (there is no centralized UPS).

      I think it's a very elegant design.

    12. Re:You're Computin' for a Shootin' Mister by pseudonomous · · Score: 5, Funny

      Oh, I finally get it now, Cat 5 cable can kill five cats, and cat 6 can kill SIX cats.

    13. Re:You're Computin' for a Shootin' Mister by Ihlosi · · Score: 5, Funny
      Oh, I finally get it now, Cat 5 cable can kill five cats, and cat 6 can kill SIX cats.

      But ... but ... you'll need Cat 9 to get rid of the cat completely?

    14. Re:You're Computin' for a Shootin' Mister by metacell · · Score: 2, Insightful

      So let me get this straight, the Vice President of a web company is criticizing the hardware guys in two of the world's biggest chip makers?

      He's not criticising their technical know-how, he's criticising them for not knowing what their web company customers want.

      Since he himself is one of those customers, it's not too unlikely that he knows what he's talking about.

    15. Re:You're Computin' for a Shootin' Mister by Jurily · · Score: 4, Insightful

      When you need the cheapest, most power-efficient servers you can find, to the point where you criticize your suppliers publicly, you're not willing to pay for the most expensive cables out there.

      Besides, all the seal clubbers are buying those up.

    16. Re:You're Computin' for a Shootin' Mister by TheRaven64 · · Score: 2, Insightful

      Not to mention the fact that neither Google nor Facebook have important transactions. If, every million or so page accesses, one of their servers dies, who cares? The end user who has to hit refresh will probably blame it on their ISP.

      --
      I am TheRaven on Soylent News
    17. Re:You're Computin' for a Shootin' Mister by KibibyteBrain · · Score: 2, Insightful

      Yes, this reads like "Guy with huge ego upset that engineers can't use magic to conjure up ideal device at his command." to me.

    18. Re:You're Computin' for a Shootin' Mister by MonoSynth · · Score: 2, Funny

      But that will void your warranty!

    19. Re:You're Computin' for a Shootin' Mister by Antique+Geekmeister · · Score: 3, Informative

      You need a 'CAT5 of Nine Tails'. Google it: I've seen them made with usable sets of labeled connectors and adapters. on the ends to serve as an amusing collection of adapters, actually packed in a toolbox, and amusing as heck to whip out when a client had lost the appropriate adapter to go from their funky 3Com serial jack to normal laptop serial port, or needed a crossover cable to tie two machines directly together without a switch back in the pre-GigE days when network ports became hermaphroditic.

    20. Re:You're Computin' for a Shootin' Mister by fuzzyfuzzyfungus · · Score: 5, Funny

      Seals actually have an excellent warranty; but the sticker says "Warranty void if seal is broken" so the warranty has never been successfully claimed.

    21. Re:You're Computin' for a Shootin' Mister by Andy+Dodd · · Score: 2, Insightful

      "Google wants to see reference specs that include options for bare motherboards to slide right into your basic 42 unit rack with IO, disk and power all pulled out to the raw basics so Google can decide how to manage the bits rather than having stock OEM boards with such limited options."

      Sounds a lot like a VME backplane...

      --
      retrorocket.o not found, launch anyway?
    22. Re:You're Computin' for a Shootin' Mister by rgviza · · Score: 2, Insightful

      that guy is an ass.

      the latest generations of server processors from Intel and AMD don't deliver the performance gains that 'they're touting in the press

      then

      Google has done a great job designing and building its own servers for this kind of use

      I wonder who makes the server processors for Google's servers. Hmmm.....

      --
      Don't kid yourself. It's the size of the regexp AND how you use it that counts.
    23. Re:You're Computin' for a Shootin' Mister by KillerBob · · Score: 2, Insightful

      Most battery UPS's upconvert the 12VDC to 120VAC to provide a standard power supply that you can plug anything into. That's because most of them run off standard boat or motorcycle 12V batteries which you can get at your local car parts store. Diesel or Gasoline UPS's are electric generators and usually cost a *lot* more. They make sense for keeping an office building powered, but not for keeping just a computer or thirty up. And that's above and beyond the power losses from transmitting 12V over a distance that you mention.

      I can see right away why it'd be cheaper to simply design a system to run off 12V directly and convert to 5V internally, and to having the battery right in that system.... first, you don't have to pay for the electronics in a UPS which convert the battery's 12VDC to 120VAC. Second, you don't lose energy in the form of heat, powering those electronics, and spinning the fans to keep it cool, and energy lost in transmission. A much higher proportion of the battery's power gets used to actually power the computer. The electronics which do the conversion from 12VDC to 5VDC are *much* cheaper, and less power intensive, than electronics that can increase the voltage, let alone converting it to alternating current.

      Think of it this way: it's basically a laptop, only without the keyboard, screen, video card, and with 8 memory slots and dual CPUs, and provisioning for two 3.5" hard drives. The system runs directly off the battery, and the power supply just charges the battery.

      Also, adding computers to the matrix doesn't reduce the length of time that you get from the UPS. I have a media center PC that's connected to a UPS, for example. The UPS is just running the computer and the sattelite receiver. In that configuration, it lasts about 1h without mains. If I were to plug the TV into it, it'd last about 25m. While you're operating on a *much* larger scale, the same would hold true for a centralized UPS. Each system you add reduces the overall effectiveness of the UPS by reducing the amount of time it can power the works without mains. By putting the battery directly on the server, you can add computers without diminishing this capacity. Your computing capacity in the event of power interruption scales up linearly, rather than hitting diminishing returns and a theoretical maximum limit.

      --
      If you believe everything you read, you'd better not read. - Japanese proverb
    24. Re:You're Computin' for a Shootin' Mister by DragonWriter · · Score: 2, Insightful

      So let me get this straight, the Vice President of a web company is criticizing the hardware guys in two of the world's biggest chip makers?

      Wrong.

      He is criticizing, in the bits in TFS, two groups:
      1) The marketing guys in two of the world's biggest chip makers (he's not complaining that the chips are flawed from an engineering perspective, he is complaining about the claims, which apparently conflict with Facebooks experience in testing them chips, about the performance of the chips), and
      2) The people setting the design goals (not, again, the engineers) at the companies making servers, complaining that they are doing a bad job of what he sees as a major need (which is, of course, also the particular thing that Facebook needs), and that Google does a better job of building servers for that need (a complaint which would be more effective at changing behavior at server manufacturers if it was followed up by Facebook going to Google to get Google to build them servers.)

      Can we get like a panel of hardware engineers to have a discussion with this guy and can I get some popcorn?

      Why? His complaints aren't directed at engineers.

  2. Hm... by Darkness404 · · Score: 3, Insightful

    To build servers for companies like Facebook, and Amazon, and other people who are operating fairly homogeneous applications, the servers have to be cheap, and they have to be super power-efficient.

    Hm, lets see... perhaps because Facebook and Amazon are niche markets? The average server isn't going to even need all the computing horsepower and the power efficiency is simply a drop in the bucket for most companies electrical bills. The average server is going to be much more I/O intensive than CPU intensive unless you do cluster computing or render a lot of stuff. The average server such as a web server or a file server doesn't use that much CPU and usually you are running 1-3 servers, not the hundreds that Facebook or Amazon would run.

    And really, why is a VP complaining about this stuff? That he can't either afford custom solutions or spend the money buying more servers?

    --
    Taxation is legalized theft, no more, no less.
    1. Re:Hm... by Darkness404 · · Score: 2, Insightful

      Niche as in, only a few companies (~100) are going to need the same solutions. On the other hand the vast majority of servers will be for much, much, much less intense use. Then you have the problem that really Facebook isn't super profitable, Amazon is but they seem to be doing decent with their servers and have the spare cash to simply upgrade them. I mean, other than a few websites who needs a "perfect" server?

      --
      Taxation is legalized theft, no more, no less.
    2. Re:Hm... by Quothz · · Score: 2, Informative

      Hm, lets see... perhaps because Facebook and Amazon are niche markets?

      -Maybe-. Even if they are a niche market, they're a big enough one to hold the attention of the big chipmakers.

      A traditional business model might use large orders, especially advance orders, to offset or defray the cost of setting up a production line or facility, and get most of the profit from smaller sales. Or they may choose only to do production runs for large, inherently profitable orders. Even in a firing-from-the-hip model, large customers cost less per unit in marketing and sales than do smaller ones, very much so when compared to the general public. And of course there's plenty of wiggle room between extremes. So depending on the diversity of the market and the choice of business model, big customers range from important to desirable. Naturally, in a niche market large customers have a greater importance, since smaller sales are fewer.

      Presumably, AMD and Intel are selling servers to the likes of Amazon and Facebook 'cause they think it's profitable. If it is a niche market, keeping those guys happy is paramount to profitability.

      (I don't think the server farm market is really a niche, tho'. But I dunno; I don't keep up with such things.)

      And really, why is a VP complaining about this stuff? That he can't either afford custom solutions or spend the money buying more servers?

      Well, because we asked. Well, not "we" as such, but someone asked him and he answered. It sounds like he was answering honestly and openly. I've no problem with that.

    3. Re:Hm... by HockeyPuck · · Score: 5, Informative

      The average server is going to be much more I/O intensive than CPU intensive unless you do cluster computing or render a lot of stuff.

      As someone who designs and deploys large storage environments for a living, I call BS. While the current generation of HBAs are 8Gb FibreChannel, I would say that the "average server" (as you put it) could happily live on a 1Gb HBA. Recall that almost all servers, or atleast those you care about, have DUAL HBA connections to their respective storage. So that's actually 2Gb of storage connectivity. Sure there are servers which have multiple HBAs, or use a higher utilization of the HBAs, such as database servers or backup/media servers. Most servers today are deployed with dual 4Gb HBAs as the 8Gb SFPs/optics are still quite pricey, and you cannot, in all seriousness, purchase 1 or 2Gb FC HBAs.

      Even as we deploy VMware based servers, the VMware servers themselves tend to be more memory/cpu strapped than IO.

      It would be very rare, or almost impossible for a server to be driving linerate HBAs, with still plenty of headroom left in the CPU. Even basic test tools like IOmeter require significant CPU usage to drive an HBA to capacity. And that is when it's writing/reading all zeros. It's doesn't actually need to do anything with the data. As would be the case if a database server was requesting 2Gb/s from a disk array, and then had to join/sort/add/whatever the tables retrieved.

       

    4. Re:Hm... by drsmithy · · Score: 2, Informative

      As someone who designs and deploys large storage environments for a living,

      Then you should know that throughput is not the only (or - typically - the most important) measure of IO performance.

      Typical computing tasks tend to be I/O bound - specifically by random I/O performance. To a large degree, this is due to the massive disparity in performance improvements between CPUa and storage.

    5. Re:Hm... by ezzzD55J · · Score: 2, Insightful

      Kindof depends on how you read 'niche.' yes, there is a relatively small number of companies (customers) that have such requirements, but if each of them have a massive, massive number of servers, then i wouldn't call that niche any more, because it still represents a large turnover.

  3. Facebook's application is poorly coded by jsimon12 · · Score: 3, Insightful

    I have heard from some reliable sources that Facebook and Twitter's backend applications are poorly written.

    Are Intel and AMD's claims overblown, sure what hardware manufacter doesn't cherry pick performance claims.

    But I don't care what sort of hardware you through at crap code you are always going to get crap performance.

    1. Re:Facebook's application is poorly coded by royallthefourth · · Score: 3, Insightful

      Crap code on faster computer is still going to be faster than it was on a slower computer. He's not saying anything about how efficient their software is, just that buying new processors didn't get him the performance delta that it was supposed to. More advanced hardware should deliver a performance benefit no matter what is running on it.

    2. Re:Facebook's application is poorly coded by corychristison · · Score: 2, Insightful

      More advanced hardware should deliver a performance benefit no matter what is running on it.

      Not if your code is not tuned for this new "advanced hardware". Surely there are new compile flags to consider, and if you are not tuning your code for the new processor features it could very well be slower than before.

    3. Re:Facebook's application is poorly coded by evanbd · · Score: 3, Insightful

      Developers have been known to trade off performance for development ease. Frequently the result is crap code. Yes, it performs like crap on both sets of processors. But if the application is CPU-limited (rather than IO or memory or...), then throwing faster CPUs at it ought to make it proportionally faster, no? Obviously they thought the previous performance was acceptable, is it unreasonable to think that buying CPUs marketed as 50% faster should give a 50% performance increase? Clearly crap code will still run like crap, but you ought to be able to throw more CPU power at it and get 150% of crap performance.

    4. Re:Facebook's application is poorly coded by hidden · · Score: 5, Informative

      Facebook is written in PHP; there are no compile flags.

      apache and the php engine have plenty of compile flags. not to mention whatever the database is.

    5. Re:Facebook's application is poorly coded by Necroman · · Score: 4, Interesting

      One of the server techs from Twitter was at SXSW 2 years and gave some details about how their backend servers worked. If I remember correctly (there were 4 sites on the panel, so I may be confusing them with someone else), the original code was written in Ruby on Rails which did not scale well to the multi-server systems that they had setup. They have spent a lot of time improving their code over the years, but from what I could tell, their initial implementation wasn't the most thought out thing in the world.

      --
      Its not what it is, its something else.
    6. Re:Facebook's application is poorly coded by Stormie · · Score: 4, Interesting

      I have heard from some reliable sources that Facebook and Twitter's backend applications are poorly written.

      Given the quality of Facebook's developer API (it's horrible), I'd be amazed if the back-end of the actual site wasn't poorly written.

    7. Re:Facebook's application is poorly coded by Samah · · Score: 2, Insightful

      Facebook is written in PHP

      There's your problem right there... ;)

      --
      Homonyms are fun!
      You're driving your car, but they're riding their bikes there.
    8. Re:Facebook's application is poorly coded by cowbutt · · Score: 4, Informative
      Essentially our disks are no faster than they where 3 years ago, or even 5 years ago

      # hdparm -Tt /dev/sdc

      /dev/sdc:
      Timing cached reads: 5120 MB in 2.00 seconds = 2562.04 MB/sec
      Timing buffered disk reads: 84 MB in 3.02 seconds = 27.77 MB/sec # hdparm -i /dev/sdc | grep Model
      Model=ST3200822A, FwRev=3.01, SerialNo=xxxxxx
      # hdparm -Tt /dev/sda

      /dev/sda:
      Timing cached reads: 6078 MB in 1.99 seconds = 3052.95 MB/sec
      Timing buffered disk reads: 338 MB in 3.01 seconds = 112.22 MB/sec
      # hdparm -i /dev/sda | grep Model
      Model=ST31000333AS, FwRev=SD1B, SerialNo=xxxxxx

      It's not even a full order of magnitude faster, but 112MB/s is still nearly four times faster. And these are both magnetic discs, rather than SSDs.

    9. Re:Facebook's application is poorly coded by gbjbaanb · · Score: 2, Insightful

      throwing proportionally faster CPUs at *good* code should make it proportionally faster.

      Crap code.... probably not. For example, I once had to improve the performance of an app. The app spent most of its time context switching from one thread to another, more time was taken up stopping a thread, switching to another, refilling the cache lines, and so on that was spent processing the data! Think what a faster processor would do here - the CPU would process the little bit of data it was given faster thus providing much more CPU time for context switching.

      Similarly with other aspects of modern code - relatively little of it is spent spinning CPU cycles. I'd say more was spent dealing with memory IO (as there is a lot of RAM used nowadays, getting that data to and from the CPU is, relatively speaking, slow as treacle) so it wouldn't matter if you could crunch the data faster if you still had to wait for it to be delivered to you.

      Then we put more stuff on the network, and connect to it via Web services and the like, and the amount of CPU power required gets less and less relevant.

      I'd say the single best thing you can do to get good performance, and therefore energy efficiency, and cheapness of resources is to write efficient code that requires little resources itself. Even if it takes you longer to do the job, tough on you - there's just you as a programmer but millions of users, the extra time spent developing at a lower level (instead of pointy-clicking in the IDE) is time well spent.

      If Facebook's code could be made 10% more efficient, they'd require 10% less servers with all the reduced energy bill that entails. But the Facebook chap doesn't care about that - that'd cost him programmer time, and that costs short-term money! Far better for him to whinge that Intel and AMD aren't fixing his shit for him instead.

    10. Re:Facebook's application is poorly coded by Anonymous Coward · · Score: 2, Insightful

      Not "unreasonable", but possibly naive and inexperienced, depending on the details.

      Crap code that bottlenecks a CPU often will not scale as well as good code. It involved bad synchronization, other contention, spinning loops, and memory bandwidth limits. It is often NUMA-unfriendly. It often interacts poorly with the other resources, such as I/O.

    11. Re:Facebook's application is poorly coded by jjgm · · Score: 2, Informative

      That may be so. The new drive may indeed have four times the raw read throughput. But how much larger are they? Five times.

      And even more tellingly, look at the seek performance. I looked up those two drives you mentioned. You'll find it's unchanged at 8.5ms. So we're seeking at the same speed, for more data.

      In practice, then, in terms of throughput per provisioned GB, we are 24% worse off, and in terms of seek time per megabyte we are TEN times worse off today!

      To illustrate what I mean, based on those numbers above: slurping 10TB off an idealised JBOD array of those newer drives would take 89 seconds; slurping 10TB off an idealised array of the older drives in parallel would take only 72 seconds. A similar (but far worse) story applies to random seek time performance, especially for busy transaction systems.

      One might challenge the exact figures, but it doesn't matter - the point is, drive size is an important gotcha in storage performance optimisation today, and it's because performance has not really kept pace with drive size. The issue is not offset by the bigger caches they're turning up with, although that helps for some workloads.

      We haven't talked dollars. The cost is important, but that's another dimension. Let's keep this to engineering chatter.

      So what happens in shops that need really high performance? Well, if it's an application with lots of random reads but with hotspots, then cache will do nicely. But for raw random write performance i.e. the heavy transaction processing applications, it's gotta be more 15K RPM spindles at lower capacity. Or go crazy and solid state, but that's another party.

  4. Well I suppose... by cptnapalm · · Score: 3, Funny

    Well, I suppose that if he does not like the offerings from Intel and AMD, they could always go with...

    Uh..

    Oh.

    1. Re:Well I suppose... by the+linux+geek · · Score: 5, Informative

      Let's see... IBM, Sun, Fujitsu, Itanium (yeah, its still Intel, but has great performance)... All of these can offer equivalent or much better performance at these kinds of applications than what they're using. Don't bitch if you're not willing to consider the alternatives.

    2. Re:Well I suppose... by kzieli · · Score: 3, Informative
      There's actually 2 seperate points here
      1. the latest CPU's don't seem to be any better in practice then the previous model.
      2. Server OEM's are not delivering power efficient servers.

      the two points are somewhat independent of each other. The second I suspect is due to their being a lack of any standard for power efficent servers. Google did it by running single voltage power supplies. A standard around something like this would be useful, and not just for servers I suspect.

      --
      read my mind at http://the-willows.blogspot.com/
    3. Re:Well I suppose... by Trixter · · Score: 3, Interesting

      I was just going to say that. If Facebook et al are not looking at the Sun coolthreads servers, they're idiots. A T5240 would deliver a whopping 128 hardware threads per 1u of rackspace.

    4. Re:Well I suppose... by fishbowl · · Score: 2, Informative

      >None of these offer much better performance. None.

      There are IBM and Sun systems that are in an entirely different league, in terms of IO and memory bandwidth, than any Intel- or AMD-flavored system.

      --
      -fb Everything not expressly forbidden is now mandatory.
    5. Re:Well I suppose... by drsmithy · · Score: 2, Informative

      POWER6 absolutely ass-rapes Nehalem. Period. 4.7GHz (clocked up to 6GHz internally), faster per-cycle than any x86 processor currently on the market.

      According to the SPECCPU2006 benchmarks, a 3.33Ghz Nehalem provides nearly identical performance to a 5Ghz POWER6 (@ 8 cores each).

    6. Re:Well I suppose... by asaul · · Score: 3, Insightful

      Really? What sort of test was it?

      We took a Java application off a E6900 using 35% of 48 1.35Ghz US-IV cores. We put it on a T5240 with 16 1.4Ghz cores we saw it only use 14% of the machine with improved user response time.

      We also ran a database benchmark for some tests we were running between some AIX and Linux boxes and threw it against a T5240 running Oracle 11g for comparison. Because it was predominately a single threaded operation it ran slower than the 2.2Ghz Power5 LPAR, but the overall difference was about the same ratio as the difference in clock speeds. The thing to note was the machine was only a few percent utilised, so we could have run another 16 or so instances and coped easily.

      These machines are workhorses. Granted, you need the right workload but highly parallel/highly transactional work like java web applications or web serving absolutely fly.

      --
      "If everybody is thinking alike, somebody isn't thinking" - Gen. George S. Patton
  5. Something about his arguement doesn't work by joeflies · · Score: 5, Insightful

    1) Facebook & Amazon need cheap, power efficient systems
    2) Intel and AMD aren't measuring up with processors to power these systems
    3) However, Google has systems appropriate for this use (presumably using Intel or AMD processors)

    If that's his argument, then it would seem that the real conclusion is that Facebook can't build systems as good as Google's, even though they are using the same processor technology.

    1. Re:Something about his arguement doesn't work by joeflies · · Score: 2, Insightful

      In addition, there seems to be something else wrong with his arguement

      "To build servers for companies like Facebook, and Amazon, and other people who are operating fairly homogeneous applications, the servers have to be cheap"

      Which he later follows up with the following insight

      "There's a pretty simple answer for scaling infrastructure. It's, 'Don't be cheap,'"

      so which one is it?

    2. Re:Something about his arguement doesn't work by blackraven14250 · · Score: 2, Informative

      Server Cheapness != Data Center Cheapness

    3. Re:Something about his arguement doesn't work by Trepidity · · Score: 2, Insightful

      If that's his argument, then it would seem that the real conclusion is that Facebook can't build systems as good as Google's, even though they are using the same processor technology.

      Google does have approximately 30x as many employees as Facebook, so it's not implausible that they've got a much greater ability to build in-house custom tech.

  6. PHP by Anonymous Coward · · Score: 3, Interesting

    And we're, literally in real time right now, trying to figure out why that is,' Heiliger said.

    It's because your shitty website doesn't have a single line of compiled code. PHP only goes so far.

    1. Re:PHP by afidel · · Score: 4, Interesting

      Yeah, this. Most of us don't have too much trouble wringing performance out of x64 processors when we need to. He wants a miracle of hardware he can throw at poor code which is NOT what Google asks for. Google simply want to wring every last flop/dollar (TCO) out of their systems which is slightly more than most of us need (the cost of engineering Google type solutions is more than 99.9+% of shops could reap through improved efficiency).

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    2. Re:PHP by MightyMartian · · Score: 2, Insightful

      Exactly. All these interpreted languages, even with some special tricks, will have serious scalability issues. At some point you have to look at the application and ask some serious questions.

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
  7. Re:WTF? by Zebai · · Score: 5, Insightful

    I agree I think this was writing his own resignation with this crap. The guy is basically telling everyone that he is incapable of finding an acceptable solution for his company and blaming intel and amd because he has committed a great deal of money on something that he didn't plan well enough to know exactly what the long term costs vs performance was. In the very article he says to not be cheap, but in many more words than necessary, probably to try to disguise what he is saying like most politicians, that they were not only too cheap, but made bad decisions on what to be cheap with. Its as if he's already in a public office, hes telling everyone he screwed up, why he screwed up, and trying to make it look like hes teaching everyone lesson to make his mistake to be less of a disaster.

  8. Sun.... by Fallen+Kell · · Score: 2, Insightful

    Its the next logical solution... Those T5440 servers with 256 processing threads are MONSTERS in terms of handling simultaneous connections which make them very good web servers, database servers, and file servers, all of which means they are very good for a company who's product is a website.

    --
    We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"
    1. Re:Sun.... by Temkin · · Score: 2, Funny

      Sun?.... you are crazy to go with sun and there platform now.. its all dead now..

      Larry? Is that you?

    2. Re:Sun.... by Fallen+Kell · · Score: 2, Interesting

      Not really. What is dead is Rock. Coolthreads are here to stay, especially now that Oracle bought them. Niagra Falls is the fastest single server Oracle server currently in existence. Oracle is going to continue to build on that platform, with most speculation being that they are going to release a "black box" Oracle solution, which will simply be a drop in place, connect power and network, and turn the key solution, eliminating the need for the company that purchases said solution to have system admins who have had enough Oracle training to know how to properly setup a server to run the database. Oracle will then sell "support" for the systems on a tiered basis. It will most likely be based on the same platform as the Sun Unified Storage System line, like the 7410, even available with Oracle RAC as an option.

      --
      We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"
  9. Sounds like a bunch of excuses to me by stox · · Score: 4, Insightful

    Assuming that a solution was properly engineered, this should not have been a surprise.

    Cheap. power efficient, performance. Pick two.

    --
    "To those who are overly cautious, everything is impossible. "
    1. Re:Sounds like a bunch of excuses to me by rossifer · · Score: 2, Interesting

      Assuming that a solution was properly engineered, this should not have been a surprise.

      Cheap. power efficient, performance. Pick two.

      Actually, Google got all three of those in their system-level design (when cheap is measured per CPU). What they didn't get was per CPU reliability. That's pretty miserable by the standards of commercial servers. Luckily, all Google software is architected, designed, and written to work around frequent hardware failures, so that's ultimately covered.

  10. A Familiar Tune from Facebook by 1sockchuck · · Score: 3, Interesting

    This is becoming an annual event for Heiliger, who also complained about server vendors at GigaOm's Structure 08 conference last year. Facebook used to buy a lot of cloud-optimized gear from Rackable/SGI, but no longer appears on the list of their largest customers. Makes you wonder if they're not going to follow Google's lead and build their own servers.

  11. Rub a lamp, Heiliger by SeaFox · · Score: 4, Insightful

    'You guys don't get it,' Heiliger said. 'To build servers for companies like Facebook, and Amazon, and other people who are operating fairly homogeneous applications, the servers have to be cheap, and they have to be super power-efficient.'

    NEWSFLASH! Customer are tightwads.

    Performance/Reliability/Price.

    Pick any two, Heiliger.

  12. Re:WTF? by Spit · · Score: 4, Interesting

    Looks like that to me; he scoped for cheap and cheerful and was bit on the ass when he realised that sometimes you get what you pay for. Like what's the point in having quad-core server CPU without the high-bandwidth buses of server-grade hardware.

    In the concurrent DNS/Kaminsky thread, I saw a reference that facebook's DNS TTL is low. A quick investigation reveals that they have a 30 second TTL and are using DNS round-robin for their load balancing.

    He's nothing but a blame-shifting cretin.

    --
    POKE 36879,8
  13. Re:WTF? by cryogenix · · Score: 4, Insightful

    I think we read different articles. He's not saying he didn't plan well enough, he's saying that Intel and AMD promise that Gen Y processor is 35% faster than Gen X processor, and he's not seeing anywhere near 35% in real world performance. The 35% is a made up number but it doesn't matter what the number is that they claim. He's probably correct. Manufacturers pull this stuff all the time. Look at the recent articles on battery life claims on notebook's. AMD came out and called BS on the whole thing and basically said if you guys don't stop lying through your teeth, the FTC is going to regulate us. From the perspective you are taking, that would mean we have to call AMD incompetent for not understanding how batteries work and not properly selecting them.

    Manufacturers ALWAYS overstate claims in computer related products. CRT actual inches vs viewable inches (thank you lcd's for finally being honest... about inches anyway.. brightness and contrast however....) Computer speaker wattage being 1/2 or 1/4 of what's claimed. Power supply efficiency or wattage not measuring up to claims... you name it. He's calling out what he see's to be bogus claims based on a real world experiences in a demanding environment, the type of environment where one is always looking for better performance. I think we should get some more information before declaring him to be the problem as I'm sure he has a whole team of people that are working on this issue.

    What I'd like to see from him is some numbers. On this Intel (or AMD) rig, we get so many operations per hour/minute/whatever. On this new Intel (or AMD) rig which they claim is 20% faster than the previous rig, we're only seeing this number of operations per hour which amounts to only a 7% gain, thus we're seeing 13% less than they are claiming. Again, numbers made up for examples sake. I'd also be very interested in what a typical rig of theirs looks like... X Processor, Y Ram, what type of storage system is it connected to, etc. I think such numbers are vital to understanding the issues at hand. We all know that vendors will run the benchmarks that makes their stuff look the best, and that is often not reflective of real world performance. If I was Intel/AMD I'd be chiming in right about now and opening a dialog with Facebook and looking to see what the issues are. Facebook is a big customer with huge name recognition and you want to be able to use them as an example of your solution delivering the promised performance for your marketing. I'm going to assume (I know I know) that they are already working with the server vendor to try and see what's going on here.

  14. Surely that's obvious by grahamsz · · Score: 3, Informative

    They collect a large amount of data on people and mine that for marketing information to turn around and target those same users.

    It's the same model as google.

  15. so what about google then? by Klintus+Fang · · Score: 2, Insightful

    I'm bemused that he implies the problems with his servers are due to Intel and AMD no delivering with their chips, yet at the same time he admires google for how good a job they do in building out their machines.

    he must be aware that google uses Intel and AMD chips.

    his reasoning just doesn't square.

    --
    In a minute there is time For decisions and revisions which a minute will reverse. -T.S. Eliot
  16. Re:WTF? by MidnightBrewer · · Score: 2, Interesting

    How can you be blamed for finding an acceptable solution when there simply isn't one available? He is a software developer, not a hardware one. Not everybody can just go out and design their own servers like Google does. He's saying he's been tripped up by the fact that the server manufacturers aren't delivering on their promises; hardly something he should be blamed for. Your attempts to read more into his comment about "not being cheap" and compare it to the false words of a politician seems like a pretty big stretch.

    If you read the entire article, he not only doesn't say that his decisions have led to disasters, but instead says that his infrastructure development decisions have led to very smooth transitions, even when Facebook rolls out big, new features like the customized home page URLs. He is only voicing his disappointment in saying that the servers aren't living up to the hype, and that he is still looking for a better solution.

    I will say that his comment to not be cheap seems to be in direct conflict with the rest of his argument, since his criticism over AMD and Intel revolves around the fact that they need to be cheaper. Seems a bit counter-intuitive.

    --
    "Give a man fire, and he'll be warm for a day; set a man on fire, and he'll be warm for the rest of his life
  17. And yet... by Junta · · Score: 5, Interesting

    Every major server vendor has jumped on the bandwagon of 'look how efficient we are, and 'cheap'. Three years ago, by and large the tier ones wouldn't bother designing systems without forcing even the cheap design to have parts included to facilitate purchase of redundant add-ons (i.e. power distribution cards designed for dual power supplies regardless of one being bought or not). They would always put a high end storage controller on the planar. They would always make their 'entry' platform be burdened with expensive components to make it easier to option it up.

    Now, we have tons of 'internet scale', or 'cloud', or whatever buzzword you feel like. They tend to stress energy efficiency, low cost components, with sales and management strategies targeted at thousands of servers (i.e. IBM iDataplex, HP SL6000). Basically, precisely what he prescribes, though probably not as 'cheap' as he wants. The incentive he gives is that the vendors should have zero margin, which is not particularly compelling for companies to work toward. Google's situation works because they brought it in-house and thus have fewer middle-men. Honestly, from all the rumours I hear, it's the logical thing to do when your server consumption is larger than some respectable computer companies' entire production. If he thinks the volume of servers is high enough to pull a google, by all means do it. Otherwise, be prepared for people not jump at the chance to give their designs to him at zero margin.

    Of course, if he is calling them out on performance per-watt by avoiding non-x86 solutions, including ARM, that might be a fair criticism. However, I think company forays into 'exotic' architectures have not panned out in the market recently. Sun's niagra, despite all the worthy praise, couldn't attract a mass-market required to subsidize it for those who benefited most from it. Last year, IBM seemed to be saying Cell architecture would light the world on fire, but have been a lot quieter about it now. The message their buisness leaders have probably taken in is that while these things have their target market, that market isn't worth the expense of developing products that are refused by the larger market and focus instead on leveraging commonly accepted building blocks to do as best they can for that niche, even if it means skipping the 'perfect' solution. Sure, IBM still sells plenty of POWER, but I haven't heard that be *particularly* praised on the performance/watt category like I hear a lot for Niagra, Cell, and ARM. And if not for POWER's legacy, it probably would be still born in the market today. The PA-RISC->Itanium decision for HP probably sank their HP-UX product line faster than banking on legacy of PA-RISC installs, and it seems IBM won't make that mistake, but at the same time I don't hear much about *new* POWER customers.

    --
    XML is like violence. If it doesn't solve the problem, use more.
  18. Re:WTF? by Spit · · Score: 2, Interesting

    You can better identify your bottlenecks by benchmarking. Facebook's scalability is likely not as cpu-bound as predicted, thus the dude's angst on discovering that CPU upgrades weren't a silver bullet.

    In your case, you haven't looked past the RAID configuration for the root-cause of your performance issues. Without benchmarking you don't really know if it was an issue with: the filesystem, the block size, stripe size, or a caching tunable.

    Systems architecture isn't as easy as PC builders would have you believe.

    --
    POKE 36879,8
  19. Depends on 'headroom' of other subsystems. by Kadin2048 · · Score: 2, Informative

    Not necessarily, no.

    It's all about how CPU limited the workload is.

    You might be running a program that's CPU limited on one processor, then upgrade the processor and discover that it's suddenly discover that instead of being CPU-bound, now you're memory-bound. Or I/O bound. Or whatever.

    Point is, just because you've hit the wall in terms of CPU doesn't mean you'll get a 50% improvement with a 50% increase in CPU ... you'll only get that if all the rest of the server's systems have 50% overhead to spare. And in most cases they don't. One of them will hit the performance wall before you return to being CPU-bound with the shiny new processor.

    There are exceptions to this -- renderfarms, for instance, or some distributed HPC stuff -- where you really can reasonably expect to get 50% more performance out of 50% more CPU, but they're exceptions not the rule.

    --
    "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
  20. Strange... by spire3661 · · Score: 4, Informative

    Since when do we listen to manufacturer's claims? You take the new hardware, stress test it with your custom software, record results, plan servers accordingly. How hard is it really to commission a server design that meets your needs and then QA some prototypes?

    --
    Good-bye
  21. Would you expect otherwise by Chuck+Chunder · · Score: 4, Funny

    If that's his argument, then it would seem that the real conclusion is that Facebook can't build systems as good as Google's

    Google's core business is intelligence.
    Facebooks core business is stupidity.

    --
    Boffoonery - downloadable Comedy Benefit for Bletchley Park
  22. Re:WTF? by edmudama · · Score: 5, Insightful

    I think we read different articles. He's not saying he didn't plan well enough, he's saying that Intel and AMD promise that Gen Y processor is 35% faster than Gen X processor, and he's not seeing anywhere near 35% in real world performance.

    If the application was purely CPU bound, and Y wasn't giving me 35% more than X, I'd complain.

    However, if it's a complex system like almost everything else, why would they expect their application to get 35% faster when there's probably 6 or 8 critical subsystems that could all be bottlenecks as well?

    --
    More data, damnit!
  23. Re:WTF? by Runaway1956 · · Score: 3, Interesting

    Uhhh, correct me if I'm wrong. I've been looking at after market bolt on parts for my car. The headers claim increase fuel mileage, the spark plugs, the air filter, the tires, as does a turbocharger. The glass pack mufflers, and the resonator. Oh yeah, the aerodynamic rims, the hood, and spoiler. Don't forget the carbon fiber body panels. Taken all together, those increased MPG's add up to about 150 MPG. You're saying I may not see that much improvement on my 1968 Chevy Malibu? It's just hype? Man - you just saved me about $5,000!!!

    --
    "Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
  24. Hang on a minute... by OneSmartFellow · · Score: 3, Funny

    ...I'm supposed to care about the comments of the guy who wrote Facebook ?

    Hah, hah, hah, hah, hah !At least google needed to actually engineer their solution, but Facebook, come on ! The next time I need to write a PHP script for displaying photos and text, I'll hire my 13 year old daughter.

  25. not just the CPU it's overall system performance by unix_geek_512 · · Score: 3, Insightful

    This isn't just about the CPU, it's about overall system performance.

    Despite improvements in CPU performance, memory and IO performance is lagging behind.

    A modern SATA drive delivers about 90MB/sec ( peak sequential read ).

    Some RAID controllers can do about 600-800MB/sec ( peak sequential read ).

    An average AM2 ( K10 core 65nm ) gets about 34,849MB/sec L1, 12,169MB/sec L2, 6371MB/sec L3, 2,741MB/sec DDR2-800 5-5-5-12.

    Obviously Opterons scale a lot better since they each have an onboard memory controller and additional HT links which greatly increases bandwidth as you add more CPUs. However adding more cores on the same die which have to share a single memory controller can cause starvation.

    Another major issue is software parallelization, writing parallel code is still a difficult problem. If your software doesn't parallelize well it doesn't matter if you have 8, 16 or even 32cores on a single die.

    If you had an equal number of CPU cores and memory controllers you could achieve much better performance, however your relatively very slow storage subsystems would still be a major bottleneck.

  26. PHP "extension" by RGRistroph · · Score: 4, Insightful

    I once did a large project in which I took a large, slow site in PHP (it was pretty complecated, it was a CRM with a lot of custom business logic) and rewrote all the core functionality from PHP to C / C++, and made it a "module" of PHP. The rewriting was mostly simple translation -- litterally removing all dollar signs, adding some types, and attempting to compile, and just fixing the compile errors until it would build. Then going back through it with a fine-tooth comb to track down all the memory leaks.

    The speed increase from doing that is pretty surprising. Simple loops that do a bit of math or something speed up by 100 times, and a loop that creates and destroys an object within the loop will be 100,000 times faster. This is without actually trying to write fast C/C++ code, and not create and delete the same thing over and over in a loop -- just pure dumb translation of the code.

    At that point, the web site guys can keep tweaking and changing the web page in PHP just like before; but they load that module in the php.ini and then they have a basic library of stuff, like login_user() or get_user_balance() and etc, that are really fast and do all the heavy lifting.

    I would be surprised if Facebook has not already done this. How to do it is well documented in several books, and there are lots of PHP modules written in C/C++ to look at for examples.

    I suspect that Facebook's VP is right that AMD and Intel exaggerate their claims, but is also generally true that most computer programs are more IO bound that you expect. This is not a reason to avoid something like I describe above; once you have the more complete control of programming in C, IO issues may be easier to find and address.

    He also mentions that the servers offered by Dell and others aren't very power efficient or practicle for him, and he mentions Google designing their own servers. Nothing google did was really rocket science, from what we know, and Facebook probably doesn't have to go as far as they did to get a reasonable benefit. It's not that hard to set up motherboards to run without a case, booting off the network with no harddrive attached.

  27. Re:WTF? by amorsen · · Score: 2, Interesting

    No, he just found that RAID controllers suck. Which they do, universally, all the time. The only ones that actually perform decently are the ones in external SAN boxes, and inside they are typically servers with software RAID...

    --
    Finally! A year of moderation! Ready for 2019?
  28. Re:WTF? by TheRaven64 · · Score: 4, Interesting

    One of the fun toys Intel has to play with is a complete system simulator, which simulates every single component in a computer for early testing. This lets them vary parameters that aren't feasible yet while they're working on their design goals. A few years ago they did a test; what happens to the system performance if you make the CPU infinitely fast? They adjusted the simulator so that every CPU operation took zero simulated time and ran their benchmark suite. It ran twice as fast (in simulated time) as it was before.

    A CPU-bound workload can quickly become a RAM-speed bound or a disk-speed bound workload if you make the CPU faster but don't upgrade anything else.

    --
    I am TheRaven on Soylent News