Facebook VP Slams Intel's, AMD's Chip Performance Claims
narramissic writes "In an interview on stage at GigaOm's Structure conference in San Francisco on Thursday, Jonathan Heiliger, Facebook's VP of technical operations, told Om Malik that the latest generations of server processors from Intel and AMD don't deliver the performance gains that 'they're touting in the press.' 'And we're, literally in real time right now, trying to figure out why that is,' Heiliger said. He also had some harsh words for server makers: 'You guys don't get it,' Heiliger said. 'To build servers for companies like Facebook, and Amazon, and other people who are operating fairly homogeneous applications, the servers have to be cheap, and they have to be super power-efficient.' Heiliger added that Google has done a great job designing and building its own servers for this kind of use."
You guys don't get it
Is it possible to take out a massive life insurance policy on Jonathan Heiliger?
To build servers for companies like Facebook, and Amazon, and other people who are operating fairly homogeneous applications, the servers have to be cheap, and they have to be super power-efficient.
I assure you, despite your misconception that the world revolves around you everyone has those requirements. From the people who build supercomputers right down to the netbook I am typing on while watching Gurren Lagann.
Can we get like a panel of hardware engineers to have a discussion with this guy and can I get some popcorn?
My work here is dung.
Maybe the dude should have benchmarked before committing. How does he scope his projects, with brochures?
POKE 36879,8
To build servers for companies like Facebook, and Amazon, and other people who are operating fairly homogeneous applications, the servers have to be cheap, and they have to be super power-efficient.
Hm, lets see... perhaps because Facebook and Amazon are niche markets? The average server isn't going to even need all the computing horsepower and the power efficiency is simply a drop in the bucket for most companies electrical bills. The average server is going to be much more I/O intensive than CPU intensive unless you do cluster computing or render a lot of stuff. The average server such as a web server or a file server doesn't use that much CPU and usually you are running 1-3 servers, not the hundreds that Facebook or Amazon would run.
And really, why is a VP complaining about this stuff? That he can't either afford custom solutions or spend the money buying more servers?
Taxation is legalized theft, no more, no less.
YouTube has been constantly crashing and showing server errors more frequently over the last month. I'm not sure how much more bandwidth that YouTube demands over Facebook or Amazon (if it does) but whatever issues with servers and reliability that the Facebook team is having with Intel and AMD might be even worse over at YouTube.
Anyone experiencing major outages and slow movement and loading times over at YouTube recently? Can't say I've had a problem with Amazon or Facebook other than intended outages for updates or repairs in the last month.
I have heard from some reliable sources that Facebook and Twitter's backend applications are poorly written.
Are Intel and AMD's claims overblown, sure what hardware manufacter doesn't cherry pick performance claims.
But I don't care what sort of hardware you through at crap code you are always going to get crap performance.
Well, I suppose that if he does not like the offerings from Intel and AMD, they could always go with...
Uh..
Oh.
Do you want your servers to be cheap or do you want them to be good?
"...To build servers for companies like Facebook, and Amazon, and other people who are operating fairly homogeneous applications, the servers have to be cheap, and they have to be super power-efficient..."
Sounds like ARM processors are being described here. Whether they can deliver is another subject in itself. On this front, I have my doubts on ARM's ability to deliver. That's my bias.
1) Facebook & Amazon need cheap, power efficient systems
2) Intel and AMD aren't measuring up with processors to power these systems
3) However, Google has systems appropriate for this use (presumably using Intel or AMD processors)
If that's his argument, then it would seem that the real conclusion is that Facebook can't build systems as good as Google's, even though they are using the same processor technology.
It's because your shitty website doesn't have a single line of compiled code. PHP only goes so far.
"the servers have to be cheap, and they have to be super power-efficient." So aren't Atom-based nettops using like 5 watts and dual core versions selling for $150, you supply the drive? http://www.newegg.com/Product/Product.aspx?Item=N82E16856167037
Dashboard Widgets
Its the next logical solution... Those T5440 servers with 256 processing threads are MONSTERS in terms of handling simultaneous connections which make them very good web servers, database servers, and file servers, all of which means they are very good for a company who's product is a website.
We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"
There's a pretty simple answer for scaling infrastructure. It's, 'Don't be cheap,'" Heiliger said. He added that Facebook does drive hard bargains with its hardware and software infrastructure suppliers, and is careful not to overbuy.
I remember reading about how Amazon does it. They have clusters of servers running whatever OS suits the particular person having written the portion of code being used and will blow through something like 100 dead servers a day. IIRC, when you load a page from Amazon you get content delivered by 20+ servers onto one web page.
Maybe he just needs to scale out.
Or - I just noticed an unused AS390 in the server room today. Apparently the Z890 that replaced it is also going to be replaced by a new z9 machine. He could bundle some apps on the z890 or the 390.
I wonder if you can run .net or Java under OS/390 or MVS...
The Kai's Semi-Updated Website Thingy
Assuming that a solution was properly engineered, this should not have been a surprise.
Cheap. power efficient, performance. Pick two.
"To those who are overly cautious, everything is impossible. "
This is becoming an annual event for Heiliger, who also complained about server vendors at GigaOm's Structure 08 conference last year. Facebook used to buy a lot of cloud-optimized gear from Rackable/SGI, but no longer appears on the list of their largest customers. Makes you wonder if they're not going to follow Google's lead and build their own servers.
He woke up on the wrong side of the bed, and then he had to sign the check for the electric bill.
He's just grumpy.
NEWSFLASH! Customer are tightwads.
Performance/Reliability/Price.
Pick any two, Heiliger.
Facebook uses PHP, and yes you can, on both z/OS and Linux. Probably on z/TPF, too. And Facebook wouldn't be the first Internet company to buy a mainframe.
Let me file his opinion with my next door neighbor the plumber.
Java? Yes, absolutely. Not sure about .net tho.
They collect a large amount of data on people and mine that for marketing information to turn around and target those same users.
It's the same model as google.
I'm bemused that he implies the problems with his servers are due to Intel and AMD no delivering with their chips, yet at the same time he admires google for how good a job they do in building out their machines.
he must be aware that google uses Intel and AMD chips.
his reasoning just doesn't square.
In a minute there is time For decisions and revisions which a minute will reverse. -T.S. Eliot
I wonder if you can run .net or Java under OS/390 or MVS...
Yes, actually, though we've been calling it z/OS for about 10 years now, and it's an hybrid mix of MVS and Unix. .NET engine yet - might need to run Linux for z-Series in an LPAR or under z/VM to get a .NET framework
Java definitely. Not sure if there's a
Every major server vendor has jumped on the bandwagon of 'look how efficient we are, and 'cheap'. Three years ago, by and large the tier ones wouldn't bother designing systems without forcing even the cheap design to have parts included to facilitate purchase of redundant add-ons (i.e. power distribution cards designed for dual power supplies regardless of one being bought or not). They would always put a high end storage controller on the planar. They would always make their 'entry' platform be burdened with expensive components to make it easier to option it up.
Now, we have tons of 'internet scale', or 'cloud', or whatever buzzword you feel like. They tend to stress energy efficiency, low cost components, with sales and management strategies targeted at thousands of servers (i.e. IBM iDataplex, HP SL6000). Basically, precisely what he prescribes, though probably not as 'cheap' as he wants. The incentive he gives is that the vendors should have zero margin, which is not particularly compelling for companies to work toward. Google's situation works because they brought it in-house and thus have fewer middle-men. Honestly, from all the rumours I hear, it's the logical thing to do when your server consumption is larger than some respectable computer companies' entire production. If he thinks the volume of servers is high enough to pull a google, by all means do it. Otherwise, be prepared for people not jump at the chance to give their designs to him at zero margin.
Of course, if he is calling them out on performance per-watt by avoiding non-x86 solutions, including ARM, that might be a fair criticism. However, I think company forays into 'exotic' architectures have not panned out in the market recently. Sun's niagra, despite all the worthy praise, couldn't attract a mass-market required to subsidize it for those who benefited most from it. Last year, IBM seemed to be saying Cell architecture would light the world on fire, but have been a lot quieter about it now. The message their buisness leaders have probably taken in is that while these things have their target market, that market isn't worth the expense of developing products that are refused by the larger market and focus instead on leveraging commonly accepted building blocks to do as best they can for that niche, even if it means skipping the 'perfect' solution. Sure, IBM still sells plenty of POWER, but I haven't heard that be *particularly* praised on the performance/watt category like I hear a lot for Niagra, Cell, and ARM. And if not for POWER's legacy, it probably would be still born in the market today. The PA-RISC->Itanium decision for HP probably sank their HP-UX product line faster than banking on legacy of PA-RISC installs, and it seems IBM won't make that mistake, but at the same time I don't hear much about *new* POWER customers.
XML is like violence. If it doesn't solve the problem, use more.
Not necessarily, no.
It's all about how CPU limited the workload is.
You might be running a program that's CPU limited on one processor, then upgrade the processor and discover that it's suddenly discover that instead of being CPU-bound, now you're memory-bound. Or I/O bound. Or whatever.
Point is, just because you've hit the wall in terms of CPU doesn't mean you'll get a 50% improvement with a 50% increase in CPU ... you'll only get that if all the rest of the server's systems have 50% overhead to spare. And in most cases they don't. One of them will hit the performance wall before you return to being CPU-bound with the shiny new processor.
There are exceptions to this -- renderfarms, for instance, or some distributed HPC stuff -- where you really can reasonably expect to get 50% more performance out of 50% more CPU, but they're exceptions not the rule.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Since when do we listen to manufacturer's claims? You take the new hardware, stress test it with your custom software, record results, plan servers accordingly. How hard is it really to commission a server design that meets your needs and then QA some prototypes?
Good-bye
I wonder when we'll see servers with CPUs based on (many...) ARM cores.
Yes, they are an order of magnitude slower, but three orders of magnitude more power efficient. For the same CPU performance you'd probably be around two orders of magnitude more power efficient (for CPUs at least). If your app runs on a large farm already...
One that hath name thou can not otter
Google's core business is intelligence.
Facebooks core business is stupidity.
Boffoonery - downloadable Comedy Benefit for Bletchley Park
I bet if they rewrote the CPU intensive parts in C/C++ (from PHP) they could reduce the overall machine count and stop worrying about which CPU manufacturer lied to them that week :)
Ever since I accepted their invitation to use my (very unusual) name openly on Facebook, every single variation on that name has been spammed. This Heilinger jackass can take his "harsh words" about the performance of people who do real hardware work and shove them deep and hard. You can't bullshit a chip, you can merely mis-state its performance. When you lie about how you'll protect information people give you in trust, you're pretty much a douchebag. Mr. Heiliger, you are a douchebag.
I've calculated my velocity with such exquisite precision that I have no idea where I am.
...I'm supposed to care about the comments of the guy who wrote Facebook ?
Hah, hah, hah, hah, hah !At least google needed to actually engineer their solution, but Facebook, come on ! The next time I need to write a PHP script for displaying photos and text, I'll hire my 13 year old daughter.
the servers have to be cheap, and they have to be super power-efficient.
yeah, and self-replicating too!
I don't think that Amazon, Facebook, etc, need all the floating point performance that current desktop and server CPUs can deliver. If they're really looking for power efficiency, they shouldn't use CPUs with features that are never used, but draw tons of power nonetheless.
This isn't just about the CPU, it's about overall system performance.
Despite improvements in CPU performance, memory and IO performance is lagging behind.
A modern SATA drive delivers about 90MB/sec ( peak sequential read ).
Some RAID controllers can do about 600-800MB/sec ( peak sequential read ).
An average AM2 ( K10 core 65nm ) gets about 34,849MB/sec L1, 12,169MB/sec L2, 6371MB/sec L3, 2,741MB/sec DDR2-800 5-5-5-12.
Obviously Opterons scale a lot better since they each have an onboard memory controller and additional HT links which greatly increases bandwidth as you add more CPUs. However adding more cores on the same die which have to share a single memory controller can cause starvation.
Another major issue is software parallelization, writing parallel code is still a difficult problem. If your software doesn't parallelize well it doesn't matter if you have 8, 16 or even 32cores on a single die.
If you had an equal number of CPU cores and memory controllers you could achieve much better performance, however your relatively very slow storage subsystems would still be a major bottleneck.
Of server manufacturers customers are not Google, or Amazon or Facebook. The VP doesn't get that he's just not that important....
Deleted
...big companies have infrastructure architects to plan, test and make recommendations on new infrastructure. You don't just go out and buy a container load of the latest and greatest based on the advertising copy.
I once did a large project in which I took a large, slow site in PHP (it was pretty complecated, it was a CRM with a lot of custom business logic) and rewrote all the core functionality from PHP to C / C++, and made it a "module" of PHP. The rewriting was mostly simple translation -- litterally removing all dollar signs, adding some types, and attempting to compile, and just fixing the compile errors until it would build. Then going back through it with a fine-tooth comb to track down all the memory leaks.
The speed increase from doing that is pretty surprising. Simple loops that do a bit of math or something speed up by 100 times, and a loop that creates and destroys an object within the loop will be 100,000 times faster. This is without actually trying to write fast C/C++ code, and not create and delete the same thing over and over in a loop -- just pure dumb translation of the code.
At that point, the web site guys can keep tweaking and changing the web page in PHP just like before; but they load that module in the php.ini and then they have a basic library of stuff, like login_user() or get_user_balance() and etc, that are really fast and do all the heavy lifting.
I would be surprised if Facebook has not already done this. How to do it is well documented in several books, and there are lots of PHP modules written in C/C++ to look at for examples.
I suspect that Facebook's VP is right that AMD and Intel exaggerate their claims, but is also generally true that most computer programs are more IO bound that you expect. This is not a reason to avoid something like I describe above; once you have the more complete control of programming in C, IO issues may be easier to find and address.
He also mentions that the servers offered by Dell and others aren't very power efficient or practicle for him, and he mentions Google designing their own servers. Nothing google did was really rocket science, from what we know, and Facebook probably doesn't have to go as far as they did to get a reasonable benefit. It's not that hard to set up motherboards to run without a case, booting off the network with no harddrive attached.
I bet intel/amds performance claims are closer to reality than facebooks book value.
Mainframes aren't known for good CPU performance. Brilliant at I/O, but if Facebook is I/O-limited they're either doing something very wrong or something very right.
Finally! A year of moderation! Ready for 2019?
Excellent article at CNET.
The Slashdot article seems to be a way of getting people to see ITworld; is it a paid ad? It was not clear from the ITworld article what problems the Facebook V.P. had with AMD and Intel. If what he wanted to say could be understood better, I doubt it would be controversial.
to Apple. That is the only thing that they are good at.
I'm just about to buy a new server and I'd like to hear from slashdot what's your experience with the new CPUs? Should I buy the new 55 or the old 54 series?
PALO ALTO (Reuters). PHP-based website reports scalability problems. Blames server hardware. Film at 11.
They're on the block, and produce incredibly power-efficient, inexpensive (complete systems for ~$200/core to the end user) machines. Their sales were continually gaining momentum, but their investors hit troubled times and had to pull out.
how many mips does it take to run a database full of one-liner updates about someones cat, or boyfriends girlfriends ex fiance?
unless bejewelled is actually running as a computational cluster app, you folks have far LESS to bitch about than a web-business or the worlds largest search engine in my opinion. find a more efficient operating system, and insist more efficient code. you dont bitch at vendors for performance, you find other ones if it gets bad enough. im sure Cray would love to hear from you guys, or perhaps maybe SGI?
Good people go to bed earlier.
What grown man, let alone a VP of technology would use the word SUPER to describe something technical?
No matter how much faster hardware gets, the software has to take advantage of the improvements. If you're using a bloated interpreted web-based language running on an OS that's not fine-tuned to your given piece of hardware and don't see huge improvements, perhaps one should evaluate the layers of innefficient code that you've rested your apps on... The Gaming Industry has had a lot of success in eeking out every cycle of performance possible, but they spend time tuning their products to hardware solutions.
http://www.beanleafpress.com
Your humor almost cost me my Kinesis keyboard (and they're not cheap)!
That was absolutely brilliant.
Are they using the intel compiler?
Are they making their binaries more thread-friendly?
Are they using cc flags that exploit the new cache/features/instructions?
Are they running their OS's with at least basic tuning to contain interrupts and kernel activity to particular CPUs?
Are they using new hyperthreading efficiently and consciously or just disabling it?
Are they running with taskset wrappers to decrease context switching on an obviously stochastic workload?
Are they tweaking their networking configuration internally to optimize for their specific packet loads?
Because I've found that the new hardware (Nehalems, for example) will give you a heck of a speed boost if you're doing the above.
It's real easy to have the compatibility features mask performance, really you gotta look at it like a new platform.
Anyway, those comments smell like making excuses to me, they are not even remotely specific enough to be defensible.
Telcos have been using -48V DC for their equipment for decades. Don't see why the non-telco crowd can't do the same and increase efficient by removing power AC/DC power conversion.
It's not even a full order of magnitude faster, but 112MB/s is still nearly four times faster. And these are both magnetic discs, rather than SSDs.
Only if you're doing purely sequential writes. This only happens if you're laying down 20+ MB files at a time, or you're using a file system that is COW (like Btrfs, ZFS, BSD's LFS).
If you're using ext3, with small- to medium- sized files, then you're getting head seeks and probably not getting 112 MB/s.
I'm thinkin this fella Jonathan Heiliger is probably disappointed because he is probably running Dell Power Edge servers he bought off someone on Craigslist. That would explain the lack of chip performance. Does he have any clue that google is probably running intel/amd chips? They may have designed their own servers, but not the chips.
see Amdahl's law for more about speedup gain for any one part of a problem.
AMD and Intel are in the position where they need to make products used by a large number of customers. As a result of this, their primary focus will be on making products that will draw in the greatest number of customers. As we saw with Itanium, if the customer base is not large enough, the R&D costs will never be made up.
So, when does it make sense for a chip and motherboard supplier to make a product with only one or two POTENTIAL customers in mind? Never is the answer that comes to mind. Both AMD and Intel MUST spend their resources on making products that will result in a net profit.
So, Facebook and web servers, and database engines...it should be possible for AMD or Intel to make a platform with these specific applications in mind, but the cost for such a specific product to be developed when there are very few potential customers that would want it would be small. Potential is the key, because I am sure that if Facebook approached AMD or Intel and wanted a fixed-purpose product to be developed, they would be happy to do it for the right price.
When a company makes a motherboard, the focus is to make a product that will get enough interest and sales to make a profit. As a result, we see motherboards with extra PCI, PCI Express, memory, USB, SATA, and other connectors than most people would actually need. If it gets cut back to only what would be needed for a specific customer, then the machine would probably perform better. Expecting a product aimed at a large number of people to be perfectly optimized and customized for any one specific purpose is foolish.
And of course, you have the limitations inherent in any system, including bandwidth between components, ethernet controllers, and how much CPU power may be used for things like USB, SATA, and ethernet. When you buy a $75 motherboard and expect the performance of a $250 motherboard, you are pretty much guaranteed to be disappointed.
If they want performance per watt, why don't they use sun ultrasparc t1 or t2? Massive throughput specifically optimized for web applications.
Start Looking For A New Job Jonathan ?
I think Jonathan Heiliger, VP of technical operations should first try keeping Facebook from crashing constantly. I can't go 5 minutes without the site locking up on me. Mr. Heiliger, really not impressed ! These chip makers are the one's that are keeping America Safe, right now and into the future. If you want to bad mouth someone Jonathan, bad mouth the people that work under you, that can't even keep Facebook a site worth going to anymore. You are a Spoiled Rotten Little Kid that thinks someone owes you something, the chip makers don't. Keep working harder on Facebook !
What the VP of Facebook is really asking is that Intel and AMD spend a lot of their R&D on designing features they want, and then pass those costs onto consumers everywhere, so Facebook can get cheaper servers.
F--- that.
This is my sig.
LOL the new ones were designed for VM's really... haha thats why they do all the hardware virtulization and Virtulized i/o and they pimp it showing it. They could take advantage of super computers and drain there drops out of them if they built on the vm cloud. but it's just easier to do googles approach and much cheaper. But for paying customers they will want to buy the warranty and support because they are selling a service. Facebook guy needs a life.