Intel Lindenhurst Xeon DP Platform Discussion
Steve from Hexus writes "Hexus.net has a article looking at Intel's latest Xeon platform: Lindenhurst, discussing the Paxville dual-core processor, E7520 core-logic, where it could go right for Intel, and where it could all go wrong." From the article: "If you're I/O bound by your threads in any way, you can hit problems (all threads touch the MCH, then there's a 266MiB/sec bus link to the I/O processors to cross, then the data hits disks or network hardware). If you're memory subsystem bound in any way, especially on a majority of compute threads, performance is likely gone. There's just too much resource sharing for it to all conceivably work well, especially compared to Opteron. I can forsee many a scenario where dual-core Opteron will give Paxville Xeon DP a beating."
there's a 266MiB/sec bus link
Wow - that's a *LOT* of Tommy Lee Joneses and Will Smiths!
Lindenhurst? They ARE running out of names. I spent a couple months in Lindenhurst, Illinois when I was about 1. It's a sprawl-barf located just outside the doors of Six Flags.
anyone else read this as a sublimedirectory.com tag line?
Intel Lindenhurst Xeon DP Platform Discussion
HEXUS have an article coming that evaluates the latest Intel Xeon DP platform, codenamed Lindenhurst. As you'll likely know, (current) Xeon is Intel's workstation and server processor based on many of the same technologies that define Pentium 4 in the desktop space. Lindenhurst (at its most basic definition) is the combination of the new Paxville Xeon processor in DP (dual processor) form (there's a multi processor version hosted by Truland), along with Intel E7520 core logic.
The Paxville generation of Xeon is dual-core and uses the latest generation of Netburst microarchitecture, making the DP version ostensibly a clone of the Pentium D 820, but with the ability to also turn on HyperThreading for both cores. The DP version of Paxville, at $1080 in volume, is only available in 2.8GHz form for the time being, MP variant available at up to 3GHz. Think about your breathing. Inhale and exhale voluntarily. It supports everything the dual-core Pentium D does, including SSE3 instructions and rides the same 200MHz system bus (800MHz effective).
E7250 provides a single dual-channel DDR2-400 memory controller, and a shared bus for the CPUs to get to that memory controller from. Other stuff like PCI Express, support for the Xeon CPU's execute disable bit and support for PCI-X via a mandatory 6700PXH segment bridge (2 PCI-X segments) mean that superficially its a forward thinking, modern workstation and server platform.
However, in advance of the Lindenhurst test platform arriving for evaluation, I've caught myself wondering just how it's supposed to work with any kind of real performance outside of a couple of scenarios. It's an issue of limited resource sharing, mainly at the CPU and memory controller levels.
Not much food to go round
We've evaluated HyperThreading-able processors many times in the past, since its launch with the 3.06GHz Pentium 4, and while there's opportunity for performance improvements with a single HyperThreaded processor, performance rarely doubles because HyperThreading is the sharing of the CPU's execution resources by the Hyper threads.
In an SMP scenario with Xeon, you've then got CPUs sharing a memory controller. When that memory controller only supports fairly slow DDR2-400, likely at higher latency and with a performance penalty compared to DDR-400 (even without ECC in the mix, which is almost mandatory for Xeon given the places its implemented), there's a performance issue. When the CPU-to-memory bus is shared between the two CPUs, so bus access is singular and access has to be interleaved, performance can be limited by a CPU-to-memory bottleneck.
Add in dual-core and you've now got four cores sharing one memory controller over one bus link. See where I'm going with this? Add in HyperThreading and eight logical processors in two sockets have to share that one memory resource, on one bus.
The lack of dedicated CPU bus connections to the memory controller on SMP Intel systems historically is one of the reasons why Athlon MP was able to do fairly well on introduction against SMP Pentium IIIs, CPUs which still shared the bus back then. Each Athlon MP had a dedicated bus connection to the memory controller.
With the introduction of Opteron by AMD in recent years, each CPU has its own memory controller right there on the CPU die and HyperTransport to allow the CPUs to access each other's memory controller and other connected system resources on non-heavily shared (only between a pair of CPUs, or a CPU and devices) bus links. That kind of topology, where all bus and memory access traffic isn't confined to one set of bus paths is why Opteron generally beats on Xeon in modern performance testing.
So while dual-core Opteron processors have the cores share a memory controller and HyperTransport link, that's as far as the sharing goes for the most part. Intel's comparison platforms with Xeon are sat sharing resources like nobody's business.
Where it could go ri
To me, Linderhurst sounds like a blimp covered in flames... sounds quite appropriate for Intel right now :)
cost 3 times as much as the 820D ... it's a copy of the 820D ... see where I'm going with this?
:-)
The dual-core intels may cost half as much as the dual core Athlon64s but they still suck twice as bad. What you save in initial purchase cost you lose in electricity bills and time doing work.
The fact they're STILL making Netburst based processors just sickens me. Give it up already and go P6 or something new. I mean if they put half the money they put into the netburst into the P6 designs of late they'd already have a 2.5Ghz P6 core that would give AMD a run for their money.
I think the cats out of the bag for the most part. And not like you're gonna sell a lot of dual-core based Dells to grandma so she can write emails.
Times like this make me feel proud I'm an AMD whore
Tom
Someday, I'll have a real sig.
Lindenhurst, Paxville.
Who takes these names serious these days?
Pentium, Athlon, those are good names, just keep on following this pattern.
This is the sig that says NI (again)
Got a plain old dual processor 1GHz box that with video and hard drive upgrades is still competent. It does everything I need it to do, although processor- or memory-intensive processes are getting a bit sluggish. Rendering video takes a little time, but that's more because the application I use renders in a single thread - but I can play games and render video at the same time ;-)
I still believe if you could remove all the latency from I/O subsystems in a modern PC you'd have more processor than you could use by a longshot - IMO high-end PCs just wait for data faster than older machines, and a lot of the performance boost you see with a new machine is simply masking latency in other subsystems.
PCI-X and improved memory bandwidth will solve some of these problems, but it's a bandaid at best. I do tend to chuckle at people buying the newest/fastest peripheral, not understanding that a lot of the time the peripheral will talk faster than the nine(?) year-old PCI bus that's feeding it.
When troubleshooting performance issues the component that's working at 100% capacity is *always* the bottleneck - and with most home and business users, that bottleneck is almost never the CPU itself.
we see things not as as they are, but as we are.
-- anais nin
I'll admit that I'm no great expert on the details of multi-core, hyper-threaded CPU design, but from what's in the article isn't the memory access bottleneck a rather fatal, and obvious, flaw in the whole design? Unless I'm missing something, I'm really struggling to see how this got off the drawing board. What is it's point if the only applications that can ever take advantage of it are the very few that rarely need to access main memory?
Yep, I'd say that if both her input and her output are busy, she's DP.*
*See, kids? This is why you should avoid too much pr0n, it just totally warps your mind.
Are you...Are you some kind of genius?
No, ma'am, I'm just a regular Slashdot reader.
What exactly is that supposed to mean? Many scenarios?
I don't have allot of knowledge about the the inner-workings of Intel CPU's. So really this is not that much to read, but how come Intel still use a quad piped 200Mhz FSB, surely if they double the FSB their Xeon and P4 chips would have a better response time. Their adding in all these features like Hyper threading a new SSE3 (& others I don't understand) -So surely if they up the FSB, things might pass-through better. * I'm confused does anyone know what I mean. * come on Intel open up the pipelines Thanks in advance :)
I used to live on Lindenhurst in NY (on LI). I was once told that it had the most bars per square mile in all of the US.
William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
Where it could go right It's not all doom and gloom, though. Think of a scenario where compute threads rarely touch system memory , doing most of their work on the CPU with small working sets and you've got yourself something that Xeon should do well at. While those compute threads would have to be HyperThreading friendly to have HT be a performance win, Intel has spent good time making sure HT gets focus by application developers. If you read the last benchmark results for the dual core xeon's here you can see that AMD totally whipped the cpu into submission.
Yet Hexus.net still thinks they can find a task that the dual core xeon is better at, but yet they provide no real world example or benchmark results, so I guess they are still looking, perhaps they can find the right benchmark in the basement, or perhaps in deepest darkest africa.
>>that's a *LOT* of Tommy Lee Joneses and Will Smiths
It's also a lot of pugs barking to "Who let the dogs out?"
*shudder*
I just put together a Xeon based server. It was a rare case where a Xeon solution met my needs better than an Opteron based solution.
My company is _very_ sensitive to power consumption. So, I picked a very new motherboard from Tyan, and a Xeon that supported Enhanced Speed Step. I figured that I'd install cpudyn, like I did with all of our AMD boxes, and save a few bucks on electricity.
So, cpudyn doesn't work... because Speedstep isn't supported by Tyan's BIOS. I email Tyan, and I find out two things:
* Tyan wasn't aware that Speedstep was an option on the Xeon platform,
* That none of their BIOS suppliers are supporting Speedstep at this time.
Amazing! Intel put this in the CPU as a way to compete with this great feature from AMD, but you CANNOT USE IT.
Most certainly my last Intel purchase, ever.
jh
Itanic, Hindenburg....
Here's some more names they can choose from for new processors:
1) Challenger
2) Columbia
3) Chernobyl
4) Tacoma Narrows
5) Sultana
6) Cocoanut Grove
7) Grand Camp
8) Galveston
Many, many more. Feel free to add to the list.
The Hexus article is just a summary of their results along with several inaccuracies.
This is misleading. First off, the MCH is a 6.4 GB/s link so I dont understand how it could bottleneck I/O even if you're compute bound. The 266 MB/s IO bus is for legacy peripherals (USB/serial/SATA). Considering SATA-I (what the ICH5R supports) is 150 MB/s per channel, and USB is 400 Mb/s I cant see how this is a big problem. If you want fast (SCSI/FibreChannel/SATA-OII HW raid) disks and network, there are PCI-X 64bit and PCIe x4, x8 slots that you can have your important I/O subsystem hanging off of.Here is a link to the intel datasheets for the chipsets which shows 3 x8 PCIe interfaces for the 7520 and 1 for the 7320. http://www.intel.com/products/chipsets/E7520_E7320 /
All that being said, the CPU itself is a dog.
The article gets the point of Hyperthreading... backwards.
Yes, the memory interface gets congested, so the processor takes a stall. But, instead of just leaving the ALU idle, it has another thread in reserve to schedule on it. Thus improving the utilization of the ALU subsystem.
And THAT'S the point of this "Hyperthreading" thang...
The rest? Well, if the local L1/L2 cache isn't big enough, you are going to suffer. Yes, a bigger pipe to memory would help, but you are STILL several times slower than you could be. That's why you have the cache.
Anyway, its a balancing act.
Just another "Cubible(sic) Joe" 2 17 3061
I'm typing this on a system with two 3.6Ghz Xeons with Hyperthreading enabled. The system uses two 300GB ltra-320 scsi disks set up in a mirror and has 4GB RAM installed. When I run Nescape the performance is about as good as on my other system which is a 2.4Ghy P4 with 1GB RAM and one SATA drive. However the Dual Xeon system runs my DBMS querries blazingly faster, __much__ faster then the P4 based system. Many DBMSes work like Apache and "fork" a new server process for each client, so when 12 process each connect to the DBMS I see 12 copies of the DBMS server software running DBMSes tend to run in tight loops where the instructions stay cached in the CPU's L2 cache Solaris is smart enough to "know" which CPUs share on-chip cache and that all CPUs are not "symetric" and so can schedule process to stay within a group of "close" CPUs. At boot time the system looks around and builds a hiarchical model of the machine's configuration, noting which PCI busses, CPU and memory are connected to what. On the larger systems there are cards wth four CPU and up to 4GB RAM and some PCI buses all on one card and then the cards are connected by backplane So when you read about these new CPUs don't think about a low-end PC runnig Windows These could be intended to go into a 20-way mutiprossor runnig Solaris (or Mac OS X) You should figure that in 10 years the system on your desk will use techniques like today's high end systems. What Sun's Sunfire does at the board level (four CPUs, Busses and RAM) wil be done at the chip level and you will see four core chips and then 16 core chips andthen computersbuilt with multiple 16-core chips designed like today's "starfire". What would yo use such a system for? How ablut controlling a robot that can walk up a flight of stairs and responce to voice commands and identify objects with a vision system all at once. Another aplication is video rendering, lots of data t process but the _code_ stays cached ad rentering like DBMS and web servering to paralizable. In fact the video render problem is the prime example of what to use a room full of CPUs for. And just wait 'till you buy a High Def Video camera. You will want one of those "quad core" power macs
They did and they have, and they sell shedloads of them. It's called the Pentium M, dual-core 65 nm versions of which will be available next quarter. The currently-available single-core Dothan version performs pretty awesomely, matching FX-55 and P4 EE even at gaming, and all at less than a third the CPU power consumption of the Pentium 4.
Not only that, but they overclocked it to 2.5 GHz as you suggest, and this was back in May.
Because timings and MHz are ALWAYS 10^x, never 2^3x.
And this datarate is obviously 266Mhz*8Bit or something compareable.
HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
I live in Lindenhurst, IL. It's not a "barf-sprawl"! It's a great little town. Why the hell Intel is using my hometown of ~10,000, I will never understand. Apparently there is a Lindenhurst in NY, but that is a smaller town, too. Weird!
There is no system task in existence that will not interface with memory somewhere along the line. AMD's shifting of the memory controller to the CPU was incredibly astute - memory is one of the most used components in any system, and one of the components most accessed by the CPU. We've all seen the huge benefits AMD CPUs have reaped as a result of this move and the restructuring of the low-level I/O buses, especially compared to Intel's paltry "more megahurts!!!1111oneoneone lollerskates" approach.
Goten Xiao
How much does it cost to buy a website to present intel's cpu in a favorible light?
Its obvious that they needed to pay for this service since AMD opterons are so much faster and everyone knows it.
Lindenhurst, NY has about 27,819. It was also George Washington's home for a while (back when it was called Breslau).
We've been building out a lot of systems for our (web) apps. They've ALL been based on Xeon processors despite the fact that the dual-core Opteron is clearly the way to go. The catch has always been availability of a solid dual socket board (we try to get as much raw power in a 1U case as we can so dual chips are kind of company culture around here).
If you've been using the Opteron, and it sounds like in production, I'd love to hears some details about good/compatable/stable hardware. I really, really, really don't want the next system I purchase to be another hot, slow Xeon.
Quack, quack.
Are there real world benchmarks of these things? All I see is a lot of Intel-bashing - which does not excite me much.
Here each Opteron has its own memory interface, while the Xeons have to share one FSB.
Despite what the freakin' article says, Lindenhurst (Intel E7250 chipset) is not the latest Xeon DP chipset (the often-cited GamePC benchmarks also use this chipset). Intel's latest Xeon chipset, the E8500 (Twin Castle), features dual independent FSBs running at 667MHz each. It's available now (e.g. Dell PowerEdge 6850 and PowerEdge 6800). The dual buses will be increased to 800MHz each in January (E8501 chipset). These new Paxville Xeons were released ahead of schedule (rushed in response to dual-core Opteron), so I think that's why the dual 800MHz bus chipset is trailing Paxville (which is capable of 800MHz FSB) by two months.
So I think the freakin' article is wrong when it says:
Also note that the GamePC benchmarks use two 800MHz Paxville Xeons on the E7250 chipset (single 800MHz bus). The current E8500 chipset has dual independent buses, but they only run at 667MHz each. I'm sure the dual-bus system will outperform the single-bus system by a lot, even though the dual buses each run 16.7% slower than the single bus. I'm also pretty sure the dual Opterons will still whup the dual Xeons, but not by so much.TO START
PRESS ANY KEY
Where's the 'ANY' key? I see Esk, Kitarl, and Pig-Up...
Is it just me, or is this name a cross between "Hindenberg" and "Lakehurst"?
This doesn't bode well for Intel.
You speak with absolutely NO idea of how, what, or why I do what I do. I'm glad you can afford this and if I had the final say we'd be running enterprise level hardware all around.
But guess fucking what? Thats not the way it works for a lot of us in the *gasp* real world.
As far as you tidbit goes I agree 100%. Frankly I think you're just being an asshole to A) brag about your leet warez B) just another blow-hard who likes to try to cut people down who has neither the attention nor capability to grasp the big picture.
I build some servers. Get fucking over it.
Sincerly.
Quack, quack.
Thats not the way it works for a lot of us in the *gasp* real world.
I wasn't aware that my company operated in some sort of imagainary fairy universe.
And if you can't afford to spend $800 on a server, you're doing something a.) very wrong or b.) that doesn't actually require a real server.
Do not fold, spindle or mutilate.
I grew up there, and that's always been a local rumor. I think it's supposed to have actually been in the Guinness Book at some point. True or not, anyone who's spent any time around the LIRR station wouldn't doubt it for a second.
Slashdot Burying Stories About Slashdot Media Owned
Here's what it looks like in the real world: Oh, you want hard drives with that? You'll need at least 2 because NO mission critical box goes without raid1 and hot-swap bays.
And of course after you've covered all that you've still got $5000 for a single-processor Oracle license which you can add $1995 if you'd like hot-fixes and support for a single year. SBS? OEM is still another $500 for a 5 CAL.
Back in the pre-dot.bubble days we wasted oddles of money on "real" servers. 350's, 250's and a couple of Spark 5 workstations, dual homed with redundant T1's. Times have changed.
Now stop trying so hard to be a prick.
Quack, quack.
I lived on S. Broadway and worked for a while in downtown Manhattan. About 1/2 mile walk to the train station and I think I passed 4 bars on the way, all of them on Hoffman.
William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
Interesting. I've read the E8500 product description, and as I understand it, the two FSBs will remove the bottleneck when accessing the L3 Cache. But what about the actual interface to the memory modules?
I think the Opteron is still superior in that regard.
C - the footgun of programming languages
Maybe you need to talk to your finance department about those prices. $800 for a "real" server is a fairy tale. Or maybe your just a basement dweller using refurbished dells to run p2p out of your mom's closet.
OK...so a SunFire X2100 isn't a real server? And I have no idea where your coming from on software licensing.
Look up in the thread. This is about how it's moronic to build a real server out of parts when so many servers with actual support from a real company are available for similar or possibly even better prices.
I have no idea what tangent you're trying to take this off on, but I'm not really interested in it.
Do not fold, spindle or mutilate.
Back in the pre-dot.bubble days we wasted oddles of money on "real" servers. 350's, 250's and a couple of Spark 5 workstations
I went back and read your comment again. "Spark" workstations? That explains it all. You have no idea what you're talking about.
Have a nice day.
Do not fold, spindle or mutilate.
Don't be an asshole and make me spell it all out for you.
Sun Sparc Ultra 5.
You're right, I'm abso-fucking-lutely clueless.
Quack, quack.
With options? Ya, its a real server with a price-tag to match. As far as software, maybe you've never done an IT budget but you can't spec prices on one without the other. Otherwise you've got no budget, and no approval.
Anyway, this is your tangent. If you can hit that funny back button a couple of times you'll see I was asking someone else a legitimate question before you decided to drop you tidbits.
Quack, quack.