Microsoft Advice Against Nehalem Xeons Snuffed Out
Eukariote writes "In an article outlining hidden strife in the processor world, Andreas Stiller has reported the scoop that Microsoft advised against the use of Intel Nehalem Xeon (Core i7/i5) processors under Windows Server 2008 R2, but was pressured by Intel to refrain from publishing this advisory. The issue concerns a bug causing spurious interrupts that locks up the Hypervisor of Server 2008. Though there is a hotfix, it is unattractive as it disables power savings and turbo boost states. (The original German-language version of the article is also available.)"
The processors are clearly broken, and anyone who bought them should get a refund or an exchange. End of story.
AMD is looking better and this is the type of stuff that intel worshipers say amd systems do and now what will they say about intel?
amd is incapable of having bugs in the convoluted exception path?
Maybe Xeons are what end up being used on the UESG Marathon. I mean, half of the terminal messages on that ship are subject to the same bug. Just look at this typical example:
http://marathon.bungie.org/story/nawmanhesclose.html#M3.13.1.1
The English word fart is one of the oldest words in the English vocabulary.
This story is interesting and timely because I plan on buying a new desktop in the next 2 weeks, just waiting for the right deal to come out, hopefully on Cyber Monday. While not getting a server, I will be getting Windows 7. I had been planning on an i7, but now am hesitant. Is there a problem with these processors for home use/gaming purposes under Windows 7? Or would I better off going with a Quad Core?
-"Those who fought today will die tommorow."-
Many of the benchmarking sites have also posted some poor results - I was thinking this might be a generation to skip, but now I wonder if a flaw has been discovered that could be fixed with a microcode upload. Might help the benchmarks too if it was a hidden variable.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
It sounds like microsoft should retract the advice and issue a warning that no OS should be run on a processor with such spurious interrupts?
Or is this the sort of crappy hardware kernels are supposed to put up with in which case it should be Intel advising against running windows on it's hardware?
Int€l bashing..check
M$ bahing...check
now i just sit back and watch the karma roll in
I've been experiencing problems with intermittent lockups under VMWare as well. DL370-G6 boxes. HP has given us BIOS fixes and is even shipping new boxes, but if there's a suspect problem
with working with MS' hypervisor, I wonder if this is the same issue?
Harrison's Postulate - "For every action there is an equal and opposite criticism"
Nothing to see here. Move along. What? Nevermind where I work.
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
I wouldnt be so sure without seeing test results
I read the article, I read the MS support report, and I read the Intel advisory. And I don't think that the summary is correct.
The summary says that the hotfix disables power savings and turbo boost. But my reading of the MS report is that an affected system has two options, (1) a workaround, and (2) the hotfix. The difference is that the workaround disables advanced power savings and is known to be stable without side effects, but the hotfix actually fixes the problem with the vector table, presumably by following the instructions provided in the Intel advisory note.
Said another way, the hotfix doesn't disable power savings and doesn't disable turbo boost.
I expect that this is another fine example where Slashdot editors misunderstand a situation. Someone prove me wrong.
FTFA:
So yes, if you depend on something that generates an interrupt whose code path may be suspended in certain power-saving modes, don't be surprised if it doesn't get serviced promptly. It looks more like a bug in Windows Server.
Back in the old days, when you issued a CLI instruction, you made sure your routine didn't do too much work before issuing an STI, because that code isn't re-entrant (it's directly modifiable by the hardware, which is why you have to use the "volatile" keyword to make sure that compilers didn't "optimize away" any loops, etc). Kind of hard to guarantee that if you're putting that portion of the hardware to sleep between interrupts. As the article points out, disabling those power-saving modes fixes the problem.
I wouldn't say "AMD is better", necessarily. I will say, however, that the Xeons seem to have been plagued from the very beginning with problems like this. They're just fringe enough to not get enough run-in testing, and the bugs don't get as quickly found as they do with the more mainstream/many users processors.
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
Read the link. 5 pages of errata, and that's just headlines. Modern processors are very complicated, and they will have bugs.
The major difference between Intel and AMD when it comes to errata is that Intel learned its lesson about secrecy from the Pentium FPU fiasco. Since then they have had a very open approach to processor bugs. AMD hasn't had such a PR disaster and isn't quite as open. That doesn't mean they are particularly less buggy.
Finally! A year of moderation! Ready for 2019?
From the pdf file linked from the Intel site, I think it's AAK36, as it's the only one that mentions the word "spurious." This has to do with writing to the interrupt vector table when a local interrupt is pending. That doesn't look terribly serious from my perspective. If I'm mistaken and it's a different errata, please reply with the correction.
No, it's more like [hardware manufacturer of your choice] AND [software manufacturer of your choice] are incapable of making products that are both complex, and bug-free.
And for some reason, 'high performance' often equals 'complex'.
Looks like it's a Microsoft coding problem if there is no problem in Xen or VMWare ESX Hypervisors (post on VMware above is far from useful).
And poster didn't read the MSFT article very closely. The hotfix doesn't preclude the energy saving sleep states, it's the workaround that inhibits their use.
and also switches contexts about 5 times as fast as well (and these are full processes, not threads)
Seeing as Linux has about 1/5th of useful software compared to Windows, I guess it all balances out in the end.
And here's me, with a ton of recent Insightfuls and a good backlog of Excellent karma, pissing it all up the wall for one dig at an obvious Linux fanboi ... when will I ever learn ?
I'm putting together a dual i7 xeon server for a customer in a couple weeks. He's planning to run SBS 2008 on it. if he's not doing any virtualization on it will he be affected?
There is no evidence Intel pressured MS into their wording of the fix/workaround. It's quite possible that after not finding a fix/workaround for it and writing an initial draft saying not to use the processors, MS developed a workaround/fix (perhaps with Intel's help) that actually does work and put that in instead of saying not to use the chips.
To those are are suddenly concerned about Intel chips because they have an errata, every chip has errata, tons of them. AMD has them too, trust me.
I've been running a Core i7 (920) for a year and it's worked great under Vista and Windows 7. I'm sure it has faults, but they don't seem to be an issue in my regular use.
http://lkml.org/lkml/2005/8/20/95
Xeon is just a marketing name. The Xeon 3400 are identical with the i5-7xx, i7-8xx CPUs, the Xeon 3500 are identical with the i7-9xx CPUs and the Xeon 5500 CPUs are basically i7-9xx with two QPI Links.
For example, this issue also affects als i5 and i7 CPUs.
It couldn't possible be that VMware, Xen and Microsoft have different approaches to the whole Hypervisor thing, which could expose different bugs in Intel's Hardware.
Intel i7(quad) @ 3.33ghz Idle: 117watts Full-Load: 247watts
Phenom2 x4(quad) @ 3.2ghz Idle: 148watts Full-Load: 236watts
Now, include Intel's cpu being 2x-4x faster(depending on type of work) and check your performance per watt and tell me which is better
I'm trying to follow your logic of AMD being better (at least for now, bulldozer has a lot of promise)
That's odd, we must be running different versions of Windows. I run it too, and it has exactly one useful program on it, the game I play. In contrast, my O/S of choice, Linux, has thousands of useful programs on it, which I use daily. No contest.
And you think like one too! Although on the positive side, you can probably recognize fine beers instead. ;-)
It's a processor bug exposed by a new hypervisor technique used by MS and nobody else.
I'm not sure why you want to blame this on MS.
I think we have a clown, mod funny!
How times have changed. I remember when Intel used to be the bitch in the relationship.
AMD has also built parts with equally screwed up timers, particularly TSC clock skew on multi-cores. Timers are just messed up on x86 from either company. This nonsense goes back years. There are now at least four distinct general purpose clock sources that must be present on modern systems; tsc, apci_pm, hpet and pit (as labeled by the Linux kernel.) There will probably be further proliferation in the future as ALL of the existing timers are inadequate in subtle ways. Implementations from both manufacturers have been plagued with bugs that require nasty work-arounds; google "clocksource tsc unstable", "pm-timer bug" or "athlon x2 tsc" for some examples. This nonsense that Microsoft has stumbled upon is just the latest in a long and colorful history of failure that we'll now have to add to the list.
Computers are supposed to keep time. Today that means high resolution clocks that work correctly regardless of power saving, concurrency, etc. Using these crucial timers is not suppose to cause spurious interrupts, bus contention or other subtle problems. People that must work with this stuff are thoroughly fed up with this ever growing pile of half-baked bullshit.
Lurking at the bottom of the gravity well, getting old
People still buy processors from a thrice-convicted, unrepentant monopolist.
3 decades of anti-consumer anti-competitive activity and still they come up smelling like roses...
Science : Proprietary , Knowledge : Open Source
[needs citation]
BlaProc t9 (nine cores) @ 5.33Ghz idle: 24.3 Watts Full-Load: 25.6 Watts.
Xen need not use the hardware virtualization, and in fact performs far better in "para-virtualization". So would any system that avoided so much of the hardware virtualization and used a customized kernel, more suited to use in a virtualized OS by speaking more gracefully with the virtual server's system. I find it wonderful, and dearly with that VMWare could be convinced to support that kind of guest environment.
A Xeon proc is not the same as an i5 or i7.
I agree wholeheartedly. I once spent a lot of time trying to get a virtualised windows machine to run in plain old vmware server without the clock galloping head at 40% faster than wall-clock time; I tried many different things on the linux host side as well as the vm and the vmware tools install.
Will Intel and AMD please sit down like adults and come up with a standardised mechanism that virtualises and copes with dynamic clocking, multiple cores with/without hyperthreading and all the idle and sleep states.
Um, VMWare does support that. That's what the VMI and VMIPAE kernals are for in many distributions. However, VMWare just announced that based on their testing with the latest hardware, that it's now as fast and sometimes faster to not paravirtualize, so they're phasing out support in 2010-2011.
It's the equivalent to writing a program against the Windows API, not testing it, and calling the API buggy when you find that it is failing in the wild.
The API may not match the spec perfectly, but it's your software that's buggy.
Intel can revise the proc, or revise the spec to be in agreement.
MS is trying to use an APIC interrupt for timing that isn't normally used for that purpose.
It's the equivalent of attaching an alarm clock to your electric car's engine, and complaining when the idling speed deviates due to a power saving feature.
Nehalem processors were out long before 2008 R2 or the newest Hyper-V release.
intel Nehalem is a processor with features very attractive to users of virtualization, it's one of the most common procs to be used in new server deployments.
There is absolutely no excuse for MS not extensively testing and qualifying including stress-testing their Hypervisor on Nehalem CPUs before releasing the code.
It would be like you or me writing a piece of desktop software today (in 2009), designed for use with Windows, and extensively testing it on Windows '98 and XP, but not discovering a frequent crash on Vista, that almost always occurs as soon as starting the program.
There is a price to pay for being on the "bleeding edge" of technology.
You are essentially being an unpaid BETA tester for both Microsoft, Intel, and whatever other components you happen to be using.
You are paying for the privilige of BETA testing , and since your software comes with NO WARRANTY, or FITNESS FOR A PARTICULAR PURPOSE, and contains, KNOWN DEFECTS, you should be happy to know your hard work will be used to make other peoples life easier.
I'm going to call BS on the AMD power numbers. I have a Phenom II X4 system and the whole computer doesn't even draw that amount of power.
Perhaps said companies would benefit by migrating their servers to Linux?
I'm running a brand new DL370 G7 with a pair of 2.93GHz Nehalams and Oracle VM Server 2.2 (specialized RHEL 5.3 Xen) and it seems to be working fine, except for a completely unrelated SAS backplane / SmartArray p410 failure I experienced before the box was a month old, but that was just a simple fluke of a hardware failure like any server can experience.
Should say "I'm running a brand new DL370 G6... " instead.
Too much Lophraig between my brain and the keyboard tonight.
Do the timers in Freescale or TI (or IBM) processors have problems like this?
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
Folks, this is a very irresponsible headline at slashdot. The Microsoft articles does NOT say hotfix breaks power save and it doesn't even mention turbo, but that it is an either or solution. Microsoft always offers workarounds as an ALTERNATIVE to the hotfix for people who don't want to apply hotfixes. The Microsoft KB article even tells you if you want to keep using those power states, then run the hotfix and make a certain modification to the registry.
This post makes it sound like some kind of cover up and that the fix causes major CPU slowdowns, and that it's on the level of the AMD Barcelona TLB bug where the fix actually did cause a significant performance drop. This does not appear to be true. The real story is that all CPUs have hundreds of errata, and it's the job of the software maker to work around it, and that is what Microsoft is doing with their hotfix and registry hack. They're also telling you if you aren't experiencing any problems, don't bother applying the hotfix.
I didn't see a link to the KB article in question. I assume this is the one: http://support.microsoft.com/kb/975530
That "offensive" tone seems to arise from our old "friends", the conservative, which associate authority with righteousness (there's a TED about this).
The way I see it (TM) -- and use it -- replacing some letter with a currency sign in a name aims at conveying the idea that the name refers to a profit-seeking entity.
It's not a childish attempt at mockery like some conservative jerk would want us to believe, read ahead to understand why.
It is, IMHO, a "caveat emptor", a warning that one is no longer having a conversation with a friend, a sign one must be alert for hidden agendas.
M$, INT€L -- and others -- have clients, paying clients. It's not like talking to a manager of non-paid GPL project. People usually are prone to confusion and think these corps can be their friends. Of course, for some "marketing" people, that might be the very aim...
To be fair, from their point-of-view, I too have a bias and maybe should carry some warning sign; being greatly concerned about freedom, some entities could be get some qualifying sign in their names, too... kinda "watch out, these Freedom folks are trying to defend your rights, even if you're keen on giving up them... beware!".
download video bokep
It's also worth noting that the interrupt delivery mechanism on the Core 2 and newer processors is horribly complex. I'm not sure if these CPU's support VT-d, but I assume that they do and that makes it even more complicated. Parallels couldn't even get interrupt delivery right on the Core 2 (causing kernel panics as a result of IPIs in the host OS unless you paid for the new version) and that was a simpler system. Given the complexity of the hardware, it's not at all surprising that there are errata. Their test coverage probably isn't anywhere near 100%, and it's just bad luck for MS that their code happens to trigger it.
Designing a modern x86 CPU is a horrible task. Not only is the ISA byzantine, there are errata from earlier versions that people actually depend on. For example, a lot of games took advantage of one of the 486 bugs that caused condition flags to be spuriously set after certain instruction sequences. If you follow the documented behaviour, games crash unexpectedly and people complain that your CPU is broken.
I am TheRaven on Soylent News
Xen now tries to use a hybrid system if the hardware is available; they use the host CPU's virtualisation capabilities where available and PV code where it's faster. System call delivery is a good example of this. On an old CPU, the kernel will install an interrupt 80h handler using the Xen Hypercall. Any syscall instructions will be bounced slowly back from the hypervisor and any int 80h instructions will jump straight to the system call handler in the kernel. With a new CPU, the kernel will just set a syscall / sysenter handler as normal and these instructions will Just Work and it will use VT-x or AMD-V instructions for hypercalls. These are faster than the PV approach. Memory mapping, similarly, will be implemented using the host CPU's shadow or nested page tables if available, but a PV-aware guest it may also use some Xen-specific features for avoiding extra context switches while doing this. In contrast, device drivers will always use the PV features if available because they are much faster. The emulated drivers will be installed and work for booting non-PV-aware guests but they will be replaced by the PV versions as soon as they are available.
I am TheRaven on Soylent News
Comment removed based on user account deletion
Excellent points. But the last time I used Xen, there was no evidence of the kind of mixed hardware virtualization/para-virtualization you refer to here. It was either/or, hard-coded into the XML configuration files by the setup tools or re-configurating the installed guest environment. Are you saying this is a run-time detected and enabled behavior? And given that no server class motherboards ship with hardware virtualization enabled, it was very important to be able to run para-virtualized enviornments on other people's hardware, to avoid forcing them to reset their BIOS.
You can use the CPUID instruction in Xen to detect when you're running in an HVM environment and then use a virtual PCI device to map the XenStore and bootstrap the PV mode. Intel had a Linux kernel that did this in early 2007 (and even did some binary rewriting to replace native code with PV code where appropriate). Given the massive NIH that Linux has with respect to Xen, I wouldn't be surprised if it's not part of the stock tree, but you can probably get it working easily.
I am TheRaven on Soylent News
I just googled review sites and that's what I came up with. They might've been total system draw and I just didn't see it.
Here, look at TDP then
http://en.wikipedia.org/wiki/Intel_Core_i7
i7 3.33ghz extreme ed - 130watt
http://en.wikipedia.org/wiki/Phenom_II
Phenom2 x4 3.4ghz 140watt
It is possible for TDP to be exceeded, but all reviews I've ever read had i7 below K10 for power consumption under load. Idle, the i7 was always below K10, but core parking gave it an advantage. Now that i7 can't core park, it should be on par with K10 idle.
Don't forget, the quad core i7 has 8 logical cores and 1 logical core is out performing 1 regular core of the current K10s.
Someone else posted, which I've also read independently, of the 4-core i7 out performing the new 6-core K10. This would make sense with 1 i7 logical cpu out performing 1 K10 core and having 8 faster logical cpus should out perform 6 slower cores. Although, we are talking about different generations of CPU architectures. This will actually make power draw worse. The K10 does not support core parking and having 6 cores will make idle and peak power draw higher than the i7's 4 cores which the i7 will still out perform the K10.
Bulldozer sounds really nice though. Can't wait for some next gen CPU fights.
It's a processor bug exposed by a new hypervisor technique used by MS and nobody else.
I'm not sure why you want to blame this on MS.
Well, it is kind of what he do around here...