Slashdot Mirror


Microsoft Advice Against Nehalem Xeons Snuffed Out

Eukariote writes "In an article outlining hidden strife in the processor world, Andreas Stiller has reported the scoop that Microsoft advised against the use of Intel Nehalem Xeon (Core i7/i5) processors under Windows Server 2008 R2, but was pressured by Intel to refrain from publishing this advisory. The issue concerns a bug causing spurious interrupts that locks up the Hypervisor of Server 2008. Though there is a hotfix, it is unattractive as it disables power savings and turbo boost states. (The original German-language version of the article is also available.)"

14 of 154 comments (clear)

  1. Broken processors by Anonymous Coward · · Score: 5, Insightful

    The processors are clearly broken, and anyone who bought them should get a refund or an exchange. End of story.

    1. Re:Broken processors by hattig · · Score: 4, Insightful

      It's pretty serious.

      Server requirements of CPUs include virtualisation and power savings (saving power in the data centre is a top priority for companies now).

      This CPU cannot do both at the same time, at least with Windows Server 2008's Hypervisor. Presumably it is being sold with both items listed as features however. I agree with the OP - the CPUs are broken as sold and advertised.

    2. Re:Broken processors by agnosticnixie · · Score: 2, Insightful

      Or the processor exposes an issue with the OS...

    3. Re:Broken processors by countach · · Score: 2, Insightful

      So you've missed the entire trend towards power saving in the data center?

  2. Re:AMD is looking better and this is the type of s by Anonymous Coward · · Score: 2, Insightful

    amd is incapable of having bugs in the convoluted exception path?

  3. Isn't it really a bug in Windows Server? by tomhudson · · Score: 5, Insightful

    FTFA:

    For the integrated hypervisor of Windows Server 2008 R2, Microsoft has bravely resorted to a timer function that they themselves had classified as unreliable for former processors: the timer of the Advanced Programmable Interrupt Controller (APIC). Unlike, for example, the CPU timer (Time Stamp Counter, TSC) - which by now is comparatively resistant to power-saving, SpeedStep and turbo-boost modes, but is also virtualised by virtual machines - the APIC timer can also trigger interrupts. Unfortunately, right now, the Nehalem has too many of those, so that the hypervisor falters and then stops, returning the message "Clock_Watchdog_Time-out".

    So yes, if you depend on something that generates an interrupt whose code path may be suspended in certain power-saving modes, don't be surprised if it doesn't get serviced promptly. It looks more like a bug in Windows Server.

    Back in the old days, when you issued a CLI instruction, you made sure your routine didn't do too much work before issuing an STI, because that code isn't re-entrant (it's directly modifiable by the hardware, which is why you have to use the "volatile" keyword to make sure that compilers didn't "optimize away" any loops, etc). Kind of hard to guarantee that if you're putting that portion of the hardware to sleep between interrupts. As the article points out, disabling those power-saving modes fixes the problem.

    1. Re:Isn't it really a bug in Windows Server? by Anonymous Coward · · Score: 2, Insightful

      This article is gibberish. The TSC does not generate interrupts. As a clocksource, the TSC is unreliable because while the frequency is fixed within a socket, it can skew across sockets particularly when dealing with multi-node systems.

  4. Re:AMD is looking better and this is the type of s by CAIMLAS · · Score: 2, Insightful

    I wouldn't say "AMD is better", necessarily. I will say, however, that the Xeons seem to have been plagued from the very beginning with problems like this. They're just fringe enough to not get enough run-in testing, and the bugs don't get as quickly found as they do with the more mainstream/many users processors.

    --
    ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
  5. Re:AMD is looking better and this is the type of s by amorsen · · Score: 3, Insightful

    Read the link. 5 pages of errata, and that's just headlines. Modern processors are very complicated, and they will have bugs.

    The major difference between Intel and AMD when it comes to errata is that Intel learned its lesson about secrecy from the Pentium FPU fiasco. Since then they have had a very open approach to processor bugs. AMD hasn't had such a PR disaster and isn't quite as open. That doesn't mean they are particularly less buggy.

    --
    Finally! A year of moderation! Ready for 2019?
  6. Performance, complexity & bugs by Alwin+Henseler · · Score: 3, Insightful

    No, it's more like [hardware manufacturer of your choice] AND [software manufacturer of your choice] are incapable of making products that are both complex, and bug-free.

    And for some reason, 'high performance' often equals 'complex'.

  7. No evidence of problem in Xen or VMWare -MSFT bug by Glasswire · · Score: 2, Insightful

    Looks like it's a Microsoft coding problem if there is no problem in Xen or VMWare ESX Hypervisors (post on VMware above is far from useful).
    And poster didn't read the MSFT article very closely. The hotfix doesn't preclude the energy saving sleep states, it's the workaround that inhibits their use.

  8. Re:What about for Windows 7? by cwebster · · Score: 2, Insightful

    Actaully no, IE8 is the only program you mentioned that actually needs an i7 920 and 12 gigs or ram to properly execute.

    The rest of your post is like a word problem, "Sally has 5 fish, 2 turtles and a cat. How many cats does Sally have?." That is to say, completely irrelevant to the question at hand.

    Using putty to justify a multiple core machine, quite hardcore...

  9. Re:AMD is looking better and this is the type of s by mysidia · · Score: 3, Insightful

    It's the equivalent to writing a program against the Windows API, not testing it, and calling the API buggy when you find that it is failing in the wild.

    The API may not match the spec perfectly, but it's your software that's buggy.

    Intel can revise the proc, or revise the spec to be in agreement.

    MS is trying to use an APIC interrupt for timing that isn't normally used for that purpose.

    It's the equivalent of attaching an alarm clock to your electric car's engine, and complaining when the idling speed deviates due to a power saving feature.

    Nehalem processors were out long before 2008 R2 or the newest Hyper-V release.

    intel Nehalem is a processor with features very attractive to users of virtualization, it's one of the most common procs to be used in new server deployments.

    There is absolutely no excuse for MS not extensively testing and qualifying including stress-testing their Hypervisor on Nehalem CPUs before releasing the code.

    It would be like you or me writing a piece of desktop software today (in 2009), designed for use with Windows, and extensively testing it on Windows '98 and XP, but not discovering a frequent crash on Vista, that almost always occurs as soon as starting the program.

  10. Re:What about for Windows 7? by Anonymous Coward · · Score: 1, Insightful

    What the hell is wrong with you people?

    This article is about the Xeon processor. These are separate from what is sold on the desktop. There was a time when your average Slashdotter used to know this kind of stuff.