Slashdot Mirror


Microsoft Advice Against Nehalem Xeons Snuffed Out

Eukariote writes "In an article outlining hidden strife in the processor world, Andreas Stiller has reported the scoop that Microsoft advised against the use of Intel Nehalem Xeon (Core i7/i5) processors under Windows Server 2008 R2, but was pressured by Intel to refrain from publishing this advisory. The issue concerns a bug causing spurious interrupts that locks up the Hypervisor of Server 2008. Though there is a hotfix, it is unattractive as it disables power savings and turbo boost states. (The original German-language version of the article is also available.)"

2 of 154 comments (clear)

  1. Broken processors by Anonymous Coward · · Score: 5, Insightful

    The processors are clearly broken, and anyone who bought them should get a refund or an exchange. End of story.

  2. Isn't it really a bug in Windows Server? by tomhudson · · Score: 5, Insightful

    FTFA:

    For the integrated hypervisor of Windows Server 2008 R2, Microsoft has bravely resorted to a timer function that they themselves had classified as unreliable for former processors: the timer of the Advanced Programmable Interrupt Controller (APIC). Unlike, for example, the CPU timer (Time Stamp Counter, TSC) - which by now is comparatively resistant to power-saving, SpeedStep and turbo-boost modes, but is also virtualised by virtual machines - the APIC timer can also trigger interrupts. Unfortunately, right now, the Nehalem has too many of those, so that the hypervisor falters and then stops, returning the message "Clock_Watchdog_Time-out".

    So yes, if you depend on something that generates an interrupt whose code path may be suspended in certain power-saving modes, don't be surprised if it doesn't get serviced promptly. It looks more like a bug in Windows Server.

    Back in the old days, when you issued a CLI instruction, you made sure your routine didn't do too much work before issuing an STI, because that code isn't re-entrant (it's directly modifiable by the hardware, which is why you have to use the "volatile" keyword to make sure that compilers didn't "optimize away" any loops, etc). Kind of hard to guarantee that if you're putting that portion of the hardware to sleep between interrupts. As the article points out, disabling those power-saving modes fixes the problem.