Slashdot Mirror


Intel Reveals Itanium 2 Glitch

NeoChichiri writes "News.com is running on an article about glitches in Intel's Itanium 2 chips. Even though it doesn't affect all chips, they have still stopped shipments of the new 450 Servers until the problem is resolved. Apparently it has to be 'a specific set of operations in a specific sequence with specific data.' Intel is saying that affects the 900MHz and 1 GHz Itanium 2 chips and that it will not affect the upcoming 1.5 GHz Itanium 2 6M chips." Until the next iteration of chip arrives though, Oliver Wendell Jones writes, "they recommend working around the problem by underclocking the processor to run at 800 MHz instead of its default 900 MHz or 1 GHz."

7 of 249 comments (clear)

  1. Alternative to underclocking by ethnocidal · · Score: 5, Informative

    Underclocking is typically necessary if a part needs more voltage than is allowed for with the default configuration. This is why when you overclock, the converse is generally required; you can get better overclocks by increasing voltage.

    Obviously, Intel are not going to encourage people to increase the voltage of their processors in order to run them at the default speeds, as this can run the risk of thermal damage to the chip with insufficient cooling, or overly high voltages. It may however still represent an option for system administrators who are keen to retain the performance of the chip.

  2. Possibly timing or power related by dprice · · Score: 4, Informative

    There isn't much detailed information about the exact conditions that bring out the bug, but they do state that the bug is electrical, that some unspecified combination of instructions and data pattern are needed, and that reducing the clock frequency avoids the problem. I can think of several things that might cause the bug. These are just guesses.

    One possibility is that there is a slow timing path in the logic that is marginally meeting the 900MHz or 1GHz clock speed. Going to 800 MHz gives the slow path more margin. This is the easy answer.

    Another possibility is that they have some part of the chip that has insufficient metal to deliver power to the logic gates. The right combination of activity might cause enough voltage droop to cause logic errors. Slowing the clock reduces the power consumption in CMOS chips.

    They might have a crosstalk problem between some signals that could flip bits when the right activity and frequency are combined. Slowing the clock can shift the relative positions of signal transitions.

    Eventually more details might surface, but Intel is probably keeping it quiet so that people don't write code to maliciously crash servers.

  3. Re:How about others (AMD, Mot, IBM) by vadim_t · · Score: 5, Informative

    Not very uncommon, really. Here are some AMD bugs, for example. I think the deal is that the Itanium has a rather serious problem that's been undetected for a long time. Itanium based computers can cost about $20000, which is why it's a big deal. If you have such a system you probably are running something important on it.

  4. Re:How about others (AMD, Mot, IBM) by questamor · · Score: 3, Informative

    The 68040 bug affected quite a few LC040 machines, which made running FPU emulation on them horrid. Basically, trapping calls to the FPU in order to emulate them in software doesn't work as it should. It's b0rked, and most Apple 68LC040 machines just cannot fully emulate an FPU. That wasn't such a problem with the MacOS at the time, as it didn't need an FPU for any functions, nor did most apps.

    Running a normal Linux or NetBSD on one of these machines is asking for pain however,.

  5. Re:Ironic? by cgori · · Score: 4, Informative

    I love posts that are COMPLETELY TOTALLY WRONG.

    The number of states is 2 to the power of the numbers you were talking about. Even if I take the lowest number ("a couple dozen Kbytes") that you mentioned, it's 2^2*12*1024*8 = 2^24000.

    Guess what?

    That's a HUGE number -- way bigger than the "billions of petabytes" you were saying is impossible to recreate for software testing. It's roughly equivalent to 10^7200 (if that somehow makes things easier for you). Of course, the "couple dozen Kbytes" is a massive underestimation of the total state of a modern CPU (100 million transistors, even just making flip-flops will give 2.5M bits of state, and for 6T SRAM more like 16M bits).

    And then you have the nice problem that physics and electrical phenomena play havoc with hardware testing simulations, as opposed to software, which only has to worry about bad boolean logic.

    Come talk to me next time you have to worry about alpha-particle hits changing the state of any of your code or when you care about any event with picosecond granularity (which is just about every day in hardware).

    Yes, software testing has even more states to worry about, but trust me when I tell you that the hardware problem is plenty big enough to prevent exhaustive testing from being applicable. Hardware testing uses a lot of brute-force regression and detailed test planning to find and remove bugs. Software folks would do well to use such methodologies.

  6. Not ppc603s by questamor · · Score: 4, Informative

    ow about Motorola leaving out critical instructions in the PPC603 and crippling every machine with one compared to the PPC601?

    That's a very very big reinterpretation of the facts. ppc603 machines were designed for low cost low heat. One of the ways to do this was to further remove instructions that were not needed, legacy instructions from pre-PPC601, and were never designed to be in the 601. They were not 'critical' and did not cripple anything. ppc603 cpus ended up working just for the purpose they were designed for. cheaper and less energy-hungry cpus.

    the G3 floating point debacle where excel spreadsheets would show up errors consistently

    You made a typo there. "Pentium" is not spelled "G3"

  7. Give ME a break by m11533 · · Score: 3, Informative

    While I have no particular animosity toward Intel, other than it is important for there always to be competition to push them, I do not think they need to be let off the hook. Itanium has been around a very long time. You may think of it as new technology, but that is more because of the lack of acceptance in the marketplace, not because it has only recently been released. What was happening all of these years since Itanium was initially launched?

    Additionally, while the Itanium instruction set takes a different approach to those of Intel's competitors, they are not the only company introducing new CPUs. I do not remember such problems when other 64bit CPUs with their own, new, unique instruction sets were launched by Digital, HP, IBM or Sun to name just a few. These days, the competitive landscape has been radically reduced. Digital no longer exists and its Alpha architecture is owned by Intel. HP, while it still owns its PA-RISC architecture, is trying to migrate its customers to Itanium, though it is hard to say what will really happen to PA-RISC since no one seems anxious to adopt Itanium. IBM also has picked up Itanium, so who knows what will happen to their RISC architecture? That leaves Sun, and while SPARC has always been the weaker of the RISC architectures, it seems to be the primary remaining competitor to Itanium and Intel. Of course, who knows how much longer Sun will survive as an independent company? Maybe they are the next to be gobbled up by IBM or HP, both already commited to Itanium, so what happens to SPARC?

    Finally, it is hard to say exactly where AMD fits in all of this. Its 32-bit line provides excellent competition to Intel's 32-bit Pentium family (now at P4), and the AMD 64-bit architecture looks like a nice increment beyond the now very old x86 32-bit architecture. But, in terms of major pressure on future CPU architectures? I just don't know where that competition will be coming from... Maybe China, Russia, Japan, India? Places not noted for their hi-tech prowess, but with lots of experience in fabrication and lots of affordable talent?