Intel Reveals Itanium 2 Glitch
NeoChichiri writes "News.com is running on an article about glitches in Intel's Itanium 2 chips. Even though it doesn't affect all chips, they have still stopped shipments of the new 450 Servers until the problem is resolved. Apparently it has to be 'a specific set of operations in a specific sequence with specific data.' Intel is saying that affects the 900MHz and 1 GHz Itanium 2 chips and that it will not affect the upcoming 1.5 GHz Itanium 2 6M chips." Until the next iteration of chip arrives though, Oliver Wendell Jones writes, "they recommend working around the problem by underclocking the processor to run at 800 MHz instead of its default 900 MHz or 1 GHz."
Well done intel ;-)
Underclocking too...
The Itanic 2 appears to be going down like the first...
Intel is now selling cuts of meat?!?!? WTF? Does the grasping greediness of this corporation know no end?
Their chops are busted?
underclocking the processor to run at 800 MHz
That'll make the chip more stable anyways. Always underclock your mission critical servers!
Is it a glitch or did they sell chips that can't run at the rated speed?
Looks like they are overclocking their own chips. Maybe they should sell them with a bigger fan on them... ;-)
Itanium 2 chops?... ?
Methinks dear Timothy hasn't really grapsed what "editing" really means.
*in Homer Simpsons voice* 'mmmmmm.....Itanium 2 chops.......glazhzhzhz'
Bye-bye fans and thermal paste, hello heaters and insulation!
Is this something that could be addressed by a microcode update? I've always wondered about exactly what can be done with the Kernel support for microcode updates.
On a side note -- who exactly didn't expect something like this? Intel has a history of this sort of thing -- from the 80486DX not being able to add properly, and IBM having to halt shipments of PS/2 machines; to the Pentium F00F bug and others. Buying first run Intel chips is like playing dice with your business. Give them a few production runs to work out the bugs...
Learning HOW to think is more important than learning WHAT to think.
Perhaps they should put some silver stuff over the serial number. Welcome to the Intel Itanium scratchcard lotto, those with bad chips win a new one :)
What, this is going to affect all 6 people that own this chip?
can't sleep slashdot will eat me
Apparently it has to be 'a specific set of operations in a specific sequence with specific data.
This sounds similar to the way they described the floating point divide error in the original pentium. How long until they start giving odds on the chances of someone seeing the problem in normal use.
Jason
ProfQuotes
You deserve to be beaten like a red-headed step-child.
When I clicked to read more of this story I got an Intel ad at the bottom of the story. Gee, what great timing...
Two wrongs don't make a right, but 3 lefts do - Lew of GO magazine
whenever they come out with a new design, they tend to have all sorts of f00fy little problems with it.
a specific set of operations in a specific sequence with specific data.
hrm....
$crash = 1;
"they recommend working around the problem by underclocking the processor to run at 800 MHz instead of it's default 900 MHz or 1 GHz."
Why not just buy the lower-clocked CPU's then? Will Intel replace the crap chips when a revision with a fix comes around?
"If the customer feels it's the right solution, we'll exchange processors with ones that aren't affected," she said. Intel has developed a simple software test that can determine whether a chip is affected. Meaning what? Lower-end chips that aren't aaffected, or a fixed version of the same chip. If it's the same chip, who wouldn't think it is the right solution? The article doesn't indicate whether the problem is actually solved either, but that it seems to be somewhat of an anomaly that doesn't affect all chips.
Not a good day for Intel, and probably another reason why you don't immediately need that "Newest on the shelf" CPU, whether for your home machine or a server. Besides, by the time this chip is assuredly fixed, a faster revision will probably be out at a comparable price.
It's literally etched in metal, kinda hard to change it at this point.
take your CPU down anyways.
3rd time right here
Underclocking is typically necessary if a part needs more voltage than is allowed for with the default configuration. This is why when you overclock, the converse is generally required; you can get better overclocks by increasing voltage.
Obviously, Intel are not going to encourage people to increase the voltage of their processors in order to run them at the default speeds, as this can run the risk of thermal damage to the chip with insufficient cooling, or overly high voltages. It may however still represent an option for system administrators who are keen to retain the performance of the chip.
When you consider all the bugs that come through in higher level programming where everything is object oriented and human readable, it really comes as a surprise that you don't see more bugs in hardware considering the complexity of the problem and low level nature.
He who knows not and knows he knows not is a wise man. He who knows not and knows not he knows not is a fool.
Sounds familiar.. crunching certain operands generated flawed results in 32x32 multiplies. Was some interference on chip, layout issue. Double sigma comes to mind. Since this bug can be fixed by lowering the speed it's probably a similar issue?
Intel disclosed an electrical problem Monday that can cause computers using its flagship Itanium 2 processor to behave erratically or crash.
Hmmm...wonder if BMW is using these chips?
"Intel is saying that affects the 900MHz and 1 GHz Itanium 2 chops and that it will not affect the upcoming 1.5 GHz Itanium 2 6M chips."
In other news Slashdot is returning their newest server to Intel for a replacement. It appears that the error in in the on chip string handling routines.
Norris/Palin 2012
Fact: We deserve leaders who can kick your ass and field dress your carcass.
Does anyone else find it ironic that when Intel makes one mistake in a processor, everyone jumps on them for making a bad product, but software companies can sell products with thousands of bugs in them and people accept this as normal? Sure, we complain about buggy software, but I don't think anyone here expects any software to be completely bug-free. Why are Intel and other chip manufacturers held to such a high standard? Or, more importantly, why are software companies not held to the same high standards?. If Intel and AMD can make incredibly complex processors that are (usually) completely bug-free, why can't any software company in the world make any product that even comes close to being free of defects?
Disclaimer: The opinions expressed are not necessarily my own, as I've not yet had my medication today.
who exactly didn't expect something like this? Intel has a history of this sort of thing
Of course when it happens to Intel, then EVERYBODY knows about it. My question is, how prevelant is this sort of thing throughout the cpu industry? Anyone know of other "mistakes" by the other major players? It's hard to imagine that only Intel makes these kinds of goofs, esp. with the complexity of todays chips. As an example, wouldn't Mot's failure to scale up the G4 PPC chips be considered an "error"? They just caught it early enough to not to ship any chips and say "oh, we're sorry, our G4's won't go as fast as we originally stated, wait another year and a half or so and we'll get it all sorted out". Didn't they also do a similar thing with the 68040?
Didn't Dexter Douglas become Freakazoid when his cat punched 'a specific set of keys in a specific sequence' after he installed a CPU with a 'glitch'?
It's the main component of a computer. Besides, for software, it's much easier to update (bug fix). If your processor is messed up, it's a lot worse.
Agilent Technologiesr sity of Oslo
ChevronTexaco
Cornell University
DreamWorks
Johns Hopkins University
Liberty Medical
National Crash Analysis Ctr.
NCSA
PNNL
Rice University
Sony Pictures ImageWorks
Wells Fargo
VeriSign, Inc.
Airbus
British Petroleum
CERN
Daimler-Chrysler
Daresbury Laboratory
Erickson Utvecklings AB
HLRS
Philips Semiconductor
Preussag
SecFinex
Triaton
Unive
VTG-Lehnkering AG
Bio-Informatics Institute
Fujitsu ISOTEC, Ltd.
Ibaraki Hitachi Information Service Co., Ltd.
MarketBoomer
Mazda
Mitsubishi Heavy Industries
Mitsui Chemicals
Okazaki National Research Institute
Singapore-MIT Alliance
Subaru Research
Toyota Autobody Corp.
(nad lots of others....)
Not too long ago I saw a story that talked about how Intel was working (had) chips that wouldn't work if speed!=Intel settings. The new chips might prevent this kind of work around.
Oops, I posted.
"If you're not confused by quantum mechanics, you really don't understand it." - Niels Bohr
I have about 6 years experience in Quality Assurance, with emphasis on electronics, manufacturing processes and attention to detail.
You know...if you're looking for anyone that is.
all those gloches with the chops and the topys... or is that tpyos?
I am very sucseptible to "let's have another drink"
How long does it take an Itanium to count to 10?
I don't know but will let you know when it gets there
OMG I don't believe I just wrote that
rus
Cheap UK and US VPS
What you wrote is not a list of instructions, genius.
Chips... chops... I'm confused...
FLR
We got screwed, we just bought 300 i2 900mhz processors(in dual proc systems) and NOW this comes up. Well at least I was right when I said we should have bought xeon's!!
Anybody else had flashbacks of the Pentium FDIV bug and this excelent post?
My other OS is the MCP!
AMD STILL SCUCKS ASS
There isn't much detailed information about the exact conditions that bring out the bug, but they do state that the bug is electrical, that some unspecified combination of instructions and data pattern are needed, and that reducing the clock frequency avoids the problem. I can think of several things that might cause the bug. These are just guesses.
One possibility is that there is a slow timing path in the logic that is marginally meeting the 900MHz or 1GHz clock speed. Going to 800 MHz gives the slow path more margin. This is the easy answer.
Another possibility is that they have some part of the chip that has insufficient metal to deliver power to the logic gates. The right combination of activity might cause enough voltage droop to cause logic errors. Slowing the clock reduces the power consumption in CMOS chips.
They might have a crosstalk problem between some signals that could flip bits when the right activity and frequency are combined. Slowing the clock can shift the relative positions of signal transitions.
Eventually more details might surface, but Intel is probably keeping it quiet so that people don't write code to maliciously crash servers.
"Open the Itanium register sets, HAL."
...."
"I'm sorry, Dave. I can't do that
-kgj
Obviously, the error is division related.
The REAL jabber has the user id: 13196
What you do today will cost you a day of your life
In terms of reliability, the Itanium II is no worse than the UltraSPARC series of chips. Both Itanium and UltraSPARC face the daunting task of debugging 100+ million transistors. Ensuring that the fabricated chip is bug free is virtually impossible. So, both companies have substantial errata sheets.
The reason that Intel chips "appear" to be more error prone than other companies' chips is that Intel chips are extremely popular. So, people tend to pay far more attention to flaws in Intel chips than they do to flaws in other comapanies' chips. However, since so many people pay attention to the flaws in Intel chips, they are likely to have less bugs than other chips. The economies of scale that, say, the Pentium 4 enjoys means that if the Pentium 4 does have a bug, then it will likely be found by someone among the gazillion users. Then, Intel will fix the problem. Economies of scale help to lower the cost of a product but also help to lower the number of bugs.
In any event, the performance of the Itanium II is at least 1 order of magnitude greater than the UltraSPARC III and (soon) IV. That performance difference is due to serious architectural mistakes in the UltraSPARC family of processors.
a specific set of operations in a specific sequence with specific data reminds me of that show on the WB called Phreakazoid i'm phreaking out!
chillax137
Finally, The electrical engineers are to blame. I knew my code was correct!
"If you are a dreamer, a wisher, a liar, A hope-er, a pray-er, a magic bean buyer
"Until we're sure the issues are 100 percent resolved, we're going to keep holding back shipments with the 450," IBM spokeswoman Lisa Lanspery said. "We have a policy of zero tolerance for undetected data corruption" at a customer site, she said.
:-)
so detected data corruption is just fine, then...?
Maybe this is not a bug, maybe this is just Intel's new anti-overclocking technology!
Spent some time setting up their validation tests. Without getting too specific, it was a bit of a mess. They do have good incident response though--lots of people working weekends on that.
-Libertarian secular transhumanist
Ok, I guess the joke is now no longer:
...>>NO CARRIER
Intel Inside: Get 99.98765374% from your PC!
Instead, it's now:
Intel Inside: Get 99.98765374% from your
If I knew the wedgies I gave you back in 6th grade would have resulted in this . . . I might have taken a moments pause.
rushed Itanium to market...
(ducking)
AMD also sucks your mother. I know - I was there.
COMPLETELY WRONG! See the non-AC reply for an accurate explanation.
Yeah, hopefully the SlashTrolls will be able to find a janitorial job instead of their usual everyday crapposting.
...that's why Itanium is not yet in the mass-market. Regardless of the flaws. It's also why my first 64-bit CPU is likely to be an AMD.
I mean, come on. I want 64-bit a lot. An awful lot. I even wanted a redesigned instruction set - something I was please to see Intel had the balls to do with Itanium. The old IA-32 has a lot of baggage and bad design choices. But for crying out loud - $7000 for a single chip?
I kid you not about the $7000 price tag for a single CPU - Itanium 2 is literally 10 times more expensive than AMD's 64-bit Opteron.
This is why, as a life-long fan of Intel, am planning on defecting to AMD with my next machine purchase. To anyone else buying a new machine in the next 6 months: it makes absolutely no sense to get a 32-bit system when vastly more capable and future-proof 64-bit ones can be had for almost the same price as the old 32-bit ones.
This time next year, I reckon that Intel will be steadily loosing market share to new system purchases because of their exhorbitant prices and their complete failure to provide a capable 64-bit platform in the same price range as AMD. Intel haven't even announced a consumer 64-bit chip. AMD's announced six - 2 Opterons, 4 Athlon 64's).
I'm desperately hoping that AMD and Microsoft's marketing machines ramp up to push 64-bit - the sooner the better to punish Intel for not being more pro-active in the marketplace.
Intel probably doesn't care about underclocking as much, so the overclock protection circutry is probably more along the likes of
if (($clockspeed) > ($specspeed)) shutdown
than
if (($clockspeed) != ($specspeed)) shutdown
After all, faster clocked processors are more expensive, thus, Intel's already made their money off you if you underclock. They're more worried about overclocking because it skims money off their high profit margin chips.
Marxism is the opiate of dumbasses
ow about Motorola leaving out critical instructions in the PPC603 and crippling every machine with one compared to the PPC601?
That's a very very big reinterpretation of the facts. ppc603 machines were designed for low cost low heat. One of the ways to do this was to further remove instructions that were not needed, legacy instructions from pre-PPC601, and were never designed to be in the 601. They were not 'critical' and did not cripple anything. ppc603 cpus ended up working just for the purpose they were designed for. cheaper and less energy-hungry cpus.
the G3 floating point debacle where excel spreadsheets would show up errors consistently
You made a typo there. "Pentium" is not spelled "G3"
Itanium is a very new architecture. It has the potential for kicking i386 chips in the butt once it has a chance to grow up. With anything as radicaly new as the Itanium, there is a high probability of unexpected problems. AMD has not had this sort of problem resently because they don't have any balls. All they ever do basicaly amounts to minor tweeks of a stable design. Even their 64 bit extensions fall into this catagory.
The type of problem Intel is dealing with could very well be in a new class. I have a hunch that it has to due with either unexpected capacitive coupling ( possibly related to an in-spec extreme of the process variation) or thermal transients causing timing skew. These types of phenomena are nearly impossible to model, especial if its tied to a particular set of process deviations. That is why manufacturer do such extensive qualification testing. Unfortunatly this testing can not be done untill there are enough units to test ( like in the 1000s). This does not happen untill the device is ready for production. Technicaly, this is the Pilot phase of development.
One needs to give Intel some credit for learning a lesson from the Pentium fiascos ( not just the math error, but also the original ( 5V) 90Mhz burn-up issue). At least they are doing the right thing now. Corporations, like people, sometimes need to learn the hard way. Unfortunatly, though people usually retain their lessons, Corporations sometimes need to relearn them, especialy when being run by greedy BODs ( or board members with hidden agendas). AMD has yet to learn this particular lesson. One of these days, they will try to cover up a problem and its not going to work. They have gotten away with some stuff already because everyone loves to hate Intel ( me included, 68000 and PowerPC for me!)
Unless your familiar with LSI semiconductor manufacturing, you should not be commenting. Because you don't have a clue as to what is going on. The posts I've read so far, remind me of what a class of 10 year olds would right in criticing Joseph Conrads "Heart of Darkness".
The Law of Leaky Abstractions.
As we abstract more, we lose touch with the lower layers. As more abstractions are introduced, new classes of bugs are also introduced.
So while it is easier to optimize software architecture and see other "high level" system design bugs, more "low level" bugs will creep in from this abstraction.
No, all 6.666666666666666666666 people.
Please help metamoderate.
The description is kind of funny: "This document identifies implementation differences between versions of the PowerPC 750- PID8p processor and the description of the processor contained in the User's Manual.
Because I truly believe that 1 * 1 == 2.
According to "SPEC", the Itanium 2 trounces the UltraSPARC III in performance and beats it by a wide margin. According to the "Transaction Processing Council", the Itanium 2 beats the UltraSPARC III by a wide margin on the most important commercial benchmark: TPC-C. An Itanium-powered server has close to the world record: 660,000 transactions per minute.
While I have no particular animosity toward Intel, other than it is important for there always to be competition to push them, I do not think they need to be let off the hook. Itanium has been around a very long time. You may think of it as new technology, but that is more because of the lack of acceptance in the marketplace, not because it has only recently been released. What was happening all of these years since Itanium was initially launched?
Additionally, while the Itanium instruction set takes a different approach to those of Intel's competitors, they are not the only company introducing new CPUs. I do not remember such problems when other 64bit CPUs with their own, new, unique instruction sets were launched by Digital, HP, IBM or Sun to name just a few. These days, the competitive landscape has been radically reduced. Digital no longer exists and its Alpha architecture is owned by Intel. HP, while it still owns its PA-RISC architecture, is trying to migrate its customers to Itanium, though it is hard to say what will really happen to PA-RISC since no one seems anxious to adopt Itanium. IBM also has picked up Itanium, so who knows what will happen to their RISC architecture? That leaves Sun, and while SPARC has always been the weaker of the RISC architectures, it seems to be the primary remaining competitor to Itanium and Intel. Of course, who knows how much longer Sun will survive as an independent company? Maybe they are the next to be gobbled up by IBM or HP, both already commited to Itanium, so what happens to SPARC?
Finally, it is hard to say exactly where AMD fits in all of this. Its 32-bit line provides excellent competition to Intel's 32-bit Pentium family (now at P4), and the AMD 64-bit architecture looks like a nice increment beyond the now very old x86 32-bit architecture. But, in terms of major pressure on future CPU architectures? I just don't know where that competition will be coming from... Maybe China, Russia, Japan, India? Places not noted for their hi-tech prowess, but with lots of experience in fabrication and lots of affordable talent?
Er, you do seems to be trolling just a bit. The US-III@ 1.2GHz achieves a base SPECint of 637, and the 1.0GHz Itanium-2 is 807. Yeah, it beats it, but trounces it? err, well, not really.
And it's a far cry from the "order of magnitude" better performance than the grandparent post's claims.
What's really funny about this post is that normally I am the one bashing Sun's CPUs... *boggle*
Obligatory AMD note: the new SPEC update today shows that a 1.8GHz Opteron SPECint base is 1081.
On a price/performance basis, I would consider that to be the trouncing chip -- maybe even in the order-of-magnitude range.
Are you kidding me! If I paid for a server
with a 900 Mhz or 1 Ghz processor, and the
company that produced the processor said that
I had to underclock the chip for it to work
properly I'd ask for a refund. And it one
wasn't forthcoming...
*sigh* Mod parent down....
Robert Nambla Malda
Harware has the advantage of well-defined specifications.
Riddle me this, what should the output be in Word Processor A when the user presses the 'r' key?
Answer: It depends. An 'r' should be inserted into the document, unless shift is depressed then is should be a capital r, unless the menus are selected, then a 30x80 pixel menu should scroll down smoothly at the rate of 10px/sec display certain text. . . .
They recommend working around the problem by underclocking the processor to run at 800 MHz instead of its default 900 MHz or 1 GHz
I just want to see them recommend this AFTER they start incorporating their new patented anti-clock speed changing technology into all of their chips.
I remember Compaq or maybe Dell delayed their Itanic systems while everyone else was going whole hog shipping them. Could this be it?
Where quality is job 1.99904274017.
--
"Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
How the hell is downclocking going to protect you from someone sending 'a specific set of operations in a specific sequence with specific data' to the processor? Spindoctors, i choose you!
Trolls dont like to be Flamebait, because they burn so well. Protect our Troll heritage!
Luckily, all of the Itanium 2 owners have been contacted, and both of them had not yet experienced data corruption.
Well, for one, a hardware bug is very easy to define. The chips may be complex, but in the end they have relatively few well defined operations. Software, on the other hand, has almost infinite poorly defined operations and it's not even possible to define what all software bugs are.
Not cheap, not reliable.
Sans the liquid varients, there doesn't seem to be any such thing as 'adequate' cooling on an AMD T-Bird in Texas during the summer. Sure, the last few AMD processor generations seem relatively bug-free, but what's the point of a 'flawless' processor if it only lasts me a year?
Sigh.....*waits for the new P4 and MB to arrivde*
Whenever I get a glitch I scratch then bitch.
How the heck do they put em out so fast and get away with users needing to clock them down?
Is it to prevent overclockers from doing processor fried chicken to their chips. Keep it up Intel you will make Hienz ketchup millions. Or maybe you can serve them up with big Macs.
Though somehow I do not thing Steve is going to want his big macs to include intel chips, especially if they are too easy to cook!
OH THE SHAME I fell off the wagon and use sigs again!
maybe even in the order-of-magnitude range.
t ml
;-)
I agree. Statements involving the phrase "orders of magnitude" are banded around without any thought idea of what it means.
If you are talking an "order of magnitude" you mean a "factor of ten", if you say "two orders of magnitude" you mean a factor of one hundred (etc).
http://mathworld.wolfram.com/OrderofMagnitude.h
So a SPECint of 1081 v.s. a SPECint of 637 does not even differ by one one order of magnitude. 637 ~ 1081, more like a factor of two, significant but not as earth shattering as, say three orders of magnitude (1000)!
Be nice to people on the way up. You will meet them again on your way down!
Jesus! Talk about a difficult bug to find. I would be interested in how they figured out which instructions in which order with what data.
All your base are belong to us!
Hasn't Intel championed anti-underclocking technology?
Must-not-watch TV!
The Sun system uses the UltraSPARC II. Since Sun has refused to disclose the TPC-C score for the UltraSPARC III, we can only conclude that the UltraSPARC III does approximately as well as the UltraSPARC II.