Hidden Cores On Phenom CPUs Can Be Unlocked
An anonymous reader writes "One of the major ways a semiconductor manufacturer manages to make the most of its chips is through binning. Chips able to cope with high clock speeds with all cores running end up as premium product lines, while others end up as models rated at lower speed grades, or with fewer cores. In the case of AMD's Phenom CPUs, dual and triple core models are quad cores with some disabled, while some newer quad core CPUs are actually six core models with two disabled. To this end both ASUS and MSI have announced that they have modified versions of AMD 890FX- and 890GX-based motherboards to unlock these hidden cores. Much like overclocking, there is no guarantee that you will gain anything by unlocking the hidden cores — everything depends on just why your CPU ended up in a certain product line."
I would have gotten first post here, but AMD disabled two of my CPU's cores :/
Unless this is a rehash of when Intel were (alleged?) to be selling 486DX processors as 486SX with perfectly good maths co-processor cores disabled ...
Uh, yeah, basically that's what the article says.
Whale
It depends on the maturity of the product. Often cases early on there is a legitimate need to reuse chips with flawed cores so they are disabled and sold as such. Later in the product cycle though, the demand is still there for lower cored versions, but manufacturing has often caught up to the point where there simply aren't enough flawed versions to fill demand for the limited versions. Result is that when the quota of flawed runs short, perfectly good chips are limited in the same way to fill the gap.
Later in the product line it might end up that only 20% of the lower priced chips have any flaws at all. For those people who want to tinker, it's often worth while to at least check and see if their chips will run ok when then turn the rest of them on. They stand to gain some performance if it works, and if not - eh, they paid for the slower version anyways (the only issue I take with this is when I see Negwegg reviews or forum posts claiming that they were returning the chip because it "didn't overclock far enough").
It's not something I really bother with anymore (as I've gotten older as long as the computer keeps running I'm happy), but I remember enjoying the whole overclocking scene ~10 years ago and wouldn't begrudge the new cheap teenagers of the same fun I had :).
"People who think they know everything are very annoying to those of us who do."-Mark Twain
I don't know about you, but I would not want to be willingly running a system with a known-bad CPU core
You underestimate the combination of paranoia and lack of sense that a lot of overclockers have, who are convinced the CPU manufacturers intentionally disable their chips in order to make more money somehow by selling them at a lower price.
They might not necessarily be flawed. It quite probably is a 'rehash' of what Intel were doing, and for good reason:
If all the chips come off the same line, then they might have an average cost of, say, $150. If there's a huge demand for quad-core chips at $200 and little demand for six-core chips at $350 then it's probably going to be more profitable disable two cores, bulk up the stock already consisting of chips with only four working cores, and take the $200 rather than have a chip sitting on a shelf. Thus some quad-cores are perfectly good six-cores, others aren't. They couldn't, however, afford to market all the six-core chips at $200 because the yield would be too low - there'd be nothing to do with all the faulty ones, thus pushing the average cost above $150.
True, but there's also a good possibility that the your part wasn't binned to fulfill an order. Chips go through a severe set of stress tests that often exceed what will be encountered in practical use. During these tests, it may be revealed that a core doesn't function properly or well enough (it gives bad results) to qualify. All chips go through that, and that's why there's many redundant structures on a chip (to improve yields). (Sony PS3 has 7 SPUs when they build 8 on a chip, Xbox360's got 3 PowerPC cores even though it has 4, Intel disables cache lines and/or functional units, etc. etc. etc.)
So the question is, are those cores disabled because AMD had extra parts and an outstanding order they could fulfill? Or are there actually potential issues that may only be revealed under certain loads? FOr the most part, it just means a game crashes a bit more often than usual (since mission critical servers never do wierd things like this - the money saved isn't worth the potential for extra downtime), or maybe a file gets corrupted. Or worse, your disk gets corrupted.
Plus, AMD's historically been supply-bound and unable to fulfill demand for their product, so there's a potential that instead of getting a binned part, it's actually one that failed their test patterns.
And yes, you see the same behavior with flash chips - NAND flash traditionally ships with bad blocks, and the majority of those can probably be erased and used quite safely (having accidentally destroyed the bad block information before due to buggy software...), but you never can tell why it was marked bad in the first place.
Yeah, maybe. Then again, GP has a point and you're being an asshat.
TFS makes a comparison to overclocking. It points out that there is no guarantee of a benefit - but doesn't point out that there is a risk. In the case of overclocking, the risk is that you will overheat a chip that was rated at a particular clock speed for good reason. Of course you can combat this risk by improving the cooling system. You can combat the risk because you know exactly what the risk is.
Now in the case of "hidden cores", what's the risk? Do you even know? Do you know what kind of flaw would lead them to legitimately disable a core? Is that one core unable to tolerate the same clock speed as the others? Is it functionaly broken such that it will return incorrect results for some operations? How would you tell the difference between that, vs. a chip that was perfectly fine but sold in a degraded state to balance out supply and demand?
You could shell out for a special motherboard just to test your chip, and if no flaw in the normally-disabled chip causes any damage to the rest of the chip (or do you have some basis on which to rule that possibility out?) you at least won't lose anything. Or, could the defect be intermittant such that your tests might miss it?
And if your computer is for hobbying and you enjoy working with a potentially-unstable system, good for you. A lot of people think that's a fine trade-off for what they're going to do with their systems. None of which invalidates GP's question - which is "what exactly might a disabled-by-default core do if you turn it on when it really was disabled for a reason?"
Right. And given that there is *always* a yield rate somewhere below 100, it's a guarantee that not all of the partially disabled parts are in actual fact fully working. You'd have no way of knowing if you do. In fact, given that the yields are private information, you don't even know the *probability* that your unlocked unit will work properly.
The manufacturer will *always* bin the partially flawed parts as their low end units first. They will only use intentionally crippled units to fill the low end volumes if they run out of partially flawed units. Historical experience with yields indicates that they're more likely to get not enough fully functional units than they need. This was the case with single core parts, and I'd assume it's even more the case with multi-core parts, becoming more of a problem as core counts increase. I doubt AMD or Intel have the latitude to pick and choose the relative outputs of their units; I doubt the yield curves are such that they end up having to cripple many units because they have too many fully functional parts and not enough to fill low-end volumes.
Even if there *were* a decent percentage of fully working CPUs on on the market, you'd have to be pretty stupid to spend that amount of money on a high end motherboard to turn your CPU into a *maybe* working higher model that *may* totally destroy your data. Either that or the work you're doing is so trivially unimportant that you probably don't need a computer in the first place. Why not just buy a normal motherboard and spend the saved money on the real fully featured part.
You're showing a complete lack of understanding of, well, just about everything.
I hate printers.
I think they use one of those ion Cannons from Empire Strikes Back.
No. They ruin the core's self esteem. They tell it, "You're not good enough to work with the others. Just turn off and sit there and stay out of everybody else's way."
Then one day, a gamer comes by and turns it on. But the core is thinking, I can't do this! This is graphics processing! It's intense! I can't keep up with the other cores!
But the gamer, having faith in the little core, turns him on. And low a behold, the little core can do it, but not without being picked on by the other cores. No! They still tell the little core that he's just not good enough. He can't keep up. But the little core hangs in there to fulfill his duty to the gamer - feeling less than every one else.
One day, the gamer upgrades, and the other cores are scared. They can't keep up. The clock is mad now. He screams, "Come on cores you need to keep up!" The little core comes in and takes up the slack, showing the other cores that he indeed can keep up. The other cores shout, "You did it! You can do it! Come and join our click!"
The little core responds, "No, I'm having lunch with the master clock and by the way, he's promoting me to be your boss. You're my bitches now!"
That's how it happens.
RIP America
July 4, 1776 - September 11, 2001
And you underestimate the profit product differentiation can generate.
If you have $300 to spend and you can choose between two products, one for $100 and one for $500. Which will you choose?
Now if I take that $500 product and turn it into a third product, $300 and slightly tweaked to perform less than the $500 product. Which will you choose?
You and I might take the $100 product and pocket the rest, but many buyers will go for the $300 one. As long as manufacturing costs are low it's more profitable to have a range of prices.
This sig is intentionally left blank
This reminds me of "processor affinity" or "affinity mask", whereby you assign software to a particular processor or core. If you want to setup your software so that only less cpu intensive software (cooler) runs on the questionable core, you can do this in Windows 7, and likely for at least some software in Linux (I'm really not sure here), then yes, in theory, you could do this so only Word runs on core #3.
But please remember the wisdom of Yogi Berra when trying to apply a theory like this: "In theory there is no difference between theory and practice. In practice there is."
In other words, your mileage *will* vary.
Tequila: It's not just for breakfast anymore!
For a while I was selling race car / high performance street car suspension systems.
I had discovered that 90% of the aftermarket shocks being sold as performance upgrades were actually crap. The customer is really not qualified to properly evaluate a shock valving and so it is very difficult for them to differentiate between a proper performance shock and a juiced-up pogo stick.
I started putting shocks on a device called a "shock dyno" (which measures the forces produced by the shock at different shaft speeds) and discovered an absolute parade of horror. Details can be read at http://farnorthracing.com/autocross_secrets6.html
To get the good stuff you needed to be paying upwards of $3000 per corner (so $12000 per car) which is far, far out of the price range of most customers.
So I was building packages based on a brand of shock that was pretty decent and much cheaper. Even though the base design was solid, it still suffered from manufacturing variations. To get around this, I would buy batches and then dyno the lot. Shocks that were close to each other became matched sets, and I'd tweak the adjusters on the shock to ensure each pair was as closely matched as possible. On top of that, I designed some hardware to resolve some other tricky problems typical of the off-the-shelf aftermarket designs, and only used the best bang for the buck components to build them.
When done, I provided a race-quality suspension system, dyno-matched (and it came with the data sheets to prove it) that was very nearly the equal of the $3000/corner systems, for about $500/corner. I say "nearly" the equal because the adjusters on my shocks worked nowhere near as well as the adjusters on the expensive shocks, but in terms of absolute performance, they were effectively identical.
There was almost no markup in these parts; I was hoping to make it up on volume and I knew the customer base was price-sensitive.
These suspensions were INCREDIBLE deals. There was nothing else like it anywhere for anything less than 5 times the price, and unlike all the cheaper stuff, I could prove that it worked. What's more, I could run the cheaper stuff on my dyno and prove that it DIDN'T work; that it was categorically JUNK.
I sold almost none of them, and the universal complaint was "too expensive".
Even when I opened up the books, showed what I was paying for the components, explained why *this* part instead of *that* part, explained every single design decision and proved why it could not be made any cheaper without compromising the functionality, over and over again potential customers would choose to buy non-functional (but shiny) JUNK over functional parts based solely on price.
It was mind-boggling, and eventually I just said to hell with it and found something else to do.
The chip manufacturers are right on the ball here. If I were them, I'd be encouraging the creation of these kinds of motherboards and rather than down-rating the high end parts to make mid/low end, I'd be cherry-picking the best ones for the high end and defaulting the output of my fab runs right to the mid/low end SKUs. In fact, I'd be tempted to DESTROY any chip with a bad core and ensure that all the low-end chips were fully functional - specifically to build a reputation for being "overclocker-friendly".
You can't make money off what you DON'T sell. Believe me, I know.
DG
Want to learn about race cars? Read my Book
If a core is just flat out non-functional then yes, you are right, a system wouldn't boot. However that it works mostly doesn't mean there isn't a problem. There could be a single instruction that has a flaw, so everything is fine unless that instruction gets executed but when that happens you get a crash or worse, data corruption.
If you think Prime95 is an accurate test, you are kidding yourself. Prime95 tests the FPU mainly, and is good for heat testing. It is not a full CPU test. So maybe the FPU works great, but one of the other units doesn't.
So no, you don't know that nothing is broken. You assume nothing is broken. Maybe that's fine, however then no bitching if you get data corruption or the like because there was a problem that you didn't know about.
Good work! I plan to do something similar soon, though the cost savings of getting a $100 2-core Ph2 and unlocking it to a $160 4-core Ph2 isn't so great :/
I'll share my pseudo-failure story, though. I bought a Tyan Tiger MPX about 10 years ago to run dual SMP 1Ghz Durons. About 5 years later I upgraded the CPUs to 2.0 Ghz mobile Athlon XP. My motherboard couldn't control the mobile chips, so I think they only ran at 1.2Ghz or something for a time, then I got brave and whipped out the xacto knife and cut some bridges to clock them up to 1.8Ghz. After I migrated to a new server, I got even more brave and whipped out the pencil as well and linked some more bridges to get them up to ~2.2Ghz for the past few years. It's still my primary gaming machine (yeah, I'm too cheap to budget any real money towards entertainment, but it still runs most games better than my wife's 1-year old laptop, as long as they don't require 64-bit or DX10).
Of course, it's quite a bit flaky now, I think due to the penciled bridges and probably old noisy cooling fans. It crashes when I kick the case, and if it gets too warm in the room, it just plain doesn't boot (motherboard gives out 5 beeps and it just sits there). But once it starts running a game for more than a few minutes it tends to continue to be OK
Still, I'm plotting to migrate my current server to a low-power, low profile Zotac Zbox with some sort of external eSATA RAID, so I can free up my current hardware for gaming before it gets too outdated :-P
The diagnostic doesn't put any sort of uniform stress on anything other than memory. Even wondered why it does a ton of passes on a ton of different modes with a ton of patterns on RAM? That's testing for as many possible RAM failure modes as it can. No attempt is made to test the CPU. You're stressing some parts of the CPU, but you're neglecting the vast majority (e.g. floating point and SIMD).
If anything, it might be a diagnostic for your cooling system. Sure, it helps ensure that nothing is blatantly wrong with the CPU, and it does a better job at testing the CPU than memtest86, but it isn't even remotely a comprehensive test of CPU functionality.
This isn't overclocking we're talking about here. When you overclock, you stress the entire CPU more as a whole. When tests like memtest86 and Prime95 start failing, you know that your CPU is definitely unstable. Then you back off and you hope the untested parts of the CPU will do OK with whatever safety margin you gave it.
When you enable a core, it might have some broken parts, or it might not. Those parts can be flaky, or they can be borked, period. Unless you run software that has a chance of testing those parts, you will never find out. E.g. if the hardware for a specific floating point instruction is borked, memtest86 will be useless, and Prime95 will be useless unless it happens to use that specific instruction. If the transistor in charge of forbidding kernel memory access from user mode is borked, you won't find out until an unstable application takes down your entire system by scribbling all over the kernel.
I am absolutely sure there is no test that will match what Intel and AMD do - because they know exactly how their CPUs work and what to test for. I do know that you can do a whole lot better than memtest86 or Prime95. I haven't checked whether someone actually has attempted to produce a comprehensive architecture test of this sort.
Your mistake is attempting to extrapolate from tools used for testing overclocking (which typically results in overall instability) as a means to test for disabled and possibly subtly broken hardware. Any failures from a defective core are likely to show up only with workloads that exercise the defective bits, and the rest of the CPU will work fine.
I remember when running the "Second Reality" demo (by Future Crew) on my 486, if you hit the desk the computer was on, the particles on screen would jump around to different locations (and occasionally it would crash). I never noticed any other problems with any other software. Granted it was probably the RAM and not CPU, but after seeing this, I was really surprised that the computer worked at all...
"They were pure niggers." – Noam Chomsky