Theo de Raadt Details Intel Core 2 Bugs
Eukariote writes "Recently, Intel patched bugs in its Core 2 processors. Details were scarce; soothing words were spoken to the effect that a BIOS update is all that is required. OpenBSD founder Theo de Raadt has now provided more details and analysis on outstanding, fixed, and non-fixable Core 2 bugs. Some choice quotes: 'Some of these bugs... will *ASSUREDLY* be exploitable from userland code... Some of these are things that cannot be fixed in running code, and some are things that every operating system will do until about mid-2008.'"
Thank God I got a AMD this time around.
If information wants to be free, why does my internet connection cost so much?
.. not intel compatable.
Ask for your money back folks!
Old COBOL programmers never die. They just code in C.
I always find Mr. De Raadt's comments an interesting read. He's like a geek version of Harlan Ellison.
The simple truth is that interstellar distances will not fit into the human imagination
- Douglas Adams
outstanding, fixed, and non-fixable Core 2 bugs
Well, in these days of fast-paced business, business at the blink of an eye, at the speed of light, at the speed of spooky action at distance kinda speed, it's normal that companies would release products prematurely and then patch later.
Thankfully, software is very easy to patch post-release.
Now, the only thing left to do, is someone tell Intel that they're selling hardware.
Sure:
Some of the bugs are so dangerous that it doesn't matter WHAT operating system you're running, code could be written that could attack the entire system. It would still be OS-specific code, but since the exploit is in the hardware, it's a LOT harder to prevent the attack, if it's even possible.
Some of the bugs are unfixable, as well. (I assume they mean without physcially replacing the chip with a 'fixed' one that doesn't exist yet.)
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
can someone at slashdot please provide an "english" translation of the problems and how dangerous they are to normal users?
"We don't have the complete picture yet, but things look bad"
Hanno
Actually we are talking about VHDL. The "million transistors" argument is just as appropiate as saying "software is so large, it has so many ones and zeros". Development does not happen at this low stage.
This sig does not contain any SCO code.
Coming from the government sector, this kind of issue isn't going to be taken lightly. I work at a DoD facility and all our machines were just refreshed with Core 2 Duo machines. It is already almost impossible to get new software approved, if this causes the same paranoia for basic commodity hardware we're really gonna feel some pain.
"God fights on the side with the best artillery." - Napoleon, Marshal of France - speaking truth to power
What is a bug but an undocumented feature?
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
I am exceedingly ignorant in this area, but even I can grasp how dangerous some of these are. And, as a mac user (PowerPC Dual G5 - thankfully), I suspect that this will come as really bad news to mac community as well. It's unbelievable to me that some of the "Show Stopper" issues are not being addressed by intel - especially when news of nation to nation cyber wars/cyber attacks are beginning to pepper the news. The fact that some of these are not resolvable through software patches is VERY worrisome to me! I am very appreciative to those who can fully interpret these flaws chip architecture and bring it out to the public's awareness.
the computer thingamajibs don't do things right and the computer nerds are all upset about it. best not to click on ANYTHING until 2009
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
The market resoundingly rejected that idea when Intel tried to hoist IA64 on it.
How do AMD, VIA, Motorola, IBM, etc. fare?
AMD64 doesn't like FreeBSD 6.2 at all. We use FreeBSD and Linux in our business. FreeBSD is very important to us. In fact, I would go so far as to say that the senior management here in our IT department borders on being fanboys of FreeBSD. We were running various versions of FreeBSD on our AMD64 servers from 6.1 down to 5.something and we (foolishly in hindsight) decided that we had to upgrade to version 6.2 because it had some bug fixes we thought we needed. Oh they did fix those bugs, but they opened up a huge one that apparently nobody knows what causes it and nobody has any idea how to fix. What happens is that AMD64 systems will panic with some sort of a "sleeping on a non-sleepable lock" panic. Some people think that this is being caused by a large number of writes. Given how our servers work, this is certainly possible for us. The bottom line for us is that FreeBSD on AMD64 is so unstable that we are probably going to have to go to Linux instead for our web servers. Nobody wants to do that, but we simply can't have our webservers going down every day with the same panic and we lose one server a day on average to this problem. We've even had boxes crash within minutes of being brought up with the exact same panic.
Once we move to Linux, I don't think we'll go back to FreeBSD. My best guess is that because the problem has apparently been going on for months with no resolution that we'll start moving servers from FreeBSD to Linux when we can. We don't have this problem under Linux. The fact is that whether we like it or not, more people use Linux and if stuff is seriously broken under Linux, someone will fix it soon enough. With FreeBSD nobody seems to have any idea what to do for this problem and I'm not sure that it will even be fixed this year, let alone soon enough to keep us from moving to Linux.
There seem to be intentional modifications in there.
Unprovable conjecture. Why would Intel make this public if they were?
Could that be a backdoor and a good reason for countries like China to develop their own CPUs?
Are you the same freak that posted on KernelTrap about how every CPU since the 486 has been bugged by the NSA and can be monitored by satellites? If so, please carry out the recommended course of action that I detailed to you on that occasion. Namely that you set fire to yourself outside the UN building in order to draw attention to your cause. Thanks in advance.
This is going to be a big deal for shared hosting environments for example.
If you can, as a normal user, execute something that lets you get root on the box, and there's nothing the OS can do to prevent it, then it's a seriously nasty situation for quite a few businesses.
I wouldn't be surprised if businesses like that started switching to AMD hardware.
Yes and no. There are limitations to HDL's (and Intel, last I heard, was all Verilog). For one, it is *very* difficult, if not impossible in certain situations, to describe asynchronous signals. With something as complicated as a microprocessor that is so aggressively designed for both power and speed, I would guess that they didn't go with a completely synchronous design (hell, no one does anymore). Locally synchronous, globally asynchronous design has been in use for a while. It helps when you want to be able to shut off, or slow down, only parts of the chip that aren't being used very much.
It is not possible to describe such things (let alone voltage islands, voltage scaling) in an HDL language and they must either be a special feature built into synthesis (with an extra set of constraints) or done by hand at the transistor/gate level.
Then there's the point of verification. Every software release since about the mid-1990's has almost been immediately followed by patches. Just because it's "1's and 0's" does not mean that it doesn't get harder to detect corner cases as complexity grows. And it's much more difficult if you have to simulate on a cycle-accurate model (boot-up for an operating system, in simulation, would take a day on a nice cluster on something as big as the Core 2).
Then there's post-synthesis/layout issues. Timing analysis do best on sequential logic (completely synchronous). When you throw in clock gating, multiple voltage islands, dynamic voltage scaling (meaning dynamic gate delays), not to mention the plethora of other techniques that those folks might be doing, what you see in simulation at the RTL level will not match what you see in reality. First rule they ever teach in any ASIC design class is never trust simulation.
The point is that abstraction, like how it's "vhdl", does not mean that it's not difficult to get right and even sometimes impossible to be certain.
My favorite errata in the list is AI22, Sequential Code Fetch to Non-canonical Address May have Nondeterministic Results. Basically the chip decides that all of the high oreder bits should be '1', instead of '0' - for no apparent reason as its not consistent.
Did anyone notice these chips are using the 65nm process?
At what point do the shear quantum affects overcome the deterministic EE rules that are used to design the chips? I don't know, but wikipedia defines a nanoparticle as one with at least one dimension less than 100nm. http://en.wikipedia.org/wiki/Nanoparticle
Given that definition every transistor's source, drain and gate are nanoparticles. And we expect them to behave classically why?
How does this errata compare to previous generations or even AMDs? I wonder if any increase could be from rushing Core 2 to market to kick AMD's flagship CPU off the top of the heap.
More Twoson than Cupertino
Comment removed based on user account deletion
So perhaps the NY law requiring software for voting machines to be held in escrow should include the chip layout as well...
TCP: Why the Internet is full of SYN.
AMD64 doesn't like FreeBSD 6.2 at all
% uname -a
FreeBSD myhost.grateful.net 6.2-STABLE FreeBSD 6.2-STABLE #0: Mon May 28 09:52:28 PDT 2007 me@myhost.grateful.net:/usr/obj/usr/src/sys/AMD64 i386
granted, I'm using 32bit mode - but I've been running 6.2 for as long as its been out and my 'always on' freebsd box. what issues are you seeing? this is my production box - but I don't see any problems with bsd. in fact, I also have 6.2 running with an old amd64 3000+ that was a mobile chip and had to have cpufreq enabled just to move it off its default 800mhz and up to the 2.mumble ghz that its supposed to clock at. works fine.
I have seen some hardware devices not behave well but often its not a well designed piece of hardware or its just not meant for server style loads (cheap consumer onboard sata sometimes times out and usb2.0 always times out if you give it enough load).
I can't speak to amd64 USING 64bit mode, but 32bit mode works as well as (or better) than linux on headless style computing.
--
"It is now safe to switch off your computer."
Link
Here's a little more detail, based on my (very incomplete) understanding of the issues:
It appears that Intel has made changes to the way the memory management unit in the processor works, plus there are also some bugs that affect memory management. So what does that mean?
There are other issues as well... but these are a good sample, and should give an idea of what kind of bad stuff these CPU bugs/changes can make possible.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
The first Pentium had a floating point bug. Maybe they're working too closely with Microsoft? (I kid! I kid! Put down that flamethrower!) Any way, here are a few Pentium jokes I dug up. If only the Core 2 bugs were all floating point erroirs we could recycle all of these old jokes to Core 2 jokes!
Q: How many Pentium designers does it take to screw in a light bulb?
A: 1.99904274017, but that's close enough for non-technical people.
Q: What do you get when you cross a Pentium PC with a research grant?
A: A mad scientist.
Q: What's another name for the "Intel Inside" sticker they put on Pentiums?
A: The warning label.
Q: What do you call a series of FDIV instructions on a Pentium?
A1: Successive approximations.
A2: A random number generator.
Q: Complete the following word analogy: Add is to Subtract as Multiply is to:
1) Divide
2) ROUND
3) RANDOM
4) On a Pentium, all of the above
A: Number 4.
Q: What algorithm did Intel use in the Pentium's floating point divider?
A: "Life is like a box of chocolates." (Source: F. Gump of Intel)
Q: Why didn't Intel call the Pentium the 586?
A: Because they added 486 and 100 on the first Pentium and got 585.999983605.
Q: According to Intel, the Pentium conforms to the IEEE standards 754
and 854 for floating point arithmetic. If you fly in aircraft
designed using a Pentium, what is the correct pronunciation of "IEEE"?
A: Aaaaaaaiiiiiiiiieeeeeeeeeeeee!
Q: Did you hear about the new "morning after" pill being developed as a replacement for RU-486???
A: Its called RU-Pentium. It causes the embryo to not divide correctly.
TOP TEN NEW INTEL SLOGANS FOR THE PENTIUM:
9.9999973251 It's a FLAW, Dammit, not a Bug
8.9999163362 It's Close Enough, We Say So
7.9999414610 Nearly 300 Correct Opcodes
6.9999831538 You Don't Need to Know What's Inside
5.9999835137 Redefining the PC -- and Mathematics As Well
4.9999999021 We Fixed It, Really
3.9998245917 Division Considered Harmful
2.9991523619 Why Do You Think They Call It *Floating* Point?
1.9999103517 We're Looking for a Few Good Flaws
0.9999999998 The Errata Inside
THE TOP TEN REASONS TO BUY A PENTIUM MACHINE:
10. Your current computer is too accurate
9. You want to get into the guinness book as "owner of most expensive paperweight"
8. Math errors add zest to life
7. You need an alibi for the I.R.S.
6. You want to see what all the fuss is about
5. You've always wondered what it would be like to be a plaintiff
4. The "intel inside" logo matches your decor perfectly
3. You no longer have to worry about cpu overheating
2. You got a great deal from JPL
1. It'll probably work
Thank you, thank you, I'll be here all week. Remember to tip the bartender. Lets see, 20% of... divide by...
% uname -a
:(
FreeBSD myhost.grateful.net 6.2-STABLE FreeBSD 6.2-STABLE #0: Mon May 28 09:52:28 PDT 2007 me@myhost.grateful.net:/usr/obj/usr/src/sys/AMD64 i386
Wait... this works in Slashdot's text area?
% uname -a
% uname -a
Damn it
I don't know why Theo posted that link, because it is about the Core, not the Core 2. They are two completely different micro-architectures. The Core was a slightly tweaked Pentium M (which is basically a P6 with extra vector instructions and the NetBurst branch predictor), while the Core 2 is a completely new micro-architecture. If you compare the errata in the two links, you will see that they are quite different.
I am TheRaven on Soylent News
What makes you think the bugs are in the "x86 layer"?
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
http://marc.info/?l=openbsd-misc&m=11830201643010
and control computers remotely.
* Monitor and control (filter) the network traffic - before/under the
running operatingsystem
* sending out patches to computers - even if they are turned off.
* Control, upgrade, change, add and remove software
the AMT (Advanced Management Technology) is the truly frightening bit. Big Brother visits your computer:
6 &w=2
A Swedish ASIC designer explains:
http://strombergson.com/kryptoblog/?p=311
(A rough) translation:
http://marc.info/?l=openbsd-misc&m=11830201643010
Chris
So Buddha walks into a pizza parlor and says: "Hey, make me one with everything."
Any sufficiently advanced undocumented feature is indistinguishable from a bug. :-)
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Another scary bug (perhaps the scariest, since it appears to be the one that most reliably/repeatably occurs) is AI88: Microcode Updates Performed During VMX Non-root Operation Could Result in Unexpected Behavior.
From what the errata says, unless the host software has specifically disallowed access to parts of the MSR, a VMX guest/non-root system could reload the CPU microcode.
This leads to a whole universe of complicated data theft/code execution/etc. exploits that will probably never be created due to their complexity. However, it also leads to a very, very, very simple DoS/crash exploit (load some bad microcode, crash the CPU).
Stack problems, memory management problems, interrupt problems and so on. Many of these bugs will not cause an immediate exception or crash but may look like software bugs, for example a stack problem causing a return to the wrong address.
I guess MS Windows users will simply blame Microsoft's sloppy code, when it isn't even their fault...
Excuse me, but please get off my Pennisetum Clandestinum, eh!
I think the latest Power series will give any Intel CPU a run for it's money as well the latest Sparc.
Yes, they will. But those chips are designed with a target price of thousands of dollars and without anywhere near as much concern about heat.
Power has a 128 KB L1 cache (64 KB on Core 2), 4 MB L2 cache per core (4 MB L2 shared on Core 2), and a 32 MB L3 cache (none on Core 2). If you're willing to pay for that, x86 would be a lot faster.
Oh, don't forget that Power chips run really really hot. Hotter than Pentium 4's. The market has made it clear that lower power usage / heat generation is a priority now.
"Come on people, move along, nothing to see here".
"Hasn't anyone noticed these terrible bugs?"
Apparently they have, and now we know too.
Look, I know Theo-bashing is a traditional bit of fun, so I hate to rain on your parade. But you should keep in mind that the OpenBSD team is uniquely (or nearly so) positioned to discover and publicize the security implications this sort of flaw. The whole project is security oriented; they don't accept "binary blobs" into security-sensitive roles, which means they look more closely at hardware than most; they operate in a very transparent manner; their user base is supportive of any security-related moves by the devs; their installed base is heavy in security-sensitive roles; and the project is notorious for not giving a damn about political considerations.
"But they're rarely very serious, they rarely actually affect anything in remotely realistic scenarios."
OpenBSD is heavily used in the perimeter security role, and in security-sensitive roles generally. As its OS security gets better, OpenBSD's sensitivity to hardware security flaws gets higher. If there's an architectural flaw that the OS can't cover, OpenBSD's user base needs to know that so they can evaluate their overall security and spec hardware accordingly.
Almost no one else needs to worry about hardware exploits in Core 2 as much as OpenBSD does, because almost every other OS for general-purpose hardware has easier exploit paths. For instance, I'm not worried about this flaw on my home iMac, because my iMac isn't in a security-sensitive role. If an attacker wants my home data, it'd be easier for the attacker to simply break in and steal the whole box.
"How does he expect Intel to respond?"
Like the professionals they are, I'd think.
With reasonable men I will reason; with humane men I will plead; but to tyrants I will give no quarter. -- William Lloyd
Itanium is a lesson in how not to handle technological transitions. Itanium was picked by geeks who had no idea of what the market wanted or needed, and Intel marketing and management blindly believed what they were hearing from the geeks.
Actually, Itanium was a wildly successful product. Mere rumors of Itanium's capabilities were sufficient to kill DEC Alpha, drive SGI/MIPS out of the high end processor market and disrupt SPARC and PA-RISC development programs. Intel virtually eliminated the threat of competitive RISC architectures for years with the announcement of Itanium.
(Another company that works like that is Microsoft, which is why they keep churning out such bad software.)
To much the same effect.
Ok, lets look at some of these.
/dev/io or memory-mapped bus space can exploit it. So e.g. something like XOrg, but not the typical user program. Worse case seems to be a system freeze. Still, this is something to be concerned about.
AI65 - Thermal interrupt does not occur if DTS reaches an invalid temperature. What the hell is an invalid temperature? A disconnected sensor or something? It doesn't sound like something a userland thermal-generating loop can exploit but the errata is not detailed enough to know for sure.
AI79 - REP/STO in specific situation may cause the processor to hang. BIOS patchable. The errata mentions an uncacheable memory store. If this is a pre-requisit then only user programs with access to
AI43 - Concurrent MP writes to non-dirty page may result in unpredictable behavior. This one is extremely serious. It effects any threaded program and possibly even programs which are no threaded. This would cause me to not purchase the cpu. It says that a BIOS workaround is possible (aka microcode update).
AI39 - Cache access request from one core hitting a modified line in the L1 cache of another core may cause unpredictable system behavior. What the hell? Are they out of their minds? This is a big-time show stopper. It says it can be fixed with the BIOS (aka microcode update). I sure hope so.
AI90 - Page access bit may be set prior to signaling a code segment limit fault. This one is pretty serious. This cannot occur on most operating systems because the code segment is set to be unlimited and access is governed solely by the page tables. In 64 bit mode emulating 32 bit operation the problem might occur if a bit of code wraps the segment. There are possibly issues in other emulation modes, such as VM86 mode. The effect of setting the page accessed bit will not make a page accessible that was not previously unaccessible, but it will result in unexpected modifications to the page table page and numerous operating systems may free such pages to the page-zerod page list under the assumption that they cleaned the page out when in fact there may be a page table entry with the access bit set (meaning the page wasn't completely zerod when freed). That could cause problems.
AI99 - Updating code page directory attributes without tlb invalidation may result in improper handling of a page fault exception. This one doesn't look too serious, it just means the wrong exception will be taken first, meaning that the OS will probably seg-fault the program. If the OS corrects the issue and retries, the correct exception will be taken on retry. All BSDs that I know of handle page fault exceptions generically and will not be effected. Of greater concern is what sort of modifications to a page directory entry now require TLB invalidations? On FreeBSD and DragonFly, and I assume most BSDs and probably Linux too, page directory entries usually transition between only two states and a TLB invalidation is made when a page directory entry is invalidated, so they wouldn't be effected by this bug.