Theo de Raadt Details Intel Core 2 Bugs

Re:Yay AMD by BosstonesOwn · 2007-06-28 01:02 · Score: 5, Informative

I don't think that is a good thing either. It looks like AMD may be doing this as well.

(While here, I would like to say that AMD is becoming less helpful day
by day towards open source operating systems too, perhaps because
their serious errata lists are growing rapidly too).

--
This package Does Not Contain a Winner

Re:Summary sucks, someone please provide better on by Aladrin · 2007-06-28 01:07 · Score: 4, Informative

Sure:

Some of the bugs are so dangerous that it doesn't matter WHAT operating system you're running, code could be written that could attack the entire system. It would still be OS-specific code, but since the exploit is in the hardware, it's a LOT harder to prevent the attack, if it's even possible.

Some of the bugs are unfixable, as well. (I assume they mean without physcially replacing the chip with a 'fixed' one that doesn't exist yet.)

--
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM

Re:How hard is it to get right? by ardor · 2007-06-28 01:09 · Score: 4, Informative

Actually we are talking about VHDL. The "million transistors" argument is just as appropiate as saying "software is so large, it has so many ones and zeros". Development does not happen at this low stage.

--
This sig does not contain any SCO code.

Re:Good stuff. by Lisandro · 2007-06-28 01:22 · Score: 5, Informative

Same here. The guy might seem like a bit of an asshole sometimes, but he surely knows what he's talking about. Some of the things he points out are plain unbelievable:

Basically the MMU simply does not operate as specified/implimented in previous generations of x86 hardware. It is not just buggy, but Intel has gone further and defined "new ways to handle page tables" (see page 58).

Some of these bugs are along the lines of "buffer overflow"; where a write-protect or non-execute bit for a page table entry is ignored. Others are floating point instruction non-coherencies, or memory corruptions -- outside of the range of permitted writing for the process -- running common instruction sequences.

It will be interesting to see what Intel has to say about this.

Re:Yay AMD by vadim_t · 2007-06-28 01:26 · Score: 3, Informative

Well, there's VIA as well, althought their stuff left a lot to be desired the last time I checked it out. Their mini-ITX stuff had potential -- small, low power usage, REALLY good crypto and video acceleration to compensate for the slow CPU. Unfortunately when I tried a Nehemiah board, it was very unstable.

Re:Patches by Jeff+DeMaagd · 2007-06-28 01:36 · Score: 3, Informative

Now, the only thing left to do, is someone tell Intel that they're selling hardware.

Hardware has had built-in firmware/software for as long as I remember. BIOS is software. Microcode for even consumer CPUs has been done for as long as I remember, Pentium II had it. Apparently, the 8086 had microcode-based instructions.

Re:How hard is it to get right? by TheGratefulNet · 2007-06-28 01:44 · Score: 4, Informative

AMD64 doesn't like FreeBSD 6.2 at all

% uname -a
FreeBSD myhost.grateful.net 6.2-STABLE FreeBSD 6.2-STABLE #0: Mon May 28 09:52:28 PDT 2007 me@myhost.grateful.net:/usr/obj/usr/src/sys/AMD64 i386

granted, I'm using 32bit mode - but I've been running 6.2 for as long as its been out and my 'always on' freebsd box. what issues are you seeing? this is my production box - but I don't see any problems with bsd. in fact, I also have 6.2 running with an old amd64 3000+ that was a mobile chip and had to have cpufreq enabled just to move it off its default 800mhz and up to the 2.mumble ghz that its supposed to clock at. works fine.

I have seen some hardware devices not behave well but often its not a well designed piece of hardware or its just not meant for server style loads (cheap consumer onboard sata sometimes times out and usb2.0 always times out if you give it enough load).

I can't speak to amd64 USING 64bit mode, but 32bit mode works as well as (or better) than linux on headless style computing.

--

--
"It is now safe to switch off your computer."

Linus doesn't think it's a big deal. by davebert · 2007-06-28 01:44 · Score: 5, Informative

Link

Re:Linus doesn't think it's a big deal. by Durzel · 2007-06-28 03:49 · Score: 3, Informative

Theo also seems quite sensationalist from a first glance (this is the first of his articles I've read). Emotive statements like "These processors are buggy as hell", conjecture like "We bet there are many more errata not yet announced" doesn't really lend credence to his arguments.

He may be entirely right, and his experience in CPUs, BIOS vendors and Intel, AMD, etc may mean what he is saying is accurate - but the tone doesn't really sound very professional.

Re:Summary sucks, someone please provide better on by swillden · 2007-06-28 01:45 · Score: 5, Informative

Some of the bugs are so dangerous that it doesn't matter WHAT operating system you're running, code could be written that could attack the entire system. It would still be OS-specific code, but since the exploit is in the hardware, it's a LOT harder to prevent the attack, if it's even possible.

Here's a little more detail, based on my (very incomplete) understanding of the issues:

It appears that Intel has made changes to the way the memory management unit in the processor works, plus there are also some bugs that affect memory management. So what does that mean?

Theo mentions changes in how TLB flushes must be handled. Translation Lookaside Buffers (TLBs) are tables where operating systems cache information used to quickly determine what physical memory page corresponds to a given virtual memory page. Each running process has it's own address space (meaning the data at address, say, 1000, is different for each process) and operating systems have to be able to quickly translate these virtual addresses to addresses within the physically-available RAM. The authoritative data on the mapping is in a set of data structures called the "page table", but the processors provide a mechanism for creating and managing TLBs which act as a high-performance cache of part of the page table data. Failing to properly flush the TLBs during a context switch (putting one process to sleep and activating another) might result in the new process' virtual memory mapping being done incorrectly. From a security perspective, this could give one process access to memory owned by another.
Another issue mentioned is the possibility that No-Execute bits may be ignored. The OS can set the No-Execute (NX) bit on a page of memory that it knows to be pure data that should not be executed. The processor will refuse to execute code from any memory page with NX set. This makes most buffer overflow attacks impossible, because the normal buffer overflow attack involves getting a bit of malicious code shoved into a stack-based buffer as well as overflowing the buffer to overwrite a return address so that the CPU will jump to and execute the malicious code. Obviously, if the processor sometimes ignores NX bits, the buffer overflow attacks become possible again.
Theo also mentions possibly-ignored Write-Protect (WP) bits. The OS can mark memory pages as read-only. This is used for all sorts of things related to security. One of the biggest is preventing processes from writing to the memory in which shared libraries are loaded. If my process could overwrite, say, the C library code implementing "printf", other processes that call this function would execute my code. Some of them will be running as root, so I can execute code with root permissions. Modern operating systems do lots of data-sharing between processes, some of it completely non-writable, other parts of it "copy on write". Copy-on-write pages are implemented by setting the WP bit and then catching the page fault generated by the CPU when a process tries to write the page. The fault handler quickly copies the page in question, allows the write to hit the copy, and reswizzles the page table so the virtual page of the writing process points to the new copy. WP bits being ignored would also break this, so lots of cases where data is "opportunistically" shared would become really and truly shared, allowing one process to corrupt data used by another.

There are other issues as well... but these are a good sample, and should give an idea of what kind of bad stuff these CPU bugs/changes can make possible.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.

Re:Yay AMD by TheRaven64 · 2007-06-28 01:51 · Score: 5, Informative

SPARC is doing very well for certain categories of workload, although mainly web-app types at the moment. Most computers sold these days have some form of ARM chip[1], which is a nice, low-power architecture, but lacks floating point. This isn't a huge problem, since a lot of ARM designs (particularly those from TI) have a DSP on die which can seriously out-perform a general purpose CPU for a lot of FPU-heavy workloads.

For general-purpose usage, the most interesting design I've seen recently is the PWRficient from P.A. Semi. It's a nice dual-core 64-bit PowerPC, with low power usage, similar performance to IBM's PowerPC 970 series. It has a lot of nice stuff on-die (crypto, a really shiny DMA architecture, etc).

For a complete round-up of current alternatives, take a look at this article and the next two in the series.

[1] They are generally marketed as 'cell phones' or similar, rather than 'computers'.

--
I am TheRaven on Soylent News

Re:Summary sucks, someone please provide better on by TheRaven64 · 2007-06-28 02:08 · Score: 4, Informative

I don't know why Theo posted that link, because it is about the Core, not the Core 2. They are two completely different micro-architectures. The Core was a slightly tweaked Pentium M (which is basically a P6 with extra vector instructions and the NetBurst branch predictor), while the Core 2 is a completely new micro-architecture. If you compare the errata in the two links, you will see that they are quite different.

--
I am TheRaven on Soylent News

Scariest post of the thread by Anonymous Coward · 2007-06-28 02:13 · Score: 4, Informative

Scariest post on that thread:

http://marc.info/?l=openbsd-misc&m=118302016430106 &w=2

AMT is a technology intended to facilitate survailance, maintenance
and control computers remotely.

* Monitor and control (filter) the network traffic - before/under the
running operatingsystem

* sending out patches to computers - even if they are turned off.

* Control, upgrade, change, add and remove software

Re:Summary sucks, someone please provide better on by Bri3D · 2007-06-28 03:16 · Score: 3, Informative

Another scary bug (perhaps the scariest, since it appears to be the one that most reliably/repeatably occurs) is AI88: Microcode Updates Performed During VMX Non-root Operation Could Result in Unexpected Behavior.
From what the errata says, unless the host software has specifically disallowed access to parts of the MSR, a VMX guest/non-root system could reload the CPU microcode.
This leads to a whole universe of complicated data theft/code execution/etc. exploits that will probably never be created due to their complexity. However, it also leads to a very, very, very simple DoS/crash exploit (load some bad microcode, crash the CPU).

Re:Time for RISC? by edwdig · 2007-06-28 03:46 · Score: 3, Informative

I think the latest Power series will give any Intel CPU a run for it's money as well the latest Sparc.

Yes, they will. But those chips are designed with a target price of thousands of dollars and without anywhere near as much concern about heat.

Power has a 128 KB L1 cache (64 KB on Core 2), 4 MB L2 cache per core (4 MB L2 shared on Core 2), and a 32 MB L3 cache (none on Core 2). If you're willing to pay for that, x86 would be a lot faster.

Oh, don't forget that Power chips run really really hot. Hotter than Pentium 4's. The market has made it clear that lower power usage / heat generation is a priority now.

The erratum mentioned by m.dillon · 2007-06-28 05:21 · Score: 4, Informative

Ok, lets look at some of these.

AI65 - Thermal interrupt does not occur if DTS reaches an invalid temperature. What the hell is an invalid temperature? A disconnected sensor or something? It doesn't sound like something a userland thermal-generating loop can exploit but the errata is not detailed enough to know for sure.

AI79 - REP/STO in specific situation may cause the processor to hang. BIOS patchable. The errata mentions an uncacheable memory store. If this is a pre-requisit then only user programs with access to /dev/io or memory-mapped bus space can exploit it. So e.g. something like XOrg, but not the typical user program. Worse case seems to be a system freeze. Still, this is something to be concerned about.

AI43 - Concurrent MP writes to non-dirty page may result in unpredictable behavior. This one is extremely serious. It effects any threaded program and possibly even programs which are no threaded. This would cause me to not purchase the cpu. It says that a BIOS workaround is possible (aka microcode update).

AI39 - Cache access request from one core hitting a modified line in the L1 cache of another core may cause unpredictable system behavior. What the hell? Are they out of their minds? This is a big-time show stopper. It says it can be fixed with the BIOS (aka microcode update). I sure hope so.

AI90 - Page access bit may be set prior to signaling a code segment limit fault. This one is pretty serious. This cannot occur on most operating systems because the code segment is set to be unlimited and access is governed solely by the page tables. In 64 bit mode emulating 32 bit operation the problem might occur if a bit of code wraps the segment. There are possibly issues in other emulation modes, such as VM86 mode. The effect of setting the page accessed bit will not make a page accessible that was not previously unaccessible, but it will result in unexpected modifications to the page table page and numerous operating systems may free such pages to the page-zerod page list under the assumption that they cleaned the page out when in fact there may be a page table entry with the access bit set (meaning the page wasn't completely zerod when freed). That could cause problems.

AI99 - Updating code page directory attributes without tlb invalidation may result in improper handling of a page fault exception. This one doesn't look too serious, it just means the wrong exception will be taken first, meaning that the OS will probably seg-fault the program. If the OS corrects the issue and retries, the correct exception will be taken on retry. All BSDs that I know of handle page fault exceptions generically and will not be effected. Of greater concern is what sort of modifications to a page directory entry now require TLB invalidations? On FreeBSD and DragonFly, and I assume most BSDs and probably Linux too, page directory entries usually transition between only two states and a TLB invalidation is made when a page directory entry is invalidated, so they wouldn't be effected by this bug.

Re:The erratum mentioned by m.dillon · 2007-06-28 05:58 · Score: 4, Informative

Now the core duo/solo errata.

AE1 - CPU to memory copy with FST with numeric and null segment exceptions may cause GP faults to be missed and FP linear address mismatch. In otherwords, a segmentation violation will be missed and a write will be allowed to proceed. This will not effect OSs using page tables for protection, which is all OSs. Sounds bad but doesn't sound like it will effect existing OSs

AE2 - Code segment violation may occur if a code stream wraps around a segment. No program does this on purpose, and OSs will just seg-fault the program if it does. The intel errata says it could be exploted by a virus but I don't see how by its current description. Maybe there is something they aren't telling us.

AE3 - POPF/POPFD that sets the trap flag (aka when single-stepping a program) may cause unpredictable behavior. Holy shit. This one is serious.

AE4 - REP MOVS in fast string mode continues in that mode when crossing into a page with a different memory type. This means that when crossing over from a cacheable page to an uncacheable page, the I/Os remain cacheable. And vise-versa. This will never happen on purpose so the question is whether it can be exploited in some way, and the answer to that is not that I can see.

AE5 - Memory aliasing with inconsistent dirty and Access bits may cause a processor deadlock. This means a PTE with 'D'irty set but with 'A'ccess not set. FreeBSD and DragonFly always set the A bit when setting the D bit and will not be effected but I don't know about other OSs. This is a very serious bug though.

AE6 - VM bit will be cleared on a double fault exception. Double faults are usual fatal for the whole machine so unless they can occur in an emulation mode (where the double fault is being emulated). Check your OS. FreeBSD and DragonFly do not try to resume after a double fault and do not take faults in VM mode and are not effected.

AE7 - Incompatible write attributes in page table verses MTTR may consolidate to UC. Not a big deal, doesn't happen unless something has been misprogrammed.

AE8 - FXSAVE after FNINIT without an intervening FP instruction may save uninitialized values for FDP and FDS. This isn't an issue unless the data being written represents a security leak of some sort, such as a portion of the state of another program's FP unit. This could be a security issue with regards to one program snooping another program's cryptography. Statistical snooping possible through this sort of mechanic has been shown to be effective in recent years.

AE9 - LTR can result in a system hang. Well, BSDs don't really use LTR all that much and the conditions required just will not happen on BSD or probably linux either. A break point must be set on the cache line containing the descriptor data? Not from userland!

AE10 - Invalid entries in the page directory pointer table register may cause a GP fault. Not an issue.

AE11 - REP MOVS operation in fast string mode continues in that mode when crossing into a page with a different memory type. Not an issue.

AE12 - FP inexact result exception flag may not be set if the #inexactresult occurs in any FPU instruction with certain instructions occuring afterwords. This is a very serious bug that only compilers can work around (and probably won't).

AE13 - IFU/BSU deadlock may cause system hang. I've no idea what IFU and BSU is.

AE14 - MOV with debug register causes a debug exception. Sounds like the worst that happens here is a program seg faults if this condition is hit while the program is being debugged.

AE15 - INIT does not clear global entries in the TLB. Oh, joy. Intel says that BIOS writers would know of thise errata and cod efor it, but insofar as I know this could be an issue when starting up APs.

AE16 - Use of memory aliasing with inconsistent memory type may cause system hang. It shouldn't be possible for this to happen with a modern OS. It means mapping the same physical page of memory with different memory contr

Slashdot Mirror

Theo de Raadt Details Intel Core 2 Bugs

17 of 442 comments (clear)