Hyperthreading Considered Harmful
cperciva writes "Hyper-Threading, as currently implemented on Intel Pentium Extreme Edition,
Pentium 4, Mobile Pentium 4, and Xeon processors, suffers from a serious
security flaw. This flaw permits local information disclosure, including
allowing an unprivileged user to steal an RSA private key being used on the
same machine. Administrators of multi-user systems are strongly advised
to take action to disable Hyper-Threading immediately.
I will be presenting this attack at
BSDCan 2005 at 10:00 AM EDT on May 13th, and at the conclusion of my talk
I will also releasing a paper describing the attack and possible mitigation
strategies."
I read about this last night here at KernelTrap. They offer more info, evidently having talked to Colin...
The point is that most servers system don't allow you to execute system calls which you could exploit.
You need at least root/administrator privileges to get stuff from the OS memory.
So before you can exploit the system you must have access to the system it self.
It is an "local" kind of "root exploit" if you can read from the system memory of other processes if the claim is true.
Unlike SMP, with HT you're interleaving two threads on the same physical execution unit. That means that there is data from another thread in registers at the same time that you're executing, without having enough instructions execute during a context switch to flush the pipeline. It also means that the other process's page table is in the MMU while you're executing. Even if their proof-of-concept attack doesn't work on some other operating systems, everyone needs to look over their code to make sure this isn't just an accidental effect that could change with increasing pipeline depths, different context switch logic, etc.
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
Actually, Intel CPUs contain patchable microcode ROMs. You can see the option to enable it when you configure a Linux kernel.
-mkb
I'd be willing to bet he's right. He is currently awaiting a doctorate from the University of Oxford, which is commonly held as the finest academic institution in the world.
(I'm not biased by having spent the past 7 years there)
The Ars Technica page on hyperthreading with the Xeon might provide some clues. It lists which parts of the CPU are replicated, partitioned and shared.
...
...
One final bit of information that should be included in a discussion of partitioned resources is the fact that when the Xeon is executing only one thread, all of its partitioned resources can be combined so that the single thread can use them for maximum performance. When the Xeon is operating in single-threaded mode, the dynamically partitioned queues stop enforcing any limits on the number of entries that can belong to one thread, and the statically partitioned queues stop enforcing their boundaries as well.
The same can be said for the register file, another crucial shared resource. The Xeon's 128 microarchitectural general purpose registers (GPRs) and 128 microarchitectural floating-point registers (FPRs) have no idea that the data they're holding belongs to more than one thread--it's all just data to them, and they, like the execution units, remain unchanged from previous iterations of the Xeon core.
For a simultaneously multithreaded processor, the cache coherency problems associated with SMP all but disappear. Both logical processors on an SMT system share the same caches as well as the data in those caches. So if a thread from logical processor 0 wants to read some data that's cached by logical processor 1, it can grab that data directly from the cache without having to snoop another cache located some distance away in order to ensure that it has the most current copy.
You might think since the Xeon's two logical processors share a single cache, this means that the cache size is effectively halved for each logical processor. If you thought this, though, you'd be wrong: it's both much better and much worse. Let me explain.
Each of the Xeon's caches--the trace cache, L1, L2, and L3--is SMT-unaware, and each treats all loads and stores the same regardless of which logical processor issued the request. So none of the caches know the difference between one logical processor and another, or between code from one thread or another.
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Most machines let you disable it in the BIOS, which would have to be the simplest way of turning it off possible.
According to XP Task Manager:
Firefox: 9
Visual Studio: 10
Outlook: 8
Gaim: 2
Explorer: 8
Other People's Cache - HyperAttacks with HyperThreading - Dag Arne Osvik, Norway
...RSA is vunerable to timing attacks (why we have blinding in software). It's a wonder noone has thought about this earlier though, I remember reading about the military considering virtual machines (i.e. one physical machine could be on both classified/unclassified systems). One of the reasons they didn't was the ability to tap/signal through spinlocks and other timing data. I always thought this was a "well-known but too unlikely to be interesting" weakness, but I guess not. Maybe I should have published a paper myself.
Live today, because you never know what tomorrow brings
Regular desktop apps tend to have lots of threads, but the issue is not how many threads exist but rather how many of them attempt to use CPU at the same time.
For instance, your web browser might spawn a thread to do a DNS lookup, since you wouldn't want the GUI to block during DNS. That thread hardly uses any CPU though. When your Web browser does real work, like rendering, it will usually be confined to a single thread.
I have seen the future, and it is inconvenient.
This is only tangentially related to the security issue, but I found that disabling hyperthreading on a cluster of dual Xeons running Linux greatly improved performance with a distributed memory (MPI) numerical model. Short summary: even if you only run your model on physical CPUs, hyperthreading will apparently bounce jobs around in a somewhat random way. Not sure if it's a hardware issue or a software (Linux) issue.
Here is a link which goes into detail
A squid eating dough in a polyethylene bag is fast and bulbous, got me?
But since Intel has not yet responded to this, it's unknown whether the problem can be fixed that way. We'll just have to see after the official annnouncement.
I'm not sure what is really involved by this, but a FreeBSD security bulletin was released today addressing this topic (including a kernel patch and work-around) so I highly doubt this is simply a stunt.
Hyper Transport has nothing to do with Hyper Threading. Hyper Threading means processor support for several (usually two) execution threads at once. Hyper Transport is a bus technology to interconnect pocessors, RAM, motherboard chips, PCI bus and the like.
AMD's Hyper Transport is similar to Intel's Hyper Threading, but in my books, superior.
That's like saying that the computers from Apple Computers are similar but superior to the computers from Apple Records. Notice how Apple Records makes no computers? Just because they start with the same word does not mean two things are the same.
I just watched his talk, and you are on the right track. Your workaround is one he suggested too. It's actually a timing based attack based on watching the cache misses in a spy thread to try and reverse the RSA public key. The interesting thing is this isn't Hyper-Threading only - it's possible on normal procs too that don't flush the cache between context switches. It's just that with HT context switches can be far more common.
Random is the New Order.
Yes, but each context MUST have its own register set or it makes no sense at all. Perhaps the attack comes through the rename registers or somesuch. Each SMT (or HyperThread) context has its own set of registers and they don't share.
My paper is available here.
Have fun reading, I'm going back to the conference.
Tarsnap: Online backups for the truly paranoid
The reason HT is vulnerable is because both threads share the cache and context switches can happen at any time. It could on normal non-HT procs too but the context swithces are more likely to flush the cache or not happen as often.
Random is the New Order.
Why wasn't Intel notified over the past SEVEN MONTHS ?
They were. I've clarified the page somewhat now, but "Other security teams" includes Intel.
Tarsnap: Online backups for the truly paranoid
Why notify FreeBSD and then wait 2 or 3 months before notifying other possibly affected vendors (at least other BSDs)?
Two reasons. First, because I'm part of the FreeBSD Security team -- I'm required to notify them about potential issues.
Second, because if I contacted lots of security teams with what I had on December 31st, they wouldn't have listened: "Umm, hey guys, there's a problem with hyperthreading. I've convinced myself that it is real, but I don't really have any evidence to give you, so you'll just have to believe me..."
Tarsnap: Online backups for the truly paranoid
He alerted SCO to a flaw in their OS?
Actually, I posted to vendor-sec. I was rather surprised when I got an email back from SCO -- I didn't think that they'd be on vendor-sec.
Tarsnap: Online backups for the truly paranoid
The paper is now available at:
http://www.daemonology.net/papers/htt.pdf
I've tried HT on both the 3.0c [Northwood, 512k L2] and 2.8e [Prescott, 1M L2] P4 models, both with identical hardware otherwise [1Gb dual channel DDR400, 875P chipset, nvidia fx5200, 120Gb 7200RPM ATA133 WD disc]. It's really nice on the 2.8e, but you fall in the cache miss tar pit on the 3.0c. With HT turned on the 2.8e actually feels faster than the 3.0c ever did, especially under heavy load, and is nearly impossible to bring to its knees whatever I throw at it.
:)
Back on topic: This attack doesn't really shock me that much; covert channels are a fact of life in any multi-user machine, and anything that needs bulletproof security should be on isolated hardware. Attacking an RSA implementation by analyzing cache performance is a truly sweet hack though... my propeller-beanie spins in admiration.
...when you're writing a game...tweak the difficulty of "Easy" to something [your mother] can cope with. -- onion2k
Well, I just read the paper, and I applaud Colin on several levels. First off, the theory of the attack is rock-solid and well-written. Secondly, he describes very implementable OS work-arounds, crypto library fixes, and finally chip design corrections which will totally eliminate the security hole.
This is one of the best thought out, best written papers of its kind that I have read in my over thirty years of work in the engineering field.
About the word "if": If bullfrogs had wings, they wouldn't bounce around on their little green butts.
It's about shared cache and timing cache hits and misses. One thread can monitor the cache hits and misses of another thread (because access to a cache miss takes more time) and infer how that thread operated. This is as much of a problem on dual-core (with shared cache) as any SMT implementation. As noted in the paper, it's even a problem on normal systems that use paging.
How about just not allow different UIDs on the core at the same time?
That would be the ideal solution (assuming that you also check for setuid/setgid programs). Unfortunately, it's really hard to do that correctly due to problems of kernel data locking.
FreeBSD's policy on security fixes is that they must never ever break anything -- so if necessary (as in this case) a simple but suboptimal fix will be used instead of a complicated fix which might have the inadvertent side-effect of causing machines to crash.
Tarsnap: Online backups for the truly paranoid
During the Cryptographer's Panel at the RSA conference, Adi Shamir made a short reference to this vulnerability.
...a presentation would be forthcoming at the Eurocrypt 2005 rump session next week in Denmark.
Yes, we seem to have discovered the problem independently. (Until today I wasn't sure if we had discovered the same problem -- Adi Shamir didn't reply to an email I sent him about this -- but I got an email from Eran Tromer after my paper went online.)
I don't want to pre-release their results, but Shamir, Tromer, and Osvik decided to demonstrate the attack in a somewhat different way. I think it demonstrates how dangerous this attack is that two people independently discovered the attack and came up with different entirely practical targets for it.
Tarsnap: Online backups for the truly paranoid