A New Vulnerability In RSA Cryptography
romiz writes, "Branch Prediction Analysis is a recent attack vector against RSA public-key cryptography on personal computers that relies on timing measurements to get information on the bits in the private key. However, the method is not very practical because it requires many attempts to obtain meaningful information, and the current OpenSSL implementation now includes protections against those attacks. However, German cryptographer Jean-Pierre Seifert has announced a new method called Simple Branch Prediction Analysis that is at the same time much more efficient that the previous ones, only needs a single attempt, successfully bypasses the OpenSSL protections, and should prove harder to avoid without a very large execution penalty." From the article: "The successful extraction of almost all secret key bits by our SBPA attack against an openSSL RSA implementation proves that the often recommended blinding or so called randomization techniques to protect RSA against side-channel attacks are, in the context of SBPA attacks, totally useless." Le Monde interviewed Seifert (in French, but Babelfish works well) and claims that the details of the SBPA attack are being withheld; however, a PDF of the paper is linked from the ePrint abstract.
When i see this kind of news the following question arises: so what are we supposed to do now? Throw away RSA cryptography is not the best answer i think. What do you, fellow /.ers, would do to by pass this problem?
When my Karma level reaches 0 I feel in piece with the Universe
pWn3d!
So it requires a spy proccess to be running on the same processor as the server....
--jeffk++
ipv6 is my vpn
"What do you, fellow /.ers, would do to by pass this problem?
Get rid of the spyware, perhaps?
Just in case it gets Slashdotted.
PDF file
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
If you have a Trojan on your computer you are going to lose your secrets anyway, because, surprise, your private key is probably stored on the disk drive, and you use the keyboard to type passwords, etc. etc. Could someone explain how a local attack can be big news ?
Let me get this straight. To use this attack, you need to be running on the same hardware, but you don't need any particular access beyond that? If that's the case, any multi-site server that allows you to run your own server-side scripting is at risk.
This is not a signature.
I started working on VAX/VMS systems in 1986. I changed jobs to another DEC site after nine months or so. I got an account, put my username in at the appropriate prompt, hit return and then immediately knew that I had entered my old username, not the new one.
I had to think for a bit before I knew the reason: VMS searched SYSUAF.DAT for my username before I entered the password. If it found the username it would stop searching. Later versions did the search after the password had been entered and one type of attack became less likely.
I suppose something like this could be done so that timing can't be used to debug the process runing the algorithm but another way of viewing it is that it is like getting the key from a chip by measuring the amount of power it uses or something. There may be limits to who far protection can go if you have the hardware or are on the same (virtual) machine.
http://michaelsmith.id.au
Just wondering. If it is, then who cares? What sort of moron generates new keys while the system's in a user-exposed untrusted phase?
Oh yeah, the Grid weenies....
This isn't really a flaw in RSA cryptography, but rather the fairly obvious situation that a branch predictor, shared between processes of different privilege levels, can be used as a covert channel and thus can be used to reveal state. The same is true with the cache, for example, and multithreading makes this problem many times worse by increasing the bandwidth of the channel. On architectures which don't have branch predictors, or don't share them, this is not an issue. ARM processors, for example, tend to rely on predication rather than branches (except when running Thumb), and thus don't suffer the same problem.
This class of problems is only going to grow as CPUs become less and less deterministic.
For unmasking the sensationalist headline. This attack only works if you are already logged in to the box and able to run processes on it. Yet from reading the article summary, it looked like I would have to fear for my online banking SSL connections and SSH sessions.
Aciicmez et al are extending an attack they published a few months ago. It's real, but:
It targets RSA implementations, not the algorithm, which is fine
Attackers need to be on the same host as the victim
This specific attack is tuned to the Pentium 4 architecture
This paper doesn't break SSL.
We wrote about the attack two months ago. A quick, dumbed-down recap:
The CPU aggressively caches aspects of what programs do. It doesn't make an exception for RSA. You obviously can't just read key bits out of the cache.
But caches are finite, and way, way too small to accomodate everything every program does. So operations from one program are constantly evicting cached values from other programs. This makes the other program imperceptably but measurably slower. By writing a program that constantly and carefully measures those time differences, you can watch an RSA operation from another program leave footprints through the cache.
There are years-old attacks like this against the L1 and L2 caches, and extensions that use hyperthreading to improve the resolution. Some variants, which measure timing differences but don't track cache footprints, are remote attacks. These aren't. You run a "spy" process on the machine; it repeatedly executes a series of operations and measures timing differences. Aciicmez found an overlooked cache which makes Pentium branch prediction work (the BTB). They published back in August.
From what I can tell, this paper extends the attack; they figured out that the Pentium 4 architecture has two BTB caches, and their original attack wasn't hitting both of them. Their new attack does, and that creates much bigger timing differences, making RSA's footprints much easier to see.
This is really cool stuff, but from where I stand, they hit game-over back in August with the original BTB attack. This paper reads like a refined exploit for the same vulnerability.
Since this is localhost-only, and (unlike Bernstein's and Boneh's attacks) can't be extended remote, it's not going to impact SSL or (single-user) SSH. The classic victim of timing attacks is smart cards. For these attacks, another interesting possibility is DRM; these attacks say you can't trust crypto running on the same Pentium 4+ as an attacker.
...the current OpenSSL implementation now includes protections against those attacks.
This really hits the mark. What if OpenSSL, an arguably widely-used crypto layer, were closed instead of open? According to what I hear on Slashdot, we would have no idea if "ClosedSSL" would have protection against this kind of thing, but because OpenSSL is, uh, open, we can verify that these kinds of attacks are indeed mitigated.
"Beware of he who would deny you access to information, for in his heart he dreams himself your master."
Better than BabelFish I hope.. human made, so prone to errors ;)
====
The confidence users have in Internet and in the capacity of the system to secure data has always been relative. And it could collapse if the microprocessor manufacturers and cryptography software editors were to be unable to cope against a new type of attack, fearsomely efficient, discovered by the team directed by the German cryptographer Jean-Pierre Seifert (universities of Haifa and Innsbruck). Electronic commerce could be threatened, but also, more broadly, everything that enables the dematerialization of exchanges, which rely on asymmetrical cryptography applications, would it be ciphers, digital signatures or message integrity checks.
In the still confidential article, the researcher and his colleagues describe the procedure they used to, gather a nearly entire cipher key of 512 bits (a series of as many of 0s and 1s) in a single attempt, that's to say in a few milliseconds. For comparison, the greatest public key that has been broken so far is 640 bits long, and as announced in November 2005, the process involved the usage of 80 microprocessors running at 2.2 Ghz for 3 months.
Since the announcement made this summer, on the International Association of Cryptology Research (IACR), that such an attack was theoretically feasible, microprocessors producers were on their nerves: the chips of nearly all of the computers, world wide, are vulnerable. So much that the head of Intel security, the number 1 microprocessor manufacturer, when confronted with the issue declared that he would be "unavailable for a few weeks". This is because the usual fix against classical attacks on public key cryptography - to increase the size of the keys - will not work this time.
Jean-Pierre Seifert was in fact able to affect the systems from the ground up. As most of the security relies on the incapacity to mathematically deduce the private key, kept secret, from the public one, he chose to study how the microprocessors was reading these confidential data.
He found out that the mode of operation or the chip itself, optimized for calculation speed, was making it vulnerable. "Security was sacrificed for the sake of performance", estimated the researcher.
The attack principle can be summed up as such: to go faster and faster, the microprocessor parallelizes operations and uses a branch prediction system to predict the result of the current operation. If the prediction is good, the computation time is greatly decreased. If not, the processor must go back and start again the elementary operation. It is "sufficient" to measure the computation time when the processor goes through the line of 0s and 1s that constitute the cipher key to able able to deduce it.
This threat, called "Branch Prediction Analysis" (BPA) was already known. It was thought a lot of attempts was necessary to statistically deduce the cipher key, thus making the attack not-practicable. The technique discovered by Jean-Pierre Seifert make it possible to break the key in a single attempt. It relies on the fact that the prediction process, essential to increase the processor speed, is not protected.
A spyware could then be made to listen to the chip discreetly, and send back the key to hackers, foreign intelligence services or competitors.
"A MATTER OF WEEKS"
We are not yet there though. "We have not made a turn key application that would be available online" argues Jean-Pierre Seifert. But he estimates that once the method is made public, in early 2007 during the next RSA conference - RSA, being one of the most popular ciphers -, the making of such software would be "a matter of weeks".
Cryptography specialists confirm that the threat is serious. One of the best world wide public key experts anonymously sums up the situation: "The real solution is to review the conception of the microprocessors itself - a long and difficult process. A short term solution would be to forbid normal applications to run in para
Yep localhost only... so who is the primary user of localhost public key cryptography techniques?
:)
DRMs! yep yep!...
I would love to see Vista DRMs cracked before the OS even make it to the market...
ok.. i'm probably dreaming, but still it feels good.
Now I have no idea what this means exactly, so I'll let Steve Jobs explain this RSA attack. I called him up and asked him how it worked. Jobs said, "I dunno what it does. It predicts...branches. It's a good thing!"
Viper is the preferred editor of the Emacs operating system.
This seems to rely on spy software watching your particular RSA application decrypt things, and thus said spy software would need to be running on the same hardware. Wouldn't it make sense to start writing RSA implementations that use the computer's GPU whenever possible? Using the GPU as a coprocessor has been discussed on Slashdot previously, and the main disadvantage was that the GPU isn't typically optimized for general purpose computing. However, since most GPU's are optimized for massive number crunching, I would think that RSA would play to its strengths. Obviously, whether or not the GPU is "good" at it, it would still come with the advantage of moving the implementation off the main CPU, and thus foiling any malware. (At least until they design malware that runs on the GPU...)
Formerly GNU/Anonymous Coward. This message has been determined to cause cancer in laboratory animals.
You're thinking too small. The real problem moving forward is virtualization; the industry is converging to a state where hosted servers will transparently share underlying hardware; the new threat model is VM-to-VM, which is why the earlier commenter who dismissed "overly complex local-only problem scenarios" (paraphrase) is off base.
Can the spy process do its job from a different virtual machine? If so, and if virtualization methods can't be tweaked to defeat all such attacks while maintaining reasonable performance, that might spell doom for the virtual server market.
Fuck the system? Nah, you might catch something.
I would love to see Vista DRMs cracked before the OS even make it to the market... :)
No no no! I hate when researchers do that!
Not long ago someone published a way to load unsigned kernal drivers in Vista (Vista's DRM mechanisms criticially rely on owners being strictly forbidden to do that). So what happened? Microsoft patched Vista to prevent that mechanism from working... before the OS even made it to market. So the discovery and all that work went entirely to waste.
No, when someone finds an anti-DRM method the best thing to do is not just wait until release, but wait until the DRM scheme has had time to build up a critical usage base. If you find two or more methods at the same time, then only reveal them one at a time. Wait till they release the inevitable new lockdown patch, then wait maybe a week or two, then reveal the next one. (Revealing the second one on the very day they patch may be more fun and impressive, but I think it much better to drag out the value and effect of the multiple techniques over a period of weeks or months.)
-
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
From the paper, it sounds as though the attack only relies on the public key. Does this mean that all pgp / gpg secret keys can be compromised this way?
The paper seems to describe a process where only the public key has to be on the "spied upon" machine. Everybody can get access to almost everybody's public pgp key. Does this mean, that all RSA pgp / gpg keys can be compromised using this method, even if one keeps the secret key on a computer with only a floppy drive (to get the data which will be decrypted onto the machine) and no connection to any kind of a network (ideally...)? Is it really this scary, or do they infact need to spy on the machine with the secret key on it?
The whole thing - as critical as it is - spies on the processes running on the machine. It is an indirect attack, checking on the resources used while performing some not broken algorithmic calculations.
When you disable pipelining and cache while doing the calculations, there is not much to spy on and nothing to gain. Just prevent the wanna-be intruder from seeing cache, pipelines, CPU from working makes you safe.
The problem is, that this isn't very practical.
I wished the editors used less misleading headlines. There is no vulnerability in RSA cryptography per se. It is rather that you observe Men At Work and from what you see you can guess what the're doing.
And in principle this applies to any other cryptography just as well. Inclusive DRM (which makes me giggle).
This type of attack has been mentioned before per DJ Bernstein's paper on "Cache-timing Attacks on AES":
p df
p s
http://cr.yp.to/papers.html#cachetiming
PDF format:
http://cr.yp.to/antiforgery/cachetiming-20050414.
PS format:
http://cr.yp.to/antiforgery/cachetiming-20050414.
Hi, This paper is based on the wrong assumption that the algorithm that is used is binary exponentiation. This is false as every single respectable implementation uses N-ary multiplication or sliding windows in the worst case scenario. In both of these cases, the attack as shown in the paper would only be able to predict minimal information. Also, the claimed statement that you can do nothing with this type of attack is completely false (even in the case of binary exponentiation.) Just do this: - If you have a 1, do as usual - If you have a 0, do as in the case you have a 1, but ignore the last value, and use the result of the squaring. Regards, LG
A slight improvement to your idea might be to balance the loop anyway, using D = 1 - di, etc., essentially a vectorized version of figure 4. This would slow it down by a factor of two but it would make it resistant to conventional timing attacks.
We don't see the world as it is, we see it as we are.
-- Anais Nin
get rid of those pesky branches...
Any thoughts, or did I miss somebody's comments?
Somehow, my password had been changed by someone even though I have always login using HTTPS. I wonder why...
Good thinking. That could be a very practical use for this technique - hacking your own machine. I wanted to mod you up, but as the other poster remarked, DRM countermeasures should be secret until the DRM is widely deployed.
Another technique that I'd like to chip in with is the possibility of using differential power analysis on your own hardware. Apparently it is not possible to completely prevent power analysis on a piece of hardware: there are plenty of things that can be tried, such as using balanced data paths and executing extra obfuscating operations, but these just make the attack slower - not impossible. We may end up using power analysis to extract the secret keys from our trusted computing modules in the future. However, this will require sensitive measuring equipment: a software-only attack would be much easier!
Second, these are timing-based attacks that perform branch prediction. This requires no changes to OpenSSL or any other source to completely mask. You first mix the optimization methods when compiling, and then use the real-time scheduler, to ensure that as many branches as possible (preferably all) take the same length of time within the margin of error of any observation. You can't choose between that which you cannot distinguish.
Finally, this is specific to an implementation. If OpenSSL had multiple branches for the same algorithm (just a different implementation which it could select at random for any block of data), or if you had multiple OpenSSL installations which differed sufficiently in implementation that the computer could again select at random, you'd make it a lot harder for an attacker.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
From the linked article:
// Either 0 or -1 // -1 or 0 // Either 0 or R0 // At this point T0 will point to the value to be squared, R0 or R1!
// Now we move the correct values back into R0 & R1
R0 = 1; R1 = M
for i from 0 to n-1 do
if d[i] then
R1 = R0 * R1 mod N
R0 = R0 * R0 mod N
else
R0 = R0 * R1 mod N
R1 = R1 * R1 mod N
return R0
The key-dependent if statement is the key here, if we can remove all such branches, then there's no Branch Target Buffer entry that depends on it, and no timing channel attack either:
R0 = 1; R1 = M;
for (i = 0; i < n; i++) {
mask = 0 - d[i];
nmask = mask ^ -1;
T0 = R0 & mask;
T0 += R1 & nmask;
T1 = R0 * R1 mod N;
T0 = T0 * T0 mod N;
R1 = T1 & mask;
R0 = T0 & mask;
R0 += T1 & nmask;
R1 += T0 & nmask;
}
return R0;
There are at least three interesting issues here:
a) Most modern cpus have hw support for conditional operations, on x86 this is in the form of CMOVcc which is a (constant-time!) conditional move into a register, but as shown above, it really isn't needed here.
b) The perforance impact of the above branch removal can be negative!
On a P4 a branch miss costs about 20 clock cycles, and since a key-dependent branch will miss 50% of the time, the average cost is 10 cycles. My replacement code above takes around 5 cycles or less on any current cpu.
c) A final possible timing-channel attack would be due to the memory alignment of the R0 and R1 values:
By allocating them at the same address modulo the cpu page size, i.e. at 4 KB offset, the cache lines hit will be the same for both.
When I worked on the asm version of DFC, one of the AES also-rans, I removed a similar timing attack from a core 128-bit modular multiplication operation, using very similar techniques.
Terje
"almost all programming can be viewed as an exercise in caching"
This kind of comments are what keeps slashdot above digg and all the rest.
We are Turing O-Machines. The Oracle is out there.
Balancing may not be needed at all with the vectorized version. Certainly the same top level calls are made each time through the loop regardless of the bits in the private key so the need to balance is totally dependent on the implementations of the arithmetic functions, particular the modulus. If the control flow in all these functions is data independent then there would be no need to balance the vectorized version. It will be immune from both branch table and timing attacks (I think). If the control flow in those functions is data dependent then it would probably more efficient to vectorize them rather than balance the top loop.
We don't see the world as it is, we see it as we are.
-- Anais Nin
Widely-respected Australian cryptographer Peter Gutmann offered a concise analysis of the Seifert's achievement on the Cryptography Mailing List yesterday. It offers both detail and useful perspective.
Udhay Shankar N had just summarized the scary rumors about the Seifert's attack:
"... German cryptographer Jean-Pierre Seifert has announced [1]a new method called Simple Branch Prediction Analysis that is at the same time much more efficient that the previous ones, only needs a single attempt, successfully bypasses the OpenSSL protections, and should prove harder to avoid without a very large execution penalty."
Gutmann replied with a magisterial air:
"That's not quite accurate. What it did was succeed against a an old version of OpenSSL that (a) didn't have the protections present yet and (b) had been specially modified to make it vulnerable to the attack. It's a nice attack, but based on what's been published so far the claims of RSA's demise are considerably exaggerated.
"What it does is rely on the fact that on a HT P4, if you saturate the branch target buffer (BTB) from a second thread running in the same pipeline (i.e. on the same HT CPU), you can see when BTB misses occur in the RSA thread and therefore observe whether it's branching on a one or zero bit.
"To do this, they had to use (as mentioned above) a rather old version of OpenSSL that doesn't employ any protection against this type of attack. In addition they reduced the modexp window size from 5 to 1 (to make sure you get a branch for each bit, with the standard window size 5 the branches are replaced by a table lookup), and they disabled the CRT code (to force use of the textbook-mode RSA operation that, in practice, no software implementation ever uses).
"This isn't to say that the paper doesn't point out a potential vulnerability. However, saying "we broke RSA" or "we broke OpenSSL" is pushing things a bit."
Peter.thats me again. i wrote to one author of the paper. this method is known but is slower, he says. true because i always perform all calculation. but i cant believe its 4times slower. for me its 1.5 times slower and 2 times at worst, and prolly not because as terje said it removes pipe flush.