A New Vulnerability In RSA Cryptography

← Back to Stories (view on slashdot.org)

A New Vulnerability In RSA Cryptography

Posted by ryuzaki0 on Saturday November 18, 2006 @09:45AM from the predictions-of-trouble dept.

romiz writes, "Branch Prediction Analysis is a recent attack vector against RSA public-key cryptography on personal computers that relies on timing measurements to get information on the bits in the private key. However, the method is not very practical because it requires many attempts to obtain meaningful information, and the current OpenSSL implementation now includes protections against those attacks. However, German cryptographer Jean-Pierre Seifert has announced a new method called Simple Branch Prediction Analysis that is at the same time much more efficient that the previous ones, only needs a single attempt, successfully bypasses the OpenSSL protections, and should prove harder to avoid without a very large execution penalty." From the article: "The successful extraction of almost all secret key bits by our SBPA attack against an openSSL RSA implementation proves that the often recommended blinding or so called randomization techniques to protect RSA against side-channel attacks are, in the context of SBPA attacks, totally useless." Le Monde interviewed Seifert (in French, but Babelfish works well) and claims that the details of the SBPA attack are being withheld; however, a PDF of the paper is linked from the ePrint abstract.

7 of 108 comments (clear)

Min score:

Reason:

Sort:

I got a question... by sam0vi · 2006-11-18 09:50 · Score: 4, Interesting

When i see this kind of news the following question arises: so what are we supposed to do now? Throw away RSA cryptography is not the best answer i think. What do you, fellow /.ers, would do to by pass this problem?

--
When my Karma level reaches 0 I feel in piece with the Universe
1. Re:I got a question... by maxwell+demon · 2006-11-18 11:45 · Score: 4, Interesting
  
  After now having read the complete article: Shouldn't it be possible to eliminate the branches completely?
  The following loop (adapted from fig. 3 in the paper) should IMHO work as well (although less efficiently):
  S = M A = M - 1 for i from 1 to n-1 do S = S * S (mod N) C = di /* should be doable without branch by just bit masking and shifting */ C = C * A C = C + 1 /* now if di was 1, C is M, otherwise C is 1 */ S = S * C (mod N) return S
  The only branch here is in the for loop, and that's independent of the key. Unless there are exploitable branches in the multiplication routine, of course.
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
Branch predictor as a covert channel by hpa · 2006-11-18 10:16 · Score: 4, Interesting

This isn't really a flaw in RSA cryptography, but rather the fairly obvious situation that a branch predictor, shared between processes of different privilege levels, can be used as a covert channel and thus can be used to reveal state. The same is true with the cache, for example, and multithreading makes this problem many times worse by increasing the bandwidth of the channel. On architectures which don't have branch predictors, or don't share them, this is not an issue. ARM processors, for example, tend to rely on predication rather than branches (except when running Thumb), and thus don't suffer the same problem.

This class of problems is only going to grow as CPUs become less and less deterministic.
Re:Unsecure computer - no secrets. Big deal ! by Cid+Highwind · 2006-11-18 10:31 · Score: 4, Interesting

Think managed web hosting companies that put dozens of virtual hosts on a single physical server. If this really works from an unprivileged account, one malicious user could steal SSL keys from all the rest.

--
0 1 - just my two bits
Re:Not so bad... by SnowZero · 2006-11-18 10:56 · Score: 5, Interesting

It gets better. The attack requires that the two processes are running on the same core with hyperthreading enabled (i.e. ALU-poor CMP). The "spy" process will be sucking up 100% cpu pretty much continuously. They also simplified the multiplication routine from OpenSSL. Even if you are running such a setup on a P4 with HT turned on (even though its often useless), and you need to run secure processes along with unsecure ones (generally not a good idea anyway), patches already exist for Linux and BSDs to address this. The patches modify the scheduler to prevent processes from different users from running on the same physical core. A half-hearted attempt is made in the paper to say that these attacks to generalize to something remote, but no details are given as to how their attack would compensate for the 100,000 fold decrease in timing accuracy to pull off the attack on even a local LAN.

Essentially they took a very impractical attack with an unlikely scenario, and created a somewhat practical attack with an unlikely scenario. Avoid the problem scenario which was raised in the prior work last year, and you are still golden.
Great idea! by DrJimbo · 2006-11-18 16:57 · Score: 4, Interesting

That is a clever vectorization of the square-multiply loop. It sure looks to me like it would work (I used RSA encryption as the final project in a University assembly language class I taught). The slight decrease in efficiency of your routine will be not be noticed. The timing of the entire process is totally dominated by the N-byte x N-byte multiplications. An extra N-byte x 1-byte multiplication will cause less than a 1% slowdown, probably much less.

A slight improvement to your idea might be to balance the loop anyway, using D = 1 - di, etc., essentially a vectorized version of figure 4. This would slow it down by a factor of two but it would make it resistant to conventional timing attacks.

--
We don't see the world as it is, we see it as we are.
-- Anais Nin
Solution to: Branch predictor as a covert channel by Terje+Mathisen · 2006-11-19 06:35 · Score: 3, Interesting

From the linked article:

R0 = 1; R1 = M
for i from 0 to n-1 do
if d[i] then
R1 = R0 * R1 mod N
R0 = R0 * R0 mod N
else
R0 = R0 * R1 mod N
R1 = R1 * R1 mod N
return R0

The key-dependent if statement is the key here, if we can remove all such branches, then there's no Branch Target Buffer entry that depends on it, and no timing channel attack either:

R0 = 1; R1 = M;
for (i = 0; i < n; i++) {
mask = 0 - d[i]; // Either 0 or -1
nmask = mask ^ -1; // -1 or 0
T0 = R0 & mask; // Either 0 or R0
T0 += R1 & nmask; // At this point T0 will point to the value to be squared, R0 or R1!

T1 = R0 * R1 mod N;
T0 = T0 * T0 mod N; // Now we move the correct values back into R0 & R1
R1 = T1 & mask;
R0 = T0 & mask;
R0 += T1 & nmask;
R1 += T0 & nmask;
}
return R0;

There are at least three interesting issues here:

a) Most modern cpus have hw support for conditional operations, on x86 this is in the form of CMOVcc which is a (constant-time!) conditional move into a register, but as shown above, it really isn't needed here.

b) The perforance impact of the above branch removal can be negative!
On a P4 a branch miss costs about 20 clock cycles, and since a key-dependent branch will miss 50% of the time, the average cost is 10 cycles. My replacement code above takes around 5 cycles or less on any current cpu.

c) A final possible timing-channel attack would be due to the memory alignment of the R0 and R1 values:
By allocating them at the same address modulo the cpu page size, i.e. at 4 KB offset, the cache lines hit will be the same for both.

When I worked on the asm version of DFC, one of the AES also-rans, I removed a similar timing attack from a core 128-bit modular multiplication operation, using very similar techniques.

Terje

--
"almost all programming can be viewed as an exercise in caching"