Generating Fast MD5 Collisions With ATI Video Cards
An anonymous reader writes "Yesterday at Black Hat USA 2009, a talk entitled
MD5 Chosen-Prefix Collisions on GPUs
(whitepaper) (Both PDFs)
presented an implementation written in assembly language for ATI video cards that achieves
1.6 billion MD5 hash/sec, or 2.2 billion MD5 hash/sec with reversing,
on an ATI Radeon HD 4850 X2. This is faster than the much-publicized 1.4-1.9 billion hash/sec figure that was
supposedly reached on a PlayStation 3 by Nick Breese at Black Hat Europe 2008 (he
later noticed an error in his benchmarking tool). Compared to the cluster of 215 PlayStation 3s that was used to
create a rogue CA in December 2008,
Marc Bevand claimed a cluster of 12 machines with 24 video cards would be
a bit faster, consume 5 times less power, and be 10 times cheaper."
It would be harder than you seem to think. It's not just any old fake cert they created. They created a CA certificate. That is, a certificate that can be used to issue other certificates. You can issue any many of these "other" certificates as you want and they will look legitimate.
It's very rare for a real CA to issue a certificate like that. That is the "top of the food chain" in certificates so to speak. You would have to bribe a fairly high level employee to get something like that. They keep those high level keys very well protected and there are only a few people that even have access to them.
The attack that is mentioned in the story, the creation of the rogue CA certificate, is an example of a successful MD5 collision attack with a practical application. The "random" garbage was inserted in a part of the certificate signing request which is opaque to the certificate authority. That was also an example of a useful collision attack, so these are actually dangerous (not just pre-image attacks).
ATI cards are programmable, Brook+ is just a little too high level for writing simple computational kernels (you drop too much performance) and CAL too low level for most people (it's basically assembly). So generally people just stick to CUDA, even in the few cases where ATI's architecture is superior.
This problem is ideal for ATI, very little input necessary (NVIDIA has more texture samplers) and no inter thread communication necessary (ATI does not have random writes on it's local data share at the moment, making that communication harder than it is with NVIDIA). So basically it just comes down to FLOPS and ATI wins big there.
Basically this was done in CAL because it was done by a hacker and not by an academic researcher (who doesn't really care about performance if he can just as easily get his paper published on a slower GPU with less effort, easier in fact since editors know CUDA).