SHA-1 Cracking On A Budget
cloude-pottier writes "An enterprising individual went on eBay and found boards with more than half a dozen Virtex II Pro FPGAs, nursed them back to life and build a SHA-1 cracker with two of the boards. This is an excellent example of recycling, as these were originally a part of a Thompson Grass Valley HDTV broadcast system. As a part of the project, the creator wrote tools designed to graph the relationships between components. He also used JTAG to make reverse engineering the organization of the FPGAs on the board more apparent. More details can be seen on the actual project page."
This makes them fairly pointless for general computing, but when you need to crunch a bunch of numbers in the same way over and over, they can REALLY outperform a general cpu. Usually these are used to manipulate audio / video data streams in real time (the original purpose for the FPGAs used in this project) - but recently people have started using them to brute-force try to crack an encryption scheme. Where a general purpose cpu might take upwards of 40 clock cycles to check one possible answer, each of the FPGAs in this system can check at least one answer PER clock cycle.
This guy pulled a bunch of FPGA systems out of some (defective?) HDTV video processing systems - reverse engineered exactly how everything was wired together, reprogrammed the FPGAs to do SHA-1 hash cracking rather than HDTV video processing, and added some usb control circuitry so the system could take commands from / return results to a pc.
One could use this same board setup to do any sort of massively parallel data processing, but right now the system isn't wired up to really feed large amounts of data into / out of the system in real time. He can get away with that as hash cracking results are fairly small and infrequent, so the limited means he has for getting "answers" out of the system isn't too much of a problem.
Posted at 4:39AM on Sep 1st 2007 by smilr HTH.
I'm guessing it's still not practical to search the entire sha1 keyspace, even with a bunch of these things. That begs the question, why not a widely used digest with a known collision weakness (ie md5)?
Cool project though.
Anybody, have an idea how fast that is compared to modern a CPU?
IIRC, the last time I did anything like this it took my 2200+ AMD about 24 hours to do a 6-character keyspace (from 64-character set) - with MD5.
TC - My Photos..
That's nice, his own SHA-1 cracker. But, even with advanced cryptographic attacks, SHA-1 is still in the order of 2^63. Not something you would like to try with just a few FPGA's. What is meant here is a cracker to find out which plain text, with limited entropy, is used to create a certain hash value. A SHA-1-based password cracker would therefore be a better name, I suppose.
It seems from here that it searches a 64 ^ 8 = (2 ^ 6) ^ 8 = 2 ^ 48 keyspace in 24 hours. No small feat, it should therefore do about 3,257,812,230 hashes in a second. It does 800 concurrently, which makes for 4 million a second per SHA-1 unit. Ouch, that's really fast.
Note that this could be done with any hash or symmetric algorithm, as long as it can be implemented on FPGA. So the moral of the story: use very long password (or even better, pass phrases), or make sure that they won't be able to acquire the hash.
But I have no idea what that summary or TFA are about.
Cool diagram.
I used to draw patterns like that while suffering through triple Maths - draw a circle, mark off every 10 degrees (or 5 if it looked like being really boring today), then join every point to every other point. Mindless, yet strangely satisfying.
And what kind of nerd site doesn't let you use a degree symbol?
For the record the company is Thomson and that is a peice of equipment known as the Princeton Engine used by the IC developers to quickly verify their software/algorythms. It was lying around in our computer room (known as the Princeton Engine room) for years. Its replacement is from Cadence and is called Palladium and has the power of several hundred of those old fpga boards.
He's not looking for collisions - he's looking for preimages of a given hash. Since he can't search a large enough space to find a preimage of an arbitrary hash, the most useful application of this sort of thing is password cracking - given the hash of someone's password, search the space of plausible passwords until you find one that matches the hash (taking salt into account as appropriate). Fun but not too advanced.
Shame - what I was really hoping to read was that he'd implemented the latest collision-finding attacks on SHA-1 on FPGAs. It won't be long before we have our first real-live SHA-1 collison, and it'll be interesting to see whether it's done with special hardware like this, general purpose processors, or perhaps something curious like PS3s or video hardware.
Xenu loves you!
An FPGA really is conceptually very simple, and they're not hard to "program" either... Contrived example:
Verilog design to add/subtract 2 numbers (you'd never do this, but...)
Compare that to a K&R "C" routine to do the same thing...
In both cases, of course, you'd just use the 'if...else...' part, but I wanted to show more language structure...
:-), you don't get world-beating overnight, but it's relatively easy to get the 80% solution, and that might be just fine. Eking out the last 20% is where it gets hard, as you have to understand the internal structure of the LUTs, and how they interact with the carry-chain, what the LUT->LUT delay can be useful for etc. None of this is at all relevant unless you're missing your timing on a critical circuit (eg: you need 133MHz so your DDR2 SDRAM can work, but the synthesis tools (equivalent to a compiler) only deliver 125 MHz for your design).
The key thing to remember is that in C, all things happen serially, unless you arrange otherwise with threading libraries. In Verilog, any block beginning with 'always @' happens in parallel with every other 'always @' block. Once you've mentally-mapped the concept of vast numbers of threads to the way hardware works, any competent multi-threaded programmer can become a competent hardware engineer.
Of course, there's "guru stuff" in there as well (as much as you want, trust me
The 'always@' part is the hint of just where the power lies. *Everything* can happen in parallel, so you can build pipelines (like CPU's are pipelined today) into your logic, thus reducing the time taken per step (while taking multiple steps), thus increasing your clock rate. The benefit of course is that although the *first* result comes out in the same time, every clock thereafter, you'll also get a result.
I wrote a JPEG decoder a couple of years or so ago, running at ~130MHz. That doesn't sound much, but that comes to ~65 MPixels/second because of the pipelining. Looking at the SSE-optimised intel libraries, a CMYK422->CMYK baseline decode (which is what the FPGA was doing) takes 371 clocks/pixel. The intel chip I was comparing to was a 3.6GHz P4, meaning it could do ~9.7 Mpixels/second. For motion-jpeg that's the difference between decoding SD frames (for the P4) and decoding HD frames (for the FPGA)...
So, FPGAs tend to run slowly (relative to today's CPUs) but can exploit parallelism in ways CPUs just can't, but of course for serial processing, you can't beat a tradition
Physicists get Hadrons!
This doesn't find hash collisions (much less pre-images), so it's not an attack on SHA-1. It's an exhaustive search over a subset of ASCII (64 characters) for the purpose of cracking short (8-character) passwords.
This is John the Ripper in hardware, just not as clever.
hmm... you seem to know a lot about FPGAs, so I'll ask you something I've been wondering for a while...
Coming from a traditional software end of things, I'm used to seeing "accelerating co-processors" available to do useful tasks much faster than the main CPU. I'm thinking not only the FPU (when it was a separate chip), but things like a modern GPU and such. Many of these have been slowly integrated back into the CPU as time has gone on, the FPU being the best example, so now it's something you can just call on with a special instruction.
From my understanding, FPGAs are mainly all generic logic blocks, arranged in fancy ways, and therefor are rather "generic" like the general CPU - you have to implement any fancy processing yourself.
My question is has anybody thought about putting fancy co-processing hardware local to the FPGA? I'm thinking some built-in FFT units and such that you could just include anywhere in your pipeline would be really useful, and might help that "timing critical" areas by having some common "higher level" functions computed in full hardware (ASIC?) speed.
Like, as a programmer, it sounds like it'd be cool to be able to buy an FPGA with built-in FFT, CRC/MD5/etc, maybe part of some encryption routines, etc to work with as common things that need to be accelerated.
Is this sane? Does it already exist and I just don't know about it? Is it totally incompatible with how FPGAs work?
Ce n'est pas une signature automatique.
"+4, Interesting"? This is obviously a joke. And a pretty terrible one at that.
Sheesh.
So is this being used to create huge rainbow tables? How many characters out can you go? I saw someone selling rainbow tables for SHA-1 out to 15 characters at DefCon on 2 DVD's for like 10$...
Horns are really just a broken halo.
This is a really neat project. These FPGAs aren't the latest and greatest, but when you've got 15 of them, parallelism is your friend.
Outside of my job, I've had some ideas for FPGA projects, but the (mostly) BGA packages result in very expensive multilayer boards; too expensive for hobbyist purposes. You'd also have to contract out the BGA assembly, and depending on the FPGA chosen, spend a lot on the FPGA itself. (The smaller FPGAs are available in hand-solderable flat-packs, but the logic densities are typically insufficient. At least we can get small FPGAs from Digikey for $10 and up.) An obvious workaround is to buy a dev board, but they're still relatively expensive, and you've got IO hardwired to the various peripherals they stick on the dev board.
This SHA-1 project cleverly reuses hardware that someone else devoted a lot of time and money to.