Finding MD5 Collisions With Chinese Lottery
Stanislav Shalunov writes "Jean-Luc Cooke posted a Usenet article describing a distributed webpage-based effort (Chinese Lottery) to find a collision in the MD5 function. All you need to do to participate in the effort is visit the URL that loads the code. The author comments: 'What is interesting about this approach - when we reach final release stage - is that any website that adds this small snippet of code to their pages will have their visitors working on the problem for the duration of their visit to the site'."
Just embed the applet into your HTML, view the source of that page - you'll get it.
That's a really interesting way of doing it. For the people who don't know, here's a quick explanation:
Java Applets, because of the sandbox they're run in, can't open up a network connection to any website, except for the websie they came from. Presumably, what they're doing is creating a small Java applet, that when loaded, executes some logic, then opens up a network connection back home and sends the results.
Fascinating. This way, you don't have to bother installing something and hope it doesn't fsck up your computer. It might be slightly less efficient than a dedicated, installed program, but this way, they can harness the power of a computer just casually browsing a web page. Very innovative.
It certainly isn't using very many cpu cycles, the OS reports that my webbrowser is using less than 1% of the available cpu power
Java applets run as a different process to the browser, and it can (and very likely does) create a new thread, and set its priority to low.
Wow, I should not post when knackered.
Here's the code:
:P
.html files through PHP, 'cause he's got a PHP header that isn't being sent - oh yeah and better html please.
<!-- try IFRAME, else use LAYER -->
<IFRAME SRC="http://www.jlcooke.ca/psearch/dmd5l.html" SCROLLING="NO" FRAMEBORDER="0" WIDTH="100" HEIGHT="32">
<LAYER SRC="http://www.jlcooke.ca/psearch/dmd5l.html" WIDTH="100" HEIGHT="32" CLIP="0,0,100,32"></LAYER>
</IFRAME>
It' s making an iframe that loads the applet, and just does its own thing - by loading in the iframe it can call back to their host, rather than yours
Someone should let him know that he needs to make his server parse
Yeah, but do we all run Java enabled browsers? (lynx, links, etc)
..and I have dialup.
I'm running No-Java-Opera right now:because the java enabled opera was 11 more megs..
Point is, geeky as we are, we're probably all expirementing with stuff.
NOT LIKE THAT YOU PERVERTS!!/
"The most looniest, zaniest, spontaneous, sporadic Impulsive thinker, compulsive drinker, addict"
Dude. Do you want to know the tax on your server? 3 lines of simple HTML. That doesn't sound like much of an extra complication, or CPU usage. Even the tiny applet is loaded off Their Server, meaning you do nearly no work to help these guys. You can debate the ethics, sure, but saying this is a mistake because of server issues is wrong.
SAILING MISHAP
It's not JavaScript, it's Java. Despite the names, they're vastly different.
It's really too early for Slashdot readers to try to run that code. As the usenet post said, it's alpha test. I'd actually call it pre-alpha. The usenet sci.crypt discussion is about ways to change the design so it can be hosted on multiple sites at the same time. Really, it would have been a lot better to wait for the author to make an announcement, before linking an ongoing discussion about a work in progress to the front page of Slashdot as if the code was ready for prime time. Ow!
MD5 is a hashing algorithm. It will take an input of theoretically any size and create a 16 byte number that maps to this string. Most security algorithms use MD5 (or SHA-1 or some other hashing algorithm) to verify that the plaintext or cryptotext has not been altered during transit.
Obviously, since a string can be an almost infinite length, there has *got* to be collisions somewhere, but so far, no one has found any.
Realize that 16 bytes = 128 bits = 3.40282367e38 different outputs of MD5. Given that the half-life of a proton is 10e31 years, you need to do about 1 per second before half of the universe ends for good. Or, if you want to finish it in 100 years, you would need to 10e20 per second.
You better start some time soon!
This is just ONE MORE REASON YOU SHOULD DISABLE JAVASCRIPT.
.sig (WARNING: Sig link is not FRIGGIN SAFE for work, home, or anywhere else).
Why is it when I say this stuff, nobody believes me?
If that's not enough, check-out my
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
It can't be reversed. That's the point of MD5.
However, it is trivial to prove the fact that there are strings that have the same MD5 hash due to the fact that you can't represent 2^65 different numbers with only 2^64 keys.
-twb
The chance of an MD5 collision if MD5 were an ideal hashing algorithm is astronomically small. To get a 1% chance of a collision you'd have to test on the order of 2^63 samples (for the math behind this google for the birthday paradox; it's of the order of the sqrt of the size of the hash space) to find two that match. Never mind finding an MD5 which matches a chosen hash value.
This is a really big number.
Nobody's really concerned about MD5 hash collisions of reasonable corpii (corpuses?, forgive my pseudo-latin) if MD5 is actually a perfect hash, or somewhat close to it. What people are really concerned about is there being some weakness in MD5 where you can reverse the algorithm and given some MD5 hash (maybe not any hash, maybe just certain ones) and come up with strings which hash to that value.
For example, suppose that 2^127-1 is prime (it may well be but I'm too lazy to check). Then if you start pulling out random strings foo and using the remainder of foo mod 2^127-1 as your hash you'll also have a 1% chance of a random collision with a sample size of the order of roughly 2^63, as above. However there are some trivial collisions you can calculate, like 0*(2^127-1), 1*(2^127-1), 2*(2^127-1) all hash to the same value.
If the data you're feeding your hash algorithm is random (more or less) there's no reason to prefer the modulus algorithm over MD5. But if you're using it for cryptographic things the modulus algorithm is pretty useless, and it may turn out to fall down on many common inputs that MD5 gives good results for.
I may have goofed some of this, and there's lots more to be said about it but I've wasted enough time on this post as it is.
Actually, I think in the "Chinese Lottery" scenario, there is one string/hash pair that is chosen, and all the clients try other combinations of strings. Whoever gets the same hash will "win" the lottery. Thus, the web site wouldn't have to store anything except the returned plaintext that hashed to the same MD5 value.
I think the original "Chinese Lottery" scenario was if everyone one in China had a radio that was set to do encryption, and the Chinese government broadcasted a particular ciphertext that it wanted to encrypt, every radio would do the decryption using different strings until one of them got the answer. I think it would be under the guise of a lottery, so whichever citizen came back with the winning radio would receive a prize, and the Chinese government would have their cracked ciphertext.
You're basically correct. Theoretically many different inputs have the same md5 hash. However, the chances of finding two such inputs are very small. There is no real practical value to finding such a collision, other than to give a rough idea of what it takes computationally to find one. Since md5 is used to check the integrity of files like linux isos, it is important to know how secure the algorithm is.
It is a bit like SETI@home, It is very likely that we're not alone in the universe, but until we have empirical proof that we're not, nobody is truly satisfied.
Besides, if this was of true significance for national safety, funding would be found to run this on dedicated machines.
YOU HAVE BEEN WARNED
Visit CryptoGnome in his home.
a collision in MD5's transform was found. But not on the whole hash.
h tm l
Difference? The md5() function includes padding. The md5_compress() collision is cited here:
http://citeseer.nj.nec.com/denboer93collisions.
read the sci.crypt post, I site a paper from van oorschot from 1994 describing exactly how to get MD5 collision. In today dollars/moores law, it would cost $100,000....anyone with good credit can find collisions in MD5.
Read van oorschot's paper cited in my sci.crypt post. You'll start gettign mad at VeriSign, Amazon, SourceForge, et al for using MD5.
No respectable cryptographer uses MD5 for signatures anymore, they havn't for years - the industry hasn't caught up yet (TripWire, VeriSign, .rpm, .deb, md5sum, some PRNGs, etc)
/. posters... :) )
This is the essance of why I'm doing this.
Look around for evidance of this movment in crypto circles (ie don't listen to