Finding MD5 Collisions With Chinese Lottery
Stanislav Shalunov writes "Jean-Luc Cooke posted a Usenet article describing a distributed webpage-based effort (Chinese Lottery) to find a collision in the MD5 function. All you need to do to participate in the effort is visit the URL that loads the code. The author comments: 'What is interesting about this approach - when we reach final release stage - is that any website that adds this small snippet of code to their pages will have their visitors working on the problem for the duration of their visit to the site'."
From the link:
;)
You run an Applet, it reports to us the search results. Distributed computing without installing anything...and without people knowing you're stealing their idle CPU time.
I don't know about you but I wouldn't lean out the window with the fact that I'm stealing from others.
Idle CPU time might be unused but I still want to know what my box is doing and why.
Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
Perhaps we could tie this to some sort of micropayment system. You come do distributed work on my website, and you get to view it. Some third party pays me for the cycles, and I have a new revenue stream!
Last time I looked into this, which was several years ago, there were no known different strings which had the same MD5 hash. I thought this was remarkable. Are there any known ones today?
Just embed the applet into your HTML, view the source of that page - you'll get it.
That's a really interesting way of doing it. For the people who don't know, here's a quick explanation:
Java Applets, because of the sandbox they're run in, can't open up a network connection to any website, except for the websie they came from. Presumably, what they're doing is creating a small Java applet, that when loaded, executes some logic, then opens up a network connection back home and sends the results.
Fascinating. This way, you don't have to bother installing something and hope it doesn't fsck up your computer. It might be slightly less efficient than a dedicated, installed program, but this way, they can harness the power of a computer just casually browsing a web page. Very innovative.
Make sure to take out the warning message "ok fine then, you don't want cookies..." that pops up when you disallow it yer cookies (buy yer own thx!). This was surely a debug message, it's not useful anymore ;)
First thing it does when the applet loaded was to bitch at me for not accepting cookies. Just like my wife.
I respect the effort and ingenuity, but the rationale that "hey, we're helping solve a problem" somehow justifies stealing someone else's resources... it's just wrong.
Be upfront with people - tell them why it's so important, what can be accomplished with it, and what it does. You'd be surprised - people might help out of *gasp* the goodness of their own hearts. A good example might be SETI, etc.
It certainly isn't using very many cpu cycles, the OS reports that my webbrowser is using less than 1% of the available cpu power
put the snippet on slashdot.org. The collisions should all be found within an hour.
Yep, I never spell check.
More incorrect spellings can be found he
Have you ever tried even using a dedicated renderfarm? The complications that can arise if you don't have all the textures and files locally, not to mention the fact that rendering is so heavy a tax on the CPU people would NEVER want to do it. Plus, that would involve them releasing files that go into making the movie. And so on and so forth, The idea is so terrible I couldn't imagine anyone ever trying it. Peace out and try to talk about something you konw for once.
YOU SUCK BALLS!
Interesting idea, but most distributed computing tasks that run in the background run at low priority. Since this is running inside your browser (more or less) it will run at the priority of the browser. Unless your browser is running at low priority then this process will push all the lower priority processes out of process cycles.
This could prevent contact with ET!
"Anything is possible with enough programmers, time and pizza." (Substitute caffeine for time as needed.)
No, it would take too long just to upload the scene data to the client, let alone render anything useful within the average person's attention span.
It's about time that the monster (us) is used for good and not evil.
Oooh! I thought of another way...
Just Click here.
-P
I nearly got suspended from school because I installed seti@home on all the machines. With this, I can still maintain my EVIL distributed computing campaign, and do it without them knowing!
And why did you staple the trout to the RAM?
Here's the code:
:P
.html files through PHP, 'cause he's got a PHP header that isn't being sent - oh yeah and better html please.
<!-- try IFRAME, else use LAYER -->
<IFRAME SRC="http://www.jlcooke.ca/psearch/dmd5l.html" SCROLLING="NO" FRAMEBORDER="0" WIDTH="100" HEIGHT="32">
<LAYER SRC="http://www.jlcooke.ca/psearch/dmd5l.html" WIDTH="100" HEIGHT="32" CLIP="0,0,100,32"></LAYER>
</IFRAME>
It' s making an iframe that loads the applet, and just does its own thing - by loading in the iframe it can call back to their host, rather than yours
Someone should let him know that he needs to make his server parse
Let's put the research effort asside here and thing about the underlying concept here... basically, this is a distributed computing app being buried within webpages. Could commercial interests use this concept to get access to computing resources from their web users without telling them?
1. Create very small website with CPU draining applet and post a link to said website to Slashdot.
2. ??
3. Profit!
What's this Dotslash you talk about?
Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
I believe the term was parasitic computing. Ideally the web master makes visitors aware to what's going on. You're using visitors' computing power to accomplish a neat sort of distributed computing. Great idea, if you're not just stealing resources
As someone who intentionally runs a low-performance box as a primary system (VIA Epia 533) I'd be pretty unhappy with some snarfing up a few cycles. Junked-up web sites with flash and excessive java/javascript are REALLY noticable when you're browsing at the low end of the power curve.
I run a cpu monitor in the background and when a site wants to run one of the more annoying classes of advertisements, utilization usually pegs... I can't imagine what something that intentionally sucked cycles would do.
Yeah, but do we all run Java enabled browsers? (lynx, links, etc)
..and I have dialup.
I'm running No-Java-Opera right now:because the java enabled opera was 11 more megs..
Point is, geeky as we are, we're probably all expirementing with stuff.
NOT LIKE THAT YOU PERVERTS!!/
"The most looniest, zaniest, spontaneous, sporadic Impulsive thinker, compulsive drinker, addict"
Dude. Do you want to know the tax on your server? 3 lines of simple HTML. That doesn't sound like much of an extra complication, or CPU usage. Even the tiny applet is loaded off Their Server, meaning you do nearly no work to help these guys. You can debate the ethics, sure, but saying this is a mistake because of server issues is wrong.
SAILING MISHAP
I wonder if the good slashdot people would be willing to make this into a slashbox ?
Basically, in a world where everything was based on a thumbprint, would you want even the smallest chance, no matter how statistically unlikely, that someone else had the same thumbprint as you?
If two strings produce the same md5 hash, the universe ends. This project should probably be stopped.
..some. You use bandwidth for data throughput, you have the CPU usage..
All on the server side. Yes, the clients are the ones doing the Real Work, but you have to do something with the result of that work. And its the Doing that taxes your servers, if only a little bit.
"The most looniest, zaniest, spontaneous, sporadic Impulsive thinker, compulsive drinker, addict"
OK, so an evil webmister makes a pop-under containing this kind of code and puts it up when you visit his porn site (optionally by mistyping "google" in your address bar.)
Heck, (google|SlashDot|your legitimate business) just has a tiny inset on their page: "This box is using your spare CPU cycles to help us pay for this site or service. Subscribers do not see this box. Click here to subscribe."
It could work.
In the popunder case it is vile and abusive. In the legitimite and well advertised case it is totally fair.
Innocent people shouldn't be forced to pay for inferior software development.
--"Code Complete" Microsoft Press
It's really too early for Slashdot readers to try to run that code. As the usenet post said, it's alpha test. I'd actually call it pre-alpha. The usenet sci.crypt discussion is about ways to change the design so it can be hosted on multiple sites at the same time. Really, it would have been a lot better to wait for the author to make an announcement, before linking an ongoing discussion about a work in progress to the front page of Slashdot as if the code was ready for prime time. Ow!
With this being posted here someone with more knowledge of java than me is going to have the idea to give back false results. That is the reason for an install, to give the project mamgers control.
I bet that sometime son they are going to be finding lots of collisions, all results from the same IP.
Hope they have some sort of filter.
md5sum
d41d8cd98f00b204e9800998ecf8427e
It crashes Safari. Now, admittedly, I don't know whether this is a Safari bug, a Java bug, a bug in the applet, or some combination thereof, but here's what happens to me:
I load the thing in its own tab, have a look, look at the neat code that loads an IFRAME, etc. Ho-hum, nice idea, let's see where it goes, cmd-W to close the tab. Whups! The entire browser window closed, including all the tabs which I hadn't got around to checking yet! Safari is still running in the foreground, but I just lost its window.
Anyone interested enough to debug this? I'm not =P
political_news.c: warning: comparison is always true due to limited range of data type
Not that I mind technology, and new tricks.
But the last thing I want to see is every website hogging my CPU. Either selling computing power of their web visitors for profit, or using it for themselves.
Imagine the next series of Spyware Trojans... rather than spy, they harness your CPU and sell the power. All without the knowlege of the computer owner.
Interesting business model, but not something I want to see. I like my CPU. Note the word "my".
MD5 is a hashing algorithm. It will take an input of theoretically any size and create a 16 byte number that maps to this string. Most security algorithms use MD5 (or SHA-1 or some other hashing algorithm) to verify that the plaintext or cryptotext has not been altered during transit.
Obviously, since a string can be an almost infinite length, there has *got* to be collisions somewhere, but so far, no one has found any.
Realize that 16 bytes = 128 bits = 3.40282367e38 different outputs of MD5. Given that the half-life of a proton is 10e31 years, you need to do about 1 per second before half of the universe ends for good. Or, if you want to finish it in 100 years, you would need to 10e20 per second.
You better start some time soon!
This uses Java not Javascript; learn the difference.
The chance of an MD5 collision if MD5 were an ideal hashing algorithm is astronomically small. To get a 1% chance of a collision you'd have to test on the order of 2^63 samples (for the math behind this google for the birthday paradox; it's of the order of the sqrt of the size of the hash space) to find two that match. Never mind finding an MD5 which matches a chosen hash value.
This is a really big number.
Nobody's really concerned about MD5 hash collisions of reasonable corpii (corpuses?, forgive my pseudo-latin) if MD5 is actually a perfect hash, or somewhat close to it. What people are really concerned about is there being some weakness in MD5 where you can reverse the algorithm and given some MD5 hash (maybe not any hash, maybe just certain ones) and come up with strings which hash to that value.
For example, suppose that 2^127-1 is prime (it may well be but I'm too lazy to check). Then if you start pulling out random strings foo and using the remainder of foo mod 2^127-1 as your hash you'll also have a 1% chance of a random collision with a sample size of the order of roughly 2^63, as above. However there are some trivial collisions you can calculate, like 0*(2^127-1), 1*(2^127-1), 2*(2^127-1) all hash to the same value.
If the data you're feeding your hash algorithm is random (more or less) there's no reason to prefer the modulus algorithm over MD5. But if you're using it for cryptographic things the modulus algorithm is pretty useless, and it may turn out to fall down on many common inputs that MD5 gives good results for.
I may have goofed some of this, and there's lots more to be said about it but I've wasted enough time on this post as it is.
is a good thing.
Most people who browse websites are quite simply unaware that their computer even contains a concept called Idle CPU Cycles, or that there is any way to get a CPU % reading from their computer. Besides, not everyone is so miserly with their CPU time. Most users also have a short attention span.
If the user, whose browser visits such a website that opens up a number crunching applet, notices that their whole computer just became slower, then they'll leave the website. And the applet will be alive for less time. Therefore successful applet projects that are accepted and deployed by various webmasters, which want to obtain the most results would make sure that the applet is as unobtrusive as possible. Otherwise the user will browse away from the page (and or close the browser window all together), and the applet's lifespan will be short.
Obviously, since a string can be an almost infinite length, there has *got* to be collisions somewhere, but so far, no one has found any.
Correction: No one has reported any. I, uh, have a friend--yeah, that's it--who found a few collisions but is afraid to report them because it always occurs between his beastiality files and his lengthy and frequent poetic love letters to some girl who claims he's stalking her.
Actually, I think in the "Chinese Lottery" scenario, there is one string/hash pair that is chosen, and all the clients try other combinations of strings. Whoever gets the same hash will "win" the lottery. Thus, the web site wouldn't have to store anything except the returned plaintext that hashed to the same MD5 value.
I think the original "Chinese Lottery" scenario was if everyone one in China had a radio that was set to do encryption, and the Chinese government broadcasted a particular ciphertext that it wanted to encrypt, every radio would do the decryption using different strings until one of them got the answer. I think it would be under the guise of a lottery, so whichever citizen came back with the winning radio would receive a prize, and the Chinese government would have their cracked ciphertext.
You're basically correct. Theoretically many different inputs have the same md5 hash. However, the chances of finding two such inputs are very small. There is no real practical value to finding such a collision, other than to give a rough idea of what it takes computationally to find one. Since md5 is used to check the integrity of files like linux isos, it is important to know how secure the algorithm is.
It is a bit like SETI@home, It is very likely that we're not alone in the universe, but until we have empirical proof that we're not, nobody is truly satisfied.
Besides, if this was of true significance for national safety, funding would be found to run this on dedicated machines.
YOU HAVE BEEN WARNED
Visit CryptoGnome in his home.
Once they have gotten this working, and assuming there is a commercial need for these cycles that exceeds the cost in bandwith, a site could do as others have suggested, and require you to run this app (ala netzero etc) in order to acess content on the site.
Beats pop up ads, anyway.
But, could this not be used to build a hash table of all MD5 sums? If all possible MD5s were known by one source, what is to prevent them from using this as a simple lookup to crack MD5-based passwords? Even if they only focused on short strings (say, typical password length) they could go a long way to defeating another security mechanism.
What those who want activist courts fear is rule by the people.
My standard reply to this is that there are 2^128 possible hash sums which is many magnitudes more than the number of electrons in the universe! So you'd have a pretty hard time storing them all.
As for the set of short strings, because this is such a limited set, if MD5 is any good (which it is), you won't find a collision in such a small subset.
a collision in MD5's transform was found. But not on the whole hash.
h tm l
Difference? The md5() function includes padding. The md5_compress() collision is cited here:
http://citeseer.nj.nec.com/denboer93collisions.
Read van oorschot's paper cited in my sci.crypt post. You'll start gettign mad at VeriSign, Amazon, SourceForge, et al for using MD5.
No respectable cryptographer uses MD5 for signatures anymore, they havn't for years - the industry hasn't caught up yet (TripWire, VeriSign, .rpm, .deb, md5sum, some PRNGs, etc)
/. posters... :) )
This is the essance of why I'm doing this.
Look around for evidance of this movment in crypto circles (ie don't listen to