Finnish Firm Claims Fake P2P Hash Technology
An anonymous reader writes "As reported by The Inquirer, a Finnish company known as Viralg Oy claim to have developed software that can create a junk file with the same hash as a genuine p2p download. This, according to the company, can altogether stop the sharing of copywritten files by flooding p2p networks with corrupt/junk data, which then spreads through the network, causing less and less of the original file to be available. However, with the resolve of the p2p userbase, is this software really going to 'beat all Peer 2 Peer pirates at their own game,' or simply prove a minor annoyance?"
I guess there are two schools here.
One believes this kind of fake files will only add burden to the internet, as users will just download one fake file after another until they got a hit.
The other believes that such annoyance will put most people off, because the total time/cost it takes to acquire something is now higher than the actual product.
I don't think MP3s will be affected because you can start playing the song if you've got the first bit. Can/will other file formats do that too?
Rock that crushes, Paper & Scissors that don't matter.
How big is that 'junk file'?
I've always thought it would be extremely possible to create a file with the same MD5 hash.
.. then I'll be impressed.
Now, what the company has to do is create a file of the SAME FILE SIZE, with the same MD5 hash that's a fake
= Grow a brain...
Check out my sci-fi/humor trilogy at PatriotsBooks.
What hashing algorithm do they claim to have broken so completely? Sounds like BS to me.
Don't blame me; I'm never given mod points.
They might be able to fake one hash, but don't most P2P networks use a combination of different hashes? if not then it would be easy to implement - you can either go for more than one different type of hash like md5 and sha etc or add salt/pepper to a chunk and make any number of hashes where each additional hash makes it insanely harder to crack..
This comment does not represent the views or opinions of the user.
Or even better, use more than one. If file_x is hashed 10 different ways, using 10 different algorithms, there's no way the file generated by this firm will behave the same way for ALL of them, perhaps not even for two.
Quid festinatio swallonis est aetherfuga inonusti?
Africus aut Europaeus?
The time-vs-accuracy tradeoff is a big one. One client which I know some people who use, takes almost 48 hours to index a full hard drive of files to share, and hash them all.
Anything less robust, you're liable to have collisions, such as these, apparently. Any more, and if you have a lot of files, there's a major time committment before you can actually begin to serve anything -- most people aren't willing to have their CPU pegged for 2 days straight while their P2P client hashes their 35,000 MP3s and 200 movies, or so.
Says the algorithms patented on their site so presumably we should all be able to go look at this little marvel.
Update Watch - Automatic software update notification
isn't the whole point of a hash is that it's computationally-infeasible to create a file that that H(new file)=H(original).
if this technology is true, it'll completely undermine the safety of today's unix passwords, which are stored in clear text of their hash.
What is neat, or not so neat depending on your point of view, are music files which deteriorate after a while. I don't know how they are made, but I have listened to music that sounds pretty good, but after the 10th playing it starts skipping. Or it could be those skips are not very noticable when first played, but once identified, they become annoying.
Rosco: "If brains were gunpowder, Enos couldn't blow his nose."
I see a really short reign of this new "technology" anyway. The hashes could only be for one specific file encoded by a specific encoder with the EXACT title/artist/album info which is not always consistent anyway. I see this as a futile effort.
main(0)
If someone can really poison P2P networks with junk that hash matches (and I have a difficult time believing they've cracked all the hash generators), then consider some hypothetical entity probing illicit distribution of copyrighted material using hashes. They could end up making false accusations against individuals for trading trash instead of Trash©.
"Provided by the management for your protection."
If increasing the noise ratio on P2P networks is a good thing, maybe we can use a similar technique to defeat spammers?
For example, if we could pollute spammers' email address databases with millions of bogus e-mail addresses, then instead of delivering millions of spam e-mails to real e-mail accounts every day, maybe spammers could only reliably send a few hundred to users, the rest of their messages would be to bogus addresses and be "noise" that spammers have to deal with.
How could we go about doing this?
I don't know the meaning of the word 'don't' - J
For example, you send the company a copy of the .mp3 file you want to drive out of circulation. They feed it to a computation cluster and eventually out comes another file which has the same hash. You then publish this new file with the same filename on the victim P2P network and hope that it spreads enough to poison the P2P well, so to speak. There are a number of problems with this scheme (assuming of course that this is the sort of scheme that they offer):
For instance, hash with two different algorithms. In theory it is possible to find a file that can hash to the same value in two different algorithms, but its a lot harder than finding a file that hashes to a specific value in one algorithm.
I am disrespectful to dirt! Can you see that I am serious?!
There is a world of difference between a valid collision and an invalid one.
The anti p2p software appears to find invalid collisions which mean the downloaded file is useless.
Finding collisions where the movie/app/document remains valid will be MUCH more tricky.
liqbase
SO say the RIAA tries to sue you, saying they saw that you had the newest 50 cent album on Kaaza. Couldn't you claim that what you had was not 50 cent's album, but random files with the same hash as 50 cent's mp3's? I mean, can't you fight the RIAA with its own weapons? If they completely destroy the mechanism with determining what files you currently have, then how does their claim that you had X file hold any merit at all?
The hashes could only be for one specific file encoded by a specific encoder with the EXACT title/artist/album info which is not always consistent anyway. I see this as a futile effort.
Who pirates individual songs these days? I see this as being a major annoyance for people who pirate games. DVD ISOs are typically 4GBs, usually released by only one or two groups (and so there probably won't be more than 2 versions of the file), and take several hours if not days to download. Worst yet, the games contain executable content, so assuming the ISO mounts via Daemon Tools, for example, if you're really unlucky, you might randomly have gotten code that reformats your harddrive.
How about a hash of the entire file, plus a hash of every 128 KB segment. Constructing a file that matches all of the 128 KB section hashes, plus the overall hash is a much more difficult problem.
Plus, you know after downloading only 128 KB that the file is not the real deal. It only takes 8 * 128 bytes or 1024 bytes of hash information per megabyte of download -- really only a few packets to communicate the hash list for, say, a 10 MB file. The benefit for this cost is
- early detection of corrupt download
- difficult of creating a corrupt download
Now suppose that in BitTorrent like fashion, I could download each 128 KB segment from a different location.I'll see your senator, and I'll raise you two judges.
The magic of this system is that it also works in reverse: "Your honor, my client hates p2p filesharing. All those songs he downloaded, he thought they were phonies with duplicate hashes and deliberately shared them in order to poison the network."
So if they try to pollute a network by giving corrupt data for a valid file, all the downloader needs to do is notice that a particular client keeps sending corrupt parts. And of course if they send some real bits nad some fake bits, the downloader will keep the real bits and discard the fake ones.
Don't ask me how it works, but I know that Shareaza makes use of this hash.
Link I ripped from the Shareaza wiki: Tree Hash EXchange format (THEX)
Thanks for the link. If you look at page four of the document, it explains that because the UUHash algorithm used by Kazaa hashes only a small part of the file it is feasible to change other parts and produce hash collisions through brute-force attacks. Then the attacker just pretends to be a normal node and feeds bad data into the network.
The obvious way to counter this is to either fix Kazaa or switch to a network where the whole file is hashed.
And those things were each also embroiled in copyright lawsuits by big corporations in their day. The difference is that today, the big corps have finally gained enough political leverage to get it their way.
Corporations are the new first-class citizens. Any individual, regardless of race, gender, or creed, is second-class compared to a corporation.
I honestly fear that by the time the American people get fed-up enough to realize this, the transformation will be complete, and we will be powerless to change it.
This doesn't cripple P2P. It just makes a dent in pirate-2-pirate. There is a difference, you realize. The Blizzard Bittorrent patch downloader will still function perfectly. Indie bands who release their new CDs to Kazaa won't have anybody trying to pollute their download pools. And it probably won't even work, more's the pity.
Help poke pirates in the eyepatch, arr.
Oh, I get Mr. Schneier's thing and I'm not behind on the news; I am under the impression that that there have not been demonstrated preimage attacks on MD5, which is what I was referring to.
Re: SHA-1:
These are not theoretical results but actual collisions.
Again, here it is preimage attacks that are the problem, not just any collisions. But the results mentioned in the link are NOT actual collisions, just an algorithm to produce those collisions that might be feasable to run sometime soon. They didn't actually calculate any collisions. So not "actual collisons", but a "theoretical result". But that's just pedantry, sort of.
Anyway, as far as preimage goes SHA-1 is certainly still secure, as is -- I believe -- MD5, and this is what's relevant in downloading. If they are not, please point me to the appropriate thing.
xkcd.com - a webcomic of mathematics, love, and language.
file1.dat:
00000000 d1 31 dd 02 c5 e6 ee c4 69 3d 9a 06 98 af f9 5c
00000010 2f ca b5 87 12 46 7e ab 40 04 58 3e b8 fb 7f 89
00000020 55 ad 34 06 09 f4 b3 02 83 e4 88 83 25 71 41 5a
00000030 08 51 25 e8 f7 cd c9 9f d9 1d bd f2 80 37 3c 5b
00000040 96 0b 1d d1 dc 41 7b 9c e4 d8 97 f4 5a 65 55 d5
00000050 35 73 9a c7 f0 eb fd 0c 30 29 f1 66 d1 09 b1 8f
00000060 75 27 7f 79 30 d5 5c eb 22 e8 ad ba 79 cc 15 5c
00000070 ed 74 cb dd 5f c5 d3 6d b1 9b 0a d8 35 cc a7 e3
MD5(file1.dat) = a4c0d35c95a63a805915367dcfe6b751
file2.dat:
00000000 d1 31 dd 02 c5 e6 ee c4 69 3d 9a 06 98 af f9 5c
00000010 2f ca b5 07 12 46 7e ab 40 04 58 3e b8 fb 7f 89
00000020 55 ad 34 06 09 f4 b3 02 83 e4 88 83 25 f1 41 5a
00000030 08 51 25 e8 f7 cd c9 9f d9 1d bd 72 80 37 3c 5b
00000040 96 0b 1d d1 dc 41 7b 9c e4 d8 97 f4 5a 65 55 d5
00000050 35 73 9a 47 f0 eb fd 0c 30 29 f1 66 d1 09 b1 8f
00000060 75 27 7f 79 30 d5 5c eb 22 e8 ad ba 79 4c 15 5c
00000070 ed 74 cb dd 5f c5 d3 6d b1 9b 0a 58 35 cc a7 e3
MD5(file2.dat) = a4c0d35c95a63a805915367dcfe6b751
For SHA1, you are correct. They presented an algorithm for finding collisions in full 80-round SHA1, and demonstrated the correctness of the algorithm on SHA1 reduced to 58 rounds. Here is the SHA1 announcement:
http://theory.csail.mit.edu/~yiqun/shanote.pdf