Finnish Firm Claims Fake P2P Hash Technology
An anonymous reader writes "As reported by The Inquirer, a Finnish company known as Viralg Oy claim to have developed software that can create a junk file with the same hash as a genuine p2p download. This, according to the company, can altogether stop the sharing of copywritten files by flooding p2p networks with corrupt/junk data, which then spreads through the network, causing less and less of the original file to be available. However, with the resolve of the p2p userbase, is this software really going to 'beat all Peer 2 Peer pirates at their own game,' or simply prove a minor annoyance?"
Or they have cracked even the strong hashes. In which case they are really cool. I know Mr. Torvalds is Finnish, but I doubt even he could come up with algorithms to do that.
In their conceited press release, they have compared Spoofing vs DRP/a
Iran captures three CIA agents
It's "copyrighted," not "copywritten." We're talking about rights, not writings.
I took the liberty of pre-caching the site on Coral before it went live - http://www.viralg.com.nyud.net:8090/index.html. I think Slashdot should really consider doing this as part of the proceedure...this site won't last a minute under the weight of our collective, nerdy asses.
By the time this is submitted, it will probably already be redundant (even though it's informative :)) - but the hashes are used for parallel download streams of the same file. So, if you saturate the network with the same hash, you can corrupt the data when the client automatically assumes it's the same file and tries to merge it with the other incoming data.
Read more here
fuvoo: watch something
in pdf form
Note the claims section and references - they keep talking about Napster and Kazaa - nothing about anything that use hashes.
how will this be different from the flodding of fake files already on P2P networks like Kazaa. Sure, the hash will be the same, but what "JHoe Sixpack" looks at hashes?!
Joe Sixpack may not look at hashes, but his P2P software probably does. I know aMule uses the hash to match files that have had their names changed.
~Rebecca
Unless they have lots of supercomputer time, seeding the occasional p2p file with bad data will be very expensive.
The Bittorrent protocol uses SHA1 hashing.
Yes, there was recently a paper presented that "broke" SHA1, but the result is 2**69 operations instead of 2**80 to find a SHA1 collision. 2**69 is still a very large number of operations... a lot less than a full 2**80, but still a prohibitively large number (more costly than the actual realized losses the entertainment industry is suffering).
PJRC: Electronic Projects, 8051 Microcontroller Tools
From what I have notices, using Kazaa-type software in Finland is nowadays a complete waste of time. What you get are exactly these files the company claims to have created. Sometimes you here like 10 seconds of the actual song and the rest is just random noise.
Now, I do not know if what they claim is technically true or whether it is this company that is behind all these files, but I can tell that in real life it is extremely hard for a "normal non-geek user" to find pirated music here in Finland anymore.
Bittorrent and DC++ type of systems seem to be unaffected though.
Not only the company's, but also the submitter's claim seems to be bogus. Neither the Inquirer article nor the viralg.com website anywhere seem to be talking about hashes. Moreover, I'm kind of wondering where the Inqurer got their stuff from, since the viralg website contains... nothing. Nothing but blaah. No word at all on how they protect anything from anyone. A random link to the Finnish Top 40 allegedly showing how BMG became the market leader for domestic music. Umm, except that nothing whatsoever proves that Viralg had anything to do with it. (If you have evidence to the contrary, please post it!) Then there's some blurb about being insiders with mathematical knowledge up in the lonely north where there's nothing else to do is what got them where they are. So, where are they? Not like they actually tell us. No contact information besides the email address either (and nothing in the whois info). Apparently, being up in the lonely north with nothing else to do doesn't get you much further than producing a nonsensical website claiming you know how to save the world, find the question to the answer to life, the Universe and everything, with "stunning results."
:)
:)
And, breaking hashes, nonsense. If anything, maybe they are managing to manipulate P2P protocols to send you data you weren't supposed to be getting, but which is not actually going into the checksum?
Nothing for you to see here, methinks... and here I am wasting my time actually writing a reply to a trollish article.
On another random note, I kind of liked how their website looked in links.
Empty.
Shareaza has a "commenting" system for just this purpose.
"Flyin' in just a sweet place,
Never been known to fail..."
Or at least to be unique for each individual file per size. That would have ment that if you send the md5 sum plus the size info, you could in theory remake the file.
So instead of sending 'cf878d4809930e3696d9c9c242a6f646 1450466 KB' and recalculating what the content was, I will just have to retrieve SL-9.3-LiveDVD-amd64.iso.
Oh well, back to the drawing board.
Don't fight for your country, if your country does not fight for you.
I've also heard MP3s that work fine on my PC, but skipped horribly on my car player. Different players handle corrupted or badly compressed files differently.
I wonder why people who use P2P don't help each other out a little more. For example, you have someone with 200 files shared. They are downloading and sharing at the same time. Sometimes they download a bad file, and share it. It would make more sense to have a "unchecked" folder for downloads, then more it to the "checked" folder to share.
That would break a feature which enables greater sharing... Uploading of parts of files that you do not have all of. Think BitTorrent, but less organized...
"I'll have a Guinness, no wait, make that a Coors Light" -Grad student I work with, who shall remain anonymous...
Hehe, yup, its one of the great lines HEX produced.
I can really reccommend Terry Pratchett's books to everyone.
+++ MELON MELON MELON +++ Out of Cheese Error +++ redo from start +++
It's a couple pages in my paper here. Basically, the first 300Kb of Kazaa's files are hashed normally, then every 32Kb chunk of the file is hashed independently. This allows independent chunks to be downloaded out of order. These out of order chunks are recursively hashed against one another to create one final value, called a "kzhash", which is verified after the file is downloaded.
The attack is to use the recently released collision -- which creates two blocks that, when mixed against the default initial state of MD5, emit the same system state. Every 32K, you can embed one or the other in the file you're transmitting, and kzhash can't tell. What can you do with this? Morph a file as it traverses the network; have an installation executable describe the systems its being installed on as it propogates through a network. With a fairly large installer, you'd get quite a few bits in there.
You still don't get to do random noise, and while it's no Tiger Tree, kzhashing doesn't appear so exploitable that this group is likely to have anything. I could be wrong, but then, virtual algorithm? Right.
Your post advocates a
(X) technical ( ) legislative ( ) market-based ( ) vigilante
approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)
( ) Spammers can easily use it to harvest email addresses
( ) Mailing lists and other legitimate email uses would be affected
( ) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
( ) It will stop spam for two weeks and then we'll be stuck with it
( ) Users of email will not put up with it
( ) Microsoft will not put up with it
( ) The police will not put up with it
( ) Requires too much cooperation from spammers
( ) Requires immediate total cooperation from everybody at once
( ) Many email users cannot afford to lose business or alienate potential employers
(X) Spammers don't care about invalid addresses in their lists
( ) Anyone could anonymously destroy anyone else's career or business
Specifically, your plan fails to account for
( ) Laws expressly prohibiting it
( ) Lack of centrally controlling authority for email
( ) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
( ) Asshats
( ) Jurisdictional problems
( ) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
( ) Huge existing software investment in SMTP
( ) Susceptibility of protocols other than SMTP to attack
( ) Willingness of users to install OS patches received by email
( ) Armies of worm riddled broadband-connected Windows boxes
( ) Eternal arms race involved in all filtering approaches
(X) Extreme profitability of spam
( ) Joe jobs and/or identity theft
( ) Technically illiterate politicians
( ) Extreme stupidity on the part of people who do business with spammers
( ) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
( ) Outlook
and the following philosophical objections may also apply:
(X) Ideas similar to yours are easy to come up with, yet none have ever
been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
( ) Blacklists suck
( ) Whitelists suck
( ) We should be able to talk about Viagra without being censored
( ) Countermeasures should not involve wire fraud or credit card fraud
(X) Countermeasures should not involve sabotage of public networks
( ) Countermeasures must work if phased in gradually
( ) Sending email should be free
( ) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses [hey, it's Microsoft... they've probably already submitted the patent...]
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
( ) Killing them that way is not slow and painful enough
Furthermore, this is what I think about you:
(X) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your house down!
The AACS key is NOT 0xF606EEFD628B1CA427BEA93A9CA9773F
Actually we were both wrong; it is (2^keylength)^2 number of keys. However this number is equivalent to 2^(keylength*2), not 2^(keylength^2)
Why would this not be "just double work"?
First you find all files matching the first hash, then filter out one matching the second.
And where exactly do you think the work is occuring? Computing the second hash. If you have one hash algorithm, you only have to match once. If you have two hash algorithms and you did it this way, you have to match enough with the first algorithm to find a match for the second algorithm. This isn't twice as much work, this is twice as much keyspace (with each bit increase in keyspace representing twice the work)
I am disrespectful to dirt! Can you see that I am serious?!
The file isn't comprobed only when complete, every chunk is comprobed when received. (BT:1/2mb,ED2k:10mb)
I meant: The file isn't verified only when complete, every chunk is verified when received. (BT:1/2mb,ED2k:10mb) Sorry, me fail english... (that's not umpossible...)
Thank you for pointing out my mind slip.
While I'm at it...
With an 8-bit hash key, there are 256 possible keys. This means that 1/256 files will match the hash. With another hash function with 8-bit keys there are 1/256/256=1/65536=1/(256^2)=1/((2^8)^2) files matching the two keys. This keyspace is indeed the same size as that of a 16-bit key with the important difference that it is much easier to find matches if you can partition the search space.
Picture yourself an unpainted 65536-piece square jigsaw puzzle (quite impossible for a human to do within a lifetime?).
Now change your mental picture to a 65536-piece square jigsaw puzzle painted in 256 randomly ordered differently coloured vertical stripes. The solution for a column of the puzzle quickly degenerates into the work of solving an unpainted 256-piece 1-D puzzle (not so impossible, might take a couple of days). After doing 256 of those (might be a slight bit time-consuming, some years), the set of stripes represents another 256-piece puzzle (needing like another day to solve).
This is not magic with large numbers, but the difference between brute force and the rest of the methods.
For a 10MB file, there are 2^83886080 possible bit arrangements. 1/(2^32) of these (2^2621440) are collisions in a 32 bit key space. You wouldn't have to try them all to find enough collisions to find one which also makes a collision with another algorithm. Especially not if you know something about the algorithm.
Geek rants since like... 2000 or something.
I've already looked into poisoning Torrents: 1) There is a hash on the entire file (simple enough) 2) The data shared from a torrent is broken up into pieces. Contributors can only send whole pieces. (ie many people contribute to the entire file you're downloading but only 1 person contributes to a given piece). AND EACH PIECE IS HASHED. Take a look at the .torrent for yourself. The .torrent contains the hash of every piece. So not only would you have to make a file of the SAME SIZE with the SAME HASH, but every 1MB (for example) would also need to have the SAME HASH.
Not only that but if you inject enough bad pieces you get booted (and yes this can be tracked, becuase as I stated before pieces come from a single individual).
Anybody remember the name of that company that promised extremely high lossless compression rates on arbitrary files?
early in the lives of gotwoot and scarywater (large, fairly well known fansub bittorrent tracker sites), they encountered ddos issues...
people were using botnets and what amounts to trivial network code to send false complete requests to the trackers, and volunteering as seeds. So, in a field of maybe 100-200 legitimate seeds, there would be ~30,000 fakes poisoning the tracker. The tracker couldn't tell they were fakes, so was redirecting 99% of requests for blocks to the fakes advertising themselves as seeds (And eventually running out of memory as more bots were activated and the server broke under the load).
The recent weaknesses found in md5 and sha1 also make block poisoning a possibility. Which opens the door to download pool poisoning. If an attacker can generate a block that checksums to a known good block, then the downloader will only be able to detect that poisoned block in a many-blocks hash, not in individual block hashes. This means that the bad block would be propagated before it was detected, and poison the whole larger block (chunk).
Even further, clients would have no way of determining exactly which block is bad, so would have to discard the entire chunk and start again... and again, may very well end up with the poisoned data.
That's assuming that the app is still using a broken hash though. This becoming a problem would probably force the application into a better hashing algorithm (the yet-unbroken sha256 over sha1 or md5, for example), or into complete unusability, assuming the attackers were determined enough to poison every file and to do so intently enough to make an impact.
>Why would this not be "just double work"? It is squared work.
All we can really say is that these researchers did not demonstrate a preimage attack. However what they did demonstrate should raise serious concerns that a preimage attack might be possible. For example, I could hash the latest blockbuster movie file, saving the internal MD5 state at the last iteration. Then, proceed with their algorithm, searching for a pair of two-block extensions to add to the file which lead to MD5 collisions of the entire file. If not, why not?
Bottom line, attacks get stronger over time, never weaker. Once a crack appears, further probing generally widens the crack.
MD5 is probably ok to use in a scenario where you don't expect an active adversary, or in a keyed hash where the security is protected by a secret key. But relying on MD5 to protect data integrity against a well funded adversary is foolish at this point.