Can Poisoning Peer to Peer Networks Work?
andrewchen writes "Can poisoning peer to peer networks really work? Business 2.0 picked up my research paper from Slashdot
and wrote an article about it. In my paper, I argue that P2P networks may have an inherent "tipping point" that can be triggered without stopping 100% of the nodes on the network, using a model borrowed from biological systems. For those who think they have a technical solution to the problem, I outlined a few problems with the obvious solutions (moderation, etc.)."
Have each user vote for each server they download from. If a specific server gives out bad files, the users would vote as a bad server. Then it would not be able to connect to the P2P network.
This would be moderation however, it would be the smartest way as each user would have their word on who is allowed and not allowed on the network.
*Headline News* censorship shuts down the Internet! More at 6PM!
Many users, when they download a "poisoned" file, get a little angry... and then they move on WITHOUT deleting the file! This leaves it in the system on yet another node and increases the chances that someone else will download it from them. If users take a little more responsibility for the network, these files wouldn't spread very well at all.
Because most users download files and never check them.
Really annoying especially with large files you've downloaded at 1kbps
Although this idea [checksums] works for newsgroups and some other centralized services, it does not with P2P. Basically, it comes down to the fact that you must trust whomever is actually doing the checksumming, or else they can just lie and publish false checksums. In the case of P2P networks, the checksumming is done by the same person you want to figure out if you can trust! As far as I know, this is an unresolvable problem.
So, um... how about this... If it's a standard file, such as, say, the deviance rip of neverwinter nights, or the new MPEG of Two Towers, then it should always have the same checksum.
Somebody somewhere needs to maintain a website with these checksums on. Then there's no dependence on the person who you're pulling the file from.
Obviously doesn't work for random porn videos (although it would for more popular ones... which might also tell you whether they're any good).
And there's nothing illegal about it.
Problems?
If you can't see this, click here to enable sigs.
I disagree with your suggestion that checksums can't work. A way they could work is as follows.
Create a website with logins for the users. Users of this web site can create lists of checksum for the files they create or have downloaded and verified as valid.
Other users can check any given user's list, and perhaps even post comments about the user's list, a form of moderation, if you will.
The validity of any single file on any random user's list would certainly be questionable, but some lists would become "trusted" by the community through trial and error. Others would be recognized as bogus and ignored.
Just a thought. Give me more than a few minutes and I might be able to come up with a better one.
Most of us who have been on P2P looking for files have been used to the fact that a large number of users are misconfigured (their firewall blocks your incoming request but heppily tells you they have the file you want) or are trading crap quality files. At that point you resort to brue force and using a bot to just grab everything it can to a large holding drive... a 40gig ide is dirt cheap and can easily hold the results of running a bot searching for "radiohead mp3" and grabbing EVERYTHING it finds over the course of about 3 days. but then you have to manually go in and delete all the crud, cruft and garbage. It's still faster than the old days of IRC trading but the signal to noise ratio has always been really bad.
Granted poisining it can start to drive away the gimmie-gimmie crowd or the newbies.. but the hardcore and old-timers will stay and simply find a way around it. Hell a group of about 100 of us now have our own private open nap network going and we have only high quality known good files. any clients connecting not sharing or sharing crap are instantly banned/blackballed... so we do the moderation thing.. with a side requirement that you must be asked to join and prove your worthyness to us. Maybe that will be the direction P2P will go... back to the roots of IRC where you had to prove your worthyness, ratios were encforced, and real people made decisions to keep out the troublemakers...(RIAA) granted you dont get 30 bajillion users that way, but then you dont have to spend a night and 10 gig trying to find that song or file you want.
Do not look at laser with remaining good eye.
From the webpage:
In particular, our analysis of the model leads to four potential strategies, which can be used in conjunction:
1. Randomly selecting and litigating against users engaging in piracy
This seems to be the option which involves the least technological action. However, randomly wouldn't work, if it were only because the P2P users don't all live in the same country, hence different laws apply. So some sort of not-so-random selection proces has to be implemented.
2. Creating fake users that carry (incorrectly named or damaged files)
Modern P2P programs support downloading files from multiple sources. If someone downloads such a fake file and discovers it, the file will almost always be deleted. So, these files will not propagate through the network, or at least not as fast and as much as the correct files. So a search where one file can be downloaded from many sources is in this case preferable before one with not many nodes serving the same file.
3. Broadcasting fake queries in order to degrade network performance
Now this is an interesting thing. The makers of the P2P programs who are being targeted by fake queries could ban such users, or could build in a feature where the user of a P2P program can ban a host his/herself, so that it will be excluded in further searches.
4. Selectively targeting litigation against the small percentage of users that carry the majority of the files
Some users carry gigs and gigs of files, but that doesn't mean they're very popular. If I setup a server where I host my 20CD collection of Mozart works I'll probably won't get as much traffic as when I publish the Billboard 100. It's not the quantity, but the content of the files served that counts. Search for Britney and you'll receive 1000's of hits. Search for Planisphere and a lot less results will show up.
Nevertheless it's a good paper.
The answer is quite simple, and would be very difficult for the sabateurs to subvert.
GPG signatures (which BTW include a checksum) of content, with said signatures refering to an online alias rather than a real person (thereby maintaining anonymouty).
A web of trust is formed, in which HollywoodDude is known and trusted, and has signed RipperGod's key, who in turn has signed FairUsers key, and so forth.
Provide a separate way of obtaining the keys (e.g. multiple independent websites, multiple independent keyservers, and so forth), and people can simply filter out anything submitted by untrusted users. If something submitted by someone outside of the trust ring, and someone who is trusted sees the item and determines that it is worthwhile/good/whatever and not a decoy, they could sign the item themselves.
Gaining trust would of course take time, probably requiring many worthwile submissions, but that is true in real life anyway, so why should it be any different online.
If someone violates their trusted status (or their private key is stolen, which BTW would be a violation of the law), others in the ring of trust could revoke their trusted access and blacklist their signature.
It isn't as convinient as just being able to share something with little or no thought, but it is emminently doable, and there really is no straightforward way to undermine such an approach.
The Future of Human Evolution: Autonomy
The problem faked hashes can be addressed using trees of checksums rather than just a simple checksum although a workable implementation would require embedding into the P2P protocol.
The idea is you break the file up into smallish sized blocks (100k or so) and generate a hash for each one of these. For each 8 first level hashes, you feed them into a crypto hash function to generate a second level hash. For each 8 second level hashes... you generate a third level hash. This allows a continuous (per 100k blocks) proof that the content is valid... The size of the proof grows with the log of the content so it is not much of a problem.
The RIAA/MPAA don't need to poison P2P networks. Nor do they need to use lawsuits and the threat of DMCA. The easiest, best way to stop illegal sharing of copyrighted materials is to provide a legal, reasonably priced electronic distribution alternative.
Really. Most users, given the choice, will pick the "honest" legal way to get their music and videos. Will there still be pirates? Of course, but you can never stop them and, heck, you're not losing money on them anyway. They wouldn't spend the money on the music.
Treat honest customers as honest, embrace new distribution methods. The problems go away. Think of the cost savings: they wouldn't have to buy any more senators.
tune, I may end up with somthing thats bland, repetitive and annoying.
And, pray tell, how am I supposed to know the difference?
what if the they take a few AOL accounts to do the poisoning: mind you that these have flexible IP adresses. Therefore you have to block all of AOL, which is A-OK by the RIAA I suppose...
Or you could not live in the US and have no problem
If an experiment works, something has gone wrong.
If this is what people are forced to do to achieve Napster-like results, then RIAA et. al have basically won all that they set out to achieve. By raising the bar high enough and by forcing higher transaction costs on the users, industry effectively shuts internet piracy out for 99.9% of the population. Of course people like me, that 1% or whatever it is, will always be able to circumvent whatever they throw in my path (presuming that I'm willing and wanting to do so of course). However, that number is so small that they really would not bother spending much effort to enforce from a simple cost / benefit point of view. Why spend millions in legal and related fees to track down a group of consumers that only account for half that amount? They won't bother, like they didn't really before Napster came along.
In fact, I would further argue, against the conventional wisdom on slashdot, that RIAA has basically won the war against P2P and other forms of mass piracy. At least once they shut out networks such as Fasttrack, and let it be known that there will no financial return for those that fund the development of piracy networks. Certainly the average Schmoe can download that super popular song via GNUtella with some effort, but getting much more than that like, say, the entire album at decent quality from same artist, is like trying to extract blood from a rock. That is not to say that they will retire their guns, but rather that it will just be an on-going series of small battles, more like maintenance, to hammer down any network, system, or device that pops up and starts to hemmorage their intellectual property.
Checksumming - no good. Any program could pretend to have the right checksum, but send false data. No point in figuring out *afterwards* the download is corrupt.
Webs of trust - hardly. Imagine a network of antis giving eachother good reviews, they'd certainly be better off than someone without any reviews at all. It's very *unlikely* that the one you're P2P'ing with has a trust chain you accept.
"Database" of who are good traders and not - Fake databases would screw that, you wouldn't know which ones to trust as you have no central server. The problem is that if there's to be any real P2P exchange happening, it's usually *strangers* meeting.
My friends could do a web of trust or a database, but then we'd much more likely to setup some mutual leech ftp servers instead and skip the entire P2P-networks.
Kjella
Live today, because you never know what tomorrow brings
The latest versions of limewire use hashes from a specification called HUGE that probably defeat this type of posioning attack. You can check out a recent interview with limewire team here. Go here if you want to download the code or check out the dev docs(Which are pretty outdated).
What the second-to-last paragraph in the paper? There's a missing word. A pretty important word, too. (How can this paper be featured all over the map and have an error like this?)
Anyway, is it:
"Or perhaps the carrying capacity of a well-designed P2P network is huge, and *NO* amount of flooding can overwhelm the network."
Or:
"Or perhaps the carrying capacity of a well-designed P2P network is huge, and *ANY* amount of flooding can overwhelm the network."
Which is it: "no" or "any?"
I love the smell of undergraduate sophistry in the morning...
The author of this paper seems to suffer from the common practice of those in a hurry to finish their term papers that if they somehow ignore the elephant in the room that disproves their point they might end up getting partial credit for impressing people with how well they can tap dance around the elephant. In this case the well-established practice of using a secure hash function as a self-verifying mechanism to prevent DoS attacks that try to flood a network with garbage files is the elephant.
In his FAQ regarding the paper, Mr. Chen correctly addresses the problem of a lack of centralized authority in using hash functions as distributed/P2P but apparently did not make more than a cursory examination of the subject or else he would have seen the various methods available for solving such a problem. I can only assume this is the case because reputation systems beyond simple moderation are not addressed and flow-constrained trust networks are never mentioned in this section.
As someone who seeks to pass off a "bad" file (this report) as a "good" file, perhaps sooner rather than later Mr. Chen will learn how the distributed moderation and trust system known as peer reputation works. Surely I am not the only one who finds it more than a little ironic that a paper by an author who claims that distributed moderation doesn't work is being submitted to a peer-reviewed journal in an attempt by the author to bootstrap his own reputation?
Taken from Andrew Chens responses to the solutions:
Although this idea works for newsgroups and some other centralized services, it does not with P2P. Basically, it comes down to the fact that you must trust whomever is actually doing the checksumming, or else they can just lie and publish false checksums. In the case of P2P networks, the checksumming is done by the same person you want to figure out if you can trust! As far as I know, this is an unresolvable problem.
Actually, the checksums should still work I believe, in much the same way that file sizes work now. Consider the reason the files that are being injected are set to the same size as the real file; the purpose is to mask these files to the naked eye. Checksums could be used for the same purpose.
The reason for this is because as people find good files they will tend to keep them while deleting the bad files. Sure if we only get 1 result back then we don't know one way or other, but if we have 10 results back and 8 of the 10 of the same checksum, we can assume those 8 are the good files.
Of course the problem with this is that a great many people don't bother to delete bad files after downloading, but should the poisoning become too much of a problem we can entice more people to clean up their shared files by way of the client interface.
All in all, I think this would combat poisoning very well.
Sigs are awesome huh?
Here is a file
Bobs_Song.mp3 5 M Hash -XXXXXXX
You don't know that I gave you the wrong hash till you're done.
It can only tell you that you have the wrong file, after you have it
A P2P program call edonkey (don't laugh) has partially solved this problem.
C D1.FTF.eDKDistro.Sharereactor.bin|559778352|1b153e 31f5fdbe829488989d04dda2b1|/
In order to dowload a file, you can use a URI such as (ed2k://|file|The_Adventrues_Of_Pluto_Nash(2002).
). The URI contains the "local filename", size and SHA-1 hash. A companion web site acts as a directory of URI's for popular content. The content is screened by the folks running the site. It has now reached the point where the "pirate" teams have accounts and post SHA-1 encoded URIs before releasing the content into the wild. Most edonkey users don't use the embedded search and instead use directories such as sharereactor.
The author writes
This is not an unresolvable problem at all; this is where web of trust comes in. The basic idea is for the publisher to sign the checksum using his or her private key. Others can then verify the signature using the publishers public key. This allows me to verify, using only a few bytes of information, that a publisher named SecretAgent did indeed publish a file. If I know that SecretAgent has previously published a lot of "good" files, then the file is probably good. If I don't have any experience with SecretAgent, but I do know that PrivateBenji is trustworthy, and PrivateBenji vouches for SecretAgent, then the file is probably good.
The author fundamentally misunderstands webs of trust:
A web of trust is not a "trust rating" ala eBay. A web of trust is a specific group of people who vouch for each other. Creating a malicious group of people who trust each other does not cause problems. (In fact, it can actually help.) If I trust A, based on experience, and if A trusts B, based on experience, then I can probably trust B. The fact that C, D, and E are malicious doesn't cause problems, because neither A nor B trusts them.
Distributed trust and peer review are fine and good but not even needed for the simple task at hand.
Look at the warez scene to see how it goes. A handful of release groups whose names are known to everybody who is even vaguely interested is sufficient to ensure supply. If these groups are attacked by fake releases (rarely happens) they can use hash keys as you suggest (some already do).
Websites like www.sharereactor.com also safeguard against fakes - another mechanism which is strong enough to defeat the entire problem by itself.
What I am saying is that distributed moderating à la slashdot will not evolve. Instead, we will have a handful of "authorities" - Web sites or public keys - that everyone trusts.
Note that authority - when not combined with power - is a Good Thing (TM).
one could keep a trusted block signature for each file. Say you have signature file that has one MD5 for each x bytes of the file. This file and it's MD5 hash is the identity of the file. On would then choose to download this file before the file itself and then download the blocks of x bytes from the file in a rendomised order, and possibly from diferent nodes. I guess this would add some otherwise uneeded downloads, but would help to restart the stoped downloads and would detect poison nodes easily.
To bad I am so late in posting this...
[]'s Victor Bogado da Silva Lins
^[:wq
look at DVD's...provide so much material that it is more work pirating than it is to buy. Why does a DVD cost the SAME as a CD ? Last time I checked a movie was SIGNIFICANTLY more expensive to produce than a ALBUM, and yet DVD's sell for the same or LESS, and quite often contain the BLOODY soundtrack as well. If a CD included multimedia stuff, editing room floor tracks, useless bio info and oodles of extra crap at a reasonable price it will be more trouble to rip it than it would be to buy it. When the RIAA wakes up and realizes that, maybe, just maybe things will turn around, otherwise, one way or another the industry is dead. The MPAA is actually beginning to come around, slowly and not without a FIGHT, but they are evolving. I don't hold out the same hope for the record industry.
errr....umm...*whooosh* *whoosh* Is this thing on ?
Do GPG signatures on blocks(about 50-100k) of files instead of entire files. When you have a contradiction of checksum's on blocks of files, alert that the user that someone is a liar. Take all the results of the search for that file, and all the gpg signatures and present the user with two options that are the sum of their trust levels. Most files can be previewed to check if it is bogus, and the user can blacklist anyone that even trusted that host, and their IP's as well. From then on, none of those IP's will be allowed to connect to this host. Eventually, they'll exhaust their IP supply before they end piracy.
:) (A per file rating instead of a per host rating)
Obviously the user would get to select the appropriate action if one of the files are just better than the other with a rating mechanism as well
Other advantages to this method are:
*Checksums can't be faked except in NP time. (use a random block size to thwart a super computer precalculating bad blocks that MD5 to the right hash... use multiple hashes)
*Multiple host download is gauranteed to be the same file (even when being poisoned).
*A computer need not have the entire file to share a block of the file, therefore files propogate the network in a more exponential manner. (host A gets block 1 from B. Host C gets block 2 from B, Host C and A trade blocks 1 and 2. Host D comes along and wants the same file, and can download from A and C instead of bogging down B. Works even better because all connections that I've seen are duplex even if they have a slower upstream. Conserve network bandwidth by refering downloaders to other people who have downloaded before... search for the GPG signature of the hosts on the network.)
Overall, I see this kind of thing being implemented very soon because it's not that difficult, and it's pretty obvious. Maybe the next edition of Gnutella will support this.
Of course there are loopholes where the RIAA/MPAA could buy half a million IP addresses or have a lot of computers on the network, but you don't have to have an unbreakable system, just a system that costs more to break than they think they will see in profits from breaking it.
Karma Clown
In particular, our analysis of the model leads to four potential strategies, which can be used in conjunction:
1. Randomly selecting and litigating against users engaging in piracy
2. Creating fake users that carry (incorrectly named or damaged files)
3. Broadcasting fake queries in order to degrade network performance
4. Selectively targeting litigation against the small percentage of users that carry the majority of the files
This mostly summarizes the war on drugs and the government's strategy against alcohol prohibition in the 1920's. Neither worked and the countermeasures are simple and straight forward.
A "directed" web of trust, objective quality measurement, and knowledge compartimentalization defeat the above strategy. The countermeasure of creating large numbers of mutally trusting attackers doesn't work when trust "flow" is taken into account. The keys to such a system are:
1) trust is assymetric
2) nodes define and change who they trust based on their own assessments
3) Nodes protect their knowledge of the web of trust
To see how this works, consider the cops and the drug dealers. The fact that the cops all trust each other does not result in the drug dealers trusting them. When a dealer is compromised, no matter how high up the chain it goes, trust shifts to rivals. Even when a kingpin falls, lines of trust will still exist that aren't compromised.
Drug dealing is not as popular as file sharing, is substantially more damaging to peoples lives and society, and has motivated levels of funding that are not matchable by publicly traded firms (who must demonstrate at least mid-range ROI). Despite all of these advantages, the war on drugs has been a dismal failure. The bottom line is that the internet makes distribution of content a commidity, where it was formerly a task of enormous complexity and value add. Economics will determine the rest, unless the US adopts and maintains a totalitarian government.