Can Poisoning Peer to Peer Networks Work?
andrewchen writes "Can poisoning peer to peer networks really work? Business 2.0 picked up my research paper from Slashdot
and wrote an article about it. In my paper, I argue that P2P networks may have an inherent "tipping point" that can be triggered without stopping 100% of the nodes on the network, using a model borrowed from biological systems. For those who think they have a technical solution to the problem, I outlined a few problems with the obvious solutions (moderation, etc.)."
Have each user vote for each server they download from. If a specific server gives out bad files, the users would vote as a bad server. Then it would not be able to connect to the P2P network.
This would be moderation however, it would be the smartest way as each user would have their word on who is allowed and not allowed on the network.
*Headline News* censorship shuts down the Internet! More at 6PM!
Many users, when they download a "poisoned" file, get a little angry... and then they move on WITHOUT deleting the file! This leaves it in the system on yet another node and increases the chances that someone else will download it from them. If users take a little more responsibility for the network, these files wouldn't spread very well at all.
Because most users download files and never check them.
Really annoying especially with large files you've downloaded at 1kbps
Although this idea [checksums] works for newsgroups and some other centralized services, it does not with P2P. Basically, it comes down to the fact that you must trust whomever is actually doing the checksumming, or else they can just lie and publish false checksums. In the case of P2P networks, the checksumming is done by the same person you want to figure out if you can trust! As far as I know, this is an unresolvable problem.
So, um... how about this... If it's a standard file, such as, say, the deviance rip of neverwinter nights, or the new MPEG of Two Towers, then it should always have the same checksum.
Somebody somewhere needs to maintain a website with these checksums on. Then there's no dependence on the person who you're pulling the file from.
Obviously doesn't work for random porn videos (although it would for more popular ones... which might also tell you whether they're any good).
And there's nothing illegal about it.
Problems?
If you can't see this, click here to enable sigs.
I disagree with your suggestion that checksums can't work. A way they could work is as follows.
Create a website with logins for the users. Users of this web site can create lists of checksum for the files they create or have downloaded and verified as valid.
Other users can check any given user's list, and perhaps even post comments about the user's list, a form of moderation, if you will.
The validity of any single file on any random user's list would certainly be questionable, but some lists would become "trusted" by the community through trial and error. Others would be recognized as bogus and ignored.
Just a thought. Give me more than a few minutes and I might be able to come up with a better one.
Most of us who have been on P2P looking for files have been used to the fact that a large number of users are misconfigured (their firewall blocks your incoming request but heppily tells you they have the file you want) or are trading crap quality files. At that point you resort to brue force and using a bot to just grab everything it can to a large holding drive... a 40gig ide is dirt cheap and can easily hold the results of running a bot searching for "radiohead mp3" and grabbing EVERYTHING it finds over the course of about 3 days. but then you have to manually go in and delete all the crud, cruft and garbage. It's still faster than the old days of IRC trading but the signal to noise ratio has always been really bad.
Granted poisining it can start to drive away the gimmie-gimmie crowd or the newbies.. but the hardcore and old-timers will stay and simply find a way around it. Hell a group of about 100 of us now have our own private open nap network going and we have only high quality known good files. any clients connecting not sharing or sharing crap are instantly banned/blackballed... so we do the moderation thing.. with a side requirement that you must be asked to join and prove your worthyness to us. Maybe that will be the direction P2P will go... back to the roots of IRC where you had to prove your worthyness, ratios were encforced, and real people made decisions to keep out the troublemakers...(RIAA) granted you dont get 30 bajillion users that way, but then you dont have to spend a night and 10 gig trying to find that song or file you want.
Do not look at laser with remaining good eye.
Why not block all IP's in RIAA/MPAA IP ranges and any ranges that are putting crap onto the network.
thank God the internet isn't a human right.
From the webpage:
In particular, our analysis of the model leads to four potential strategies, which can be used in conjunction:
1. Randomly selecting and litigating against users engaging in piracy
This seems to be the option which involves the least technological action. However, randomly wouldn't work, if it were only because the P2P users don't all live in the same country, hence different laws apply. So some sort of not-so-random selection proces has to be implemented.
2. Creating fake users that carry (incorrectly named or damaged files)
Modern P2P programs support downloading files from multiple sources. If someone downloads such a fake file and discovers it, the file will almost always be deleted. So, these files will not propagate through the network, or at least not as fast and as much as the correct files. So a search where one file can be downloaded from many sources is in this case preferable before one with not many nodes serving the same file.
3. Broadcasting fake queries in order to degrade network performance
Now this is an interesting thing. The makers of the P2P programs who are being targeted by fake queries could ban such users, or could build in a feature where the user of a P2P program can ban a host his/herself, so that it will be excluded in further searches.
4. Selectively targeting litigation against the small percentage of users that carry the majority of the files
Some users carry gigs and gigs of files, but that doesn't mean they're very popular. If I setup a server where I host my 20CD collection of Mozart works I'll probably won't get as much traffic as when I publish the Billboard 100. It's not the quantity, but the content of the files served that counts. Search for Britney and you'll receive 1000's of hits. Search for Planisphere and a lot less results will show up.
Nevertheless it's a good paper.
The answer is quite simple, and would be very difficult for the sabateurs to subvert.
GPG signatures (which BTW include a checksum) of content, with said signatures refering to an online alias rather than a real person (thereby maintaining anonymouty).
A web of trust is formed, in which HollywoodDude is known and trusted, and has signed RipperGod's key, who in turn has signed FairUsers key, and so forth.
Provide a separate way of obtaining the keys (e.g. multiple independent websites, multiple independent keyservers, and so forth), and people can simply filter out anything submitted by untrusted users. If something submitted by someone outside of the trust ring, and someone who is trusted sees the item and determines that it is worthwhile/good/whatever and not a decoy, they could sign the item themselves.
Gaining trust would of course take time, probably requiring many worthwile submissions, but that is true in real life anyway, so why should it be any different online.
If someone violates their trusted status (or their private key is stolen, which BTW would be a violation of the law), others in the ring of trust could revoke their trusted access and blacklist their signature.
It isn't as convinient as just being able to share something with little or no thought, but it is emminently doable, and there really is no straightforward way to undermine such an approach.
The Future of Human Evolution: Autonomy
The problem faked hashes can be addressed using trees of checksums rather than just a simple checksum although a workable implementation would require embedding into the P2P protocol.
The idea is you break the file up into smallish sized blocks (100k or so) and generate a hash for each one of these. For each 8 first level hashes, you feed them into a crypto hash function to generate a second level hash. For each 8 second level hashes... you generate a third level hash. This allows a continuous (per 100k blocks) proof that the content is valid... The size of the proof grows with the log of the content so it is not much of a problem.
The RIAA/MPAA don't need to poison P2P networks. Nor do they need to use lawsuits and the threat of DMCA. The easiest, best way to stop illegal sharing of copyrighted materials is to provide a legal, reasonably priced electronic distribution alternative.
Really. Most users, given the choice, will pick the "honest" legal way to get their music and videos. Will there still be pirates? Of course, but you can never stop them and, heck, you're not losing money on them anyway. They wouldn't spend the money on the music.
Treat honest customers as honest, embrace new distribution methods. The problems go away. Think of the cost savings: they wouldn't have to buy any more senators.
tune, I may end up with somthing thats bland, repetitive and annoying.
And, pray tell, how am I supposed to know the difference?
I think webs of trust are a good idea.
Poisoning such a web could prove difficult. I trust personal friends highly, the aren't a poisoning group.
People I or they don't know well won't get a high trust rating, and would be suspected if they were poisoning the group.
I think slashdot type moderation works well too, combined with a decent sized web of trust should be a pretty stable system
Flooding a network with spoofed files would drive users to more reliable music sources -- like the labels' own online sites.
The problem is the labels don't have their own online sites. Sooner or later (its bound to happen) the labels are gonna hire some college grads who grew up on sharing and understand the problem. Maybe then a compromise will be reached.
'Same speed C but faster'
The crew at the Open Content Network have released a specification for serializing hash trees. The specification is called the Tree Hash EXchange (THEX) and is being implmented in both the Open Content Network and Gnutella. Furthermore, this specification is compatible with the TigerTree hashes used for Bitzi.
If this is what people are forced to do to achieve Napster-like results, then RIAA et. al have basically won all that they set out to achieve. By raising the bar high enough and by forcing higher transaction costs on the users, industry effectively shuts internet piracy out for 99.9% of the population. Of course people like me, that 1% or whatever it is, will always be able to circumvent whatever they throw in my path (presuming that I'm willing and wanting to do so of course). However, that number is so small that they really would not bother spending much effort to enforce from a simple cost / benefit point of view. Why spend millions in legal and related fees to track down a group of consumers that only account for half that amount? They won't bother, like they didn't really before Napster came along.
In fact, I would further argue, against the conventional wisdom on slashdot, that RIAA has basically won the war against P2P and other forms of mass piracy. At least once they shut out networks such as Fasttrack, and let it be known that there will no financial return for those that fund the development of piracy networks. Certainly the average Schmoe can download that super popular song via GNUtella with some effort, but getting much more than that like, say, the entire album at decent quality from same artist, is like trying to extract blood from a rock. That is not to say that they will retire their guns, but rather that it will just be an on-going series of small battles, more like maintenance, to hammer down any network, system, or device that pops up and starts to hemmorage their intellectual property.
I just started using it last week -- I think I remember something whereby each file has some type of key / checksum (I'm not too familiar with the nuances of encryption)........... but I could be wrong.
Checksumming - no good. Any program could pretend to have the right checksum, but send false data. No point in figuring out *afterwards* the download is corrupt.
Webs of trust - hardly. Imagine a network of antis giving eachother good reviews, they'd certainly be better off than someone without any reviews at all. It's very *unlikely* that the one you're P2P'ing with has a trust chain you accept.
"Database" of who are good traders and not - Fake databases would screw that, you wouldn't know which ones to trust as you have no central server. The problem is that if there's to be any real P2P exchange happening, it's usually *strangers* meeting.
My friends could do a web of trust or a database, but then we'd much more likely to setup some mutual leech ftp servers instead and skip the entire P2P-networks.
Kjella
Live today, because you never know what tomorrow brings
However, in any case, it is way easier to spread checksums by various means - internet boards, email lists, usenet, IRC - than spreading the actual file. If the situation arises, and the P2P net is "poisoned" with invalid files (and invalid checksums) I'm sure it won't be hard to acquire the valid checksums and download the correct files. Of course, "poisoned" clients sending out fake files with wrong checksums will still be a problem. Why would there be different rips? Typically, each movie is released only once (by groups specialising in it), all other releases are "dupes" and are not do be distributed. The same is true for virtually any sub-category of the scene, such games ISO/RIP, utils and audio.
Switch back to Slashdot's D1 system.
The latest versions of limewire use hashes from a specification called HUGE that probably defeat this type of posioning attack. You can check out a recent interview with limewire team here. Go here if you want to download the code or check out the dev docs(Which are pretty outdated).
What the second-to-last paragraph in the paper? There's a missing word. A pretty important word, too. (How can this paper be featured all over the map and have an error like this?)
Anyway, is it:
"Or perhaps the carrying capacity of a well-designed P2P network is huge, and *NO* amount of flooding can overwhelm the network."
Or:
"Or perhaps the carrying capacity of a well-designed P2P network is huge, and *ANY* amount of flooding can overwhelm the network."
Which is it: "no" or "any?"
I love the smell of undergraduate sophistry in the morning...
The author of this paper seems to suffer from the common practice of those in a hurry to finish their term papers that if they somehow ignore the elephant in the room that disproves their point they might end up getting partial credit for impressing people with how well they can tap dance around the elephant. In this case the well-established practice of using a secure hash function as a self-verifying mechanism to prevent DoS attacks that try to flood a network with garbage files is the elephant.
In his FAQ regarding the paper, Mr. Chen correctly addresses the problem of a lack of centralized authority in using hash functions as distributed/P2P but apparently did not make more than a cursory examination of the subject or else he would have seen the various methods available for solving such a problem. I can only assume this is the case because reputation systems beyond simple moderation are not addressed and flow-constrained trust networks are never mentioned in this section.
As someone who seeks to pass off a "bad" file (this report) as a "good" file, perhaps sooner rather than later Mr. Chen will learn how the distributed moderation and trust system known as peer reputation works. Surely I am not the only one who finds it more than a little ironic that a paper by an author who claims that distributed moderation doesn't work is being submitted to a peer-reviewed journal in an attempt by the author to bootstrap his own reputation?
Again, I disagree. It has been my experience than many users do not delete damaged files, they simply leave them. The so-called swarmed downloads only further expose the downloads to corruption since all it really takes is one corrupt segment to either cause the program to crash or at least play really unbearable sound (or whatever media). To further compound the problem, the industry could use their cash and their legitimacy to be the most available and desirable servers (so that your swarmed downloads are almost certain to select its servers).
This is impossible in any current decentralized P2P scheme, don't you get it? How is any routing servent to know that the other servent it is connected to is not passing legitmate requests the hosts it is purporting to represent? It can't. It might attempt to throttle the traffic of any from any given node, but then that would necessarily mean throttling the ENTIRE network, which would be self-defeating.
While it is almost certainly true that only 1% of the content accounts for 99% of the traffic, it is also true that only 10% of the hosts account for almost all of the servers. Of those 10%, roughly half of them, (those that HAVE the popular files, are SHARING, are on truly HIGH speed network, and are NOT FIREWALLED) account for the majority of it. If you take the biggest servers out first, you will have a big impact. What's more, once it becomes established that there are likely consequences for being an effective server of files, the industry need not literally attack every last one of them. They need only use fear to their advantage and allow the servers' own self-interest to take over.
ShareReactor
FileNexus
Asia Movies
Jigle
Various sites specialised in files of certain languages (French, German), such as Spieleplanet
etc etc etc - just search for eDonkey links.
There are also IRC channels and uncounted web boards (similar to Asia Movies) dedicated to sharing ED checksums.
No, I am saying, in fact, I said, that is completely irrelevant. We're not talking about sharing files as in Napster or Audiogalaxy (where you seem to draw your experience from). There's only ONE valid version of each single/album (single MP3s aren't usually spread), the first high-quality, complete release by a scene group. All later releases are dupes, and not distributed. You get the checksum to that release, and you're set.Switch back to Slashdot's D1 system.
Taken from Andrew Chens responses to the solutions:
Although this idea works for newsgroups and some other centralized services, it does not with P2P. Basically, it comes down to the fact that you must trust whomever is actually doing the checksumming, or else they can just lie and publish false checksums. In the case of P2P networks, the checksumming is done by the same person you want to figure out if you can trust! As far as I know, this is an unresolvable problem.
Actually, the checksums should still work I believe, in much the same way that file sizes work now. Consider the reason the files that are being injected are set to the same size as the real file; the purpose is to mask these files to the naked eye. Checksums could be used for the same purpose.
The reason for this is because as people find good files they will tend to keep them while deleting the bad files. Sure if we only get 1 result back then we don't know one way or other, but if we have 10 results back and 8 of the 10 of the same checksum, we can assume those 8 are the good files.
Of course the problem with this is that a great many people don't bother to delete bad files after downloading, but should the poisoning become too much of a problem we can entice more people to clean up their shared files by way of the client interface.
All in all, I think this would combat poisoning very well.
Sigs are awesome huh?
This strategy fails to take into account the fact that an RIAA mole could easily share desireable content. For example, mp3.com has 7 free, legal tracks for download from Linkin Park (not my choice in music, but they are quite popular currently). There are quite a few other well-known bands with free tracks on there. Sharing all that content, which the various record labels have decided to share anyway, will only serve to get the sharing user voted up.
Once the mole is voted up for carrying lots of valid files that people are interested in, the mole begins to distribute poison. Sure, this will cause the mole's ranking to fall somewhat, but damage will be done in the process. Furthermore, the legit files will continue to somewhat offset the attempts to vote the user down. Multiply this whole situation by a number of different automated users, and you've got an effective poisoning attack.
In short: The mole has spread files that the RIAA already wants distributed (win for the RIAA, win for users), and the mole has spread poison for files that the RIAA doesn't want distributed (win for the RIAA, loss for users).
I hope the same people who defends the right to distribute mp3 they don't own the copyright for, will be the same people who defends a person/company's right to violate the GPL.
Je ne parle pas francais.
The RIAA and all the lawyers in the world will never be able to completely stop pirating. Look at how much money the feds throw at drugs and the number of addicts on the street. If enough people want something, they'll get it.
:).
I know one of my chief frustrations is to search for a song and either have it incomplete, or be of poor quality (e.g. pops or other defects) or to simply have it not be the same song that I downloaded. If I could search for a song, pay $SOME_SMALL_AMOUNT (e.g. $1US) for it and download a 'known perfect' copy at my choice of bitrates (e.g. 128, 160, etc.) then sure as heck I'd do it.
Distributing these poisoned files would take an enormous amount of bandwidth, so they'd have to have some sort of agreement worked out with ISPs and a mass-content provider, say Akamai. Akamai has tens of thousands of servers located in hundreds (if not more) of ISPs throughout the nation. I think on peak usage they're pushing out 100 GB/sec. in the US (if not more). Simply say "Ok Akamai, can we buy 10GB on each of your servers and push all these MP3s out?". Then you write a gnutella client for each box which offers all the MP3s up for distribution.
I can't remember how the gnutella protocol works but I think it broadcasts search requests to the nodes that store a cache of what they have and what their neighbors offer and then can pass the request off. Have your client log all the requests (so you can tell the record companies which songs were requested more) and of course offer up your files when requested. If you do this with 10,000 boxes full of identical content chances are you're going to drown out any signal out there.
If you're really tricky, you can even have the client 'fake' files so you don't actually need to have the file on the box; you could send a pre-existing obfuscated file, or even dynamically build and stream the poisoned MP3.
Of course, all of this is moot if you still don't have a very easy, cheap method of offering MP3s online for the mass public. You could pitch it like this "Yeah, so you won't make much money off of offering $SOME_SMALL_AMOUNT for each MP3. But you're a fool if you think simply shutting Morpheus off will result in even 10% of the Morpheus users buying the actual CD or using a painful, userUNfriendly pay-per-MP3 system. However, what if we have a method to net you 20 or 30% of users who wouldn't pay you anyway?" So the pitch would be "We can't get you all of them, but our method would give you more than you're getting now!". Frankly the people who post on SlashDot (from the very negative response to the Subscription model) are not a good cross-section of the vast majority of internet users out there
So in your obfuscated file you have it play maybe 20 seconds of the file and then say "Sorry, this is a copywrited file. Pirating files costs artists money. If you want to buy this MP3 for $SOME_SMALL_AMOUNT, please visit http://www.somestore.com. 80% of $SOME_SMALL_AMOUNT earned will go directly to the artist."
It gives them a reason to buy it - not only do you have SomeStore.com very easily accepting payment, but you ACTUALLY PAY THE ARTISTS A MAJORITY OF THE MONIES EARNED! So it can quell the naysayers who say "Well the artist wouldn't receive anything anyway!" (rant: but who are you hurting more, the billion dollar-industry or the Artist who NEEDS even the small cut they receive from each CD sold?).
Some drawbacks could be of course that someone writes a 'detector' to find and ignore the invalid MP3s, or they block the IP addresses of the servers, etc. but that is easily fixed. Most non-power users (e.g. the great and huddled masses of the internet) don't want to update their Morpheus client every time a new version is released. Heck, even programs which offer hassle-free updating (e.g. antivirus, windowsupdate.com) very rarely are by the majority of internet users. Also, you'd work out the server IP settings with the ISP so that they would rotate to a random IP in their pool - since most of the servers are located in most ISPs you couldn't ban the single IP but perhaps a subnet. But since the IPs are in the ISP, you have now banned a large chunk of users. If they are in every ISP, you will have to ban every ISP (see the problem in banning IPs?).
So, to boil it down to a sentence:
Have very easy-to-use, hassle-free, cheap, reliable, etc. method for users to buy MP3s and they WILL
Thanks,
--
Matt
Bitzi stores information on files found on P2P networks, indexed by a TigerTree hash appended to a SHA1 hash. Support for it has been integrated into several Gnutella clients (ShareAza, Limewire, etc.), which have also come up with their own URL systems (gnutella:// and magnet:// are the two existing ones right now).
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Here is a file
Bobs_Song.mp3 5 M Hash -XXXXXXX
You don't know that I gave you the wrong hash till you're done.
It can only tell you that you have the wrong file, after you have it
If you find a poisoned file in a trusted chain, you can now discount that person, and that entire chain.
Trust should work both ways.
Several unrelated "I got a good file" ratings could give you a cloud of trust. I think it oculd work.
A P2P program call edonkey (don't laugh) has partially solved this problem.
C D1.FTF.eDKDistro.Sharereactor.bin|559778352|1b153e 31f5fdbe829488989d04dda2b1|/
In order to dowload a file, you can use a URI such as (ed2k://|file|The_Adventrues_Of_Pluto_Nash(2002).
). The URI contains the "local filename", size and SHA-1 hash. A companion web site acts as a directory of URI's for popular content. The content is screened by the folks running the site. It has now reached the point where the "pirate" teams have accounts and post SHA-1 encoded URIs before releasing the content into the wild. Most edonkey users don't use the embedded search and instead use directories such as sharereactor.
The author writes
This is not an unresolvable problem at all; this is where web of trust comes in. The basic idea is for the publisher to sign the checksum using his or her private key. Others can then verify the signature using the publishers public key. This allows me to verify, using only a few bytes of information, that a publisher named SecretAgent did indeed publish a file. If I know that SecretAgent has previously published a lot of "good" files, then the file is probably good. If I don't have any experience with SecretAgent, but I do know that PrivateBenji is trustworthy, and PrivateBenji vouches for SecretAgent, then the file is probably good.
The author fundamentally misunderstands webs of trust:
A web of trust is not a "trust rating" ala eBay. A web of trust is a specific group of people who vouch for each other. Creating a malicious group of people who trust each other does not cause problems. (In fact, it can actually help.) If I trust A, based on experience, and if A trusts B, based on experience, then I can probably trust B. The fact that C, D, and E are malicious doesn't cause problems, because neither A nor B trusts them.
Distributed trust and peer review are fine and good but not even needed for the simple task at hand.
Look at the warez scene to see how it goes. A handful of release groups whose names are known to everybody who is even vaguely interested is sufficient to ensure supply. If these groups are attacked by fake releases (rarely happens) they can use hash keys as you suggest (some already do).
Websites like www.sharereactor.com also safeguard against fakes - another mechanism which is strong enough to defeat the entire problem by itself.
What I am saying is that distributed moderating à la slashdot will not evolve. Instead, we will have a handful of "authorities" - Web sites or public keys - that everyone trusts.
Note that authority - when not combined with power - is a Good Thing (TM).
If they get to poison the networks, then that means that they are using the networks --just as we are.
I wonder what would happen if some ordinary user did the same things? Right or wrong?
Dealing with the problem this way is far better than using the law because it is hard to define the law in a way that makes good sense for everyone long term particularly when we don't yet know how P2P could benefit us all.
Besides, they can place any number of promotional information into their files just as easily as they can garbage and they should. Why not? They might even be able to write off more of the expense.
What the media companies need is good marketing. They are the content source. (for now) All they need to do is add value in ways that leverage the network effect that P2P offers and they *will* make money.
Anyway, the result of this is likely not all bad because file sharing will get somewhat marginalized, we all preview before we download large files and everyone is reasonably happy and free to use the net in creative ways.
Blogging because I can...
Popular files are more likely to be valid. Poison is less likely to be popular. Poison sinks to obscurity.
Public key encryption's been around for quite a while.
Just give moderators private keys, and distribute the public keys. Bingo! Authenticated moderation...
What's this Submit thingy do?
one could keep a trusted block signature for each file. Say you have signature file that has one MD5 for each x bytes of the file. This file and it's MD5 hash is the identity of the file. On would then choose to download this file before the file itself and then download the blocks of x bytes from the file in a rendomised order, and possibly from diferent nodes. I guess this would add some otherwise uneeded downloads, but would help to restart the stoped downloads and would detect poison nodes easily.
To bad I am so late in posting this...
[]'s Victor Bogado da Silva Lins
^[:wq
Is to create a network specifically dedicated to trading, say, opensource code, research papers, personal public diaries, and the like.
(Bye bye, karma) I may sound like a troll, but at least I'm being honest.
Peer-to-peer filesharing has a great deal of potential, but if its only popular use is piracy, well, we already get enough bad press, don't we? It'll only get worse.
(Sorry about the soapbox I'm standing on...)
What's this Submit thingy do?
I wonder if the author has considered that the primary applications of this work are probably not in influencing file-sharing networks so much as in politics. The P2P network that first comes to mind is ordinary web access within China. This is a situation where the government has an active interest in preventing any politically sensitive information from being propagated within the country, and so the ideas of this paper are directly applicable.
I'll leave the relevant ethical issues as a matter of discussion -- but I would suggest that this is a far more serious reason to be concerned about corporate research into network interruption.
since this website that collections strong file checksums, descriptions, etc, is now a centralized location (as opposed to P2P which isn't centralized), could the website fall under legal attack for aiding and abetting illegal activity of swapping copyrighted material? just curious...
"Facts are meaningless. You could use facts to prove anything that's even remotely true." - Homer Simpson
I see two problems with this idea.
-or-
TodayTM BillyJoelTM GoogleTMd for StitchTMes due to WindowsTM while RollerbladeTMing with an AppleTM and a PopsicleTM
Even simpler than all these attack strategies. Simply produce the produce the way customers want it.
Enough people will defect to the faster, more direct, legitimate servers. Where they can get the whole album and a movie in 2 hours instead of 2 weeks. The price should be good enough to encourage this.
The P2P networks relies on enough users mirroring enough copies of enough products. Reduce the user base and the number of nodes drops until it just doesn't work anymore.
You can see this on the unpopular P2P networks now.
So either you will end up with:
1. a few users sharing lots of files (which can be picked off with civil copyright laws).
2. a few users sharing few files (which means they can't find the files they want on the network, so are less likely to be running a P2P just to support other users, so the number of people spirals down).
The one thing I don't think you will end up with is many people legitimately downloading and then sharing the files. Quite simply, you would eat up your bandwidth using P2P which you need to do the downloading.
Another factor is the charging, many ISPs are moving to a download limit, e.g. TOnline is moving to 5GB limit per month, then pay 1.5 cents per MB.
So a movie would cost $7 to download after you've used up the first 5GB. Or for that matter to upload to another user!
So you could pull maybe 7 movies a month on the flat fee.
A lot of users on P2P systems will disappear as this becomes the norm.
So P2P is really just a temporary problem for copyright holders, just as long as they get their legitimate sales systems in place and don't go pissing off the consumers with DRM, funny licenses etc.
.
- First they ignore you, then they laugh at you, then ???, then profit.
Mozart is not, but any random recording of a Mozart piece is.
look at DVD's...provide so much material that it is more work pirating than it is to buy. Why does a DVD cost the SAME as a CD ? Last time I checked a movie was SIGNIFICANTLY more expensive to produce than a ALBUM, and yet DVD's sell for the same or LESS, and quite often contain the BLOODY soundtrack as well. If a CD included multimedia stuff, editing room floor tracks, useless bio info and oodles of extra crap at a reasonable price it will be more trouble to rip it than it would be to buy it. When the RIAA wakes up and realizes that, maybe, just maybe things will turn around, otherwise, one way or another the industry is dead. The MPAA is actually beginning to come around, slowly and not without a FIGHT, but they are evolving. I don't hold out the same hope for the record industry.
errr....umm...*whooosh* *whoosh* Is this thing on ?
After reading this and some of the comments from the old posting, I realised the MD5 hash is not a bad approach. When a client scans its HD it creates MD5 checksums of its files. when some one requets a file the checksum is sent with the reply. when the file is d/l'ed the checksum is checked. if the checksum fails the user is notified and they can either re-try the d/l or accept it. after they can test the file. if (with a valid checksum) the file is corrupt, the client can store the checksum and filter it from future requests, also they can be shared to prevent others from d/l'in as well. this system could still be temerarily defeted by having many versions of the same file, but again that could be tested as well (too many bad files flags a bad host, etc)
I sig therefore I am...
A P2P program call edonkey (don't laugh) has partially solved this problem.
I'd hate to see the kinds of porno AVIs that get traded on a P2P program named "edonkey"! (shudder). At least there isn't one called FistOfFiles.exe yet.
GMD
watch this
That's the key point right there. The paper that the article was based on used the analogy of a pond being polluted. Well, there are good anaolgies and there are bad analogies and a fishing pond not a very good analogy here because a P2P network is much more like a swimming pool with not one or two, but millions of high powered filters. A standard filter and chlorine/ozone system on a swimming pool can remove enormous amounts of excrement. A pool with a million filters is going to require a hell of a waste stream to pollute for any length of time. Given that these filter systems are also the water inlets for the pool, the task of polluting the majority of the water for any length of time is problematic at best and unlikely to succeed.
Anybody could have seen that coming. Were I president of the US, I would have skipped it as well.
Information wants to be anthropomorphized.
Do GPG signatures on blocks(about 50-100k) of files instead of entire files. When you have a contradiction of checksum's on blocks of files, alert that the user that someone is a liar. Take all the results of the search for that file, and all the gpg signatures and present the user with two options that are the sum of their trust levels. Most files can be previewed to check if it is bogus, and the user can blacklist anyone that even trusted that host, and their IP's as well. From then on, none of those IP's will be allowed to connect to this host. Eventually, they'll exhaust their IP supply before they end piracy.
:) (A per file rating instead of a per host rating)
Obviously the user would get to select the appropriate action if one of the files are just better than the other with a rating mechanism as well
Other advantages to this method are:
*Checksums can't be faked except in NP time. (use a random block size to thwart a super computer precalculating bad blocks that MD5 to the right hash... use multiple hashes)
*Multiple host download is gauranteed to be the same file (even when being poisoned).
*A computer need not have the entire file to share a block of the file, therefore files propogate the network in a more exponential manner. (host A gets block 1 from B. Host C gets block 2 from B, Host C and A trade blocks 1 and 2. Host D comes along and wants the same file, and can download from A and C instead of bogging down B. Works even better because all connections that I've seen are duplex even if they have a slower upstream. Conserve network bandwidth by refering downloaders to other people who have downloaded before... search for the GPG signature of the hosts on the network.)
Overall, I see this kind of thing being implemented very soon because it's not that difficult, and it's pretty obvious. Maybe the next edition of Gnutella will support this.
Of course there are loopholes where the RIAA/MPAA could buy half a million IP addresses or have a lot of computers on the network, but you don't have to have an unbreakable system, just a system that costs more to break than they think they will see in profits from breaking it.
Karma Clown
This is faulty reasoning. Once 80% of the nodes are poisoned (and probably far fewer), it means that users looking for illegal mp3 files will stand only a 1 in 5 chance of getting something that isn't worthless. How many times do you think people are going to subject themselves to this before deciding that it's just to much trouble? It's a clever solution, because it's using the very trait that makes P2P so attractive (P2P caters to convenience, and by extension, laziness), to render it wholly ineffective for its intended purpose.
In particular, our analysis of the model leads to four potential strategies, which can be used in conjunction:
1. Randomly selecting and litigating against users engaging in piracy
2. Creating fake users that carry (incorrectly named or damaged files)
3. Broadcasting fake queries in order to degrade network performance
4. Selectively targeting litigation against the small percentage of users that carry the majority of the files
This mostly summarizes the war on drugs and the government's strategy against alcohol prohibition in the 1920's. Neither worked and the countermeasures are simple and straight forward.
A "directed" web of trust, objective quality measurement, and knowledge compartimentalization defeat the above strategy. The countermeasure of creating large numbers of mutally trusting attackers doesn't work when trust "flow" is taken into account. The keys to such a system are:
1) trust is assymetric
2) nodes define and change who they trust based on their own assessments
3) Nodes protect their knowledge of the web of trust
To see how this works, consider the cops and the drug dealers. The fact that the cops all trust each other does not result in the drug dealers trusting them. When a dealer is compromised, no matter how high up the chain it goes, trust shifts to rivals. Even when a kingpin falls, lines of trust will still exist that aren't compromised.
Drug dealing is not as popular as file sharing, is substantially more damaging to peoples lives and society, and has motivated levels of funding that are not matchable by publicly traded firms (who must demonstrate at least mid-range ROI). Despite all of these advantages, the war on drugs has been a dismal failure. The bottom line is that the internet makes distribution of content a commidity, where it was formerly a task of enormous complexity and value add. Economics will determine the rest, unless the US adopts and maintains a totalitarian government.
You make it sound like the RIAA/MPAA is conducting a criminal enterprise by negotiating contracts between artists and themselves, and then risking the capital required for the production, promotion, and distribution. Those THIEVES!
The $15-$20/CD argument is a smokescreen. Not only do most consumers have the option to get CDs at fairly reduced prices (through a mail-order club), if they object to the price, they are free to keep their money, and let the RIAA/MPAA keep its property.
Here's another way to look at the problem: the physics of evolution. If we can treat p2p as an ecosystem, we can apply the same types of energy balances. The paper isn't talking aobut extinction of p2p, it's talking about a change in the observable patterns it exhibits. Because stressing a network can't eliminate p2p, a new one will pop up in its place. If you treat user demand as "free energy" the most stable state of those users is in sharing. Fundamentally, when you stress an ecosytem, it can "fail" in that the species in it aren't the same, but new ones pop up. The dinosaurs went extinct, but here we are!
You are probably correct as far as how things will play out in the real world (fewer sources of authority, but well-known and trusted sources) simple because of how the background social networks that currently exist can be used as a bootstrapping mechanism by the trusted source solution. Part of my original point is that this solution, as long as multiple sources of authority are allowed to exist, is a part of the general distributed trust solution to the original problem. Distributed trust can be "client-server", "peer-to-peer" or some hybrid of the two.
You only have to take a look around the real world to see that reputations are an efficient and attack resistant mechanism for allowing untrusted parties to exchange info/goods/services. Credit ratings, movie ratings, "best of" lists, gossip, etc. We are surrounded by and enmeshed within distributed trust and reputation systems so completely that most people do not even realize how many times a day they use such a system.
The break-even point on this curve is actually not that far below the so-called outrageous amounts they are currently charging. I did a research paper on this topic last year for college, and had to admit that while CD's are in fact overpriced, they are nowhere even close to the amounts that you are claiming. Yes, they are far above their own costs for the media -- but they aren't that much above their costs if you factor in media piracy as lost revenue. Whether or not they ever would have seen the money is beside the point -- if you smuggle someone into a theatre to see a movie without them paying for a ticket, even if there *are* a lot of empty seats, you are still considered to be depriving the theatre of lost revenue as well.
Anyways, you know the way the our market works, right? Everyone charges as much as they possibly can and still be able to convince some percentage of the people to buy the product. Every once in a while you find a philanthropic soul who will charge a modest amount above his own costs, but come on people! This is the real world... you can't seriously _expect_ a majority of people to run businesses like that. Heck, if most of them did, they probably wouldn't last more than a year!
File under 'M' for 'Manic ranting'
What's worse, it's very difficult to identify bad files automatically, because different rips of the same original can have different checksums, so the poisoners can spread lots of versions with different checksums, so you can't tell whether two files claiming to be a 128kbps ogg of "Whoops I Cloned It Again" came from the same original, only that they're not the same, so you have to listen to the thing all the way through to be sure that it doesn't suddenly turn into an FBI/RIAA/KGB warning against copying music, or a commercial for the CD containing the FM version of the track, or that it doesn't have a lot of low-level static in it. (If I were an artist, I might be more annoyed about the latter.)
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
CRCs aren't the only kind of checksum out there, though they're nice and fast. Cryptographic-quality checksums avoid the problems - if you change one bit of the input, they change about half the bits of the output, and it's nearly impossible to predict what the changes will be. MD5 was the most popular for a long time, though SHA1 has been replacing it for a variety of technical reasons. MD5 is 128 bits long, SHA1 is 160, so you don't need to worry about collisions unless you have more than 2**64 or 2**80 files.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Unlike Warez or some lossless compression systems, this doesn't work for audio or for video applications using lossy compression instead of distributing exact copies. The reason is that different compression runs don't need to have identical checksums, depending on your compression parameters, equipment, etc., so the Poisoners can go create lots of different files all claiming to be a rip of the real thing, and they can have multiple identities all claiming to have a version to share, so even if you burn one file and one identity, they can trivially create more. If they're clever, they can do this with very little extra work - each version has identical data except in the last block (448 bits for MD5, I forget how many for SHA1), which is juggled a bit. Since music files are large, this means they can do 99.99% of the work once and only have to repeat the last 0.01% multiple times. GPG signatures on the files don't help much either - they've provided a genuine signature saying that jack12345 and lars6789 both downloaded this file of "Whoops I Cloned It Again" and got checksum 12903849021834, but when you listen to it, it's just Poison singing "Happy Copyright Violation Lawsuit To You" with a burst of noise in the last few milliseconds.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
This is different - there's no penalty other than your reputation, the Poisoners have a much stronger legal position than anybody who might complain (Hey - I tried to rip off their music and they gave me a Bad Copy!), identities can be created free by robots, reputations for the identities don't take too much work to forge, and there are lots of creative ways to cheat.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
It's very easy to create a large number of identities in this system, each pretending to be a real person but really just Yet Another Tentacle of the Poisoners. They can all build up great reputations by signing each others's keys, and sending reports into the whoever-archives-reports-about-users system claiming to have done lots of downloads to each other, and they're all listed as having T3 or Ethernet connections so they're very attractive. And they can pump out a large number of files that they've signed, indicating correctly that the checksum on File#12345 is 290384098213 or whatever, for many different files with many different names, all of which are really Poison singing "Happy Copyright Violation Lawsuit To You!" with a different serial-number burst of noise at the end. They can distribute enough non-poisoned songs to create some good genuine reputations, use those to sign peoples' keys and get people to sign their keys, use these reputations to sign the keys of their other tentacles, and start distributing poisoned songs to people who trust them directly or indirectly, using their keys which have been outed as Poisoners to sign the keys of people who aren't tentacles. Even more fun, you can distribute lots of poisoned index data - some P2P systems are much easier to kill that way.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
But if you *do* have Trusted Third Parties, Poisoners will either attack them technically, sue them, or pretend to be them, or all three. And Slashdot MetaModeration isn't directly applicable to this problem, because the disputed event is private, unlike Slashdot postings which third and fourth parties can look at and decide whether they're really Insightful or Trolls.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Bitzi is based on checksumming. After you download a file, you run it through the Bitcollider app to generate a unique checksum which is automatically uploaded to the Bitzi site. Meta-information like ID3 tags, etc. is also extracted from the file if present, and all of this data is combined to create what's known as a "Bitzi ticket." You can vote for the (in)validity of a particular file, and you can also leave comments about a particular file for other users. A ticket can be created for any file, not just MP3s; there are already lots of pornos with Bitzi tickets
The eventual goal is that, before you take the time to download a file, you'll be able to look up its Bitzi ticket and determine whether or not it's what you're really looking for. If 10 people have already indicated that the file is bogus, corrupted, incomplete, etc. you'll be able to safely skip it without wasting time or bandwidth. In order for this to happen on a broad scale, Bitzi needs more users. It's totally a volunteer community effort; someone has to be the first person to run each file through the Bitcollider and generate the initial ticket. Please visit the Bitzi site, register (I can vouch for the fact that it's possible to register with an @example.com address and still access the site just fine), then run all your shared and/or downloaded files through Bitcollider. The more files that get into the Bitzi system, the better; this includes "bad" files, and in fact ticketing "bad" files is probably more useful than ticketing "good" files.
Several popular P2P filesharing clients, including BearShare and eDonkey2K, already have built in support for Bitzi tickets. I hope others will follow suit.
Shaun
Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
I've separately posted a discussion about how it's easy to create large numbers of files with different checksums pretending to be different audio rips of the same tune. Not only does this flood the typical index system, but if the Poisoners can create lots of users, they can all rate the poisoned files as good, or rate non-poisoned files as bad, and they can probably give themselves great karma by first sending in lots of reports about having successfully shared lots of good files with each other.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Do go read about BitTorrent, though - it does use a number of the ideas you've mentioned for efficient distrubtion.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
I have no desire to intentionally spam people, but if that ad isn't all in their faces, is it really that bad of a thing? And honestly, Shareaza works 5x as well for me as Gnucleus ever did. Plus, it looks / feels like a mature application....... not a big deal to some, but a trait that I definitely miss from the Napster days of yore.