Reputation System Fights P2P Junk
yeejiun writes "Many of the files that are shared on p2p networks tend to be junk. Organizations such as the RIAA and music labels regularly pollute these networks with nonsense files masquerading as real music/video files. These junk files make it difficult for users to find what they want on such p2p networks. Some researchers at Cornell University have developed a reputation system called Credence, that works on the Gnutella network, allowing users to tell the good files from the bad ones."
quit downloading crap off of kazaa/grokster/morpheous/etc. dont trust brittneyspearsporno.avi.mpeg.exe
lameness filter thwarted.
I thought the primary purpose of P2P filesharing was to share legally swappable media files as well as other files like documents and useful freeware applications. Is there some nefarious entity flooding the P2P networks with garbage disguised as those files above? Why would you need to know the quality of the file's reputation?
Jesus saved me from my past. He can save you as well.
if the RIAA is willing to create junk files, you really don't think they are going to create fake accounts to rate their junk files as "good"? ANY system you put in place that gathers "votes" from users can be manipulated.
Doesn't the eDonkey2000 network already have a system like this? Users identify fakes and report them, then the phony file information propagates throughout the network and the fake file dies.
Don't you mean the real illegal files from the fake illegal files? Seriously, it is no surprise to me why P2P has gotten a bad rap. Many of the users simply use P2P apps to commit piracy.
Yes, there are legit uses as well. But honestly, if you are looking for free music from a band that has released it as such, you can usually find it. It's the copyrighted commercial music and video that have tons of fake files, porn movies, etc...Not Jim Blow Sings the Blues, Live from Natrona, PA!
If a file appears to by RIAA-affiliated music, treat it as a junk file.
Why bother with music the artist doesn't want you to have? Just forget about it altogether and discover new music, even new types of music that you'd never realize existed, much less that you could enjoy.
For those of you that can't be bothered to RTFA, this system takes a profile of how you vote on files and matches you with other people who voted similarly. Thus, the spammers would see different ratings than 'normal users.'
Illegal? Samir, This is America.
The fact that I didnt get to play HL2 was compensated by the 2 hours of dwarf porn.
-knowles
I like this idea. Media hordes, read as RIAA and MPAA, will constantly try to find technical ways to put the P2P genie back in the bottle.
/. mobs will just mock them.
For every Napster (Kazaa, etc.) they close, another will be spawned. For every fake or intrusive system they create to battle downloaders, another downloading method will be innovated. For every commercial they feature a celebrity crying copyright heresy,
It's no shattering concept there'll never be a checkmate for either side.
Some aim to please, I aim to tease.
I think the main insight and contribution of the system is that the reputation of a peer according to you is determined by whether he/she votes in a similar manner as you.
So if the RIAA starts spamming Gnutella with lots of junk stuff, you will never vote in the same way as the RIAA dummy accounts, and you don't take their votes into account.
In fact, it seems the system is even smarter than that - it can take votes from people that are strongly uncorrelated with you and use that as negative information. So anything these people vote as valid files, you can treat as garbage as their definition of good/bad files is completely opposite to yours. And assuming you trust your own judgement, that means those files must be bogus.
Reminds me a lot of the google pagerank system, but with explicit learning/training instead of using back-links for determining correlation.
True... But a bogus torrent usually doesn't survive too long and certainly doesn't see too many seeders. If it's been up for a day or two you can be reasonably sure it's valid.
Also, even the "pirate" torrent sites are centralized and often even have administrators, sometimes even comment boards. If a torrent is bogus, someone will take it down. (Not that I've been to those sites, of course...)
Of course this could all be manipulated, but AFAIK it hasn't been yet by the powers-that-be... And I don't see why they'd bother, when a threatening letter is all it usually takes to take a torrent site down, and it would take considerably more effort than turning a bunch of scratchy mp3's loose on kazaa.
The research and motivation for this is important. If peer to peer networks can be subverted, then they have lost their usefulness. IMO, the sharing of copyrighted data is unavoidable, and sacrificing the freedom of a protocol in an attempt to prevent it is shortsighted.
It probably would have been better for Cornell if it had been left as a paper, rather than implementing it.
"A language that doesn't affect the way you think about programming, is not worth knowing" - Alan Perlis
Many hardcore file shares and hosters, dare I say most that would call themselves hardcore, are not in it for getting free content on demand when they want it. They are into collecting absolutely anything and everything they can get their hands on. In some collections, people wouldn't possibly, in their lifetimes,be able to listen to all the music or watch all those movies. But just the thought of having it makes many hoarders happy. And it's not even necessarily reputation amongst others. It could be in many cases, but not always. They just have to have it.
What's my point? Well, this is the greatest strength and weakness of peer to peer. Hoarders ensure a healthy flow of files, but they rarely actually check what they have. They don't check to see the software works, or if the music is a complete copy, or that the movie was cut down to a quarter of the original screen size.
This is what companies take advantage of, both those who want to hurt swapping, and those who just want to seed files for the purpose of installing some evil spyware. It's nice to have a bunch of people trying to seed the masses but cmon the point of file sharing is to pool our independent resources. For someone who doesn't have all day to search for files and test quality and whatnot, it is sometimes less painful to just go buy the CD than it is to actually try to download it amongst the mess of files that are out there.
"All great wisdom is contained in .signature files"
Many many companies (and individual artists) have faced SERIOUS economic damage by attempts to thrawt P2P from being absolutely ubiquitous and maximally effective. Estimates are in the BILLIONS of dollars (US only) of lost sales in broadband connections, blank media disks, large hard disk drives, software support, consulting fees, home audio/video equiptment, and the like. And Western countries are fast falling behind as the majority of educated citizens from developing nations take advantage of the black market for these goods and services while Western citizens are blocked in droves by propaganda, political corruption, inferior substitutes, and FUD from fully participating in the open exchange of science, the arts, poltical discorse, and culture in general.
Credence will hopefully bring us a bit closer to reaching our current potential.
This may automate the reviewing process
1. Mark a bunch of good files as good
2. Mark your bogus file as good
3. Spread your vote list on zombie network
4. Your votes corrolate highly with "good files", and there's no counter-votes by others (yet)
5. Trick lots of people to download it (the rating goes to shit eventually, but...)
6. New bogus file. Goto 1.
In addition, you have an issue with semi-good files. What if the encoding is flawed, should you mark it as bad or good? Either case can put you at odds with the general opinion.
Third, you have an issue with files trolling for incorrect votes. Create a "non-obviously" bogus file, which some people will mark bad, others good. You'll create a lot of conflicting votes and "noise" in the system to make attacks like above possible.
Kjella
Live today, because you never know what tomorrow brings
Who actually searches for files in the P2P client? Normally you visit some site where the releaser himself posted a torrent or an ed2k link and you download that.
I can't remember the last time I actually searched in eMule.
Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
couldnt agree more. P2P is superb stuff, and has all kinds of legit uses, but to pretend that its not 95% used to download copyrighted music and movies and thus save a few bucks is just denial.
There are far too may slashdotters who reply to any article on copyright with "get with the system dude! copyright is over!" usually they seem to be 13 year old kids who dont understand what its like to have your income and career based on developing electronic products.
Do people really think that Lord of the Rings deserved to sell just 1 copy, to the p2p hacker who ripped it?
DRM-free indie games for the PC and Mac: Positech Games
Can this also be used as a metric for the RIAA and MPAA to decide which people to take legal action against? Go for the most trusted, most highly rated individuals and take out the most influential (central? critical?) nodes. In the same way that cliques of poisoners would stand out.
Xix.
"Everything is adjustable, provided you have the right tools"
I disagree that these scientists are breaking any *legitimate* law, but if you accept as a premise that they are, then they are in fact breaking the law using taxpayer dollars.
Instead of modding that down it should be modded up so more people can discuss the ramifications.
Do we allow taxpayer dollars to be spent on civil disobedience? On that issue, I am very unsure.
--- Grow a pair, liberals... stop letting the Republicans bully you!
The system seems like a tool to use against the RIAA/MPAA to block pollution efforts. However, then the other shoe drops, and the RIAA/MPAA has a tool to target the highest ranked nodes/cliches/people. No longer do they need to figure out how many files you have.
They just have to find one file, extrapolate your rank to the average system rank, run a few numbers (and maybe a few inflated costs in there too), and bam... for sharing Happy Birthday To You.mp3, you get slapped with a $1 million infringement case because you happen to rank as a very high legitimate link.
On the other hand, this might be benefitial to take the heat off of the majority of the file trading community that honestly is NOT costing them any money. They don't need to target the casual "weekend downloader", who's rank should be significantly lower (being a new node on the network) than some guy with 4 160GB HDD's of the latest releases to theater and DVD. Nobody feel sorry when these guys (or gals) get busted. When 14 year old choir girls get busted, there is PR hell to pay. This system allows them to do that.
Didn't RTFA, but that's my first impression. A use to boost network quality, a use to increase (not decrease) the reach of the **AA's, and a use that may help both sides.
"Every tool has at least 2 completely unassociated uses. A spoon can serve food to your mouth, or gouge the eyes out of your enemies." - Me
I8-D
And I don't see why they'd bother, when a threatening letter is all it usually takes to take a torrent site down
That's not really true. Depending on where the site is hosted, legal threats could be more humerous than scarry.
Case in point.
Btw, if you've got a few minutes to kill, you should really check out some of the emails to and responses from thepiratebay.com. They are hilarious!
- Fake files. This is clearly a more primitive tactic and can be thwarted by clients that can be set to download the first parts of a file first.
- Incomplete files. The seeder reports having the entire file, but will never deliver certain parts of it. Thus, downloaders get stalled at 98.5%. And it's amazing how long people will wait for that last bit.
- Fake seeds. Haven't confirmed how this one works, but sometimes you'll see a torrent with an improbable number of seeders (e.g., 300 seeds and 100 leechers for a fairly new torrent). Lots of seeds attract more people.
- Timing. For example, demand for a movie will rise in the days shortly before its release. If you get your fake tracker up and running during that critical time before there's a real pirate version out, then you'll attract downloaders and waste their time. And there's a snowball effect: when people go to download from BT, all of things being equal they usually go for the tracker that has the most people on it.
Combine the tactics, and you've got a serious problem. Every user adds to the strength of the distribution network so tying up one client with a fake not only prevents that client from getting the material, it also keeps that client from helping others get it as well.If you're patient, persistent, and knowledgeable, you can avoid or minimize the impact of these spoofing tactics. But patient, persistent and knowledgeable don't really describe the average pirate (or just about anyone else, for that matter). The dedicated pirate simply won't be stopped, and the content producers know this.
Like you, I once assumed that the various forms of moderation on the torrent sites would mitigate this. But the countermeasure are slow to work, as I've seen fake torrents stay up for weeks. It's easy to post multiple new fakes. And users are incredibly clueless. I have, on several occasions, seen comment threads where several people will post "This is a fake, don't bother," but the torrent will still have thousands of people downloading and the very next comment will be something like "I've been stuck at 99% for three days, will somebody fucking seed this!!" Remember, the goal isn't to elimiate the network. The goal is to make it so untrustworthy and unreliable that it's too much trouble for Joe User and he'll go to the theater instead.