Reputation System Fights P2P Junk
yeejiun writes "Many of the files that are shared on p2p networks tend to be junk. Organizations such as the RIAA and music labels regularly pollute these networks with nonsense files masquerading as real music/video files. These junk files make it difficult for users to find what they want on such p2p networks. Some researchers at Cornell University have developed a reputation system called Credence, that works on the Gnutella network, allowing users to tell the good files from the bad ones."
Doesn't the eDonkey2000 network already have a system like this? Users identify fakes and report them, then the phony file information propagates throughout the network and the fake file dies.
...which only verifies file integrity. It doesn't check if the file is what its filename says it is. It only ensures correct data transfers, not correct data.
OVERVIEW
.
Credence is a robust and decentralized system for evaluating the reputation of files in a peer-to-peer filesharing system. Our goal is to enable peers to confidently gauge file authenticity, the degree to which a file's contents matches its advertised description.
At the most basic level, Credence employs a simple, network-wide voting scheme where users can contribute positive and negative evaluations of files. On top of this, a client uses statistical tests to weight the importance of votes from their peers. And finally, Credence allows clients to extend the horizon of information by selectively sharing information with their peers.
Authenticity and Pollution
We define pollution broadly as any file with content that does not match its description. An authentic file, by contrast, has content that is accurately described by its metadata. We find in practice that pollution in current networks can be easily identified by users without any special knowledge or expertise. As pollution becomes more sophisticated, more advanced detection techniques will need to be developed to help users safely identify malicious content.
Voting
The Credence system relies on individual users as the first line of defense against pollution. After a user downloads and uses a file, she is given a chance to submit a single vote to the Credence system: a positive (thumbs-up) vote for authentic files, and a negative (thumbs-down) vote for a polluted file. Each vote is cryptographically signed and entered into the system.
Vote Gathering
Credence uses these votes collected in the network to determine the authenticity of content. Credence displays a rating for each file that appears in response to a user query.
First, the client software executes a search for votes, and downloads a number of votes randomly selected from the network. These votes are then aggregated into a single estimate of the authenticity of the file in question.
Each vote collected from the network is not used directly, however, since some peers in the network may accidentally vote incorrectly, or even lie intentionally about the file's authenticity. Therefore we assign to each peer a correlation coefficient, or weight, reflecting the historical usefulness of the peer's votes. In effect, this helps remove the incentive for an attacker to lie about the authenticity of files. A consistent liar is, after all, just as useful as an honest peer when it comes to distinguishing authentic files and pollutions. And an inconsistent voter will come to be be ignored by others in the network.
Information Sharing and Transitive Correlation
Peer-to-peer networks can grow quite large, and many clients might participate rarely, sharing and voting on only a few files. This means that alone, a client may have trouble quickly discovering peer correlations and other historical data. To alleviate this problem, Credence uses a technique called transitive correlation to quickly spread information among small groups of peers and help clients expand their horizon
In Credence, a client periodically requests historical data from selected peers in the network. This data contains information on how the peer voted in the past (cryptographically signed, as before), and information about how the peer is related to other peers in the network. The client can then validate this information for authenticity, then integrate it into its local databases. In this way, not only does the client take advantage of the work other peers do in evaluating files for authenticity, but also gains insight into the behavior of peers in the network. All this is done without need for user interaction, or any peer trust values, which can be difficult for a user to accurately determine.
Changes to the LimeWire Client and Gnutella Network
Credence is integrated into the LimeWire client, and works on top of the Gnutella network. The implementation is built entirely on top of existing primitives in the Gnutella protocol. It opens up no additional ports
For those of you that can't be bothered to RTFA, this system takes a profile of how you vote on files and matches you with other people who voted similarly. Thus, the spammers would see different ratings than 'normal users.'
Illegal? Samir, This is America.
-knowles
True... But a bogus torrent usually doesn't survive too long and certainly doesn't see too many seeders. If it's been up for a day or two you can be reasonably sure it's valid.
Also, even the "pirate" torrent sites are centralized and often even have administrators, sometimes even comment boards. If a torrent is bogus, someone will take it down. (Not that I've been to those sites, of course...)
Of course this could all be manipulated, but AFAIK it hasn't been yet by the powers-that-be... And I don't see why they'd bother, when a threatening letter is all it usually takes to take a torrent site down, and it would take considerably more effort than turning a bunch of scratchy mp3's loose on kazaa.
Ok, iirc, BT uses what looks like sha. How can BT prevent hash collision attacks (rare, but in case of big media, possible)
Not possible. Bittorrent uses SHA-1, which has only recently (Febuary) been reported to be collisionable in 2^69 hash computations.
So yes, if your chunk size is 536,870,912 Gb, and you have a supercomputer working on it for a year or so, you will be able to find a colliding hash.
Yeah. Possible indeed.
Ever heard of trojan horses? Spam zombies are worth good money.
-- The act of censorship is always worse than whatever is being censored. Always.
That's why you should try sites like http://www.seedler.org/ they seem to do a good job at removing the crap.
And indeed as somone said, watch the torrent comments. they help a lot.
Use a P2P program that actually includes some 'anti-junk" features. I typically use Shareaza (probably not the best, and I'm sure someone will state a better P2P but the points still remains, Shareaza does offer some features these clients do not -- including a rating/comment system that goes with the file whenever anyone finds a search result for it). Usually I know if the file is a fake before I download because I use some obvious signs:
I prefer the client program including these features, especially when it's available to connect to several networks at the same time. Nothing worse then getting a 100MB+ file and realizing you wasted the bandwidth for not, or the program you downloaded wasn't the same as the file name (more legit, but not what you were looking for).
Do be careful because some files that are really a virus can be detected by AV as 'ok'. Thankfully I found the virus before it did much damage and by reading the Symantec AV report I was able to make sure I removed it completely. Just because one 'setup.exe' claims to be a setup program don't trust it unless you trust the name of the setup program -- "Program Setup Wizard" does not cut it!
Since Shareaza also supports torrents I usually go through torrent sites and have rarely had any 'junk' files from the torrents. The more junk the RIAA (and other companies!) try to spread the better we get at ignoring and working around it!
Do you even lift?
These aren't the 'roids you're looking for.
From the FAQ:
Join moola.com, play games to earn money.
A large amount of video releases posted to torrent sites are "scene" releases that come from usenet.
These releases are typically rar-ed into multiple parts to allow for easy and reliable posting to usenet.
People simply taking a scene release and uploading it to a torrent site is quite common, so these rar releases on places like The Pirate's Bay are nothing to worry about. It's usually a sign that it's a "good" release if you see many *.r0* or *rar files.
Of course be on the lookout for *exes inside of compressed releases, but the presence of rars means nothing negative as far as a torrent being legit.
Whoops, posted too soon. The second potential problem you describe is more in line with how Credence is described to work, but I think it's unlikely to be a very big problem. Yes, the system will probably allow for "mistakes," but it will cull those mistakes out. So if the spammer rates most good files good and bad files bad, but rates their one spam file also good, then it is possible your client will report that spam file as having a high credibility. But, once you (or anyone else) download and find that it is not a good file, you will rate it bad, and as more people rate it bad, its credibility will go down. It's a case of diminishing returns for the spammer.
Join moola.com, play games to earn money.
Ummm, yes there is. For instance, VLC media player will play partly downloaded videos.
http://www.sydney-webcam.com
Yeah, because 300 years certainly isn't enough for a word to be recognized...?
From http://www.etymonline.com/index.php?term=pirate :
"Meaning "one who takes another's work without permission" first recorded 1701"
Come on, the term is older than RMS!
- Peter Brodersen; professional nerd
Simply prioritize the first rar chunks, or first few chunks of a torrent that has been rar-ed. Open, and preview with mplayer or vlc.
Grabbing the first chunk of a video out of a bunch of rars will actually allow you to preview a movie more easily than if torrent "contained" one large movie. If you're DL-ing a large movie file, you just get random chunks of it here and there. To preview something in mplayer or vlc you pretty much need to get the first chunk or last chunk. You will grab the chunks of a large *.avi file pretty much at random, so you may not be able to preview that DVD-rip for a good number of hours... just depends on when you happen to grab the right chunks.
With a release that is a bunch of rars, you can choose to grab the very first part, or couple parts, of a movie and then unpack and have a look at what you're getting. So it's actually quicker to preview a release that is in "usenet split rar" format than trying to get the right chunks from one big avi.
And mplayer will play everything... the first part of an *.iso or *.bin for example, just grab the first few rars and you can preview within minutes.
Of course an avi that is put in one big rar is fairly pointless, not much compression is gained, but for pictures compression will save some space and time, though as you say if it's just one big rar you won't be able to preview.
But a bunch of little rars is just fine for previewing releases.
Azureus has that functionality built in. There's a setting for prioritize first chunks (maybe its first/last, but memory says its first).
[sig]www.masterslate.org[/sig]