Slashdot Mirror


Reputation System Fights P2P Junk

yeejiun writes "Many of the files that are shared on p2p networks tend to be junk. Organizations such as the RIAA and music labels regularly pollute these networks with nonsense files masquerading as real music/video files. These junk files make it difficult for users to find what they want on such p2p networks. Some researchers at Cornell University have developed a reputation system called Credence, that works on the Gnutella network, allowing users to tell the good files from the bad ones."

21 of 338 comments (clear)

  1. eDonkey by mnemonic_ · · Score: 5, Informative

    Doesn't the eDonkey2000 network already have a system like this? Users identify fakes and report them, then the phony file information propagates throughout the network and the fake file dies.

    1. Re:eDonkey by mnemonic_ · · Score: 4, Informative

      Ah, found it: donkey-fakes. eMule automatically downloads the fakes list upon startup, and prevents the files from spreading.

  2. Re:Torrents can be bogus too. by mnemonic_ · · Score: 2, Informative

    ...which only verifies file integrity. It doesn't check if the file is what its filename says it is. It only ensures correct data transfers, not correct data.

  3. NO by zymano · · Score: 2, Informative

    OVERVIEW

    Credence is a robust and decentralized system for evaluating the reputation of files in a peer-to-peer filesharing system. Our goal is to enable peers to confidently gauge file authenticity, the degree to which a file's contents matches its advertised description.

    At the most basic level, Credence employs a simple, network-wide voting scheme where users can contribute positive and negative evaluations of files. On top of this, a client uses statistical tests to weight the importance of votes from their peers. And finally, Credence allows clients to extend the horizon of information by selectively sharing information with their peers.
    Authenticity and Pollution

    We define pollution broadly as any file with content that does not match its description. An authentic file, by contrast, has content that is accurately described by its metadata. We find in practice that pollution in current networks can be easily identified by users without any special knowledge or expertise. As pollution becomes more sophisticated, more advanced detection techniques will need to be developed to help users safely identify malicious content.
    Voting

    The Credence system relies on individual users as the first line of defense against pollution. After a user downloads and uses a file, she is given a chance to submit a single vote to the Credence system: a positive (thumbs-up) vote for authentic files, and a negative (thumbs-down) vote for a polluted file. Each vote is cryptographically signed and entered into the system.
    Vote Gathering

    Credence uses these votes collected in the network to determine the authenticity of content. Credence displays a rating for each file that appears in response to a user query.

    First, the client software executes a search for votes, and downloads a number of votes randomly selected from the network. These votes are then aggregated into a single estimate of the authenticity of the file in question.

    Each vote collected from the network is not used directly, however, since some peers in the network may accidentally vote incorrectly, or even lie intentionally about the file's authenticity. Therefore we assign to each peer a correlation coefficient, or weight, reflecting the historical usefulness of the peer's votes. In effect, this helps remove the incentive for an attacker to lie about the authenticity of files. A consistent liar is, after all, just as useful as an honest peer when it comes to distinguishing authentic files and pollutions. And an inconsistent voter will come to be be ignored by others in the network.
    Information Sharing and Transitive Correlation

    Peer-to-peer networks can grow quite large, and many clients might participate rarely, sharing and voting on only a few files. This means that alone, a client may have trouble quickly discovering peer correlations and other historical data. To alleviate this problem, Credence uses a technique called transitive correlation to quickly spread information among small groups of peers and help clients expand their horizon .

    In Credence, a client periodically requests historical data from selected peers in the network. This data contains information on how the peer voted in the past (cryptographically signed, as before), and information about how the peer is related to other peers in the network. The client can then validate this information for authenticity, then integrate it into its local databases. In this way, not only does the client take advantage of the work other peers do in evaluating files for authenticity, but also gains insight into the behavior of peers in the network. All this is done without need for user interaction, or any peer trust values, which can be difficult for a user to accurately determine.
    Changes to the LimeWire Client and Gnutella Network

    Credence is integrated into the LimeWire client, and works on top of the Gnutella network. The implementation is built entirely on top of existing primitives in the Gnutella protocol. It opens up no additional ports

  4. Good summary by kernel_dan · · Score: 3, Informative

    For those of you that can't be bothered to RTFA, this system takes a profile of how you vote on files and matches you with other people who voted similarly. Thus, the spammers would see different ratings than 'normal users.'

    --

    Illegal? Samir, This is America.
  5. rtfa, sucka. by knowles420 · · Score: 5, Informative

    7. Can a group of spammers game the Credence algorithm by voting thumbs-up for each others' spam ?

    No. The trustworthiness computation is designed to preclude such attacks.

    8. What happens when a large number of spammers vote each others' spam up ? Can they fool the reputation system ?

    No. Credence's reputation computation is similar to Google's PageRank, but is more general - every node computes a different rank based on its own votes. Reputation flows from a given good node along trust edges towards other nodes. Spammers can create tight cliques in which everyone votes on each others' spam, but the entire clique will be deemed untrustworthy. And if anyone in the spammer clique does a search, they will see each others' spam ranked high.

    or, just do whatever you want.
    --
    -knowles
    1. Re:rtfa, sucka. by PylonHead · · Score: 5, Informative

      No, the pot smoker is right. Your brain is too small to absorb their goodness.

      In their system there is no single "high reputation" metric. Everyone had a different reputation to each other. Three people, A, B and C. A may have a high reputation as far as B is concerned, but C thinks A has a low reputation.

      They do this by grouping people who vote the same way. So you trust the people that vote like you do.

      Assuming that you vote good files up and bad files down, you will be grouped with people who do the same. At some point, the spammers have to start voting differently than you do.. voting their spam up. This will distance them from your trust network, and cause you to value their opinion less.

      --
      # (/.);;
      - : float -> float -> float =
    2. Re:rtfa, sucka. by xquark · · Score: 3, Informative

      yes correct, and in-fact it can be taken one step further:

      assume the system is able to determine symmetric groups.
      that is groups that have totally (or near totally) different
      voting directions, an example would be the honest group and
      the spammers group.

      if say the spammers vote something up, instead of the honest
      group ignoring their rating, they can use the symmetric
      properties between their group and the spammer's group to
      re-enforce their vote (aka the credence) of the file in
      question - in this case rate it down even further.

      If the right restrictions were put in place such as the fact
      that the symmetric effect will only effect files that have a
      negative credence and not files that have a positive credence,
      then various forms of collusion can be over come.

      A lie can always be turned into a truth and a truth into a lie...

      Arash Partow
      __________________________________________________
      Be one who knows what they don't know,
      Instead of being one who knows not what they don't know,
      Thinking they know everything about all things.
      http://www.partow.net/

      --
      Arash Partow's Philosophy: Be a person who knows what they don't know, and not a person who doesn't know.
  6. Re:Torrents can be bogus too. by nunchux · · Score: 3, Informative

    True... But a bogus torrent usually doesn't survive too long and certainly doesn't see too many seeders. If it's been up for a day or two you can be reasonably sure it's valid.

    Also, even the "pirate" torrent sites are centralized and often even have administrators, sometimes even comment boards. If a torrent is bogus, someone will take it down. (Not that I've been to those sites, of course...)

    Of course this could all be manipulated, but AFAIK it hasn't been yet by the powers-that-be... And I don't see why they'd bother, when a threatening letter is all it usually takes to take a torrent site down, and it would take considerably more effort than turning a bunch of scratchy mp3's loose on kazaa.

  7. Re:Torrents can be bogus too. by Anonymous Coward · · Score: 1, Informative

    Ok, iirc, BT uses what looks like sha. How can BT prevent hash collision attacks (rare, but in case of big media, possible)

    Not possible. Bittorrent uses SHA-1, which has only recently (Febuary) been reported to be collisionable in 2^69 hash computations.

    So yes, if your chunk size is 536,870,912 Gb, and you have a supercomputer working on it for a year or so, you will be able to find a colliding hash.

    Yeah. Possible indeed.

  8. Re:I'm a little lost in this whole thing by Chandon+Seldon · · Score: 2, Informative

    Ever heard of trojan horses? Spam zombies are worth good money.

    --
    -- The act of censorship is always worse than whatever is being censored. Always.
  9. Re:FP? - And that's why I use Bittorrent... by Anonymous Coward · · Score: 1, Informative

    That's why you should try sites like http://www.seedler.org/ they seem to do a good job at removing the crap.

    And indeed as somone said, watch the torrent comments. they help a lot.

  10. Even better answer by quadra23 · · Score: 2, Informative
    quit downloading crap off of kazaa/grokster/morpheous/etc

    Use a P2P program that actually includes some 'anti-junk" features. I typically use Shareaza (probably not the best, and I'm sure someone will state a better P2P but the points still remains, Shareaza does offer some features these clients do not -- including a rating/comment system that goes with the file whenever anyone finds a search result for it). Usually I know if the file is a fake before I download because I use some obvious signs:

    • How many sources have this file? (more can be just as suspicious as legitimate
    • Is the file size relatively the same to one fake file I already downloaded? (yes, sometimes they are just copies with different names)?
    • What kind of comments/ratings does the file have when I select it in the search list? (of course this could be a little flaky if the 'junk spreader' decided to positively review the file)

    I prefer the client program including these features, especially when it's available to connect to several networks at the same time. Nothing worse then getting a 100MB+ file and realizing you wasted the bandwidth for not, or the program you downloaded wasn't the same as the file name (more legit, but not what you were looking for).

    Do be careful because some files that are really a virus can be detected by AV as 'ok'. Thankfully I found the virus before it did much damage and by reading the Symantec AV report I was able to make sure I removed it completely. Just because one 'setup.exe' claims to be a setup program don't trust it unless you trust the name of the setup program -- "Program Setup Wizard" does not cut it!

    Since Shareaza also supports torrents I usually go through torrent sites and have rarely had any 'junk' files from the torrents. The more junk the RIAA (and other companies!) try to spread the better we get at ignoring and working around it!

  11. Re:FP? - And that's why I use Bittorrent... by larry+bagina · · Score: 2, Informative
    mpeg/avi/wmv/mp3/ogg/etc are already compressed with an encoder specific for video/audio, so secondary compression from zip or rar isn't particularly helpful. However, zip and rar can password protect files, so if you want to see britney's 6-month pregnant sex video, the password id the 3rd word of the 2nd paragraph after you sign up for a "totally free" pr0n site.

    --
    Do you even lift?

    These aren't the 'roids you're looking for.

  12. Re:Problems by Beolach · · Score: 2, Informative
    Except the way it works is that the reputation of a file that you see is based not on the over-all votes of the total population (including spammers). The reputation of a file that you see is only based on the votes of other peers that you have a high correlation with, based on what files you rate as good and bad. So if you have rated 9 files, and I have rated those same 9 files in the same way you did, then Credence would trust my ratings for you.
    From the FAQ:
    3. How does Credence know who is trustworthy and who is a spammer?
    Initially, it doesn't. As you vote for files, it stores your votes and discovers the set of peers with whom your votes are correlated. It also communicates with peers to find out about other peers with whom they in turn are correlated. The outcome of this computation is a numerical value computed for each file appearing in query results that reflects the probability that the given file is trustworthy.

    If you vote thumbs-up for good files and thumbs-down for bad files, you will be grouped with the vast majority of people who also vote honestly. You will then compute a high trustworthiness metric for all files that this (potentially very large) group of users has ever voted on. If you vote inaccurately (i.e. you are a spammer), you will compute a low trustworthiness metric for other non-spam files, and honest users will compute a low trustworthiness coefficient for your opinion. It is thus in your best interest to vote honestly.
    ...
    6. I hate the music group X. Should I vote thumbs-down for their songs?

    No. See the question above - your votes should simply reflect whether the file's description is accurate and whether its contents are intact. Voting thumbs-down for a perfectly good file may cause your node to be lumped in with spammers and reduce the effectiveness of Credence for you (i.e. you will likely see more spam in your searches).
    --
    Join moola.com, play games to earn money.
  13. Re:FP? - And that's why I use Bittorrent... by Anonymous Coward · · Score: 2, Informative

    A large amount of video releases posted to torrent sites are "scene" releases that come from usenet.

    These releases are typically rar-ed into multiple parts to allow for easy and reliable posting to usenet.

    People simply taking a scene release and uploading it to a torrent site is quite common, so these rar releases on places like The Pirate's Bay are nothing to worry about. It's usually a sign that it's a "good" release if you see many *.r0* or *rar files.

    Of course be on the lookout for *exes inside of compressed releases, but the presence of rars means nothing negative as far as a torrent being legit.

  14. Re:Problems by Beolach · · Score: 2, Informative

    Whoops, posted too soon. The second potential problem you describe is more in line with how Credence is described to work, but I think it's unlikely to be a very big problem. Yes, the system will probably allow for "mistakes," but it will cull those mistakes out. So if the spammer rates most good files good and bad files bad, but rates their one spam file also good, then it is possible your client will report that spam file as having a high credibility. But, once you (or anyone else) download and find that it is not a good file, you will rate it bad, and as more people rate it bad, its credibility will go down. It's a case of diminishing returns for the spammer.

    --
    Join moola.com, play games to earn money.
  15. Re:Torrents can be bogus too. by frostw · · Score: 2, Informative

    Ummm, yes there is. For instance, VLC media player will play partly downloaded videos.

    --
    http://www.sydney-webcam.com
  16. Re:Self-policing is needed by Penguin · · Score: 3, Informative

    Yeah, because 300 years certainly isn't enough for a word to be recognized...?

    From http://www.etymonline.com/index.php?term=pirate :

    "Meaning "one who takes another's work without permission" first recorded 1701"

    Come on, the term is older than RMS!

    --
    - Peter Brodersen; professional nerd
  17. Re:FP? - And that's why I use Bittorrent... by Anonymous Coward · · Score: 1, Informative

    Simply prioritize the first rar chunks, or first few chunks of a torrent that has been rar-ed. Open, and preview with mplayer or vlc.

    Grabbing the first chunk of a video out of a bunch of rars will actually allow you to preview a movie more easily than if torrent "contained" one large movie. If you're DL-ing a large movie file, you just get random chunks of it here and there. To preview something in mplayer or vlc you pretty much need to get the first chunk or last chunk. You will grab the chunks of a large *.avi file pretty much at random, so you may not be able to preview that DVD-rip for a good number of hours... just depends on when you happen to grab the right chunks.

    With a release that is a bunch of rars, you can choose to grab the very first part, or couple parts, of a movie and then unpack and have a look at what you're getting. So it's actually quicker to preview a release that is in "usenet split rar" format than trying to get the right chunks from one big avi.

    And mplayer will play everything... the first part of an *.iso or *.bin for example, just grab the first few rars and you can preview within minutes.

    Of course an avi that is put in one big rar is fairly pointless, not much compression is gained, but for pictures compression will save some space and time, though as you say if it's just one big rar you won't be able to preview.

    But a bunch of little rars is just fine for previewing releases.

  18. Re:FP? - And that's why I use Bittorrent... by MasterSLATE · · Score: 2, Informative

    Azureus has that functionality built in. There's a setting for prioritize first chunks (maybe its first/last, but memory says its first).

    --

    [sig]www.masterslate.org[/sig]