Faster P2P By Matching Similiar Files?

← Back to Stories (view on slashdot.org)

Faster P2P By Matching Similiar Files?

Posted by ryuzaki0 on Wednesday April 11, 2007 @03:45AM from the something-doesn't-jive-here dept.

Andreaskem writes "A Carnegie Mellon University computer scientist says transferring large data files, such as movies and music, over the Internet could be sped up significantly if peer-to-peer (P2P) file-sharing services were configured to share not only identical files, but also similar files. "SET speeds up data transfers by simultaneously downloading different chunks of a desired data file from multiple sources, rather than downloading an entire file from one slow source. Even then, downloads can be slow because these networks can't find enough sources to use all of a receiver's download bandwidth. That's why SET takes the additional step of identifying files that are similar to the desired file... No one knows the degree of similarity between data files stored in computers around the world, but analyses suggest the types of files most commonly shared are likely to contain a number of similar elements. Many music files, for instance, may differ only in the artist-and-title headers, but are otherwise 99 percent similar.""

5 of 222 comments (clear)

Nickelback? by onemorehour · 2007-04-11 03:46 · Score: 5, Funny

Many music files, for instance, may differ only in the artist-and-title headers, but are otherwise 99 percent similar.

Well, sure, if you're only looking at Nickelback songs.
1. Re:Nickelback? by thepotoo · 2007-04-11 04:17 · Score: 5, Interesting
  
  If you use bittorrent, the DHT protocol (supported by Azureus, BitComet, and uTorrent, among others) does the exact thing you're describing. It checks MD5 hashes for files (the whole file, not the pieces, I think), and connects you to peers which have the same file.
  DHT even supports partially corrupted files, your client just discards the corrupt data.
  My question is, why would I want to use SET over DHT? Does SET not need a ceneralized server, or does it have any other advantage at all?
  TFA is really short on technical details, but it sounds to me as though SET is just a re-design of DHT. Still, I imagine SET support will be in the next builds of all the major bittorrent clients if it ends up being worth something.
  
  --
  Obligatory Soundbite Catchphrase
grea tide a by underwhelm · 2007-04-11 03:51 · Score: 5, Funny

I'm hoping this CATCHES ON and wet ransfer a11 sorts of information like this. It'11 be 1ike getting every thing in the form of a ransom n0te.

--
I don't need large brains to have a good time.
Re:Right.... by angio · 2007-04-11 04:01 · Score: 5, Informative

Take a peek at the paper - it actually does work, and we demonstrated it. The intuition: people make small changes to files like changing the artist or title in the MP3 header, and then BitTorrent and other systems treat this as a "different" file, when in fact it's 99.9% similar.
(Yes, I'm one of the authors.)
Re:TorrentSoup by joe_cot · 2007-04-11 04:06 · Score: 5, Informative

It would still work the same way as it does now: an md5 of each specific block, and an md5 of the whole thing. If the md5 for the block doesn't match, it's not going to download, and if it's someone using collision to inject a block with the same md5, 1) it's not going to pass the md5 on the whole thing, 2) you're already vulnerable to it. The reason this will work is that they'll be lots of people sharing incomplete or corrupted versions of your FreeBSD iso; you'll get the blocks that are good, and skip the blocks that aren't, making "similar" files very useful. Not too difficult to understand, and no need for tin foil hats.