Slashdot Mirror


Mission: Infiltrate the P2P Network

prostoalex writes "Wired News unveils the secrecy behind Overpeer, the company whose mission is to infiltrate peer-to-peer networks with low-quality audio and video files, or corrupted chunks of data which carry the same name and have the same size as originals. Apparently OverPeer even managed to procure a USPTO patent on (a) producing an advertising digital music file by deteriorating or damaging a sound quality of an original music file of a record of a cooperating record corporation; and (b) distributing the advertising digital music file through the communication network."

20 of 532 comments (clear)

  1. Re:MD5? by JimDabell · · Score: 4, Informative
    Isn't there some magical algorithm that produces an unique checksum number for a file, and if it were missing chunks wouldn't that reflect in that magical number? Don't most P2P networks use this magical MD5 checksum algorithm to ensure files aren't screwed up?

    Yes, but the client supplies the checksum. There's nothing to stop a client from sending a phony checksum.

    In any case, the checksum only really protects against things getting screwed up through the transfer - if they are screwed up to begin with, the checksum isn't going to help at all.

  2. Re:MD5? by frp001 · · Score: 2, Informative

    On the other hand checksumming is not a garanty of uniqueness : If not it would be called compression (Cool a 4 minute song on a MD5 checkum).

    --
    May I use your sig please?
  3. Stupid. by grub · · Score: 5, Informative


    It won't work well with all P2P networks. A prime example is the eDonkey network which uses a hash of each file as an identifier, not a filename/size identifier. You can rename the file to anything and the hash won't change. eMule Project is another great eDonkey network client and is open source.

    This is too little, too late, unless you're stuck on Kazaa.

    --
    Trolling is a art,
    1. Re:Stupid. by grub · · Score: 2, Informative


      How does the system ensure that the file the hash was computed from is the same file the client will be giving to other users?

      If I read your question correctly, you're referring to what's called a hash collision, that's highly unlikely. Schneier's "Applied Cryptography" has a lot of good reading on this. Parts (or "chunks" as eDonkey/eMule call them) which come in 9 MB pieces are also checked. It's a pretty sweet system. When you see a file with a lot of sources and you've gotten the file ID from a reputable source, say ShareReactor or FIleDonkey you shouldn't have any problems.

      --
      Trolling is a art,
    2. Re:Stupid. by Anonymous Coward · · Score: 1, Informative

      Your client verifies the hash of each segment of the file after its been downloaded. If overpeer uploads a corrupted segment, your client will identify that the hash is wrong and discard it, then try to download the correct segment from somebody else. The donkey clients use a very sophistocated redundancy checking system, I realy don't think overpeer will work against it.

  4. Re:Overpeer Or Overpee-er? by hagardtroll · · Score: 2, Informative

    And this would cause people to WANT to visit their overpriced pay per use pool? I haven't bought a CD in many years. I also do not participate in P2P piracy. I find plenty of good FREE quality tunes in legitimate distribution channels. MP3.com, et al. provide me with enough legit free material. I no longer desire to spend $18.00 for a CD of bland uninteresting music the RIAA is spewing.

  5. Re:Won't Work by olethrosdc · · Score: 4, Informative

    So suppose you do a search for 'Band XYZ'
    and you get results
    BAND XYZ - I can't write a song (md5=12345)
    BAND XYZ - I cant write a song (md5=91283)

    One of them is the real and the other is the decoy. Which one is which?

    Or if they are ripped from analogue sources, they would be different.

    The md5 thing only works if all files are exactly the same.

    --

    I miss my rubber keyboard.(Homepage)

  6. Community review/link sites. by jonathan_ingram · · Score: 3, Informative

    It's not too hard to avoid low quality/bogus files. All you need is some form of rating and feedback system. ShareReactor fulfills this need for the eDonkey network, providing links to verified versions of files. I imagine it's very possible to decentralise this system significantly, or even to integrate it into the file sharing protocol itself, in order to reduce the possibility of the rating site being shut down.

  7. Re:MD5? by jetmarc · · Score: 3, Informative

    > No its not PRACTICAL...but maybe they've got some brute force per song?

    They'd need A LOT of brute force. Still today exist no two known files with same MD5 hash. You could claim the big price if you could come up with two such files!

  8. Re:It just doesn't make sense. by SN74S181 · · Score: 2, Informative

    I would like to know when this is all going to come to a head,

    Umm, it stops when the consensus model of content sharing breaks down horribly because it's entirely possible to do this kind of thing. Unless a 'centralized authority' happens along or some form of 'peer authentication' method is devised (which requires some form of centralized authority) they eventually win.

    'Consensus model' schemes only work in subcultures. They fail dramatically when scaled to the whole world. That in a nutshell describes all the problems with the 'net as it exists today.

  9. Re:audio files are rarely identical by cameleon · · Score: 2, Informative

    Rip a CD on two different drives and the chances that some bits will be different in the resulting files are really pretty good.

    Not if you use a good ripping program like Exact Audio Copy and a reasonably good (i.e. not with multiple big scratches) cd. Of course if you then encode it, the end result will still depend on the encoder (LAME, Ogg), the version, and the settings used, so your point still stands.

  10. Re:Already been done by Anonymous Coward · · Score: 1, Informative

    eDonkey does what you are suggesting. It has directories of good hashes on the web. It's still filled with spam and crap.

  11. Wrong. by FallLine · · Score: 2, Informative
    Where I think you are confused is about the nature of MD5.

    MD5 is not just another hash function. It is cryptographically secure. This means that you will never ever, in the life of the universe, be able to find nor contrive / construct a file with an identical hash. That is the whole point of MD5. Otherwise digital signatures and certificates would be meaningless.
    This is not quite true.

    Firstly, MD5 is just a one way hash. That hash can be and is often signed to prove that the hash was generated by some trusted party. However, if the hash itself is broken, then validating with it any signature, regardless of how secure it is, is by definition meaningless. See MD4 and others.

    Secondly, we only presume MD5 to be a good one way hash--there is no absolute proof that it is. There might be some novel approach that we just don't know about yet.

    Thirdly, by definition, no one-way hash can rule out the possiblity of brute forcing the hash by throwing enough stuff at it with the hope that something else will generate the same hash. In other words, we KNOW there exist other inputs that will generate the exact same hash result because the hash cannot possibly describe a unique input given that it is much much shorter. We only believe that it would be very hard to generate some other (reasonable) input to match a specific target hash. For instance, for some known hash I probably cannot generate an input that will match it and I especially cannot hope to generate one that is apt to resemble what I intend to pass my package off as. However, given enough computer time, I can certainly generate SOME file (even if it is ugly) that will match your MD5 hash (and pass your signature with flying colors). In 50 years even there is every reason to think that this would be a trivial task.
    1. Re:Wrong. by Anonym0us+Cow+Herd · · Score: 4, Informative

      Secondly, we only presume MD5 to be a good one way hash--there is no absolute proof that it is. There might be some novel approach that we just don't know about yet.

      True indeed.

      Just like we might find a way to easily find the prime factors of huge composite numbers. Which would render public key cryptography useless. But mathematicians smarter than us seem to think this is not likely. So your suggestion that it might happen doesn't mean much. After all, we might find a way to travel faster than light.

      I can certainly generate SOME file (even if it is ugly) that will match your MD5 hash (and pass your signature with flying colors).

      All you have to do to proove that a program could be written that could break MD5 is to post two tiny blocks of data which have the same MD5 hash. Basically the same simple test I would offer to anyone claiming a perpetual motion machine. Simply demonstrate it. If you break MD5 you could be famous.

      Thirdly, by definition, no one-way hash can rule out the possiblity of brute forcing the hash by throwing enough stuff at it with the hope that something else will generate the same hash.

      It is a given that something else will generate the same hash. I agreed with this point in your earlier post. It is just finding it that is the problem. If the RIAA wants to spend hundreds of millions of dollars to build a machine that might possibly find a block of data that hashes to the same hash as one mp3 file, then I would be right there cheering them on.

      Throw enough horsepower at any problem, and you can solve it by brute force. Heck, in theory, you could exhaustively search the keyspace for a 2048-bit key. Extra credit: How many machines were working for how many years on the RC-64 challenge?

      In 50 years even there is every reason to think that this would be a trivial task.

      It's premature to say this. Only time will tell.

      A key principal of cryptography is that you pick key lengths and algorithms that remain unbroken not just based on today's technology, but based on tomorrow's technology and how long the secrecy of the data remains important.

      For instance, each bit of additional length added to a key doubles the keyspace that must be searched. Moore's law, if it continues to hold true, says that computer power doubles every 18 months. Now you figure out how many extra bits you need to add in order to prevent a successful attack within a 50-billion year timeframe. A 2048-bit key, for instance, is probably adequate over a 64-bit key.

      As to your hypothesis that MD5 can be broken, you may be right. Maybe it will be. But I wouldn't hold my breath.

      --
      The price of freedom is eternal litigation.
  12. Re:huh? by Catbeller · · Score: 1, Informative

    Don't laugh.

    I recall a news report a few years back that indicated that the U.S. dumps herbicide galore on marijuana fields in Central and South America.

    As a bonus, those plants that do managed to get harvested end up being smoked in the good ol' U.S. -- and the poison ends up in the criminal bodies of the smokers.

    Win-win: the War on Some Drugs gets a shot in, the pesticide company makes millions, we humiliate the country we forced the poison into, we poison the water tables of thousands or millions of helpless poor people, and best of all, people who smoke the demon weed get poisoned and ill, maybe even die.

    I assume the Drug Warriors go out to their local pubs in D.C. and get stoned on martinis when they celebrate this victory of the Glorious Republic.

  13. Re:Confusion about:MD5 (it's no panacea) by Anonym0us+Cow+Herd · · Score: 2, Informative

    If you're getting enough random errors to conclude that no two rips will have the same MD5 sum, then you must have one heck of a crappy CD-drive.

    I'm not sure, but I think that you can get different rips of the same cd track. I seem to remember that cdparanoia's docs had some detail on this. Something called "digital jitter" or somesuch. Just recalling from memory.

    I'm certianly not an expert on all the levels of what goes on in ripping.

    --
    The price of freedom is eternal litigation.
  14. Know your enemy by dcavanaugh · · Score: 3, Informative
    It looks like Overpeer is owned by some kind of Korean conglomerate www.sk.com. Hardly any consumer products, but it would be worth a look to see if they have anything that can be effectively boycotted or tarrifed to death.

    They appear to be running Win2K/IIS, just like RIAA. Not that I'm saying this is bad, or anything like that :-)

    Be on the lookout for any of the following people:
    • Marc Morgenstern, CEO of Overpeer, Inc.
    • Val Thomas (C.I.O.)
    • Eric Bingham (C.O.O.)
    • SunHong Min (Director of Board, SK Corporation)
    • CheolWoong Lee (C.S.O., co-founder)
    • Changyoung Lee (C.T.O., co-founder)
    • Junghyoung Lee (System Engineer)
    • Don Kim (Director of Board, SK Corporation)
  15. Re:MD5 + database is all we need. by Shadeborn · · Score: 2, Informative
    This could be correctable via a web site (or database) that p2p programs could validate against.

    Bitzi does exactly what you describe. Several Gnutella clients have built-in support for it.

  16. Re:It won't work by The_K4 · · Score: 2, Informative

    You don't need a program. There's usually an easy way to tell. Look at what else the user is sharing. If they have multiple copies of the same song with just different formatting/spelling of the title...odds are they are gunna be fakes. After all most people don't keep 5 copys of songs with different titles on the HDDs. Just use about 2 min of checking and a bit of common sense you can reduce the chances of getting a bad song.

  17. p2p proposal by jishak · · Score: 2, Informative

    I propose a new type of peer 2 peer network based on distributed computing such as seti@home merged with a quality of service metric similar to slashdot's. Basically everyone who connects to this network will reserve a chunch of hard disk (say 100mb) for the use of the network, a slice of memory (say 16mb), and a portion of their bandwith (say 10%). These reserved objects can be used to keep a protected hash database running live 24 hours a day, 7 days a week.

    Redundancy should be build into the network so that as people log on and off, a large percent of the hashes are still available such as 90%. These hashes could use md5 or some other secure network and the moderation would handle filtering the good from the bad. Initially it would have a lot of duplicates. This is not a bad thing. It would cause greater numbers of people to listen to duplicate songs until the best quality ones are modded up and the lower quality ones are modded down.

    If the reserved space is encrypted we should be able to isolate source ip's and make it look as if the traffic is coming from everyone. So instead of a song coming from 3 sources, it looks like it comes from 1000 sources because the protected share is part of every client. Similar to the Borg.

    We could still give preference to faster pipes such as T3/T1/OC whatever. In addition with a node/supernode algorithm, we could figure out more efficient routes for transmitting the songs based on the users already connected to the network. For example, choosing to get a song from a user at your "isp" vs "the nearest supernode".

    The protected share should handle the md5 checksum and thus the client's distributed client program would devote cpu cycles to checking the validity of the content in the protected share. I like the idea of hashed based searching but I wonder, even if we store the hashes in a protected share, does this open the door to any form of legal liability?

    I realize that the record cartel could come in and do an initial flood of crap and then maintain a network of computers to saturate it with bad data. A solution would be to have the client upload a valid file and then have the network (protected share) validate the file. The network could then keep running times of valid source ip's. The source IP does not have to be sharing data (it can if it wants, and most clients probably would) it just is needed to prevent the record cartel and their minions from setting up hordes of dhcp machines spitting out bad data because they would have to revalidate everytime an ip is changed. This may effect others who are on dhcp but their moderated accounts would be able to act as a form of credit at time of validation. People with good history who switch ip's but don't disconnect would not have to be revalidated because a trust would be established. Whild someone who disconnects and changes IP is no longer trusted. By having a protected share, high quality data could go into replication quicker.

    If we know it is trusted and we see a concentration of requests coming from a particular area/isp, we can broadcast data to other clients near area/isp for the purpose of retransmission during peak times. Maybe we could build in requirements such as if a song is downloaded, it must be kept on the machine for 24 hours, so people don't just download and delete. This way retransmission could be quicker during peak times. People who download and delete or log off would be modded down as potential sources while others would continue to keep good credit. Thus, in addition to having metrics for quality of service, we could also have metrics for the quality of the source.