Mission: Infiltrate the P2P Network
prostoalex writes "Wired News unveils the secrecy behind Overpeer, the company whose mission is to infiltrate peer-to-peer networks with low-quality audio and video files, or corrupted chunks of data which carry the same name and have the same size as originals. Apparently OverPeer even managed to procure a USPTO patent on (a) producing an advertising digital music file by deteriorating or damaging a sound quality of an original music file of a record of a cooperating record corporation; and (b) distributing the advertising digital music file through the communication network."
Yes, but the client supplies the checksum. There's nothing to stop a client from sending a phony checksum.
In any case, the checksum only really protects against things getting screwed up through the transfer - if they are screwed up to begin with, the checksum isn't going to help at all.
On the other hand checksumming is not a garanty of uniqueness : If not it would be called compression (Cool a 4 minute song on a MD5 checkum).
May I use your sig please?
It won't work well with all P2P networks. A prime example is the eDonkey network which uses a hash of each file as an identifier, not a filename/size identifier. You can rename the file to anything and the hash won't change. eMule Project is another great eDonkey network client and is open source.
This is too little, too late, unless you're stuck on Kazaa.
Trolling is a art,
And this would cause people to WANT to visit their overpriced pay per use pool? I haven't bought a CD in many years. I also do not participate in P2P piracy. I find plenty of good FREE quality tunes in legitimate distribution channels. MP3.com, et al. provide me with enough legit free material. I no longer desire to spend $18.00 for a CD of bland uninteresting music the RIAA is spewing.
So suppose you do a search for 'Band XYZ'
and you get results
BAND XYZ - I can't write a song (md5=12345)
BAND XYZ - I cant write a song (md5=91283)
One of them is the real and the other is the decoy. Which one is which?
Or if they are ripped from analogue sources, they would be different.
The md5 thing only works if all files are exactly the same.
I miss my rubber keyboard.(Homepage)
It's not too hard to avoid low quality/bogus files. All you need is some form of rating and feedback system. ShareReactor fulfills this need for the eDonkey network, providing links to verified versions of files. I imagine it's very possible to decentralise this system significantly, or even to integrate it into the file sharing protocol itself, in order to reduce the possibility of the rating site being shut down.
-- Help Digitise the Public Domain at DP.
> No its not PRACTICAL...but maybe they've got some brute force per song?
They'd need A LOT of brute force. Still today exist no two known files with same MD5 hash. You could claim the big price if you could come up with two such files!
I would like to know when this is all going to come to a head,
Umm, it stops when the consensus model of content sharing breaks down horribly because it's entirely possible to do this kind of thing. Unless a 'centralized authority' happens along or some form of 'peer authentication' method is devised (which requires some form of centralized authority) they eventually win.
'Consensus model' schemes only work in subcultures. They fail dramatically when scaled to the whole world. That in a nutshell describes all the problems with the 'net as it exists today.
Rip a CD on two different drives and the chances that some bits will be different in the resulting files are really pretty good.
Not if you use a good ripping program like Exact Audio Copy and a reasonably good (i.e. not with multiple big scratches) cd. Of course if you then encode it, the end result will still depend on the encoder (LAME, Ogg), the version, and the settings used, so your point still stands.
eDonkey does what you are suggesting. It has directories of good hashes on the web. It's still filled with spam and crap.
Firstly, MD5 is just a one way hash. That hash can be and is often signed to prove that the hash was generated by some trusted party. However, if the hash itself is broken, then validating with it any signature, regardless of how secure it is, is by definition meaningless. See MD4 and others.
Secondly, we only presume MD5 to be a good one way hash--there is no absolute proof that it is. There might be some novel approach that we just don't know about yet.
Thirdly, by definition, no one-way hash can rule out the possiblity of brute forcing the hash by throwing enough stuff at it with the hope that something else will generate the same hash. In other words, we KNOW there exist other inputs that will generate the exact same hash result because the hash cannot possibly describe a unique input given that it is much much shorter. We only believe that it would be very hard to generate some other (reasonable) input to match a specific target hash. For instance, for some known hash I probably cannot generate an input that will match it and I especially cannot hope to generate one that is apt to resemble what I intend to pass my package off as. However, given enough computer time, I can certainly generate SOME file (even if it is ugly) that will match your MD5 hash (and pass your signature with flying colors). In 50 years even there is every reason to think that this would be a trivial task.
Don't laugh.
I recall a news report a few years back that indicated that the U.S. dumps herbicide galore on marijuana fields in Central and South America.
As a bonus, those plants that do managed to get harvested end up being smoked in the good ol' U.S. -- and the poison ends up in the criminal bodies of the smokers.
Win-win: the War on Some Drugs gets a shot in, the pesticide company makes millions, we humiliate the country we forced the poison into, we poison the water tables of thousands or millions of helpless poor people, and best of all, people who smoke the demon weed get poisoned and ill, maybe even die.
I assume the Drug Warriors go out to their local pubs in D.C. and get stoned on martinis when they celebrate this victory of the Glorious Republic.
If you're getting enough random errors to conclude that no two rips will have the same MD5 sum, then you must have one heck of a crappy CD-drive.
I'm not sure, but I think that you can get different rips of the same cd track. I seem to remember that cdparanoia's docs had some detail on this. Something called "digital jitter" or somesuch. Just recalling from memory.
I'm certianly not an expert on all the levels of what goes on in ripping.
The price of freedom is eternal litigation.
They appear to be running Win2K/IIS, just like RIAA. Not that I'm saying this is bad, or anything like that
Be on the lookout for any of the following people:
Bitzi does exactly what you describe. Several Gnutella clients have built-in support for it.
You don't need a program. There's usually an easy way to tell. Look at what else the user is sharing. If they have multiple copies of the same song with just different formatting/spelling of the title...odds are they are gunna be fakes. After all most people don't keep 5 copys of songs with different titles on the HDDs. Just use about 2 min of checking and a bit of common sense you can reduce the chances of getting a bad song.
I propose a new type of peer 2 peer network based on distributed computing such as seti@home merged with a quality of service metric similar to slashdot's. Basically everyone who connects to this network will reserve a chunch of hard disk (say 100mb) for the use of the network, a slice of memory (say 16mb), and a portion of their bandwith (say 10%). These reserved objects can be used to keep a protected hash database running live 24 hours a day, 7 days a week.
Redundancy should be build into the network so that as people log on and off, a large percent of the hashes are still available such as 90%. These hashes could use md5 or some other secure network and the moderation would handle filtering the good from the bad. Initially it would have a lot of duplicates. This is not a bad thing. It would cause greater numbers of people to listen to duplicate songs until the best quality ones are modded up and the lower quality ones are modded down.
If the reserved space is encrypted we should be able to isolate source ip's and make it look as if the traffic is coming from everyone. So instead of a song coming from 3 sources, it looks like it comes from 1000 sources because the protected share is part of every client. Similar to the Borg.
We could still give preference to faster pipes such as T3/T1/OC whatever. In addition with a node/supernode algorithm, we could figure out more efficient routes for transmitting the songs based on the users already connected to the network. For example, choosing to get a song from a user at your "isp" vs "the nearest supernode".
The protected share should handle the md5 checksum and thus the client's distributed client program would devote cpu cycles to checking the validity of the content in the protected share. I like the idea of hashed based searching but I wonder, even if we store the hashes in a protected share, does this open the door to any form of legal liability?
I realize that the record cartel could come in and do an initial flood of crap and then maintain a network of computers to saturate it with bad data. A solution would be to have the client upload a valid file and then have the network (protected share) validate the file. The network could then keep running times of valid source ip's. The source IP does not have to be sharing data (it can if it wants, and most clients probably would) it just is needed to prevent the record cartel and their minions from setting up hordes of dhcp machines spitting out bad data because they would have to revalidate everytime an ip is changed. This may effect others who are on dhcp but their moderated accounts would be able to act as a form of credit at time of validation. People with good history who switch ip's but don't disconnect would not have to be revalidated because a trust would be established. Whild someone who disconnects and changes IP is no longer trusted. By having a protected share, high quality data could go into replication quicker.
If we know it is trusted and we see a concentration of requests coming from a particular area/isp, we can broadcast data to other clients near area/isp for the purpose of retransmission during peak times. Maybe we could build in requirements such as if a song is downloaded, it must be kept on the machine for 24 hours, so people don't just download and delete. This way retransmission could be quicker during peak times. People who download and delete or log off would be modded down as potential sources while others would continue to keep good credit. Thus, in addition to having metrics for quality of service, we could also have metrics for the quality of the source.