RIAA Tracking Songs by MD5 Hashes
aSiTiC writes "Apparently RIAA has obtained some technical experts in their prosecution of file swappers. Currently they are tracking traded mp3 files from the Napster network by matching MD5 hashes. This seems quite interesting but I was under the assumption that identical hashes could be created with identical rips and id3v2 tagging. Now may be the time to update your illegal mp3 file MD5 hash sums."
We will have to create a honeypot that spoofs md5 hashes as well. IANACS, so i don't know how.
As far as I know, you will get indentical hashes from identical files with the same ID3. How can they track files with the help of MD5-hashes?
What if I own the CD but got files off the Internet because I was too lazy to rip them? Would I still be expecting to be sent to the prison camp?
In other news, all songs produced by RIAA artists in the last 10 years all have the same MD5 hash anyway, because they're all the same.
"If you want to improve, be content to be thought foolish and stupid." - Epictetus
you just normalize or edit the begining or the end of the song? Does the MD5 Hashes still works?
will they start sending subpeonas to aol/tw customers this time?
It is generally believed amongst file traders that it is legal to download an mp3 for a song, when you own the CD. In other words, you don't need to rip and encode songs from your own CD. However, this may not be true (I am not a lawyer).
The RIAA is using MD5 hashes as a basis for proof that the individual in question downloaded the files they are sharing, instead of ripping them from their own CD collection. This is supposed to show the individual is a willing participant in stealing and distributing music, instead of someone who is just sharing what they already own. But, see above.
I think this is mostly just a FUD tactic. They can talk to the media about how their MD5 hashes prove so-and-so is a big mean pirate hacker. MD5 hash certainly sounds scary, especially when the technology is described by the media as a tool used by hackers.
---
I support spreading santorum
The only way for two files to have the same MD5 hash is for them to both be encoded with the same encoder, from the same WAV file, with the same bitrate and all advanced options, and to have exactly the same ID3 information, the same filesize, and to be identical to the last bit.
If two people used the same ripping software set to all its default settings (as many unsophisticated users do), got a perfect rip off the CD, and relied on CDDB information for tagging the song, then it's possible that they got mp3s identical down to the last bit, and thus identical MD5 hashes. BUT to make this a plausible defense, you'd have to show that your rip was in fact perfect. In other words you'd have to be able to recreate the mp3 independently. If the old Napster mp3 had any ripping errors, then it would be hard to claim that the later rip just happened to have the same errors - assuming errors are essentially random.
Are we sure they're actually using MD5? The article doesn't even contain the string "md5" that I can see. It mentions hashes though, but there's something called Robust Hashing which can be used to identify, or at least, compare content in a "fuzzy" way.
Belief is the currency of delusion.
Yes, but note that just changing the ID3 tag isnt enough since when you calculate the MD5 hash value you can just ignore it and then you will be able to find matches.
Although i wonder, if the WAV files on 2 CD of the same album are identical, the only thing you can prove from the fact that the hashes match is that you made the mp3 file using the same bitrate.
I cant say this is enough information.
BTW: A way to move around having the exact same copyit is by introducing small amount of random changes. one bit is enought the fool the hash
Lets see someone put together an app that flips bits here and there within MP3s to make each one it runs against unique enough to create a new MD5 hash!? (I would, but I can only program in a pseudo-language ;) It could even be as simple as adding in a trailing byte to all of your MP3s, though that could be easily filtered. Hell, if you can hide messages within compressed JPEGs without noticeably affecting their quality, why not do something similar to MP3s just to jack up this sort of tracking!?
"1984" was ment to be a warning, not a guidebook. You hear that Kim Jong-il!? BushCo?!
What good evidence destroying/hiding mechanisms are there around? Apart from deleting and overwriting the area several times? How about something that can kill the hard-drive even when the computers off? I see crime scenes on the news all the time with police carrying out computer cases for examination - it always struck me that you could fit tamper protection in your computer - any attempt to move it, open the case or anything with out proper authorisation would cause the hd to torch its-self, this could be as simple as a battery inside with enough power to boot the machine quietly and very quickly destroy the data, the police would have no time to stop it, while all this is probably illigal itself, it could be better than being sued for $50000 per song or whatever their price is?
:)
I hope the next kazaa lite comes with file altering/deleting/anti-riaa utilities
This comment does not represent the views or opinions of the user.
I suppose that (if its possible) you would either want to swamp these guys with false positives, or distribute the hash keys and the files somehow to make it more difficult and protracted to discover who actually owns that file.
I suppose that one viable option in P2P would be a freenet model where downloading involves a number of encrypted hops between peers to search or get the data, and where peers cache popular data and indexes in encrypted form. It would be much, much harder to figure out who shared that file then.
Obviously there is a trade off going this route. You wouldn't want the sluglike performance of Freenet so it would not be as secure, but I'm sure you could reduce the number of hops and other measures and still make life massively more difficult for RIAA and their ilk to track down your activities.
It's true that two different people could generate RIP's of the same track with the same MD5 hash, but the odds are low: they'd have to use exactly the same encoding settings, and enter exactly the same ID3 tags with exactly the same values. (Counterpoints: they could be default settings, and CDDB/Gracenote metadata, which would improve the odds a bit) And since we're talking about large music collections, the exact matching would have to have to happen across hundreds of tracks. And if the ID3 tags had notes like "ripped by so-and-so" that'd kinda blow the case. So while it's certainly true that MD5 hashes don't completely uniquely identify a particular RIP of a track, I think that when compared for large numbers of files, it'd be a pretty good indicator of file copying.
Enable 3D printed prosthetics!
If I use KaZaa to access indie artists who are
sharing their songs - as is their right - AND I
also rip my entire 1000+ CD/LP/8track collection
to the same computer AND I intellegently store
all the files in the same heirarchy.
Have any laws been broken?
KaZaa is configured to share everything in my
heirarchy so that the indie songs can continue to
be shared.
Have any laws been broken?
I go in for Jury Duty, meanwhile Another Kazaa
user downloads the indie shared files.
Have any laws been broken?
Another Kazaa user downloads the rips from my
personal collection because their 8track player
is on the fritz.
Have any laws been broken?
Another Kazaa user downloads the rips from my
personal collection because their LPs were
destroyed in a flood.
Have any laws been broken?
Another Kazaa user downloads the rips from my
collection because they want to see what the
latest Madonna single sounds like before going
out and buying the CD.
Have any laws been broken?
If any laws were broken here - who broke them?
Just because I leave the front door open does not
mean that anyone can enter and take what they
want from my house. Same as my computer.
The action of downloading is at question not
making the article available.
YMMV. Consult a lawyer.
comment directly in my journal
About this interpretation of Fair Use: I agree that downloading mp3's of CDs that you have purchased should be fair use. I am in a similar situation. A couple of years ago I lost 90% of my CD collection in an apartment fire. I had about 20 of these CDs ripped at the time and since then, I have downloaded many of the others to replace what I had paid for. In some cases, I re-purchased the CD because I wanted to have an original for some of my favorite artists but I didn't mind the mp3 mastered replacements for many of the CDs. Would this fall under Fair Use? I would think that it does since the RIAA seems to think that we are only purchasing a license to listen to the music. However, if I had to present the original CDs to a judge to prove that I do/did own the physical CD, I would be SOL.
The Tools Of Ignorance wanna be a tool?
From the article:
...
Copyright lawyers said it remains unresolved whether consumers can legally download copies of songs on a CD they purchased rather than making digital copies themselves.
By comparing the fingerprints of music files on a person's computer against its library, the RIAA believes it can determine in some cases whether someone recorded a song from a legally purchased CD or downloaded it from someone else over the Internet.
So, the RIAA has been downloading illegal copies of music for years, in fact probably has a huge library of music. Simultaneously, in their broad sword efforts to completely end p2p, they're arguing that it's illegal to download songs you've already bought. So, even if the RIAA has gone through all the hoops with this library, obtaining licenses for each song they swiped off of file traders in their investigations-- which I doubt; recall Microsoft's slip ups-- they're arguing that the methods they've been using to track down illegal file traders are actually illegal themselves! In fact, the RIAA might have the largest collection of illegal music of anyone, even larger than mine! Of course, this should come as no surprise, after all of the attempts to make it legal for them to attack suspected infringers PC's, it's pretty clear that the RIAA's privilege and property makes them above the law.
Unless she had an OC-48 or two going into her home, she didn't make the files available for download by *millions* of strangers. When the resource is limited, the magnitude of the crime is likewise limited. If you offer a stolen watch on the streets of New York, you can't be charged with trying to sell it to MILLIONS of people, cause there's only one watch. Likewise, in this case there's only enough bandwidth for a certain number of potential downloads, and speaking of millions here is plain misleading.
If the people who downloaded files from her spread them further, that's THEIR crime and not hers, much as the guy who sold a stolen watch won't be found guilty for the watch buyer illegaly selling it to someone else.
And in this case, it's even less severe, as it's not a theft, but a copyright violation.
Regards,
--
*Art
The ripping stage can also produce slightly different checksums, depending on the condition of the CD - Audiograbber actually reports "potential speed errors". Unlike data CDs, some level of read error is considered acceptable on music CDs; you don't want the player to keep re-trying a bad sector if it detects a big problem - it would ruin your listening pleasure!
When I am king, you will be first against the wall.
what's stopping people from simply changing a letter in the mp3 info tag (the trivial approach) or a bit or byte somewhere in the file? Good luck matching my file to anything.
Well there are several things that could stop you. You could get the latest MISD (Microsoft Internet Social Disease), etc.
But if you don't, then short of other things stopping you, such as getting run over by a truck, you merely need to change one single bit in the file to have a very different MD5. That bit does NOT need to be in the ID tag. You could just decode one single mp3 frame, randomly selected from the file, alter one bit of the sound, and then re-code that single mp3 frame.
It is even possible that someone might be inspired to write a tool to do this. It would defeat a lot of the previous Slashdot discussions about using MD5 to indicate "good" downloads before you download them. But maybe trust relationships of the P2P swappers themselves, using private keys, is a better idea than trusting the download file.
The price of freedom is eternal litigation.
A cd ripper called "Exact Audio Copy", allows you to set your cd-rom/writer's read offset or read/write offset. Would this offset have any effect on the md5sum created? Say someone rips with the offset set at 0 and then again with the corrected offsets. The mp3 was encoded with the same encoder, settings, id3 information, volume adjustment, etc. Would the md5sum match?
I was under the impression that MP3 (MPEG-1, Layer 3) was a lossy algorithm. Even with the same ripper settings working off the same stored raw CD audio file, will it actually produce identical output? Can the MP3 encoder drop different bits as irrelevant on different passes in time on the same data with the same settings? If this is indeed the case (I don't know, I am not familiar with the detail of the algortithm), then MD5 sums become a virtually foolproof way to identify a file since an identical sum can only be produced from the exact source MP3, not one that is close. Just a thought on that matter. And a second point, more of an idea really... Has anyone thought of trapping RIAA? Here is my proposal... 1) Go and buy 50-100 CDs from your local music stores (I know, this is abhorrent since you are lining the pockets of the people you want to fight but it is a means to an end). SAVE ALL THE RECEIPTS! You will need these. 2) Download a popular P2P program and sign on. 3) Go download crazy and download an MP3 for EVERY SINGLE SONG on the pack of CDs you just purchased. Be obviously, be a bandwidth pig, get somone's attention. 4) Take screenshots and printouts of the directories containing your "booty". This will establish the timestamps of when they were downloaded. Sign and date the screenshots, preferably with witnesses who sign them as well. 5) Wait for a supoena from RIAA. 6) Join RIAA in court and argue "fair use" by throwing up your stack of legally purchased CDs and the receipts for them clearly indicating that they were purchased PRIOR to the supposed infringement and you were simply wanting MP3s of CDs you own but lacked the knowledge/skill/time/tools to rip them. Is such a case copyright infringement? It's a dangerous game to play because the fair use doctrine has been supported, it is not a matter of law. The outcome could be undesired because it could cause a rethinking of what constitutes fair use. The fun part of such rethinking could be the broadening of what is considered infringement into areas where it was not infringement and ignite an absolute firestorm.
No, I did not have renter's insurance, so it was a complete loss for me. If I had been reimbersed, I would have likely re-purchased the CD's that I wanted most and forgotten about the ones that I seldom listen to. This brings up another question/issue. Before the fire, I could have made backup's of every CD that I had. Then after the fire, I wouldn't have lost anything audiable, just the physical packaging. However, after the fire, it was too late, but couldn't I have considered napster to be my backup. Since I could readily download a CD when ever I wanted, why make a backup of it?
The Tools Of Ignorance wanna be a tool?
You point out a very real danger.
If you just alter the ID3 tags without altering the mp3 content, then they can nail you. If simply altering id3 tags becomes commonplace because everything thinks it is the easy, trivial implementation, then they will nail you by checking the hash of the content. Identical content with trivially altered ID3 tags is a very good argument that you got this file from the thousands of other people who have the same hashed file with trivially altered ID3 tags.
I'm proposing a non-trivial, but not that conceptually complex alteration to the content that alters it in an imperceptable way. In fact, whether the alteration seems complex to you is irrelevant. After all, it is just a command line tool to you anyway, just like altering ID3 tags. You don't care how it is done. Run this tool on your mp3 file, it randomly affects an imperceptable alteration to one of the gazillions of 11-byte frames in the file.
However I doubt that they will go to such trouble -- if they have access to your files you're pretty much caught red-handed. A different MD5 checksum won't get you off of the hook here.
They might have access to your files if you are sharing them.
I think the original argument is that Jane Doe was sharing files. Jane claims the sharing is unintentional. Jane claims that the mp3's on her hard drive are her own rips of CD's she owns. The MD5 hash proves otherwise. This sub-discussion is about altering mp3's so that hashing is now useless at tracking the source of where you got an mp3 from. In the Jane Doe scenerio, a comples mp3 alteration to foil the MD5 hash would actually be useful.
Merely altering the ID3 tag such that the RIAA can also alter the ID3 tag back to what it is in the wild, and get identical MD5 hashes is a very strong argument against Jane Doe.
The price of freedom is eternal litigation.
Can I share files for my self? I'm at work... I have a large CD (and MP3) collection at home. I have a hight speed internet connection. Can I share the files to my self for use at work? (Ok, put the thinking caps on for a minute....)
Try the following: Install some CD ripping/encoding software. Leave it at the defaults. Use CDDB to generate the ID3 tags. Unless something gets corrupted, that *will* produce an identical file, down to the last bit.
You may be right. I'm not sure. I have some doubts about the ripping process being as exact as you say. I agree that the mp3 encoding process is exact. Same input file, same settings, ---> same output file.
The price of freedom is eternal litigation.
Don't we already pay a small tax to the recording industry every time we buy blank audio CDs (but not data CDs)? I'd like to see some lawyer fight a case claiming that a P2P user has already paid the RIAA and is therefore exempt from their lawsuits when downloading the music and burning it to an audio CD. That would be an interesting lawsuit.
Although I may not have said it as well as I could have, that is the basis of my question. If the RIAA continues to make copyrighted CDs and shuts down P2P services, what am I to do when I have a damged disc. I could make a backup even though I am entitled to one and I can't grab the files off of P2P because no one will give me access to the file out of fear of being sued. Now the RIAA can start making disc more fragile and easier to scratcha and I will be forced to buy the same disc over and over during the course of my lifetime. But I just want to listen to the damn song. Isn't it great to be a consumer in America?
The Tools Of Ignorance wanna be a tool?
Hashing and compression aren't really my thing so maybe someone could clarify my understanding.
I was under the impression that hashes are not reversible like compression algorithm's are, but that they try to add as much chaos between slightly different variations of the original. (The same way the telephone company racks up money by having area codes be very distant from each other; a typo in the area code probably means big bucks for a wrong number)
My spreadsheet of 1997 budget information could produce the same hash as a RIP of Meeco's Star Wars disco theme remix, but it would be unlikely to produce a hash similar to my 1996 budget information (which is practically the same other than 1996 being 1997). None of these would ever compress to the same result using a loss-less compression scheme (or they might be in for a surprise when they uncompressed their Mecco track).
Producing a unique result for each file is what a compression algorithm does. If a hash were truly unique and reversible then you'd have a compression algorithm, right?
Now to make this relevant to this case...
Could someone make a MP3 from MD5 generator? It'd create an MP3 with the goal of having exactly the same MD5 hash as the original song. Admittedly it'd probably sound like a confusion of radio static and Husker Du. Not anyone's cup of tea to listen to probably, but it might wind up being just the sort of edge case to make MD5 hashes insufficient evidence in court (especially if the defendent had a nose ring). If this isn't possible, then perhaps it could make a JPG from MD5 generator? Visual noise is much more appealing to many than auible noise and probably easier to create.
So did I, so I just ran the experiment:
Looks like under identical conditions (same drive) it'll rip consistently. Ripping off a different drive might give different results, that's more hassle than I want to try right now. If anyone wants to compare, the disc/track I ripped is Pink Floyd's Dark Side of the Moon, Capitol's catalog # CDP 7 46001 2, DIDX 226. (Different recordings will almost certainly give different results.)Oh, and to make RIAA happy:
;-)-- Alastair
Everyone is missing the point here with the MD5 hashes.
OK, if you use the defaults in your MP3 encoder, and the ID3 tags from CDDB the *encoding* would be the same, but not the end file. Know why?
The rippring process differs greatly - you've got things like scratches on discs that some CD-ROMs will pick up as errors and some won't, you've got pauses due to slow processor/HD on different computers etc.
The only way I'd say to get an identical file would be to rip it using the same computer, encoder and CDDB - in which case "Jane Doe" must have been the original producer of the Napster file if the KazaA one matches it (or she copied it from someone else).
She's guilty as Hell, but personally I support her as the RIAA/MPAA are scum.
#include <sig.h>
Maybe they're speculating that the jury will immediately succumb to the magic word 'hash'.
But otherwiese, frankly, i don't see what this could be good for. Hashes (whether MD5 or SHA or some other algorithm) don't prove a thing.
Identity: The identity of the hashes of two MP3s only provey that the MP3s were encoded with identical settings from an identical CD source. If two people, one in NY the other in LA buy the latest Red Hot Chili Peppers album and rip and encode it both on Windows machines using identical versions of RealOne (or any encoder) then the resulting MP3s will have identical hashes. Whether the probability of two different files accidentally having the same hash ist 1 in 2 or 1 in 2^127 is absolutely irrelevant here. The chances of two people using the same software with the same CDDB information to rip the same track from a CD that sold a million copies is a lot higher. Everybody with a half episode of Matlock legal expertise will tear the RIAAs position apart on this ground.
Trackability: Hashes cannot be used to reliably track the path of copies across P2P networks either. Since the hash is more sensitive to minor changes than the ear doing random changes to the ID3 tags or randomly changing a bit or two somewhere in the MP3 will wipe the tracks.
So two files having the same hash doesn't prove they come from a single origin. Two files having different hashes doesn't prove they don't come from a single origin.
Hashes don't prove a thing
Well, except that most decent rippers these days use paranoia or something similar, using algorithms to interpolate the corrupt stuff. The interpolation is going to sound good but it's almost certainly not going to be the same bit-for-bit. And bit-for-bit is what matters.
Remember that the MD5 hashes are the values used by popular P2P software to enable synchronized multi-source downloading of a file. If everybody "sharing" modifies files to affect MD5 hash values, then the P2P networks essentially fall apart into single source FTP-like downloading.
Different drives, with the same disc, and identical software, certainly do give different results. Just tested. Identical versions of cdparanoia live on both systems.
I also ran lame with default settings (makes a 128K CBR) on both WAVs and got different sums there as well.
No tags involved.
I don't know of a single MP3 ripper anymore that doesn't error-check the data as it is ripped.
Heck, I've taken unplayable CD's, run them through cdparanoia and gotten songs with no (audible) skips.
I challenge that is in fact *very* easy to end up with two computers producing two identical MP3's with the same hardware/software combination.
8b24f4f77034299b716cae19d687e807 icp2.wav
11d92db3509d53f40c62837e4d65f64e icp.wav
Also, I removed the second, then duplicated the first file and ran oggenc on both copies. This is the md5sum output.
8652995a3dbc5ff9888b0f2bab583959 icp2.ogg
56241131ffcc27e44a950bc8fac7b866 icp.ogg
I doubt very much that any VBR encoder produces the exact same output twice. Thanks for listening, and yes I did remove both copies afterwards.
I also ran lame with default settings (makes a 128K CBR) on both WAVs and got different sums there as well.
This part is not at all surprising. Even one single bit difference in two files would give radically different MD5 hashes.
Different drives, with the same disc, and identical software, certainly do give different results. Just tested. Identical versions of cdparanoia live on both systems.
This part is the really interesting result. Two different rips, same software, same CD, give different results on different drives. I think cd paranoia says something about "digital jitter" whatever the heck that means?
The price of freedom is eternal litigation.
Right, but I figured, maybe the bit differences might disappear in the encoding, some wacky things you can only determine empirically :-)
Not sure about "digital jitter" myself, but I do know that pretty much all discs have errors all over the place (I backup my audio CDs with cdrdao, which tells me just how many CRC errors it had -- not seen a disc with less than a hundred yet), and the difference probably lies mostly in error correction strategies employed by the drives themselves. I don't know this for sure though.
Excuse me if this has already been covered but if all the rips have different MD5 hashes then all are from unique users who have the disc. So is it possible to modify each mp3 to have a unique md5 hash or as unique as possibe. Thus negating the argument and problem of all copies from one user. Just a thought.
Thanks,
--
Matt
"Destroy science and religion. Science would re-emerge exactly the same; but not religion." - Penn Jillette, paraphrased
- You keep saying that you are interested in a "capitalistic" solution, yet your entire argument seems to be based on the communist principle of "from each according to his abilities, to each according to his needs." You even seem willing to go down the path that all communists eventually follow, arguing that the government should make stricter and stricter laws and (if needed, back them up with force) to make sure that your system "works."
- You are assuming what you are trying to prove.
I dispute 2 & 3; I hold that the urge to create is a fundemental part of what it means to be human, as is the urge to copy/immitate others.I dispute 4 because people (such as game designers, cooks, fashion designers, etc.) make money off of goods (games, food, clothing) which are not covered by IP (excluding trademarks, as I did earlier).
I have worked in the game industry for over twenty years, and in all that time I have never seen IP laws successfully used to defend a company like SJG, but have seen several cases where they were successfully used to attack one.
[ As an aside, I was one of the people who wrote a letter in support of Steve when he was raided by the FBI years ago. ]
As for IP laws being the cause (rather than a consequence) of the wealth of creative output, consider. In a state of nature, man copies what he sees others doing. It is a basic part of our nature. In a creatively impoverished environment, there is the risk that there may not be enough templates to copy, because only a few people are innovating in any given area, and they may elect to hide their discoveries. So society offers a bargin: they will prevent the natual copying for a limited period of time, in exchange for the disclosure of new discoveries / inventions. This is the basis of all IP except trademarks.
As society grows larger, richer, and more diverse, the supply of templates rises rapidly. If all parties adhered to this "fair trade" and the growth arose from the IP laws (as you suggest), we should expect the price (length of IP terms, etc.) to drop as the supply increased and the demand remained relatively constant.
If, as I maintain, causality goes the other way and the natural growth of society's creative output (which has made IP increasingly lucrative is) instead driving IP laws, we should expect the price to rise--and this is in fact what we see.
-- MarkusQ