RIAA Tracking Songs by MD5 Hashes
aSiTiC writes "Apparently RIAA has obtained some technical experts in their prosecution of file swappers. Currently they are tracking traded mp3 files from the Napster network by matching MD5 hashes. This seems quite interesting but I was under the assumption that identical hashes could be created with identical rips and id3v2 tagging. Now may be the time to update your illegal mp3 file MD5 hash sums."
The md5 hashing algorithm has been proven to contain flaws allowing two files to produce identical md5 sums.
The only way for two files to have the same MD5 hash is for them to both be encoded with the same encoder, from the same WAV file, with the same bitrate and all advanced options, and to have exactly the same ID3 information, the same filesize, and to be identical to the last bit.
Otherwise, the MD5 will be nothing like the same, for two perfectly identical songs where one has a spelling error in one field of the ID3 tag. I imagine for any one song, there are many many different MD5sums out there, although perhaps one or another good quality version would exists on hundreds of different PCs...
Conversion Rate Optimisation French / English consultant
hmm Isn't that how k-sig, built into Kazaa Lite K++, works, by tracking MD5 hashes so ppl get exactly the file they want.
Changing MD5 hashes on songs to avoid RIAA would also lessen the effectiveness of K-SIG. Trading hashes of know working files was one of the ways ppl on P2p avoided downloading those fake RIAA files.
Yes, because for them to know that you have the MP3s, you have to be sharing them, which is the illegal part.
-- Dr. Eldarion --
Any modification, to ANY bit of the file covered by the hash, will change the MD5 hash (that's how hashes work). If you assume the hash includes the ID3 tag info, then simply editing the info (putting something in the notes field, for example) would change the hash.
On the other hand, if I were the RIAA attempting to identify common files in this way, I might be inclined to exclude the ID3 tag from the MD5 computation since it is so easily modified.
Any changes to the actual content, though, will ripple into the MD5 computation.
Short answer: "normalizing" the file for volume, or even chopping off a few seconds of trailing silence with something like CoolEdit will certainly change the hash and make it distinct from whatever their baseline hash value is.
The only problem is that a lot of file sharing software uses the fact that 2 files (from different sources) have the same hash in order to swarm the download from multiple sources. If everybody goes around intentionally making their mp3s have different hashes, swarming basically won't work anymore.
No, I don't want a free iPod
(This wouldn't, though, be a defense for the central problem that she made all of these MP3s available for download by millions of anonymous strangers without the consent of the copyright holders. And assuming her identity is revealed and she is sued, if the "ambiguous" claim's alternative interpretation is correct, she'll be able to show the CDs to the Judge.)
You are not alone. This is not normal. None of this is normal.
Pretty much no rip is identical.
First step: the *.wav is ripped. Using libcdparanoia, which i personally perfer, i find slight variation in size depending on the machine and cdrom drive i rip them on.
Second step: encoding on different machines, with different encoders, using different algorythms, using different levels of floating point precision, on different architectures etc... produces vastly different files.
Third step: sharing. Oftentimes an mp3 is downloaded 99.8% before the connection is broken. You keep the mp3 becuase mp3 is a sequential file format and you only lose a second or two of music. The rest of the file is intact.
Their md5 searching scheme could be circumvented quite easily by changing a comment in the id3 but they could get around that by cutting out the id3 part of the file when they make their md5sum.
The downside to this is that if you are searching for music on something like gnutella by the ***sum, the content would differ and you would not get as many results. Gnutella would not download from multiple sources becuase the file would not have the same signature.
Whatever the case, it is clear that some form of file obfuscation is now needed for safety online. Or we can wait for freenet to mature.
http://news.bbc.co.uk/1/hi/entertainment/music/318 7695.stm
:wq
> This proof of RIAA is as good as the SCO evidences of greek language or bsd firewall code against linux
/. were clamoring for some MD5 sums instead...
Uh, actually this is irrefutable proof. It will miss a lot of songs, but it is virtually guaranteed to not give false positives. This is much more solid proof than SCO had.
To think a month or two ago when SCO was insisting on an NDA many on
Obviously the RIAA's technical experts know what they are doing... its time to alter a few ID3 tags like the story suggested.
The unofficial
you have to be sharing them, which is the illegal part
Actually that's not true. They only care about the sharing because it leads to what they really care about: people listening to music that they didn't pay for. If everyone who shared mp3s had bought every CD of the songs they downloaded, no one would care because they would have already paid to listen to those songs. The problem is that most people don't own all of the CDs for the songs they download, and the RIAA doesn't like it when you try to wriggle out of their money trap. If the actual sharing was the problem, the distribution itself, then we wouldn't have radio stations playing music either, because that also lets people listen to music they didn't pay for, but it's a bit different because you don't really get a choice of what you hear. But now if you go and start recording songs you hear on the radio, so you could listen to them whenever you wanted, you're getting into that grey area. Of course the RIAA doesn't really care about that because they know that radio quality is shit, so there won't be widespread radio recording anyway.
--
Promoting critical thinking since 1994.
If that's all you want to do, much better not to use Cooledit, which has to expand and recompress the file to MP3. Use something like MP3Trim which can chop off any given number of MP3 frames, or normalise the volume, by operating on the MP3 directly. Much much faster, and no expand/recompress quality loss.
I just did some consecutive rips of an audio track and compared the md5 checksums.
I did the same song three times. The first two times, all things were equal including all settings. The MD5 checksums were the same.
I swapped out my DVD/CD player for a different model. Reripped the track on the same computer with the same exact settings and the MD5 was different.
I am using Exact Audio Copy in secure mode and Lame for the encoding. The ID tags were recieved the first time and the same tags used for all three attempts (EAC remembers the disk).
I'm sure I could try many things like changing the read speed, comparing the wav files and not just the resulting mp3 etc.. but I do not have the time for more analysis.
Bad boys rape our young girls but Violet gives willingly.
This is pretty common at least with iTunes. Most of the people will not change the default settings, so each cd rip will be identical, all using the same id3 tags.
What, me worry?
Theres issues of offset values (as with CD audio it is difficult to hit an *exact* location on the disk), plus the way the reader deals with C1 and C2 error correction, as well as how different extracting software interfaces with the hardware.
It would almost be safe to say two mp3s with the the same MD5 are one file copied twice (as opposed to two individually created mp3s), but that doesn't mean they are illegal...
If that were possible, it would destroy the value of an MD5 hash immediately and everyone wouild quit using it faster than you could blink.
The purpose of CRC hashes is entirely different. They are designed to detect a burst of bit errors in a stream of data, the type of error that is most likely to occur in a network transmission. They are not meant for fingerprinting files.
I doubt that anyone with any degree of sophistication in cryptology would attempt to use CRC and MD5 hashes interchangeably.