Slashdot Mirror


RIAA Tracking Songs by MD5 Hashes

aSiTiC writes "Apparently RIAA has obtained some technical experts in their prosecution of file swappers. Currently they are tracking traded mp3 files from the Napster network by matching MD5 hashes. This seems quite interesting but I was under the assumption that identical hashes could be created with identical rips and id3v2 tagging. Now may be the time to update your illegal mp3 file MD5 hash sums."

9 of 779 comments (clear)

  1. MD5 Cannot stand up in court. by Organized+Konfusion · · Score: 5, Informative

    The md5 hashing algorithm has been proven to contain flaws allowing two files to produce identical md5 sums.

    1. Re:MD5 Cannot stand up in court. by Urkki · · Score: 5, Informative

      A bit of clarification is in order I think.

      First of all it's very clear that two files can give same MD5 checksums. After all, MD5 is only 16 bytes (2^128 different possible). So if you have just 17 byte files (2^136 different possible), it's clear that on average every MD5 sum matches to 256 of all possible files.

      It's just damn unlikely to get 2 files with same MD5, and if you wanted to brute force it, you would have to try average 2^64 different files before you found one with identical MD5 to another file. And this would take a long time (actually not that terribly long, a few years at most, and it parallelizes perfectly).

      The page you link to implies that it's possible to "easily" fabricate a file that produces a given check sum, so instead of months of processing time, only days or hours would be needed to get a MD5 hash collision.

      So all P2P users / software makers need to do to circumvent this, is to agree on a specific MD5 sum, then patch every file so that they produce this same MD5 sum :)

      Of course the obivious solution for RIAA would be to use a more secure hash algorithm, with more bits. Unbroken algorithm with enough bits can't be faked, as it would take more than age of the universe to brute force it.

      Though the basic problem with this RIAA method remains. If you rip with same software from identical CD digitally, and there are not bit errors at ay point, then you should end up with identical file, and therefore identical hash no matter how secure the algorithm is...

  2. MD5 Hash by fruey · · Score: 5, Informative
    This seems quite interesting but I was under the assumption that identical hashes could be created with identical rips and id3v2 tagging.

    The only way for two files to have the same MD5 hash is for them to both be encoded with the same encoder, from the same WAV file, with the same bitrate and all advanced options, and to have exactly the same ID3 information, the same filesize, and to be identical to the last bit.

    Otherwise, the MD5 will be nothing like the same, for two perfectly identical songs where one has a spelling error in one field of the ID3 tag. I imagine for any one song, there are many many different MD5sums out there, although perhaps one or another good quality version would exists on hundreds of different PCs...

    --
    Conversion Rate Optimisation French / English consultant
  3. Md5 hashes are also used for.... by shione · · Score: 5, Informative

    hmm Isn't that how k-sig, built into Kazaa Lite K++, works, by tracking MD5 hashes so ppl get exactly the file they want.

    Changing MD5 hashes on songs to avoid RIAA would also lessen the effectiveness of K-SIG. Trading hashes of know working files was one of the ways ppl on P2p avoided downloading those fake RIAA files.

  4. Re:What happen if by l1gunman · · Score: 5, Informative

    Any modification, to ANY bit of the file covered by the hash, will change the MD5 hash (that's how hashes work). If you assume the hash includes the ID3 tag info, then simply editing the info (putting something in the notes field, for example) would change the hash.

    On the other hand, if I were the RIAA attempting to identify common files in this way, I might be inclined to exclude the ID3 tag from the MD5 computation since it is so easily modified.

    Any changes to the actual content, though, will ripple into the MD5 computation.

    Short answer: "normalizing" the file for volume, or even chopping off a few seconds of trailing silence with something like CoolEdit will certainly change the hash and make it distinct from whatever their baseline hash value is.

  5. Easy by sprouty76 · · Score: 5, Informative
    Just take a random id3 field that you don't use for anything, and fill it with a random number. You can probably write a srcipt in a few seconds. Bingo, different md5.

    The only problem is that a lot of file sharing software uses the fact that 2 files (from different sources) have the same hash in order to swarm the download from multiple sources. If everybody goes around intentionally making their mp3s have different hashes, swarming basically won't work anymore.

    --

    No, I don't want a free iPod

  6. MD5 sums and different encoders by Psyborgue · · Score: 5, Informative

    Pretty much no rip is identical.

    First step: the *.wav is ripped. Using libcdparanoia, which i personally perfer, i find slight variation in size depending on the machine and cdrom drive i rip them on.
    Second step: encoding on different machines, with different encoders, using different algorythms, using different levels of floating point precision, on different architectures etc... produces vastly different files.
    Third step: sharing. Oftentimes an mp3 is downloaded 99.8% before the connection is broken. You keep the mp3 becuase mp3 is a sequential file format and you only lose a second or two of music. The rest of the file is intact.

    Their md5 searching scheme could be circumvented quite easily by changing a comment in the id3 but they could get around that by cutting out the id3 part of the file when they make their md5sum.
    The downside to this is that if you are searching for music on something like gnutella by the ***sum, the content would differ and you would not get as many results. Gnutella would not download from multiple sources becuase the file would not have the same signature.
    Whatever the case, it is clear that some form of file obfuscation is now needed for safety online. Or we can wait for freenet to mature.

  7. Re:What happen if by 1u3hr · · Score: 5, Informative
    Short answer: "normalizing" the file for volume, or even chopping off a few seconds of trailing silence with something like CoolEdit will certainly change the hash

    If that's all you want to do, much better not to use Cooledit, which has to expand and recompress the file to MP3. Use something like MP3Trim which can chop off any given number of MP3 frames, or normalise the volume, by operating on the MP3 directly. Much much faster, and no expand/recompress quality loss.

  8. Re:MD5-hashes by nolife · · Score: 5, Informative

    I just did some consecutive rips of an audio track and compared the md5 checksums.

    I did the same song three times. The first two times, all things were equal including all settings. The MD5 checksums were the same.

    I swapped out my DVD/CD player for a different model. Reripped the track on the same computer with the same exact settings and the MD5 was different.

    I am using Exact Audio Copy in secure mode and Lame for the encoding. The ID tags were recieved the first time and the same tags used for all three attempts (EAC remembers the disk).

    I'm sure I could try many things like changing the read speed, comparing the wav files and not just the resulting mp3 etc.. but I do not have the time for more analysis.

    --
    Bad boys rape our young girls but Violet gives willingly.