Napster, Audio Fingerprinting, and the Future of P2P

← Back to Stories (view on slashdot.org)

Napster, Audio Fingerprinting, and the Future of P2P

Posted by CmdrTaco on Sunday July 13, 2003 @03:17AM from the well-I'll-believe-it-when-I-see-it dept.

mjmalone writes "Napster founder Sean Fanning is poised for a comeback, seems the now 22 year old Fanning has developed technology which creates "audio fingerprinting" of individual tracks and compares them against fingerprints in his firm's database to determine legality. A fee may be set and collected on a copyrighted track by its rightful owner. Fanning is actively recruiting industry support as well as pushing the idea to p2p services such as kazaa and grokster. " This isn't exactly new technology, but it's still interesting to see what Fanning is up to these days besides movie cameos.

6 of 141 comments (clear)

Min score:

Reason:

Sort:

Napster was adding this in its dying days... by ergo98 · 2003-07-13 03:25 · Score: 5, Informative

I recall that in its dying days Napster was talking about adding this to appease the recording industry. The variation then was from a company called Relatable. Sounds like Shawn is stuck in a recursive loop.
The Parson's Code by Ian+Jefferies · 2003-07-13 03:28 · Score: 5, Informative

I remember seeing a book once that helped you identify songs by whether the sequence of notes at the beginning of the piece went up, down or stayed the same pitch when compared to the previous note. It was about the size of a telephone directory.

A quick Google finds out that its called The Parson's Code, with a lot more information here.

Presumably the fingerprinting scheme works in a similar fashion (over a larger portion of the song, and probably over multiple fragments of the song as well).

Ian.

--
A physicist is an atom's way of thinking about atoms
Re:I'm not surprised about Fanning. by TheKey · 2003-07-13 03:42 · Score: 5, Informative

Uh? Fanning made Napster. Literally.

--
My Journal - 1,337 fans and countin
Fingerprint for free by Davak · 2003-07-13 04:04 · Score: 4, Informative

MusicBrainz already has a free music fingerprint program. It identified about 60-70% of my songs correctly. It also will rename your files and update the ID tags.

The 30-40% it did not find... I could easily find by doing some searching manually through the program.

It was a nice way to completely identify my mp3 collection. Yes, it's a legal collection, but I wanted an easy way to rename the files and id tags.

Anyhoo... the program is pretty buggy so save often. Help the cause.

Enjoy.

DavaK
Re:What an awesome new technology! by gordyf · 2003-07-13 04:17 · Score: 4, Informative

This is not an md5, this is spectral analysis "fingerprint" of the song. Thus they can identify the song no matter what the encoding (within reason, of course, but you wouldn't want to listen to a song so badly encoded that it can no longer be recognized anyway).

See http://musicbrainz.org/ for some software that uses the same technology to help you tag your MP3s.

I'm sure someone will come up with some software that, say, rearranges the MP3 frames of a song, foiling the fingerprinting but allowing the song to be restored on the other end..
One Way Audio Fingerprinting Works by Flwyd · 2003-07-13 06:38 · Score: 4, Informative

As a class project, a friend and I built a music recognition database. You can read our paper.

The general approach is fairly straightforward. You extract a set of "features" (typically several Mel Frequency Cepstral Coefficients, or MFCCs) from each sample of the song, say 10ms. You then pick several (say, 16) arbitrary points and iteratively generate that many "average" feature vectors, along with their weights so that they all sum to a one vector. This data is turned into a Hidden Markov Model (HMM). To see what audio you have, you run it through each of the possible HMMs and see which produces the greatest likelihood.

This method is typically applied to speaker recognition, where a linear search through HMMs is reasonable. This obviously isn't the case when you know about hundreds of thousands of songs, so a large part of the challenge is narrowing the field of HMMs to check (which is one of the focuses in our paper). Relatable, who were working with Napster a long time ago, have clusters that can classify 1,000 songs per second; I'm pretty sure they use this technique.

This technique has several important features. First, it doesn't depend on any properties of files themselves. Checksums would be trivial to beat, looking at a file's length could be circumvented by inserting silence, etc. Since this creates an average of sample data, a song would need to be changed quite a bit to fail to match. (The system is robust to, for instance, changes in bitrate, slowing the music down, and rearranging bits of the song or putting it in reverse.) We didn't have enough "derivative" music to test how it handles sampled music vs. the original -- it depends how much is changed.

Finally, this sort of system is useful for much more than song identification. You can build a model for an artist or genre and determine how to classify the song. One of my focuses in the paper is unsupervised genre classification -- my tests indicated some fairly reasonable groupings. This technique could be used for music recommendation -- "You like Dropkick Murphys? Well, they sound like Flogging Molly, so you might want to check them out."

--
Ceci n'est pas une signature.