Slashdot Mirror


Efficiently Reading ID3v2 Tags Over HTTP?

Paul Crowley asks: "Given an HTTP URL for an MP3 file, what's the best way to read its ID3 tags on a GNU/Linux system? It shouldn't be necessary to fetch the whole file: HTTP byteranges should make it possible to fetch only the tiny fraction that's needed, for a big saving in network bandwidth. However, existing ID3v2 libraries are designed to read local files. Extending these libraries for this purpose, or implementing a new one, would be a big job. What's the clean solution - is FUSE the best way, or is there a simpler way that doesn't require root privs? Can I do it using the existing id3lib binary?"

7 of 65 comments (clear)

  1. how I would do it by HughsOnFirst · · Score: 2, Insightful

    From your question, it sounds like you already have figured out how to use http to grab the relevant byte range. I don't know anything about ID3 tags but if they are at the head of the file, then you just need to get the head, if they are at the tail, then just get the tail. Save the relevant byte ranges as files locally on a directory structure that is based on the URL so that http:/mess-o-mp3.com/content/this.mp3 would map to /mess-o-mp3.com/content/this.mp3. If necessary append or prepend dummy mp3 file so that existing ID3v2 libraries that are designed to read local files can read the ID3 tags. Then just run the "existing ID3v2 library" against the file tree that you have just built, and translate the output from describing the contents of the file system to describing the contents of the Internet.

    This doesn't see so complicated to me

  2. Er... by cjpez · · Score: 1, Insightful

    id3v2 tags are at the beginning of the file. That's one of the reasons id3v2 was developed as an alternative to the id3v1 tags which were put on the end. Stick an http:// url to an mp3 file with id3v2 tags in your xmms playlist and you'll watch the tag info populate in the playlist window right away.

  3. Re:HTTP 499 by pbox · · Score: 4, Insightful

    This is why

    1. read first 3 bytes with http bytrange
    2. if id3, process tag from byte 0
    3. else read last 10 bytes
    4. if 3di, process tag from backwards
    5. else, see if there is a id3v1 tag at the end
    6. if yes, read last 10 bytes before id3v1
    7. if 3di, then process backwards

    So it is possible. He just needs to read the fricking id3 tag definitions.

    --
    Code poet, espresso fiend, starter upper.
  4. Re:Source code is stupid by Marillion · · Score: 2, Insightful
    The bad part about just aborting the connection is that TCP uses an optimistic, windowing protocol. It will send multiple packets down the wire before getting an ACK expecting them accepted and read. Sure one connection is only a KB or two.

    If he needs to scan hundreds or thousands of files, that WILL add up in a hurry. Also, if he's clever, he can take advantage persistant HTTP connections, diable Nagle's Algorithm and really get a performance boost. Especially over a "slow" link.

    --
    This is a boring sig
  5. Re:Perhaps not this simple by Anonymous Coward · · Score: 1, Insightful

    That would entail downloading the whole file. I think the aim is to download just the chunk of data representing the tags.

  6. Amazed that no-one's really tried to attack this by Paul+Crowley · · Score: 3, Insightful

    This is an interesting general problem. I'm sorry that so few people seem to have taken the time to understand it before replying. The general approach seems to be "read the first sentence, assume the poster is an idiot, hit reply".

    The problems are these:

    1) Reading ID3v2 tags on an MP3 file is a complex business. I have no desire to re-implement the libraries that do that, or even to wade deep into the existing codebases, if I can avoid that. And it should be possible to avoid that.

    2) Even knowing the size and location of ID3v2 tags is complex. Contrary to popular belief here, those tags can appear at either the beginning or the end of a file, and can be arbitrary size. I already implemented the "fetch some stuff at the beginning and some stuff at the end and feed that to the library" approach, and it sort-of works, but you have to guess the size of the tag. Guess too big, you fetch lots of data unnecessarily. Guess too small, you get breakage or wrong results. By contrast, the libraries that read ID3v2 tags know exactly where and how much to read to glean the appropriate data, and it should be possible to make use of that.

    3) I want to read existing data - changing the format of that data is not an option.

    So that's why I was suggesting solutions like "FUSE". With FUSE, when the library does a seek and a read, I can arrange for just the relevant portion of the file to be fetched. I don't have to include any knowledge about ID3 in my application - the library does all the work. But the library doesn't have to worry about HTTP byte ranges - FUSE handles that. And the code will always be correct.

    The only trouble is that FUSE requires a kernel patch and root privs. The question is, is there a way to do the same trick without those limitations? Or is there a library for reading ID3v2 tags in an object-oriented language that will let me put an efficient back-end for fetching data on request using HTTP byteranges in place of the file?

    The best information I've got out of this is that there's a pure-Python implementation of ID3v2 (most implementations appear to be built on top of the C library). This may be hackable to solve my problem.

    Those of you who didn't think reading or thinking was necessary before posting - please don't do the next "Ask Slashdot" post the same discourtesy. Thanks.

  7. Would not require downloading the whole file. by danielsfca2 · · Score: 2, Insightful

    No it wouldn't. The tag is at the beginning of the file, so why not just fetch the first 512 bytes or so (more if you expect cover art in the tag) and save it into /tmp/.

    If you really -must- download only the tag and not a byte more, then clearly you'd have to (A.) know the offset in each file where the tag ends. This is not possible without storing that in some sort of database. Which won't work if you aren't the person in control of the server. Or (B.) download the file and scan it as you download looking for the end of the tag and when you see it abort the download. Seems more trouble than it's worth to bother using those methods, though.