Efficiently Reading ID3v2 Tags Over HTTP?
Paul Crowley asks: "Given an HTTP URL for an MP3 file, what's the best way to read its ID3 tags on a GNU/Linux system? It shouldn't be necessary to fetch the whole file: HTTP byteranges should make it possible to fetch only the tiny fraction that's needed, for a big saving in network bandwidth. However, existing ID3v2 libraries are designed to read local files. Extending these libraries for this purpose, or implementing a new one, would be a big job. What's the clean solution - is FUSE the best way, or is there a simpler way that doesn't require root privs? Can I do it using the existing id3lib binary?"
The number of checks you have to do is phenominal. The biggest worry is buffer overflow where the length given is greater than the actual length of the tag and you read more than is in the file. There are just hundreds of such edge cases. Libraries for ID3v2 are likely to be buggy, crashy, and just no fun.
That's the problem -- it could be at the end, requiring you to spin through all x bytes (most likely megs) until you get to the end.
_______
2B1ASK1
Look at the MP3::Info Perl module, you might recognize the author's handle. It reads (and writes) tag info. It's used by the "jukebox" module Apache::MP3 (sample site) to generate pages with track info.
Basically every web jukebox out there does something like this so I'm sure there's plenty of other code available to work from. The mod_perl way is to put SetHandler perl-script then PerlHandler [name of module] in your httpd.conf file so when a URL request falls within that Location or Directory, the perl module handles returning whatever you want it to return.
Yeah, that could be true, but if it's not within say, the first 100KB, then the smart thing to do is to stop trying to find it and just return an error.
If it's not at the beginning, you could then use byte ranges to try to fast forward to the end and guess that it will be within the last say, 50 KB of the end.
Vorbis-comments are ASCII only, right?
No. The field names are ACSII only (actually a printable subset minus '=') but the contents of the fields are specified as UTF-8.
The intention was you could put arbitrary binary data in there too, but there's no general mechanism for marking it as anything else. So any non-UTF-8 use would be application specific.