Efficiently Reading ID3v2 Tags Over HTTP?
Paul Crowley asks: "Given an HTTP URL for an MP3 file, what's the best way to read its ID3 tags on a GNU/Linux system? It shouldn't be necessary to fetch the whole file: HTTP byteranges should make it possible to fetch only the tiny fraction that's needed, for a big saving in network bandwidth. However, existing ID3v2 libraries are designed to read local files. Extending these libraries for this purpose, or implementing a new one, would be a big job. What's the clean solution - is FUSE the best way, or is there a simpler way that doesn't require root privs? Can I do it using the existing id3lib binary?"
You'd better be prepared to extend the API with a URL handler...
There's no point adding http:// support without also adding ftp:// URL support. FTP supports range fetching as well.
So you have handlers for http:// URLs, ftp:// URLs, and file:// URLs.
Then you'd have to map all the old (compatibility) file-oriented APIs into the new function handlers for file://. (Or maybe the opposite, map file:// into the old API, leaving the old implementation intact)
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
It seems like it shouldn't be that hard. You just initiate the HTTP transfer and then cancel it as soon as you have as much data as you need.
I haven't actually done it, but speaking as a server operator, when I look through my server logs, you see some hits that end with status code 499, meaning that the transfer was aborted. So you just have the client software you're writing close the HTTP connection after it locates the end of the ID3 tag. It's probably not 100% efficient, but obviously a lot better than reading the whole MP3 file.
I'm assuming you're doing this in C/C++, but I'll try to do a prototype in perl.
The number of checks you have to do is phenominal. The biggest worry is buffer overflow where the length given is greater than the actual length of the tag and you read more than is in the file. There are just hundreds of such edge cases. Libraries for ID3v2 are likely to be buggy, crashy, and just no fun.
Vorbis-comments are ASCII only, right?
No. The field names are ACSII only (actually a printable subset minus '=') but the contents of the fields are specified as UTF-8.
The intention was you could put arbitrary binary data in there too, but there's no general mechanism for marking it as anything else. So any non-UTF-8 use would be application specific.