Efficiently Reading ID3v2 Tags Over HTTP?

Perhaps not this simple by Anonymous Coward · 2004-05-18 02:15 · Score: 3, Interesting

Why couldn't you save the result of the remote HTTP access to a temporary local file and allow the libraries access to that file?

Re:Perhaps not this simple by Anonymous Coward · 2004-05-18 14:28 · Score: 1, Insightful

That would entail downloading the whole file. I think the aim is to download just the chunk of data representing the tags.

You'd have to extend the API by Ayanami+Rei · 2004-05-18 02:15 · Score: 4, Interesting

You'd better be prepared to extend the API with a URL handler...

There's no point adding http:// support without also adding ftp:// URL support. FTP supports range fetching as well.

So you have handlers for http:// URLs, ftp:// URLs, and file:// URLs.

Then you'd have to map all the old (compatibility) file-oriented APIs into the new function handlers for file://. (Or maybe the opposite, map file:// into the old API, leaving the old implementation intact)

--
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON

Re:You'd have to extend the API by Anonymous Coward · 2004-05-18 15:57 · Score: 0

Or just use KDE ioslaves and you don't have to worry about any of that.

Re:Not really answering your question.. by Ashish+Kulkarni · 2004-05-18 02:19 · Score: 2, Informative

ID3v2 data *IS* at the start of the file, normally takes the first 4kB or so (depends on the padding settings).

No need for an index by metalhed77 · 2004-05-18 02:19 · Score: 1

Just read chunks of teh end of the file until you start hitting mp3 data. You'll waste some bandwidth, but if you choose appropriately sized chunks, you should be fine.

--
Photos.

Re:No need for an index by Anonymous Coward · 2004-05-18 16:02 · Score: 0

But ID3v2 isn't stored at the end of the file. You'll end up downloading almost the whole file using your method.

Silly Question by Anonymous Coward · 2004-05-18 02:19 · Score: 1, Interesting

You know from the mp3 spec where the tag will be. Just snag that information and feed it to the preexisting library. If you have to, make a temporary file that meets the bear minimum definition of an mp3 (using the snagged tag info, of course).

Without looking and without knowing, I'm willing to bet there's a Perl module for processing mp3 ID3v2 tags. The whole project can probably be done in Perl in a very small amount of lines.

Re:Silly Question by dave420 · 2004-05-19 03:26 · Score: 1

It's ID3v2, so the tag can be variable length. It's not as easy as just reading, say, 4 bytes at 0x1D to get the length of the header, either - the size is stored in "sync-safe" form, where the MSB is moved to the next byte over, to stop the tag being interpreted as an MP3 frame.
Of course there's a package for handing ID3 tags in perl. Heck, I wrote one in PHP. This is about efficiently reading tags over HTTP, where getting the tags requires multiple requests, and not just downloading the whole file. That's what this is about. Not just downloading a track over HTTP and wondering how to get the tags.
Re:Silly Question by Paul+Crowley · 2004-05-19 03:30 · Score: 1

Thank you - you appear to be the first person to have understood my request. Was I that unclear?

--
Xenu loves you!

HTTP 499 by cryptor3 · 2004-05-18 02:28 · Score: 4, Interesting

It seems like it shouldn't be that hard. You just initiate the HTTP transfer and then cancel it as soon as you have as much data as you need.

I haven't actually done it, but speaking as a server operator, when I look through my server logs, you see some hits that end with status code 499, meaning that the transfer was aborted. So you just have the client software you're writing close the HTTP connection after it locates the end of the ID3 tag. It's probably not 100% efficient, but obviously a lot better than reading the whole MP3 file.

I'm assuming you're doing this in C/C++, but I'll try to do a prototype in perl.

Re:HTTP 499 by Anonymous Coward · 2004-05-18 03:01 · Score: 0

don't know what server you are using ...

499 is not a standard HTTP status code
Re:HTTP 499 by eyeball · 2004-05-18 03:37 · Score: 4, Informative

From the ID3v2 FAQ:

Q: Where is an ID3v2 tag located in an MP3 file?

It is most likely located at the beginning of the file. Look for the marker "ID3" in the first 3 bytes of the file.

If it's not there, it could be at the end of the file (if the tag is ID3v2.4). Look for the marker "3DI" 10 bytes from the end of the file, or 10 bytes before the beginning of an ID3v1 tag.

That's the problem -- it could be at the end, requiring you to spin through all x bytes (most likely megs) until you get to the end.

--

_______
2B1ASK1
Re:HTTP 499 by cryptor3 · 2004-05-18 03:41 · Score: 1

Yeah, so you're right. 499 is just the code my server puts in the logs when a connection gets aborted.
Re:HTTP 499 by cryptor3 · 2004-05-18 03:54 · Score: 3, Informative

That's the problem -- it could be at the end, requiring you to spin through all x bytes (most likely megs) until you get to the end.

Yeah, that could be true, but if it's not within say, the first 100KB, then the smart thing to do is to stop trying to find it and just return an error.
If it's not at the beginning, you could then use byte ranges to try to fast forward to the end and guess that it will be within the last say, 50 KB of the end.
Re:HTTP 499 by pbox · 2004-05-18 05:18 · Score: 4, Insightful

This is why

1. read first 3 bytes with http bytrange
2. if id3, process tag from byte 0
3. else read last 10 bytes
4. if 3di, process tag from backwards
5. else, see if there is a id3v1 tag at the end
6. if yes, read last 10 bytes before id3v1
7. if 3di, then process backwards

So it is possible. He just needs to read the fricking id3 tag definitions.

--
Code poet, espresso fiend, starter upper.

ID3v2 Sucks by DeadSea · 2004-05-18 02:38 · Score: 5, Informative

As somebody who has tried to write libraries that read ID3v2 tags, I'd have to say I hate them. The standard is clear and well documented, but the chosen format is horrible. It is very hard to write a parser correctly. It would have been so much better to embed an XML document at the front of the MP3 file. Instead they decided to make each field in a special binary format prepended by a length field.

The number of checks you have to do is phenominal. The biggest worry is buffer overflow where the length given is greater than the actual length of the tag and you read more than is in the file. There are just hundreds of such edge cases. Libraries for ID3v2 are likely to be buggy, crashy, and just no fun.

Re:ID3v2 Sucks by foofboy · 2004-05-18 03:25 · Score: 3, Informative

I hear you on that. I wrote a python module for id3v2 tags. It reminded me of nothing so much as ASN.1/BER/DER.

It does, however, support arbitrary character sets and arbitrary binary formats, though. Not sure there's another way to do it. Vorbis-comments are ASCII only, right?

I look forward to your reply.
Re:ID3v2 Sucks by Fweeky · 2004-05-18 07:30 · Score: 4, Informative

foobar2000 uses APEv2 tags on MP3's by default; the standard's just as flexible (well, as much as anyone wants anyway), but, well, you just need to compare filesizes for their handlers; an ID3v2 reader/writer I saw was ~150k of code -- the APEv2 one was 15k. They're always at the end, but obviously since fb2k is the only player I'm aware of which supports it the appeal may be limited. You can at least mix them with ID3v1, which should be good enough for portables.

And before anyone goes off on one because it's non-standard, I'll point out that MP3 has *no* provision for metadata. ID3v1 and 2's are just as arbitary addons as APEv2; they're just older (and lamer, either in big limitations or extreme overcomplication).

I believe the recommended *standard* way of attaching metadata to an MP3 now is to put it in an MP4 container, which has it's own more sensible format. Again, I'm pretty sure foobar2000 (maybe with some plugin in the Special Installer) can put them in, and I think they should play on anything which knows about MP4. Fully reversable too.
Re:ID3v2 Sucks by swillden · 2004-05-18 10:27 · Score: 1

It reminded me of nothing so much as ASN.1/BER/DER.
If they were going to do something similar to ASN.1, they should have just used ASN.1 BER. Then writing tag manipulation tools would be easy. ASN.1 BER is complex and a pain in the butt to write from scratch, but lots of good tools exist so writing it from scratch wouldn't be necessary.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:ID3v2 Sucks by dave420 · 2004-05-19 03:31 · Score: 2, Interesting

It all made sense to me. You did understand the purpose of sync-safe bytes, right? They're not just there to piss people off.
I wrote a class for handling ID3v1/2 tags, and it works fine. I use it nearly every day, and it's processed nearly 5000 songs without fail (various versions of v2 tags, mixed in with the old classic v1), from Apples, *nixes and windows.
The format is so specific you can code for almost any eventuality. It's one of the easier binary formats I've worked with, and I think it's a great place for developers to learn about manipulating/creating binary files.
Re:ID3v2 Sucks by pdh11 · 2004-05-19 05:02 · Score: 2, Informative

The standard is clear and well documented
...and never followed. In particular the bit about text being either ISO Latin 1 or UTF-16 (or, in later versions of ID3v2, UTF-8), which is a very sensible idea, is always completely ignored; the overwhelming majority of tag writers, both on Windows and Linux, write text in arbitrary 8-bit encodings (shift-JIS, GBK, whatever) and then mark them as being Latin 1. There's nothing a tag reader can do about that, as there's no way to work out what the writer's locale was. Taglib can write Unicode tags correctly, but no front-end for it that I've seen does the Right Thing: use Latin 1 tags if all the characters used are available in Latin 1 (or, given the problems above, US-ASCII) and UTF-16 tags otherwise.
The problem isn't the standard. It's the implementors.
Peter
Re:ID3v2 Sucks by technology+is+sexy · 2004-05-20 03:23 · Score: 1
Just in case anybody is seriously interested in working with APEv2:
Formats currently supporting/using APEv2: MPC, WavPack, APE (Monkey's Audio), MP3

how I would do it by HughsOnFirst · 2004-05-18 02:47 · Score: 2, Insightful

From your question, it sounds like you already have figured out how to use http to grab the relevant byte range. I don't know anything about ID3 tags but if they are at the head of the file, then you just need to get the head, if they are at the tail, then just get the tail. Save the relevant byte ranges as files locally on a directory structure that is based on the URL so that http:/mess-o-mp3.com/content/this.mp3 would map to /mess-o-mp3.com/content/this.mp3. If necessary append or prepend dummy mp3 file so that existing ID3v2 libraries that are designed to read local files can read the ID3 tags. Then just run the "existing ID3v2 library" against the file tree that you have just built, and translate the output from describing the contents of the file system to describing the contents of the Internet.

This doesn't see so complicated to me

Re:Not really answering your question.. by ScriptGuru · 2004-05-18 02:47 · Score: 1

ID3v2, to which he was refering, has been out for years and is at the beginning of the file.

--
Yet another signature that refers to itself. The irony and humor is dead.

Er... by cjpez · 2004-05-18 02:49 · Score: 1, Insightful

id3v2 tags are at the beginning of the file. That's one of the reasons id3v2 was developed as an alternative to the id3v1 tags which were put on the end. Stick an http:// url to an mp3 file with id3v2 tags in your xmms playlist and you'll watch the tag info populate in the playlist window right away.

--
Al Qaeda has ninjas!

Re:Er... by cjpez · 2004-05-18 02:53 · Score: 2, Informative

Well, *actually* from what I understand, the id3v2 tags *can* be put anywhere in the file, so you could have (for instance) an hour-long mp3 of a radio stream and have the tags change with each song. I'm not sure if anything actually supports that, though. Regardless, if you tag with id3v2, it'll be right at the beginning.

--
Al Qaeda has ninjas!
Re:Er... by cujo_1111 · 2004-05-18 15:14 · Score: 1

Unless it is tagged with ID3v2.4 whewre it could be at the end of the file...

--
If I point out that you are incorrect, making me a foe does not make you any more correct.

Grab the header, file the rest by ScriptGuru · 2004-05-18 02:57 · Score: 1

The easy way out is probably just to grab the first 10 bytes from HTTP and use your own stripped down library to read the header. These precious 10 bytes tell you a) that is is an id3 tag, and b) how much more has to be downloaded. If you take the size data you can easily see how much of the remaining file you need and dump it into a temp file for parsing.

--
Yet another signature that refers to itself. The irony and humor is dead.

Continue? by bigattichouse · 2004-05-18 03:00 · Score: 1

I thought HTTP1.1 had continue features where you could specify a byterange... just ask for the first X bytes...if you didn't get enough, get another chunk and append.

--
meh

Re:Not really answering your question.. by Joseph+Vigneau · 2004-05-18 03:18 · Score: 1

Ah. I am less aware of v2 than v1, then.

MP3::Info Perl module by extra88 · 2004-05-18 03:40 · Score: 3, Informative

Look at the MP3::Info Perl module, you might recognize the author's handle. It reads (and writes) tag info. It's used by the "jukebox" module Apache::MP3 (sample site) to generate pages with track info.

Basically every web jukebox out there does something like this so I'm sure there's plenty of other code available to work from. The mod_perl way is to put SetHandler perl-script then PerlHandler [name of module] in your httpd.conf file so when a URL request falls within that Location or Directory, the perl module handles returning whatever you want it to return.

Source code is stupid by cryptor3 · 2004-05-18 03:51 · Score: 1

So actually, the code isn't worth posting. It's pretty much as I said it. Read until you have a valid ID3 tag, and then break the socket connection.

Re:Source code is stupid by Marillion · 2004-05-18 07:28 · Score: 2, Insightful

The bad part about just aborting the connection is that TCP uses an optimistic, windowing protocol. It will send multiple packets down the wire before getting an ACK expecting them accepted and read. Sure one connection is only a KB or two.
If he needs to scan hundreds or thousands of files, that WILL add up in a hurry. Also, if he's clever, he can take advantage persistant HTTP connections, diable Nagle's Algorithm and really get a performance boost. Especially over a "slow" link.

--
This is a boring sig

Just realized I was nearly offtopic by cryptor3 · 2004-05-18 04:05 · Score: 1

I just realized that I was answering with something the article poster probably already knows.

As far as how you might be able to use an existing library to extend other libraries, It seems like you should be able to save the first x bytes of http (mp3) data to a local temp file and then have the pre-existing id3 library run over that data. I would think that this doesn't necessarily require root.

Hmmm... by hillg3 · 2004-05-18 04:28 · Score: 1

Sounds like the RIAA has hired a college intern - They can't afford real programmers any more with all this money they're losing.

The easy solution... by Whip · 2004-05-18 06:04 · Score: 1

1. Download first ~1k of file (which should contain at least the start of the id3v2 tag)
2. Check to make sure you have the whole tag. If it's bigger than what you downloaded, download the rest of the tag.
3. Write to a temporary file
4. Run existing libraries and/or tools against temporary file

... you don't need any of the actual mp3 to get the id3v2 info, and the above will work on most files. The exception will be the few files that have the id3v2 data at the end and just a reference to it at the front of the file -- but those are pretty rare.

CDDB: Feel the Pain by cpeterso · 2004-05-18 08:07 · Score: 2, Informative

Here is Netscape's JWZ hilariously sad-but-true rant about the ID3 header format:

CDDB: Feel the Pain

In case you didn't know, the file format that CDDB (and FreeDB) use is complete garbage. In addition to random idiotic crap like it being impossible to unambiguously represent a song title that has a slash in it, it's rocket science to figure out how long a song is supposed to be. I need this info not only to display it in Gronk (my MP3 jukebox software), but also for some error-checking that my CD-ripping scripts do, so that I don't end up with truncated files if there was a crash or a full disk or something.

So get this. CDDB files contain junk like this:

# Track frame offsets:
# 150
# 18265
# 32945
# 49812 ...
# Disc length: 3603 seconds
#
DISCID=...
DTITLE=...

(You'd think that the fact that it's in a comment would mean something, but no: you have to parse both comments and non-comments, begging the question of what they thought "comment" means.)

Those numbers are the starting sectors of each track on the disc. There are 75 sectors per second. So you convert those to seconds by dividing, and then find the length of each track by subtracting each from the previous. Oh, but wait, they don't give you the sector address of the end of the last track: for that one, it's expressed in seconds instead of sectors, for no sensible reason. Still, the info is there, right?

Uh, almost.

It turns out that if the last track on a CD is a data track (an ISO9660 file system) then there is a gap between the last track (the data track) and the second-to-last track (the last audio track.) This gap is exactly 11400 sectors (152 seconds, 2:32.) On some discs, you can actually see this track, it's a differently-shiny ring. Why's it there? I don't know. Why's it that size? I don't know. What if the data track is not the last track on the CD? (Does that even work?) I don't know.

So what this means is, when computing the length that a track should be, you have to subtract 152 seconds from the length of the second-to-last track, only if the last track is a data track.

How do you tell whether the last track is a data track, without having the CD in question physically in your computer? By hoping that the CDDB file contains the words "data track" in the title of that track, I guess. Yeah, that's reliable.

And, just to keep things interesting, it turns out that older versions of grip and cdparanoia didn't skip over this gap when ripping: instead, they would append 152 seconds of silence onto the end of the second-to-last track. So now my script that sanity-checks the lengths of the files has to consider two different lengths to be "right", since I now have CDs that were ripped both ways.

Whee. I love love love supporting standards invented by 12-year-olds.

Of course the reason that I use CDDB files at all in Gronk is because of the mind-blowing worthlessness of ID3 tags (32 character limits on titles, etc.) Yay more standards invented by 12-year-olds. (Please don't even mention ID3v2 or Ogg. I laugh at you, you silly person. Those are universally-unsupported fantasies that simply trade one set of problems for a whole new set of problems.)

And as if CDDB wasn't bad enough, FreeDB has taken the CDDB braindeadness and layered even more braindeadness on top of it: it is truly a thing of wonder.

For example, go ahead and try to ever have the "genre" field be something approaching reality -- oops! The first person who ripped this CD said it was "folk" because that's genre number zero! So fix it and resubmit it to the database? Sorry! You can't ever change the genre of an entry in the database after creation, since the genre dictates what directory the file goes in on their server. And so on.

It's a wonder anything works at all.

--
cpeterso

Re:Silly Question - NOT by Anonymous Coward · 2004-05-18 09:38 · Score: 1, Informative

You know from the mp3 spec where the tag will be.

If you had any knowledge of ID3x2, you'd be aware that you DON'T know where the tag will be. It can be placed pretty much anywhere within the file.

Vorbis comments UTF-8 by rillian · 2004-05-18 09:47 · Score: 4, Informative

Vorbis-comments are ASCII only, right?

No. The field names are ACSII only (actually a printable subset minus '=') but the contents of the fields are specified as UTF-8.

The intention was you could put arbitrary binary data in there too, but there's no general mechanism for marking it as anything else. So any non-UTF-8 use would be application specific.

Tried using PHP? by BladeMelbourne · 2004-05-18 11:13 · Score: 1

I believe ID3 tags are at the beginning of MP3 files - so you could use a couple of neat PHP functions.

fopen() can open local files & URLs - look at the http:// example:
fopen()

fgets() will read in data from the steam - you can pick how many bytes you want to read in:
fgets()

Dont forget to use fclose() afterwards!

When you get those functions working, it's just a matter of interpreting the content returned. PHP has many useful string functions - many more than ASP does.

These functions are analogous to using a Microsoft.XMLHTTP object:
myXMLHTTPObject.Open "GET", someURL

The PHP and ASP way are both neat ways to read content from [X]HTML pages on other servers (the weather, share prices, etc) - although it might not be 100% ethical!

Hope this helps... I have nothing better to do while I wait for Fedora Core 2 to be delivered. Oh wait, I do... I'm at work reading Slashdot!

Re:Tried using PHP? by BladeMelbourne · 2004-05-18 13:50 · Score: 1

A quick Google on "php id3v2" returned ID3v2.php
The author is German, as are a couple of his comments, but the PHP code is tidy, with English variables. The script handles ID3v1 - ID3v2.3 and is LGPL.
No need to reinvent the wheel :-)
Re:Tried using PHP? by abiggerhammer · 2004-05-18 21:34 · Score: 1

I should think libexif should work just as well under PHP. Cf. the 07-Feb-2004 comment.

--
Dance like nobody's watching. Sing like you're in the shower. Fuck like you're being filmed.

Fun problem .. by stevey · 2004-05-18 22:03 · Score: 1

The simple choice seem to be "read a range of 0-50k" to see if the data is at the start of the file. If it is then you get lucky and win!

If it isn't then you assume it's at the end, and then ideally you just want to just say "give me the last 50k".

Unfortunately you can't do that as there isn't a notion of negative offsets from the end of a file in HTTP. So in the general case you cannot do better than read the whole thing.

I guess if you have a directory index you can parse the filesize from that and then use that in your range'd request, but that's sucky too.

Re:Fun problem .. by Paul+Crowley · 2004-05-18 22:43 · Score: 1

HTTP can tell you the file size, and with byte ranges you can fetch whichever bits you want.

The *real* problem is this: if I were writing this in Java, and the ID3v2 library were in Java, I could easily provide a seekable InputStream object representing a file which I have a URL for, and the ID3v2 library would read only the parts of the file it needed. It wouldn't have to think about the fact that the file was remote, and I wouldn't have to anticipate what it was going to want to read using the cumbersome and fragile heuristics people have been suggesting here.

However, the best library for these purposes is in C, and it expects to call "seek" on a real, kernel file descriptor. I'm wondering if there's a solution that's as neat in that world.

If there's an implementation of ID3v2 that's entirely written in an OO language and does not use the C library, then the OO language will have almost certainly have the primitives needed to present the URL as one would present a file, for the same neatness of solution.

--
Xenu loves you!
Re:Fun problem .. by metamatic · 2004-05-19 00:55 · Score: 1

How about using the existing C id3v2 library, but feeding it a UNIX socket/pipe instead of an actual file or stream? Then at the other end of the pipe you put some code which does just-in-time fetching of appropriate chunks of the MP3 at the other end of the network connection.

Of course, this relies on the library seeking in a sensible way, and you might have to hack it to use seek to determine file size rather than fstat.

--
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
Re:Fun problem .. by dave420 · 2004-05-19 03:36 · Score: 1

There is a negative offset... it's called "content-length"-50 :)
Re:Fun problem .. by Paul+Crowley · 2004-05-19 03:56 · Score: 1

What happens when the existing library seeks to near the end of the file and reads a byte?

--
Xenu loves you!
Re:Fun problem .. by 42forty-two42 · 2004-05-19 12:13 · Score: 1

Unfortunately you can't do that as there isn't a notion of negative offsets from the end of a file in HTTP. So in the general case you cannot do better than read the whole thing.
There is, however, a nifty thing called a HEAD request, which gets all the headers and none of the data. Observe:
HEAD / HTTP/1.0 HTTP/1.1 200 OK Date: Thu, 20 May 2004 00:10:24 GMT Server: Apache/2.0.49 (Gentoo/Linux) mod_perl/1.99_10 Perl/v5.8.4 mod_ssl/2.0.49 OpenSSL/0.9.7d DAV/2 SVN/1.0.2 PHP/4.3.6 Content-Location: index.html.en Vary: negotiate,accept-language,accept-charset TCN: choice Last-Modified: Sun, 02 May 2004 05:31:25 GMT ETag: "509819-5b0-633f8540" Accept-Ranges: bytes

Content-Length: 1456

Connection: close Content-Type: text/html; charset=ISO-8859-1 Content-Language: en
Re:Fun problem .. by metamatic · 2004-05-23 14:31 · Score: 1

You fetch the relevant bit of file via HTTP? It's really just like implementing disk caching.

Of course, there's the pathological worst case where the ID3 tag is ID3v1 or ID3v2.4, i.e. at the end of the file, and the HTTP server doesn't support HTTP/1.1 byte range requests. In that case, you fetch the entire file. But that's no worse than not having the middle caching layer at all, and it's hard to see how you could do better in that case.

--
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak

Completely irrelevant by Paul+Crowley · 2004-05-18 22:47 · Score: 1

CDDB is a completely different file format from ID3v2.

--
Xenu loves you!

The Solution by Anonymous Coward · 2004-05-19 03:03 · Score: 0

Let's all use Ogg Vorbis instead! One of my main arguments for using Ogg is the superior format for metadata. //Blenda

Amazed that no-one's really tried to attack this by Paul+Crowley · 2004-05-19 03:37 · Score: 3, Insightful

This is an interesting general problem. I'm sorry that so few people seem to have taken the time to understand it before replying. The general approach seems to be "read the first sentence, assume the poster is an idiot, hit reply".

The problems are these:

1) Reading ID3v2 tags on an MP3 file is a complex business. I have no desire to re-implement the libraries that do that, or even to wade deep into the existing codebases, if I can avoid that. And it should be possible to avoid that.

2) Even knowing the size and location of ID3v2 tags is complex. Contrary to popular belief here, those tags can appear at either the beginning or the end of a file, and can be arbitrary size. I already implemented the "fetch some stuff at the beginning and some stuff at the end and feed that to the library" approach, and it sort-of works, but you have to guess the size of the tag. Guess too big, you fetch lots of data unnecessarily. Guess too small, you get breakage or wrong results. By contrast, the libraries that read ID3v2 tags know exactly where and how much to read to glean the appropriate data, and it should be possible to make use of that.

3) I want to read existing data - changing the format of that data is not an option.

So that's why I was suggesting solutions like "FUSE". With FUSE, when the library does a seek and a read, I can arrange for just the relevant portion of the file to be fetched. I don't have to include any knowledge about ID3 in my application - the library does all the work. But the library doesn't have to worry about HTTP byte ranges - FUSE handles that. And the code will always be correct.

The only trouble is that FUSE requires a kernel patch and root privs. The question is, is there a way to do the same trick without those limitations? Or is there a library for reading ID3v2 tags in an object-oriented language that will let me put an efficient back-end for fetching data on request using HTTP byteranges in place of the file?

The best information I've got out of this is that there's a pure-Python implementation of ID3v2 (most implementations appear to be built on top of the C library). This may be hackable to solve my problem.

Those of you who didn't think reading or thinking was necessary before posting - please don't do the next "Ask Slashdot" post the same discourtesy. Thanks.

--
Xenu loves you!

Re:Not really answering your question.. by dave420 · 2004-05-19 03:39 · Score: 1

It can be anywhere from 20bytes up to hundreds of KB. iTunes writes particularly wasteful ID3 tags, which are nearer 4KB.

Re:Amazed that no-one's really tried to attack thi by dave420 · 2004-05-19 03:44 · Score: 1

I wrote an object-oriented PHP class to handle tag information. It would be very, very trivial to modify it to work on remote files, even doing it efficiently (seeing as it has all the code necessary to read in tag lengths, etc).

ID3v2 tags are very interesting, in my opinion :)

Would not require downloading the whole file. by danielsfca2 · 2004-05-19 06:12 · Score: 2, Insightful

No it wouldn't. The tag is at the beginning of the file, so why not just fetch the first 512 bytes or so (more if you expect cover art in the tag) and save it into /tmp/.

If you really -must- download only the tag and not a byte more, then clearly you'd have to (A.) know the offset in each file where the tag ends. This is not possible without storing that in some sort of database. Which won't work if you aren't the person in control of the server. Or (B.) download the file and scan it as you download looking for the end of the tag and when you see it abort the download. Seems more trouble than it's worth to bother using those methods, though.

Yes, yes you were.... by Otto · 2004-05-19 08:01 · Score: 1

Was I that unclear?

Yes, it was unclear because you provided too much information.

What you appear to actually want is a generic way to wrap a library that reads a file or stream of some type and be able to feed it from an http stream doing efficent requests, by getting byteranges over http.

The fact that you want ID3 isn't totally relevant, as you want a way to wrap the existing ID3 class to read from http instead of, say, a file. This probably confused a lot of people.

Short answer is that no, I don't think there's any good way to do this for all libraries. Too much varience in what the library does.. If the library can read from stream input you give it directly (instead of from a file descriptor you pass in), then you could possibly fool it that way by writing something that pretends to be a stream but instead is reading from a file.. Could be tricky, but it's doable.

But if you're passing in a file descriptor, then you're looking as faking the file descriptor out, and that's probably kernel level code to do that. I don't think there's anyway to do it in userland.

--
- Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.

Re:Yes, yes you were.... by Paul+Crowley · 2004-05-19 20:10 · Score: 1

I think you're right about the providing too much information thing. I guess I hoped that someone would have solved this specific case specifically for ID3v2, which is my current problem. If someone had been able to say "the X ID3v2 library has code to do this" then that would have solved my current problem even though it's not generic.

But in future I'd write a "general" thing about HTTP as a read-only network filesystem, and then a second "specific" paragraph about why I'm interested.

--
Xenu loves you!

Oops, typo... by Otto · 2004-05-19 08:04 · Score: 1

Replace "instead is reading from a file" with "instead is reading from http". Sorry.

--
- Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.

You still don't get the fundamental problem? by danielsfca2 · 2004-05-19 08:25 · Score: 2, Informative

This is where you tell us how you expect to find out which sections of the files contain the tag. The "libraries that read ID3v2 tags know exactly where and how much to read" because of the fact that they have access to the whole file! If you want that kind of perfect efficiency, then you'll have to download the whole file!

If you want a solution that will allow you to escape downloading the whole file, just check for ID3 in the first 3 bytes and 3DI @ 10 bytes back from the end. Download a couple K in the appropriate direction (more if you expect more than a minority to have images). Neither means no tag, move on. Then parse the tags with your libraries, catch errors, and try to grab more on the few that will throw errors, in case someone put a huge image or War and Peace in a tag.

It's not that hard, and I'm not sure what sorcerer's magic you expected Ask Slashdot to come up with to help you with this.

I think we can all agree on one thing though. Whatever asshole who decided that a tag at the end of the file is a good idea needs to be smacked in the head.

Re:You still don't get the fundamental problem? by Paul+Crowley · 2004-05-19 20:07 · Score: 1

This is completely wrong.

Think about the actual calls to "read" and "seek" on the filehandle the library does. Now imagine that in the background, you fetch parts of the file only at the moment the application calls "read". You'll see that the application does not "read" every last byte from the file - usually much, much less.

FUSE does exactly this "sorcerer's magic".

Or think about what would happen if the file were served by NFS, rather than by HTTP. Again, only the parts of the file that were needed would be fetched over the network. No magic.

--
Xenu loves you!

Re:Amazed that no-one's really tried to attack thi by Gollum · 2004-05-20 19:09 · Score: 1

It sounds like you already know what it is that you need to do, but are looking for implementation ideas.

Couple of thoughts:

LD_PRELOAD might help you to override open, seek, read, etc calls. You can probably do a HEAD on the URL to get the actual size of the MP3, without downloading the entire file and fake stat results from that.

Seek and read can be faked with Byte-Ranges, as you have already indicated.

Problems that I see are convincing the application to open "http://host:port/path" using the filesystem, and not spewing immediately.

I'm thinking that extending the python code that you have is probably going to be the easiest.

Alrighty then. by danielsfca2 · 2004-05-21 18:00 · Score: 1

Okay. So why the hell do you need FUSE for that? Why can't one just implement this using HTTP partial content requests? These libraries are open source, right? So the submitter obviously just needs to modify the existing libraries to read and seek using HTTP partial content requests instead of filesystem ones. Problem solved.

Yes, this involves doing real work. No, Ask Slashdot rarely does real work for people.

My "sorcerer's magic" comment, by the way, was trying to communicate the idea that even these libraries (that already exist), must either inspect the sections of the file that I mentioned, or parse the whole file. They cannot, by any way other than magic, just ascertain exactly which bytes to read without probing the file like I described. The person to whom I replied seemed (to me) to suggest this was somehow possible.

Slashdot Mirror

Efficiently Reading ID3v2 Tags Over HTTP?

65 comments