Track Separation Detection for Streaming Media?
manavendra asks: "I have been using a couple of streaming audio rippers for about a week now (StreamRipper and StationRipper, most notably), but most of them seem to be afflicted with the same problem - inconsistent track separation. I've read about the Shoutcast Metadata Protocol, but I haven't properly understood how the silence-detection mechanism of track separation works. There have been users who have observed that since most tracks are skewed by a few seconds (depending upon the radio station), they advise adding provision to delay or advance the timing. Has anyone implemented a better mechanism, since basing breaks on silence detection seems dodgy in first place? Can someone can shed some light on the inherent problems of reliable track separation?"
if you're using streamripper or streamripper32, they simply use the metadata that comes down the stream to split the tracks. on most stations (only the ones WITH the metadata, don't forget) this works perfectly and splits the tracks exactly at the right time. Nice & Easy.
the separation doesn't come from detecting silence, it comes from winamp (or a supporting media player) TELLING the server that the track has changed, and what the data is for the new track. the server then sends this down in the stream along with the music and that's how you see what's playing.
now, on stations that have a lot of listeners, the metadata seems to start coming down AFTER the song actually starts, which cuts off the front of the song, and at the end of the track you hear 10-15 seconds of the next song. Club 977's stream does this VERY bad, for example.
other than resplitting the files and joining them again there doesn't seem to be any easy way to correct the mis-separation.
you could force the stream ripper to check the metadata every single second (if the software supports it) so that it will catch the new data as soon as it comes down, but if the server only updates this data every 15 seconds there is nothing you can do. if i were a server operator i would do this purely because it would make it difficult to cleanly rip my stream, something i think that i would like.
The shouldcast metadata protocol works by inserting special data every X (usually 8192) bytes. The data looks something like "StreamTitle=blah; StreamUrl=bleh".
Since 8192 bytes is about half a second at 128Kbps (128000/8192/8 = 1.95), a detection of a change of track using this method is usually pretty accurate.