Track Separation Detection for Streaming Media?
manavendra asks: "I have been using a couple of streaming audio rippers for about a week now (StreamRipper and StationRipper, most notably), but most of them seem to be afflicted with the same problem - inconsistent track separation. I've read about the Shoutcast Metadata Protocol, but I haven't properly understood how the silence-detection mechanism of track separation works. There have been users who have observed that since most tracks are skewed by a few seconds (depending upon the radio station), they advise adding provision to delay or advance the timing. Has anyone implemented a better mechanism, since basing breaks on silence detection seems dodgy in first place? Can someone can shed some light on the inherent problems of reliable track separation?"
How do YOU know the difference between two tracks? You talk like there HAS to be SOMETHING that identifies track seperation, but this simply is not true. I bet if you pop in Pink Floyd's: Dark side of the moon, you can't tell when it is the next track, without looking at the track indicator. This is because you look for a pause... an inconsistancy in the flow of sound, or perhaps a seperation of audio by silence. Eliminate these, and you have nothing to work with, but again, you can't count on these.
"When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
If you really care about the music being offered then you'd manually do the track editing yourself. And if you care that much you should be paying for the stuff, or, if it's something truly unique then the extra effort that it takes to do the splitting manually is the price you have to pay. What happened to people loving music?
Of course I personally dislike many of those radio stations that just sequence a playlist on winamp and let it run. Gaps between tracks just doesn't make good radio, it's lazy on the part of the 'DJ'.
Yes there are better ways, depending on the metadata protocol (shoutcast has a really broken metadata system) and on the source streamer. I wrote code for it a long time ago, but it was just at odds with any notion that I was a music fan.
Can you separate tracks based not only on audio but also on the title/artist data displayed?
If you had software that did that and combined it with a ripper, you could leave it running all day and have a repository of free (well, already-paid-for) legally owned music.
If God had meant us to separate tracks, He wouldn't have invented crossfading!
The difference between stupidity and genius is that genius has its limits.
Here can you handle this?
Did you think about your bills, your ex, your deadlines...
and so on.
I bet "2:35 of silence" would be a whole lot of tracks.
if you're using streamripper or streamripper32, they simply use the metadata that comes down the stream to split the tracks. on most stations (only the ones WITH the metadata, don't forget) this works perfectly and splits the tracks exactly at the right time. Nice & Easy.
the separation doesn't come from detecting silence, it comes from winamp (or a supporting media player) TELLING the server that the track has changed, and what the data is for the new track. the server then sends this down in the stream along with the music and that's how you see what's playing.
now, on stations that have a lot of listeners, the metadata seems to start coming down AFTER the song actually starts, which cuts off the front of the song, and at the end of the track you hear 10-15 seconds of the next song. Club 977's stream does this VERY bad, for example.
other than resplitting the files and joining them again there doesn't seem to be any easy way to correct the mis-separation.
you could force the stream ripper to check the metadata every single second (if the software supports it) so that it will catch the new data as soon as it comes down, but if the server only updates this data every 15 seconds there is nothing you can do. if i were a server operator i would do this purely because it would make it difficult to cleanly rip my stream, something i think that i would like.
The shouldcast metadata protocol works by inserting special data every X (usually 8192) bytes. The data looks something like "StreamTitle=blah; StreamUrl=bleh".
Since 8192 bytes is about half a second at 128Kbps (128000/8192/8 = 1.95), a detection of a change of track using this method is usually pretty accurate.
... the tracks seem to be separated just fine on the CDs I purchase.
I know you can separate by frequency using Fast Fourier Transformations.... But that does not compensate for the fact that, for example, the highest note on a bass guitar is higher than the lowest note on a normal guitar.
-Clio
Karma: Bad (mostly from not giving a fuck)
Blog: http://clintjcl.wordpress.com