Slashdot Mirror


Visual Analysis Of Mp3 Encoders

Chris Johnson writes: "I've just finished an interesting scientific analysis of several mp3 encoders and have my findings up on the Web. The process involves differencing a 'sonogram' image from an encoded test signal with the image of the original signal, and then producing response curves showing the disparity in direct signal volume, and over time. Umm . . . which is just to say this is probably the most rigorous analysis of any encoders anywhere on the web, and very geeky (in a good way). LAME carries the day, but BladeEnc shows that it has a completely distinctive sonic approach- and Fraunhofer proves unacceptable (in the version I tested) for audiophile use, though it's unbeatable at very low bit rates. See why." Truth in advertising -- this is a cool example of how visual information can convey more than you'd expect it to.

60 of 127 comments (clear)

  1. Re:Is not! by jmv · · Score: 3

    If you are a big fan of classical you will have an opinion on _which_ parts of the sonic information are expendable

    No, when a certain frequency component is discarded, it's not because the listener won't mind, it's because even if it's there, the listener cannot hear it. If you can't hear a sound, why encode it? Now, there are sometimes problems with classical music, but that's because it's often hard to predict exactly what you can and can't hear.

  2. Quaint, but flawed by John+Whitley · · Score: 5
    This sonogram analysis is quaint, but the author fails to grok the basics of psychoacoustic model based audio compression. The first rule is: you cannot measure the perceptual quality of the compressed audio via a raw distortion metric. Subtracting the original signal's sonogram from the compressed signal's sonogram is a distortion metric.

    That said, it is generally the case that "pre-echo is bad" and "over-ring is bad." Reducing these can be thought of as a good thing. Let's assume that for these encoders, pre-echo and over-ring are universally bad (I'll give an example where this isn't the case, below). Furthermore, this comparison actually says nothing about these encoders other than the pre-echo or over-ring. I.e. what happened to the sound that was the "same" on the sonogram? It is quite possible for an "encoder" to mangle the audio quality yet have a pristine sonogram by this test's standards.

    Just to throw a wrench in the works, more advanced encoders and/or psychoacoustic models can utilize what's called temporal masking. This is the ability of a higher-amplitude signal to mask (make inaudible) a lower-amplitude signal either before or after itself, as far as the human ear is concerned. Pre-echo is the phenomenon whereby a transient signal (i.e. a very 'sudden' attack, like a drum hit) is smeared in time. The audible effect can be most obnoxious. Yet encoders utilizing temporal masking will explicitly allow a certain amount of pre-echo through, as long as it is temporally masked. This leaves the encoder to spend those bits on other parts of the signal that would be more seriously degraded as far as our ear is concerned. In short, a sufficiently savvy encoder could exhibit more pre-echo than another worse-sounding encoder, especially if it uses temporal masking.

    Quantitative analysis for perceptual audio coding is not easy; this has been a grail for researchers in the field for years. I strongly suggest that interested parties dig into various IEEE and AES (Audio Engineering Society) journal papers on the subject, as well as various books, etc.

    1. Re:Quaint, but flawed by Chris+Johnson · · Score: 2
      I consider it _very_ interesting to know what frequencies pre-echo and over-ring are occurring. As you know, the sonic results of this type of distortion are greatly different depending on what frequency they're at. It's going to be a hell of a lot harder to hide a pre-echo or ring at 3K than one at 200hz- or 12K.

      That said- the sonograms are greyscale plots of deviation from the original signal. They are inevitably offset in time by the encoding process- I aligned them using those ugly transients in the center. There are two little charts under each. The second is the pre-echo and over-ring. The _first_ is precisely the opposite- deviation from the sound that was the 'same', with the weighting of the little chart (a RELATIVE measurement) emphasising the content of the wave rather than areas that are supposed to be free of additional frequency content.

      I don't think it's possible for an encoder to mangle the audio quality and have a pristine 'sonogram' as differenced with the source material. A pristine sonogram would be uniformly BLACK when this was done- none of the encoders remotely approached this. Any mangling, no matter what sort, will show up as a lighter-than-black area on the differenced image. I'm very much a high end audio dweeb at heart, but I don't believe there can be mangled audio quality without the Fourier content changing, and thus the sonogram showing big gray or white blobs.

      I wholeheartedly agree that quantitative analysis of perceptual audio coding is not easy! :)

    2. Re:Quaint, but flawed by jmv · · Score: 2

      I don't think it's possible for an encoder to mangle the audio quality and have a pristine 'sonogram' as differenced with the source material. A pristine sonogram would be uniformly BLACK when this was done- none of the encoders remotely approached this

      I'm sorry, from what you're saying, I just don't think you really understand what perceptual encoders are. First, if you have a 10:1 compression ratio, your sonogram cannot be all black (that would be lossless). Now, writing an encoder for which everything is grey (instead of black and white as the sonograms you found) is very easy to do, but it will sound like sh*t.

      Very simple experiment: take a signal and add white noise so you get a 20 dB SNR. It'll sound _very_ noisy. Now, while preserving the noise energy, shape that noise to look like the signal (of course, still 20 dB lower). The audio you'll hear will sound quite OK (though not perfect) and much better that with the white noise. You have just used a (quite simple) psycho acoustic model.

    3. Re:Quaint, but flawed by Chris+Johnson · · Score: 2
      Um- the 'sonogram' we're talking about here is the difference between a source and result sonogram (which is a pretty simple plot of frequency data over time). As such, one that was uniformly grey would represent a wave in which the distortion from the original is totally uniform over the complete frequency range _and_ time range. If I'm not mistaken, this would be essentially the same as taking the exact original wave, unaltered in any way, and perfectly blending pink noise with it. I don't see any other way you'd get a 'sonogram' result like you describe. Keep in mind that even if you took raw pink noise and called that your 'result' you wouldn't have a featureless 'sonogram', because it is relative to the original source- a 'sonogram' of full-spectrum full volume noise is uniform WHITE, and this differenced with the original source's sonogram will just invert it.

      For this reason, writing an encoder for which everything is grey (given the techniques I've been using) is far from very easy to do- and the sound of the file that would produce this result would have to be 50% perfect uncolored reproduction of the wave, and 50% pink noise. That's basically as tough to do as 100% perfect uncolored reproduction of the wave, and it'll sound bad because of the loud pink noise, but I don't think it would sound like you are imagining it to sound.

      The point about noise and psychoacoustic model is well taken- I'm not claiming that my testing is illustrating psychoacoustic model suitability. If you think about it you can see that's not testable- it's going to be different for every song, and every listener. Some people can't hear over 12K- nuke it! Some people are acutely sensitive to peaks at around 3K- for instance, someone with tinnitus who's subject to the phenomenon of _recruitment_ will find a resonance there to be painfully unpleasant.

      I can't possibly test for that and am not trying. I can, however, work out where the errors are, where artifacts are being produced in the frequency band, and what types of resonance are present, and that information can be used by any person who knows what their psychoacoustic model will accept. For instance: if you like Xing, you'll probably like Fraunhofer at high bit rate still better. If you run screaming from Xing and hate all mp3 encoders, you might need to go with Blade assuming you listen to smooth music like classical. If you can't stand Blade at all, Fraunhofer might be right up your alley. These are quantifiable observations based on driving all these encoders completely beyond their ability to cope, and watching where they break down.

  3. Another fun experiment by Tom7 · · Score: 2


    Another fun experiment is to do this same thing sonically (makes a little more sense) -- encode to mp3, convert back to wave, and then subtract the original from the encoded one. The resulting wave will have all of the bits which were discarded.

    It's difficult to interpret the results (I agree with those who say that this study is more or less worthless) but it does sound pretty neat. =)

  4. Re:In the final analysis by Stoutlimb · · Score: 2

    While agreeing that for high quality audio one must "fuck mp3", I have to disagree with you that it will loose it's appeal.

    Right now, the attitude is "Why be able to store several hundred songs, when I can store several thousand..."

    In a couple of years, the numbers will change but the rationale will be the same. Why store ten thousand PCMS when I can have a hundred thousand??

    I agree at some point things will become meaningless, but there will have to be quite a major revolution first... Perhaps that infinite data storage by quantum methods. Perhaps I'm a bit too hesitant to rely overmuch on Moore's law.

    E

  5. Re:MP3 for Audiophiles?? by sheldon · · Score: 2

    Give me some of what you are smoking, dude!

    MP3 distortions are very evident especially at 128kbps(so called CD quality) They become less evident the higher the bitrate, but even at 320kbps the distortions are still easily identified compared to the original CD.

  6. You can't make an objective test of mpeg encoders by geirt · · Score: 5

    The basic idea of mpeg is that the encoder removes the parts of the music which you (probably) can't hear. The encoder splits the sound into pieces, and rates each piece after how important it is for the total sound image. Then it starts with the most important sound and encodes that, and continuing with the less important parts until the available bit rate is reached (e.g 128kbit/s). The rest of the sound data is discarded.

    The tricky part is the calculation of the "importantness" of each sound, and that is what differentiates the encoders. This calculation is done with an algorithm called "a psycho acoustic model".

    To measure the quality of an mpeg encoder automatically, you need an algorithm which calculates the quality the the encoded signal. By knowing this algorithm it is trivial to create an encoder which will score maximum on this quality measurement, since the quality measurement algo is basically the same as the psychoacoustic model.

    This test is "snake oil", a real test of mpeg encoder unfortunately involves listening to the music to evaluate the psycho acoustic model of the encoder, and not comparing two artificially created psycho acoustic models with each other.

    --

    RFC1925
  7. Re:MP3 for Audiophiles?? by Gorgonzola · · Score: 2

    Not really being an audiophile I beg to differ. I got some tracks from the Lola Rennt film throught Napster, remembering that I enjoyed the soundtrack as much as the film. They sounded allright on my Aureal Vortex 2 soundcard and the cheapest model Rotel amplifier. Nonetheless, when I bought the CD, the difference was noticable. And we are talking about 192 Kbs MP3's. The clarity of CD's is far superior to MP3's.

    --
    -- Spelling and grammar errors tend to be a sign of erroneous thinking.
  8. Re:Conventional Wisdom by Chris+Johnson · · Score: 2
    *g* Thankee :)

    Giving this sort of thing to Slashdot is as fun as nude mudwrestling. Gotta love it. :)

  9. The real reason by Chris+Johnson · · Score: 2
    Um, the real reason is more humble than that.

    On the Mac, I would have to _pay_ to use the Xing encoder. I just got through a serious ramen-and-spaghettios period, and there's just no way I'm going to merrily throw money at people who not only support the mp3 licensing patentholders, but also make an encoder that is considered to be more prone to artifacts and ringiness than even the Fraunhofer high bit rate stuff.

    Beat me, whip me, slashdot me and call me unrigorous, but I'm not paying money for Xing. The lurkers support me in email. So there ;)

  10. Well, it was 'scratching an itch' really by Chris+Johnson · · Score: 2
    The mp3s I used to have up on mp3.com were Blade. That's because at the time, I hadn't located any other encoders that could be simply downloaded- it might be overlooked that for the most part these are _free_ _Mac_ encoders. When I used Blade, I was happy with the frequency response, but the pre-echo and weakness of transient attack always bothered me. I had certain music ("Koala", off the "anima" album) in which there were sounds (wood block combined with reggae rhythm guitar) that _severely_ failed to be reproduced by Blade in any sort of acceptable fashion- instead of going "klik!" the sound sorta went "whuf".

    I had to know why- no, scratch that, I knew why. I had to know which encoders did better- what they in turn traded off- and I had to know across a wide range of bit rates in a way I could quickly cross correlate.

    I've written for (IMNSHO) the foremost High End Audio journal. It's not that I'm not interested in listening to encoders! But if they are _all_ quite compromised, why not break 'em down into a series of measurements relative against each other with clearly identifiable characteristics? Shows you what to listen for- and tips you off to particular issues.

  11. Re:What about... by Chris+Johnson · · Score: 4
    I'd be hugely interested in that. I consider it very relevant. I'm doing all this on a Mac, and have tried repeatedly to compile Vorbis in any sort of way- one of the Ogg people did this at MacHack and has not made binaries available. If he had, Vorbis would be represented at every bit rate level. I am simply not coder enough to deal with porting Vorbis, even a cheap hack, and I wish I was. I've begged for Vorbis/Mac repeatedly, and finally I had to go on without it, as there were decisions I needed to make on what mp3 encoder to use for my stuff, and the whole project was to answer for me what was most appropriate for 128K-range and what was best at arbitrarily high bit rates.

    You can add me to that list- and such a comparison (I naturally kept a logbook to be able to reproduce the process later) would indeed be meaningful to me. For instance, if Vorbis was more sophisticated in its control of over-ring and either imposed a flatter characteristic (resisting resonant peaks) or went for an intentionally tailored characteristic (say, suppressing ring around 3-5K like Fraunhofer 32K bit rate) this would have obvious and interesting application to the sound quality. Conversely, if it had big ugly peaks and artifacts, their location in the frequency response would tell a lot about the sonic signature of the encoder.

  12. Re:Web page background. by Chris+Johnson · · Score: 2
    Sorry :)

    Doh! For years I've used a purely white background for airwindows.com, with a sort of vintage-cnet layout. I also used to keep a 'graphics' section in which I had some web background gifs I'd done. They were made like this:
    x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
    Do a diffusion dither between white and the lightest 'web safe' gray- then take all the pixels at x positions and knock 'em out to white too. The result (works with other colors as well) is a texture in which no two colored pixels are ever directly next to each other- it's a paperlike texture but never gets darker than half Netscape grey.

    Which is to say- sorry, I did it that way because I liked it, and I'll keep it. Honest, I have done everything I possibly could to avoid obscuring the text, but it's sort of like a trade-off: in getting rid of additional table clutter that I used to have, I found that I liked the pages when this simpler layout was backed by the softest texture I had, rather than plain white.

    I hope it didn't bother your eyes too much :)

  13. Yikes by Chris+Johnson · · Score: 2
    Those Xs were supposed to look like this:

    x x x x x x x x x x x x x
    x x x x x x x x x x x x x
    x x x x x x x x x x x x x
    x x x x x x x x x x x x x
    x x x x x x x x x x x x x
    x x x x x x x x x x x x x
    x x x x x x x x x x x x x
    x x x x x x x x x x x x x

    Woops. Or I could have said 'checkerboard' and saved myself the hassle :)

    The idea is from company named Boxtop Software which produced a photoshop plugin that put different web safe colors in checkerboard patterns to produce a much greater range of 'web safe' colors (which look solid). I figured, why not run with that and do textures that way? Maybe the Gimp would benefit from some websafe checkerboard texture generators too :)

  14. Anybody up for trying this out for real? by Chris+Johnson · · Score: 2
    I'd be very curious to know if this can be done- and I seriously question if the resulting encoder would perform 'very well'. If you think this would cause the resulting wave to score perfectly you're... not correct: it's not possible for that to happen with such lossy compression, perceptual model or no.

    Actually, I think this would be a _very_ good experiment. I'm aware that my questioning some of these concepts is seen as prima facie evidence of being a tottering loony *g* but the whole concept of the psycho-acoustic model is so central to current audio theory... and this theory basically says, 'mp3s can be made to sound indistinguishable from CDs' and they cannot- the same theory on a broader level says 'CD itself is theoretically perfect sound', and it is not- mastering engineers, for instance, have learned that to do their work they need something better than CD audio.

    I'm not certain that the psychoacoustic model must necessarily be that much better than, for instance, trying to diffuse unavoidable error as evenly as possible over the frequency and time domains. You are essentially insisting that concentrating the error in particular areas that are said to be 'masked' is far superior. This assumes the masking is effective, and that there are no side effects- neither assumption is wholly true, as large numbers of people are able to find fault with (say) 128K mp3s, and any filtering is going to impose extraneous characteristics. Finally, you're assuming that given an encoder that does not have a psychoacoustic model (I assume this would mean one that diffuses error pretty uniformly) is going to perform 'very well' in the procedure I devised. I'm not sure of that- I'd like to try it experimentally before jumping to that conclusion.

    Finally, I have to admit- I haven't got the faintest idea what the resulting sonogram, and frequency/overring characteristics, would look like. I can say some things about it- with regard to the over-ring, diffusing it over a wider frequency range is not only desirable but markedly preferable. Fraunhofer loses badly to LAME, sonically, over just this issue- and Blade gets away with its severe over-ring by diffusing it over a wider frequency range. If the experimental psychoacoustic-model-less encoder showed significant improvements in diffusing out this over-ring and reducing its duration- there would be legitimate applications for its tonal characteristics, even if the raw frequency response was noticably compromised. It would be sort of like the 'anti-Blade'.

    I don't suppose anyone will actually _try_ it, much less help me out with measuring it :P but if anyone is genuinely interested in investigating this, drop me a line? It sounds like something that could be attempted. Seriously- the whole point of such a model is 'masked stuff can't be heard'. If people can hear the masked error anyhow, what is the point? And if you assume people who can't hear anyhow and won't notice, what's the difference? Is it so axiomatic that you have to shun diffusing error evenly, and instead concentrate it in areas you think won't be heard?

  15. I'd like to see results of that exercise by Chris+Johnson · · Score: 2
    I didn't notice at first but you're proposing a mirror image of an exercise I'd dearly like to try, an idea that emerged from yet another slashdot audio argument :)

    You are talking about applying only the psychoacoustic model of the mp3 encoding, and producing a comparison of that with the original signal. I would indeed be really interested in seeing that- I'd like to know which of the various distortions, over-rings etc. arise from the psych model and which arise from the fractal part.

    In the argument (lower in the thread) I was questioning whether you could skip the psych model entirely (pretend people can hear the difference between 128K mp3 and real life ;P ) and see just what you'd get if you went purely with the fractal encoding- trying to diffuse any and all error in the process as evenly as possible over frequency and time.

    People will swear up and down that this will be drastically worse. I'd like to measure it in comparison with normal mp3 encoders and see exactly what it is, not just run around making theories that it's going to be awful. The one thing I'm willing to guess about it is that the sound will be the opposite of BladeEnc's sound. For some people that'll be bad- but the idea of an 'anti-Blade' might really interest others.

    I don't know if anybody's comfortable enough with hacking on a version of LAME or whatever that they'd be willing to try it- I am going to bounce the idea off Martin Hairer, with whom I worked to perfect the sonogram-plotting program (I needed to request better picture export capacities- he came through like a trouper and fixed everything). I think he is the one who ported LAME to his program, and he might be both able to try such experiments, and interested in seeing what they do.

    At any rate I wanted to say that your idea of isolating the transformations and considering them independently _is_ truly an interesting exercise- and I hope to be able to do such experiments, and learn from them, with a bit of work and patience :)

    1. Re:I'd like to see results of that exercise by Fross · · Score: 2

      You are talking about applying only the psychoacoustic model of the mp3 encoding, and producing a comparison of that with the original signal. I would indeed be really interested in seeing that- I'd like to know which of the various distortions, over-rings etc. arise from the psych model and which arise from the fractal part.

      yes that's exactly it, i think it would be an interesting exercise, as i don't recall seeing any study of that as of yet. i'm sure much has been done to develop psychoacoustics in the first instance, but as that was way before mp3 actually came about, this info won't be readily available from mp3 sites (though thanks to the anonymous coward's url elsewhere in this thread!)

      i think removing the psychoacoustics and simply applying the fractal transform on its own would result in a lower perceived quality-per-bitrate ratio, not much else. but it's interesting also.

      to do any of these experiments, we'd need access to the source code for an mp3 encoder - are any of these available? LAME for instance? i'm sure fraunhofer's is available from less reputable websites ;)

      fross

    2. Re:I'd like to see results of that exercise by Chris+Johnson · · Score: 2
      The reason I'm interesting in seeing what the no-psychoacoustics version would do is that, apparently, the psychoacoustic transform is a dynamic but extremely elaborate equalisation curve. There are definite consequences to doing such an elaborate correction- and I think Fraunhofer has illustrated some of them, in particular by pushing for an _extremely_ sharp cutoff at the top end of the frequency range- which results in ringing.

      A lot of the pre-echo that's showing up as resonant peaks could be attributed to this type of equalisation. If that is the case, applying the fractal transform alone would result in noticably coarser frequency information but, at the same time, a much cleaner time domain with pre-echo and over-ring much more diffuse and unhearable. It might be percieved as extremely dynamic but somewhat colored sound with a great deal of openness and energy but compromised tonality. It might be terrific for certain types of electronic music, drum machines and such things.

      I'm working on being able to try this experiment. If I can do this, I can also experiment with different types of filtering (realistically, I'd be working with a programmer who would know how to do this but might not have thought to try some of the things I'll suggest).

      If anybody tries this sort of thing, don't test it on stuff that would obviously suck! It would be pointless trying it on classical, or easy listening. On the other hand, gabba house music and really harsh techno, or brutally distorted heavy metal... I know I've got stuff that I'd like to have encoded in a way that pushes impact at all costs and brings out the rawness of the sound at the expense of the detail and clarity. That should be possible, don't know if axing the psychoacoustic model would do it- depends on how much it compromises the original sound. Not all filters produce such obvious artifacts- just the ones with really sharp slopes such as the top lowpass filter in Fraunhofer.

  16. Done before (again). by xenoweeno · · Score: 3

    Spectral and waveform analysis and such has all been done before, and LAME has been known to be superior for quite some time. I've been singing the praises of this site for at least six months.

  17. Re:So what? by hymie3 · · Score: 2
    First off, the *only* way to evaluate the quality of a perceptual encoder is to listen to it, period. Who cares what is rejected (non encoded) if you don't hear it.

    I'll agree that perception is what matters. However, what souds great on my $48 Labtec speakers at work sounds like crap on my $500 studio headphones at home. The fact of the matter is, most people don't have $25,000 of audio equipment nor sufficiently trained ears to tell the difference. I'll readily use LAME encoded stuff from people I trust, but cringe in horror when I listen to the rapage that Xing's encoder performs to the quality of complex music.

    Think of it this way: most people are arguing which color of crap tastes better. Sites like this one and the one in the article are trying to point out that you don't have to eat crap.

    hymie

  18. Kexis -- GPL lossless file format. by ghazban · · Score: 2

    Kexis is a GPL'd lossless encoder which has proved to be _almost_ as good as shorten for filesize, is _much_ faster to decode and encode than any encoder I have ever used... The fact that the kexis file format may change in the future is largely a petty issue as you can simply losslessly convert from the old format to the new one. Have a look at it at http://kexis.sourceforge.net

    1. Re:Kexis -- GPL lossless file format. by slim · · Score: 2

      The fact that the kexis file format may change in the future is largely a petty issue as you
      can simply losslessly convert from the old format to the new one.


      Yeek; that's fine until you have several gigabytes to convert each time the format changes.

      One of the best things the Minidisc inventors and the MP3 inventors did was to keep the decoding algorithm static, while allowing the encoding algorithm to improve as technology improved.
      --

  19. Blade is not an encoder by heroine · · Score: 2

    Blade became popular because it was the first program to be banned by Fraunhoffer. In fact, blade is really a copy of the ISO reference code, optimized for speed. Lame incorporated massive quality improvements, but came too late to catch the wave of publicity offered to Blade. It would be nice to have access to the code which generated these sonograms.

  20. A much better overview of mp3 encoders by rawrats · · Score: 2

    r3mix.net is really the definitive site for this sort of thing. Not only does the site show waveform deviation, but the tester actually listens to lots of very diverse music to test for quality. The waveforms are used mainly to explain errors heard during listening (ie. what the hell is that fuzzy warp sound overriding the bassline?). So anyways, read up at r3mix.net -- you'll realize people have already done this much better.

    --
    -- jar
    1. Re:A much better overview of mp3 encoders by Rayban · · Score: 2

      Not true... read the front page. He describes pre-echo and overring, both in audiophile terms.

      --
      æeee!
  21. Re:MP3 for Audiophiles?? by Anonymous Coward · · Score: 4
    Audiophiles are interested in the most accurate reproduction of sound...

    Absolutely. CD quality (44.1 kHz 16 bit PCM) is total CRAP to true audiophiles. I won't be satisfied until they invent a format that will store the timing and stength of every single air molecule hitting my eardrum, precise to within the Heisenberg uncertainty principle. Uncompressed.

  22. (OT)The vinyl vs. CD debate by yerricde · · Score: 2

    some of my vinyl is way better

    Vinyl sounds "warmer" because...

    ...the scratches and pops remind the listener's subconscious mind of a fireplace. [Read More...]

    --
    Will I retire or break 10K?
  23. You are wrong by scheme · · Score: 2
    You can't compress data and also have the output be the same as the input. Think about it, there are 256^(# of bytes in the sample) possible inputs, and since every encoder output can only decompress to one possible input, the only way to get 256^(# of bytes in the sample) possible decompressed results is to have 256^(# of bytes in sample) compressed outputs-- i.e. 1:1 compression ratio.

    That's trivially proven to be incorrect since gzip and bzip2 compress data and yet have the outputs be the same as the inputs. In an audio context, ten minutes of a pure frequency sound be easily compressed to a small size. The only information you really need keep is the length of the tone and the frequency.

    --
    "When you sit with a nice girl for two hours, it seems like two minutes. When you sit on a hot stove for two minutes, it
  24. Re:What about... by puetzk · · Score: 2

    I would also be very interested in seeing similar graphs (preferably from the same source) made with Vorbis encoders, to see how they stack up.

    --
    The Matrix is going down for reboot now! Stopping reality: OK. The system is halted.
  25. The portrayal of this is inaccurate. by Fross · · Score: 3
    first off, i must say this is a very interesting article, and an original and potentially useful analysis for comparison both between mp3 formats and, to some extent, between mp3 and other audio encoding formats. however, the correlations between visual distortion and loss of audio quality are *NOT* valid or accurate, something the article doesn't place enough emphasis on. :)

    the key point here is that mp3 encoding is in fact a process of two separate transformations (both of which consist of many processes, of course), the first of these is my bone of contention as it seems less well-known than the second, which i will address first.

    the "second transformation" is the one familiar to most people, the iterative fractal encoding procedure, which simply adds information to that audio frame until it a) either hits a "quality threshold" (ie is consider good enough), or b) fills up its bitrate allocation. it's similar in many ways to making a "jpeg of sound". you can get a good view of this whole process by following this link to a graphic of the aac encoding process on fraunhofer's website. It is the stuff inside the box at the lower left that this concerns.

    however the first transformation here is the important one, this is the stuff outside and above the box in the graphic linked above. (i am not sure the graphic is detailed enough, there may be some missing, from what i remember) - this is a series of transformations to limit the amount of data the second transformation has to deal with (and hence get essentially better encoding for the same bitrate), according to the way the human ear works. our ears have "features" like having a dead area in frequencies near loud noises, which means these bits can be cut out, and other bits and pieces that i can't remember and don't have to hand ;) this is of course psychoacoustics, as other people have commented. there is a _very_ basic primer on this at the fraunhofer site here, but it doesn't go into any technical detail.

    as an aside, there used to be some fantastic and informative articles on these subjects at mp3.org back in the day (1997-1998?), may it rest in peace. does anyone have some links for where something as good on this subject is? i haven't been as in touch with the technical side of mpeg encoding as i used to be...

    but anyway back on subject, this first transformation actually distorts the signal *significantly*, but only in a way that makes it easier to process, while still sounding the same (or close) to the human ear. it may be an interesting exercise to isolate this first transformation, apply it and then save without any fractal encoding, and compare that to the original signal. this transformation will cause great "visual degradation", as shown in the article, but imho this is not an accurate criteria for measuring audio quality. still interesting, and a good read, though :)

    fross

  26. Re:MP3 for Audiophiles?? by FallLine · · Score: 2

    What about your amp and speakers?

  27. Um, no, you can't. by Chris+Johnson · · Score: 2
    If you would like to write an encoder that actually encodes audio, and specifically trades everything off to perform terrific on 'EncoderHell' (the test tone noise), please do so! It might have interesting results when used with regular music. You cannot make it perfectly reproduce the sound without effectively hardcoding the exact waveforms for that sound into the encoder, because of the elements that use random bandlimited noise at up to 22K- there aren't enough bits to literally store that information in an mp3.

    The ideal result from the process (totally unaltered waveform information) would be an entirely _black_ 'sonogram' at the end of the process. That's not going to happen. Since there are going to be deviations, it's down to the psychoacoustic model- and the pictures and charts are going to show what the encoder chose to throw away, on a larger scale.

    You can argue that the encoder throws away stuff that can't be heard, therefore measuring _that_ is meaningless. This equates to arguing that the result is indistinguishable from the source audio. I disagree, and feel that all mp3s are audibly degraded from the source audio- which is itself degraded, being typically 16 bit 44.1K digital audio :)

    I'm trying to measure what the encoder's failing to do. The project was meant to answer my own questions, and has done so.

    1. Re:Um, no, you can't. by jmv · · Score: 2

      If you would like to write an encoder that actually encodes audio, and specifically trades everything off to perform terrific on 'EncoderHell' (the test tone noise), please do so

      Quite easy... strip out the psycho-acoustic model from a good MP3 encoder (like LAME) and you get a crappy MP3 encoder that performs very well in you sonogram test.

  28. Re:Visual analysis of MP3 is nonsense by Chris+Johnson · · Score: 2
    Not so much 'better'. DIFFERENT. I think it's plain that Blade makes very different choices from Frau or LAME in discarding information. The results I got would suggest that Blade is only good for classical music but excels at that, that Frau is 'mid fi' sonic spectacular, that LAME strikes a balance between sonic spectacular and being driven into artifacts and coloration. Sure enough, I'm seeing people citing orchestra conductors who would only accept Blade, people getting Really Agitated (Fraunhofer fans?) and people saying 'yeah, I already knew LAME was best so your page has no point' ;)

    Personally, I'm with LAME for my sonic requirements, although the only mp3s of my music out there (so far) are Blade, done many months ago before I did this research. But the point is not that there is a 'winner'- the point is that the differing sonic characteristics of these encoders CAN BE QUANTIFIED. Perhaps not measured outright (my charts etc. are _relative_ to each other), but these encoders take significantly different approaches to discarding information, and that applies directly to your choice of encoder for recording music, and translates to a completely predictable sonic characteristic of the encoder on ANY music, no matter what.

    I put all sorts of music through Blade when I was on mp3.com with only Blade for a free encoder- no matter what I did, the result was always identifiably BladeEnc, with the smooth extended frequency response and absolutely terrible transient impact. For some pieces, this was suitable- for some it was grossly unsuitable. But the sonic characteristics were consistent- and correlate with what I learned about the encoder in this 'torture test'.

  29. What about... by Didel · · Score: 2

    Ogg Vorbis?

    1. Re:What about... by joey · · Score: 3

      He's comparing the output of the encoders, once decoded. If he had a vorbis decoder that allowed him to get the information he needs, or course he could do a meaningful comparison. And it's the comparison I and probably many of us are most interested in.
      --

      --
      see shy jo
    2. Re:What about... by jmv · · Score: 5

      Ogg Vorbis?

      He's measuring the MP3 encoders, and Ogg Vorbis is not an MP3 encoder, but an Ogg Vorbis (duh!) encoder, it doesn't use exactly the same encoding scheme, though it is still a perceptual encoder (based on time-frequency masking).

  30. Not the first. by Siqnal+11 · · Score: 2
    --

    --
    You are a fucking moron.
  31. That's nice by Evro · · Score: 2
    That's nice and all, but if you can't hear a difference, what does it matter? I, too, would prefer to use the "best" one, but if I didn't know which was the best until this test, what do I really care?

    __________________________________________________ ___

    --
    rooooar
  32. Finally! by ca1v1n · · Score: 2

    I have been wondering about this kind of thing for a long time. I have used Lame as of late because it is very fast with the optimized compile I have. I wish it were as fast on VBR, but I guess I'll have to settle for CBR.

    I'd really like to see something like this with Ogg Vorbis once it matures. Or now even, because it seems to be a bit better already, though it's hard to tell on my laptop speakers.

  33. Lame encoder by ericdano · · Score: 2

    I've been using the VBR Lame Encoder 3.99 Alpha for a couple of weeks and I love it. It's fast, and it sounds great. I was using BladeEnc for a while. I have found that Lame sounds better, and using VBR will result in a smaller file than Blade and still sound better.

    --
    It's either on the beat or off the beat, it's that easy.
    I moderate therefore I rule!
    --
  34. Re:Visual analysis of MP3 is nonsense by dewet · · Score: 2

    Ah, but nowhere does this article try to disprove that, does it? The whole point is that certain codecs does a better, more intelligent job of discarding information, and that is what the author set out to prove.
    ------------------------------------------ ---------

    --
    -------------------------------------------------- -
    This sig could have been put to good use.
  35. Re:So what? by FallLine · · Score: 2

    Not that I particularly care, but this seems to be a shallow argument. When you're searching the skies, you're trying to FIND something; ignorance is NOT bliss in this case. When you're listening to music, all that matters is what you can hear. Now maybe there is a more scientific method to determine what you can hear, such that you can detect percentable problems before you run into them, but other than that, who really cares?

  36. Re:LAME? by Rei · · Score: 3

    The author is on drugs, is all I have to say. :)

    I'm taking a course currently on audio and image compression, and his article annoys me greatly. He uses ambiguous terminology and often the wrong terminology (for example, calling things "wavelets" that aren't actually wavelets). He describes things which can't be seen clearly in the graphs and would much better be viewed with a different display format. Etc.

    I'm still wondering if some of my compression ideas will work... I plan to test them out before too long: grouping some of the generally weak high-frequency signals together since the human ear is less sensitive to high frequency pitch variation (we're sensitive to frequency on a logarithmic scale - an octave is a doubling of frequency); and, instead of doing block transforms on the music, generate a 2d image of the signal (graph: frequency vs. time), compress the frequency axis as you normally would, and instead of saving the time axis as a series of blocks of discrete frequencies, actually compress it greatly with a fft - doing this, you should be able to save space on recurring themes in songs (such as a chorus, a regular beat, etc). Voice may introduce complications, though, and I may end up having to do some kind of combination between the two (such as, compressing the difference between the original and final signal as a low quality block transform and saving it with the compressed signal). Two ideas of mine I plan to test when this incredible work load from my senior year stops bearing down on me ;)

    - Rei

    --
    He's just being nice so my real father won't freeze him in carbonite and sell him for spice.
  37. Those tests are Worthless. by rabtech · · Score: 2

    Audio quality for compression codecs cannot be measured in terms of visual graphs or synthetic benchmarks. (I.E. just comparing the difference between the original singal and the compressed signal does not work.)
    It is quite possible to have a singal that very much resembles the original wave graph, and yet sounds horrible to the ear. It is also equally possible to have a signal who's graph doesn't resemble the original very much, and yet has a much higher 'percieved' quality.

    Just remember: The first rule in every single BEGINNERS guide to sound is to "Trust your Ears," and that is the only way to tell a good codec from a bad one.
    -----

    --
    Natural != (nontoxic || beneficial)
  38. Very cool! by Chris+Johnson · · Score: 2
    Very cool! Quite a lot of people have been saying (in inimitable slashdot fashion ;) ) "j00 sux0rs! r3mix did this before you and is better!" *g*

    In fact I think I have seen this before and r3mix actually affected my approach to my encoder analysis. Definite kudos to r3mix, and I entirely agree with many of this site's decisions and approaches- interestingly they reach precisely the same conclusion as I did, that LAME 256 was the ideal archival encoder and LAME VBR was the best one for smaller file sizes- except that r3mix has added the recommendation that joint stereo be used in the latter case! (this would really hurt the relative comparison with higher bit rate stereo encoders with my mono test signal, but I think I will take the advice and try that for my own mp3s...)

    r3mix also chooses to use _relative_ graphs rather than attempting to give absolute measurements, something I heartily approve of.

    Now, here's the thing- r3mix's results are sometimes a subset and sometimes comparable to mine, just depicted in a different way. The primary measurement of a frequency sweep produces different-colored graphs- if you take the horizontal axis and express the vertical deviation of each graph, from an ideal line of flat reproduction at the top, as a brightness value of a single pixel, you'd get something akin to a single line on one of my 'sonograms'. The test with the 'applaud' signal is an example too- if you subtracted the source from the results you'd end up with distortion levels very similar to my differenced sonograms.

    More interesting to me is the fact that my sonograms show an _intermediate_ step- several r3mix tests are the averaged responses of an encoder over time. That is exactly what my 'charts' are- they are sums of all the deviation and distortion over the entire length of a sonogram, over a range of frequencies.

    I'm almost certain I'd seen r3mix before doing my own analyses- I think it's very likely that this site significantly helped me define the processes I used for my own stuff. I heartily recommend checking it out- this is good work, I totally endorse it, in fact I'm going to put a link to it on my own encoder page right now :) *put* there!

  39. Re:LAME? by Tackhead · · Score: 2
    What you said.

    I think the guy's hearing with his eyes, or using a totally different set of music than what I listen to.

    If you wanna hear how dog-fuckingly-shit Blade is, encode the first 10 seconds of New Order's "Blue Monday" (a basic drum machine emitting a sound common to much new-wave, dance, and industrial from around 1980 to the present day) at 128/160/192 using Blade, Fraun, and LAME.

    Blade will be unlistenable at 128, shit at 160, and you may hear artifacts at 192 if you know what to listen for. LAME and Fraun sound sweet, even at 128.

    Similar results can be achieved with a heavy guitar track, e.g. Def Leppard or other 80's "hair metal" bands.

    I don't have data on string quartets - but for non-classical music, Blade blows steaming piles of donkey dick.

  40. Re:An alternate analysis by Chris+Johnson · · Score: 2
    Yeah- this is interesting stuff :) the fact that it's often measuring 128K tends to hurt LAME and Blade. I'd direct your attention particularly to page three, "Testing With Real Music: 'Dirty Blue'": Ars actually learned more than they realise with this test. I quote: "The FhG encoder strives mightily all the way out to 20 kHz, but this results in obvious errors in the power spectra." Absolutely- these are the artifacts I was able to illustrate in sonograms, and the artifacts are in part produced by high frequency ringing of Frau's overly sharp cutoff.

    _Definitely_ an interesting site. Also, referring to the listening tests: "The Fraunhofer encoder produced a surprisingly harsh sounding attack on the guitar; it remained quick and sharp, but was artificially crisp and accentuated." That's precisely what I was trying to say, couldn't have put it better myself. It turns out Ars _likes_ that. I do not. But if you do- clearly, you're going to like Fraunhofer. It's not about picking a winner, it's like picking a musical instrument...

  41. cuecat by jbridge21 · · Score: 3

    from the can-a-cue-cat-read-these? dept.

    Well, after calibrating my cat on a couple of Pop-Tarts boxes, I tried several scans on the diagrams on the web page... nothing! I can therefore conclusively answer this question with a big, fat NO.
    -----

  42. Visual analysis of MP3 is nonsense by Djinh · · Score: 3

    MP3 is about selectively discarding information from the audiostream. The purpose is not to create an output waveform which is as close as possible to the input. This is what the whole business with the psycho-acoustic model is about.

  43. Conventional Wisdom by kfg · · Score: 2

    The guy used the example of Fairport Convention with Sandy Denny.

    I don't know about his rigor, but the guy's alright by me.

    Who knows where the time goes?

  44. So what? by jmv · · Score: 4

    OK, now we see what parts of the spectrum are thrown away at very low bit rate, but why is it supposed to be "probably the most rigorous analysis of any encoders anywhere on the web"? First off, the *only* way to evaluate the quality of a perceptual encoder is to listen to it, period. Who cares what is rejected (non encoded) if you don't hear it.

    Also, while using the 32 kbps bitrate amplifies the effects of perceptual quantization, so it's easy to see them, the problem is that not all the encoders where meant to work at this bitrate.

    Think about it, when standard institutes want to evaluate audio/speech codecs, they don't calculate sonograms like this, they make subjective tests. They make a bunch of listeners hear the result of many encoders on *many* audio files. That's right you need many files to evaluate a codec. Some will perform better for certain musical instruments, some will perform better with or without background noise, echo, ...

    For all these reasons, I do NOT consider this analysis rigorous at all!

    1. Re:So what? by jmv · · Score: 3

      with an oscilloscope I can get a more precise answer

      Yes, the guy's sonogram is more *precise* but it is still irrelevant. I could write an encoder that gives a much better result when evaluated with this "precise" sonogram, but yet will sound like crap.

      This is the point of perceptual encoding. The goal is not to produce the best result in terms of signal-to-noise ratio or spectral distortion, but to cause the encoder "errors" where the "non-precise" ear won't hear it. And if you don't hear it, you don't care, even if your oscilloscope of spectral analyser tells you there's an error.

      The most critical part of a perceptual encoder is the "psycho-acoustic model", which tries to model as best as it can the sensitivity of the ear at a given frequency, given the rest of the spectrum. This is not an easy task, and you have to make lots of approximations. Given two encoders that produce the same quantitive result (SNR, ...), the beat one will be the one with the best psycho-acoustic model and your $10 k oscilloscope of spectral analyser won't see that at all.

  45. MP3 for Audiophiles?? by Stoutlimb · · Score: 2

    Is it me, or or does this seem like an oxymoron? Not being an audiophile, someone correct me if I'm wrong here... Audiophiles are interested in the most accurate reproduction of sound... Why would they even consider a lossy compression scheme at all? Just like serious digital artists shun JPEG for all but web distribution to the masses, and even then we see much done in gif or tiff. I would say that MP3 audio done by ANY encoder is unacceptable to an audiophile.

    Second, I want to challenge some of the assumptions and declarations that this experimenter made. The experiments placed on these encoders are mostly "torture tests" that one would never encounter in real situations... And by using this series of torture tests he tells people which encoders are best for encoding mp3's. Does anyone see this reasoning as flawed? He's subjecting encoders to situations that NONE of them have been designed for, and proclaiming that this has something to do with reality. I see little correlation... How often do you hear pure sine sweeps in any song?

    I found the previous mp3 performance analysis posted on Slashdot to be much more informative. It put the encoders up on real world performance, and rates them accordingly.

    The guys who wrote the encoders realized that some things just wouldn't happen in normal music, such as these torture tests, so they wrote "shortcuts" that ignored these conditions, and resulted in a higher compression rate! How dare he rate encoders on something that the programmers all deliberately IGNORED.

    My friends, trust no statistics that you did not falsify.

  46. Re:In the final analysis by Ranger+Rick · · Score: 2

    Because it's a real pain in the ass to mess with 300 CDs, but it's really easy to select a directory with 300 CDs worth of music and put it on random. You have no idea how useful it is until you put 4000 songs (I'm not kidding) on random. :)

    --

    WWJD? JWRTFM!!!

  47. Re:Caveat Lector by Chris+Johnson · · Score: 2

    I would hope that anybody reading either what I wrote, or what you've just written, would avoid accepting unsupported claims, consider the facts of the situation, and make up their own minds...

  48. Re:What about Xing (AudioCatalyst)? by hymie3 · · Score: 3
    I know that Xing (AudioCatalyst) doesn't have the greatest encoder, but that's no reason to leave it out...

    Well, actually, there is a reason: the Xing encoder blows chunks. Sure, it's fast, but the sound quality sucks. If all you're encoding is Teeny Bopper of the Week music, then you're not missing out on anything. If you're encoding stuff that's a lot more complex, you're better off with soemthing that doesn't sacrifice quality for speed..

    hymie

  49. A similar, if not better comparison by Jack9 · · Score: 3

    http://users.belgacom.net/gc247244/analysis.htm#MP 3ENC31 This is what I found when searching for mp3 comparison. It compares different implementations of encoding for mp3 as well as output quality. Much more useful and definitive.

    Often wrong but never in doubt.
    I am Jack9.

    --

    Often wrong but never in doubt.
    I am Jack9.
    Everyone knows me.