Music Listeners Test 128kbps vs. 256kbps AAC
notthatwillsmith writes "Maximum PC did double-blind testing with ten listeners in order to determine whether or not normal people could discern the quality difference between the new 256kbps iTunes Plus files and the old, DRM-laden 128kbps tracks. But wait, there's more! To add an extra twist, they also tested Apple's default iPod earbuds vs. an expensive pair of Shure buds to see how much of an impact earbud quality had on the detection rate."
Not only that, but audio professionals typically do codec and compression tests using an ABX test.
http://en.wikipedia.org/wiki/ABX_test
This would have been more interesting if they had used a statistically valid sample size and not only compared 128 to 256, but also to lossless.
The MPEG community uses a MUSHRA test* to judge the quality of new codecs and to decide on bitrates etc. If there are n-codecs under test than the subject can switch A-B style between n+2 different versions of the same piece of music. These are the n-codecs and a reference or lossless version. He does not know which is which. He can also switch to one which he knows is the reference track (so the reference track is in there twice, labelled in one case and not labelled in the other). The task is to rate (0-100) each of the unknown tracks based on how similar it is to the reference track. One important thing to remember about the task is that the subject must rate similarity, rather than 'quality' or anything else. A certain codec could, for instance, add a load of warm bass to a piece of music making it more pleasurable (maybe) to listen to, but decreasing its similarity to the reference piece. The idea is that the subject should be able to pick the reference track from the unknowns (giving it a score of 100) and then rate all of the other unknowns in terms of similarity to the reference. The codec with the highest score wins. This type of test would be carried out for each of a number of pieces of music, with a lot of listeners.
* sorry, I've no good link- it's in ITU-R BS.1534-1 "Method for the subjective assessment of intermediate quality level of coding systems".