Sony Super CD: More Bits, More Bucks, Mo' Betta?

← Back to Stories (view on slashdot.org)

Sony Super CD: More Bits, More Bucks, Mo' Betta?

Posted by timothy on Saturday October 14, 2000 @03:58PM from the betamax-memory-stick-SDMI-new-CD-format dept.

Reader dcigary pointed to this "nice writeup on the new Sony Super CD." Though the explanation of the difference between supposedly revolutionary "DSD" recording over conventional digital seems to get by with a knowing mumble, the piece does mention the price (high) and that competition from audio-only DVDs may cripple acceptance of the new format. Even if I like the idea of ultra-fidelity, my faith in the Nyquist theorum is too strong to spend a grand and a half on a CD player anytime soon ...

11 of 309 comments (clear)

Min score:

Reason:

Sort:

What SACD is all about and why DVD audio is better by XNormal · 2000-10-14 19:05 · Score: 4

Virtually all audio A/D and D/A converters today use sigma-delta, also commonly referred to as "one-bit" conversion.

In a sigma-delta A/D converter the audio signal is sampled with a high sampling frequency (typically a few MHz) and low sample resolution (1 bit). An error feedback mechanism is used to ensure that most of the energy of the quantization noise is "shaped" into high frequencies, giving excellent fidelity in the audio band. One bit is inherently linear - no need for carefully matched resistor networks such as those used on older A/D converters. This stream is then filtered and decimated using digital signal processing techniques to a lower sampling rate (e.g. 44100Hz) while gaining sample depth on the way (16 bits and higher).

For D/A conversion the process is reversed: the 44100Hz signal is interpolated up to a high sampling rate and then the sample depth is reduced down to one bit. Again, error feedback is used to ensure that the quantization noise resulting from the low resolution is shaped to high frequencies. This bitstream is then low-pass filtered and used as the audio signal. Again, with much better linearity than D/A converters based on carefully adjusted resistor networks.

The Sony SACD skips the decimation and interpolation stage. It stores the noise-shaped bitstream directly on the disc. The beauty of this idea is in its simplicity: it performs much less transformations on the sigma-delta signal and therefore should offer inherently higher fidelity and wider bandwidth.

If sigma-delta converters were available 20 years ago when the CD was invented they would probably have chosen this method for its simplicity. But at that time the analog conversion technique known was resistor networks so PCM was used.

Remember that at the time the CD was really stretching the limits of consumer technology. No other consumer product prior to the CD player used so many new and advanced technologies: lasers, error correction, digital signal processing. If they could have used this technique it would have reduced the cost of CD players significantly. For example, this bitstream is much more tolerant to bit errors because unlike PCM there is no "most significant bit" that can cause a large error if corrupted.

Using this technique today, though, is insane. There is no real savings in simplicity when a million digital transistors cost close to nothing. If you want higher fidelity, 96kHz and 24 bits is more than enough.

Let's say you want something simple like a graphic equalizer on your SACD player. If it's analog such a complex circuit will introduce lots of noise. If you implement it digitally it would take insanely large amounts of CPU power to process a signal sampled at over 2mHz. Manufacturers will probably end up downconverting it to PCM at 96kHz or lower, doing the signal processing and then converting back to sigma delta for playback. This will lose all of DASD's alleged advantages.

BTW, for the purpose of preserving analog masters DASD is really a good idea because they contain useful information at very high frequencies such as the tape bias signal and the intermodulations it creates. Preserving this information will allow future signal processing techniques to create accurate models of the nonlinearities of the magnetic medium and use this high frequency information to reconstruct the original recording with better fidelity down in the audio band. For home use SACD is a very bad idea. Just about the only good thing I can see about it is that it can be marketed effectively because it's such a "radical new concept".

The DVD audio uses conventional, well proven PCM with somewhat higher sampling frequency and bit depth than CD. Why use a higher sampling frequency when we can't hear over 20kHz? It turns out that while we can't hear a sinewave at frequencies higher than 20kHz the high frequency components of complex waveforms make a noticable difference even up to 26kHz. To take a good safety margin and maintain integer ration a 96kHz sampling rate was used. This does not significantly hurt the data rate required because non-lossy compression is used on DVD audio. A compressed 96kHz signal takes about 30% more space than a compressed 48kHz signal. 16 bits is, again, almost enough. In fact, with proper in-band noise shaping the noise floor is inaudible in all but very extreme circumstances. 24 bits is therefore a very good safety margin.

Another reason why DVD-audio is superior is because it supports Ambisonics. Ambisonics is a surround sound system. It was not crated for cinematic effects. Ambisonics was designed for music and for reconstructing the subtle spatial cues of the ambience of the recording venue. With a proper arrangement of speakers it can create true 3D sound - including the height dimension. Imagine listening to a recording and feeling the height of the concert hall!

Please never ask "how many channels does Ambisonics use" because it's not a relevent question. Ambisonics deconstructs the 3D sound field mathematically using a four component representation (XYZW). This representation can be processed with a simple linear matrix for playback on different speaker configurations and numbers of channles with varying levels fidelity of 3D soundfield reconstruction. This includes the popular 5.1 setup used on home theaters (it's probably going to be the default settings for DVD-Audio players) .

DVD-Audio is also backward compatible with DVD players although a DVD-audio player will be required to take advantage of all the features and full quality.

More information about DVD-Audio here

----

--
Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
You've got it wrong by The+Mayor · 2000-10-14 14:28 · Score: 4

Even if I like the idea of ultra-fidelity, my faith in the Nyquist theorum is too strong to spend a grand and a half on a CD player anytime soon ...

You've got it all wrong. You see, humans have an approximate range of hearing between 20Hz and 20KHz (assuming no hearing loss). Now, Nyquist theory says that you must sample at twice the frequency of the highest frequency you wish to preserve. We use 44.1 KHz for this. Sounds good so far, right?

Well, what many neglect to mention about Nyquist theory is that you must run the resulting output through a filter. The filter, according to the theory, is a brick-wall filter. Of course, these things don't exist. Filters have a roll off. As a result, people invented the concept of oversampling. This way, you move the sampling frequency way above 44.1 KHz, and you can put a filter in at, say, 100 KHz. Nice, right? Wrong. Filters have audible effects *well* below their -6dB level.

That said, there's still another problem. People have an approximate dynamic range for their hearing of 120dB. Using 16 bit samples (like CDs), you end up with only 96dB of theoretical dynamic range. So people invented the concept of running a low level noise input when digitizing. This ends up pushing the dynamic range above 96dB. This is how most modern CD players can claim a dynamic range of about 102dB. Sounds good, right? Wrong. You're increasing the dynamic range, but you're also increasing the noise. This is not good.

Of course, for 1980, CDs were pushing the limits of technology. Now they're not. Now we have DVDs. With DVDs and compression, you can get 5 channels of sound digitized with 24-bit sampling at a sample rate of 96kHz. Now that kicks ass. Of course, all the different parties messed around with the standards committees long enough to pretty much kill DVD-Audio (it was finally released a few months ago, but there is way too little material released under the format).

So, if you want a cheap CD player that truly sounds good, I recommend you get a DVD player, and listen to music on DVDs or DVD-As. Even a cheap one can have a pretty poorly built filter and sound OK. Of course, cheaper D2A converters have their own problems, like jitter. But that's a story for another time.

Look, if you don't believe me about this quality issue, go to your local high-end (and I don't mean that they carry Denon and Yamaha...I'm talking about equipment like Wadia and Krell), and ask them to do some listening tests with a Krell compared to your $189 Technics (or whatever). If you don't hear a difference, then you either have hearing loss or you aren't used to paying attention to sound quality. Like many elements of perception (sight, hearing, etc), the more you work the sense, the more acute it becomes.

--
--Be human.
Nyquist theorem by darkwiz · 2000-10-14 11:08 · Score: 4

Nyquist's theorem states that the highest frequency that can be represented is one half the sampling rate. This is obvious because you must be able to detect at least a peak and a valley of the sound wave.

Nyquist's theorem does not imply, however, that the representation of the maximum [or near maximum] frequencies will be highly accurate as far as the shape of the wave form is concerned. At and around 1/2 sampling freqency, the wave forms become basically nothing but square waves [alternating between a single high, and a single low point]. In order to deal with this, some sound decoders will attempt to interpolate the waves, but they cannot reproduce the original sound accurately. This is why higher sampling frequencies ARE relevent to higher audio fidelity. Higher bit resolutions are arguable though...
1. Re:Nyquist theorem by nathanh · 2000-10-14 12:16 · Score: 5
  
  Nyquist's theorem states that the highest frequency that can be represented is one half the sampling rate. This is obvious because you must be able to detect at least a peak and a valley of the sound wave.
  
  Entirely correct.
  Nyquist's theorem does not imply, however, that the representation of the maximum [or near maximum] frequencies will be highly accurate as far as the shape of the wave form is concerned. At and around 1/2 sampling freqency, the wave forms become basically nothing but square waves [alternating between a single high, and a single low point]. In order to deal with this, some sound decoders will attempt to interpolate the waves, but they cannot reproduce the original sound accurately. This is why higher sampling frequencies ARE relevent to higher audio fidelity. Higher bit resolutions are arguable though..
  
  You fail! This idea that the signal is not perfectly represented just because you have only two sample points is complete nonsense. Only two sample points are needed because you know the encoded signal must have been low-pass filtered at half the sampling rate before sampling (otherwise you would have introduced aliasing errors). Given this information you can entirely reproduce the original signal as it was before sampling. Nyquist's theorem states that you can exactly reproduce the signal if sampled at twice the signal's maximum frequency. I quote Oppenheim and Willsky:
  The sampling theorem establishes the fact that a bandlimited signal is uniquely represented by its samples.
  
  In layman's terms: you don't need more bits to reproduce the original signal. You just need a perfect low-pass filter on your output and infinite precision on your PCM samples. A sine wave with sampling points at the exact peaks and troughs will produce a square wave of the same frequency after sampling/modulation. This square wave will contain the frequency you want plus odd harmonics. The harmonics are naturally going to be higher frequencies and so they will be removed by an appropriately picked low-pass filter. And what's the appropriate cut-off frequency for your low-pass filter? 1/2 the sampling rate, of course. The result is the original sine wave.
  Now in practise they actually do sample at higher than the low-pass cut-off frequency, but this is because of other limitations. The PCM samples are only 16-bit, not infinite precision. Also there is no such thing as an ideal low-pass filter: realistic (and affordable) filters will take several kHz to drop from 0dB to -9dB. Also you need exactly -/2 phase difference between your sampling pulse train and the source signal. There are also aliasing issues but at this point the discussion gets heavily into mathematics.
  Higher resolution is what is actually needed but this is expensive to achieve. Increasing the sampling rate is far more practical (considering how fast CPUs are) and a heck of a lot cheaper. This is the real reason DVD audio samples at 96kHz. It's not because you can hear 48kHz tones but because it lets the DVD manufacturers use cheap DACs and cheap low-pass filters without sacrificing fidelity.
Please! Most people think (cough) MP3s sound fine. by hatless · 2000-10-14 20:43 · Score: 5

So here's a digital format that should please nearly all the classical music afficionados out there who spend tens of thousands of dollars constructing acoustically-perfect "listening rooms". Nothing bad about that. At the very least, it finally creates a reasonably lossless way to digitize analog material for archival and preservation purposes--although any archivist will tell you that the real archives themselves for long-term preservation should be old-fashioned stamped analog discs.

These two markets--archivists and money-is-no-object audiophiles--should be covered with about 20,000 of these devices. So what about the rest of us? I have serious doubts that the difference between this and DVD-Audio can be heard on even a $3,000 home theater system.

Sony (and presumably Philips/Magnavox) intend to build support for this into all of their players starting sometime soon, maybe a year from now. The thing is, nearly all the DVD players being sold today can play the competing DVD-Audio discs. None, not even Sony's, and not any of those millions of Playstation2s shipping in the next year, can play SACDs.

Ultimately, this is about patent royalties. Sony and Philips have been collecting royalties on every CD player and CD drive sold for over a decade now, and SACD is about trying to do it again for another decade. DVD-A is the format endorsed by everyone in the industry except Sony and Philips. Is it a good professional archival format? Nah. Is it both better and more flexble than CD? Yep.

So here's the ugly truth. The MP3 revolution seems to have proven that most people have tin ears. Ask a hundred people. 98 of them will tell you that 128Kbps MP3 is "CD quality". Fact is, it's inferior to Minidisc, to FM radio and--in many respects--analog cassettes. But it doesn't have hisses and pops, and that's all most folks really notice. Heck, 320Kbps MP3 sounds crappy next to a CD, even on a $400 stereo.

If people think MP3 is "good enough"--when it can't even hold a candle to CD--why is the mass market going to embrace SACD over DVD-A? Especially when they'll have DVD-A players available from dozens of manufacturers and SACD players most likely available from... three?

CD will be superseded, not because most people want higher-resolution sound quality they can't hear on Britney Spears remixes, but (1) because DVD-A and SACD players will offer things like 6-channel sound and bundled-in DVD video clips, and (2) because the record industry will stop making CDs, just like they stopped making LPs, in order to force everyone to buy the new players and buy yet another copy of Billy Joel's Greatest Hits to go with the LP, cassette and CD they already have.

The best format won't win. The more ubiquitous one will. The question remains which coalition will blink first. Will the Sony-Philips side break down and allow their record companies to start making DVD-As once they see SACD players aren't selling well, or will companies like Matsushita start paying royalties and buying chips from Sony because the Sony/Philips DVD-A embargo has made it impossible to get record stores to carry DVD-As?
Re:Sorry. by n+xnezn+juber · 2000-10-14 17:06 · Score: 4

Ok, Chris... I replied to one of your posts and I thought you at least had a clue of what your were talking about. And then I see this post. I'm sorry but you have no understanding of digital signal processing and fourier series. I mean this whole-heartedly that you are missing the fundmental mathematical concepts to understand why a 14.7kHz sine wave can be perfectly reproduced with 44.1kHz sampling and the appropriate filter.
For your education I pulled a couple links from the web:
Site A has two pictures of a sine wave being sampled. This web page is totally wrong. They do not understand aliasing... that picture they are showing with the straight lines shows the reason that you need to have a low-pass filter. With the appropriate low-pass filter, there sine wave in the above picture will be reproduced exactly.
Site B shows the frequency domain. You're probably seen a similiar plot of the horizontal axis being frequency and vertical axis being magnitude. Don't worry about the math if you don't understand it. Just look at the pictures. The top picture shows the sampling period being less than half the period of the highest frequency in the original signal (the bell shaped thing centered at frequency 0. This is like your 14.7kHz sine wave sampled at 44.1kHz. The second is when the period of sampling T equals half the period of the highest frequency (see how the edges of that waveform exactly touch each other?). The bottom picture shows what happens when the sampling period is greater than half the period of the highest frequency. That portion of the bell shaped thing that overlaps one another is sampling noise. In other words everything that is overlapping is lost.
Site C is another site that does not understand nyquist's theorem. They are completely thinking in terms of the time domain instead of the frequency domain. Not to mention that they don't realize you always have to low-pass filter a sampled signal.
Site D actually is correct and should be understandable to even the least mathematically inclined.
22050Hz by jbf · 2000-10-14 11:48 · Score: 5

More frequency range isn't going to be recorded, played, or heard by anyone.

First of all, things above 22kHz aren't picked up by ordinary mics... Even the ultra-high-end Neumann U87Ai only claims 20-20kHz frequency response (http://www.neumann.com/mics/u87ai.htm)

Secondly, most speakers won't crank out those high frequencies without a severe falloff in response: the high-end Genelec 1038A triamped monitor gets you 33-20k Hz (-3dB). (http://www.genelec.com/products/1038a/1038a.htm)

Finally, most people can't hear above 20kHz, especially those people who are incessantly blasting their ears out with loud music.

The best reason for Super CD (or DVD or whatever) is higher bit depth, NOT higher sampling rate; going from 16/44.1 (CD quality) to 24/44.1 takes just 50% more space, for nontrivially better quality, while going from 16/44.1 to 16/88.2 brings minimal benefit at a 100% space penalty.
1. Re: 22050Hz by bbrantley · 2000-10-14 13:06 · Score: 5
  
  First of all, things above 22kHz aren't picked up by ordinary mics... Even the ultra-high-end Neumann U87Ai only claims 20-20kHz frequency response (http://www.neumann.com/mics/u87ai.htm)
  Far from true. The mikes used in this paper, "There's life above 20 KHz!", certainly were capable of this.
  Secondly, most speakers won't crank out those high frequencies without a severe falloff in response: the high-end Genelec 1038A triamped monitor gets you 33-20k Hz (-3dB). (http://www.genelec.com/products/1038a/1038a.htm)
  Also not true. Unless there is a low-pass filter to prevent sending higher-frequency signal to the tweeters, most amplifiers, speaker wire, and drivers will gladly play sounds upwards of 100KHz. Whether it is necessarily FLAT is another story, as most people don't optimize (or even measure) flatness above 20k.
  The best reason for Super CD (or DVD or whatever) is higher bit depth, NOT higher sampling rate; going from 16/44.1 (CD quality) to 24/44.1 takes just 50% more space, for nontrivially better quality, while going from 16/44.1 to 16/88.2 brings minimal benefit at a 100% space penalty.
  This is probably true, except that "minimal" may be too harsh a term. Have YOU ever done a careful comparison between a 16/44.1 recording and a 16/88.2 recording? (I have!) On a somewhat-related note, it is remarkably interesting what effect a more accurate clock signal has on the quality of a 44.1KHz recording. The human ear can distinguish playback when the timing of these samples being played back varies by as little as 10^-10 seconds!
  The reality is that the human ear's ability to differentiate is remarkably more subtle and complex than the market (and marketeers) would have you believe.
Upgrade media to new format? by knarf · 2000-10-14 12:06 · Score: 5

Well, another day, another media format. Of course, the media companies will happily sell me their products. But I already have Radiohead's 'OK Computer' on CD, so I already paid the license fees. I want to 'upgrade' that CD to the format-du-jour, and am willing to pay the production costs and a little something to make it worthwile for the industry to keep on developing new products. I do NOT want to pay royalties again, since I already did. And since I have always been told that those compact discs are so expensive because of the license fees, this upgrade should be quite cheap, am I right? I mean, I only OWN the piece of plastic, which is cheap. It is the license fee which drives up the price (or so 'they' say). So, just let me upgrade my piece of plastic then...

No, unfortunately I am wrong. But I should be right...

--
--frank[at]unternet.org
Nyquist really has little to do with it by mcg1969 · 2000-10-14 13:16 · Score: 4

The SACD has a sampling rate of 2.82 MHz. This means that theoretically you could accurately (AD/DA converters and such aside) reproduce the frequencies up to 1.41 MHz (1410000 Hz).
Sorry, SACD is a lot more complex than a simple application of Nyquist can handle. The key to SACD's high fidelity is all in the quantization theory.
Yes, an SACD has a sample rate of 2.82MHz, but that's with one bit per sample (per channel). Yep, that's right---a single bit per sample. In fact, the signal-to-noise ratio on a SACD is very likely negative--there is more noise than signal.
Now before you blow your top with how absurd that sounds, let's clarify one thing: the SACD format jumps through serious technical hoops to insure that the vast majority of that noise is in the completely inaudible range. And, the vast majority of the signal is, of course, within the audible range. The technique is, not surprisingly, called "noise shaping".
So once you limit your measurements to, say, 0-20kHz, you're back to where you would hope: the astronomical dynamic range and signal-to-noise ratio of a high-fidelity audio format. (In fact, SACD is designed to provide ultra-low noise and 120dB of dynamic range all the way out to 100kHz, from what I understand.)
For those of you who remember, or perhaps own, CD players with "1-bit D/A"s, you're using a similar version of this technology. The difference is that the SACD recording process can decide at the mastering stage how to get down to 1 bit per sample, and that's a much better place to make that decision.
General Niftiness :) by Chris+Johnson · 2000-10-14 13:34 · Score: 5
I _hope_ this becomes common technology- there are some extraordinarily important things about it. A little background:
- 16 bits isn't enough. That's _really_ obvious at this point- no professional works in 16 bits except for the final CD output. Mix busses have to be many times that in the digital domain, but even if you mix with ideal noiseless coloration-less electronics there's a really big difference between monitoring an undigitised feed of the signal with monitoring the 16 bit output.
- 44.1K isn't enough either. This is not primarily due to people being able to hear beyond 20K (though you can sense such sounds to some extent- why do you think smashing glass or dropped plates make you jump? Viciously loud supersonic transients), it's due to the brick-wall filters required. High end amplifier designers go to great lengths to get their pass-bands up into the megahertz (and nobody claims humans hear that!) because cutting off lower causes interactions across the entire frequency band. Cutting off at 22K is just ridiculous.
Now, how does the Sony approach compare? The neat thing about the bit rate is that it's effectively infinite bit rate- it's not a finite set of voltage levels but just one bit very fast tracing a voltage level that could be anywhere. This is substantially beyond even 24 bit- a major, major advance. That's gonna be very noticable.
As for frequency, there is a surprise in store here. It may or may not be competitive with advanced PCM encoding at say 96K- but two very, very important points:
- There doesn't need to be _any_ brickwall filter on the output- provided a circuit can be made to output this stuff that doesn't merely calculate it as a super-PCM-encoding and D/A converter. If the format can feed a sort of very high frequency analog synthesiser, no filter is needed- which is critical, because...
- ...the potential slew rate of this technology is just astronomical. I hope the power supply of the players is up to it- if not there will be some very effective power supply mods waiting to be done, such as backing up the power supply with MIT Multicaps (a film cap that can produce very very high instantaneous voltage). Basically, if you fed this technology a big square wave, it might not be able to turn the corners of the wave instantly, but the vertical parts of the wave would be _vertical_- no brick-wall-filtered system can get anywhere close to this.
We're talking absurdly high transient peak voltages here: this is why high end audiophiles use absurdly heavy cables and absurdly powerful amplifiers, to let those peaks through. It doesn't hurt the speakers: this isn't RMS or even 'peak' wattage, the spikes are of such short duration that you can feed speakers many times the maximum 'peak' voltage if it's only for a microsecond, and high end systems do just that.
Where do you find such peaks? Easy- The Who ;) seriously, The Who is a _good_ example, but symphony orchestras are also good for this. The capacity for this type of extreme and essentially 'inaudible' (too brief!) transient translates to the ability to produce the _sensation_ of loudness- for instance, you could easily make many systems play 'Live At Leeds' and sound loud and bright and kind of grating and ear-splitting, but with this technology it would be less grating but more _electrifying_ and the impact would be like having the living people right there playing at you, not just a bunch of very loud sounds. Alternately, you could play big orchestra crescendos and the resulting sound would be _huge_, not just loud but as big as a live performance.
It's really not hard to make stuff sound 'loud', but making it _feel_ loud is something else. If you don't have that, the loudness ends up being just a grating, thin surface, which is actually a very good description of the sound of most pop recordings these days :) the irony is that this technology is coming around just when the recording industry's pushing sounds that are substantially worse than even CD audio can produce...
Bottom line: I want one. Specifically, I want this to _master_ to. I have quite a bit of stuff that loses about 2/3 of its potential when made into 44/16 (eight tracks of 48/20 output analog and mixed with passive resistance mixing will tend to do that- I once figured the rough equivalent resolution was about a 64 bit mix bus, possibly higher) Maybe I should try to wheedle Sony out of a recorder ;)