I'm not claiming that RDRAM is the way to go for today's PC's and PC applications. Who cares how many milliseconds it takes to cut and paste in Word? I'm saying that today's high-end servers, and tomorrow's high-end workstations (and yes, eventually, your PC) are/will be severely bandwidth limited.
A processor with an integrated RDRAM controller will be able to have low access latency and extremely high bandwidth (keep in mind that you haven't seen a processor with an integrated RDRAM controller, yet, so the current benchmarks should be taken with a grain of salt). The next generation of processors will have to use RDRAM to compete, because there isn't currently a good high-bandwidth alternative that scales.
Would I want a PC with Rambus? No, not today and probably not for a few years. But servers and workstations are a different animal.
The slashdot crowd refuses to acknowledge that Rambus is actually a solid and viable memory technology (that's just a couple years ahead of its time). We'd rather just rant and rant about DDR is great and Intel sucks and Rambus is a fascist organization, yadda yadda yadda. Pack up your reality and go home.
While this article isn't perfect, it does give a good overview of the competing memory technologies, and doesn't cave to the predominant "Rambus sucks" slashdot mentality.
As I've said many times in the past, in the future, memory will look more and more like Rambus and less like DDR SRAM.
It's that you need four of them to match the bandwidth you can get with RDRAM from a 64bit bus. How many processors do you know of with 4 SDRAM busses?
BTW: What 700Mhz bus are you talking about? As far as I know, the spec for RDRAM is 2x (yes DDR) on a 400Mhz bus.
Actually, RDRAM simms are pretty simple, the additional circuitry is in the controllers--which is, admittedly, more complex than what's necessary to make a DDR-SDRAM work, but not that big a deal when compared to the complexity of modern processors. It's expensive because it is new. Plus, RDRAM is for higher-end systems, not your grandma's PC, at least not yet.
I'm one of those computer researchers, and I think that memory latency is not that big a deal. For single-thread performance, sure, a TLB miss can idle your processor. But when you are a server, you just swap contexts and continue working--with no problem at all. You are right that latency is important, but really more of latency to L1 & L2 caches, not memory latency. Let's play some games with scale:
L1 cache hits 95% of the time at 1ns a pop.
When L1 misses, L2 hits 90% of the time at, say, 15ns a pop.
When L2 misses, memory satisfies requests at between 40 & 100ns (SDRAM/RDRAM roughly)
So, you're average memory latency time for a SDRAM system is:
(95% * 1ns) + (5% * (90% * 15ns + 10% * 40ns)) = 1.825ns
And for a RDRAM system, it is:
(95% * 1ns) + (5% * (90% * 15ns + 10% * 100ns)) = 2.125ns
Now, your SDRAM, or DDR-SDRAM, system might have less than half the latency of RDRAM (40ns vs. 100ns) (and believe me, that is a conservative ratio as SDRAM isn't that fast and RDRAM isn't that slow), but the average memory latency is only about 16% worse. This is because the average memory latency is largely controlled by the cache latencies, since caches have such high hit rates.
Now factor in that RDRAM can supply three or four times as much bandwidth as SDRAM or DDR-SDRAM for an equivalent pin-count, and you have a much, much faster computer with RDRAM.
The problem is that Intel did a high latency and low-bandwidth implementation of Rambus, and therefore you haven't seen it perform yet. They're next version should be better. Otherwise, just wait for EV7.
that what Intel did was add a single rambus channel to a chipset and hack that in to an existing processor. No wonder it doesn't work well! If you put the controller on-chip (as DRDRAM intends), then you can have relatively low-latency to memory, and you can use multiple channels to get extremely high bandwidth.
problems as Rambus, then (tight tolerances, etch resistance, etc.). If the fast bus isn't exactly twice as fast as the two slower ones, then you have to do some kind of multiplexing and bus arbitration outside the CPU--which creates all kinds of livelock and starvation problems, probably.
when it is done right. The peak theoretical bandwidth of that system is reported to be 12.8GB/s--which is almost 6 times better than the second place contender (also an Alpha, incidentally, 21264). And the latency of a dedicated DRDRAM implementation (using on-chip controllers) is much lower than a chip-set implementation.
I've said it before and I'll say it again. Intel's initial crack at RDRAM is a poor implementation and shouldn't be used to judge Rambus. In a year or so you will see some legitimate hardware wrapped around RDRAM, and then you will see the truth.
DDR SRAM requires 4x as many pins as RDRAM. If you use 4 RDRAM channels (for same pin count as DDR SRAM), all of a sudden you are talking 6400MB per second, or thereabouts. And guys!, the latency is not that bad.
DDR-DRAM is fundamentally bandwidth limited when compared to a DRDRAM implementation in the same technology. Servers need high memory bandwidth, and latency is not really an issue (BTW: a good RDRAM implementation has decent latency). You will see this in the next couple years.
but you're right, in the PC world it doesn't seem to be a good fit. Plus, Intel has done an extremely poor implementation of RDRAM for their initial offerings, which has handicapped it in the benchmarks.
You can't just interleave memory to solve bandwidth problems...every type of memory does that. The only way to get higher bandwidth with SDRAM is to A) increase the speed, or B) use multiple memory buses going into the CPU (very costly pin wise).
SDRAM has some nice properties, and I'm sure it's appropriate for the pc-world, but within a year or two it will be severly bandwidth limited for high-end systems.
I'm not claiming that RDRAM is the way to go for today's PC's and PC applications. Who cares how many milliseconds it takes to cut and paste in Word? I'm saying that today's high-end servers, and tomorrow's high-end workstations (and yes, eventually, your PC) are/will be severely bandwidth limited.
A processor with an integrated RDRAM controller will be able to have low access latency and extremely high bandwidth (keep in mind that you haven't seen a processor with an integrated RDRAM controller, yet, so the current benchmarks should be taken with a grain of salt). The next generation of processors will have to use RDRAM to compete, because there isn't currently a good high-bandwidth alternative that scales.
Would I want a PC with Rambus? No, not today and probably not for a few years. But servers and workstations are a different animal.
The slashdot crowd refuses to acknowledge that Rambus is actually a solid and viable memory technology (that's just a couple years ahead of its time). We'd rather just rant and rant about DDR is great and Intel sucks and Rambus is a fascist organization, yadda yadda yadda. Pack up your reality and go home.
While this article isn't perfect, it does give a good overview of the competing memory technologies, and doesn't cave to the predominant "Rambus sucks" slashdot mentality.
As I've said many times in the past, in the future, memory will look more and more like Rambus and less like DDR SRAM.
The 21264 uses SDRAM, while the 21364 will use Direct RDRAM. Latency is pretty good on both.
...in research. Nothing earth-shattering.
It's that you need four of them to match the bandwidth you can get with RDRAM from a 64bit bus. How many processors do you know of with 4 SDRAM busses?
BTW: What 700Mhz bus are you talking about? As far as I know, the spec for RDRAM is 2x (yes DDR) on a 400Mhz bus.
Actually, RDRAM simms are pretty simple, the additional circuitry is in the controllers--which is, admittedly, more complex than what's necessary to make a DDR-SDRAM work, but not that big a deal when compared to the complexity of modern processors. It's expensive because it is new. Plus, RDRAM is for higher-end systems, not your grandma's PC, at least not yet.
I'm one of those computer researchers, and I think that memory latency is not that big a deal. For single-thread performance, sure, a TLB miss can idle your processor. But when you are a server, you just swap contexts and continue working--with no problem at all. You are right that latency is important, but really more of latency to L1 & L2 caches, not memory latency. Let's play some games with scale:
L1 cache hits 95% of the time at 1ns a pop.
When L1 misses, L2 hits 90% of the time at, say, 15ns a pop.
When L2 misses, memory satisfies requests at between 40 & 100ns (SDRAM/RDRAM roughly)
So, you're average memory latency time for a SDRAM system is:
(95% * 1ns) + (5% * (90% * 15ns + 10% * 40ns)) = 1.825ns
And for a RDRAM system, it is: (95% * 1ns) + (5% * (90% * 15ns + 10% * 100ns)) = 2.125ns
Now, your SDRAM, or DDR-SDRAM, system might have less than half the latency of RDRAM (40ns vs. 100ns) (and believe me, that is a conservative ratio as SDRAM isn't that fast and RDRAM isn't that slow), but the average memory latency is only about 16% worse. This is because the average memory latency is largely controlled by the cache latencies, since caches have such high hit rates.
Now factor in that RDRAM can supply three or four times as much bandwidth as SDRAM or DDR-SDRAM for an equivalent pin-count, and you have a much, much faster computer with RDRAM.
The problem is that Intel did a high latency and low-bandwidth implementation of Rambus, and therefore you haven't seen it perform yet. They're next version should be better. Otherwise, just wait for EV7.
that what Intel did was add a single rambus channel to a chipset and hack that in to an existing processor. No wonder it doesn't work well! If you put the controller on-chip (as DRDRAM intends), then you can have relatively low-latency to memory, and you can use multiple channels to get extremely high bandwidth.
problems as Rambus, then (tight tolerances, etch resistance, etc.). If the fast bus isn't exactly twice as fast as the two slower ones, then you have to do some kind of multiplexing and bus arbitration outside the CPU--which creates all kinds of livelock and starvation problems, probably.
Yeah, but it is hardly $1000 vs. $150, as the original poster said.
when it is done right. The peak theoretical bandwidth of that system is reported to be 12.8GB/s--which is almost 6 times better than the second place contender (also an Alpha, incidentally, 21264). And the latency of a dedicated DRDRAM implementation (using on-chip controllers) is much lower than a chip-set implementation.
...and it is not really an issue.
I've said it before and I'll say it again. Intel's initial crack at RDRAM is a poor implementation and shouldn't be used to judge Rambus. In a year or so you will see some legitimate hardware wrapped around RDRAM, and then you will see the truth.
DDR SRAM requires 4x as many pins as RDRAM. If you use 4 RDRAM channels (for same pin count as DDR SRAM), all of a sudden you are talking 6400MB per second, or thereabouts. And guys!, the latency is not that bad.
DDR-DRAM is fundamentally bandwidth limited when compared to a DRDRAM implementation in the same technology. Servers need high memory bandwidth, and latency is not really an issue (BTW: a good RDRAM implementation has decent latency). You will see this in the next couple years.
but you're right, in the PC world it doesn't seem to be a good fit. Plus, Intel has done an extremely poor implementation of RDRAM for their initial offerings, which has handicapped it in the benchmarks.
You can't just interleave memory to solve bandwidth problems...every type of memory does that. The only way to get higher bandwidth with SDRAM is to A) increase the speed, or B) use multiple memory buses going into the CPU (very costly pin wise). SDRAM has some nice properties, and I'm sure it's appropriate for the pc-world, but within a year or two it will be severly bandwidth limited for high-end systems.