From Rambus to DDR:Memory Explained
rosewood sent us linkage to an article that explains memory and more. A fairly detailed story talking about RAM in general, as well as explaining Rambus, DDR (including 1.5 and 2). Well written and worth the read. And it even features lots of diagrams (although some of the tables seem to have been designed by someone who is color blind, using white text on very bright backgrounds. Why do people do that?) Anyway, highly recommended.
However another thing that may not be obvious - today's 133MHz DRAMs being used in PCs are top-of-the-line - back in 1989 the fastest DRAMs were only being used in high-end servers because of the price premium.
(some background on why Rambus is good/bad in general) I've done designs with many of these technologies (traditional async ras/cas, sdram, rambus, not DDR) over the years - the older rambus designs were certainly harder to implement with (they used more of a network protocol paradigm) but not by much. The main thing about rambus is that at some level it trades off latency for bandwidth - there are some places where this is actually a good thing - display controllers for example.
Rambus also is a win in places where lots of concurrent transactions are available - the finer grained banking allows parallel row senses - reducing average latency, even speculative row senses for CPUs doing speculative instructions. I beleive this is the main reason Intel went for rambus - they are building CPUs that are highly parallel at the low level - and can issue many overlapping memory requests at once - but they screwed up - this would have been great if they were hooking the rambus channels directly to their CPUs - but instead they are making them over the slot1 bus which forces complete serialization losing any possible advantage - AMD's slot A would have been a better choice but these buses still do a very basic serialization that's going to make obtaining almost any concurrency at the RAM channel level difficult (which is why IMHO rambus on Intel hardware sucks).
Have a look at Visicheck and see what your site looks like to those with (among other things) red/green colour deficit.
"don't fall into the fallacy of believing that Perl can solve social problems. Maybe Perl 6 can, but that's a ways off"
The chart on the first page of the article says that the memory bus increased only 4X from 1989 to 2000. I have to disagree. The article says that the FP SIMMs on 486s ran at 16 MHz. Those SIMMs were either 8-bit SIMMs run in banks of 4 or 32-bit SIMMs. Today's DIMMs do 64-bits at 133 MHz. So that would be 16 times faster, or 32 times if you count DDR. That's approximately equal to the increase in processor speed.
The whole point of the article, that RAM latencies have not kept up, is still a valid point. Although even the latencies have improved 8X. Remember, another reason that we don't have higher bandwidth memory is that it is hard to make motherboards and CPU interfaces that can handle higher clock frequencies.
I'm wondering if we could improve bandwidth and latency by going back to banked memory, perhaps interleaved.
Software sucks. Open Source sucks less.
This and the other articles didn't really get into some of the fundamental [S][DDR-S][R][ES]DRAM limits in terms of latency, and why this is just plain a losing battle.
At the cell level, DRAMs work by charge transfer. (That part was covered, IIRC) To write, you push some charge into the cell. To read, you share that charge and let it disturb voltages in your sensing system, and then evaluate it to a 0 or 1. If that sounds fuzzy, it's because it really is.
Anyway, there is a transistor used as a switch to get the charge in and out of the cell. It has to be a pretty darned good switch, especially in the 'off' position. We're about to the point where we can count the electrons that tell the difference between a 0 and a 1 - somewhere around 40,000. ANY leakage at all in that transistor HURTS.
Therefore, that transistor has to be optimized for leakage, and speed has to take a back seat. It simply takes TIME for those 40,000 electrons to get in and out of the cell. Oh, don't forget that this whole structure is optimized for size too, and there isn't any significant room to play around.
As we keep cramming more and more bits on to chips, the transistor (the D in the 1D memory) keeps getting smaller, and even with scaling, just can't get significantly faster. This aspect of performance just plain didn't show up, in the old days, because every other part swamped it out. We're now in the era where it shows, and causes pain. In the future, it will begin to dominate performance.
All else is addressing moving bits around, getting them to and from cells. That stuff is where most of the improvements have been happening. (There have also been improvements in wordlines and sensing systems, to credit them, too.)
My personal favorite is a large-ish L3 cache on the Northbridge. There have been some indications of this happening, already. Furthermore, on the Northbridge, you can differentiate between streaming data and random data, so streams don't flush the cache. You can't do that with cache in the DRAM.
The living have better things to do than to continue hating the dead.
Ars also has a great write-up on the (pin) ins and outs of memory. Only they started at the very beggining with SRAM and stuff. They did a really great of not only explaining the (physical) layout of memory but the theory behind every step and technical innvoations too. A lot of it was way over my head but I liked reading it anyways...
RAM Guide: Part I DRAM and SRAM Basics
And one other thing...
And it even features lots of diagrams (although some of the tables seem to have been designed by someone who is color blind, using white text on very bright backgrounds.
Check out this graph... I have no idea what it's explaining but its really spiffy and colorfull!
"Me Ted"
BOSTON SUCKS!