From Rambus to DDR:Memory Explained
rosewood sent us linkage to an article that explains memory and more. A fairly detailed story talking about RAM in general, as well as explaining Rambus, DDR (including 1.5 and 2). Well written and worth the read. And it even features lots of diagrams (although some of the tables seem to have been designed by someone who is color blind, using white text on very bright backgrounds. Why do people do that?) Anyway, highly recommended.
I am partially color blind (red-green) and I cannot believe how many people design their websites with color rather than brightness. For this particular site you can see examples of this on page three and four, where the web designer uses white text on a bright background in some tables. My monitor is set to 1600x1200 (19") and I cannot read the text without blowing it up (zoomin) or selecting the text (which produces a nice white on dark blue selection)
This site was obviously not designed by a color-blind person, it would have to have been designed by a person who 1) has their display set to a low resolution and 2) has no color blindness whatsoever.
Good user interface design requires not only contrasting colors, but contrasting brightness (luminance ~= brightness, chominance ~= color). Too many sites have a dark font (navy blue, brick red, etc) on a black background, or a light font (pink, purple) on a white background. If you run a site such as this, PLEASE consider changing it (even a slight change can make a big difference!), or at least increasing the font size.
-Adam
[trying to soak in this article]
:/
ERROR!
Insufficient Memory this early in the morning!
Someone needs to write an article on the effects of caffeine on memory...of course then all the charts would probably be horrible neon colors on black backgrounds!
Gotta get some more coffee... =(
> CPU clock rates have experienced an exponential growth, leaving the rest of the PC components behind
:-) )
I tend to disagree. Hard drive capacity had an exponential growth too. And hard drive bandwith too. And memory size. 1 had 4Mh computers with 64Kb of RAM. Now I have a 800Mhz (x200) with 512 Mb of RAM (x8192)
My hard drives went from 20Mb on a 25 Mhz machine, and now it is 40Gb (x2000) on a 800Mhz (x32). And disk bandwidth went from 500K/s in 1991 (with good expensive SCSI hardware) to about 20Mb/s on ATAPI disk.
Most component of a PC had an exponential growth. But agreed, there is a problem with memory latency. And display quality. And CPU count. And noise. And size. And heat (well, one could argue that heat had an exponential growth too
Cheers,
--fred
1 reply beneath your current threshold.
However another thing that may not be obvious - today's 133MHz DRAMs being used in PCs are top-of-the-line - back in 1989 the fastest DRAMs were only being used in high-end servers because of the price premium.
(some background on why Rambus is good/bad in general) I've done designs with many of these technologies (traditional async ras/cas, sdram, rambus, not DDR) over the years - the older rambus designs were certainly harder to implement with (they used more of a network protocol paradigm) but not by much. The main thing about rambus is that at some level it trades off latency for bandwidth - there are some places where this is actually a good thing - display controllers for example.
Rambus also is a win in places where lots of concurrent transactions are available - the finer grained banking allows parallel row senses - reducing average latency, even speculative row senses for CPUs doing speculative instructions. I beleive this is the main reason Intel went for rambus - they are building CPUs that are highly parallel at the low level - and can issue many overlapping memory requests at once - but they screwed up - this would have been great if they were hooking the rambus channels directly to their CPUs - but instead they are making them over the slot1 bus which forces complete serialization losing any possible advantage - AMD's slot A would have been a better choice but these buses still do a very basic serialization that's going to make obtaining almost any concurrency at the RAM channel level difficult (which is why IMHO rambus on Intel hardware sucks).
I also say nuc-u-lear all of the time, when I intend nuclear. I suffer from a kind of dyslexia. While retardation is too strong of a word, indeed, my disability is pathological in nature. While on the other hand, your colloquial grammar and lack of manners can only be attributed to ignorance and/or demeanor. Anyway, thanks for the tip ;)
Someone you trust is one of us.
The key Rambus argument is that the 16-bit bus at 800MHz will be faster. But when PC2100 and PC2133 64-bit DDR SDRAM (200MHz and 266MHz, respectively, effectively) came along, it was no contest: DDR wins. Add the latency due to the Rambus proprietary packet structure, and Rambus becomes even more the loser. And the "serial is more effective because there's less wires to cause loss" argument is moot; seriously, how far is it from the RAM sockets to the chipset? A maximum of 12 inches travelled along the line. You could set up a 256-bit RAM pipeline and not worry about signal loss!
I've already begun a boycott of Rambus. I'm not buying a P4 system until they make the DDR chipset. I'm not buying a PS2. If there was a "sucks" site for Rambus, I'd visit often. However, it's not enough to boycott Rambus, but hopefully Intel will be able to pound them into the ground when those morons of the RAM industry try to litigate.
"Ancillary does not mean you get to rule the world." --U.S. Circuit Judge Harry Edwards, speaking to the FCC's lawyer
I'm wondering if we could improve bandwidth and latency by going back to banked memory, perhaps interleaved.
You would definitely be able to increase bandwidth, as long as you kept signal paths clean (difficult, but not impossible). Latency gets iffy. In practice, the main benefit would be in SMP systems or SMT systems (simultaneous multithreading; two or more threads running concurrently on a chip, while sharing some or all of the pipeline hardware).
The reason is that interleaved memory lets you have several outstanding cache row loads in progress from different parts of the memory system. A cache miss will still most likely stall the thread that missed, but as long as other threads or other processors are accessing memory, processing can continue. A non-banked/interleaved system would have to wait for the cache row to be transferred before servicing other requests.
One of the points mentioned is that rambus memory is poor at random access, but good for sequencial access. This actually makes rambus memory good for crt refresh applications, since a crt buffer DOES access sequencial locations as the screen is refreshed. I wonder if any graphic card designers are using rambus memories in their units.
As the article states, random accesses prove to be the largest problem, since new pages need to be output, and the cache has to be emptied. Does there exist (and if not, is it even possible) a program that manages the memory pages in a packing manner, similar to a defrag program for hard drives? Something that would note what addresses are generally request in sequence, and move those memory locations to a single page, to prevent cache misses? Seems like a simple idea, but I've not seen it before.
Do not confuse duty with what other people expect of you; they are utterly different.Duty is a debt you owe to yourself.
Thanks, this answer is very helpful.
I would think that a colorblind person (like me) would prone to add more contrast to a diagram, not less. I know that I have trouble seeing minor color changes (bright color to white) than I have seeing major changes ( white on black, or white on blue, green on black, etc.).
;-).
So, enough with the colorblind references already
It't attempting to show how bandwidth, as measured by several benchmarks, correlates to real-world performance, as measured by Expendable.
Interesting how games have become some of the most quoted benchmarks in recent years.
The living have better things to do than to continue hating the dead.
The chart on the first page of the article says that the memory bus increased only 4X from 1989 to 2000. I have to disagree. The article says that the FP SIMMs on 486s ran at 16 MHz. Those SIMMs were either 8-bit SIMMs run in banks of 4 or 32-bit SIMMs. Today's DIMMs do 64-bits at 133 MHz. So that would be 16 times faster, or 32 times if you count DDR. That's approximately equal to the increase in processor speed.
The whole point of the article, that RAM latencies have not kept up, is still a valid point. Although even the latencies have improved 8X. Remember, another reason that we don't have higher bandwidth memory is that it is hard to make motherboards and CPU interfaces that can handle higher clock frequencies.
I'm wondering if we could improve bandwidth and latency by going back to banked memory, perhaps interleaved.
Software sucks. Open Source sucks less.
The hardocp article doesn't fully address overall system architecture. Although the article was interesting in is broad coverage of the memory latency/bandwidth bottleneck, I am warry of articles that don't use entire systems architectures in their performance reviews.
Someone you trust is one of us.
This and the other articles didn't really get into some of the fundamental [S][DDR-S][R][ES]DRAM limits in terms of latency, and why this is just plain a losing battle.
At the cell level, DRAMs work by charge transfer. (That part was covered, IIRC) To write, you push some charge into the cell. To read, you share that charge and let it disturb voltages in your sensing system, and then evaluate it to a 0 or 1. If that sounds fuzzy, it's because it really is.
Anyway, there is a transistor used as a switch to get the charge in and out of the cell. It has to be a pretty darned good switch, especially in the 'off' position. We're about to the point where we can count the electrons that tell the difference between a 0 and a 1 - somewhere around 40,000. ANY leakage at all in that transistor HURTS.
Therefore, that transistor has to be optimized for leakage, and speed has to take a back seat. It simply takes TIME for those 40,000 electrons to get in and out of the cell. Oh, don't forget that this whole structure is optimized for size too, and there isn't any significant room to play around.
As we keep cramming more and more bits on to chips, the transistor (the D in the 1D memory) keeps getting smaller, and even with scaling, just can't get significantly faster. This aspect of performance just plain didn't show up, in the old days, because every other part swamped it out. We're now in the era where it shows, and causes pain. In the future, it will begin to dominate performance.
All else is addressing moving bits around, getting them to and from cells. That stuff is where most of the improvements have been happening. (There have also been improvements in wordlines and sensing systems, to credit them, too.)
My personal favorite is a large-ish L3 cache on the Northbridge. There have been some indications of this happening, already. Furthermore, on the Northbridge, you can differentiate between streaming data and random data, so streams don't flush the cache. You can't do that with cache in the DRAM.
The living have better things to do than to continue hating the dead.
Actually, we are starting to find that serial buses can be made faster than parallel buses. Look at USB replacing parallel ports for printers and scanners. Look at the upcoming IDE specs -- they're moving to serial. I believe Fiber-channel uses the SCSI command set on a serial bus, and future SCSI interfaces will also be serial.
The fact is that it is often actually easier to pump 1 bit at a super-fast rate than to try to synchronize 64-bits at a fast rate. Think about it -- which would be easier to run at 5 MHz, a CPU the complexity of a 286, or one the complexity of a Pentium IV? Also consider the money saved by having to run fewer data lines. Just because Rambus was incompetent does not mean that the technology is necessarily bad.
Software sucks. Open Source sucks less.
Rambus is an example of a design tradeoff. They sacrificed data pins for the ability to ramp up the clock speed. It's not at all an unfair comparison to look at a Dual Channel DDR vs. Dual Channel Rambus. Your 4-8 Channels of RDRAM is certainly not a better comparison. Besides that, Intel will probably never make such a monstrosity, because the cost of the controller would be hideous. The superior technology is the one that gets the most done for the least cost, at the highest quality. Dual channel DDR gets more done than Dual Rambus, costs less, by the OP's competent analysis, and is of a higher quality, in that the latency is far, far less.
The natalie portman, hot grits, and goatse.cx trolls are ridiculous, but the above post is a perfect example of high quality signal rising above the noise.
WARNING: there is a trojan on your
Ars also has a great write-up on the (pin) ins and outs of memory. Only they started at the very beggining with SRAM and stuff. They did a really great of not only explaining the (physical) layout of memory but the theory behind every step and technical innvoations too. A lot of it was way over my head but I liked reading it anyways...
RAM Guide: Part I DRAM and SRAM Basics
And one other thing...
And it even features lots of diagrams (although some of the tables seem to have been designed by someone who is color blind, using white text on very bright backgrounds.
Check out this graph... I have no idea what it's explaining but its really spiffy and colorfull!
"Me Ted"
BOSTON SUCKS!