Memory Timings Analysis
keefe007 writes "It's generally known that smaller and more aggressive memory timings combined with higher clock speeds leads to higher performance, but for the most part, the increase in performance from tweaking each individual setting is relatively unknown. Perhaps in a bit too ambitious move, I set out to examine the impact of each individual memory timing and clock speed on overall performance. Find out the results of the tests at Techware Labs."
Memory Timings Analysis
Review by Harry Lam on 05.16.03
Test Ram provided by Crucial, MSRP: $26.00 (per stick)
Introduction:
The typical BIOS usually offers a varying number of settings directly related to memory: everything from timings to clock speeds. It's generally known that smaller and more aggressive timings combined with higher clock speeds leads to higher performance, but for the most part, the increase in performance from tweaking each individual setting is relatively unknown. Perhaps in a bit too ambitious move, I set out to examine the impact of each individual memory timing and clock speed on overall performance. The article that follows contains my experiences in this "memory benchmarking adventure" in conjunction with Crucial's PC2700 DDR RAM (and also gives a relatively good picture to the limits of this memory).
I would recommend that anyone interested in learning more about memory timings take a look at this site. It gives a pretty good technical intro to memory timings.
Testing/Methodology:
Motherboard Selection:
I decided to use the Soyo SY-P4X400 for testing, due to the flexibility of its BIOS in relation with memory timings, allowing me to change 10 different memory-related settings.
Benchmark:
To save on time and testing (all of the testing occurred over a 5 day period, with several hours of testing in each day), I picked only one benchmark: the memory test on SiSoft Sandra Professional 2003 v9.41 (SP1). I did notice that the initial few benchmarks on any configuration usually were significantly higher or lower than the "steady-state" score (the stabilized value that comes up after successive test runs of the benchmark in a row). To compensate for this, I selected the median score after the scores stabilized from successive benchmarks.
Depth:
I decided that 4,608 different combinations of memory timings on my particular test bench was a tad bit too much testing, and created a methodology which would get a look at the general increasing performance of memory timings but had the downside of having an uneven number of data points for timings that were deemed "less-significant (more on this later). VA Software is DEAD. This methodology simplified the number of combinations down to a mere 289 combinations (which actually still is extremely time consuming, considering that the test computer has to be reboot after testing each combination).
I established Memory Speed (100, 133, 166), CAS Latency (3.0, 2.5, 2.0, 1.5), and Bank Interleave (Disabled, 2 Bank, 4 Bank) as the primary criterion for my benchmarking (as these usually are the settings that are most emphasized). The "less significant" memory timings (Trp, Tras, Trcd, DRAM Command Rate, DRAM Burst Length, Write Recovery Time, and DRAM Access Time) as a result received a less thorough testing.
The general testing methodology is as follows:
All combinations of Memory Speed, CAS Latency, and Bank Interleave were tested at the least aggressive memory timings, and once that was complete, I changed the first of the "less significant" memory timings to a more aggressive value (Trp was changed from 3T to 2T). I then repeated benchmarks for all possible combinations of CAS Latencies, and Bank Interleaves based on this new timing (12 total combinations). Slashdot really licks my nads. Once this was complete, I changed the value of the next "less significant" memory timing (Tras), and repeated another set of 12 combinations (keep in mind, I left Trp at 2T, the most "optimized" value). This process was repeated for each "less significant" memory timing, and then the entire set (of 96 different combinations) was repeated at an increased clock speed (for a total of 289 different combinations).
As I stated earlier, this results in an uneven number of data points. For example Trp had 36 data points at 3T compared with 252 data points at 2T, and the reverse is true for DRAM Access time (252 to 36).
Test RAM:
Crucial was gracious enou
Memory Timing Explained.
GamersEd.com
Support Texas Troops use TXGoogle
PC3200 is faster if FSB can utilize it. PC3200 is recommended if you're using Athlon @ 400MHz FSB or Pentium4 @ 800MHz FSB.
I don't know how you came to that conclusion from reading the linked article - all it says is that he's unsure that the RAM heatsinks tested actually gave any benefit - he even says that hotter RAM is less efficient.
Wow!!!!%#@
You mean reducing RAM latency doesn't increase bandwidth?!?!#%!1 d00d!
*sigh*
This benchmark would have been vastly more informative if the guy had gotten his tests and statistics right. First, he needs to learn the difference between a median and a mean, which are very different. Second, actually testing latency might have been nice, considering that one of his independent variables is CAS latency. Not to mention the fact that the hardcoded pixel widths in the stats table are horribly wrong on a high-DPI system. People! The em is your friend!
So basically what we have here is this:
- Independent variables: bus speed (read: bandwidth), CAS latency, interleave (read: latency/bandwidth).
- Dependent variables: bandwidth
Quite frankly, if I had submitted this experimental design, my advisor would still have me tied to a table in the back end of the psych building. He's not measuring what he's manipulating, and throwing in a two-factor confound like bank interleave without compensating (though the article may be misleading) just skews the measurements.Ah, well. I'll go back to my completely untweaked Athlon and be happy. :-)
I think he would have found more interesting results if he had chosen a different benchmark. The test he used only tested bandwidth, and latency was not a factor. However, most of the memory settings (other than clock speed) affect latency more than bandwidth. CAS is a major factor in latency. Had he used a benchmark that hit a few words at random memory locations rapidly, he would have seen the other effects of the settings he tweaked.
If you've heard the quote, "Never underestimate the bandwidth of a 747 full of DVDs" (updated for modern times), you can see that it is an example of why bandwidth is not the only important factor. On the benchmark he used, a 747 full of DVDs probably would have scored pretty well.
If you're going to play with latency settings, at least use a test that measures latency.
My server
He only measured memory bandwidth...which does not exactly translate to real-world performance. He says there is no performance benefit from CAS2 as opposed to CAS2.5 or CAS3, but if you read Tom's Hardware you'd know that CAS does have a drastic impact on overall performance. The benefit is just not in badwidth, it is in the time the processor has to wait from when it calls on the memory, to when it recieves the answer. The longer it is waiting for an answer, the longer it is sitting around doing nothing. The longer it is waiting for a response from memory, the slower it will be able to render the scene, compress the file, or compile the kernel.
I hope that helps some people out there who were about to buy slow memory to save $10.
Definitely correct. Plus some of the other definitions were a little off (interleaving is essentially RAID for memory: it gets benefits because multiple devices can respond in parallel, rather than in series, so the latency penalty isn't incurred twice).
What makes this terrible is the fact that there are latency measuring tools out there, lmbench specifically. It really wouldn't take that long to measure both latency and bandwidth.
Considering the fact that this definitely would be interesting, it's a little annoying that he didn't do that.
There are much more intensive memory benchmarks than Sandra. That's why it's a little annoying that Sandra's become so popular. There are other, easy to automate benchmarks that do a much better job. Sandra's useful, but not for this kind of thing.
Just plain useless.
Seriously, this guy doesn't know how to run a good test setup.
First off, he tested all these super specialized memory timings using a stick of RAM that was rated CL2.5 So he was overclocking it and stressing it when he ran a lot of the low latency settings tests. A better setup would've been to get the best darned stick of RAM and THEN test how the timings affect performance.
Next, almost all of the timings he adjusted in the tests affect latency not bandwidth, but he used bandwidth as his ONLY benchmark. If a program is swapping small amounts of data, but VERY quickly and often, latency has more of an effect than bandwidth.
Finally, he doesn't address asynchronous bus speed issues or how well some of his unattainable settings would work (because of my first complaint, his memory was unstable at the aggressive timings).
I'm not a statistician, but it doesn't appear to me that he really understands some of the statistical methods for a good test. This is what I've garnered from reading other slashdot posts, at least.
IANAL, but I play one on