Slashdot Mirror


Memory Timings Analysis

keefe007 writes "It's generally known that smaller and more aggressive memory timings combined with higher clock speeds leads to higher performance, but for the most part, the increase in performance from tweaking each individual setting is relatively unknown. Perhaps in a bit too ambitious move, I set out to examine the impact of each individual memory timing and clock speed on overall performance. Find out the results of the tests at Techware Labs."

10 of 159 comments (clear)

  1. Full text of article in case of slashdotting by Anonymous Coward · · Score: 4, Informative

    Memory Timings Analysis

    Review by Harry Lam on 05.16.03
    Test Ram provided by Crucial, MSRP: $26.00 (per stick)

    Introduction:

    The typical BIOS usually offers a varying number of settings directly related to memory: everything from timings to clock speeds. It's generally known that smaller and more aggressive timings combined with higher clock speeds leads to higher performance, but for the most part, the increase in performance from tweaking each individual setting is relatively unknown. Perhaps in a bit too ambitious move, I set out to examine the impact of each individual memory timing and clock speed on overall performance. The article that follows contains my experiences in this "memory benchmarking adventure" in conjunction with Crucial's PC2700 DDR RAM (and also gives a relatively good picture to the limits of this memory).

    I would recommend that anyone interested in learning more about memory timings take a look at this site. It gives a pretty good technical intro to memory timings.

    Testing/Methodology:

    Motherboard Selection:
    I decided to use the Soyo SY-P4X400 for testing, due to the flexibility of its BIOS in relation with memory timings, allowing me to change 10 different memory-related settings.

    Benchmark:
    To save on time and testing (all of the testing occurred over a 5 day period, with several hours of testing in each day), I picked only one benchmark: the memory test on SiSoft Sandra Professional 2003 v9.41 (SP1). I did notice that the initial few benchmarks on any configuration usually were significantly higher or lower than the "steady-state" score (the stabilized value that comes up after successive test runs of the benchmark in a row). To compensate for this, I selected the median score after the scores stabilized from successive benchmarks.

    Depth:
    I decided that 4,608 different combinations of memory timings on my particular test bench was a tad bit too much testing, and created a methodology which would get a look at the general increasing performance of memory timings but had the downside of having an uneven number of data points for timings that were deemed "less-significant (more on this later). VA Software is DEAD. This methodology simplified the number of combinations down to a mere 289 combinations (which actually still is extremely time consuming, considering that the test computer has to be reboot after testing each combination).

    I established Memory Speed (100, 133, 166), CAS Latency (3.0, 2.5, 2.0, 1.5), and Bank Interleave (Disabled, 2 Bank, 4 Bank) as the primary criterion for my benchmarking (as these usually are the settings that are most emphasized). The "less significant" memory timings (Trp, Tras, Trcd, DRAM Command Rate, DRAM Burst Length, Write Recovery Time, and DRAM Access Time) as a result received a less thorough testing.

    The general testing methodology is as follows:

    All combinations of Memory Speed, CAS Latency, and Bank Interleave were tested at the least aggressive memory timings, and once that was complete, I changed the first of the "less significant" memory timings to a more aggressive value (Trp was changed from 3T to 2T). I then repeated benchmarks for all possible combinations of CAS Latencies, and Bank Interleaves based on this new timing (12 total combinations). Slashdot really licks my nads. Once this was complete, I changed the value of the next "less significant" memory timing (Tras), and repeated another set of 12 combinations (keep in mind, I left Trp at 2T, the most "optimized" value). This process was repeated for each "less significant" memory timing, and then the entire set (of 96 different combinations) was repeated at an increased clock speed (for a total of 289 different combinations).

    As I stated earlier, this results in an uneven number of data points. For example Trp had 36 data points at 3T compared with 252 data points at 2T, and the reverse is true for DRAM Access time (252 to 36).

    Test RAM:

    Crucial was gracious enou

    1. Re:Full text of article in case of slashdotting by Anonymous Coward · · Score: 3, Informative
      Here is the results page, with a tiny bit of HTML formating restored for easier reading:
      -----

      Memory Timings Analysis

      Review by Harry Lam on 05.16.03
      Test Ram provided by Crucial, MSRP: $26.00 (per stick)

      Observations:
      • This can't be shown in the recorded results, but from my observations during testing, I noticed the general trend that as memory timings are set more and more aggressively, Sandra would reach it's steady-state score quicker and quicker (A gradual decrease from 5-7 trial runs to achieve a steady-state value to about 2-3 runs).
      • The overall bandwidth increase from the slowest memory timings to the fastest memory timings (1333 --> 2303) was approximately 73% (970 MB/sec).

      General Trends:

      These are just some general trends that I noticed when I was doing an analysis of my data:

      Memory Clock Speed:

      The speed of memory is most commonly measured by the clock speed, basically the number of cycles per second. Ram running at 133 MHz basically goes through 133 million clock cycles a second.

      Clock Speed: Performance Gain: % Increase: Theoretical %:
      100 to 133 ~500 MB/sec 35-40% 33%
      133 to 166 ~200-300MB/sec 10-15% 25%
      100 to 166 ~750-800MB/sec 55-60% 66%

      The performance gain from increasing the memory clock speed looks to be subject to the law of diminishing returns, with larger performance gains when moving from lower clock speeds.

      CAS:

      CAS latency is basically the number of clock cycles (or Ticks, denoted with T) between the receipt of a "read" command and when the ram chip actually starts reading. Obviously, lower numbers will result in less of a delay when memory is being read from. Corsair's website claims a low single digit % gain from CAS-3 to CAS-2. Memory can be basically visualized as a table of cell locations, and the CAS delay is invoked every time the column changes (which is far more often than the row changing)..

      CAS Latency: Performance Gain: % Increase:
      3.0 to 2.5 ~0-2MB/sec 0%-0.001%
      2.5 to 2.0 ~0-3MB/sec 0%-0.002%
      2.0 to 1.5 ~0-3MB/sec 0%-0.002%
      3.0 to 2.0
      (166 MHz mem clock) ~0-4MB/sec 0%-0.002%
      3.0 to 1.5
      (100 MHz mem clock) ~0-4MB/sec 0%-0.002%

      The differences in memory bandwidth concerning CAS latency were non-existent (and it is just as likely that any recorded performance gains are attributed to random events, as performance gains were not at all consistent). There was no significant gain in memory bandwidth from adjusting CAS latencies.

      Additional Reading/References:

      • Corsair's information page on CAS latency (http://www.corsairmemory.com/main/trg-cas.html)
      • Adrian's Rojak Pot: Bios Optimization Guide - SDRAM CAS Latency Time (http://www.rojakpot.com/showBOG.aspx?bogno=117)

      Bank Interleave:

      In layman's terms, Bank interleaving changes the way "banks" (basically, chunks of memory) are accessed and refreshed. Basically a staggered effect is created to minimize the overall refresh and access delays, sending a read/access command to a certain bank of memory while waiting for the results of a previous read/access command. All memory chips over 64 megs have 4-banks (and can utilize this option).

      Bank Interleave: Performance Gain: % Increase:
      Disabled to 2-Way 40-50MB/sec 1%-4%
      2-Way to 4-Way 40-50MB/sec 1%-4%
      Disabled to 4-Way 80-100MB/sec 2%-8%

      Performance gains concerning bank interleave were very consistent, with a 40-50 point increase across the board, completely independent of all other settings. Of course, at higher speeds this performance gain is less significant (1% at 166 FSB compared to 4% at 100 FSB - meaning that the increase does not scale with faster speeds).

      Additional Reading/References:

      • Ars Technica: BIOS Arcana - Description and Translation (http://www.arstechnica.com/guide/building/bios/bi os-1.html)
  2. Just for those who need more education. by SolidCore · · Score: 5, Informative
  3. Re:Just bought some of that PC2700 myself by Anonymous Coward · · Score: 1, Informative

    PC3200 is faster if FSB can utilize it. PC3200 is recommended if you're using Athlon @ 400MHz FSB or Pentium4 @ 800MHz FSB.

  4. Re:WOW! by Anonymous Coward · · Score: 1, Informative

    I don't know how you came to that conclusion from reading the linked article - all it says is that he's unsure that the RAM heatsinks tested actually gave any benefit - he even says that hotter RAM is less efficient.

  5. You mean...?!?!? by cbiffle · · Score: 4, Informative

    Wow!!!!%#@

    You mean reducing RAM latency doesn't increase bandwidth?!?!#%!1 d00d!

    *sigh*

    This benchmark would have been vastly more informative if the guy had gotten his tests and statistics right. First, he needs to learn the difference between a median and a mean, which are very different. Second, actually testing latency might have been nice, considering that one of his independent variables is CAS latency. Not to mention the fact that the hardcoded pixel widths in the stats table are horribly wrong on a high-DPI system. People! The em is your friend!

    So basically what we have here is this:

    • Independent variables: bus speed (read: bandwidth), CAS latency, interleave (read: latency/bandwidth).
    • Dependent variables: bandwidth
    Quite frankly, if I had submitted this experimental design, my advisor would still have me tied to a table in the back end of the psych building. He's not measuring what he's manipulating, and throwing in a two-factor confound like bank interleave without compensating (though the article may be misleading) just skews the measurements.

    Ah, well. I'll go back to my completely untweaked Athlon and be happy. :-)

  6. Re:Results = Waste of Time? by CTho9305 · · Score: 4, Informative

    I think he would have found more interesting results if he had chosen a different benchmark. The test he used only tested bandwidth, and latency was not a factor. However, most of the memory settings (other than clock speed) affect latency more than bandwidth. CAS is a major factor in latency. Had he used a benchmark that hit a few words at random memory locations rapidly, he would have seen the other effects of the settings he tweaked.

    If you've heard the quote, "Never underestimate the bandwidth of a 747 full of DVDs" (updated for modern times), you can see that it is an example of why bandwidth is not the only important factor. On the benchmark he used, a 747 full of DVDs probably would have scored pretty well.

    If you're going to play with latency settings, at least use a test that measures latency.

  7. Re:this guy by Jackazz · · Score: 3, Informative
    ...is not smart.

    He only measured memory bandwidth...which does not exactly translate to real-world performance. He says there is no performance benefit from CAS2 as opposed to CAS2.5 or CAS3, but if you read Tom's Hardware you'd know that CAS does have a drastic impact on overall performance. The benefit is just not in badwidth, it is in the time the processor has to wait from when it calls on the memory, to when it recieves the answer. The longer it is waiting for an answer, the longer it is sitting around doing nothing. The longer it is waiting for a response from memory, the slower it will be able to render the scene, compress the file, or compile the kernel.

    I hope that helps some people out there who were about to buy slow memory to save $10.

  8. Re:cas vs bus speed by barawn · · Score: 4, Informative

    Definitely correct. Plus some of the other definitions were a little off (interleaving is essentially RAID for memory: it gets benefits because multiple devices can respond in parallel, rather than in series, so the latency penalty isn't incurred twice).

    What makes this terrible is the fact that there are latency measuring tools out there, lmbench specifically. It really wouldn't take that long to measure both latency and bandwidth.

    Considering the fact that this definitely would be interesting, it's a little annoying that he didn't do that.

    There are much more intensive memory benchmarks than Sandra. That's why it's a little annoying that Sandra's become so popular. There are other, easy to automate benchmarks that do a much better job. Sandra's useful, but not for this kind of thing.

    Just plain useless.

  9. Horrible test methods by boarder · · Score: 3, Informative

    Seriously, this guy doesn't know how to run a good test setup.

    First off, he tested all these super specialized memory timings using a stick of RAM that was rated CL2.5 So he was overclocking it and stressing it when he ran a lot of the low latency settings tests. A better setup would've been to get the best darned stick of RAM and THEN test how the timings affect performance.

    Next, almost all of the timings he adjusted in the tests affect latency not bandwidth, but he used bandwidth as his ONLY benchmark. If a program is swapping small amounts of data, but VERY quickly and often, latency has more of an effect than bandwidth.

    Finally, he doesn't address asynchronous bus speed issues or how well some of his unattainable settings would work (because of my first complaint, his memory was unstable at the aggressive timings).

    I'm not a statistician, but it doesn't appear to me that he really understands some of the statistical methods for a good test. This is what I've garnered from reading other slashdot posts, at least.

    --
    IANAL, but I play one on /.