Memory Timings Analysis
keefe007 writes "It's generally known that smaller and more aggressive memory timings combined with higher clock speeds leads to higher performance, but for the most part, the increase in performance from tweaking each individual setting is relatively unknown. Perhaps in a bit too ambitious move, I set out to examine the impact of each individual memory timing and clock speed on overall performance. Find out the results of the tests at Techware Labs."
that the best time to install new memory is in between clock cycles.
This sig washed every five years whether it needs it or not!
What, only 289 combinations?!? I demand that all 4608 combinations be explored. Who knows what secrets of memory speed might be unleashed? Not that I'm willing to waste my time doing it.
so what i got out of that is that increasing the speed of the memory (from 133 -> 166) is a much larger difference than bumping down the cas latency. i think i'd rather have memory on a faster bus than at a lower cas then.
but people will always say they have their stuff at the most agressive timings just to say that they are there, even though the average performance increase is only 0-2%
Wow, considering all the settings that were tested, and the only improvment beyond 1% was the Clock Speed, seems like the rest of it was kinda a waste...
Polaroid. See what develops!!
>sarcasm/sarcasm<<
Stupid sexy Flanders.
Memory Timings Analysis
Review by Harry Lam on 05.16.03
Test Ram provided by Crucial, MSRP: $26.00 (per stick)
Introduction:
The typical BIOS usually offers a varying number of settings directly related to memory: everything from timings to clock speeds. It's generally known that smaller and more aggressive timings combined with higher clock speeds leads to higher performance, but for the most part, the increase in performance from tweaking each individual setting is relatively unknown. Perhaps in a bit too ambitious move, I set out to examine the impact of each individual memory timing and clock speed on overall performance. The article that follows contains my experiences in this "memory benchmarking adventure" in conjunction with Crucial's PC2700 DDR RAM (and also gives a relatively good picture to the limits of this memory).
I would recommend that anyone interested in learning more about memory timings take a look at this site. It gives a pretty good technical intro to memory timings.
Testing/Methodology:
Motherboard Selection:
I decided to use the Soyo SY-P4X400 for testing, due to the flexibility of its BIOS in relation with memory timings, allowing me to change 10 different memory-related settings.
Benchmark:
To save on time and testing (all of the testing occurred over a 5 day period, with several hours of testing in each day), I picked only one benchmark: the memory test on SiSoft Sandra Professional 2003 v9.41 (SP1). I did notice that the initial few benchmarks on any configuration usually were significantly higher or lower than the "steady-state" score (the stabilized value that comes up after successive test runs of the benchmark in a row). To compensate for this, I selected the median score after the scores stabilized from successive benchmarks.
Depth:
I decided that 4,608 different combinations of memory timings on my particular test bench was a tad bit too much testing, and created a methodology which would get a look at the general increasing performance of memory timings but had the downside of having an uneven number of data points for timings that were deemed "less-significant (more on this later). VA Software is DEAD. This methodology simplified the number of combinations down to a mere 289 combinations (which actually still is extremely time consuming, considering that the test computer has to be reboot after testing each combination).
I established Memory Speed (100, 133, 166), CAS Latency (3.0, 2.5, 2.0, 1.5), and Bank Interleave (Disabled, 2 Bank, 4 Bank) as the primary criterion for my benchmarking (as these usually are the settings that are most emphasized). The "less significant" memory timings (Trp, Tras, Trcd, DRAM Command Rate, DRAM Burst Length, Write Recovery Time, and DRAM Access Time) as a result received a less thorough testing.
The general testing methodology is as follows:
All combinations of Memory Speed, CAS Latency, and Bank Interleave were tested at the least aggressive memory timings, and once that was complete, I changed the first of the "less significant" memory timings to a more aggressive value (Trp was changed from 3T to 2T). I then repeated benchmarks for all possible combinations of CAS Latencies, and Bank Interleaves based on this new timing (12 total combinations). Slashdot really licks my nads. Once this was complete, I changed the value of the next "less significant" memory timing (Tras), and repeated another set of 12 combinations (keep in mind, I left Trp at 2T, the most "optimized" value). This process was repeated for each "less significant" memory timing, and then the entire set (of 96 different combinations) was repeated at an increased clock speed (for a total of 289 different combinations).
As I stated earlier, this results in an uneven number of data points. For example Trp had 36 data points at 3T compared with 252 data points at 2T, and the reverse is true for DRAM Access time (252 to 36).
Test RAM:
Crucial was gracious enou
Is the fact that accountants and finance managers (decision makers in PC buying deals) talk as if they understand all these things better than sysadmins. SDRAM, DDRRAM, RambusRAM, L2 cache, on-chip cache and all that marketing crap is heavily used by these decision makers.
Last year, I did a demo of a Via system with SDRAM and it did about 40% faster than a DDR-RAM board. The VP-Fin chap has become highly suspicious of any memory performance graphs or numbers, these days. And in true BOFH style, I've got decision-making rights on all PC purchases.
Thanks to all the confusion.
If you keep throwing chairs, one day you'll break windows....
Speaking as an engineer, I do hate buying new stuff because its cheaper to do so rather than spending time tweaking the old stuff, but 100's of combinations, for a few % increase? Even I would be perfectly happy in paying the money rather than loosing 0.05% of my life!
I meant to hit preview! I guess I deserve it for being a wiseass.
Stupid sexy Flanders.
If Dell sent you, gratis, their high-end gaming machine for you to review and post about on your blog (assuming you have one), would you not mention the fact more than a couple of times in thanks for the free machine?
Sure it's advertising for whomever the vendor was, but its also a sponsorship of something that the author might have had to pay for himself. Or might have had to do without.
Advanced memory timings, while beneficial in squeezing that last bit of performance out of your system, won't save your server from the Slashdot Effect.
I read several articles that said that PC3200 is not worth the price difference and that in many casses PC2700 is faster.
/.ers have experience with this? Is PC3200 worth the price?
Before I plunk down $$$ for PC3200, I wanted to know if it is worth it. I was hoping this article would help answer the question, but it looks like he is only testing various BIOS settings with a single DIMM, and not comparing memory with different access speeds. Any
std::disclaimer<std::legalese> sig=new std::disclaimer; sig->dump(); delete sig;
No it's not slashdotted.
Did you even check the website first? Or were just wanting to make a snappy comment about bandwidth vs. RAM?
-Dubya
Hey, this is slashdot, it's a safe assumption that the site's buckled under the load.
Ah the poor common slashdotter. Too naive to know that the 'Slashdot Effect' is no more than a myth perpetuated from the Early Days. Too ignorant to realize that server performance has scaled infinitely more than the slashdot crowd over the past years. Living in a dream world.
Very well, hang on to it if it makes you feel better, at least you'll have the power in your imagination.
I'd spread false rumors and lies just to ruin their business. Damned evil capitalist sons-of-bitches!
Squirreled away in there. Teehee.
I've found that dumping beer onto my computers' silicon memory has the same effect as dumping beer down my throat does on my carbon-based memory.
Trolling is a art,
Now if you really had a sense of humor, you would have changed the technical details. But then again, you're only an AC, right? Perhaps I'm expecting too much.
--jdp Maintainer of VisEmacs
You are right except for one part, that I should evaluate a competing product(s) as well, or I should neglect the vendor name minimally make effort not to make it look like an advertisement (for example fine-printing the product name at the bottom of the page), only then I can be called "credible", "professional", "unbiased" you know those buzzwords we all need in our life?
Well, no, because I have something called 'integrity.' Which means I probably won't be getting any free crap anytime soon.
Which is just as well, because crap is still crap, no matter if it's free or not. And if you encourage people to spend money on something that you know is worthless, you're the kind of person that'll be getting a ball peen hammer to the forehead sooner or later.
(-1, Raw and Uncut is the only way to read)
Hmm, what a surprise.
Sisoft Sandra memory bandwidth tests are good at exactly that, memory bandwidth tests.
What would be much more interesting is how programs that rely on small memory latencies (especially scientific programs) depend on changing the CAS.
seriously, for anyone that knows what they are messing with when changing these settings - don't you already know that it increases performance?
must be a slow day.
Synthetic Sandra are nice, but I'd rather he ran 3dmark2001. It's what I would've done ::)
-- taking over the world, we are.
Memory Timing Explained.
GamersEd.com
Support Texas Troops use TXGoogle
How can you call this a waste of time? Anything that let's you get 0.0001 extra frames per second in Quake 3 for free shouldn't be laughed at!
"Accept that some days you are the pigeon, and some days you are the statue." - David Brent, Wernham Hogg
What, only Windows XP was used for the tests as well? Why not run tests also against Linux and BSD on the same hardware too? Your result will vary...
... driving a Pinto. You certainly won't be getting the full eXPerience.
Testing just Windows is like test driving the Indy 500
OT for sure, but just wanted to point out the awful mistake in your sig. "excitied" yeah...
I dunno if integrity has as much to do with it...I mean, had he been comparing two memory types, perhaps, but since the test was on a single hardware configuration, how does it impunge his integrity to thank the hardware provider in a semi-advertising way? Nowhere else does he make any claims good or bad about the Crucial RAM itself that I saw.
"America has done some terrible things. But I know that Americans don't cheer when innocents die." -Dave Barry
The last server shindig I went to for Compaq had all these "advanced" memory options (hotswap, interleave, RAID for RAM [and for RAM only, it wasn't a solid-state disk system]).
Does it really turn out that 4 way interleave is kind of bogus and only a 2-8% increase in performance? I suppose 8% might mean a lot, but on average it could be just 4% or so.
Well, no, because I have something called 'integrity.'
obviously someone from across one of the ponds. here in the states, free crap will trump that there 'integrity' card each and every time.
and hey, now that i've spilled the beans, AMD, i could probably post some benchmarks of your new 64 bit processor if i only had a couple of them to play around with (wink, wink, smp, nudge, nudge).
If it is free and worthless you mention that too. But as we all know Memory is pretty much the same ie SD 133 =SD 133. or DD 2700= DD 2700 doesn't matter who you got it from.
ANd yes I understand that out of spec won't perform as well, but I'm talking about properly specced and tested RAM.
Food not Bombs is a nice platitude but it breaks down when you notice that the Bombees are usually well fed
The manufacturers only make RAM with the lowest latencies and just change the labelling because it is more cost effective to do so. But this should not come as a surprise to anyone.
Well, I'll get my flamebait mods, but what a no shit article.
He finally concludes that memory clock speed has a significant effect on bandwidth, while CAS and other settings hardly have an effect at all. Something I've known intuitively all along, and anyone with a rudimentary understanding would know.
In other words, when all those "super dooper case-moddin' overclockin' nothing-knowin' computer experts" payed an extra hundred bucks for stick of CAS 2 ram instead of CAS 2.5, they got ripped off. No surprise. A fool and his money, after all.
God bless the kids who think they're super computer savvy, but are absolutely clueless and easily swayed by hype - they subsidize the industry for the rest of us.
I don't need no instructions to know how to rock!!!!
since the _&%_&$#& /. filters do not allow a copy of page 2 i tried to make a mirror: (it is not /.éd now, it just loads slowly)
< /a>< /a>
<a href="users.domaindlx.com/leukhe/page2.htm">page2
<a href="users.domaindlx.com/leukhe/page3.htm">page3
By the way, the parent post is including items like "VA IS DEAD" in his text.
Aww, screw it. Let's just spell 'em both "lusing" and get it over with.
Imagine this-
A few ticks backwards of the clock
One guy
A limited number of brain cells
One overgrown ego
A couple of feathers
Some glue
Running at a good clip
A decent sized cliff
Some nice jagged rocks
Everytime said guy runs off the cliff with a couple of feathers glued to his back he ends up a chunky wet spot on the jagged rocks.
But that won't stop some other idiot from saying He just needed-
More feathers
Straps instead of glue
To run faster
A taller cliff
Less pointy rocks
Let's not get into the point of how its always someone who isn't willing to jump off the cliff that swears it would work if someone just followed HIS plans
At some point, you stop coming out to watch the spectacle because you have seen all the possible pretty splatter patterns.
AND MAYBE, just MAYBE, you start considering some other alternatives
This guy found a good number of idiots to run off a cliff. I applaud him for it
---"What did I say that sounded like 'Tell me about your day?'"---
Wow!!!!%#@
You mean reducing RAM latency doesn't increase bandwidth?!?!#%!1 d00d!
*sigh*
This benchmark would have been vastly more informative if the guy had gotten his tests and statistics right. First, he needs to learn the difference between a median and a mean, which are very different. Second, actually testing latency might have been nice, considering that one of his independent variables is CAS latency. Not to mention the fact that the hardcoded pixel widths in the stats table are horribly wrong on a high-DPI system. People! The em is your friend!
So basically what we have here is this:
- Independent variables: bus speed (read: bandwidth), CAS latency, interleave (read: latency/bandwidth).
- Dependent variables: bandwidth
Quite frankly, if I had submitted this experimental design, my advisor would still have me tied to a table in the back end of the psych building. He's not measuring what he's manipulating, and throwing in a two-factor confound like bank interleave without compensating (though the article may be misleading) just skews the measurements.Ah, well. I'll go back to my completely untweaked Athlon and be happy. :-)
Actualy I'm surprised that MB/BIOS/Memory isn't self-tuning. Pick your goal (stability, speed) and let it go from there.
Someone needs to take a course, or read a book on Design of experiments.
Then he could have gotten the same information with many fewer test runs.
Also you could end up with interaction effects, which is nice. Maybe two settings have a greater or lesser effect.
I did my own memory timing tests, I found that I got 40 more FPS in Q3 with DDR200@2-3-3-5 than at DDR166@2-3-3-5. When I lowered the timings to DDR166@2-2-2-5 I gained back all but a few of those 40FPS. To me that looks like memory timings have a significant effect in Q3 at least.
1: he didn't disable system cache, this could have cause the >theoretical performance increase.
2: In his final analysis he failed to mention that there must be another bottleneck in the system causing the sub standard performance increases.
3: He only tested memory transfer and not random access, page faults and all the other things that really slow your computer down.
If your after max performance then your going to buy the best anyhow, if you not then a PC still using PC133 memory will be fine.
thank God the internet isn't a human right.
For some applications, like a main memory database, latency is key (bandwidth can be way below maximum for large number of small random read/writes). It looks like only large transfers were tested. None of his results suprise me. What I dont know is how certain settings affect latency.
most do exactly that, pick from safe/optimized/performance, etc
What he has proven is that under some conditions it might be possible to tweak some extra performance out of your memory AS LONG AS the only thing you do with your computer is run Synthetic Sandra.
The first major problem I have is that he didn't test enough data points to prove that the differences in memory performance were actually due to the tweaks he did, and not thanks to some other environmental condition like sun spots, power surges, or what he ate for breakfast. His ENTIRE test suite only eaked out a couple of percent difference; he would have to run each point several times just to establish that those couple of percent weren't a fluke.
Second, and more important, even if we assume his test data is 100% accurate, it doesn't simulate real-world conditions. Benchmarks are great for establishing ball-park performance numbers (within 1-2 sigma, *maybe*), but they are definitely not representative of how people use a computer. If I'm reading a web page, for instance, I might only access memory intermittently as I scroll the page up and down. But, then again maybe I'm listening to mp3s while I read that web page, so I am intermittently accessing one part of memory, while periodically (as in, at a regular interval) accessing another part of memory. And, let's not forget that I'll randomly startg up new applications, close old ones, etc. which means that there's going to be extra time where my solid-state memory is actually waiting on data from either virtual memory (usually equivalent to the HDD), my CD-ROM drive, or perhaps from the network. None of this is really being factored into the benchmark, but it's something I do all the time which has a much larger impact on my perceived memory performance than the difference between his worst test settings, and his best test settings.
I learned this the other day from an article at Tom's Hardware. In retrospect it makes logical sense but I don't think it would have occured to me. We're sorta trained to think faster == more performance.
Anyways, what the article discovers is that you'll get BEST performance when memory speed == FSB speed. In benchmarks they find that a Athlon 3000+ (333Mhz FSB) with DDR333 is faster than the 3000+ with DDR400 (or DDR444). So, mental note, when shopping for a system, don't bother paying extra for that faster RAM, just get whatever matches your FSB.
creating a async situation with overclocked memory will be slower then sync memory with the cas tweaks.
try it and you will see.
Those of you out there running with >=1 gig of memory should be looking at ECC. How much this hits performance would be an interesting subject. I've heard ~5%
At least I don't worry about cosmic rays; I use ECC!
Power tends to corrupt, and absolute power corrupts absolutely.
He concludes that running memory clocked at the memory speed = FSB is better than running mem speed > or < than FSB. He never says that running 400MHz DDR at 333MHz will be slower than DDR333 at 333. If you bother paying extra for that faster RAM, you can run it at 333MHz and it will be just as fast...
The big thing about underclocking the bus speed, though, is that you can now overclock the latencies. You can make that CAS2.5 pc3200 a CAS2 pc2700 and tweak other latency settings, too. It also means that if you decide to upgrade to a 400MHz FSB Athlon, you'll be prepared with memory tested to support that.
This article is stupid because almost all of the tweaks affect latency, but his benchmark is bandwidth. Not much useful information can be gained from it. The Tom's article is much better, but you have to add other knowledge to use it correctly.
IANAL, but I play one on
Can't be true...
Seriously, this guy doesn't know how to run a good test setup.
First off, he tested all these super specialized memory timings using a stick of RAM that was rated CL2.5 So he was overclocking it and stressing it when he ran a lot of the low latency settings tests. A better setup would've been to get the best darned stick of RAM and THEN test how the timings affect performance.
Next, almost all of the timings he adjusted in the tests affect latency not bandwidth, but he used bandwidth as his ONLY benchmark. If a program is swapping small amounts of data, but VERY quickly and often, latency has more of an effect than bandwidth.
Finally, he doesn't address asynchronous bus speed issues or how well some of his unattainable settings would work (because of my first complaint, his memory was unstable at the aggressive timings).
I'm not a statistician, but it doesn't appear to me that he really understands some of the statistical methods for a good test. This is what I've garnered from reading other slashdot posts, at least.
IANAL, but I play one on
Did he disable these caches? Seems to me if not then he really may not have tested anything. If he did then I can't see why there is so little difference between some of the tests.
Of course, I've never run these benches so I don't know. I'm just asking.
It's generally known that smaller and more aggressive memory timings combined with higher clock speeds leads to higher performance
Inventive use of the term "generally known"...
Vs lbh pna ernq guvf, ybt bss abj. Tb bhgfvqr. Syl n xvgr.
neither Linux and Windows know how to fuck with your memory timings; that's the boot ROM's domain. They'd have to have motherboard specific code to manage that. If any OS is liable to adjust memory timings, it's Windows, with whatever crappy mobo drivers they bundled with your purple monster.
Black holes are where the Matrix raised SIGFPE
I guess it depends on what sort of benchmark you were doing, but I find it very odd that an SDR system beat a DDR system unless something was done to stack the competition in favor of the SDRAM system. (Small loop that fits in CPU cache combined with a faster CPU for example.)
For programs that fit in CPU cache, or spend a lot of time on the cache contents before moving on to a new dataset, DDR vs. SDR doesn't matter as much as programs that are not cache-efficient. (FYI, older FPS games, esp. Quake I/II and maybe III, were known for being very cache-efficient, where 128kb cache was "enough" and having 128kb of corespeed cache was better than 256k of halfspeed cache. This and it inherent overclockability are why the Celeron was so popular with gamers. But the moment you needed more than 128k of cache the Celeron sucked and the fact that the PII had double the cache made it win despite the cache being half the speed.)
If you perform operations on datasets larger than the CPU cache (Large matrix multiplications for example, which are common in scientific computing), memory bandwidth makes a HUGE difference. A few years ago I worked in a scientific research facility. We had a benchmark that performed incresingly large matrix operations. When I benchmarked it on my 800 MHz Athlon at home, I tried both 100 and 133 MHz memory speeds. For small matrix operations, they were even. For larger operations, the extra memory BW made a huge difference. Later I benchmarked my 1.1 GHz Athlon with DDR memory against it - It was only a bit faster for small matrices, but for large matrices it was significantly faster thanks to the DDR memory.
The neat thing about the benchmark was that you could clearly see the effects of various caches, as performance would drop like a rock when you exceeded a cache size. The exception was one of the SGI Origins or an old Cray at work, which had INSANE memory bandwidth and only had a gradual dropoff.
retrorocket.o not found, launch anyway?
Apart from the lack of latency testing, why didn't he use programs to tweak this kind of stuff? There are programs that run from FreeDOS, that for example can adjust about all settings of your chipset.
Then also use a good benchmark program that can run from FreeDOS and you're ready way faster. Not to bash any (more) real OS, but DOS boots way fast, it's easier to 'deploy' than making your own memory-benchmark-OS.
Could someone please post a link to a page with good overview of what PC2100/PC2700/... are, SDRAM/DDRAM/... On what kind of motherboards they should go ? What bus ? With which processors, etc...?
And also what all the little tweaks in the BIOS are (latency, ECC, scrubbing...) I tried searching google, but it's always vendors' hype.
Non-Linux Penguins ?
I first tried to copy and paste page 2 into a post. I thought it would look better in fixed font. The filters did not allow the table. I forgot to preview and now i have an ugly post that will be forever stored into the internet. But the mirror is there!
I would! I would actually put a few banners on the page for their genorosity! UNGREATFUL SCUM actually I got some free shit for testing back in the boom days, I used to write artices on hardware just to get freebies, heres a link
There is no god
THIS POST 0WNED BY M4RC TH3 P1R473, AS HE WAS NOT MENTIONED!