The Impact of Memory Latency Explored
EconolineCrush writes "Memory module manufacturers have been pushing high-end DIMMs for a while now, complete with fancy heat spreaders and claims of better performance through lower memory latencies. Lowering memory latencies is a good thing, of course, but low-latency modules typically cost twice as much as standard DIMMs. The Tech Report has explored the performance benefits of low-latency memory modules, and the results are enlightening. They could even save you some money."
Cached link, in case original gets Slashdotted
The article seems to finish off with, "Don't bother with these unless you want to overclock"
Which makes sense... or not. I'd just buy 533 ram and be done with it.
The real question is: can I buy 533MHz ram and run it slower with lower latencies?
110100 1101000 1101000 1100110 0 1101111 1101000 1100011 1
The link seems to crash my Firefox...
A bug to be reported, or what is happening?
I'd have to say this is right on when applied picking a woman to spend your life with... low-latency memory is a BAD BAD thing, and VERY expensive. My next time around, I'm going with the "CHEAPER", high-latency model that can't immediately recall everything I've ever said while arguing her point... Roses and jewelry can cost you over the long run friends...
Don't anthropomorphize computers: they hate that.
I have no doubt that hardcore PC gamers will shell out the cash for these, regardless of the cost/performance ratio. Once you start paying $500+ for a graphics card, all rational decision making skills are lost.
domain combinatorics
Teh LATENCIES MAKE ALL THE DIFFernence I run with my uber memory timings on my gentoo system => http://funroll-loops.org/
http://anandtech.com/memory/showdoc.aspx?i=2392
You'll basically find that the performance of value memory is very on par with the high end stuff. You basically pay for the ability to overclock on a more consistent basis.
I'm running Firefox 1.0.7 under Ubuntu. When I click on the link firefox exits, am I the only one having this problem?
Ed Almos
The more corrupt the state, the more numerous the laws. - Tacitus, 56-120 A.D.
Although I didn't read all the text (about 50% of it), the benchmarks were what I was interested in, as well as the conclusion. So to sum it up:
2-2-2-5 timings at 400MHz t1 memory is the fastest but costs twice as much and the performance gains are almost non-existant except in lower resolution games (i.e. 800x600 you may see an increase in 20 fps, which I think is a lot!), and of course the cost of the ram in this case would not be justified because putting that extra money into a better video card would be the better thing to do.
Only if you're an overclocker is this worth it, at least from their benchmarking and perspective, which I'll accept.
Oh yes, and that website also crashed my Firefox.
If you wanna show off your uber-low SuperPi times you need to be at or below 2.5-3-3-7 T1 ;-)
http://superpi.radeonx.com/
and yes we all know PiFast is much faster than SuperPi:
http://home.istar.ca/~lyster/chart.html
By the same logic as those cheesy insurance commercials, where you can afford the policy if you can afford a cup of coffee a day, if you can afford to spend 5 minutes reading slashdot each day, then you can afford not to drop twice the amount of money on ram for the 2% time savings it offers in most programs.
Although tighter memory timings and a 1T command rate can certainly improve the performance of the Athlon 64's memory subsystem, that improvement doesn't always translate to better application performance. In fact, with the exception of the Sphinx speech recognition engine, moving to tighter memory timings or a more aggressive command rate generally didn't improve performance by more than a few percentage points, if at all, in our tests. Lower latencies only improved WorldBench's overall score by a single point, and performance gains in games were generally limited to lower resolutions and detail levels.
At the end of the day, the appeal of low-latency memory modules may be limited to overclockers and enthusiasts intent on squeezing every last drop of performance from a system. More pedestrian "value" memory should be plenty fast enough for everyone else, especially since you can practically afford twice as much.
Crow T. Trollbot
Seriously, this has been known very well amongst the gaming PC builder crowd for a long time. Most of them, anyways; there's unfortunately still that level at which people know enough to put the PC together, but don't know enough to tell you what any of the numbers mean.
The difference between, say, Corsair Value Select memory, and Corsair 1337 Ultra X2000 - the memory equipped with LCDs, heat spreaders, and a spoiler with metal-flake yellow paint that add at least 10 horsepower - is going to be absolutely unnoticeable in the real world. Even benchmark scores will show little to no improvement.
Ricer RAM - you know, the PC equivalent of this crap - is for overclocking. If you're not planning on overclocking it, you're paying too damned much.
But they're doing this on an AMD-64 platform...
In the short-run, these tests help a person decide whether to buy low-latency RAM. But they provide little long-term insight into how much faster the entire system could be if low-latency were the norm and compilers, libraries, operating systems, and applications were re-optimized for low-latency, not high-latency, architectures.
Two wrongs don't make a right, but three lefts do.
Improvements in memory speed crawl compared to improvements in CPU speed, however larger caches can mitigate this problem to a certain extent, so why is it that growth in cache size continues to crawl? The Apple G5 updates FINALLY gave us 1mb l2 cache per core(and of course the industry standard 64k L1 cache per core) and whil the Intel/AMD world is slightly better in this regard, it's not by much. So why is it so hard to increase cache size?(of course you will need good cache allocation/replacement policies to go with them)? I'm not trolling, I honestly want to know. I realize that the people that design these chips are a lot smarter than I, but so far I haven't really seen a good reason why they don't increase cache size.
Also, outside of the HPC world, it seems very few programmers optimize their cache usage. Are there any tools(open source or otherwise) that can actually help you locate/fix inefficient uses of cache?
Monstar L
ExtremeTech Article
If people are concerned about the speed of their memory, then having fast DDR SDRAM running on an equally fast FSB is what really makes a difference. This is especially true on P4 Celeron based systems where the L2 cache isn't huge and cache misses are common. While memory latency is important to consider, it isn't critical that your modules have the absolute fastest timings ever. I think that the importance of the other components that connect to your memory like the FSB are underestimated. You can have fast memory, but still have it traveling over a slow or congested bus.
... Is not memory performance as such, but system performance. If a 5 percent increase in system performance increases the cost of your system by 10 percent, you have to want it pretty badly or be on the edge of required performance or just be in a schoolyard comparison. But if it's reversed, and a 10 percent increase in system performance can be had for a 5 percent increase in system price, then if you can afford the 5 percent (say $100 for a $2000 system), go for it.
-- Jim Crigler In 1937, I began, like Lazarus, the impossible return. -- Whittaker Chambers
I mean did anyone seriously think that these memory latencies were going to have a great impact on anything that the most common users care about? I mean game performance is barely touched at all, which is another Duh! I think their conclusion is probably right, the people buying these things are the idiots who want to post how they have the ultimate system with great RAM and everything, where they probably only could afford 1 GB of their stuff, my performance and load times are better because I could afford the 2 GB of my 'slower' RAM.
Geeze, must be a slow week for hardware...
"Some days you just can't get rid of a bomb."
"Software developers have spent years optimizing their code to mitigate the impacts of latency."
Really? MS hand-tunes the ASM code generated when they do a build of winword.exe ? Maybe thats why OO.o is so slow?
If I sound sarcastic, I suppose I am. With a few exceptions, almost every coder I've worked with in multiple jobs, has been of the 'throw CPU cycles' at the problem. I can count on one hand those who actually design for a HW architecture, since most of the coders these days are VBScript and Java kiddies.
I want to delete my account but Slashdot doesn't allow it.
Anyone have any benchmarks on how much faster a compile, DB or app server gets with better memory like this?
So talking about optimisation for low-latency RAM is, I suspect, nonsense. What we are surely seeing here is that the actual limitation on memory bandwidth is somewhere else - in the memory controller,in the cache controller, in the CPU fetch rate, in the rate at which stuff is being fetched from hard disk, in bus contention. Overclocking - speeding up memory controllers and buses - will have an effect. Reducing the number of wait states on the memory bus will not have much effect on performance if the total number of active memory cycles in a given period is largely unchanged.
If you had a need for real speed in an application which was not dependent on the graphics subsystem or access to network and HDD, I am sure you could get much more performance out of low-wait state RAM, but you would do it by HARDWARE design, not by software optimisation.
As a simple example from the dim and distant past when I was building hardware, TI used to have a microcontroller called the TMS9995 which ran at, for the day, a hefty 12MHz. With the slow DRAM of the time, it always needed a wait state and this meant that it could manage, as I recall, two memory accesses per microsecond. With static RAM, it could manage 3. The 9995 actually stored its working registers in external memory and so this meant a real world speedup of nearly 30%. The 8088, on the other hand, kept its working registers on-chip and had a limited instruction pipeline. As a result, the equivalent speedup was nothing like 30%. This was due to hardware differences not software differences.
In fact, the applications which really test out the memory subsystem are not games - they are databases and webservers, which hardly use the graphics system at all. And in these cases, for low end systems, the big beast in the equation is cache. It's quite astonishing how a Pentium-M can churn through a badly designed join while a low end AMD 64 struggles, simply because one has 2Mbytes of cache and the other has only 512K. As a result, for ordinary technical laptop and desktop work, I now specify Pentium-M, the AMD 64 with 1Mbyte cache, or pentium-D with 1Mbyte per core. You know it makes sense.(And now everyone can explain why I'm wrong, in my turn)
Pining for the fjords
I remember an article published on Slashdot a year or two (or three or four?) ago about a memory timing analysis. A copy is available here:
_ timings/index_3.shtml
http://www.techwarelabs.com/reviews/memory/memory
The net impact of the CAS latency: 0-0.002%. Almost _nothing_.
The great thing about this article is that it goes into just about every aspect available in the bios, giving you a good idea of what _does_ work (a brief scan-through reveals clock speed as the primary contributor, dram command rate and Active to Command (Trcd?)).
Happy reading.
-DrkShadow
In an interview nvidia's chief blatantly said such. He sees no end in sight to how much they can gouge. The tools who spend tons of money for pocket lining havn't balked at the price hikes so far so nvidia et al intend to keep hiking.
Comment removed based on user account deletion
How's about this one? 2004, ExtremeTech3 7762%2C00.asp?kc=ETRSS03039TX1K0000564
http://www.extremetech.com/article2/0%2C1558%2C16
Google Images
PDF from company
Note, due to their width, you can only put in one per bank. :)
Ostentation doesn't work so well when inside an opaque case.
Ok, heres my stance on ram (considering my wallet just went up in flames while testing ram... i think i have some things to say about it)
/. theres a high chance of it, lol... darn anti social nerds... but anyway... my recommendation is as follows...
This is on an AMD Sempron 757 socket, 3000+ AMD 64 bit cpu, and a gigabyte k8u motherboard. running windows xp pro 64. nvidia geforce 6600 (agp 8x)
I used Performance Mark(64) to benchmark, and FEAR, and Sims2 for real world testing.
Ok onto the results,
config 1 (2x512) total of 1 gig of memory
value ram 266mhz through tweaking with the motherboard, i scored 380 to 400
there was no cas timings or anything available to be read
fear chokes every few minutes, but overall an awsome game that only lags a little in complex scenes
sims 2 chokes every minute or so... and full fledge crashes the system every hour or so.
config 2 (3x512) total of 1.5 gig of memory
value ram 266mhz through tweaking i scored anywhere from 390 to 405, considering the large jump in memory, i woulda expected a bigger difference to be made
fear completly stopped choking
sims 2, crashed lest often, but still choked every minute or 2
config 3 (1x512, 1x256) total of ~784 megs of ram
value ram 333mhz through tweaking i scored anywhere from 398 to 415, the memory speed seemed to make a large difference, but the timings were HORRIBLE... cheap ass ram 3-3-3-8
fear ran like crap, barely playable.
sims 2, didnt crash but still choked alot
config 4 (2x512) total of 1 gb
patriot dual channel memory kit 400mhz, (2-3-2-5) heres when i got nuts, dropped 109 bux
ram would NOT overclock, at all.... even if i only increased it a few mhz it would crash the whole system, i scored 440 to 450 by overclocking just the cpu speed.
fear was smooth as hell, and an awesome play i was even able to increase the options of the game to allow more detail, without choking
sims 2, choked twice within an hour... but never crashed
config 5 (3x512) total of 1.5 gb
patriot dual channel memory kit 400mhz, (2-3-2-5), and a cheap value stick of geil memory (3-3-3-8 @ 400mhz) again, ram would NOT overclock, at all.... i scored 430 to 435 by overclocking just the cpu speed. as expected... the timing made more difference than the ammount of ram.
didnt even try fear or sims 2... so i cant tell ya
So, whats my conclusion? try it before you buy it!!!! ask a buddy if you can borrow his ram for a few minutes, so you can benchmark your system, and try out some games. see if the difference is worth it to you. if you dont have friends, which hey.... if your reading
266mhz to 400mhz makes a HUGE difference, that right there is the biggest boost you will see, thats worth an extra 40/50 bux (to me)
3-3-3-8 is pure and utter crap.... dont touch it with a ten foot pole, a full 1.5 gig of it is far WORSE than 1 gig of decent ram.
2-3-2-5 is pretty good, if you sit n wait for a decent sale you can get a good price... i searched for a month before i dropped my bux on it.
2-2-2-5 is the best you'll see (in my price range anyway), but its expensive, and i honestly dont think its worth the extra 70 bux difference from the 2-3-2-5 price tag.... but i also didnt test it.
all of the equipment i use was bought at frys, they sell stuff on www.outpost.com, but i warn you, if you goto the physical store.... expect them to try to rip you off.... they tried selling me returned ram, brokem motherboards, bad hardrives, a copy of xp pro 64 with a frickin scratch on the cd!!! (it was obviusly used), i havent heard much bad things about the online policies... but i havent tried them
Do NOT goto this URL http://www.forthesims.com
I do large 3D thunderstorm simulations. With some of the larger simulations I am integrating lots of things, contained in 3D floating point arrays, over 1 billion or more gridpoints (using distributed computing, such as a beowulf cluster made up of dual Xeons or an SGI Altix system). Each scientific calculation requires accessing floating point values stored in these arrays, doing some math, and updating another array.
Memory latency, and memory bandwidth, both impact how long it takes my simulations to complete. Let's say it is the difference between a simulation taking a week vs. five days... this is significant to me and how much I can get done. With these heavy duty scientific models and such, you really can see a noticable benefit with the fancier hardware, and clock speed is certainly not the the only factor to consider by a long shot.
A squid eating dough in a polyethylene bag is fast and bulbous, got me?
Does RAM Latency Matter?
This is on an AMD Sempron 757 should be This is on an AMD Sempron 754 .... i went with the 109 bux for the (2x512, 2-3-3-5, 400mhz, patriot) from frys.
should be .... i went with the 109 bux for the (2x512, 2-3-2-5, 400mhz, patriot) from frys.
also, id like to say... if your planning on overclocking your memory, do NOT buy patriot ram.... go with the centon stuff
Do NOT goto this URL http://www.forthesims.com
I admit, I only read the conclusion (it's the only thing I read on articles like that), but I'm rather dissappointed that the author didn't mention overclocking. Low latency memory is almost exclusively marketed to enthusiasts--and that's overclockers.
The reason low latency memory is interesting is it's overclockability. So comparing performance at stock CPU clock is just a silly waste of time. The whole point is now you can mess with your multipliers, bus, and clock and get the whole machine running faster.
I saw a couple other people mention this but I wanted to emphasize that this article really missed the point, I think.
If you can read some Czech (or numbers and colours, at least :) ), try this article. Interesting part begins here - throughput and some benchmarks with constant timing and increasing frequency (two pages). Then there are test with two frequencies and three timings.
Tests are done on A64 X2 3800+.
Price. Well, price and size, but mostly price.
:)
Cache isn't some magical thing. It's simply RAM. SRAM, usually, which is why it's so fast (don't have to waste power/time refreshing your contents). At the end of the day, it's just some very fast RAM. It sits between your CPU and the rest of your RAM, and uses its increased speed to "trick" the CPU into performing as if your main RAM is much faster than it is.
In my computer arch course a while back, someone asked why, if cache is so fast, we don't just build computers 100% SRAM memory. Our professor did some back-of-the-napkin calculations for fun. Major $. Have to include the extra space and cooling requirements, of course
The other thing, of course, is the good old law of diminishing returns. Cache actually solves the problem VERY nicely. For most people/computers/applications, cache misses aren't that great of a problem, because most computer code lends itself to cache hits (a phenomenon called "locality"). Locality is WHY we have cache in the first place. In general most computing works very well with a tiny amount of very fast cache and a small amount of fast cache. Adding more eventually gets you to the point where you're not seeing much if any improvement. On most modern systems, we're at that point - at least as far as the market will bear.
Oh, and outside of the HPC world, there's no NEED for programmers to worry about memory caching issues. This isn't where most bottlenecks show up, and again, most general-purpose code lends itself very nicely to small amounts of cache. Compilers often help here, too. Most of your average programmers would get better use of their time analyzing the data structures and algorithms they use.
Endless arguments over trivial contradictions in books written by ignorant savages to explain thunder in the dark.
the memory equipped with LCDs
:)
I actually picked up a pair of these when I bought my latest machine's parts. I didn't buy it because the memory was any better, I bought it for the geek factor of having a couple of LED displays on which I could put whatever I liked. And I still like it. It's nice to be able to glance at the box to see the temperature the unit is running at. It's fun to watch it go from 95 to 130 when I'm playing some major 3d game. And it was only $30 more than the cheap stuff.
Yes, if you pay list price, you're paying too much. So don't pay list.
- Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.
Instead of buying one of this modules at twice the cost, I'd say go and buy twice as much memory. Plese note that I have the average computer in mind, not those steroid pumped mutants that some o' my geek friedns have (you have to be geek and have the cash!).
Intel uses an inclusive cache archetecture, so you actually don't get the 640K you were looking for, and even so it'd have to be backed by DRAM (AFAIK, that cache isn't programmer or even OS accessible). AMD uses an exclusive-cache, so the L1 and L2 (and any L3) would all be additive in which data they could store.
JOC, why don't you specify Athlon X2 4400+ or 4800+s? They all have 1MB L2 per core, as well.
Exactly!
Oh, and latency is getting worse, not better, and has been for a long, long time.
Very true my first full-sized computer had a 8 MHz processor and 150 ns RAM in 1985. Now there's more than an 8:1 ratio between CPU and RAM clocks (and the RAM requires several cycles of wait states even at its slow speed). The question is: is this totally irreversible? Could future memory technologies based on paramagnetic hyperoptical nanodots (there's no such thing, but you never know) provide CPU-equalling clock speeds or allow significant amounts of on-die RAM for future CPUs?
My point is that these tests suggest that low-latency RAM wouldn't help in most of the applications tested in TFA. But I wonder if most of those applications have been written to cope with high-latency memory.
Two wrongs don't make a right, but three lefts do.
The Apple G5 updates FINALLY gave us 1mb l2 cache per core
What are you talking about? My G3 running at 450MHz has a 1MB L2 cache, and it has since 1999. Pentium Pros and various workstation/server class chips had multimegabyte caches a decade ago.
The reason you've seen less cache is that it didn't make sense to have a slow CPU with a 4MB ache that had to dissipate 100+ watts to operate. on-die cache is expensive in terms of heat, die space, and clock speed.
There's also the marketing factor, Intel would have you believe (at least until last year) that GHz were the benchmark, to get them they had to strip out cache. Everyone has to follow Intel to some extent, or they'll get mowed down (read: Apple filing to sell 500MHz G4s when 1.7GHz P4s were out). If you made a chip today that ran at 400MHz but kicked a Pentium 4 830's ass, you'd have a lot of trouble selling it.
And as for cache optimization, CHUD is an excellent tool to profile such things. You can get INSANE performance benefits by keeping an oft-repeated loop or randomly charged dataset inside the CPU's cache.
"Sometimes, I think Trent just needs a cup of hot chocolate and a blankie." -Tori Amos on Nine Inch Nails
Better latent than never.
Today's vices may be tomorrow's virtues.
That's the funny thing about all this 2-2-2-5 stuff being sold to gamers - it makes very little difference for them. It is however fantastic for code compilation, simulations, and anything else that has unavoidable random access patterns.
Thankfully they do market it for gamers, because if they did market it for scientists, software developers and generally in servers, it would be even more uber expensive than it is now.