The Impact of Memory Latency Explored
EconolineCrush writes "Memory module manufacturers have been pushing high-end DIMMs for a while now, complete with fancy heat spreaders and claims of better performance through lower memory latencies. Lowering memory latencies is a good thing, of course, but low-latency modules typically cost twice as much as standard DIMMs. The Tech Report has explored the performance benefits of low-latency memory modules, and the results are enlightening. They could even save you some money."
I'd have to say this is right on when applied picking a woman to spend your life with... low-latency memory is a BAD BAD thing, and VERY expensive. My next time around, I'm going with the "CHEAPER", high-latency model that can't immediately recall everything I've ever said while arguing her point... Roses and jewelry can cost you over the long run friends...
Don't anthropomorphize computers: they hate that.
I have no doubt that hardcore PC gamers will shell out the cash for these, regardless of the cost/performance ratio. Once you start paying $500+ for a graphics card, all rational decision making skills are lost.
domain combinatorics
Beware, one of the banner advertiser on that page (netshelter.net) is trying to buffer overflow with strangely crafted cookie. Hope you do not run your Firefox on Windows...
There you are, staring at me again.
http://anandtech.com/memory/showdoc.aspx?i=2392
You'll basically find that the performance of value memory is very on par with the high end stuff. You basically pay for the ability to overclock on a more consistent basis.
Although I didn't read all the text (about 50% of it), the benchmarks were what I was interested in, as well as the conclusion. So to sum it up:
2-2-2-5 timings at 400MHz t1 memory is the fastest but costs twice as much and the performance gains are almost non-existant except in lower resolution games (i.e. 800x600 you may see an increase in 20 fps, which I think is a lot!), and of course the cost of the ram in this case would not be justified because putting that extra money into a better video card would be the better thing to do.
Only if you're an overclocker is this worth it, at least from their benchmarking and perspective, which I'll accept.
Oh yes, and that website also crashed my Firefox.
Crow T. Trollbot
The real question is: can I buy 533MHz ram and run it slower with lower latencies?
Yes. I regularly by high speed RAM and downclock it, but run it at lower latency. For instance if I wanted to run my RAM at 400MHz, I'd buy 433/466/500MHz VAL-U-RAM and run it as a stick of semi-premium 400MHz.
Seriously, this has been known very well amongst the gaming PC builder crowd for a long time. Most of them, anyways; there's unfortunately still that level at which people know enough to put the PC together, but don't know enough to tell you what any of the numbers mean.
The difference between, say, Corsair Value Select memory, and Corsair 1337 Ultra X2000 - the memory equipped with LCDs, heat spreaders, and a spoiler with metal-flake yellow paint that add at least 10 horsepower - is going to be absolutely unnoticeable in the real world. Even benchmark scores will show little to no improvement.
Ricer RAM - you know, the PC equivalent of this crap - is for overclocking. If you're not planning on overclocking it, you're paying too damned much.
Improvements in memory speed crawl compared to improvements in CPU speed, however larger caches can mitigate this problem to a certain extent, so why is it that growth in cache size continues to crawl? The Apple G5 updates FINALLY gave us 1mb l2 cache per core(and of course the industry standard 64k L1 cache per core) and whil the Intel/AMD world is slightly better in this regard, it's not by much. So why is it so hard to increase cache size?(of course you will need good cache allocation/replacement policies to go with them)? I'm not trolling, I honestly want to know. I realize that the people that design these chips are a lot smarter than I, but so far I haven't really seen a good reason why they don't increase cache size.
Also, outside of the HPC world, it seems very few programmers optimize their cache usage. Are there any tools(open source or otherwise) that can actually help you locate/fix inefficient uses of cache?
Monstar L
ExtremeTech Article
"Beware, one of the banner advertiser on that page (netshelter.net) is trying to buffer overflow with strangely crafted cookie. Hope you do not run your Firefox on Windows..."
Just another reason to switch to IE!
"Derp de derp."
... Is not memory performance as such, but system performance. If a 5 percent increase in system performance increases the cost of your system by 10 percent, you have to want it pretty badly or be on the edge of required performance or just be in a schoolyard comparison. But if it's reversed, and a 10 percent increase in system performance can be had for a 5 percent increase in system price, then if you can afford the 5 percent (say $100 for a $2000 system), go for it.
-- Jim Crigler In 1937, I began, like Lazarus, the impossible return. -- Whittaker Chambers
Sounds like a memory timing issue - you should upgrade to some OCZ low-latency RAM!
(I'm sorry, that's not helpful at all, is it?)
Schrodinger's cat is either dead or really pissed off...
Sorry, I call BS on your entire post. The difference in latencies here is miniscule -- it's not like we're talking about having the CPU wait 2 clock cycles vs 30 clock cycles. It's closer to 13 vs 25 (not exact, but the magnitude of difference is close). That just doesn't matter that much -- the reality is that if you have a cache miss then you're looking at 20-30 cycles (or, more likely, 40-60 cycles) of stall while you fetch the data from main memory.
The kind of changes you're talking about require vastly faster memory. Not the kind of latency differences being discussed here at all. Both of these are "high latency" compared to what would be needed for your theoretical redesign of the entire software stack. And even then, you just become utterly and completely screwed if you have to hit virtual memory, possibly more so than you are now because you've re-orchestrated everything around the idea that latency is a non issue.
Oh, and latency is getting worse, not better, and has been for a long, long time. CPU speeds long ago outstripped the speeds of our fastest memory (well, fastest while still not costing absurd amounts of money...), and the newer memory formats (DDR, DDR2, DDR3, RDRAM, etc) have higher latencies in exchange for greater bandwidth.
I'm sorry, but you are too stupid to post on /.
Bad analogies are like waxing a monkey with a rainbow.
The software knows nothing about memory latency, the software only knows it needs to move a block of data from point A to point B. That Java/C/C++ Move_Memory function translates at the lowest level to machine code instructions which are implemented in the logic of the silicon. The coder or the compiler may optimize the ORDER of execution of the instructions, or use different instructions (such as BlockMoves) to speed things up, but the basic underlying machine instructions execute the same way every time (either they hit the cache and load from there, or it misses and a memory fetch is executed across the memory bus). On-chip caches were a design to minimize memory fetch and it's associated latency. On-chip caches are small and fast and are a different design than the external memory.
What you would want is to eliminate the wait states from CPU to RAM (or get more cache hits) and that is NOT something a compiler or OS can do for you, that is done in the algorithms that run the CPU. You can change that to some extent in the BIOS settings, to tell the CPU that memory wait states are zero, or the clock is higher but IIRC the CPU and Memory and Bus Controller have to agree on all this setting and must be able to implement its' timing. Overclocking the CPU won't fix this when the Bus and Memory can't run any faster.
Your analogy does not hold. Slashdot is a high latency site. By the time I've read a few comments, I've usually forgotten what the story was about.
Wait, why am I posting this comment again?
*blinking cursor*
So talking about optimisation for low-latency RAM is, I suspect, nonsense. What we are surely seeing here is that the actual limitation on memory bandwidth is somewhere else - in the memory controller,in the cache controller, in the CPU fetch rate, in the rate at which stuff is being fetched from hard disk, in bus contention. Overclocking - speeding up memory controllers and buses - will have an effect. Reducing the number of wait states on the memory bus will not have much effect on performance if the total number of active memory cycles in a given period is largely unchanged.
If you had a need for real speed in an application which was not dependent on the graphics subsystem or access to network and HDD, I am sure you could get much more performance out of low-wait state RAM, but you would do it by HARDWARE design, not by software optimisation.
As a simple example from the dim and distant past when I was building hardware, TI used to have a microcontroller called the TMS9995 which ran at, for the day, a hefty 12MHz. With the slow DRAM of the time, it always needed a wait state and this meant that it could manage, as I recall, two memory accesses per microsecond. With static RAM, it could manage 3. The 9995 actually stored its working registers in external memory and so this meant a real world speedup of nearly 30%. The 8088, on the other hand, kept its working registers on-chip and had a limited instruction pipeline. As a result, the equivalent speedup was nothing like 30%. This was due to hardware differences not software differences.
In fact, the applications which really test out the memory subsystem are not games - they are databases and webservers, which hardly use the graphics system at all. And in these cases, for low end systems, the big beast in the equation is cache. It's quite astonishing how a Pentium-M can churn through a badly designed join while a low end AMD 64 struggles, simply because one has 2Mbytes of cache and the other has only 512K. As a result, for ordinary technical laptop and desktop work, I now specify Pentium-M, the AMD 64 with 1Mbyte cache, or pentium-D with 1Mbyte per core. You know it makes sense.(And now everyone can explain why I'm wrong, in my turn)
Pining for the fjords
I do large 3D thunderstorm simulations. With some of the larger simulations I am integrating lots of things, contained in 3D floating point arrays, over 1 billion or more gridpoints (using distributed computing, such as a beowulf cluster made up of dual Xeons or an SGI Altix system). Each scientific calculation requires accessing floating point values stored in these arrays, doing some math, and updating another array.
Memory latency, and memory bandwidth, both impact how long it takes my simulations to complete. Let's say it is the difference between a simulation taking a week vs. five days... this is significant to me and how much I can get done. With these heavy duty scientific models and such, you really can see a noticable benefit with the fancier hardware, and clock speed is certainly not the the only factor to consider by a long shot.
A squid eating dough in a polyethylene bag is fast and bulbous, got me?
Price. Well, price and size, but mostly price.
:)
Cache isn't some magical thing. It's simply RAM. SRAM, usually, which is why it's so fast (don't have to waste power/time refreshing your contents). At the end of the day, it's just some very fast RAM. It sits between your CPU and the rest of your RAM, and uses its increased speed to "trick" the CPU into performing as if your main RAM is much faster than it is.
In my computer arch course a while back, someone asked why, if cache is so fast, we don't just build computers 100% SRAM memory. Our professor did some back-of-the-napkin calculations for fun. Major $. Have to include the extra space and cooling requirements, of course
The other thing, of course, is the good old law of diminishing returns. Cache actually solves the problem VERY nicely. For most people/computers/applications, cache misses aren't that great of a problem, because most computer code lends itself to cache hits (a phenomenon called "locality"). Locality is WHY we have cache in the first place. In general most computing works very well with a tiny amount of very fast cache and a small amount of fast cache. Adding more eventually gets you to the point where you're not seeing much if any improvement. On most modern systems, we're at that point - at least as far as the market will bear.
Oh, and outside of the HPC world, there's no NEED for programmers to worry about memory caching issues. This isn't where most bottlenecks show up, and again, most general-purpose code lends itself very nicely to small amounts of cache. Compilers often help here, too. Most of your average programmers would get better use of their time analyzing the data structures and algorithms they use.
Endless arguments over trivial contradictions in books written by ignorant savages to explain thunder in the dark.
It's not really that wierd. Vertical bars are a popular list separation item, and I've used them in cookies for web applications I've designed many times. In your example, you have a twelve-item list, with the first two items equal to "CA" and "NA" respectively, and the remaining equal to "".
:)
What they're doing with the list is anybody's guess.
"Times have not become more violent. They have just become more televised."
-Marilyn Manson
Better latent than never.
Today's vices may be tomorrow's virtues.
However, if you have algorithmicly intensive software (spending lots of time in the same loops or crunching large amounts of data), it's worthwhile to instrument your code and see how you're doing for cache hits/misses. You might discover that by tweaking the inner-most loops or the size blocks you crunch, you can better fit the cache of the target processor.
Word/Excel isn't going to bother, but a game might be worth stuffing a few versions of tweaked loops in that are selected by a loop invariant, or by feeding the functions some data ahead of time to help guide them to use the best sizes of data that they can.
This isn't unlike memory alignment for structures, and taking a massive performance hit for the data not being "easy" for the assembly instructions to process.
One example is the ability to loop-unroll the innermost butterflies of an FFT on the x86-64 extension using the extra registers that are available there. That WILL get you a noticeable increase in performance.
But these are always the last 20% kinds of increases...