Slashdot Mirror


The Impact of Memory Latency Explored

EconolineCrush writes "Memory module manufacturers have been pushing high-end DIMMs for a while now, complete with fancy heat spreaders and claims of better performance through lower memory latencies. Lowering memory latencies is a good thing, of course, but low-latency modules typically cost twice as much as standard DIMMs. The Tech Report has explored the performance benefits of low-latency memory modules, and the results are enlightening. They could even save you some money."

36 of 162 comments (clear)

  1. apply this to picking a wife by 0110011001110101 · · Score: 5, Funny
    FTFA - Lowering memory latencies is a good thing, of course, but low-latency modules typically cost twice as much as standard DIMMs.

    I'd have to say this is right on when applied picking a woman to spend your life with... low-latency memory is a BAD BAD thing, and VERY expensive. My next time around, I'm going with the "CHEAPER", high-latency model that can't immediately recall everything I've ever said while arguing her point... Roses and jewelry can cost you over the long run friends...

    --
    Don't anthropomorphize computers: they hate that.
  2. Just stick a few blue LEDs on it... by phpm0nkey · · Score: 5, Funny

    I have no doubt that hardcore PC gamers will shell out the cash for these, regardless of the cost/performance ratio. Once you start paying $500+ for a graphics card, all rational decision making skills are lost.

    1. Re:Just stick a few blue LEDs on it... by timeOday · · Score: 4, Insightful

      I wouldn't buy a $500 card either but, sheesh, at least they're faster than the cheap ones. This low-latency memory is twice the price for a ~3% boost... I think not.

    2. Re:Just stick a few blue LEDs on it... by Iriel · · Score: 3, Insightful

      This isn't all that funny. I mean, it does make me laugh, but it's far more true than humorous. I constantly get berated by the 'hardcore' gamers for not having the fastest CPU/RAM/GPU/HD when I can still run a lot of games just as well as anyone else. The problem with hardcore gaming equipment is that it has become something like MTV selling you 'cool'.

      Guess what? That wicked dual-core CPU actually runs games slower than its single core cousin. That brand-spankin' new video card that cost you $400(or more)? I pay that much once every several years on my video card. The difference is that I don't care if I squeeze out my maximum frames per second because most people can't even detect the difference if the game didn't have an option to show the number in the corner of the screen like some veritable rating of thier manhood (sorry for my gender bias on that). And that super ultra OHMYFUCKINGGODITMAKESMYEXPLODEITSSOFAST low-latency RAM is giving you a performance boost of 2% of what I've got now.

      I find it educational to read these reports so I can make educated purchasing choices. For that, I'm quite grateful. However, I find it kind of sad that the parent post is unsettlingly accurate in that the 'hardcore pc gamers' will shove this to the side for the ATI SXL 10G Super Elite XTRME Pro card next week. Witness what happens when PC gaming meets MTV-esque marketing.

      --
      Perfecting Discordia
      www.stevenvansickle.com
    3. Re:Just stick a few blue LEDs on it... by Iriel · · Score: 3, Insightful

      You seemed to have missed the point that 'a lot of games' does not mean 'all games', 'any games', or any derivative thereof. And honestly, the point of my post is that I'm willing to sacrifice some detail and put my settings at 75-80% instead of maxed-out if it'll save me from spending close to a thousand dollars a year in upgrades.

      --
      Perfecting Discordia
      www.stevenvansickle.com
    4. Re:Just stick a few blue LEDs on it... by vmcto · · Score: 3, Interesting

      Hey don't knock gamers that spend tons of money on computer gear.

      It's thanks to them that the rest of us can get normal gear at such reasonable prices...

    5. Re:Just stick a few blue LEDs on it... by Wonko · · Score: 2, Informative

      Guess what? That wicked dual-core CPU actually runs games slower than its single core cousin.

      Is this actually a true statement? I can't do any current testing since I don't have a reasonable 3D card in my machine, but I remember testing Quake 3 on my old dual Celeron machine with a TNT2 card. top showed Quake was using 95% or more of one CPU, and the X server was using 30% or more of the other CPU.

      I don't expect the numbers to be the same today, but shouldn't there be at least some slight increase in speed if the GUI is running in a separate process? I am not saying that the increase would justify the price, but I haven't run a single CPU desktop in something like six years. I am not about to start now :).

    6. Re:Just stick a few blue LEDs on it... by HavokDevNull · · Score: 4, Interesting

      Have to agree with AC on the cpu issue, taken from the http://techreport.com/reviews/2005q2/athlon64-x2/i ndex.x?pg=16

      Conclusions
      Let's start by talking about the Athlon 64 X2 4200+. This CPU generally offers better performance than its direct competitor from Intel, the Pentium D 840. Most notably, the X2 4200+ doesn't share the Pentium D's relatively weak performance in single-threaded tasks like our 3D gaming benchmarks. The Athlon 64 X2 4200+ also consumes less power, at the system level, than the Pentium D 840--just a little bit at idle (even without Cool'n'Quiet) but over 100W under load. That's a very potent combo, all told.

      In fact, the X2 4200+ frequently outperforms the Pentium Extreme Edition 840, which costs nearly twice as much. Thanks to its dual-core config, the X2 4200+ also embarrasses some expensive single-core processors, like the Athlon 64 FX-55 and the Pentium 4 Extreme Edition 3.73GHz. Personally, I don't think there's any reason to pay any more for a CPU than the $531 that AMD will be asking for the Athlon 64 X2 4200+.

      If you must pay more for some reason, the Athlon 64 X2 4800+ will give you the best all-around performance we've ever seen from a "single" CPU. The X2 4800+ beats out the Pentium Extreme Edition 840 virtually across the board, even in tests that use four threads to take best advantage of the Extreme Edition 840's Hyper-Threading capabilities. The difference becomes even more pronounced in single-threaded applications, including games, where the Pentium XE 840 is near the bottom of the pack and the X2 4800+ is constantly near the top. The X2 4800+ also consumes considerably less power, both at idle and under load.

      The X2 4800+ gives up 200MHz to its fastest single-core competitor, the Athlon 64 FX-55, but gains most of the performance back in single-threaded apps thanks to AMD's latest round of core enhancements, included in the X2 chips. The X2 4800+ also matches the Opteron 152 in many cases thanks to Socket 939's faster memory subsystem. Remarkably, our test system consumes the same amount of power under load with an X2 4800+ in its socket as it does with an Athlon 64 FX-55, even though the X2 is running two rendering threads and doing nearly twice the work. Amazing.

      There's not much to complain about here, but that won't stop me from trying. I would like to see AMD extend the X2 line down two more notches by offering a couple of Athlon 64 X2 variants at 2GHz clock speeds and lower prices. I realize that by asking for this, I may sound like a bit of a freeloader or something, but hey--Intel's doing it. No, the performance picture for Intel's dual-core chips isn't quite so rosy, but the lower-end Pentium D models will make the sometimes-substantial benefits of dual-core CPU technology more widely accessible. If AMD doesn't follow suit, lots of folks will be forced to choose between one fast AMD core or two relatively slower Intel cores. I'm not so sure I won't end up recommending the latter more often than the former.

      Beyond that, the giant question looming over the Athlon 64 X2 is about availability, as in, "When can I get one?" Let's hope the answer is sooner rather than later, because these things are sweet.

      --
      Sig
  3. Re:Link crashed Firefox by Maljin+Jolt · · Score: 5, Interesting

    Beware, one of the banner advertiser on that page (netshelter.net) is trying to buffer overflow with strangely crafted cookie. Hope you do not run your Firefox on Windows...

    --
    There you are, staring at me again.
  4. Anandtech did this months ago by Anonymous Coward · · Score: 4, Informative

    http://anandtech.com/memory/showdoc.aspx?i=2392

    You'll basically find that the performance of value memory is very on par with the high end stuff. You basically pay for the ability to overclock on a more consistent basis.

    1. Re:Anandtech did this months ago by bersl2 · · Score: 2, Informative

      Also, see this thread from the Anandtech forums.

  5. Insightful article by mindaktiviti · · Score: 2, Informative

    Although I didn't read all the text (about 50% of it), the benchmarks were what I was interested in, as well as the conclusion. So to sum it up:

    2-2-2-5 timings at 400MHz t1 memory is the fastest but costs twice as much and the performance gains are almost non-existant except in lower resolution games (i.e. 800x600 you may see an increase in 20 fps, which I think is a lot!), and of course the cost of the ram in this case would not be justified because putting that extra money into a better video card would be the better thing to do.

    Only if you're an overclocker is this worth it, at least from their benchmarking and perspective, which I'll accept.

    Oh yes, and that website also crashed my Firefox.

    1. Re:Insightful article by mindaktiviti · · Score: 2, Insightful
      800x600? Won't you already be getting >100 FPS in most games anyway?

      Perhaps, that particular benchmark was for Far Cry at 800x600 w/ medium settings, and the lowest fps was around 168, and the highest was 188, so a 20fps difference.

      They were using this video card: NVIDIA GeForce 6800 GT with ForceWare 77.77 drivers

      However if you look at the opportunity cost of buying this ram because you have a bad video card and play at those resolutions, then it would still be more worth it to just get a better video card. Even if you want to upgrade your ram, it would be wiser to just save that extra money to put into a 6600GT or something.

      It would have been interesting if they did the test with an older video card as well, like a GeForce3 series.

  6. But surely flashing LEDS make it go faster! by Cr0w+T.+Trollbot · · Score: 3, Funny
    After all, that's the main feature of Crucial Ballistix Tracer Memory. I'm sure those LEDS must be worth at least 10 fps in Doom 3...

    Crow T. Trollbot

  7. Re:FP! by stinerman · · Score: 4, Informative

    The real question is: can I buy 533MHz ram and run it slower with lower latencies?

    Yes. I regularly by high speed RAM and downclock it, but run it at lower latency. For instance if I wanted to run my RAM at 400MHz, I'd buy 433/466/500MHz VAL-U-RAM and run it as a stick of semi-premium 400MHz.

  8. Ask a builder by Dragoon412 · · Score: 2, Insightful

    Seriously, this has been known very well amongst the gaming PC builder crowd for a long time. Most of them, anyways; there's unfortunately still that level at which people know enough to put the PC together, but don't know enough to tell you what any of the numbers mean.

    The difference between, say, Corsair Value Select memory, and Corsair 1337 Ultra X2000 - the memory equipped with LCDs, heat spreaders, and a spoiler with metal-flake yellow paint that add at least 10 horsepower - is going to be absolutely unnoticeable in the real world. Even benchmark scores will show little to no improvement.

    Ricer RAM - you know, the PC equivalent of this crap - is for overclocking. If you're not planning on overclocking it, you're paying too damned much.

    1. Re:Ask a builder by theantipop · · Score: 3, Insightful
      What most people don't realize is that the only way to improve your performance at the top end of the performance spectrum is through a combination of small tweaks such as this. Sure spending twice the money for 103% of the performance sounds dumb, but when you combine that with small tweaks to your processor, graphics card and a 10,000rpm hard drive they add up.

      These products are not for people who want to achieve a useable level of performance and as such are not marketed at those crowds. They are for people who have already fast equipment but want more. I won't say this is a good or bad thing as it is simply a hobby for most of these people. Just like import tuners: they may drive funny-looking cars, but it's their choice of hobby.

  9. What about cache? by antifoidulus · · Score: 3, Interesting

    Improvements in memory speed crawl compared to improvements in CPU speed, however larger caches can mitigate this problem to a certain extent, so why is it that growth in cache size continues to crawl? The Apple G5 updates FINALLY gave us 1mb l2 cache per core(and of course the industry standard 64k L1 cache per core) and whil the Intel/AMD world is slightly better in this regard, it's not by much. So why is it so hard to increase cache size?(of course you will need good cache allocation/replacement policies to go with them)? I'm not trolling, I honestly want to know. I realize that the people that design these chips are a lot smarter than I, but so far I haven't really seen a good reason why they don't increase cache size.
    Also, outside of the HPC world, it seems very few programmers optimize their cache usage. Are there any tools(open source or otherwise) that can actually help you locate/fix inefficient uses of cache?

    1. Re:What about cache? by harrkev · · Score: 2, Informative

      For one simple reason -- die size. Cache eats up a lot of real estate. A 1MB (B as in byte) is 8 million bits. If the cache uses DRAM-style cells, that is at least 8 million transistors. If the cache is more like SRAM, then you can count on a lot more This increases the size of the die, which decreases both the number of chips per wafer, and also increases the percentage of defective dies.

      So, the bottom line is that cache is the most expensive type of memory in a computer. Some methods have been made to get around this -- like the Intel "Slot-1" architecture where the L2 cache was on a separate chip. But this idea faded into the museum of bad ideas.

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    2. Re:What about cache? by timeOday · · Score: 2, Informative

      Because you get diminishing returns for more and more cache. At some point it's better to use all those transistors as a second core instead.

    3. Re:What about cache? by Sycraft-fu · · Score: 2, Insightful

      Cache is SRAM, since SRAM is much faster. Ok, except that SRAM takes 6 transistors per bit to make. So for 1 megabyte of cache, that's 48 million transistors to implement. That's a major budget of silicon. As transistor count goes up, so does die size, heat, cost, failure rate, etc. So putting large caches on just isn't feasable. A 8MB cache would use more transistors than most processors today do in total between core and cache.

      Ok, you say, so move it off the chip. Well the problem is that part of the reason the cache can be so fast and low latency is that it's located on die. If you move it off, you start to run in to a lot harder speed limitations. Intel discovered that with their Celerons back in the PII days.

      Real PIIs had 512k of cache, but on seperate chips. Because it was off die, half chip speed was the best they could do. The Celerons only had 128k of cache, but it was on the chip die and thus ran at full speed. Now what you found was if you overclocked a Celeron to the same bus and speed as a PII (for example a 300mhz Celeron ran on a 66mhz bus, cranking that to 100mhz made it run at 450mhz, the same as a PII) it ran at least as fast, sometimes faster, despite the lower cache.

      Thus these days, on-die cache is what it's all about. Generally the value the pick is where diminishing returns start to seriously kick in. You discover that throwing more cache at things generally doesn't result in that big a speed increase (servers are a little different).

      So, unless someone figures out a better kind of RAM to use, we are stuck. DRAM is what we use for main memory already, and SRAM is too expensive to use very much of.

  10. Re:Link crashed Firefox by isometrick · · Score: 2, Informative
    How do you figure? Here's the response:
    greg@yak ~ $ wget -U "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050922 Firefox/1.0.6" --save-headers -O- "http://servedby.netshelter.net/serve.cgi?pid=TECH -REPORT&g=1&m=9&j=1&k=1&id=1016381969816&d=iframe" 2>/dev/null
    HTTP/1.1 200 OK
    Date: Wed, 02 Nov 2005 16:05:08 GMT
    Server: Apache/2.0.46 (Red Hat)
    Set-Cookie: ls=1; path=/; domain=.servedby.netshelter.net
    Set-Cookie: FCDEFAULT_TECH-REPORT.TECH-REPORT=1|051102160508+1 |cf4bbb50bcb0d4ac; path=/; domain=.servedby.netshelter.net; expires=Wed, 09 Nov 2005 16:05:08 GMT
    P3P: CP='NOI NID PSAa PSDa OUR IND COM NAV', policyref="/w3c/p3p.xml"
    Expires: Fri, 20 Mar 1998 01:00:00 GMT
    Cache-Control: no-cache
    Pragma: no-cache
    X-Adtrix-Debug: -ac=7 VONCAN728Q405HEITNOV=GT ITTQ405TECHENTHUS728=DD VON728Q405HEISITNOV=DD CRUUKNOV05GEEK728=GT CRUUKNOV05VIPER728=GT VERZTECHENTH728NOV05=DD -ec=1 DEFAULT_TECH-REPORT=#1
    Connection: close
    Content-Type: text/html; charset=UTF-8
     
    <script language='JavaScript' type='text/javascript' src='http://techreport.com/phpads/adx.js'></script >
    <script language='JavaScript' type='text/javascript'>
    <!--
      if (!document.phpAds_used) document.phpAds_used = ',';
      phpAds_random = new String (Math.random()); phpAds_random = phpAds_random.substring(2,11);
     
      document.write ("<" + "script language='JavaScript' type='text/javascript' src='");
      document.write ("http://techreport.com/phpads/adjs.php?n=" + phpAds_random);
      document.write ("&amp;what=zone:24");
      document.write ("&amp;exclude=" + document.phpAds_used);
      if (document.referrer)
          document.write ("&amp;referer=" + escape(document.referrer));
      document.write ("'><" + "/script>");
    //-->
    </script><noscript><a href='http://techreport.com/phpads/adclick.php?n=a 1feedc2' target='_blank'><img src='http://techreport.com/phpads/adview.php?what= zone:24&amp;n=a1feedc2' border='0' alt=''></a></noscript>
  11. So did ExtremeTech - and they included A64 and P4 by freidog · · Score: 4, Interesting
  12. Re:Link crashed Firefox by NanoGator · · Score: 4, Funny

    "Beware, one of the banner advertiser on that page (netshelter.net) is trying to buffer overflow with strangely crafted cookie. Hope you do not run your Firefox on Windows..."

    Just another reason to switch to IE!

    --
    "Derp de derp."
  13. The real issue ... by TheCrig · · Score: 4, Insightful

    ... Is not memory performance as such, but system performance. If a 5 percent increase in system performance increases the cost of your system by 10 percent, you have to want it pretty badly or be on the edge of required performance or just be in a schoolyard comparison. But if it's reversed, and a 10 percent increase in system performance can be had for a 5 percent increase in system price, then if you can afford the 5 percent (say $100 for a $2000 system), go for it.

    --
    -- Jim Crigler In 1937, I began, like Lazarus, the impossible return. -- Whittaker Chambers
  14. Re:Can't Read the Article by DrWhizBang · · Score: 4, Funny

    Sounds like a memory timing issue - you should upgrade to some OCZ low-latency RAM!

    (I'm sorry, that's not helpful at all, is it?)

    --
    Schrodinger's cat is either dead or really pissed off...
  15. Re:The underestimated impact of latency. by Zathrus · · Score: 5, Interesting

    Sorry, I call BS on your entire post. The difference in latencies here is miniscule -- it's not like we're talking about having the CPU wait 2 clock cycles vs 30 clock cycles. It's closer to 13 vs 25 (not exact, but the magnitude of difference is close). That just doesn't matter that much -- the reality is that if you have a cache miss then you're looking at 20-30 cycles (or, more likely, 40-60 cycles) of stall while you fetch the data from main memory.

    The kind of changes you're talking about require vastly faster memory. Not the kind of latency differences being discussed here at all. Both of these are "high latency" compared to what would be needed for your theoretical redesign of the entire software stack. And even then, you just become utterly and completely screwed if you have to hit virtual memory, possibly more so than you are now because you've re-orchestrated everything around the idea that latency is a non issue.

    Oh, and latency is getting worse, not better, and has been for a long, long time. CPU speeds long ago outstripped the speeds of our fastest memory (well, fastest while still not costing absurd amounts of money...), and the newer memory formats (DDR, DDR2, DDR3, RDRAM, etc) have higher latencies in exchange for greater bandwidth.

  16. Re:Save money? by CountBrass · · Score: 3, Funny

    I'm sorry, but you are too stupid to post on /.

    --
    Bad analogies are like waxing a monkey with a rainbow.
  17. Re:The underestimated impact of latency. by twiddlingbits · · Score: 2, Informative

    The software knows nothing about memory latency, the software only knows it needs to move a block of data from point A to point B. That Java/C/C++ Move_Memory function translates at the lowest level to machine code instructions which are implemented in the logic of the silicon. The coder or the compiler may optimize the ORDER of execution of the instructions, or use different instructions (such as BlockMoves) to speed things up, but the basic underlying machine instructions execute the same way every time (either they hit the cache and load from there, or it misses and a memory fetch is executed across the memory bus). On-chip caches were a design to minimize memory fetch and it's associated latency. On-chip caches are small and fast and are a different design than the external memory.

    What you would want is to eliminate the wait states from CPU to RAM (or get more cache hits) and that is NOT something a compiler or OS can do for you, that is done in the algorithms that run the CPU. You can change that to some extent in the BIOS settings, to tell the CPU that memory wait states are zero, or the clock is higher but IIRC the CPU and Memory and Bus Controller have to agree on all this setting and must be able to implement its' timing. Overclocking the CPU won't fix this when the Bus and Memory can't run any faster.

  18. Re:If you can afford a cup of coffee a day... by merdaccia · · Score: 4, Funny

    Your analogy does not hold. Slashdot is a high latency site. By the time I've read a few comments, I've usually forgotten what the story was about.

    Wait, why am I posting this comment again?

    --

    *blinking cursor*

  19. What does this mean? by Flying+pig · · Score: 2, Interesting
    All memory has an access time, and the further you get from the CPU the longer it is going to be. CPU registers have the shortest access time, with (nowadays) subnanosecond access. L1 cache comes next, then L2, then external RAM, then HDD, and finally the slow backing store represented nowadays by CD and DVD. This heirarchical memory architecture changes with time mostly in that the caches grow bigger, so the 640K of RAM from DOS days now fits into the cache of each processor in a pentium-D with room to spare, and a Pentium-M could in theory run DOS with extended and expanded memory without needing any external RAM at all. (I'd almost like to try that.)

    So talking about optimisation for low-latency RAM is, I suspect, nonsense. What we are surely seeing here is that the actual limitation on memory bandwidth is somewhere else - in the memory controller,in the cache controller, in the CPU fetch rate, in the rate at which stuff is being fetched from hard disk, in bus contention. Overclocking - speeding up memory controllers and buses - will have an effect. Reducing the number of wait states on the memory bus will not have much effect on performance if the total number of active memory cycles in a given period is largely unchanged.

    If you had a need for real speed in an application which was not dependent on the graphics subsystem or access to network and HDD, I am sure you could get much more performance out of low-wait state RAM, but you would do it by HARDWARE design, not by software optimisation.

    As a simple example from the dim and distant past when I was building hardware, TI used to have a microcontroller called the TMS9995 which ran at, for the day, a hefty 12MHz. With the slow DRAM of the time, it always needed a wait state and this meant that it could manage, as I recall, two memory accesses per microsecond. With static RAM, it could manage 3. The 9995 actually stored its working registers in external memory and so this meant a real world speedup of nearly 30%. The 8088, on the other hand, kept its working registers on-chip and had a limited instruction pipeline. As a result, the equivalent speedup was nothing like 30%. This was due to hardware differences not software differences.

    In fact, the applications which really test out the memory subsystem are not games - they are databases and webservers, which hardly use the graphics system at all. And in these cases, for low end systems, the big beast in the equation is cache. It's quite astonishing how a Pentium-M can churn through a badly designed join while a low end AMD 64 struggles, simply because one has 2Mbytes of cache and the other has only 512K. As a result, for ordinary technical laptop and desktop work, I now specify Pentium-M, the AMD 64 with 1Mbyte cache, or pentium-D with 1Mbyte per core. You know it makes sense.(And now everyone can explain why I'm wrong, in my turn)

    --
    Pining for the fjords
  20. Scientific computing benefits from this by Orp · · Score: 2, Interesting

    I do large 3D thunderstorm simulations. With some of the larger simulations I am integrating lots of things, contained in 3D floating point arrays, over 1 billion or more gridpoints (using distributed computing, such as a beowulf cluster made up of dual Xeons or an SGI Altix system). Each scientific calculation requires accessing floating point values stored in these arrays, doing some math, and updating another array.

    Memory latency, and memory bandwidth, both impact how long it takes my simulations to complete. Let's say it is the difference between a simulation taking a week vs. five days... this is significant to me and how much I can get done. With these heavy duty scientific models and such, you really can see a noticable benefit with the fancier hardware, and clock speed is certainly not the the only factor to consider by a long shot.

    --
    A squid eating dough in a polyethylene bag is fast and bulbous, got me?
  21. (can't have a subject that starts with $) $ by freeweed · · Score: 2, Insightful

    Price. Well, price and size, but mostly price.

    Cache isn't some magical thing. It's simply RAM. SRAM, usually, which is why it's so fast (don't have to waste power/time refreshing your contents). At the end of the day, it's just some very fast RAM. It sits between your CPU and the rest of your RAM, and uses its increased speed to "trick" the CPU into performing as if your main RAM is much faster than it is.

    In my computer arch course a while back, someone asked why, if cache is so fast, we don't just build computers 100% SRAM memory. Our professor did some back-of-the-napkin calculations for fun. Major $. Have to include the extra space and cooling requirements, of course :)

    The other thing, of course, is the good old law of diminishing returns. Cache actually solves the problem VERY nicely. For most people/computers/applications, cache misses aren't that great of a problem, because most computer code lends itself to cache hits (a phenomenon called "locality"). Locality is WHY we have cache in the first place. In general most computing works very well with a tiny amount of very fast cache and a small amount of fast cache. Adding more eventually gets you to the point where you're not seeing much if any improvement. On most modern systems, we're at that point - at least as far as the market will bear.

    Oh, and outside of the HPC world, there's no NEED for programmers to worry about memory caching issues. This isn't where most bottlenecks show up, and again, most general-purpose code lends itself very nicely to small amounts of cache. Compilers often help here, too. Most of your average programmers would get better use of their time analyzing the data structures and algorithms they use.

    --
    Endless arguments over trivial contradictions in books written by ignorant savages to explain thunder in the dark.
  22. Re:Link crashed Firefox by Randolpho · · Score: 2, Informative

    It's not really that wierd. Vertical bars are a popular list separation item, and I've used them in cookies for web applications I've designed many times. In your example, you have a twelve-item list, with the first two items equal to "CA" and "NA" respectively, and the remaining equal to "".

    What they're doing with the list is anybody's guess. :)

    --
    "Times have not become more violent. They have just become more televised."
    -Marilyn Manson
  23. As we say in Psychology by frank249 · · Score: 2, Funny

    Better latent than never.

    --

    Today's vices may be tomorrow's virtues.

  24. Re:The underestimated impact of latency. by Woody77 · · Score: 2, Interesting

    However, if you have algorithmicly intensive software (spending lots of time in the same loops or crunching large amounts of data), it's worthwhile to instrument your code and see how you're doing for cache hits/misses. You might discover that by tweaking the inner-most loops or the size blocks you crunch, you can better fit the cache of the target processor.

    Word/Excel isn't going to bother, but a game might be worth stuffing a few versions of tweaked loops in that are selected by a loop invariant, or by feeding the functions some data ahead of time to help guide them to use the best sizes of data that they can.

    This isn't unlike memory alignment for structures, and taking a massive performance hit for the data not being "easy" for the assembly instructions to process.

    One example is the ability to loop-unroll the innermost butterflies of an FFT on the x86-64 extension using the extra registers that are available there. That WILL get you a noticeable increase in performance.

    But these are always the last 20% kinds of increases...