The Fundamentals Of Cache

← Back to Stories (view on slashdot.org)

Posted by Hemos on Tuesday October 17, 2000 @12:44AM from the learn-more dept.

Dave wrote to us with an article currently running on SystemLogic that delves into caching, with specific examples taken from the Athlon and PIII processors. It also talks about the different types of cache - fairly technical, but an all around good read.

7 of 60 comments (clear)

Min score:

Reason:

Sort:

Re:Latency not bandwidth, surely! by Cenotaph · 2000-10-16 20:44 · Score: 3

Actually, the article is correct.
Modern processors have all sorts of things to deal with latency. Branch prediction, multiple issue slots, out-of-order execution, etc. However, this hardware generally requiers a group of instructions to work on, the more the better. Keeping a large number of instructions around for the processor to use requires more bandwidth.
That being said, when you have adequate bandwidth, latency does become more of a problem. That's why chip multiprocessors (CMPs) and Simultaneous Multithreading Processors (SMTs) are becoming a large focus of current processor research.

--
"You can put a man through school,
But you cannot make him think."

--
"You can put a man through school,
But you cannot make him think."
Ben Harper
Re:Asmheads can check check cache effects easily by mav[LAG] · 2000-10-16 21:53 · Score: 3

Gah - that will teach me to use the preview button :)
It isn't fair to certain processors as there are restrictions on using addresses with the same low 4 address bits back-to-back (original Pentium certainly).
True - this example is meant to show how the Pentium (PPLain in Agner's docs) can be tripped up.
- You've chosen to access non-aligned data (accessing 16 bit values on an odd, and cache line spanning, address)
Very deliberately I might add :) The point I was making was that cache considerations play a big role in optimising inner loops. The above code is an example written by Niklas Beisert a.k.a Pascal of Cubic Team to show the effects of cache misses when doing snazzy bitmap effects.

--
--- Hot Shot City is particularly good.
Re:Latency not bandwidth, surely! by n3rd · 2000-10-16 20:11 · Score: 3

But surely it's latency that is of primary importance and bandwidth takes second place?

At first I thought you were right, but the more I thought about it, the more I feel the article is correct.

The best way to think of this is on high end systems. Think of a Sun Ultra Enterprise 450 with a couple gigs of RAM, a couple processors and a bunch large, memory and CPU intensive programs running.

You have 20 processes each which needs a slice of CPU time. Each time a process runs on the CPU, parts of the process are copied from RAM to the CPU to execute. That processes executes for its specified time slice, the kernel stops it, copies the results back to RAM, and then does it again with another process. Now imagine this happening hunderds of times per second! The memory bus gets even more saturated with more processors since there are more RAM to CPU copies and vice-versa.

This is where cache comes in. Part of the program that just executed is kept in the cache so the next time it's time slice comes around (ie: it's time to run on the CPU), there won't have to be a copy from RAM to the CPU. The CPU simply grabs it out of it's cache, thus freeing up bandwidth on the memory bus.
Asmheads can check check cache effects easily by mav[LAG] · 2000-10-16 20:56 · Score: 4

fast: mov dx,12 l1: mov cx,32768 l2: mov ax,[0] mov ax,[0] mov ax,[0] dec cx jnz l2 dec dx jnz l1 slow: mov dx,12 l3: mov cx,32768 l4: mov ax,[4095] mov ax,[8191] mov ax,[12287] dec cx jnz l4 dec dx jnz l3 Nearly identical loops, except the first one flies and the second one thrashes because it misses the cache 100% of the time. Can you say 10-50 times slower?

--
--- Hot Shot City is particularly good.
Re:Latency not bandwidth, surely! by kinnunen · 2000-10-16 21:03 · Score: 3

Context switching doens't really take insane amounts of bandwithdth. You copy the contents of all the registers to memory (including all registers not visible to programmer), then copy the registers of the new process from RAM to CPU. I am intimately famiar with x86 CPU, but I'd guesstimate this shouldn't take more than maybe half a kilobyte of memory transfers total. Do that 400 times and second a we are talking about one fifth of a megabyte of used bandwith per second. I can live with that.
Of course you are right in that cache does increase available bandwidth and that this is a good thing, but latency really is the thing we want to cure. For the original Athlon (though I'm guessing this is one of the faster models with 1/3 L2 dividor), 24 clock cycles go to waste even if the data is on L2, and even that is considerably faster than DRAM. Add to this that x86 is a really architecture because almost every instrucion can (and will) reference memory at least once (in addition to the mandatory instruction fetch).

--
Latency not bandwidth, surely! by WSSA · 2000-10-16 19:59 · Score: 3

The article starts out talking of "bandwidth starved" processors. But surely it's latency that is of primary importance and bandwidth takes second place?
Efficient use of a cache - App writers fault. by fatphil · 2000-10-16 21:37 · Score: 4

Unhappy with the performance of someone elses memory hungry code (50-100MB working RAM footprimt) I wrote my own version of the utility. It was only marginally faster than the original. However, I increased the performance by a factor of 10 when I realied that I could cache up jobs to do, then in turn perform those jobs on each 4MB chunk of data (Dec Alpha with 4MB L3 cache). I managed to increase performance even further by aiming the code at the L2 cache instead! The total number of bytes read/written was identical, but simply changing the order in whihc they were done increated performance 12-15 times.

However, as soon as you do take into account caching issues, you sometimes start making non-portable decisions. (not always though, as generally most architectures have the problem but the lines are simply drawn in different places).

FatPhil

--
Also FatPhil on SoylentNews, id 863