Ask Slashdot: Is the Gap Between Data Access Speeds Widening Or Narrowing?
New submitter DidgetMaster writes: Everyone knows that CPU registers are much faster than level1, level2, and level3 caches. Likewise, those caches are much faster than RAM; and RAM in turn is much faster than disk (even SSD). But the past 30 years have seen tremendous improvements in data access speeds at all these levels. RAM today is much, much faster than RAM 10, 20, or 30 years ago. Disk accesses are also tremendously faster than previously as steady improvements in hard drive technology and the even more impressive gains in flash memory have occurred. Is the 'gap' between the fastest RAM and the fastest disks bigger or smaller now than the gap was 10 or 20 years ago? Are the gaps between all the various levels getting bigger or smaller? Anyone know of a definitive source that tracks these gaps over time?
The distance between the "fastest" and "slowest" gets larger and larger, but the gaps are getting smaller because things like SSDs fill them.
...but not just because someone else is too lazy to do it themselves. Soing the maths would have taken less effort than writing this /. fill-piece.
Does it matter? Fast CPU, fast RAM, fast disks is like having no speed limits on every race track in the world - but in order to get from track to track you have to go on the interstates or perhaps back country roads (PCI bus, etc). Sure, each component is fast and getting faster, but the way those components connect to each other hasn't changed all that much...
Don't blame me, I voted for Kodos
This could literally be answered with three google searches. '2015 l2 cache speed' 'ddr4 speed' '2015 ssd speed'
I'm not sure what a historic timeline of these ratios (not "differences", please) would gain you.
These ratios can have a big impact on what algorithms and implementations you choose to maximize performance. I suppose if, say, the ratio of RAM to disk speed increased by a factor of 10 over the decade before last, then decreased back to its original ratio in the last decade, it might be worth trawling through some old papers (or old source trees) to revisit lessons learned in the earlier period -- but that seems like a bit of a stretch.
If you're just curious, it shouldn't be too hard to generate timelines of CPU cycle speeds, cache and RAM latencies and bandwidths, disk performance, and so on. But really, each of those has enough factors that a simple "ratio" would probably conceal more than it illuminates.
for a CS or IT class?
Yes, yes it is.
PlanetVulkan.com
is still in effect. Every memory speed increase is perforce dwarfed by speed increases above in the hierarchy. The ongoing stall in increasing CPU clock speeds has obscured the nature of the problem, but we'll never go back to the 1980s when CPUs were not fast enough to cause memory hot spots.
No one has attacked the problem successfully except for architects who have designed split-phase memory transactions for loads and stores along with the capability for many of them to be simultaneously in flight. This requires lots of silicon for management, and significant investment in compiler technology to be effective, a cost that almost no one has been willing to bear.
"Everyone knows that CPU registers are much faster than level1, level2, and level3 caches."
I'd argue that most people don't even know what a CPU register is, never mind what it's faster than.
BeauHD. Worst editor since kdawson.
Originally, there was CPU registers, and memory. Then there was registers, memory, and disk. Then there was registers, SRAM cache, memory, and disk. Then there was registers, L1 cache (on CPU), L2 cache (on mobo), DRAM, and disk. Then the L2 moved onto the CPU. Then there was L3. Then SSDs were added between RAM and disk. Now some chips have an L4 cache on the CPU package (but not the CPU die).
Oh, and there's a difference between latency and bandwidth. DRAM latency has not significantly improved over time, particularly compared to DRAM bandwidth.
And with multiple cores, some levels are core-specific while others are not. You can even have a bizarre situation where L1 cache is per-core, L2 cache is shared between two cores, and L3 cache is per-CPU (in SMP setups, that means main RAM is the first level shared among all cores).
For a number of years, we were in the technological progress era, we're now in the commercial progress era.
Slashdot, fix the reply notifications... You won't get away with it...
I think that a lot of good IT folks are looking at where the bottlenecks are within the technologies available and making implementation decisions in a much more detailed way. Right now, and has often been the case, the bus becomes the bottleneck for a lot of applications. Sure there are memory bound, and I/O bound problems that still see the gaps between memory speed and disk read/write speeds as an issue, but there are far more problems that are handicapped due to the system bus speed being hobbled, comparatively. One prime example being problems that could be solved by GPGPU systems that have to swap a lot of data across the system bus to an expansion bus and get hung up doing so. The same could be said for coprocessors like the Xeon Phi. Bus speeds have always lagged other technologies that interface with them, and that's really a concern for most computing problems. Overall, I'd say that the gaps are getting smaller, but there is still a lot of work to be done to integrate the various protocols and technologies used in the end-to-end computing environment in order to make things more streamlined from a data throughput point of view.
The latency of RAM is improving very slowly, only something like 2x-4x improvement in last 20 years.
Only the bandwidth of the memory is growing faster, and that's just because they have been putting more dram cells in parallel, always doing bigger data transfers and having faster memory bus.
Same is true for hard disk drive speed, the rotation speeds dictates the random access latency and the rotation speed of average hard disk has only gone up from 4200 or 5400 to 7200 rpm in the last 20 years, meaning only 1.7 or 1.33 times improvement in random access latency
Though replacing hard disks with flash-based SSD storage has improved latency by a huge margin.
There are orders of magnitude difference in the access times. Disk access is measured in milliseconds; memory access is measured in nanoseconds; register speed in picoseconds. Improving disk access by even 1 millisecond closes the gap.
A modern cheap laptop from today is faster than a Convex supercomputer I was using 20 years ago. But in that same time span disk seek times dropped from 15 milliseconds down to about 7.5 milliseconds which is only a factor of two.
We don't see the world as it is, we see it as we are.
-- Anais Nin
Yes, it does it matter.
For example, if you want to sort objects which does not fit at one level, the sorting may spill over to a next level of much slower memory. This imbalance of speed is relevant regardless of the interconnect speed limits, as a delta as such is the culprit.
There are some ways to improve on that.
From Wikipedia > Sorting_algorithm > Memory_usage_patterns_and_index_sorting
https://en.wikipedia.org/wiki/...
That is one example of a solution to a problem that matters.
20 years ago main memory was 10-14 ns, instruction cycle time was 2-4ns (Cray)
Guess what? it still is.
Memory has grown, it has gotten cheaper.
What HASN'T changed? Access to memory. That is how Cray got its speed - instead of a single port to memory, it used a crossbar switch - 4 ports for each processor. 1 instruction bus, 2 input data busses, and one output bus; even I/O got its own port to memory; all with overlapping address/data cycles.
The effect was that all of main memory worked at the speed of cache, thus the CPU had no need to waste silicon on cache memory - and the entire system ran full speed.
What slows down the current systems? Memory access. Most systems only have a single port to main memory. Some servers and "high performance" desktops have dual ported memory. Yet even dual ported memory access is slow when you have to share it among 4/8 cores... plus I/O (which isn't dual ported). Interrupt latency on PCs is really horrible. Still only 15 IRQs? and have to share them? No direct vectoring? Forced interrupt chain actions? Even the old PDP 11 with ONE interrupt request line allowed direct interrupt vectoring (64 basic vectors) to reduce overhead.
There hasn't been much innovation in architecture in over 20 years.
It's unlikely (but still possible) that it's staying the same.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Each memory address is normally associated with 8 bits of data (not counting correction bits). But processors nowadays routinely consume 64 bits at a time. That means getting the data from 8 different addresses simultaneously. Things would be simpler if they put all those 64 bits at one address --if every single address had 64 bits of data associated with it. In the previous processor generation, gobbling 32 bits at a time meant accessing 4 different addresses simultaneously, and the total accessible address space of the processor was essentially 30 bits instead of 32 bits --while you were allowed to access 4 addresses starting at Address Zero, you were not allowed to access 4 addresses starting at Address One or Address Two or Address Three. It could have been allowed if every address had had 32 data-bits associated with it. With 64-bit processors today needing to access 8 addresses at a time, the total effective address space is 61 bits instead of 64 bits (still a huge number, I know). Anyway, my main reason for writing this is, wouldn't memory run slightly faster if it didn't have to access all the data from 8 addresses simultaneously, but instead just got 64 bits, nicely parallel from any one address?
This article can shed some light on it: http://www.dba-oracle.com/t_hi... Looks like RAM is the laggard.
You can look up the specs easy.
Back in the 80286 days, there was not even an L1 cache however the memory and ISA bus ran at CPU speed 8-20Mhz. Hard drive latency was ~65ms.
In the 80486 days L1 cache was introduced and L2 was sometimes available in (very) expensive modules. I remember buying 256kb for the same price as 16MB RAM. The L1 caches ran (if I remember correctly) at CPU speed, 1 cycle. However the bus speed started to slow down compared to the CPU. The VLB ran at CPU bus speed ("local" bus) and was often used for graphics but PCI (an inferior bus) ran at 33MHz so for anything over 33MHz, we started needing dividers. The RAM ran at 80-120ns so it started being slower than the CPU bus. Hard drive speeds were however up to ~30ms.
In the Pentium age memory slowed even farther compared to the CPU bus. Now it took several cycles to access memory, buses ran even slower (still PCI mostly, eventually PCI-X (133MHz?) until PCI-e (serial buses running) came along. Hard drive speeds went up to ~15ms
In modern age, L1 caches have slowed even further requiring 4 cycles for L1 cache and up to 30 for L3 caches. RAM is even slower access with bus speeds about a quarter of a single CPU but sometimes 16 CPU's need to share those lanes. Peripheral bus speeds however have gone up and PCIe 3.0 is now directly integrated into CPU 80486 VLB-style. Hard drives have latencies of 10ms (we have a mechanical issue there) still but even cheap SSD's can go down to ~1-2ms.
Custom electronics and digital signage for your business: www.evcircuits.com
https://www.sandisk.com/busine...
Not only is the underlying physical technology getting better, but the software (aka filesystem) utilizing that hardware is also becoming more efficient. The likes of ZFS and ext4 are far better than predecessors (UFS or ext2/3). No troll-o, but I think NTFS and FATx are static in performance across hardware revisions.
Gah, forgot to log in.