Ask Slashdot: Is the Gap Between Data Access Speeds Widening Or Narrowing?
New submitter DidgetMaster writes: Everyone knows that CPU registers are much faster than level1, level2, and level3 caches. Likewise, those caches are much faster than RAM; and RAM in turn is much faster than disk (even SSD). But the past 30 years have seen tremendous improvements in data access speeds at all these levels. RAM today is much, much faster than RAM 10, 20, or 30 years ago. Disk accesses are also tremendously faster than previously as steady improvements in hard drive technology and the even more impressive gains in flash memory have occurred. Is the 'gap' between the fastest RAM and the fastest disks bigger or smaller now than the gap was 10 or 20 years ago? Are the gaps between all the various levels getting bigger or smaller? Anyone know of a definitive source that tracks these gaps over time?
The distance between the "fastest" and "slowest" gets larger and larger, but the gaps are getting smaller because things like SSDs fill them.
...but not just because someone else is too lazy to do it themselves. Soing the maths would have taken less effort than writing this /. fill-piece.
There is currently a real gap gap.
Does it matter? Fast CPU, fast RAM, fast disks is like having no speed limits on every race track in the world - but in order to get from track to track you have to go on the interstates or perhaps back country roads (PCI bus, etc). Sure, each component is fast and getting faster, but the way those components connect to each other hasn't changed all that much...
Don't blame me, I voted for Kodos
This could literally be answered with three google searches. '2015 l2 cache speed' 'ddr4 speed' '2015 ssd speed'
Memristers can be layed out in a large grid which might be the future, but I really don't see the point of talking about the evolution of the hard drive into the cpu.
I'm not sure what a historic timeline of these ratios (not "differences", please) would gain you.
These ratios can have a big impact on what algorithms and implementations you choose to maximize performance. I suppose if, say, the ratio of RAM to disk speed increased by a factor of 10 over the decade before last, then decreased back to its original ratio in the last decade, it might be worth trawling through some old papers (or old source trees) to revisit lessons learned in the earlier period -- but that seems like a bit of a stretch.
If you're just curious, it shouldn't be too hard to generate timelines of CPU cycle speeds, cache and RAM latencies and bandwidths, disk performance, and so on. But really, each of those has enough factors that a simple "ratio" would probably conceal more than it illuminates.
for a CS or IT class?
Yes, yes it is.
PlanetVulkan.com
... the answer is almost certainly "yes."
Then again, the gap could be the same as it was decades ago. But my hunch is the answer is yes, "the Gap Between Data Access Speeds [is] Widening Or Narrowing".
is still in effect. Every memory speed increase is perforce dwarfed by speed increases above in the hierarchy. The ongoing stall in increasing CPU clock speeds has obscured the nature of the problem, but we'll never go back to the 1980s when CPUs were not fast enough to cause memory hot spots.
No one has attacked the problem successfully except for architects who have designed split-phase memory transactions for loads and stores along with the capability for many of them to be simultaneously in flight. This requires lots of silicon for management, and significant investment in compiler technology to be effective, a cost that almost no one has been willing to bear.
"Everyone knows that CPU registers are much faster than level1, level2, and level3 caches."
I'd argue that most people don't even know what a CPU register is, never mind what it's faster than.
BeauHD. Worst editor since kdawson.
Personally, We've seen the clock rates stall essentially for 15 years. Instead slight of hand has been used through multi-core tech. A brute force measure until they fix the main issue. Clock speeds at the bus level without having to use multipliers.
Originally, there was CPU registers, and memory. Then there was registers, memory, and disk. Then there was registers, SRAM cache, memory, and disk. Then there was registers, L1 cache (on CPU), L2 cache (on mobo), DRAM, and disk. Then the L2 moved onto the CPU. Then there was L3. Then SSDs were added between RAM and disk. Now some chips have an L4 cache on the CPU package (but not the CPU die).
Oh, and there's a difference between latency and bandwidth. DRAM latency has not significantly improved over time, particularly compared to DRAM bandwidth.
And with multiple cores, some levels are core-specific while others are not. You can even have a bizarre situation where L1 cache is per-core, L2 cache is shared between two cores, and L3 cache is per-CPU (in SMP setups, that means main RAM is the first level shared among all cores).
For a number of years, we were in the technological progress era, we're now in the commercial progress era.
Slashdot, fix the reply notifications... You won't get away with it...
I think that a lot of good IT folks are looking at where the bottlenecks are within the technologies available and making implementation decisions in a much more detailed way. Right now, and has often been the case, the bus becomes the bottleneck for a lot of applications. Sure there are memory bound, and I/O bound problems that still see the gaps between memory speed and disk read/write speeds as an issue, but there are far more problems that are handicapped due to the system bus speed being hobbled, comparatively. One prime example being problems that could be solved by GPGPU systems that have to swap a lot of data across the system bus to an expansion bus and get hung up doing so. The same could be said for coprocessors like the Xeon Phi. Bus speeds have always lagged other technologies that interface with them, and that's really a concern for most computing problems. Overall, I'd say that the gaps are getting smaller, but there is still a lot of work to be done to integrate the various protocols and technologies used in the end-to-end computing environment in order to make things more streamlined from a data throughput point of view.
The latency of RAM is improving very slowly, only something like 2x-4x improvement in last 20 years.
Only the bandwidth of the memory is growing faster, and that's just because they have been putting more dram cells in parallel, always doing bigger data transfers and having faster memory bus.
Same is true for hard disk drive speed, the rotation speeds dictates the random access latency and the rotation speed of average hard disk has only gone up from 4200 or 5400 to 7200 rpm in the last 20 years, meaning only 1.7 or 1.33 times improvement in random access latency
Though replacing hard disks with flash-based SSD storage has improved latency by a huge margin.
There are orders of magnitude difference in the access times. Disk access is measured in milliseconds; memory access is measured in nanoseconds; register speed in picoseconds. Improving disk access by even 1 millisecond closes the gap.
Another thing to take into account is CPU speed.
A Xeon5560 (Nehalem) at 2.8Ghz will execute 89.6 instructions per nanosecond.
or, about 89,600,000 instructions per milisecond. how many miliseconds does it take to access flash?
math: nehalem = 4 core * 8 single precision per clock = 32 ops per clock * 2.8ghz = 89.6 GigaOp = 89.6 ops per nanosecond
Geek note: yes, I am aware of FLOP calcs. yes, I am aware of SIMD. yes, I am aware of out of order and all the fun stuff in the CPU, but we have to have SOMETHING to measure
A modern cheap laptop from today is faster than a Convex supercomputer I was using 20 years ago. But in that same time span disk seek times dropped from 15 milliseconds down to about 7.5 milliseconds which is only a factor of two.
We don't see the world as it is, we see it as we are.
-- Anais Nin
Yes, it does it matter.
For example, if you want to sort objects which does not fit at one level, the sorting may spill over to a next level of much slower memory. This imbalance of speed is relevant regardless of the interconnect speed limits, as a delta as such is the culprit.
There are some ways to improve on that.
From Wikipedia > Sorting_algorithm > Memory_usage_patterns_and_index_sorting
https://en.wikipedia.org/wiki/...
That is one example of a solution to a problem that matters.
20 years ago main memory was 10-14 ns, instruction cycle time was 2-4ns (Cray)
Guess what? it still is.
Memory has grown, it has gotten cheaper.
What HASN'T changed? Access to memory. That is how Cray got its speed - instead of a single port to memory, it used a crossbar switch - 4 ports for each processor. 1 instruction bus, 2 input data busses, and one output bus; even I/O got its own port to memory; all with overlapping address/data cycles.
The effect was that all of main memory worked at the speed of cache, thus the CPU had no need to waste silicon on cache memory - and the entire system ran full speed.
What slows down the current systems? Memory access. Most systems only have a single port to main memory. Some servers and "high performance" desktops have dual ported memory. Yet even dual ported memory access is slow when you have to share it among 4/8 cores... plus I/O (which isn't dual ported). Interrupt latency on PCs is really horrible. Still only 15 IRQs? and have to share them? No direct vectoring? Forced interrupt chain actions? Even the old PDP 11 with ONE interrupt request line allowed direct interrupt vectoring (64 basic vectors) to reduce overhead.
There hasn't been much innovation in architecture in over 20 years.
It's unlikely (but still possible) that it's staying the same.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Each memory address is normally associated with 8 bits of data (not counting correction bits). But processors nowadays routinely consume 64 bits at a time. That means getting the data from 8 different addresses simultaneously. Things would be simpler if they put all those 64 bits at one address --if every single address had 64 bits of data associated with it. In the previous processor generation, gobbling 32 bits at a time meant accessing 4 different addresses simultaneously, and the total accessible address space of the processor was essentially 30 bits instead of 32 bits --while you were allowed to access 4 addresses starting at Address Zero, you were not allowed to access 4 addresses starting at Address One or Address Two or Address Three. It could have been allowed if every address had had 32 data-bits associated with it. With 64-bit processors today needing to access 8 addresses at a time, the total effective address space is 61 bits instead of 64 bits (still a huge number, I know). Anyway, my main reason for writing this is, wouldn't memory run slightly faster if it didn't have to access all the data from 8 addresses simultaneously, but instead just got 64 bits, nicely parallel from any one address?
This article can shed some light on it: http://www.dba-oracle.com/t_hi... Looks like RAM is the laggard.
You can look up the specs easy.
Back in the 80286 days, there was not even an L1 cache however the memory and ISA bus ran at CPU speed 8-20Mhz. Hard drive latency was ~65ms.
In the 80486 days L1 cache was introduced and L2 was sometimes available in (very) expensive modules. I remember buying 256kb for the same price as 16MB RAM. The L1 caches ran (if I remember correctly) at CPU speed, 1 cycle. However the bus speed started to slow down compared to the CPU. The VLB ran at CPU bus speed ("local" bus) and was often used for graphics but PCI (an inferior bus) ran at 33MHz so for anything over 33MHz, we started needing dividers. The RAM ran at 80-120ns so it started being slower than the CPU bus. Hard drive speeds were however up to ~30ms.
In the Pentium age memory slowed even farther compared to the CPU bus. Now it took several cycles to access memory, buses ran even slower (still PCI mostly, eventually PCI-X (133MHz?) until PCI-e (serial buses running) came along. Hard drive speeds went up to ~15ms
In modern age, L1 caches have slowed even further requiring 4 cycles for L1 cache and up to 30 for L3 caches. RAM is even slower access with bus speeds about a quarter of a single CPU but sometimes 16 CPU's need to share those lanes. Peripheral bus speeds however have gone up and PCIe 3.0 is now directly integrated into CPU 80486 VLB-style. Hard drives have latencies of 10ms (we have a mechanical issue there) still but even cheap SSD's can go down to ~1-2ms.
Custom electronics and digital signage for your business: www.evcircuits.com
Datasize has increased in all things. It's 1 thing you can't really offset as it's reality. It gets slower. Data grows. Even in code you use initially declaring large values (double onwards etc.) & going from 8->16->32->64 bit in programs you use (this is a hand-optimization I often do, say vs. std. widestring down to shortstr when possible, from integer to shortint for example, it helps on disksize too, not only memory potential consumption) + they get larger on disk too in program doing it. Pickup times from mechanical disk increase (I offset it with compressed executables lessening disksize area used) - not so bad with SSD or on live ramdrive storage (software or hardware). It offsets disk access to load/store cycles, especially if compressed data files on filesystems are used for data & what I do in exe compacting for programs themselves above). Lots of little hairs that addup to a moustache in mechanical diskbound world (largely storage).
---
Elevator algorithms on harddrives help offset it. Good move in hardware for that but still, see above.
Bufferbloat congestion was unexpected iirc as well online as another factor to mitigate that 'backfires'.
APK
P.S.=> The "infamous they" say "Size matters" - & in this case, YES, it does for hdd access which is still the most prevalently used, & striping/spanning disks, shortstroking them, or caching of all forms notwithstanding (not knocking the latter UNLESS aging problems arise, causing stale data & there are those, see above), on hdd's, even @ diskfarms for storage for huge commercial sites? No questions asked it does... apk
" The "infamous they" say "Size matters" - & in this case, YES, it does for hdd access which is still the most prevalently used, & striping/spanning disks, shortstroking them, or caching of all forms notwithstanding (not knocking the latter UNLESS aging problems arise, causing stale data & there are those, see above), on hdd's, + indexing in databases vs. HUGE files,
See subject: I forgot to note it as it helps vs. large diskmass for internal parsing of records too.
APK
P.S.=> Nitpicking myself today... apk
" The "infamous they" say "Size matters" - & in this case, YES, it does for hdd access which is still the most prevalently used, & striping/spanning disks, shortstroking them, or caching of all forms notwithstanding (not knocking the latter UNLESS aging problems arise, causing stale data & there are those, see above), on hdd's, + indexing in databases vs. HUGE files PLUS COMPACTING DATABASES (removing blank bloat spots containing NO DATA) even @ diskfarms for storage for huge commercial sites? No questions asked it does... apk" - ME
See subject: I forgot to note it as it helps vs. large diskmass for internal parsing of records too.
APK
P.S.=> Nitpicking myself today 2nd TIME again (had to, leaving no stones unturned) - since WHEN YOU SPEEDUP THE SLOWEST PART, the entire system speeds up - datasize REALLY affects HDD's, but when you come right down to it? It affects even in-memory operations as well, especially in RAM limits are hit & paging, yes FROM RAM to disk, begins... apk
" The "infamous they" say "Size matters" - & in this case, YES, it does for hdd access which is still the most prevalently used, & striping/spanning disks, shortstroking them, or caching of all forms notwithstanding (not knocking the latter UNLESS aging problems arise, causing stale data & there are those, see above), on hdd's, + indexing in databases vs. HUGE files PLUS COMPACTING DATABASES (removing blank bloat spots containing NO DATA) as well as DEDUPLICATION/NORMALIZATION on disk and in db's both, AND HDD defragmenting to offset excessive headswing movement instead picking up files as a single piece in 1 pass even @ diskfarms for storage for huge commercial sites? No questions asked it does... apk" - ME
See subject: I forgot to note it as it helps vs. large diskmass for internal parsing of records too.
APK
P.S.=> Nitpicking myself today 3rd TIME again (had to, leaving no stones unturned) - since WHEN YOU SPEEDUP THE SLOWEST PART, the entire system speeds up - datasize REALLY affects HDD's + internal file parsings (not as bad due to index seeks), but when you come right down to it? It affects even in-memory operations as well, especially in RAM limits are hit & paging, yes FROM RAM to disk, begins... apk
https://www.sandisk.com/busine...
Not only is the underlying physical technology getting better, but the software (aka filesystem) utilizing that hardware is also becoming more efficient. The likes of ZFS and ext4 are far better than predecessors (UFS or ext2/3). No troll-o, but I think NTFS and FATx are static in performance across hardware revisions.
See subject: Gets better minus geometries used for filesystems for mechanical disks & the structures it demands ramdrives/ramdisks do NOT need - this redesign will lend to far smaller overheads on the logical filesystems end of things & make them even FASTER...
* That's coming soon...
(So what's already "the street dominator" in diskspeeds is about to get another jump in its lead beyond access/seek advantages it has now...)
APK
P.S.=> It's pretty exciting knowing it's going to happen & boost the slowest part of a system (SSD, fast as it is over HDD, is still the laggard)... apk