The Legacy of CPU Features Since 1980s
jones_supa writes: David Albert asked the following question:
"My mental model of CPUs is stuck in the 1980s: basically boxes that do arithmetic, logic, bit twiddling and shifting, and loading and storing things in memory. I'm vaguely aware of various newer developments like vector instructions (SIMD) and the idea that newer CPUs have support for virtualization (though I have no idea what that means in practice). What cool developments have I been missing? "
An article by Dan Luu answers this question and provides a good overview of various cool tricks modern CPUs can perform. The slightly older presentation Compiler++ by Jim Radigan also gives some insight on how C++ translates to modern instruction sets.
"My mental model of CPUs is stuck in the 1980s: basically boxes that do arithmetic, logic, bit twiddling and shifting, and loading and storing things in memory. I'm vaguely aware of various newer developments like vector instructions (SIMD) and the idea that newer CPUs have support for virtualization (though I have no idea what that means in practice). What cool developments have I been missing? "
An article by Dan Luu answers this question and provides a good overview of various cool tricks modern CPUs can perform. The slightly older presentation Compiler++ by Jim Radigan also gives some insight on how C++ translates to modern instruction sets.
The first large scale availability of virtualisation was with the IBM 370 series, dating from June 30, 1970, but it had been available on some other machines in the 1960's.
So the idea that "newer machines have support for virtualisation" is a bit old.
Watch this Heartland Institute video
He wrote, "introduced to x86 since the early 80s include paging / virtual memory, pipelining, and floating point." We know that some platforms had some of these features earlier than x86, but he was speaking to those who had been programming on the x86 platform. Of course, this ignores the x87 math coprocessor, but I digress.
Gamingmuseum.com: Give your 3D accelerator a rest.
The latest generation of CPUs have instructions to support transactional memory.
Near future CPUs will have a SIMD instruction set taken right out of GPUs where you can conditionally execute without branching.
The IBM 360/370 line and its successors have had decimal arithmetic (in addition to binary and after the 370/158 floating point) since the 1960/70s. Others have had these also.
There was a period in the 0s when PC processors were good for cooking eggs. You had to be careful with the AMD ones though, they had a tendency to burn the egg quickly.
Then it's settled. Your edge case must apply to everybody. The article is wrong.
I'm not a nerd. Nerds are smart.
I remember when VMWare first came out, and there was all this amazement about all the cool things you could do with Virtual Machines. Very little mention anywhere that these were things you could do for decades already on mainframes.
Same thing with I/O offloading (compared to mainframes, x86 and UNIX I/O offload is still quite primitive and rudimentary), DB-based filesystems (MS has been trying to ship one of those for over 20 years now; IBM has been successfully selling one (the AS/400 / iSeries) for 25, built-in encryption features, and a host of other features.
Current 64-bit path/register cpu architechture will satisfy most computing requirements for some time to come. The only real reason to increase data path width is to address more data. Until we have need to address 16 exabytes, 64 bit will remain in favour everywhere because $$$.
I am staying away from your lawn, that's for sure. If my frisbee lands over there, you can keep it; you've earned it.
We just had a story about low-level improvements to the BSD kernel, and now we get an article about chip-level features and how compilers use them?
Is this some sort of pre-April-Fools /. in 2000 joke? Where are my Slashvertisements for gadgets I'll never hear about again? My uninformed blog posts declaring this the "year of Functional Declarative Inverted Programming in YAFadL"? Where the hell are my 3000-word /. editor opinions on the latest movie?
If this keeps up, this site might start soaking up some of my time instead of simply being a place I check due to old habits.
Last post!
If you want to see more Slashdot-in-2000 style posts, and you have access to the sort of articles that Slashdot-in-2000 might have posted, Slashdot welcomes your submissions. You could even become a "frequent contributor".
For example, I worked for a decade in the linux kernel and low-level userspace. Assembly definitely needed. I tracked down and fixed a bug in the glibc locking code, and you'd better believe assembly was required for that one. During that time I dealt with assembly for ARM, MIPS, powerpc, and x86, with both 32 and 64-bit flavours of most of those. But even there most of the time you're working in C, with as little as possible in assembly.
If you're working in the kernel or in really high-performance code then assembly can be useful. If you're working with experimental languages/compilers where the compilers might be flaky, then assembly can be useful. If you're working in Java/PHP/Python/Ruby/C# etc. then assembly is probably not all that useful.
You are a retarded idiot. The author states right at the beginning of the article that he's focusing on x86. In the (late) 80s, most people had an IBM PC, if they had anything.
"Most", maybe. But the late 1980s were the heyday of the Macintosh, Amiga and Atari ST.
Come to think of it, I'm not even sure of "most" outside the business world. The Commodore 64 and Apple computers fit in there somewhere.
Diminishing returns on cache misses
Higher latency of larger caches
Higher latency of more layers of cache
Poor transistor scaling of fully associate caches or increased rate of false evictions for n-way caches.
Increased power usage. It's very difficult to turn off part of your cache to save power, but it's very easy to turn off a core
Not all problems scale well with more cache
I'm sure there are many other reasons.
indeed, the architecture of stream processors is quite a bit different than the general purpose processors we are used to programming. It's kind of exciting that programming stream processors through shaders, openCL, and CUDA has gone mainstream. And for a few hundred dollars a poor college student can afford to build a small system capable of running highly parallel programs. While not equivalent in performance to a super computer, has structural similarities sufficient for experimentation and learning.
20 years ago I wouldn't have believed that a poor programmer could buy a system that had 1000+ execution engines in it.
“Common sense is not so common.” — Voltaire
if you understand scalar assembly, understanding the basic "how" of vector/SIMD programming is conceptually similar
Actually, if you think back to pre-32bit x86 assembler, where the X registers (AX, BX) were actually addressable as half-registers (AH and AL were the high and low sections of AX), you already understand, to some extent, SIMD
SIMD just generalizes the idea that a register is very big (e.g. 512 bits), and the same operation is done in parallel to subregions of the register.
So, for instance, if you have a 512 bit vector register and you want to treat it like 64 separate 8 bit values, you could write code like follows:
C = A + B
If C, A, and B are all 512 bit registers, partitioned into 64 8 bit values, logically, the above vector/SIMD code does the below scalar code:
for (i == 1..64) {
c[i] = a[i] + b[i]
}
If the particular processor you are executing on has 64 parallel 8-bit adders, then the vector code
C = A + B
Can run as one internal operation, utilizing all 64 adder units in parallel.
That's much better than the scalar version above - a loop that executes 64 passes..
A vector machine could actually be implemented with only 32 adders, and could take 2 internal ops to implement a 64 element vector add... that's still a 32x speedup compared to the scalar, looping version.
The Cray 1 was an amazing machine. It ran at 80mhz in 1976
http://en.wikipedia.org/wiki/C...
According to WP, the only patent on the Cray 1 was for its cooling system...
My opinions are my own, and do not necessarily represent those of my employer.
There is some interesting stuff. But it mostly boils down to ways to optimize code. The older chips may have had the idea for something but didn't implement it due to the enormous cost. Sometimes it's handy to have just a couple of instructions to help out rather than add a giant feature; is in having no floating point or multiplication (early RISC machines) but having an instruction to find first or last bit set which makes the software library to do this much faster.
There are instructions to help out cryptography, which I don't think any computer in the 60s was concerned enough about to devote expensive hardware to it. Instructions to support atomic operations even within a multiprocessor environment is present in many modern CPUs too, whereas in the past if there were multiprocessors there would usually be some round-about way to do this. As processors got more complex with out of order execution and delayed writes, there was a need for instructions to synchronize operations, such as the "EIEIO" instruction on the PowerPC. Possibly some of this was present on early supercomputers but today these are present in mainstream processors.
/* this line is misindented */
Whoa there. He's not composing python. He's writing in a real programming language.