The Legacy of CPU Features Since 1980s
jones_supa writes: David Albert asked the following question:
"My mental model of CPUs is stuck in the 1980s: basically boxes that do arithmetic, logic, bit twiddling and shifting, and loading and storing things in memory. I'm vaguely aware of various newer developments like vector instructions (SIMD) and the idea that newer CPUs have support for virtualization (though I have no idea what that means in practice). What cool developments have I been missing? "
An article by Dan Luu answers this question and provides a good overview of various cool tricks modern CPUs can perform. The slightly older presentation Compiler++ by Jim Radigan also gives some insight on how C++ translates to modern instruction sets.
"My mental model of CPUs is stuck in the 1980s: basically boxes that do arithmetic, logic, bit twiddling and shifting, and loading and storing things in memory. I'm vaguely aware of various newer developments like vector instructions (SIMD) and the idea that newer CPUs have support for virtualization (though I have no idea what that means in practice). What cool developments have I been missing? "
An article by Dan Luu answers this question and provides a good overview of various cool tricks modern CPUs can perform. The slightly older presentation Compiler++ by Jim Radigan also gives some insight on how C++ translates to modern instruction sets.
All that is added is more instructions. but everything is still only binary math.
Do not look at laser with remaining good eye.
The new Google, and hey the answer is even provided in the summary!
The first large scale availability of virtualisation was with the IBM 370 series, dating from June 30, 1970, but it had been available on some other machines in the 1960's.
So the idea that "newer machines have support for virtualisation" is a bit old.
Watch this Heartland Institute video
That's the 1960s, friend. Conceptually, not much has changed since then, it's just that it took a while for our manufacturing/matter manipulation to get small enough to make enough reliable transistors to implement these ideas cheaply.
You probably *heard about these ideas* for the first time in the 1980s, though.
Also served as a space heater.
The latest generation of CPUs have instructions to support transactional memory.
Near future CPUs will have a SIMD instruction set taken right out of GPUs where you can conditionally execute without branching.
The IBM 360/370 line and its successors have had decimal arithmetic (in addition to binary and after the 370/158 floating point) since the 1960/70s. Others have had these also.
There was a period in the 0s when PC processors were good for cooking eggs. You had to be careful with the AMD ones though, they had a tendency to burn the egg quickly.
I remember when VMWare first came out, and there was all this amazement about all the cool things you could do with Virtual Machines. Very little mention anywhere that these were things you could do for decades already on mainframes.
Same thing with I/O offloading (compared to mainframes, x86 and UNIX I/O offload is still quite primitive and rudimentary), DB-based filesystems (MS has been trying to ship one of those for over 20 years now; IBM has been successfully selling one (the AS/400 / iSeries) for 25, built-in encryption features, and a host of other features.
The whole evolution of multi-level caches has gotten a bit crazy. if Intel can put 2 billion transistors in a processor, how about instead of piling more cores in, just do 1 or 2 cores and a massive L1 SRAM cache instead? Memory latency is more of a bottleneck than anything just now.
Indeed, the Motorola 680x0 series for example. I loved the ABCD (add binary coded decimal) instruction and always tried to make (even if esoteric) use of it. Unfortunately this is the kind of instructions that were not supported by the C compilers of the time.
We just had a story about low-level improvements to the BSD kernel, and now we get an article about chip-level features and how compilers use them?
Is this some sort of pre-April-Fools /. in 2000 joke? Where are my Slashvertisements for gadgets I'll never hear about again? My uninformed blog posts declaring this the "year of Functional Declarative Inverted Programming in YAFadL"? Where the hell are my 3000-word /. editor opinions on the latest movie?
If this keeps up, this site might start soaking up some of my time instead of simply being a place I check due to old habits.
Last post!
I have had arguments over this. People in various fora have asked what programming languages they should learn. I always put assembly in my list. But is it really important enough to learn these days? Is hardware still relevant?
putting the 'B' in LGBTQ+
If you want to see more Slashdot-in-2000 style posts, and you have access to the sort of articles that Slashdot-in-2000 might have posted, Slashdot welcomes your submissions. You could even become a "frequent contributor".
If you want to get the most out of an 8-bit microcontroller, you'll need assembly language. Until recently, MCU programming wasn't easily accessible to the general public, but Arduino kits changed this.
For example, I worked for a decade in the linux kernel and low-level userspace. Assembly definitely needed. I tracked down and fixed a bug in the glibc locking code, and you'd better believe assembly was required for that one. During that time I dealt with assembly for ARM, MIPS, powerpc, and x86, with both 32 and 64-bit flavours of most of those. But even there most of the time you're working in C, with as little as possible in assembly.
If you're working in the kernel or in really high-performance code then assembly can be useful. If you're working with experimental languages/compilers where the compilers might be flaky, then assembly can be useful. If you're working in Java/PHP/Python/Ruby/C# etc. then assembly is probably not all that useful.
SIMD was developed in the 70's with CDC 660/77600 Cyber 100 and the Cray 1 ...
http://en.wikipedia.org/wiki/SIMD
I haven't seen the article or video. But for 99% of developers, I'd say the only CPU-level changes since the 8086 that matter are caches, support for threading and SIMD, and the rise of external GPUs.
Out-of-order scheduling, branch prediction, VM infrastructure like TLBs, and even multiple cores don't alter the programmer's API significantly. (To the developer/compiler, multicore primitives appear no different than a threading library. The CPU still guarantees microinstruction execution order.)
Some of the compiler optimization switches have become more complex, and perhaps a few coding idioms are now deprecated/encouraged so that compilers better understand what you intend (so you don't make their job unnecessarily harder).
But overall, almost all developer techniques don't benefit from changes to CPU microarchitecture after 1990, aside from caches, SIMD, and GPUs.
And of course, ever since the 80486 (1989), all CPUs support floating point instructions.
I appreciate the sentiment of the article, but as others have pointed out, a lot of these "new" features have been around for a very long time.
What has changed significantly is addition of functions onto the CPU die: whether you want them or not.
I'm specifically talking about the "vPro / SE / Management Engine" bullshit (yes, I hate it) that is being pushed out to everyone. I know that it helps in a corporate setting, but all of that low-level interception takes its toll.
I've been troubleshooting some software that communicates with a device over USB. When I benchmark my code on Ivy Bridge systems I get 1ms read/write times (lots of overhead, the actual R/W by the device is much faster) yet on the newer and "faster" Haswell systems this jumps to 6ms. It's been tested across multiple motherboards, so while I don't know WTF is going on, there is something different about the CPU architecture (or extra features) that is causing this change in behavior.
In case anyone is curious, the OS is Windows 7x64 and the software is .NET v4.5.
I'm not a CPU expert so feel free to take my opinions below with a grain of salt... (grin)
The biggest change to processors in general is the increased use and power of desktop GPUs to offload processing-intense math operations. The parallel processing power of GPUs outstrips today's CPUs. I'm sure that we will be seeing desktop CPUs with increased GPU like parallel processing capabilities in the future.
http://en.wikipedia.org/wiki/G...
http://www.pcworld.com/article...
The GP said decimal arithmetic. Those of us that know about processors and electronics - including the GP, but apparently not you - know *exactly* what he meant by that.
Hint: go look up the instructions that deal with "binary coded decimal" for x86 or 680x0.
but CPUs ARE stuck in the 1980s. Although some "new" features have only been recently implemented, almost all of them were conceived decades ago.
I'm not sure SIMD really falls outside of a "1980s" model of a CPU. Maybe if your model means Z80/6502/6809/68K/80[1234]86/etc, rather than including things like Cray that at least some students and engineers during the 80s would have been exposed to.
von Neumann execution, but Harvard cache become common place in the 1990s. Most people didn't need to know much about the Harvard-ness unless they need to do micro-optimization.
Things don't get radically different until you start thinking about Dataflow architecture and Transactional memory. I'm not sure if Dataflow will ever come back, but transactional memory is here and pops up from time to time and I think it will get big pretty soon as moves beyond being some small obscure part in a processor core.
(Transport Triggered Architecture is equivalent to von Neumann for software when abstracted out in a macro assembler, so I don't count TTA as something new, plus it was pretty uncommon)
“Common sense is not so common.” — Voltaire
while (hitCount < arraySize) {
i=rand() % arraySize;
if (hitArray[i] == 0) {
sum += array[i];
array[i]=0;
hitArray[i]=1;
hitCount++;
}
}
https://www.eff.org/https-everywhere
And of course, ever since the 80486 (1989), all CPUs support floating point instructions.
486 SX chips had the FPU disabled or absent. So not all CPUs (or even all 80486 CPUs). As far as I'm aware Penitum (586) did not have a model without FPU support (although in the MMX models, you couldn't use MMX and the FPU at the same time).
He effected a bored affect.
I used to sort of understand how a computer works. Not anymore. It's just magic.
Guy who doesn't understand how CPUs work amazed about how CPUs work. /thread
It must have been something you assimilated. . . .
Can someone explain why in the example race condition code given, the theoretical minimum count is 2, and not n_iters?.
You might like the second sentence of the article:
things that were new to x86 were old hat to supercomputing, mainframe, and workstation folks.
Just curious, did you read one sentence of the article before commenting on it? :rolleyes Obama voters
look up BCD ..it's not usual binary math, look like you have gaps in YOUR knowledge of processors or electronics
Also note that rather recently Intel drastically dropped the accuracy of their FPU's in order to make the performance numbers look better.... dont expect 80-bit procession even when explicitly using the x87 instructions now... its now been documented that this is the case but for a few years Intel got away without publicly acknowledging the large drop in accuracy....
"His name was James Damore."
if you understand scalar assembly, understanding the basic "how" of vector/SIMD programming is conceptually similar
Actually, if you think back to pre-32bit x86 assembler, where the X registers (AX, BX) were actually addressable as half-registers (AH and AL were the high and low sections of AX), you already understand, to some extent, SIMD
SIMD just generalizes the idea that a register is very big (e.g. 512 bits), and the same operation is done in parallel to subregions of the register.
So, for instance, if you have a 512 bit vector register and you want to treat it like 64 separate 8 bit values, you could write code like follows:
C = A + B
If C, A, and B are all 512 bit registers, partitioned into 64 8 bit values, logically, the above vector/SIMD code does the below scalar code:
for (i == 1..64) {
c[i] = a[i] + b[i]
}
If the particular processor you are executing on has 64 parallel 8-bit adders, then the vector code
C = A + B
Can run as one internal operation, utilizing all 64 adder units in parallel.
That's much better than the scalar version above - a loop that executes 64 passes..
A vector machine could actually be implemented with only 32 adders, and could take 2 internal ops to implement a 64 element vector add... that's still a 32x speedup compared to the scalar, looping version.
The Cray 1 was an amazing machine. It ran at 80mhz in 1976
http://en.wikipedia.org/wiki/C...
According to WP, the only patent on the Cray 1 was for its cooling system...
My opinions are my own, and do not necessarily represent those of my employer.
Even when modern optimizing compilers "unroll" them: See here for results http://stackoverflow.com/quest... with permutations of most *any* kind there (mixing asm with optimization, or not, etc. - et al).
* I used to use it ALL the time (since loops are a "key area" I know of that HELPS a LOT, when redone in inline assembly in Delphi/Object-Pascal, for speed) in Delphi 2.x - 7.x in 32-bit (was unable to in Delphi XE2, but it's been reinstated as an ability of Delphi in XE3 onwards to present XE7).
APK
P.S.=> Enjoy that read - it TRULY is, informative... apk
Please tell me what voltage do you put on a data line to give it a 2 or 3.
It's binary. You only have digital 1's and 0's It seems you dont know anything at all about processors or electronics.
Whatever you want. But it's easier and simpler and less technically complex to just use a threshold value to represent a 1 or 0, instead of trying to set a series of ranges to achieve greater than binary capability.
Hey look everyone, a clueless smarmy smartass! Putting aside your astounding ignorance of fundamental stuff (let me guess: recent EE graduate or programmer?), there is three-level logic out there.
Or even more, as in what happens inside MLC flash memory.
Looks like there are more things between heaven and Earth than are dreamt of in your philosophy. Good thing you posted AC! LOL
Z80 had some support to iirc via the half carry flag and some sort of BCD correction instruction (I once wrote a z80 emulator but seem to have suffered a memory fail since then).
Citation required
All I have been able to find are Gcc bug reports as to why it is hard to fix codegen for 80-bit x87 register spills that would lead to unexpected rounding. (Programming error for the must part of the application not using long double data type.)
You have been gone long, a CPU hasn't been a box since the 70's.
All processors work in binary. It seems the low IQ types think they can do decimal math.
its 100% binary.
Strictly speaking, there's no reason for user binary other than that it makes some things a lot easier, while it makes other things a bit more difficult.
For example, during the early time of electronic engineering, the Russians/Soviets experimented with ternary computers, the "SETUN" while the USA had the "Ternac". Both had more complicated hardware than a binary computer, but were a lot more efficient at processing arithmetic instructions.
See: http://en.wikipedia.org/wiki/T...
And who knows, in a few decades, people might thing binary to be quaint and outdated, given that Qubits are so much, much more efficient.
DAA decimal adjust accumulator after doing add,sub
So, what, just double or triple the voltage?
This comment is my opinion and does not represent an official position of Donald Trump or others I do not work for
Remember the Model 2 that could be upgraded to the Model 16? Model 2 ran on a Z80 (TRSDOS). After upgrade, it ran on a 68000 (Xenix) but kept the Z80 for i/o and keyboard handling.