Religion and design aesthetics aside, the commercial advantage of RISC was entirely a consequence of the fab process capabilities of the time RISC was introduced. At any given process, you can fit only so many gates on a die before yield goes to hell completely and only the government can afford to buy the chips. RISC uses fewer gates than CISC because it's a much simpler decode. So at some process you can fit a reasonable RISC processor on a cheap enough chip where you couldn't fit the same capability in CISC on the same chip. Alternatively and equivalently, you could fit a dinky CISC or a more capable RISC on the chip.
That edge process happened about 1980, and lasted for two fab generations ending around 1985. Then the fabs got good enough that you could fit either CISC or RISC, your choice, and the commercial advantage of RISC ISAs went away. The formerly dinky CISC got jazzed up by true designer heroics, and today you choose an architecture based on other reasons.
There remain a few markets where cheap (power and area) decode still matters enough to influence dollar decisions. Mobile is one, and CISC as she is spoke in x86 is at a disadvantage. However, most posters seem to assume that CISC necessarily has expensive decode, and that CISC == x86. Both assumptions are false.
There are some new ones in the pipeline, from companies you never heard of. We recently spec'd a core intended for NUMA supercomputer usage: 102GFlops, 960GBytes/sec memory bandwidth, 87 Watts.
Or for the merely human: 10Gflops (or Gops integer), standard workstation board interfaces, 9.2 Watts.
I work for a CPU startup (ootbcomp.com) that is exploring how our high-end chips would fit in a supercomputer. Our Mill architecture is a general-purpose CPU (yes, Linux) that looks like nothing you've ever seen when you get the hood up.
Supercomputers are multinode machines, which does *not* imply that they are loosely coupled clusters in the VATech model. A node is one CPU chip, a local memory, and an interconnect to other nodes to provide system-wide global shared memory; node counts from 100 to 10,000 are common. Per node, our proposed system has up to 100Gflops, up to 960GB/s memory bandwidth (to any mix of local or remote memory), up to 4 GB of local memory, and burns under a kilowatt. With reasonable packing densities you can get 3 Earth Simulators worth of flops and up to 7TB memory per rack, using around 600kWatts. That's with global shared memory and a maximum memory latency from any CPU to any byte of under 100 clocks.
We just do chips, but would be quite interested in someone who wanted to put our chips in their boxes and sell them in this market. Please don't kick the tires to satisfy your individual curiosity - if you don't know why those numbers are significant then we won't tell you, and if you are not with a present or potential market player then you'd be wasting time we need to spend getting it working:-)
Ivan
Acid randomizes settings - which can be useful
on
Lysergically Yours
·
· Score: 1
Few of the deep insights bear close inspection, and it's *really* hard to bring deep-trip things back anyway. But the experience can still be useful anyway.
An analogy: in the old days, loggers cut trees during winter and sledged them to stream banks with horses. Come the spring thaw the streams would flood, the logs would float, and downstream they'd go to the mill.
However, the streams and rivers often had random rocks, and a log wold fetch up. Then the log behind that would fetch up on the first, and pretty soon you have a solid mass of intertwined logs running five miles upriver. These jams had to be unjammed to get the logs down, and it had to be done before falling water levels made it impossible until next year.
The bigger mills had a set of jam experts on hire. The expert would travel to river and climb out on the jam with a peavey (a pole with a spike and hook for shifting logs around). He'd poke a little here, poke a little there, and find the one key log that was holding everything. Poke, shift, tie a rope around it and pull, and whoosh! - out goes the whole mess. The experts were paid *very* well.
Smaller logging companies could not afford the experts. If they got a jam, their best logger would try to locate the key log and play expert. But if he couldn't, the fix was to bundle a couple of sticks of dynamite into a likely looking place in the pile. Kaboom! - and if lucky, whoosh!. If not, try a few more sticks somewhere else.
The dynamite always blew a few logs to smithereens, and so the companies that could afford it used the experts instead of dynamite. But even the experts used explosives sometimes. The combination of skill and bang would unstick things, and that was what mattered.
It's the same with acid in manny ways. We all get stuck inside on occasion - sometimes small jams, sometimes so big that the clear flow of our river of mind is completely blocked. Those that can afford an expert (a shrink, or guru, or whatever) or who have developed the skill themselves (meditation, revelation, or whatever) can do just fine without dynamite.
For the rest of us, acid makes pretty good mental dynamite. Yes, it will smash a few logs, but the benefit of unblocking the mental and spiritual river is great. Of course, it doesn't offer much benefit when the river is flowing well anyway, so most people who go tripping seem to gradually cease after a while, except for an occasional "old time's sake".
And it make lousy recreation - acid makes you look at what you've been working for years to not look at. Takes your person and life and reality itself and rubs your nose in it. But sooner or later you will meet your greatest fear, and your heart's desire. For most people, the heart's desire is the scarier.
And always beautiful. Sometimes horribly beautiful, but always beautiful.
Ivan
Sure, there are practical advantages to being bright. But the only thing that really matters is that it lets you see that "if only" is a trap. Many people (who do not think of themselves as bright) feel that "if only I were smart then life would be wonderful". But smart people *are* smart - and life isn't wonderful. Hence, "if only I were smart" is a trap, and by extension *all* "if only"s are traps too.
This realization is greatly liberating because it lets you get on with your life, and so have a chance that it will actually *be* wonderful after all. People who are not smart have a harder time avoiding traps. Sure, if you're beautiful then you will avoid the "if only I were beautiful" trap, but beautiful won't help you make the generalization to other traps like smart will.
While a degree matters in the academic world and certain portions of industry, if you have any talent for this stuff at all then a University is the wrong model. We are really craftspeople, not scientists, and the right model is a medieval master craftsman's shop and apprenticeship. You can teach yourself by just starting a project, asking questions, reading and thinking hard - and you will be productive long before any place will have given you a degree. The sheepskin doesn't get you a job these days either.
Sure you will find it hard to get the first job if you go to Fortune500 and paid headhunters. So don't - use your network and find a small office that will take a chance on you, or volunteer your programming skills at a charity. If you're good at it you'll find work; after all, you don't have to mention that your Doctorate is not in CS:-)
I've had a lot of success hiring empty-nest women (usually) with no formal computer training at all and turning them into damn fine engineers. You can teach engineering to someone who can think a hell of a lot easier than you can teach thinking to someone who can engineer.
Ivan
The ClearSpeed processor is actually a co-processor (see the ClearSpeed site for details). A co-processor is a special purpose functional unit that you bolt onto the side of a regular processor that runs the actual program. The ClearSpeed chip is a floating-point banger; other co-processors do other things.
To use a co-processor, the main program wanders along until it hits something that the c-p can do - say it wants a 1000 point FFT done (ClearSpeed would be good at that). It loads the data into the c-p, pushes the button, and then idles until the c-p is done, usually signaled by an interrupt (boxes vary). It then unloads the result and continues.
Making some reasonable guesses (their site has little tech detail) I'm guessing that the c-p has pipelined floating point but no scoreboarding to keep things straight between instructions. An easy way to get more throughput in such a case is to give every nth instruction to a different thread of control i.e. multi threading on a cycle by cycle basis, rather than on an interrupt by interrupt basis (c-ps don't have interrupts). This keeps the multipliers busy at the cost of requiring software to keep things straight.
Speaking of software, to use a c-p you have to rewrite (and often redesign) your app so as to put in the necessary handshaking. This isn't just a matter of recompiling - think assembler level timing issues, even if you are writing in nominal C. C-ps do 10 line loops (real fast), not applications. You will never run gcc, or Apache, or Oracle, or Hello World on a ClearSpeed - but if you need a pile of FFTs and are willing to rethink/rewrite your code then it should brush your teeth.
C-ps are a very crowded market. Most are grid, stream, or vector machines, most often used for embedded codes like your friendly local MRI machine. ClearSpeed looks to be a reasonably workmanlike design, but no breakthrough IMO.
Ivan
If strong AI was an existential threat then it would already exist in the universe, and would have gobbled us already if it cared to.
If AI reached human levels then it would be subject to the same existential choices that we have.
Religion and design aesthetics aside, the commercial advantage of RISC was entirely a consequence of the fab process capabilities of the time RISC was introduced. At any given process, you can fit only so many gates on a die before yield goes to hell completely and only the government can afford to buy the chips. RISC uses fewer gates than CISC because it's a much simpler decode. So at some process you can fit a reasonable RISC processor on a cheap enough chip where you couldn't fit the same capability in CISC on the same chip. Alternatively and equivalently, you could fit a dinky CISC or a more capable RISC on the chip. That edge process happened about 1980, and lasted for two fab generations ending around 1985. Then the fabs got good enough that you could fit either CISC or RISC, your choice, and the commercial advantage of RISC ISAs went away. The formerly dinky CISC got jazzed up by true designer heroics, and today you choose an architecture based on other reasons. There remain a few markets where cheap (power and area) decode still matters enough to influence dollar decisions. Mobile is one, and CISC as she is spoke in x86 is at a disadvantage. However, most posters seem to assume that CISC necessarily has expensive decode, and that CISC == x86. Both assumptions are false.
There are some new ones in the pipeline, from companies you never heard of. We recently spec'd a core intended for NUMA supercomputer usage: 102GFlops, 960GBytes/sec memory bandwidth, 87 Watts.
:-)
Or for the merely human: 10Gflops (or Gops integer), standard workstation board interfaces, 9.2 Watts.
Linux of course
Ivan
I work for a CPU startup (ootbcomp.com) that is exploring how our high-end chips would fit in a supercomputer. Our Mill architecture is a general-purpose CPU (yes, Linux) that looks like nothing you've ever seen when you get the hood up. Supercomputers are multinode machines, which does *not* imply that they are loosely coupled clusters in the VATech model. A node is one CPU chip, a local memory, and an interconnect to other nodes to provide system-wide global shared memory; node counts from 100 to 10,000 are common. Per node, our proposed system has up to 100Gflops, up to 960GB/s memory bandwidth (to any mix of local or remote memory), up to 4 GB of local memory, and burns under a kilowatt. With reasonable packing densities you can get 3 Earth Simulators worth of flops and up to 7TB memory per rack, using around 600kWatts. That's with global shared memory and a maximum memory latency from any CPU to any byte of under 100 clocks. We just do chips, but would be quite interested in someone who wanted to put our chips in their boxes and sell them in this market. Please don't kick the tires to satisfy your individual curiosity - if you don't know why those numbers are significant then we won't tell you, and if you are not with a present or potential market player then you'd be wasting time we need to spend getting it working :-)
Ivan
Few of the deep insights bear close inspection, and it's *really* hard to bring deep-trip things back anyway. But the experience can still be useful anyway. An analogy: in the old days, loggers cut trees during winter and sledged them to stream banks with horses. Come the spring thaw the streams would flood, the logs would float, and downstream they'd go to the mill. However, the streams and rivers often had random rocks, and a log wold fetch up. Then the log behind that would fetch up on the first, and pretty soon you have a solid mass of intertwined logs running five miles upriver. These jams had to be unjammed to get the logs down, and it had to be done before falling water levels made it impossible until next year. The bigger mills had a set of jam experts on hire. The expert would travel to river and climb out on the jam with a peavey (a pole with a spike and hook for shifting logs around). He'd poke a little here, poke a little there, and find the one key log that was holding everything. Poke, shift, tie a rope around it and pull, and whoosh! - out goes the whole mess. The experts were paid *very* well. Smaller logging companies could not afford the experts. If they got a jam, their best logger would try to locate the key log and play expert. But if he couldn't, the fix was to bundle a couple of sticks of dynamite into a likely looking place in the pile. Kaboom! - and if lucky, whoosh!. If not, try a few more sticks somewhere else. The dynamite always blew a few logs to smithereens, and so the companies that could afford it used the experts instead of dynamite. But even the experts used explosives sometimes. The combination of skill and bang would unstick things, and that was what mattered. It's the same with acid in manny ways. We all get stuck inside on occasion - sometimes small jams, sometimes so big that the clear flow of our river of mind is completely blocked. Those that can afford an expert (a shrink, or guru, or whatever) or who have developed the skill themselves (meditation, revelation, or whatever) can do just fine without dynamite. For the rest of us, acid makes pretty good mental dynamite. Yes, it will smash a few logs, but the benefit of unblocking the mental and spiritual river is great. Of course, it doesn't offer much benefit when the river is flowing well anyway, so most people who go tripping seem to gradually cease after a while, except for an occasional "old time's sake". And it make lousy recreation - acid makes you look at what you've been working for years to not look at. Takes your person and life and reality itself and rubs your nose in it. But sooner or later you will meet your greatest fear, and your heart's desire. For most people, the heart's desire is the scarier. And always beautiful. Sometimes horribly beautiful, but always beautiful. Ivan
Sure, there are practical advantages to being bright. But the only thing that really matters is that it lets you see that "if only" is a trap. Many people (who do not think of themselves as bright) feel that "if only I were smart then life would be wonderful". But smart people *are* smart - and life isn't wonderful. Hence, "if only I were smart" is a trap, and by extension *all* "if only"s are traps too.
This realization is greatly liberating because it lets you get on with your life, and so have a chance that it will actually *be* wonderful after all. People who are not smart have a harder time avoiding traps. Sure, if you're beautiful then you will avoid the "if only I were beautiful" trap, but beautiful won't help you make the generalization to other traps like smart will.
Ivan
While a degree matters in the academic world and certain portions of industry, if you have any talent for this stuff at all then a University is the wrong model. We are really craftspeople, not scientists, and the right model is a medieval master craftsman's shop and apprenticeship. You can teach yourself by just starting a project, asking questions, reading and thinking hard - and you will be productive long before any place will have given you a degree. The sheepskin doesn't get you a job these days either. Sure you will find it hard to get the first job if you go to Fortune500 and paid headhunters. So don't - use your network and find a small office that will take a chance on you, or volunteer your programming skills at a charity. If you're good at it you'll find work; after all, you don't have to mention that your Doctorate is not in CS :-)
I've had a lot of success hiring empty-nest women (usually) with no formal computer training at all and turning them into damn fine engineers. You can teach engineering to someone who can think a hell of a lot easier than you can teach thinking to someone who can engineer.
Ivan
The ClearSpeed processor is actually a co-processor (see the ClearSpeed site for details). A co-processor is a special purpose functional unit that you bolt onto the side of a regular processor that runs the actual program. The ClearSpeed chip is a floating-point banger; other co-processors do other things. To use a co-processor, the main program wanders along until it hits something that the c-p can do - say it wants a 1000 point FFT done (ClearSpeed would be good at that). It loads the data into the c-p, pushes the button, and then idles until the c-p is done, usually signaled by an interrupt (boxes vary). It then unloads the result and continues. Making some reasonable guesses (their site has little tech detail) I'm guessing that the c-p has pipelined floating point but no scoreboarding to keep things straight between instructions. An easy way to get more throughput in such a case is to give every nth instruction to a different thread of control i.e. multi threading on a cycle by cycle basis, rather than on an interrupt by interrupt basis (c-ps don't have interrupts). This keeps the multipliers busy at the cost of requiring software to keep things straight. Speaking of software, to use a c-p you have to rewrite (and often redesign) your app so as to put in the necessary handshaking. This isn't just a matter of recompiling - think assembler level timing issues, even if you are writing in nominal C. C-ps do 10 line loops (real fast), not applications. You will never run gcc, or Apache, or Oracle, or Hello World on a ClearSpeed - but if you need a pile of FFTs and are willing to rethink/rewrite your code then it should brush your teeth. C-ps are a very crowded market. Most are grid, stream, or vector machines, most often used for embedded codes like your friendly local MRI machine. ClearSpeed looks to be a reasonably workmanlike design, but no breakthrough IMO. Ivan