How to Kill x86 and Thread-Level Parallelism
kid inputs: "There's an interesting article discussing how one might go about 'killing' x86. The article details a number of different technological solutions, from a clean 64-bit replacement (Alpha?), to a radically different VLIW approach (Itanium), and an evolutionary solution (Opteron). As is often the case in situations like these, market forces dictate which technologies become entrenched and whether or not they stay that way (VHS vs Beta, anyone?). Another article by the same author covers hardware multi-threading and exploiting thread level parallelism, like Intel's Hyperthreading or IBM's POWER4 with its dual-cores on a die. These types of implementations can really pay off if the software supports it. In the case of servers, most applications tend to be multi-user, and so are parallel in nature."
Post! First
A From Litte A system endian!
Rules! x86
Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.
Buy Apple :D
The space shuttle still uses 16-bit x86s, the financial system is reliant on v_e_r_y old systems which spew out dot-matrix printed backups. Old systems survive today, and IMHO will always. It has to be organic.
--
Slashdot: Racism against Indians OK. China bad, USA good. Blue pill in water supply.
We should rewrite all of our COBOL programs in C while we're at it.
Might as well compound the folly of tossing out a perfectly good instruction set with the folly of tossing out perfectly good source code.
Update, don't reinvent. The desire to reinvent is a junior engineer character flaw. It takes several experiences in spending long hours tracking down bugs in the new implementation rather than simply updating some older code that worked fine.
I have been pwned because my
This is much like my day to day work. The h/w guys thinks they are gods and always blames us s/w guys not to utilize the smartness of their designs fast enough. s/w compatibility is what counts for general purpose systems, and it always will. You can cry the guts out of yourself about bad system design and segment hell etc etc and it will not help.
"Throughput computing"..where the performance is measured not individually but in aggregate.o ug hputcomputing/ for more details.
See their media kit available at
http://www.sun.com/aboutsun/media/presskits/thr
However, I believe the whole idea is nothing new. AFAIK, there are only two ways of increasing the performance of a processor (Operations Per Second) - either increase the IPC (Instructions per cycle) by increasing parallelism or decrease the cycle time by increasing the clock Rate (Ghz).
Each method has its limits and follows the law of diminishing returns - for e.g. increasing the clock rate implies increasing the number of stages in the pipeline...and after say 10000 stages, the penalties imposed due to flushing the pipeline might compensate for the increased GhZ. Similarly if you manage to place 100000 cores on a chip, scheduling amongst these cores and providing realtime access to the memory for all these cores will become the bottleneck. Hence, I take statements like "how to kill the x86" with a pinch of salt.
Finally, it will the fabcrication (physical) technology that decides which one of these dies. For e.g. if tomorrow someone is able to come up with a process that enables 100Ghz chips at the (think extensions of SOI etc) decreasing the cycle time will win. Similarly, if someone comes out with femto (10^-15 ) metre fabrication technology, then parallelism will win.
Two decades ago, the instruction set still mattered because it was closely tied to how the processor executed things. Today, we can put enough logic between the instruction strem and the processor that the instruction set makes no difference anymore.
And VLIW in particular is quite unconvincing: processors should rely less on compilers, not impose a bigger burden on software writers.
Since I use linux and it or its applications can be ported to most architectures you throw at it, I could theoretically have my pick of the litter for a future system. What I consider most is the bang-for-my-buck factor.
Sure, I could spend $20 on eBay and get a Sparc Lunchbox, but there's not enough processing power in there for me. I could also go out and buy a year-old IBM mainframe, but I doubt any auction site will have them anywhere near my price range. I want something that's decent but also cheap. I don't care what architecture it is as long as it 'works' and I can afford it.
This is for my Desktop/Workstation, mind you.
The Power 4 is two full CPUs (cores) on a single die, not that crap that Intel puts out called Hyper-Threading where you only have a single full CPU and then some extra logic to quickly swap over to another thread when needed.
The neat hardware implementation of this would be to make all MOV instructions take nearly the same time, regardless of the amount of data moved. A MOV should result in a remapping of the source and destination memory in the cache system. Even if this were just implemented for aligned moves, it would be a big help. When your application's 8K buffer needs to be copied to the file system, that copy should be done by updating cache control info, not by really doing it.
With this, windowing becomes far simpler. Each window is maintained locally. Shared window management is reduced to screen space allocation, which is done by commanding the window MMU.
Why not define a new standard machine code set and start making new chips with it? Old software can use the old chip and new software use the new chip. Game machines do something like this.
Emulators can be implemented such that old chips can still run code from the new standard (and visa versa), just slower. For development, training, simple apps, and testing that is usually fast enough.
A box could come with both an X86 and an Alpha-clone, for example. Eventually over time the X86 chip is not worth it. The few old apps laying around just use emulation mode.
Table-ized A.I.
speaking of thread-levels and x86 tainting..
r e compliant sites or *at least* clean links to them? Can someone post a plaintext version?
I get a blank page except for the advert; front page too.
I use netscape 4.77, cookies/javashit off. Now I'm a long way from text-only browsing but is it asking too much to have standardized, *.INT/insert_your_international_web_commission_he
Yes, economy of scale determines who provides
the most bang for the buck, but there are more
dimensions to the purchasing decision than
mips, mflops, and $$. There are watts and
hours and then, god forbid, intangibles.
ARM and PPC have the best shot at displacing
ia32 and its best successor, amd64, because
they accomodate very real market segments.
We keep waiting for commodity PPC hardware,
but it never emerges because the OSS community
isn't big enough to drive sales to economical
volume; but some magical event could happen
at any moment in PPC-land, nonetheless, as
IBM is quite motivated to see it happen.
ARM has economy of scale, but no one
is pushing its performance into competitive
domains right now.
-I like my women like I like my tea: green-
This assumes linux compiles on your pick-of-litter system. If intel moves to a non-x86 instruction set, then someone needs to port gcc to the new instruction set. Someone would also need to port the kernel to the new architecture. Userland apps may be mostly ok, but some of them will need tweaked, too.
That is why everyone is hesitant to throw out an architecture upon which literally decades of computing is built.
Maybe not quite, but I've programmed M68K, x86, and MIPS assembly, and if PPC is anywhere near as clean as M68K, I would absolutely love it on my desktop. Even if I never program in assembly. x86 just feels like some undead stalker that just crawled out of the basement, with bits of slime dripping off it.
So does old software. I've seen many a checkout-like system that appeared to be running on DOS, and some terminals I walked up to (even though I may not have been supposed to get to them physically, they were relatively unguarded) responded directly to the three fingered salute.
And while old hardware still works, especially as long as you have software that's ported to it, old software does not. For that matter, since old hardware is so cheap, people who would keep using 16-bit processors should buy 32-bit ones just as 64-bit starts becoming de-facto, because that way they have a low-cost upgrade cycle. Not because they necessarily need it to accomodate new software, but because I'm sure they will start wanting to do new things with their computer system, and sometimes that requires hardware.
Don't thank God, thank a doctor!
Today, we can put enough logic between the instruction strem and the processor
Wouldn't less decoder logic allow for a smaller decoder, which requires less die space and emits less heat?
processors should rely less on compilers
To the other extreme, do you propose a processor that can run Perl directly? What compromise would you find best?
I wouldn't expect seeing multicore in home PCs within the next five years, even if multicore becomes so cheap Intel could start putting it in its Celeron chips. The limitation is that Microsoft charges for Windows licenses per core; a license for Windows XP Professional, which can handle two cores, costs much more than a license for Windows XP Home Edition, which can handle one core. Wouldn't multicore require selling the machine with a more expensive version of Microsoft Windows?
I say "next five years" to estimate how long it will take before Linux desktop environments reach the maturity of even Windows 2000.
x86 however has a ridiculously small number of registers. This means that you have to go to memory A LOT. It's easy to make register operations fast, extremely hard to make memory fast. The performance gap between memory and processors is constantly increasing.
That's why x86-64 has 16 general purpose registers, Alpha - 64 and Itanium ... 128.
Bottomline: we do need a replacement for x86. Not because it's CISC, but because it has too few registers.
And, btw, it's not a bad idea to demand more from the compiler. Big table structures on the chip require space and affect clock cycles. Compilers just don't.
The Raven