Troubles with Merced
Brandon Bell
writes "Everyone has their theory on why Intel's Merced
is in trouble. Kraemer just wrote an opinion piece that
discusses two problems he thinks its facing: the compiler
and the sales model."
← Back to Stories (view on slashdot.org)
I don't care how good your compiler or cpu is
in most cases if you add 100 cpu's the code will
not run 100 times faster. The writer of this article
seems to think it will. Also even if this were the
case no one is selling mobo's to average joe user
with 100 cpu sockets.
Thank you for your message, which is much more informative than the initial page :-).
The problem is now: how well does EPIC/VLIW mixes with dynamic code and languages: Java, Smalltalk, Eiffel, Lisp, Perl/Tcl/Python, ... which trade CPU efficiency for speed of development.
On current RISCs, JIT and miscellaneous run-time optimizations could restrict themselves to emitting machine code ; it would then be scheduled properly by the (complex) hardware. But on EPIC/VLIW it seems that it wouldn't be the case ; is there any way to get still good performance for dynamic language ? Any research paper on this ?
I know that adding a cpu won't increase the performance of a program normally, because processes, in general, run on a single cpu. The program must be explicitly written to use multiple cpus. Also (correct me if I'm wrong), it is the kernels job to distribute different processes across cpus in a multiprocessing environment, but how are threads handled? For example in a multithreaded app, does/can the kernel run the app on multiple cpus? Is this provided in pthreads or do you need a special library like PVM (I think that's the right TLA :) ?
Yes, any reasonable kernel distributes threads across all available CPUs.
Linux and BeOS have the same thread model, so I don't really understand what you're saying.
However, BeOS effectively forces GUI and network programs to be multithreaded, so BeOS apps tend to use more threads than Linux apps, and thus take better advantage of SMP.
Actually, EPIC calls for splitting the CPU jobs into threads easily worked on by multiple pipelines. The biggest problem with current multi proc technology is that most software isn't threaded in that way. But if the software has multiple threads, which EPIC requires it to, what is to stop the OS from diverting them to the pipelines on another CPU? Of have one CPU do one branch and the other CPU do the other? Epic is desgined for multiple pipelines, but also can aid in multiple CPUs.
True, this aspect (code compatibility) has been dealt with somewhat in EPIC, but I think the spirit of VLIW is still there: expose as much architectural information that cannot be dealt with easily in hardware and do it in software.
So I think if you upgrade the EPIC architecture (add more functional units, change the latencies etc.) it would still be necessary to recompile to take full advantage of the new parallelism. This is true for superscalars, but much more serious in EPICs.
More thoughts on binary compatibility: I think
binary compatibility (in particular x86 compatibility) is a bad thing for innovation, and
not as critical as before. Reasons: desktops dominated by a single architecture is on the way out (I hope). Java, or things like java, where
the client with dynamically compile code into
binary by need will become more common. Hopefully this sort of dynamically and incremental recompilation can be made transparent to the users. Other
markets than desktops, like embedded systems etc, will become more important; these do not run shrink wrapped binaries. Finally, open source may
also have impact---if programs are distributed in source form, compilation and recompilation are
not a big problem.
Allen (leunga@cs.nyu.edu)
you may have heard about userspace threads.
basically, this is a library which fakes multithreaded activity in one process. Since
this is in userspace, the kernel knows nothing of it and can only schedule it on one processor.
This was used to do MT on linux before it had
kernel threads.
nowthat linux has kernel threads, this sort of hack is unnecessary (on Linux). I am unaware of
the story wrt the BSDs or other oses.
Dos has had this back at least as far as '90
(MT on segmented memory is no fun)
i believe sun has said they have a copy of solaris 7 running on a merced emulator.. but that was a few weeks ago. so who knows.. check out there site.. i think sun is looking forward to the merced hoping it would be cheaper.. but then again, who knows..
Yeah... this guy is a total fool. It looks like his site is only relevent to the same crowd that packs in sites like Tom's Hardware and Sharky Extreme.
These guys seem to measure the size of their penises by how much they can overclock their system, how much faster their newest gee-whiz sound card is than the next guy's, and how many megatexels their 3d card pumps out.
Let's see who's the god in their world? Whoever wrote Quake/Doom... nevermind that he's no compiler demi-god.
I intern'ed in an Intel compiler research group (MRL) 2 summers ago, when they are already actively pursuing innovative designs for the Merced. The suggestion that Intel isn't aware of the compiler issues is silly. The question is if they allocated sufficient resources to finishing the project on time. Who knows... we'll find out.
The idea that Carmack could step in and dominate this group of around 10 compiler-PhDs and 5 grad. students is pretty laughable.
No offense to Carmack, obviously.
There's been other VLIW compilation techniques
around for a long time. The Bulldog compiler
and the MultiFlow TRACE machine did it. Others
do it as well. Look into the history of the
IA64 designers.
Or look at Apple a few years back when the 68000 chip was running out of gas. The difference in the top end and bottom end CPU was so low that they had to handicap the bottom end machines with things like 40MB disks and so little memory that the OS wouldn't boot if you turned file sharing on. Apple found that you can only screw most customers over like that once.
Intel is just about at this point. Right after the Pentium III came out, I saw an ad for a $1900 Compaq - where about half the cost of that system is the CPU. Pity the poor sucker that buys that broken system thinking they're getting the top end. As General Motors and others have found out, selling dogs is not the way to get customers to come back quicker.
>I hear what you are saying about home power >requirements, and I tend to agree (although >SimCity 3000 might just need that Merced ;-)). >But consider that 10k chips * $10k/chip = $100 >m. My understanding is that a chip like Merced >costs >Intel $1-3 billion (US terminology) all >found. So
:) .. Your arguement in that the money gained from this isn't enough to remotely satisfy the money went into it is well understood, but keep in mind that other company's build just server chips and they are still around and kicking .. one such company, Alpha, was bought out by Intel.. but if that buy hadn't taken place, Alpha .. the Server only (Basically) company would still be around and by far, much better then Intel ..
.. Supposedly they will be releaseing some new stuff that'll be braking the 1GHz barrier, EV8 is also supposed to be very interesting .. (not EV7 .. EV8! :)
>$100 million won't go very far to pay off that >loan - and the bulk of the sales still have to >come from the workstation side.
I feel like entering this argument
I believe that the prices of the Merced will indeed be much higher then most people will expect
BTW: Alpha is still around, under Compaq's control
Sort of a PPro situation?
Note that there's nothing keeping Be from switching back to supporting PPC-based platforms if SMP PPC systems become popular.Until it actually happens, though, they have no reason to sink time, energy, and resources into a market which has demonstrated itself as unviable for them.
-- Guges --
>And the worst part is, we need gcc to be able to >support it thouroughly to be able to run linux >and take full advantage of its capabilities.
>I think I'll buy an alpha. It will probably cost >less anyway. It's supported by linux (or vice >versa) and we know the damn thing works.
Now if everybody thought this way, technology would never evolve.
With companies like Samsung introducing Alpha-based 64 bit chips at low prices, and an open-source model that allows apps to be compiled for a specific platform (thereby freeing the user from the shackles of backwards compatibility - the single biggest monkey on Intel's back since the 286) the Merced could be irrelevent
SMP doesn't of itself improve the SINGLE process performance. You CAN write special code [..]
Note that the OS under discussion (BeOS) uses a multithreaded kernel and provides many powerful multithreaded libraries. By virtue of using any system calls and/or library calls, any BeOS application gains the benefit of having itself broken up into threads and distributed across multiple processors, even if it is written serially. Though of course that doesn't help much if all your time is spent in an inherently serial inner loop. I just wanted to point out that the matter isn't as cut-and-dried as your post made it sound. The general gist of your post, ie that adding more on-chip parallelism to processors like the Merced and Xeon is for improving single-thread performance, is absolutely correct.
-- Guges --
Thanks for the great comments!
Yes, I forgot to mention Josh Fisher's,
John Ellis' and Bob Rau's works: for Multiflow, Bulldog, and Cydra 5(?) etc.
I disagree on the relative ease of making a good EPIC compiler though: I don't think naive tricks like hoisting loads and prepare-to-branch instructions to hide latencies with work well by themselves. I think resource constraints have to be modeled directly to get even decent performance. For example, hoist too many loads too early when you lack the # of load units and you have to stall etc. (No real data to back this up though; just from my own experience in writing schedulers and guessimate.) And this basically means modeling your EPIC as a VLIW.
Furthermore, EPIC has predication, and various data and control speculation instructions, and you really have to take advantage of these to get good performance. So the compiler will not be as simple as a compiler for a superscalar. In fact, you can't even take an existing superscalar backend and hack up an EPIC backend from it; the IR, algorithms etc are just so different, especially with predication involved.
From the perspective of marketing though, I agree that EPIC has an advantage over VLIW.
I recall that Beos was capable of taking advantage of MP chips. Programmers didn't have to do anything special in their code and beos would distribute tasks to cpus accordingly. Am I right about this or what?
Why does thre processor need to check if the compiler "lied"?
This Merced design of executing both branches seems like it would take an enormous amount of work. Is it really worth it? Isn't a simpler design able to operate at a higher clock rate?
Of course its worth it, with sufficient parallelism its free to run both branches.
As to the higher clock rate, there are real physical limitations down that route, meaning it will end at some point, and beyond that there is no gain with clock rate. However executing multiple instructions at the same time, even if it were just 2 at once, if you could maintain that consistently then you would expect processing at 2 times the speed.
JIT is "Just-In-Time" compilation, and actually outputs machine code (see kaffe for a Java JIT). The idea is that at run-time you have a better idea of the types of the variables and actual functions called, even though at compile time it wasn't necessary clear. This is an optimization that could be applied to dynamic languages: Smalltalk, Java, maybe Lisps [they also have type declarations], and hypothetically to any interpreter (Perl, Python, Tcl, ...).
SGI have already announced that they are
going to build another generation of MIPS
CPU's to cover the anticipated delay in
the release of Merced.
That's a pretty expensive thing to do
unless there are some very serious
problems with the Intel part.
No - SGI had announced future product containing
Merced processors.
I agree about Carmack though. Programming games
and writing a great compiler require two very
different sets of skills. In writing a game,
you also get to change the rules to suit the
code. When writing a compiler, you have a very
fixed language spec and a very fixed CPU spec
and you have to bridge the two.
Its happening now just like it has since the begining of time. Everytime someone creates something completely new, and it's a risk, people come out of the woodwork to dismiss it. Before everyone sits around and says the Merced chip is done, wait till it ships before judgement is passed. If it ships on schedule, and is buggy, is that better then it shipping a year late, and flawless? As far as I know, no other chip company is taking this much of a risk on a new chip. I say more power to Intel.
To take advantage of EPIC, a compiler needs to look for machine instructions that have no dependency on each other. Those instructions can be executed simultaneously.
To benefit from multiprocesing a program must use threads (break itself up into what are called lightweight processes). Threads are an operating system service, not really a processor level thing. A compiler cannot make a thread - even on Merced. The programmer creates the threads by triggering operating system calls within the process.
Thus, even on Merced, an ordinary Joe Blow can't just make individual programs faster just by popping in new processors unless the software is written using threads. And even then, the benefit will only affect software on operating systems that offer both user and kernel level threads like BeOS and Solaris UNIX. NT doesn't really cut the mustard. And Linux doesn't either.
As far as writing the compiler goes, he's partially correct there, but only partially. All programs have a few instructions that can be executed simultaneoudly safely, but how mush faster would that make the program. A compiler must be well written and this will make quality comparisons between different vendors' compilers much more useful. How crappy will the MS Visual C++ be then? Will NT even run on Merced?
You got the Alpha and PA-RISC mixed up: the Alpha always assumes that a branch to a previous address will be taken (which makes loops fast and gives compilers a handle on how to optimize code with well-understood flow).The main advantage to using such a simple (and more or less effective) technique on the alpha was that it consumed very few transistors and the Alpha was facing very severe space constraints.The 21264 is about twice as powerful as the 21164 at the same clock speed, and most of the benefit came from improvements in the branch prediction (made possible by better fab technology relieving some of the space constraints; the Alpha is a *big* processor).
Branch prediction is very important for keeping deep pipelines from stalling.If your pipe is 33 instructions deep and your branch prediction is only 90% effective, then your branches cost you an average of three extra cycles each.
Speculative execution is another powerful tool for keeping your (now tree-shaped) pipeline full, but it's not intended to be a complete replacement for branch prediction.On systems where the pipe isn't trivially shallow speculative execution is used with branch prediction (ie, executing speculatively on the earliest branches and/or poorly predictable branches and using branch prediction for the rest).Speculative execution is expensive in terms of duplicating issue logic and ALU, but that's not much of an issue for today's microprocessors -- most of the space on-die is taken up by memory cache, which tends to become much less effective per-transistor once beyond a certain (already long surpassed) size.As long as the extra logic for speculative execution yields better gain per transistor spent than L1 cache, it's a win.
Yet another technology for ducking the high cost of conditional branches is predication.Predication is orthogonal to prediction and speculative execution.Its biggest strength is that it doesn't require much extra logic, doesn't require splaying your pipe into a tree (a la speculative execution), and greatly reduces the cost of small blocks of conditionally executed code albeit not as much as good branch prediction would, so its use is more or less limited to small blocks of hard-to-predict conditionally executed code, and having it in your processor by no means allows you to get away with not using good branch prediction logic.
-- Guges --
EPIC means the following:
1. The compiler has to explicitly package instructions that have no dependences into parallel issue packets. This task is currently
done by hardware in superscalars.
This means that you can't just slap another CPU onto the board to make things faster:
the parallelism is in the instruction level and is compile-time determined; most bindings are done
statically. For this to work right,
almost all instruction, pipeline and functional units characteristics have to be exposed. For example, for software pipelining (a technique to
overlap instructions from different iterations of
a loop) to work, pipeline latencies, and functional unit resource constraints at each time step has to be carefully considered.
2. EPIC is basically VLIW, but don't say that because all VLIWs (except DSPs like TI's C6) have been commercial failures. Besides it's not as salesirific as VeeEllEyeDoubleU.
3. The compiler is crucial. So far, only Univ of Illinois IMPACT group and HP CAR group really really knows how to build one well. (My opinion
of course)
4. For more technical info, read comp.arch,
look at proceedings such as MICRO. See also www.trimaran.org. Just don't listen to the clueless.
Allen (leunga@cs.nyu.edu)
I didn't knew that Carnack was the semi-god of compilation research, and I thought that some people had OSes running on Merced simulators.
The sales model: of course Intel made a gamble ; but the gamble is that you couldn't make much architectural optimization on current RISC (maybe that's why people are throwing away die area with multiple MMX, 3DNow!, whatever "multimedia/SIMD" units), so a new paradigm shift could outperform older units. Basically, Intel is betting that superior performance is a valid reason for being (partly) incompatible with x86 (or to keep up with competitors).
The a huge part of Merced issue is essentially technical, and the article is just completly out in this respect. Please, someone fix this and post relevant URLs...
Even from the small amount of information published about IA64, it is clear that there is absolutely no support for automatic scaling simply by adding cpus. EPIC refers to the way each individual cpu decodes the instruction stream. EPIC is no more inherently multi-processor than the current IA32 instruction set.
To get automatic scaling, you need something like Tera's Multi-Threaded Architecture. Too bad they can't seem to ship the damn thing, and that it costs a couple of million.
See: http://www.tera.com/ for more info.
If the trouble is in that the money model does not work with easier to upgrade hardware, then maybe the model needs to change. Currently Dell and Compaq make money selling whole computers. Perhaps they should sell or lease parts in addition to cases. That way you could change the CPU every few months and keep current, for a fee of course.
Time and progress won't hold still, so perhaps you shouldn't.
-Ben
It's not just _a_ compiler that has to be worked over, it's all of them.
In addition, it's widely accepted that there will be faster RISC CPUs available then Merced when Merced ships, and even faster x86 chips (at running x86 binaries). Before using this to claim that Merced is dead, remember that this was true of the initial RISC chips when they came out. What this means is that it'll simply take a while for EPIC to mature and for the advantages to come to the fore.
The problem is that Intel doesn't have much of a choice. Well, I suppose they could have gone for a standard RISC chip. But something post-x86 is necessary. If nothing else, the 32-bit limitations of the architecture are hurting Intel's sales in the lucrative server market (where 10's of gigabytes of RAM are common, and 100's not unknown). The desktop user probably won't care for another 4-5 years, but the server market started caring 3 years ago.
One interesting note in all of this is how this is affecting the Intel/Microsoft relationship. By it's actions, Intel has no confindence in Microsoft being able to ship an Enterprise-ready 64-bit clean OS any time soon. Not that I blame them.
Intel is learning one nice feature about open-source operating systems- they don't have to depend upon someone else to support their chips. For a small engineering investment, they can do it themselves- and if you want something done right, doing it yourself is a real good idea. Making a small investment in a small company (like, say, Redhat) makes a lot of sense in this context.
That being said, I think EPIC is an interesting design with a lot of long-term potiential. Standard RISC processors have a hard time averaging more than about 2 parallel instructions. Research done by HP indicates a lot more than that is possible- it's just computationally infeasible for the _processor_ to find it.
I wonder how much time the author of this article
took to research the matter.
I see no mentioning of any of the unix vendors.
Both HP and SGI are going to use the IA64. HP
will be fasing out it's PA-RISC CPU in favour of
the IA64. (Don't know about SGI's use of MIPS
CPU's.) Both vendors have extensive experience
with multi-scalar RISC CPU's. Also, Intel has
it's own RISC CPU's and the are several 3rd party
compiler developers probably just waiting for a
break.
Also, he starts comparing Joe Average's IA64
system with real server machines (HP PA-RISC,
Alpha AXP, MIPS R10K, Sun UltraSparc). An
IA64-based system is going to cost more that Joe
makes in a year! He can't even buy a machine
based on one the the currently popular server
architectures (except maybe Intel IA32/Xeon).
Also, the comment about just adding an extra CPU
is also valid for current SMP Windows NT based
systems, since almost all software is
multi-threaded. (I really like my dual P150
running Linux...)
Mathijs
Most processors are already parallel in the way EPIC means parallel. It has multiple units of execution which can concurrently process instructions. Multichip parallalism is a much more tough problem with lot's of different problems to beat. It has nothing to do with merced or the sales model.
An IA-64 instruction comes in a bundle with 2 other instructions, all together there are 3 instructions in a bundle. Each instruction is something like 40bits long and each bundle has a dependancy flag of several bits. The performance problems that hinder chips the most are pipeline stalls and branches. The chip has a ton of logic that tries to predict branches and choose the right one and modern chips have a ton of logic to execute instructions out of order to reduce stalls. IA-64 forces the job of stall detection to the compiler, which makes the instruction bundles and chooses the dependancy flag (the flag says which instructions in the bundle and conflict) that way the chip doesn't need as much logic for out of order execution, they can focus on more important things. This is also a piece of cake for modern compilers, IBM, Sun, MIPS and DEC all have the technology to do this and most have for years and years.
To solve the branch problem, IA-64 doesn't use branch prediction. This is the really important part of EPIC. It executes both branches and once the correct path is know it discards the instructions it executed on the wrong branch. THis is tough to do. The compiler is also supposed to help with this and add some bits to the flag and this is a tough thing to do.
If it all works, IA64 chips will be fast, but nothing stellar because RISC chip makers have done such a great job of dealing with these problems already. So Intel has chosen to make a very complicated design, with some hard but not impossible compiler changes and they aren't going to deliver the ultimate performance they have been promising for years. There are definitely hard technical problems to solved but they aren't that bad, I think the bigger problem is actually building a chip that can compete with modern PowerPC and Alpha RISC processors and look like it is innovative. Intel is breaking compatibility and once that is done it's anybody's market because they have nothing that makes them look better than the other guys (like 25 years of x86 software...)
The funny thing about all this epic talk is that intel still has to have logic on the processor to tell if the compiler lied... They were trying to get rid of that logic to make a leaner and meaner processor but they still have to have it.
I don't really know for sure, but it seems to me that one of Intel's major problems is that they want to charge an extreme premium for performance, and don't want to wake up in a world where you can scale processor power by adding CPU's.
I find it odd that just when the PPC people will be removing the "SMP Premium" charge from their chips, and making very SMP-capable G4 chips, Be will be abandoning the PPC arena for Intel chips, where the only processors capable of scaling beyond two-way SMP are "non-consumer-grade" very expensive, very high-margin server chips.
Once someone comes out with a decent low cost multi-smp-scalable (beyond 2!) chip and motherboard system, the world will beat a path to their door. I think if that ever happens, Be will have to decide whether or not to stick with Intel and watch some more processor-agnostic SMP-capable OS (like Linux) seize the ground of becoming a "media OS."
Phil Fraering "Humans. Go Fig." - Rita
(currently testing something about signatures here)
You don't understand what I mean.
For the intended market for Merced, i.e.
servers, being able to handle multi-threaded
applications would help a lot.
This also holds true for consumer OS's, otherwise
M$ wouldn't be quite so concerned about Be.
Phil Fraering "Humans. Go Fig." - Rita
(currently testing something about signatures here)
Incidentaly, what I've heard at the Register does correspond pretty well what I've heard though other channels.
Basically, the 'other' problems with the Merced are the design itself - Intel's engineers aren't used to doing this sort of thing, and also, apprantly they're a bit short on quality engineers, and are using lots of people who've just left university. Not the sort of people to give a massively complicated chip design to.
Incidentaly, the '2nd gen' EPIC chip, the McKinley is mostly being done by HP, and is apprantly going pretty well. I've been hearing from my own contacts for a long long time, that the Merced might just end being a 'test' processor that never goes into production, and that the McKinley will be the first production EPIC.
Not surprisingly both HP and SGI have recently been saying they'll still commited to their own architectures (at least for a while), after previously planning to dump them. I think HP have been saying they'll go with their own stuff for another 5 years.
Intel isn't the only one being a bit late. Sun are about a year behind with their UltraSparc-III, though I haven't heard anything about why they're behind. (their reasons for being late is probably quite different to Intels.) Shame, as it seems like a pretty nice chip...
"Too many web sites (especially gamer sites, for some reason), don't seem to understand that Merced isn't for the average user. When it comes out, and at the very least for a few years following, it will be an ENTERPRISE level chip. This means 1) expensive as hell 2) used in"
/price reduction model. Otherwise they couldn't earn a return on the chip or keep AMD etc. at bay. Set up a spreadsheet and play around with some pricing models (first 10k chips at $2000, next 100k at $750, and so on). The arithmetic is quite simple and inoxerable.
Intel's pricing model doesn't work that way. True, with every new chip Intel announces "this is for servers only". And the first few thousand chips do go into servers. But the server market isn't anywhere near large enough to pay back the cost of developing that chip, so within a few months workstations are released, first by one of the larger clone makers, then by Gateway 2000, and finally by Compaq.
And Intel absolutely depends on these workstation sales to drive their learning curve
So look for the first Merced (McKinley?) workstation about three months after the first server is released.
sPh
"Not in this case. These pricing models will be on a much larger scale. Try $10,000 for the first 10k chips, etc. This will one won't be quick to the home user (intel will still be realeasing some"
;-)). But consider that 10k chips * $10k/chip = $100 m. My understanding is that a chip like Merced costs Intel $1-3 billion (US terminology) all found. So $100 million won't go very far to pay off that loan - and the bulk of the sales still have to come from the workstation side.
I hear what you are saying about home power requirements, and I tend to agree (although SimCity 3000 might just need that Merced
Just my 0.02.
sPh
MIPS and PA architecture chips are the architecturally closest *working* chips to Merced that we have today.
I remember all the trouble MIPS had when they rolled out the R10000 chip. Initial performance was not up to spec because early estimates of performance were pretty much correct on the SPECfp numbers, but underestimated *how long* it took to get those numbers. It took a couple of years for the compiler people to wring out the best performance (ok clock speed was off too, but that was not the sole reason, nor was internal CPU wars within MIPS/SGI).
Now the Merced is more complex than the R10000 (and at least the R10K has some *vague* similarity to the R8K, and PA architecture has been around for many years, so these companies had compiler writers experienced in some of the problems they were up against). Intel is starting from scratch here. I'd say when they've done first tapeout and have silicon in their hot little hands, it'll be at least a year before the compilers get close to the performance they hope for.
Meanwhile, IA32 will be up to similar spec and Alpha, PA and MIPS (and SPARC perhaps) will be serious contenders.
Just last week or so MIPS and HP announce they were reviving their CPU development for a further year or so (ie another generation), rather than trusting all to Merced (I assume that means last MIPS or PA in 2003-2005 now). What news did Intel give these guys for them to decide to make such an announcement??
cheers
Michael Snoswell
pithy comment
But I don't know where he got this idea that Merced automatically makes all applications multi-processor ready; that's just plain wrong. High end processors have had multiple execution units for many years, which allows them a small amount of very fine grain parallelism: on average perhaps two instructions can be executed at once. Sometimes when you're lucky it can be more than two for a short burst. Merced will *not* be able to keep all 7 of their execution units busy 100% of the time, but they may get lucky and do so for an instant every once in a while, if their compilers are really good.
None of that has anything whatsoever to do with multiple cpu's. The situation with those will be unchanged from the situation today with multiple e.g. Pentiums: applications won't take advantage of more than one cpu unless they are explicitly coded to do so.
Therefore the conclusion of the article is dead wrong: the business model won't change, because he just misunderstood the issue with parallelism.
Professional Wild-Eyed Visionary
My summery of EPIC vs. VLIW vs. SuperScaler (note I use the term "functional unit" to mean "thing that can execute some kind of instruction", more functional units means a faster CPU, it's an oversimplifaction, but useful in this context):
Now my reply to allen's post:
Yes, however these packages need only be free of data use dependences not executions unit dependencies (this is the big diffrence between EPIC, and traditonal VLIW).
To get maximum proformance this is correct. From a "normal" VLIW you need it to get a working program. This diffrence is important. If you own a Multiflow (one of the defunct comercial VLIWs) and you upgrade it's CPU all of your old programs are incorrect (diffrent load latencies), and if you managed to compile your code to work with both load latencies, you still can't use more adders per cycle because the exact instructions that are executed per cycle are set in the code.
If you upgrade from a Merced with three integer execution units and two load units to one with six integer units and one load unit your old programs continue to work. The may run faster, or slower, but they still work.
I don't think you need to know all the details of the Merced microarcheture to get decent proformance. Just move the loads as far from the uses as you can, and get as many instructions marked intependent of their neibors as possable. You may end up moving loads farther away then needed, or marking more things as "can run in the same cycle" then your Mercend can gobble up, but that's ok. It won't kill you. It might make a furure Merced faster even.
The EPIC is basically a VLIW, except it is a little slower (for a given transistor budget), and it has an upgrade path. I think the upgrade path makes it comercially diffrent from VLIW.
Multiflow made a good one (well, it got good results most of the time, it was pig slow). DEC eventually bought it when Multiflow went under.
Also the compiler isn't as hard as it is for VLIW. With VLIW if you get the latency wrong you don't run. EPIC just stalls. Kind of like SuperScaler. Getting max speed requires tons of work, but the same work would speed up a SuperScaler (by exactly the same amount, if the SuperScaler has the same number of functional units). I think the big diffrence will merely be that EPIC CPUs will tend to have many more functional units so the bad-compiler vs. good-compiler will be more like a factor of 8 then a factor of 4 (or factor of 2 on a PII/PPro/PIII).
Indeed. And that's not a slam, your opnions were well thought out, I just happen to think that requireing explicit dataflow (EPIC) is very diffrent from explicit dataflow AND instruction scheduling (VLIW).
I assume you mean that it performs speculative execution (which is what you described) in addition to having predicated instructions, e.g. speculatively executing predicated instructions before it knows what's in the instruction's predicate register, and throwing away instructions' results as soon as it finds out that the predicate register was false.
(I.e., predicated instructions aren't the same thing as speculative execution; don't automatically conclude that Merced does speculative execution merely because IA-64, of which Merced is planned to be the first implementation, has predicated instructions.)
...although, of course, one can support more than 4GB of RAM with a 32-bit processor, in the sense of a processor that can't handle more than 32-bit linear virtual addresses, as long as the processor's physical addresses can be more than 32 bits (as is the case with most, if not all, P6-core processors - Pentium Pro, PII, PIII) and as long as the chipset can handle it.
It may be less convenient, as one might have to have a process manually map stuff into and out of its address space if you want a single process to use more than 4GB of RAM (as opposed to, say, having file systems use it as a buffer cache, although that may also involve switching mappings), but it's certainly still possible.
(I say "linear virtual addresses" because, whilst the x86 segmented virtual addresses go up to 48 bits, they first get mapped by the segmentation hardware to a 32-bit linear address before being used as physical addresses, if you haven't enabled paging, or before being run through the page table, if you have enabled paging; not only are 48-bit addresses not necessary for accessing more than 4GB of physical memory, they don't even help you to access it.)
I was under the impression that "auto-parallelizing" compilers can convert, say, some Fortran or C/C++ code into multi-threaded code.
See, for example, this Sun white paper on their compilers, which, it appears, can auto-parallelize loops to run on multiple processors.
In what fashion? Its threads may not be "both user and kernel level" in the sense that there are user-level threads that can be executed by a pool of kernel-level LWPs, with the possibility that there are more user-level threads than kernel-level LWPs, as is the case in Solaris, but I don't see why that's necessary in order to get a speedup to a threaded program by adding processors - would not the model I think NT uses, wherein every thread known to userland is known to the kernel (I ignore "fibers" here), be sufficient?
DEC comes to mind - handicapping their low-end systems so that they would not outperform the high-end ones.
Bruce
Bruce Perens.
Intel hasn't innovated - give me break. Just the
silicon process technology they've developed would
wipe that arguement out - the other detail -managing
to get x86 to go as fast as they have generation after
generation disproves the statement also.
Also, EPIC as detailed, isn't really even an HP
invention, but rather an outgrown of things done
by companies in the 80's such as Multiflow and
Cydrome.
The author claims that the compiler is a bitch -well it
is, but they solved most of the problems with the
compiler technology at those earlier companies - and
some of the folks doing the IA64 are graduates of same.
I WOULD worry about the scalability of the architecture
though - that WAS a problem with the Trace and
Cydra architectures. You had to recompile for
suceeding generations of hardware. I personally
don't know if EPIC solves that problem with VLIW
architectures. Anyone know if it does, and how?
Steve
Have you compiled your kernel today??
The interesting point about this is - a large
;-)
number of the folks that DO have such
experience work for Intel and HP.
I know - I worked with about half of em
at Cydrome..
Have you compiled your kernel today??
Excuse my ignorance (I know VLIW but not JIT ;-)
Is JIT reduce to compiling for the hardware the
JIT compiler is running on - or a virtual machine?
If it's a virtual machine, then the Machine simulator
is all that needs to be run thru the EPIC compiler -
cause that's all that would execute - not some
intermediate target language. If that's the case,
EPIC won't present any real problem.
Out of ignorance...which is it??
Steve
Have you compiled your kernel today??
There are fundamentally two ways to make a
SINGLE thread go faster - you can up the
clock rate - or figure out a way to run more than
one instruction at a time.
Multi-issue pipelines, VLIW, and EPIC are attempts
at solving the problem in the second manner. Once
you have an adequate solution in the second space
it becomes possible to improve it's performance in
via the first method.
Thus, from an architect's point of view - the second
method is the first tried!
Now - which is better - multi-issue pipelines or
EPIC at a given clock rate. That remains to be seen.
Steve
Have you compiled your kernel today??
This is falacious.
SMP doesn't of itself improve the SINGLE process
performance. You CAN write special code on a
Beowolf platform (or SMP) and get the answer faster
for the single thread via parallelism...but that is
a problem that isn't well supported by automatic
tools at this time. We DO have the technology to
throw more execute units at a single thread and
get the answer faster though - that is what Merced
is all about.
You can just as easily SMP a Merced class CPU and
run multiple threads thru them as you can with a PPC
or a Xeon. That isn't the problem that EPIC,VLIW, or
Mulitple-issue pipeline(Superscalar) machines are
trying to solve.
Think SINGLE threads when talking about these
architectures.
Steve
Have you compiled your kernel today??
I seem to remember reading a few months ago that Intel is considering Merced as a test platform for their future IA64 chips and is not really intending to market it very strongly.
Also, aren't there many people out there who know more about compilers than Carmack?
EPIC is not a multithreaded architecture. EPIC (which, as everyone knows, stands for Explicitly Parallel Instruction Computing) focuses on Instruction Level Parallelism (known in hardware and compiler circles as ILP for short).
Explicit ILP architectures, such as Very Long Instruction Word (VLIW) architectures, Transport Triggered Architectures (TTA) and the like all focus on finding parallelism within a given single-threaded program. The compiler for such an architecture may divide separate paths of execution into a sort-of thread (for instance, it might execute down the "then" and "else" clauses of an "if" before it knows which it nees, or perhaps down multiple "cases" of a "switch"), but this is not multi-threading in the common, macro sense of the term.
Multithreaded architectures, on the other hand, do focus on running multiple independent threads of execution, typically as if they were multiprocessors. For these CPUs, a given application needs to be constructed as a series of explicit threads (at the process level, not the instruction level), or a compiler needs to simulate this division. Alternately, a number of independent processes need to be available (although since all threads share a common pipeline, running independent processes together can have bad cache effects and cumulative stall effects that generally don't make anyone's day).
--Joe--
Program Intellivision!
First, before everyone jumps in and says "Intel will never get there because the compiler will never get there," please don't forget that some shipping devices are already there.
Quite simply, EPIC allows a compiler to tell the hardware ahead of time where it knows parallelism exists, so that the silicon (which is finite) doesn't have to hunt for it. Compared to the rate at which silicon must make scheduling decisions (at 800MHz, that's 1.25 nanoseconds), compiler time seems infinite.
Granted, compiler time is not infinite, but for performance-critical applications, it is quite large. The Texas Instruments TMS320C6000-family of DSPs, for instance, rely on compilers and assembly optimizers in order to eek out that last bit of performance, and as any DSP engineer will likely tell you, its usually worth it. Cycles saved in one loop are cycles that can be spent elsewhere on value-added features, leading to a more valuable product.
This points to the real fundamental problem as I see it, which is that the current VLIW darling in the industry is in the embedded world. Why should that make a difference, you ask? Because the embedded developer is the one most likely to take advantage of the raw capability that an exposed parallelism architecture can provide.
Merced's biggest problem lying ahead is the fact that workstation-class code does not naturally exhibit large amounts of parallelism. While I was attending MICRO-31, I heard someone remark about how most code looks like a series of 5-10 instruction bursts followed by a jump. ICK!!
Embedded programmers generally seem willing to learn whatever it takes to get their product running in the fewest MIPS (so that they can either use cheaper parts or provide more features), and so are often willing to jump through a few hoops to help out the compiler in order to get the parallelism they desire.
Workstation programmers, on the other hand, are interested in the much bigger picture (since their applications are much larger and tend to have larger life expectancies), and so code tends to be human-friendly and not compiler friendly. (Certain heavily-traveled code paths in the Linux kernel being a noteworthy exception.)
The point is that the Merced compiler will ship with alot of amazing compiler transformations, but very few of them will be effective at translating the hopping, skipping, and jumping nature of your typical general-purpose database-ish looking code into highly parallel performance-oozing EPIC instructions, at least straight out of the gate.
Merced will inherently provide big performance wins to the compute-farm customers (your big engineering shops that currently use networks full of Sun or HP workstations to crunch VHDL, Spice, or whatever simulations around the clock), as these applications end up reducing to huge matrix manipulations and numeric crunching galore -- oozing with parallelism. But Merced will be hard pressed to feed up web pages or database queries much faster than any other architecture, unless it's able to massively crank its clock rate due to losing the shackles of the instruction scheduling hardware.
Anyway, those compiler nuts in the crowd might find the following links useful and informative.
- The Rocket Project -- ILP research at Michigan Tech University
- VLIW Architectures -- a description of VLIW that's part of a larger presentation about VLIW compiler techniques.
- The Trimaran Research Compiler -- HP's research compiler that was supposedly used in development of the architecture that begat Merced.
- EE Times -- article which describes the release of Trimaran and includes a diagram showing the relationship of architectures from Superscalar to VLIW/EPIC to TTA.
--Joe--
Program Intellivision!
Much as I hate to side with "the moneymakers," there is one advantage to the OEM's current business model as described in the article: as long as the OEMs are basing their income on the periodic-upgrade model, they have a direct incentive to provide quality systems and support: the better your experience with owning a Micron, the more likely your next computer will also be a Micron. If this changes, OEMs will be less interested in customer experience of quality, and much more interested in the *perception* of quality, which in turn means (gasp) marketing.
This is all assuming that this article is accurate in its description of the OEM business, which I am not 100% convinced of. Among other things, in a business with that kind of growth rate, wouldn't new users be at least as important, if not more so, than returning users?
"Never let your sense of morals prevent you from doing what is right" -Salvor Hardin
OK, about 6 years ago, I took a EE course that taught us CSs about CPU design.
At the time we were comparing Pentium, PA-RISC, Alpha, and MIPS.
If I remember, Alpha hadd a huge amount of transistors dedicated to branch prediction. PS-RISK always assumed that the program would loop. As a class, we questioned how much branch prediction actually helped. Does anyone have a good feel, or even some numbers to descibe how much branch prediction improves performance?
This Merced design of executing both branches seems like it would take an enormous amount of work. Is it really worth it? Isn't a simpler design able to operate at a higher clock rate?
And, has anyone read about async processors lately? Anything ever released commercially for that?
Thanks,
Joe
[forgive the English, I don't have an English compiler.]
Joe Batt Solid Design
I have no doubt that by the time Merced is ready to hit the market, Microsoft will have bloated up and fluffed out windows to the point where anything less powerful then a merced will be useless.
Actually, Cray invented all of this, but hey, why nitpick :-P.
SGI's ccNUMA white paper can be found here
-- ultra1
That would be all well and good, if we were actually discussing a RISC architecture. But we aren't - we're discussing VLIW.
With the i960 and crew, Intel has all the RISC expertise that they ever wanted (or needed). Finding someone who can write compilers and tools for VLIW is a horse of a different color, however. There isn't much experience in the industry when dealing with VLIW; not only that, coding for RISC isn't going to help you with this type of architecture. Hence, the hair-pulling and delays from the compiler/tools group. This isn't a problem you throw money at to make it go away faster - it's a first run, and everyone on the team is learning as they go.
If that doesn't convince you, keep in mind that Intel is partnered with a company that has deep experience with RISC architectures (HP). If HP and Intel together are having a rough time of it, I would submit that this can't be an easy design to work with - especially given that no one has done it before.
-- ultra1
So, how much Intel stock do you own???
Umm, I distinctly remember reading an article (on /.?) about HP and SGI deciding to comeout with one or two more chips in their respective series because of Merced delays. I think, by the time Merced does ship, the chips from HP and SGI/MIPS will be significantly faster/scalable than Merced. The big Unix vendors just can't wait for Merced to ship faster boxes to their clients.
Maybe my view of Merced is colored by the fact that I have used VLIW machines in the past. My experience has been: a lot of code will not run at even close to the theoretical capabilities of the machine (because the compiler couldn't figure out how to squeeze the logic into the parallel instruction set) and there were few compilers and little software available for them.
So far, I see little reason why Merced should be any different. Despite many years of research, compilers that are actually in use still haven't gotten very smart in understanding aspects of programs that need to be understood for parallelization and optimization. And Intel may try to help with C, C++, and Fortran backends, but what about all the other languages that are coming into use? We need chips that encourage the use of post-1970's languages, not chips that write them into stone.
Merced will probably perform well on some very structured problems (geometric transformations, optimizations, other numerical problems, text search, etc.). But for those, adding vector processing units to a more traditional processor might be cheaper and result in better overall performance than Merced's architecture.
There also seem to be questions about the way the VLIW architecture is implemented by Merced; supposedly, code compiled for one generation of the chip will not take advantage of more parallelism available in a later generation.
I think there is a good chance that the Alpha will save Intel. People already know how to write compilers for the Alpha, and the chip is fast. According to an article in Byte (but, hey, where are they now :-), Alpha will have twice the performance of Merced at the time Merced finally gets released.
On the one hand, I'm glad that some company is finally breaking with the dull tradition of processor design over the last 20 years. On the other hand, I'm not sure that this is the right way to do it.
Actually, there is another rather radical change in processor design that has happened recently: the complete system on a chip (from IBM and maybe others). Those might allow very dense multiprocessor systems, leading possibly to very different designs.
He's right about the compiler being hard, but I'm sure Intel realized this when they decided to go the VLIW route.
The idea with merced is and parallelism is this: the compiler does it all. No explicit coding should have to be done. If you write something in c++, the compiler will parallelize it to execute as much as possible in parallel within the cpu. Plus, the Merced architecture is scalable: more execution units can be added to future generations of chips for even more parallelism within the chip. Unlike the author of the article stated, this has nothing to do with multiprocessor systems, and the post above is correct, for that apps need to be coded with threads. That's not to say that there won't be multiple cpu systems: SGI and HP will definitely be making massively parallel supercomputers using merced, with 256+ processors.
He said, "You'll be able to tell your grandchildren that you helped assemble the first NT supercomputer," and I cringed.
Not in this case. These pricing models will be on a much larger scale. Try $10,000 for the first 10k chips, etc. This will one won't be quick to the home user (intel will still be realeasing some next get 32 bit chips (foster I think?)). And by the time it is ready for home market, well, I have serious doubts that anyone will need it. Think about it: right now, with what I actually use my computer for, all I need is a P200 with enough ram so I can run netscape, a word processor, and other common apps (so why is it that I have a dual celeron 450?). While there will undoubtedly be new apps that will start pushing cpu utilization, I think the trend will continue that the bottom line of applications that people actually use can be handled by a relatively slow processor, and only intense media functions will consume more. This indicates that people will not be buying full computers as we know them by the time Merced is out (maybe I'm pushing the speed at which this will happen a little bit), and will simply be buying "appliances" that will handle certain tasks (ie a WinCE style machine where it just does certain tasks and is networkable). I'm thinking the home user won't be having the "need" for a merced chip.
He said, "You'll be able to tell your grandchildren that you helped assemble the first NT supercomputer," and I cringed.
Too many web sites (especially gamer sites, for some reason), don't seem to understand that Merced isn't for the average user. When it comes out, and at the very least for a few years following, it will be an ENTERPRISE level chip. This means 1) expensive as hell 2) used in supercomputers (a la SGI and HP) and 3)high end workstations/servers. The author of this article is right... it doesn't fit into Intel's business strategy for the consumer, but it isn't supposed to. Besides, I'm starting to get the feeling that by the time Merced is consumer viable, people will be using pure computers less, and computing appliances more.
He said, "You'll be able to tell your grandchildren that you helped assemble the first NT supercomputer," and I cringed.
I stand corrected... forgive my mistake. Back
in '96 (don't laugh) I kept very good tabs on
what was going on with Intel and its competitors
regarding chip technology, with help from friends
placed well at Intel (who would surely like to
remain anonymous). At the time Merced was
described as "essentially RISC" when compared
with the CISC systems then being put out (and
still being put out) by Intel. Over the past
years I kept less abreast of the impending
technologies (having moved my focus to more
software development, and much of that *not* on
Intel systems), but at least kept aware of
scheduled *releases* and some of the current Intel
technology. I clearly missed the IA64 move
(talk about head in the sand) on which I have
justed started to catch up, and hence the "RISC"
discussion above.
The basis of my argument still stands, but the
compilers will be harder to write, and I see now
why there are some delays. Micro$oft does claim
to have a 64-bit windows running on a Merced
simulator (like that isn't a bald-faced lie,
judging by other orthogonal press releases coming
out of Redmond). I still firmly contend that
the current marketing infrastructure for Intel's
products will change if it cannot handle the
responbilities of making money in Intel's Brave
New World. etc., etc., etc.
Thanks for the heads-up.
Roundeye
"Cause there's 40 different shades of black, so many fortresses and ways to attack, so why you complainin'?"
I'm not sure why this review was written.
Intel has been plagued for a decade by backwards
compatibility with a poorly designed CISC chip
with one of the poorest memory subsystem designs
still in current use. The amount of juice which
can be squeezed from the '86 lemon is limited and
it is a testament to Intel's determination (some
would say stubbornness or stupidity) that they
have been able to make this architecture a
profitable industry standard (of course the more
cynical (myself included on the occasional lonely
night) might chalk this up as a testament to the
power of a tightly run monopoly).
Merced is a necessity if Intel wants to stay
profitable in the face of not only Moore's Law
but AMD and other not-so-dark horses. This chip
has been designed for the most part for years.
The compilers have been under development for
years as well -- anyone who thinks otherwise
doesn't know how Intel does business.
A company which has the resources to write
compilers for superscalar CISC with pipelining,
data forwarding, bizarre MMX
registers/instructions, virtual '86s while
maintaining backwards compatibility with the
original broken design will find writing a new
compiler for a freshly designed clean RISC
system a wonderful relief. The amount of
openly available published research in the RISC
compiler community is significant, and Intel has
the bucks to hire more gurus on the topic if they
need them.
Marketing... It pains me to see so many people
assume that "they way it is" is "the only way
it can work". This is the same fallacious
thinking that makes it painful to watch any
Hollywood movie about time travel or the contact
of our civilization with another (I think
Indpendence Day may be the flagship example of
this) -- the way we Americans do things in this
day and age is superior to the way any other
conceivable society could do them. Cultural
ignorance and arrogance.
This sort of thinking comes up quite often in
discussions of why "Windows will be here forever"
and now appears here in a discussion of Intel's
marketing plan for Merced. The truth of the
matter is that (1) Intel wants the market to
change -- they have been burdened with the '86
albatross for far too long, and (2) the market
will change. Initially we hardware power users,
systems hackers, and speed/systems freaks will
jump on Merced because it is a better chip than
a crappy CISC chip on steroids. The chipsets
to run the chips will be there, and at least
some variation in motherboard configurations.
Dell/Compaq/Gateway will be able to sell a
Merced system.
If, as Intel puts more of its weight behind Merced
(and more applications are brought to Merced) the
current distribution system cannot change their
marketing model to take advantage of the new
configurations which will be possible and then
*desired*, then someone will step up to make the
new money by providing them. Because it's done
a certain way now doesn't mean that that is the
only way (I reiterate at the risk of sounding
pedantic). This industry moves too fast to coddle
companies which have become too large to steer
effectively.
The distribution channels for these systems, and
multi-processor systems, will develop and may
not include the current Big Players in the market.
In addition, as Intel hopes, if AMD et al cannot
create a chip to compete with Merced, and cannot
anchor the market on the '86-type chips, they
may also find themselves too big to steer out
of the way of the Intel truck.
Be careful. Merced could be a swan song for
Intel, but I think it is more likely their
Excalibur.
"Cause there's 40 different shades of black, so many fortresses and ways to attack, so why you complainin'?"
1. What does Carmack have to do with writing a Merced compiler? He is an excellent 3-D game programmer, but most of the hand coded parellel-executions tweaks in quake were written by x86 assembly guru Michael Abrash, not Carmack.
2. As other people have pointed out, he doesn't know what he is talking about.
3. If I were looking for analysis on the Merced delays, I would dig around on www.mdronline.com for an excellent article by one of their staff on the subject rather than listen to this bozo. Synopsis of the MDR online article. Intel is leading the design of the Merced and they are using their standard massively parallel design approach (lots of engineers). Problem is, this approach works fine for successive iterations of an existing, well understood ISA & implementation, it is not working well for a brand new, cutting edge ISA and chip. His predection is that the Merced will have a very short life before it is superceeded by the second generation chip in the family, one being designed methodically and inexorably by a small HP led design team.
As long as Intel and the OEM's keep selling single-cpu boards, selling extra cpu's instead of entire systems shouldn't be too much of a problem. Most end-users don't like swapping mobo's and cpu's.
Most users don't need the horsepower of their current K6-350, or PII-300, I'm still using a P-133. How much of their sales are going to be towards companies that can afford a hardware guy, or hard core gamers who have the skill and motivation to do this though? That might be a problem.
That would also be enough motivation to keep on churning out a more advanced, speed and instruction-wise, processor though. Intel has been pretty good at making a new cpu ever 3 years or so, the average time between upgrades. They've been even faster with the improved chipsets. The rest of the computer has gotten better as well.
As long as it's more expensive to build/upgrade to a state of the art system, I don't think the OEM's will have too many problems. Everyone, especially the hardcore gamers, know that the cpu isn't everything. The compiler is another story, but that'll happen too.
SCO's UnixWare 7 has been running on Merced emulators for quite a few months now (and I hear that recently a version of NT4 is too). These emulators run on IA32 NT and UnixWare boxes - so Sun must be using NT or SCO's stuff to run under... ironic? :)
I believe Sun got solaris running
Who am I? Subscribe and find out
IMHO, the trouble is with Intel. Though they make some decent chips, but it has been a long long time since they have done any huge innovation. Consider: with the virtual monpoly they held on PC chips for ages and ages they should have been able to pour money into R&D and create some pretty new and exciting things. True, they were held back by backward compatibility, but this should have been more than balanced out but $$$. Even the latest PIIIs on the enterprise level are not better than HP chips, Sun, SGI MIPS, PPCs, or what have you.
Also consider that the IA-64 EPIC architechture was orignally an HP invention. As I understand it HP designed it but realized that they didn't have the money or volume to produce it well, so they went to intel. Intel realized that they were in a position of power (they could live without HP, but HP would be in trouble without Intel's Fabs) so they grabbed the architechture and made it their own. for Merced, intel is throwing a huge team of designers agiainst it, but is still doing a poor job because the corperate architechture is too rigid. (Intel has been known to be a nasty employer, with a slew of age and sex discrimination suits behind it) HP meanwhile is working on the next-generation IA-64 chip (McKinely I believe) which is coming along quite nicely. The last estimate I heard was that McKinely will really show the power of EPIC chips, where as Merced will be about comperable to whatever Pentium successor they have out at the time.
anyhow, this is just my take on it all.
Who am I? Subscribe and find out
It will be interesting to see if Intel Panics and tries to throw more people at the project if it falls behind. IMHO, that would cause more harm than good.
As I understand it, the HP Mckinely team is much smaller, which is a more intelligent way to attack a problem like this (drasticly new architechture)
Who am I? Subscribe and find out
I read the projected performance of the Dec Alpha 21364 from the Digital people (so its probably marginally biased) and the alpha blew the merced clear out of the water. Digital have been working on the alpha and its revisions for how many years now 5, 6, 7 ?? so one would expect it to be pretty refined. As well i think it will support upto 64 way SMP for serious machines.
It would be nice to know how many SMP cpus you can run with merced at once. If your building a serious server , multithreading and smp will probably save the day over multiple instruction pipelines any day.
It is interesting tho that the EPIC spec can support a virtually limitless number of instruction pipelines simultaenusly (at least according to the Ars Technica review) so if this compiler based strategy works, and chip densities increase (as they will) we could see some very , very wide CPUS.
And the worst part is, we need gcc to be able to support it thouroughly to be able to run linux and take full advantage of its capabilities.
I think I'll buy an alpha. It will probably cost less anyway. It's supported by linux (or vice versa) and we know the damn thing works.
Remember way back when the first pentium came out? It was pretty slow, but the word was it would speed up heaps when applications were re-built for the pentium and optimised with re-order of instructions and stuff.
Of course no-one ever did this. In the early days it was probably mostly because everyone needed it to run on the old processors too. These days it is partly because nobody probably remembers this anymore, or not many people have bothered to do _that_ good a job of optimising for exactly how the pentium works anyway.
This time around, the same thing will happen, EXCEPT that probably Intel will come out with a proper VLIW gcc based compiler for Linux/UNIX. They have to do this for the chip to survive.
Now sure, MS and Borland or whoever will make their own compilers but nobody will use the VLIW stuff for years because of backward compatibility. They will stick with whatever x86 compatibility box Merced has.
But any Linux vendor can just do a re-build of everything (use the source Luke). This might be a big win for Linux to take a speed leap over Windows.
I believe HP is "putting a bob both ways". They are still actively developing the PA series in case Merced doesn't pan out.
Yes, even Carmack doesn't think Carmack is a compiler god. You don't have to go much further than this idiotic suggestion to discount this article. Whatever the state of the Intel Merced compiler you can count on the fact that there are some incredible minds working away on it, people at the top of their fields like John C. is in his.
Why is everyone talking about Merced? That chip doesn't exist, we know nothing about it. Why should we care about Merced when we have nice Alpha's, MIPS', Sparc's, PowerPC's... that already do the job? We don't need another RISC chip just because Intel made it.
On the other hand 7 execution units isn't a good idea on a plain RISC because you can't avoid instruction dependencies regardless of how good the compiler is. But they are useful on vectorial chip. I guess they are making a vectorial CPU that can be scaled up adding more chips to the system.
While the notion that EPIC allows one to
throw more cpus at a problem is silly, this
does bring up the related idea of chip level
multiprocessing. That is, if you *do* have
a program that can be run well on an SMP
machine, then you can use a computer that
has two or more conventional cores on a single
chip, sharing an L1 cache. This may be a
better way to use a transistor budget than
fancy VLIW schemes. The shared cache would
make interprocessor communication very fast.
Linux programmers should try to make their
programs SMP-friendly.
VLIW is crap, because it ties the instruction
set to the processor generation. Intel would
not be successful if Merced v2 could not run
Merced v1 software.
EPIC fixes the problems with VLIW.
With EPIC, it is OK to add more functional units.
The compiler links together variable-length
groups of instructions that can execute together.
Example:
Your compiler finds a group of 17 instructions
that can execute together. (this is reasonable,
since there are 128 registers) Your CPU can only
execute 6 at once. You upgrade the CPU to one
that can execute 10 at once. No problem!
You only have trouble if your new CPU has more
functional units than your old compiler was
able to feed. This is simply an issue of a poor
compiler, not one producing code that is better
for the old CPU.
Ok. I think that the guy who wrote that is missing a few key ideas. Unfortunately, EPIC does NOT magically fix Inter Process Commuunication, nor is it a magic bullet. It DOES have a very heavy reliance on the compiler, but that is because the compiler will try to provide parallelism for the code. For a good article, take a look at this article from Byte Dec. 97: http://www.byte.com/art/9712/sec5/art1.htm
Doug
Stop talking about who's to blame when all that counts is how to change --"Born of Frustration" - James
Yes, but HOW much parallism can be extracted from most programs? As I understand it, the RISC architecture provides more optimization at runtime. I am interested to see Merced perform, but remember that the DEC guys have really put a lot of work into the compiler for the AXP and it is quite mature. I think it will take Merced a couuple of years for the compiler to matuure.
dmp
Stop talking about who's to blame when all that counts is how to change --"Born of Frustration" - James