Troubles with Merced

100 cpu's by Anonymous Coward · 1999-04-03 07:22 · Score: 0

I don't care how good your compiler or cpu is
in most cases if you add 100 cpu's the code will
not run 100 times faster. The writer of this article
seems to think it will. Also even if this were the
case no one is selling mobo's to average joe user
with 100 cpu sockets.

Re: The man is clueless; EPIC = VLIW by Anonymous Coward · 1999-04-03 08:31 · Score: 0

For this to work right, almost all instruction, pipeline and functional units characteristics have to be exposed. For example, for software pipelining (a technique to overlap instructions from different iterations of a loop) to work, pipeline latencies, and functional unit resource constraints at each time step has to be carefully considered.

Thank you for your message, which is much more informative than the initial page :-).

The problem is now: how well does EPIC/VLIW mixes with dynamic code and languages: Java, Smalltalk, Eiffel, Lisp, Perl/Tcl/Python, ... which trade CPU efficiency for speed of development.

On current RISCs, JIT and miscellaneous run-time optimizations could restrict themselves to emitting machine code ; it would then be scheduled properly by the (complex) hardware. But on EPIC/VLIW it seems that it wouldn't be the case ; is there any way to get still good performance for dynamic language ? Any research paper on this ?

Question about multiprocessing by Anonymous Coward · 1999-04-03 09:33 · Score: 0

I know that adding a cpu won't increase the performance of a program normally, because processes, in general, run on a single cpu. The program must be explicitly written to use multiple cpus. Also (correct me if I'm wrong), it is the kernels job to distribute different processes across cpus in a multiprocessing environment, but how are threads handled? For example in a multithreaded app, does/can the kernel run the app on multiple cpus? Is this provided in pthreads or do you need a special library like PVM (I think that's the right TLA :) ?

Question about multiprocessing by Anonymous Coward · 1999-04-03 09:39 · Score: 0

Yes, any reasonable kernel distributes threads across all available CPUs.

Linux cuts the mustard by Anonymous Coward · 1999-04-03 09:45 · Score: 0

Linux and BeOS have the same thread model, so I don't really understand what you're saying.

However, BeOS effectively forces GUI and network programs to be multithreaded, so BeOS apps tend to use more threads than Linux apps, and thus take better advantage of SMP.

RE: Buzzwords by Anonymous Coward · 1999-04-03 09:51 · Score: 0

Actually, EPIC calls for splitting the CPU jobs into threads easily worked on by multiple pipelines. The biggest problem with current multi proc technology is that most software isn't threaded in that way. But if the software has multiple threads, which EPIC requires it to, what is to stop the OS from diverting them to the pipelines on another CPU? Of have one CPU do one branch and the other CPU do the other? Epic is desgined for multiple pipelines, but also can aid in multiple CPUs.

EPIC = Dynamic VLIW, not plain VLIW by Anonymous Coward · 1999-04-03 10:21 · Score: 0

True, this aspect (code compatibility) has been dealt with somewhat in EPIC, but I think the spirit of VLIW is still there: expose as much architectural information that cannot be dealt with easily in hardware and do it in software.

So I think if you upgrade the EPIC architecture (add more functional units, change the latencies etc.) it would still be necessary to recompile to take full advantage of the new parallelism. This is true for superscalars, but much more serious in EPICs.

More thoughts on binary compatibility: I think
binary compatibility (in particular x86 compatibility) is a bad thing for innovation, and
not as critical as before. Reasons: desktops dominated by a single architecture is on the way out (I hope). Java, or things like java, where
the client with dynamically compile code into
binary by need will become more common. Hopefully this sort of dynamically and incremental recompilation can be made transparent to the users. Other
markets than desktops, like embedded systems etc, will become more important; these do not run shrink wrapped binaries. Finally, open source may
also have impact---if programs are distributed in source form, compilation and recompilation are
not a big problem.

Allen (leunga@cs.nyu.edu)

threads by Anonymous Coward · 1999-04-03 10:51 · Score: 0

you may have heard about userspace threads.

basically, this is a library which fakes multithreaded activity in one process. Since
this is in userspace, the kernel knows nothing of it and can only schedule it on one processor.
This was used to do MT on linux before it had
kernel threads.

nowthat linux has kernel threads, this sort of hack is unnecessary (on Linux). I am unaware of
the story wrt the BSDs or other oses.

Dos has had this back at least as far as '90
(MT on segmented memory is no fun)

No mention of unix workstation/server vendors by Anonymous Coward · 1999-04-03 11:44 · Score: 0

i believe sun has said they have a copy of solaris 7 running on a merced emulator.. but that was a few weeks ago. so who knows.. check out there site.. i think sun is looking forward to the merced hoping it would be cheaper.. but then again, who knows..

What a dope! by Anonymous Coward · 1999-04-03 12:13 · Score: 0

Yeah... this guy is a total fool. It looks like his site is only relevent to the same crowd that packs in sites like Tom's Hardware and Sharky Extreme.

These guys seem to measure the size of their penises by how much they can overclock their system, how much faster their newest gee-whiz sound card is than the next guy's, and how many megatexels their 3d card pumps out.

Let's see who's the god in their world? Whoever wrote Quake/Doom... nevermind that he's no compiler demi-god.

Very little technical content. by Anonymous Coward · 1999-04-03 12:20 · Score: 0

I intern'ed in an Intel compiler research group (MRL) 2 summers ago, when they are already actively pursuing innovative designs for the Merced. The suggestion that Intel isn't aware of the compiler issues is silly. The question is if they allocated sufficient resources to finishing the project on time. Who knows... we'll find out.

The idea that Carmack could step in and dominate this group of around 10 compiler-PhDs and 5 grad. students is pretty laughable.

No offense to Carmack, obviously.

The man is clueless; EPIC = VLIW by Anonymous Coward · 1999-04-03 12:34 · Score: 0

There's been other VLIW compilation techniques
around for a long time. The Bulldog compiler
and the MultiFlow TRACE machine did it. Others
do it as well. Look into the history of the
IA64 designers.

Re: The man is clueless; EPIC = VLIW by Anonymous Coward · 1999-04-03 12:55 · Score: 0

I think the source language is not an issue for EPIC, at least theoretically. If you have a good compiler that targets EPIC and can extract parallelism from scalar code, you can compile and run interpreted languages like Perl and Python(?) well. If you want to generate machine code yourself from your TurboVisualtkPerlLispTalkMLHaskell++ compiler, just hook it up to an EPIC backend and pass to it lots of aliasing and control flow info (to help the optimizer.) Of coure, in practice no such compiler/backend exists. Is egcs working on this problem? This is a huge undertaking and more than just hacking up a new machine description; the optimization algorithms are new and non-trivial.

JIT compilation may be an issue for EPIC, since the optimization algorithms are more expensive to run---the compiler will also have to do more. I have seen some new work on quick and cheap scheduling and register allocation algorithms for RISC machines (probably from the ACM PLDI or OOPSLA proceedings). I suppose the same quick and cheap approach can be applied to EPIC; you just pay more in performance relatively since there is no hardware there to bail you out if you do a poor job.

Allen
Re: The man is clueless; EPIC = VLIW by mprinkey · 1999-04-03 18:12 · Score: 1

The egcs compiler issue is significant. If you examine the current state of the Alpha backend for egcs, you have a very good indication of the problems that a Merced port might face. The 21164 is missing one feature (out-of-order execution?) which makes instruction scheduling very important and seems to really limit overall egcs/Alpha performance. Alpha represents a rather small deviation from the current CPU "norm," but yet these problems have persisted for quite some time. EPIC and Merced present a vastly different architectual model, so I worry about the ability of egcs development to keep up. Of course, with Intel's recent interest in Linux, perhaps they will be gracious and help engineer the compiler.

Forget about it breaking their sales model by Anonymous Coward · 1999-04-03 17:00 · Score: 0

Or look at Apple a few years back when the 68000 chip was running out of gas. The difference in the top end and bottom end CPU was so low that they had to handicap the bottom end machines with things like 40MB disks and so little memory that the OS wouldn't boot if you turned file sharing on. Apple found that you can only screw most customers over like that once.

Intel is just about at this point. Right after the Pentium III came out, I saw an ad for a $1900 Compaq - where about half the cost of that system is the CPU. Pity the poor sucker that buys that broken system thinking they're getting the top end. As General Motors and others have found out, selling dogs is not the way to get customers to come back quicker.

I hear what you're sayin, but... by Anonymous Coward · 1999-04-03 20:26 · Score: 0

>I hear what you are saying about home power >requirements, and I tend to agree (although >SimCity 3000 might just need that Merced ;-)). >But consider that 10k chips * $10k/chip = $100 >m. My understanding is that a chip like Merced >costs >Intel $1-3 billion (US terminology) all >found. So
>$100 million won't go very far to pay off that >loan - and the bulk of the sales still have to >come from the workstation side.

I feel like entering this argument :)
I believe that the prices of the Merced will indeed be much higher then most people will expect .. Your arguement in that the money gained from this isn't enough to remotely satisfy the money went into it is well understood, but keep in mind that other company's build just server chips and they are still around and kicking .. one such company, Alpha, was bought out by Intel.. but if that buy hadn't taken place, Alpha .. the Server only (Basically) company would still be around and by far, much better then Intel ..

BTW: Alpha is still around, under Compaq's control .. Supposedly they will be releaseing some new stuff that'll be braking the 1GHz barrier, EV8 is also supposed to be very interesting .. (not EV7 .. EV8! :)

Test Platform by Anonymous Coward · 1999-04-03 20:58 · Score: 0

Sort of a PPro situation?

SMP and Merced by Anonymous Coward · 1999-04-03 21:02 · Score: 0

Note that there's nothing keeping Be from switching back to supporting PPC-based platforms if SMP PPC systems become popular.Until it actually happens, though, they have no reason to sink time, energy, and resources into a market which has demonstrated itself as unviable for them.

-- Guges --

MERCED EPIC v DEC ALPHA 21364 by Anonymous Coward · 1999-04-03 21:14 · Score: 0

>And the worst part is, we need gcc to be able to >support it thouroughly to be able to run linux >and take full advantage of its capabilities.
>I think I'll buy an alpha. It will probably cost >less anyway. It's supported by linux (or vice >versa) and we know the damn thing works.

Now if everybody thought this way, technology would never evolve.

I hear what you're sayin, but... by Anonymous Coward · 1999-04-04 00:41 · Score: 0

With companies like Samsung introducing Alpha-based 64 bit chips at low prices, and an open-source model that allows apps to be compiled for a specific platform (thereby freeing the user from the shackles of backwards compatibility - the single biggest monkey on Intel's back since the 286) the Merced could be irrelevent

SMP and Merced by Anonymous Coward · 1999-04-04 04:14 · Score: 0

SMP doesn't of itself improve the SINGLE process performance. You CAN write special code [..]

Note that the OS under discussion (BeOS) uses a multithreaded kernel and provides many powerful multithreaded libraries. By virtue of using any system calls and/or library calls, any BeOS application gains the benefit of having itself broken up into threads and distributed across multiple processors, even if it is written serially. Though of course that doesn't help much if all your time is spent in an inherently serial inner loop. I just wanted to point out that the matter isn't as cut-and-dried as your post made it sound. The general gist of your post, ie that adding more on-chip parallelism to processors like the Merced and Xeon is for improving single-thread performance, is absolutely correct.

-- Guges --

The man is clueless; EPIC = VLIW by Anonymous Coward · 1999-04-04 11:16 · Score: 0

Thanks for the great comments!

Yes, I forgot to mention Josh Fisher's,
John Ellis' and Bob Rau's works: for Multiflow, Bulldog, and Cydra 5(?) etc.

I disagree on the relative ease of making a good EPIC compiler though: I don't think naive tricks like hoisting loads and prepare-to-branch instructions to hide latencies with work well by themselves. I think resource constraints have to be modeled directly to get even decent performance. For example, hoist too many loads too early when you lack the # of load units and you have to stall etc. (No real data to back this up though; just from my own experience in writing schedulers and guessimate.) And this basically means modeling your EPIC as a VLIW.

Furthermore, EPIC has predication, and various data and control speculation instructions, and you really have to take advantage of these to get good performance. So the compiler will not be as simple as a compiler for a superscalar. In fact, you can't even take an existing superscalar backend and hack up an EPIC backend from it; the IR, algorithms etc are just so different, especially with predication involved.

From the perspective of marketing though, I agree that EPIC has an advantage over VLIW.

MP Question by Anonymous Coward · 1999-04-04 11:16 · Score: 0

I recall that Beos was capable of taking advantage of MP chips. Programmers didn't have to do anything special in their code and beos would distribute tasks to cpus accordingly. Am I right about this or what?

Ignore that article by Anonymous Coward · 1999-04-04 11:49 · Score: 0

Why does thre processor need to check if the compiler "lied"?

Branch prediction by Anonymous Coward · 1999-04-04 13:41 · Score: 0

This Merced design of executing both branches seems like it would take an enormous amount of work. Is it really worth it? Isn't a simpler design able to operate at a higher clock rate?

Of course its worth it, with sufficient parallelism its free to run both branches.

As to the higher clock rate, there are real physical limitations down that route, meaning it will end at some point, and beyond that there is no gain with clock rate. However executing multiple instructions at the same time, even if it were just 2 at once, if you could maintain that consistently then you would expect processing at 2 times the speed.

Re: The man is clueless; EPIC= VLIW by Anonymous Coward · 1999-04-04 20:39 · Score: 0

Is JIT reduce to compiling for the hardware the JIT compiler is running on - or a virtual machine?

JIT is "Just-In-Time" compilation, and actually outputs machine code (see kaffe for a Java JIT). The idea is that at run-time you have a better idea of the types of the variables and actual functions called, even though at compile time it wasn't necessary clear. This is an optimization that could be applied to dynamic languages: Smalltalk, Java, maybe Lisps [they also have type declarations], and hypothetically to any interpreter (Perl, Python, Tcl, ...).

No mention of unix workstation/server vendors by Anonymous Coward · 1999-04-05 17:36 · Score: 0

SGI have already announced that they are
going to build another generation of MIPS
CPU's to cover the anticipated delay in
the release of Merced.

That's a pretty expensive thing to do
unless there are some very serious
problems with the Intel part.

Test Platform by Anonymous Coward · 1999-04-05 17:43 · Score: 0

No - SGI had announced future product containing
Merced processors.

I agree about Carmack though. Programming games
and writing a great compiler require two very
different sets of skills. In writing a game,
you also get to change the rules to suit the
code. When writing a compiler, you have a very
fixed language spec and a very fixed CPU spec
and you have to bridge the two.

hold off on merced comments just yet. by Anonymous Coward · 1999-04-03 07:16 · Score: 1

Its happening now just like it has since the begining of time. Everytime someone creates something completely new, and it's a risk, people come out of the woodwork to dismiss it. Before everyone sits around and says the Merced chip is done, wait till it ships before judgement is passed. If it ships on schedule, and is buggy, is that better then it shipping a year late, and flawless? As far as I know, no other chip company is taking this much of a risk on a new chip. I say more power to Intel.

EPIC != SMP by Anonymous Coward · 1999-04-03 07:26 · Score: 1

Explicit parallism (EPIC) has nothing to do with multiprocessing (SMP). Dell and Gateway have nothing to fear, but idiots who write articles like this.

To take advantage of EPIC, a compiler needs to look for machine instructions that have no dependency on each other. Those instructions can be executed simultaneously.

To benefit from multiprocesing a program must use threads (break itself up into what are called lightweight processes). Threads are an operating system service, not really a processor level thing. A compiler cannot make a thread - even on Merced. The programmer creates the threads by triggering operating system calls within the process.

Thus, even on Merced, an ordinary Joe Blow can't just make individual programs faster just by popping in new processors unless the software is written using threads. And even then, the benefit will only affect software on operating systems that offer both user and kernel level threads like BeOS and Solaris UNIX. NT doesn't really cut the mustard. And Linux doesn't either.

As far as writing the compiler goes, he's partially correct there, but only partially. All programs have a few instructions that can be executed simultaneoudly safely, but how mush faster would that make the program. A compiler must be well written and this will make quality comparisons between different vendors' compilers much more useful. How crappy will the MS Visual C++ be then? Will NT even run on Merced?

Branch prediction by Anonymous Coward · 1999-04-03 20:33 · Score: 1

You got the Alpha and PA-RISC mixed up: the Alpha always assumes that a branch to a previous address will be taken (which makes loops fast and gives compilers a handle on how to optimize code with well-understood flow).The main advantage to using such a simple (and more or less effective) technique on the alpha was that it consumed very few transistors and the Alpha was facing very severe space constraints.The 21264 is about twice as powerful as the 21164 at the same clock speed, and most of the benefit came from improvements in the branch prediction (made possible by better fab technology relieving some of the space constraints; the Alpha is a *big* processor).

Branch prediction is very important for keeping deep pipelines from stalling.If your pipe is 33 instructions deep and your branch prediction is only 90% effective, then your branches cost you an average of three extra cycles each.

Speculative execution is another powerful tool for keeping your (now tree-shaped) pipeline full, but it's not intended to be a complete replacement for branch prediction.On systems where the pipe isn't trivially shallow speculative execution is used with branch prediction (ie, executing speculatively on the earliest branches and/or poorly predictable branches and using branch prediction for the rest).Speculative execution is expensive in terms of duplicating issue logic and ALU, but that's not much of an issue for today's microprocessors -- most of the space on-die is taken up by memory cache, which tends to become much less effective per-transistor once beyond a certain (already long surpassed) size.As long as the extra logic for speculative execution yields better gain per transistor spent than L1 cache, it's a win.

Yet another technology for ducking the high cost of conditional branches is predication.Predication is orthogonal to prediction and speculative execution.Its biggest strength is that it doesn't require much extra logic, doesn't require splaying your pipe into a tree (a la speculative execution), and greatly reduces the cost of small blocks of conditionally executed code albeit not as much as good branch prediction would, so its use is more or less limited to small blocks of hard-to-predict conditionally executed code, and having it in your processor by no means allows you to get away with not using good branch prediction logic.

-- Guges --

The man is clueless; EPIC = VLIW by Anonymous Coward · 1999-04-03 07:47 · Score: 2

EPIC means the following:

1. The compiler has to explicitly package instructions that have no dependences into parallel issue packets. This task is currently
done by hardware in superscalars.
This means that you can't just slap another CPU onto the board to make things faster:
the parallelism is in the instruction level and is compile-time determined; most bindings are done
statically. For this to work right,
almost all instruction, pipeline and functional units characteristics have to be exposed. For example, for software pipelining (a technique to
overlap instructions from different iterations of
a loop) to work, pipeline latencies, and functional unit resource constraints at each time step has to be carefully considered.

2. EPIC is basically VLIW, but don't say that because all VLIWs (except DSPs like TI's C6) have been commercial failures. Besides it's not as salesirific as VeeEllEyeDoubleU.

3. The compiler is crucial. So far, only Univ of Illinois IMPACT group and HP CAR group really really knows how to build one well. (My opinion
of course)

4. For more technical info, read comp.arch,
look at proceedings such as MICRO. See also www.trimaran.org. Just don't listen to the clueless.

Allen (leunga@cs.nyu.edu)

Very little technical content. by Anonymous Coward · 1999-04-03 05:53 · Score: 3

The article has very little technical content. And it may not be accurate even.

I didn't knew that Carnack was the semi-god of compilation research, and I thought that some people had OSes running on Merced simulators.

The sales model: of course Intel made a gamble ; but the gamble is that you couldn't make much architectural optimization on current RISC (maybe that's why people are throwing away die area with multiple MMX, 3DNow!, whatever "multimedia/SIMD" units), so a new paradigm shift could outperform older units. Basically, Intel is betting that superior performance is a valid reason for being (partly) incompatible with x86 (or to keep up with competitors).

The a huge part of Merced issue is essentially technical, and the article is just completly out in this respect. Please, someone fix this and post relevant URLs...

Buzzwords make it easy to spot the idiots by Anonymous Coward · 1999-04-03 05:52 · Score: 4

This guy clearly doesn't know anything more about what EPIC is other than what the acronym expands too.

Even from the small amount of information published about IA64, it is clear that there is absolutely no support for automatic scaling simply by adding cpus. EPIC refers to the way each individual cpu decodes the instruction stream. EPIC is no more inherently multi-processor than the current IA32 instruction set.

To get automatic scaling, you need something like Tera's Multi-Threaded Architecture. Too bad they can't seem to ship the damn thing, and that it costs a couple of million.

See: http://www.tera.com/ for more info.

Time for change? by bjk4 · 1999-04-03 05:42 · Score: 2

If the trouble is in that the money model does not work with easier to upgrade hardware, then maybe the model needs to change. Currently Dell and Compaq make money selling whole computers. Perhaps they should sell or lease parts in addition to cases. That way you could change the CPU every few months and keep current, for a fee of course.

Time and progress won't hold still, so perhaps you shouldn't.

-Ben

It's not just _a_ compiler by bhurt · 1999-04-03 08:05 · Score: 2

It's not just _a_ compiler that has to be worked over, it's all of them.

In addition, it's widely accepted that there will be faster RISC CPUs available then Merced when Merced ships, and even faster x86 chips (at running x86 binaries). Before using this to claim that Merced is dead, remember that this was true of the initial RISC chips when they came out. What this means is that it'll simply take a while for EPIC to mature and for the advantages to come to the fore.

The problem is that Intel doesn't have much of a choice. Well, I suppose they could have gone for a standard RISC chip. But something post-x86 is necessary. If nothing else, the 32-bit limitations of the architecture are hurting Intel's sales in the lucrative server market (where 10's of gigabytes of RAM are common, and 100's not unknown). The desktop user probably won't care for another 4-5 years, but the server market started caring 3 years ago.

One interesting note in all of this is how this is affecting the Intel/Microsoft relationship. By it's actions, Intel has no confindence in Microsoft being able to ship an Enterprise-ready 64-bit clean OS any time soon. Not that I blame them.

Intel is learning one nice feature about open-source operating systems- they don't have to depend upon someone else to support their chips. For a small engineering investment, they can do it themselves- and if you want something done right, doing it yourself is a real good idea. Making a small investment in a small company (like, say, Redhat) makes a lot of sense in this context.

That being said, I think EPIC is an interesting design with a lot of long-term potiential. Standard RISC processors have a hard time averaging more than about 2 parallel instructions. Research done by HP indicates a lot more than that is possible- it's just computationally infeasible for the _processor_ to find it.

No mention of unix workstation/server vendors by shrike · 1999-04-03 07:17 · Score: 1

I wonder how much time the author of this article
took to research the matter.

I see no mentioning of any of the unix vendors.
Both HP and SGI are going to use the IA64. HP
will be fasing out it's PA-RISC CPU in favour of
the IA64. (Don't know about SGI's use of MIPS
CPU's.) Both vendors have extensive experience
with multi-scalar RISC CPU's. Also, Intel has
it's own RISC CPU's and the are several 3rd party
compiler developers probably just waiting for a
break.

Also, he starts comparing Joe Average's IA64
system with real server machines (HP PA-RISC,
Alpha AXP, MIPS R10K, Sun UltraSparc). An
IA64-based system is going to cost more that Joe
makes in a year! He can't even buy a machine
based on one the the currently popular server
architectures (except maybe Intel IA32/Xeon).
Also, the comment about just adding an extra CPU
is also valid for current SMP Windows NT based
systems, since almost all software is
multi-threaded. (I really like my dual P150
running Linux...)

Mathijs

Ignore that article by Nelson · 1999-04-03 08:09 · Score: 1

The author doens't know what he is talking about.

Most processors are already parallel in the way EPIC means parallel. It has multiple units of execution which can concurrently process instructions. Multichip parallalism is a much more tough problem with lot's of different problems to beat. It has nothing to do with merced or the sales model.

An IA-64 instruction comes in a bundle with 2 other instructions, all together there are 3 instructions in a bundle. Each instruction is something like 40bits long and each bundle has a dependancy flag of several bits. The performance problems that hinder chips the most are pipeline stalls and branches. The chip has a ton of logic that tries to predict branches and choose the right one and modern chips have a ton of logic to execute instructions out of order to reduce stalls. IA-64 forces the job of stall detection to the compiler, which makes the instruction bundles and chooses the dependancy flag (the flag says which instructions in the bundle and conflict) that way the chip doesn't need as much logic for out of order execution, they can focus on more important things. This is also a piece of cake for modern compilers, IBM, Sun, MIPS and DEC all have the technology to do this and most have for years and years.

To solve the branch problem, IA-64 doesn't use branch prediction. This is the really important part of EPIC. It executes both branches and once the correct path is know it discards the instructions it executed on the wrong branch. THis is tough to do. The compiler is also supposed to help with this and add some bits to the flag and this is a tough thing to do.

If it all works, IA64 chips will be fast, but nothing stellar because RISC chip makers have done such a great job of dealing with these problems already. So Intel has chosen to make a very complicated design, with some hard but not impossible compiler changes and they aren't going to deliver the ultimate performance they have been promising for years. There are definitely hard technical problems to solved but they aren't that bad, I think the bigger problem is actually building a chip that can compete with modern PowerPC and Alpha RISC processors and look like it is innovative. Intel is breaking compatibility and once that is done it's anybody's market because they have nothing that makes them look better than the other guys (like 25 years of x86 software...)

The funny thing about all this epic talk is that intel still has to have logic on the processor to tell if the compiler lied... They were trying to get rid of that logic to make a leaner and meaner processor but they still have to have it.

SMP and Merced by Phil-14 · 1999-04-03 17:10 · Score: 1

I don't really know for sure, but it seems to me that one of Intel's major problems is that they want to charge an extreme premium for performance, and don't want to wake up in a world where you can scale processor power by adding CPU's.

I find it odd that just when the PPC people will be removing the "SMP Premium" charge from their chips, and making very SMP-capable G4 chips, Be will be abandoning the PPC arena for Intel chips, where the only processors capable of scaling beyond two-way SMP are "non-consumer-grade" very expensive, very high-margin server chips.

Once someone comes out with a decent low cost multi-smp-scalable (beyond 2!) chip and motherboard system, the world will beat a path to their door. I think if that ever happens, Be will have to decide whether or not to stick with Intel and watch some more processor-agnostic SMP-capable OS (like Linux) seize the ground of becoming a "media OS."
Phil Fraering "Humans. Go Fig." - Rita

--
(currently testing something about signatures here)

SMP and Merced by Phil-14 · 1999-04-05 05:04 · Score: 1

You don't understand what I mean.

For the intended market for Merced, i.e.
servers, being able to handle multi-threaded
applications would help a lot.

This also holds true for consumer OS's, otherwise
M$ wouldn't be quite so concerned about Be.
Phil Fraering "Humans. Go Fig." - Rita

--
(currently testing something about signatures here)

Author ignoring reports by ChrisRijk · 1999-04-03 08:30 · Score: 1

The author mentioned articles in The Register, and other places, but seemed to just ignore the content of those articles. He is right about the compiler though - it is a big issue, and I've heard it isn't going too well.

Incidentaly, what I've heard at the Register does correspond pretty well what I've heard though other channels.

Basically, the 'other' problems with the Merced are the design itself - Intel's engineers aren't used to doing this sort of thing, and also, apprantly they're a bit short on quality engineers, and are using lots of people who've just left university. Not the sort of people to give a massively complicated chip design to.

Incidentaly, the '2nd gen' EPIC chip, the McKinley is mostly being done by HP, and is apprantly going pretty well. I've been hearing from my own contacts for a long long time, that the Merced might just end being a 'test' processor that never goes into production, and that the McKinley will be the first production EPIC.

Not surprisingly both HP and SGI have recently been saying they'll still commited to their own architectures (at least for a while), after previously planning to dump them. I think HP have been saying they'll go with their own stuff for another 5 years.

Intel isn't the only one being a bit late. Sun are about a year behind with their UltraSparc-III, though I haven't heard anything about why they're behind. (their reasons for being late is probably quite different to Intels.) Shame, as it seems like a pretty nice chip...

Intel's pricing model doesn't work that way... by sphealey · 1999-04-03 09:35 · Score: 1

"Too many web sites (especially gamer sites, for some reason), don't seem to understand that Merced isn't for the average user. When it comes out, and at the very least for a few years following, it will be an ENTERPRISE level chip. This means 1) expensive as hell 2) used in"

Intel's pricing model doesn't work that way. True, with every new chip Intel announces "this is for servers only". And the first few thousand chips do go into servers. But the server market isn't anywhere near large enough to pay back the cost of developing that chip, so within a few months workstations are released, first by one of the larger clone makers, then by Gateway 2000, and finally by Compaq.

And Intel absolutely depends on these workstation sales to drive their learning curve /price reduction model. Otherwise they couldn't earn a return on the chip or keep AMD etc. at bay. Set up a spreadsheet and play around with some pricing models (first 10k chips at $2000, next 100k at $750, and so on). The arithmetic is quite simple and inoxerable.

So look for the first Merced (McKinley?) workstation about three months after the first server is released.

sPh

I hear what you're sayin, but... by sphealey · 1999-04-03 10:49 · Score: 1

"Not in this case. These pricing models will be on a much larger scale. Try $10,000 for the first 10k chips, etc. This will one won't be quick to the home user (intel will still be realeasing some"

I hear what you are saying about home power requirements, and I tend to agree (although SimCity 3000 might just need that Merced ;-)). But consider that 10k chips * $10k/chip = $100 m. My understanding is that a chip like Merced costs Intel $1-3 billion (US terminology) all found. So $100 million won't go very far to pay off that loan - and the bulk of the sales still have to come from the workstation side.

Just my 0.02.

sPh

Compilers *are* hard - Ask MIPS or HP by Michael+Snoswell · 1999-04-04 00:34 · Score: 1

MIPS and PA architecture chips are the architecturally closest *working* chips to Merced that we have today.

I remember all the trouble MIPS had when they rolled out the R10000 chip. Initial performance was not up to spec because early estimates of performance were pretty much correct on the SPECfp numbers, but underestimated *how long* it took to get those numbers. It took a couple of years for the compiler people to wring out the best performance (ok clock speed was off too, but that was not the sole reason, nor was internal CPU wars within MIPS/SGI).

Now the Merced is more complex than the R10000 (and at least the R10K has some *vague* similarity to the R8K, and PA architecture has been around for many years, so these companies had compiler writers experienced in some of the problems they were up against). Intel is starting from scratch here. I'd say when they've done first tapeout and have silicon in their hot little hands, it'll be at least a year before the compilers get close to the performance they hope for.

Meanwhile, IA32 will be up to similar spec and Alpha, PA and MIPS (and SPARC perhaps) will be serious contenders.

Just last week or so MIPS and HP announce they were reviving their CPU development for a further year or so (ie another generation), rather than trusting all to Merced (I assume that means last MIPS or PA in 2003-2005 now). What news did Intel give these guys for them to decide to make such an announcement??

cheers

Michael Snoswell

--
pithy comment

Only half right by Doug+Merritt · 1999-04-03 05:57 · Score: 4

Certainly the compiler is known to be a difficult important issue with Merced, so that part is sort of right -- although I don't see that as a reason for them to slip ship dates.

But I don't know where he got this idea that Merced automatically makes all applications multi-processor ready; that's just plain wrong. High end processors have had multiple execution units for many years, which allows them a small amount of very fine grain parallelism: on average perhaps two instructions can be executed at once. Sometimes when you're lucky it can be more than two for a short burst. Merced will *not* be able to keep all 7 of their execution units busy 100% of the time, but they may get lucky and do so for an instant every once in a while, if their compilers are really good.

None of that has anything whatsoever to do with multiple cpu's. The situation with those will be unchanged from the situation today with multiple e.g. Pentiums: applications won't take advantage of more than one cpu unless they are explicitly coded to do so.

Therefore the conclusion of the article is dead wrong: the business model won't change, because he just misunderstood the issue with parallelism.

--
Professional Wild-Eyed Visionary

The man is clueless; EPIC = VLIW by stripes · 1999-04-04 01:57 · Score: 1

My summery of EPIC vs. VLIW vs. SuperScaler (note I use the term "functional unit" to mean "thing that can execute some kind of instruction", more functional units means a faster CPU, it's an oversimplifaction, but useful in this context):

Super Scaler CPUs execute a stream of inctructions with no information about data dependencies or execution unit dependencies. They figure it out on the fly. You can change the latencey of functional units, or the number of functional units, and still run the same code. Many transistors are used to figure out the dependencies on the fly.
VLIW executes a stream of instructions marked to show data and functional unit dependencies. No changes to functinal unit latancies or number of functional units can be made without risking (almost 100% risk in fact) breaking old code. No transistors are used to figure dependencies on the fly, all are used to actually do work. For a given number of transistors a VLIW should have more execution units then a SuperScaler RISC or CISC, at the cost of having no binary upgrade path.
EPIC reads a stream with data dependencies marked, but no information about execution units. You can alter number of functional units, and their latency and still execute the same code. Many transistors are needed to figure out functional unit use on the fly, none to figure data dependencies on the fly. For a given number of transistors an EPIc should have more functional units then a SuperScaler RISC or CISC, but fewer then a VLIW. It pays a cost relitave to the VLIW for having an upgrade path. The compiler pays a cost relitave to SuperScaler designs for having to find data dependencies at compile time.

Now my reply to allen's post:

1. The compiler has to explicitly package instructions that have no dependences into parallel issue packets. This task is currently done by hardware in superscalars. [...]

Yes, however these packages need only be free of data use dependences not executions unit dependencies (this is the big diffrence between EPIC, and traditonal VLIW).

For this to work right, almost all instruction, pipeline and functional units characteristics have to be exposed. For example, for software pipelining (a technique to overlap instructions from different iterations of a loop) to work, pipeline latencies, and functional unit resource constraints at each time step has to be carefully considered.

To get maximum proformance this is correct. From a "normal" VLIW you need it to get a working program. This diffrence is important. If you own a Multiflow (one of the defunct comercial VLIWs) and you upgrade it's CPU all of your old programs are incorrect (diffrent load latencies), and if you managed to compile your code to work with both load latencies, you still can't use more adders per cycle because the exact instructions that are executed per cycle are set in the code.

If you upgrade from a Merced with three integer execution units and two load units to one with six integer units and one load unit your old programs continue to work. The may run faster, or slower, but they still work.

I don't think you need to know all the details of the Merced microarcheture to get decent proformance. Just move the loads as far from the uses as you can, and get as many instructions marked intependent of their neibors as possable. You may end up moving loads farther away then needed, or marking more things as "can run in the same cycle" then your Mercend can gobble up, but that's ok. It won't kill you. It might make a furure Merced faster even.

2. EPIC is basically VLIW, but don't say that because all VLIWs (except DSPs like TI's C6) have been commercial failures. Besides it's not as salesirific as VeeEllEyeDoubleU.

The EPIC is basically a VLIW, except it is a little slower (for a given transistor budget), and it has an upgrade path. I think the upgrade path makes it comercially diffrent from VLIW.

3. The compiler is crucial. So far, only Univ of Illinois IMPACT group and HP CAR group really really knows how to build one well. (My opinion of course)

Multiflow made a good one (well, it got good results most of the time, it was pig slow). DEC eventually bought it when Multiflow went under.

Also the compiler isn't as hard as it is for VLIW. With VLIW if you get the latency wrong you don't run. EPIC just stalls. Kind of like SuperScaler. Getting max speed requires tons of work, but the same work would speed up a SuperScaler (by exactly the same amount, if the SuperScaler has the same number of functional units). I think the big diffrence will merely be that EPIC CPUs will tend to have many more functional units so the bad-compiler vs. good-compiler will be more like a factor of 8 then a factor of 4 (or factor of 2 on a PII/PPro/PIII).

4. For more technical info, read comp.arch, look at proceedings such as MICRO. See also www.trimaran.org. Just don't listen to the clueless.

Indeed. And that's not a slam, your opnions were well thought out, I just happen to think that requireing explicit dataflow (EPIC) is very diffrent from explicit dataflow AND instruction scheduling (VLIW).

Ignore that article by Guy+Harris · 1999-04-03 14:08 · Score: 1

To solve the branch problem, IA-64 doesn't use branch prediction. This is the really important part of EPIC. It executes both branches and once the correct path is know it discards the instructions it executed on the wrong branch.

I assume you mean that it performs speculative execution (which is what you described) in addition to having predicated instructions, e.g. speculatively executing predicated instructions before it knows what's in the instruction's predicate register, and throwing away instructions' results as soon as it finds out that the predicate register was false.

(I.e., predicated instructions aren't the same thing as speculative execution; don't automatically conclude that Merced does speculative execution merely because IA-64, of which Merced is planned to be the first implementation, has predicated instructions.)

It's not just _a_ compiler by Guy+Harris · 1999-04-03 14:15 · Score: 1

If nothing else, the 32-bit limitations of the architecture are hurting Intel's sales in the lucrative server market (where 10's of gigabytes of RAM are common, and 100's not unknown).

...although, of course, one can support more than 4GB of RAM with a 32-bit processor, in the sense of a processor that can't handle more than 32-bit linear virtual addresses, as long as the processor's physical addresses can be more than 32 bits (as is the case with most, if not all, P6-core processors - Pentium Pro, PII, PIII) and as long as the chipset can handle it.

It may be less convenient, as one might have to have a process manually map stuff into and out of its address space if you want a single process to use more than 4GB of RAM (as opposed to, say, having file systems use it as a buffer cache, although that may also involve switching mappings), but it's certainly still possible.

(I say "linear virtual addresses" because, whilst the x86 segmented virtual addresses go up to 48 bits, they first get mapped by the segmentation hardware to a 32-bit linear address before being used as physical addresses, if you haven't enabled paging, or before being run through the page table, if you have enabled paging; not only are 48-bit addresses not necessary for accessing more than 4GB of physical memory, they don't even help you to access it.)

EPIC != SMP by Guy+Harris · 1999-04-03 14:28 · Score: 1

A compiler cannot make a thread...

I was under the impression that "auto-parallelizing" compilers can convert, say, some Fortran or C/C++ code into multi-threaded code.

See, for example, this Sun white paper on their compilers, which, it appears, can auto-parallelize loops to run on multiple processors.

EPIC != SMP by Guy+Harris · 1999-04-03 14:32 · Score: 1

And even then, the benefit will only affect software on operating systems that offer both user and kernel level threads like BeOS and Solaris UNIX. NT doesn't really cut the mustard.

In what fashion? Its threads may not be "both user and kernel level" in the sense that there are user-level threads that can be executed by a pool of kernel-level LWPs, with the possibility that there are more user-level threads than kernel-level LWPs, as is the case in Solaris, but I don't see why that's necessary in order to get a speedup to a threaded program by adding processors - would not the model I think NT uses, wherein every thread known to userland is known to the kernel (I ignore "fibers" here), be sufficient?

Forget about it breaking their sales model by Bruce+Perens · 1999-04-03 07:20 · Score: 1

Lots of companies have tried to protect their sales model by not manufacturing products that would break that model. Inevitably, their competitors manufactured those products and the sales model became broken anyway.

DEC comes to mind - handicapping their low-end systems so that they would not outperform the high-end ones.

Bruce

--
Bruce Perens.

Very little technical content. by stevew · 1999-04-04 02:31 · Score: 1

Intel hasn't innovated - give me break. Just the
silicon process technology they've developed would
wipe that arguement out - the other detail -managing
to get x86 to go as fast as they have generation after
generation disproves the statement also.

Also, EPIC as detailed, isn't really even an HP
invention, but rather an outgrown of things done
by companies in the 80's such as Multiflow and
Cydrome.

The author claims that the compiler is a bitch -well it
is, but they solved most of the problems with the
compiler technology at those earlier companies - and
some of the folks doing the IA64 are graduates of same.

I WOULD worry about the scalability of the architecture
though - that WAS a problem with the Trace and
Cydra architectures. You had to recompile for
suceeding generations of hardware. I personally
don't know if EPIC solves that problem with VLIW
architectures. Anyone know if it does, and how?

Steve