Intel's Itanium Processor Explained

Re:IA64 vs x86-64 by AKAImBatman · 2000-12-04 01:49 · Score: 1

Personally, I am sad to see the stack architectures like the b6000/7000 series from Burroughs (now Unisys) die. They were incredible marvels of computer engineering that were at least a decade ahead of the register architecture machines. I especially liked the concept of tagged data which enabled the software to do rather marvellous things. Just the kind of machine that could run Java quite well.

Not that I'm disagreeing, but I always thought that the Unisys architechture would get in the way of a Java implementation. My understanding of it was that the machine tagged each sector of memory (36 bit addressing, bleech!) with either a DATASEG or CODESEG tag. The rules of these tags could not be violated, thus making data segments non-executable, and code segments non-writable (after tagging). Now I could see how this could work under an interpreted JVM, but not how it would work with a JITted JVM. Could you by chance elaborate?

--
Javascript + Nintendo DSi = DSiCade

Great Processors of Past and Present by new500 · 2000-12-03 12:15 · Score: 1

Great Processors of Past and Present has the following information :

It is expected to translate 80x86 instructions into VLIW instructions (or directly to decoded instructions) the same way that Intel P6 and AMD K5/K6/K7 CPUs do, but with a larger number of instructions issued using the VLIW design, it should be faster. However, if native IA-64 code is even faster, this may finally produce the incentive to let the 80x86 architecture finally fade away.

scary

better have that 4000 way Itty workstation to run my 16 bit apps

Re:His standards are waaay too high. by AKAImBatman · 2000-12-04 01:54 · Score: 1

Planet business. Most scalable servers have a minimum of 4GB of RAM. Most big machines have upwards of 64GBs. (Can you say E10000?) In other words, this ain't your granny's machine.

--
Javascript + Nintendo DSi = DSiCade

Itanium at Intel. by Fbelch · 2000-12-03 12:15 · Score: 2

Well.. for all you interested in the itanium's there is a not to bad press release at Intel's website here:
http://www.intel.com/pressroom/archive/releases/sp 083199.htm

And.. Some Architectural Designs / information here:
http://developer.intel.com/design/ia-64/

Slishdot.org..Splashdot.org..sdot.org..Ah damn it.. /.org

Re:no room for coffee on my desk anymore by jeffy210 · 2000-12-04 01:54 · Score: 1

How about a beo.... aw, just forget it...
------------------------------------------- --------------------

--
------
"And may your days be long upon the earth."

Re:Some highlights... by David+Greene · 2000-12-04 01:56 · Score: 1

- Predication. You read this part right? This means no more pipeline flushes for missed branch prediction. None. This is a big saver. Although transmetas CPU's do this (to a limited extent) with their VLIW and OS, it is still wrong on occasion (i.e., not perfect branch prediction, which itanium will effectively provide)

Um...no. :)

Itanium most definitely does NOT provide perfect branch prediction. Predication and prediction are related, but very different, beasts.

Prediction tries to get around the added penalty of a branch mispredict over and above the "obvious" penalty of executing the wrong instructions. After a branch is predicted it takes some time for it to trickle down the pipeline and compute the correct answer. If at that time (or often a bit later) the machine compares the answer and finds its prediction to be incorrect, it has already fetched, decoded and executed many instructions from the wrong path of execution. But in addition there is a penalty associated with restoring proper machine state, re-directing the fetch engine and generally getting the pipeline filled back up. This is the penalty predication eliminates.

In fact, I would say that predication performs "perfectly imperfect" branch prediction, in that the machine never executes only from the right path. Prediction trades off the wasted time executing useless instructions to remove the restore/redirect/fill penalty of a misprediction and allow additional scheduling freedom. The scheduling freedom is important for a VLIW-style machine to keep the function units busy and reduce code bloat. However, if used unwisely, a predicated chunk of code can actually execute more useless instructions than a dynamically-predicting machine would, therefore offsetting the advantages of predication. This is why predication is usually reserved for hard-to-predict branches that cover short control sequences.

- Rotating registers. Why are these great? Usually you only have a few registers with CISC architectures. RISC has quite a bit more, but they are much smaller and you end up using them as much as the less populous CISC registers.

This just doesn't make any sense to me. What do you mean by "they are much smaller?"

Having 256 registers with the ability to cycle them means you will be hitting the L1 cache even less. While the L1 is fast, it is still at least twice as slow as hitting a register directly. This is another big bonus

As you corrected below, the number of registers is orthogonal to rotating them. The big advantage of rotating registers is their use in software pipelining, as explained in the wonderful discussion above. Note that software pipelining is especially critical on a VLIW machine for the same reason prediction is -- scheduling. Is anyone noticing a trend here? :)

As far as the number of registers go, yes, it is very nice to have lots of them, but it's important to be able to use them as well. Most compilers today cannot make much use of more than about 40 general-purpose registers unless they start doing "unsafe" things like putting global values into registers or using "non-traditional" architectures like register windows. Now I'm ignoring floating-point and scientific benchmarks where software pieplining can chew up registers like nobody's business. The point is that (for example) your kernel compile will not benefit from more than about 40 registers, at least with today's technology.

Some register usage studies we've done are available here and here. In particular, I suggest looking at our workshop paper on ILP, large register file tech. report and especially at our MICRO-33 paper (to be presented next week). These papers highlight how current compilers and/or architectures are artificially crippled to shoehorn programs into 32 registers. Many more can be used if some more tricks are pulled.

It sounds like Intel wont have a top notch compiler for another few years at best, and who knows when the GNU compiler will support even a fraction of the features.

What I can't figure out is why HP isn't developing (or announcing) a compiler. They have some top-notch people there who invented most of this stuff!

One very important thing to remember about IA64 is that all these nifty features are intimately tied together. It's a bit like a house of cards in that if one fails, the others will have a hard time making up the slack. VLIW implies that good scheduling is needed. Predication allows more scheduling freedom. Software pipelining allows more scheduling freedom at the cost of more temporary registers and copying. Rotating registers gets rid of much of the copying. The ALAT allows better more scheduling freedom and possibly more loop optimizations. See how everything works together to keep the machine busy?

--

Re:Everybody that has 4 Gigs of RAM, raise your ha by mce · 2000-12-03 17:04 · Score: 1

Most of our servers (I'm ignoring the little crap like print servers for the sake of simplicity) have between 1 and 4 gigs.

The 1 gig machines are pseudo-desktops, actually, in the sense that they provide desktop functionality for multiple people. The 2 and 4 gig ones are mostly used for chip development, and while we have not seen it over here yet, we have project partners that have had single ECAD processes grow past the 4 gig boundary (and no, these are not leaking memory like mad).

We also have a machine on which we develop ECAD software of our own, and guess what: we wanted to put 2 gigs in it, but were told by the vendor (HP) that 4 was the minimum. Yes, you read that right: they wouldn't sell us less than 4. What's more, even though we use it as server, HP officially call it a (technical) workstation.

--

--
Linux user since early January 1992.

IA64 vs x86-64 by PTrumpet · 2000-12-03 17:13 · Score: 3

I've followed the IA-64 for a while. I would have to agree that it is an entirely new beast which will take longer to develop good compilers for. The x86-64 design I have just checked up on.

You have to think about what you want out of a 64 bit architecture. To me they are 64 bit addressing, and 64 bit data.

Both architectures are capable of 64 bit addressing as far as I can see (actual implementations will probably be limited e.g. the initial AMD chip will initially be 48 bits of virtual address space I believe). How each handles the moving those addresses around will be critical to performance.

The major differences are in the handling of 64 bit data. The AMD chip extends the existing size of the registers to 32 bits and adds another set of 8 registers. The Intel one on the other hands supplies *heaps* of registers. While this would put the intel chip way in front, the downside is loading & storing registers when you change stack frames (calling a function). The rotating register stack helps, but eventually nesting of procedures can result in register spill. In some ways, the IA64 resembles a stack machine, but that's a dirty word these days - perhaps the terminology was avoided for those reasons.

Having written compilers over the years, I can definitely say that the i386 has suffered a severe shortage of registers. It's not a lot better than the pdp-11 (8 registers, 1 of which was the PC). As a result of this shortage, most languages run like a dog on the x386. They were even worse with the x86 because registers were dedicated to specific activities. It's also probably a good reason why cache performance has always been critical to getting i386 working well.

Having looked at both, my money will probably be with the AMD solution as it is an incremental design, not a revolutionary change. This affects the amount of work required to port existing code bases (OS core and compilers) to 64 bits. For example, my own OS could probably be ported relatively quickly to x86-64 much more quickly than with IA64.

Eventual performance differences will probably depend on the languages implemented and the programming styles applied. In my opinion, 16 general purpose registers is probably about as many that a good optimizing compiler would need for the typical C functions.

What both chips promise is the 64 bit addressing. This opens up a new realm for OS design because it allows disks and other structures to be mapped directly into the kernel's virtual address space. This is currently not possible with the current 4G limit because already storage devices are surpassing this limit. It is about time that CPU address space exceeded that of storage as it will allows for more elegant solutions to caching, disk management and swapping.

In the long run though, a new architecture is needed. Computing is likely change signiicantly in the next 10 years with the development of AI and better ways of using computer power. Given this, the IA64 might be the one which wins out in the long run because of the totally different view of execution. It does however assume that we finally make a break from the curse of legacy computing.

Personally, I am sad to see the stack architectures like the b6000/7000 series from Burroughs (now Unisys) die. They were incredible marvels of computer engineering that were at least a decade ahead of the register architecture machines. I especially liked the concept of tagged data which enabled the software to do rather marvellous things. Just the kind of machine that could run Java quite well. It is rather curious to see the trend from highly CISC machines to progressively more RISC machines, with the burden being placed more heavily on good compiler design. Consistent with this approach, IA64 looks to be a machine that will be tightly bound to specific compiler optimization techniques, although this bothers me a little because very likely those with access to the best compilers will be the ones who get the best performance out of these beasts. Compaq because of the inherited Digital resources would have access to some of the best compiler technology on the planet. It is widely recognized that the original Bliss compiler was state of the art by miles when Digital developed it in the 70's.

Re:IA64 vs x86-64 by David+Greene · 2000-12-04 02:32 · Score: 1

You have to think about what you want out of a 64 bit architecture. To me they are 64 bit addressing, and 64 bit data.

You'll get no argument from me, though I (and I suspect most people) would say that the addressing is by far the most crucial part.
The Intel one on the other hands supplies *heaps* of registers. While this would put the intel chip way in front, the downside is loading & storing registers when you change stack frames (calling a function). The rotating register stack helps, but eventually nesting of procedures can result in register spill. In some ways, the IA64 resembles a stack machine, but that's a dirty word these days - perhaps the terminology was avoided for those reasons.

Isn't every modern general-purpose machine a stack machine, then? :) Every machine (even windowed ones) at some point has to save its local register set to the runtime stack.
In my opinion, 16 general purpose registers is probably about as many that a good optimizing compiler would need for the typical C functions.

This really depends on the compiler. See my post above for some studies in this area. To summarize, hundreds of registers can be used effectively if you pull the stops out of the compiler. Any single typical C function will probably use around 64 or so.
It is rather curious to see the trend from highly CISC machines to progressively more RISC machines, with the burden being placed more heavily on good compiler design.

The machines are only more compiler-oriented in their ISA's. I think most people simplify the CISC/RISC argument a bit too much. In some ways, machines are becoming "simpler" for the compiler's sake (fewer instructions to choose from, more registers, pipeline interlocks, etc.). However, the underlying implementations are actually becoming much more complex to take the burden away from the compiler.
With RISC (and unfortunately ignoring the pioneering work IBM and CDC did decades before anyone else), we went from lacking pipeline interrupts and requiring branch holes (both making the compiler's job harder) to adding hardware interlocks and branch prediction to full-fledged register renaming and out-of-order execution. The underlying hardware is not "reduced" in any sense of the word! The compiler's job actually ot easier in the sense that (for example) scheduling is not as critical on an out-of-order machine as it is on an in-order machine. Of course it is still important, but the hardware takes some of the burden away.
With IA64, we're going to see this trend again. The first release is going be compiler-critical (like the early RISC machines) but later generations (McKinley, etc.) are going to add in prediction, renaming, out-of-order execution and all the baggage that comes with it.
When you get down to it, the interface to the machine (the ISA) and the "bare metal" are completely decoupled. This is taken to the extreme in the Crusoe.

--

--
Re:IA64 vs x86-64 by PTrumpet · 2000-12-04 09:35 · Score: 2

I will unload my memory a little to give you a bit of info. Hope I get my facts right - it's about 18 years ago when I worked on a machine writing a Basic+ compiler for it.

The word size was 48 bits with a 3 bit tag for each word. Address size of the basic architecture was I think 20 bits which was defined mainly by the array descriptors which contained size & length of 20 bits plus 8 bits of other info. I think later versions virtualized the address space by adding some kind of virtual paged memory, but don't quote me - I didn't have much to do with that side.

The important feature was the tagged memory. With 3 bits, you could tag your data down to the individual word level. From my fading memory there were the following word types.
- 48 bit real (which also represented integer data - precision escapes me)
- Double precision real (96 bits) (the two words had to be contiguous)
- a procedure control word
- an indirect reference word
- an array descriptor
and 3 more which I can't remember - probably other kinds of descriptors. I seem to recall two kinds of array descriptors, there was a special coding for interior pointers (string pointers).
Also I believe there was a stack control word - to manage what we now call structured error handling. I find it hard to dig up much on the web about it, but there's probably a few books out there. I found this..

http://www.ajwm.net/amayer/papers/B5000.html

The processor was rather clever in the way it did things. Say you loaded a value from a location that was a procedure control word - it would go execute the function it pointed to and return the value - rather neat for algol thunks. Also if you hit an indirect reference word it would recursively load the data it pointed at.

Also, because it was a stack architecture, most of instructions were 1 byte long - only a handful were two bytes or more which made for very simple instruction decoders. All the hard work was done by interpretation of the tagged data - the microcode is what did all this. You could actually write a program by configuring your data in the right kind of way.

Why I see it relevant to Java is that one of the banes of java is effective garbage management. With tagged data, this job would be made easier, and also the hardware type checking would relieve the interpreted/compiled code somewhat. With the trend towards more object oriented languages and polymorphism, hardware type checking is what is really needed to make such languages execute efficiently.

Clearly, the word size and so forth is a bit antique, but the basic concepts might be valuable.

I don't know a lot about CISC -> RISC optimization, but my guess is that the stack model carries quite a lot of implicit information and a CISC->RISC scheduler might be able to do quite a good job of it, especially since the set of interpretable opcodes is quite small in comparison to register machines.

I guess what I'm trying to get at is that modern languages like java and C++ are pushing the limits of register architecture machines. It's very much like the difference between Fortran and Algol in the early days. The fortran machines just couldn't cut it with languages like Algol or PL/I. Just think, they had no concept of a stack as procedures weren't reentrant (typically the return address was stored in a global location).

It was a slow and unreliable old beast, but it was a marvel from a computing scence point of view. I find it sad that the nuts & bolts people have dictated the way computing has headed over the past 20 years - perhaps the reason why AI hasn't taken off like it should. The year 2001 is looming and we haven't got any semblance of a HAL with us, and won't do for at least another decade or more.

You comment about JIT compilers and such, but miss the point. The JVM is already a stack architecture - perhaps by adopting the data tagging techniques, and implemeting the JVM in silicon, one could get a rather powerful beast that doesn't require JITing. The tagging is important to distinguish a class reference from data, especially in local variables which have a tendency to be reused. Also, don't try to confuse system/heap management with normal execution. Some kind of mode/privelege switch could assist easily in those kind of house keeping activities.

I've recently gotten some feedback from fairly large real projects done in java, and a constant theme comes out of them which is that Java doesn't manage heaps particularly well. Throwing memory at the machine doesn't seem to help especially when you have a constantly changing data set - eventually you hit the working set size and things grind down to a halt.

I think we could do well to revisit some of the older architectures to see if they offer techniques that might lend themselves to modern programming.

Spewed coke on myself! by pod · 2000-12-03 13:56 · Score: 1

EPIC instructions are put together by the compiler into a threesome called a "bundle."

Haha! OK, I know everyone was thinking this as they were reading this particular passage. I know you were!

Hehe, now imagine a beowolf... er, a roomfull of these! An Itanium orgy!

--
"Hot lesbian witches! It's fucking genius!"

Comment removed by account_deleted · 2000-12-03 13:56 · Score: 2

Comment removed based on user account deletion

Sheeit... by Will+The+Real+Bruce · 2000-12-03 13:59 · Score: 2

Real cache memory?

Dolla' dolla' bills, y'all!

--
Will the real Bruce Perens Please Stand Up

Re:COMPAQ's use for it by stu_coates · 2000-12-03 17:59 · Score: 1

The likes of Sun have been able to run Oracle in pretty much that way for a long time. With their E10000 fully loaded with 64GB RAM and 64CPUs Oracle sure does fly... and that's a 64bit version of Oracle too that's capable of using all of that resource. After the initial ramp-up time, most of the data that's used often is in RAM and the disks are barely touched unless you're doing some heavy update tasks that have to hit the disk anyway.

Re:Itanium Acceptance Links by ackthpt · 2000-12-03 14:03 · Score: 3

/. article from Aug. 15 on Linux for Hammer.

MS article on IA-64 Windows.

The Register article on MS dithering on Hammer support.

Whew! I knew I saw all this crap, I just had to remember where!

--

--

A feeling of having made the same mistake before: Deja Foobar

Re:Compatibility Problems by vanza · 2000-12-03 18:08 · Score: 1

The Itanium uses some sort of preprocessor to translate x86 instructions to the EPIC instructions the chip actually uses.

Correct me if I'm wrong, but doesn't the P[I-II-III-4] have 3 decoding circuits so they can translate what they call "macro instructions" (the assembly lines you/your compiler create) into micro-instructions the execution unit understands?

Intel has already been doing this kind of tranlation for quite some time now...

--
Marcelo Vanzin

Re:AMD vs Intel Support by cfleming · 2000-12-03 14:07 · Score: 1

They can both run legacy 32bit code without recompile.

The Intel is a completely different proc. It translates the old code into native code before running it. Recompiling the code is like porting to a completely different platform.

The AMD is a fancy i86 with 64bit registers. It will run old code as any i386 would. But code has to be recompiled to use the new 64bit registers much like with MMX.

Windows has already announced support for the Itanium. But the Sledgehammer should be much easier to support if Mickeysoft decides to, and the Sledgehammer should be able to run old code faster.

Personally, I will choose which ever has the best performance/price ratio, like I always have with K6's and Celerons.

Re:AMD vs Intel Support by gunner800 · 2000-12-03 14:08 · Score: 1

My info may be a bit dated, but IIRC AMD plans on better x86 support in Sledgehammer than Intel in their Itanium. I suspect we'll see piss-poor x86 performance and great 64-bit performance on Itanium, but pretty good performance in both categories in the Sledgehammer.

For both chips, you'll need to recompile for maximum effect, but it will be less urgent to do so for Sledgehammer.

So AMD would have the advantage of better supporting existing software without a recompile/rewrite, and Intel would have the advantage of raw horsepower for those who can go to the trouble to make it work right.

My mom is not a Karma whore!

Re:Informative, but OFFTOPIC by the+Nach · 2000-12-03 12:17 · Score: 1

Here Here! Then again, this might complicate the ratings system a bit too much. I would only want to see 'Informative, but OFFTOPIC' comments where (score >= 2).

Pipeline flush question. by Christopher+Thomas · 2000-12-04 01:56 · Score: 2

This means no more pipeline flushes for missed branch prediction. None.

Ok, perhaps I need to re-read my textbooks, but I seem to have missed the part about branch mispredictions requiring a full pipeline flush. As far as I can tell, all that would actually happen is the speculated instructions being invalidated in-flight, with other instructions proceeding as normal. You still get a delay - it's the equivalent of a stall of as many cycles as it took to figure out which way the branch really went - but certainly not a full flush.

Is there some mechanism that I don't know about at work here, or have Sharky et. al. just turned "stall" into "flush" because of miscommunication?

Re:Pipeline flush question. by David+Greene · 2000-12-04 02:09 · Score: 1

As far as I can tell, all that would actually happen is the speculated instructions being invalidated in-flight, with other instructions proceeding as normal.

True, but on most machines, this is rather late in the pipeline, so it is effectively a flush.
You still get a delay - it's the equivalent of a stall of as many cycles as it took to figure out which way the branch really went - but certainly not a full flush.

It may actually be worse than a flush if the cost of restoring the non-speculative state and redirecting the fetch is very high, not to mention the cache pollution (or prefetching depending on your luck) caused by wrong-path execution.

--

--

Re:How is this different from i.e. AMD or Alpha's? by ackthpt · 2000-12-03 12:19 · Score: 1

In a nutshell = VLIW

This allegedly would be a boone to servers. Workstations? Dunno. One thing's for sure, though, Intel would be pretty shamed to actually roll out a Alpha clone, after swiping much of the technology for the Pentium Pro and settling out of court with DEC (Now a part of Compaq.)

McKinley is supposed to run 32bit x86 apps, but last I heard Itanium runs them very slowly. (Yet another reference to Intel, when the PPro ran 16 bit code slower than 486 processors, eventually ironed out.)

--

--

A feeling of having made the same mistake before: Deja Foobar

25 Years? by ruck · 2000-12-04 02:02 · Score: 1

According to Intel, the EPIC architecture was designed with about 25 years of headroom for future development in mind (from the article).

I know that the 8086 architecture has been around for about 20 years now, but I find it very unlikely that any sort of architecture could last that long in the future. CMOS technology (and perhaps Moore's law with it) will hit a roadblock long before then, and I would hope that we'd be moving to fundamentally different technology by then (eg. molecular computing, etc.).

Besides, I don't know how anyone can claim to be planning 25 years ahead in the computer industry. When they were designing the 8086, they probably weren't saying "well, this should take us into the next millenium." Instead, they put together a chip that would run as well as possible based on the technology available at the time. The situation is no different with Itanium, and claims to the contrary are just silly.

Re:Rotating Registers... by David+Greene · 2000-12-04 02:04 · Score: 1

Thanks a lot, I never realised that adding instruction between stalled cycles could sped up the process (I always thought that there were slots 'for free', but not that it could accelerate the result). Make sense in reality, because by using the b1/b2/b3 we are giving more temporary memory for the execution...

It is a bit weird at first, isn't it? The way I always think about stuff like this is by going back to the fundamentals. In combinational logic, a SOP or POS form has only two levels of gates and is often faster than a more minimal implementation which may have more levels of logic. Likewise, a fast algorithm is usually longer than the shortest possible to get the job done, because you usually have to sort some container or other to get the speedup.

Proving once again that bloat is not necessarily bad. :)

--

Compatibility Problems by Life+Blood · 2000-12-03 12:19 · Score: 4

IIRC correctly this may be the major downfall of the Itanium. The Itanium uses some sort of preprocessor to translate x86 instructions to the EPIC instructions the chip actually uses. It performs some optimizations as it does this to parallelize these instructions as much as possible to increase speed. Still this means the chip will have the same sorts of problems as the Pentium Pros did, they will run significantly slower on older 32bit software.

IIRC, AMD on the other hand will be bringing out a chip which is essentially 2 32bit athlon cores stuck together and linked to produce a 64bit processor. It essentially needs no translator and runs 32 bit and 64 bit equally well. This coupled with the fact that Itanium has been going nowhere slow has me looking toward AMD for a good 64bit solution.

--

So far I've gotten all my Karma from telling people they are wrong... :)

Re:Compatibility Problems by gunner800 · 2000-12-03 14:12 · Score: 1

AMD on the other hand will be bringing out a chip which is essentially 2 32bit athlon cores stuck together and linked to produce a 64bit processor.
Wow, is there anything duck tape can't do?

My mom is not a Karma whore!
Re:Compatibility Problems by pope+nihil · 2000-12-03 12:31 · Score: 1

Ick, ick, ick!

AMD wants to breathe a little more life into an already ancient and far outdated processor core. The major reason that x86 suxors right now is that Intel hasn't figured out how to escape their legacy support. I APPLAUD them for finally doing something original.

As for AMD, they may see a bit of success initially with their x86-64, but as people migrate to the new non-legacy IA64, AMD is going to sink. It's unfortunate because I think there really should be a bit more competition in the processor market, but I just don't see people migrating to x86-64.
Re:Compatibility Problems by new500 · 2000-12-03 12:38 · Score: 1

I think I have to disagree : I would say "disappointment" rather than "downfall".
My original though is already posted here but to recap, I think what is occuring is a trade between Moore's law dependant compute growth and a recalcitrant Software developer's world which will never just recompile (even for a price) but wants to upgrade you, with all the problems that entails.
Throwing hardware at the problem, or silicon specifically, is hardly new, and it is distasteful to many people as a solution. But also wrt your downfall scenario, yes the PPro was a flop but this was in no small part because it had (very nice) on chip L2 chaches of up to 2MB, which *cost* and priced it out 'till Intel packaged the PII w/ PPro core and off chip, 1/2 core speed, typically smaller (cheaper) L2s.
Re:Compatibility Problems by Y2K+is+bogus · 2000-12-03 15:01 · Score: 1

Actually, the PPRO's problems were due to an overly small TLB cache. 16bit code hits the TLB cache pretty hard, and this was causing the PPRO to lag.

In the case of AMD vs Intel, they're both doing the same thing. AMD has been doing CISC->RISC instruction translation for about 5 years now. The Athlon is based partly on the Nexgen work, who they bought a few years ago. The real deal is that AMD has a more mature instruction translator than Intel, which should give them a leg up. Also, if the 64 really is 2 athlons stuck together, they might be doing some hairy SMP-on-chip kind of stuff that gets both execution units to work in tandem like SMP does. Intel's architecture is going to be slow because of the instruction width and memory access issues. The 32 bit instructions cannot fetch memory as fast as the 64 bit instructions because they are inherently limited to 32 bits wide of data access. This is the same issue as MMX, it can fetch in large quanities, that's why a simple read-write routine is twice as fast, it's twice as wide as a 32 bit read-write routine.

It's going to be a battle between 2 entirely different approaches, EPIC vs Tandem. The AMD approach has a lot of promise, however the Intel approach has a longer term benefit. Once the chipsets and memory catch up to their 3rd gen EPIC system, it'll be pretty fast. AMD may not be able to pull off a leapfrog unless they have a lot of tricky chipset work, and their 3rd gen 64 bit processor is scaled beyond the inherently bottlenecked Tandem/dual core setup.
Re:Compatibility Problems by plunge · 2000-12-04 09:40 · Score: 2

Frogive me for asking if this is stupid, but why exactly is everyone talking about AMD? I mean, SGI, among several other big companies, already HAS a working 64-bit processor on the market NOW that's world's better than the Itanium even promises to be. So what's the big deal with Intel going 64 bit almost two years after they promised they would? I mean, I understand that SGI chips are expensive and all, but how they don't have the 64-bit markt already tied up already?
Re:Compatibility Problems by MrBogus · 2000-12-03 22:06 · Score: 2

I think people will go to x86-64. Especially after Linux and *BSD have optimized versions for them.

And what percentage of AMD's sales channel currently is Linux/BSD? My guess is about 5%, not much more or less than Intel. Not enough to float an entire ISA on.

What's more likely is that AMD will sell the sledgehammer into the same channels they sell their current chips - home machines designed for gamers. That means WinME and a 64-bit chip that's running in 32/16/8-bit modes most of the time, plus maybe some "64-bit optimized" video drivers and games.

Linux users of this chip win big because they get a 64-bit arch subsidized by the great unwashed masses. However, for most users Sledge's 64-bitness will be a marketing feature along the lines of MMX or 3DNow.

--

When I hear the word 'innovation', I reach for my pistol.
Re:Compatibility Problems by rabidcow · 2000-12-03 13:33 · Score: 1

intel's 64-bit solution is designed for 64-bit efficiency.
amd's 64-bit solution is designed for 32-bit efficiency.

sad as i am to say it, intel's solution will win in the end unless they try to tack on endless instruction set extensions. (who am i kidding, we all know they will...)
Re:Compatibility Problems by Life+Blood · 2000-12-03 13:08 · Score: 1

Downfall is an overstatement, perhaps I should have said "major drawback." Ah well I'm an engineer not a writer for a reason I suppose.

--
So far I've gotten all my Karma from telling people they are wrong... :)
Re:Compatibility Problems by pope+nihil · 2000-12-05 04:11 · Score: 1

Intel has a monopoly on the desktop market and low-end server market, which is one of the places Itanium is eventually headed.

SGI has been a kind of sinking ship for a few years now. Although they made really nice products, they didn't market them so well, and they were very expensive. Now SGI has all but abandoned IRIX in favor of the much more marketable Linux.

The other place Intel hopes to put Itanium is the high-end server market, where it will have to take on the already entrenched Sun Microsystems (which has had a 64-bit chip for years as well). Like SGI, Sun is really too expensive to take much of a foothold on the desktop market, although they have really lowered the prices on their low-end workstations. The other thing keeping Sun out of the desktop market is Microsoft. Sun doesn't really have any good desktop software. Linux runs on Sun, but let's face it. Linux hasn't displaced Microsoft yet, and it probably won't happen with the release of Itanium either.

Re:Isn't the ATHLON already 64-bit? by TurkishGeek · 2000-12-04 02:10 · Score: 1

The Athlon core has no relation to the Alpha whatsoever. They use the same EV memory bus architecture, and that's it. Athlon (K7) is strictly a 32-bit x86 processor, AMD's next generation "Sledgehammer" will be a 64-bit processor incompatible with Itanium.

--
Zigbee Central: A Zigbee weblog

Re:4 gb of ram eh? by ErikJson · 2000-12-03 12:22 · Score: 1

Yeah... But 2^64 is even more... About 1.84467E19 bytes in fact...Nice little addresses we're getting at.

"Hey! I'd like to put 37 over there at address 0xFFFFFFFFFFFFFFFF!"

/E

Re:MS & Intel by ackthpt · 2000-12-04 02:14 · Score: 2

The Xeon is to the P3 what the Pentium Pro was to the Pentium. Big honkin L2 cache is the common thread.

What we're talking about here, is that Intel is not in the leader role anymore. They are either neck and neck with AMD or slightly behind. This has caused Intel to react, and we've seen them doing a few face-plants by this course: Rollout-recall 1.13GHz P3, roll out P4 with massive heatsink sans multiprocessor capacity with Rambus only memory. The next P4 will be the smaller 0.13 micron process, with multi-processor capacity and probably require much less cooling. In effect, what you're getting at Best Buy right now is their Beta version.

Only two things could explain Intel foisting this dead end initial P4 on the market: (1) PR - recovering the speed crown (doubtful it's just this) (2) Use initial sales to pay off R&D costs.

Interstingly a P4 Xeon will be available by Q2, yet the 0.13 micron process be up until Q3.

Skiing term, when one falls face first. Such activity requires ski buddies to "dust" him or her. Cries of, "Dust Him! Dust Him! Ski over his face!" are optional.

--

--

A feeling of having made the same mistake before: Deja Foobar

Re:AMD? by roju · 2000-12-03 12:23 · Score: 1

I believe however, that the point is not to interest the average, or even avid computer user. The processor is being aimed specifically at high-end jobs. The article mentions that their main competition will be from Sun... not many people have personal Sun workstations at their disposal.

Re:It is "rotating the register stack" by Mr+Z · 2000-12-04 02:21 · Score: 2

Register windows and rotating registers are two different things entirely. The former is used for context switching between functions. The latter is for hardware-assisted, software-controlled register renaming in software pipelined loops.

Register windows slide up and down, providing a (theoretically infinite) stack in the register file. Each positioning of the window provides a "context", which represents the set of registers provided to a function at the function-call boundary. The chip implements a fixed number of contexts, and if you exceed the sliding window in one direction or the other, you take a fault and the fault handler slides the context for you. Presumably, you stay within the chip's implemented contexts most of the time and avoid faults. Such a technique saves you from having to push/pop as many registers around function calls.

Rotating registers work in a modulo fashion, with N registers (configured by the user on IA-64, as I recall) rotate every time a special branch is taken. (On IA-64, they have a "software pipeline branch" which triggers register rotation.) That's a completely different purpose.

A separate facility that IA-64 provides is a set of rotating predicates that can be used to provide "stage predicates." This gives you a mechanism for generating prologs and epilogs from a software pipeline kernel. This is the "avoiding bloat" bit you referred to. While this is still part of the rotating registers, it deserves special mention because it's a distinct use from the other uses of rotating registers that I've discussed elsewhere on this article.

And as for bitwise rotation, the IA-64 does provide that with the shrp instruction. You just provide the same argument to both halves of the pair.

--Joe
--
Program Intellivision!

--
Program Intellivision!

Re:itanium info for lazy readers by ackthpt · 2000-12-03 12:23 · Score: 3

Yes... They use the word solution a lot. This is definitely prepared by Intel Marketing for PHB types. Thanks.

--

--

A feeling of having made the same mistake before: Deja Foobar

Re:"New" Architecture by David+Greene · 2000-12-04 02:37 · Score: 1

I don't know if that's what was suggested, but if so, it would be correct.

The compiler cannot do as well as the hardware because the hardware has runtime context to guide its decisions. Unless you're cheating and running the compiler at run-time. :)

--

Re:"New" Architecture by mike260 · 2000-12-03 19:07 · Score: 1

...it is not going to be new processor tech that cuts it (if at all) but the compilers, which have to break down (probably CISC legacy) code into the parrelisations the Itty will want
Are you suggesting that a compiler can't do a better job than a CPU, given that it around a billion times longer to think about it?

Re:Everybody that has 4 Gigs of RAM, raise your ha by Eg0r · 2000-12-03 19:18 · Score: 1

uhuh, it all depends on your app... sure a small server of some sort will do just fine on 32Mb RAM, things like linuxrouter (www.linuxrouter.org) can work on a 486: it decompress from a floppy to 4Mb RAM and use the remainder of the 8Mb ram for it's memory... awsome!

However, how do you go about solving say a 4000x4000 double float linear system on the same machine? (that's err... 8x4000x4000=128Mb RAM for ONE matrix on a 32bit machine if I'm not mistaken...)

For any computation of this size, you need lotsa memory (and a fast proc ;), especially if your system is dense.

Sure, not everybody needs to do things like that, but it just gives you an idea of what memory is used for.

If you're wondering why the hech I would want to solve a 4000x4000 system, I'm doing 3-D warping/morphing, 4000 is somewhere in the range of the number of deformation vectors I end-up using.

---

--
"Hasta la victoria siempre!" El Comandante

Re:4 gb of ram eh? by budgenator · 2000-12-03 19:28 · Score: 1

Lets see them run over 2GB of ram reliably on a pentium class machine first. Many have tried all have failed. No one knows if its hardware or software, but it just isn't do-able.

And worst yet it'll still run x86 code! Get a clue if you got the bucks to run an itanium, why criple it with the sins of the past. better post this before the machune crashes!

--
Apocalypse Cancelled, Sorry, No Ticket Refunds

Re:no room for coffee on my desk anymore by HalJohnson · 2000-12-03 19:31 · Score: 2

Yes, and they are sweet! I have one on order right now. Due to minor temporal anomalies, all I had to do was deposit one cent and wait a few billion years for the interest to accrue, then simply quantum transfer the funds back (and people thought you could only pay for dinner like this!).

Unfortunately, it seems that this post triggered an abuse of this system which quickly brought upon the collapse of the galaxy's economic system. Guess I won't be getting one, damned paradoxes.

Rotating Registers... by Mr+Z · 2000-12-03 14:15 · Score: 5

Well, it seems Sharky glossed right over this one. They don't seem to get what rotating registers are for. They just make some vague statement about them working well for streaming things or something. *sigh*

One of the chief techniques that VLIW (and EPIC) processors will use to extract parallelism from looping code is Software Pipelining. This technique extracts parallelism across multiple loop iterations by scheduling them in parallel. The most popular form of software pipelining, Modulo Scheduling, offsets the loop iterations by a fixed interval known as the initiation interval.

The minimum possible initiation interval for a software pipelined loop is limited by two factors: The resource bound for the loop, and the recurrence bound for the loop. The resource bound is determined by counting up all the resources the loop uses and finding the minimum # of cycles (ignoring dependences) that you could pack everything into. The recurrence bound is a little trickier.

The recurrence bound is the bound imposed by loop-carried dependences in the loop. That is -- dependences that feed from one iteration of the loop into future iterations. For instance, in the following loop, there's a dependence from the result written to "z" on one iteration to the calculation of "x" on the next:

for (i = 0; i &lt N; i++) {
- x = z ^ 3; y = x + 42; z = y * 69;
}

On an architecture with infinite resources, this loop is still recurrence bound by the path from x to y to z, back to x. So, what does this have to do with rotating registers?

Well, so far, I've just described flow dependences. If you pick up a copy ofHennessy and Patterson's Computer Architecture: A Quantitative Approach , you'll see that this corresponds to "Read after Write" hazards -- meaning a later instruction reads a result written by an earlier instruction. There are two other sorts of hazards to watch out for: Write-After-Write (two instructions writing to the same place have to write in order), and Write-After-Read (a later instruction might clobber a value read by the current instruction).

Write-After-Read hazards are particularly interesting in the case of software pipelined loops. First, some terminology: a value is live from its earliest definition to its last use. In the example above, x is live from the first statement until the second within the body of the loop. In a given loop, a value may be live for quite a long time. However, the initiation interval for the loop might be quite short. This can lead to problems, such as violated Write-After-Read hazards.

Suppose we have the following code:

for (i = 0; i < N; i++) {
- b = a[i]; c = b + t; d = c + u; e = d + v; g[i] = e + b;
}

Suppose we can fit all of this into a single cycle loop on our hardware because we can do four ADDs in parallel, plus the load and the store. Notice that the instructions in the middle are just dependent on each other, and on constants that are initialized outside the loop. Notice that the final instruction uses the second-to-last ADD's result as well as the value we loaded initially.

If we try to put this into a single-cycle loop, we'll have a problem, because we'll load multiple values into b before we even get to the calculation which finds g[i]. Oops. This is because the b = a[i] from a future iteration has moved up above an instruction from the current iteration which reads b--that is, we've violated a Write-After-Read hazard. In software-pipelining parlance, this is a "live-too-long" problem. The value of b is live across multiple iterations.

In a device without rotating registers, you solve this problem by manually copying b to temporary registers. In C code, this might look like so:

for (i = 0; i < N; i++) {
- b = a[i]; b1 = b; b2 = b1; b3 = b2; c = b + t; d = c + u; e = d + v; g[i] = e + b3;
}

Fine, except that can increase codesize, and in some cases impact performance. (It is, however, the technique of choice on processors that implement a minimum of hardware, so as to save power and cost.) Rotating registers alieviate this by performing these copies implicitly whenever the loop branch is taken.

So there you have it. That's the scoop behind rotating register files.

--Joe
--
Program Intellivision!

--
Program Intellivision!

Re:Rotating Registers... by f5426 · 2000-12-03 19:47 · Score: 2

I am lost here.

Did you really mean:

for (i = 0; i < N; i++)
{ b = a[i];
b1 = b;
b2 = b1;
b3 = b2;
c = b + t;
d = c + u;
e = d + v;
g[i] = e + b3;
}

IU fail to see why this is an improvment on the original code. And the additions are still dependant on each other and can't be executed at the same step. Or I misunderstood everything...

Cheers,

--fred

--
1 reply beneath your current threshold.
Re:Rotating Registers... by Mr+Z · 2000-12-04 05:08 · Score: 1

Hey, no problem! I guess it would've just been nice to at least mention software pipelining if you're going to mention rotating registers at all. :-)

In any case, no worries.
--Joe
--
Program Intellivision!

--
Program Intellivision!
Re:Rotating Registers... by Mr+Z · 2000-12-03 20:40 · Score: 2
Uhm, the code you posted looks just like what I posted, unless my tired eyes missed something. Anyway, the point is that yes, the adds must occur in order, but the adds from one iteration can now occur in parallel with a future iteration.

Let me illustrate "graphically" what a single-cycle version of this loop might look like on an infinite resource machine. I'll use || to show instructions in parallel. I'll put the first full iteration in bold -- the subsequent iterations which are placed in parallel will be left unbolded.
- b = *a++
- b1 = b || c = b + t || b = *a++
- b2 = b1 || d = c + u || b1 = b || c = b + t || b = *a++
- b3 = b2 || e = d + v || b2 = b1 || d = c + u || b1 = b || c = b + t || b = *a++
- loop: *g++ = e + b3 || b3 = b2 || e = d + v || b2 = b1 || d = c + u || b1 = b || c = b + t || b = *a++ || if (i++ < N) goto loop
The last cycle of that mess is the actual loop "kernel", which is the part that will do most of the iterating. The kernel in this loop produces one new output every cycle -- the initiation interval is 1. This loop wouldn't've been possible in a single cycle if we didn't move b to b1 to b2 to b3 unless we had rotating registers (which would've done the same, implicitly).
Note that there is a considerable "pipe-up" to the loop kernel. They don't call this software-pipelining for nothing!
--Joe
--
Program Intellivision!
--
Program Intellivision!
Re:Rotating Registers... by Glowing+Fish · 2000-12-03 15:05 · Score: 2

Okay, I am throughly intimidated. First I ask if anyone is really using a computer with anything near 4 gigs of memory on it, and someone says that oh yes, they have 128 gigs of RAM.

And now this. Could a Slashdot editor please post an article for us dumb people who like to read Slashdot for entertainment at 1 AM? Something along the lines of either Jon Katz telling us not to be afraid of letting our colors show, or else some guy in Japan who has built one hundred lifesized smurfs out of legos?

Just a thought.

--
Hopefully I didn't put any [] around my words.
Re:Rotating Registers... by Mr+Z · 2000-12-03 15:33 · Score: 1

Tee hee! Actually, it's kinda interesting, speaking of 4GB RAM, the next workstation I'm scheduled to get at work will have ~4GB RAM in it, and two happy UltraSPARC III CPUs in the ~800MHz range. Whee! (And to think those US IIIs came out of our fab just up the street! Whoo hoo!) The design jobs that run on my workstation really use that much RAM too. They're not my jobs though -- I'm a software guy. All our workstations are in a load-sharing queue, offering gobs of MIPS for crunching all of design's jobs day in and day out. I can kick the jobs off my node during the day as needed, though, which is nice, especially on my current workstation which only has a half-gig of RAM.

As for light, content-free entertainment, you can get that anytime by setting your threshold to -1. I find it rather amusing. :-)
--Joe
--
Program Intellivision!

--
Program Intellivision!
Re:Rotating Registers... by f5426 · 2000-12-03 23:38 · Score: 2

> Uhm, the code you posted looks just like what I posted

Yes. I asked if it was really what you meant. :-)

I *though* I understood what you wanted to say (each step of the computation are done in parallel, so 'b' value used in the last step must be 4 generation old, but didn't see why the way you wrote it was an improvment...

I get it now. The logic behind the execution is, from the point of view of the processor:

First I do:

b= a[i] (Can't do anything else)

Now I have b, so I can do

c = b + t || b1 = b

As I don't need b anymore, I can fetch the next one now, so the second cycle looks like:

c = b + t || b1 = b || b = a[i+1]

Etc, etc. The b3,b2,b1,b is the pipleline for b values. At the end, we have one new g[i] value at each cycle. Neat.

Thanks a lot, I never realised that adding instruction between stalled cycles could sped up the process (I always thought that there were slots 'for free', but not that it could accelerate the result). Make sense in reality, because by using the b1/b2/b3 we are giving more temporary memory for the execution...

Cheers,

--fred

--
1 reply beneath your current threshold.
Re:Rotating Registers... by Mr+Z · 2000-12-03 16:24 · Score: 2

Write-After-Read hazards are particularly interesting in the case of software pipelined loops. First, some terminology: a value is live from its earliest definition to its last use. In the example above, x is live from the first statement until the second within the body of the loop. In a given loop, a value may be live for quite a long time. However, the initiation interval for the loop might be quite short. This can lead to problems, such as violated Write-After-Read hazards.

Ack, it's late and I'm tired, and I forgot to link this back to my introduction of loop-carried dependences. In this case, the way you avoid the violated W-A-R hazard is to introduce a new dependence known as an anti-dependence. An anti-dependence is a dependence on the use of data relative to its destruction; in contrast, a flow dependence is a dependence on the creation of data relative to its use.

In this example, an anti-dependence exists from g[i] = e + b to b = a[i] on the next iteration. This forms a cycle in the dependence graph, and gives us a much larger recurrence bound. This leads to an artifically high iteration interval and low performance.

We break this recurrence by inserting the moves I mentioned in the remainder of my post, or by using rotating registers. Sorry for my lameness there.
--Joe
--
Program Intellivision!

--
Program Intellivision!

Re:AMD? by sql*kitten · 2000-12-03 19:38 · Score: 2

The question is, will developers jump on board and start recompiling?

Probably a lot sooner if Intel start giving VTune to developers for free. Which makes sense if they want to sell CPUs in competition with AMD.

Re:Different target market by jeffsenter · 2000-12-03 14:15 · Score: 1

With Itanium aimed at a niche market of real highend stuff that is often using only a few application, how is development for those few applications going? Is Oracle going to have support for Itanium right away?

Re:But, but, but... (OT) by ackthpt · 2000-12-03 14:15 · Score: 1

Bill claims to never have said such. Until I see where the original post/news article is refuted by the author, I'm not getting worked up over it. However, *I* made this quote ;-)

--

--

A feeling of having made the same mistake before: Deja Foobar

Re:Everybody that has 4 Gigs of RAM, raise your ha by SQL+Error · 2000-12-03 20:06 · Score: 1

A number of IA32 chips support 64GB of physical memory; however, a single process can only map 4GB directly. The operating system has to play funky tricks to get at different parts of memory at different times. Not unlike the days of extended/expanded memory (anyone remember that? Ugh!) So it's mainly useful on large multi-user machines.

Re:"New" Architecture by new500 · 2000-12-03 14:16 · Score: 1

Thanks for the clarification. I think I was getting in a twist because Merced was supposed to replace the > 8400 line - so made a assumption whilst I typed without checking (the /. curse :-)

Achieving High Levels of Instruction-Level Parallelism with Reduced Hardware Complexity is the only link to the project I can find at HP Labs.

Re:Favourite Quotes... by Pulzar · 2000-12-03 14:19 · Score: 1

While we're at correcting the math of the post.. 8 operations per second is.. well... 8Hz, not 12.5 :).

--
Never underestimate the bandwidth of a 747 filled with CD-ROMs.

Re:AMD? by new500 · 2000-12-03 12:27 · Score: 1

I guess we'll have to see if Intel can get the developers excited

Well, it's not like Intel (and HP) haven't had about 5 years or so to manage that.

like a lot of these things (PPro RISC cores to run interpreted x86 instructions), the underlying architecture seems to be there just to accelerate in silcon existing app code.

The day Microsoft, or for that matter any other software vendor offers split upgrade paths e.g. NT4.0SP5 for Itanium w/ bug patches apart from tied featuritis and other effective "upgrade" lock - ins, is the day we might actaully see the kind of huge leaps in performance all this new harware keeps promising.

btw anyone know of an app that would actually want to be coded to EPIC?

Re:How is this different from i.e. AMD or Alpha's? by atrowe · 2000-12-03 12:28 · Score: 2

Umm, exactly what 64-bit AMD chip were you talking about again?

--

-atrowe: Card-carrying Mensa member. I have no toleranse for stupidity.

The most important thing by The-Pheon · 2000-12-03 12:31 · Score: 1

The Itanium will come with 128 floating point and 128 integer registers.

This is what i love about the Itanium! Finally getting away from that awful x86. You need more than 4 gpr's on modern chips.

Re:The most important thing by The-Pheon · 2000-12-03 13:29 · Score: 1

While the L1 is fast and you can attain a 90-99% cache hit rate, it is still at least twice as slow as hitting a register directly.
Re:The most important thing by Mr+Z · 2000-12-03 13:43 · Score: 1

Ok, Pheon, are you sure of this? What if your L1 cache is on chip and super fast? Isn't this just as good as having a zillion registers?

Not really. Cache memory is never truly as fast as registers. The primary reason (in a parallel architecture) is porting. A register file has ports which connect it to all of the functional units. (A port is a connection from a memory cell to a device which reads or writes that memory.) Multiple functional units can all access the register file in parallel. In contrast, most memory is single ported. Multi-porting a memory either slows it down, or drastically limits its size. When you throw in the cache tag RAMs as well for an L1 cache, you further limit its size and speed, and add a layer of indirection that simply does not exist with registers. And that's just some of the hardware reasons why L1 will always be slower.

In the compiler, things get messy as well with memory operands, as now the compiler must disambiguate references to these operands to know which operations may be safely moved past each other. In languages such as C, you have the unfortunate problem that pointers can point just about anywhere. Many compilers are unwilling to consider pointer arguments as pointing to storage that's independant of even the function's local stack frame, and so you get artificial scheduling hazards which limit the parallelism that the compiler can expose in the code.

Other fun reasons: You can't do register renaming on memory locations. You actually have to do memory allocation for memory (pushing values on a stack IS register allocation). You take memory faults and cache misses at least occasionally for memory -- you NEVER do for registers.

So no, memory can never be as fast as registers. It can get close, but never quite 100%.
--Joe
--
Program Intellivision!

--
Program Intellivision!
Re:The most important thing by nekid_singularity · 2000-12-03 13:44 · Score: 2

No, because it will still be at lest half as fast as a register. That is exactley what Intel has done with the P4. Have you noticed the size of the L1 cache. It's 8k!!!. The Athlon has 128k! At first I thought Intel was employing a buch of doped up high school jocks, but it turns out that Intel has designed the P4 L1 cache to compensate for the IA-32's horrific lack of registers. But it still ain't gonna be as fast.

--
Numbers 31:17,18 Now kill all the boys. And kill every woman who has slept with a man,but save for yourselves every virg

6.4 GFLOPS by cperciva · 2000-12-03 12:32 · Score: 3

Isn't that even more than a playstation 2?

--
Tarsnap: Online backups for the truly paranoid

Re:6.4 GFLOPS by Mr+Z · 2000-12-03 13:45 · Score: 1

Yeah, except the PSX2 doesn't quite put out enough heat to melt the DVDs you put in it...
--Joe
--
Program Intellivision!

--
Program Intellivision!

Re:Everybody that has 4 Gigs of RAM, raise your ha by MikeTheYak · 2000-12-04 03:55 · Score: 1

Applications that are typically being used today don't usually need that much memory. That doesn't mean that the demand doesn't exist; it just means that people find a way to balance the resources they have with what they want to do. As having more RAM becomes more practical, applications that use it will pop up. Servers for high-quality streaming video, for example. Or maybe virtual 3D environments with billions of triangles. Since computers first started showing up with a few dozen kilobytes of RAM, people have always wondered whether there would be any reason to have so much memory. If you make the memory available, people will find a way to fill it.

Re:Some disinformation... by JimB · 2000-12-03 20:14 · Score: 2

Actually, if you read this closely, and have read all the OTHER "stuff" put out by Intel on this processor, you'll recognize this article as being REALLY close to one published about five years ago, BY INTEL !! Plus, it's "technical" if your in marketing, but not really technical at all, if your job is on the tech side of life ! I am disheartened that I can find such tripe on Slashdot. P.S.: The IA64 is VLIW, don't let the pukes who MARKET the thing FOOL you. EPIC == VLIW.

Re:itanium info for lazy readers by Rolu · 2000-12-03 20:14 · Score: 2

They use the word solution a lot

So, it's a solution again? And what problems come with it, then?

Re:Quantum irregularities by Pulzar · 2000-12-03 14:21 · Score: 1

Or, you could simply invert the polarity of the flux generator, that always works.

Oh, damn, there goes my company's secret.

--
Never underestimate the bandwidth of a 747 filled with CD-ROMs.

Re:just more Intel crap by itarget · 2000-12-03 20:16 · Score: 1

The whole x86 translation component will allow 32bit x86 code to run on it as-is.

It's slow and it's cruft, but legacy apps don't need to be recompiled.
---
Where can the word be found, where can the word resound? Not here, there is not enough silence.

--

"Where shall the word be found, where will the word resound? Not here, there is not enough silence." -T.S. Eliot

Re:AMD? by jeffry_smith · 2000-12-04 04:25 · Score: 1

> Microsoft has already released Windows Whistler Professional and Advanced Server versions for IA-64.

Interesting that searches at www.microsoft.com don't show any signs of this. All I could find were some beta stuff around driver kits and Visual C++.

Of course, www.ia64linux.org has all the linux stuff, right out there in the open.

Re:AMD? by itarget · 2000-12-03 20:19 · Score: 1

Maybe the itanium will flop... then I can pick it up for cheap and just run a IA64 *nix on it. ;)
---
Where can the word be found, where can the word resound? Not here, there is not enough silence.

--

"Where shall the word be found, where will the word resound? Not here, there is not enough silence." -T.S. Eliot

Re:You must not work in industry. by baldusi · 2000-12-04 04:36 · Score: 1

Well, I wouldn't call the i740 bleeding edge, the i810 a price/performance champion, the i820 or i840 a robust platform. Recall the how late the P!!! and 820 were (reliability does includes promised launch dates). Consider how the crippled the Celeron by keeping its FSB at 66Mhz. Recall how they had to cede the whole server/workstation chipset market to ServerWorks just because they had to use RDRAM. No, I'm not the typical slashdotter and yet I've lost all my respecto for Intel.

Re:Everybody that has 4 Gigs of RAM, raise your ha by logicTrAp · 2000-12-03 20:22 · Score: 2

Chips from the PPro onwards have support for addressing up to 64GB of memory, even tho a single process can only address 4GB. This is useful for programs like databases which can run multiple instances to split work up. The extensions are called "PAE" if you want to go browse the source code.

That Slow!? by addaon · 2000-12-03 14:26 · Score: 1

From the article: For more integer-oriented tasks, where there are few instructions with multiple operations, running eight operations per second is the theoretical maximum.

Now, I've never been particularly keen on Intel, but I've worked with vacuum-tube systems that can top eight ops/sec. ;-)

--

I've had this sig for three days.

Re:Everybody that has 4 Gigs of RAM, raise your ha by SQL+Error · 2000-12-03 20:23 · Score: 1

[Raises hand]

My Win98 machine, which I use only for playing games, has 384MB. My old Linux box has 384MB. My new Linux box has 1GB.

Work machines? My Sun Ultra2 has 1GB. My Linux box has 1GB. My Sun E250 had 768MB, but another department "borrowed" it a year ago and never gave it back.

Servers? I don't know about all the little servers scattered about the place, but they typically start at 512MB. New installs start at 1GB. The enterprise servers range from 12GB to 64GB. And these are all Unix. No NT anywhere.

Anyone who works extensively with databases, simulations, GIS or imaging would laugh at 256MB. There's not that many people who have 4GB on the desktop yet, but that will change over the next 2-3 years as memory becomes cheaper. (Plenty of people who want 4GB on their desks, it's just too expensive for most right now.)

Crap on one count. by G-funk · 2000-12-03 20:23 · Score: 1

Every intel processor from the ppro/p2 (and some late p1s i think) have 4mb and 2mb paging options (as opposed to the default 4k) allowing access to the extra (36 bit) address bus, and much more than 4g of ram.

See http://www.x86.org/articles/2mpages/2mpages.htm

on what used to be the intel secrets web site.

Gfunk

--Gfunk

--
Send lawyers, guns, and money!

*yawn* by mr_typo · 2000-12-03 14:27 · Score: 1

"There's a technical piece [at Sharky Extreme]..."
sure, it throws around fancy words, and uses working metaphores that a 4 year old would understand. One has to read 20 lines to get one usefull piece of information from the article.

"The Itanium may not consistently run 20 operations per cycle, but the potential is there and proper coding and compiling should yield efficient usage of the CPU."
Intel had the core ready years ago, but they have failed to implement a working assembly compiler until recently, I wonder how well it _REALY_ works, and if there will be any alternatives to intels own compiler. It's difficult to imagine that others would come up with alternatives in less than a year, when it took intel years to implement it in the first place. Atleast if intel wont open source their compiler.

Re:How is this different from i.e. AMD or Alpha's? by realberen · 2000-12-03 14:28 · Score: 1

I guess we can close this discussion as other comments have covered this topic more in depth.

Re:MS & Intel by ackthpt · 2000-12-03 20:25 · Score: 2

Hey doofus, Intel isn't trying to sell IA64 chips in Best Buy

Just like Intel wasn't targetting the home market with the P4, just before we read about Best Buy recalls P4 systems. Ok, so Intel lied and they really are trying to sell the P4, not next year, but immediately upon rollout to Jane and Joe Consumer. Don't believe for a half a second that by the end of next year you won't see Itanium systems on sale at Best Buy. Intel has clearly changed their strategy, illustrated by numerous P4 price cuts prior to Nov. 20 formal announcement. They can just as easily readjust their pricing for Itaniums.

If those lousy imitators over at AMD are selling 64 bit processors on ASUS motherboards with VIA chipsets and DDR SDRAM through Best Buy, you can damn well bet Intel won't sit quietly, particularly after their P3 1.13Ghz fiasco trying not to be shown up by that lousy imitator.

--

--

A feeling of having made the same mistake before: Deja Foobar

A little history... by haroldhunt · 2000-12-03 20:27 · Score: 1

There seems to be a lot of excitement and debate regarding the performance of 64 bit code on the new Itanium in comparison to the performance of 32 bit code. In all the excitement some people seem to be forgetting that the first 32 bit 80386 was released in 1985, while the first 32 bit consumer-oriented operating system wasn't released until 1995.

In other words: don't hold your breath waiting for closed-source vendors to recompile their code.

Re:Some highlights... by Mr+Z · 2000-12-03 14:44 · Score: 1

Not that you could do rotates from C or anything...

--
Program Intellivision!

Re:Critique of the Itanium. by ShoeHead · 2000-12-03 14:46 · Score: 1

If intel makes the price of their compiler high, and no one buys it, which sticks the industry with bad performance, no one will buy the processor.

If there is no good/cheap compiler, no one will by the cpu. If Intel does not make the compiler cheap, there will be no good/cheap compiler.

MS & Intel by ackthpt · 2000-12-03 12:34 · Score: 5

Already noted somewhere in the past, but here it is again. Microsoft is porting Windows over to the new processors (McKinley and Itanium(?)) but has dithered on whether they will do a 64 bit Windows implementation for the AMD 64bit [Sledge]Hammer. Not to worry, as your suddenly unfashionable 32bit code should still run fine on the Hammer, but you'll probably have grief trying to run anything compiled for 64bit addressing under 32bit Windows.

What will dictate the success is whichever is more cost effective (read: Cheap) to consumers and purchasing agents. If AMD is dominating the shelves at Best Buy, Circuit City, et al and Itaniums move like the P4 is, you can kinda see the writing on the wall. This is the brink and AMD and Intel are heading toward it, tune in next year and watch this *EXCITING* HiTech drama play out!

Popcorn mandatory, butter and salt optional.

--

--

A feeling of having made the same mistake before: Deja Foobar

Re:MS & Intel by bcaulf · 2000-12-04 05:01 · Score: 1

The Xeon is to the P3 what the Pentium Pro was to the Pentium. Big honkin L2 cache is the common thread.

!! PIII Xeon, PIII and PPro are all the same core (let's call it 80686, or better Sexium) with various process sizes and cache architectures. None are a direct relative of the Pentium (80586). There was never a large cache version as such of the 586, since L2 cache was always on the motherboard.

Anyway I got a couple of pages into this review and found that the reviewer is an idiot. 64-bit CPUs are not a new phenomenon and an Itanium review shouldn't start with a long-winded justification of using a 64-bit word size.
Re:MS & Intel by leviramsey · 2000-12-03 13:09 · Score: 1

I seem to remember hearing that MS was having ungodly amounts of trouble porting the NT kernel to IA64.

We've seen it before... by Tom7 · 2000-12-03 12:39 · Score: 2

The Pentium Pro and Pentium II both run 16-bit code "slower" (clock for clock) than their predecessors. All it takes is higher clocks and an end-of-life on the P4 and Itanium will take over handily just like the PPro and PII did.

Coding for a 64 bit CPU by XneznJuber · 2000-12-03 12:39 · Score: 3

If you have the option of 32-bit compatibility, it may not be worthwhile to migrate existing code to 64-bit. Converting code to 64-bit makes sense if you plan on using huge files or a huge address space. Converting to 64-bit also makes sense if you can utilize efficient 64-bit integer types or other 64-bit processor features and performance that would be otherwise unavailable. Keep in mind that there are also downsides to 64-bit programs that result from the increased program memory usage because many basic data types expand from 32-bit to 64-bit quantities. Also, you may need to test and support both a 32-bit and 64-bit version of your code when a single 32-bit version would work as well. For most existing X applications, unless porting to 64-bit is required, using 32-bit compatibility is an appropriate option. For libraries, the choice of whether to support 64-bit is based on the needs of the library customers. Since a 64-bit application may require various libraries, providing 64-bit library implementations is generally a good idea even if not currently needed.

Re:Coding for a 64 bit CPU by ~MegamanX~ · 2000-12-03 22:48 · Score: 1

Yes, but in this case:

- Maybe the compiler can make good use of the extra bandwith even if the code wasn't designed for it.

- Recompiling the code will remove the translation overhead

- Rewrite/recompile our java virtual machines and all of our java code will be instantanously happy ;)

I tend to like the idea of a new architecture better than the amd tandem (2x x86s) solution...

phobos% cat .sig

--
phobos% cat .sig
cat: .sig: No such file or directory
Re:Coding for a 64 bit CPU by ~MegamanX~ · 2000-12-03 22:52 · Score: 1

public string sorry() {

String wError = instantanously;
StringBuffer wCorrect();
char wInsert = 'e'; //from www.dictionary.com

wCorrect.append( wError.substring( 0, 9 ) );
wCorrect.append( wInsert );
wCorrect.append( wError.substring( 9 ) );

System.out.println( "Sorry... english is not my first language..." );

return wCorrect.toString();

}

phobos% cat .sig

--
phobos% cat .sig
cat: .sig: No such file or directory
Re:Coding for a 64 bit CPU by heh2k · 2000-12-04 00:55 · Score: 1

the *only* types that're expanded on 64bit arches are pointers. ints are still 32bits!! yes, the pointers take up twice as much space, but regular ints do NOT. that is why you can't store a pointer in an int on 64bit cpus (although, you shouldn't ever do that anyway).

Re:How is this different from i.e. AMD or Alpha's? by atrowe · 2000-12-03 12:41 · Score: 2

vaporware -

A sarcastic term used to designate software and hardware products that have been announced and advertised but are not yet available.

--

-atrowe: Card-carrying Mensa member. I have no toleranse for stupidity.

Re:AMD? by eric17 · 2000-12-04 04:52 · Score: 1

"btw anyone know of an app that would actually want to be coded to EPIC?"

I wrote a program that wants to be ported to EPIC, but it's going to have to learn to make money option trading before it gets the goodies.

Re:4 gb of ram eh? by baldusi · 2000-12-04 04:52 · Score: 1

May you point out _which_ were those failed attempts? Cause I'm being a bit skeptical about your frasing.

Questions by selectspec · 2000-12-03 20:33 · Score: 2

Is the L1 cache subset redundant of L2. I assume L2 is subset redundant of the L3. Or is each cache independent? Anyone know. Sounds like this chip is going to starve on memory.

--

Someone you trust is one of us.

Re:AMD? by jeffry_smith · 2000-12-04 05:31 · Score: 1

Yes, they gave out a "preview" in July, but the line I was questioning was:

Microsoft has already released Windows Whistler Professional and Advanced Server versions for IA-64.

In fact, searches on "Whistler Advanced Server" turns up nothing, and "Whistler Professional" turns up an article on winlogo future requirements, but neither search turns up anyting on itanium.

a search on:
itanium whistler advanced
gives some pages where they talk about itanium support, Whistler coming downstream, and Windows 2000 Advanced Server and Windows 2000 Data Center Server.

devil is in the rotating register file by Norge · 2000-12-03 20:52 · Score: 1

The current head of x86 developement at Intel, John Shen (I think that's his title, anyway) is a former professor here at CMU. Another professor of mine was talking with John some time ago, and John mentioned that several of the advanced features in IA64 -- such as the rotating register file mentioned elsewhere in the comments and predicated execution -- have made the Itanium a bear of a chip to implement. I for one am a little skeptical that Intel will be able to make a really fast processor out of this architecture any time soon.

Ben

Re:Everybody that has 4 Gigs of RAM, raise your ha by juuri · 2000-12-03 14:48 · Score: 1

Wrong market.

Itanium isn't for the desktop market. Its for the large server market... this is Intel's last big chance to get rid of other big iron makers (like Sun; or at least I think so).

With that said, all the "production" servers I help look after have at least 1gig of ram per cpu. In the case of DB boxes that goes up to 2gig per cpu... Oracle eats memory for lunch, dinner, breakfast and sometimes as a snack. With typical DB size exploding over the last couple of years the ability to have 4+gig ram per cpu is a definite need.

--
--- I do not moderate.

Re:But, but, but... by Kjella · 2000-12-04 06:38 · Score: 1

So "1.8Meg should be enough for anybody". :)

Gee, I'm glad you didn't design Linux, 2046.2Mb worth of overbloated kernel, you'd have to take bloating classes at M$

Kjella

--
Live today, because you never know what tomorrow brings

So what's better for Linux? by selectspec · 2000-12-03 21:02 · Score: 2

If Linux is the target platform, which chip is better? Compiling linux in 64bit seems straightforward enough. So wouldn't the irridium be the better choice, as its less backwards compatibile? The AMD seems more virsitile, but with linux, you can compile out the 32 bit problem. Both chips seem like they'll starve on too little cache sizes.

--

Someone you trust is one of us.

Re:But, but, but... by The_Messenger · 2000-12-03 14:51 · Score: 1

fons.

All generalizations are false.

--

--
I like to watch.

Re:AMD? --- Missing the point. by Semi_God · 2000-12-03 21:16 · Score: 1

After reading this far I think that you as well as many others are missing the point. This chip is NOT aimed at the same market as the new AMD x86-64 chip. The new Intel chip is aimed at the server market where there is a need for faster machines with the ability to process more and more data everyday.

Intel is introducing the future of computing with this chip. It is doing what it did years ago when it introduced it's first x86 chip to the public. Intel is not improving on existing technology they are creating new technology. While the new Intel chips may not run Quake 3 as fast as current AMD CPU's that it fine. It wasn't invented to run Quake. The only reason it will run x86 code at all is to ease the transition.

All in all I think the comparisions are moot. The two are aimed at diffrent segments of the IT community. Although the comparision is inevitable it add nothing to the discussion, only confusion for those that didn't read the sharky article.

Re:Quantum irregularities by Listen+Up · 2000-12-03 14:51 · Score: 1

Eh..What? This was marked as a 2 for what reason? Okay, there is no such thing as a "flux capacitor" you morons. Has anybody had Physics? Ha ha ha..We laughed about this when I was in Physics II in college. Yeah and maybe if I drive my Delorian fast enough and at just the right time and a perfectly placed lightning bolt hits my car I might travel Back to the Future. Ha ha ha. Flux Capacitor...Torque Iversion Matrix...don't even get me started.

Re:How is this different from i.e. AMD or Alpha's? by athlon02 · 2000-12-03 21:18 · Score: 1

*cough* www.sandpile.org *cough* under x86-64, aka AA-64... hardly vaporware. Itanium maybe 64bit *IF* and when it ever comes out, but you cannot deny it will be a very slow process of making it accepted... First off, x86 has so much market share, it makes it hard to get anything else out there that will be easily accepted (without software to "emulate" or run x86 code at similar speeds as the real thing). Even if Itanium is meant mainly for servers where x86 competes much more against other architectures, Itanium would have to compete against x86, Alphas, Sparcs, etc... Sledgehammer & derivatives have a higher chance of being accepted and sold in mass than Itanium.

64 bits of addressing space.... by gary+bernhardt · 2000-12-03 21:30 · Score: 2

...means a limit of 18,446,744,073,709,551,616 bytes of memory (18,446,744 terabytes or 18,446 petabytes). Compare that with the pathetic 4,294,967,296 offered by 32 bit architectures.

If you bought 128MB dimms it'd still take 144,115,188,075 dimms to reach the maximum memory supported by a 64-bit architecture. A dimm is 5.25 inches long (yes, i really just measured one), so if you layed the number of dimms required end to end, you'd have a line of dimms 756,604,737,398 inches long, or 11,941,362 miles. That's 1/9th the distance to the sun.

That is a DAMN lot of ram.

Re:You havn't been paying much attention: by Mr+Z · 2000-12-03 14:51 · Score: 2

1. Many modern CPUs perform 'Predication', often called something speculitive execution insted. Processors such as P6, K7, and EV6 all perform this optimization.

This is somewhat true but not completely. Predication is a form of speculative execution, but is qualitatively different from the speculative execution that most CPUs do when they branch-predict. The problem is that these architectures don't really have a way to execute down both sides of a branch. To equal what predication provides, you'd actually need to be able to fetch down several code paths in parallel and know which instructions to discard. Icky. Predication allows that to happen in a single code path, because you can put both "if (cond)" and "if (!cond)" paths directly in parallel, or even better, "if (cond1)", "if (cond2)" ... "if (condN)".

Predication is very useful for eliminating short branches and flattening small switch-case statements into effectively straight-line code. It's a much, much, much more effective method for speculative execution than trying to fetch and execute down multiple code-paths.

--Joe
--
Program Intellivision!

--
Program Intellivision!

Re:Everybody that has 4 Gigs of RAM, raise your ha by antijava · 2000-12-03 14:57 · Score: 1

I could definitely make use of it. Think memory mapped files. With 32-bit address space, you're limited to 4gig files. Realistically 2gig is more common, because the upper 2gig of addressable memory is commonly reserved for the kernel. I currently have a file that I memory map that will be about a gig. This steals half of my available addressing space before my application even gets around to doing anything. Sure, it's not fully mapped and stealing real RAM, but it steals from my available virtual memory. If I had to map 2 of these files for some reason (say a compare), I wouldn't have much address space left for my program to use.

Re:wtf is "fons" by The_Messenger · 2000-12-03 14:59 · Score: 1

An acronym of my own invention. I started using it under a different account when replying to people like "ackthpshawhatever" and the ACs seem to have caught on. I'm so proud. Maybe it'll get in the Jargon File.

As for the meaning, well, let's just leave it stay cryptic for a while. More fun that way.

All generalizations are false.

--

--
I like to watch.

Re:"New" Architecture by new500 · 2000-12-03 12:42 · Score: 1

nevermind [sic]

dammit, I do!

:-)

Re: P4 comercials OT by atrowe · 2000-12-03 12:44 · Score: 2

From what I understand, the P4 is currently aimed at workstations and servers and not consumer PC's. Intel won't be advertising the P4 because it isn't targeted for the average consumer.

--

-atrowe: Card-carrying Mensa member. I have no toleranse for stupidity.

IA32 can address 36bits (64GB) by frantzen · 2000-12-03 12:48 · Score: 1

Since the P6 core was introduced with the Pentium Pro, the processor could access 64GB of physical memory. The PTE's (Page Table Entries) have a 36bit wide space for the physical address. You can still only access 32bits of space at a time in a single application since the virtual address space is only 32 bits wide.

It remains to be seen if IA64 can really have that much physical memory. IIRC, SUN4U only has 40 address pins on the CPU.

Re:IA32 can address 36bits (64GB) by PureFiction · 2000-12-03 13:26 · Score: 2

It remains to be seen if IA64 can really have that much physical memory.

While the capability may be there, I don't think it will happen before the chip is completely obsolete.

64bit address space > 16,777,216 Terabytes of memory.

Re:Quantum irregularities by atrowe · 2000-12-03 12:49 · Score: 4

The obvious solution to the Quantum irregularity issue would be to add a thermal flux capacitor to the torque inversion matrix. This would require a slightly larger die for the CPU, but should allow for additional thermal stabilization. AMD has been doing this for several years now.

--

-atrowe: Card-carrying Mensa member. I have no toleranse for stupidity.

Re: P4 comercials OT by pope+nihil · 2000-12-03 12:51 · Score: 1

Neither is Itanium.

COMPAQ's use for it by CMiYC · 2000-12-03 12:54 · Score: 1

I had an interview with the Enterpise department at COMPAQ a few weeks ago for a EE position. They are working on a 4-proc IA64 with 64gigs of RAM.

I think about the P3-500 we have at work now running SCSI-3 with Oracle and how fast it works. I can't even begin to imagine how fast it'd be if we loaded the entire database into a RAMDISK and ran Oracle from that... (he hinted that is what they are trying to move towards).

---

Re:Favourite Quotes... by cicadia · 2000-12-03 13:27 · Score: 1

Imagine that... an MS/Intel server down for less than 3.5 days in a year!

Yeah, I saw it too... too much studying... math skills rapidly deteriorating...

- cicadia

--
Living better through chemicals

Re:"New" Architecture by Webmonger · 2000-12-03 12:54 · Score: 2

It's a new Instruction Set Architecture. IA-64 has only ever been available on one chip, and that's Merced--I mean Itanium.

And linus will become king of the world. by 1nt3lx · 2000-12-03 13:28 · Score: 1

You're absolutely right! Unfortunately, George W. Bush probably means the end of the anti-trust suit against Microsoft. Oh well. It was a nice try anyway.

A companies sales records aren't going to be dramatically upheaved because of the market for OpenSource Operating systems.

Now I come to the flaw in your logic. If someone is going to recompile a version for AMD's optimizations then someone is also going to recompile a version for Intel's optimizations. Intel obviously has a far superior 64-bit core. Not saying that the Sledgehammer won't compete because the Athlon broadsided the industry.

--
The List of Grievances with Slashdot.

Re:Itanium Acceptance by mr · 2000-12-03 13:29 · Score: 1

> linux is the only thing that actually works with IA-64, that can't but help linux though

Let me get this straight.

The X86 based present stock of Intel chips kinda suck.
But, if linux works on the new chip, this helps linux, because the new chip will suck less.

Errrrrr.....there are already other processors that linux run on. MIPS/PA-RISC/SPARC/PPC and Alpha. Yet, what is the leading shrink-wrapped option? X86.

Are the non X86 processors driving the adoption of Open Source OSes? No. And the I64 will effect Open Source OS sales about as much as Alpha does.

Once the IA-64 makes it to a mass market, then it will matter. Untill then, the adoption will be about the same as for the non X86 processors.

--
If it was said on slashdot, it MUST be true!

Re:AMD? by while · 2000-12-03 12:55 · Score: 1

Why not? We've discarded just about everything else from the past 20 years! When was the last time you bought an AT keyboard or an ISA expansion card? PS/2, serial, and parallel ports are being replaced by USB, and IDE is being replaced by Serial ATA.

Each time we get rid of the legacy, the reason is the same -- more headroom. We run out every couple of years, figure out some way to double it, then the process repeats. You can keep making additions to a house, but at some point you'll have to pick up and move to a different lot because you've built it up and out as far as you can.

(end comment) */ }

--

(end comment) */ }
[an error occurred while processing this directive]

Re:"New" Architecture by Mr+Z · 2000-12-03 13:31 · Score: 1

It was supposed to be the first mass-market VLIW chip, though Transmeta beat them to it.

Erm, and how exactly do I program Transmeta's VLIW assembly language? What's that? I can't? Although Transmeta's TMxxxx family of CPUs uses a VLIW architecture presently, nothing constrains them to a particular VLIW instruction set or even to be VLIW on future parts, since the only interface they expose is their emulation of the x86 instruction set. They may as well have a little hamster running in his wheel in there -- as long as it cranks out x86, it won't change the instructions you program it with. (The hamster might not be as fast as the VLIW, though. ;-)

So, with that in mind, I'd say that Itanium will probably be the first mass market VLIW-like programming platform. Of course, TI's TMS320C6000 DSP was probably the first volume-shipping VLIW-on-a-chip, even if it wasn't targeted at the desktop market. The old Multiflow and Cydra machines of yore never quite got that small.

--Joe
--
Program Intellivision!

--
Program Intellivision!

Definition of EPIC! by anoopiyer · 2000-12-03 13:32 · Score: 2

EPIC processors are capable of addressing a 64-bit memory space.

Really? I don't see why this is always the case. Can't I design a 32-bit EPIC processor? Or am I missing something here?

Itanuim is only compared to 32bit archs? by Avrice · 2000-12-04 07:40 · Score: 1

Why is it in this article that Itanium - a '64bit' processor is only compared to 32bit processors. There are some 64bit processors out there, some that have been around since before the first pentium. It would make more sense to compare 64bit processors and their abilities, rather than a 64bit proc to 32bit procs . . unless there is something not being said. More power to Alpha, a real (not 5yrs behind) 64bit proc! James

--
Avrice

Re:bwahaha! by The_Messenger · 2000-12-03 22:00 · Score: 1

Okay, you win...

All generalizations are false.

--

--
I like to watch.

Re:His standards are waaay too high. by nightfire-unique · 2000-12-04 11:32 · Score: 1

Man I'm tired of hearing people toting the e10000 like it's all that. It's _really_ not that impressive.

--
All men are great
before declaring war

--
A government is a body of people notably ungoverned - AC

Re:You havn't been paying much attention: by PureFiction · 2000-12-03 22:07 · Score: 2

You totally got rotating registers wrong. ... They are not good because 'risc registers are small' they are good because you can make your instruction code much smaller, I.e. insted of having code that says R1=R1+R2,R2=R2+R3,R3=R3+R4.... You can say R1=R1+R4,rotate,repeat.

My appologies, I ran two points into one with that statement. The one point being rotating registers, the other being 256 registers, which is quite an increase for CISC/EPIC.

Re:Quantum irregularities by Skynet · 2000-12-04 12:20 · Score: 1

You're dumb.

--
Execute? [Y/N] _

Re:wtf is "fons" by The_Messenger · 2000-12-03 15:07 · Score: 1

And "your"[sic] not very verbose.

All generalizations are false.

--

--
I like to watch.

Re:itanium info for lazy readers by The_Messenger · 2000-12-03 15:11 · Score: 1

Does the Itanium have higher levels of synergy?

All generalizations are false.

--

--
I like to watch.

Re:"Itanium"? by KFury · 2000-12-03 15:13 · Score: 3

Umm, Ickle? Actually, I'm looking forward to more vaporous technologies like:

Xygen (for girls)
Ydrogen (for boys)
Eon (the low-cost clock-crippled version)
R-gon (like the PS2, not available in stores)
Elium (for Cuban markets)
Enon (cheap clone of Eon)
Adon (peripheral processor)
Ithium (big in New York)

Kevin Fox

--

Kevin Fox

Re:How is this different from i.e. AMD or Alpha's? by um...+Lucas · 2000-12-03 22:46 · Score: 1

Agreed... Intel sat out on the RISC craze instead reengineering their processor over and over so as to be competive with the RISC crowd. Now, most of the world (at one point at least... remember when they were touting everyone signed onto their ship) agrees that VLIW will be the next revolutionary change in processing architectures. From every review and analysis i've read, VLIW could do wonders, but only if the compilers are perfect...

Still, it would be nice if they scrapped the hope of running x86 code directly on the silicon of the chip. They're basically pasting a P3 on top of the chip so that it'll be able to run some applications out of the gate... That money would be better used, i think, getting tools into developers hands so that they can port their software to Itanium;s architecture, and then maybe licensing Transemeta's code morphing engine, or else coming up with their own software based emulator.

And so far as i remember, the PPro was indeed much faster than 486's for all tasks. It just lagged behind the Pentiums at running 16 bit code. That's intel's legacy, by now... The first generation of silicon that emerges when they build a new core is slower than the previous top of the line...

They shouldn't get railed for it, like they do around here, because the same situation has occured before. If they didn't take those risks, we'd be running 700 MHz 486's now... But they instead took a hit here and there when they introd the P5 which was clocked slower than the 486's of the time. Then the PPro was slower than the Pentiums. Then the P2 came out but only scaled to 2 CPU's rather than the 4 or 8 the Pro could do. Now the P4 is here and it's slower than the P3 at its' intro... Give them six months and the P3 will be eating the P4's dust. Same goes for Itanium. Of course, it's not going to be the fastest out the gate, but after they tweak it a bit and let it go, if all goes well, it could blow other arch's out of the water.

Re: P4 comercials OT by Anarchos · 2000-12-03 13:00 · Score: 1

Intel won't be advertising the P4 because it isn't targeted for the average consumer.

So you're saying that IBM's Websphere and Microsoft's Windows 2000 are targeted for the average consumer? Somebody's advertising beliefs are skewed...

--

"A good conspiracy is an unprovable one." -Conspiracy Theory

Re: P4 comercials OT by Smitty825 · 2000-12-03 13:01 · Score: 2

From what I understand, the P4 is currently aimed at workstations and servers and not consumer PC's. Intel won't be advertising the P4 because it isn't targeted for the average consumer.

I totally disagree. The Pentium IV does not have a SMP capable design, so the server and high-end workstation market is still being fed the P3. Also, clock cycle for clock cycle, it's slower than the P3, so it really isn't aimed at the lower-end workstation market. Right now, all it has going for it is some fancy MHz rating, which appeals to gamers and consumers that want the latest-greatest thing.

--

Doh!

"Itanium"? by Anal+Surprise · 2000-12-03 13:01 · Score: 4

I've gotten a secret list of upcoming codenames for Intel processors. Through the magic of slashdot, I'll share them with you:

Itanium
Ron
Anganese
Latinum
Opper
Ickle
Admium
Ilver
Ercury
Luminum
Agnesium
...and finally... Old

I really like this naming scheme, and I'm looking forward to using these Innvoative Processors.

Re:"Itanium"? by cookieman · 2000-12-03 16:08 · Score: 1

You forget to mention Fallium :))

Cheers,

--
Just another coder...

Some highlights... by PureFiction · 2000-12-03 13:07 · Score: 3

The Itanium will probably sound like another beefed up Intel chip *yawn* without much to set it apart from the crowd. (We already have lots of 64bit chips right?)

Here are a few interesting tid bits which make the Itanium something different:

- Predication. You read this part right? This means no more pipeline flushes for missed branch prediction. None. This is a big saver. Although transmetas CPU's do this (to a limited extent) with their VLIW and OS, it is still wrong on occasion (i.e., not perfect branch prediction, which itanium will effectively provide)

- Rotating registers. Why are these great? Usually you only have a few registers with CISC architectures. RISC has quite a bit more, but they are much smaller and you end up using them as much as the less populous CISC registers. Having 256 registers with the ability to cycle them means you will be hitting the L1 cache even less. While the L1 is fast, it is still at least twice as slow as hitting a register directly. This is another big bonus

- L1, L2, and L3 cache all at CPU clock speed. Most L2/L3 caches are at half speed at best.

The other enhancements, more pipelines, more ALU's, etc, are all nice but nothing ground breaking. Together with the above additions they add up to impressive performance.

The only downside with all the features is the compilers. Most of the really cool optimizations will require a compiler smart enough to translate the code effectively to ake advantage of them.

It sounds like Intel wont have a top notch compiler for another few years at best, and who knows when the GNU compiler will support even a fraction of the features.

This will be a real downer, as gcc support for Alpha's, which have been around for years and years, is still far behind digital/compaq's alpha compiler.

Re:Quantum irregularities by caferace · 2000-12-03 13:08 · Score: 1

42

When will? by jjr · 2000-12-03 13:35 · Score: 1

The 128 bit processor being come a mainstream thing. When have to think ahead

Everybody that has 4 Gigs of RAM, raise your hand by Glowing+Fish · 2000-12-03 13:40 · Score: 2

I know people have posted about this earlier, but I thought I would ask this as a question,
Are there really a lot of people out there who need a processer that can deal with more then 4 gigs of RAM?

Is this a Windows NT things? Because even some pretty well used Linux\FreeBSD servers are running quite well on 486s with 32 megs of RAM and other ridiculously lowend hardware like that.

So, would anyone out there who is currently using more then even 256 Megs of RAM tell me?

--
Hopefully I didn't put any [] around my words.

Re:But, but, but... by maraist · 2000-12-05 01:09 · Score: 2

Gee, I'm glad you didn't design Linux, 2046.2Mb worth of overbloated kernel, you'd have to take bloating classes at M$

Seeing as how I said kernel AND drivers, and we're finding more than 64 meg of memory mapping for video drivers today, I think it was a very conservative estimate. Note I'm referring to HW BIOS mappings, not Linux based drivers.

-Michael

--
-Michael

But, but, but... by ackthpt · 2000-12-03 11:58 · Score: 5

4 Gigabytes should be enough for anybody!

If you don't get it you are not a nerd and should immediately procede over to CNN where all the other cattle get their news!

--

--

A feeling of having made the same mistake before: Deja Foobar

Re:But, but, but... by maraist · 2000-12-03 22:48 · Score: 3

Cute, but it doesn't completely parallel the original. MS designed around 640K which purposefully limited them. The alternative was to go up to a wopping MEG!!! They arbitrarily chose the 640K (as far as I know) so as to put drivers (starting with the all important video) above that into fixed addressible regions.

Sooo.. What we should do is say Linux only supports 2Gig on most systems, then you have the BIOS mapped memory, then you have the kernel.. So "1.8Meg should be enough for anybody". :)

-Michael

--
-Michael
Re:But, but, but... by Alomex · 2000-12-06 21:44 · Score: 1

The latter part is lame since the 512K Mac is easily memory expandable.
Over Steve Jobs objections and without his knowledge, as he never approved it...
Just trying to set the Urban Legend straight.
Fb. Bill Gates never said such a thing.
T. Steve Jobs refused to make the Mac expandable beyond 512k.
T. The Mac is expandable because of skunk works from the Mac engineers.
And since the facts in this case turn out to be beneficial for Bill Gates, I'll surely lose karma over this...
Re:But, but, but... by ishrat · 2000-12-03 12:11 · Score: 1

Well don't be cattle and come to our web site we do the cattle job for you.

--
There's always sufficient, but not always at the right place nor for the right folks.

Re:Everybody that has 4 Gigs of RAM, raise your ha by joaoraf · 2000-12-05 09:47 · Score: 1

You don't need 64bit addressing to actually address more than 4GB in one computer, but you may need it to implement distributed shared memory (with better than 32bit addressing).
I'm sure there are other applications also (single address-space OS).

Thanks for correcting my mistake by renoX · 2000-12-05 16:11 · Score: 1

I realised my mistake two hours after having posted.

Indeed the "register window" and "rotating the register" are different things.

And I was trying to clear a mistake about rotating a register vs rotating the register stack. How ironical ! :-)

Its like Comedy got an upgrade! by Nightcloud · 2000-12-06 05:40 · Score: 1

"4 Gigabytes should be enough for anybody!"
"If you don't get it you are not a nerd and should immediately procede over to CNN where all the other cattle get their news!"

Jeesh I thought I had left the elitists back on the tech tv forums...

--
Send all information this way please...

Re:How is this different from i.e. AMD or Alpha's? by um...+Lucas · 2000-12-03 22:50 · Score: 1

x86 can't compete in the area's that Itanium is targetted at. If it could, then Intel wouldn't have spent billions to make the Itanium. Those computers in the back rooms of banks, et al, aren't saying "Intel Inside". So, it makes little sense to devise an architecture that's meant to get them into the biggest computers in the world yet still throw in x86 compatibility... it's just a non-issue. Something people don't want or need...

Intel's confused, i think... :) And I find this amusing solely because my previous post was spent defending them...

no room for coffee on my desk anymore by defaultz · 2000-12-03 11:59 · Score: 2

The Itanium was not designed for small systems, it is intended for 1 to 4000 processor workstations and servers.

A 4000 processor workstation? ;)

- - - - - - - - -

It does do bitwise rotate by Mr+Z · 2000-12-03 15:46 · Score: 2

Actually, I just looked it up, and you're wrong. On Page 4-6 of IA-64 Application Developer's Architecture Guide, Rev 1.0, it says specifically, and I quote: (emphasis mine)

The shift right pair (shrp) instruction performs a 128-bit-input funnel shift. It extracts an arbitrary 64-bit field from a 128-bit field formed by concatenating two source general registers. The starting position is specified by an immediate. This can be used to accelerate the adjustment of unaligned data. A bit rotate operation can be performed by using shrp and specifying the same register for both operands.

So there.

--Joe
--
Program Intellivision!

--
Program Intellivision!

His standards are waaay too high. by AFCArchvile · 2000-12-03 23:22 · Score: 2

"In comparison, 32-bit x86 processors access a relatively small 32-bit address space, or up to 4GB of memory."

Oh, so now 4GB of RAM is considered "small"? What planet did you come from, mister "caviar-for-breakfast"?

--
"Ancillary does not mean you get to rule the world." --U.S. Circuit Judge Harry Edwards, speaking to the FCC's lawyer

itanium info for lazy readers by mr_gerbik · 2000-12-03 12:02 · Score: 4

For those of us who don't care to read 1200 pages about Itanium and EPIC. Intel sums it up here in a quick Itanium FAQ.

-gerbik

Re:itanium info for lazy readers by -brazil- · 2000-12-03 16:40 · Score: 1

Definitely! Its innovative, result-oriented design signifies a paradigm change, making it the ideal investment for B2B-focussed players in the new economy!

--
The illegal we do immediately. The unconstitutional takes a little longer.
--Henry Kissinger

It is "rotating the register stack" by renoX · 2000-12-03 23:30 · Score: 2

Not bitwise rotating a register.

This is already done in the SPARC ISA, which means that your register number is a now an offset in the "window register" (stack) not an absolute offset.

It has already been investigated numerous times, and apparently you don't gain that much for the SPARC implementation.

Now for the IA64, it enables some kind of automatic software pipelining which is really nifty (if a bit dificult to undestand).
The only thing I'm wondering is : at which price (how mant transistors, etc) does it come?

Still avoiding code bloat while having software pipelining is really neat!

"New" Architecture by new500 · 2000-12-03 12:03 · Score: 1

Whoa, I have to take issues with the Itanium / IA64 being caleld "New". This is a descendant of the HP PA RISC development work at Hewlett Packard, onto which Intel latched itself something like 5 years ago. (in a hullabaloo of marketing fizz which IIRC scared the likes of SGI / MIPS into practically giving up.)

When the Itanium is announced as commercially available (apparently a whole bunch or 6,000 or so eval units are out there) it is not going to be new processor tech that cuts it (if at all) but the compilers, which have to break down (probably CISC legacy) code into the parrelisations the Itty will want.

Yes, I'm griping, and yes what I add is not much news to most readers here, but as /. grabs more and more readers from the broad user community (okay that's a blatant guess, but I just figure) there IS room on the small news items to say as much as to be accurate. i.e. "new processor family" sounds fine, but "new architecture" I'm not so sure about.

I always wonder if the news items are actually edited, or just picked from a bunch of incoming. Whatever. (That's also not a bitching comment, but I do wonder . . .)

Re:"New" Architecture by untulis · 2000-12-03 13:19 · Score: 1

Whatever. Whatever is right.

It was supposed to be the first mass-market VLIW chip, though Transmeta beat them to it.

You're correct that the article doesn't give HP its credit for really being where many of IA-64's ideas came from. However, HP's work was with the Playdoh project, which wasn't really connected to the PA-RISC architecture. It started there, but quickly evolved differently.

It is a new architecture. Predication, a register stack engine, the number of features designed for parallelism, certainly haven't been seen together or in a "mass-market" processor. Certainly the process technology isn't going to be special, given its size and clock speed.

Also, the code the compilers is going to compile is still going to be in source form. It's not "CISC legacy". The x86 compatibility is done by an on-chip unit, not software.
Re:"New" Architecture by clem.dickey · 2000-12-04 00:01 · Score: 1

It's interesting to see how the article cuts HP out of the Itanium loop. HP used to be acknowledged as a co-developer. Now, according to the article, "IBM, HP, Compaq, and SGI will all offer Itanium solutions alongside their own." I wonder how a supposedly technical article could so thoroughly ignore HP's contributions. And why it would ...

Re:Critique of the Itanium. by kinnunen · 2000-12-03 15:58 · Score: 1

Part of the reason that schedulers on modern chips are so complex is that good compilers are rare. If the compiler produced optimally-ordered code, you could dispense without-of-order execution and save a huge amount of silicon and effort. In practice, however, this kind of code is rare, so the scheduler stays in.

No, the schedulers needs to stay in because it is not possible to do all things in software. Things like register renaming and micro-op scheduling. The instruction set doesn't support it so you do it in hardware. x86 may not optimize all that well, but considering the complex ISA and the existance of multiple generations of CPUs from multiple vendors, you really cant optimize for any specific arch. RISC compilers tend to do a lot better (one vendor + good ISA = better optimization) .

--

Yummy Intel Documentation Goodness by Mr+Z · 2000-12-03 16:02 · Score: 3

With all the wild speculation going on around here, I thought it might be worth throwing some actual links in here to real information.

Itanium Processor Family Home -- has links to all sorts of IA-64 material.
The IA-64 Architecture Specifications and Guides -- lots of good documentation links.
And don't forget Itanium[TM] Processor Microarchitecture Reference.

I haven't read all of these myself, but I have poured over the details that are most relevant to my work. :-)

Have fun.

--Joe
--
Program Intellivision!

--
Program Intellivision!

Re:AMD? by maraist · 2000-12-03 23:43 · Score: 2

Consider the x86 to be the old 68K code and you're Motorola. You WANT the old code to die..
In a previous slashdot article, Intel has tried to patent their EPIC code.. What this means is that IF, by some stretch of the imagination they pull off an industry swing to IA-64 processors, then they're all of a sudden the only game in town.

Beyond that, IA-64 is slated (at least initially) to be a no holds barred processor.. The Ferrari of cars, since they spared little or no expense on functional units or cache (at-speed 4Meg cache?? Especially after they recently determined that they couldn't rely on 3'rd party or external modules). They say 1 to 4,000 processors are to be in these new machines.. That doesn't sound like your Mother's Word processor running in there does it?

Once these bad boys hit 1.5Ghz (maybe 2-5 years from now), then the fact that your emulating Duke Nukem 3D is going to be irrelevant, just like the old non-recompiled 68K code is largely antiquated. Or more directly, just like we could care less that the PPro and it's descendants run 16bit code more slowly.

At the moment, if you're going to buy an IA-64, you have a SINGLE app that you're interested in. Namely a web server, a database server, or a CAD program. If you're in a UNIX system (as has been pointed out), then it's a trivial matter of getting the code to work anew (since either the code is freely available, or the vendor that gives you the box owns the original code). If you're in windows, you still only need worry about your AutoCad xxx, etc. So-what that explorer runs slower; 733MHZ isnt' going to let it run _that_ slowly.

The type of people that shell out $20,000 on a machine could care less what the architecture is.. And to some degree, they care little about the compatibility from box to box... They have their software, and as long as it works fastest on a given platform this year, they'll buy it and switch over (since the data will migrate). Think of it like a black box phone.. A chordless phone.. Do we fret over the fact that we've gone from 400MHZ to 900MHZ to 900-Dig to 2.4Gig Spread Spectrum? Each time switching vendors? None are compatible with each other, but they fullfill a single task well. It's a black box that fullfills a business service. Slower generic apps are simply the cost of doing business with it.

Beyond that, mainstream apps like this typically take up the entire computer and desktop; there is no room for other applications.. You would have a seperate machine if you wanted to do general purpose work-station operations. Perhaps the average Matlab user might be more pressed for all-around performance, but they're on a UNIX machine anyway.

And as for a 4,000 machine.... Well, I shouldn't have to mention how specialized a program is going to be for that anyway.

The point is that Intel wanted to compete against SUN and friends, which use a totally different business paradigm which is incompatible with most users (including value-basd work-stations).

The only danger, as I see it, will be the loss of "trickle down hardware". Where the state of the art today becomes the value PC tomorrow. I doubt that Intel will have _any_ incentive for making an IA-64 cost effective (since that would make it harder to justify the 3,000% premium they'll likely charge for the additional 100MHZ). Since we're locked out of the market (for at least 3 years), Intel will have to continue developing the x86 line for many years.

The problem will be that they can't bet the farm on the IA-64; they have to keep the PXXX on top. Yet, they're not going to re-engineer the x86 for 64 bits, since that would undermine IA-64. If all I need is 64bits, and I'm not worried about massive multi-CPU or even changing my compiler tool-set, then why should I choose other than AMD's x86-64 or the equivalent Pentium derivative.. Why pay the premium for IA-64? If AMD successfully converts their entire line to x86-64, and MS comes around and produces a compatible OS, there will be no compelling need for vendors to port to IA-64 (since there will be little compelling need for a user to buy outside of tradition). Yes they'll get performance... But it would be just like switching to SUN, with their proprietary supported hardware and software environment, AND more importantly, their smaller user-base. Should Sybase support yet another architecture for that 1-10% additional market? Oracle most definately, but maybe not a smaller App company.

The point is that IA-64 requires just as massive a change for software developers and users as if they were to switch to Alpha (especially since they run NT and emulate x86 code as well). What do I gain by choosing IA-64 as the platform? Reliability is offset by the newness of the system. Scalability can almost be better handled through clusters (and more cheaply at that). 64bit will become ubiquitous, and ironically, Intel will have the only 32bit processor left in the market (with the possible exception of older Macs).

All I'm saying is that Intel had best have a backup plan. There is one thing in Intels favor, however. Politics.. They still have the clout with the major industries, and they still can coerce MS to shift in their direction. MS has stopped to purposefully breaking compatibility with competition, and moreso, they are excelent and artificially creating demand for higher end system. With win-2k having an enterprise solution, it isn't too big a streatch of the imagination to concieve that there will be an IA-64 only varient that has necessary features that simply aren't offered on other platforms. Who knows, it might even be easier to develop these enerprise solutions on IA-64 than on legacy x86[-64].

Intel will not fail.. But they will not succeed on merit alone. (much like Rambus)

-Michael

--
-Michael

Re:Everybody that has 4 Gigs of RAM, raise your ha by CardiacArrest · 2000-12-03 16:08 · Score: 1

How could 2.4 support up to 64GB RAM on IA32? I was under the impression that all IA32 platforms since the 386 supported a maximum of 4GB RAM due to the limits of the IA32 chips themselves. I don't see how this is possible but it could possibly extend the lifespan of the IA32 a few more years.

Different target market by JohnZed · 2000-12-03 13:43 · Score: 5

In other cases, I'd agree that legacy code performance would be a huge issue for a processor family aimed at the desktop. After all, there are so many thousands of apps that businesses and consumers rely on (some of which were written by companies that have long since died) that we couldn't possibly expect all of them to port to IA-64. Even worse, this might not be a simple recompile -- if you use any assembly or (more likely) if your code isn't 64-bit clean, you need to modify your code pretty carefully to support it.
But, luckily for them, Intel isn't targetting desktops. They're going after the very highest-end markets (especially with the first release) where users either own the code they're using (as with scientific/high performance computing) or where they rely on only one or two enterprise applications (look at the number of high-end boxes out there that basically just run Oracle, and the number of workstations that are used entirely for one CAD program). Intel just has to make sure that these key apps are really, really well-supported on IA-64 and their target customers will be happy. And they're basically paying companies to do this sort of porting (they have a $250 million IA-64 venture fund), so I have a lot of confidence that this'll work out for them.
It's also important to remember that enterprise products have a much longer purchasing cycle than consumer products. For any console system, the availability of games on Day 1 is crucial to the success of the whole system. But any reasonable enterprise can be expected to spend 9-18 months evaluating critical products before doing a serious roll-out, and that gives Intel a crucial buffer period in which to get the remaining ISVs on board.
The much tougher issue for them will be quality of the compilers themselves. The article alludes to the fact that IA-64 puts a LOT of burden on the compiler, but I think it even understates that fact. The standard gcc is woefully inadequate for this architecture, so Linux users have to hope that SGI's version comes through. Realistically, only HP (which has been working in VLIW experiments for years) can be counted on to have a good implementation ready from the launch of the chip.
--JRZ

Only programs with if then and elses. by 1nt3lx · 2000-12-03 13:22 · Score: 1

read article and subject.

--
The List of Grievances with Slashdot.

Re:AMD? by nekid_singularity · 2000-12-03 13:24 · Score: 1

JESUS H. CHRIST! You could have bought a house for what that damn coffee table cost. But, what do you expect form ACs.

--
Numbers 31:17,18 Now kill all the boys. And kill every woman who has slept with a man,but save for yourselves every virg

Favourite Quotes... by cicadia · 2000-12-03 13:25 · Score: 1

For more integer-oriented tasks, where there are few instructions with multiple operations, running eight operations per second is the theoretical maximum.

Not bad... 12.5Hz out of an 800MHz processor :)

Chipset, OS, and system designers, which will include the likes of HP, IBM, Compaq, SGI, Microsoft and Intel, will bring out their own error handling and reliability processes that should further enhance Itanium-based server uptime to 99.9% and beyond.

Imagine that... an MS/Intel server down for less than 3.5 days in a year!

couldn't help myself :)

- cicadia

--
Living better through chemicals

8 bit, 16 bit, 32 bit more by howman · 2000-12-03 13:44 · Score: 1

all the hooplah, all the supposition, all for naught... give it a year and it'll all be forgot for the 128 bit, 256 bit 512 bit more.

--
flinging poop since 1969

AMD vs Intel Support by pod · 2000-12-03 13:45 · Score: 1

OK, OK, I admit, I'm a little confused about these new 64 bit chips. I know Intel's Itanium will require everything to be recompiled (and rewritten where 32 bit assupmtions are made). AMD's Sledgehammer will also require everything to be recompiled to take advantage of the new mojo. But, but, here is the important question, the two are not the same? They will have different instructions sets? Or no? If they do, won't it basically come down to marketing because vendor (read: MS) support will be critical?

--
"Hot lesbian witches! It's fucking genius!"

Hmm. How many obvious typos can you find? by sommerfeld · 2000-12-03 23:49 · Score: 1

"8 integer operations per second"...

.. and that bit about "bank switching".. umm. how 70's. P6-family x86 processors have 32-bit virtual address, 36-bit physical address. if you turn on PAE, you move to 64-bit PTE's and 3-level page tables (instead of the 32-bit PTE and 2-level tables) and can map individual 4kb pages from 64gb of physical memory anywhere within your 4gb virtual address space.. of course, you need to rewrite your pmap or equivalent piece of the vm system to deal with the different page table format..

(Any individual virtual address space is limited to 4gb, which is a significant constraint, but no "bank switching" is needed)

just more Intel crap by Proud+Geek · 2000-12-03 12:06 · Score: 1

Merced (oops... Itanium) is way too complicated. It has a ton of stuff that it doesn't need and shouldn't have. Plus it requires brand new compilers and optimizations. It will be like the P4, but much worse because old code won't run at all until it is recompiled.

Sledgehammer will be a lot better. Even if you can't recompile your code, it will run it as is, and do a great job of it.

--

Even Slashdot wants to hide some things

Re:just more Intel crap by Depressive+Cyborg · 2000-12-04 00:40 · Score: 1

If all programs and users (!) were optimized for computers, there would be no need for a faster CPU. Trust me, with a round-robin scheduler in your head, it's possible to do a good work on a 486.

I like computers because they like me.

Re:Informative, but OFFTOPIC by UnknownSoldier · 2000-12-04 00:10 · Score: 1

Overrated. At 1, or 0, "Informative, but OFFTOPIC" is overrated.

Hmm, that's an interesting way of looking at it.

I would of tend to thought an misinformative post that incorrectly got moderated up, should get moderated over-rated when a moderator see the post is not entirely correct.

*shrugs*

--
Hey, moderators, lay off the $2 crack pipe, we're trying to have a serious discussion here.
Oh wait, this is /. Whoops, there goes the karma ...

Nyet! by ackthpt · 2000-12-03 12:07 · Score: 2

s this the chip Intel recalled?

Patience, patience! All in good time.

Seriously, I've read about this damn thing for years, and that they actually plan to roll the thing out I find nothing short of anticlamactic. By the way, has anyone seen a P4 commercial, yet? All I see are the Blue Men Group plugging the P3. Will there be a campaign for the Itanium? Big step that it would be, I think they have to.

Ignore that AMD immitator over there with the inexpensive, high performance SledgeHammer, which runs all your existing software! We have some real innovation for you, our track record proves...uh... well, trust us anyway, because Bill is doing Windows for us!

--

--

A feeling of having made the same mistake before: Deja Foobar

Re:Critique of the Itanium. by Christopher+Thomas · 2000-12-04 00:33 · Score: 2

No, the schedulers needs to stay in because it is not possible to do all things in software. Things like register renaming and micro-op scheduling. The instruction set doesn't support it so you do it in hardware.

Actually, register renaming doesn't require out of order execution. All you're doing is renaming the second register in a write-after-write or write-after-read situation to a different internal register name.

You're right about micro-op scheduling, though. I was thinking about RISC processors, which already have more or less atomic instructions.

Re:Everybody that has 4 Gigs of RAM, raise your ha by Mr_Tom · 2000-12-04 00:33 · Score: 1

So, would anyone out there who is currently using more then even 256 Megs of RAM tell me?
[/Quoted]

Yep. 640 Megs here. But that's on a bolshy great machine that processes call records from over 50 telephony switches across the UK.

And yes, we /do/ need more RAM. (And no, it does not run NT!)

Apple & Motorola were in the same boat 10 yrs ago by Geek+Dash+Boy · 2000-12-04 00:45 · Score: 1

When the 680x0 was to be replaced by the PowerPC, Apple had to make some hard choices when it came to their software and how it ran.

Now, I'm not sure if this is quite the same situation, since the PPC and 68k are completely different instruction sets.

But what is similar is that Microsoft will have to make a choice: how do you want your software to run slower today? Apple decided that 68k emulation was the best road to take (giving birth to FAT applications).

Only in the last 2 years or so has PPC-native software truly matured. It was a long, hard road, and Microsoft and Intel have to realize there's no way around it.

Having AMD in the mix makes things that much more interesting...

--
I say we take off and nuke the entire site from orbit. It's the only way to be sure.

Re:Quantum irregularities by minus23 · 2000-12-03 16:09 · Score: 1

Ugh...what was the question again? --- Hmm ...seems an infinate improbablility drive is in order here.

easier,

minus

--
I am Jack's HTTP Server

Re:AMD? by aliebrah · 2000-12-03 16:14 · Score: 1

until Windows 2000 or other popular, but closed source Server operating systems and applications are ported, it's just an academic processor

Windows Whistler (successor to Windows 2000/Me, currently in beta) supports IA-64, Microsoft has already released Windows Whistler Professional and Advanced Server versions for IA-64.

Re:How is this different from i.e. AMD or Alpha's? by Betcour · 2000-12-03 16:14 · Score: 1

I'm not very familiar with the Alpha's, but from what I know they have a "lightweight" architecture, which explains why they can run at such high frequencies (less complexity = less transistors = less heat = higher frequency). Although they are 64 bits, they don't carry so many speed-enhancing mechanism as the Itanium. They are also more "traditionnal RISC" than the Itanium.

AMD Sledgehammer (their 64 bits CPU) is more of a joke : it's just a x86 CPU with extended registers. They basically do the same thing as Intel did when introducing 32 bits x86 CPU : extend existing 16 bits register to 32 bits, with a special "32 bit" running mode to address 4 Gig of RAM. While Itanium is a radically new CPU, with a x86 compatibility layer but completly new concepts and design, AMD is reusing (very) old tricks to push the aging x86 legacy one step further.

In software terms : Pentium III/Athlon is akin to Windows 3.1. Sledgehammer is like Windows 98 (a hack that is fully compatible with 3.1 yet not very nicely done or high performance). Itanium is like Windows NT : new core, new cleaner design, higher performance but less compatibility and more work to do for the designers.

Re:Everybody that has 4 Gigs of RAM, raise your ha by oojah · 2000-12-03 16:17 · Score: 1

My desktop has currently got 384MB in it. That's just because I've got a 128MB dimm that is for my mum and I'll give her the next time I'm at theirs.

Windows reports having just 43% (166MB) free at the moment.

oojah

--
Do you have any better hostages?

Big register set (Re:Critique of the Itanium) by po8 · 2000-12-03 16:17 · Score: 1

The giant rotating register set was inspired by the SPARC register windows. For interrupt handling and user-kernel transitions, it should be possible to use a dedicated set of registers by rotating the window. Heck, one could even do user process context switching that way, up to a point.

Probably it's best to think of the 128-register file in smaller chunks, say 32 regs. Only 64 registers are visible at once anyhow.

Critique of the Itanium. by Christopher+Thomas · 2000-12-03 13:51 · Score: 2

A few aspects of this design strike me as either shady or overly optimistic:

It assumes a very good compmiler, tuned specifically to its architecture.

Part of the reason that schedulers on modern chips are so complex is that good compilers are rare. If the compiler produced optimally-ordered code, you could dispense with out-of-order execution and save a huge amount of silicon and effort. In practice, however, this kind of code is rare, so the scheduler stays in.

Remember the P4 vs. Athlon saga; it turned out that _both_ chips were running far below optimum performance due to sub-optimal compilation. Even without SSE2 enabled, Intel's compiler was able to produce a very large increase. Intel has a history of writing very good compilers, so it's possible that they'll be able to handle optimization for the Itanium consistently, but the vast majority of software developers don't shell out for the Intel compiler. Thus, most Itanium code will be sub-optimal.
Limits to the amount of parallelism present.

This is the big problem with building really wide superscalar processors - it gets exponentially harder to extract parallelizable instructions from the serial program stream. The predication system helps Intel a lot here - by allowing them to pretend that they've predicted branches with certainty, thus optimizing them out and producing longer basic blocks - but it won't be magical. Beyond a certain point, which we're already starting to reach, it just stops being practical to try to issue more instructions in parallel from one instruction stream.

The caveat here is loops that repeat for a large number of iterations, known beforehand, without data dependencies between iterations. You can unroll these into reams of parallelizable instructions, and a large register set makes it much easier to do so. However, this turns out also to reach diminishing returns fairly quickly (play with -funroll-loops and -funroll-all-loops on a few test programs to see what I mean). Your processor bottlenecks on the (large) part of the program that isn't in an easily-scheduled tight loop.
128 integer and 128 fp registers.

Boy, will this increase context switch overhead. Part of the attraction to register renaming and a smaller visible register set is that you get much of the benefit of a larger register set without the context switching cost. Now, this can be taken too far (c.f. x86), but I suspect that 256 registers will be enough to substantially influence performance if you're doing something that involves switching a lot to perform relatively short tasks (like many kernel service calls, many driver calls, interrupts to transfer blocks of network data, etc.).

In short, I think that this processor tries to be too clever for its own good; my prediction is that it will burn lots of power executing both sides of branches, and run at far below peak issue rate due to poor compilers used by most of industry and the limited ILP that exists in the programs being run.

That having been said, there are a few things about this architecture that I _do_ like. Predication is one; speculating both sides of a branch requires a lot more silicon, but allows certain optimizations that just wouldn't be possible by any other means. The large visible register file is also nice for loop unrolling and software pipelining compiler optimizations, though it does cause overhead on systems with a lot of context switching.

My money's still on SMT processors (symmetrical multithreading; one core and one scheduler executing many instruction streams (threads or processes), which gives you more ILP for free, as well as free interleaving when needed to mask latencies).

Nothing like being off by a billion or so... by eric17 · 2000-12-03 16:22 · Score: 1

"For more integer-oriented tasks, where there are few instructions with multiple operations, running eight operations per second is the theoretical maximum."

IA64 is not x86 Descendent by herbierobinson · 2000-12-03 13:52 · Score: 1

IA64 is a PA-RISC Descendent. AFIK, It isn't really intended to replace the x86 any time soon. It is eventually supposed to replace the PA-RISC and provide new high end markets for Intel.

--
An engineer who ran for Congress. http://herbrobinson.us

Re:Everybody that has 4 Gigs of RAM, raise your ha by Fluffy+the+Cat · 2000-12-03 13:56 · Score: 2

So, would anyone out there who is currently using more then even 256 Megs of RAM tell me?

We have a machine with 128GB of RAM here. For many scientific apps you really do need that sort of capacity to deal with the size of the data sets used. If you're working with large databases in business or financial situations, I'd expect that much the same is true - you really want to be able to keep as much of your data in RAM as possible, and you really want to be able to perform complex manipulations.

And no, it's not just an NT thing. One of the more useful features of 2.4 is support for up to 64GB of RAM on IA32 systems. This is something that people want. There's more to life than the desktop, and there's more to servers than just throwing out static web content and processing mail.

ho hum by lophophore · 2000-12-04 00:58 · Score: 1

been there, done that. in 1992. It was/is called Alpha. So Intel is finally catching up. Big deal.

there are 3 kinds of people:
* those who can count

--
there are 3 kinds of people:
* those who can count
* those who can't

AMD? by Outlyer · 2000-12-03 12:11 · Score: 5

I suppose any discussion of Intel will require the mention of AMD. While Intel has frequently admitted that this new chip will run non-native (i.e. not explictily compiled for it) code slower than current chips, AMD claims their 64-bit processor will actually run it faster through a smoother translation layer.

The question is, will developers jump on board and start recompiling? It's not as simple for other OS's as it is for Linux since the code is not available for you to do it personally.

If this chip actually runs code slower, and suffers poor backwards compatibility, what motivation is there for people to port to it? I can see specialized apps, but until Windows 2000 or other popular, but closed source Server operating systems and applications are ported, it's just an academic processor.

I guess we'll have to see if Intel can get the developers excited; but based on my purely anecdotal survey of developers in my group of friends, there isn't a lot of excitement about anything Intel does anymore, especially not this chip.

* mention of Windows 2000 as a server Operating System in no way endorses that as a Good Idea(tm)

--
----------------- "I have a bone to pick, and a few to break." - Refused -------------------

Slashdot Mirror

Intel's Itanium Processor Explained

188 comments