Well, I don't know how easy it would be to yank off the x86 bit of the core and replace it with a PPC one. I'm sure the designs are pretty tightly coupled, for example the number of registers as well as things like SSE and MMX would need to work the way it does on an x86 chip.
Not as coupled as you may think. SSE, MMX, 3dNow!-- all are either specialized floating point calculations (such as 128 bit floats-- but you can pack 2 64 bit floats in it and get a speed boost there), or are used for matrix operations. All of the above are decoded into many RISCOps, which are are then passed to the somewhat more 'generic' FP pipeline(s).
I also seriously doubt the decoder takes up half the CPU space.
You'd be greatly suprised. The whole RISC vs. CISC debate centers around this fact. The reason DEC dropped its VAX (CISC) processors and created the Alpha (RISC) was because the decode stage in the VAX would take more than 1/2 the design. Motorola dumped its 680x0 for PowerPC for the same reason. As did IBM and HP for their RISC chips. And Sun's had its RISC SPARC processors as long as I can remember.
Only Intel's x86 (and x86 clones from AMD, Cyrix, and others) remained CISC. Many analysts thought remaining CISC would kill Intel (as CISC chips are more expensive and more difficult to design). Only the sheer volume of x86 allows Intel to spend the money to develop CISC chips that perform as well as the much simpler RISC designs. (ie. if you look at what Intel spends to develop their x86 designs, and compare it to the development cost for a SPARC, Intel spends a lot more for a slower product).
AMD kept competitive by joining two chips: The decoder and the processing core. The decoder would only have to change if new instructions were added (such as SSE/2). And AMD could concentrate on a wickedly-fast RISC core.
The reason CISC ships have such a huge decoder is because of the complexity of the instructions.
A RISC instruction does only one task, in one way.
A RISC design has an absolute minimum of instructions, with no redundancy.
And to perform more complex tasks, you have to combine multiple instructions.
RISC instruction sets are not as easy to program in assembler with. (or, to be more accurate, it's a lot more tedious)
A CISC (like x86) has multiple-purpose, multiple method instructions. (somewhat like operator overloading in an Object Oriented language).
A CISC chip has ~300+ instructions, where a RISC chip has ~70. (With some RISC chips having as few as 40)
CISC assembler closely resembles a more high-level language.
Implementing high-level programming instructions in hardware takes a lot of transistors.
RISC designs have both less complicated (2-3 times simpler) instructions, and fewer instructions (by 3-6 times fewer).
Even in a RISC design, decode is at least 15% of the total design (and more if you use out-of-order execution).
I also don't think Intel plans to sell the Itanium to the public for a long time. They may also add 64 bit capabilities to their Pentium line of chips to compete with AMD that way.
Their marketing is a bit wishy-washy about that; I think they'll wait and see for the time being. There is a project at Intel called 'Yamhill' which is intended to be a 64-bit x86 clone (like AMD's Opteron).
The thing is there's not too much of a reason to move to 64-bit, save it be addressing more memory, or other things that require large integers. And if you're addressing that much memory, x86 is a lousy choice to start with.
OSX will never be x86, nor will it work on 'commodity' (ie. non-Apple) hardware, Itanium/Opteron or not.
x86 is dying. Apple isn't known for living in the past.
Just because it's Intel doesn't mean it's x86.
Releasing OSX for x86 is completely moronic. Apple is a computer company, not a software company. They sell computers first, software second. If OSX ran on 'open' PC hardware, nobody would buy Apple computers-- they'd buy cheap hardware and OSX.
This is exactly what happened circa 1995 when there were Mac clones. The clones bled Apple dry. Steve Jobs saved Apple by making it a closed system again. Openness only works in a world that believes in openness. The clones exploited Apple's generosity, and it nearly killed Apple.
Pit any software against Microsoft, and expect Microsoft to attempt to kill it. Apple is doing well because they cooperate with Microsoft. If OSX were released for commodity PC hardware, and Microsoft will dump Office/Mac, and basically shut OSX out of the market (as it did with Netscape).
Free software is surviving Microsoft because it can't out-compete with Free Software's price. There's no company to bankrupt, and the software is largely donated by generous coders. Apple has no such protection. They can go bankrupt, and they don't have the hordes of programmers donating code that Free Software enjoys.
Only on x86 code. It's like a mac user (10 years ago) complaining that their new PowerPC runs the same program slower than their old 680x0.
Anything runs slower under emulation -- whether it's hardware or software providing the emulation. Espescially if the emulation is more of a 'white elephant' that isn't really intended to be used.
The Itanium is x86-compatible. There was never a promise that Itanium would execute x86 fast; the current Itaniums aren't even meant for the consumer market, but for workstation use; workstation code, such as the OS (Win64 & Lin64) and apps. And the apps are usually written to be portable, as the market requires it to run on (PPC, PA-RISC, x86, SPARC, MIPS) computers anyhow. Just re-compile.
And, FWIW, Itanium runs PA-RISC code about as well as the PA-RISC does. There's just more in common between PA-RISC/Itanium than there is x86/Itanium.
I believe the poster meant that there are 2 different Itaniums:
One that is x86 backwards compatible (and only x86 backwards compatible)
One that is PA-RISC backwards compatible (and only PA-RISC backwards compatible)
And the poster thought it not much of a stretch to create a third version, that is PowerPC compatible.
Of course, there is only one Itanium core, and it handles all 3 (as you said). However, most RISC chips (such as the PA-RISC and PowerPC) at least have enough similarities that emulating PPC on PA-RISC (or using the PA-RISC decoder) is relatively simple; the opcodes may be different, but otherwise almost everything translates over directly.
eg. (example-- the actual binary is probably different) Function to perform: A+B=C
"ADD A, B, C"='AF0F32BFh' in PPC machine language.
"ADD C, A, B"='CBBF0F32h' in PA-RISC machine language.
The difference is the opcode byte (AF v. CB), and ordering (A+B=C v. C=A+B)
The commands translate directly over, and only the formatting of the instruction matters. Easy emulation. x86 emulation is more of a bear: a single instruction can do different things, depending on the context (almost like operator overloading in assembly)
There have been similar rumors about using AMD chips; they go along these lines:
AMD Athlon & Opteron processors are really two processors: 1.) An x86 decoder, which translates the x86 instructions to 2.) AMD's completely original RISC core; each is roughly 1/2 of the total die size.
Take the upcoming Opteron, chop off the x86 decoder (which is about 1/2 of the chip), and use its RISC core natively (and emulate PPC)
Take the Opteron, and replace the x86 decoder with a PPC decoder (which would still be a smaller die than the x86 Opteron)
AMD is more likely to modify their design than Intel is.
Of course, the argument can be made 'why modify anything?'
As the poster said: x86 is on its last legs. The Opteron is likely the bed it will die in. There's really no reason to even have a CISC chip now that compiled languages are used instead of assembly.
There aren't many compelling things that show that VLIW is a better design paradigm than RISC. Few convincing reasons that VLIW (Itanium) is better than RISC (PowerPC)
Even Intel will have to debunk the MHz myth when trying to convince the public to buy the consumer version of Itanium, rather than the x86 Opteron.
Itanium and PowerPC have roughly equivalent SPEC scores at the same clock speeds.
There's not much to show that PowerPC is 'showing its age', as many of Itanium's touters claim. (It's more of a VLIW vs. RISC argument)
Apple has already done the processor emulation: When it moved from 680x0 to PowerPC. It's not as big a problem for them, having learned how to do it)
Actually, endian-ness isn't even an issue either. Both PowerPC and Penium processors have an endian flag-- this allows the processor(s) to use either byte order with no performance drop. Of course, the two have opposite 'native' modes, but it honestly doesn't matter in terms of speed.
And one other thing I forgot- The human mind is incapable of even registering more than 60 FPS (the US military did the research for simulations). The military wanted to be as budget-consious as possible... but they didn't want to endanger the lives of their pilots by having a sim that rendered frames slowly enough to be noticable.
It has been shown, however, that even though it's impossible to tell the difference in frame rate. However, in real life (as in games) there are things that happen too fast to see the motion.
Games are full of explosions, etc. Very high-speed motion. Most people have watched too many movies; they're used to 'slow' explosions where debris & effected objects are visible on screen. Movie makers know we like eye candy, so they give it to us.
Reality is quite different. A TOW missile explodes before it hits its target. The expanding gas forms a 2-4" hole in the targets armor in microseconds. A person watching it can't see the transition. Bullet wounds take 2-3 frames to fully appear in a movie. Reality is more like 1e-6 frames. Explosives can lift an entire car feet into the air so fast that a human thinks it's instantaneous.
Video games use fairly real physics, as it both makes animating easier, as well as having a more realistic 'feel'. The frames rendered follow the model. With even moderately real physics, an object can move large distances in between frames.
And, of course, there's the ultimate trump: Online gaming. Where the object boundaries (often simplified/compressed) must be transmitted over a low-bandwidth link, with a latency of hundreds of milliseconds. It doesn't matter how fast the graphics card renders, or how well the game keeps track of positions interally.
Updates of 30/sec is pretty optimistic, with 10-20/sec more typical. Other players can 'pop' locations in between frames simply because, in between location updates, the opponent's 'actual' location(s) end up being different than the one the CPU guessed it would be.
Which can mean 4-5 frames were rendered with incorrect locations, the update is recieved, and the 'real' frame is rendered. Next the game guesses where the opponent will be by the next update, and renders the frames necessary to make things look smooth.
The guessing is an imperfect way to make up for the large difference in frame rates and multiplayer location updates. However, there simply isn't any option; there are 4-5 frames that must be rendered before the next update. Simply 'stitting still' looks awful, and lends itself to the perception of a lower framerate than actually exists.
Programmers try to close the gap by making an educated guess. Since they use a realistic motion model (inertia, gravity, etc.), nearly all the possibilities for the 'next frame' can be eliminated immediately. Then it just chooses a 'middle road' that is close enough that us humans don't notice.
Any high-speed, unexpected changes (such as an explosion) can foil the system:
The player thinks they've killed someone (that's what was rendered/displayed on their screen, after all)
But the estimate was wrong. The 'someone' was actually in a safe place when the explosion changed things.
There is no 'backing up', so the next frame shows the person alive and well, and in a completely different place.
The gamer gets upset because they want a perfectly synchronized game
The much lower frequency of positional updates is unacceptably 'chooppy' when such synchronization is used.
The programmers use a 'physics' trick to try and smooth out the picture, but the trick sacrifices accuracy.
With VSYNC enabled, it also means that any FPS above what the monitor can display are lost frames. This can actually be detrimental to gameplay when the game speed relies on the frames rendered, as you can miss a frame you "need".
I smell a know-it-all that needs whackin'.
Modern graphics API's use time, not frames, to determine the speed of a game. The graphics rendering is completely independant of actual gameplay. It's the only way a game can be expected to run 'at the same speed' on the wide range of hardware.
Ever use a boot disk to load DOS and then play an old game? The original, unadapted Wing Commander (1989 version) is completely unplayable because hardware speeds were so close back then, that the programmers could get away with using frame rate to regulate gameplay.
A modern game doesn't care if the rocket impact is rendered. The game registers 'impact' by the vertex's position, which is computed seperately. When the graphics card does the vertex handling, the game still keeps a (much smaller) set to calculate object positions. In other words:
The graphics card computes thousands of vertices, and renders the entire scene once.
The CPU will compute a few hundred vertices. (the collision boundaries, which is generally a bunch of cubes the model fits inside) There is all kinds of time for the CPU to compute a few hundred intermediate steps before the graphics card asks for the next 'snapshot' to render.
No, you don't miss the frames at all. What is this so-called 'need'? First, there is a very big difference between keeping track of the objects (Poly boundaries/collisions, positioning the vertices, etc), and actually rendering them. Vertex calculations (including physics and animation) is much less computationally-intensive. That's why the first 3D cards really only handled rendering. The CPU still did all the vertex operations-- the 3D card did the (exponentially) more intensive rendering of the frame.
The way it usually works is as follows:
Frame Buffer A is displayed on screen
Graphics card renders to Frame Buffer B
Graphics Card renders to Frame Buffer C
When all of Buffer A has been displayed, flip display pages (or use a blit) to Buffer B.
Frame Buffer B is displayed on screen
Graphics Card renders to Frame Buffer A
IF Frame Buffer A finishes rendering before B finishes drawing, flip pages (or blit) to C.
Begin rendering B
If A is being displayed, render C.
If C is being displayed, render A.
If the buffer isn't being displayed, render the next frame. Show frames in order, but drop frames when a more recent one is available.
And so on. This is 'triple buffering', which not all games support (although it is becoming much more common). Double-buffering is almost always used, where there are only 'A' and 'B' buffers.
Which means, that even with vsync enabled, the card is capable of rendering 120 (double) or 180 (triple) buffered. (And that's at an eye straining 60 Hz. With a better monitor that refreshes at, say 85 Hz, the card renders 170 (double) to 255 frames per second.
It is in Microsoft's best interest to charge money for these patents, especially unreasonable amounts of it, because it makes DirectX the only affordable option and locks you into Microsoft software and x86 hardware.
OpenGL would be unaffordable how? nVIDIA already has its own fully-licenced OpenGL drivers for the 3 major OSes (Windows, Mac, and Linux). ATI & Matrox have Windows & Mac covered; the only question is Linux, where neither write drivers. It's not impossible to have MESA implement all non-'patented' OpenGL functions, and the respective hardware makers release the remainder under the (necessary) closed licence.
And more to the point: Windows has a mechanism to allow for other non-DirectX graphics API's. Vid card manufacturers (usually) own full OpenGL licences, and they write complete implementations of OpenGL in their drivers anyway. (Or, to be more specific, they implement the segments of OpenGL that aren't already in their hardware).
Price isn't even an issue, and never was. The cost is shouldered by the vid card makers, and is is hewn down to pennies by the time we pay for it. Neither is x86 hardware-- Or have you forgotten that the primary implementation of WinXP-64bit, which includes DirectX, is Itanium (and while is x86 compatible, it is not x86 or even close to it).
The only real problems that arise is the (expected) moaning that Microsoft is getting money from us whether we buy their software or not, and the future of Mesa or other "Free(dom)" implementations.
And there's nothing from stopping Mesa from implementing everything non-patented, and leaving the patented portions to the hardware makers. Which is still a good deal for ATI or Matrox, as they would only have to write a partial portion of the driver.
For users of nVIDIA and Windows/Mac/Linux, there is and will probably never be a problem; they write their own drivers for all three anyway.
"Free(dom)" software drivers aside, I prefer an excellent, closed-source driver(s) such as nVIDIA's to absolutely no driver at all. It isn't necessarily the HW maker's fault; they have to follow IP laws, and are often kept from releasing source code because of IP laws. If they leave out the 'locked' feature, they lose a competitive advantage, and business to the companies who do. So they choose the best path allowed by law, and provide a non-free driver to a Free OS.
IP law isn't necessarily a bad thing; it's what makes the GPL work. Were it not for IP law, there's nothing from keeping Microsoft from selling our own code back to us.
Information does not want to be free. If it did, we wouldn't have to spend billions in research, either theoretical or applied. People don't give up years of their lives and thousands of dollars to college education because information simply wants to saturate their brains; but because the information requires an active, continuous effort to both spread and simply continue to remain known. Information does everything it can to remain secret. Without our own constant vigilence, all the knowledge and information mankind has collected over the ages would hide iteself again. Skills and facts are forgotten. Books age and crumble. CD's and magnetic media decay.
It takes long, hard work to get information. The whole entropy argument ignores the fact that information is an organized substance, and entropy works against organization, and towards chaos.
While I don't agree on the period of time involved in patents (and espescially copyrights), there has to be a real financial incentive to seek and preserve information. Otherwise, the quest for information and knowledge will be left to rich eccentrics, as was the case centuries ago.
IP law is what made it possible for a person to be a scientist, and earn a living at the same time. It gave them a chance to sell the information they found, and buy their daily bread with the money gained. Without this capability -- to sell the fruits of research and thinking, we would live in a world with very few professional scientists, professional engineers, professional writers (so long to the Lord of the Rings and Dune!) We wouldn't even have flown aircraft yet, let alone flown to the moon.
This does not underscore the greatness of Free Software; it's one of the most altruistic services for all of mankind. But to expect all knowledge to be "Free" is like expecting a farmer to give away his crop.
The world would be nice if everybody shared in this way, but there is a greater human desire to have more if you work more, and that a skilled worker should have more than an unskilled worker. If there isn't an incentive to hard work, study, and the honing of skills, civilization would have never developed.
Re:Didn't Microsoft just do something with this?
on
OpenGL 1.4 Spec Finalized
·
· Score: 4, Informative
SGI is still in charge?
SGI isn't 'in charge' per se; the ARB is (the ARB consists of various hardware & software makers, including Microsoft, nVIDIA, ATI, Matrox, SGI, Sun, and Evans & Sutherland). However, OpenGL is a trademark of SGI, so they get to make the announcement.
How could you possibly say that onboard DRAM controller doesn't provide any benefits to SP systems? [...] Even the recent history shows that a reduction in memory latency has a greater effect on PC performance than an increase in bandwidth.
This argument seems to be more a Rambus vs. DDR thing; and even then on commodity boxen. But I digress. In both cases there is currently an off-chip memory controller. The big reason for the difference in latency is not the controller itself, but the (completely different) methods of transferring data. Rambus uses a serial data transfer, which is easy to scale up (in terms of speed and bandwidth), but has higher latency. DDR is an older, parallell technology. DDR has lower latency, but has lower bandwidth and is much harder to scale up. This primarily because of electromagnetic crosstalk (and other E&M interference problems) within DDR's (parallell) data paths.
There is a point of limited returns with the low latencies DDR offers; the point is frequently reached on high-performance computers (workstations, scientific processing, and high-end servers) where the bandwidth is the key factor. When you're transferring a few GB of memory, who cares that it takes a few us longer to start receieving data-- overall, the entire transfer (from request to completion) takes much less time. Even Wintel boxen are beginning to reach this point.
Personally, I wonder how RAMBUS even got a patent. I don't see how a serial memory bus is 'non-obvious to the trade's practitioners'. But, that's the USPTO for you.
Another major problem is the physical distance to (as well as speed of) DRAM. Silicon technology has already reached the point where a signal often travels faster through logic gates (such as an off-CPU controller) than it does through wire. So long as the memory controller is physically located between the DRAM and the CPU, there is little chance there will be any performance drop. At current CPU speeds, it takes 2-3 clock cycles for any signal to even reach the DRAM (even light-speed is slow at 1 GHz). Then it takes several more before the DRAM addresses and returns data. Then another 2-3 clock cycles before it gets back to the CPU. An off-CPU DRAM controller may or may not take an additional cycle. For large (sequential addressed) memory transfers, this one cycle is a one-shot deal. Even with millions of tiny, single-byte (randomly selected) transfers, there is one million extra clock cycles 'burned up'. This would result in a performance drop of 0.05% on a 2GHz CPU. (And less as speeds increase)
As for Hypertransport, the idea behind that is not just absolute performance increases, but also design flexibility. So the same chipset that serves as a PC chipset, may also be able to serve as an 8-way server chipset, with few design changes (perhaps by adding or subtracting a few more HTT channels).
This is true; but as I said, it only really makes things better for the multiprocessing crowd. Chip makers don't usually pass the costs of a higher-complexity/performance chip to the buyers of a lower-complexity chip. The SP chipset would be the hands-down highest-volume seller. An MP chipset that is based from the SP design would cost less than a wholly-redesigned MP chipset. This suits the MP buyers fine... but it doesn't give any benefit to the SP buyers. The benefit is to MP alone.
Even within a desktop environment, you can easily separate out shared PCI/AGP buses, into multiple switched PCI/AGP buses with Hypertransport underlying them.
You can, but why? For all intents and purposes, the PCI/AGP bus is essentially idle 100% of the time. (The times when it is used is more of a statistical anomoly than fact; a figment of the deranged observer's imagination.) Even in applications when there actually is heavy bus activity, the PCI/AGP bus is far from being saturated. There are cases (such as multiport gigabit ethernet cards) where any single PCI slot is unable to handle the load -- but the PCI bus itself still has massive amounts of idle bandwidth; it's just that it's not possible to transfer the data between the network card and the PCI bus fast enough. (Which is a limitation of PCI's component interface, but not of its bus).
I've seen many servers that have multiple network interfaces, where each NIC saturates the PCI card slot. The actual PCI bus, however, is not saturated, and handles the full load of multiple saturated interfaces quite well.
In other words, it doesn't matter how wide the freeway is; the tollbooth (AKA the PCI Slot interface) is the bottleneck, and is the real limiter of performance. A HyperTransport-switched PCI bus would be like adding more lanes to a highway that has nearly no traffic on it. It doesn't change how fast you can drive. It's the long wait at the toll-booth at the on and off-ramps that is the speed problem.
Espescially as on many motherboards, AGP and PCI are on entirely different buses, so heavy AGP usage (such as DoomIII, or 3D Animation) doesn't even effect the PCI bus. For the desktop user, there is no benefit to such a scheme. Even a power-hungry gamer, using his AGP8X card to its fullest potential, compiling XFree86, and hosting multiple P2P file transfers couldn't do much to dent the PCI bus's capabilities. It's other x86 problems that are most likely to cause speed drops; not PCI or AGP.
Only in ultra-high-end applications would there be a benefit.
But it's not all of the other players it has to worry about, just one player: Intel. Intel may be allowed to use the HTT, but its absolutely certain they would rather die than use their great competitor's designs.
That's completely untrue. In several aspects. First, the NIH (Not Invented Here) syndrome has burned just about everybody. No company that is too proud to use a technology that was NIH lasts long. The managers at Intel are not that stupid. But they aren't going to jump on the bandwagon and spend any money just yet; they'll wait until they see how the results fare on the market before they invest anything in HyperTransport. If it's in Intel's best interest, they'll use it. If not, they'll design an alternative. To call AMD their 'great competitor' is rather short-sighted as well. They're only the most major competitor in the x86 arena, and one with a minority of the market. That's the reality, whether you like it or not. And I like (and have recently bought) AMD processors.
All of the other players are small-fry in terms of volume compared to the x86 camp.
That is an entirely baseless statement. The x86 camp is extremely small in terms of the 'other players'. Or weren't you aware that approximately 0% of all computers use an x86 chip? AMD has a very small production volume; so small they don't even fab their own chips. The only major competitor that is fab'd in such small volumes is SPARC. But Power & PowerPC, Itanium, and even ARM processors are all fab'd in greater volumes than AMD's. Intel plans on abandoning x86 entirely; their Yamhill (Hammer-like) processor is a contingency plan, to 'steal the Hammer's thunder.'
HP has no need to use HTT in its processors, simply because it has no processors anymore
Patently false. HP's processor is the Itanium. (more below)
all of them (PA-RISC and Alpha) have been EOL'ed according their own roadmaps, so what are they going to use them for, Itanium?
Their roadmap EOL's the PA-RISC, but points straight to Itanium. The Itanium is 100% PA-RISC compatible (in addition to supporting x86 and its own architecture). It is the next-gen PA-RISC. They are only supporting the next couple of releases of PA-RISC to appease people whom already have PA-RISC hardware, and wish to upgrade the processors in their pre-existing hardware. Alpha was acquired well after the Itanium was complete; a white elephant of sorts. It was never part of the plan. It's entirely likely that HP will include Alpha technologies into next-gen IA-64 chips. If there is customer demand (espescially if it's from Itanium's co-designers at HP), HyperTransport will be included as well.
Anyways, the only RISC player that is likely to use HTT is Sun, and they will likely use it in their upcoming Opteron servers. It's likely that IBM, HP, in addition to Sun all have Opteron plans secretly already devised.
Opteron is the Hammer's new brand-name, and Sun will definately not be using it.
Sun is 100% SPARC, has been for more than a decade, and they have no plans to abandon it. There is no such thing as an 'Opteron server' from Sun. Sun only sells SPARC boxen.
I already covered HP -- they're Itanium. Their roadmaps still point to it.
SGI's roadmap leads to Itanium for their workstations and servers. They will use Intel's answer to HyperTransport (whether it is HyperTransport or not)
IBM is all about their own Power and PowerPC processors, which has better SPECint and SPECfp scores than anything else to begin with.
It's likely that IBM has an Opteron-based PC and Windows.net server, but the Opteron won't be used in their high-end servers or workstations. IBM already scales well past the point where HyperTransport would be beneficial; and IBM is in the same boat as Intel: If it's worth their while, they'll either use or design an alternative for HyperTransport. But for IBM, it may be completely unnecessary to begin with.
Apple is likely to use HyperTransport, as they have a great deal of flexibility in what technologies are to be used in their machines. Apple is also a member of the HyperTransport consortium. Apple's market is definatley not a trivial one.
Which goes to show my point: Just because AMD's Opteron has great features, they are in no way unique to the Opteron. And its competitors have a better system architecture than x86 to boot.
So with the improved process technology they were able to get 70% better speeds (Athlon vs. P3), but with increased pipeline stages (P4 vs. P3) they were able to get 100% better speeds.
Interesting side note: One reason the Alpha does so well is that the physical design is very closely tuned to its fab process.
And a question: Do you mean a greater number of pipelines, or more pipeline stages?
I ask because more pipeline stages doesn't really increase speed very much (ie. there can be one instruction in each pipeline stage, but as each instruction takes one clock to move to the next stage, there isn't any improvement in speed.) In fact, shorter pipelines are often faster, as they don't have as much potential for stage bubbles.
A stage conflict is when, for example, you have a 5 stage pipeline. Instruction A comes immediately before B. However, instruction B requires that A finish the entire pipeline before it can begin executing. So, instruction B has to wait 4 more cycles before it can execute (instruction A must finish, which essentially clears out the pipeline) A 10-stage would take 10 cycles to clear out before B can execute.
Out-of-order execution can help keep the pipeline busy with other tasks while B is waiting to be executed; but it doens't always work out.
Additional pipelines (which is what I think you meant) is adding a second (or third, fourth...) identical pipeline, so that tasks unrelated to the A,B instructions (above) can be executed as well. Again, out-of-order execution helps keep things busy, but not always.
Which comes to the nice thing about VLIW design: The compiler (or, in the case of VLIW, the maschocistic asm coder) is able to take a larger look at program than is possible in a non-VLIW design (Which, AFAIK for the mass-produced chips, is everything except the Crusoe and Itanium). And that results in a more efficient run than having the hardware attempt to do it.
Of course, as far as design complexity goes, I'm not entirely sure which is easier to design: The out-of-order predicion chip, or a VLIW chip. I tend to believe the VLIW chip is more complex in design.
This directly causes the RISC system to require a bigger cache to keep the CPU fed with the same amount of work.
This isn't exactly true. PowerPC, as I recall uses a 64-bit instruction. (8 bytes) This includes the operation type, the source and destination registers, as well as any additional information.
CISC instructions are variable in size and purpose, and can range from one byte instructions (such as noop) to multibyte instructions that are greater than the 8 bytes the PowerPC uses.
So the situation isn't quite so dire; many RISC chips (such as MIPS) have very little 'wasted' bits in the instruction set.
The additional cache isn't anywhere near as big (or complex) as the total savings of RISC vs CISC die size. It's like taking 10 steps forward and one step back. (But don't quote me on the actual scale; as that may vary from chip to chip)
But you're absolutely right on the CISC at 250 MIPS vs. a 1000 MIPS RISC. But I'd much rather design the RISC chip, as it is so much easier than a CISC design of (roughly equivalent) speed.
It's the x86's experience and expertise in designing large decoder stages that has allowed both Intel and AMD to reach the 1Ghz+ frequency stage so far ahead of any of the RISC crowd
Actually, it's primarily because Intel pushed better fab processes into production earlier than the RISC crowd, of whom only Motorola & IBM fab their own.
The Alpha was making a run for this crown, and it was the only horse in this race for the longest time, and then all of a sudden from out of nowhere both Intel and AMD both overhauled the Alpha as if it wasn't there.
Never underestimate the damaging effects of a corporate sale. When DEC was split between Intel and Compaq, (well before the 1 GHz barrier) it was the death knell for the Alpha-- there was simply too much disruption in the shift of companies. (not to mention the fact that many of Alpha's engineers wanted nothing to do with Intel or Compaq, so they left) Neither AMD or Intel was bought out, as DEC was. And AMD even ended up with some of Alpha's engineers!
That leaves the whole category of heavy-haul trucks unanswered by x86 at the moment. But what distinguishes a heavy-haul truck from a pickup? The ability to pull large loads. Is that all achieved by the truck's engine? No! Large trucks have incredible 18-speed transmissions, and stiff chassis, etc. In other words it's the overall package that distinguishes a heavy-hauler from a pickup... [it] describes a similar approach to how you distinguish a RISC processor-based (heavy haul) server from a PC (pickup) processor-based one.
So how's this got anything to do about Hammer?
Easy... Architecture. As you say, the engine is only a small (but significant) part of the entire package that makes the distinction. The rest is the architecture around which the engine is built. Frankly, even though there's been many improvements of the x86 design (primarily by eliminating ISA and replacing it with PCI/AGP), it still has its problems; which is why it will never be a true replacement for high-end workstations and servers.
Well, what it leads to is that Hammer has been designed right from the start to be everything from a car engine, to a pickup engine, to a heavy haul engine. That's because of its various features, such as Hypertransport, and onboard DRAM controller.
If it were designed from the ground up, it wouldn't be x86 compatible; not, at least, if the designers wanted a truly great processor. Rather, AMD hopes to ride the x86-compatibility market and is therefore adapting a phenomenal RISC core to the pre-existing x86 set. It's like bolting a jet engine on a farm tractor.
Hypertransport (as well as a built-in DRAM controller) is only useful on multiprocessor systems (I'm not downplaying their usefulness at all) The onboard DRAM controller allows each processor to have its own seperate memory (whereas many, including the IA-64, share the same memory through the system bus.) Combined with the increased multiprocessing effecinecy Hypertransport offers, the Hammer processor line seems to be clearly designed for multiprocessor systems. (Hypertransport and onboard DRAM doesn't provide any real benefit to a single-processor system)
It will be great for companies that want to upgrade their x86 server hardware, but want to keep their old software. It'll do great in the 3D animation and rendering studios, many of whom use a Unix-like OS anyway. But for the general desktop machine, there will be only one CPU, robbing the user of the benefits Hypertransport and the onboard DRAM module give.
One key here is that Hypertransport is not unique to the Hammer; SUN, HP, Motorola, SGI and Apple are all members of Hypertransport consortium, and intend to incorporate it into their processor designs.
The primary benefit of an onboard DRAM controller per chip (no longer sharing the same memory pool via a bus) is already implemented on other architectures by using multiple DRAM controllers.
My argument all along was that the Hammer isn't a good thing because it:
Keeps the paleolithic x86 architecture.
Could operate far faster if its RISC core didn't adapt itself to x86
We would be better off junking the x86 architecture sooner than later.
The Hammer, while an excellent x86 design, seeks to make the transition 'later', if at all.
Most of the responses I've seen are remarkably similar to a PC fan's reasons why they don't want to switch to a better machine than x86 can provide: They're cheap (the machines, although it can apply to a few users). Actual reasons as to the Hammer's 'superiority' are in no way particular to the Hammer, and are found in many of its competitor's drawing boards as well.
And outside the Free software world, where the software typicall only requires a recompile, the Hammer faces some serious, possibly fatal obstacles once 64-bit compiled commercial packages begin to replace the older 32-bit code. The commercial reality is that to be successful, the Hammer has to have natively-compiled 64-bit code. (In Windows) To do this, they have to have developers who will support Hammer/64 in addition to the IA-64. They'll have to either sell two different versions (somewhat similar to the sales of Mac vs PC / or Win32 vs x86Linux games), or have both binaries in one package. Both are expensive propositions, and with Intel's virtually guaranteed market-share, it may not be worth the effort to support Hammer.
For a brief history on AMD and binary incompatibility-- Jim Turley, a CPU/Architecture analyst, said the following: "Backing Intel's newest and heavily promoted next-generation architecture is a foregone conclusion for vendors that want to stay in business. Supporting AMD becomes more problematic. Will the added market share be worth the effort? Suddenly AMD finds itself in the same boat as Apple with a different, yet competitive, product that requires dedicated software support to survive.
Grimly, AMD itself lived through this tragedy not so many years ago, and the wound was self-inflicted. AMD unceremoniously axed its entire 29000 family, one of the most popular RISC processors of the early 1990s, due to the cost of software support. The company decommissioned the second-best-selling RISC in the world because subsidizing the independent software developers was sapping all the profits from 29K chip sales. As "successful" as it was, AMD had to abandon the 29K, the only original CPU architecture it ever created. " (emphasis added)
I'm not saying that the Hammer isn't a good processor.
I'm saying that it's putting a jet engine in a 1940's John Deere tractor. I'm saying the mechanic should dump the tractor, and put a jet engine in an aircraft-- not an ancient, over-extended farm tool. The tractor could still do its job, but it's just such a waste of the engine's potential.
I'm sorry, but the x86 instruction set is old and inefficient; it doesn't allow compilers or programmers to access a modern CPU's (including the Hammer) features-- So the Hammer has to deal with the limits inherited from the x86 set.
IA-64 allows explicit branch/pipeline ordering and load optimization; this allows the compiler's larger view to create code that keeps all the pipelines busy.
As all branch/pipeline and load optimization is done in the compiler, there is much more time to find the most optimal instruction order and path. (Fractions of nanoseconds vs. seconds/minutes/hours)
An instruction set (such as IA-64) capable of direct access to branch ordering, or a greater number of registers is more powerful, in that it allows for developers (directly, or via a compiler) to 'take the time' and resources to find the most optimal/efficient way to use the processor's full capabilities.
x86/Hammer does not allow explicit branch/pipeline ordering or load optimization, as x86 was purely single-pipeline until the first Pentium. (Although technically x87 is another pipeline, it served an entirely different purpose... the branching I speak of is of two or more identical pipelines)
As a result, the (Pentium, Athlon, K6, Hammer) must look at its instruction cache, and from that (very limited) amount of information, attempt to optimize the branch/pipelines and provide load-balancing. Time is extremely limited (to fractions of nanoseconds), as are resources to perform any re-ordering. But as time is limited, it frequently executes a suboptimal route and/or order.
Even though the Hammer has all kinds of ultra-modern features and resources, nearly all of them are inaccessible to the programmer/compiler; while the built-in management of these features/resources is quite good, it is also far from perfect (having a far more limited scope than a compiler does, after all) Cycles that could have been put to good use end up being wasted.
Lastly, I'll say that I'm not so much a fan of the IA-64 as I am of the VLIW concept; Non-VLIW processors (Sparc, Power, Alpha) have the same pipeline scheduling concerns as the Hammer. But at least they offer greater access to the processor's resources (such as double or more the accessible GP registers of 64-bit Hammer).
AMD has stated that adding the 64-bit extensions to Hammer has only increased its core size by about 5% over K7, at the same process size!
That's not too suprising... I'd say the figure is about right. With as large an instruction decode stage as an x86 (or any CISC) has, changing from 32 to 64 bits isn't going to change the size of the chip much. (The 64-bit extensions, from what I understand, do not add more than a couple instructions; it simply reuses the ones it already has. Hence Decode stage won't grow too much)
The thing is the Decode stage takes up so much of the overall die (and number of transistors, etc) in any CISC processor, that even sweeping changes in the remainder of the chip will result in a nearly identical die size.
That being said, the actual RISC processing core (of the Hammer) is significantly larger than the K7's RISC core. (On the order of 20-30%). It's just that the decode stage is so huge that it hardly makes any difference.
Why do you think people are so excited about Hammer?
A couple of things: First, there is a significantly large anti-Intel crowd. (Not surprisingly, they're also anti-Microsoft). So any upcoming non-Intel chip is exciting to them.
My feelings as to 'why AMD?' comes down to a simple factor: Price. AMD chips are loved by so many because they're cheap x86-compatibles (games being a key factor). If Apple hardware were similarly priced, and had the game market that x86 offers, Apple (and PowerPC) would be a favorite.
Processors can be related to cars fairly well, as long as you forget about being compatible with Windows for a moment; And frankly, as far as I'm concerned, the programs that run on it don't make a difference to the actual hardware.
The Hammer is akin to a pickup truck: A fairly inexpensive, medium-quality vehicle. It's loved because it does its job at a bargain price. It's utilitarian. It's the 'people's truck', and is affordable to most of the population.
Workstation processors (Such as Power, SPARC, Alpha, PA-RISC, Itanium) are compared to a semi-truck (Kenworth, International, Caterpillar): They don't necessarily go any faster, but they can tow huge cargos, but the corresponding rise in cost is far from linear.
And Apple (PowerPC) processors are BMW's or an Audi: They don't really run any better (or worse) than a pickup truck-- but it's a higher-quality 'luxury' car, and gives a better ride. You pay for the quality and experience, though.
And, basically, there are a lot of people who are perfectly happy with their pickup truck. They're not about to pay more (at a very uneven scale) for more performance of a semi-truck, nor do they care for the luxury of a BMW.
(And, the Itanium isn't as great as the other workstation processors, but it's also the only 1st gen chip in the bunch; The 1st gen SPARC, Power, and PA-RISC processors weren't wonderful either.)
The Itanium also has one major problem with reguard to die size: It's binary compatible with both x86 and PA-RISC processors; meaning that while the pure IA-64 architecture part of the chip is smaller than the Hammer, it then has the circutry to decode x86 (which is a huge # of transistors, and hence, huge die area), PA-RISC (a much simpler/smaller addition to the x86 decode), and the IA-64's own VLIW decode.
If the hammer had three seperate instruction decoders (one CISC, one RISC, one VLIW), then it would have a huge die area too. But the Hammer has one (CISC). And even the Athlons would be half their current size if they were pure RISC rather than CISC. (Of course, they wouldn't be x86 compatible then, but that's markets for ya.)
The 64-bit extensions don't comprise an entirely new instruction set, primarily because they're just that: extensions. The Hammer's mechanism to extend from 32 to 64 bits is identical to the way the '386 extended from 16 to 32 bits. (This is from AMD's data). The '386 also added a couple more instructions (and registers) to the '286 design. That doesn't make an entirely different instruction set and/or decode.
Hammer has a substantially smaller die than P4, it's main competitor.
Again, that's comparing a fab tech that is in the near-future compared to one that's been used for over a year. Not a fair comparison by any means. It's like saying that the Athlon has a smaller die than the K6. Completely different chip generations.
And since both Intel and AMD are working together (with about every other semiconducter maker) on researching new fab techs, you can bet Intel will have the same fab tech of the Hammer. (I do know the Itanium II uses a 0.09 micron fab tech, which is unprecedented for the scale.)
Supposedly the Itanium was (more or less) a rushed release (similar to the PowerPC G4). The Itanium II seems to have improved by a few orders of magnitude in efficiency, as well as speed. For that matter, the PowerPC G5 (which is not being rushed out) specs about 2x faster than IBM's Power4 core.
And, remember, as I said before, the Itanium is currently targeted at the Workstation/high-end server market; NOT the PC market. When I say workstation, I mean "ultra-high performance, ultra-high stability (and typically, ultra-high cost)" market. The Itanium is priced similarly to the primary competitors in the arena, those being UltraSPARC, Power, Alpha, and PA-RISC. The first-gen Itanium is not (and was never intended) to be anywhere near your local conumer electronics store (or your local system builder, for that matter).
The Athlon MP is not real workstation class by any stretch of the imagination. No competant engineer even trusts the architecture with critical tasks. I have yet to see anybody design computer hardware (or vehicles, or perform complex simulations, scientific calculations, or true enterprise-level work) on x86 hardware. The hardware, while cheap, still crashes far, far too often... it doesn't have anywhere near as good of a memory (and system) architecture... the list goes on and on.
The reason PC's are used for 'render farms' are because they're so cheap. If a computer crashes, then they just have to re-boot it and re-render the current frame (losing only a few hours work at most, and even then in a relatively non-critical task).
To be short: Sun dominates the workstation market, followed by HP, IBM, and SGI. None of their workstations (with exceptions to SGI's lowest-cost graphic workstations), run x86. That's over 95% of the workstation market.
There is the issue of OEM support, but if Hammer meets spec it will be in high demand. Unquestionably. However, that doesn't mean it will be successful. AMD once made the world's most popular RISC processor (hands down). It literally blew everything else away in terms of sales. AMD discontinued production because, in spite of very high demand for the hardware, they couldn't come close to competing with the other architectures (or, to be more specific, although hardware makers loved it, nobody wrote software for it.)
If the Hammer isn't compatible with IA-64 compiled binaries, then AMD will have to fund the development of Hammer-compiled versions, as software developers, following the money, will support IA-64 first. AMD has done this in the past already, but had to give up because it wasn't profitable. (Not coincidentally, it's the same RISC processor that was in such high demand that was the source of this headache).
Why don't you go to the Usenet group comp.compilers and state that "Except for pathological cases, C code runs a few hundred percent slower than assembler".
The resulting blood bath should be amusing.;-)
As I said, the assembler vs compiled fight is quite long running. Stating asm vs compilers arguments in a compiler newsgroup would get a similar response to a windows user extolling the virtues of WinXP in a Mac (or linux) group.
And, unsurprisingly, stating that C is anywhere near as efficient as pure asm in an assembly newsgroup would be a bloodbath as well.
The main argument for using C is that it is generally faster software development, and generates code that is 'acceptable'.
Pure asm takes more time to develop, but results in significantly tighter/faster code. The L4 microkernel kernel is a great example of this: The C implementation is much slower than the asm implementation.
But, unsuprisingly, the C implementation is a bit easier to work with.
HP did some research a while back (2-3 years) with software optimisation. They discovered a few interesting things: They could 'emulate' (using full architecture emulation) compiled programs with ~5-15% greater performance than running the same binary natively. (The emulator was emulating the PA-RISC architecture, and ran on top of PA-RISC hardware, so the test was conducted on the same machine) The emulator was capable of making up for inefficiencies the compiler added into the code. In spite of the (large) overhead of the emulation, the program still ran faster while emulated.
While it has yet to really see more than tech demo releases, the Amiga OS4 technologies are quite similar: They are able to run the exact same binary on multiple platforms (PowerPC, x86, IA-64, SPARC, and MIPS) with no drop in performance (compared to natively-compiled versions of the same code). Again, this is due to (current) compiler problems. (PS- I'm not an Amiga fan per se, but I do admire how well-engineered they were for their day)
Another good example is the speed difference between different compilers on the same platform. If they compiled to anything remotely close to the speed of asm, then there wouldn't be a 15-20% speed difference between a highly specialized compiler (such as Intel's) versus a more generic (cross-platform) compiler (such as gcc).
Don't get me wrong: There's nothing really wrong with using a compiled (or interpreted) language. There are very definate benefits to their use (development and maintenance time being primary considerations). Compiled languages are acceptably fast, and compilers are getting steadily better.
But I doubt we'll ever see more than a fraction of embedded devices use a more high-level language. A price difference of $0.01 adds up to real money (and reduced cost) in commercial production runs.. In addition, even where compiled languages are used, the resulting code is still de-compiled and the results scrutinized closely. (Which isn't much different than just writing the whole thing in asm anyway).
The discussion is about Hammer vs (insert processor)
But all are (or will be used) in embedded system design anyway, so that's where my train of thought was leading. The Hammer mainly has 'momentum' going for it. Just about everything else is against it.
First, the Hammer is a design of a few orders of magnitude more complex than anything else ever attempted. The engineers at DEC dropped the VAX processors and designed the Alpha to avoid the same complexity issues the Hammer is trying to tackle.
First, it uses the x86 set, which has both more instructions, and more complexity (some would say features) per instruction than a pure RISC processor. About half the Athlon's design is just to decode the instructions it's given. After decode from x86 into its internal RISC structure, it then schedules the pipeline, and finally actually sends the data into the appropriate pipeline for execution. There is a huge amount of overhead just to decode what needs to be done.
Pure RISC designs use about 15% of the chip's transistors for decode, and that's if you include pipeline scheduling.
This is the crux of the problem for AMD's hammer. The hammer will be forced to use a much larger transistor count than its RISC competitors. The higher transistor count results in several problems: It's far more complex and expensive to design. It takes a more complicated and expensive process to fab. The die is larger, which results in a slower processor. And it uses more power.
Which means that while AMD may have some momentum going for it, the Hammer is far more costly to design and produce than its competition. This will make things very hard for AMD; espescially if Intel is able to use its (considerably greater resources) to get computer makers to move from x86 to IA-64 at the same time they move from 32 to 64-bit.
And since HP/Compaq, Dell, Gateway, Micron, and IBM have all thrown in with IA-64... Things look grim for the Hammer.
The good thing is that I would bet that the RISC back end to the hammer is designed so it can be mated with an IA-64 interface should the x86-interface core not take off.
or are you claiming you could run Quake 3 on a 386, if it were written in assembly?;-)
Not exactly a fair comparison, given that 3D Acceleration was a rather expensive solution back in the days of the '386. Most of which were used for military flight sims. The accelerator was about the size of a refrigerator, and connected to the 'host' computer, which was usually a SPARCstation @ 33 MHz. (At least in the case of sims made by Evans & Sutherland, which had the market for them pretty much cornered)
However, I wouldn't be too surprised that the non-graphics portions of Q3A would run fairly well (but not great) on a '386 (if it had a '387 FPU as well.).
I'll say this: Without a dedicated 3D card, it would take a Power4 module to tackle Q3A at max settings. (Of course, a Power4 module isn't a single processor-- it's 8 processor cores roughly analogous to a PowerPC. And just one of the current Power4 cores outruns AMD's 'best-case' specs for the Hammer (which is still in development).
Except for pathological cases, C code will run a few percent slower than hand-tuned assembler
More the opposite; except for pathological cases, C code runs a few hundred percent slower than assembler. (Although on the IA-64 architecture, this is not necessarily true, as it relies entirely on the compiler to explicitly state operation order. The IA-64 does not re-order operations (or do any pipeline scheduling) at all, which is one of the primary reasons the Itanium runs x86 code so slowly.
And, the IA-64 arch. is about the only one out there where a C compiled program stands a fair chance against pure asm, (since it requires the pipeline scheduling to be explicitly stated by the programmer, which is an extremely difficult task for mere mortals.)
Frankly, I'm not about to argue any further. The asm vs c/compiled is older than vi vs. emacs; except the 'vi vs. emacs' doesn't have much impact on the speed of programs written in it. I know for my own experience how much faster assembler is than Compiled languages. I stand by my numbers. So do all the hardware engineers I know, including a couple whom have Ph.D.'s in compiler design.
C is great because it compiles well, and is cross-platform. Asm doens't require THAT much more development time than C does... But ASM is so device specific that unless you're writing software for a driver or embedded devices, the advantage of C more portable nature outweighs ASM's speed.
I mean... think of what Carmack would give up if he wrote his graphics engines in asm: NO cross-platform capability, a nightmare interfacing with graphics card drivers, and almost no flexibility in the graphics engine.
And since the advantages of ID's graphics engines have always been broad platform and hardware support, and extreme graphics engine flexibility. He'd lose a significant part of his market if he wrote it in asm. Plus, other companies would have graphics engines that, while somewhat slower, would be available far sooner than an asm implementation.
To 'cure' an ailment, you first have to treat the cause of the disease instead of the symptoms.
Again, refer to my earlier statements. Diseases are about the only health problem for which a cure can exist. But diseases just a slice of the entire 'health problem' pie. There is far more to health care (and pharmaceuticals) than ridding a body of an infectious disease.
they had decided to "patent instead of publish". In the end nothing useful was accomplished because the research results were secret, and those suffering from the disease could not afford commercially synthesized drugs.
Um... patents are published. In fact, that's one of the main reasons we have patents -- to force publication (in a public arena) of a good idea.
Besides... to grow any herb in quantaties that are useful is often far more expensive (if it's possible at all) than the synthesized drug.
You've just proven you have no practical knowledge of software development. Far less than 1% of desktop/workstation/server software is programmed in assembler. Perhaps the inner loop of some game engines might be, but I doubt even that in most cases.
If desktop computers accounted for more than a tiny fraction of the whole computer market, I might actually care about that statement. Fortunately, the vast majority of computers are embedded systems, and a substantial portion of embedded code is pure asm.
One of the main points of developing faster processors with large amounts of memory was to enable the use of more programmer-friendly languages. It is simply not worth the cost to develop systems of any size in assembly.
No, that's the software designers point of view. The hardware designers point of view is to maintain the performance of software written by overworked programmers who don't have the time to do it right.
Finally, if you think C code "usually" runs several times slower than assembler, you're just plain out to lunch.
First off, C code does execute several times slower than assembler. On the order of 5-10x is typical. Compilers really aren't that wonderful.
Just because it's misleading doesn't make it false.
It does taste sweeter when its more dilute.
Such is the case where marketers can get around such nasty things like 'lying' and 'false advertising.' They just tell the truth in a way that is so misleading that it gives an impression opposite of the truth.
The drug companies want the maintenance drug, the patients want the cure.
Sure the patients may want the cure. However, there's almost never a real 'cure' to most ailments. There's rarely any such thing as a 'magic bullet' for such a thing. On this point, pretty much everybody agrees, from the loudest herbal/homeopathic advocate to the most conservative scientist.
Antibiotics (and many infectious diseases) are the closest thing to a case for the existence 'magic bullet'. And we still get sick, don't we?
There's no getting around this fact. Most health problems are a matter of treatment, because a real cure is utterly impossible. You can't cure old age; we can merely relieve some of its symptoms. We can't cure joint and/or muscle problems. You can't 'cure' chemical depression; the person's brain just can't maintain the right chemistry on its own. Treatment (or as you put it, maintenance) is required
Which suits health care professionals just fine. All want your repeated business. From herbalists, to acupuncturists, to doctors, and pharmaceutical companies.
I used to work for a company that markets 'nutritional supplements,' including herbals. I worked in the lab. The company spends millions on research every year, into ways to maintain or improve health. One thing they researched vigorously: If herbs had the properties its advocates claimed.
In nearly every case, there were no particularly unique compounds, organic or inorganic. In clinical studies, the result was no better than a placebo. The exceptions are very conditional at best (ie. slight improvement in memory of a small percentage of alhzeimer's patients (and only among alhzeimers patients); the claim was it substantially improves memory for all)
Drug companies do the same; an astounding number of drugs are biological in origin, they haven't forgotten that. Aspirin (willow bark tea), penicilin (mold), opiates, and coca derived drugs among them. They're still finding benefits to aspirin. And just because it's bilogical in origin doesn't mean it's not patentable. (You can patent the use of this chemical, found in (komodo dragon saliva, cow feces, this plant... whatever) for the treatment of (whatever ails you, so long as you can prove it is safe and effective to your government's equivalent of the US's FDA)
So it's not that herbal solutions aren't being looked at seriously, or that there isn't funding. It's just that in nearly every case, the actual facts of an herbs properties do not back up the claims of its advocates. For that matter, many herbs touted as 'good' are in fact rather dangerous. (I even remember reading about the healing properties of Hemlock).
If an herb has good properties, we study it, and find out how it works. In nearly every case, it's easiest (and cheapest and safest for everyone) to isolate the compound(s) responsible, and synthesise them. (Which can then be patented)
I mean, there's a lot of proven harmful products out there (cigarette, for example) that haven't gone away yet.
Logic doesn't have much to do with things that have been part of a culture for centuries. Tobacco is one of them. Alcohol is another.
Alcohol is so dangerous: The statistics of crime (espescially murder and rape) commited under the influence of alcohol are staggering. Ditto for accident rates. Alcohol is dangerous to an individual, and to the public.
At one point in time, the US did the 'logical' thing and ban alcohol completely by banning the substance in its constitution. (And for anybody not familiar with american politics, amending the constitution is not an easy or trivial thing to do.
However, while it was the 'logical' thing to do, the consumption of alcohol was (and still is) so ingrained into American culture that its people simply rebelled wholesale against it. Alcohol production simply went underground. Eventually, another amendment was added to repeal the ban on alcohol, having decided that making a law can't change a culture.
The harmful effects of tobacco weren't proven until relatively recently; and having learned a lesson from banning alcohol (along with experience gained from law enforcement with other substances, like cocaine, opiates, marijuana, methanphetamines, etc...) Congress is unlikely to try to outlaw tobacco, in spite of its harmful properties (both public and private).
So while the government can't stop the use of 'cultural' drugs/substances, it can (with reasonable success) keep substances that have not come into common use from becoming popular, and harming the public.
First, almost all programmers can (thankfully) ignore the underlying instruction set and program in a higher level language - therefore it is irrelevant. x86-64 is actually quite an improvement over IA32 regardless.
Oooh! A higher level language!!!
So is BASIC! And you can get it for any platform and your code will run.
Whoopee! It's still dog slow and takes up more resources than is necessary to get the job done. Even compiled (C) code usually runs several times slower and requires more memory than assembler.
Second, if an instruction set is sufficiently efficient to allow the processor to be the fastest microprocessor in the world,
First, an instruction set has little to do with the speed of the processor. The whole CISC vs. RISC thing has more than shown that. An instruction set has more to do with the difficulty and/or complexity of the processor's design. The CISC instruction set requires more (electrical) power, and more transistors to do the same job.
Second, it's to be the fastest in the world? By what method is this measured? Clock speed? Size of the pipeline? Number of pipelines? Clocks per (integer, float, or instruction)?
The hammer isn't even meant to compete with workstation processors in terms of speed. I'll take a SPARC or Itanium any day. (It's a sad thing that so many seem to forget that the Itanium is an HP design, the successor to its PA-RISC, and that newer versions of the Itanium will include many of the Alpha's technologies).
on't know what you're complaining about those IRQ's, all systems in the world have something similar in concept to IRQ's. And in fact most systems throughout the world are now standardized on PCI, so they use the same IRQ mechanism as PC's.
Exactly true. Although the number and arrangement of the interrupts may be different. I would prefer not to think of how dog slow computers would be if they had to actively poll system devices (from video cards to keyboards). It's sooo much nicer to use an interrupt system.
And what about them "lousy serial ports"? That's absolutely essential in maintaining control over large groups of Unix servers. Their consoles are invariably serial-port based. They do have nice modern GUI consoles, but when it comes to stacking them into a server room and controlling them all from a single input/output source, nothing beats the simplicity of a serial console tty device. And since they're X Window or Java based, you can simply do all of your graphical stuff from the comfort of your own PC logging in remotely, but the local administration can be done over non-graphical serial ports.
While not arguing this point in the least, I will say one thing: The way the serial ports are set up on the x86 is a bit messy. The Unix boxen I've worked with had a more elegant system for serial ports. (Although most of them also didn't have the same backwards-compatibility problems x86 has).
16 GPR's are well within the current norms for RISC processors. Don't forget this is a CISC processor, so more than likely there will be all kinds of hidden internal registers for register renaming available. BTW, the 6502 had a whole 2 GPRs available to it.
Funny... I seem to remember most RISC processors I've known (or designed) to have at least 32 GPR's.
Besides... The point of moving away from CISC is so a processor doesn't use over 1/2 its transistors just to decode the instruction. The instruction decode section of the pipeline shouldn't be the single most complex part; unfortunately on a CISC processor, that's where ~50% of the transistors are.
I'm also fully aware of the 'evolution' of PC architecture. I've been programming x86 asm for quite a while as well. Many of the x86 (even modern ones) ways of doing things are just... inelegant (or ugly)
Not as coupled as you may think. SSE, MMX, 3dNow!-- all are either specialized floating point calculations (such as 128 bit floats-- but you can pack 2 64 bit floats in it and get a speed boost there), or are used for matrix operations. All of the above are decoded into many RISCOps, which are are then passed to the somewhat more 'generic' FP pipeline(s).
I also seriously doubt the decoder takes up half the CPU space.
You'd be greatly suprised. The whole RISC vs. CISC debate centers around this fact. The reason DEC dropped its VAX (CISC) processors and created the Alpha (RISC) was because the decode stage in the VAX would take more than 1/2 the design. Motorola dumped its 680x0 for PowerPC for the same reason. As did IBM and HP for their RISC chips. And Sun's had its RISC SPARC processors as long as I can remember.
Only Intel's x86 (and x86 clones from AMD, Cyrix, and others) remained CISC. Many analysts thought remaining CISC would kill Intel (as CISC chips are more expensive and more difficult to design). Only the sheer volume of x86 allows Intel to spend the money to develop CISC chips that perform as well as the much simpler RISC designs. (ie. if you look at what Intel spends to develop their x86 designs, and compare it to the development cost for a SPARC, Intel spends a lot more for a slower product).
AMD kept competitive by joining two chips: The decoder and the processing core. The decoder would only have to change if new instructions were added (such as SSE/2). And AMD could concentrate on a wickedly-fast RISC core.
The reason CISC ships have such a huge decoder is because of the complexity of the instructions.
A RISC instruction does only one task, in one way.
A RISC design has an absolute minimum of instructions, with no redundancy.
And to perform more complex tasks, you have to combine multiple instructions.
RISC instruction sets are not as easy to program in assembler with. (or, to be more accurate, it's a lot more tedious)
A CISC (like x86) has multiple-purpose, multiple method instructions. (somewhat like operator overloading in an Object Oriented language).
A CISC chip has ~300+ instructions, where a RISC chip has ~70. (With some RISC chips having as few as 40)
CISC assembler closely resembles a more high-level language.
Implementing high-level programming instructions in hardware takes a lot of transistors.
RISC designs have both less complicated (2-3 times simpler) instructions, and fewer instructions (by 3-6 times fewer).
Even in a RISC design, decode is at least 15% of the total design (and more if you use out-of-order execution).
I also don't think Intel plans to sell the Itanium to the public for a long time. They may also add 64 bit capabilities to their Pentium line of chips to compete with AMD that way.
Their marketing is a bit wishy-washy about that; I think they'll wait and see for the time being. There is a project at Intel called 'Yamhill' which is intended to be a 64-bit x86 clone (like AMD's Opteron).
The thing is there's not too much of a reason to move to 64-bit, save it be addressing more memory, or other things that require large integers. And if you're addressing that much memory, x86 is a lousy choice to start with.
As I said- I don't know much about VLIW at the current time; And I am quite ignorant of TransMeta's Code Morphing.
So I'll decline to comment further.
x86 is dying. Apple isn't known for living in the past.
Just because it's Intel doesn't mean it's x86.
Releasing OSX for x86 is completely moronic. Apple is a computer company, not a software company. They sell computers first, software second. If OSX ran on 'open' PC hardware, nobody would buy Apple computers-- they'd buy cheap hardware and OSX.
This is exactly what happened circa 1995 when there were Mac clones. The clones bled Apple dry. Steve Jobs saved Apple by making it a closed system again. Openness only works in a world that believes in openness. The clones exploited Apple's generosity, and it nearly killed Apple.
Pit any software against Microsoft, and expect Microsoft to attempt to kill it. Apple is doing well because they cooperate with Microsoft. If OSX were released for commodity PC hardware, and Microsoft will dump Office/Mac, and basically shut OSX out of the market (as it did with Netscape).
Free software is surviving Microsoft because it can't out-compete with Free Software's price. There's no company to bankrupt, and the software is largely donated by generous coders. Apple has no such protection. They can go bankrupt, and they don't have the hordes of programmers donating code that Free Software enjoys.
Only on x86 code. It's like a mac user (10 years ago) complaining that their new PowerPC runs the same program slower than their old 680x0.
Anything runs slower under emulation -- whether it's hardware or software providing the emulation. Espescially if the emulation is more of a 'white elephant' that isn't really intended to be used.
The Itanium is x86-compatible. There was never a promise that Itanium would execute x86 fast; the current Itaniums aren't even meant for the consumer market, but for workstation use; workstation code, such as the OS (Win64 & Lin64) and apps. And the apps are usually written to be portable, as the market requires it to run on (PPC, PA-RISC, x86, SPARC, MIPS) computers anyhow. Just re-compile.
And, FWIW, Itanium runs PA-RISC code about as well as the PA-RISC does. There's just more in common between PA-RISC/Itanium than there is x86/Itanium.
I believe the poster meant that there are 2 different Itaniums:
One that is x86 backwards compatible (and only x86 backwards compatible)
One that is PA-RISC backwards compatible (and only PA-RISC backwards compatible)
And the poster thought it not much of a stretch to create a third version, that is PowerPC compatible.
Of course, there is only one Itanium core, and it handles all 3 (as you said). However, most RISC chips (such as the PA-RISC and PowerPC) at least have enough similarities that emulating PPC on PA-RISC (or using the PA-RISC decoder) is relatively simple; the opcodes may be different, but otherwise almost everything translates over directly.
eg. (example-- the actual binary is probably different)
Function to perform: A+B=C
"ADD A, B, C"='AF0F32BFh' in PPC machine language.
"ADD C, A, B"='CBBF0F32h' in PA-RISC machine language.
The difference is the opcode byte (AF v. CB), and ordering (A+B=C v. C=A+B)
The commands translate directly over, and only the formatting of the instruction matters. Easy emulation. x86 emulation is more of a bear: a single instruction can do different things, depending on the context (almost like operator overloading in assembly)
There have been similar rumors about using AMD chips; they go along these lines:
AMD Athlon & Opteron processors are really two processors: 1.) An x86 decoder, which translates the x86 instructions to 2.) AMD's completely original RISC core; each is roughly 1/2 of the total die size.
Take the upcoming Opteron, chop off the x86 decoder (which is about 1/2 of the chip), and use its RISC core natively (and emulate PPC)
Take the Opteron, and replace the x86 decoder with a PPC decoder (which would still be a smaller die than the x86 Opteron)
AMD is more likely to modify their design than Intel is.
Of course, the argument can be made 'why modify anything?'
As the poster said: x86 is on its last legs. The Opteron is likely the bed it will die in. There's really no reason to even have a CISC chip now that compiled languages are used instead of assembly.
There aren't many compelling things that show that VLIW is a better design paradigm than RISC. Few convincing reasons that VLIW (Itanium) is better than RISC (PowerPC)
Even Intel will have to debunk the MHz myth when trying to convince the public to buy the consumer version of Itanium, rather than the x86 Opteron.
Itanium and PowerPC have roughly equivalent SPEC scores at the same clock speeds.
There's not much to show that PowerPC is 'showing its age', as many of Itanium's touters claim. (It's more of a VLIW vs. RISC argument)
Apple has already done the processor emulation: When it moved from 680x0 to PowerPC. It's not as big a problem for them, having learned how to do it)
Actually, endian-ness isn't even an issue either. Both PowerPC and Penium processors have an endian flag-- this allows the processor(s) to use either byte order with no performance drop. Of course, the two have opposite 'native' modes, but it honestly doesn't matter in terms of speed.
It has been shown, however, that even though it's impossible to tell the difference in frame rate. However, in real life (as in games) there are things that happen too fast to see the motion.
Games are full of explosions, etc. Very high-speed motion. Most people have watched too many movies; they're used to 'slow' explosions where debris & effected objects are visible on screen. Movie makers know we like eye candy, so they give it to us.
Reality is quite different. A TOW missile explodes before it hits its target. The expanding gas forms a 2-4" hole in the targets armor in microseconds. A person watching it can't see the transition. Bullet wounds take 2-3 frames to fully appear in a movie. Reality is more like 1e-6 frames. Explosives can lift an entire car feet into the air so fast that a human thinks it's instantaneous.
Video games use fairly real physics, as it both makes animating easier, as well as having a more realistic 'feel'. The frames rendered follow the model. With even moderately real physics, an object can move large distances in between frames.
And, of course, there's the ultimate trump: Online gaming. Where the object boundaries (often simplified/compressed) must be transmitted over a low-bandwidth link, with a latency of hundreds of milliseconds. It doesn't matter how fast the graphics card renders, or how well the game keeps track of positions interally.
Updates of 30/sec is pretty optimistic, with 10-20/sec more typical. Other players can 'pop' locations in between frames simply because, in between location updates, the opponent's 'actual' location(s) end up being different than the one the CPU guessed it would be.
Which can mean 4-5 frames were rendered with incorrect locations, the update is recieved, and the 'real' frame is rendered. Next the game guesses where the opponent will be by the next update, and renders the frames necessary to make things look smooth.
The guessing is an imperfect way to make up for the large difference in frame rates and multiplayer location updates. However, there simply isn't any option; there are 4-5 frames that must be rendered before the next update. Simply 'stitting still' looks awful, and lends itself to the perception of a lower framerate than actually exists.
Programmers try to close the gap by making an educated guess. Since they use a realistic motion model (inertia, gravity, etc.), nearly all the possibilities for the 'next frame' can be eliminated immediately. Then it just chooses a 'middle road' that is close enough that us humans don't notice.
Any high-speed, unexpected changes (such as an explosion) can foil the system:
The player thinks they've killed someone (that's what was rendered/displayed on their screen, after all)
But the estimate was wrong. The 'someone' was actually in a safe place when the explosion changed things.
There is no 'backing up', so the next frame shows the person alive and well, and in a completely different place.
The gamer gets upset because they want a perfectly synchronized game
The much lower frequency of positional updates is unacceptably 'chooppy' when such synchronization is used.
The programmers use a 'physics' trick to try and smooth out the picture, but the trick sacrifices accuracy.
I smell a know-it-all that needs whackin'.
Modern graphics API's use time, not frames, to determine the speed of a game. The graphics rendering is completely independant of actual gameplay. It's the only way a game can be expected to run 'at the same speed' on the wide range of hardware.
Ever use a boot disk to load DOS and then play an old game? The original, unadapted Wing Commander (1989 version) is completely unplayable because hardware speeds were so close back then, that the programmers could get away with using frame rate to regulate gameplay.
A modern game doesn't care if the rocket impact is rendered. The game registers 'impact' by the vertex's position, which is computed seperately. When the graphics card does the vertex handling, the game still keeps a (much smaller) set to calculate object positions. In other words:
The graphics card computes thousands of vertices, and renders the entire scene once.
The CPU will compute a few hundred vertices. (the collision boundaries, which is generally a bunch of cubes the model fits inside) There is all kinds of time for the CPU to compute a few hundred intermediate steps before the graphics card asks for the next 'snapshot' to render.
No, you don't miss the frames at all. What is this so-called 'need'? First, there is a very big difference between keeping track of the objects (Poly boundaries/collisions, positioning the vertices, etc), and actually rendering them. Vertex calculations (including physics and animation) is much less computationally-intensive. That's why the first 3D cards really only handled rendering. The CPU still did all the vertex operations-- the 3D card did the (exponentially) more intensive rendering of the frame.
The way it usually works is as follows:
Frame Buffer A is displayed on screen
Graphics card renders to Frame Buffer B
Graphics Card renders to Frame Buffer C
When all of Buffer A has been displayed, flip display pages (or use a blit) to Buffer B.
Frame Buffer B is displayed on screen
Graphics Card renders to Frame Buffer A
IF Frame Buffer A finishes rendering before B finishes drawing, flip pages (or blit) to C.
Begin rendering B
If A is being displayed, render C.
If C is being displayed, render A.
If the buffer isn't being displayed, render the next frame. Show frames in order, but drop frames when a more recent one is available.
And so on. This is 'triple buffering', which not all games support (although it is becoming much more common). Double-buffering is almost always used, where there are only 'A' and 'B' buffers.
Which means, that even with vsync enabled, the card is capable of rendering 120 (double) or 180 (triple) buffered. (And that's at an eye straining 60 Hz. With a better monitor that refreshes at, say 85 Hz, the card renders 170 (double) to 255 frames per second.
It is in Microsoft's best interest to charge money for these patents, especially unreasonable amounts of it, because it makes DirectX the only affordable option and locks you into Microsoft software and x86 hardware.
OpenGL would be unaffordable how? nVIDIA already has its own fully-licenced OpenGL drivers for the 3 major OSes (Windows, Mac, and Linux). ATI & Matrox have Windows & Mac covered; the only question is Linux, where neither write drivers. It's not impossible to have MESA implement all non-'patented' OpenGL functions, and the respective hardware makers release the remainder under the (necessary) closed licence.
And more to the point: Windows has a mechanism to allow for other non-DirectX graphics API's. Vid card manufacturers (usually) own full OpenGL licences, and they write complete implementations of OpenGL in their drivers anyway. (Or, to be more specific, they implement the segments of OpenGL that aren't already in their hardware).
Price isn't even an issue, and never was. The cost is shouldered by the vid card makers, and is is hewn down to pennies by the time we pay for it. Neither is x86 hardware-- Or have you forgotten that the primary implementation of WinXP-64bit, which includes DirectX, is Itanium (and while is x86 compatible, it is not x86 or even close to it).
The only real problems that arise is the (expected) moaning that Microsoft is getting money from us whether we buy their software or not, and the future of Mesa or other "Free(dom)" implementations.
And there's nothing from stopping Mesa from implementing everything non-patented, and leaving the patented portions to the hardware makers. Which is still a good deal for ATI or Matrox, as they would only have to write a partial portion of the driver.
For users of nVIDIA and Windows/Mac/Linux, there is and will probably never be a problem; they write their own drivers for all three anyway.
"Free(dom)" software drivers aside, I prefer an excellent, closed-source driver(s) such as nVIDIA's to absolutely no driver at all. It isn't necessarily the HW maker's fault; they have to follow IP laws, and are often kept from releasing source code because of IP laws. If they leave out the 'locked' feature, they lose a competitive advantage, and business to the companies who do. So they choose the best path allowed by law, and provide a non-free driver to a Free OS.
IP law isn't necessarily a bad thing; it's what makes the GPL work. Were it not for IP law, there's nothing from keeping Microsoft from selling our own code back to us.
Information does not want to be free. If it did, we wouldn't have to spend billions in research, either theoretical or applied. People don't give up years of their lives and thousands of dollars to college education because information simply wants to saturate their brains; but because the information requires an active, continuous effort to both spread and simply continue to remain known. Information does everything it can to remain secret. Without our own constant vigilence, all the knowledge and information mankind has collected over the ages would hide iteself again. Skills and facts are forgotten. Books age and crumble. CD's and magnetic media decay.
It takes long, hard work to get information. The whole entropy argument ignores the fact that information is an organized substance, and entropy works against organization, and towards chaos.
While I don't agree on the period of time involved in patents (and espescially copyrights), there has to be a real financial incentive to seek and preserve information. Otherwise, the quest for information and knowledge will be left to rich eccentrics, as was the case centuries ago.
IP law is what made it possible for a person to be a scientist, and earn a living at the same time. It gave them a chance to sell the information they found, and buy their daily bread with the money gained. Without this capability -- to sell the fruits of research and thinking, we would live in a world with very few professional scientists, professional engineers, professional writers (so long to the Lord of the Rings and Dune!) We wouldn't even have flown aircraft yet, let alone flown to the moon.
This does not underscore the greatness of Free Software; it's one of the most altruistic services for all of mankind. But to expect all knowledge to be "Free" is like expecting a farmer to give away his crop.
The world would be nice if everybody shared in this way, but there is a greater human desire to have more if you work more, and that a skilled worker should have more than an unskilled worker. If there isn't an incentive to hard work, study, and the honing of skills, civilization would have never developed.
SGI is still in charge?
SGI isn't 'in charge' per se; the ARB is (the ARB consists of various hardware & software makers, including Microsoft, nVIDIA, ATI, Matrox, SGI, Sun, and Evans & Sutherland). However, OpenGL is a trademark of SGI, so they get to make the announcement.
This argument seems to be more a Rambus vs. DDR thing; and even then on commodity boxen. But I digress. In both cases there is currently an off-chip memory controller. The big reason for the difference in latency is not the controller itself, but the (completely different) methods of transferring data. Rambus uses a serial data transfer, which is easy to scale up (in terms of speed and bandwidth), but has higher latency. DDR is an older, parallell technology. DDR has lower latency, but has lower bandwidth and is much harder to scale up. This primarily because of electromagnetic crosstalk (and other E&M interference problems) within DDR's (parallell) data paths.
There is a point of limited returns with the low latencies DDR offers; the point is frequently reached on high-performance computers (workstations, scientific processing, and high-end servers) where the bandwidth is the key factor. When you're transferring a few GB of memory, who cares that it takes a few us longer to start receieving data-- overall, the entire transfer (from request to completion) takes much less time. Even Wintel boxen are beginning to reach this point.
Personally, I wonder how RAMBUS even got a patent. I don't see how a serial memory bus is 'non-obvious to the trade's practitioners'. But, that's the USPTO for you.
Another major problem is the physical distance to (as well as speed of) DRAM. Silicon technology has already reached the point where a signal often travels faster through logic gates (such as an off-CPU controller) than it does through wire. So long as the memory controller is physically located between the DRAM and the CPU, there is little chance there will be any performance drop. At current CPU speeds, it takes 2-3 clock cycles for any signal to even reach the DRAM (even light-speed is slow at 1 GHz). Then it takes several more before the DRAM addresses and returns data. Then another 2-3 clock cycles before it gets back to the CPU. An off-CPU DRAM controller may or may not take an additional cycle. For large (sequential addressed) memory transfers, this one cycle is a one-shot deal. Even with millions of tiny, single-byte (randomly selected) transfers, there is one million extra clock cycles 'burned up'. This would result in a performance drop of 0.05% on a 2GHz CPU. (And less as speeds increase)
As for Hypertransport, the idea behind that is not just absolute performance increases, but also design flexibility. So the same chipset that serves as a PC chipset, may also be able to serve as an 8-way server chipset, with few design changes (perhaps by adding or subtracting a few more HTT channels).
This is true; but as I said, it only really makes things better for the multiprocessing crowd. Chip makers don't usually pass the costs of a higher-complexity/performance chip to the buyers of a lower-complexity chip. The SP chipset would be the hands-down highest-volume seller. An MP chipset that is based from the SP design would cost less than a wholly-redesigned MP chipset. This suits the MP buyers fine... but it doesn't give any benefit to the SP buyers. The benefit is to MP alone.
Even within a desktop environment, you can easily separate out shared PCI/AGP buses, into multiple switched PCI/AGP buses with Hypertransport underlying them.
You can, but why? For all intents and purposes, the PCI/AGP bus is essentially idle 100% of the time. (The times when it is used is more of a statistical anomoly than fact; a figment of the deranged observer's imagination.) Even in applications when there actually is heavy bus activity, the PCI/AGP bus is far from being saturated. There are cases (such as multiport gigabit ethernet cards) where any single PCI slot is unable to handle the load -- but the PCI bus itself still has massive amounts of idle bandwidth; it's just that it's not possible to transfer the data between the network card and the PCI bus fast enough. (Which is a limitation of PCI's component interface, but not of its bus).
I've seen many servers that have multiple network interfaces, where each NIC saturates the PCI card slot. The actual PCI bus, however, is not saturated, and handles the full load of multiple saturated interfaces quite well.
In other words, it doesn't matter how wide the freeway is; the tollbooth (AKA the PCI Slot interface) is the bottleneck, and is the real limiter of performance. A HyperTransport-switched PCI bus would be like adding more lanes to a highway that has nearly no traffic on it. It doesn't change how fast you can drive. It's the long wait at the toll-booth at the on and off-ramps that is the speed problem.
Espescially as on many motherboards, AGP and PCI are on entirely different buses, so heavy AGP usage (such as DoomIII, or 3D Animation) doesn't even effect the PCI bus. For the desktop user, there is no benefit to such a scheme. Even a power-hungry gamer, using his AGP8X card to its fullest potential, compiling XFree86, and hosting multiple P2P file transfers couldn't do much to dent the PCI bus's capabilities. It's other x86 problems that are most likely to cause speed drops; not PCI or AGP.
Only in ultra-high-end applications would there be a benefit.
But it's not all of the other players it has to worry about, just one player: Intel. Intel may be allowed to use the HTT, but its absolutely certain they would rather die than use their great competitor's designs.
That's completely untrue. In several aspects. First, the NIH (Not Invented Here) syndrome has burned just about everybody. No company that is too proud to use a technology that was NIH lasts long. The managers at Intel are not that stupid. But they aren't going to jump on the bandwagon and spend any money just yet; they'll wait until they see how the results fare on the market before they invest anything in HyperTransport. If it's in Intel's best interest, they'll use it. If not, they'll design an alternative. To call AMD their 'great competitor' is rather short-sighted as well. They're only the most major competitor in the x86 arena, and one with a minority of the market. That's the reality, whether you like it or not. And I like (and have recently bought) AMD processors.
All of the other players are small-fry in terms of volume compared to the x86 camp.
That is an entirely baseless statement. The x86 camp is extremely small in terms of the 'other players'. Or weren't you aware that approximately 0% of all computers use an x86 chip? AMD has a very small production volume; so small they don't even fab their own chips. The only major competitor that is fab'd in such small volumes is SPARC. But Power & PowerPC, Itanium, and even ARM processors are all fab'd in greater volumes than AMD's. Intel plans on abandoning x86 entirely; their Yamhill (Hammer-like) processor is a contingency plan, to 'steal the Hammer's thunder.'
HP has no need to use HTT in its processors, simply because it has no processors anymore
Patently false. HP's processor is the Itanium. (more below)
all of them (PA-RISC and Alpha) have been EOL'ed according their own roadmaps, so what are they going to use them for, Itanium?
Their roadmap EOL's the PA-RISC, but points straight to Itanium. The Itanium is 100% PA-RISC compatible (in addition to supporting x86 and its own architecture). It is the next-gen PA-RISC. They are only supporting the next couple of releases of PA-RISC to appease people whom already have PA-RISC hardware, and wish to upgrade the processors in their pre-existing hardware. Alpha was acquired well after the Itanium was complete; a white elephant of sorts. It was never part of the plan. It's entirely likely that HP will include Alpha technologies into next-gen IA-64 chips. If there is customer demand (espescially if it's from Itanium's co-designers at HP), HyperTransport will be included as well.
Anyways, the only RISC player that is likely to use HTT is Sun, and they will likely use it in their upcoming Opteron servers. It's likely that IBM, HP, in addition to Sun all have Opteron plans secretly already devised.
Opteron is the Hammer's new brand-name, and Sun will definately not be using it.
Sun is 100% SPARC, has been for more than a decade, and they have no plans to abandon it. There is no such thing as an 'Opteron server' from Sun. Sun only sells SPARC boxen.
I already covered HP -- they're Itanium. Their roadmaps still point to it.
SGI's roadmap leads to Itanium for their workstations and servers. They will use Intel's answer to HyperTransport (whether it is HyperTransport or not)
IBM is all about their own Power and PowerPC processors, which has better SPECint and SPECfp scores than anything else to begin with.
It's likely that IBM has an Opteron-based PC and Windows.net server, but the Opteron won't be used in their high-end servers or workstations. IBM already scales well past the point where HyperTransport would be beneficial; and IBM is in the same boat as Intel: If it's worth their while, they'll either use or design an alternative for HyperTransport. But for IBM, it may be completely unnecessary to begin with.
Apple is likely to use HyperTransport, as they have a great deal of flexibility in what technologies are to be used in their machines. Apple is also a member of the HyperTransport consortium. Apple's market is definatley not a trivial one.
Which goes to show my point: Just because AMD's Opteron has great features, they are in no way unique to the Opteron. And its competitors have a better system architecture than x86 to boot.
So with the improved process technology they were able to get 70% better speeds (Athlon vs. P3), but with increased pipeline stages (P4 vs. P3) they were able to get 100% better speeds.
Interesting side note: One reason the Alpha does so well is that the physical design is very closely tuned to its fab process.
And a question: Do you mean a greater number of pipelines, or more pipeline stages?
I ask because more pipeline stages doesn't really increase speed very much (ie. there can be one instruction in each pipeline stage, but as each instruction takes one clock to move to the next stage, there isn't any improvement in speed.) In fact, shorter pipelines are often faster, as they don't have as much potential for stage bubbles.
A stage conflict is when, for example, you have a 5 stage pipeline. Instruction A comes immediately before B. However, instruction B requires that A finish the entire pipeline before it can begin executing. So, instruction B has to wait 4 more cycles before it can execute (instruction A must finish, which essentially clears out the pipeline) A 10-stage would take 10 cycles to clear out before B can execute.
Out-of-order execution can help keep the pipeline busy with other tasks while B is waiting to be executed; but it doens't always work out.
Additional pipelines (which is what I think you meant) is adding a second (or third, fourth...) identical pipeline, so that tasks unrelated to the A,B instructions (above) can be executed as well. Again, out-of-order execution helps keep things busy, but not always.
Which comes to the nice thing about VLIW design: The compiler (or, in the case of VLIW, the maschocistic asm coder) is able to take a larger look at program than is possible in a non-VLIW design (Which, AFAIK for the mass-produced chips, is everything except the Crusoe and Itanium). And that results in a more efficient run than having the hardware attempt to do it.
Of course, as far as design complexity goes, I'm not entirely sure which is easier to design: The out-of-order predicion chip, or a VLIW chip. I tend to believe the VLIW chip is more complex in design.
This directly causes the RISC system to require a bigger cache to keep the CPU fed with the same amount of work.
This isn't exactly true. PowerPC, as I recall uses a 64-bit instruction. (8 bytes) This includes the operation type, the source and destination registers, as well as any additional information.
CISC instructions are variable in size and purpose, and can range from one byte instructions (such as noop) to multibyte instructions that are greater than the 8 bytes the PowerPC uses.
So the situation isn't quite so dire; many RISC chips (such as MIPS) have very little 'wasted' bits in the instruction set.
The additional cache isn't anywhere near as big (or complex) as the total savings of RISC vs CISC die size. It's like taking 10 steps forward and one step back. (But don't quote me on the actual scale; as that may vary from chip to chip)
But you're absolutely right on the CISC at 250 MIPS vs. a 1000 MIPS RISC. But I'd much rather design the RISC chip, as it is so much easier than a CISC design of (roughly equivalent) speed.
Actually, it's primarily because Intel pushed better fab processes into production earlier than the RISC crowd, of whom only Motorola & IBM fab their own.
The Alpha was making a run for this crown, and it was the only horse in this race for the longest time, and then all of a sudden from out of nowhere both Intel and AMD both overhauled the Alpha as if it wasn't there.
Never underestimate the damaging effects of a corporate sale. When DEC was split between Intel and Compaq, (well before the 1 GHz barrier) it was the death knell for the Alpha-- there was simply too much disruption in the shift of companies. (not to mention the fact that many of Alpha's engineers wanted nothing to do with Intel or Compaq, so they left) Neither AMD or Intel was bought out, as DEC was. And AMD even ended up with some of Alpha's engineers!
That leaves the whole category of heavy-haul trucks unanswered by x86 at the moment. But what distinguishes a heavy-haul truck from a pickup? The ability to pull large loads. Is that all achieved by the truck's engine? No! Large trucks have incredible 18-speed transmissions, and stiff chassis, etc. In other words it's the overall package that distinguishes a heavy-hauler from a pickup... [it] describes a similar approach to how you distinguish a RISC processor-based (heavy haul) server from a PC (pickup) processor-based one.
So how's this got anything to do about Hammer?
Easy... Architecture. As you say, the engine is only a small (but significant) part of the entire package that makes the distinction. The rest is the architecture around which the engine is built. Frankly, even though there's been many improvements of the x86 design (primarily by eliminating ISA and replacing it with PCI/AGP), it still has its problems; which is why it will never be a true replacement for high-end workstations and servers.
Well, what it leads to is that Hammer has been designed right from the start to be everything from a car engine, to a pickup engine, to a heavy haul engine. That's because of its various features, such as Hypertransport, and onboard DRAM controller.
If it were designed from the ground up, it wouldn't be x86 compatible; not, at least, if the designers wanted a truly great processor. Rather, AMD hopes to ride the x86-compatibility market and is therefore adapting a phenomenal RISC core to the pre-existing x86 set. It's like bolting a jet engine on a farm tractor.
Hypertransport (as well as a built-in DRAM controller) is only useful on multiprocessor systems (I'm not downplaying their usefulness at all) The onboard DRAM controller allows each processor to have its own seperate memory (whereas many, including the IA-64, share the same memory through the system bus.) Combined with the increased multiprocessing effecinecy Hypertransport offers, the Hammer processor line seems to be clearly designed for multiprocessor systems. (Hypertransport and onboard DRAM doesn't provide any real benefit to a single-processor system)
It will be great for companies that want to upgrade their x86 server hardware, but want to keep their old software. It'll do great in the 3D animation and rendering studios, many of whom use a Unix-like OS anyway. But for the general desktop machine, there will be only one CPU, robbing the user of the benefits Hypertransport and the onboard DRAM module give.
One key here is that Hypertransport is not unique to the Hammer; SUN, HP, Motorola, SGI and Apple are all members of Hypertransport consortium, and intend to incorporate it into their processor designs.
The primary benefit of an onboard DRAM controller per chip (no longer sharing the same memory pool via a bus) is already implemented on other architectures by using multiple DRAM controllers.
My argument all along was that the Hammer isn't a good thing because it:
Keeps the paleolithic x86 architecture.
Could operate far faster if its RISC core didn't adapt itself to x86
We would be better off junking the x86 architecture sooner than later.
The Hammer, while an excellent x86 design, seeks to make the transition 'later', if at all.
Most of the responses I've seen are remarkably similar to a PC fan's reasons why they don't want to switch to a better machine than x86 can provide: They're cheap (the machines, although it can apply to a few users). Actual reasons as to the Hammer's 'superiority' are in no way particular to the Hammer, and are found in many of its competitor's drawing boards as well.
And outside the Free software world, where the software typicall only requires a recompile, the Hammer faces some serious, possibly fatal obstacles once 64-bit compiled commercial packages begin to replace the older 32-bit code. The commercial reality is that to be successful, the Hammer has to have natively-compiled 64-bit code. (In Windows) To do this, they have to have developers who will support Hammer/64 in addition to the IA-64. They'll have to either sell two different versions (somewhat similar to the sales of Mac vs PC / or Win32 vs x86Linux games), or have both binaries in one package. Both are expensive propositions, and with Intel's virtually guaranteed market-share, it may not be worth the effort to support Hammer.
For a brief history on AMD and binary incompatibility-- Jim Turley, a CPU/Architecture analyst, said the following: "Backing Intel's newest and heavily promoted next-generation architecture is a foregone conclusion for vendors that want to stay in business. Supporting AMD becomes more problematic. Will the added market share be worth the effort? Suddenly AMD finds itself in the same boat as Apple with a different, yet competitive, product that requires dedicated software support to survive.
Grimly, AMD itself lived through this tragedy not so many years ago, and the wound was self-inflicted. AMD unceremoniously axed its entire 29000 family, one of the most popular RISC processors of the early 1990s, due to the cost of software support. The company decommissioned the second-best-selling RISC in the world because subsidizing the independent software developers was sapping all the profits from 29K chip sales. As "successful" as it was, AMD had to abandon the 29K, the only original CPU architecture it ever created. " (emphasis added)
I'm not saying that the Hammer isn't a good processor.
I'm saying that it's putting a jet engine in a 1940's John Deere tractor. I'm saying the mechanic should dump the tractor, and put a jet engine in an aircraft-- not an ancient, over-extended farm tool. The tractor could still do its job, but it's just such a waste of the engine's potential.
I'm sorry, but the x86 instruction set is old and inefficient; it doesn't allow compilers or programmers to access a modern CPU's (including the Hammer) features-- So the Hammer has to deal with the limits inherited from the x86 set.
IA-64 allows explicit branch/pipeline ordering and load optimization; this allows the compiler's larger view to create code that keeps all the pipelines busy.
As all branch/pipeline and load optimization is done in the compiler, there is much more time to find the most optimal instruction order and path. (Fractions of nanoseconds vs. seconds/minutes/hours)
An instruction set (such as IA-64) capable of direct access to branch ordering, or a greater number of registers is more powerful, in that it allows for developers (directly, or via a compiler) to 'take the time' and resources to find the most optimal/efficient way to use the processor's full capabilities.
x86/Hammer does not allow explicit branch/pipeline ordering or load optimization, as x86 was purely single-pipeline until the first Pentium. (Although technically x87 is another pipeline, it served an entirely different purpose... the branching I speak of is of two or more identical pipelines)
As a result, the (Pentium, Athlon, K6, Hammer) must look at its instruction cache, and from that (very limited) amount of information, attempt to optimize the branch/pipelines and provide load-balancing. Time is extremely limited (to fractions of nanoseconds), as are resources to perform any re-ordering. But as time is limited, it frequently executes a suboptimal route and/or order.
Even though the Hammer has all kinds of ultra-modern features and resources, nearly all of them are inaccessible to the programmer/compiler; while the built-in management of these features/resources is quite good, it is also far from perfect (having a far more limited scope than a compiler does, after all) Cycles that could have been put to good use end up being wasted.
Lastly, I'll say that I'm not so much a fan of the IA-64 as I am of the VLIW concept; Non-VLIW processors (Sparc, Power, Alpha) have the same pipeline scheduling concerns as the Hammer. But at least they offer greater access to the processor's resources (such as double or more the accessible GP registers of 64-bit Hammer).
AMD has stated that adding the 64-bit extensions to Hammer has only increased its core size by about 5% over K7, at the same process size!
That's not too suprising... I'd say the figure is about right. With as large an instruction decode stage as an x86 (or any CISC) has, changing from 32 to 64 bits isn't going to change the size of the chip much. (The 64-bit extensions, from what I understand, do not add more than a couple instructions; it simply reuses the ones it already has. Hence Decode stage won't grow too much)
The thing is the Decode stage takes up so much of the overall die (and number of transistors, etc) in any CISC processor, that even sweeping changes in the remainder of the chip will result in a nearly identical die size.
That being said, the actual RISC processing core (of the Hammer) is significantly larger than the K7's RISC core. (On the order of 20-30%). It's just that the decode stage is so huge that it hardly makes any difference.
Why do you think people are so excited about Hammer?
A couple of things: First, there is a significantly large anti-Intel crowd. (Not surprisingly, they're also anti-Microsoft). So any upcoming non-Intel chip is exciting to them.
My feelings as to 'why AMD?' comes down to a simple factor: Price. AMD chips are loved by so many because they're cheap x86-compatibles (games being a key factor). If Apple hardware were similarly priced, and had the game market that x86 offers, Apple (and PowerPC) would be a favorite.
Processors can be related to cars fairly well, as long as you forget about being compatible with Windows for a moment; And frankly, as far as I'm concerned, the programs that run on it don't make a difference to the actual hardware.
The Hammer is akin to a pickup truck: A fairly inexpensive, medium-quality vehicle. It's loved because it does its job at a bargain price. It's utilitarian. It's the 'people's truck', and is affordable to most of the population.
Workstation processors (Such as Power, SPARC, Alpha, PA-RISC, Itanium) are compared to a semi-truck (Kenworth, International, Caterpillar): They don't necessarily go any faster, but they can tow huge cargos, but the corresponding rise in cost is far from linear.
And Apple (PowerPC) processors are BMW's or an Audi: They don't really run any better (or worse) than a pickup truck-- but it's a higher-quality 'luxury' car, and gives a better ride. You pay for the quality and experience, though.
And, basically, there are a lot of people who are perfectly happy with their pickup truck. They're not about to pay more (at a very uneven scale) for more performance of a semi-truck, nor do they care for the luxury of a BMW.
(And, the Itanium isn't as great as the other workstation processors, but it's also the only 1st gen chip in the bunch; The 1st gen SPARC, Power, and PA-RISC processors weren't wonderful either.)
The Itanium also has one major problem with reguard to die size: It's binary compatible with both x86 and PA-RISC processors; meaning that while the pure IA-64 architecture part of the chip is smaller than the Hammer, it then has the circutry to decode x86 (which is a huge # of transistors, and hence, huge die area), PA-RISC (a much simpler/smaller addition to the x86 decode), and the IA-64's own VLIW decode.
If the hammer had three seperate instruction decoders (one CISC, one RISC, one VLIW), then it would have a huge die area too. But the Hammer has one (CISC). And even the Athlons would be half their current size if they were pure RISC rather than CISC. (Of course, they wouldn't be x86 compatible then, but that's markets for ya.)
The 64-bit extensions don't comprise an entirely new instruction set, primarily because they're just that: extensions. The Hammer's mechanism to extend from 32 to 64 bits is identical to the way the '386 extended from 16 to 32 bits. (This is from AMD's data). The '386 also added a couple more instructions (and registers) to the '286 design. That doesn't make an entirely different instruction set and/or decode.
Hammer has a substantially smaller die than P4, it's main competitor.
;-)
Again, that's comparing a fab tech that is in the near-future compared to one that's been used for over a year. Not a fair comparison by any means. It's like saying that the Athlon has a smaller die than the K6. Completely different chip generations.
And since both Intel and AMD are working together (with about every other semiconducter maker) on researching new fab techs, you can bet Intel will have the same fab tech of the Hammer. (I do know the Itanium II uses a 0.09 micron fab tech, which is unprecedented for the scale.)
Supposedly the Itanium was (more or less) a rushed release (similar to the PowerPC G4). The Itanium II seems to have improved by a few orders of magnitude in efficiency, as well as speed. For that matter, the PowerPC G5 (which is not being rushed out) specs about 2x faster than IBM's Power4 core.
And, remember, as I said before, the Itanium is currently targeted at the Workstation/high-end server market; NOT the PC market. When I say workstation, I mean "ultra-high performance, ultra-high stability (and typically, ultra-high cost)" market. The Itanium is priced similarly to the primary competitors in the arena, those being UltraSPARC, Power, Alpha, and PA-RISC. The first-gen Itanium is not (and was never intended) to be anywhere near your local conumer electronics store (or your local system builder, for that matter).
The Athlon MP is not real workstation class by any stretch of the imagination. No competant engineer even trusts the architecture with critical tasks. I have yet to see anybody design computer hardware (or vehicles, or perform complex simulations, scientific calculations, or true enterprise-level work) on x86 hardware. The hardware, while cheap, still crashes far, far too often... it doesn't have anywhere near as good of a memory (and system) architecture... the list goes on and on.
The reason PC's are used for 'render farms' are because they're so cheap. If a computer crashes, then they just have to re-boot it and re-render the current frame (losing only a few hours work at most, and even then in a relatively non-critical task).
To be short: Sun dominates the workstation market, followed by HP, IBM, and SGI. None of their workstations (with exceptions to SGI's lowest-cost graphic workstations), run x86. That's over 95% of the workstation market.
There is the issue of OEM support, but if Hammer meets spec it will be in high demand.
Unquestionably. However, that doesn't mean it will be successful. AMD once made the world's most popular RISC processor (hands down). It literally blew everything else away in terms of sales. AMD discontinued production because, in spite of very high demand for the hardware, they couldn't come close to competing with the other architectures (or, to be more specific, although hardware makers loved it, nobody wrote software for it.)
If the Hammer isn't compatible with IA-64 compiled binaries, then AMD will have to fund the development of Hammer-compiled versions, as software developers, following the money, will support IA-64 first. AMD has done this in the past already, but had to give up because it wasn't profitable. (Not coincidentally, it's the same RISC processor that was in such high demand that was the source of this headache).
Why don't you go to the Usenet group comp.compilers and state that "Except for pathological cases, C code runs a few hundred percent slower than assembler".
The resulting blood bath should be amusing.
As I said, the assembler vs compiled fight is quite long running. Stating asm vs compilers arguments in a compiler newsgroup would get a similar response to a windows user extolling the virtues of WinXP in a Mac (or linux) group.
And, unsurprisingly, stating that C is anywhere near as efficient as pure asm in an assembly newsgroup would be a bloodbath as well.
The main argument for using C is that it is generally faster software development, and generates code that is 'acceptable'.
Pure asm takes more time to develop, but results in significantly tighter/faster code. The L4 microkernel kernel is a great example of this: The C implementation is much slower than the asm implementation.
But, unsuprisingly, the C implementation is a bit easier to work with.
HP did some research a while back (2-3 years) with software optimisation. They discovered a few interesting things: They could 'emulate' (using full architecture emulation) compiled programs with ~5-15% greater performance than running the same binary natively. (The emulator was emulating the PA-RISC architecture, and ran on top of PA-RISC hardware, so the test was conducted on the same machine) The emulator was capable of making up for inefficiencies the compiler added into the code. In spite of the (large) overhead of the emulation, the program still ran faster while emulated.
While it has yet to really see more than tech demo releases, the Amiga OS4 technologies are quite similar: They are able to run the exact same binary on multiple platforms (PowerPC, x86, IA-64, SPARC, and MIPS) with no drop in performance (compared to natively-compiled versions of the same code). Again, this is due to (current) compiler problems. (PS- I'm not an Amiga fan per se, but I do admire how well-engineered they were for their day)
Another good example is the speed difference between different compilers on the same platform. If they compiled to anything remotely close to the speed of asm, then there wouldn't be a 15-20% speed difference between a highly specialized compiler (such as Intel's) versus a more generic (cross-platform) compiler (such as gcc).
Don't get me wrong: There's nothing really wrong with using a compiled (or interpreted) language. There are very definate benefits to their use (development and maintenance time being primary considerations). Compiled languages are acceptably fast, and compilers are getting steadily better.
But I doubt we'll ever see more than a fraction of embedded devices use a more high-level language. A price difference of $0.01 adds up to real money (and reduced cost) in commercial production runs.. In addition, even where compiled languages are used, the resulting code is still de-compiled and the results scrutinized closely. (Which isn't much different than just writing the whole thing in asm anyway).
The discussion is about Hammer vs (insert processor)
;-)
But all are (or will be used) in embedded system design anyway, so that's where my train of thought was leading. The Hammer mainly has 'momentum' going for it. Just about everything else is against it.
First, the Hammer is a design of a few orders of magnitude more complex than anything else ever attempted. The engineers at DEC dropped the VAX processors and designed the Alpha to avoid the same complexity issues the Hammer is trying to tackle.
First, it uses the x86 set, which has both more instructions, and more complexity (some would say features) per instruction than a pure RISC processor. About half the Athlon's design is just to decode the instructions it's given. After decode from x86 into its internal RISC structure, it then schedules the pipeline, and finally actually sends the data into the appropriate pipeline for execution. There is a huge amount of overhead just to decode what needs to be done.
Pure RISC designs use about 15% of the chip's transistors for decode, and that's if you include pipeline scheduling.
This is the crux of the problem for AMD's hammer. The hammer will be forced to use a much larger transistor count than its RISC competitors. The higher transistor count results in several problems: It's far more complex and expensive to design. It takes a more complicated and expensive process to fab. The die is larger, which results in a slower processor. And it uses more power.
Which means that while AMD may have some momentum going for it, the Hammer is far more costly to design and produce than its competition. This will make things very hard for AMD; espescially if Intel is able to use its (considerably greater resources) to get computer makers to move from x86 to IA-64 at the same time they move from 32 to 64-bit.
And since HP/Compaq, Dell, Gateway, Micron, and IBM have all thrown in with IA-64... Things look grim for the Hammer.
The good thing is that I would bet that the RISC back end to the hammer is designed so it can be mated with an IA-64 interface should the x86-interface core not take off.
or are you claiming you could run Quake 3 on a 386, if it were written in assembly?
Not exactly a fair comparison, given that 3D Acceleration was a rather expensive solution back in the days of the '386. Most of which were used for military flight sims. The accelerator was about the size of a refrigerator, and connected to the 'host' computer, which was usually a SPARCstation @ 33 MHz. (At least in the case of sims made by Evans & Sutherland, which had the market for them pretty much cornered)
However, I wouldn't be too surprised that the non-graphics portions of Q3A would run fairly well (but not great) on a '386 ( if it had a '387 FPU as well.).
I'll say this: Without a dedicated 3D card, it would take a Power4 module to tackle Q3A at max settings. (Of course, a Power4 module isn't a single processor-- it's 8 processor cores roughly analogous to a PowerPC. And just one of the current Power4 cores outruns AMD's 'best-case' specs for the Hammer (which is still in development).
Except for pathological cases, C code will run a few percent slower than hand-tuned assembler
More the opposite; except for pathological cases, C code runs a few hundred percent slower than assembler. (Although on the IA-64 architecture, this is not necessarily true, as it relies entirely on the compiler to explicitly state operation order. The IA-64 does not re-order operations (or do any pipeline scheduling) at all, which is one of the primary reasons the Itanium runs x86 code so slowly.
And, the IA-64 arch. is about the only one out there where a C compiled program stands a fair chance against pure asm, (since it requires the pipeline scheduling to be explicitly stated by the programmer, which is an extremely difficult task for mere mortals.)
Frankly, I'm not about to argue any further. The asm vs c/compiled is older than vi vs. emacs; except the 'vi vs. emacs' doesn't have much impact on the speed of programs written in it. I know for my own experience how much faster assembler is than Compiled languages. I stand by my numbers. So do all the hardware engineers I know, including a couple whom have Ph.D.'s in compiler design.
C is great because it compiles well, and is cross-platform. Asm doens't require THAT much more development time than C does... But ASM is so device specific that unless you're writing software for a driver or embedded devices, the advantage of C more portable nature outweighs ASM's speed.
I mean... think of what Carmack would give up if he wrote his graphics engines in asm: NO cross-platform capability, a nightmare interfacing with graphics card drivers, and almost no flexibility in the graphics engine.
And since the advantages of ID's graphics engines have always been broad platform and hardware support, and extreme graphics engine flexibility. He'd lose a significant part of his market if he wrote it in asm. Plus, other companies would have graphics engines that, while somewhat slower, would be available far sooner than an asm implementation.
To 'cure' an ailment, you first have to treat the cause of the disease instead of the symptoms.
Again, refer to my earlier statements. Diseases are about the only health problem for which a cure can exist. But diseases just a slice of the entire 'health problem' pie. There is far more to health care (and pharmaceuticals) than ridding a body of an infectious disease.
they had decided to "patent instead of publish". In the end nothing useful was accomplished because the research results were secret, and those suffering from the disease could not afford commercially synthesized drugs.
Um... patents are published. In fact, that's one of the main reasons we have patents -- to force publication (in a public arena) of a good idea.
Besides... to grow any herb in quantaties that are useful is often far more expensive (if it's possible at all) than the synthesized drug.
You've just proven you have no practical knowledge of software development. Far less than 1% of desktop/workstation/server software is programmed in assembler. Perhaps the inner loop of some game engines might be, but I doubt even that in most cases.
If desktop computers accounted for more than a tiny fraction of the whole computer market, I might actually care about that statement. Fortunately, the vast majority of computers are embedded systems, and a substantial portion of embedded code is pure asm.
One of the main points of developing faster processors with large amounts of memory was to enable the use of more programmer-friendly languages. It is simply not worth the cost to develop systems of any size in assembly.
No, that's the software designers point of view. The hardware designers point of view is to maintain the performance of software written by overworked programmers who don't have the time to do it right.
Finally, if you think C code "usually" runs several times slower than assembler, you're just plain out to lunch.
First off, C code does execute several times slower than assembler. On the order of 5-10x is typical. Compilers really aren't that wonderful.
Oh, I buy that completely.
Just because it's misleading doesn't make it false.
It does taste sweeter when its more dilute.
Such is the case where marketers can get around such nasty things like 'lying' and 'false advertising.' They just tell the truth in a way that is so misleading that it gives an impression opposite of the truth.
The drug companies want the maintenance drug, the patients want the cure.
Sure the patients may want the cure. However, there's almost never a real 'cure' to most ailments. There's rarely any such thing as a 'magic bullet' for such a thing. On this point, pretty much everybody agrees, from the loudest herbal/homeopathic advocate to the most conservative scientist.
Antibiotics (and many infectious diseases) are the closest thing to a case for the existence 'magic bullet'. And we still get sick, don't we?
There's no getting around this fact. Most health problems are a matter of treatment, because a real cure is utterly impossible. You can't cure old age; we can merely relieve some of its symptoms. We can't cure joint and/or muscle problems. You can't 'cure' chemical depression; the person's brain just can't maintain the right chemistry on its own. Treatment (or as you put it, maintenance) is required
Which suits health care professionals just fine. All want your repeated business. From herbalists, to acupuncturists, to doctors, and pharmaceutical companies.
I used to work for a company that markets 'nutritional supplements,' including herbals. I worked in the lab. The company spends millions on research every year, into ways to maintain or improve health. One thing they researched vigorously: If herbs had the properties its advocates claimed.
In nearly every case, there were no particularly unique compounds, organic or inorganic. In clinical studies, the result was no better than a placebo. The exceptions are very conditional at best (ie. slight improvement in memory of a small percentage of alhzeimer's patients (and only among alhzeimers patients); the claim was it substantially improves memory for all)
Drug companies do the same; an astounding number of drugs are biological in origin, they haven't forgotten that. Aspirin (willow bark tea), penicilin (mold), opiates, and coca derived drugs among them. They're still finding benefits to aspirin. And just because it's bilogical in origin doesn't mean it's not patentable. (You can patent the use of this chemical, found in (komodo dragon saliva, cow feces, this plant... whatever) for the treatment of (whatever ails you, so long as you can prove it is safe and effective to your government's equivalent of the US's FDA)
So it's not that herbal solutions aren't being looked at seriously, or that there isn't funding. It's just that in nearly every case, the actual facts of an herbs properties do not back up the claims of its advocates. For that matter, many herbs touted as 'good' are in fact rather dangerous. (I even remember reading about the healing properties of Hemlock).
If an herb has good properties, we study it, and find out how it works. In nearly every case, it's easiest (and cheapest and safest for everyone) to isolate the compound(s) responsible, and synthesise them. (Which can then be patented)
I mean, there's a lot of proven harmful products out there (cigarette, for example) that haven't gone away yet.
Logic doesn't have much to do with things that have been part of a culture for centuries. Tobacco is one of them. Alcohol is another.
Alcohol is so dangerous: The statistics of crime (espescially murder and rape) commited under the influence of alcohol are staggering. Ditto for accident rates. Alcohol is dangerous to an individual, and to the public.
At one point in time, the US did the 'logical' thing and ban alcohol completely by banning the substance in its constitution. (And for anybody not familiar with american politics, amending the constitution is not an easy or trivial thing to do.
However, while it was the 'logical' thing to do, the consumption of alcohol was (and still is) so ingrained into American culture that its people simply rebelled wholesale against it. Alcohol production simply went underground. Eventually, another amendment was added to repeal the ban on alcohol, having decided that making a law can't change a culture.
The harmful effects of tobacco weren't proven until relatively recently; and having learned a lesson from banning alcohol (along with experience gained from law enforcement with other substances, like cocaine, opiates, marijuana, methanphetamines, etc...) Congress is unlikely to try to outlaw tobacco, in spite of its harmful properties (both public and private).
So while the government can't stop the use of 'cultural' drugs/substances, it can (with reasonable success) keep substances that have not come into common use from becoming popular, and harming the public.
First, almost all programmers can (thankfully) ignore the underlying instruction set and program in a higher level language - therefore it is irrelevant. x86-64 is actually quite an improvement over IA32 regardless.
Oooh! A higher level language!!!
So is BASIC! And you can get it for any platform and your code will run.
Whoopee! It's still dog slow and takes up more resources than is necessary to get the job done. Even compiled (C) code usually runs several times slower and requires more memory than assembler.
Second, if an instruction set is sufficiently efficient to allow the processor to be the fastest microprocessor in the world,
First, an instruction set has little to do with the speed of the processor. The whole CISC vs. RISC thing has more than shown that. An instruction set has more to do with the difficulty and/or complexity of the processor's design. The CISC instruction set requires more (electrical) power, and more transistors to do the same job.
Second, it's to be the fastest in the world? By what method is this measured? Clock speed? Size of the pipeline? Number of pipelines? Clocks per (integer, float, or instruction)?
The hammer isn't even meant to compete with workstation processors in terms of speed. I'll take a SPARC or Itanium any day. (It's a sad thing that so many seem to forget that the Itanium is an HP design, the successor to its PA-RISC, and that newer versions of the Itanium will include many of the Alpha's technologies).
on't know what you're complaining about those IRQ's, all systems in the world have something similar in concept to IRQ's. And in fact most systems throughout the world are now standardized on PCI, so they use the same IRQ mechanism as PC's.
Exactly true. Although the number and arrangement of the interrupts may be different. I would prefer not to think of how dog slow computers would be if they had to actively poll system devices (from video cards to keyboards). It's sooo much nicer to use an interrupt system.
And what about them "lousy serial ports"? That's absolutely essential in maintaining control over large groups of Unix servers. Their consoles are invariably serial-port based. They do have nice modern GUI consoles, but when it comes to stacking them into a server room and controlling them all from a single input/output source, nothing beats the simplicity of a serial console tty device. And since they're X Window or Java based, you can simply do all of your graphical stuff from the comfort of your own PC logging in remotely, but the local administration can be done over non-graphical serial ports.
While not arguing this point in the least, I will say one thing: The way the serial ports are set up on the x86 is a bit messy. The Unix boxen I've worked with had a more elegant system for serial ports. (Although most of them also didn't have the same backwards-compatibility problems x86 has).
16 GPR's are well within the current norms for RISC processors. Don't forget this is a CISC processor, so more than likely there will be all kinds of hidden internal registers for register renaming available. BTW, the 6502 had a whole 2 GPRs available to it.
Funny... I seem to remember most RISC processors I've known (or designed) to have at least 32 GPR's.
Besides... The point of moving away from CISC is so a processor doesn't use over 1/2 its transistors just to decode the instruction. The instruction decode section of the pipeline shouldn't be the single most complex part; unfortunately on a CISC processor, that's where ~50% of the transistors are.
I'm also fully aware of the 'evolution' of PC architecture. I've been programming x86 asm for quite a while as well. Many of the x86 (even modern ones) ways of doing things are just... inelegant (or ugly)