Linux is not solely responsible for the five-fold increase in speed. Linux allows ILM to leverage the very high price/performance ratio of the x86 microprocessors produced by AMD and Intel. If SGI ported IRIX to x86, then they might not be using Linux. Of course Linux is free, but ILM had to spend many man hours to port their software.
This is the package as far as product design is concerned. Everything from cell phone housings to automobile engines are designed with ProE. Surely not every company uses it, but most do. Many of my friends are engineers at Fortune 50 companies. Most of them use ProE on Sparcs running Solaris. Some have converted to Windows, but the product was and is primarily designed for a Unix/X11 environment.
You are referring to the Price/Performance ratio, which AMD probably does lead. You see, Price and Performance really are different things. And I mispoke: Intel leads the Performance race for microprocessors. For shared-memory computers, Cray probably leads the performance race. For other massively parallel computers, one of the ASCI supercomputers probably leads the race.
Yes, it is exclusive. The reason being is that the L1 is so big. Because Intel's L1 is small compared to its L2, having the data duplicated is no big deal. The issue of inclusion and exclusion is tricky because it affects how the processor maintains cache consistency in a multiprocessor. Having all of the L1 data lines duplicated in L2 (inclusion) means that the processor only has to check the L2 cache tags during a bus snoop. AMD likely has a copy of the L1 cache tags near the bus such that it can quickly check the L1 and L2 cache tags during a snoop.
A large L1 cache certainly benefits the performance of AMD chips as long as it can keep up with the clock rate. My only point is that it also is likely a cause to why they can't ramp up their clock rate beyond 2.5 GHz
Also look at the numbers of the Alpha, MIPS, IBM Power, and Sun Sparc processors. None of these chips has an L1 cache that is 25% of the total on-chip cache.
Numbers speak for themselves. Intel leads the performance race.
Absolutely. Increasing the total amount of on-chip cache has contributed to the rise in processor performance over the last decade. However, the largest additions of on-chip cache size is at the L2 level. The L1 must be able to respond quickly at the cost of cache hits.
Smaller memories are always faster (when comparing similar technologies). Registers, being the smallest memory, are the fastest. Followed by the L1, then L2, then main memory, and then disk.
AMD's huge L1 cache probably contributes to the difficulty in ramping up the clock rate. An L1 cache must be able to respond to a data access within usually 1-2 clock cycles. Many computer architectects believe that the size of the L1 cache should be less than 10% the size of the on-chip L2 cache. AMD's chips have L1 caches on the order of 25% the size of the L2. Such a large L1 probably cannot keep up with increasing clock frequencies.
Intel chips have very small L1 caches as compared with AMD. T
A quality 200W power supply is better than a lousy 300W power supply. Dell makes a quality 200W power supply. That is, Dell wrote a specification calling for a quality power supply that some company (probably in Asia) meets.
I have 2 machines: A Dell Dimension 4100 w/ PIII and a 200W power supply and two 7200 IDE drives. My other machine is a newly built Athlon box-- I bought quality RAM (Samsung) and a quality 300W PS (Enermax). It also has 2 7200 drives. Guess which one is more stable?? Yep, the Dell.
The remark belonged in a posted comment. Although slashdot.org is not an official news outlet, it has taken on the role for millions of people and should be somewhat responsible. Would the New York Times post something like that on its front page? No, but it may appear in an editorial inside the paper.
and Sim Invading Iraq to Keep Approval Ratings High
Please keep your snide unpatriotic comments to yourself when posting a story. Some of us have pride in our country and military. This remark belongs in a comment, not a posted story.
Actually, specialized apps are often developed by small companies who don't use the newest wizbang API's that trouble Wine. I've got several specialized apps running on Wine that run perfectly. Sure, the file open/save dialogs look like the ones on original Win95 but who cares if the app solves the problem at hand.
Don't upgrade. Office 97/2000 will work fine for the next few years. At that time, your financial circumstances may be different or Linux may have even closed the gap some more making it a more viable alternative. Who knows, maybe a miracle will happen and M$ will develop Office for Linux (who's laughing now?)
I agree. In high resolution, ATI's 2D is far superior to the nVidia cards I've seen. nVidia has gotten better since the TNT2 (which was just aweful at 1280x1024). However there is still a noticeable difference between a Radeon and a Geforce3 (i'm not sure about Geforce4).
I'm considering purchasing a game console to supplement my PC gaming. On my PC, I play FPS multiplayer games, flight sims, and RPGs. I'd like a cheap console to play some quick, fun, and social games with friends.
Now that prices are nearly the same, should I get a PS2 or Gamecube or Xbox? What about the price of games? Any difference?
A lousy ISA absolutely limits the amount of compile-time optimizations with regards to parallelism. However RISC ISAs only make the compiler simpler as the instructions are more orthogonal and the register-coloring algorithms are easier. Compilers just aren't that good yet at discovering parallelism and can't express it with RISC either.
That's why register renaming and out-of-order execution was invented....to compensate for ISAs lack of primitives and the compilers inability to discover or express parallelism.
With register renaming, it doesn't matter how many logical registers there are. I believe the Intel P4 has over 40 internal registers. Combined with out-of-order execution, the machine dynamically determines data dependences and assigns a different logical register. It also dynamically determines which micro-op is ready to execute and does it. Having more visible registers often can make things slower. It creates unnecessary false dependencies that the hardware can't deal with. This is why the compiler technology is absolutely crucial for IA-64 which is, for the most part, a static execution engine.
Compile-time branch prediction is not such a great idea. You must profile the program which some fixed data set. With hardware branch prediction, the prediction tables dynamically adjust based on the execution of the program. Branch prediction combined with speculative execution is often just as good as predicated execution (i believe you are referring to this).
If you don't do branch prediction, parallel instruction units, multiple issue/retire, etc, etc...what do you have??? You have a slow processor certainly suitable for embedded applications bot not for modern general-purpose computing. Sure, rolling your own processor might be interesting to explore different computing paradigms such as SIMD. And it might make a cool hobby. But you have to ask yourself, will a 1.8 GHz Athlon do the job just as well as my 100MHz super-specialized-for-one-task processor??
And no, a bunch of guys with bachelors degrees and no experience can't build any decent general-purose processor. I'm not talking FPGA's because you can't use those to make a processor that is anywhere close to an Athlon. You need architects, logic designers, layout designers, packaging experts, fab experts, etc etc etc. Then there are so many subtle design issues such as precise exceptions, I/O, etc.
ISA's are mostly irrelevant in terms of performance potential (except for IA-64 which I will get to). Both AMD and Intel devote a (small) portion of their transistor budget to dynamically convert the CISC instructions into RISC-like "micro-ops". Thus the actual execution core of the AMD K7 and Intel P6 micro-architectures are very similar to say the MIPS R10000 core. Now if Intel and AMD had a decent ISA to begin with, they could devote those transistors (used to convert CISC to RISC) to things like bigger caches. Thus the performance penalty of using a lousy ISA is really not that much as evident by the success of Intel and AMD in raw computational power.
Your comment about "RISC chips are really not all that complex" is extremely ignorant and uneducated. Please tell me again that the MIPS R12000 core is "not all that complex" after studying about superscaler speculative out-of-order execution.
The IA-64 ISA really is different because it takes a radical approach to achieving instruction-level parallelism. It is very VLIW-like and contains many advanced features like "poison bits", register windows (not SPARC windows), software pipeline support, etc. Thus the parallelism is discovered by the compiler and can be expressed to the architecture unlike RISC and CISC ISAs which rely on the hardware to discover and provide parallelism (through OOO execution).
Very true about the embedded space...lots of competition. Embedded processors are actually very simple cores. They are usually in-order and have simple pipelines. Plus there exists a large design space of customization that stimulates competition and offers many different companies the ability to offer something different. Just look at all the different "systems on a chip" out there.
And yes, standard tools exist that can automatically transform high-level specifications into mask-level designs. However, for anything that will come close to the price/performance ratio that Intel and AMD have achieved for general-purpose microprocessors, full custom design is usually required.
ARM is really an instruction set specification. Most ARM implementations I know of are rather simple (except for the engineering expended to make them low-power).
I can't think of an ARM implementation that is superscaler, speculative, and performs out-of-order execution.
Designing a modern microprocessor can not be done by amateurs or a group of people with a B.S. degrees in electrical engineering. Sure, many of us have taken undergraduate architecture classes and maybe have designed a simple pipelined microprocessor in Mentor Graphics or VHDL/Verilog. Some of us maybe even implemented it with FPGAs.
However, anything close to being as complex as Intel/AMD chips requires an army of highly experienced architects/engineers with many of them having pHD's. Even the software design tools, such as Mentor, cost well over $100,000
Then building the chip is another beast requiring a fab facility in the order of $1 billion for any process with feature sizes smaller than 0.5.
Microprocessors are becoming so complex to design and build, that only a few companies are surviving. Sort of like the aircraft industry. There are only 2 remaining companies in this world that design and build 300+ passenger commercial aircraft (Boeing and Airbus). It is infeasible for a new competitor to arise because of the capital involved (unless of course it is nationally sponsored).
You are incorrectly aggregating mainframes, shared-memory multiprocessors (SMP/NUMAs), and clusters as massively parallel machines.
Linux makes a great operating system for certains classes of massively parallel machines: clusters. It is low-cost and has a decent MPI (message-passing interface) implementation. It also runs on commodity hardware. Don't be surprised if you see the next ASCI supercomputer using Linux as the OS for each node.
You are correct in that Linux is not a good operating system for larger shared-memory multiprocessors. It lacks the fine-grained locking necessary to run the same kernel instance across dozens of processors.
I can't comment on mainframes because I am unaware of their architecture. I do know that high-end UNIX servers and mainframes are different beasts. The former focus on performance while the latter prefers uptime above all. I also believe that IBM, the kind of mainframes, has not used UNIX as their traditional operating system. Thus you are comparing apples to oranges. Linux makes a perfectly decent "mainframe OS" if you are partitioning the machine into multiple virtual machines.
Also please elaborate on "Linux' inferior TCP/IP stack". And "inferior handling of multi-threading on a large scale". Are Solaris light-weight processes any better?
That's because NT already has many of the dynamically linked libraries that MS word uses already open. Because OpenOffice is cross-platform, it can't take advantage of Windows-only API's.
Linux is not solely responsible for the five-fold increase in speed. Linux allows ILM to leverage the very high price/performance ratio of the x86 microprocessors produced by AMD and Intel. If SGI ported IRIX to x86, then they might not be using Linux. Of course Linux is free, but ILM had to spend many man hours to port their software.
Refer to the chart on page 23. How is it that the BSD license is "GNU GPL compatible"?
This is the package as far as product design is concerned. Everything from cell phone housings to automobile engines are designed with ProE. Surely not every company uses it, but most do. Many of my friends are engineers at Fortune 50 companies. Most of them use ProE on Sparcs running Solaris. Some have converted to Windows, but the product was and is primarily designed for a Unix/X11 environment.
You are referring to the Price/Performance ratio, which AMD probably does lead. You see, Price and Performance really are different things. And I mispoke: Intel leads the Performance race for microprocessors. For shared-memory computers, Cray probably leads the performance race. For other massively parallel computers, one of the ASCI supercomputers probably leads the race.
Yes, it is exclusive. The reason being is that the L1 is so big. Because Intel's L1 is small compared to its L2, having the data duplicated is no big deal. The issue of inclusion and exclusion is tricky because it affects how the processor maintains cache consistency in a multiprocessor. Having all of the L1 data lines duplicated in L2 (inclusion) means that the processor only has to check the L2 cache tags during a bus snoop. AMD likely has a copy of the L1 cache tags near the bus such that it can quickly check the L1 and L2 cache tags during a snoop.
A large L1 cache certainly benefits the performance of AMD chips as long as it can keep up with the clock rate. My only point is that it also is likely a cause to why they can't ramp up their clock rate beyond 2.5 GHz
Also look at the numbers of the Alpha, MIPS, IBM Power, and Sun Sparc processors. None of these chips has an L1 cache that is 25% of the total on-chip cache.
Numbers speak for themselves. Intel leads the performance race.
The irony amuses me.
Intel is a strong backer of Linux. Therefore Intel is good.
But wait, AMD is the little David against the Goliath of Intel. Intel is evil.
Oh oh, AMD ignores Linux (AGP cache coherence bug) and Jerry Sanders, the CEO of AMD, publicly supports Microsoft!
What will the "geek hippies" do!!!
Absolutely. Increasing the total amount of on-chip cache has contributed to the rise in processor performance over the last decade. However, the largest additions of on-chip cache size is at the L2 level. The L1 must be able to respond quickly at the cost of cache hits.
Smaller memories are always faster (when comparing similar technologies). Registers, being the smallest memory, are the fastest. Followed by the L1, then L2, then main memory, and then disk.
AMD's huge L1 cache probably contributes to the difficulty in ramping up the clock rate. An L1 cache must be able to respond to a data access within usually 1-2 clock cycles. Many computer architectects believe that the size of the L1 cache should be less than 10% the size of the on-chip L2 cache. AMD's chips have L1 caches on the order of 25% the size of the L2. Such a large L1 probably cannot keep up with increasing clock frequencies.
Intel chips have very small L1 caches as compared with AMD. T
A quality 200W power supply is better than a lousy 300W power supply. Dell makes a quality 200W power supply. That is, Dell wrote a specification calling for a quality power supply that some company (probably in Asia) meets.
I have 2 machines: A Dell Dimension 4100 w/ PIII and a 200W power supply and two 7200 IDE drives. My other machine is a newly built Athlon box-- I bought quality RAM (Samsung) and a quality 300W PS (Enermax). It also has 2 7200 drives. Guess which one is more stable?? Yep, the Dell.
The remark belonged in a posted comment. Although slashdot.org is not an official news outlet, it has taken on the role for millions of people and should be somewhat responsible. Would the New York Times post something like that on its front page? No, but it may appear in an editorial inside the paper.
and Sim Invading Iraq to Keep Approval Ratings High
Please keep your snide unpatriotic comments to yourself when posting a story. Some of us have pride in our country and military. This remark belongs in a comment, not a posted story.
Actually, specialized apps are often developed by small companies who don't use the newest wizbang API's that trouble Wine. I've got several specialized apps running on Wine that run perfectly. Sure, the file open/save dialogs look like the ones on original Win95 but who cares if the app solves the problem at hand.
Don't upgrade. Office 97/2000 will work fine for the next few years. At that time, your financial circumstances may be different or Linux may have even closed the gap some more making it a more viable alternative. Who knows, maybe a miracle will happen and M$ will develop Office for Linux (who's laughing now?)
I agree. In high resolution, ATI's 2D is far superior to the nVidia cards I've seen. nVidia has gotten better since the TNT2 (which was just aweful at 1280x1024). However there is still a noticeable difference between a Radeon and a Geforce3 (i'm not sure about Geforce4).
I'm considering purchasing a game console to supplement my PC gaming. On my PC, I play FPS multiplayer games, flight sims, and RPGs. I'd like a cheap console to play some quick, fun, and social games with friends.
Now that prices are nearly the same, should I get a PS2 or Gamecube or Xbox? What about the price of games? Any difference?
A lousy ISA absolutely limits the amount of compile-time optimizations with regards to parallelism. However RISC ISAs only make the compiler simpler as the instructions are more orthogonal and the register-coloring algorithms are easier. Compilers just aren't that good yet at discovering parallelism and can't express it with RISC either.
That's why register renaming and out-of-order execution was invented....to compensate for ISAs lack of primitives and the compilers inability to discover or express parallelism.
With register renaming, it doesn't matter how many logical registers there are. I believe the Intel P4 has over 40 internal registers. Combined with out-of-order execution, the machine dynamically determines data dependences and assigns a different logical register. It also dynamically determines which micro-op is ready to execute and does it. Having more visible registers often can make things slower. It creates unnecessary false dependencies that the hardware can't deal with. This is why the compiler technology is absolutely crucial for IA-64 which is, for the most part, a static execution engine.
Compile-time branch prediction is not such a great idea. You must profile the program which some fixed data set. With hardware branch prediction, the prediction tables dynamically adjust based on the execution of the program. Branch prediction combined with speculative execution is often just as good as predicated execution (i believe you are referring to this).
If you don't do branch prediction, parallel instruction units, multiple issue/retire, etc, etc...what do you have??? You have a slow processor certainly suitable for embedded applications bot not for modern general-purpose computing. Sure, rolling your own processor might be interesting to explore different computing paradigms such as SIMD. And it might make a cool hobby. But you have to ask yourself, will a 1.8 GHz Athlon do the job just as well as my 100MHz super-specialized-for-one-task processor??
And no, a bunch of guys with bachelors degrees and no experience can't build any decent general-purose processor. I'm not talking FPGA's because you can't use those to make a processor that is anywhere close to an Athlon. You need architects, logic designers, layout designers, packaging experts, fab experts, etc etc etc. Then there are so many subtle design issues such as precise exceptions, I/O, etc.
good point! guess i'm western-centric...
Take a few more advanced architecture classes.
ISA's are mostly irrelevant in terms of performance potential (except for IA-64 which I will get to). Both AMD and Intel devote a (small) portion of their transistor budget to dynamically convert the CISC instructions into RISC-like "micro-ops". Thus the actual execution core of the AMD K7 and Intel P6 micro-architectures are very similar to say the MIPS R10000 core. Now if Intel and AMD had a decent ISA to begin with, they could devote those transistors (used to convert CISC to RISC) to things like bigger caches. Thus the performance penalty of using a lousy ISA is really not that much as evident by the success of Intel and AMD in raw computational power.
Your comment about "RISC chips are really not all that complex" is extremely ignorant and uneducated. Please tell me again that the MIPS R12000 core is "not all that complex" after studying about superscaler speculative out-of-order execution.
The IA-64 ISA really is different because it takes a radical approach to achieving instruction-level parallelism. It is very VLIW-like and contains many advanced features like "poison bits", register windows (not SPARC windows), software pipeline support, etc. Thus the parallelism is discovered by the compiler and can be expressed to the architecture unlike RISC and CISC ISAs which rely on the hardware to discover and provide parallelism (through OOO execution).
Very true about the embedded space...lots of competition. Embedded processors are actually very simple cores. They are usually in-order and have simple pipelines. Plus there exists a large design space of customization that stimulates competition and offers many different companies the ability to offer something different. Just look at all the different "systems on a chip" out there.
And yes, standard tools exist that can automatically transform high-level specifications into mask-level designs. However, for anything that will come close to the price/performance ratio that Intel and AMD have achieved for general-purpose microprocessors, full custom design is usually required.
ARM is really an instruction set specification. Most ARM implementations I know of are rather simple (except for the engineering expended to make them low-power).
I can't think of an ARM implementation that is superscaler, speculative, and performs out-of-order execution.
Designing a modern microprocessor can not be done by amateurs or a group of people with a B.S. degrees in electrical engineering. Sure, many of us have taken undergraduate architecture classes and maybe have designed a simple pipelined microprocessor in Mentor Graphics or VHDL/Verilog. Some of us maybe even implemented it with FPGAs.
However, anything close to being as complex as Intel/AMD chips requires an army of highly experienced architects/engineers with many of them having pHD's. Even the software design tools, such as Mentor, cost well over $100,000
Then building the chip is another beast requiring a fab facility in the order of $1 billion for any process with feature sizes smaller than 0.5.
Microprocessors are becoming so complex to design and build, that only a few companies are surviving. Sort of like the aircraft industry. There are only 2 remaining companies in this world that design and build 300+ passenger commercial aircraft (Boeing and Airbus). It is infeasible for a new competitor to arise because of the capital involved (unless of course it is nationally sponsored).
You are incorrectly aggregating mainframes, shared-memory multiprocessors (SMP/NUMAs), and clusters as massively parallel machines.
Linux makes a great operating system for certains classes of massively parallel machines: clusters. It is low-cost and has a decent MPI (message-passing interface) implementation. It also runs on commodity hardware. Don't be surprised if you see the next ASCI supercomputer using Linux as the OS for each node.
You are correct in that Linux is not a good operating system for larger shared-memory multiprocessors. It lacks the fine-grained locking necessary to run the same kernel instance across dozens of processors.
I can't comment on mainframes because I am unaware of their architecture. I do know that high-end UNIX servers and mainframes are different beasts. The former focus on performance while the latter prefers uptime above all. I also believe that IBM, the kind of mainframes, has not used UNIX as their traditional operating system. Thus you are comparing apples to oranges. Linux makes a perfectly decent "mainframe OS" if you are partitioning the machine into multiple virtual machines.
Also please elaborate on "Linux' inferior TCP/IP stack". And "inferior handling of multi-threading on a large scale". Are Solaris light-weight processes any better?
That's because NT already has many of the dynamically linked libraries that MS word uses already open. Because OpenOffice is cross-platform, it can't take advantage of Windows-only API's.
There are several blood patches available.