Actually, no, it isn't. It's much more likely that you do not have Free Will. In order to prove Free Will, you have to prove that something can act contrary to the laws of physics/nature in order to arrive at some state (the state representing the choice). Otherwise, if all states of 'choice' are the result of following natural laws (phsyics, etc.) then arrival at that state was no more of a 'choice' than ice has of not melting when put into an environment of 100C at standard atmospheric pressure.
We had a very interesting discussion on Ars sometime back about this very subject. Basically, I believe that in order for Free Will to exist, the supernatural must also exist. Now... whether or not you believe in the supernatural (religion, diety or dieties, etc.) is your own business.
Sure they are. IF there is complete determinism, any unit that exhibits undesirable behavior should simply be terminated so as to a) not continue that behavior, and b) not pass that programming along to its offspring.
You don't have to care for the welfare of others. I choose to.
I choose to not attempt to open cups of hot liquid by jamming it in my crotch, holding it between my knees, and trying to remove the lid. I further choose to not do such things while in the confines of an automobile.
It's unfortunate that she was burned but rarely does a single event happen in a vacuum. One can usually find an entire sequence of poor choices and judgements along the way.
Yeah... because it's unreasonable to expect hot coffee to be hot. Nor should you expect to have to wait for hot coffee to cool down to a temperature that you can ingest it safely before actually ingesting it, despite the fact that it is made with water that is near or above the boiling temperature of water. You should treat it just like you do hot cheesy pizza... you should, as fast as you possibly can, shove as much of it into your mouth as you possibly can as soon as it comes directly out of the oven, regardless of the fact that it is very hot after being cooked in an oven that is significantly over the boiling temperature of water.
"Balance" is realizing that coffee is hot and letting it cool down before you drink it. "Balance" is realizing that you shouldn't always do things from the seat of your car (while driving or otherwise in such a tight, cluttered place) involving very hot or otherwise dangerous substances.
Seriously... thinking ahead is just something people have lost the ability to do.
Since "Hot Coffee" is served... well... hot... If you spill something hot into your lap, you will probably have to worry about more than being wet. Where I grew up, we were taught to be careful and not spill anything hot into our laps because it might, you know, burn you or something.
Personally, I don't like x86 either. Luckily, I've never had to write x86 assembly even though I've worked on millions of lines of source (C, C++, etc.) So, aesthetics of the ISA are (no matter what I think) irrelevant because most of us will never see the ISA. I grew up learning the 6502, 6800, 68000, SPARC, and other ISAs. Those were nice to use and made x86 look like a Gorgon. I haven't written any assembly at all in over 20 years. My lowest level language has been C so that's my "ISA".
Second, the CISC/RISC debate died a long time ago. It's mostly RISC basically fell away to mean Load/Store architecture while CISC was Memory+Op.
Third, something much more interesting to think about in the world where everyone is so concerned about memory bandwidth is that x86 instructions are very much like compressed binaries. One read can get an instruction that translates into a number of instructions that are more Load/Store-like (RISC-like). On a "RISC" type machine, that equivalent instruction stream could have taken a number of reads (read: bus cycles) and multiple I-cache lines to hold (read: more memory). So, not only do you save memory size, you can save many clockcycles by reading a "compressed instruction" and translating it into the several equivalent load/store (RISC) instructions.
At least... #1 helps me forget about the ugliness that is the x86 ISA and #3 actually makes me like it a little.
Good luck with that... people have been working on it for a long time with no solution yet. Just refactoring code is not horribly difficult. Data partitioning and data flow (overlapping communication with computation in meaningful ways) are things that are a bit more difficult to handle automagically just by looking at a bunch of source code (and those are only an example of some of the problems that such a tool would face).
Heh, yeah... I'm familiar with MPI (message passing) models of concurrency and with various MPI (the Message Passing Interface standard) libraries. From everything I've read, the SPEs are comparable to modern DSPs. They have lots of functionality but aren't really suited to running a general purpose OS (compared to the PPE, for example) for a number of reasons. Do you have a good link that talks about Cell's built-in message passing? Having built-in message passing functionality would help a lot, obviously, over having to write libraries to handle such things built only on their DMA capabilities.
Not a fanboi are you? You are right, though... Exclusives will make/break the PS3 because it's so different that the only way for it to shine will be the exclusives. Similarly, the cross platform games will all default down to a single PPC (with two threads) because that's all they have in common and the graphics libraries will hopefully take care of the graphics differences. The rough part of exclusives for the PS3 is that the programming model for the Cell is very, very complicated... much more complicated than the XBox360 model (which is just plain SMP). It's even further complicated by (from what I hear) the tools available for it being not-so-great.
The Cell's programming model isn't that new... it's been around for a while (PPC surrounded by a bunch of DSPs that talk to each other, the PPC, and the world through DMA). I've programmed that model before... it's a massive pain compared to the relative simplicity of generic SMP.
Similarly, I found my workload to be dominated by a few intensive applications (some of which I was writing) which made my single-core Athlon64 cry out in pain. I upgraded to an AMD X2 and saw an almost doubling in my productivity in such situations... not that I'm unfamiliar with SMP (and some large SMP machines and how they give benefits)... it's just the first time I've actually decided to buy one for my personal use at home.
Not as much as you may think, and in some cases, it most definitely can be worse, but that depends on the application and how it was written. The easiest case to cite would be the simple case of increased register pressure due to expanded data types. This is where an application was written (let's say in C) where it has a lot of data that is stored in long data types when the dynamic range of the data being stored could be held in 32-bit values (could also use lots of pointers, the same thing happens). Imagine a huge array of these values, exactly large enough to be held in 9/10 of the cache, in 32-bit mode. In 64-bit mode, that array just doubled in size, because a long in 64-bit is 64-bits wide, where it was 32-bits wide when compiled in 32-bit mode. In 32-bit mode, the data can be entirely contained in the cache. In 64-bit mode, the cache will not hold it and you'll be force to do lots of main memory reads. The 32-bit program will be faster because it is cache bound where the 64-bit program would be main memory bound. Granted, this type of behavior is easily avoidable by a programmer who knows the pitfalls and limits the datatypes being used to efficiently contain the dynamic range of the data being stored (efficiency may also take into account any penalties of misaligned reads, not just purely dynamic range, so even though the data is 0-2000, you may want to use a 32-bit integer anyway to prevent misaligned reads/writes). This is nothing new... We were dealing with these things back in the very early 1990s when we got a bunch of SGIs in that had the R4000 processor line in them.
So...
What are you talking about? 32-bit was never faster than 64-bit mode.
is certainly a false statement (not to mention, that I've seen 32-bit faster on the same code and can easily construct samples of (non pathological) code which demonstrate 32-bit compiled x86 code that runs faster than the same code compiled in x86-64 mode.
I've written (and helped write) and profiled a number of 32-bit/64-bit applications and have compared the performance in each mode. Sometimes it's a win, sometimes it's about equal, sometimes the 32-bit compiled version wins. Most of the time, the benefit is less than 10% except in specific cases where lots of temps are being used, where you can take advantage of the increased number of registers (but that would also help in 32-bit mode if they were available, unfortunately they aren't) or where you're actually using 64-bit integers (remember, the FPU has been 64-bit for a long time already so don't much of a gain there unless you can use the extra registers).
Writers of applications like Cinebench9.5 have already stated that their gains were as much as being able to "throw away" all the 'compatibility' code (SSE vs FPU and having to support both and such), because x86-64 doesn't support it, so that the x86-64 code is much more clean plus the benefits of using the additional number of registers for calculations. Granted, this *is* an inherent benefit on the x86 systems from moving from x86 to x86-64 but it isn't necessarily a generality that you can extend to all 64-bit environments.
There are never any guarantees. However, the VAST MAJORITY of programs benefit, and CPU-intensive applications in particular almost always benefit from it.
OK, you've got me here... a 1% to 10% speed improvement *is* most certainly showing that many programs do benefit from the x86-64 architecture over the x86 architecture.
So when AMD was trouncing Intel in 32-bit (and 64-bit) mode, it was AMD's strength... Now it is their weakness? Also, running 64-bit programs doesn't automatically mean massive performance increases... it isn't like it runs 2x as fast because 64 = 32 * 2. For reasons why, there are numerous discussions on the 'net as to why.
Wierd... AnandTech has reviews of the AM2 in May 23, 2006. Same for AMD.
Meanwhile, a price drop was announced by AMD around July 14 specifically stating the following:
Sunnyvale (CA) - A spokesperson for AMD confirmed to TG Daily that customers should expect AMD to announce a substantial price drop in its highest performance desktop processors, including Athlon 64 X2. The price drop should be announced at or about the time that Intel announces the release of its next-generation Core 2 Duo and Core 2 Extreme processors - an announcement which is now expected before the end of the month.
As always... wait until the kid discovers he can't play the same games as his friends (or he might can, but with a bit of work) on the PC. Of course, he could always get a console to cover console gaming.
but I'd expect Core Duo's apparent lead to be pretty short lived. There's nothing inherently brilliant in their architecture, other than them stealing a march on AMD in terms of 65 nm and adding cores like there's no tomorrow
Obviously you haven't read anything about Core2. Core2 is, on average, 20% faster at the same clockspeed as AMD's Athlon64. If you liked the fact that AMD's Athlon64 was comparable to the Pentium4 at 2/3 the clock speed, then you simply must like Core2's being faster at ~80% of the clock speed of the Athlon64... while still being on the obsolete, much maligned FSB architecture that AMD spends so much time and money poo-pooing over their obviously-so-much-better IMC+HT technology.
AMD still leads Intel by a country mile on budget processors as well.
Well... when you can't lead in performance, you try to lead somewhere else. Yes, the launch of Core2 parts drove AMD to cut the prices of their processors by 50% or more in order to stay competitive. Had they not done that, they would be selling nothing right now because even a fanboi couldn't justify buying AMD at the complete destruction that equal priced CPUs from AMD would get compared to the Intel parts. So, AMD dropped back to attempt to remain king of the bargain market until they could release something that would put them back into the performance game, which Intel currently owns (not counting the obviously boutique, one-off 4x4 deadend attempt to save face that AMD marketing released).
I have three Athlon XP and four Athlon64 machines at home (and no Intel other than two laptops). This migration of socket requiring a new CPU and all has bitten me already. My S939 parts barely lived a year (two years given the entire lifetime) and already AMD is requiring me to buy a new motherboard *and* CPU if I wish to upgrade because they're already killing off S939. Many of the already existing S775 boards for Intel will upgrade the CPU to Core2 (perhaps with a BIOS flash). It makes me kind of wonder how long Socket AM2 is going to last given that they're already talking about S1207 and some future move to DDR3 (yet another socket change?) Currently, for me to upgrade my AMD machines, I have to buy a CPU + motherboard because even the S939 X2s are EOL'd. I've thought about buying some upgrade AMD CPUs but I'm not going to do it. Core2 offers too much performance increase with the promise of socket compatible quad core very (very) soon. I believe AMD is requiring S1207 for quad core, so if you bought AM2, you're already EOL'd because of the change for DDR3 and quad core. AM2 was dead before it was released, it seems.
On second thought, because someone doesn't like a book doesn't mean that any of those people would read something you'd like. For example, people who respond who didn't like the Foundation books by Isaac Asimov would be just as likely to actually like cooking recipe books as they would religious books or any other book that you may not like as well.
I guess you could use it for the obvious use in that by stating what you like you may get a list of other books to avoid.
Probably a better way to broaden your horizons is to enter a book that you read (or started to read) and knew you hated. Then it might tell you about some books you may like. It won't always work because it isn't tailored to your own tastes (your own likes/dislikes) so there aren't two poles in the general evaluation but at least it may give you some ideas and even open you up to some other genres of books.
Maybe... Microsoft's DirectX10 has an API in it for offloading vector type work to 'something else' in the system. The interesting thing about it is that it will be a standard API, meaning that hardware can be built to take advantage of it while drivers can also be written to either do it in emulation or by actually handing it off to the specialized hardware. This would help AMD out a lot as far as that kind of hardware goes... without some standard APIs, it would likely end up in a mess, IMO.
Personally, I'm not that excited about that kind of technology just yet because it is still fairly immature as far as PCs go. The logistics are rough all the way around. First hardware can likely be surpassed quickly with newer/better coprocessors but you have to a) replace the entire CPU, b) leave that coprocessor there but disable it or something in preference to an add-in card that is better (just like embedded graphics today) which means you have basically a dead 'core' on your CPU, and c) AMD will be stuck with a ton of dead stock as soon as they upgrade that coprocessor and AMD already has problems with keeping channels fed. c) will probably mean that advancement of Fusion will be slow. If AMD can release a Fusion vector part that is 2X as fast or has a better API (new version) or something, but the 'main core' is the same (say it's a 3.4GHz Athlon64 core) because it hasn't advanced as fast, the instant the new part comes out, no one will want the old one and it will just sit there unless it has huge discount. IMO, this will make AMD not want to advance the Fusion device at a faster pace than the x86-64 core that's also on the die because it would be very costly.
Again... what is this mythical "true SMP operations" that people keep mentioning? Are you talking about MIMD code?
Depending on what the mix of instructions are, you can end up with a stall on the L2 with the Intel part because the CPUs share that L2 and can be in different places at the same time- a cache spill's expensive on machine speed as you well know.
I don't understand the "places" you mention. L2 cache has been multiported for a long time. Additionally, the cache subsystem should be able to handle simultaneous requests from both cores. There should be no stalling due to simultaneous cache accesses from both cores in a shared cache system. As far as cache spills, any situation that should cause spills in a shared cache should cause spills in non-shared (I'll mention this later). Basically, the shared 2M cache can mimic the degenerate case of two 1M caches exactly, but has the flexibility to also be the same as one core having a 512K cache and the other having a 1.5M cache, if working sets dictate, for example (I'll mention this later too).
The Intel part can conceivably be having a LOT of those under a real multiproc load. With that in mind, one CPU may have 1.5Gb of cache for it to work with reliably before a cache spill and 512Mb of cache on the other core; or even nastier combinations thereof. The Athlon architecture doesn't HAVE that concern- it's as if you've got two (or four) full Athlon64's working at the same time. With the Intel offerings, you're going to have swinging performance losses that you can't quite explain all the time. Sure, it's fast with what's now available, but it's not stable speed; bring in that second CPU into full play and things don't look as rosy.
I don't get your discussion... I'm just not following your verbage. I'm trying to understand it but can't get your metaphors or something.
Anyway I'll try to discuss what I think you are talking about. Shared L2 cache is considered the superior design compared to each core having unshared cache. There are numerous discussions on this around the 'net. However, I'll talk about several specific examples.
In a non-shared cache configuration with two cores on the same die running multithreaded code, you can easily get into situations where each thread wants access to the same piece of data for writing. When this happens (which is fairly common... mutex/semaphore/etc in fine-grained code are good examples of this), in a non-shared cache system, you can get a lot of MOESI traffic and passing around of that data between the two non-shared caches (takes inter-cache bandwidth to do that). However, in the shared cache system, that data is in the shared L2 cache exactly once and, furthermore, there is no passing it around... no MOESI traffic, no usage of any intercache bandwidth because no copy takes place. In such a situation (two threads competing for writes on the same data), the shared L2 cache can be very much faster than the non-shared L2 cache. In addition, the absence of the MOESI traffic is a lighter load on the MOESI subsystem, leaving it free to do other MOESI traffic and do other transfers. In some codes, MOESI traffic between non-shared cache and data copying between the unshared L2 caches can be almost pathological behaviour, leading to heavy slowdown as the two cores fight for access to the data. To summarize: Shared L2 = much lower MOESI traffic in a competing writes situation and little/no intercache bandwidth utilization because no copies between caches occurs. Non-shared L2 in such a situation is more MOESI traffic and intercache bandwidth utilized (and cores waiting for the data to transfer) to transfer the data back and forth. It's easy to write a simulation of this problem.
A second example is cache utilization. If you have two threads in a dual core system that are asymmetric in cache working set size, you can
Core 2 Duo is only winning right now because of the process, but Intel's trouble is that a processes relies on science and technology whereas a processor has to be designed. As time goes by it gets easier and easier for AMD to get to a certain nm, but it doesn't get any easier to design the CPU. And right now AMD's designs are much better than Core 2 Duo (they are in the same ballpark in performance/watt at.4x the density).
How does this explain that when clock speed and L2 cache sizes are equal, Core2Duo outperforms Athlon X2 by a non-trivial amount? If it were "just process", then you could try to show where Core2Duo wins based on how much cache it has and the like, but that isn't what we see in the multitudes of benchmarks that have been run. A 65nm 2GHz part doesn't just magically perform calculations faster than the same part running at the same frequency at 90nm, for example.
What is "true SMP type multi-threading"? I've worked on parallel (multi-process and multi-threaded) code for almost 20 years now. There is no "one true" SMP processing type. The differentiators are data partitioning and how much serial vs. parallelism are inherent in your problem/algorithms. All are "true" in every sense of the word. Some are more efficient than others and some scale better... that's it.
As far as L2 cache sizes, they *are* going to grow with the 45nm shrink, it's already been announced... and that's fine by me as the 45nm shrink will just increase the frequency which means the gap between chip speed and memory speed will just go up. And yeah, for most things, I'd take a GB cache:) You'd still have to have main memory, even it were all cached, just because of the way that main memory and cache work differently (they are sort of the same but not - VM is sorta like a caching technique, but they operate on different levels so you have to have both still:p).
And yeah... the whole 'there'll only ever been a need for maybe seven computers' and '640k is enough for anybody' arguments have been around for a long time but have never held true. There'll always be a need for faster machines, but I do agree with you, a human can only send/receive email, surf the web, and type so fast and current CPUs are far beyond what it takes to do that. Still... as Valve has shown, build it and they will come... at least in the entertainment world (games and content), being able to have more CPU power is still good.
The council does plan to begin migrating those desktops to its Suse Professional 9.3-based desktop OS,
What's the logic of going with a version that is so far behind? I know that you don't go bleeding edge with such a project but 9.3 is ancient. I guess it is still supported but it seems like being *that* far behind would be leaving yourself open to a number of security/compatibility issues.
Actually, no, it isn't. It's much more likely that you do not have Free Will. In order to prove Free Will, you have to prove that something can act contrary to the laws of physics/nature in order to arrive at some state (the state representing the choice). Otherwise, if all states of 'choice' are the result of following natural laws (phsyics, etc.) then arrival at that state was no more of a 'choice' than ice has of not melting when put into an environment of 100C at standard atmospheric pressure.
We had a very interesting discussion on Ars sometime back about this very subject. Basically, I believe that in order for Free Will to exist, the supernatural must also exist. Now... whether or not you believe in the supernatural (religion, diety or dieties, etc.) is your own business.
Sure they are. IF there is complete determinism, any unit that exhibits undesirable behavior should simply be terminated so as to a) not continue that behavior, and b) not pass that programming along to its offspring.
Get that 12yo girl from Jurrasic Park.... she knows Unix!
I choose to not attempt to open cups of hot liquid by jamming it in my crotch, holding it between my knees, and trying to remove the lid. I further choose to not do such things while in the confines of an automobile.
It's unfortunate that she was burned but rarely does a single event happen in a vacuum. One can usually find an entire sequence of poor choices and judgements along the way.
Yeah... because it's unreasonable to expect hot coffee to be hot. Nor should you expect to have to wait for hot coffee to cool down to a temperature that you can ingest it safely before actually ingesting it, despite the fact that it is made with water that is near or above the boiling temperature of water. You should treat it just like you do hot cheesy pizza... you should, as fast as you possibly can, shove as much of it into your mouth as you possibly can as soon as it comes directly out of the oven, regardless of the fact that it is very hot after being cooked in an oven that is significantly over the boiling temperature of water.
"Balance" is realizing that coffee is hot and letting it cool down before you drink it. "Balance" is realizing that you shouldn't always do things from the seat of your car (while driving or otherwise in such a tight, cluttered place) involving very hot or otherwise dangerous substances.
Seriously... thinking ahead is just something people have lost the ability to do.
Since "Hot Coffee" is served... well... hot... If you spill something hot into your lap, you will probably have to worry about more than being wet. Where I grew up, we were taught to be careful and not spill anything hot into our laps because it might, you know, burn you or something.
Personally, I don't like x86 either. Luckily, I've never had to write x86 assembly even though I've worked on millions of lines of source (C, C++, etc.) So, aesthetics of the ISA are (no matter what I think) irrelevant because most of us will never see the ISA. I grew up learning the 6502, 6800, 68000, SPARC, and other ISAs. Those were nice to use and made x86 look like a Gorgon. I haven't written any assembly at all in over 20 years. My lowest level language has been C so that's my "ISA".
Second, the CISC/RISC debate died a long time ago. It's mostly RISC basically fell away to mean Load/Store architecture while CISC was Memory+Op.
Third, something much more interesting to think about in the world where everyone is so concerned about memory bandwidth is that x86 instructions are very much like compressed binaries. One read can get an instruction that translates into a number of instructions that are more Load/Store-like (RISC-like). On a "RISC" type machine, that equivalent instruction stream could have taken a number of reads (read: bus cycles) and multiple I-cache lines to hold (read: more memory). So, not only do you save memory size, you can save many clockcycles by reading a "compressed instruction" and translating it into the several equivalent load/store (RISC) instructions.
At least... #1 helps me forget about the ugliness that is the x86 ISA and #3 actually makes me like it a little.
Antarctica
Good luck with that... people have been working on it for a long time with no solution yet. Just refactoring code is not horribly difficult. Data partitioning and data flow (overlapping communication with computation in meaningful ways) are things that are a bit more difficult to handle automagically just by looking at a bunch of source code (and those are only an example of some of the problems that such a tool would face).
Heh, yeah... I'm familiar with MPI (message passing) models of concurrency and with various MPI (the Message Passing Interface standard) libraries. From everything I've read, the SPEs are comparable to modern DSPs. They have lots of functionality but aren't really suited to running a general purpose OS (compared to the PPE, for example) for a number of reasons. Do you have a good link that talks about Cell's built-in message passing? Having built-in message passing functionality would help a lot, obviously, over having to write libraries to handle such things built only on their DMA capabilities.
Not a fanboi are you? You are right, though... Exclusives will make/break the PS3 because it's so different that the only way for it to shine will be the exclusives. Similarly, the cross platform games will all default down to a single PPC (with two threads) because that's all they have in common and the graphics libraries will hopefully take care of the graphics differences. The rough part of exclusives for the PS3 is that the programming model for the Cell is very, very complicated... much more complicated than the XBox360 model (which is just plain SMP). It's even further complicated by (from what I hear) the tools available for it being not-so-great.
The Cell's programming model isn't that new... it's been around for a while (PPC surrounded by a bunch of DSPs that talk to each other, the PPC, and the world through DMA). I've programmed that model before... it's a massive pain compared to the relative simplicity of generic SMP.
Similarly, I found my workload to be dominated by a few intensive applications (some of which I was writing) which made my single-core Athlon64 cry out in pain. I upgraded to an AMD X2 and saw an almost doubling in my productivity in such situations... not that I'm unfamiliar with SMP (and some large SMP machines and how they give benefits)... it's just the first time I've actually decided to buy one for my personal use at home.
So... is certainly a false statement (not to mention, that I've seen 32-bit faster on the same code and can easily construct samples of (non pathological) code which demonstrate 32-bit compiled x86 code that runs faster than the same code compiled in x86-64 mode.
I've written (and helped write) and profiled a number of 32-bit/64-bit applications and have compared the performance in each mode. Sometimes it's a win, sometimes it's about equal, sometimes the 32-bit compiled version wins. Most of the time, the benefit is less than 10% except in specific cases where lots of temps are being used, where you can take advantage of the increased number of registers (but that would also help in 32-bit mode if they were available, unfortunately they aren't) or where you're actually using 64-bit integers (remember, the FPU has been 64-bit for a long time already so don't much of a gain there unless you can use the extra registers).
Writers of applications like Cinebench9.5 have already stated that their gains were as much as being able to "throw away" all the 'compatibility' code (SSE vs FPU and having to support both and such), because x86-64 doesn't support it, so that the x86-64 code is much more clean plus the benefits of using the additional number of registers for calculations. Granted, this *is* an inherent benefit on the x86 systems from moving from x86 to x86-64 but it isn't necessarily a generality that you can extend to all 64-bit environments.
OK, you've got me here... a 1% to 10% speed improvement *is* most certainly showing that many programs do benefit from the x86-64 architecture over the x86 architecture.
So when AMD was trouncing Intel in 32-bit (and 64-bit) mode, it was AMD's strength... Now it is their weakness? Also, running 64-bit programs doesn't automatically mean massive performance increases... it isn't like it runs 2x as fast because 64 = 32 * 2. For reasons why, there are numerous discussions on the 'net as to why.
Same for AMD.
Meanwhile, a price drop was announced by AMD around July 14 specifically stating the following:
As always... wait until the kid discovers he can't play the same games as his friends (or he might can, but with a bit of work) on the PC. Of course, he could always get a console to cover console gaming.
Obviously you haven't read anything about Core2. Core2 is, on average, 20% faster at the same clockspeed as AMD's Athlon64. If you liked the fact that AMD's Athlon64 was comparable to the Pentium4 at 2/3 the clock speed, then you simply must like Core2's being faster at ~80% of the clock speed of the Athlon64... while still being on the obsolete, much maligned FSB architecture that AMD spends so much time and money poo-pooing over their obviously-so-much-better IMC+HT technology.
Well... when you can't lead in performance, you try to lead somewhere else. Yes, the launch of Core2 parts drove AMD to cut the prices of their processors by 50% or more in order to stay competitive. Had they not done that, they would be selling nothing right now because even a fanboi couldn't justify buying AMD at the complete destruction that equal priced CPUs from AMD would get compared to the Intel parts. So, AMD dropped back to attempt to remain king of the bargain market until they could release something that would put them back into the performance game, which Intel currently owns (not counting the obviously boutique, one-off 4x4 deadend attempt to save face that AMD marketing released).
I have three Athlon XP and four Athlon64 machines at home (and no Intel other than two laptops). This migration of socket requiring a new CPU and all has bitten me already. My S939 parts barely lived a year (two years given the entire lifetime) and already AMD is requiring me to buy a new motherboard *and* CPU if I wish to upgrade because they're already killing off S939. Many of the already existing S775 boards for Intel will upgrade the CPU to Core2 (perhaps with a BIOS flash). It makes me kind of wonder how long Socket AM2 is going to last given that they're already talking about S1207 and some future move to DDR3 (yet another socket change?) Currently, for me to upgrade my AMD machines, I have to buy a CPU + motherboard because even the S939 X2s are EOL'd. I've thought about buying some upgrade AMD CPUs but I'm not going to do it. Core2 offers too much performance increase with the promise of socket compatible quad core very (very) soon. I believe AMD is requiring S1207 for quad core, so if you bought AM2, you're already EOL'd because of the change for DDR3 and quad core. AM2 was dead before it was released, it seems.
Anyone else find it funny that the blurb immediately below this one is: Changing Climates for Microsoft and Google
On second thought, because someone doesn't like a book doesn't mean that any of those people would read something you'd like. For example, people who respond who didn't like the Foundation books by Isaac Asimov would be just as likely to actually like cooking recipe books as they would religious books or any other book that you may not like as well.
I guess you could use it for the obvious use in that by stating what you like you may get a list of other books to avoid.
Probably a better way to broaden your horizons is to enter a book that you read (or started to read) and knew you hated. Then it might tell you about some books you may like. It won't always work because it isn't tailored to your own tastes (your own likes/dislikes) so there aren't two poles in the general evaluation but at least it may give you some ideas and even open you up to some other genres of books.
Maybe... Microsoft's DirectX10 has an API in it for offloading vector type work to 'something else' in the system. The interesting thing about it is that it will be a standard API, meaning that hardware can be built to take advantage of it while drivers can also be written to either do it in emulation or by actually handing it off to the specialized hardware. This would help AMD out a lot as far as that kind of hardware goes... without some standard APIs, it would likely end up in a mess, IMO.
Personally, I'm not that excited about that kind of technology just yet because it is still fairly immature as far as PCs go. The logistics are rough all the way around. First hardware can likely be surpassed quickly with newer/better coprocessors but you have to a) replace the entire CPU, b) leave that coprocessor there but disable it or something in preference to an add-in card that is better (just like embedded graphics today) which means you have basically a dead 'core' on your CPU, and c) AMD will be stuck with a ton of dead stock as soon as they upgrade that coprocessor and AMD already has problems with keeping channels fed. c) will probably mean that advancement of Fusion will be slow. If AMD can release a Fusion vector part that is 2X as fast or has a better API (new version) or something, but the 'main core' is the same (say it's a 3.4GHz Athlon64 core) because it hasn't advanced as fast, the instant the new part comes out, no one will want the old one and it will just sit there unless it has huge discount. IMO, this will make AMD not want to advance the Fusion device at a faster pace than the x86-64 core that's also on the die because it would be very costly.
Again... what is this mythical "true SMP operations" that people keep mentioning? Are you talking about MIMD code?
I don't understand the "places" you mention. L2 cache has been multiported for a long time. Additionally, the cache subsystem should be able to handle simultaneous requests from both cores. There should be no stalling due to simultaneous cache accesses from both cores in a shared cache system. As far as cache spills, any situation that should cause spills in a shared cache should cause spills in non-shared (I'll mention this later). Basically, the shared 2M cache can mimic the degenerate case of two 1M caches exactly, but has the flexibility to also be the same as one core having a 512K cache and the other having a 1.5M cache, if working sets dictate, for example (I'll mention this later too).
I don't get your discussion... I'm just not following your verbage. I'm trying to understand it but can't get your metaphors or something.
Anyway I'll try to discuss what I think you are talking about. Shared L2 cache is considered the superior design compared to each core having unshared cache. There are numerous discussions on this around the 'net. However, I'll talk about several specific examples.
In a non-shared cache configuration with two cores on the same die running multithreaded code, you can easily get into situations where each thread wants access to the same piece of data for writing. When this happens (which is fairly common... mutex/semaphore/etc in fine-grained code are good examples of this), in a non-shared cache system, you can get a lot of MOESI traffic and passing around of that data between the two non-shared caches (takes inter-cache bandwidth to do that). However, in the shared cache system, that data is in the shared L2 cache exactly once and, furthermore, there is no passing it around... no MOESI traffic, no usage of any intercache bandwidth because no copy takes place. In such a situation (two threads competing for writes on the same data), the shared L2 cache can be very much faster than the non-shared L2 cache. In addition, the absence of the MOESI traffic is a lighter load on the MOESI subsystem, leaving it free to do other MOESI traffic and do other transfers. In some codes, MOESI traffic between non-shared cache and data copying between the unshared L2 caches can be almost pathological behaviour, leading to heavy slowdown as the two cores fight for access to the data. To summarize: Shared L2 = much lower MOESI traffic in a competing writes situation and little/no intercache bandwidth utilization because no copies between caches occurs. Non-shared L2 in such a situation is more MOESI traffic and intercache bandwidth utilized (and cores waiting for the data to transfer) to transfer the data back and forth. It's easy to write a simulation of this problem.
A second example is cache utilization. If you have two threads in a dual core system that are asymmetric in cache working set size, you can
How does this explain that when clock speed and L2 cache sizes are equal, Core2Duo outperforms Athlon X2 by a non-trivial amount? If it were "just process", then you could try to show where Core2Duo wins based on how much cache it has and the like, but that isn't what we see in the multitudes of benchmarks that have been run. A 65nm 2GHz part doesn't just magically perform calculations faster than the same part running at the same frequency at 90nm, for example.
What is "true SMP type multi-threading"? I've worked on parallel (multi-process and multi-threaded) code for almost 20 years now. There is no "one true" SMP processing type. The differentiators are data partitioning and how much serial vs. parallelism are inherent in your problem/algorithms. All are "true" in every sense of the word. Some are more efficient than others and some scale better... that's it.
As far as L2 cache sizes, they *are* going to grow with the 45nm shrink, it's already been announced... and that's fine by me as the 45nm shrink will just increase the frequency which means the gap between chip speed and memory speed will just go up. And yeah, for most things, I'd take a GB cache
And yeah... the whole 'there'll only ever been a need for maybe seven computers' and '640k is enough for anybody' arguments have been around for a long time but have never held true. There'll always be a need for faster machines, but I do agree with you, a human can only send/receive email, surf the web, and type so fast and current CPUs are far beyond what it takes to do that. Still... as Valve has shown, build it and they will come... at least in the entertainment world (games and content), being able to have more CPU power is still good.
What's the logic of going with a version that is so far behind? I know that you don't go bleeding edge with such a project but 9.3 is ancient. I guess it is still supported but it seems like being *that* far behind would be leaving yourself open to a number of security/compatibility issues.