The big problem is that game developers like to develop a game for all the consoles in a given generation. The hardware capabilities between the PS2, Xbox, and GameCube were different, but in the same ballpark.
Now I'm not a graphics guru and could be wrong, but to my understanding, this greatly eases the art pipeline-- for example textures and polygon counts could be the same size.
The problem with the Wii is that it is not in the same class as the Xbox360 and PS3.
Many people don't realize that for a given game, as much as 3/4 of the manpower goes into art and *not* code. Most developers leverage existing game engines. A friend of mine is on a project where they have ~ 40 artists and ~ 10 coders.
That's the main reason that this question is somewhat flawed. Newer software on your phone will frequently improve reception, sometimes just because the new software is aware of more towers. So purchasing the same phone from the same provider at a later date can get you better reception.
The PRL (preferred roaming list) can be reprogrammed on all cellphones. The list of towers are not hard-coded into the software.
There are a lot of factors that determine reception.
I'm more familiar with CDMA, but both the noise floor and the signal strength determined reception quality. The noise floor is more of a factor w/ CDMA than GSM.
And different phones use different algorithms for computing "the number of bars", so definitely don't use that to compare phones.
You act as if Microsoft is a "felon", and thus has its rights stripped for life. Give me a break. Companies change and are not subject to lifetime penalties.
Wikipedia lost my respect when I read the 9/11 article several months back. To give them credit, upon checking this article just now, there is now a red flag saying that the "factual accuracy of this article is disputed".
Several months ago this article did *not* present the cold hard facts. Links to conspiracy articles, including some that claim the U.S. government was directly responsible, were contained within the core of the article. My attempts to at least move these links to a bottom section were immediately rolled back.
No, you don't understand modern processor issues. The problem is increasing the size of the instruction window. It simply cannot be done within today's frequency and power contraints. An 8-wide machine with a large enough instruction window to get the IPC increases can't be build due to the size of the CAMs.
The idea is to get a larger instruction window by via multithreading. Either via programmed threads (standard multithreading), or speculative techniques for a single-threaded program.
"Adding another FPU" unit is funny and makes no sense. The ILP cannot be extracted to use additional ALU units without making a larger instruction window.
This was proposed in acadamia over 10 years ago. Its called speculative multithreading, or "multiscalar" as coined by one of the primary inventors at the University of Wisconsin (Guri Sohi).
Basically the processor will try to split a program into multiple threads of execution, but make it appear as a single thread. For example, when calling a function, execute that function on a different thread and automatically shuttle dependent data back/forth between the callee and the caller.
No, but hardware register renaming compensates for false dependencies due to the lack of architectural registers. Your example is a true dependency and extra registers doesn't help here (unless I am confusing AT&T/Intel Syntax...plus I really don't know IA-32 assembly like I do SPARC and MIPS).
Besides, I call Amdahl's Law. It just doesn't matter for most applications.
p[t[3*x + y*z] + 1] = s[blah blah...]
Whatever, point is the address for p[] and s[] can be computed "serially" in program order but in parallel on the processor if they're not reusing the same registers.
BTW-- they can be computed in parallel with register-renaming which is done with all OoO cores. If they use the same registers, they are _false_ dependencies.
Yes, register-register data dependencies can by bypassed. Yes, this can make a difference for certain kernels.
But did you actually look at the benchmark results in the parent article? It simply does not make much of a difference for most applications. If it did, the RISC architectures would have buried IA-32 in the 90s. Guess what? They didn't. Much of this is due to Intel's superior manufacturing capabilities, but much of it is also because it just doesn't matter for most applications.
And your example completely ignores the effects of superscalar out-of-order execution. There is plenty of work to do while stack fills/spills are hitting in the L1D cache and retiring.
BTW-- I didn't find your CV, but being that I do research in architecture, I think I know something too.
Look into how much a stack spill costs compared to just using a register to hold a value. Then tell me x86_64 doesn't help.
Do you realize that the number of physical registers greatly exceeds the number of exposed registers? I'm not saying it does, but the micro-op decoder could even elide the stack fills/spills. OoO pipelines already do Tomusulu's algorithm, so I wouldn't be surprised if the micro-op decoder could do the same. Even if it doesn't, filling and spilling the stack will not cause many pipeline stalls because of the locality of the accesses.
Performance is becoming dominated by the memory system. Stack accesses nearly always hit in the L1 cache.
Yup, you make a good point. Plus engineering 64-bit datapaths is slower than 32-bit.
However the lack of register space isn't a huge deal with out-of-order superscalars. Register renaming takes care of false dependencies and spilling/filling registers to the stack gets great locality and instruction-level parallelism because an L1 hit is pretty much guaranteed. Yes, it can make a difference because L1 hits take longer than register accesses...but Intel has proven that the x86 can perform just as well as RISC ISAs.
The point isn't that the registers are bigger, it's that there's twice as many of them
Big deal. The number of physical registers is always larger than the number of architectural registers. Go read up on register renaming in out-of-order superscalars.
The reason for going to 64-bits is to increase the amount of physical address space, not for speed. The majority of applications, especially integer, do not benefit from bigger registers and wider ALUs.
Its pretty important to our national security that farms can stay afloat. What happens when WWIII breaks out and the rest of the world embargos the U.S.? Yup, without farms, we starve to death.
I said the datapaths were designed from day 1 as 64-bit. I believe you that the ALU is 32-bits. It is much easier to swap in/out a different ALU than to change the entire layout of the datapaths.
Going to 64-bit was FAR less painful than completely redoing the floorplanning of the datapaths. Sure, they had to implement 64-bit ALUs, registers, etc. These are all localized changes.
Intel had 64-bit datapaths designed into the Willamette core (NetBurst, Pentium4, whatever) since day one. This is mid-late 90s. They were never utilized until market dynamics forced them to fully implement EMT64 into the ISA.
The price of computing has dropped, and consumers are accustomed to this.
Now I'm not a graphics guru and could be wrong, but to my understanding, this greatly eases the art pipeline-- for example textures and polygon counts could be the same size.
The problem with the Wii is that it is not in the same class as the Xbox360 and PS3.
Many people don't realize that for a given game, as much as 3/4 of the manpower goes into art and *not* code. Most developers leverage existing game engines. A friend of mine is on a project where they have ~ 40 artists and ~ 10 coders.
That's the main reason that this question is somewhat flawed. Newer software on your phone will frequently improve reception, sometimes just because the new software is aware of more towers. So purchasing the same phone from the same provider at a later date can get you better reception.
The PRL (preferred roaming list) can be reprogrammed on all cellphones. The list of towers are not hard-coded into the software.
I'm more familiar with CDMA, but both the noise floor and the signal strength determined reception quality. The noise floor is more of a factor w/ CDMA than GSM.
And different phones use different algorithms for computing "the number of bars", so definitely don't use that to compare phones.
You act as if Microsoft is a "felon", and thus has its rights stripped for life. Give me a break. Companies change and are not subject to lifetime penalties.
Wikipedia lost my respect when I read the 9/11 article several months back. To give them credit, upon checking this article just now, there is now a red flag saying that the "factual accuracy of this article is disputed".
Several months ago this article did *not* present the cold hard facts. Links to conspiracy articles, including some that claim the U.S. government was directly responsible, were contained within the core of the article. My attempts to at least move these links to a bottom section were immediately rolled back.
There is no such thing as a "4-way instruction window". There is 4-issue (4-way superscalar). Instruction windows are sized to dozens of instructions.
The idea is to get a larger instruction window by via multithreading. Either via programmed threads (standard multithreading), or speculative techniques for a single-threaded program.
"Adding another FPU" unit is funny and makes no sense. The ILP cannot be extracted to use additional ALU units without making a larger instruction window.
Tom, quit being an armchair architect. Read this paper:
s calar.pdf
ftp://ftp.cs.wisc.edu/sohi/papers/1995/isca.multi
BTW-- RC delay is causing on-chip wires to get pretty slow, but nowhere near "hundreds of cycles".
Super-linear speedup is possible and has been observed in SMPs because of greater cache locality.
This was proposed in acadamia over 10 years ago. Its called speculative multithreading, or "multiscalar" as coined by one of the primary inventors at the University of Wisconsin (Guri Sohi).
Basically the processor will try to split a program into multiple threads of execution, but make it appear as a single thread. For example, when calling a function, execute that function on a different thread and automatically shuttle dependent data back/forth between the callee and the caller.
No, but hardware register renaming compensates for false dependencies due to the lack of architectural registers. Your example is a true dependency and extra registers doesn't help here (unless I am confusing AT&T/Intel Syntax...plus I really don't know IA-32 assembly like I do SPARC and MIPS).
Besides, I call Amdahl's Law. It just doesn't matter for most applications.
http://en.wikipedia.org/wiki/Register_renaming
p[t[3*x + y*z] + 1] = s[blah blah ...]
Whatever, point is the address for p[] and s[] can be computed "serially" in program order but in parallel on the processor if they're not reusing the same registers.
BTW-- they can be computed in parallel with register-renaming which is done with all OoO cores. If they use the same registers, they are _false_ dependencies.
Yes, register-register data dependencies can by bypassed. Yes, this can make a difference for certain kernels. But did you actually look at the benchmark results in the parent article? It simply does not make much of a difference for most applications. If it did, the RISC architectures would have buried IA-32 in the 90s. Guess what? They didn't. Much of this is due to Intel's superior manufacturing capabilities, but much of it is also because it just doesn't matter for most applications. And your example completely ignores the effects of superscalar out-of-order execution. There is plenty of work to do while stack fills/spills are hitting in the L1D cache and retiring. BTW-- I didn't find your CV, but being that I do research in architecture, I think I know something too.
Look into how much a stack spill costs compared to just using a register to hold a value. Then tell me x86_64 doesn't help.
Do you realize that the number of physical registers greatly exceeds the number of exposed registers? I'm not saying it does, but the micro-op decoder could even elide the stack fills/spills. OoO pipelines already do Tomusulu's algorithm, so I wouldn't be surprised if the micro-op decoder could do the same. Even if it doesn't, filling and spilling the stack will not cause many pipeline stalls because of the locality of the accesses.
Performance is becoming dominated by the memory system. Stack accesses nearly always hit in the L1 cache.
Yup, you make a good point. Plus engineering 64-bit datapaths is slower than 32-bit. However the lack of register space isn't a huge deal with out-of-order superscalars. Register renaming takes care of false dependencies and spilling/filling registers to the stack gets great locality and instruction-level parallelism because an L1 hit is pretty much guaranteed. Yes, it can make a difference because L1 hits take longer than register accesses...but Intel has proven that the x86 can perform just as well as RISC ISAs.
The point isn't that the registers are bigger, it's that there's twice as many of them
Big deal. The number of physical registers is always larger than the number of architectural registers. Go read up on register renaming in out-of-order superscalars.
The reason for going to 64-bits is to increase the amount of physical address space, not for speed. The majority of applications, especially integer, do not benefit from bigger registers and wider ALUs.
But the majority of the world's supply of Uranium isn't located in the Middle East with its political problems.
-5.75 is nothing. My wife has -10.25
Its pretty important to our national security that farms can stay afloat. What happens when WWIII breaks out and the rest of the world embargos the U.S.? Yup, without farms, we starve to death.
I said the datapaths were designed from day 1 as 64-bit. I believe you that the ALU is 32-bits. It is much easier to swap in/out a different ALU than to change the entire layout of the datapaths. Going to 64-bit was FAR less painful than completely redoing the floorplanning of the datapaths. Sure, they had to implement 64-bit ALUs, registers, etc. These are all localized changes.
Sorry, you are wrong. Read the book Pentium Chronicles by the lead architect of the P6 and NetBurst processors, Robert Colwell.
Intel had 64-bit datapaths designed into the Willamette core (NetBurst, Pentium4, whatever) since day one. This is mid-late 90s. They were never utilized until market dynamics forced them to fully implement EMT64 into the ISA.