The second SSE2 pipeline does not do FLOP's
on
Intel Reacts to AMD
·
· Score: 1
"SSE2 can crunch 4 32 bit numbers in each of 2 FPU pipes"
This is an old rumour prooved false for quite a while now, it has a seperate 128 bit LOAD/STORE unit which can kinda count as a seperate pipeline. But its max FLOP's per cycle is still 4 just like P3/K7.
See http://developer.intel.com/design/processor/future /manuals/245470.htm and http://developer.intel.com/software/idap/media/ppt /program/Optimizing_for_WMT_E31.ppt
Who the hell keeps moderating you up so fast? :)
on
Intel Reacts to AMD
·
· Score: 1
Something fishy going on here:)
The x87 FPU in the P4 has a single instruction per clock throughput (not counting load/stores which like the K7 are handled seperately) for MUL/ADD's not unlike PPro/P2/P3 but WITHOUT free FXCH. Thats going to hurt a lot of legacy code... It certainly is going to make it lag compared to K7 for anything which doesnt use SSE/2 for its floating point calculations. It probably is even going to make it lag compared to the P3 at the same clock for code which use's the x87 FPU a lot.
SSE/2 doesnt exactly give it an advantage over K7 either, for single precision its still got the same throughput on operations as 3DNow! only with better load/store bandwith compared to the P3. And for double precision it potentially has the same number of operations per clock as K7, but its advantage in its larger number of registers is IMO negated by the limitation of needing to do everything with SIMD. Which for double precision code might not always be as easy as for 3D.
A) Hahahaha yeah thats why those 1+ GHz P3's are just flooding the market. They are moving to.13u for a reason they are not yielding enough product in the high frequency bins.
B) No they are not, they cannot arbitrarily throw capacity at badly yielding parts if the income doesnt justify it... for one their shareholders dont like it, and its probably a for of dumping
And how the hell is all that capacity supposed to magically make stabile products appear? Processors and chipsets (just cause they had something stable enough to run a demo on dont mean diddly squat, on top of that they need different chipsets for different market segments) dont magically appear out of those fabs, and just throwing more engineers at development to get them done sooner has diminishing returns too.
C) Intel has limited room to manouvre due to their commitments to Rambus...
AMD could release Sledgehammer now if they wanted
on
Intel Reacts to AMD
·
· Score: 1
And how exactly do you know they have the yields stability motherboard chipsets etc needed to introduce the P4 at all levels immediately? Did I miss some press release announcing Intel's omnipotence? (must have come recently, after all the RDRAM chipset fiasco's)
And the exact pairing restrictions and latencies arent known yet are they?
As for games and 3D they rely a lot on floating point, and in Williamette's case they rely on SSE because its standard FPU is rather lame AFAICS. How well it will do in gaming integer code, dunno. The extremely fast latency (0.5 clock) on some integer instruction could help on mainly serial (pointer chasing?) code, would that for instance help much on BSP code? But you are not going to get 4 ALU instructions per clock throughput anyway, the trace cache can only supply 3...
Why Intel would give away its bus design, maybe because they are locked into a very restrictive deal with Rambus?:)
What would a couple of good compiler dev's cost?
on
Intel Reacts to AMD
·
· Score: 1
Development of an optimized GCC backend should not be a huge task... nothing compared to the scale of undertaking which the development of a processor is.
They cannot just shoehorn in too much new arhictectural features into a core they are taking out of commission soon. You will not see a second floating point unit for instance IMO.
The big point about why P3 is playing catchup (in vain I might add) is the price difference you mention... and thats not going to change, the Athlon is better optimized for speed and nothing short of a complete redesign (which is the Williamette) can change that.
Per clock performance doesnt meand dick... for one consumers have been conned into merely look at the MHz, and secondly even if you profess to be above that and look at the performance purely objectively... even then in performance/$ Intel is still hugely lagging and thats ultimately the most important ratio IMO. If you just care about top performance go get an Alpha.
They could even enforce it by putting it in the license agreement for the DDK making life on hackers a whole lot more difficult. With signed drivers that would only leave wrappers, which can be much more easily detected.
The only way to stop this on Linux would be if a trustworthy body was created which would distribute signed binary drivers guarantueed to contain no cheats. This is halfway feasible.
Of course it would only make it more difficult not impossible.
I think you two are talking about different things. You seem to be talking about the ISA and hes talking about the actual logic/mask level design.
Well at least I hope thats how you meant it, because otherwise its easy to see how somoene could easily compete with you... they dont have to recoup development costs.
1. Why didnt he say it was patent pending from dec 1996 onward?
2. What are his intentions towards L4-Linux.
I can only think of a single answer to 1, and his IMO arrogant letter leaves little doubt as far as 2 is concerned.
If both answers are as I fear I dont see how this guy is any better than any of the other "I patented breathing, pay me and be gratefull" types. Also I really dont see why Linus/Linux international is helping him in that case... that for me was the biggest question mark, why are they helping him in excercising conditions to get questionable licenses on questionable patents. The free licenses for RT-Linux certainly aren't worth squandering your principles for.
They dont trust it as much as true blue 20/20, and they have plenty of choice. They can put any itty-bitty thing in their selection procedure, they still get enough people. I agree I wouldnt let someone dig around my eye just yet, but this argument doesnt cut it:)
1. I take of my glasses and adapt the focus if it bothers me too much
2. Im lazy and not very demanding, I usually clean my "glasses" (plastic really) when people start telling me they cant see my eyes:) Never have had much trouble with scratches, but Im carefull with the glasses... never had coating come off neither.
3. Safety glasses... a pain but if its really important to you you can get something made. H3D/HMD I dont know about, and I can get sunglasses made cheaply.
4. Thats probably because you have really heavy glasses, my glasses only fall off if something catches them and makes them. (like my clumsy hands)
AFAICS only because they are maintained by people who keep them interchangable. (extra work not all maintainers feel like doing I imagine) A very short look at the USB source for instance showed up a good amount of ifdef xBSD's. To me that makes it more 3 non interchangable versions rolled into one:)
"SSE2 can crunch 4 32 bit numbers in each of 2 FPU pipes"
e /manuals/245470.htm and http://developer.intel.com/software/idap/media/ppt /program/Optimizing_for_WMT_E31.ppt
This is an old rumour prooved false for quite a while now, it has a seperate 128 bit LOAD/STORE unit which can kinda count as a seperate pipeline. But its max FLOP's per cycle is still 4 just like P3/K7.
See http://developer.intel.com/design/processor/futur
Something fishy going on here :)
The x87 FPU in the P4 has a single instruction per clock throughput (not counting load/stores which like the K7 are handled seperately) for MUL/ADD's not unlike PPro/P2/P3 but WITHOUT free FXCH. Thats going to hurt a lot of legacy code... It certainly is going to make it lag compared to K7 for anything which doesnt use SSE/2 for its floating point calculations. It probably is even going to make it lag compared to the P3 at the same clock for code which use's the x87 FPU a lot.
SSE/2 doesnt exactly give it an advantage over K7 either, for single precision its still got the same throughput on operations as 3DNow! only with better load/store bandwith compared to the P3. And for double precision it potentially has the same number of operations per clock as K7, but its advantage in its larger number of registers is IMO negated by the limitation of needing to do everything with SIMD. Which for double precision code might not always be as easy as for 3D.
A) Hahahaha yeah thats why those 1+ GHz P3's are just flooding the market. They are moving to .13u for a reason they are not yielding enough product in the high frequency bins.
B) No they are not, they cannot arbitrarily throw capacity at badly yielding parts if the income doesnt justify it... for one their shareholders dont like it, and its probably a for of dumping
And how the hell is all that capacity supposed to magically make stabile products appear? Processors and chipsets (just cause they had something stable enough to run a demo on dont mean diddly squat, on top of that they need different chipsets for different market segments) dont magically appear out of those fabs, and just throwing more engineers at development to get them done sooner has diminishing returns too.
C) Intel has limited room to manouvre due to their commitments to Rambus...
And how exactly do you know they have the yields stability motherboard chipsets etc needed to introduce the P4 at all levels immediately? Did I miss some press release announcing Intel's omnipotence? (must have come recently, after all the RDRAM chipset fiasco's)
They are up for hire for exactly these kind of jobs are they not? Where there is a will there is a way... but apparently there's no will.
And the exact pairing restrictions and latencies arent known yet are they?
:)
As for games and 3D they rely a lot on floating point, and in Williamette's case they rely on SSE because its standard FPU is rather lame AFAICS. How well it will do in gaming integer code, dunno. The extremely fast latency (0.5 clock) on some integer instruction could help on mainly serial (pointer chasing?) code, would that for instance help much on BSP code? But you are not going to get 4 ALU instructions per clock throughput anyway, the trace cache can only supply 3...
Why Intel would give away its bus design, maybe because they are locked into a very restrictive deal with Rambus?
Development of an optimized GCC backend should not be a huge task... nothing compared to the scale of undertaking which the development of a processor is.
They cannot just shoehorn in too much new arhictectural features into a core they are taking out of commission soon. You will not see a second floating point unit for instance IMO.
The big point about why P3 is playing catchup (in vain I might add) is the price difference you mention... and thats not going to change, the Athlon is better optimized for speed and nothing short of a complete redesign (which is the Williamette) can change that.
Per clock performance doesnt meand dick... for one consumers have been conned into merely look at the MHz, and secondly even if you profess to be above that and look at the performance purely objectively... even then in performance/$ Intel is still hugely lagging and thats ultimately the most important ratio IMO. If you just care about top performance go get an Alpha.
So yeah, they are playing catch up.
They could even enforce it by putting it in the license agreement for the DDK making life on hackers a whole lot more difficult. With signed drivers that would only leave wrappers, which can be much more easily detected.
The only way to stop this on Linux would be if a trustworthy body was created which would distribute signed binary drivers guarantueed to contain no cheats. This is halfway feasible.
Of course it would only make it more difficult not impossible.
I think you two are talking about different things. You seem to be talking about the ISA and hes talking about the actual logic/mask level design.
Well at least I hope thats how you meant it, because otherwise its easy to see how somoene could easily compete with you... they dont have to recoup development costs.
1. Why didnt he say it was patent pending from dec 1996 onward?
2. What are his intentions towards L4-Linux.
I can only think of a single answer to 1, and his IMO arrogant letter leaves little doubt as far as 2 is concerned.
If both answers are as I fear I dont see how this guy is any better than any of the other "I patented breathing, pay me and be gratefull" types. Also I really dont see why Linus/Linux international is helping him in that case... that for me was the biggest question mark, why are they helping him in excercising conditions to get questionable licenses on questionable patents. The free licenses for RT-Linux certainly aren't worth squandering your principles for.
They dont trust it as much as true blue 20/20, and they have plenty of choice. They can put any itty-bitty thing in their selection procedure, they still get enough people. I agree I wouldnt let someone dig around my eye just yet, but this argument doesnt cut it :)
My experiences
:) Never have had much trouble with scratches, but Im carefull with the glasses... never had coating come off neither.
1. I take of my glasses and adapt the focus if it bothers me too much
2. Im lazy and not very demanding, I usually clean my "glasses" (plastic really) when people start telling me they cant see my eyes
3. Safety glasses... a pain but if its really important to you you can get something made. H3D/HMD I dont know about, and I can get sunglasses made cheaply.
4. Thats probably because you have really heavy glasses, my glasses only fall off if something catches them and makes them. (like my clumsy hands)
5. Nah just the really strong ones
Marco
AFAICS only because they are maintained by people who keep them interchangable. (extra work not all maintainers feel like doing I imagine) A very short look at the USB source for instance showed up a good amount of ifdef xBSD's. To me that makes it more 3 non interchangable versions rolled into one :)
Please correct me if I'm wrong.
Marco