Intel Scraps Plan For 4 Ghz P4 Chip
bizpile writes "It was reported earlier that Intel would be delaying the release of their 4Ghz Pentium 4 chips, but it now appears that they will be cancelling them altogether. The announcement came Thursday and Intel says they are going to rely on approaches besides faster clock speed to improve the performance of chips. Engineers are working to add additional cores to a single chip and improving the efficiency in how the chips interact with the rest of the system. Intel spokesman Chuck Mulloy said, "Those are the sort of things where you get more capability out of a processor by designing specific silicon solutions as opposed to just keep turning the clock faster." In the meantime, Intel is planning on releasing a 3.8 Ghz chip with 2mb of cache."
Actually, Moore says that chip complexity will double along with relative performance, not clock speed. If Intel goes ahead with dual cores, and maybe quad cores later, then Moore's law is safe...for now
Whoops, got my units wrong. That should be 2.4cm/ns, and 0.6cm per hertz at 4Ghz.
occultae nullus est respectus musicae - originally a Greek proverb
But what about Moore's law? Is nothing sacred?
Your ignorance certainly shouldn't be. It's the number of transistors not their switching speed.
"Moore's law will still be true. It's the doubling of *silicon* not the doubling of speed "
more precice: doubling of Silicon's capibility or doing the same at half the size (die space) IIRC.
-nB
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
Actually your theory has already been destroyed by the folks who got 6GHz out of their P4. It was even stable enough to boot XP, so something is flawed with your ideas. So maybe there is a limit, but not anywhere near 4GHz.
WASTE - The Secure P2P
But at the same time, you're not necessarily transfer individual electrons in a circuit. The actual net electron drift velocity is much smaller than the speed of an electron. When you call the UK the electrons (assume a copper wire) are not travelling at or near the speed of light. They are traveling around 72e-6 cm/s, or 72um/s. Yet, the call goes through almost instantly.....
If it's really 0.8c, I think it should read 24cm/ns or 6cm/cycle@4GHz.
My calculations:
240 000 km/s
240 000 000 m/s
24 000 000 000 cm/s
24 cm/ns (divided by 10^9)
Those benchmarks doesn't mention the complexity, nor do they specify the number of transistors on the CPUs, so I don't see how you can draw your conclusion.
Belief is the currency of delusion.
Common misconception. Electrons don't move at the speed of light. In fact, electrons aren't the primary charge carrier in half the transistors in the chip. Holes are (P vs. N).
Charge carriers propagate at about the speed of molasses. Go read this website, it is great:
http://amasci.com/miscon/eleca.html#light
Here's an excerpt --
THE "ELECTRICITY" INSIDE OF WIRES MOVES AT THE SPEED OF LIGHT? Wrong.
In metals, electric current is a flow of electrons. Many books claim that these electrons flow at the speed of light. This is incorrect. Electrons actually flow quite slowly, at speeds on the order of centimeters per minute. And in AC circuits the electrons don't really flow at all, instead they sit in place and vibrate. It's the energy in the circuit which flows fast, not the electrons. Metals are always full of movable electrons, and when the electrons at one point in the circuit are pumped, electrons in the entire loop of the circuit are forced to flow, and energy spreads almost instantly throughout the entire circuit. This happens even though the electrons move very slowly.
https://www.accountkiller.com/removal-requested
You're also off the mark. It is almost certain that there is no electrical pathway that spans the chip without hitting some logic. The number in 90nm (for best performance) is about 12000\lambda (\lambda = 90nm). Often signals propogate much smaller distances in a cycle. I assure you in one cycle no one is making a signal traverse the entire core. Modern CPUs are highly pipelined which is essentially to say that in one clock cycle data is transfered and processed within a very small section of the chip before being passed on to the next stage. This then frees the stage for the next bit of data. see http://en.wikipedia.org/wiki/Pipelining As a side consequence, what you mention is not the limiting the factor. Signals simply do not need to propogate across the chip in one cycle. What has really happened is the drive current available from each transistor has gotten smaller as the transistor itself has shrunk. The wiring capacitance has remained the same and begun to predominate over the gate capacitence. Thus, making the transistors smaller does not make the circuit faster as it once did. Also, as someone else pointed out, the mobility of electrons in semiconductors is no where near the numbers you quote. Electronics simply don't work the way you claim.
My understanding with the Slot 1 and 2 were designs to keep the cache on the same package, since they couldn't get 256k and 512k on a socket 7/8 sized package along with the SSE/MMX instructions. That way you didn't get asshole retailers that shipped expensive processors with no L2/L3 cache leaving the customers in the lurch.
09f911029d74e35bd84156c5635688c0
Remember intel has done other things to increase speed other than just MHz increase. Such as: 1) Increase Front Side Bus (in the p4's case 400 -> 533 and now 800MHz) 2) Increase Cache (256 -> 512 -> 1024 -> 2048kb) 3) SSE 1, 2 and 3 4) HyperThreading
More precise still: The number of transistors giving the lowest cost per transistor doubles every (N) months.
It doesn't quite work like that. A current along a conductor isn't shooting electrons like a radio signal, where it coems out at Point A and you have to wait till it arrives at Point B. It's more like pushing a long rigid stick. Pretty much as soon as you push your end, the guy at the other end is going to feel it move. If you had a stick that was (for the sake of argument) one light-minute long, you could push and pull your end and communicate information to the guy at the other end faster than a radio signal moving at the speed of light.
If a job's not worth doing, it's not worth doing right.
ask, and someone will deliver:1 009/etc_i855gme.html
Pentium-M mini ATX motherboard from AOpen:
http://www.watch.impress.co.jp/akiba/hotline/2004
Inter-instruction data dependencies limit the number of cores that can be used simultaneously. 4 or 8 was the magical number IIRC.
Sadly, this beloved phrase is rapidly losing value with inflation. Back in its day, "dollars to donuts" meant REALLY steep odds. But today you're only betting about 2:1, which isn't much to get excited about.
Any compitent person at slashdot(the "we" you are referring to) should have realized that a decade ago. Compare clock-to-clock a ppc(not just a mac, an ibm or moto box would work, but macs are most obvious) to an intel. PPC's do things (some, not necesarily all) in a much more efficient way, so an intel 1.2 ghz p4, doesn't necesarily mean it is faster than a motorla 1.0ghz G4. Quite the opposite most of the time.
Sparc and Alpha processors were the same way, to some extent. Basicly, Intel racked 1 category that determines performance up so high, that it compensated for x86 less-efficient designs. That isn't bad, but once that 1 category can't get racked up higher as easily, Intel needs to start looking at other factors on hardware design, distances and improved layout, frontside bus speeds, etc to make that 3 ghz box, actually perform to its potential.
"...emulation, which is all about the MHz and basically non-parallelizable."
First, emulation IS "parallelizable". There is usually a decision: emulate, or translate, and if translating, how much optimization to apply. On a single processor machine, this is critical. It may take a great deal of time to translate; less time to emulate. If something is run once (or rarely), it doesn't make sense to translate. We can't afford the overhead.
On an MP (multi processor, or multi-core), we can emulate, and schedule translations. The translations don't have an immediate impact on run-time, but allow a future speed-up (assuming enough memory).
Secondly, it is very difficult (typically), to model things like exceptions. The choice is to (1) be accurate, but slow, or (2) to be sloppy, potentially breaking some code. On an MP platform, multiple methods can be executed. If an exception doesn't happen, the results from the slower methods can be simply discarded.
MP can also be exploited to allow ILP increases by speculative execution. Assuming fast inter-processor communication.
I find that a dual-CPU machine is a "sweet spot" for most of my needs. The GUI, etc. typically exeuctes on one CPU, and my actual application on the other. The system is then MUCH more responsive under "load". I would imagine that "MAME" would allow X to draw on one processor, while it utilized other processsors for the emulation. [or maybe not, I don't MAME as I have no interest in arade games].
Ratboy.
Just another "Cubible(sic) Joe" 2 17 3061
I'll bet dollars to donuts that the ad guy who came up with the new naming system owns a BMW.
Either that, or he owns an Opteron server and AMD already took all the even numbers...
Flying is easy, just throw yourself at the ground and miss. -Douglas Adams
There's nothing exaggerated there. The P4 was a step backwards from the P3 in every respect other than the ability to push it to high clock speeds. The speed comparisons he makes seem roughly correct; for some applications it would have been worse.
I don't agree. There were lot of exaggerations there. P4 had some very interesting branch prediction when it was released. P3 was a bit faster when P4 was released, but saying that P4 performance was horrible is not correct.
The problem P4 have today is that it's L1 caches are too small, there are some workloads hurt because of this.
For example: (from http://www.emulators.com/docs/pentium_1.htm) MISTAKE #6 - Shifts and rotates are slow - It seems Intel has taken yet another step back to the days of the 486, even the days of the 286, by eliminating the high-speed barrel shifter found in all previous 386, 486, Pentium, 68020, 68030, 68040, and PowerPC chips. Instead, they created the shift/rotate execution unit, which by design operates at normal clock speed (not double clock speed), but in my testing actually operates even slower. A typical shift operation on the Pentium 4 requires 4 to 6 clock cycles to complete. Compare this with a single clock cycle on any 486, Pentium, or Athlon processor. How bad is this mistake? For emulation code, it's absolutely devastating. Shift operations are used for table lookups, for bit extractions, for byte swapping, and for any number of other operations. For some reason, Intel's engineers just could not spare a few extra transistors to keep shifts fast, yet they waste transistors on idle double speed ALUs. Intel's own documentation is now contradictory. On the one hand, Intel has for years advocated the use of shift and add operations to avoid costly multiply operations. For example, to multiply by 10, it is quicker on the 486 and Pentium to use shifts to quickly multiply by 2 and 8 and then add the results. However, on the Pentium 4 this trick of shift and add can take as long as 6 or 7 clock cycle, which negates much of the benefit over using a multiply. This appears to have something to do with the fact that the original Pentium 4 design called for there to be two address generation units, which are circuits to quickly calculate addresses for memory operations. In previous chips, the AGU contained a barrel shifter to quickly handle indexed table lookups, which the Pentium 4 now handles using the much slower ALU. The "add and shift" trick was usually accomplished by the AGU by a programming trick using the LEA (load effective address) instruction. This trick is now rendered useless thanks to Intel cutting out the part.