New Pentium 5 Details - 5-7ghz?
zymano writes "This article gives some details on Pentium 5. It will have 64 bit extensions and maybe a 4000 mhz frontside bus. Quote from the article,'The Pentium V is likely to fly along at between 5GHz to 7GHz, have 2MB plus of level two cache, be built on a 90 nanometer process, and have a stackable design. '"
The chip will sample internally at Intel in January 2004 and will take between four to six months to get to market. The Pentium 6 will follow a very similar schedule.
The Pentium V is likely to fly along at between 5GHz to 7GHz, have 2MB plus of level two cache, be built on a 90 nanometer process, and have a stackable design.
The processor we believe, sits in the LGA 775 pin socket, and above it is a very thin heatsink. But, according to sources close to the firm's plans, another permeable heatsink can sit between this and another microprocessor module, giving a stackable design.
The final design of this arrangement is not set in stone.
According to this source, and the details have not been confirmed, a module sitting on top could provide 64-bit extensions.
And the source claimed, Microsoft is ready to launch a version of Windows called Elements with 64-bit extensions.
The idea seems to be that people can buy a 32-bit module, and then add in the 64-bit processor.
There are three samples of an arrangement of the Pentium V here in Taiwan this week, with a very thin processor and lots of wires and patches stuck on it, just to show proof of concept.
The Pentium V could have a front side bus speed of as much as 4000MHz, the source claimed, although this may be reserved for the next chip along, the Nehalem.
Looks good for your age..
The clock speed hike reminds me that the P4 is slower clock-for-clock than the P3, and makes me wonder if Intel are doing this entirely for marketing reasons. I can't help feeling that they should start looking more closely at the other end of the market. Saying that 100W is acceptable in a desktop CPU does not make it so. For a large number of people 1GHz is fast enough, and a silent 1GHz chip would be more welcome than a 5GHz chip with a built-in tornado.
I am TheRaven on Soylent News
P-V should have 64bit extensions for both pointers and basic math.
64bit pointers and basic math on those pointers, are really what people desire so that more than 4GB can be trivially addressed in a single process's virtual memory space. Think about people who want to manipulate a video file that is larger than 4GB.
AltiVEC **128 bit** is just wide data manipulation and is of no use for those that require large memory footprints. It has the same 32 bit address lines and pointers at a 60MHz Pentium I.
That being said, P-V should also have more than the current 36 bit of physical address lines. I'm guessing they will have 40 usable bits or so of the address bus to physically address memory.
So if you want to put in more than 4GB of RAM you can. But if you don't, 64 bits will be useful to address more than 4GB of a video file sitting in virtual memory.
Yes, thats exactly what they are doing. The P4 pipeline is 20 stages, and the P3s is something like 10. The longer pipeline helps them to ramp up speed, but at the cost of efficiency. Wheeeee.
There will be a heatsink inbetween the stacked processors, although it would be more properly named a heat spreader. They just call it permeable because it will have holes drilled into it so pins can attach to the lower processor.
Karma: SELECT `karma` FROM `users` WHERE `userid`=138474;
The processor we believe, sits in the LGA 775 pin socket, and above it is a very thin heatsink. But, according to sources close to the firm's plans, another permeable heatsink can sit between this and another microprocessor module, giving a stackable design.
Yes, I saw that in the article, and it's pretty much the only way you *can* do it, to have something separating the chips. The question is, how can they get this to work? I mean, there's limits to how fast heat can be spread away by something like this (based on the heat conduction coefficient of the material you are using) and the latency between chips increases linearly as you increase the thickness of the separator... We can barely keep faster chips right now cool with enormous heatsinks... this seems far more ambitious.
Also remember ohmic heating is proportional to the square of the clock speed (yes, it goes down by a factor as you get the components smaller, but you see where this is heading). IT will be a long while till Intel chips don't put out a ton of heat (when they start using something like spintronics or photonics). There's simply too much current to dissapate.
Cheers,
Justin
They did. It's called the Itanium. Look how well that's worked out.
Even running it outside of a server, you have to have a special version of Windows, which doesn't have all of the features that the 32-bit Windows does (Windows for the Opteron line is supposed to fix this). It's hideously expensive, meaning fewer people adopted it, which meant that costs stayed high, so there was less encouragement for people to adopt it, even within the server/workstation market in which it was sold.
AMD is going about it the right way. Allow a smooth and orderly transition. That they're going about it using a 64-bit adjustment to x86 makes it more difficult to move on to a new architecture, but perhaps in a few years, this will be looked back on as a successful model.
You can never go home again... but I guess you can shop there.
It complicates cache design, yes, but it's a solved problem.
In x86, you can store into instructions. Even right before they get executed. Even right before they get executed by another CPU. And it all works right. Now that causes architectural complications.
Think about what that means. The superscalar processor is happily going along, executing several instructions ahead simultaneously. Then information comes in that some instruction already executed but whose results have not yet been committed to memory has been overwritten. The processor has to discard everything dependent on that instruction, back up, and do it over.
It sounds horrible. But if you view it as another case of speculative execution (where, at a branch, the CPU starts executing on both paths until the branch is decided) it starts to become clear how to implement this in silicon.
The key to all this is the "retirement unit", which first appeared in the Pentium Pro. The Pentium Pro was the first "modern" x86 machine. Up until the Pentium Pro, what went on inside the CPU was reasonably closely related to the user-level instruction set. In the Pentium Pro, the user-level and internal architectures parted company. Inside a Pentium Pro/II/III/IV is a dataflow machine, pipelining little self-contained operations expressed in an internal instruction set that's quite different from the one the programmer sees. The dataflow machine is front-ended by an x86 instruction translator, and back ended by the "retirement unit". The "retirement unit" takes the outputs of the dataflow machine, figures out which ones to keep and which ones to dump, and determines what gets stored in the programmer-visible registers and memory.
In addition, the Pentium Pro and later machines have far more registers in the CPU than the programmer sees. The Pentium Pro and later have 40 or more registers storing temporary results. Storing data in a temporary variable on the stack just puts it in a register representing that stack slot. There's little or no penalty for this compared to having the value in an x86 register. Eventually the retirement unit pushes the value out to memory (i.e. cache), but the processor doesn't wait for that event.
Once architectures broke the problem apart like that, the programmer-visible instruction set didn't matter that much. This is why RISC isn't very important any more. The original RISC idea, as expressed in early MIPS machines and the DEC Alpha, was to have simple, fixed-sized instructions, a simple CPU, and execute one instruction per clock. This made sense when non-RISC machines were executing less than one instruction per clock.
But the Pentium Pro architecture changed all that. Now, more than one instruction was being executed per clock in a microprocessor. To keep up, RISC machines had to go to similarly complex architectures, losing the simplicity advantages of RISC, while keeping the code bloat of fixed-size instructions.
There are other ways to accomplish the same result. AMD does instruction translation when instructions move into the cache. Transmeta does it in software when the program is loaded. But none of today's fast machines are directly executing what the programmer wrote.
That's why instruction set architecture doesn't matter much any more.
All this takes huge transistor counts, and acres of chip designers. (Intel's acres of chip designers, each in their own tiny cubicle, with one acre of cubicles per room, are at Intel's Santa Clara facility. I've been there, but fortunately don't work there.) But it all works.
I can't speak for SCSI, Firewire, SIDE, or any other drive techs 'cause I'm a cheap S.O.B. and won't pay the big bucks for them.
We moved an application from 2 UltraSPARC III 750 MHz CPU's to 6 UltraSPARC III Cu 900 MHz CPU's and saw very little improvement in performance. Then we moved the disk for the application from 9 internal drives to 20 external SCSI over FC drives, and voila our IO wait dropped from 60% or so to 10% +/-. Our query response times dropped by a factor of three or more. Faster, and even more, CPU's are not the answer to data intensive problems, I/O is. Slower (clock speed wise) 64bit CPU's, with better efficiency, more memory addressing, etc. are the norm in the data center for just this reason. IF you can take advantage of your L1/L2 cache then faster clock speed on the CPU will improve performance. The reason most Intel PC's benchmark better than an older box is because the disk, memory and video sub-systems have improved, not because the CPU is making a huge difference.
As proof, search SPEC's benchmark results using Dell and then Sun as your search criteria. Notice the following:
Theoretically the PE2650 should outperform the PE2550 and 280R by about 3 times, all other factors being equal (i.e. same benchmark). The SPEC benchmark does its absolute best to eliminate I/O systems and network interfaces as a factor, so if we are just talking CPU, cache and memory, the Xeon should have had a CINT baseline of about 1600 or so.
Things get even worse when you start looking at the SMP capabilities and scalability. In a truly linearly scalable SMP system you should be able to go from 1 CPU to 2 CPU's and have the benchmark double. Even the best SMP systems (Sun UltraSPARC and IBM Power) can't quite achieve that. But Itanium really has trouble. Search on Dell and look at the CINT and CFP rates benchmarks. Look at 1, 2 and 4 CPU scores for the Dell 7150.
Bottom line? If you are doing heavy lifting on a server, go SMP with 64bit RISC, or, in some cases, use a cluster of 2 CPU x86 servers. If you are a PC user, you are unlikely to see a significant performance increase with new Intel CPU's unless you upgrade the whole system, not just the CPU.
This whole thing of adding clock cycles and deepening the pipeline is not working out well.
In my universe I'm perfectly normal, it's not my fault you don't live in my universe.
Heat pipes aren't a panacea. They are reasonably efficient at moving heat, and that is all. Putting a flat copper block with a couple of heatpipes (which will be problematic if, say, the mobo is placed horizontally) will not make the heat magically vanish. Hell, just to make the heatpipes work you need a reasonable difference in temperatures at the evaporating and condensing ends- and that will be pretty hard to achieve, considering the scale of things.
Current chips generally require all sorts of nasty supercooling, specifically LN2, to run at ~4Ghz even for a short period of time. Even if the PV is a miracle of engineering, I don't see how you can have two modules running at 5Ghz+ stacked on top of each other.
And, in fact, that's not the whole story either. Today, current leakage is a very serious issue (I haven't seen concrete numbers for 90nm process technology, but the leakage gets larger the smaller process technology you use, and the power dissipation due to leakage gets comparable to the power dissipation due to the transistor switching), and the current leakage is completely independant of clock speed. So, part of the whole power dissipation is independant of clockspeed, and the other part scales linearly with clockspeed.
what i want to know is how much you pay your admin people.. you bought more CPU's even though you were 60% iowait ?
Actually, we knew we needed both CPU's and disk. Here's why. The system was IO bound, that was clear from a simple reading of top. But we also intended to add more users, and even if we removed the factors contributing to being IO bound, we still would need more CPU cycles for the user queries. Because of the way the project was built and interconnected with some other projects, we first added the CPU's, and then moved the data to new disk. So, we were able to measure performance in both states.
in general, 64bit computing is a waste of time and performance, unless you need a 64 bit address space. you can fit half the instructions in cache, half the pointers in your data structures, load half as many addresses per cycle, etc. We've got a couple of 8 and 16GB SQL server boxes so when Win64 and SQL64 have baked a bit longer we may migrate those databases to 64bit platforms..
Sorry, but your beloved wintel doesn't support the heavy lifting needed. I have yet to see a true multi-terabyte data warehouse run on wintel and sql server. Although you just might be able to do it reasonably using Windows, Intel and Sybase IQ Multiplex. The application I was talking about was a 350 GB fraud datamart .... not even the data warehouse. This datamart, on technology that is not "interesting or novel", manages to support that much data, real time data loads, real time interactive queries and so forth. And, as you might imagine, a fraud datamart gets very heavy ad-hoc analytical queries. We aren't really worried about the technology being interesting or novel, we are worried about it doing what we need it to do. I suspect that a 4 or 8 CPU system based on Intel technology would be CPU bound in this situation, not IO bound. Although I'm not really willing to try it and waste the money.
We have consistently hit performance and scalability ceilings with Intel, especially when running Windows. Intel processors seem to scale a bit better with Solaris x86 or Linux than with Windows, although not much. By contrast, the main limiting factor we have found with our Sun and IBM technology is our disk farm. Then again, the organization I work for deals with nearly 200 million transactions annually, a data warehouse that contains about 2.5 terabytes of data, an imaging system with more than one billion images available in either real time or as little as 10 minutes (for the offline images), about 5,000 total users and more than 5,000,000 customers who all have to interact with the system in some fashion or another. So, not only is the performance and scalability important, but so is the reliability, availability and stability. When the downtime costs more than entire wintel server, you find that the ROI of those Sun and IBM servers you scoffed at makes a lot of sense. The data center doesn't run on those platforms (not to mention HP Superdome and Alpha) because of some sort of hegemony, but rather because the systems are proven, reliable, stable, scalable and perform well with enormous user and data loads on them. On top of all of that, they aren't vulnerable to the worm of the week because the OS vendor can't manage to separate the user from the OS.
Not to turn this into a holy war or something, but Intel CPU's may increase in computing power each generation, but if you plot a curve using something objective like SPEC you see that the increase is a parabolic curve along the X axis, that is performance is not increasing as fast as it did in the prior generation. Put processor generation/clock speed on the X axis and SPEC benchmark baseline on the Y axis. Now, hopefully, Intel will figure a technology path out of their dead end. However, if you take the increase in performance of "boring" 64bit RISC processors, interestingly enough the curve is parabolic along the Y axis. Admittedly the improvement is gentle, but still there.
So, back to my original point, usin
In my universe I'm perfectly normal, it's not my fault you don't live in my universe.