Posted by
michael
on from the double-the-pleasure-double-the-fun dept.
msolnik writes: "Over at RealWorldTech they've published an article on the future of 64-bit performance. This article covers the different technology from Sparc to Hammer. Its a great read if you are looking for information on up-and-coming products from Intel, AMD, Sun, and Compaq."
AMDs going for a slightly different track, AMD
is the only one trying to put 64-bit on the
desktop. Now for us linux freaks SUSE Linux
and NetBSB will be fine for a 64-bit desktop,
but if AMD want to lock up some of the market
into x86-64, they really need a mainstream OS.
Unfortainately that means Windows, and "if
we build it they will come" doesn't necessary
work if they is no competition. Still in the
mean time, Crawhammer will be a damn fine 32-bit
chip as well, and Sledgehammer will bring
high-end servers right down to mid range prices.
Re:AMD's gonna win
by
Anonymous Coward
·
· Score: 0
*** shudder ***
Installing AMD CPUs on a server is like using IDE hard drives instead of SCSI...
Using AMD CPUs and IDE drives on a server is good both financially and for performance.
AMD CPUs outperform every Intel CPU (don't fall into the MHz trap!), are cheaper and are no less reliable. You're a fool if you buy more expensive Intel CPUs these days.
SCSI isn't magic technology anymore either. In fact, the latest IDE protocols surpass all the existing SCSI technology in speed. Furthermore, the actual drive mechanics are the same for both SCSI and IDE versions of a drive so the reliability isn't any lower for IDE drives anymore. Yeah, you can chain more drives in a single SCSI bus than you can on IDE, but IDE controller cards are cheap. There are inexpensive IDE RAID controllers too. And, of course, the price for IDE drives is significantly lower. You can get two huge IDE drives at the price of a single 18 GB "high performance" SCSI drive.
The more I listen to Intel and SCSI people the more I believe that the so called advantages of both technologies are nothing but hype to keep up the ridiculous pricing.
Re:AMD's gonna win
by
Anonymous Coward
·
· Score: 0
You mean fours times cheaper?.
BTW, High-end servers tend to use fibre channel
anyway.
Re:AMD's gonna win
by
Anonymous Coward
·
· Score: 0
Athlon looks to slip to #2 once again soon. Northwood P4 comes out January 6th, 10% faster per clock. PC1066 RDRAM comes out in Q2 2002, 10-15% faster per clock again. Plus 2.4GHz in early Q2 2002.
Re:AMD's gonna win
by
Anonymous Coward
·
· Score: 0
You get what you pay for.
If reliability is important you'll want to get the best possible technology. Not something designed for Joe Sixpack's desktop.
AMD has kick-ass CPUs; they are fast and cheap.
Same goes for IDE, including the fact that they
have become bigger and more reliable the past
years, in case you haven't noticed.
I'm not bashing anyone here, I'm just stating
a fact. And in case you wonder, my OpenBSD
server runs on AMD and SCSI.
Maybe you were thinking of Alpha CPUs ? Now
THERE'S raw power for ya.
Re:AMD's gonna win
by
Anonymous Coward
·
· Score: 0
You get what you pay for is bullshit.
People pay for brandnames and advertising, and get nothing but ripped off.
People pay for there own stupidity, aways have, aways will.
Re:AMD's gonna win
by
Anonymous Coward
·
· Score: 0
Sigh.
I guess you've never dealt with big iron and don't understand why your precious AMDs won't be able to replace a high-end Sun server ever.
A few keywords for you: reliability, scalability and pure performance.
That's why you should pay for real hardware.
Re:AMD's gonna win
by
Anonymous Coward
·
· Score: 0
According to the rumors Microsoft is developing x86-64 version of Windows.
Re:AMD's gonna win
by
Anonymous Coward
·
· Score: 0
That's Microsoft innovation for you...
A 64-bit Linux has been available for years now and Itanium specific version is already available even before the actual chip has been even released. That's open source power!
Re:AMD's gonna win
by
Space+cowboy
·
· Score: 3, Informative
Whereas I agree with you on the AMD front, I strongly disagree regarding SCSI vs IDE in a server environment.
SCSI drives have disconnect abilities, which means they can have commands sent to them, and the bus is then disconnected (free for other use) while the drive is seeking to the sectors required and buffering in the internal drive RAM. This means that other drives can be instructed in this 'dead' time. On a single-drive system, this is irrelevant, but even on a small server (say 0.5Tb disk array) it is crucial.
IDE drives hog the channel - which is why you can't get much more speed out of a RAID-0 array with 3 or 4 drives than one with 2 (masters) on a standard PC. There are only 2 channels, so only 2 drives can be accessed at once. Contrast this to a SCSI system, where anything up to ~64 disks might be attached to a single channel, but using disconnect to manage that channel amongst them.
To see why disconnect works so well, remember that the time it takes to seek the disk head is measured in milliseconds - this is several orders of magnitude slower than the time to send the commands/data over the bus to the host computer.
Also remember that the ATA-100 is (AFAIK) a burst-speed, ie: it can transfer at that speed when the source data is in the cache - it cannot read the data at that speed... The latest SCSI standard is 320MBytes/sec (Seagate, I believe) although I think 160 MBytes/sec is the highest widely available standard. Given the architecture underlying both technologies, which do you think will have the best chance of filling it's cache more often in a RAID array? (Hint: it starts with an 'S':-)
The only company I have seen to make large-scale IDE RAID arrays work as fast as SCSI ones uses an IDE controller *per drive*, and attaches a SCSI/Fibrechannel front-end via custom hardware. It's still cheaper than SCSI, but not by that much, and getting people who know about it is more difficult when it goes wrong...
Simon.
-- Physicists get Hadrons!
Re:AMD's gonna win
by
Anonymous Coward
·
· Score: 0
> I guess you've never dealt with big iron and
> don't understand why your precious AMDs won't
> be able to replace a high-end Sun server ever.
This is not true, SGI and Sun have had 64bit based desktops for a long time! The Sun Blade 100 is a great example, this is only ~1000. If you are a member of the academic community you can obtain one for around $795. This is a 64bit processor - same one as in the Sun Fire 15000.
UNIX is and has been on the desktop for years. Sun got their start on the desktop and has been strong there ever since! Suse, Debian, NetBSD and OpenBSD all support the Sun Blade 100 too !
Re:AMD's gonna win
by
tap
·
· Score: 3, Interesting
Furthermore, the actual drive mechanics are the same for both
SCSI and IDE versions of a drive
Why do people keep repeating this myth? If you look at the physical parameters for any SCSI and IDE drive made in the last 5 years, you will see that they are completely different. I dare anyone to find a SCSI and IDE drive from the same manufacture produced since 1998 that has the same number of heads, spins at the same speed, and has the same capacity. You won't find any.
Re:AMD's gonna win
by
Anonymous Coward
·
· Score: 0
I was doing it all the time.
Re:AMD's gonna win
by
Anonymous Coward
·
· Score: 0
AMD has kick-ass CPUs; they are fast and cheap.
And have a tendency to go up in a puff of smoke...
Same goes for IDE, including the fact that they
have become bigger and more reliable the past
years, in case you haven't noticed.
Like the IBM's 75 GB total-data-loss-guaranteed wonder?
Re:AMD's gonna win
by
tap
·
· Score: 4, Informative
SCSI's disconnect ability looks good in theory, but in practice it's not such a great advantage. With SCSI you can attach up to 15 devices to a single channel, and effectively access them all the the same time. With IDE you can attach up to two devices to a single channel, and only access one at a time. Sounds like SCSI is lots better, but only if you have a single IDE/SCSI channel and more than one drive. If you put each IDE drive on a seperate channel, and you can buy IDE controllers with 8 channels, then there really is no advantage to SCSI's disconnect/reconnect ability.
Not to sound like I'm jumping on the Microsoft bandwagon but there was a 64 bit Windows 2000 and there is a 64 bit Windows XP. Considering how in-bed Microsoft and Intel are I am sure they have been working on this for a lot longer than we will ever know.
This IDE vs. SCSI debate is getting really old. Buy what you believe in and what you want to spend. I for one prefer SCSI and I am willing to pay the extra cash. If you don't, then don't.
It's a Free country, brother...
-- Wealth is the product of man's capacity to think. -Ayn Rand
Re:AMD's gonna win
by
ergo98
·
· Score: 2, Interesting
Both SCSI and IDE are communications mechanisms, with SCSI winning out as being more intelligent (due to a variety of factors). Having said that therefore it's merely a function of the circuitry stuck on the back of the drive: Why in the world would any drive manufacturer manufacture completely different drives for SCSI or IDE? Seriously, I personally have never looked at the stats, but that seems absurd: It seems brutally obvious that they'd just pull them off the end of the line and stick on the SCSI board, or the IDE board, of course sticking a 200% premium on the SCSI equipped version as a sucker tax.
I find it interesting that you mentioned "since 1998", and it is perhaps true given that condition: IDE has permeated the market, and the only area where SCSI still has a presence is high end servers, so given that it is possible that they only even both sticking SCSI boards on the 15,000RPM monsters anymore. However, I still disagree with your assertion that it's a "myth", as back in the day (when even desktops came with SCSI if you wanted "multitasking") every SCSI versus IDE review started off with a disclaimer that the drives were physically exactly the same, and only the communications mechanism differed.
This is a 64bit processor - same one as in the Sun Fire 15000
No, it's not. I mean, it is a 64-bit processor but it isn't the same one as in a StarCat. The Blade 1000 is Sun's US-III based desktop but you're not going to pick one up for $800.
Re:AMD's gonna win
by
SQL+Error
·
· Score: 2, Informative
I've found that the high-end multi-channel IDE controllers (such as 3Ware's Escalade line) work well in small database servers. They get around IDE's shortcomings by devoting a separate channel to each drive. Mind you, cabling up 8 drives is a b*tch.
Of course, the drives are still lower performance, but with a healthy amount of RAM I get good results at a reasonable price.
Re:AMD's gonna win
by
Anonymous Coward
·
· Score: 0
He's talking about luser desktops, not scientist desktops.
Re:AMD's gonna win
by
Anonymous Coward
·
· Score: 0
Oh, poo. You just don't want to admit how much money you wasted on Intel and SCSI...
Re:AMD's gonna win
by
UnknownSoldier
·
· Score: 2, Insightful
> Northwood P4 comes out January 6th, 10% faster per clock.
And I bet it will cost more then 10%. AMD will still have the better ratio of performance/price.
Re:AMD's gonna win
by
Anonymous Coward
·
· Score: 0
Spoken like a true penny-pincher.
In addition to the above mentioned technical aspects, Intel chips have also other advantages such as the brand name and highly regarded image in the business world.
Unfortainately that means Windows, and "if
we build it they will come" doesn't necessary
work if they is no competition.
Luckily, Microsoft has abandonned Windows 9x/ME based kernels for Windows NT based kernels for all of their desktops. Microsoft has been developing 64bit versions of Windows NT for some time now, originally for Alpha, then (using Alphas for development and testing even) for IA-64. If there is sufficient demand, we may see an x86-64 version of Windows XP (or whatever the next version will be called). I doubt it will be a lower cost "Home" version, but more likely a "Professional" version. All Microsoft has to do is realize that x86-64 owners will use Linux/BSD if they are limited to a 32bit version of Windows, and suddenly they will be scrambling to make a port.
Re:AMD's gonna win
by
MSG
·
· Score: 3, Informative
BTW, High-end servers tend to use fibre channel as a physical interconnect for SCSI devices, anyway.
Re:AMD's gonna win
by
Anonymous Coward
·
· Score: 0
yup. its biased and untrue. heatsinks falling off AMD processors dont damage the chip if the correct power/temp support on the mobo is implemented...like it is on intel mobos (which is why intel mobos are more expensive than AMD mobos).
Re:AMD's gonna win
by
Anonymous Coward
·
· Score: 0
brand name and highly regarded image means didly squat. i want good performance/$ and reliability both of which AMD parts give me more than intel.
and RDRAM is a piece of SHIT. proprietary memory which sucks in terms of performance and CPUs which rely on high clock speeds to give mediocre performance are both on my products-never-to-be-touched list.
All that proprietary crap, ughhh. A 4 gig memory kit for one of those things is 5 grand and up. Thats with the discount. Sun and SGI both should of had a line of affordable for the student and compatible with their big iron hardware computers. If they did they might of had a developer community large enough to forestall the incursions of linux and sadly win32 into their marketspace. At least SGI is now giving what they can to linux and plan on moving to it in the future for sheer numbers of developers. SUN is looking more and more like a dinosaur everyday.
Re:AMD's gonna win
by
Anonymous Coward
·
· Score: 0
oh please. your big iron sun server offers worse performance than any x86 system. specint/fp for the sun parts is so slow its laughable. there's a reason why solaris has been named slowlaris. its dog slow. the only reason you cram so many us-ii Cpus into a solaris box is because the 450MHz us-ii parts give didly shit for performance. us-iii has major problems with compiler bugs and part availability.
the only thing good about sun is quality & reliability. if a good x86-64 manufacturer came up with a decent, quality hot plug mobo with hot plug cpu and memory support with amd chips i'd switch over from my rack of E4500s in a heartbeat. and im betting it wouldnt cost me $300,000+ per machine either unlike the sun boxen.
Re:AMD's gonna win
by
Anonymous Coward
·
· Score: 0
http://www.microsoft.com/WINDOWSXP/64bit/overview. asp
Quite easy to find this information. They actually have a 64-bit Limited Edition server version out:
http://www.microsoft.com/windows2000/64bit/defau lt.asp
It's called Windows Advanced Server Limited Edition. Free upgrade to.Net server after release with it as well I believe. It's pretty good (i've used it). And is a Supported OS until 90 days after the.Net server release:
http://www.microsoft.com/windows2000/64bit/overv ie w/default.asp
(at the bottom under how to license.
Installing AMD CPUs on a server is like using IDE hard drives instead of SCSI...
It's quite courageous to give such statement, as I don't think you know the exact features of the to-be AMD CPUs (that have not even yet been released, or won't for some time. Want to borrow your crystal ball, I'd like to win in lottery?;)
-- Everyone who makes generalizations should be shot.
Re:AMD's gonna win
by
Anonymous Coward
·
· Score: 0
>Spoken like a true penny-pincher.
Well excuse me, Mr. Money Bags. Some of us less fortunate only have a fixed amount of money.
> Intel chips have also other advantages such as the brand name and highly regarded image in the business world.
And some of us couldn't give a sh!t about brand name. Quality, performance, and price are more important then brand name.
Oh, no. The Blade100 uses a 500MHz US IIe processor with 256KB cache while Fire 15000 uses 900 MHz UltraSPARC III processors with 8MB cache. There is absolutely no comparison between the two. If you want a desktop machine with a USIII CPU it's gonna cost you around $10K
(Blade1000)
AMD has a long way to go to compete with Sun, or even Intel on the server market. You're comparing a useless piece of silicon with a leading server platform. When you pay to Sun, you're not just getting hardware but a proven, scalable, stable enterprice quality server platform with industry leading application support, industrial strength operating system and Sun support services.
And I bet it will cost more then 10%. AMD will still have the better ratio of performance/price.
Certainly Northwood will (rightly) carry a bit of a price premium over Willamette, but mainly because Willy prices will drop by a lot. Northwood will almost certainly improve Intel's price/performance relative to AMD, for the simple reason that in addition to being able to clock faster and getting better performance at equivalent clock speeds, Northwoods are cheaper to make than Willamettes, because they're a lot smaller. (~130mm^2 vs. 217mm^2)
AMD will still offer better performance/price, of course, but mainly because they will cut prices in response. (And they had an awfully large lead to start with.)
All they'll do is emasculate the pro version to make a home version, like they did with XP pro and XP home. If x86-64 looked like it would be popular with home users, you can bet it wouldn't take more than a few months for them to get a home version out.
nt4 ships with an alpha version. i ran it for a while, silly stable, but not that fast compared to tru64.
Not having tried it, what I heard was that while NT ran on Alpha, it was still basically 32-bit. Just like Win95 was largely 16-bit even though it ran in 386 protected mode.
That would explain why Tru64 (which, like NT, is microkernel-based) was faster. Some say that to get the best performance on Alpha one should compile code with the Tru64 compilers but run it on Linux (the two are ABI-compatible).
-- "How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README
it did indeed run in 32 bit mode, but you could get 64 bit version of most programs. the speed problems were due to the fact that the os was essentially a 32 bit os, but thats where the added stability came from, for some reason the 64 bit cpu seems to not choke on nt near as much as a 32 bit cpu.
those extra bits must do something for stability. its likey the failed instruction tolerance the alphas are famous for.
I'll stick to my Atari Jaguar or N64
by
Anonymous Coward
·
· Score: 1, Funny
I've had 64bit computing for years now! This is nothing new!
Re:I'll stick to my Atari Jaguar or N64
by
orangesquid
·
· Score: 2
Well, so have I! SGI Indy, 1994. 64-bit MIPS CPU.
-- --TheOrangeSquid
Is it any wonder things seem so awry? We swim in a sea of confusion and don't have to think to survive
Re:I'll stick to my Atari Jaguar or N64
by
petej
·
· Score: 1
Indy didn't have a 64-bit CPU; you had to move up the SGI line for that. It did have direct binary compatibility with the 64-bit version, though.
Re:I'll stick to my Atari Jaguar or N64
by
thogard
·
· Score: 1
And how much N64 code is 64bit? The spinning logo is hte only 64 bit code I've seen in any N64 game. The logo was written by the SGI guys to show how to use the hardware properly.
Re:I'll stick to my Atari Jaguar or N64
by
linzeal
·
· Score: 0
Yes it did, the r3000 was the baseline proc and everything after that was 64 bit in mips land.
i want a 128 bit computer why are computers lagging aginst game systems or is this just graphics measurement on the game systems.
I still want a 128 bit computer in any case!!!! (-:
Re:my first post
by
Anonymous Coward
·
· Score: 0
get a transmeta VLIW 128-bit laptop. if you can get it to run native, you have 128 bits.
Might have 64-bit computing very soon.
by
x136
·
· Score: 3, Interesting
If the Power Mac G5 is introduced at Macworld on Monday*, you can all have your 64-bit goodness by the months end!
*I'm not really expecting it to be released this soon, maybe later this year. But who knows? It could happen.
-- SIGFEH
Re:Might have 64-bit computing very soon.
by
MikeD83
·
· Score: 1
It would be great, but I don't seen it happening. A top of the line G4 currently uses PC133 SDRAM and ATA/166 hard drives. They will more than likely unveil future plans for 64bit, but I think we will see something more modest in terms of hardware at the show like maybe DDR-SDRAM and ATA/133 hard drives in a brand new G5 (and I guess you could also count the new case as hardware too).
Re:Might have 64-bit computing very soon.
by
Anonymous Coward
·
· Score: 0
Correct me if I'm wrong, but isn't the G4 as much as a 64-bit processor as the Athlon is an 80-bit processor? IIRC, the ALU is 32-bit, FPU is 64-bit and it has 128-bit SIMD?
I thought it was all marketing crap Apple threw around.
Re:Might have 64-bit computing very soon.
by
Anonymous Coward
·
· Score: 0
Look, Steve's going to do his song-and-dance about the well-known-already flat panel iMac on Monday. It's topped up for USB and firewire ports (2.0 and 1394b) and the screen is great. At $999 for the base model with a 60gig hard drive and a 600mHz G4 processor (128mb RAM), you can't lose. But it's still mostly the same old stuff.
It looks just like the Studio 15 and your mom will love it.
-A Little Birdie
Re:Might have 64-bit computing very soon.
by
inio
·
· Score: 3, Informative
yes, but the G5 replaces the 32-bit ALU with a 32/64-bit ALU. The PPC spec has included 64-bit instructions from day 1, but they've only been used in IBM's mainframes. The problem with apple using a standard 64-bit PPC is that there are a few minor differences in how certain generic instructions are handled (most instructions are specific to single- or double-words) which make running code compiled for 32-bit PPC uncertain on 64-bit PPC. So what I'm assuming Mot has done with the G5 is add a "64-bit mode" that apple disabled by default and applications must explicitly request.
Re:Might have 64-bit computing very soon.
by
tjwhaynes
·
· Score: 2
The PPC spec has included 64-bit instructions from day 1, but they've only been used in IBM's mainframes. The problem with apple using a standard 64-bit PPC is that there are a few minor differences in how certain generic instructions are handled (most instructions are specific to single- or double-words) which make running code compiled for 32-bit PPC uncertain on 64-bit PPC.
Is this really true? I run on a mixture of 32bit and 64bit 4-way POWER RS6000 machines - all the software is compiled up on the 32bit platforms and runs seemlessly everywhere. So either your statement doesn't apply on AIX or the PowerPC chips are subtly different to the POWER platform when it comes to 32bit/64bit.
Cheers,
Toby Haynes
-- Anything I post is strictly my own thoughts and doesn't
necessarily have anything to do with the opinions of IBM.
Re:Might have 64-bit computing very soon.
by
stilwebm
·
· Score: 1
Rumor has it there will be a dual-banking feature, effectively doubling the memory bus size and bandwidth to 128bit, but still PC-133 (SDR).
Re:Might have 64-bit computing very soon.
by
linzeal
·
· Score: 0
Do linux distros take advantage of these 64 bit instructions? Have people tried to enable these features?
Re:Might have 64-bit computing very soon.
by
vanguard
·
· Score: 2
They will more than likely unveil future plans for 64bit
That doesn't sound very apple-like to me. They'll probably keep it a closely guarded secret.
-- That which does not kill me only makes me whinier
Re:Might have 64-bit computing very soon.
by
Brand+X
·
· Score: 2
I'm not currently speaking for 64 bit PPC. I know PPC quite well. I write software for a living, and the PPC (MacOS X BSD layer with some components in asm) is one of the targets I have to keep synched. Gotta love that big-little endian mode switch...
I'm not currently speaking for 64 bit PPC, as I've never seen one. I've seen 64 bit POWER-4 servers, but that's a little different. I do, however, also target and maintain Solaris versions of my software, which are 64 bit aware. I do have to deal with the 32 bit library/64 bit application issues. I do have to deal with building both 32 and 64 bit versions. I even have to deal with testing gcc 3.0.x 32/64 modes against the Forte CC 32/64 modes. I'm pretty damned familiar with the issues involved in making software on mixed addressing operating systems work.
Before I go on, let me note that a 64 bit application in the sparcv9 format cannot link to a 32 bit sparvcv8 library, either static or dynamic, and the only solution with a commercial library will be to actually write an interface by interface transport layer for the library, linking the 64 bit side of the transport layer to the application, and the 32 bit side to the library, and take the penalty of using pipes for communication right on the jaw. Oh, and the 32 bit side will have the 4GB memory limit, too...
While Sun does do a good job of making the 32 bit/64 bit transition look smooth, it's not, really. SGI and HP face similar issues. I'm told that Alpha Linux may have workarounds not available on the big iron platforms, but I don't know the details, as I don't do any serious Alpha work.
Now I am currently speaking for the PPC. Please take this as speculation based on POWER-4 details and the original PPC spec, not as insider knowledge.
The PPC is interesting. The original design calls for mode switching (like the sparc or mips), but there's a provision for realtime mode switching in there. I expect you would take a heavy hit, but you might be able to link 64 and 32 bit binaries, if the linker was smart enough to insert mode switch instructions into the calling sequence and if the compiler were set to interpret interface definitions (in headers) according to a dependancy determined pointer size assumption. Come to think of it, it should have been possible to implement something like this for the sparc and mips binary formats... (eg _int_v8 and _int_v9 as seperate types in the compiler's internal interpretation...)...but there would be serious penalties for this as well.
Being realistic, I expect eventually we'll have a 64 bit kernel (Darwin) with 32 bit libraries provided as interfaces for mixed mode applications, and a handful of apps (Photoshop, FCPro) that require 4+GB memory being released in 64 bit form (requires G5(6?) or greater!!!) for power users... this of course, at the point in time where we have 2GB+ DDR modules, and four slots again... and another major transition. At least Apple has proven that they are good at tremendous transitions, remarkably so, considering...
There are other possible benefits to 64 bit computing, beyond addressing. Some of them can be realized now... on the G4s, and the P4s, there are ways to use 64 bit (or 128 bit, or even, in one case on the P4, 256 bit) bit vector arithmatic to speed up comparisons, sometimes by unbelievable factors... some higher precision mathematical processing is possible only with 128 bit floating point, which is generally coupled only with 64 bit integer registers, which are the basis of 64 bit memory addressing as a reasonable proposition...
There's also a possible two-instruction-per-cycle trick that could be performed on a 64 bit CPU with a hybrid (64 bit with 32 bit support) kernel for certain operations. There's some documentation for this online, but I haven't tried anything of the sort myself (no current access to a POWER-4 server), so I can't vouch for the usefulness of this.
We're not talking about a trivial task, or any immediate benefits, so don't expect a 64 bit MacOS X anytime soon. Even if the CPUs are 64 bit. It should be transparent, however, as the PPC is upward compatible (32 bit binaries run on 64 bit CPUs) just as the sparc and MIPS are...
-- --
Still waiting for the Nike endorsement
Re:Might have 64-bit computing very soon.
by
sirsnork
·
· Score: 1
Serverworks chipsets have been doing this for quite some time now
--
Normal people worry me!
64 bit? Nothing new to SGI/IRIX/Mips users...
by
Anonymous Coward
·
· Score: 2, Funny
CONGRATS.I386
I truly care about 64 bits
by
Anonymous Coward
·
· Score: 0
I truly care about 64 bits. That way graphics' transparencies, reflections, etc. can be embedded right away. Yohoo! Way to go!
link to Full article
by
mESSDan
·
· Score: 5, Interesting
is here: That way you only have to wait a longass time for it to load once, instead of a longass time for each of the 5 or 6 pages.
--
-- Dan
Thank Satan for anon proxies
by
Anonymous Coward
·
· Score: 0
HISTORY OF THE WORLD (Score:5, Funny)
2.5 million B.C.: OOG the Open Source Caveman develops the axe and releases it under the GPL. The axe quickly gains popularity as a means of crushing moderators' heads.
100,000 B.C.: Man domesticates the AIBO.
10,000 B.C.: Civilization begins when early farmers first learn to cultivate hot grits.
3000 B.C.: Sumerians develop a primitive cuneiform perl script.
2920 B.C.: A legendary flood sweeps Slashdot, filling up a Borland / Inprise story with hundreds of offtopic posts.
1750 B.C.: Hammurabi, a Mesopotamian king, codifies the first EULA.
490 B.C.: Greek city-states unite to defeat the Persians. ESR triumphantly proclaims that the Greeks "get it".
399 B.C.: Socrates is convicted of impiety. Despite the efforts of freesocrates.com, he is forced to kill himself by drinking hemlock.
336 B.C.: Fat-Time Charlie becomes King of Macedonia and conquers Persia.
4 B.C.: Following the Star (as in hot young actress) of Bethelem, wise men travel from far away to troll for baby Jesus.
A.D. 476: The Roman Empire BSODs.
A.D. 610: The Glorious MEEPT!! founds Islam after receiving a revelation from God. Following his disappearance from Slashdot in 632, a succession dispute results in the emergence of two troll factions: the Pythonni and the Perliites.
A.D. 800: Charlemagne conquers nearly all of Germany, only to be acquired by andover.net.
A.D. 874: Linus the Red discovers Iceland.
A.D. 1000: The epic of the Beowulf Cluster is written down. It is the first English epic poem.
A.D. 1095: Pope Bruce II calls for a crusade against the Turks when it is revealed they are violating
the GPL. Later investigation reveals that Pope Bruce II had not yet contacted the Turks before calling for the crusade.
A.D. 1215: Bowing to pressure to open-source the British government, King John signs the Magna Carta, limiting the British monarchy's power. ESR triumphantly proclaims that the British monarchy "gets it".
A.D. 1348: The ILOVEYOU virus kills over half the population of Europe. (The other half was not using Outlook.)
A.D. 1420: Johann Gutenberg invents the printing press. He is immediately sued by monks claiming that the technology will promote the copying of hand-transcribed books, thus violating the church's intellectual property.
A.D. 1429: Natalie Portman of Arc gathers an army of Slashdot trolls to do battle with the moderators. She is eventually tried as a heretic and stoned (as in petrified).
A.D. 1478: The Catholic Church partners with doubleclick.net to launch the Spanish Inquisition.
A.D. 1492: Christopher Columbus arrives in what he believes to be "India", but which RMS informs him is actually "GNU/India".
A.D. 1508-12: Michaelengelo attempts to paint the Sistine Chapel ceiling with ASCII art, only to have his plan thwarted by the "Lameness Filter."
A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait).
A.D. 1553: "Bloody" Mary ascends the throne of England and begins an infamous crusade against Protestants. ESR eats his words.
A.D. 1588: The "IF I EVER MEET YOU, I WILL KICK YOUR ASS" guy meets the Spanish Armada.
A.D. 1603: Tokugawa Ieyasu unites the feuding pancake-eating ninjas of Japan.
A.D. 1611: Mattel adds Galileo Galilei to its CyberPatrol block list for proposing that the Earth revolves around the sun.
A.D. 1688: In the so-called "Glorious Revolution", King James II is bloodlessly forced out of power and flees to France. ESR again triumphantly proclaims that the British monarchy "gets it".
A.D. 1692: Anti-GIF hysteria in the New World comes to a head in the infamous "Salem GIF Trials", in which 20 alleged GIFs are burned at the stake. Later investigation reveals that many of the supposed GIFs were actually PNGs.
A.D. 1769: James Watt patents the one-click steam engine.
A.D. 1776: Trolls, angered by CmdrTaco's passage of the Moderation Act, rebel. After a several-year flame war, the trolls succeed in seceding from Slashdot and forming the United Coalition of Trolls.
A.D. 1789: The French Revolution begins with a distributed denial of service (DDoS) attack on the Bastille.
A.D. 1799: Attempts at discovering Egyptian hieroglyphs receive a major boost when Napoleon's troops discover the Rosetta stone. Sadly, the stone is quickly outlawed under the DMCA as an illegal means of circumventing encryption.
A.D. 1844: Samuel Morse invents Morse code. Cryptography export restrictions prevent the telegraph's use outside the U.S. and Canada.
A.D. 1853: United States Commodore Matthew C. Perry arrives in Japan and forces the xenophobic nation to open its doors to foreign trade. ESR triumphantly proclaims that Japan finally "gets it".
A.D. 1865: President Lincoln is 'bitchslapped.' The nation mourns.
A.D. 1901: Italian inventor Guglielmo Marcoli first demonstrates the radio. Metallica drummer Lars Ulrich immediately delivers to Marcoli a list of 335,435 suspected radio users.
A.D. 1911: Facing a break-up by the United States Supreme Court, Standard Oil Co. defends its "freedom to innovate" and proposes numerous rejected settlements. Slashbots mock the company as "Standa~1" and depict John D. Rockefeller as a member of the Borg.
A.D. 1929: V.A. Linux's stock drops over 200 dollars on "Black Tuesday", October 29th.
A.D. 1945: In the secret Manhattan Project, scientists working in Los Alamos, New Mexico, construct a nuclear bomb from Star Wars Legos.
A.D. 1948: Slashdot runs the infamous headline "DEWEY DEFEATS TRUMAN." Shamefaced, the site quickly retracts the story when numerous readers point out that it is not news for nerds, stuff that matters.
A.D. 1965: Jon Katz delivers his famous "I Have A Post-Hellmouth Dream" speech, which stated: "I have a dream that one day on the red hills of Georgia the geeks of former slaves and the geeks of former slave geeks will be able to sit down together at the table of geeks... I have a dream that my geek little geeks will one geek live in a nation where they will not be geeked by the geek of their geek but by the geek of their geek."
A.D. 1969: Neil Armstrong becomes the first man to set foot on the moon. His immortal words: "FIRST MOONWALK!!!"
A.D. 1970: Ohio National Guardsmen shoot four students at Kent State University for "Internet theft".
A.D. 1989: The United States invades Panama to capture renowned "hacker" Manual Noriega, who is suspected of writing the DeCSS utility.
A.D. 1990: West Germany and East Germany reunite after 45 years of separation. ESR triumphantly proclaims that Germany "gets it".
A.D. 1994: As years of apartheid rule finally end, Nelson Mandela is elected president of South Africa. ESR is sick, and sadly misses his chance to triumphantly proclaim that South Africa "gets it".
A.D. 1997: Slashdot reports that Scottish scientists have succeeded in cloning a female sheep named Dolly. Numerous readers complain that if they had wanted information on the latest sheep releases, they would have just gone to freshsheep.net
A.D. 1999: Miramax announces Don Knotts to play hacker Emmanuel Goldstein in upcoming movie "Takedown"
32 more bits to mess up
by
ginkelb
·
· Score: 0, Flamebait
Since thay can't get 32 bits to work stable, what will happen when Billy has to tame 32 more of those bits??
The idea alone make's me very afraid because of the fact that i have to work with windows in my office.
-- Real programmers don't document.
It was hard to write so it should be hard to understand.
Contents of the Article
by
Anonymous Coward
·
· Score: 3, Informative
Looking Forward to 2002
By: Paul DeMone (pdemone@realworldtech.com) Updated: 01-02-2002
A Quick Look Back
In the last six months several noteworthy events and disclosures have occurred in the fast moving world of microprocessors. AMD started shipping its Palomino K7 processor as the Athlon XP. Despite the controversy surrounding the performance rating based model naming scheme associated with the XP, it appears the latest refinement of the AMD's venerable K7 design has, by most measures relevant to the PC world, eclipsed the performance of the 2 GHz Pentium 4 (P4), the highest speed grade offered for Intel's first implementation of its new x86 microarchitecture. However, this advantage should prove short-lived, as the second generation 0.13 um Northwood P4 will be officially released in early January. The Northwood will offer higher clock rates, an L2 cache doubled in size, and minor internal performance enhancements.
Extending their rivalry on a different front, Intel and AMD unveiled microarchitectural details of their forthcoming 64-bit standard bearers at Microprocessor Forum in October. Although the McKinley and Hammer are both future flagship parts, and thus important symbols of Intel and AMD struggle for technological leadership, the two processor families will be sold into different markets and won't directly compete. In other 64-bit news, IBM officially unveiled the POWER4 processor in several different hardware configurations with clock rates as high as 1.3 GHz and took the top spot in both the integer and floating point performance categories of the SPEC CPU 2000 benchmark. However, preliminary "teaser" numbers from Compaq suggest that IBM will lose SPEC performance leadership when the EV7, the final major product introduction in the doomed Alpha line, is unveiled. Regardless of who wins bragging rights for technical computing, both processors will offer memory and I/O bandwidth far ahead of their competitors and both should do quite well on commercial workloads.
Sun Microsystems continues to slowly upgrade its UltraSPARC-III line in the face of an increasingly difficult competitive environment. Sun recently introduced its copper process based version of the US-III at 900 MHz. The latest device ostensibly includes a fix to the prefetch buffer bug that vexed the earlier aluminum based device. Far more interesting than the new silicon was the latest version of Sun's compiler. It raised the new copper US-III/900's SPECfp2k score by roughly 20% by spectacularly accelerating one of the 14 programs in the suite using an undisclosed optimization. A recent call was issued for new programs for the next generation of the SPEC CPU benchmark. Tentatively named SPEC 2004, it now seems like it couldn't come soon enough.
McKinley: Little more Logic, Lots more Cache
The most striking aspect of McKinley is its size and transistor count. Weighing in at a hefty 220 million transistors, this 0.18 um device occupies a substantial 465 mm2 of die area. The majority of McKinley's transistor count is tied up in its cache hierarchy. It is the first microprocessor to include three levels of cache hierarchy on chip. The first level of cache consists of separate 16 KB instruction and data caches, the second level of cache is unified and 256 KB in size, and the third level of cache is an astounding 3 MB in size. The die area consumed by the final level of on-chip cache can be seen in the floorplan of the McKinley and some representative server and PC class MPUs shown in Figure 1.
Figure 1 Floorplan of McKinley and Select Server and PC MPUs.
The Itanium (Merced) floorplan is shown as blank because although its chip floorplan has been previously disclosed its die size is still considered sensitive information by Intel and has not been released. The outlines shown indicate the range of likely sizes of the Itanium die based on estimates from a number of industry sources.
Both the first and second generation IA64 designs, Itanium/Merced and McKinley, are six issue wide in-order execution processors. In-order execution processors cannot execute past stalled instructions so it is important to have low average memory latency to achieve high performance. This focus on the memory hierarchy can be clearly seen in the McKinley [1]. Although it is not surprising that the on-chip level 3 cache in McKinley is much faster than the external custom L3 SRAMs used in the Itanium CPU module, it is interesting to see how much faster in terms of processor cycles the McKinley level 1 and 2 caches are despite the McKinley's 25 to 50 percent faster clock rate in the same 0.18 um aluminum bulk CMOS process.
The improvement in average memory latency between Itanium and McKinley can be approximated using the comparative access latencies presented by Intel at their last developers conference, combined with representative hit rates based on the size of each cache in the two designs and an assumed average memory access time of 160 ns. This data is shown in Table 1.
CPU
Processor
Itanium
McKinley
Frequency (MHz)
800
1000
L1
Size (KB)
16
16
Latency (cycles)
2
1
Miss rate
5.0%
5.0%
L2
Size (KB)
96
256
Latency (cycles)
12
5
Global Miss rate
1.8%
1.1%
L3
Size (MB)
4
3
Latency (cycles)
21
12
Global Miss rate
0.5%
0.6%
Mem
Latency (ns)
160
160
Latency (cycles)
128
160
Total
Average Latency (cycles)
3.62
2.34
Average Latency (ns)
4.52
2.34
The back of the envelope type calculations in Table 1 suggests that a load instruction will be executed by McKinley with about half the average latency in absolute time than it would on Itanium. No doubt this is a major contributor to the much higher performance of the second generation IA64 processor. Although the large die area of McKinley suggests a substantial cost premium compared to typical desktop MPUs, for large scale server applications the extra silicon cost is insignificant compared to the overall system cost budget. In fact, from the system design perspective, the ability to reasonably forgo board level cache probably more than pays for the extra silicon cost of McKinley through reduction of board/module area, power, and cooling requirements per CPU. Large scale systems based on the EV7 will also eschew board level cache(s), although with the Alpha it is the greater latency tolerance of the out-of-order execution CPU core plus the integration of high performance memory controllers that permit this, rather than gargantuan amounts of on-chip cache.
Besides the greatly enhanced cache hierarchy, the McKinley will boast two more "M-units" than Itanium. These are functional units that perform memory operations as well as most type of integer operations. In a recent article I speculated about the nature of McKinley design improvements. I suggested that it would contain 2 more I-units and 2 more M-units than Itanium in order to simplify instruction dispatch and reduce the frequency of split issue due to resource oversubscription. In IA64 parlance, both I-units and M-units can execute simple ALU based integer instructions like add, subtract, compare, bitwise logical, simple shift and add, and some integer SIMD operations. I-units also execute integer instructions that occur relatively infrequently in most programs but require substantial and area intensive functional units. These include general shift, bit field insertion and extraction, and population count.
Because the integer instructions that cannot be executed by an M-unit are relatively rare, the McKinley designers saved significant silicon area with little performance loss by only adding two M-units (for a total of four) and staying with the two I-units of Itanium. Data on the relative frequency of different integer operations suggest that the vast majority of integer operations, about 90%, that occur in typical programs are of the type that can be executed by either an M-unit or I-unit [2]. If we consider a random selection of six integer operations, each with a 90% chance of being executable by an M-unit, then the odds are better than 98% that any six instructions are compatible with the MMI + MMI bundle pair combination and can be dual issued by McKinley. Thus there is practically no incentive to add two extra I-units to McKinley to permit the dual issue of the MII + MII bundle pair combination.
One curiosity in the McKinley disclosure was the fact that the basic execution pipeline was revealed to be 8 stages long. Although this is still 2 stages shorter than the pipeline in the slower clocked Itanium, it is one more stage than the 7 stages previously attributed to McKinley [3]. Whether this represents a slightly different way of counting the pipe stages or an actual design change isn't clear. Ironically, it has long been rumored that the Itanium pipeline was stretched by at least one stage quite late in development. It will be interesting to see if the new IA64 core under development by the former Alpha EV8 design team (now at Intel) also suffers this strange pipeline growth affliction.
Hammering x86 into the 64 bit World
In October AMD revealed some aspects of K8, its next generation x86 core code-named Hammer [4]. This new design is primarily distinguished by being the first processor to implement x86-64, AMD's extension to the x86 instruction that supports 64 bit flat addressing, 64 bit GPRs, as well as other enhancements. As can be seen in Figure 2, the Hammer core heavily leverages AMD's highly successful K7 core
Figure 2 Comparison of K7 Athlon and K8 Hammer Organization
The back end execution engine of the K8 Hammer core is basically identical to that of the K7 except that the integer schedulers are expanded from 5 to 8 ROPs. The increase in the integer out-of-order instruction scheduling capability this implies may have been intended to better hide the data cache's two cycle load-use latency, and thus slightly increase per clock performance. An alternative hypothesis is that the latency of some integer operations may have been increased to allow higher clock rates and the change was made to prevent a slight loss in per clock performance. The basic execution pipeline of the K7 and K8 are compared in Figure 3.
Figure 3 Comparison of K7 and K8 Basic Execution Pipeline
The K8 execution pipeline has two more stages than K7, and these new stages seem to be related to x86 instruction decode and macro op distribution to the integer and floating point schedulers. Although some of the stages have been renamed it appears that the final five pipe stages, representing the back end execution engine, are comparable. This is unsurprising as the most complex and difficult task in an x86 processor like the K7 or K8 is the parallel parsing of up to three variable length x86 instructions from the instruction fetch byte stream and their decoding into groups of systematized internal operations. In comparison, the execution engine is hardly much more complex than a typical out-of-order execution RISC processor.
Both the block diagram and execution pipeline indicate that AMD has spent nearly all its effort in Hammer development at revamping the front end of the K7 design. Some of the extra degree of pipelining may be related to the extra degree of complexity in decoding yet another level of extensions (x86-64) on top of the already Byzantine x86 ISA. Some of the increase may be related to increased flexibility in internal operation dispatch to reduce the occurrence of stall conditions and increase IPC. And, some of the increase may simply reflect a reduction in the work per stage to increase the clock scalability relative to the K7 core. Without a detailed description of each of the pipeline stages in the K8 it is difficult to correlate front end pipe stages in the K7 to the K8, and next to impossible to assess how the benefit of the extra two pipe stages is allocated between accounting for increased ISA complexity, measures to increase IPC, and reduction in timing pressure per pipe stage to allow higher clock rates.
Although the 64-bit instruction set extension makes for attention grabbing headlines in the technical trade press, the major performance enhancements in the Hammer series are much more prosaic from a processor architecture point of view. These enhancements are the direct integration of interprocessor communications interfaces and a high performance memory controller. Like a "poor man's EV7", the Hammer includes three bi-directional HyperTransport (HT) links and a memory controller supporting a 64 or 128-bit wide DDR memory system using unbuffered or registered DIMMs. With the latter, a K8 processor can directly connect to 8 DIMMs, although this number may be reduced at the higher memory speeds supported. It is interesting to compare the results of the same design philosophy applied to the high-end server and mainstream PC segments of the MPU market as shown in Table 2. Power and clock rates for the Hammer MPU are estimates.
Alpha EV7 [5]
K8 Hammer
Process
0.18 um bulk CMOS
0.13 um SOI CMOS
Die Size
397 mm2
104 mm2
Power
125 W @ 1.2 GHz
~70 W @ 2 GHz
Comm Links
4 links, each 6.4 GB/s,
one 6.4 GB/s IO bus
3 links, each ~6 GB/s
Memory Controller
2 x 64 bit DRDRAM
12.8 GB/s peak
64 or 128 bit DDR
2.7 or 5.4 GB/s peak
Package
1443 LGA
?
Although the Intel McKinley and AMD Hammer are both 64 bit MPUs, these devices are directed at different markets. While the large and expensive McKinley will target medium and high-end server applications, the first member of the Hammer family, code named "Clawhammer", will target the high end desktop PC market. That is not to say that McKinley will outperform the Clawhammer device. Indeed, I expect the AMD device will easily beat the much slower clocked IA64 server chip in SPECint2K and many other integer benchmarks, as well as challenge much faster clocked Pentium 4 devices in both integer and floating point performance.
Exactly how much performance the Hammer core may provide is the subject of some controversy. AMD's Fred Weber was quoted as stating the Hammer core could offer SPECint2k performance as much as twice that of current processors. Although this comment is vague enough to drive a truck through (twice as fast as the best AMD processor? Best x86 processor? Best processor announced but not yet shipping?, IA-32 or x86-64 code?, Clawhammer or the big cache Sledgehammer?) a few web based news sites interpreted this comment as meaning the Hammer would achieve 1400 SPECint2k and now some people are incorrectly attributing this figure to Weber himself. Keep in mind that no Hammer device has even taped out as of the end of 3Q01 let alone been fabricated, debugged, verified, and benchmarked at the target clock frequency. Whatever figure Mr. Weber had in mind was derived from architectural simulation and for a benchmark suite as cycle intensive as SPEC CPU simulation results are approximate at best [6][7]. As been shown time and time again, it is best not to count performance chickens too closely before the silicon eggs hatch.
Alpha Goes Out With a Bang not a Whimper
Although Compaq announced the wind down of Alpha development in June and transferred nearly the entire EV8 development team to Intel over the summer there is still one more surprise in store for the computer industry. The EV7, the final major design revision in store for Alpha, has been the subject of intense testing, verification, and system integration exercises since late spring. This design has been in the pipeline for a long time. It was first announced more than three years ago and finally taped out in early 2001. Because the complexity of this device (basically a complex CPU and large scale server chipset all on one die) and the incredible degree of shakedown server class MPUs and systems undergo, the EV7 will not go into volume production until the second half of 2002. To bridge the gap between current products and EV7 based systems Compaq will shortly release a 1.25 GHz version of the workhorse EV68.
Although general details of the EV7 design have been in the public domain for more than three years, and specific facts about the performance of this MPU's router and memory controllers were disclosed in February, I think the performance it will achieve when officially rolled out in 2H02 will surprise and dismay many in the industry (possibly including senior Compaq management). At the Microprocessor Forum in October Compaq's Peter Bannon unveiled some preliminary performance numbers for the EV7, namely 804 SPECint2k, 1253 SPECfp2k, and roughly 5 GB/s STREAM performance.
Although these numbers are quite good in absolute terms, comparable to the fastest speed grade POWER4 running in a contrived and unrealistic hardware configuration, the numbers failed to live up to my estimates given in a previous article. However, former members of the Alpha design team have privately confirmed my suspicions that Mr. Bannon was clearly sandbagging the EV7 numbers, keeping a not insignificant amount of performance off the table. For a product still more than six months from release that is a not unexpected tactic. I still hold the opinion that when it is all said and done the EV7 has a good chance of being the highest performance general purpose microprocessor ever fabricated in 0.18 um technology, a fitting ending to a remarkable and tragic technological saga (EV79, an EV7 shrink to 0.13 um SOI is on the roadmap for 1H04 but the continued turmoil at Compaq suggests a healthy amount of scepticism is in order).
Sun's Surprising Spike SPARCs SPECulation
Sun recently introduced a new member of its UltraSPARC-III family. This new 900 MHz device differs from earlier US-III parts by the use of copper interconnect instead of aluminum. Although Sun submitted official SPEC scores for a 900 MHz Sun Blade 1000 Model 1900 using an aluminum US-III in late 2000, yield was apparently poor and this speed grade wasn't generally available. A rarely occurring bug related to a prefetch buffer inside the US-III was discovered and as a work around this feature was disabled in firmware. Unfortunately for Sun Microsystems, this caused the SPECfp_base2k score for the Model 1900 to drop from an already lackluster 427 to a lamentable 369 in a second SPEC submission in the spring of 2001. So it comes as no small surprise that the Sun Blade 1000 Model 900 Cu workstation, based on the new copper processor turned in a SPECfp_base2k score of 629 in a recent submission. Both the Model 1900 and Model 900 Cu versions of the Blade 1000 feature 8 MB of L2 cache.
It is possible that the copper US-III incorporates improvements beyond a fix to the prefetch buffer bug as well as improvements to system level hardware between the Model 1900 and Model 900 Cu. However it appears much of the improvement can be attributed to the use of the Sun Forte 7 EA compiler instead of the earlier Forte 6 update 1 compiler used to generate the 427 and 369 scores. The reason why I say that with confidence can be seen quite readily in the graph in Figure 4.
Figure 3 SPECfp_base2k Component Scores for US-III and Competitors
The SPECfp_base2k scores for the 14 sub-component programs for the pre-bug fix Sun Blade 1000 Model 1900 submission using the Forte 6 compiler are compared to the recent Sun Blade Model 900 submission using the Forte 7 compiler. In addition, scores for the Itanium (4MB, 800 MHz version in an HP i2000), Alpha EV68C (1000 MHz version in an ES45/1000), and POWER4 (1300 MHz version in a pSeries 690 Turbo) are provided for reference. It is the new compiler's score on the 179.art program that quite literally stands out from the rest. Although several other programs see appreciable improvement (the 183.equake score nearly triples), the new compiler increases the score of 179.art by more than 800%. In absolute terms this score, 8176, is more than four times higher than that achieved by the Alpha EV68 and POWER4, MPUs that easily beat the copper US-III on nearly every other SPECfp2k program. The 179.art score achieved by the Forte 7 compiler is vital to the new machine's pumped up SPECfp_base2k score. If you leave 179.art out of the geometric mean then its SPECfp_base2k score would drop by nearly 18% from 629 to 516.
This remarkable improvement on 179.art is unusual in the field of compiler engineering where single digit percentage performance increases are often considered major victories. So it is no surprise that Sun's achievement immediately raised suspicions among industry observers and competitors about the nature of the optimization employed by the Forte 7 compiler. It is hard not to think of Intel's infamous eqntott compiler bug that erroneously increased the SPECint92 score of its processors by about 10% until caught and fixed [8]. This bug used an illegal optimization that allowed the output of 023.eqntott to pass result checking with the test data used but was invalid in the general case.
Although the exact nature of the new Sun optimization isn't known, suspicion has fallen on several inner loops within the 179.art program. Speculation is that this code was originally written in FORTRAN and converted to C. Because FORTRAN and C access two dimensional arrays in opposite row and column order it is presumed that 179.art accesses arrays by the wrong index in the innermost loop causing poor cache locality. It is possible that the new Sun compiler recognizes this situation and turns the nested loops that step through the array accesses "inside out" and achieves much lower cache miss rates. Whatever the exact nature of the Sun optimization turns out to be there is the question of whether it violates one of the SPEC rules, namely "Optimizations must improve performance for a class of programs where the class of programs must be larger than a single SPEC benchmark or benchmark suite".
Without knowing the nature of the new Sun optimization it is impossible to say whether Sun should be praised or scolded. But here are the words of Sun engineer John Henning who made the following comments in a November 27 post to the comp.arch usenet news group:
"Our compiler team believes that what Sun has done with art is (1) the result of perfrectly [sic] legitimate optimizations (2) compliant with SPEC's rules and (3) not appropriate for further discussion - if you want to figure out to make art faster, go work on it yourself, don't ask Sun how we did it!"
With the widespread attention this incident has engendered within the industry we can presume that compiler and benchmarking experts working for Sun's competitors have closely scrutinized the code Forte 7 generates for 179.art. The fact that Sun's new scores haven't been withdrawn from the SPEC official web site yet suggests that Mr. Henning is correct. No doubt we can expect competitor's processors to score much higher on 179.art in the months and years to come as the Sun optimization migrates to other compilers. Depreciation of a benchmark's value is seldom as spectacular as in the case of 179.art, but still naturally occurs over time and provides incentive to accelerate the development of a successor to the SPEC CPU 2000 benchmark suite (which no doubt will not include 179.art). A message soliciting programs for this new suite, tentatively named SPEC 2004, was posted on comp.arch on December 28. Ironically the author of this message, the secretary of the SPEC CPU subcommittee, is none other than the previously mentioned John Henning.
Conclusion
It is comforting to see the pace of innovation in the microprocessor field shows no sign of slackening. The great seesaw battle between Intel and AMD for share of silicon's richest prize, the x86 microprocessor market, is about to enter a new phase with the imminent release of the 0.13 um Northwood Pentium 4. Although AMD will also migrate its K7 core to 0.13 um later in 2002 with both bulk and SOI versions, it is unlikely to be in the position to regain the performance advantage over Intel it previously achieved with the T-bird and XP Athlon until its new 64-bit Hammer core ships. Unlike AMD, Intel plans to reserve its 64-bit offerings for the high-end market. With McKinley Intel hopes to address the significant performance difficulties seen in the Itanium in part by taking advantage of its capacious manufacturing facilities to incorporate a huge amount of on-chip cache on its sizable die.
It seems like the time it takes for new ideas and features to migrate down from high-end server MPUs to mass-market devices is shrinking. The integration of high performance interprocessor communication links and memory controller(s) onto a processor die has been on the drawing board for many years and will soon be realized in the high end server market in the form of the EV7. Remarkably, the same concepts will appear in a mass-market x86 processor, the first of AMD's Hammer series, not too much later. Although these features will naturally be more limited in the scope in the x86 device to keep costs under control, they should still provide a large boost in performance from significantly reduced memory access latency as well as a dramatic reduction in the cost of producing multiprocessor systems based on this device.
Few topics in the computer and microprocessor field can raise a controversy, as well as blood pressure, as quickly as benchmarks and benchmarking. Sun managed to throw a hand grenade into the simmering debate between the supporters and detractors of the industry standard SPEC CPU benchmark by speeding up the execution of one of the fourteen programs in the floating point suite by nearly an order of magnitude through the use of a previously unexploited compiler optimization. This in turn raised the SPECfp2k score of its latest US-III processor by roughly 20%. We can now look forward to the spectacle of competing firms scrambling to reverse engineer Sun's new compiler trick and incorporate the same voodoo into their own wares.
References
[1] Krewell, K."Intel's McKinley Comes Into View", Microprocessor Report, October 2001, Volume 15, Archive 10.
[2] Hennessy, J. and Patterson, D., "Computer Architecture A Quantitative Approach", Morgan Kaufmann Publishers Inc., 1990, ISBN 1-55860-069-8, p. 181.
[3] Advance Program, 2001 IEEE International Solid-State Circuits Conference", p. 35.
[4] Weber, F., "AMD's Next Generation Microprocessor Architecture", October 2001, Downloaded from AMD web site.
[5] Jain, A. et al, "A 1.2 Ghz Alpha Microprocessor with 44.8 GB/s Chip Pin Bandwidth", Digest of Technical Papers, ISSCC 2001, Feb 6, 2001, p. 240.
[6] Dulong, C. et al, "The Making of a Compiler for the Intel Itanium Processor", Intel Technology Journal, Q3 2001, Downloaded from Intel web site.
[7] Desikan, R. et al, "Measuring Experimental Error in Microprocessor Simulation", Digest of Technical Papers, 28th Annual International Symposium on Computer Architecture, June 2001.
[8] "Intel OverSPECs Parts", Microprocessor Report, January 22, 1996, Volume 10, Number 1, P. 5.
Compaq and Alphacide
by
Anonymous Coward
·
· Score: 4, Interesting
There is an interesting discussion over in comp.arch on Usenet about
Compaq, Alpha, and the Itanium. The thread is called
Alphacide. Interesting stuff. It appears
that Compaq drank the Koolade.
By the way, Pricewatch is quoting about $3K for the lowend Itaniums running at about 700 Mhz.
No thanks.
So why do I need 64bits?
by
thogard
·
· Score: 0, Troll
99.99% of everything my computers does is <32 bits. So if I get a 64 bit cpu, does that mean that my computer can slag around extra unused bits just for fun?
Re:So why do I need 64bits?
by
Anonymous Coward
·
· Score: 1, Insightful
I believe the power lies in the fact that with 64 bits you can do two 32-bit operations at a time.. Speeeeeed.
Re:So why do I need 64bits?
by
4im
·
· Score: 5, Interesting
One word: addressing. With those 32 bits,
you can
typically address up to 2 gig files on your
machine - which is a limit easily encountered when
you start working with video, for instance.
It took hacks to get 4 gig of RAM working on x86
with the linux kernel.
Go 64 bit, and that limit vanishes. You keep your
linear addressing, none of those ugly segments like
in the unfamous real-mode of PC-XT times.
I don't see what's really new about it all though,
we've had 64 bit since Alpha, and there's several
64 bit architectures around. It may not be
mainstream yet, but will IA 64 or Hammer really
change that (soon)? Allow me to have doubts.
Re:So why do I need 64bits?
by
vidarlo
·
· Score: 1, Informative
Maybe, but not completly true.
Linux does some hacks to work around this...
But things will be made easier with 64 bits.
Like Intel X86 32bits does support in Linux support up to 8 gb memory.
This is a hack, and is not native for X86
Re:So why do I need 64bits?
by
thogard
·
· Score: 3, Insightful
Since that was marked troll, I'll blow more karma...
With most operations, 64 bits isn't 2x as fast its 1x as fast unless you deal with the stack in which case it could be even slower.
Addressing has little to do with word size. The 8088 shows that.
Suns running in 64 bit mode are offten slower than running in 32 bit mode.
Nintendo 64 games are all 32 bit code with just a few 64 bit operations. The good emulator proved that.
As far as going two 32 bit ops at once, I still don't need a 64 bit data path to do that, I just need several 32 bit data paths. What I don't need is to dump a bunch of unused 64 bit number on the stack everytime an exception happens (which one of my computers has done about 1047563950 times in the last 51 days)
Re:So why do I need 64bits?
by
Phosphor3k
·
· Score: 0, Flamebait
Have you ever used a computer before?
Re:So why do I need 64bits?
by
Washizu
·
· Score: 2, Informative
It doesn't vanish, but it does open the room for a lot more RAM.
With 32 bits you can address 2 gigabytes worth of addresses:
0000000000000000000000000000000 is one address
0000000000000000000000000000001 is another
0000000000000000000000000000010 is another, and so on.
With 64 bits I believe you can address up to 8 exabytes of RAM, which is equal to 8192 petabytes or 8589934592 gigabytes. It shouldn't be too long before some program out there requires most of that to run even though now it seems like infinite RAM.
Re:So why do I need 64bits?
by
Anonymous Coward
·
· Score: 0
With 32 bits you can address 4GB.
With 64 bits you can address twice what you calculated.
Re:So why do I need 64bits?
by
rhost89
·
· Score: 1
Umm excuse me, have you guys heard of protected mode? 32 bits will buy you 48. Theres something called a Segment Selector, which is 16 bits, and then you have your offset, which is 32 that the GDT register points to. Hello??? As far as getting linux to address 4 gigs of ram, It should have been able to do that right after switching into pmode. As far as 64 bit goes in general, its not going to give you anything more then what you can do now with mmx and sse registers. All of it is just a bunch of hype and none of this is going to help us for at least the next decade.
-- I will bend your mind with my spoon
Re:So why do I need 64bits?
by
Anonymous Coward
·
· Score: 0
So the other reply you got was not exactly nice,
but still valid IMHO. Have you ever done assembler
programming?
Depending on what you do and how you code, you
can do twice as much in 64bit as in 32bit.
Not on average, for sure, but for some stuff, yes,
it's possible.
For your reply about addressing: do you really want
to go back to segmented memory? Why do you think
there's these 64k resources limits in Windows?
I challenge you to implement the same application
in x86 real mode code (that 8088 you spoke of),
and do the same on a Motorola 68HC11
microcontroller (hint: linear addressing). Both
do 16bit code. Have fun with the memory management.
You'll notice the difference pretty fast.
As for your 2x 32bit ops: well gee, that's what's
being done in the current 32bit CPUs. Don't
discount the utility of 64bit for people out there
just because _you_ don't know how to use it.
Oh yeah, for the guy that said the bit about 4G
address space with 32bits: when jumping around
in [assembler] code, you want to be able to use
negative numbers -> gives you +- 2GB range, but
essentially, as you use signed numbers, your
[linear] address space is reduced to 2GB.
Re:So why do I need 64bits?
by
Anonymous Coward
·
· Score: 0
So you also want to go back to that horrible
segmented memory mess? Then go ahead, have fun
with it. I sure prefer linear addressing.
You also don't seem to have caught the little bit
about using _signed_ numbers in jumps (you know,
to distinguish forward/backward jumps) - here's
where those 4GB get cut in half, i.e. 2GB. And to
get around that problem _is_ an ugly hack.
And you forget: MMX and SSE are an ugly hack for
what the CPU should have been able to do in the
first place. And limited to x86 at that. There's
more out there in the world, go check it out.
There's very good reasons why data centers don't
use PCs but 64bit architecture machines (Alpha,
UltraSparc, SGI stuff, Power, etc.), and have been
for some time. I'll agree though, that you may not
need those 64bit on your desktop - even if some
people already do so.
Re:So why do I need 64bits?
by
rabidcow
·
· Score: 1
With 32-bits, you should be able to access 4gigs of memory, I assume there's something in the system that uses one bit as a flag to limit it to 2 gigs.
If you *really* need more memory, using segment registers (on 386 even) will give you 46 bits, for 64 terabytes (per task btw, not total). But that's not linear, and it's VIRTUAL, not PHYSICAL memory. The hardware address bus is still only 32 bits wide (well, technically a bit less, but only because memory isn't arranged in bytes), so you can still only access 4 gigs of physical ram without EXTREME ugliness.
Re:So why do I need 64bits?
by
rhost89
·
· Score: 1
Protected mode is not segmented. Read the Intel docs, you have a GDT with the selector and the offset within the selector which lets you access 4 gb of mem. And for the love of god, using unsigned numbers is not a hack, whens the last time you has -2GB of memory hmmm..... Clear the damn SF and be done with it. MMX and SSE give you 64 bit registers, and effectively give you the same as x86-64. Use al/h,ax,eax for your 8,16, and 32, and use your mm0 etc.. for you 64. If you need to get the high or low half and place it in eax etc. just copy it into and mov it. Even though it requires 2 more instructions to do it, just because its not ocupying the same register as eax dosent mean its an ugly hack although i would prefer that it be an extention to the gpr.
-- I will bend your mind with my spoon
Re:So why do I need 64bits?
by
Anonymous Coward
·
· Score: 0
From above:
>With 64 bits I believe you can address up to 8
>exabytes of RAM, which is equal to 8192 petabytes
>or 8589934592 gigabytes. It shouldn't be too long
>before some program out there requires most of
>that to run even though now it seems like
>infinite RAM.
I don't think so. Moore's law only goes so far. Observe. 2^64 is approximately 10^19. That is approximately of the same order of magnitude as one mol, 10^23. So, assuming you have one mol of material as memory chips (that's a lot of memory chips, something like 40 grams or so, though I don't have the periodic table right in front of me.) and each memory chip is composed of nothing but memory cells, and each memory cell uses only 10^4 atoms per memory cell (including interconnects) you could just use up the 64 bit address space. In practice all those assumptions are hopelessly optimistic.
Using up a 64 bit address space is on the absolute fringe of what might someday be possible without things like quantum computing, which wouldn't precisely have bit-widths anyway.
Now moore's law will continue for awhile, and I can imagine standard computers eventually coming with hundreds of gigabytes, or even thousands of gigabytes of ram, but billions of gigabytes? I doubt that will occur in any reasonable timeframe (say the next 100 years).
Though if you do believe in the endless continuation of moore's law then the 64 bit address space should be enough for us for about 50 years.
I think the only thing that could use that much ram anyway would be something like the simulation of macroscopic objects on an atomic level, not something that is likely to happen for the forseable future.
Just my $.02
Re:So why do I need 64bits?
by
mr3038
·
· Score: 2
Even though it requires 2 more instructions to do it, just because its not ocupying the same register as eax dosent mean its [MMX] an ugly hack although i would prefer that it be an extention to the gpr.(sic)
I surely would call MMX as an ugly hack. Not because one needs to use normal registers to access data but because MMX uses FPU registers. Hello? To use instructions designed for 64bit integer calculations, you need to disable FPU? And remember, this was because OSes couldn't support task switching without changes if there wouldn't have been a hack like this. MMX is useful for such a special cases that practically no compiler generates MMX code - it's always hand-tuned assembler.
-- _________________________
Spelling and grammar mistakes left as an exercise for the reader.
Re:So why do I need 64bits?
by
matrix29
·
· Score: 1
With 32 bits you can address 4GB.
With 64 bits you can address twice what you calculated.
Bzzzzzt! Wrong.
If you want TWICE the addressing you need 33 bits (each bit is 2^N thus 2^32 * 2 = 2^33). Open up the calculator and do 2^64. That's equal to 17179869184 GB. That should be enough even with WinBlows Bloatware.
-- "Face it, a nation that maintains a 72% approval rating on George W. Bush is a nation with a very loose grip on reality.
Re:So why do I need 64bits?
by
VAXman
·
· Score: 2
This is wrong. The maximum linear address on IA-32 is 2^32. The maximum virtual address is also 2^32. Segments can't give you above that, they just adjust the base address and the limit. ALthough you can construct an address which goes up to 8GB (which a very large base, and a large virtual offset), the address will wrap around to below 4GB.
And, the physical address is LARGER than 32 bits. It is 36 bits on Wmt. The physical bus on the processors has 36 bits for the address (well, actually 33 bits, since all addresses are chunk aligned, but that's an implementation detail).
FYI, on x86-64, the maximum linear and virtual addresses are 2^48, and the maximum physical address is 2^40.
Re:So why do I need 64bits?
by
rabidcow
·
· Score: 1
"Memory is organized into one or more variable length segments, each up to four gigabytes (2^32 bytes) in size. A segments can have attributes associated with it which include its location, size, type (i.e., stack, code or data), and protection characteristics. Each task on an Intel486 microprocessor can have a maximum of 16,381 segments, each up to four gigabytes in size. Thus each task has a maximum of 64 terabytes (trillion bytes) of virtual memory."
I believe the trick is to mark segment registers as invalid and use some of the free bits in the descriptor to tell the OS where to load it from. When the program tries to access that segment, an exception is generated which allows the OS to swap in the appropriate data. It's much like virtual memory is usually done, but using the segment registers as "super" pages.
the physical address is LARGER than 32 bits. It is 36 bits on Wmt.
Oh, ok. 36 bits/64 gigs.
Re:So why do I need 64bits?
by
VAXman
·
· Score: 2
The last sentence of that is misleading, if not plain wrong. That is the maximum number of virtual addresses you can form within one task, yes. But those addresses map all to 2^32 bytes of memory.
Chapter 3 of volume 3 of the current IA-32 manual is much more clear on this.
The easiest way to look at this is through paging, which clearly has a 32 bit size.
I'd be curious of a more complete reference for that citation (URL?)
Re:So why do I need 64bits?
by
rabidcow
·
· Score: 1
Alas, they seem to have rewritten this section and removed that. At least I can't find it in any of the online docs for the Pentium. Hrmph... It used to be part of the "Architectural Overview", section 2.0
It's not really practical to do something like that anyway, at least on that scale, BUT the segment descriptors contain a "present" flag exactly like the page table entries.
Even in the old 1992 docs there's not a lot of detail as to how to do this, but I would imagine it would be almost identical to the way virtual paging is done. When a segment register is loaded with a segment whose descriptore is marked "not present," the OS loads the appropriate data into the linear 32-bit address space and continues, probably pushing out a segment that's not in use and clearing its present flag.
Honestly, I don't see how it would even be possible to do that with full 4gig segments, maybe that's why Intel removed it. For smaller segments, it should be possible, though it would mean a LOT of moving things around. (like virtual memory with REALLY big pages)
Way to go Intel, make me look stupid;)
Re:So why do I need 64bits?
by
Anonymous Coward
·
· Score: 0
Current 32bit processors can do two 32bit instructions at a time from the Pentium on (in Intel's line) because it had a U and a V pipe insturctions could be executed by.
With the PentiumMMX our 32bit processors, by your standard, became 64bit processors that could work on 8x8bit, 4x16bit, or 2x32 integers at a time. WIth the PentiumIII and SSE they became "128bit processors" that could do 4x32bit floats.
Of course AMD's Athlon were only "64bit" because they could only do 2x32bit with 3DNow!, but they could do two of those per cycle like the original Pentium with regular intructions meaning it was 2x2x32bit which is actually much more flexable than 4x32bit.
64bits only allows one thing, linear addressing of more than 4gigs.
AMD's x86-64 instruction set brings other things to the table that enhance performance (16 instead of 8 regesters), but those could've been done with 32bit chips.
compilet optimization
by
Anonymous Coward
·
· Score: 0
*We can now look forward to the spectacle of competing firms scrambling to reverse engineer Sun's new compiler trick and incorporate the same voodoo into their own wares*
This will be excellent, duel rates of improvement= fasterer computers, the elbow in the rate of moores law. Still, applications for this tech have imagine a beowulf of these results as side effects.
Why is gcc produced code so slow?
by
Anonymous Coward
·
· Score: 0
Speaking of compilers, why does gcc suck so much even on ix86 architecture?
I couldn't believe the speedup I got when I installed Intel's Linux C compiler and recompiled my computational materials physics simulation code...
Re:Why is gcc produced code so slow?
by
jazzyjez
·
· Score: 2, Insightful
This is because the P4 is a very peculiar beast that needs many optimisations for the code to run fast. Indeed, when Intel first shipped it they had to ship a specially optimised MPEG decoder for it to appear any faster than the PIII on benchmarks. For more info, check this out:
Compilers are notoriously slow at catching up with the latest processor design, and you can probably expect gcc to catch up with the P4 around the time it's superseded by the 64-bit babies.
This is not to slur gcc - M$'s Visual Studio compiler suite hasn't yet been optimised for the P4 as far as I know (although I expect the.NET version will be) despite the vastly greater resources they have to throw at it...
Re:Why is gcc produced code so slow?
by
Lars+T.
·
· Score: 2
If you use Intel's C compiler (esp. when using -ipo (inter procedural optimization)), you may want to check the results. It sometimes trades speed for correct results. See this article (in German).
--
Lars T.
To the guy who modded me down from perfect to terrible Karma - Apple haters still suck
Re:Why is gcc produced code so slow?
by
Anonymous Coward
·
· Score: 0
GCC is a fully cross-platform compiler whereas Intel's compiler is optimized for the Pentium architecture.
The optimization gives great results for Intel's own processors but try using it to compile something for Athlon and you'll see why GCC is an excellent choice.
Re:Why is gcc produced code so slow?
by
Anonymous Coward
·
· Score: 0
you should use the intel compiler for the processor. it is what they use (as a drop-in for visual studio) to run SPEC. Likewise, anyone
who needs to get serious floating point done
on an x86 should use it. And, yes, it is available
for linux.
Re:Why is gcc produced code so slow?
by
Anonymous Coward
·
· Score: 0
That emulators.com article on the P4 is great! The dude knows what he talks about because he has to count his cycles for his amazing Gemulator 2000 program.
GOATSE.CX
by
Anonymous Coward
·
· Score: 0
i shoulda lernd by now
Shrinkage
by
CaptainAlbert
·
· Score: 5, Informative
Impressive though 64-bit processors might be, I'm not convinced that the performance improvement is going to be as big as people are expecting.
Remember that the components in any digital system - and I'm not just talking about your windoze desktop PC, but servers, mainframes and embedded systems too - have to talk to each other in order to do anything remotely useful. Last time I looked, most PCI devices din't utilise the provision for 64-bit data bus operation.
There's a perfectly good reason for this, of course... in order to attach a chip to a circuit board, you need an array of pins (or solder balls) that are macroscopic, so they can be soldered and handled without too much risk of accidental damage. Additionally, PCB tracks can only go so small (and so close together) without undesirable electrical effects and again, an inability to work with it in a production environment.
The "more bits" phenomenon has been sustained by improvements in VLSI and the advent of true System-on-a-chip design, but this too has its limits. If you compare a P4 motherboard with, say, a 386 mobo circa 1995, you'll see the chip count is drastically reduced. But fewer interconnected components means less repairability, upgradability, and interoperability. My old 486 had a VLB EIDE hard disk controller, which I swapped in after the last one failed. If my controller failed today, I couldn't do that; I'd either need to buy a new mobo or start replacing chips on the old one (which is just as expensive).
Don't get me wrong - I'm all for progress! And I expect we'll see more and more 64/128-bit chips springing up inside custom devices (e.g. 3D cards, routers) where the local interconnect can be made as fat as necessary. But the PC will remained shackled by slow frontside busses for a while yet, I reckon.
My old 486 had a VLB EIDE hard disk controller, which I swapped in after the last one failed. If my controller failed today, I couldn't do that; I'd either need to buy a new mobo or start replacing chips on the old one (which is just as expensive).
Perhaps your 486 MB was the first of its kind, but modern motherboards with integrated devices have the ability to disable them so that can be replaced by cards in slots.
This all stems from the fact that those 'chips' that are taking ever more responsibility are trashable. I remember watching an old movie in gradeschool about the development of computers (this would've been in the 80s). A man recalled an interview where the reporter kept asking what sort of tiny tools the guy would use to go in and fix a part of the circuit (the reporter's mind was forever stuck with tubes). Eventually, the guy got through to him that the chip wouldn't be repaired, just replaced.
Thus, the chip count may be reduced, implying more complex chips, but they're not necessarily more expensive. On the other hand, they've become so cheap that its more cost-effective to bundle the functions of multiple chips past into a single chip.
But, still , regarding your BUS argument, there have been numerous articles all over the web about newer BUS standards competing to be the future industry standard. Those BUSes will get big right when these chips do.
Re:Shrinkage
by
CaptainAlbert
·
· Score: 5, Interesting
> Perhaps your 486 MB was the first of its kind,
> but modern motherboards with integrated devices
> have the ability to disable them so that can be
> replaced by cards in slots.
True, but that presupposes the existence of spare slots;-)
I hear what you're saying about trashable chips, but I think the real phenomenon is the "trashable board". Think about it - if your mobo dies and your warrantee has run out, you go buy a replacement and ditch the old board. If it happens still to be under its manufacturer's warrantee, most likely you just take it back to the shop and swap it for a working one. What happens to the old one? Most likely, they throw it away. The cost of postage, packing, an engineer's time to find the problem, repairs, parts... it's more than the damn thing retails for anyway.
I think this is missing the point anyway. The integration idea goes like this: with today's technology, you could put the equivalent of an early Pentium processor, plus hard disk and graphics controllers, BIOS chipset, etc. onto a single piece of silicon. Pretty much all you'd be left with off-chip would be (a) RAM and (b) I/O circuitry, because they're both harder to integrate. So your computer is about four or five chips. This is approximately the case in palm-tops now.
The point is that you've lost all ability to choose your own components. That graphics block/macrocell has probably been chosen by the manufacturer becuase it was the best value for money (i.e. the cheapest they could find). If you're lucky, they will give you expansion ports so you can plug your own stuff in. But that costs money, and if they think you'll pay for the lesser product then they'll make that instead.
Does it matter? Probably not to the average user. But I think it would matter to the industry. The whole point of having standard architectures like PCI, SCSI, EIDE (and before them, ISA et al.) is that many vendors can compete to produce compatible products, which drives innovation and generally provides a good deal for the consumer.
But if the minimisation continues and the busses become subsumed into the very chips themselves, then the chances are the manufacturers will cut corners. They won't wait for the not-quite-standard-yet SuperBus2005 architecture... they'll design their own and make you buy their proprietary upgrades. Again, the economics work out such that you the consumer probably get a good deal. But trading off good deals today against innovation tomorrow is dangerous.
So, it would be much better to keep all those busses outside the individual components, right? But that's exactly what is keeping the PC architecture slow at the moment (which was the point of my previous post. I think.).
Remember that the components in any digital system - and I'm not just talking about your windoze desktop PC, but servers, mainframes and embedded systems too - have to talk to each other in order to do anything remotely useful. Last time I looked, most PCI devices din't utilise the provision for 64-bit data bus operation.
PCI devices or PCI busses? Even the original old PCI buses support 64bit transfers via multiplexing (2 32bit transfers). So the bandwidth essentially remained the same, but usage as a "64 bit bus" was supported.
However, just because a CPU can process at 64 bits does not mean it must communicate at 64 bits outside the CPU. 64 bit CPU's do often support smaller word transfers.
It is true that most PCI devices are not true 64bit PCI, but that is mainly due to there being no need for the bandwidth that 64bit PCI affords.
If the bandwidth of 32bit at 33MHz (132MB/s) is not enough for your device to operate at it's fullest potential, then it is probably available as a true 64bit PCI device for a 64bit 66MHz PCI (528MB/s) slot, found in servers.
Realise that the IDE bus that may well be used in your computer, is only 16 bits wide. A 64bit CPU most certainly does not require 64bit here there and everywhere.
My old 486 had a VLB EIDE hard disk controller, which I swapped in after the last one failed. If my controller failed today, I couldn't do that; I'd either need to buy a new mobo or start replacing chips on the old one (which is just as expensive).
Not true, I've yet to see a mobo that would not allow the disabling of it's onboard VGA, IDE, SCSI, SERIAL, PARALLEL, USB, etc. Adding a card to replace a busted and disabled onboard device usually works.
The real value of a 64bit CPU over a 32bit CPU, is in the ability to compute more data at once, higher precision data or larger number data much faster and possibly also address way more data if a 32bit address bus is being compared with a 64bit address bus. A 64bit address bus, can access 4,294,967,296 *times* more data than a 32bit bus.
-- War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
Right. There is no particular reason why 64 bit processors should be faster than 32 bit processors ( or indeed why 16 bit processors should be faster than 8 bit processors!).
However there is a definite corelation between applications which require fast processing and applications which require lots of in memory data. (Large databases and really good graphic games are two that spring to mind). So chip manufacturers have tended to increase speed and addressability in parallel.
But then again if you cannot run a decent desktop in 1GB of memory just call yourself "Bill Gates".
Most of the mid to high end non Windows servers now uses 64 bit chips. They are nice and they are fast, but very few ship with > 2GB of memory.
PS. earlier on there was a comment whinging about SUN using a "special" compiler to increase thier SPEC scores. The compiler was not that "special" it just optimised for the parrellel instruction capability of the UltraSparc III architecture and will be included in the next release of the Sun compiler. So if you want really fast applications on the Sparc III you will have to buy Sun's compiler along with the truley awful "Forte" IDE.
-- Old COBOL programmers never die.
They just code in C.
Remember that the components in any digital system - and I'm not just talking about your windoze desktop PC, but servers, mainframes and embedded systems too - have to talk to each other in order to do anything remotely useful. Last time I looked, most PCI devices din't utilise the provision for 64-bit data bus operation.
True, most PCI devices don't support 64bit, but there's no need for most of them to. The biggest bottleneck isn't getting stuff off the network or even off the disk; it's moving data back and forth between memory and the CPU. Certainly, more bandwidth for some peripherals might help in some cases, but generally you're not going to see the need for it.
Memory is a different story of course. Take the RS/6000 platform as compared to Sun. Everything about the 6000s is designed around memory bandwidth, so that even mid-grade 6000s have crossbars capable of 20+GB/s. Sun doesn't do this (only their very high end boxes even have crossbars), and because of this you typically need a Sun box with 2x the number of CPUs as a 6000 to get equivalent performance on Oracle. Since Oracle is sold per CPU, a lot of companies are discovering that they can upgrade from Sun to RS/6000 for free, because the cost savings of the Oracle license pays for the upgrade. The 6000 gets this performance using PCI for peripherals, too.
There is no particular reason why 64 bit processors should be faster than 32 bit processors ( or indeed why 16 bit processors should be faster than 8 bit processors!).
Actually, there is every reason to expect a larger word size to increase speed, in some applications. Thanks to Mr Turing we all know that any computer can basically do the job of any other, but have you ever written a 16-bit multiply routine on an 8-bit CPU? It's not hard or anything, but it is slow.
Of course, most desktop applications today do not need 64-bit numbers....
-- "How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README
We were refering to addressability here. I seem to remember that even the huble pdp-11 could handle 16 bit intigers!
-- Old COBOL programmers never die.
They just code in C.
Is it really a benefit 32 vs 64?
by
miffo.swe
·
· Score: 1
Correct me if im wrong but isnt this going to make the programs bigger and heavier than before? I sure hope they will make it possible to make for example just a calculation in 64 bit and the rest in 32 bit instead of wasting bits for no reason.
Just thinking out loud =)
-- HTTP/1.1 400
Re:Is it really a benefit 32 vs 64?
by
Chatz
·
· Score: 2, Insightful
No, as some other replies have indicated, the only reason for 64-bit applications is to access 64-bit addresses. If you don't need to address that amount of memory, then your applications might as well be 32-bit. Any good processor will still give you 64-bit registers etc to work with.
-- There is folly and foolishness on the one side, and daring and calculation on the other.
- Admiral Pellew, Hornblower
Re:Is it really a benefit 32 vs 64?
by
statusbar
·
· Score: 2
It is not just real memory that is addressed - the important advantage of a 64 bit cpu is the additional virtual memory it can address.
When you have more than 2000 processes or threads on a box then you will see the difference that 64 bits makes.
Jeff
-- ipv6 is my vpn
Re:Is it really a benefit 32 vs 64?
by
Namarrgon
·
· Score: 2
That's because each thread (under WinNT/2K/XP) reserves 1 MB of address space for its stack, by default. We ran into that one:-) Wondered why we couldn't allocate any more memory, when we were only using half of what was there. Still had physical RAM, but no address space to map it to... 2 or 3 GB isn't enough. We need 64 bit CPUs.
It's possible to reduce the reserved stack, but only for all the threads in a process. We switched to using only a few threads & assigning jobs to them.
--
Why would anyone engrave "Elbereth"?
Re:Is it really a benefit 32 vs 64?
by
Anonymous Coward
·
· Score: 0
Actually the biggest benefit of x86-64 for the home user is the new instruction set with 16 instead of 8 regesters. With more regs you need far less load instructions. Performance increases dramacitcally by compiling to x86-64 instead of x86, even though the op codes themselves are bigger.
Re:PCWORLD Link
by
satanami69
·
· Score: 2, Informative
PCWorld did a review on Windows 64-bit OS. And they offer another write-up.
-- I really hate Dan Patrick.
Death of Alpha, Long Live AMD
by
hughk
·
· Score: 2
Ok, we have known that Alpha will die for some time now, I even have my doubts about the EV7. Digital was shipping their 64-bit micoprocessors before other people were and there previous forays into RISC gave them some good insights on the architectural problems.
With Digital being sold to Compaq and then Alpha being sold to Intel and Compaq possibly merging with HP, the future there is clouded. I have been working with Alphas and have been told that the future is Itanium coloured, but sorry, I don't really like the chip. EV7 will come out, but so far its performance doesn't look so competitive.
With a lot of former Digital talent working at AMD, I think this will be the better option. However, the K8 is not a clean design, it seems to be a 64-bit version of the K7 with some extras on the pipelineing. I guess hat the chip is not ging to be the easiest to get the best performance from.
Re:Death of Alpha, Long Live AMD
by
robinp
·
· Score: 1
The good thing about the alpha line is the memory bandwidth. essentially this is the tech that intel bought when it got alpha (along with the political gains of course). compaq still uses the alpha in it's proper hpc machines and will continue to do so until the memory bus technology of the alpha is rolled into the ia64 line.
When that happens ia64 will be a good proposition, until then i'd stick with alpha...
EV7 will come out, but so far its performance doesn't look so competitive.
Huh????? EV7 will almost certainly be the fastest MPU available at the time of its launch (by this I mean highest scores in SPEC2k int and fp), even ahead of the extremely expensive POWER4 (which sort of "cheats" on the single-threaded SPEC2k because then the one active core on the device gets the entire 128MB of shared L3 cache to itself).
Of course Compaq's support is questionable, the upgrade path is zero, and there's no telling how quickly they'll get out the high-end 32 and 64-way boxes out, but in terms of plain old CPU performance the EV7 is going to be the chip to beat. (BTW, sounds like you didn't read the article if you didn't get this point.)
Re:Death of Alpha, Long Live AMD
by
hughk
·
· Score: 2
The way I read the article, Compaq were being very careful about their performance figures and it doesn't compete well with the other procssors that will appear shortly after the EV7 launch. However, to be fair the article does make the point that it is normal for designers to be a little down about performance just before launch possibly due to yield issues. Until they get the full yield, it is difficult to get the full performance out of all the chips.
I like the Alpha though and have been using it since it first appeared. I will be very sorry to see it go.
Error Occurred While Processing Request
Error Diagnostic Information
An error occurred while attempting to establish a connection to the service.
The most likely cause of this problem is that the service is not currently running. You can use the 'Services' Control Panel to verify that the service is running and to restart it if necessary.
Windows NT error number 2 occurred.
Re:/.'ed already?
by
Anonymous Coward
·
· Score: 0
I really like the part with
"You can use the 'Services' Control Panel to verify that the service is running and to restart it if necessary. "
Where is the Control Panel in blackbox?
I wonder what Windows NT error number 1 is???
Intel learning from their mistakes
by
jazzyjez
·
· Score: 5, Insightful
Much as I hate to say it, the Intel McKinley looks like a very well designed piece of kit, and it appears Intel have learned from their mistakes with the P4 by including a big, fast 3-level cache on the McKinley. It's also good to see them reducing their pipeline size, which means it may finally be able to compete with the G4 in terms of efficiency. However, this is of course going to kick them in the teeth in terms of competing on processor speed, which they have been pushing so hard recently in their marketing.
The same can't be said of AMD's offering, although in fairness the Hammer is not directed at the server market unlike the McKinley. The pipeline is longer than both their previous design and the McKinley, which is going to give them a performance hit. We can only hope that their cache is as good as Intel's.
What amazes me is that they can still keep adding instruction extensions without too much of a performance hit. Anyone looked at the latest instruction set documentation for these processors? Eugh! The pain of backwards compatibility...
Re:Intel learning from their mistakes
by
nusuth
·
· Score: 4, Informative
IA64 is an incompatible and new instruction set, intel is not adding anything to their x86 ISA.
Hammer does not have an 3MB L3 but it has an integrated memory controller, that would drastically reduce latencies of cache misses.
Assuming amd will go fro bigger than 32 kb L1 cache, and will not succeed in making cache hits as fast as mckinley (speculation based on current offerings) picture is a bit complicated:
Watch it: hammer and mckinley asks for an instruction/piece of data, both hit, mckinley wins, but a more probable scenerio is mckinley misses and hammer hits - a clear win for hammer, a still more probable scenerio is that both misses. If data is in the L2, mckinley is faster, it has lower miss penalty and can fetch from L2 faster but it is more probable that it is in hammer's cache, but not in mckinley's cache, that would benefit hammer . If L2 misses too, but mckinley scores an L3 hit, mckinley wins, if it suffers from an L3 miss, it has to suffer both L3 miss latency and memory latency, but hammer suffers no L3 miss latency and its memory latency is probably much lower, so with huge data processed in not-so-tight loops hammer wins hands down, while for medium sized data that could fit into L3 mckinley wins hands down.
Although mckinley is a server product and hammer is not (or so it is said), an integrated memory controller benefits hammer in multiway systems so much that it may as well be positioned as a server product. No more asking the chipset to fetch a piece of data and wait until chipset serves other processors' requests, just go and grab it!
Finally, some of the hammer line will have L3 caches and hammer line will have a higher clockrate than mckinley. If Amd can deliver what they have promised, they have a clear winner overall. But I'm still a bit scpetical.
--
Gentlemen, you can't fight in here, this is the War Room!
Re:Intel learning from their mistakes
by
Anonymous Coward
·
· Score: 0
EPIC uses chache-hints, that should improve cache eficiency.
No more asking the chipset to fetch a piece of data and wait until chipset serves other processors' requests, just go and grab it!
You don't make it look like a good way to share memory among several CPUs.:-)
However, I'm curious: how will an integrated memory controler afect the use of this CPUs in exotic archietectures (NUMA for example)? This question has more to do with the Alpha than the Hammer though...
What do you mean "backwards compatibility"? The McKinley uses Intel's new IA-64 ISA, not x86. (Instead, it has a small chunk of real-estate devoted to translate x86 internally. Which means the chip is not optimized for x86.) IA-64 is much much cleaner than x86, and contains no reverse compatibility.
The big question is wether the compilers for IA-64 will be any good or not... that's what caused Intel's last attempt to divorce from x86 to fail, and their back-up plan (the 386, the 32-bit extension of the 16-bit x86 ISA) to succeed and become the most popular desktop microprocessor ever.
This time around, Intel doesn't have a backup plan , and AMD is the one doing the extension of a tried and true system.
Conclusion? I put my money on AMD, but if Intel can pull of the compiler, they dramatically increase their chances.
-- Those who fail to understand communication protocols, are doomed to repeat them over port 80.
Been in 64-bit heaven since IRIX 6.0 in 1994. PowerIndigo2 (R8000) on the desktop, Challenge XL's in the server room (R4400 and R10000). And, today, Octane on the desktop and Origin 300 + Origin 3000 in the server room. A few UltraSPARC Suns, too, but Solaris took its sweet time making the move to 64 bit (Sun started the migration with Solaris 2.5 and finished with Solaris 7).
AMD is deceiving you
by
hatchet
·
· Score: 3, Insightful
First of all I'd like to say, I am not biased in either way.. after all I'm going to get me a new AthlonXP next week.
IA64 is very different from x86-64. AMD's 64bit solution is nothing more than extension to current 32bit instruction set. Of course there are some tweaks, but nothing very radical. You will still be able to run old 16 and 8bit code efficiently.
Intel's IA64 is a huge step in the future... architecture wise is far superior to x86-64. Why?
Why do we need 64bit processors? Addressing? Nah, current processors can address enough space.. with 386 processors FAR addressing was introduced, which expanded allocatable address space drastically. (those silly DS, SS,.. registers) And newest processors can deal with them with same ease as with non-far addressing.
AMD's 64bit solution currently has no real value.. except for huge data storage (could work faster with 64bit data blocks) and probably some heavy encryption. x86-64 compiled Quake3 would make minimum use of 64bit registers.. and would probably be just a margin faster than IA32 compiled version.
Is IA64 better? Yes it is. IA64 has 128 usable 64bit registers, predicates... But that is not all.. in single 64bit register you can store 4 16bit values(common integer). (or 8 8bit or 2 32bit)And manipulate with them almost as much as you like. And if you have 4 integers in other register.. you can make 4 arithmetical operations with SINGLE instruction. You can do similar things with floating point operations... and with ILP you could do 3 instructions per cycle. This means that Quake2's VectorAdd/Subtract could be done in SINGLE cycle.
Clawhammer will be better for a year or so.. but soon it will hit the ceiling. Intel will be able to get better performance from 1/2 clocked IA64.
And please don't respond with lame comments if you haven't read at least whitepapers from Intel and AMD.
Don't current processors let you do the same thing? I mean, isn't the whole point of MMX to let you do, say, 4 16-bit operations at a time on a 64-bit register? and then there's SSE and 3dnow for floating-point.
From your description. it seems to me like IA64 is more MMX/SSE registers and an expanded instruction set, which is the same incremental change what we've been getting from each version of intel chips for a while now.
--
Benjamin Coates
Re:AMD is deceiving you
by
Peter+Harris
·
· Score: 1
Tough call. Is one grotesque backwards-compatibility botch better than another?
This probably qualifies as a "lame comment", but Alpha is arguably a cleaner and more elegant design. If you NEED a 64-bit server then Debian on Alpha will probably serve you well.
You probably don't though.
So why the fashion for 64bit x86-alikes? It's not like you'd want to run Windows on a serious server anyway.
Yes to some degree.. but IA64 is 10 times more flexible than SSE2. For example you can rotate 16bit parts within 64bit register. (i don't mean plain shl/shr) But 1,2,3,4 -> 4,3,2,1 or 1,2,3,4 -> 2,1,4,3.
You can download IA64 instruction set manual from intel's website.. any ASM programmer will be fascinated with options IA64 offers. I know, I was.
Re:AMD is deceiving you
by
statusbar
·
· Score: 3, Interesting
And unfortunately it will be a LONG TIME before good non-buggy optimizing compilers will be available for such a complex architecture.
software pipelining and parallel instructions give you a real complex monster cpu. Languages like C and C++ make it extremely hard to optimize for a cpu like this since C and C++ were never designed for fine-grained parallelism and software pipelining. So it results is a lot of wasted clock cycles.
What I'd LOVE to see is a statically typed pure functional language that could be used to generate the code for IA64. Then it would be feasable to fully take advantage of the IA64's features!
In the meantime, people compiling IA64 C code with GCC will be extremely disappointed. People compiling IA64 C code with Intel's optimizing compiler will be happier but will only be mildly impressed.
I've worked with VLIW (256 bit instructions) software pipelined DSP's before and learned very quickly that the C and C++ language standards are fundamentally limited for these things. I also learned very quickly that writing assembly language directly for them is an easy way to gain a special invitation to a padded room!!! I shudder to think what the compiler writers have to go through!
True.. IA64 will be very compiler dependant.. but there are still 3-4years to develop good and efficient compiler. I really don't see a point in buying Itanium atm. (except if you are an OS developer). If you really need supercomputer for server.. get yourself a CRAY:)
Exactly, ASM programmers. Actual compilers
are no where near dealing with all the toys
in IA64, so your left with programs that run
slower on IA64 than existing Sparc, Alpha and
X86 machines.
You made some good points. I've read both white papers though I honestly only understood half maybe 2/3, since I am no Electrical engineer, or assembly programmer.
The only real benefit I can see from a industry perspective is it will drive down the price of high end systems for corporations. Intel desperately wants to get into the mainframe, research and scientific market, since the margin are much higher. As others have noted, IA64 isn't going to really revolutionize partical physics, astrophysics, realtime weather simulation or any other research requiring massive bandwidth and address space. It might make it easier for smaller universities to build faster, better, and slightly cheaper clusters in 2 years, but for now who cares if AMD's extensions have a limit.
Really, if we want faser 3D graphics, we need faster Bus, memory and GPU's.
It all depends on how good register renaming and out-of-order execution units on hammer will be and how well ia64 compilers generate asm code. It doesn't matter how many registers are visible to the programmer or how many actually exists, the only important thing is how many of them are actually used. Other advantages of RISC-like arch. of ia86 over x86-64 are also depend on implementation. E.g. : in all current offerings (except P4) x86 floating point registers are essentialy on an array, not a stack. Legacy code generators can produce stack based code, and cpu can use it as if it was executing on a array based fpu. In the end you get a superior design's speed with full backward compatibility. Current x86 processors have a lot of tricks to sidestep limitations of x86 ISA, hammer will have more. Let's wait until the real benchmaks pour in, will we?
And if all else fails, SSE-3 and MMX-2 can provide your lovely 128 visible registers.
--
Gentlemen, you can't fight in here, this is the War Room!
Backwards compatability is the reason we're still using x86 processors today (and why Intel is so dominant): There have been dozens of technically superior CPUs that have come out over the years, always with the guise that with this compiler and this set of conditions it would be super mondo fast.
Re:AMD is deceiving you
by
SQL+Error
·
· Score: 5, Informative
Bullshit.
AMD has stated *explicitly* that the Hammer is an evolutionary rather than revolutionary design. They've said all along that it is an Athlon with 64-bit extensions and some minor tweaks (SSE2, extended pipeline). They haven't deceived anyone.
Now, as to the relative performance of the two architectures (x86-64 vs. IA-64): the Athlon XP 1900+ achieves a SpecInt2000 score of 701 (peak) while the 800MHz Itanium manages... 314. On floating point the Itanium does rather better: 645 vs. 634 for the Athlon. (The current leader is the IBM Power4, which gets 814 SpecInt and 1169 SpecFP.)
Having 128 64-bit registers is good, but remember that the Athlon and Hammer have far more physical registers than are presented in the programming model, and automatically map them according to the requirements of instructions in the pipeline. And the predicates and wide issue of the Itanium are balanced against the ability of the Athlon to *automatically* issue instructions speculatively and re-order the instruction queue to improve ILP.
And on the subject of manipulating multiple values with a single instruction: ever heard of MMX? 3DNow? SSE? Athlon has all of these, and Hammer will add SSE2. What do you think these are for?
As to the value of 64-bit addressing: I've programmed for machines (Suns and Compaq Alphas) with as much as 64GB of memory. While you *can* address that much with a 32-bit CPU, it means that you have to constantly re-map your view of memory, which is a royal pain. Moving to 64 bit addressing makes the problem disappear. And with current memory prices, even small commodity servers could make good use of more than 4GB of memory.
And 64-bit integer registers are good for a lot of things, and while you can certainly use 64-bit integers on a 32-bit CPU, making them faster won't hurt.
So, Athlon currently has a huge performance advantage over Itanium on integer apps, and a huge price/performance advantage (with comparable absolute performance) on FP apps. AMD's aim with Hammer is to extend Athlon cheaply and effectively into the 64-bit realm.
Intel's aim with Itanium appears to be to crush all competition; unfortunately, they've placed a *huge* bet on improvements in compiler technology that just hasn't paid off yet, resulting in a high-end chip that lags behind not just the high-end RISC chips like Alpha and Power, but low-cost desktop chips. To achieve commercial success, the Itanium needs integer performance somewhere in the vicinity of their competitors, but they currently trail the pack by a huge margin. Even SGI do better, and they all but shut down their CPU design efforts years ago.
Maybe McKinley will be the answer - but it doesn't look like it, given that the promised speeds have dropped to 1GHz. IA-64 is an interesting architecture which may even have a future, but so far it just don't fly.
True.. IA64 will be very compiler dependant.. but there are still 3-4years to develop good and efficient compiler. I really don't see a point in
buying Itanium atm. (except if you are an OS developer). If you really need supercomputer for server.. get yourself a CRAY:)
Isn't gcc already available in an Itanium form?
OK, I know GCC is not the best compiler in terms of code produced, but if it works then that means you cn happily recompile Linux and all related apps for it.
--
Donte Alistair Anderson Roberts - hi son!
Karma: Chameleon
Re:AMD is deceiving you
by
SurfsUp
·
· Score: 4, Insightful
You don't have a clue. Let me just pick out a couple of the grossly wrong items...
Why do we need 64bit processors? Addressing? Nah, current processors can address enough space.. with 386 processors FAR addressing was introduced, which expanded allocatable address space drastically. (those silly DS, SS,.. registers) And newest processors can deal with them with same ease as with non-far addressing.
Sheesh, where are you coming from? You can address 64 Gig of physical memory with an x86 now, but you can only address 4 Gig (at most!) of it linearly. 32 bit address registers, get it? Gosh, and far addressing was introduced with 386's was it? Give me a break, try 8086's.
AMD's 64bit solution currently has no real value.. except for huge data storage (could work faster with 64bit data blocks) and probably some heavy encryption. x86-64 compiled Quake3 would make minimum use of 64bit registers.. and would probably be just a margin faster than IA32 compiled version.
Right, and I'm supposed to believe you on this, given your performance above. Um, you seem to have ignored the value of being able to crunch 8 byte integers, or pixels 8 bytes at a time, nicely matching the width of the MMX registers. For starters. Repeat this to yourself: "sledge hammer". "sledge hammer". Good, that's more like it.
Is IA64 better? Yes it is. IA64 has 128 usable 64bit registers, predicates... But that is not all.. in single 64bit register you can store 4 16bit values(common integer). (or 8 8bit or 2 32bit)
Um, and guess how many 16 bit values you can store in a 64 bit sledgehammer register? Ah, and guess how many fp/mms instructions sledge can retire per cycle?
Clawhammer will be better for a year or so.. but soon it will hit the ceiling. Intel will be able to get better performance from 1/2 clocked IA64.
You don't have any idea why it's called itanic, do you. Moderaters, take a look above. Remember, that's what 'random' looks like. Yes, I've got mod points right now. No, I won't waste them on you.
Is IA64 better? Yes it is. IA64 has 128 usable 64bit registers, predicates... But that is not all.. in single 64bit register you can store 4 16bit values(common integer). (or 8 8bit or 2 32bit)
Um, and guess how many 16 bit values you can store in a 64 bit sledgehammer register? Ah, and guess how many fp/mms instructions sledge can retire per cycle?
If I understood the whitepaper the answer is 2. You can also store 2 8 bit values in a 64 bit hammer registers. I know the math doesn't hold, but the ISA does; you can't access an arbitrary byte, word or dword section of hammer registers. Ofcourse the number you can just store is 64/r_size of r_sized values, but accessing them (except the lower two) requires rotate or swap operations.
--
Gentlemen, you can't fight in here, this is the War Room!
If I understood the whitepaper the answer is 2. You can also store 2 8 bit values in a 64 bit hammer registers. I know the math doesn't hold, but the ISA does; you can't access an arbitrary byte, word or dword section of hammer registers. Ofcourse the number you can just store is 64/r_size of r_sized values, but accessing them (except the lower two) requires rotate or swap operations.
Ditto for MMX (my guess is it's the same for IA64), but it doesn't matter.
You don't need to read and write individual bytes out of the register, the idea is that you have big tables of data you want to process in a loop, and you get to do 64 bits worth at a time, say, 4 16-bit shorts every loop. (reading them into the register by loading 64 bits at the address where the 4 values are stored consecutively) When you've done all the processing (the expensive part) you just write the registers to a memory location and read them like any other data.
I think the confusing part is that the cool thing is not 64-bitness, the cool thing is SIMD. Afaik, there's no instruction that does useful math on 2 independent short values packed into a 32-bit x86 register, for example; instead, you do one 16-bit op at a time and don't get any benefit from the larger register.
Far pointer has segment:offset which is 16:32 bit thus making it 48bit address... with which you can theoretically allocate 8TB of memory.
I can store 4 8bit values in single register on my shitty celeron. But can I effitiently manipulate with them!? Can I do That:
|1|2|3|4|
+
|5|6|7|8|
=
|6|8|10|14|
with single instruction?! Yes that is basic idea of SSE... but SSE (even SSE2) is not even close to what you can do with IA64.
And stop talking about Itanium.. it's processor meant exclusively for developers of IA64 drivers and Operating Systems. IA64 will start appearing in enterprise market in no less than 2-3 years.. it's a long-term ARCHITECTURE (not processor as you'd like it to be) solution. x86-64 will be good for few years.. and then? Adding more extensions? extensions to extensions? And each time you will need recompiled software to get full potential.
Adding more extensions? extensions to extensions? And each time you will need recompiled software to get full potential.
The whole point of backward compatability is not requiring recompilation.
--
Gentlemen, you can't fight in here, this is the War Room!
Re:Intel learning from their mistakes (using HP!)
by
2nd+Post!
·
· Score: 2
You may want to read this about the Mckinley, successor to Merced.
As far as I know, Merced is HP's design. Mckinley is Intel's. So... you could say Intel is learning from their mistakes by letting HP engineers do a good job.
Anyway, it's a mutually beneficial thing because HP doesn't have the resources to market and drive the product, while Intel doesn't have the engineers or resources to design and implement the architecture in a 'good' way. Intel provides process and HP provides layout, and together they will take over the world!
At least that's what I've heard and read. Myself, I own a G4 and use a Mac, it's not exactly as if Itanium is going to strike me down anytime soon.
Now we can wait for software support...
by
green+pizza
·
· Score: 5, Interesting
Once we get the 64-bit hardware, we still have the MMOS (minor matter of software) to worry about.
Cases in point:
Silicon Graphics machines with MIPS R4400 (and up) CPUs were 64-bit, but the additional address and pointer space wern't utilized until IRIX 6.0 in 1994 -- over 18 months later. (And, of course, certain SGIs still run in 32-bit mode due to RAM concerns -- 64-bit requires more RAM -- all Indys, all Indigos, all O2s, and R4400 Indigo2s).
Sun machines with UltraSPARC CPUs were 64-bit, but again, the additional address and pointer space had to wait for software support. (Multi-stage transition to 64-bit, starting with Solaris 2.5 and finally complete with Solaris 7 in 1998).
Then there's application optimization. Many apps can get slight speedups by processing data in larger (say, 38-bit or even 64-bit chunks). Sometimes the difference is huge, many times it's small. But, lots of little speedups can add up across an entire system. Still, someone has to make these changes to apps and compilers. It takes time, testing, and adoption. In better times, SGI did several such overhauls... they got some insane speed out of Netscape Enterprise and Netscape FastTrack web servers during the Everest project. One of their engineers also did some cool (but nonstandard) hacks to Apache, including the very first pure, clean 64-bit port/mod.
Newer, faster, wider, more-torque hardware is always great. But don't forget the software.
Re:Now we can wait for software support...
by
dunstan
·
· Score: 4, Informative
Even with a reference application (oracle 8.1.6) on a reference OS (Solaris 8), the patch levels for the 64 -bit version were 3 revs behind those for the 32-bit version when I last looked. What bothered me was that the bug I'd run into was fixed in the 32-bit version but still there in the 64-bit version. Guess which version I ran.
Dunstan
--
The last scintilla of doubt just rode out of town
Re:Now we can wait for software support...
by
SurfsUp
·
· Score: 2
Newer, faster, wider, more-torque hardware is always great. But don't forget the software.
Linux already runs find on itanic, oops I mean itanium. Linux runs on AMD hammer even before it's out .
-- Life's a bitch but somebody's gotta do it.
Re:Now we can wait for software support...
by
Anonymous Coward
·
· Score: 0
Microsoft already supports it's 64-bit OS and 64-bit Server Limited Edition OS:
http://www.microsoft.com/WINDOWSXP/64bit/overvie w. asp
Re:64-bit Computing: Looking Forward to 64-bit AI
by
Peter+Harris
·
· Score: 1
Aww, be nice!
(But, well, yeah I guess he is.)
-- -- What do you need?
-- Gnus. Lots of Gnus.
64-bit is more than speed
by
Trinition
·
· Score: 2
As I recall from an article I read in a magazine (that my mind won't reveal to me) many years ago...
32-bit CPUs use a 32-bit address space. That's space enough to address 2^32 bytes, or 4GB. With today's 100+GB hard drives and fractional to low GB RAM capacities, each requires its own addressing.
However, with 64-bits of addressing space, you have enough room to memory-map your entire stinking (yes, stinkin) hard drive into virtual RAM address space. This means your virtual address space would represent both RAM and your file system together.
Re:64-bit is more than speed
by
BCoates
·
· Score: 3, Insightful
Huh? you could do 32 bit addressing long before you could buy 4GB drives, but nobody thought memory mapping your hard drive into your address space was a good idea then... What would be the point?
--
Benjamin Coates
Re:64-bit is more than speed
by
hatchet
·
· Score: 1
FAR addressing is availbile on all current 32bit systems.. and it's as fast as near addressing.
I believe segmet:offset is 48bit long.. which extends addressing space to 8TB. I believe that is enough for anyone for next 8 years.
Re:64-bit is more than speed
by
falzer
·
· Score: 1
Maybe it would make things easier for people who write filesystem drivers.
2^64 bytes is ~16 million terabytes. With that kind of address space you would Never(TM) run out. If everyone had 64 bit processors it probably wouldn't seem like a bad idea to map your entire hard drive to your address space.
And of course nobody will image mapping this storage as a single drive (via SAN/Raid/...). Ooops I just imagined it:-)
Mapping a harddrive as memory when using a 64-bit CPU could be interesting. But don't think that it will always work.
IIRC that Multics used memory-mapped files exclusively and it was one of the things that Unix left out. It is an interesting concept but has it limits: (a) You need byte-sized granularity on order to keep track of what the size of the file is. (b) You need sector-sized granularity for tracking changed sectors. The latter does not scale well.
Re:64-bit is more than speed
by
Anonymous Coward
·
· Score: 0
However, if you do map everything as RAM, you break down the whole cache heirachy - the CPU and memory controller will always try to access from lower levels - L1, L2, L3, main memory, HD, CD etc.
Keeping track of a directly mapped scheme and implementing a system like this would be a challenge.
AMD is number 1 and i think its staying number 1
by
walker2030
·
· Score: 0
im glad that AMd's chip will support 32 bit becouse i am not ready to ditch my 32 bit apps and games
I don't get it. Every time this crank posts the same offtopic junk, plugging his pet project (AI in VB and Javascript? Have some taste please!) he gets modded up. Moderators, look at his posting history and realise that you screwed up.
With the currently popular 32-bit CPU chips, Robot AI memory limitations are too severe because a memory of 2^32 size is not enough.
Ah, an attempt to be somewhat on topic. Hovever, I don't buy it - how much memory is enough - do you know or are you blowing smoke? And seeing as few machines have this much ram (2^32 = 4Gb ram) don't they use disk swap files or databases anyway? There are many file systems that can handle files this size already, so how exactly will 64bit processors suddenly enable AI in VB that can't be done at present?
An increase in computer power is a rising tide that lifts all boats, even crank AI, but how exactly is the move to 64 bits a sudden huge leap for your Javascript "mind"?
--
My Karma: ran over your Dogma
StrawberryFrog
Crackpot spouts buzzword, film at 11
by
BCoates
·
· Score: 1
You're a nut, man. The `singularity` thing is the most obvious giveaway, two-bit futurists have probably been babbling about computers becoming smarter than humans and going off on their own since Babbage.
64 bits on a server = larger databases (the better the catalog you with my dear)
64 bits on a desktop = ?
I'm sorta missing why this is good for the rest of us...
this was so fukkin funny I almost pissed my pants.
Chips, maybe, but applications?
by
Zocalo
·
· Score: 3, Informative
There may well be a slew of 64 bit chips by the years end, but I doubt you are going to see much non-specialist application support for some time. Sure PhotoShop and a few other desktop applications will arrive fairly quickly, but look at Windows and 32 bit support; Intel shipped the 80386 in 1985 and only now can you boot a Windows PC without running 16 bit code from the HDD.
Actually, even that's not strictly true, since according to the Resource Kit documentation Windows XP's initial configuration detection is *still* 16 bit.
-- UNIX? They're not even circumcised! Savages!
Re:Chips, maybe, but applications?
by
SuiteSisterMary
·
· Score: 2
Intel shipped the 80386 in 1985 and only now can you boot a Windows PC without running 16 bit code from the HDD.
Excuse me? Windows NT came out in the Windows for Workgroups era. Running, if I recall correctly, on 486 class machines. Not to mention MIPS, PowerPC, and, I believe at the time, SPARC.
-- Vintage computer games and RPG books available. Email me if you're interested.
2 out of 3. MIPS, PowerPC, and Alpha were all supported for some version of NT between 3.1 and 4.0.
-- "Evil will always triumph because good is dumb." -- Dark Helmet
Re:Chips, maybe, but applications?
by
SuiteSisterMary
·
· Score: 1
Alpha. That's the one I remembered after I hit post. But I'm pretty sure that there was a SPARC version, even if it was never made public.
-- Vintage computer games and RPG books available. Email me if you're interested.
Re:Chips, maybe, but applications?
by
Zocalo
·
· Score: 2
I think you've misunderstood; the key phrase was "without running 16 bit code", not that NT wasn't using 32 bit code. The NT/XP codebase is *still* not fully 32 bit as there are several initial steps of the OS boot process performed in real (16 bit) mode, including initial hardware detection by NTDETECT.COM, and if the underlying OS isn't yet fully 32 bit, then what is the liklihood all the applications are?
Getting techy for a minute, a PC's BIOS transfers control via a disk's boot sector to the boot loader in real mode, the boot loader (GRUB, LILO, NTLDR etc.) then loads the OS proper. Now, technically a boot loader could switch to protected (32 bit) mode before loading a fully 32 bit OS, but so far all mainstream PC OSs (yes, Linux and BSD too) run some initial boot code in real mode before making the switch to protected mode. Some make the switch sooner than others and I'm sure some of the experimental OS's out there make the switch immediately they gain control from the boot sector.
The main point though, was that for the 32 bit Windows platform (boot stubs aside) the process of Hardware support -> OS support -> decent app support has taken the best part of a decade. If you think the switch from 32 bit to 64 bit is going to happen much quicker, then you are probably going to be disappointed.
-- UNIX? They're not even circumcised! Savages!
Re:Chips, maybe, but applications?
by
linzeal
·
· Score: 1
How would you like to wake up with a nice default install of windows NT on your alpha young man, now go eat your carrots. Muhahahahah
Re:Chips, maybe, but applications?
by
vanguard
·
· Score: 2
FYI, it was it didnt' run on SPARC and you missed Alpha. (Actually, I read it was coded on Alpha machines to ensure cross platform disipline.) I'm not sure about PowerPC, I can't remember.
-- That which does not kill me only makes me whinier
Al Qrapola's gonna win
by
DrSpin
·
· Score: 2, Funny
The problem with AMD is you only have to remove the heatsink and you can use them as thermo-nuclear devicess.
This is the basis of Al Win Modem's plot to overthrow the world!
(Real operating systems run perfectly well on 8-bit CPUs.)
AMD will win
by
Manic+Miner
·
· Score: 2, Insightful
Firstly, it would appear that you have at least read some white papers on the web, but have you used a real Itanium?
I have, and let me say that the reason AMD will win is compatibility and making sure that the things don't sound like a jumbo jet taking off. These may seem like minor points but they are what will count.
The major point of the AMD solution is backward compatibility... Intel knows this, why do you think that all their previous chips have been sucessfull? Because they were the "best" solution for the job? NO, not by a long shot. They were, however, able to run the old software faster, and provide a route for new software to run even faster still.
AMD provides this solution, new software can take advantage of the 64 bit addressing and processing several integers in just one register. But at the same time old software can run on the system faster than it would on the same clock speed 32 bit processor.
Look at it from an IT purchasing point of view, you can pick machine A which will theoretically be faster in the future but to get anything even aproaching decent performance you need to buy a whole load of new software. Or you can pick machine B which will run all your current software faster than your current machines, and run future software even faster still. Which would you choose, people like a sure thing, not the promise of something good in the future.
AMD can then concentrate on moving towards a pure 64 bit machine once most of the applications have moved to 64 bit, this makes the most sense long term. You buy your 64 bit machine, run 32 and 64 bit mixed software quickly. Then once you are running mostly 64 bit you can move seamlessly to a 64 bit proven and tested environment.
Current Itaniums are slow large and noisy, this makes a huge difference if you only have a small server room, or have to run a server under your desk - some people really have to do this in smaller comanies. You won't see them on the desktop market anytime soon, they are too slow, 32 bit performance is not great.
The Itanium might have a good archtecture, but I think that it's lack of speed and compatibility with 32 bit applications, coupled with noise and heat will cause it to lose the battle
-- If you ever drop your keys into a river of molten lava, let'em go, because, man, they're gone.
I didn't mention processor superiority of Itanium over Clawhammer.. but architecture superiority of IA64 over x86-64.
Itanium is premature processor and cannot really compete with anything. But IA64 architecture is fully developed and has very bright feature ahead...
IA64 might be a good architecture, but good processors and real world applications are needed for it to suceed. Why would people buy Itanium if it is slow? They just won't. If nobody buys the processors then the application developers of the real world apps that IA64 needs to suceed will develop for x86-64 instead.
It is pure 64 bit and so is superior in a pure 64 bit environment... However if the most common applications run on x86-64, then x86-64 will suceed and IA64 will die because of a total lack of software. And x86-64 likely to be more prevelant due to it's ability to quickly execute "legacy" x86 code.
-- If you ever drop your keys into a river of molten lava, let'em go, because, man, they're gone.
Re:I'll stick to my Atari 800XL
by
Anonymous Coward
·
· Score: 0
I personally still use my 6502c 8-bit proc for my Atari 800XL, has all the power I need.
Atari Power with out the Price!
- MUD
Re:Intel learning from their mistakes (using HP!)
by
Kuad
·
· Score: 1
I believe you mean that McKinley is an HP design, not the other way around. HP has been working on VLIW architectures for a decade now and they certainly have a better idea of how to make it work.
He will finally be able to store his net worth in dollars in a single long int.
-- I've had enough abrasive sigs. Kittens are cute and fuzzy.
FAR addressing.
by
leuk_he
·
· Score: 3, Insightful
Why do we need 64bit processors? Addressing? Nah, current processors can address enough space.. with 386 processors FAR addressing was introduced, which expanded allocatable address space drastically.
Far adressing can handle 4GB. but 4GB is not much by today standards(it is not little either). You want flat adress space, and ram is cheap this days. 32 bits get you to 4GB, if you want more you have to resort to tricks. (those silly DS.DD etc)
With the first 64 bit alpha's they used this as an argument: it is useful for fast memory scans when using big databases.
--640 Kb will be enough....
Re:Intel learning from their mistakes (using HP!)
by
Anonymous Coward
·
· Score: 0
> As far as I know, Merced is HP's design. Mckinley > is Intel's. So... you could say Intel is learning > from their mistakes by letting HP engineers do a > good job.
You mena, of course, that Merced is Intel's design, whereas McKinley is HP's.
Someone is gonna pay for this
by
Gastnor
·
· Score: 0, Offtopic
Don't they know its illegal to reverse eng. something like this. DMCA will get everyone in then end. Mwahahaha!
Questionable "optimization"
by
Anonymous Coward
·
· Score: 0
The original test was written in FORTRAN and ported to C. As the article states the array was accessed in a FORTRAN manner which screwed up C's cache locality. Had this program been written in C in the first place the programmer would have adjusted the loops to better accomodate C's array layout and avoided the issue altogether. Although Sun's optimization is admirable, it will not help out the vast majority of C programs written with C's design in mind. There are lies, damned lies and benchmarks. Sun's processor speed claims are just that. Real world code will not be significantly aided by this particular optimization. Also, I would be a bit leery of a compiler questioning and rearranging the order of my loops. I've caught quite a few bugs in over zealous optimizing compilers over my career and it is certainly the last place you look for a bug, and is very time consuming and expensive to pinpoint.
Re:Questionable "optimization"
by
Anonymous Coward
·
· Score: 0
Also keep in mind that it took Sun over 3 years to clean up major bugs in its Forte line of compilers. Maybe this was because of the company's big Java initiative - it may have simply ignored C/C++ compiler development in that time. I think within 10 years the only compiler left standing will be GCC. It is not economically viable for a hardware company to support, upgrade and maintain a modern compiler - even for companies the size of Sun, IBM or Compaq. Compilers are a big money loser. Up until now it was simply the cost of doing business. But times have changed and it is no longer the fastest and most cost effective route for a chip maker to get compiler support for their new CPU designs.
Performance isn't the only reason for 64-bit chips
by
Xenophon+Fenderson,
·
· Score: 1
There are actually several meanings of "64-bit chip". If you mean "64-bit or wider data path through the CPU", then yes, performance is the reason people want that. If you mean "uses 64-bit addresses", then yes, performance isn't really an issue, whereas increased program size is (e.g. large databases).
I'm fairly sure they will ship in Jan
by
Aqua+OS+X
·
· Score: 2, Informative
Motorola and Apple really have the jump on this one. And it is about time, because the g4 has not been holding its ground for a while now.
Apple is going to use an MPC85xx. Here is one, if not the, chip apple will use MPC8540 info
64/32 bit processing, 333mhz DDR, Rapid I/O, etc.
Hypertransport will probably also find it's way into the new motherboard designs. That's been done for a while now.
-- "Things are more moderner than before- bigger, and yet smaller- it's computers-- San Dimas High School football RULES!"
Serial ATA will help there
by
Anonymous Coward
·
· Score: 0
Since transistors are cheap once we have a cheap interface, because of low pin count, cards which have a lot of individual ATA channels will become a lot cheaper too. A 1 meter cable length should suffice for most applications too.
Then the only real advantage to SCSI from a user point of view will be that the HD manufacturers artificially seperate the markets by only not using ATA for their best HD's.
Sun makes the LOWEST priced 64 bit server $950.00
by
Anonymous Coward
·
· Score: 0
Sun currently sells the lowest cost 64 bit server
check out www.sun.com they have two 64 bit machines that include all you need to boot up (the box and the software) for under $1k. Let's see if Intel or AMD can do that? Yes you can pay a lot more like $1M for a Sun too should you be needing a 128 CPU multiprocessor box. Try that (today) with AMD or Intel. That said I admit to using a P500 (running Solaris 8) to type this on simply becausethe P500 was free.
No, it's memory mapped devices
by
yerricde
·
· Score: 1
jumping around in [assembler] code, you want to be able to use negative numbers -> gives you +- 2GB range
Long jumps (i.e. more than +/- 127 bytes in 6502 or x86 or +/- 32767 words in MIPS) are not relative in most architectures; they're absolute. The real reason for a 2 GB limit is not for negative address but instead for memory-mapped devices other than RAM, such as AGP, PCI, ROM, etc.
I'm tired of x86-centric articles...
by
kelleher
·
· Score: 2
So the only thing noteworthy of the US-III is the Forte 7 compiler optimization? And just read this garbage about cache sizes:
The majority of McKinley?s
transistor count is tied up in its cache hierarchy. It is the first microprocessor to include three levels of cache hierarchy on chip. The first level of cache consists of separate 16 KB instruction and data caches,
the second level of cache is unified and 256 KB in size, and the third level of cache is an astounding 3 MB in size.
Which is later followed by:
Both the [SunBlade] Model 1900 and Model 900 Cu versions of the Blade 1000 feature 8 MB of L2 cache.
Hmmm... 3MB is astounding, but 8MB is unremarkable... Well, I'd have to agree. I haven't bought a server with less than 4MB of cache in years. Oops, the SunBlade is only a workstation... Kinda makes you wonder.
Sun might be expensive, but it's solid, fast (enough), and predictable. I love x86 (usually Linux) at home, but wouldn't dream of putting it someplace business vital - much less mission critical.
Business Vital == 1 maintance window per month and a mean time to recover exceeding 6 hours potentially costs several million dollars.
Mission Critical == 1 maintance window per quater and a mean time to recover exceeding 15 minutes potentially costs several million dollars.
Re:I'm tired of x86-centric articles...
by
cgori
·
· Score: 1
> Hmmm... 3MB is astounding, but 8MB is
> unremarkable... Well, I'd have to agree. I
> haven't bought a server with less than 4MB of
> cache in years. Oops, the SunBlade is only a
> workstation... Kinda makes you wonder.
Uh, the 8MB L2 is off-chip on US3 (ever wonder why USparc processors come as those big modules?). the 3MB L3 (and of course also the L2) is on-chip on McKinley. That's why it's a BFD. Try to read a little bit about the processors before you go off on them.
Sun hardware is not fast (enough) if you need really high speed. Price/performance is also horrible.
Linux reliability is good, but not as good as Sun, at least not when you have maintenance windows like you are quoting. However, you ought to just be able to buy 2x redundant Linux/PC everything to cover your multi-million-dollar failure cases and just swap over in the event of failure.
I use almost all Linux for computation servers at work (in the chip-design space, actually), we have a few Suns kicking around just in case an app isn't supported. The Sun HW is almost always half the speed. Our linux boxes can run 90+ days with 100% cpu load, no worry (of course, so can the suns). But, we can buy 2 or 3 times as many fully equipped Linux servers per Sun.
Re:I'm tired of x86-centric articles...
by
f00zbll
·
· Score: 2
So what you're telling me is you're willing to put a mission critical application that could cost millions in losses if it goes down. Last time I checked, most low/medium level pc motherboards are still manufacturer on 3-5 layer process and use less stringent tolerances. Even high end motherboards don't have the same level of tolerance as Solaris server level motherboards.
Is it worth spending an extra 3-4K per machine to make sure you don't loose a couple million? I don't think math is needed to figure that one out. Plus, do you want to be the one responsible for it when it goes down? How much would a full redundant system cost to build with PC components that are equal to Solaris 1K+ servers like 4500?
I for one would never put a mission critical database or transaction application on linux PC. Not because linux isn't a good operating system, because PC components are designed to be thrown away in 1-2 years.
Virtual memory filesystem breaks "revert to saved"
by
yerricde
·
· Score: 2
You need sector-sized granularity for tracking changed sectors. The latter does not scale well.
The common implementations of virtual memory use page-sized granularity for tracking changed pages. Does that scale?
The big problem I've seen with using a single memory space is that applications often forget to implement multiple levels of undo. With many PDA applications, once you make two accidental changes to a file, the previous version is gone forever because many applications modify files in-place, breaking the "revert to last saved version of document" feature. This bit me in the butt several times on Newton OS.
Having an IDE controller in your 486 also presupposes a spare slot. Just, at the time, you had to plan ahead and figure that two slots are gone right from the start (video and IDE).
Nowadays, you might have no slots taken up by IDE video, sound and lan. This is a good thing as it is freeing up slots for you to go willy-nilly all over.. If you have to remove, say, a video capture board to make up for a broken IDE controller, just keep in mind that you never had the choice with your 486.
Probably what you want is some sort of indefinitely expandable bus that may actually be soming out this century:).
"version of Windows XP (or whatever the next version will be called)"
I recommend "Windoze ES", for Eat Shit. After all, it's what they've told the gummint.
Kind of like: Eat Shit -- 10 Billion Flies (95% Market Share) Can't Be All Wrong
But where is Windows for x86-64?
by
Namarrgon
·
· Score: 2
The number one question (but perhaps not in this forum) on most potential Sledgehammer-owner's minds is, what OS are they going to run on their 64 bit? Apparently not Windows. No announcement has been made by MS or AMD. Yet.
Win64 has been ported to Itanium for some time now. We've already ported our memory-hungry special-FX app to it. But few people outside the server space are going to be interested in getting an Itanium because the performance with legacy IA32 apps is dog-slow. I mean really slow, like P90 speeds. So we don't expect too many sales of that version, just a few for hardcore dedicated seats.
Sledgehammer is really interesting to us. Combine the best available x86-32 performance for running 3DS Max, Lightwave, Photoshop etc etc, along with the serious memory & address space of a 64 bit CPU for our app, not to mention quite a bit more speed when doing 64 bit calculations (pretty common with 64 bit pixels), makes for a powerful and still flexible beast.
But without Windows support, the IA32 performance advantage is largely meaningless. In our market, that relegates Hammer to Linux-64 render farms - which is fine, but it's not where our money is, and it's not where the CPU would shine. You can use Win32 or Linux-32, of course (unlike Itanium), but that's kinda missing the point.
AMD better get MS & Win64 on their side soon, if they want to capture the workstation market. A lot of server apps still require Windows too. The reality of the market is that mainstream OS support is required, or you get niched PDQ.
--
Why would anyone engrave "Elbereth"?
But I *already* have a 128-bit computer
by
UnknownSoldier
·
· Score: 2
I think the point of a 64-bit cpu is a bit short sighted. Hardware vendors should be jumping to 128-bit cpus, ala the "Emotion Engine" in the PS2.
Why 128-bit? Because a 4-tupple (x,y,z,w) for vector and matrix operations can then be natively done. (Yes, I'm spoiled with the VU's on the PS2)
I do wonder when 64-bit cpu's will actually become a commodity item though. A 32-bit cpu provides 99% functionality for most of the general public using them. It's only gaming, scientific computing, & multimedia that really need the 128-bit registers, correct? Or am I missing something?
Secondly, for 64-bit cpus, is there a standard instruction set? Or do I need to compile our game code specifically for IA64, and Hammer?
I do agree, that a 64-bit address would be a welcome change. I can imagine the Database guys jumping up and down with joy once cheap PC hardware supports 64-bit.
Re:But I *already* have a 128-bit computer
by
Anonymous Coward
·
· Score: 0
64bit means generally two things, either the native size of its words for how much it can address (16bit=64K, 32bit=4gig, 64bit=lots! This is the correct way), or the width of the data bus (the marketting).
Game consoles like to call themselves 64bit because they can move 64bits at a time. The 64bit Atari Jaguar had a Motorola 68000 as its main CPU! The same as the first Amigas, Macs, Genesis, Neo Geo, and other 16bit machines.
By that standard even the Pentiums were 64bit. And the PS2 is 128bit.
By your standard of 128bit PCs reached that with the PentiumIII which with SSE can do four floats at once, just like your PS2.
However both the PS2 and PentiumIII are really 32bit.
Read the article. AMD is extending the x86 line to 64bit, Intel invented a whole new instruction set to break compatiblity (and make lots of patents to prevent Transmeta, AMD, VIA, etc from closing), Sun has theirs, DEC/Compaq/Intel have their own Alpha ISA, SGI uses the MIPS one like the PS2, IBM uses its own POWER instruction set (!=PowerPC), and Motorola is about to introduce the G5 this year to bring a 64bit PowerPC chip to the desktop (well mostly embedded areas).
Re:But I *already* have a 128-bit computer
by
UnknownSoldier
·
· Score: 2
> Game consoles like to call themselves 64bit because they can move 64bits at a time.
Unfortunately ture, the early consoles would play marketing games like this.
> By that standard even the Pentiums were 64bit.
The (classic) Pentium is classified as 32-bit because the *general purpose* CPU registers are only 32 bits. (There are a few 64-bit registers, i.e. TimeStampCounter, etc)
With MMX/SSE, the PentiumIII is actually a 32-bit / 128-bit hybrid. It has *native* instructions and registers for *both* 32 and 128-bit processing.
> However both the PS2 and PentiumIII are really 32bit.
Incorrect.
Pentium) I explained this above.
PS2) Do you even program on a PS2??
I think you need to re-read your "EE User's Manual", "EE Overview Manual", and "EE Core User's Manual" (Section 1.4) The core internal bus is 128 bits, and *ALL* the General Purpose Registers (total of 32) are 128 bits. What do you think LQ and SQ do? They load/store 128-bits to/from a register!
Now, it is true, that most PS2 instructions only deal with 32-bit (word) and 64-bits (doublewords), but there are native 128-bit multimedia instructions.
Don't let the fact that the PS2 treats the 128-bit registers as 2 * 64-bits, or 4 * 32-bits confuse you.
Technically the PS2 is a 64-bit/128-bit hybrid, much the same way the PentiumIII is.
Cheers
Copyright Violation
by
Anonymous Coward
·
· Score: 0
Hi,
I appreciate the concern that readers might have about slow download time, particularly when the server is hit by so many technology enthusiasts that come from this fine site...
However, posting the entire article - even with full attributions - is a violation of copyright law, because nobody actually asked for, nor received permission to, post it elsewhere in its entirety, AFAIK.
Please note that Real World Technologies does *not* own the copyright, it is owned by Paul DeMone. Therefore I suggest that either the site owners of Slashdot request permission from Paul to keep this here (if they have not done so already, or it should be removed.
Another poster provided a link to the full article (printed version), which is much more appropriate until such time as Paul wishes to give his blessings for having the article reprinted in full here...
PS... Anyone here want to donate a high-powered web server just for Slashdot readers to enjoy faster response times when reading RWT?;-)
Regards,
Dean Kent
Real World Technologies
Re:Mirror
by
Anonymous Coward
·
· Score: 0
This is not an official mirror site. This site has effectively stolen the article in full from Paul DeMone.
Anyone who cares about Intellectual Property should avoid using this link. The owner of that site should contact Paul DeMone directly and request permission for posting it.
I find this rather disturbing that this is done so blatantly...
Regards,
Dean Kent
Real World Technologies
16 to 32 bit jump unimpressive
by
eples
·
· Score: 1
The "more bits" phenomenon has been sustained by improvements in VLSI and the advent of true System-on-a-chip design, but this too has its limits. If you compare a P4 motherboard with, say, a 386 mobo circa 1995
I agree, the jump from 16 to 32-bit desktop processing was anti-climactic. The hardware definitely lagged behind the processors, and the software lagged even further.
I was always wondering why they do not average time for spec number (harmonic average for scores). That way it would be very difficult to change average with only 1 test. Yes, they do something like this for spec_rate, but this test is mainly for multiprocessor systems
Re:spec averaging
by
Anonymous Coward
·
· Score: 0
If you had one test which happened to take 12 hours when the rest took 5 minutes all the effort would go into that instead of balanced like they do now which would result in an even worse situation than what we have.
A new thought on an old subject
by
Anonymous Coward
·
· Score: 0
Nerds!!!!!!
arrrrrrrrgh!!!!
nnnnnnnnneeeeerrrrdddddsss!!!!!
That makes me feel alot better, stompin some nerds!
I'll take a look at 64-bit processors once I can manage to utilize more than 60% of my current 32-bit processor on a regular basis.
Re:Performance isn't the only reason for 64-bit ch
by
psamuels
·
· Score: 1
There are actually several meanings of "64-bit chip". If you mean "64-bit or wider data path through the CPU", then yes, performance is the reason people want that.
...so by this definition the 8088 was an 8-bit processor and the 386SX was 16-bit.
If you mean "uses 64-bit addresses", then yes, performance isn't really an issue, whereas increased program size is (e.g. large databases).
If you go by use address space, the 6502 is a 16-bit CPU. And the Pentium Pro is 36-bit. (Sort of.)
I think a third definition - width of general-purpose registers - is the commonly used one. In that case the 6502 is only 8-bit, the 8088 is 16-bit and the 386SX is 32-bit.
-- "How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README
When will 128 bit strike? Here's my take
by
jeffbeadles
·
· Score: 1
(At least from Intel)
8 bit - 8008 - 1972
16 bit - 8088 - 1978 (1.3 bits/year)
32 bit - 80386 - 1985 (2.3 bits/year)
64bit - Itanium - 2001 (2 bits/year)
128 bit - ?? - 2026? (2.5 bits/year?)
I can't imagine what I would do with a 2^64 address space, let alone 2^128...
AMD is the future. Glad to see an underdog win.
I've had 64bit computing for years now! This is nothing new!
i want a 128 bit computer why are computers lagging aginst game systems or is this just graphics measurement on the game systems.
I still want a 128 bit computer in any case!!!! (-:
If the Power Mac G5 is introduced at Macworld on Monday*, you can all have your 64-bit goodness by the months end! *I'm not really expecting it to be released this soon, maybe later this year. But who knows? It could happen.
SIGFEH
CONGRATS.I386
I truly care about 64 bits. That way graphics' transparencies, reflections, etc. can be embedded right away. Yohoo! Way to go!
is here: That way you only have to wait a longass time for it to load once, instead of a longass time for each of the 5 or 6 pages.
-- Dan
HISTORY OF THE WORLD (Score:5, Funny)
2.5 million B.C.: OOG the Open Source Caveman develops the axe and releases it under the GPL. The axe quickly gains popularity as a means of crushing moderators' heads.
100,000 B.C.: Man domesticates the AIBO.
10,000 B.C.: Civilization begins when early farmers first learn to cultivate hot grits.
3000 B.C.: Sumerians develop a primitive cuneiform perl script.
2920 B.C.: A legendary flood sweeps Slashdot, filling up a Borland / Inprise story with hundreds of offtopic posts.
1750 B.C.: Hammurabi, a Mesopotamian king, codifies the first EULA.
490 B.C.: Greek city-states unite to defeat the Persians. ESR triumphantly proclaims that the Greeks "get it".
399 B.C.: Socrates is convicted of impiety. Despite the efforts of freesocrates.com, he is forced to kill himself by drinking hemlock.
336 B.C.: Fat-Time Charlie becomes King of Macedonia and conquers Persia.
4 B.C.: Following the Star (as in hot young actress) of Bethelem, wise men travel from far away to troll for baby Jesus.
A.D. 476: The Roman Empire BSODs.
A.D. 610: The Glorious MEEPT!! founds Islam after receiving a revelation from God. Following his disappearance from Slashdot in 632, a succession dispute results in the emergence of two troll factions: the Pythonni and the Perliites.
A.D. 800: Charlemagne conquers nearly all of Germany, only to be acquired by andover.net.
A.D. 874: Linus the Red discovers Iceland.
A.D. 1000: The epic of the Beowulf Cluster is written down. It is the first English epic poem.
A.D. 1095: Pope Bruce II calls for a crusade against the Turks when it is revealed they are violating
the GPL. Later investigation reveals that Pope Bruce II had not yet contacted the Turks before calling for the crusade.
A.D. 1215: Bowing to pressure to open-source the British government, King John signs the Magna Carta, limiting the British monarchy's power. ESR triumphantly proclaims that the British monarchy "gets it".
A.D. 1348: The ILOVEYOU virus kills over half the population of Europe. (The other half was not using Outlook.)
A.D. 1420: Johann Gutenberg invents the printing press. He is immediately sued by monks claiming that the technology will promote the copying of hand-transcribed books, thus violating the church's intellectual property.
A.D. 1429: Natalie Portman of Arc gathers an army of Slashdot trolls to do battle with the moderators. She is eventually tried as a heretic and stoned (as in petrified).
A.D. 1478: The Catholic Church partners with doubleclick.net to launch the Spanish Inquisition.
A.D. 1492: Christopher Columbus arrives in what he believes to be "India", but which RMS informs him is actually "GNU/India".
A.D. 1508-12: Michaelengelo attempts to paint the Sistine Chapel ceiling with ASCII art, only to have his plan thwarted by the "Lameness Filter."
A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait).
A.D. 1553: "Bloody" Mary ascends the throne of England and begins an infamous crusade against Protestants. ESR eats his words.
A.D. 1588: The "IF I EVER MEET YOU, I WILL KICK YOUR ASS" guy meets the Spanish Armada.
A.D. 1603: Tokugawa Ieyasu unites the feuding pancake-eating ninjas of Japan.
A.D. 1611: Mattel adds Galileo Galilei to its CyberPatrol block list for proposing that the Earth revolves around the sun.
A.D. 1688: In the so-called "Glorious Revolution", King James II is bloodlessly forced out of power and flees to France. ESR again triumphantly proclaims that the British monarchy "gets it".
A.D. 1692: Anti-GIF hysteria in the New World comes to a head in the infamous "Salem GIF Trials", in which 20 alleged GIFs are burned at the stake. Later investigation reveals that many of the supposed GIFs were actually PNGs.
A.D. 1769: James Watt patents the one-click steam engine.
A.D. 1776: Trolls, angered by CmdrTaco's passage of the Moderation Act, rebel. After a several-year flame war, the trolls succeed in seceding from Slashdot and forming the United Coalition of Trolls.
A.D. 1789: The French Revolution begins with a distributed denial of service (DDoS) attack on the Bastille.
A.D. 1799: Attempts at discovering Egyptian hieroglyphs receive a major boost when Napoleon's troops discover the Rosetta stone. Sadly, the stone is quickly outlawed under the DMCA as an illegal means of circumventing encryption.
A.D. 1844: Samuel Morse invents Morse code. Cryptography export restrictions prevent the telegraph's use outside the U.S. and Canada.
A.D. 1853: United States Commodore Matthew C. Perry arrives in Japan and forces the xenophobic nation to open its doors to foreign trade. ESR triumphantly proclaims that Japan finally "gets it".
A.D. 1865: President Lincoln is 'bitchslapped.' The nation mourns.
A.D. 1901: Italian inventor Guglielmo Marcoli first demonstrates the radio. Metallica drummer Lars Ulrich immediately delivers to Marcoli a list of 335,435 suspected radio users.
A.D. 1911: Facing a break-up by the United States Supreme Court, Standard Oil Co. defends its "freedom to innovate" and proposes numerous rejected settlements. Slashbots mock the company as "Standa~1" and depict John D. Rockefeller as a member of the Borg.
A.D. 1929: V.A. Linux's stock drops over 200 dollars on "Black Tuesday", October 29th.
A.D. 1945: In the secret Manhattan Project, scientists working in Los Alamos, New Mexico, construct a nuclear bomb from Star Wars Legos.
A.D. 1948: Slashdot runs the infamous headline "DEWEY DEFEATS TRUMAN." Shamefaced, the site quickly retracts the story when numerous readers point out that it is not news for nerds, stuff that matters.
A.D. 1965: Jon Katz delivers his famous "I Have A Post-Hellmouth Dream" speech, which stated: "I have a dream that one day on the red hills of Georgia the geeks of former slaves and the geeks of former slave geeks will be able to sit down together at the table of geeks... I have a dream that my geek little geeks will one geek live in a nation where they will not be geeked by the geek of their geek but by the geek of their geek."
A.D. 1969: Neil Armstrong becomes the first man to set foot on the moon. His immortal words: "FIRST MOONWALK!!!"
A.D. 1970: Ohio National Guardsmen shoot four students at Kent State University for "Internet theft".
A.D. 1989: The United States invades Panama to capture renowned "hacker" Manual Noriega, who is suspected of writing the DeCSS utility.
A.D. 1990: West Germany and East Germany reunite after 45 years of separation. ESR triumphantly proclaims that Germany "gets it".
A.D. 1994: As years of apartheid rule finally end, Nelson Mandela is elected president of South Africa. ESR is sick, and sadly misses his chance to triumphantly proclaim that South Africa "gets it".
A.D. 1997: Slashdot reports that Scottish scientists have succeeded in cloning a female sheep named Dolly. Numerous readers complain that if they had wanted information on the latest sheep releases, they would have just gone to freshsheep.net
A.D. 1999: Miramax announces Don Knotts to play hacker Emmanuel Goldstein in upcoming movie "Takedown"
Since thay can't get 32 bits to work stable, what will happen when Billy has to tame 32 more of those bits??
The idea alone make's me very afraid because of the fact that i have to work with windows in my office.
Real programmers don't document.
It was hard to write so it should be hard to understand.
Looking Forward to 2002
By: Paul DeMone (pdemone@realworldtech.com) Updated: 01-02-2002
A Quick Look Back
In the last six months several noteworthy events and disclosures have occurred in the fast moving world of microprocessors. AMD started shipping its Palomino K7 processor as the Athlon XP. Despite the controversy surrounding the performance rating based model naming scheme associated with the XP, it appears the latest refinement of the AMD's venerable K7 design has, by most measures relevant to the PC world, eclipsed the performance of the 2 GHz Pentium 4 (P4), the highest speed grade offered for Intel's first implementation of its new x86 microarchitecture. However, this advantage should prove short-lived, as the second generation 0.13 um Northwood P4 will be officially released in early January. The Northwood will offer higher clock rates, an L2 cache doubled in size, and minor internal performance enhancements.
Extending their rivalry on a different front, Intel and AMD unveiled microarchitectural details of their forthcoming 64-bit standard bearers at Microprocessor Forum in October. Although the McKinley and Hammer are both future flagship parts, and thus important symbols of Intel and AMD struggle for technological leadership, the two processor families will be sold into different markets and won't directly compete. In other 64-bit news, IBM officially unveiled the POWER4 processor in several different hardware configurations with clock rates as high as 1.3 GHz and took the top spot in both the integer and floating point performance categories of the SPEC CPU 2000 benchmark. However, preliminary "teaser" numbers from Compaq suggest that IBM will lose SPEC performance leadership when the EV7, the final major product introduction in the doomed Alpha line, is unveiled. Regardless of who wins bragging rights for technical computing, both processors will offer memory and I/O bandwidth far ahead of their competitors and both should do quite well on commercial workloads.
Sun Microsystems continues to slowly upgrade its UltraSPARC-III line in the face of an increasingly difficult competitive environment. Sun recently introduced its copper process based version of the US-III at 900 MHz. The latest device ostensibly includes a fix to the prefetch buffer bug that vexed the earlier aluminum based device. Far more interesting than the new silicon was the latest version of Sun's compiler. It raised the new copper US-III/900's SPECfp2k score by roughly 20% by spectacularly accelerating one of the 14 programs in the suite using an undisclosed optimization. A recent call was issued for new programs for the next generation of the SPEC CPU benchmark. Tentatively named SPEC 2004, it now seems like it couldn't come soon enough.
McKinley: Little more Logic, Lots more Cache
The most striking aspect of McKinley is its size and transistor count. Weighing in at a hefty 220 million transistors, this 0.18 um device occupies a substantial 465 mm2 of die area. The majority of McKinley's transistor count is tied up in its cache hierarchy. It is the first microprocessor to include three levels of cache hierarchy on chip. The first level of cache consists of separate 16 KB instruction and data caches, the second level of cache is unified and 256 KB in size, and the third level of cache is an astounding 3 MB in size. The die area consumed by the final level of on-chip cache can be seen in the floorplan of the McKinley and some representative server and PC class MPUs shown in Figure 1.
Figure 1 Floorplan of McKinley and Select Server and PC MPUs.
The Itanium (Merced) floorplan is shown as blank because although its chip floorplan has been previously disclosed its die size is still considered sensitive information by Intel and has not been released. The outlines shown indicate the range of likely sizes of the Itanium die based on estimates from a number of industry sources.
Both the first and second generation IA64 designs, Itanium/Merced and McKinley, are six issue wide in-order execution processors. In-order execution processors cannot execute past stalled instructions so it is important to have low average memory latency to achieve high performance. This focus on the memory hierarchy can be clearly seen in the McKinley [1]. Although it is not surprising that the on-chip level 3 cache in McKinley is much faster than the external custom L3 SRAMs used in the Itanium CPU module, it is interesting to see how much faster in terms of processor cycles the McKinley level 1 and 2 caches are despite the McKinley's 25 to 50 percent faster clock rate in the same 0.18 um aluminum bulk CMOS process.
The improvement in average memory latency between Itanium and McKinley can be approximated using the comparative access latencies presented by Intel at their last developers conference, combined with representative hit rates based on the size of each cache in the two designs and an assumed average memory access time of 160 ns. This data is shown in Table 1.
CPU
Processor
Itanium
McKinley
Frequency (MHz)
800
1000
L1
Size (KB)
16
16
Latency (cycles)
2
1
Miss rate
5.0%
5.0%
L2
Size (KB)
96
256
Latency (cycles)
12
5
Global Miss rate
1.8%
1.1%
L3
Size (MB)
4
3
Latency (cycles)
21
12
Global Miss rate
0.5%
0.6%
Mem
Latency (ns)
160
160
Latency (cycles)
128
160
Total
Average Latency (cycles)
3.62
2.34
Average Latency (ns)
4.52
2.34
The back of the envelope type calculations in Table 1 suggests that a load instruction will be executed by McKinley with about half the average latency in absolute time than it would on Itanium. No doubt this is a major contributor to the much higher performance of the second generation IA64 processor. Although the large die area of McKinley suggests a substantial cost premium compared to typical desktop MPUs, for large scale server applications the extra silicon cost is insignificant compared to the overall system cost budget. In fact, from the system design perspective, the ability to reasonably forgo board level cache probably more than pays for the extra silicon cost of McKinley through reduction of board/module area, power, and cooling requirements per CPU. Large scale systems based on the EV7 will also eschew board level cache(s), although with the Alpha it is the greater latency tolerance of the out-of-order execution CPU core plus the integration of high performance memory controllers that permit this, rather than gargantuan amounts of on-chip cache.
Besides the greatly enhanced cache hierarchy, the McKinley will boast two more "M-units" than Itanium. These are functional units that perform memory operations as well as most type of integer operations. In a recent article I speculated about the nature of McKinley design improvements. I suggested that it would contain 2 more I-units and 2 more M-units than Itanium in order to simplify instruction dispatch and reduce the frequency of split issue due to resource oversubscription. In IA64 parlance, both I-units and M-units can execute simple ALU based integer instructions like add, subtract, compare, bitwise logical, simple shift and add, and some integer SIMD operations. I-units also execute integer instructions that occur relatively infrequently in most programs but require substantial and area intensive functional units. These include general shift, bit field insertion and extraction, and population count.
Because the integer instructions that cannot be executed by an M-unit are relatively rare, the McKinley designers saved significant silicon area with little performance loss by only adding two M-units (for a total of four) and staying with the two I-units of Itanium. Data on the relative frequency of different integer operations suggest that the vast majority of integer operations, about 90%, that occur in typical programs are of the type that can be executed by either an M-unit or I-unit [2]. If we consider a random selection of six integer operations, each with a 90% chance of being executable by an M-unit, then the odds are better than 98% that any six instructions are compatible with the MMI + MMI bundle pair combination and can be dual issued by McKinley. Thus there is practically no incentive to add two extra I-units to McKinley to permit the dual issue of the MII + MII bundle pair combination.
One curiosity in the McKinley disclosure was the fact that the basic execution pipeline was revealed to be 8 stages long. Although this is still 2 stages shorter than the pipeline in the slower clocked Itanium, it is one more stage than the 7 stages previously attributed to McKinley [3]. Whether this represents a slightly different way of counting the pipe stages or an actual design change isn't clear. Ironically, it has long been rumored that the Itanium pipeline was stretched by at least one stage quite late in development. It will be interesting to see if the new IA64 core under development by the former Alpha EV8 design team (now at Intel) also suffers this strange pipeline growth affliction.
Hammering x86 into the 64 bit World
In October AMD revealed some aspects of K8, its next generation x86 core code-named Hammer [4]. This new design is primarily distinguished by being the first processor to implement x86-64, AMD's extension to the x86 instruction that supports 64 bit flat addressing, 64 bit GPRs, as well as other enhancements. As can be seen in Figure 2, the Hammer core heavily leverages AMD's highly successful K7 core
Figure 2 Comparison of K7 Athlon and K8 Hammer Organization
The back end execution engine of the K8 Hammer core is basically identical to that of the K7 except that the integer schedulers are expanded from 5 to 8 ROPs. The increase in the integer out-of-order instruction scheduling capability this implies may have been intended to better hide the data cache's two cycle load-use latency, and thus slightly increase per clock performance. An alternative hypothesis is that the latency of some integer operations may have been increased to allow higher clock rates and the change was made to prevent a slight loss in per clock performance. The basic execution pipeline of the K7 and K8 are compared in Figure 3.
Figure 3 Comparison of K7 and K8 Basic Execution Pipeline
The K8 execution pipeline has two more stages than K7, and these new stages seem to be related to x86 instruction decode and macro op distribution to the integer and floating point schedulers. Although some of the stages have been renamed it appears that the final five pipe stages, representing the back end execution engine, are comparable. This is unsurprising as the most complex and difficult task in an x86 processor like the K7 or K8 is the parallel parsing of up to three variable length x86 instructions from the instruction fetch byte stream and their decoding into groups of systematized internal operations. In comparison, the execution engine is hardly much more complex than a typical out-of-order execution RISC processor.
Both the block diagram and execution pipeline indicate that AMD has spent nearly all its effort in Hammer development at revamping the front end of the K7 design. Some of the extra degree of pipelining may be related to the extra degree of complexity in decoding yet another level of extensions (x86-64) on top of the already Byzantine x86 ISA. Some of the increase may be related to increased flexibility in internal operation dispatch to reduce the occurrence of stall conditions and increase IPC. And, some of the increase may simply reflect a reduction in the work per stage to increase the clock scalability relative to the K7 core. Without a detailed description of each of the pipeline stages in the K8 it is difficult to correlate front end pipe stages in the K7 to the K8, and next to impossible to assess how the benefit of the extra two pipe stages is allocated between accounting for increased ISA complexity, measures to increase IPC, and reduction in timing pressure per pipe stage to allow higher clock rates.
Although the 64-bit instruction set extension makes for attention grabbing headlines in the technical trade press, the major performance enhancements in the Hammer series are much more prosaic from a processor architecture point of view. These enhancements are the direct integration of interprocessor communications interfaces and a high performance memory controller. Like a "poor man's EV7", the Hammer includes three bi-directional HyperTransport (HT) links and a memory controller supporting a 64 or 128-bit wide DDR memory system using unbuffered or registered DIMMs. With the latter, a K8 processor can directly connect to 8 DIMMs, although this number may be reduced at the higher memory speeds supported. It is interesting to compare the results of the same design philosophy applied to the high-end server and mainstream PC segments of the MPU market as shown in Table 2. Power and clock rates for the Hammer MPU are estimates.
Alpha EV7 [5]
K8 Hammer
Process
0.18 um bulk CMOS
0.13 um SOI CMOS
Die Size
397 mm2
104 mm2
Power
125 W @ 1.2 GHz
~70 W @ 2 GHz
Comm Links
4 links, each 6.4 GB/s,
one 6.4 GB/s IO bus
3 links, each ~6 GB/s
Memory Controller
2 x 64 bit DRDRAM
12.8 GB/s peak
64 or 128 bit DDR
2.7 or 5.4 GB/s peak
Package
1443 LGA
?
Although the Intel McKinley and AMD Hammer are both 64 bit MPUs, these devices are directed at different markets. While the large and expensive McKinley will target medium and high-end server applications, the first member of the Hammer family, code named "Clawhammer", will target the high end desktop PC market. That is not to say that McKinley will outperform the Clawhammer device. Indeed, I expect the AMD device will easily beat the much slower clocked IA64 server chip in SPECint2K and many other integer benchmarks, as well as challenge much faster clocked Pentium 4 devices in both integer and floating point performance.
Exactly how much performance the Hammer core may provide is the subject of some controversy. AMD's Fred Weber was quoted as stating the Hammer core could offer SPECint2k performance as much as twice that of current processors. Although this comment is vague enough to drive a truck through (twice as fast as the best AMD processor? Best x86 processor? Best processor announced but not yet shipping?, IA-32 or x86-64 code?, Clawhammer or the big cache Sledgehammer?) a few web based news sites interpreted this comment as meaning the Hammer would achieve 1400 SPECint2k and now some people are incorrectly attributing this figure to Weber himself. Keep in mind that no Hammer device has even taped out as of the end of 3Q01 let alone been fabricated, debugged, verified, and benchmarked at the target clock frequency. Whatever figure Mr. Weber had in mind was derived from architectural simulation and for a benchmark suite as cycle intensive as SPEC CPU simulation results are approximate at best [6][7]. As been shown time and time again, it is best not to count performance chickens too closely before the silicon eggs hatch.
Alpha Goes Out With a Bang not a Whimper
Although Compaq announced the wind down of Alpha development in June and transferred nearly the entire EV8 development team to Intel over the summer there is still one more surprise in store for the computer industry. The EV7, the final major design revision in store for Alpha, has been the subject of intense testing, verification, and system integration exercises since late spring. This design has been in the pipeline for a long time. It was first announced more than three years ago and finally taped out in early 2001. Because the complexity of this device (basically a complex CPU and large scale server chipset all on one die) and the incredible degree of shakedown server class MPUs and systems undergo, the EV7 will not go into volume production until the second half of 2002. To bridge the gap between current products and EV7 based systems Compaq will shortly release a 1.25 GHz version of the workhorse EV68.
Although general details of the EV7 design have been in the public domain for more than three years, and specific facts about the performance of this MPU's router and memory controllers were disclosed in February, I think the performance it will achieve when officially rolled out in 2H02 will surprise and dismay many in the industry (possibly including senior Compaq management). At the Microprocessor Forum in October Compaq's Peter Bannon unveiled some preliminary performance numbers for the EV7, namely 804 SPECint2k, 1253 SPECfp2k, and roughly 5 GB/s STREAM performance.
Although these numbers are quite good in absolute terms, comparable to the fastest speed grade POWER4 running in a contrived and unrealistic hardware configuration, the numbers failed to live up to my estimates given in a previous article. However, former members of the Alpha design team have privately confirmed my suspicions that Mr. Bannon was clearly sandbagging the EV7 numbers, keeping a not insignificant amount of performance off the table. For a product still more than six months from release that is a not unexpected tactic. I still hold the opinion that when it is all said and done the EV7 has a good chance of being the highest performance general purpose microprocessor ever fabricated in 0.18 um technology, a fitting ending to a remarkable and tragic technological saga (EV79, an EV7 shrink to 0.13 um SOI is on the roadmap for 1H04 but the continued turmoil at Compaq suggests a healthy amount of scepticism is in order).
Sun's Surprising Spike SPARCs SPECulation
Sun recently introduced a new member of its UltraSPARC-III family. This new 900 MHz device differs from earlier US-III parts by the use of copper interconnect instead of aluminum. Although Sun submitted official SPEC scores for a 900 MHz Sun Blade 1000 Model 1900 using an aluminum US-III in late 2000, yield was apparently poor and this speed grade wasn't generally available. A rarely occurring bug related to a prefetch buffer inside the US-III was discovered and as a work around this feature was disabled in firmware. Unfortunately for Sun Microsystems, this caused the SPECfp_base2k score for the Model 1900 to drop from an already lackluster 427 to a lamentable 369 in a second SPEC submission in the spring of 2001. So it comes as no small surprise that the Sun Blade 1000 Model 900 Cu workstation, based on the new copper processor turned in a SPECfp_base2k score of 629 in a recent submission. Both the Model 1900 and Model 900 Cu versions of the Blade 1000 feature 8 MB of L2 cache.
It is possible that the copper US-III incorporates improvements beyond a fix to the prefetch buffer bug as well as improvements to system level hardware between the Model 1900 and Model 900 Cu. However it appears much of the improvement can be attributed to the use of the Sun Forte 7 EA compiler instead of the earlier Forte 6 update 1 compiler used to generate the 427 and 369 scores. The reason why I say that with confidence can be seen quite readily in the graph in Figure 4.
Figure 3 SPECfp_base2k Component Scores for US-III and Competitors
The SPECfp_base2k scores for the 14 sub-component programs for the pre-bug fix Sun Blade 1000 Model 1900 submission using the Forte 6 compiler are compared to the recent Sun Blade Model 900 submission using the Forte 7 compiler. In addition, scores for the Itanium (4MB, 800 MHz version in an HP i2000), Alpha EV68C (1000 MHz version in an ES45/1000), and POWER4 (1300 MHz version in a pSeries 690 Turbo) are provided for reference. It is the new compiler's score on the 179.art program that quite literally stands out from the rest. Although several other programs see appreciable improvement (the 183.equake score nearly triples), the new compiler increases the score of 179.art by more than 800%. In absolute terms this score, 8176, is more than four times higher than that achieved by the Alpha EV68 and POWER4, MPUs that easily beat the copper US-III on nearly every other SPECfp2k program. The 179.art score achieved by the Forte 7 compiler is vital to the new machine's pumped up SPECfp_base2k score. If you leave 179.art out of the geometric mean then its SPECfp_base2k score would drop by nearly 18% from 629 to 516.
This remarkable improvement on 179.art is unusual in the field of compiler engineering where single digit percentage performance increases are often considered major victories. So it is no surprise that Sun's achievement immediately raised suspicions among industry observers and competitors about the nature of the optimization employed by the Forte 7 compiler. It is hard not to think of Intel's infamous eqntott compiler bug that erroneously increased the SPECint92 score of its processors by about 10% until caught and fixed [8]. This bug used an illegal optimization that allowed the output of 023.eqntott to pass result checking with the test data used but was invalid in the general case.
Although the exact nature of the new Sun optimization isn't known, suspicion has fallen on several inner loops within the 179.art program. Speculation is that this code was originally written in FORTRAN and converted to C. Because FORTRAN and C access two dimensional arrays in opposite row and column order it is presumed that 179.art accesses arrays by the wrong index in the innermost loop causing poor cache locality. It is possible that the new Sun compiler recognizes this situation and turns the nested loops that step through the array accesses "inside out" and achieves much lower cache miss rates. Whatever the exact nature of the Sun optimization turns out to be there is the question of whether it violates one of the SPEC rules, namely "Optimizations must improve performance for a class of programs where the class of programs must be larger than a single SPEC benchmark or benchmark suite".
Without knowing the nature of the new Sun optimization it is impossible to say whether Sun should be praised or scolded. But here are the words of Sun engineer John Henning who made the following comments in a November 27 post to the comp.arch usenet news group:
"Our compiler team believes that what Sun has done with art is (1) the result of perfrectly [sic] legitimate optimizations (2) compliant with SPEC's rules and (3) not appropriate for further discussion - if you want to figure out to make art faster, go work on it yourself, don't ask Sun how we did it!"
With the widespread attention this incident has engendered within the industry we can presume that compiler and benchmarking experts working for Sun's competitors have closely scrutinized the code Forte 7 generates for 179.art. The fact that Sun's new scores haven't been withdrawn from the SPEC official web site yet suggests that Mr. Henning is correct. No doubt we can expect competitor's processors to score much higher on 179.art in the months and years to come as the Sun optimization migrates to other compilers. Depreciation of a benchmark's value is seldom as spectacular as in the case of 179.art, but still naturally occurs over time and provides incentive to accelerate the development of a successor to the SPEC CPU 2000 benchmark suite (which no doubt will not include 179.art). A message soliciting programs for this new suite, tentatively named SPEC 2004, was posted on comp.arch on December 28. Ironically the author of this message, the secretary of the SPEC CPU subcommittee, is none other than the previously mentioned John Henning.
Conclusion
It is comforting to see the pace of innovation in the microprocessor field shows no sign of slackening. The great seesaw battle between Intel and AMD for share of silicon's richest prize, the x86 microprocessor market, is about to enter a new phase with the imminent release of the 0.13 um Northwood Pentium 4. Although AMD will also migrate its K7 core to 0.13 um later in 2002 with both bulk and SOI versions, it is unlikely to be in the position to regain the performance advantage over Intel it previously achieved with the T-bird and XP Athlon until its new 64-bit Hammer core ships. Unlike AMD, Intel plans to reserve its 64-bit offerings for the high-end market. With McKinley Intel hopes to address the significant performance difficulties seen in the Itanium in part by taking advantage of its capacious manufacturing facilities to incorporate a huge amount of on-chip cache on its sizable die.
It seems like the time it takes for new ideas and features to migrate down from high-end server MPUs to mass-market devices is shrinking. The integration of high performance interprocessor communication links and memory controller(s) onto a processor die has been on the drawing board for many years and will soon be realized in the high end server market in the form of the EV7. Remarkably, the same concepts will appear in a mass-market x86 processor, the first of AMD's Hammer series, not too much later. Although these features will naturally be more limited in the scope in the x86 device to keep costs under control, they should still provide a large boost in performance from significantly reduced memory access latency as well as a dramatic reduction in the cost of producing multiprocessor systems based on this device.
Few topics in the computer and microprocessor field can raise a controversy, as well as blood pressure, as quickly as benchmarks and benchmarking. Sun managed to throw a hand grenade into the simmering debate between the supporters and detractors of the industry standard SPEC CPU benchmark by speeding up the execution of one of the fourteen programs in the floating point suite by nearly an order of magnitude through the use of a previously unexploited compiler optimization. This in turn raised the SPECfp2k score of its latest US-III processor by roughly 20%. We can now look forward to the spectacle of competing firms scrambling to reverse engineer Sun's new compiler trick and incorporate the same voodoo into their own wares.
References
[1] Krewell, K."Intel's McKinley Comes Into View", Microprocessor Report, October 2001, Volume 15, Archive 10.
[2] Hennessy, J. and Patterson, D., "Computer Architecture A Quantitative Approach", Morgan Kaufmann Publishers Inc., 1990, ISBN 1-55860-069-8, p. 181.
[3] Advance Program, 2001 IEEE International Solid-State Circuits Conference", p. 35.
[4] Weber, F., "AMD's Next Generation Microprocessor Architecture", October 2001, Downloaded from AMD web site.
[5] Jain, A. et al, "A 1.2 Ghz Alpha Microprocessor with 44.8 GB/s Chip Pin Bandwidth", Digest of Technical Papers, ISSCC 2001, Feb 6, 2001, p. 240.
[6] Dulong, C. et al, "The Making of a Compiler for the Intel Itanium Processor", Intel Technology Journal, Q3 2001, Downloaded from Intel web site.
[7] Desikan, R. et al, "Measuring Experimental Error in Microprocessor Simulation", Digest of Technical Papers, 28th Annual International Symposium on Computer Architecture, June 2001.
[8] "Intel OverSPECs Parts", Microprocessor Report, January 22, 1996, Volume 10, Number 1, P. 5.
Copyright © 1996-2001, Real World Technologies - All Rights Reserved
By the way, Pricewatch is quoting about $3K for the lowend Itaniums running at about 700 Mhz. No thanks.
99.99% of everything my computers does is <32 bits. So if I get a 64 bit cpu, does that mean that my computer can slag around extra unused bits just for fun?
*We can now look forward to the spectacle of competing firms scrambling to reverse engineer Sun's new compiler trick and incorporate the same voodoo into their own wares* This will be excellent, duel rates of improvement= fasterer computers, the elbow in the rate of moores law. Still, applications for this tech have imagine a beowulf of these results as side effects.
Here is a mirror of the article
Link
kawai
I couldn't believe the speedup I got when I installed Intel's Linux C compiler and recompiled my computational materials physics simulation code...
i shoulda lernd by now
Impressive though 64-bit processors might be, I'm not convinced that the performance improvement is going to be as big as people are expecting.
Remember that the components in any digital system - and I'm not just talking about your windoze desktop PC, but servers, mainframes and embedded systems too - have to talk to each other in order to do anything remotely useful. Last time I looked, most PCI devices din't utilise the provision for 64-bit data bus operation.
There's a perfectly good reason for this, of course... in order to attach a chip to a circuit board, you need an array of pins (or solder balls) that are macroscopic, so they can be soldered and handled without too much risk of accidental damage. Additionally, PCB tracks can only go so small (and so close together) without undesirable electrical effects and again, an inability to work with it in a production environment.
The "more bits" phenomenon has been sustained by improvements in VLSI and the advent of true System-on-a-chip design, but this too has its limits. If you compare a P4 motherboard with, say, a 386 mobo circa 1995, you'll see the chip count is drastically reduced. But fewer interconnected components means less repairability, upgradability, and interoperability. My old 486 had a VLB EIDE hard disk controller, which I swapped in after the last one failed. If my controller failed today, I couldn't do that; I'd either need to buy a new mobo or start replacing chips on the old one (which is just as expensive).
Don't get me wrong - I'm all for progress! And I expect we'll see more and more 64/128-bit chips springing up inside custom devices (e.g. 3D cards, routers) where the local interconnect can be made as fat as necessary. But the PC will remained shackled by slow frontside busses for a while yet, I reckon.
These sigs are more interesting tha
Correct me if im wrong but isnt this going to make the programs bigger and heavier than before? I sure hope they will make it possible to make for example just a calculation in 64 bit and the rest in 32 bit instead of wasting bits for no reason. Just thinking out loud =)
HTTP/1.1 400
I really hate Dan Patrick.
With Digital being sold to Compaq and then Alpha being sold to Intel and Compaq possibly merging with HP, the future there is clouded. I have been working with Alphas and have been told that the future is Itanium coloured, but sorry, I don't really like the chip. EV7 will come out, but so far its performance doesn't look so competitive.
With a lot of former Digital talent working at AMD, I think this will be the better option. However, the K8 is not a clean design, it seems to be a 64-bit version of the K7 with some extras on the pipelineing. I guess hat the chip is not ging to be the easiest to get the best performance from.
See my journal, I write things there
Error Occurred While Processing Request
Error Diagnostic Information
An error occurred while attempting to establish a connection to the service.
The most likely cause of this problem is that the service is not currently running. You can use the 'Services' Control Panel to verify that the service is running and to restart it if necessary.
Windows NT error number 2 occurred.
Much as I hate to say it, the Intel McKinley looks like a very well designed piece of kit, and it appears Intel have learned from their mistakes with the P4 by including a big, fast 3-level cache on the McKinley. It's also good to see them reducing their pipeline size, which means it may finally be able to compete with the G4 in terms of efficiency. However, this is of course going to kick them in the teeth in terms of competing on processor speed, which they have been pushing so hard recently in their marketing.
The same can't be said of AMD's offering, although in fairness the Hammer is not directed at the server market unlike the McKinley. The pipeline is longer than both their previous design and the McKinley, which is going to give them a performance hit. We can only hope that their cache is as good as Intel's.
What amazes me is that they can still keep adding instruction extensions without too much of a performance hit. Anyone looked at the latest instruction set documentation for these processors? Eugh! The pain of backwards compatibility...
Been in 64-bit heaven since IRIX 6.0 in 1994. PowerIndigo2 (R8000) on the desktop, Challenge XL's in the server room (R4400 and R10000). And, today, Octane on the desktop and Origin 300 + Origin 3000 in the server room. A few UltraSPARC Suns, too, but Solaris took its sweet time making the move to 64 bit (Sun started the migration with Solaris 2.5 and finished with Solaris 7).
First of all I'd like to say, I am not biased in either way.. after all I'm going to get me a new AthlonXP next week. .. registers) And newest processors can deal with them with same ease as with non-far addressing.
IA64 is very different from x86-64. AMD's 64bit solution is nothing more than extension to current 32bit instruction set. Of course there are some tweaks, but nothing very radical. You will still be able to run old 16 and 8bit code efficiently.
Intel's IA64 is a huge step in the future... architecture wise is far superior to x86-64. Why?
Why do we need 64bit processors? Addressing? Nah, current processors can address enough space.. with 386 processors FAR addressing was introduced, which expanded allocatable address space drastically. (those silly DS, SS,
AMD's 64bit solution currently has no real value.. except for huge data storage (could work faster with 64bit data blocks) and probably some heavy encryption. x86-64 compiled Quake3 would make minimum use of 64bit registers.. and would probably be just a margin faster than IA32 compiled version.
Is IA64 better? Yes it is. IA64 has 128 usable 64bit registers, predicates... But that is not all.. in single 64bit register you can store 4 16bit values(common integer). (or 8 8bit or 2 32bit)And manipulate with them almost as much as you like. And if you have 4 integers in other register.. you can make 4 arithmetical operations with SINGLE instruction. You can do similar things with floating point operations... and with ILP you could do 3 instructions per cycle. This means that Quake2's VectorAdd/Subtract could be done in SINGLE cycle.
Clawhammer will be better for a year or so.. but soon it will hit the ceiling. Intel will be able to get better performance from 1/2 clocked IA64.
And please don't respond with lame comments if you haven't read at least whitepapers from Intel and AMD.
You may want to read this about the Mckinley, successor to Merced.
As far as I know, Merced is HP's design. Mckinley is Intel's. So... you could say Intel is learning from their mistakes by letting HP engineers do a good job.
Anyway, it's a mutually beneficial thing because HP doesn't have the resources to market and drive the product, while Intel doesn't have the engineers or resources to design and implement the architecture in a 'good' way. Intel provides process and HP provides layout, and together they will take over the world!
At least that's what I've heard and read. Myself, I own a G4 and use a Mac, it's not exactly as if Itanium is going to strike me down anytime soon.
GPL Deconstructed
Once we get the 64-bit hardware, we still have the MMOS (minor matter of software) to worry about.
Cases in point:
Silicon Graphics machines with MIPS R4400 (and up) CPUs were 64-bit, but the additional address and pointer space wern't utilized until IRIX 6.0 in 1994 -- over 18 months later. (And, of course, certain SGIs still run in 32-bit mode due to RAM concerns -- 64-bit requires more RAM -- all Indys, all Indigos, all O2s, and R4400 Indigo2s).
Sun machines with UltraSPARC CPUs were 64-bit, but again, the additional address and pointer space had to wait for software support. (Multi-stage transition to 64-bit, starting with Solaris 2.5 and finally complete with Solaris 7 in 1998).
Then there's application optimization. Many apps can get slight speedups by processing data in larger (say, 38-bit or even 64-bit chunks). Sometimes the difference is huge, many times it's small. But, lots of little speedups can add up across an entire system. Still, someone has to make these changes to apps and compilers. It takes time, testing, and adoption. In better times, SGI did several such overhauls... they got some insane speed out of Netscape Enterprise and Netscape FastTrack web servers during the Everest project. One of their engineers also did some cool (but nonstandard) hacks to Apache, including the very first pure, clean 64-bit port/mod.
Newer, faster, wider, more-torque hardware is always great. But don't forget the software.
Aww, be nice!
(But, well, yeah I guess he is.)
-- What do you need?
-- Gnus. Lots of Gnus.
32-bit CPUs use a 32-bit address space. That's space enough to address 2^32 bytes, or 4GB. With today's 100+GB hard drives and fractional to low GB RAM capacities, each requires its own addressing.
However, with 64-bits of addressing space, you have enough room to memory-map your entire stinking (yes, stinkin) hard drive into virtual RAM address space. This means your virtual address space would represent both RAM and your file system together.
im glad that AMd's chip will support 32 bit becouse i am not ready to ditch my 32 bit apps and games
Got Athlon?
I can't believe that you spent all that time coming up with this crap.... its just plain old boring!
With the currently popular 32-bit CPU chips, Robot AI memory limitations are too severe because a memory of 2^32 size is not enough.
Ah, an attempt to be somewhat on topic. Hovever, I don't buy it - how much memory is enough - do you know or are you blowing smoke? And seeing as few machines have this much ram (2^32 = 4Gb ram) don't they use disk swap files or databases anyway? There are many file systems that can handle files this size already, so how exactly will 64bit processors suddenly enable AI in VB that can't be done at present?
An increase in computer power is a rising tide that lifts all boats, even crank AI, but how exactly is the move to 64 bits a sudden huge leap for your Javascript "mind"?
My Karma: ran over your Dogma
StrawberryFrog
You're a nut, man. The `singularity` thing is the most obvious giveaway, two-bit futurists have probably been babbling about computers becoming smarter than humans and going off on their own since Babbage.
--
Benjamin Coates
64 bits on a server = larger databases (the better the catalog you with my dear)
64 bits on a desktop = ?
I'm sorta missing why this is good for the rest of us...
3000 dead over past 2 years, still no free Palestinians, still
this was so fukkin funny I almost pissed my pants.
Actually, even that's not strictly true, since according to the Resource Kit documentation Windows XP's initial configuration detection is *still* 16 bit.
UNIX? They're not even circumcised! Savages!
This is the basis of Al Win Modem's plot to overthrow the world!
All your Athlon are belong to us!
submit
Come on - the has to be the best post of 2002! Mod it up another point, and e-mail a copy to bbc.co.uk/history.
That test was rigged to produce a result that would make AMD look bad.
(Real operating systems run perfectly well on 8-bit CPUs.)
Firstly, it would appear that you have at least read some white papers on the web, but have you used a real Itanium?
I have, and let me say that the reason AMD will win is compatibility and making sure that the things don't sound like a jumbo jet taking off. These may seem like minor points but they are what will count.
The major point of the AMD solution is backward compatibility... Intel knows this, why do you think that all their previous chips have been sucessfull? Because they were the "best" solution for the job? NO, not by a long shot. They were, however, able to run the old software faster, and provide a route for new software to run even faster still.
AMD provides this solution, new software can take advantage of the 64 bit addressing and processing several integers in just one register. But at the same time old software can run on the system faster than it would on the same clock speed 32 bit processor.
Look at it from an IT purchasing point of view, you can pick machine A which will theoretically be faster in the future but to get anything even aproaching decent performance you need to buy a whole load of new software. Or you can pick machine B which will run all your current software faster than your current machines, and run future software even faster still. Which would you choose, people like a sure thing, not the promise of something good in the future.
AMD can then concentrate on moving towards a pure 64 bit machine once most of the applications have moved to 64 bit, this makes the most sense long term. You buy your 64 bit machine, run 32 and 64 bit mixed software quickly. Then once you are running mostly 64 bit you can move seamlessly to a 64 bit proven and tested environment.
Current Itaniums are slow large and noisy, this makes a huge difference if you only have a small server room, or have to run a server under your desk - some people really have to do this in smaller comanies. You won't see them on the desktop market anytime soon, they are too slow, 32 bit performance is not great.
The Itanium might have a good archtecture, but I think that it's lack of speed and compatibility with 32 bit applications, coupled with noise and heat will cause it to lose the battle
If you ever drop your keys into a river of molten lava, let'em go, because, man, they're gone.
I personally still use my 6502c 8-bit proc for my Atari 800XL, has all the power I need.
Atari Power with out the Price!
- MUD
I believe you mean that McKinley is an HP design, not the other way around. HP has been working on VLIW architectures for a decade now and they certainly have a better idea of how to make it work.
Gates likes this new technology.
He will finally be able to store his net worth in dollars in a single long int.
I've had enough abrasive sigs. Kittens are cute and fuzzy.
Why do we need 64bit processors? Addressing? Nah, current processors can address enough space.. with 386 processors FAR addressing was introduced, which expanded allocatable address space drastically.
Far adressing can handle 4GB. but 4GB is not much by today standards(it is not little either). You want flat adress space, and ram is cheap this days. 32 bits get you to 4GB, if you want more you have to resort to tricks. (those silly DS.DD etc)
With the first 64 bit alpha's they used this as an argument: it is useful for fast memory scans when using big databases.
--640 Kb will be enough....
> As far as I know, Merced is HP's design. Mckinley > is Intel's. So... you could say Intel is learning > from their mistakes by letting HP engineers do a > good job.
You mena, of course, that Merced is Intel's design, whereas McKinley is HP's.
Don't they know its illegal to reverse eng. something like this. DMCA will get everyone in then end. Mwahahaha!
The original test was written in FORTRAN and ported to C. As the article states the array was accessed in a FORTRAN manner which screwed up C's cache locality. Had this program been written in C in the first place the programmer would have adjusted the loops to better accomodate C's array layout and avoided the issue altogether. Although Sun's optimization is admirable, it will not help out the vast majority of C programs written with C's design in mind. There are lies, damned lies and benchmarks. Sun's processor speed claims are just that. Real world code will not be significantly aided by this particular optimization. Also, I would be a bit leery of a compiler questioning and rearranging the order of my loops. I've caught quite a few bugs in over zealous optimizing compilers over my career and it is certainly the last place you look for a bug, and is very time consuming and expensive to pinpoint.
There are actually several meanings of "64-bit chip". If you mean "64-bit or wider data path through the CPU", then yes, performance is the reason people want that. If you mean "uses 64-bit addresses", then yes, performance isn't really an issue, whereas increased program size is (e.g. large databases).
I'm proud of my Northern Tibetian Heritage
Motorola and Apple really have the jump on this one. And it is about time, because the g4 has not been holding its ground for a while now.
Apple is going to use an MPC85xx. Here is one, if not the, chip apple will use MPC8540 info
64/32 bit processing, 333mhz DDR, Rapid I/O, etc.
Hypertransport will probably also find it's way into the new motherboard designs. That's been done for a while now.
"Things are more moderner than before- bigger, and yet smaller- it's computers-- San Dimas High School football RULES!"
Since transistors are cheap once we have a cheap interface, because of low pin count, cards which have a lot of individual ATA channels will become a lot cheaper too. A 1 meter cable length should suffice for most applications too.
Then the only real advantage to SCSI from a user point of view will be that the HD manufacturers artificially seperate the markets by only not using ATA for their best HD's.
Sun currently sells the lowest cost 64 bit server
check out www.sun.com they have two 64 bit machines that include all you need to boot up (the box and the software) for under $1k. Let's see if Intel or AMD can do that? Yes you can pay a lot more like $1M for a Sun too should you be needing a 128 CPU multiprocessor box. Try that (today) with AMD or Intel. That said I admit to using a P500 (running Solaris 8) to type this on simply becausethe P500 was free.
has anyone actually looked at a chart of volume shipments of microprocessors ?
32bit: you see ARM with about 60%-70% MIPS with about 25%-35% and then rest are split into 5% depending on who did the research
64bit you see MIPS 90%-95% sparc with about 3%-7% and rest split into 2%-3%
ask yourself whats in my Set Top Box ?
an MIPS/ARM of some kind whats in my printer MIPS, whats in my phone ARM, whats in my router or adsl box a MIPS
wake up people no one cares the war is over
intel know it what do you think IA64 is about ?
oh and notice 1 Billion intel bucks bought a ARM licence they dont care about ia32 StrongARM and StrongARM2 aka Xscale are taped out and earning cash
64 bit MIPS has gone past the 1GHz a while back with dual cores on one die and has speculative execution from NEC
really these are the things to worry about
regards
john jones
With 32 bits you can address 4GB. With 64 bits you can address twice what you calculated.
Most PC motherboards reserve lots of the memory above 0x80000000 (2 GB) for purposes other than main RAM (AGP, PCI, ROM, etc.).
Will I retire or break 10K?
jumping around in [assembler] code, you want to be able to use negative numbers -> gives you +- 2GB range
Long jumps (i.e. more than +/- 127 bytes in 6502 or x86 or +/- 32767 words in MIPS) are not relative in most architectures; they're absolute. The real reason for a 2 GB limit is not for negative address but instead for memory-mapped devices other than RAM, such as AGP, PCI, ROM, etc.
Will I retire or break 10K?
Which is later followed by:
Hmmm... 3MB is astounding, but 8MB is unremarkable... Well, I'd have to agree. I haven't bought a server with less than 4MB of cache in years. Oops, the SunBlade is only a workstation... Kinda makes you wonder.
Sun might be expensive, but it's solid, fast (enough), and predictable. I love x86 (usually Linux) at home, but wouldn't dream of putting it someplace business vital - much less mission critical.
Business Vital == 1 maintance window per month and a mean time to recover exceeding 6 hours potentially costs several million dollars.
Mission Critical == 1 maintance window per quater and a mean time to recover exceeding 15 minutes potentially costs several million dollars.
You need sector-sized granularity for tracking changed sectors. The latter does not scale well.
The common implementations of virtual memory use page-sized granularity for tracking changed pages. Does that scale?
The big problem I've seen with using a single memory space is that applications often forget to implement multiple levels of undo. With many PDA applications, once you make two accidental changes to a file, the previous version is gone forever because many applications modify files in-place, breaking the "revert to last saved version of document" feature. This bit me in the butt several times on Newton OS.
Will I retire or break 10K?
Having an IDE controller in your 486 also presupposes a spare slot. Just, at the time, you had to plan ahead and figure that two slots are gone right from the start (video and IDE).
:).
Nowadays, you might have no slots taken up by IDE video, sound and lan. This is a good thing as it is freeing up slots for you to go willy-nilly all over.. If you have to remove, say, a video capture board to make up for a broken IDE controller, just keep in mind that you never had the choice with your 486.
Probably what you want is some sort of indefinitely expandable bus that may actually be soming out this century
you state:
"version of Windows XP (or whatever the next version will be called)"
I recommend "Windoze ES", for Eat Shit. After all, it's what they've told the gummint.
Kind of like: Eat Shit -- 10 Billion Flies (95% Market Share) Can't Be All Wrong
Win64 has been ported to Itanium for some time now. We've already ported our memory-hungry special-FX app to it. But few people outside the server space are going to be interested in getting an Itanium because the performance with legacy IA32 apps is dog-slow. I mean really slow, like P90 speeds. So we don't expect too many sales of that version, just a few for hardcore dedicated seats.
Sledgehammer is really interesting to us. Combine the best available x86-32 performance for running 3DS Max, Lightwave, Photoshop etc etc, along with the serious memory & address space of a 64 bit CPU for our app, not to mention quite a bit more speed when doing 64 bit calculations (pretty common with 64 bit pixels), makes for a powerful and still flexible beast.
But without Windows support, the IA32 performance advantage is largely meaningless. In our market, that relegates Hammer to Linux-64 render farms - which is fine, but it's not where our money is, and it's not where the CPU would shine. You can use Win32 or Linux-32, of course (unlike Itanium), but that's kinda missing the point.
AMD better get MS & Win64 on their side soon, if they want to capture the workstation market. A lot of server apps still require Windows too. The reality of the market is that mainstream OS support is required, or you get niched PDQ.
Why would anyone engrave "Elbereth"?
I think the point of a 64-bit cpu is a bit short sighted. Hardware vendors should be jumping to 128-bit cpus, ala the "Emotion Engine" in the PS2.
Why 128-bit? Because a 4-tupple (x,y,z,w) for vector and matrix operations can then be natively done. (Yes, I'm spoiled with the VU's on the PS2)
I do wonder when 64-bit cpu's will actually become a commodity item though. A 32-bit cpu provides 99% functionality for most of the general public using them. It's only gaming, scientific computing, & multimedia that really need the 128-bit registers, correct? Or am I missing something?
Secondly, for 64-bit cpus, is there a standard instruction set? Or do I need to compile our game code specifically for IA64, and Hammer?
I do agree, that a 64-bit address would be a welcome change. I can imagine the Database guys jumping up and down with joy once cheap PC hardware supports 64-bit.
Hi,
;-)
I appreciate the concern that readers might have about slow download time, particularly when the server is hit by so many technology enthusiasts that come from this fine site...
However, posting the entire article - even with full attributions - is a violation of copyright law, because nobody actually asked for, nor received permission to, post it elsewhere in its entirety, AFAIK.
Please note that Real World Technologies does *not* own the copyright, it is owned by Paul DeMone. Therefore I suggest that either the site owners of Slashdot request permission from Paul to keep this here (if they have not done so already, or it should be removed.
Another poster provided a link to the full article (printed version), which is much more appropriate until such time as Paul wishes to give his blessings for having the article reprinted in full here...
PS... Anyone here want to donate a high-powered web server just for Slashdot readers to enjoy faster response times when reading RWT?
Regards,
Dean Kent
Real World Technologies
This is not an official mirror site. This site has effectively stolen the article in full from Paul DeMone.
Anyone who cares about Intellectual Property should avoid using this link. The owner of that site should contact Paul DeMone directly and request permission for posting it.
I find this rather disturbing that this is done so blatantly...
Regards,
Dean Kent
Real World Technologies
The "more bits" phenomenon has been sustained by improvements in VLSI and the advent of true System-on-a-chip design, but this too has its limits. If you compare a P4 motherboard with, say, a 386 mobo circa 1995
I agree, the jump from 16 to 32-bit desktop processing was anti-climactic. The hardware definitely lagged behind the processors, and the software lagged even further.
Maybe we've got it figured out by now, though?
I'm a 2000 man.
I was always wondering why they do not average time for spec number (harmonic average for scores). That way it would be very difficult to change average with only 1 test. Yes, they do something like this for spec_rate, but this test is mainly for multiprocessor systems
Nerds!!!!!!
arrrrrrrrgh!!!!
nnnnnnnnneeeeerrrrdddddsss!!!!!
That makes me feel alot better, stompin some nerds!
I'll take a look at 64-bit processors once I can manage to utilize more than 60% of my current 32-bit processor on a regular basis.
...so by this definition the 8088 was an 8-bit processor and the 386SX was 16-bit.
If you go by use address space, the 6502 is a 16-bit CPU. And the Pentium Pro is 36-bit. (Sort of.)
I think a third definition - width of general-purpose registers - is the commonly used one. In that case the 6502 is only 8-bit, the 8088 is 16-bit and the 386SX is 32-bit.
"How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README
(At least from Intel)
8 bit - 8008 - 1972
16 bit - 8088 - 1978 (1.3 bits/year)
32 bit - 80386 - 1985 (2.3 bits/year)
64bit - Itanium - 2001 (2 bits/year)
128 bit - ?? - 2026? (2.5 bits/year?)
I can't imagine what I would do with a 2^64 address space, let alone 2^128...
-Jeff