Hoser+McMoose · Slashdot Mirror

Re:My question is:, MAC on Own a Piece of An Apple-Based Supercomputer · 2004-02-12 06:02 · Score: 1

Do you honestly think that EVERY scientific computing application is nothing but a bunch of multiply-adds? Sure, that's pretty much all you're going to do if you're just solving large matricies (admittedly a common task in HPC stuff), but this is definitely not the ONLY type of code out there.

It is, however, the only type of code that the Linpack benchmark uses.

Re:Radeon 9600s in the servers on Own a Piece of An Apple-Based Supercomputer · 2004-02-12 05:54 · Score: 2, Insightful

There actually IS a way to do math on GPUs, take a look at what these people are doing. I don't know if there is any software yet that will run on Macs and OS X, I think most of it is targeting PCs running Linux, but at least in theory you can do some some pretty high performance math on GPUs.

Of course, there are some downsides to this. First off, it only works on the latest and greatest generation of GPUs that are programable (the Radeon 9600 should qualify here). Second, GPUs only support single-precision floating point math, not the double-precision needed by a lot of complex computing. Third, GPUs are rather powerful vector processors, which are somewhat different than general purpose processors. This is not entirely a bad thing, the Earth Simulator is a giant vector processor as well, but some applications don't work as well on vector processors.

Re:My question is:, MAC on Own a Piece of An Apple-Based Supercomputer · 2004-02-12 05:34 · Score: 3, Informative

That's an integer multiply-add. The PowerPC 970 can do a double-precision floating point multiple-add, and that is what the Opteron and P4 lack. They can get pretty decent throughput for this sort of thing using SSE2, but only about half of the throughput, clock for clock, that a PPC 970 can get.

Given that getting on the Top500 list seemed to the main goal of this system, and that list uses only the (very limited) Linpack benchmark which is essentially nothing but multiply-adds, this makes the PPC 970 a much better chip. Of course, for real-world code, the difference might not be nearly as large and in many situations the P4 or Opteron could easily be a lot faster.

Of course, one question that could easily come out of this is WHY doesn't SSE2 include a double-precision floating point multiply-add instruction? You would have to ask Intel about that one, because it seems like a natural instruction to have in SSE2 if you ask me. Even with the updated SSE3 they didn't add this.

Re:PCI-X on Own a Piece of An Apple-Based Supercomputer · 2004-02-12 05:25 · Score: 1

You can hang a firewire mass storage device off of it to backup (tape, disk, etc), boot from (recovery, etc)

A single front firewire port does seem like a reasonable addition for backup procedures, and MAYBE even a second one in the back, but three ports? Seems like a waste of money and prescious space on a 1U rack-mount.

add extra storage in a pinch,

That would have to be one heck of a pinch for a user to want to use firewire instead of the gigabit ethernet. It would probably have been a lot more useful to put a third ethernet connection on the thing than three firewire ports.

You can create various types of clusters using firewire. One product is the sancube.

Clustering by firewire eh? Sure you CAN do it, but that doesn't mean that it's a smart thing to do! Again, a third gigabit ethernet port would have been more useful and better supported here.

those very same motherboards are probably used in both servers and high end workstations,

Not in a 1U-server they aren't! I suppose you COULD use the same motherboard between a workstation and a 1U server, but it would be a really dumb idea. Fortunately Apple was not dumb in this case, they have a totally different motherboard in the X-Serve. It actually looks like a pretty well designed board and case, all things considered. 2 processors, 8 DIMM slots, 2 full-length PCI-X slots, etc. All fairly nice. But the two firewire connectors at the back are not overly useful in my mind.

One thing I did just notice is that there is NO integrated video! Geez, loose the firewire and through an el-cheapo integrated video on those things instead! Sure, the video won't be used very often (especially in cluster systems), but there are probably going to be a LOT more people using integrated video than those using all three firewire slots!

Re:ECC? (was: You forgot one) on Own a Piece of An Apple-Based Supercomputer · 2004-02-12 05:03 · Score: 4, Interesting

Well, from what I can see there is barely any difference between the memory controllers on the two systems. It looks like it was just a new revision of the same ASIC. Apple doesn't exactly provide many details on this, but it looks like the new memory/processor controller chip would be a drop-in replacement for the chip used on the original Powermac. Therefore it's possible (even likely) that they will use this new revision on the next revamp of the G5 line. In fact, they could well start slipping them into the current line-up without telling anyone about it.

I don't anticipate that Apple will sell any desktop G5's with ECC memory installed at the factory, but if the memory controller supports ECC you could easily replace the factory memory with third-party ECC memory.

Re:Why use Intel anymore? on Current Processors Tested With Linux · 2004-02-04 14:57 · Score: 1

Unfortunatly AMD not being 100% compatable with the standard causes problems

Intel is not 100% Intel compatible either. Don't believe me? Go to their website and check their "Errata" sheets. These are lists of bugs in their processors where they do not conform to the x86 standard (which, in itself, is a bit of a moving target). AMD processors actually tend to have fewer errata than Intel processors, though I'll leave it as an excersice to the reader to decide if that's because AMD's processors are more Intel-compatible than Intel processors or simply if Intel documents their bugs better.

There are some problems with AMD chips and certian hardware.

There are also some problems iwth Intel chips and certain hardware, or more to the point, motherboards on which those chips sit. The chips themselves have dick-all to do with compatibility, it's all in the motherboards.

Intel also has much more R&D into making a chip that won't fry itself if it gets too hot.

Go back and watch that Tom's Hardware video again, but this time keep three things in mind:

- Intel's thermal throttling reduces chip speed by 50-70%. If we assume that ALL the power consumption of a chip was dynamic power, that would translate in a fully throttled 2.0GHz P4 processor would be consuming about 20-30W of power. In reality, the power consumption would be a bit higher because leakage current is pretty constant regardless of clock speed.

- That 20-30W processor ran without a heatsink, you try running a 20-30W processor without a heatsink and see how long it lasts

- Their temperature measurments read a constant 29C throughout the tests, Intel's thermal trip doesn't kick in until somewhere in the 60-70 C range

Long story short, the video is total bullshit. It was faked for the purpose of getting lots and lots of page hits (read: advertising revenue). A P4 will NOT run without a heatsink, thermal throttling or otherwise.

Re:Why use Intel anymore? on Current Processors Tested With Linux · 2004-02-04 14:39 · Score: 1

Erk, I was looking at something that us mere mortals could afford!

Who cares if you can theoretically buy an E15K with 106 processors, that box costs over a 3 million dollars! Ok, it might be an option for some companies that REALLY need the high-end, but honestly, that's a different world altogether!

Why use VIA chipsets anymore? on Current Processors Tested With Linux · 2004-02-04 14:35 · Score: 1

I think the one thing to take from this is that VIA chipsets are just more trouble then they are worth. It doesn't much matter if it's a VIA chipset for an AMD processor or a VIA chipset for an Intel processor, they're all just trouble. Same goes for ALi chipsets.

If you're going to use an Intel processor, get a board using a chipset from Intel or Serverworks/Broadcom. If you're going to use an AMD processor, get a board using a chipset from nVidia or AMD.

SiS chipsets might be an option for either platform, but they're almost exclusively used on el-cheapo boards, most of which are very poorly built and cause their own sets of problems.

Re:Two Words... "motherboard Chipsets" on Current Processors Tested With Linux · 2004-02-04 14:31 · Score: 1

nForce2 chipsets don't support ECC, making thing pretty much useless for servers. Great desktop chipsets but still useless for servers.

When it comes to servers using AMD processors, there is really only one chipset vendor: AMD. The AMD 760MPX chipset for their AthlonMP and the AMD 8000 series for their Athlon64 and Opteron line are the only real server chipsets for AMD processors.

Now, one important change has occurred though: the Athlon64 has a built-in memory controller, and that memory controller supports ECC (and Chipkill and memory scrubbing and all those other nice high-reliability features). That means that no matter how bad VIA fucks up their chipset design, Athlon64 processors will STILL support ECC. It also means that boards using nVidia's nForce3 line of chipstes will support ECC as well, though not due to anything nVidia has done.

I'm still not sure that this makes the boards entirely suitable for servers, but it's certainly a step in the right direction.

Re:Answer 2: Heat on Current Processors Tested With Linux · 2004-02-04 14:24 · Score: 1

Nope, Willamette chips ran REAL hot as well. The TDP of the 2.0GHz Willamate ranged from 72W to 76W depending on the stepping. The highest maximum power consumption of any AMD Athlon or AthlonXP processor at the time was 72W for the "Thunderbird" Athlon 1400MHz. More recently the "Barton" AthlonXP chips running at 3000+ and 3200+ model numbers have exceeded this slightly at about 75W and 77W max, but all previous AthlonXP chips had a TDP in the 50-70W range.

Really the only times that Intel chips ran cooler than AMD chips were when their chip generations weren't all that comperable. For example, when the Athlon first started shipping, Intel was still shipping the Pentium 3. The Athlon consumed more power than the P3. However once Intel started shipping the P4 the two companies had similar power consumption. Then Intel started shipping the "Northwood" P4 build on a 130nm fab process, and it consumed less power than the AthlonXP that was still being built on a 180nm fab process. Once AMD switched to a 130nm fab process, they were back in the same ballpark and Intel's chips actually ended up consuming more power.

The main reason why so many people found that Intel chips ran "cooler" is that the majority of P4 heatsinks were 80mm x 80mm heatsinks, while the majority of AthlonXP and AthlonMP heatsinks were 60mm x 60mm heatsinks (you can get a lot of bigger Athlon heatsinks now, but they were rare until a year or two ago). As such, not only did the P4 have more metal to stread the heat to, they also were able to use large and slower-spinning fans, while the smaller AthlonXP heatsinks needed fairly high-speed (read: noisy) fans to keep cool.

Re:PowerPC processors on Xbox 2 - The Price of Compatibility? · 2004-02-04 13:12 · Score: 1

But, I was thinking that a PPC-based X-Box may serve to decrease chip/architecture costs,

It may help in this regard, certainly it gives IBM another source of revenue for their processors. However, it's still likely to be a LOT smaller than the revenue Intel brings in for their x86 chips, at least at the high-end of things (IBM sells a lot of cheaper PPC chips for the embedded market).

and also bring some new interest (by way of games, etc) to this platform. I dunno, though.

And just how many Macs do you know of that run Windows and DirectX? Just because it uses a PowerPC processors does not, in any way shape or form, make the system a Mac. Cisco uses PowerPC processors in their routers, but you sure as hell don't see Macs being better routers because of any technology Cisco put into IOS!

The XBox2, despite running a PowerPC processor, will be a LOT more like a PC than it is like a Mac. It will run some version of Windows (WinCE and WinNT have both existed for PowerPC at various times) and DirectX, and that's what's important in terms of software compatibility.

Re:What market is it for? on Intel Prescott Released · 2004-02-02 13:01 · Score: 1

Uhh, Microsoft is planning on releasing WinXP for AMD64 processors in Q3 of this year (to coincide with the release of SP2 for WinXP IA32). Windows is already 64-bit, and has been for a while, but only for IA64 (Itanium).

MS will also bring out a 64-bit/AMD64 version of Win2003 Server at about the same time. Hardly 2006.

Re:Thoughts. on Intel Prescott Released · 2004-02-02 11:58 · Score: 2, Informative

... the areas where they lag the most are the ones that SSE3 looks like it should alleviate.

I don't anticipate that SSE3 will have much of an effect on performance, certainly not like SSE2 does. It's really just filling in a few holes, instructions that probably should have been included in SSE2 but weren't for whatever reason. Some odd special-case scenarios might see a big boost, but for the most part I would throw out a guess of 0-5% max for most programs, with the majority falling closer to the 0% side of things.

The Prescott delivers respectable performance and will end up costing less at the same clock speeds than Northwood.

This, of course, is the kicker. If/when the Prescott is cheaper than the Northwood, then it starts to make sense.

We're not looking at an event like the original P4 launch where the new chip was not only slower but also more expensive & required hardware upgrades to use

I would say that nearly everyone who would want a Prescott IS looking at hardware upgrades. Only the latest and greatest P4 motherboards will support the chip, and if you've already got a 2.8C GHz P4 or faster, why would you want to upgrade now? Even most boards made 6-8 months ago are unlikely to support Prescott chips, particularly those > 3.0GHz which require new voltage regulators.

Re:Increased cache latency. on Intel Prescott Released · 2004-02-02 01:40 · Score: 1

Getting the heat out of the chip is not such a big problem as getting the power TO the chip in the first place. That's why Intel won't be making too many speed grades of the Prescott with the current socket type. Instead they will change to the the LGA775 socket that has (a lot) more power and grounding pins.

Bochs DOES do 64-bit! on Bochs x86 IA-32 Emulator 2.1 Released · 2004-02-01 18:32 · Score: 4, Informative

Uhh, did you bother to read the link? Ohh wait, this is /.

If you HAD bothered to go to the BOCHS site you would notice that it DOES do 64-bit emulation. More specifically, it emulates the AMD64 instruction set (aka x86-64). This is rather nifty in that it allows developers to test out code for AMD64 without having to purchase the hardware. Obviously not an ideal development platform, but it could be useful for some.

Re:interesting hardware comp on 2.4 vs 2.6 Linux Kernel Shootout · 2004-01-31 17:56 · Score: 3, Interesting

The hardware specifications weren't very complete, but from what I can see from IBM's x335 configuration they were using the no-L3 cache Xeons. A 3.2GHz Xeon with 1MB of L3 cache could easily boost the performance 10-20% over a 3.06GHz Xeon with no L3. Of course, the Opteron could still end up leading in a lot of the tests. What's more, the Opteron seems to really come into it's own in 4-processor configurations, where the Xeon scales poorly. In short, the Opteron is a heck of a good chip.

Where this really looks bad for Intel though is with their Itanium systems. Assuming that those 1.5GHz I2 processors are of the 6MB L3 cache variety, this is Intel's top-end chips. The servers probably won't have the performance of HP or SGI's I2 servers (IBM doesn't care much for the Itanium so they don't invest nearly as much time and effort in the designs as HP or SGI do), the chip still looks pretty weak.

Intel's saving grace here may be that the Itanium line of chips are VERY dependant on a good compiler, and chances are that these applications were compiled with GCC. Using Intel's ICC instead probably would boost performance by a noticeable margin, though a number of applications still won't compile with ICC from what I understand.

Re:Motivation? on Virginia Tech Upgrade: PowerMac G5 to Xserve G5 · 2004-01-28 11:56 · Score: 1

Wow, whoever "covered" that before doesn't know anything about memory errors. Try reading up on some of IBM's very extensive research into the matter. IBM probably knows more about soft memory errors than any other organization on the planet, and their conclusion: the main cause of memory errors is random cosmic rays.

Putting a computer at the top of a mountain or in an airplane will result in several orders of maginitude more erros than one on the grouns, while putting a computer in a deep millitary bunker underground results in almost zero soft memory errors.

If you take IBM's estimates for the number of soft memory errors for a cluster the size of the VTs in a standard, ground-based system, it's pretty darn high (some calculations put it as high as a couple errors a minute, but practically speaking it's probably more like 1 every hour or so). Certain a SIGNIFICANTLY too high to be ignored unless you're cluster is a toy designed only for the publicity of getting the #3 spot in the Top500 list and not for real research.

As the other poster who replied to my message hinted at, the initial cluster was a marketing ploy, not a real super computer. The new XServes ARE a proper supercomputer (err, at least a supercluster, it takes a bit more than just high LINPACK results to be a true supercomputer, but the rest is mostly in the hands of software now).

Re:Speed Improvments on Virginia Tech Upgrade: PowerMac G5 to Xserve G5 · 2004-01-28 11:46 · Score: 1

Considering that you complained about people spouting off without actually knowing what they are talking about, it's rather ironic (or perhaps just sad) that you so obviously clueless!

There is NO way to identify soft memory errors through software short of doing everything twice. None, zero, zipo, zilch! It's not like you have the answer to a big long calculation sitting there to check against when you're done. If you want to know the answer to the calculation you'll have to do the calculation twice. If they match, you can be pretty darn certain that it's correct. If they don't match, an error occured. Repeat as necessary (so you're actual losing slightly more than 50% of your performance, but smart software should be able to keep that to less than a 51% performance loss for almsot all situations).

You can do all the checksums in software you want, it won't do one lick of good if the data you're reading in to calculate those checksums is wrong.

Re:1U units & wiring on Virginia Tech Upgrade: PowerMac G5 to Xserve G5 · 2004-01-27 07:49 · Score: 1

Their current setup has 12 PowerMac's per rack. Considering the real-estate is basically "free" (as in, it's already there and can't readily be used for much else), they don't really need to put any more than 12 1U servers in their rack. Considering that a standard cabinet can hold 42U worth of servers, they should have lots of free space for wires and wire management add-ins. Even if they double the density they should still have nearly half the racks free.

Re:Motivation? on Virginia Tech Upgrade: PowerMac G5 to Xserve G5 · 2004-01-27 07:37 · Score: 1

The problem was that the PowerMac G5 WAS broke, or at least not suitable for use in a large cluster like this. The lack of ECC memory was a major problem and it was just not possible to sweep under the rug.

As for the power bill, the claim is that the current cluster uses 2MW of power. Even with your own power plant, that is a BIG expense. Even a 10% reduction in costs would be quite noticeable over the course of a year (if you're super-cluster isn't running pretty close to 100% full-out 24/7 for it's lifespan you probably wasted a lot of money on it).

Simply put, the XServe G5 is the computer that they SHOULD have used in the first place. It's a rack-mountable computer so it's FAR easier to stick into racks, it consumes less power so your costs are lower, it has hardware monitoring built-in so that you can more easily track potential points of failure, and most importantly, it has ECC memory so that you don't need to do all your calculations twice.

In short, it's a good solution. The PowerMac G5 was NOT a good solution.

Re:Uh on Virginia Tech Upgrade: PowerMac G5 to Xserve G5 · 2004-01-27 07:20 · Score: -1, Offtopic

As has been pointed out countless times again, the cost was NOT a "mere" $5M. Their total hardware cost was ~$7M ($5.3M for the computers and memory, another $1.7M for the infiniband hardware), there was another $1M to upgrade an existing building. They also had the benefit of free labour (millions of Mac zealots) and have not factored in the cost of power and cooling (at 2MW total power and cooling, this is a pretty significant expense, about $5,000 a day) or the support costs.

These factors are typically included in the price tag of a super computer, but are always ignored with this Big Mac cluster. Once you start comparing apples to apples, it's still very cheap, but not nearly the sort of difference people make it out to be. If they didn't have free labour and a free building to house the thing then the difference would be even smaller.

So no, no one else will make it ot #3 in the Top500 list and no one has made it to #3 in this list. Even if you JUST look at the very solid and fixed up-front costs it was $8M.

Re:Speed Improvments on Virginia Tech Upgrade: PowerMac G5 to Xserve G5 · 2004-01-27 07:03 · Score: 1

And how, exactly, are you supposed to get a job "under suspicion" unless you run the job twice?! There is simply NO WAY for software to know if a soft memory error has occured except to run everything twice and compare your results. The segmentation just prevents the slowdown from being MORE than 50%!

Re:Secrets? on Linux Centrino Driver Update · 2004-01-26 16:58 · Score: 2, Informative

Not flamebaitish at all, it's actually a very good question.

First off, a 1.5GHz Pentium M will run circles around a 2.0GHz Celeron. Actually it will beat the pants off a 2.8GHz Celeron, but the Celeron is perhaps a bad example because that chip REALLY stinks! The current Celerons (1.7GHz through to 2.8GHz, basically a castrated bastard-child of the regular Pentium4) are absolutely abysmal performers, so it doesn't take much to beat them; AMD's $35 Duron processors running at 1.6GHz will usually match or beat the 2.8GHz Celeron.

Simply put, there are two main methods of designing a fast processor; the "brainiac" model where the chip does a lot of work per clock cycle but doesn't clock as high, and the "speed demon", which doesn't do much per clock cycle but runs at very high clock speeds.

The Pentium4 is very much a "speed demon" design, which is why it clocks nearly twice as high as most other chip produced on a similar manufacturing technology. The Pentium-M takes more of the "brainiac" style of design, so it's harder for Intel to clock it to high speeds, but it does more work per clock cycle.

In reality, the Pentium M doesn't really run at slower clock speeds than many other CPUs, it currently tops out at 1.7GHz. For comparison the Athlon64 is running at 2.2GHz, the PPC 970 (aka G5) at 2.0GHz, the Power4+ at 1.7GHz, the Itanium2 at 1.5GHz, Alpha EV7 at 1.25GHz, UltraSparc III at 1.2GHz, etc. Really the only odd-ball is the Pentium4, which currently clocks up to 3.2GHz. Despite the wide range in clock speed though, in the end all of these chips are in the same general ball-park in terms of performance.

Now, there are a LOT of factors that influence the overall speed of a processor, and even a quick summary of them could easily take dozens of pages, but it's already well documented in books and on the web if you're interested. Suffice it to say that a Pentium-M is usually about as fast as a 2.2 to 2.6GHz P4, though individual applications can vary wildly.

This doesn't exactly mean that clock speeds are irrelevant, a 1.7GHz Pentium-M is still going to be faster than a 1.3GHz Pentium-M, it's just that clock speed is only one small part of the whole picture. I like to equate it to the displacement of a engine. All else being equal, a 4.0L engine will give you a faster car than a 3.0L engine. However, it's certainly possible to build a 3.0L engine that will produce more horsepower than a totally different design of 4.0L engine (F1 cars manage to pump ~900bhp out of a 3.0L engine, while most 4.0L engines you're likely to see in production cars produce only ~300bhp). What's more, the peak horsepower number doesn't tell the full picture of engine performance and it certainly doesn't tell you how fast the car as a whole would be. Similarly, for any given processor core, higher clock speeds will give you more performance. On the other hand, two different cores can haver very different performance at different clock speeds, and certainly other components like the video cards, memory and hard drive can all have a big impact on the overall performance of the system.

When you get down to it, it's simply a matter of design decisions and trade-offs. The Pentium-M was designed to offer good performance and low power, and it succeeds VERY well (I'm a big fan of the Pentium-M processor, even if I rather dislike the "Centrino" marketing program). The P4 was designed for the highest overall performance at a reasonable price-point. As a result, the top-end P4s are faster than the top-end Pentium-M chips, and probably always will be. However, the Pentium-M at 1.7GHz consumes only about 25W. A similar performance P4, even in it's low-power laptop version (the "Mobile Pentium4-M", not to be confused with either the "Mobile Pentium-M" or the "Mobile Pentium4", and some people say AMD's names are confusing!) consumes 35W at 2.5GHz. Meanwhile the regular desktop Pentium4 (and also the "Mobile Pentium4") conume 61W. That 2.0GHz mobile Celeron processor you mentioned comes in at 32W.

Re:Another batch? Yes! on Joel Rants About Resumes · 2004-01-26 07:00 · Score: 2, Insightful

While you are very right, I think that the author of the article makes a good point, you don't want to LOOK like you're applying for hundreds of jobs.

In other words, the cover letter (and even the resume) you send to each employer should be at least somewhat customized for the job. If your cover letter looks like you've just filled in a new name and job title for each position you apply for, chanes are it won't get far.

Also, I don't think that people shoudl worry TOO much about being completely qualified for the job. Now obviously if the job asks for a business major with plenty of sales experience in a specific field and you're a programmer with zero sales in any field, you're SOL. However almost every job I've ever seen asks for people with more skills and experience than they really need, often asking for experience in a half-dozen or more different fields when very few people work in any one of them, let alone all of them. I've seen dozens of job postings asking for 5+ years of experience, comp. sci degree, MCSE and A+ certification for a first-line telephone tech support jobs. Of course, the funniest are the jobs where they ask for things like 10 years of experience with some technology that didn't exist more than 5 years ago. For these jobs they OBVIOUSLY are not going to get what they claim to be looking for, so if you've got 3 or 4 years of experience you're probably as good a choice as anyone else.

A lot of it comes down to trying to match your cover letter to whoever is likely to be reading the cover letters. Chances are that if the job description is filled with buzzwords and HR-speak, your cover letter better be filled with the same or it will get tossed. If it's very to the point and contains specific technical references, your cover letter should do the same.

Re:REPLY to this if you are a C/C++ programmer on Athlon64 Motherboards And Chips Compared · 2004-01-26 06:21 · Score: 1

Ok guys I have one CPU question that is yet to be answered. Aside from increase memory access and integer/float width. What could the possible advantage of a 64-bit 3D game have.

General 64-bit vs. 32-bit? Not much. In fact, the float width doesn't even change (it's always been 64 or 80-bit on PCs and most other architectures), just the integer width and larger memory access. Of course, AMD64 also doubles the number of integer registers (and makes them true general purpose registers, as opposed to the semi-general purpose registers of IA32) and doubles the number of XMM registers (for MMX/SSE/SSE2). It also cleans up a little bit of cruft here and there.

Register size/Bus speed/hypertransport all can be added to current 32bit platforms.

Bus speed and hypertransport could easily be added to 32-bit platforms. The P4 already has a fast bus and Transmeta's upcoming Efficeon processor will use hypertransport as it's I/O connection to the outside world.

Register size though, well that's another story altogether. If you're changing register size (and the number of registers) you're breaking compatibility. If you're breaking compatibility anyway, why not fix THE main limiting factor for 32-bit systems, ie the fact that they can only address 4GB of memory.

It's not like you can "pair up" instructions now... those instructions that used to be 32bit, when recompiled simply take up 64bits now, right?

Not really. The instruction size stays the same with AMD64. The standard word size (int or long int, which are the same on most architectures) stays the same, still 32-bits. The only thing that doubles in size are pointers. You also have the option of using 64-bit integers (long long int) natively instead of having the compiler hack them into two 32-bit ints like with 32-bit processors.

If your video games don't require hugely accurate numbers... The 64bitness of an instruction set adds nothing!

If your game (or any application) needs hugely accurate numbers you're using floating point stuff anyway, and as mentioned above, that's always been 64 bit or even 80-bit in some cases (all x87 FPUs support 80-bit floating point numbers, though it's rarely used by compilers). The larger integers are just that, a larger range for integer numbers. It's fairly rare to use long long ints, so this isn't a huge deal. Of course, when you need long long ints you need to do at lest 3 times as many instructions on a 32-bit machine as compared to a 64-bit one.

What am I missing? anything?

You're missing a few things. First, games right now don't use more than ~2GB of memory, the maximum that 32-bit chips can support with things getting ugly, but they will. In fact, we're probably only about 2 or 3 years away from it being common to use 2GB+ of memory for games. Also, games are definitely NOT the only market out there. Using more than ~2GB of memory is already quite common for servers and workstations, and doing so on a 32-bit chip is not ideal.

The next thing that you're missing is doubling the number of registers. Sure, you could do that with a 32-bit chip, but if you are going to break compatibility you might as well fix as many potential problems as possible in one fell swoop. This also nicely offsets any potential performance loss you might see by doubling the size of your pointers (ie more memory bandwidth requirements and more strain on cache). In fact, by doubling the number of registers you often end up with FASTER code when compiled for AMD64 as compared to IA-32, while with most other processors (eg the G5 or an UltraSparc) you typically end up with slower 64-bit code vs. 32-bit code. The extra registers not only mean more space to store data (rename registers can handle that on 32-bit arhcitectures), but fewer load/store instructions, thereby lowering the size of your code and reducing cache and bandwidth use.

In the end most AMD64 code runs about the same speed as IA-32 code. Th

Slashdot Mirror

User: Hoser+McMoose

Comments · 678