Uhh, you really don't know what you're talking about here do you? We're talking floating point code here, not integer code! You don't need Smeagol or Panther or any other cat to get 64-bit floating point code, DOS can handle that just fine!
Essentially ALL processors with a floating point unit do 64-bit precision calculations. The old G4 and G3 did, the Pentium 4 does, the old 486 did, etc. etc. The whole 32-bit vs. 64-bit argument with these PowerPC 970 chips (and, in a similarly light, AMD64 chips) has to do with INTEGER registers and, more importantly, the size of pointers and address registers.
That being said, the original parent probably missed something to. Supercomputers tend to do tasks that are easily vectorized, so therefore it's almost certain that the calculations that they were using were done using Altivec and not the standard floating point unit.
No, I'm saying that SCSI is best for many small accesses happening all at the same time but typically on different parts of the disk, while IDE tends to be best for only one (or a small number) of disk reads from the same section of the disk. For what you're doing, IDE would tend to actually be a LOT faster.
Please, take a look at the Storage Review links I posted above, in particular, their SoundForge test. Ok, it's a slightly out-dated version of the program, but 7200rpm IDE drives EASILY beat out 15krpm SCSI drives in this test, by a fairly significant (up to twice as fast) margin.
My mail client doesn't have to go through 50,000 e-mails because I don't have anywhere near that many messages. Even most/. readers don't have anywhere near that many messages. However, a server might have to go through a folder of 50,000 e-mails.
SCSI is for servers. It performs better in server tasks because it was designed that way. IDE is for desktops. It performs better in desktop tasks because it's designed that way. SCSI and IDE have been compared MANY times, and the results are always the same, if you run a server doing fairly typical server tasks, SCSI is a LOT faster, if you run a desktop doing fairly typical desktop tasks, IDE is faster (at least comparing 7200rpm IDE vs. 7200rpm SCSI, a 15Krpm SCSI drive will probably beat out the IDE drive).
Me thinks that someone needs to read up on how MTBF is defined by hard drive vendors. Ohh, and more to the point, how it is defined for SCSI drives as compared to IDE drives (they aren't always the same thing ya know).
For those looking to learn a thing or two today, Storage Review has a nice little explination of MTBF for you all to read. Now, if you do a bit more reading, you can find out just how the definition of "service life" often changes from IDE drives to SCSI drives (for the lazy, most SCSI drives don't include the first 90 days of use in their "service life").
Err, have you actually looked at proper hard drive comparisons lately? (ie not this one). Western Digital drives are consistantly the fastest IDE drives on the market. Seagate drives are fine, and almost always the quietest drives on the market these days, but they're definitely not the fastest.
Regardless, if you ever want to know ANYTHING about hard drives (including PROPER comparisons of SCSI vs. IDE drives) go to www.storagereview.com.
Tom's Hardware has improved somewhat since Tom no longer does any of the writing. Unfortunately his ego seems to get in the way of the conclusions from time to time. The real problem is that Tom's "reviews" mostly just favor whatever company took him out to the nicest resturaunt the last time he was at a convention, or who bought him the most drinks the last time he visited their office.
As a general rule, take EVERY hardware test with a large grain of salt, you can pick and choose benchmarks to show just about any result you want. What's more, places like Tom's are TERRIBLE for reading rather ridiculous conclusions from their results. Not only do they do things like saying that Product Y "absolutely destroys" Product X because it's 2% faster on a test that has a 3% margin of error, but they also make some crazy assumptions about why performance differences exists without doing any meaningful research to verify their hypothesis.
As a general rule, I'd prefer to point people to www.anandtech.com for a start, because while they aren't much better technically than Tom's, Anand at least is trying to provide accurate and factual data, while Tom is usually just trying to get someone to stroke his enormous ego. If you get through that, then head over to www.aceshardware.com for some guys that actually know what the heck their talking about and try to do some real research along with their comparisons.
Whoa! Wake up to the 1990s now that they've come and gone! IDE has supported DMA for AGES! If you're still using a PIO mode on your IDE drive, you've got SERIOUS problems! I know that DMA has been used on basically all IDE drives and controllers since at least 1995, and probably a while before then.
No, the reason why SCSI is faster on server tasks and IDE is faster on desktop tasks is that SCSI drives are designed for servers and IDE are designed for desktops. The drives are built different and they emphasize different aspects of performance in order to improve performance for their target market.
Anyone care to guess whether reading 50,000 small files at the same time is something your more likely to do on a "server" or a "desktop"? How many of you have 50,000 e-mails in your personal maildir folder? How many of you admin a server that has 50,000 e-mails in the maildir folder for a few hundred people?
Sure it's huge, but I could get just as big of a difference between two runs using the EXACT same hard drive.
The tester didn't even bother to check and see if the files are fragmented, let alone checking to see if the files are on the same part of the disk. The original poster was right, this was NOT in any way a "good" comparison.
If you actually do want a good comparison, head on over to www.storagereview.com. They have compared many different SCSI and IDE drives and have a VERY good grip as to where and when SCSI's performance advantage comes into play.
Here's a quick and easy way to do things: Click on "Performance Database" at the top of the page, and then do a head to head comparison of a bunch of few SCSI drives and a few IDE drives. This will give you a whole whack of benchmarks. What you'll find is that on desktop applications, a 7200rpm IDE can almost always outperform a 7200rpm SCSI drive and is usually about on-par with a 10,000rpm SCSI drive. But, as soon as you get into their server benchmarks, the SCSI drives wipe the floor with the IDE drives.
Then it simply becomes a question of whether you run a server or a desktop. Different drives for different markets.
The K5 was completely designed ground up by AMD. No NexGen involvement at all. The K6 was the chip that was originally the Nx586. After AMD purchased NexGen they modified the design to run on a socket 7 bus and glued MMX onto the chip and sold it as the K6.
As for the K5 and it's shifts, that's all well and good if all you use your computer for is Distributed.net, but if you actually do anything productive with the chip, it was pretty poor. Now, mind you, half of the problem was that AMD had about a year's worth of delays due to manufacturing, so by the time AMD shipped the chip they were running at a MUCH lower clock speed than Intel's chips. So, while the K5 might have been slightly faster, clock for clock, than the Pentium at integer operations, it was only running at ~75-100MHz when Intel was selling 166 and 200MHz chips. If the K5 had been even remotely close to being on-time it might have stood a chance. As it was though, that chip was almost the death of AMD. Their market share had dropped SIGNIFICANTLY, down to about 3 or 4% from a high of 30% in the late 386/early 486 days.
It's not. The core of an UltraSparc or a Power4 look a LOT more like the cores of an Opteron or a P4 than they do an ARM. The CISC chips have become rather like RISC chips on the inside, while the high-end RISC chips have tended to get someone less "reduced" (ie more complex) instruction sets.
There are two things that AMD is working towards in this chip. The first is multiple cores, while the second is on-chip multithreading (SMT, same as Intel's Hyperthreading). The first may increase the bandwidth needs of the chip, but the latter is actually designed to reduce them, in a manner of speaking.
SMT allows one thread to do a bit of processing for a while, until it runs out of data. It requests data from memory and then goes off to lala-land for a little while while the other thread takes over. Since most (typically 97%+) data is stored in cache, the second thread usually has quite a bit to do before it runs out of data. By that time, hopefully the first thread's data has arrived and it can take over and do it's thing again.
Note that this isn't really doing much of anything for bandwidth, so much as hiding latency, which is the real problem on modern processors. Consider this: since the 386, processor speed has gone up by a factor of about 500. Memory bandwidth has gone up by a factor of about 64, but latency has only decreased by a factor of about 10 or less. So while processors are slightly lacking in bandwidth, the biggest problem is latency.
Now, the upside to all this, latency has improved a LOT in the last year or two. First, nVidia and Intel did a bang-up job with their memory controllers in the nForce2 and i875 chipsets, dramatically reducing latency as compared to most previous chipsets. Second, AMD has pushed even further forward with their Opteron/Athlon64 chips (and will continue with this K9) by integrating the memory controller right on-die, further reducing latency.
You forgot about the K5, AMD's competitor to the original Pentium. If any chip AMD ever made could be called slow and underperforming, it was the K5.
When the K6 was released it was arguably the fastest processor in the world. AMD's problem was that Intel brought out the PentiumII roughly one month after AMD released the K6, and it was pretty much all downhill from there for AMD. The chip still managed to compete well on the low-cost side of things for a while though.
SuSE Linux Profession for AMD64 is listed as being "available for pre-order, expected ship date: October 15th" (ie today). I don't know if they actually are shipping today, but they certainly should be soon. Either way, beta versions have been available for some time (way back to 8.0 or thereabouts). Redhat and Mandrake also have betas available for AMD64 as well (Redhat, at least, is a fairly mature beta).
All three of these companies also sell their corporate/server version of their Linux distro for AMD64 now.
SPEC CPU200 CINT base: Athlon64 3200+ (Windows/Intel C) : 1266 Athlon64 FX 51 (Windows/Intel C) : 1376 Athlon64 FX 51 (Linux/GCC) : 1282
SPEC CPU2000 CFP base: Athlon64 3200+ (Windows/Intel C) : 1180 Athlon64 FX 51 (Windows/Intel C) : 1329 Athlon64 FX 51 (Linux/GCC) : 1371
Observant readers among you may notice that the FX 51 score in Windows is the highest CINT_base score of any processor that is currently shipping (only the 3.2GHz P4 Extreme Edition beats it).
Yup, and currently there's a grand total of ZERO distributions out there that support the G5. If you're willing to spend a lot of time working on it, you might be able to role your own, though hardware support is quite weak at the moment so that could be problematic.
There are already 3 distributions (SuSE, Redhat and Mandrake) that support AMD64, and more on their way (including the *BSDs).
In case anyone forgot to read the fine print in the article (or simply forgot to read the article altogether).. Here's a quote for you with regards to the benchmarks:
"Tests on PCs performed by the PC World Test Center; tests on Apple systems performed by the Macworld Test Center."
Current Athlon64 FX chips are Socket 940. At some point in time in the not-too-distant future they will switch to be Socket 939, but that won't happen until the new year. Most likely this will happen when AMD shrinks their die to a 90nm fab process, probably about 6-9 months from now.
FWIW, interesting little factoid, all the AMD64 chips currently use the same die, even the lowly Athlon64 3200+. AMD just disables a few features and repackages it.
The absolute cheapest Power4 server that you can buy costs $5745, and that's for a single-processor, 1.2GHz Power4. If you want to compare comperable servers, you'll have to wait until IBM release their servers based on the PowerPC 970 (aka the G5). They aren't available yet, but should be here soon.
If you want to compare the high-end though, you can do that too. I'm very certain that AMD would GLADLY compare a $5745 Opteron server to a $5745 Power4 server any day of the week.
FWIW, check out the SpecWeb scores sometime, a 4-processor Opteron 846 (2.0GHz) server is the fastest of all 4 processor servers out there, just edging out an IBM pServer 655 with 4 Power4+ 1.7GHz processors. In SpecInt the Opteron easily beats the Power4+, though IBM wins SpecFP by an equally large margin. The 4P Opteron is also faster than the 4P Power4+ 1.7GHz as a Java server according to Spec JBB2000. Again, the Opteron is second fastest out of all 4P servers, just behind the 4P 1.5GHz Itanium2 from HP.
Long story short, the Opteron HAS been tested against the Powe4+, and it's holds it's own very well.
Intel's ULV Pentium M chip runs at up to 1.0GHz and consumes 7W maximum. Their ULV Mobile Celeron also consumes only 7W when running at up to 800MHz. What's more, I haven't seen any info that says if the '7W' that Transmeta is quoting is it's maximum power consumption, a thermal design spec, or it's "typical" power consumption. Transmeta has a tendency to only talk about typical power consumption, while the 7W numbers listed above for Intel are the chips Thermal Design Power (TDP, basically the maximum you'll ever see with the exception of code specifically designed to consume maximum power). TDP is ALWAYS at least a few watts higher than 'typical' power consumption.
AMD may also have a low-voltage AthlonXP-M that is in the same power consumption range, but unfortunately AMD does a piss-poor job of documenting their mobile processors (read: there is absolutely no documentation publicly available). As you mentioned, VIA is also producing chips in the same power consumption range.
Long story short: Transmeta is going to have to either deliver on performance (like Intel does) or on price (like VIA). Right now they are talking about selling the chips for $100 a pop, which is quite a bit more than what VIA sells for. They are also talking about only a 50-80% improvement in performance over the Crusoe, which isn't going to do much for them in the way of performance. At 1.1GHz, they might be competitive with the 800MHz ULV Celeron, but I'm not holding my breath. The Crusoe had terrible performance.
There are about a half-dozen major clusters in the works using AMD's Opteron processor. Cray is building the largest, a 10,000 processor cluster that should hit 40 teraflops. IBM has a 1000+ node (2000 processor) cluster for Japan, and China is building a similar sized cluster. I've also heard some reports here and there about 256 and 512 node clusters build built by other organizations.
In short, the Opteron has been a VERY popular choice for clusters, though I don't think that any of the projects have reached the Top500 list yet.
Price them out yourself. The G5's had 4GB of memory, Radeon 9600 Pro video card and the Apple Superdrive. Cost: $5,320.00/node
Multiply by 1100: $5.85 million
Now, my understanding is that the "standard" educational discount is 11%, which brings the cost down to $5.2 million.
So, with a total system cost of $5.3 million, VT must have got one hell of a deal on the VERY expensive infiniband hardware they are using. Actually, if they got that hardware for less than $1.1 million ($1000/node) they got a deal, let along getting it all for only $100,000.
Either way, you can hardly compare the prices of these two clusters, when UT was quoting the cost to build EVERYTHING, including renovating the building in which the cluster was to be housed.
- MUCH smaller physically (1U and 2U racks vs. desktops). You can fit the entire 300-node cluster of these things in the space that you can fit roughly the same space as 54 of the G5 PowerMacs (assuming a split of about 64 of the 2U racks and the rest 1U racks).
- ECC memory. The Apple does not support ECC memory, so you can't trust your data (and, for the last time, that Deja-vu software is NOT going to make this magically go away!).
- It's cheaper. The Dell cluster cost $3.0 million for the computers, the Apple cluster cost $5.8 million for the computers. The $38 million figure included costs such as renovating the building and all the extras, while the VT cluster was a very bare-bones cost.
- You can use existing clustering software. There is VERY little, if any, software out there designed to support and maintain Mac clusters. VT has been developing most of the software themselves. For PCs there is a lot more software out there already. Of course, if you've got a bunch of grad students to do the work for "free", this isn't such a big problem:>
- Reduntant, hot-plug power supplies
- No useless hardware for the cluster. The G5 systems come with high-end ATI Radeon 9600 graphics cards, which are just a complete waste of time, money and electricity (= more money). They also come with Apple Superdrives (DVD writters).
- Faster hard drives (SCSI, potentially up to 15K RPM drives, though I can't find details on what UT bought). Drives can also be hot-swappable.
Of course, there are also some advantages to the PowerMac G5 setup:
- 64-bit capabilities. I don't know how far along their software is in this regard, but it's GOT to be better than the PSE used on PCs. This is the number 1 biggest advantage of the Mac
- More performance, at least in the theoretical flops. If they can get their software setup at all right, the Mac cluster should perform very well, particularly if they can vectorize stuff (you often can with clusters) and make use of Altivec.
- Decent resale value if they decide to break-down the cluster and sell it a few years from now. These are high-end desktop machines that can be taken out of the cluster and sold pretty much as-is. The UT/Dell cluster is a bunch of rack-mounted machines that can't really be sold. That being said, I don't know how big of an advantage this really is, typically you buy a cluster expecting to use it for quite a number of years.
- You've got a virtually limitless group of Mac zealots to do anything and everything you can possibly think of, just so that you they can get a look at this magical cluster of Macs that has been blessed by the high-holy Steve Jobs himself!:>
In short, the UT/Dell setup is a traditional cluster, the VT/Apple cluster is more of a novelty, but a novelty that shows some good potential if it could be designed properly, ie if Apple releases some G5 X-Servers. A G5 X-Server could make for some VERY nice clusters.
FWIW the initial setup of the cluster is supposed to use Gigabit ethernet. Once it was setup and working, they were planning on switching over to infiniband. I don't know where VT currently is in the switchover process, but at least at some point in time, the PDF is/was correct.
It's been previously discussed, and their "error correction algorithms" are going to do dick-all for them if they can't trust their data, and without ECC, you can't trust your data.
The first poster is correct, results from this machine can not be trusted as being 100% accurate. It may be that they can live with the lack of accuracy, but it's definitely something that they will have to figure into their work. With 4.4TB of memory, they are going to have soft memory errors on a VERY regular (daily?) basis, and they can NOT be caught by some sort of software algorithm unless you store every bit of data twice and do every single calculation twice. If you're doing that, you might as well save yourself a few million and buy a 550 node cluster with ECC and get the same result.
Uhh, you really don't know what you're talking about here do you? We're talking floating point code here, not integer code! You don't need Smeagol or Panther or any other cat to get 64-bit floating point code, DOS can handle that just fine!
Essentially ALL processors with a floating point unit do 64-bit precision calculations. The old G4 and G3 did, the Pentium 4 does, the old 486 did, etc. etc. The whole 32-bit vs. 64-bit argument with these PowerPC 970 chips (and, in a similarly light, AMD64 chips) has to do with INTEGER registers and, more importantly, the size of pointers and address registers.
That being said, the original parent probably missed something to. Supercomputers tend to do tasks that are easily vectorized, so therefore it's almost certain that the calculations that they were using were done using Altivec and not the standard floating point unit.
No, I'm saying that SCSI is best for many small accesses happening all at the same time but typically on different parts of the disk, while IDE tends to be best for only one (or a small number) of disk reads from the same section of the disk. For what you're doing, IDE would tend to actually be a LOT faster.
Please, take a look at the Storage Review links I posted above, in particular, their SoundForge test. Ok, it's a slightly out-dated version of the program, but 7200rpm IDE drives EASILY beat out 15krpm SCSI drives in this test, by a fairly significant (up to twice as fast) margin.
My mail client doesn't have to go through 50,000 e-mails because I don't have anywhere near that many messages. Even most /. readers don't have anywhere near that many messages. However, a server might have to go through a folder of 50,000 e-mails.
SCSI is for servers. It performs better in server tasks because it was designed that way. IDE is for desktops. It performs better in desktop tasks because it's designed that way. SCSI and IDE have been compared MANY times, and the results are always the same, if you run a server doing fairly typical server tasks, SCSI is a LOT faster, if you run a desktop doing fairly typical desktop tasks, IDE is faster (at least comparing 7200rpm IDE vs. 7200rpm SCSI, a 15Krpm SCSI drive will probably beat out the IDE drive).
Me thinks that someone needs to read up on how MTBF is defined by hard drive vendors. Ohh, and more to the point, how it is defined for SCSI drives as compared to IDE drives (they aren't always the same thing ya know).
For those looking to learn a thing or two today, Storage Review has a nice little explination of MTBF for you all to read. Now, if you do a bit more reading, you can find out just how the definition of "service life" often changes from IDE drives to SCSI drives (for the lazy, most SCSI drives don't include the first 90 days of use in their "service life").
Err, have you actually looked at proper hard drive comparisons lately? (ie not this one). Western Digital drives are consistantly the fastest IDE drives on the market. Seagate drives are fine, and almost always the quietest drives on the market these days, but they're definitely not the fastest.
Regardless, if you ever want to know ANYTHING about hard drives (including PROPER comparisons of SCSI vs. IDE drives) go to www.storagereview.com.
Tom's Hardware has improved somewhat since Tom no longer does any of the writing. Unfortunately his ego seems to get in the way of the conclusions from time to time. The real problem is that Tom's "reviews" mostly just favor whatever company took him out to the nicest resturaunt the last time he was at a convention, or who bought him the most drinks the last time he visited their office.
As a general rule, take EVERY hardware test with a large grain of salt, you can pick and choose benchmarks to show just about any result you want. What's more, places like Tom's are TERRIBLE for reading rather ridiculous conclusions from their results. Not only do they do things like saying that Product Y "absolutely destroys" Product X because it's 2% faster on a test that has a 3% margin of error, but they also make some crazy assumptions about why performance differences exists without doing any meaningful research to verify their hypothesis.
As a general rule, I'd prefer to point people to www.anandtech.com for a start, because while they aren't much better technically than Tom's, Anand at least is trying to provide accurate and factual data, while Tom is usually just trying to get someone to stroke his enormous ego. If you get through that, then head over to www.aceshardware.com for some guys that actually know what the heck their talking about and try to do some real research along with their comparisons.
Whoa! Wake up to the 1990s now that they've come and gone! IDE has supported DMA for AGES! If you're still using a PIO mode on your IDE drive, you've got SERIOUS problems! I know that DMA has been used on basically all IDE drives and controllers since at least 1995, and probably a while before then.
No, the reason why SCSI is faster on server tasks and IDE is faster on desktop tasks is that SCSI drives are designed for servers and IDE are designed for desktops. The drives are built different and they emphasize different aspects of performance in order to improve performance for their target market.
Anyone care to guess whether reading 50,000 small files at the same time is something your more likely to do on a "server" or a "desktop"? How many of you have 50,000 e-mails in your personal maildir folder? How many of you admin a server that has 50,000 e-mails in the maildir folder for a few hundred people?
Sure it's huge, but I could get just as big of a difference between two runs using the EXACT same hard drive.
The tester didn't even bother to check and see if the files are fragmented, let alone checking to see if the files are on the same part of the disk. The original poster was right, this was NOT in any way a "good" comparison.
If you actually do want a good comparison, head on over to www.storagereview.com. They have compared many different SCSI and IDE drives and have a VERY good grip as to where and when SCSI's performance advantage comes into play.
Here's a quick and easy way to do things: Click on "Performance Database" at the top of the page, and then do a head to head comparison of a bunch of few SCSI drives and a few IDE drives. This will give you a whole whack of benchmarks. What you'll find is that on desktop applications, a 7200rpm IDE can almost always outperform a 7200rpm SCSI drive and is usually about on-par with a 10,000rpm SCSI drive. But, as soon as you get into their server benchmarks, the SCSI drives wipe the floor with the IDE drives.
Then it simply becomes a question of whether you run a server or a desktop. Different drives for different markets.
The K5 was completely designed ground up by AMD. No NexGen involvement at all. The K6 was the chip that was originally the Nx586. After AMD purchased NexGen they modified the design to run on a socket 7 bus and glued MMX onto the chip and sold it as the K6.
As for the K5 and it's shifts, that's all well and good if all you use your computer for is Distributed.net, but if you actually do anything productive with the chip, it was pretty poor. Now, mind you, half of the problem was that AMD had about a year's worth of delays due to manufacturing, so by the time AMD shipped the chip they were running at a MUCH lower clock speed than Intel's chips. So, while the K5 might have been slightly faster, clock for clock, than the Pentium at integer operations, it was only running at ~75-100MHz when Intel was selling 166 and 200MHz chips. If the K5 had been even remotely close to being on-time it might have stood a chance. As it was though, that chip was almost the death of AMD. Their market share had dropped SIGNIFICANTLY, down to about 3 or 4% from a high of 30% in the late 386/early 486 days.
It's not. The core of an UltraSparc or a Power4 look a LOT more like the cores of an Opteron or a P4 than they do an ARM. The CISC chips have become rather like RISC chips on the inside, while the high-end RISC chips have tended to get someone less "reduced" (ie more complex) instruction sets.
There are two things that AMD is working towards in this chip. The first is multiple cores, while the second is on-chip multithreading (SMT, same as Intel's Hyperthreading). The first may increase the bandwidth needs of the chip, but the latter is actually designed to reduce them, in a manner of speaking.
SMT allows one thread to do a bit of processing for a while, until it runs out of data. It requests data from memory and then goes off to lala-land for a little while while the other thread takes over. Since most (typically 97%+) data is stored in cache, the second thread usually has quite a bit to do before it runs out of data. By that time, hopefully the first thread's data has arrived and it can take over and do it's thing again.
Note that this isn't really doing much of anything for bandwidth, so much as hiding latency, which is the real problem on modern processors. Consider this: since the 386, processor speed has gone up by a factor of about 500. Memory bandwidth has gone up by a factor of about 64, but latency has only decreased by a factor of about 10 or less. So while processors are slightly lacking in bandwidth, the biggest problem is latency.
Now, the upside to all this, latency has improved a LOT in the last year or two. First, nVidia and Intel did a bang-up job with their memory controllers in the nForce2 and i875 chipsets, dramatically reducing latency as compared to most previous chipsets. Second, AMD has pushed even further forward with their Opteron/Athlon64 chips (and will continue with this K9) by integrating the memory controller right on-die, further reducing latency.
You forgot about the K5, AMD's competitor to the original Pentium. If any chip AMD ever made could be called slow and underperforming, it was the K5.
When the K6 was released it was arguably the fastest processor in the world. AMD's problem was that Intel brought out the PentiumII roughly one month after AMD released the K6, and it was pretty much all downhill from there for AMD. The chip still managed to compete well on the low-cost side of things for a while though.
To quote Yellow Dog Linux's own web page:
"Power Mac G5 1.6 GHz Video information
Supported? No*
* We are working hard to provide Yellow Dog Linux support for the Apple G5s."
Seems pretty cut and dry to me, no?
SuSE Linux Profession for AMD64 is listed as being "available for pre-order, expected ship date: October 15th" (ie today). I don't know if they actually are shipping today, but they certainly should be soon. Either way, beta versions have been available for some time (way back to 8.0 or thereabouts). Redhat and Mandrake also have betas available for AMD64 as well (Redhat, at least, is a fairly mature beta).
All three of these companies also sell their corporate/server version of their Linux distro for AMD64 now.
Ask and ye shall receive:
SPEC CPU200 CINT base:
Athlon64 3200+ (Windows/Intel C) : 1266
Athlon64 FX 51 (Windows/Intel C) : 1376
Athlon64 FX 51 (Linux/GCC) : 1282
SPEC CPU2000 CFP base:
Athlon64 3200+ (Windows/Intel C) : 1180
Athlon64 FX 51 (Windows/Intel C) : 1329
Athlon64 FX 51 (Linux/GCC) : 1371
Observant readers among you may notice that the FX 51 score in Windows is the highest CINT_base score of any processor that is currently shipping (only the 3.2GHz P4 Extreme Edition beats it).
Yup, and currently there's a grand total of ZERO distributions out there that support the G5. If you're willing to spend a lot of time working on it, you might be able to role your own, though hardware support is quite weak at the moment so that could be problematic.
There are already 3 distributions (SuSE, Redhat and Mandrake) that support AMD64, and more on their way (including the *BSDs).
In case anyone forgot to read the fine print in the article (or simply forgot to read the article altogether).. Here's a quote for you with regards to the benchmarks:
"Tests on PCs performed by the PC World Test Center; tests on Apple systems performed by the Macworld Test Center."
Current Athlon64 FX chips are Socket 940. At some point in time in the not-too-distant future they will switch to be Socket 939, but that won't happen until the new year. Most likely this will happen when AMD shrinks their die to a 90nm fab process, probably about 6-9 months from now.
FWIW, interesting little factoid, all the AMD64 chips currently use the same die, even the lowly Athlon64 3200+. AMD just disables a few features and repackages it.
The absolute cheapest Power4 server that you can buy costs $5745, and that's for a single-processor, 1.2GHz Power4. If you want to compare comperable servers, you'll have to wait until IBM release their servers based on the PowerPC 970 (aka the G5). They aren't available yet, but should be here soon.
If you want to compare the high-end though, you can do that too. I'm very certain that AMD would GLADLY compare a $5745 Opteron server to a $5745 Power4 server any day of the week.
FWIW, check out the SpecWeb scores sometime, a 4-processor Opteron 846 (2.0GHz) server is the fastest of all 4 processor servers out there, just edging out an IBM pServer 655 with 4 Power4+ 1.7GHz processors. In SpecInt the Opteron easily beats the Power4+, though IBM wins SpecFP by an equally large margin. The 4P Opteron is also faster than the 4P Power4+ 1.7GHz as a Java server according to Spec JBB2000. Again, the Opteron is second fastest out of all 4P servers, just behind the 4P 1.5GHz Itanium2 from HP.
Long story short, the Opteron HAS been tested against the Powe4+, and it's holds it's own very well.
Intel's ULV Pentium M chip runs at up to 1.0GHz and consumes 7W maximum. Their ULV Mobile Celeron also consumes only 7W when running at up to 800MHz. What's more, I haven't seen any info that says if the '7W' that Transmeta is quoting is it's maximum power consumption, a thermal design spec, or it's "typical" power consumption. Transmeta has a tendency to only talk about typical power consumption, while the 7W numbers listed above for Intel are the chips Thermal Design Power (TDP, basically the maximum you'll ever see with the exception of code specifically designed to consume maximum power). TDP is ALWAYS at least a few watts higher than 'typical' power consumption.
AMD may also have a low-voltage AthlonXP-M that is in the same power consumption range, but unfortunately AMD does a piss-poor job of documenting their mobile processors (read: there is absolutely no documentation publicly available). As you mentioned, VIA is also producing chips in the same power consumption range.
Long story short: Transmeta is going to have to either deliver on performance (like Intel does) or on price (like VIA). Right now they are talking about selling the chips for $100 a pop, which is quite a bit more than what VIA sells for. They are also talking about only a 50-80% improvement in performance over the Crusoe, which isn't going to do much for them in the way of performance. At 1.1GHz, they might be competitive with the 800MHz ULV Celeron, but I'm not holding my breath. The Crusoe had terrible performance.
There are about a half-dozen major clusters in the works using AMD's Opteron processor. Cray is building the largest, a 10,000 processor cluster that should hit 40 teraflops. IBM has a 1000+ node (2000 processor) cluster for Japan, and China is building a similar sized cluster. I've also heard some reports here and there about 256 and 512 node clusters build built by other organizations.
In short, the Opteron has been a VERY popular choice for clusters, though I don't think that any of the projects have reached the Top500 list yet.
Price them out yourself. The G5's had 4GB of memory, Radeon 9600 Pro video card and the Apple Superdrive. Cost: $5,320.00/node
Multiply by 1100: $5.85 million
Now, my understanding is that the "standard" educational discount is 11%, which brings the cost down to $5.2 million.
So, with a total system cost of $5.3 million, VT must have got one hell of a deal on the VERY expensive infiniband hardware they are using. Actually, if they got that hardware for less than $1.1 million ($1000/node) they got a deal, let along getting it all for only $100,000.
Either way, you can hardly compare the prices of these two clusters, when UT was quoting the cost to build EVERYTHING, including renovating the building in which the cluster was to be housed.
Here's a few:
:>
:>
- MUCH smaller physically (1U and 2U racks vs. desktops). You can fit the entire 300-node cluster of these things in the space that you can fit roughly the same space as 54 of the G5 PowerMacs (assuming a split of about 64 of the 2U racks and the rest 1U racks).
- ECC memory. The Apple does not support ECC memory, so you can't trust your data (and, for the last time, that Deja-vu software is NOT going to make this magically go away!).
- It's cheaper. The Dell cluster cost $3.0 million for the computers, the Apple cluster cost $5.8 million for the computers. The $38 million figure included costs such as renovating the building and all the extras, while the VT cluster was a very bare-bones cost.
- You can use existing clustering software. There is VERY little, if any, software out there designed to support and maintain Mac clusters. VT has been developing most of the software themselves. For PCs there is a lot more software out there already. Of course, if you've got a bunch of grad students to do the work for "free", this isn't such a big problem
- Reduntant, hot-plug power supplies
- No useless hardware for the cluster. The G5 systems come with high-end ATI Radeon 9600 graphics cards, which are just a complete waste of time, money and electricity (= more money). They also come with Apple Superdrives (DVD writters).
- Faster hard drives (SCSI, potentially up to 15K RPM drives, though I can't find details on what UT bought). Drives can also be hot-swappable.
Of course, there are also some advantages to the PowerMac G5 setup:
- 64-bit capabilities. I don't know how far along their software is in this regard, but it's GOT to be better than the PSE used on PCs. This is the number 1 biggest advantage of the Mac
- More performance, at least in the theoretical flops. If they can get their software setup at all right, the Mac cluster should perform very well, particularly if they can vectorize stuff (you often can with clusters) and make use of Altivec.
- Decent resale value if they decide to break-down the cluster and sell it a few years from now. These are high-end desktop machines that can be taken out of the cluster and sold pretty much as-is. The UT/Dell cluster is a bunch of rack-mounted machines that can't really be sold. That being said, I don't know how big of an advantage this really is, typically you buy a cluster expecting to use it for quite a number of years.
- You've got a virtually limitless group of Mac zealots to do anything and everything you can possibly think of, just so that you they can get a look at this magical cluster of Macs that has been blessed by the high-holy Steve Jobs himself!
In short, the UT/Dell setup is a traditional cluster, the VT/Apple cluster is more of a novelty, but a novelty that shows some good potential if it could be designed properly, ie if Apple releases some G5 X-Servers. A G5 X-Server could make for some VERY nice clusters.
FWIW the initial setup of the cluster is supposed to use Gigabit ethernet. Once it was setup and working, they were planning on switching over to infiniband. I don't know where VT currently is in the switchover process, but at least at some point in time, the PDF is/was correct.
It's been previously discussed, and their "error correction algorithms" are going to do dick-all for them if they can't trust their data, and without ECC, you can't trust your data.
The first poster is correct, results from this machine can not be trusted as being 100% accurate. It may be that they can live with the lack of accuracy, but it's definitely something that they will have to figure into their work. With 4.4TB of memory, they are going to have soft memory errors on a VERY regular (daily?) basis, and they can NOT be caught by some sort of software algorithm unless you store every bit of data twice and do every single calculation twice. If you're doing that, you might as well save yourself a few million and buy a 550 node cluster with ECC and get the same result.