Domain: aceshardware.com
Stories and comments across the archive that link to aceshardware.com.
Comments · 338
-
Re:Raw CPU power
Sun isn't about raw CPU power. For that we have POWER and x86. Sun is about massive scaling. Sure, 1 POWER4 or P4 or Athlon beats an Ultrasparc. And 8 USIIIs lose out to 8 POWER4s or Xeons or Hammer CPUs. But Intel and AMD drop off at about 8P systems (though ItaniumII can handle larger systems, and Opteron can scale past 8P with a HT bridge), and the POWER architecture scales to hundreds of processors. Sun though can pack a thousand chips in a single system image, with plans to scale to 4096 (IIRC) within the next 2 years.
I'm sure Sun would love to have a high-performance CPU to field against massive clusters being deployed for highly parallelizable tasks such as rendering, but the fact is that's not where their strengths lie. Huge tasks which cannot be efficiently split are what Sun is good at, tasks where superb scalability in terms of both CPU power and memory are an absolute must.
For more, read Ace's Hardware's excellent volume multiprocessor articles:
Part 1
Part 2
Part 3 -
Re:Raw CPU power
Sun isn't about raw CPU power. For that we have POWER and x86. Sun is about massive scaling. Sure, 1 POWER4 or P4 or Athlon beats an Ultrasparc. And 8 USIIIs lose out to 8 POWER4s or Xeons or Hammer CPUs. But Intel and AMD drop off at about 8P systems (though ItaniumII can handle larger systems, and Opteron can scale past 8P with a HT bridge), and the POWER architecture scales to hundreds of processors. Sun though can pack a thousand chips in a single system image, with plans to scale to 4096 (IIRC) within the next 2 years.
I'm sure Sun would love to have a high-performance CPU to field against massive clusters being deployed for highly parallelizable tasks such as rendering, but the fact is that's not where their strengths lie. Huge tasks which cannot be efficiently split are what Sun is good at, tasks where superb scalability in terms of both CPU power and memory are an absolute must.
For more, read Ace's Hardware's excellent volume multiprocessor articles:
Part 1
Part 2
Part 3 -
Goodbye PHP - Hello Zope
PHP is good for its ubiquity...
Many servers already run it or can be convinced to run it with no problems.
But if you have the opportunity to roll your own server then you really should be using Zope.
Based on Python, the functionality of zope and the elegance of its implementation, really blows PHP away.
PHP will never fill the application server market the way that JSP does and
that Zope will.
If you want to see why PHP is good for quick and dirty little sites that want to be run cheaply on commodity hardware, but why JSP really rock - then check out the article on aceshardware about the rationale of setting up a java based sytem.
PHP is great but after you use Zope you won't want to go back to PHP ever again. -
Re:Kiss and say goodbye to Java language!!
PHP is good for its ubiquity...
Many servers already run it or can be convinced to run it with no problems.
But if you have the opportunity to roll your own server then you really should be using Zope.
Based on Python, the functionality of zope and the elegance of its implementation, really blows PHP away.
PHP will never fill the application server market the way that JSP does and
that Zope will.
If you want to see why PHP is good for quick and dirty little sites that want to be run cheaply on commodity hardware, but why JSP really rock - then check out the article on aceshardware about the rationale of setting up a java based sytem.
PHP is great but after you use Zope you won't want to go back to PHP ever again. -
Re:It's a good thing
FWIW, there were news pieces posted on Ace's Hardware and Tom's Hardware that quoted AMD as saying that the Athlon64 is 15% faster clock for clock than the Athlon XP chips. So, yes, there is some potential for a "better computing experience".
-
Re:RDRAM vs. DDR
>On the other hand, interleaved RDRAM has the same >peak theoretical bandwidth of interleaved DDR >SDRAM one quarter its clock rate because RDRAM >chips have one quarter the bus width of SDRAM >chips. 800 MHz RDRAM would be the same speed as >200MHz SDRAM assuming both (or neither) are >interleaved. Many if not most modern SDRAM >controllers support memory interleaving, >including my old abit board which isn't even new >enough to run an athlon XP.
Theoretical performance doesn't hold up in practical applications. That is why there is SPEC. Besides, RDRAM is designed so that you can easily make it a multi-channel architecture:
Quote: "By moving the core logic to the CPU and thus incorporating the Rambus Memory Controller as a part of the CPU itself, much of the current latency problems plaguing the technology will disappear. Both Sun's upcoming MAJC and the Playstation 2 are examples of embedded solutions with ondie RMCs. Another example is Compaq's upcoming EV7 (Alpha 21364), which also uses 8 channels to support massive bandwidth requirements and to keep latencies down (instead of accessing large volumes of DRDRAM in serial from a single channel, which would increase latencies)."
DDR SDRAM is more complicated to design in a multi-channel layout because of timing and motherboard design complexity issues. Besides, DDR SDRAM can't sustain its peak bandwidth even close to as well as RDRAM. Bursting isn't sustained bandwidth.(Remember, to transfer a byte, you have to transfer a QWORD. To transfer the next byte, you have to wait an entire clock cycle). Therefore, it only really allows 1/4 the peak bandwidth in this case.
A very old article from Ace's Hardware says: "The Rambus channel sends out the data twice as fast as the SDRAM, but the SDRAM can send out the first 8 byte without waiting, while Rambus has to transfer 16 bytes. As Rambus can send 2 bytes every cycle, it takes 4 cycles of 2.5 ns to transfer 16 bytes or 10 ns."
Of course, this is regarding 800Mhz RDRAM, and 1066Mhz RDRAM is currently out.
Also, not many applications only send 16 bytes at a time. For random access bursting applications, like servers, this is common. On the other hand, for 3d games, you want a sustained bandwidth to send all that data to the graphics card.
The aforementioned site speaks of examples where DDR SDRAM would be better because of its lower latency. In those cases, though, RDRAM will still be adequate.
Also, I'm curious to see any mentions of RDRAM versus DDR performance on newer chipsets.
> The benchmark you cite shows better performance > for DDR SDRAM on intel's solutions than on the
> athlon system, which leads me to believe that
> it is possibly cpu-dependent.
Since memory bandwidth depends on FSB, Intel systems should have an advantage.
>The belief that RDRAM is a superior technology to >SDRAM is at best a matter of opinion and at worst >an absurd myth.
Didn't see you try to refute the fact that unused cores shut down (great for power management).
> RDRAM makes great sense when you're not using >very much of it (down in the ~16 MB range...
Comparison: 1GB system.(Picking anything more would severely skew things in RDRAM's favor).
PC3700 DDR vs PC1066 RDRAM
source of prices: pricewatch.com
Generally RDRAM comes with two channels, while DDR generally comes with one.
PC1066 RDRAM theoretical peak bandwidth (2 channels)= 1066*2*2 = 4.3 GB/s
DDR3700 DDR theoretical peak bandwidth (1 channel)= 466 * 8 = 2.98 GB/s
Price:
DDR (512MB modules * 2) = $145 * 2 = $290
RIMM (256MB modules * 4) = $80 * 4 = $320
Yes, the motherboards for RDRAM based sets are more expensive, but the memory appears to be bearly more expensive in this case for twice the theoretical bandwidth. The RDRAM 512MB is more expensive than DDR 512MB, but since you usually find RDRAM in higher-channel configurations, this isn't an issue. Comparing prices is difficult since you have to pair RDRAM up.
-
Here are some major differences
Charts showing the differences between apache 1.x and 2.x.
Actually a great article as a whole -
Here are some major differences
Charts showing the differences between apache 1.x and 2.x.
Actually a great article as a whole -
Re:different meanings of "dynamic" pages
They do change the content, and they do explain how.
Read this page in the article.
Specifically:
The real beauty of these types of caches is that they maximize performance while sacrificing none of the site's dynamic nature. More traditional caching solutions implemented by HTTP servers might cache the full output of a given request, but this approach would be impractical where we need to customize the output to the user's specifications. -
Re:Who cares how long it takes for static pages?
Obviously, you didn't read the article.
Try reading this page -
Morale of the story
If you have a really really boring story, it doesn't even matter if you put it on slashdot; your webserver still won't die.
-
About dual systems...
I saw this comparison of dual Apple G4 1.25Mhz, AMD MP 2200+, Intel P4 Xeon 2400Mhz and several single processor systems today.
You do the math. Go to the 3rd page if you are impatient. -
hmm..
-
Re: Java can be faster than C++Native binaries are several orders of magnitude faster than Java's interpreted bytecode. It's a pure fact. Anything else is FUD.
Heh. Non-intuitive, ain't it?
The fact is that C++, in particular, has 'features' that prevent certain optimizations. Java, due to stricter specification, has some advantages. Whether or not Java will ever be faster than optimized FORTRAN is a different question, but largely moot since very little non-scientific software is developed in FORTRAN (or hand optimized assembly, the other performance poster boy).
Anyway, I'm working on a magazine article regarding my benchmarks so I can't release them yet. However, for a much earlier article that shows great results with last generation VMs, check out Binaries vs. Bytecodes. The 1.4 VMs are substantially faster than the 1.3 versions he used in that article, while the C++ compilers have made little or no progress in the same time period. Cool, eh?
;-)Source is provided with that article, so you can test it with current compilers and VMs.
So, anyhow, before you do any more spouting about "several orders of magnitude faster than Java" you'd better run your own benchmarks. You're in for a surprise.
-
Not MeI read several reviews, most notable among them are here and here. Although the technology seems compelling when looking forward a few years, its infancy just doesn't sell me the product, especially when I consider that a dual Athlon MP 2000 (1.6Ghz) is respectably close to the $700 PIV 3.06GHz with HT, and costs a LOT less.
3.06GHz PIV + motherboard + 512MB DDR RAM = $1025
2 Athlon MP 2000 + motherboard + 512MB DDR RAM = $695....for 80-90% of the performance of the HT PIV?Sorry, but I can get the basics for an SMP system for $5 less than Intel wants for its new flagship CPU.
Now, if I could get 2 PIV 2.4 GHz CPUS with HT, that might be a different story...
-
Re:Smokin!
-
Actually, the hyperthreading only helps in apps that support hyperthreading.
-
-
Heat output. OMG.
Tom's Hardware already has a review up about it, and it looks to live up to most of the hype.
Right. And for readers that want a review by people that actually know what they are talking about, you can read the review at Ace's Hardware.
In other news, the P3 @ 3.06GHz is indeed a fast CPU, but considering that it's maximum power dissipation is 105W to the Athlon 2800+'s 68W, it looks like people should stop making fun of the Athlon for running so hot. :)
This comparison isn't completely fair (the Pentium IV is faster), but even the P4 2.2GHz spews 70 W of heat.
At 105W, the P4 is approaching the (in)famous heat output of the Intel Itanium! This is not a good thing.
(note: regarding Tom's Hardware, I have no specific complaint about the article, just the website quality in general. The reviewers, except for Tom, have no clue and generally spew pure uninformed BS throughout their articles. Why the site is still respected is a complete mystery to me.) -
Maya 4 and AMD/Intel systems
It's a bit dated, but here is an interesting article from Ace's Hardware describing performance on AMD/Intel systems for comparison:
Maya 4 and SSE-2 optimisations
AMD also makes a comparison here, but Intel's benchmarks didn't include Maya. -
How Does Increasing FSB affect Performance?More From Ace's:
Athlon XP 2800+: 333 MHz FSB and nForce 2First of all, we tested the Athlon XP 2800+ on the "normal" KT333 platform with a 17x multiplier, the FSB set at 133 MHz DDR (266 MHz) and the memory set at 166 MHz DDR (333 MHz), CAS at 2, RAS to CAS at 3, Precharge at 3. The second time, the KT333 platform (ASUS A7V333) was set at a FSB of 166 MHz (333 MHz) and the multiplier was set to 13.5x.
...Where do I start? There is an enormous amount of info hidden is this table. Let us first start with the 266 MHz versus 333 MHz FSB discussion.
There have been many reports that show that the Athlon does not benefit much from an increase in FSB clockspeed, moving from 266 MHz to 333 MHz. But Membench tells us exactly why. First of all, compare the two KT333 latency numbers (64 byte strides). All BIOS settings were exactly the same, only the FSB speed, and thus the multiplier, are different. Normally one would expect, everything else being equal, that the Athlon with the 166 MHz FSB would see 25% lower latency, but the CPU with the 166 MHz FSB version actually sees a higher latency! This shows that the (ASUS) KT333 board, in order to guarantee proper stability, increases certain latencies of the memory controller. Memory bandwidth increases by 14%, which is also less than expected.
Now what does this mean for "real world" performance? It means that many applications will see either a very small performance increase or none at all, as it is latency and not bandwidth that is the most important performance factor. Let us explain this in more detail.
-
Calculating LatencyFrom:
Ace's Guide to Memory Technology
Basically, the latency of the whole memory (From FSB to DRAM) system is equal to the sum of:- The latency between the FSB and the chipset (+/- 1 clockcycle)
- The latency between the chipset and the DRAM (+/- 1 clockcycle)
- The RAS to CAS latency (2-3 clocks, charging the right row)
- The CAS latency (2-3 clocks, getting the right column)
- 1 cycle to transfer the data.
- The latency to get this data back from the DRAM output buffer to the CPU (via the chipset) (+/- 2 clockcycles)
If you want to calculate the latency that CPU sees, you need to multiply the latency of the memory system with the multiplier of the CPU. So a 500 MHz (5 x 100 MHz) CPU will see 5 x 9 cycles latency. This CPU will have to wait at least 45 cycles before the information that could not be found in the L2-cache will be available in the cache.
-
Re:Great....
- Sorry, but your post reeks of "armchair CPU designer" : It's all so clear and so obvious. I mean, it's not like Intel and AMD have a lot of extremely clever people who seek the best balance between all of the systems...is it?
Yes, they are slowly improving, but modern PCs are still behind where workstations were years ago, and a modern Intel based server is well behind a SPARC based machine.
-
More reviewsEven more than from my post in the last story...
- [H]ard|OCP Intel Pentium 4 @ 2.80GHz : Intel is breaking out the big guns with their sights set directly on the competition. Will the 2.80GHz Northwood be enough for Intel to hold onto the performance crown?
- Anandtech Intel's Pentium 4 2.80GHz - Moving to the Head of the Class
- Tom's Hardware Speed Isn't Everything: P4/2800 Meets Athlon XP 2600+
- Ace's Hardware Faster Still: The 2.8 GHz Pentium 4
- FiringSquad Intel Pentium 4 2.8GHz Review
- Hexus.net Intel Pentium 4 2.8GHz Review
- SimHQ.com
Intel "Northwood" 2.80GHz Pentium 4 Processor using
.13 Technology - Tech Report Intel's Pentium 4 2.8GHz processor - Two billion eight-hundred thousand hertz
- Hot Hardware The Pentium 4 2.8GHz Processor - Intel ups the anti once again
- xbit labs Intel Pentium 4 2.8GHz against Athlon XP 2600+
- VR Zone Intel Fastest Pentium 4 2.8Ghz Review
- HardcoreWare A Thorn in AMD's Hide
- Lost Circuits Pentium4 2.8 GHz - Another Hit And Run
-
More reviewsHow does Slashdot decide which of these hard-working sites gets loads of free traffic?
- [H]ard|OCP Intel Pentium 4 @ 2.80GHz : Intel is breaking out the big guns with their sights set directly on the competition. Will the 2.80GHz Northwood be enough for Intel to hold onto the performance crown?
- Anandtech Intel's Pentium 4 2.80GHz - Moving to the Head of the Class
- Tom's Hardware Speed Isn't Everything: P4/2800 Meets Athlon XP 2600+
- Ace's Hardware Faster Still: The 2.8 GHz Pentium 4
- FiringSquad Intel Pentium 4 2.8GHz Review
- Hexus.net Intel Pentium 4 2.8GHz Review
- SimHQ.com
Intel "Northwood" 2.80GHz Pentium 4 Processor using
.13 Technology - Tech Report Intel's Pentium 4 2.8GHz processor - Two billion eight-hundred thousand hertz
- Hot Hardware The Pentium 4 2.8GHz Processor - Intel ups the anti once again
- xbit labs Intel Pentium 4 2.8GHz against Athlon XP 2600+
-
Good benchmarks
Good article with benchmarks over at Aces Hardware
What I like is how the AMD 2600+ is very close on most games either 1-2FPS behind or ahead, and the 2800+ isnt out yet. Go AMD! P4 2.8 $570 or AMD 2600+ $265
-
Rumors, mythos, FAQs
Myth/rumor: The Athlon XP is a furnace of unimaginable heat! I'm getting a Pentium IV! Even though they are slower and more expensive, at least they won't dim the lights then melt them!
The fastest Athlon XP chips dissipate less than 5% more heat than the fastest Pentium IV chips. They can, however, handle more heat before cooking.
Myth/rumor: Tom's Hardware guide is "more objectvice" or even "Tom's Hardware guide is reliable"
I can't believe I read this, even in a Slashdot comment.
Tom's Hardware Guide is infamous among forums such as those at StorageReview.com and among people that actually know what they are talking about for being little more than a hardware review tabloid. Read the reviews! They come to illogical conclusions and sensationalize most of their reviews.
Read the Athlon review in question:
This is AMD's admission that the previous performance scale was set too high, especially when it came to the higher clock speeds.
Umm... Could it be that because the CPU is advancing where the other components such as memory and FSB are not, that it is possible that AMD added another 66MHz to make sure the rating system was still accurate? It isn't like system performance scales linearly with CPU speed when everything else sits still. Whoever thinks that Tom's Hardware is a good place to get hardware reviews doesn't have a clue about hardware!
Read Tom's glorious review of the KT266a vs the Nforce where despite there being less than a 5% difference between the chipsets and despite the Nforce outperforming every one of the many KT266a that outnumber it greatly in some tests, their "conclusion" was Conclusion: KT266A Trounces nForce 420D - Soltek is Front-runner
Tom's has had some good reviews, and most of the reviews BY TOM HIMSELF are pretty good, but most of the reviews are from his editors, and the proof is in the reviews--they are making Tom's Hardware more of a tabloid than a legit hardware review site, riding on the reputation that Tom made for the site years ago. I know, I was once an avid Tom's reader and am disgusted how the once clear and thoughtful reviews have turned into manic drivel.
If you want reviews that are actually well thought out, intelligent, and have sane conclusions based on mere facts, try Ace's Hardware, Ars Technica, and Anandtech.
Ace's Hardware reviews are clearly the best and most researched, but they are few and far between. Want an excellent review of current and future memory technologies written with the help of actual engineers? Read Ace's Hardware.
Ars rarely has hardware reviews, but when they do the reviews are good.
Anandtech is a good all-around major review site that as far as I can tell has never been biased, but is a little bit too PC for me. (that's Politically Correct, not the other one)
Is Tom's biased? Read the reviews! They aren't biased in a classic sense as far as I can tell, that is, they don't "always favor Intel" or "Always favor AMD"; rather they are often biased against one or the other. They will post stories that are clearly opinionated bullshit from ignorant tech writers that tend to have a bias against one or ther other. This is a mystery to me as they surely piss off both AMD and Intel all the time, and don't make any friends in the process. Overall, I wouldn't say that bias is a big problem at Tom's Hardware as much as stupid technical writers that don't know what they are talking about is a problem.
Want more examples? Point me to a review at Tom's and I'll tell you what's wrong with it (if there is anything wrong with that particular one)
At Tom's--read the reviews by Tom, but everyone else is not trustworthy.
Myth/rumor:
When you hold a seashell up to your ear, you can hear the sea.
Fact: You can hear the same sound reflections by holding a drinking cup up to your ear. It has nothing to do with the ocean. The question is, if you hold a Unix shell up to your ear, can you hear the C? -
several more 2600+ reviews
There are several more 2600+ reviews, and these are much better too.
AMDZone.com
Hot Hardware
Tech-Report
Overclockers.com.au
Ace's Hardware
Firing Squad
Hexus
xbit
Anandtech
Van's Hardware
VIA Hardware
The Inquirer -
Re:Alternative reviews...
Not to mention Aces Hardware
-
Re:AMD releases the 2400+ and 2600+ Athlons tommor
There's already a review on Ace's Hardware which concludes that the Athlon 2600+ has again leapfrogged the fastest Intel CPU. Of course, when Intel releases the 2.8GHz P4 next Monday, it will yet again leap over AMD.
Ad infinitum.
-
Re:AMD releases the 2400+ and 2600+ Athlons tommor
There's already a review on Ace's Hardware which concludes that the Athlon 2600+ has again leapfrogged the fastest Intel CPU. Of course, when Intel releases the 2.8GHz P4 next Monday, it will yet again leap over AMD.
Ad infinitum.
-
Re:Perhaps not
In memory indexes for search engines, databases etc. Servers need lots of memory anyway,
but by keeping data normally stored on data in
memory you can boost your speed by up to a
hundred fold. As for large memory for graphics or
CAD work, yeah maybe it isn't needed yet, but
see here -
Re:Why go from 32 to 64? Why not jump to 128?Er.. I though Solaris (SPARC edition) was 64-bit ever since Solaris 7? Sure, maybe not all of the userland and other applications have been ported yet, but even Solaris 9 has the option to install either the 64-bit version or the 32-bit version.
A lot of the applications need the space, databases, 3D modelling, data processing outside of databases, GIS, etc. The 4GB addressing barrier of 32-bit (sans PAE and virtual addressing of up to ~ 16-64GB) is rather limiting. Check out some of the conversations brewing at Ace's Hardware and you'll see some of the discussions. x86 PAE sucks because the performance of shifting the window is much slower than regular memory operations (but still much faster than swapping out to an already busy I/O subsystem -> hard drives).
-
Ace's Hardware also has a preview.
Ace's Hardware also has a short but very informative article about the NV30.
-
Re:combine clocked/-less sections on same chip?I believe the Pentium 4 processor does contain a couple of clockless/async components within the same die as the rest of the processor.
There is a thread going on Ace's Hardware, discussing the same article as well as other articles and references to clock-less computing.
-
Tom's? "thorough"?"...Tom's is usually rather thorough."
So you say, but I certainly haven't seen any evidence of this, not in the last 3 years.
Before then, THG was one of the better sites on the web (that I knew about at least). Now I will only go there if I'm really bored or looking for a laugh. www.tech-report.com, www.aceshardware.com or www.realworldtech.com are SO much more informed.
-
Addional Athlon XP 2200+ Reviews and InfoHere's more reviews to check out guys.
AMDZone.com
Technoa.co.kr
Hardinfo.dk
Active Hardware
Ace's Hardware
Lost Circuits
Anandtech
Hexus
VIAHardwareRacksaver also announced a blade server using 132 2200+s in a 7 foot cabinet!
-
More reviews
review at Ace's hardware
Much info about upgrading older boards to the new AMD.
At least here the reviewer make sure that both CPU work with the same memory.
Tom's gives the P4 PC1066, while 95% of the P4 systems are sold with DDR.
-
Re:arrg stop with the quake already
Right. Because the guys at Ace's only know about, and use, Quake benchmarking. Oh yes. If you were to show them, for instance, some SPECmarks, they wouldn't understand anything. So, I wonder who built this and stuck it on their site?! What a hack! Perhaps these benchmarks, which do not originate with Ace's are from Quake because that was what was available to run? It's not as if the Hammer is out in all that many reviewer's hands, yet...
-
Re:Not that bad. Their CPUs, on the other hand...
Not that it's definitive, but Ace's Hardware has a "SPECmine" page that lets you search known SPEC ratings for various processors. On SPECfp2000, the results are:
- 1050 MHZ UltraSPARC-III Cu: 827 Peak, 701 Base
- 1733 Athlon XP: 660 Peak, 613 Base
- 900 UltraSPARC-III 482 Peak, 427 Base
So the high-end UltraSPARC outperforms the Athlon by a healthy margin. (I mentioned in my earlier post the 900 UltraSPARC-III Cu, but the SPECmine doesn't have results for that exact processor. I'd expect it to perform at about 90% of the 1050Mhz version).
You can use the SPECmine to find the SPECInt results, and the Athlon does in fact beat the UltraSPARC (749 v. 610). So you're paying for floating point performance on the Sun part, but you actually pay an integer performance penalty.
In real life, the Blade feels like a REALLY fast system, in spite of the SPECInt numbers. Perhaps that massive 8MB cache doesn't help the SPECInt numbers, but pays off in day-to-day tasks? I can't explain it, and maybe it's just "This machine cost $20K+, it must be fast", but I'd definitely prefer the Blade to my current Athlon home system... If cost were no object.
On the other hand, cost is an object, which is why my current home system IS an Athlon. But don't knock the Blade system; it's outrageously priced, but it's one boss machine.
-
Re:Not that bad. Their CPUs, on the other hand...
Not that it's definitive, but Ace's Hardware has a "SPECmine" page that lets you search known SPEC ratings for various processors. On SPECfp2000, the results are:
- 1050 MHZ UltraSPARC-III Cu: 827 Peak, 701 Base
- 1733 Athlon XP: 660 Peak, 613 Base
- 900 UltraSPARC-III 482 Peak, 427 Base
So the high-end UltraSPARC outperforms the Athlon by a healthy margin. (I mentioned in my earlier post the 900 UltraSPARC-III Cu, but the SPECmine doesn't have results for that exact processor. I'd expect it to perform at about 90% of the 1050Mhz version).
You can use the SPECmine to find the SPECInt results, and the Athlon does in fact beat the UltraSPARC (749 v. 610). So you're paying for floating point performance on the Sun part, but you actually pay an integer performance penalty.
In real life, the Blade feels like a REALLY fast system, in spite of the SPECInt numbers. Perhaps that massive 8MB cache doesn't help the SPECInt numbers, but pays off in day-to-day tasks? I can't explain it, and maybe it's just "This machine cost $20K+, it must be fast", but I'd definitely prefer the Blade to my current Athlon home system... If cost were no object.
On the other hand, cost is an object, which is why my current home system IS an Athlon. But don't knock the Blade system; it's outrageously priced, but it's one boss machine.
-
Re:The bit stuff, explain to a layman. TIA
"64-bit processors tend to be slower than there 32-bit counterparts"
Depends on the Architecture, if the instruction
sizes are the same, and the 64-bit chip can
also run 32-bit code, then clearly the 64-bit
one will be faster.
Best guesses so far, reckon
that x86-64 code should be about 15% faster than
x86 code, mostly due the doubling of the number
of registers from 8-int 8-fp (SSE) to 16-int
16-fp. This is in additional to a estimated
25% gain in speed over the Athlon at the same
clock speed and however many, more GHz AMD can
squeese out the new core. -
Re:Java
You would not use Perl on the same types of applications you'd use Java on (or for that matter C++). I think Java's best space is in enterprise-level applications. I dont see Perl being all that useful there nor C++ and with Java its much easier to have stuff co-exist on different platforms. You definitely get trade-offs for that.
Correct. Persistant objects (no process fork()) make scalability under Java much better as you can use one shared object as for example a cache. This has been covered by an excellent article before here
-
SPARC a faster CPU? I don't think so.
As for raw compute performance, if you believe Sun's SPEC ratings from their product site, a 1.05GHz SPARC CPU is only just lagging behind an Intel 2.2GHz PIV on integer performance and beating it on FP.
Where do they claim that? According to the SPECcpu website, a 1.05 GHz SPARC III Cu gets 537 base SPECint and 701 SPECfp, while a 2.2 GHz P4 easily beats it with 790 SPECint and 779 SPECfp.
Intel is way ahead in integer, and although the Sun catches up somewhat in FP, if you look at the individual results, it's entirely due to one massive spike on the art test. They recently figured out a (controversial) compiler trick that gave them nearly an order of magnitude increase on that one SPECfp test, and doubled their overall SPECfp score. Sun are known for their stability & scalability, but not their CPU speed.
Of course, if you have 106 of the things, that's different. But you'll be paying over US$4M for it, which isn't exactly workstation class anymore.
-
Direct link
Direct link to the post as a stand-alone page.
-
Re:*cough* PowerPC *cough*
When you design a processor, you look at what kind of job it'll be doing and optimize it for the most frequently used instructions. Double precision float are very far from being a priority. What kind of data range from 10E-1024 to 10E1023 ? The forces applied on the different areas of a space shuttle entering the atmosphere ? A simulation of a nuclear explosion ? The 10E-128 to 10E127 of the single precision is more than enough for most of the situations. The G4 floating point units support double precision but not altivec.
Yes, most consumer applications use single-precision floats. However, most HPC code uses doubles. The original poster was positing that the G4 would be a good replacement to all the 64-bit CPUs that are getting pushed out by Itanium because of its fp number crunching abilities (i.e. for HPC workloads). This is wrong in almost every way possible:
1) the G4 doesn't have the SISD execution resources necessary, because the G4's fp units are underpowered
2) the G4 doesn't have the SIMD execution resources necessary (even if all HPC code would magically be vectorized), because Altivec doesn't do doubles
3) the G4 (as it appears in Macs) doesn't have the DRAM bandwidth necessary
Indeed, the current top of the line, a dual 1GHz PowerMac G4, would be about the worst possible choice for replacing big iron HPC machines, even if a strong FORTRAN compiler existed.
My points were all in reference to this proposed use for the G4. But yes, you're quite right that the G4 isn't nearly as inferior when it comes to desktop workloads.
Of course, your assertion that "when you design a processor, you look at what kind of job it'll be doing and optimize it for the most frequently used instructions" is both extremely true and extremely ironic, because the G4 is much more widely used as a signal processor in various embedded systems than as a general-performance desktop CPU. This is why it has comparatively meager OoO abilities and why it allowed itself to be saddled with an overpowered vector unit which gates clock rampability, while leaving its SISD execution units relatively underpowered. OTOH, I'm fairly sure the G4 has been equipped with DDR in its embedded incarnations; certainly that particular fault can't be blamed on the chip's intended design.
On any system, the memory is a bottle-neck but the problem is DRAM chips, not bus.
Huh?? Um...try sticking some faster DRAM in a PowerMac (e.g. PC2100, PC2700, RDRAM, etc) and tell me how that helps!
As long as the bus support what the dram chips can spit.
Which the current G4 bus cannot. The difference, so far as I can tell, is semantic. (It's also wrong; DRAM chips can be made to run at fantastic speeds for reasonable prices: witness current high-end 3d cards, with DRAM bandwidth of 7, 8, and in the case of the newest G4s, >10 GB/s! The problem is coming up with a bus which can handle all that throughput in the much noisier and more complex environment of a motherboard with socketed DRAM, as opposed to a small chip with soldered DRAM.)
3D games send lists of polygons to the memory card. Even if we imagine a real crappy game using absolutely no acceleration, 30fps*1024*768*3 = 67 MB/S even if we double it for memory reads and add a lot of disk and network use, we're still far from 1GB/S.
This is quite incorrect. First of all, what you're imagining is not a "crappy game", and not even a non-interactive rendering, but a non-interactive video playing in 24-bit color. If we want to turn the calculation into how much polygon data gets sent to the video card, we'll replace the number of pixels (i.e. 1024*768) by the number of polygons in the scene (say, 100,000), and the number of bytes per pixel (3) by the number of bytes per polygon (3*32-bit spatial coordinates for each of 3 vertices plus for the normal = 48 bytes; note that this doesn't include other information which needs to be sent along with the polygon, e.g. pointers to its textures, etc.) So instead we've got 30fps*100,000*48bytes = 144MB/s. Of course, 30fps looks pretty shitty, so for decent immersion we're actually looking at 60fps, or 288MB/s. And, again, this leaves out all the other polygonal data besides the vertices and normal.
And, of course, we're just talking about data that comes from the CPU and is sent over the AGP bus to the graphics card; in a real game situation the DRAM most certainly does not store the predetermined locations of every polygon in the game world (how could it know?)! Um, so this has very little in fact to do with what we're talking about.
I can't provide solid approximations of how much DRAM traffic a real 3d engine actually provides, except by pointing out that in non-graphics-card-limited situations, a high-end K7 can easily gain 20-30% in fps by replacing a good PC133 chipset with a good PC2100 chipset. For example, see the last benchmark on this page. The KT266A beats the KT133A by 27%, and the nforce increases that to 32%, albeit with an ill-utilized-but-still-there dual-channel configuration. This is direct real-world proof that moving from PC133 to PC2100 can gain you ~30% in very common consumer desktop applications. (In fairness, the Serious Sam engine is known to rely particularly heavily on DRAM throughput where, for example, the Q3 engine seems to care more about throughput and latency on the cache level. Still, 20-30% is a pretty fair estimate overall.)
A large register set allow you to work with your data without having to read and write to memory all the time which is the biggest time waster on any modern system.
Having 32 GPRs (PPC) instead of 8 (x86) means you need to use an astonishing 24x4=96 bytes of cache to make up the difference. Most modern processors have L1 caches of slightly more than 96 bytes in size. Having more GPRs is good, but it has nothing to do with saving memory traffic on any higher levels of the memory hierarchy than L1-to-register and back.
Like you said, most of the accesses on a intel will be handled by L1 but you still have to calculate the physical address from the logical address in the instruction which means page table lookups ( ie other memory accesses ) and the arithmetic operations associated.
Wrong. Try looking up how a cache works, how memory pages work, how virtual memory works, etc. (Short version: believe it or not, but they've solved that problem. The translation from logical->physical addresses only comes into play when you have a page fault. OTOH, there is still work associated with calculating the physical address, but this is why all modern CPUs have seperate execution units to allow for the computing of memory addresses without clogging up the ALUs.)
OTOH, the effect I pointed out--that RISC code is more bloated than CISC code--*does* have an impact on the level of DRAM-to-CPU throughput, albeit, like I said, usually only in particularly ugly integer code.
The alrogithms for most heavy fp number-crunching, in contrast, are usually pretty good, and the code pretty tight. But the datasets are often too large to fit in even a 2 MB cache, and the nature of the calculation is often such that it is gated by memory throughput. There's just no way to run these sorts of calculations on a computer with a 1 GB/s bus and have it be anything but slow. And this is without even considering the folly of having a 1 GB/s bus that's supposed to keep 2 CPUs fed with data from DRAM *and* carry all the messages passed from one CPU to the other. (e.g. anytime CPU 1 wants data that's in that nice big 2 MB L3 cache of CPU 2...) -
Re:doesn't it depend...
now that is intresting... but based on the performance of the nForce (where the memeory bandwidth is twice the fsb, but performance wasn't increased that much) how much better do you think the grand champion is going to be unless they pump up the fsb again...
the new p4 is going to be 133x4=600
dual ddr, assuming it's 333, would be 166x2x2=666
assuming each channel is the same width (i can't remember right now) would that extra 66mhz.x64bit make that much of a difference? might be intresting for overclocking though... according to Ace's the NForce has really good overcloackability. i wonder if it's cause of the eaxtra memory bandwidth headroom... -
Re:Rambus as a company
How is the parent post +4?
He links to an obsolete article from Q3 2000 about RDRAM on the Pentium III...
He talks about the "insanely high latency", and it's pretty obvious he's exaggerating slightly.
RDRAM's latency, particularly with the upcoming PC1066, is far better than people give it credit for. See this AcesHardware article.
PC1066 RDRAM latency for 128 bytes: 207 cycles
PC800 RDRAM latency for 128 bytes: 247 cycles
PC133 SDRAM latency for 128 bytes: 229 cycles
Slashdot moderators: Would it kill you to check the links before going points-crazy? -
Die Photo and Size
Ace's Hardware has this bit with more information including links to an Intel presentation.
"Slide 22 of the presentation features a die photo of McKinley. The large 3 MB L3 cache is notable, and according to the presentation, it consumes 20% less area than traditional designs and is overall 85% efficient (~70% for traditional designs)."
And here's a story with the photo from that same article (no need to download 2.5 meg pdf...)
-Russ -
What Was Tom Smoking?
That article always left a funny taste in my mouth.
Why was he comparing next-gen DDR (DDR333), which isn't officially out yet, to the OLD PC800 RDRAM? Wouldn't it make more sense to compare PC1066 RDRAM (see the AcesHardware benchmarks)?
PC1066 RDRAM and DDR333 will both come out officially around the same time in official chipset support.
In other words, next-gen DDR performance for the P4 is about 1.5 years behind the RDRAM performance. Tom didn't mention that part...
In other news, Samsung is sampling PC1200 RDRAM now, too. 4.8GB/s in a dual channel config. -
Re:XP ? Who cares - how about Linux performance
The optimization for WinXP is just marketing bullcrap. When AMD upgraded from their Thunderbird core (Plain old athlon) to their palimino core (Athlon XP), all they did was change around some transistors to decrease heat generated and add in data prefetch.
The inclusion of prefetching SHOULD boost performance in any OS. As a matter of fact, check the XPs SpecINT scores at Ace's Hardware and compare them to a regular Athlon at the same clock speed. Isn't SpecINT run under some form of Unix? You'll notice the XP scores higher than the regular. -
And that's not all...
Aside from the meager "5-10%" performance boost per clock that GamePC reports, the new PC1066 RDRAM and 533MHz FSB coming in a few months offers a "12%" performance boost per clock, when used with the original P4.
Northwood + 533MHz FSB/PC1066 RDRAM should be quite nifty.
The PC1066 benchmarks are here.
According to that chart there, PC1066 RDRAM actually has lower latency than PC133 SDRAM. I don't know how accurate that is, but it says PC1066 RDRAM takes 207 cycles for 128 bytes, and PC133 takes 229 cycles (PC800 took 270)?
Maybe I'm reading that wrong or don't know some specifics about RDRAM architecture, but that sounds nifty...