Domain: amd.com
Stories and comments across the archive that link to amd.com.
Comments · 1,178
-
Re:Dual 64 boards
I know Opterons will be upto 8x cpus, but i hope they keep the chipset cost down for home users wanting dual/quad boards. The MP chipset line is end of life via its roadmap.
Guess its just, hurry up and wait. -
Re:Opteron and Athlon 64 are not the same CPUStraight from the FAQ.
Q: What are the differences between the AMD Athlon 64 and AMD Opteron processors?
ClawHammer = Athlon 64A: The upcoming AMD Opteron and AMD Athlon 64 processors are designed for different markets. For the server/workstation market, the AMD Opteron processor will undergo more stringent validation and reliability testing. Another difference will be in the number of HyperTransport links embedded on the chip. The AMD Athlon 64 processor will contain one HyperTransport link offering 6.4 GB/s data transfer while the AMD Opteron processor will offer three links. The processors will also contain different amounts of cache.
SledgeHammer = Opteron -
Re:Dual 64 boards
Take a real good look at AMD's roadmaps:
Dual Cla-whammers are GONE.
AMD evidently decided to force the enthusiast/mini-server market to choose to buy-up ( dual sledge-hammers, and at the prices involved
.. NBL ), orMean ( well, not really mean, but
.. wahh!! . . ), but effective ( for their bottom-line ).Mind you, there are 2 other significant concerns in replacing my system ( I'm in the segment they
.. decided to ignore ):1. Silent System:
.. those Cla-whammer HSFs look huge/possibly-noisy, or, if the chips really are low-wattage, then they'll be really silent under a copper HS with a Verax.de fan on it, and2. as someone else mentioned, HD CPU usage, but the solution for that ( waitaminit, we dissolve stuff to fix our PC's? ) is to use Serial-ATA ( non-blocking, and non-redundancy-of-commands ).
And with Linux, the Silicon Image chip based S-ATA controllers are supported in 2.6, so grab these, then, rather than the non-open-source HighPoint, or the outright opposed-to-open-source Promise.Lost Circuits Benchmarks ( stunning ), and CyberCPU.net ( it's the low-CPU, 8% vs 44%, that puts S-ATA into the phenomenosphere ), and
.. I'd heard that Seagate is implementing out-of-order-execution for its upcoming S-ATA drives, which oughta make 'em punchier..( for the TLA-challenged, the CLA in Cla-whammer, the new AMD desktop chip, stands for the Canadian Luge Association, and if these chips are able to flatten luge , they're damn capable, and..
the above usage of NBL stands for Not Bloody Likely, as rememberers of the film-version of Pygmalion may remember.. My Bloody Fair Lady, I think it were callethed.. hmmm.. ) -
Re:Only 1 TB?
I'm pretty sure this is the physical address space and not the virtual one (which should be 2^64).
Nope. 40 bit physical addresses, and 48 bit virtual addresses. -
Slashdot a Little Slow?
-
Oh, here it is.
-
I'm off Topic an don't care
yet another case of being one-up on Intel."
Exactly why do people make links like this in stories....
Link To The Story....Maybe a few obscure References.....
I think I can Find AMD.com on my own....Thanks -
Benchmarking is available...
benchmarks comparing the XP-M to the P4-M.
-
AMD's answer: Mobile athlons with 1watt(!)
12 new Athlon Mobile models, which will go down to 1 volt core voltage and use not more than 1 watt (!).
Check here
The 1 watt number is from a Heise article.
Bye egghat. -
Re:didnt intel sell some rights.......
Actually, AMD was founded in 1969.
-
Re:Mixing the cards... no wait: cores.
One can read the AMD Processor Recognition document which explains how to extract the information from the Ordering Part Number (OPN).
Just remember to do your research, and you'll be fine.
AMD Processor Ordering Part Number (OPN) Breakdown
AXDA 2700 D K V 3 D
^^^^ ^^^^ ^ ^ ^ ^ ^
-1-- -2-- 3 4 5 6 7
(1) Processor Core Architecture/Brand Name
(2) Model Number
(3) Package Type
(4) Operating Voltage(Nominal Core Voltage)
(5) Maximum Die Temperature
(6) Level 2 Cache Size
(7) Maximum System-Bus (Front-Side-Bus) Speed
(1) Processor Core Architecture/Brand Name
(only Thoroughbred and Barton cores are 0.13 m)
AXDA ----- AMD Athlon XP -- 0.13 m
AX ------- AMD Athlon XP -- 0.18 m
AMSN ----- AMD Athlon MP -- 0.13 m
AMP/AHX -- AMD Athlon MP -- 0.18 m
K7/A ----- AMD Athlon ----- 0.18 m
(6) Level 2 Cache Size
1 -- 64 KB
2 -- 128 KB
3 -- 256 KB
4 -- 512 KB (only Barton cores have a 512 KB L2 cache)
(7) Maximum System-Bus Speed
B -- 200 MHz
C -- 266 MHz
D -- 333 MHz -
Re:It amazes me...
You're not likely to see 128- or 512-bit general-purpose computers in your lifetime, I'm afraid.
Hmm, strange, I was under the impression that AMD's 64-bit offering had numerous 128-bit [media] instructions.
That would be Volume 4 of the AMD x86-64 Programmer's Manual for those that have the set. For those that don't, look here: http://www.amd.com/us-en/Processors/TechnicalResou rces/0,,30_182_739_7044,00.html
That doesn't really make it a 128-bit processor, but it comes close enough. -
No luck yet for AMD-based solutionsBy all accounts, development of SMP motherboards for Athlon has all but stopped pending the release of the next generation of Athlon MP ("Barton").
This is largely due to the fact that as of late last year, Intel multiprocessing solutions have become cheaper than their AMD counterparts. Intel has reduced lower-end Xeon pricing as faster Xeons have come out. AMD has not done the same with Athlon MP prices. And so Tyan, MSI, Asus et al. are not spending much time thinking about how to keep their Athlon MP motherboard line up to date.
This is just a guess, but it might also have something to do with some bad blood between AMD and manufacturers: the most recent AMD chipset for dual Athlon MP's (760MPX-based) had a bug in the Southbridge that completely disabled on-board USB 1.1 (oops) and it took AMD a while to get a fix into production motherboards. That probably didn't earn them big points.
-
AMD says...According to AMD:
Q:What are the differences between the AMD Athlon 64 and AMD Opteron processors?
A:The upcoming AMD Opteron and AMD Athlon 64 processors are designed for different markets. For the server/workstation market, the AMD Opteron processor will undergo more stringent validation and reliability testing. Another difference will be in the number of HyperTransport links embedded on the chip. The AMD Athlon 64 processor will contain one HyperTransport link offering 6.4 GB/s data transfer while the AMD Opteron processor will offer three links. The processors will also contain different amounts of cache.
-
Re:AMD vs Intel
As other people have pointed out, you have to remember the timeline of these products. According to AMD's corporate history, they were founded just to take other peoples stuff and try to improve it- they didn't innovate anything. So Intel would release a chip, AMD would fart around with it for a while to try to improve it, and by the time AMD released their 'improved' product, Intel was ready with the next generation. It wasn't until recently that AMD started to branch out a little on their own (with PC processors, that is).
K6 line... beat intel's pentium line in Mhz handily
The K6 line never got close to the pentiums in terms of actual performance. And what happened to the "MHz doesn't matter!" drum that AMD has been beating for the past few years?
-
NoFrom an AMD History page:
Since 1969, AMD has grown from a fledgling start-up, headquartered in the living room of one of its founders, to a global corporation with annual revenues of $4.6 billion.
-
Re:why so slow?it didn't take AMD nearly as long...
That comment is pretty ignorant. AMD released their first processor in 1975- only 4 years after Intel's first processor. And much of AMD's progress was due to cross-licensing agreements that AMD signed with Intel.
Keep in mind that AMD was not founded to innovate- they just wanted to take other people's crap and try to make it better. They say exactly that on their corporate history page.During the company's first years, the vast majority of its products were alternate-source devices, products obtained from other companies that were then redesigned for greater speed and efficiency.
-
Re:Why?
-
Clarifying Opteron vs. Athlon 64
To clarify AMD's processor naming scheme for those who haven't been keeping up with the Joneses, the Athlon 64 is AMD's 64-bit desktop and mobile CPU, while the Opteron is AMD's 64-bit server and workstation CPU. Both utilize the x86-64 architecture, which is essentially an extension of the existing x86 instruction set for 64 bits.
A few key differences between the two are that the Opteron will be multiprocessor-enabled and have three HyperTransport pipes (each providing a theoretical 6.4GB/s of throughput) versus one in the Athlon 64. The Opteron will also have more on-die L2 cache (1MB and 2MB are being talked about right now), and will draw quite a bit more power (90W+ vs. ~65W for the Athlon 64). -
Clarifying Opteron vs. Athlon 64
To clarify AMD's processor naming scheme for those who haven't been keeping up with the Joneses, the Athlon 64 is AMD's 64-bit desktop and mobile CPU, while the Opteron is AMD's 64-bit server and workstation CPU. Both utilize the x86-64 architecture, which is essentially an extension of the existing x86 instruction set for 64 bits.
A few key differences between the two are that the Opteron will be multiprocessor-enabled and have three HyperTransport pipes (each providing a theoretical 6.4GB/s of throughput) versus one in the Athlon 64. The Opteron will also have more on-die L2 cache (1MB and 2MB are being talked about right now), and will draw quite a bit more power (90W+ vs. ~65W for the Athlon 64). -
You are comparing embedded to desktop processors
While the x86 and PPC are not comparable MHz for MHz, you can certainly figure out how much work each one does per cycle.
While work per cycles is important, work per Watt is even more important in embedded systems. You are comparing a PowerPC( less than 7.5 Watts) to a 1GHZ Duron(greater than 40 Watts). I am using the AMD datasheet http://www.amd.com/us-en/assets/content_type/white _papers_and_tech_docs/23802.pdf to guess that.
Just for comparison, a dirt cheap throw away CPU nowadays would be a 1GHz Duron (around $35),
I bet you can get this particular PowerPC for under 10.00. This IBM PowerPC is circa 1997 and is a 604e chip. http://www-3.ibm.com/chips/techlib/techlib.nsf/pro ducts/PowerPC_604e_Microprocessor
The PPC would have to execute four times more instructions per cycle than the Duron to be roughly comparable, and you can bet your life that it doesn't.
From the product brief:
Dispatch Unit
Dispatch up to 4 instructions per cycle
8-instruction dispatch buffer
Completion Unit
Completes up to 4 instructions plus 1 store
and 1 branch per cycle
So, all in all, while the PPC might be more efficient per cycle, it is not per dollar.
Like I said, you have to compare the embedded features. Low power, form factor, layout, manufacturers support, etc. AMD does not market to the embedded folks very well. PowerPC is very good at that.
The cost of this product comes in the engineering work that went into designing a custom PCB and layout for this product. Not in the parts or software.
-
Re:why?
"Basically, Hammer (64 bit cpus) = larger addressing space, not nescessarily faster processing."
I suggest you go read the developer docs for the hammer.
Memory addressing has been re-designed, because no-one uses the protection model in x86. etc....
Do you understand me, or the developer docs, probably not.
now read Understanding the Microprocessor and say that the Hammer isn't an improvement. -
Re:Heat. tsarkon reports.
Wrong. You are such a FUD/conjecture machine it's unreal.
The Intel shrinks of the P3 (.13) with the integrated heat spreader are quite cool to the touch under load. In fact, the FIRST CPU Intel shrunk was the Pentium III. You know, you can READ about CPUs. http://www.intel.com/intel/finance/pricelist/ , http://developer.intel.com/design/PentiumIII/specu pdt/ and for P4 lovers, the specification updates for the p4, http://developer.intel.com/design/Pentium4/specupd t/.
Now there is similar documentation available from AMD as well. Now instead of getting anecdotal evidence, (Which can show you clearly there are many holes in this guys Pentium 3 theory), you can see steppings, dissipations, packaging and thermal characteristics for CPUs, not some asshole's rendition of fact into half truths. You want to talk about heat, talk about dissipation first.
Intel could put out a 1.6GHz Pentium III tomorrow [The 1.4 is able to be clocked at that frequency]. They stopped the P3 at 1400 with 512cache for a reason - it started to compete with Pentium 4.
Do you expect anyone to believe you have ever read EETimes or have a sub to Microprocessor Report? I mean, you are poorly summarizing gamer sites here continually.
By the way, for anyone thinking of this guys trash, the Athlon shrinks and the hammer chips will run cooler. Maybe it wont be up to snuff with Intel.
Now the biggest problem with your filth is that Hammer specs aren't out. system integrators need thermal information to make designs. And when Dell and friends get the thermal data, along with everyone else, then you can talk about what kind of heat output AMD-64 will have.
Everything you say is conjecture, crap, fluff. You don't work for AMD, and judging from your intellect, if you did, they laid you off. -
Re:what?
-
Straight from the horses mouth
Here is the actually press release from AMD. Theres nothing in there that says they are stopping consumer chips (infact, they talk about the 64bit chip, and unreal). they do mention that they are 'branching out' Microsoft branched out from Operating systems (everything from keyboards, to chairs, to crappy networking hardware) but they still make the same great os!
-
Capitalism At Its BestUnfortunately I disagree with the original poster, AMD is not leaving the PC chip market, they are spreading their wings. More inline with the current chip industry.
May I remind you that Intel is not exclusively in the chip market either. Intel spread to new concepts in computing years ago and are better of for it (e.g. From their site: Consulting Services, Compilers, Performance Analyzers, Threading Tools, Training Center, LANDesk* Software etc...) While most of these are certainly related to the PC chip industry it is not nearly as narrow as AMD.
In doing what Intel did years ago, they are actually increasing their competitiveness. In fact a quick look at Intel's (INTC) financials confirm just that.
Hats off to AMD. for keeping capitalism and competitiveness alive.
-
Capitalism At Its BestUnfortunately I disagree with the original poster, AMD is not leaving the PC chip market, they are spreading their wings. More inline with the current chip industry.
May I remind you that Intel is not exclusively in the chip market either. Intel spread to new concepts in computing years ago and are better of for it (e.g. From their site: Consulting Services, Compilers, Performance Analyzers, Threading Tools, Training Center, LANDesk* Software etc...) While most of these are certainly related to the PC chip industry it is not nearly as narrow as AMD.
In doing what Intel did years ago, they are actually increasing their competitiveness. In fact a quick look at Intel's (INTC) financials confirm just that.
Hats off to AMD. for keeping capitalism and competitiveness alive.
-
What he really said was ...
AMD will put compatibility ahead of sheer speed. The press release mentions embedded devices, but also demos of 64-bit game and database software. AMD is emphasizing that its 64-bit processor has better backward compatibility than Intel's with 32-bit software, even though its 64-bit mode is slower. This looks to me like a bid for industry support for its x86-64 architecture, hardly a concession of the PC market.
-
Re:Makes no sense
> And before you flame me for not reading the article, I didn't read the article.
And now that I have, you can flame me for replying to myself :-)
The article talked not so much about ditching the CPU business, as partnering with other companies on non-desktop-PC applications--Gibson for digital audio workstations (using the MAGIC network protocol, covered here), JAK Films/ILM for video/storyboarding gear, and Cray for a new Sandia Labs supercomputer--the first two of which look more or less like specialised versions of desktop PCs anyway. So presumably you'll still be able to throw together a 1337 Athlon box for your own use, but they may be treating the Dell/HP/whatever market as a lost cause. -
Re:Makes no sense
> And before you flame me for not reading the article, I didn't read the article.
And now that I have, you can flame me for replying to myself :-)
The article talked not so much about ditching the CPU business, as partnering with other companies on non-desktop-PC applications--Gibson for digital audio workstations (using the MAGIC network protocol, covered here), JAK Films/ILM for video/storyboarding gear, and Cray for a new Sandia Labs supercomputer--the first two of which look more or less like specialised versions of desktop PCs anyway. So presumably you'll still be able to throw together a 1337 Athlon box for your own use, but they may be treating the Dell/HP/whatever market as a lost cause. -
Re:The goal in mind being UNIX?
-
Maya 4 and AMD/Intel systems
It's a bit dated, but here is an interesting article from Ace's Hardware describing performance on AMD/Intel systems for comparison:
Maya 4 and SSE-2 optimisations
AMD also makes a comparison here, but Intel's benchmarks didn't include Maya. -
Re:Merits of RISC
I do not understand how you can say that a CISC layer does not slow the system down and that the ISA is "almost irrelevant". I have interpret that as pure ignorance.
Well, maybe it's ignorance, but I haven't yet seen anything in your post that explains why the CISC instruction set impedes performance.Your best example was the FP stack. However, does that not internally become traces that can access the FP registers in arbitrary ways? Can the traces not eliminate extra spills, dups, swaps, and other artifacts of stack-based computation? If it doesn't currently do so (which would surprise me) then I would expect that a future version of the Pentium certainly could.
I say the ISA is almost irrelevant because the compiler's optimizations occur with RISC-like instructions, and then the actual execution (u-ops) occurs with RISC-like instructions. The CISC ISA doesn't actually do anything except communicate the former to the latter. Certainly there is overhead for translating the ISA to u-ops, but hot code is usually executed many times, and so the translation cost is amortized over a large number of iterations, making it negligible.
AMD's whitepapers on x86-64 claim that the x86 ISA is a good one for their moden processors because they get the code density of CISC with the register usage and ABI models of RISC. Clearly they may be biased because they have a technology to promote, but I think their arguments have merit.
Perhaps you could give an example of how the P4's internal u-op traces are sub-optimal because of the CISC ISA?
-
Re:Pentium IIIs?
The Athlon XP 2800 runs at 2.25 GHz. It's not shipping yet, but they are producing it and it's priced 10% less than the comparable Intel P4 2.8GHz.
-
Re:It's not suprising...your lovely AMD processors too come from an Asian country, Malaysia.
Not all of them. Many of the new ones come from Germany, Fab 30 in Dresden has been voted "Fab of the Year"
-
Re:67C?
Well, that would of course depend on the type of CPU.
The P4 @ 2.8 GHz in the PC in the article should be OK up to 75C, according to Intel.
I've got a dual Athlon MP 1900+ (i.e. 1660 MHz) box, slightly overclocked to 1740 MHz. Under full CPU load (which is 24/7 thanks to distributed.net) CPU0 temperature is around 50C, and 60C for CPU1 (which I think is weird. CPU1 is close to the graphics card, but I still wouldn't think that this would account for a 10 degree difference.) This is with active air cooling using a couple of WhisperRock II HSFs and Arctic Silver 3 "thermal goo".
Anyway, AMD says I'll be fine as long as I stay below 95C! There's a setting in my BIOS that will that shutdown the system if the temp hits a user defined value (80C here), detected by the on-die sensors of the CPUs. I suppose it's the same with the CPU and mobo in the article.
I'd be more worried about the temp of his graphics card. Personally I think I'd keep a quiet fan on the GPU, especially if it's overclocked. In my experience it's the case fans, sucking air in and blowing it out of the case through grilles, cables and plastic decorations that are the noisiest. Get some quiet case fans, or remove them like in this... case... and use some sort of insulation in the box to keep the noise of the internal stuff to a minimum. -
Re:Athalon eats the bill...How on Earth would a sports bag increase your power bill?
Surely you weren't referring to AMD's Athlon processor, were you?
-
ah, the ignorant have spoken..
Did you know that a P4 takes 20 clock cycles to perform a multiply?
Did you know that you are an idiot? the p4 has a 20 stage pipeline, which means the process of excecuting instructions is seperated into 20 peices, and the hardware used to do each one of those pecies works on part of a diffrent instruction at the same time. So while a multiply might take 20 clock cycles to come out of the other side of the CPU, if all you have is a program with one multiply instruction followed by a hlt or something.
Most programs, of course, have more then one instruction. With a 20 stage pipline one instruction takes 20 cycles to run, but you can also perform 19 other instructions along with it... depending on how many excicution units you have along with it.
The p4 has two ALUs, each running at twice the clock speed of the rest of the CPU. (in contrast, the athlon has 4 regular speed ALUs). So in actualy, you'd be able to run 80 or so instructions in that 20 clock cycles.
Integer multiplies are actualy performed by the floating point system, IIRC, rather then by the ALU, so they won't be as fast as addition and subtraction.
The chip IBM is making is a mips based chip, and takes fewer cycles to perform all its instructions. It also has a _ton_ more registers, which means you can perform significant operations without going to or from memory.
IBM is not making a mips chip, moron. They are making a Power PC chip. the p4 has only 8 general purpose 32 bit registers, but in addition has 8 80 bit floating point registers, 8 64bit integer SIMD registers and 8 128 bit floating point/vector SMID registers.
MIPS only has 32 general purpose registers, and although they can be used however you want, several of them are 'reserved' for the stack, and things like that. Also the first register is always zero, and you can't store anything in it. So in actuality, MIPS chips have fewer registers then Intel chips. PPC chips on the other hand do actually have more registers then Intel chips though, with 32 general-purpose registers, 32 floating (64 bit?) point registers, and 32 128 bit vector SMID registers.
This doesn't really help your argument, though: Reading or writing a number to memory is about 100 times slower than an arithmatic instruction.
it's true that reading from memory takes a long time, and that's why modern CPUs don't do it very often. They use these things called "caches" you know? The vast, vast majority of memory access doesn't actually need to hit ram.
But to use those coprocessors, you have to go into modes like mmx. And bolted on extra instructions like mmx have restrictions on them, like not being to do mmx and floating point math at the same time.
No, I was talking about using floating point math for integers larger then 32 bits, rather then splitting 64 bit ints up into 32 bit chunks and adding them with carry (which takes more then two instructions). MMX doesn't allow 64bit int math, as far as I know, but rather allows you to sacrifice floating-point math for accelerated 8, 16, and 32 bit math. It's always interesting in that Mac fans seem to think that Intel chips suddenly lost the ability to do integer math and floating point math at the same time when they gained MMX.
Anyway, that's really beside the point due to the fact that, as you can see, MMX no longer uses the floating-point registers.
For the future, 64-bit is the way to go, and x86 is not. I think one of these IBM processors will be the ideal linux machine. (It'll be low power too, so I won't need a hairdrier-loud fan like I do with my athlon :) )
since when are those separate things?
Might not hurt to learn a thing or two about how computers work before opening your mouth. -
Re:In 50 years, I doubt many will know what Unix i
Shouldn't this be fixed before the problem arises as we will have the ability to address more and more memory?
From what I understand, it's the ability of the processor to count to higher numbers. UNIX's datetime variable is limited to 32-bits, of course, which gives us our 2036 deadline. Of course, with AMD and Intel struggling to be the first to make a viable 64-bit chip available to the end-user, I doubt this will be a problem for long. By the next major Linux kernel revision, and by the next major BSD release, I'm more than certain we'll have the groundwork in place to migrate to 64-bit systems.With the quality of modern computer systems, and the rate at which they're being updated - do you honestly forsee yourself running any of your current machines a decade from now? Certainly not in any form of mission-critical applications, I'd wager. My screaming fast Athlon XP with DDR RAM will likely be relegated to a backup DNS server by that point, providing it's still alive of course.
So two decades from now - what will we be running? Likely our 'antiques' will be hardware purchased in or about the year 2012. Judging by AMD's Processor Roadmap, we'll be seeing the [Claw/Sledge]Hammer procssors within a year or two, and based on the proliferation of current processors (PII/P4, ThunderBird/Athlon/Athlon XP) I'd bet they'll be either commonplace or outdated by 2012.
There will come a day when 64-bit on the desktop will be the 'norm', and there will be weirdos {cough} still running "Those really old 32-bit processors", just like we now have people running C=64s.
:)UNIX will be prepared for its D-Day with more than a decade of breathing room; mark my words.
-
Re:Hammer & Intel
According to the AMD Processor Road Map, the first hammers will be 0.13, but they will be going to 0.09 for the clawhammer in late 2003. Thats where the map ends, but presumably all processors will eventually reach 0.09.
-
Re:removable RAM?
Hmmm...Yeah. We could call it, maybe, Shared Memory Architecture...that's pretty catchy. I';m sure Intel and AMD and VIA would love to talk to you about it.
-
Re:This is very bad news...
You do realize that AMD has been around for about 30 years...
They make more than just processors, check their website and see. It would be a huge problem for them if their microprocessor business went south, for sure, but honestly, they are in no risk of going out of business. -
Why Pentium IVs are slow
The P4's x87 FPU and x86 ALU are just plain slow compared to P3s and Athlons. Though I am surprised your code is running 82x slower. I'd expect more like 2-8x slower for compute bound code. You can get a somewhat sensationalistic overview of why it's so slow at this link.
If you want more in-depth numbers you can compare appendix C of the Intel Pentium 4 Optimaztion Manual with chapter 29 of Agner Fog's Pentium/II/III Optimization Manual. You can see the Athlon numbers in Appendix F of AMD's Athlon Optimization Manual.
If you want to do number crunching with Pentium 4s your best bet is to use the SSE2 instructions/registers. You should be able to get a noticable speedup by using the Intel C++ compiler and telling it to use SSE2 instructions. If you want to eek out max performance you'll have to use assembly language. Though you can probably get most of the way there using the Intel C++ Compiler's SSE2 intrinsics.
I'm curious as to why your code is so much slower on a P4 than on an Athlon. The best way to find out would be to look at the assembly code that gcc is producing. You can do that by using gcc's -S option. If you'd like send me the C code and the output from -S and I'll see if I see anything obvious.
I'm somewhat paranoid about posting my email address. My paranoia seems to work, as I've received no more than the occasional spam in the last few years. My email address is my slashdot user name at woh.rr.com. -
AMD has announced small transistors w/ this tech.
AMD just today announced that they have created 10 nm double gate transistors. Here is the AMD announcement
-
Re:Woop!
Awesome AMD indeed. They say here:
"SUNNYVALE, CA--September 10, 2002--AMD (NYSE: AMD) today announced it has fabricated the smallest double-gate transistors reported to date using industry standard technology. These transistors, measuring ten nanometers, or ten billionths of a meter in length (gate), are six times smaller than the smallest transistors currently in production. AMD's research breakthrough could foster the placement of a billion transistors on the same size chip that currently holds 100 million transistors, enabling a vastly richer computing experience." -
redhat and AMD.According to AMD, they are doing a joint venture with Redhat on their x86-64 Hammer series processor. Do you really imagine Redhat going into this if they had to write closed-source DRM crap into their distro?
Say what you want about Redhat being the next Microsoft, but they always release their code. I don't see them going into this if there wasn't some non-DRM products coming from AMD.
--
Mike -
A more serious answer ...
Since its optimization you are concerned with I have a few choices you will be interested in:
1. The Zen of Code Optimization by Michael Abrash.
2. Agner Fog's Assembly Resources
3. The Athlon Optimization Guide
4. Intel's IA32 Optimization Guide
5. The Aggregate Magic Algorithms
These sources will give you everything you need to know about code optimization for x86. -
*Full* article text follows (part 3 of 3)
(Sorry, the benchmarks pages are pretty much worthless without the graphs. If it makes you feel any better, I didn't see them either:-)
After Effects Pt. 1
Adobe After Effects 5.5 software delivers a set of tools to produce motion graphics and visual effects for film, video, multimedia, and the Web whether working in a 2D or 3D compositing environment. After Affects is a main creative program and works in concert with Adobe Photoshop, Illustrator, Softimage and a Media100 non-linear edit suite.
A user interacts with Adobe After Effects through the GUI and produces finished work by rendering a project to the hard drive. The amount of effects and elements that After Effects can do is far too lengthy to summarize accurately but guaranteed it is extensive in its palette of tools. Therefore After Effects can make a myriad of simultaneous different demands on the CPU/GPU/RAM systems. To demonstrate the benefits of different hardware components a real world After Effects projects consisting of compressed and uncompressed video, EPS, internally generated and PICT text elements, transitions, size scaling, shadows, and treatments was chosen as the test project. Benchmark programs may examine individual demands on a system but in the real world this may not be the case and it is important to measure the results of simultaneous varied demands as well as one specific measurement task. It may not be a standardized test but it shows what to expect from a project that encompasses a lot of different tasks simultaneously.
After Effects primarily uses the processor, video card and ram while a user is working within a composition window and timeline. Adjustments to a project are displayed in real time in the composition window. The faster each of the individual hardware subsystems are the smoother the interaction and the faster the composition window will be redrawn.
When After Effects renders or builds the finished timeline it is the processor, ram and hard drive that determine speed as the video card is more or less bypassed. After effects will call to the disk for information for the processor to calculate a finished frame and then return that frame to the disk for storage. This process repeats for as many frames that are within the timeline. Remember that any video is a series of still frames and After Effects builds each single frame and glues it to the next to finally end up with a playable movie or, conversely, a sequence of files.
Ram is an important consideration with Adobe After Effects. It will function effectively with only 512 MB of ram but more ram is better. Adobe recommends using the following formula to calculate the amount of ram it needs to preview a composition.
[(height x width x (bit depth/8) x frame rate x (resolution) 2) / 1024] / 1024 = MB/sec.
The variables for height, width, frame rate, and resolution depend on the composition setting. Always use the maximum expectations to determine a base of RAM requirement. For example a preview of 10 seconds in NTSC broadcast format would plug into the equation thusly:
[ (640 x 480 x (32 / 8) x 30 x (1)2) / 1024 ] / 1024 = 35 MB/sec.
One then should come to the conclusion that 350 MB of available RAM would be needed to preview 10 seconds worth of an After Effects timeline.
That couldn't be more wrong.
After Effects Pt. 2
How much RAM is needed is dictated by the information in the picture and the compression codec used. For example; 30 seconds of a 640x480 white page will take up much less RAM than a video of a stock car race. That's because more information must be stored about the color changes during moving video. A white page is just that...white...and the program will figure out quite quickly that it can save time by repeating the same information about pixel color instead of storing unique information about each one. How much required RAM depends on the variety of color, how often each pixel changes and the particular compression codec used. This will be the only time where a strong recommendation is made. Get at least 1 GB of RAM to make the After Effects experience more enjoyable. Get more RAM if it is expected that there will be a need for longer previews or work in D1, HDTV or widescreen format.
There will be two benchmark measurements to identify the benefits of different processor components. The the bars on the left feature a small jump in AMD processor speed to demonstrate how more CPU horsepower will speed up the CPU/GPU/RAM dependent RAM PREVIEW. THe bars on the right demonstrate a small increase of CPUhorsepower and its effect on rendering speed. This should help determine if the increase you are considering is worth it.
The results do show the greatest impact in each of the two major functions of After Effects. A small increase in CPU does have a big affect on rendering speed but not in real time ram preview. When designing a system on a budget it is important to identify what is expected and, if budget restrictions require a choice, then the desired balance between expectations must be sought to satisfy favor user interactivity or speed of rendering. Also anticipate that longer RAM previews or larger format previews require more ram. You can literally watch the ram fill. Just leave task manager open and watch the page file usage creep up. The goal in making purchasing choices is to work backwards from what you expect in the end result.
SoftImage XSI
SOFTIMAGE|XSI v.2.0 is an incredibly powerful 3D tool that has the capacity to bring virtually any system to its knees especially if raytracing, radiosity or photon-mapping is used to a large extent. If this is the case then there definitely will be a loud scream of anguish coming from a solitary PC system. Softimage projects can become so system intensive that 100 finished frames can take an insane amount of time to render. In order to increase rendering speed many computers are equipped with specialty hardware and are tied into render farms in the single-minded task of rendering a single scene.
That's enough of the fire and brimstone about complex 3D rendering. Softimage works on somewhat similar principle to After Effects. A faster and more powerful video card will translate to a smoother interface where complex scenes can be manipulated in real time. Note that Softimage does not have an interface to real-time preview a finished frame as unlike After Effects. Users can manipulate objects in a choice of views from wire frame mode to simulated real-time shading mode. In order to look at a finished frame a user must render the frame to disk which bypasses the GPU. A faster processor will result in the faster render. The amount of RAM is not as great an issue as the user is working frame by frame and the graphics card is doing the bulk of the work while working within the GUI.
This is a most basic overview and there are specialty hardware components that can enhance the speed and interactivity of complex 3D scenes and programs. The designers working on the test system use Softimage on a less complex level to provide enhancements and elements to commercials, promos and station ID elements. Though their work is quite complex to some it a far cry from that of special effects in major film productions.
Speeding up Softimage requires thinking on the same two levels as After Effects. The Softimage GUI can display very complex and varied effects but it does so in simulated mode. Displaying the finished rendered product in real time is beyond the capacity of most video cards. But there needs to be the hardware features on the video card to accommodate for smooth manipulation of 3D objects and the proper display of simulated effects. Softimage, AutoCAD and various other 3D programs need to access those hardware features in order to function and display the image properly. Don't think that a fast gaming card comes with these physical hardware features or have those that are onboard...unlocked. Remember that last word as it comes to make sense later.
It is quite true that a fast gaming card will be a poor performer in Softimage if it works at all. Conversely a workstation class video card may make for an enjoyable user experience in a complex 3D application but will deliver lower frame rates in games. It is safe to say that different applications require different hardware tools. Matrox provides some insight to the locking and unlocking of features on video cards.
The Parhelia workstation solution differs in no way shape or form to the retail Parhelia, which does go against the grain. Competitors tend to artificially inflate prices for their workstation products by unlocking features even though the chip may be identical to or based upon the same technology as their retail offerings. This so-called feature locking doesn't occur with Parhelia when compared to our retail solution. This is the key point here, so whether you are a prospective Parhelia client, purchased a retail board or have one integrated in your system, all Parhelia boards have access to the same workstation functionality and Surround Design support.
Softimage, by default, is designed for a single monitor interface yet the layout can be customized for dual and even triple monitors. It was most interesting to hear the comments made when the designers started to spread their workspace out to the second monitor and then to the third. Since Softimage bypasses the video card in the render process there was no performance loss.
A simple animation of 100 frames in length was rendered out with two different processors. As rendering in Softimage relies upon the processor most...then the faster the processor should result in a faster render. The animation data is as follows:
You may ask if this is any good? Just for laughs I let the art director take the project to his dual Xeon 1.8 GHz nVidia Quadro driven power box and he did beat the time by a full 10 minutes. He also beat the price by a full $3000 (cost of purchase of art director's system vs. article test system). Somehow I'll wait the 10 minutes and keep the 3 grand in my pocket. A single Xeon 450 with a Quadro card takes over 3 hours. Those numbers are completely unofficial but it lets you see the range of performance.
Benchmarks Pt. 1
Before the benchmark
Benchmarks are a yardstick we use to measure performance. Not one benchmark stands above the rest as the defacto tool. Benchmarks are useful to identify major peformance problems in a system. They can also be used to identify the impact of hardware changes on overall system peformance. This is very useful especially when combined with the software expectations. A faster processor may deliver faster renders but not help with a smooth GUI. A better video card may deliver a smoother interface but won't help if long ram previews are required. The performance enthusiast and overclocking crowd are edging each other by a handful of points or frames. Remember this as you look at graphs and charts. Don't look at just who's in front but also by how much both in points/frames and cost.
3D Mark 2001 SE
The granddaddy of benchmarking tools measuring how effectively a system runs 3D graphic applications. Moving from the 1900+ to the 2100+ showed only a small increase in peformance. This isn't critical for workstation applications but may be the goal of gamers to squeeze every frame per second gain from their systems.
Sisoft Sandra
Small increases in processor speed appear to have the greatest impact in Sandra's multimedia benchmark.
Benchmarks Pt. 2
GLExcess
Quake IIIArena
Serious Sam the Second Encounter
Business Winstone and Content Creation
Benchmarks Pt. 3
Code Creatures
Commanche 4
DroneZ high quality.
SpecviewPerf 7.0
This benchmark really testsOPENGLperformance and it is important to note that there is a large discrpency between our results and the results from Matrox on their test system. We are investigating this. (Our system scored much lower)
PSBench
We added a new benchmark to our tests. PSBench looks at 21 individual tests in Photoshop 7.0 and the results can be looked at individually or as a cumulative score. There are three levels to PSBench; basic, intermediate and advanced. This test shows the results of the intermediate tests.
Media Cleaner Pro
Three tests were conducted to compress a 651 MB 640x480 NTSC Quicktime file. The larger the file the more good a faster processor is going to do you.
In the Driver's Seat
Who's in the driver's seat?
If you were to be put into the driver's seat of a race car would you be able to win a race against a professional driver in the exact same car? Probably not given the fact you don't know how to properly drive a race car.
Computer hardware is just that...hardware...and it can't do anything without being told how to do it. While hardware itself does go through advancement cycles as new technology emerges into mainstream it isn't worth much if it doesn't work or work well. Driving the consumer PC market forward are games. Gaming video cards have fallen into 3-month product cycles with new versions being announced before the prior has even hit store shelves.
A comment from Mark Randall of Serious Magic in a TechReport article piqued interest to look beyond the hardware for performance.
The problem isn't the hardware, it's the software drivers. In fact, the speed could be dramatically increased with revised software drivers. However, no manufacturer has presently made this aspect of driver performance a priority. The first card manufacturer to address this issue would deliver the following benefits to their users:
Mr. Randall goes on to state that software drivers, properly addressed, could increase render time, record game play in real time, capture motion images off the desktop or even stream video out to the internet directly from the video card.
Drivers can indeed be a problem. Ask anyone who's experienced a Blue Screen of Death (BSOD). The $64 question is about the driver itself. Are we, the persistent purchaser of PC parts, being cheated out of performance that could be ours without a hardware upgrade?
A graphics card is built on the power of the Graphics Processing Unit (GPU). This is a processor chip and it doesn't make financial sense to reinvent the chip each time a new video card is released. This is the same for CPUs. The AMD Thunderbird chip scaled all the way up the 1.4 GHz before the Palomino core took over till 1.77 GHz and now the Thoroughbred core extends the range past 2 GHz. The same can be said for INTEL PII, PIII and PIV architecture.
The point is that features are either locked or unlocked on some graphic cards and the differences between adjacent levels of product may be very subtle; as subtle as a fresh set of tires and a tweak to a spoiler setting may make the difference between winning and losing the race.
If all the hardware is available then how visually enjoyable or complex that game may be, how fast a render is or if the card can support the software itself may come down to what features are hidden. Case in point; in the early stages of this article Matrox was developing and refining drivers for the Parhelia with such software applications like Softimage. Use the official 2.31 drivers with the Parhelia and Softimage won't recognize the card as an OPENGL card and won't access those OPENGL features and not perform as expected. One or two driver revisions later and Softimage is happy.
But it isn't as easy as that. Between the software and associated drivers and the hardware is the Application Programming Interface (API) layer. Hardware and software speak two different languages and they need some way to properly communicate with each other. Explaining the API is a fairly complex matter but think of computer hardware as your body. The software resides in your head as a desire to do something like walk, talk, run, jump, or eat. Between that software thought of wanting to walk across a room to pick up an apple and take a bite out if it and the mechanical act of actually doing it is a series of hidden instructions that just happen. You don't really think about activating individual muscles to tighten and loosen on that incredibly precarious journey of balance as you stride across a room. You don't actively plan and coordinate in 3D space the relation of the apple to you or to your hand and then calculate placement and pressure required to take a bite. These things you just...do.
The API acts in a similar fashion taking what the software wants to do and translating it to the hardware to do it and then returning the result back to the software to display. The most recognizable examples of API layers would be Microsoft's DirectX and OPENGL but other software can have its own proprietary API layer is with ADOBE and their programs. DirectX and OPENGL take interesting approaches to 3D graphics and each has their inherit advantages and disadvantages and they can be more than just coding issues....they can be political. For a far superior explanation I suggest a visit to www.jakeworld.org to read an article by guru game programmer Jake Simpson and his article on Graphical API History.
Drivers are much more complicated that one might think. They can be a proverbial house of cards. Each game, application, tool, player and so on interacts with the video card in a subtly different way. Drivers are initially designed to work with everything but may not work to their fullest potential. That's where optimization begins and the people who build drivers begin the task of figuring out what enhancements or tweaks can be made to their drivers in order to gain performance and stability. This must be done one program at a time and there is an extensive list of programs. Just think of how many games there are then begin the task of trial and error to get the best performance out of each individual game.
It isn't as simple as taking those individual driver enhancements and putting them into one set of drivers. A tweak in one enhancement can cause another tweak to turn into a problem and fixing that problem can create four others. Drivers are a balancing act between performance, stability and cost. It's almost an unobtainable triangle. Achieving performance and stability takes an unlimited pot of R&D money. Achieving great performance may cause instability. Achieving stability may cost performance.
And around it goes.
So hardware manufacturers strive to achieve balance by designing their product to fit a niche purpose. It would take too much time, effort and money to build the fastest, most stable gaming/workstation/single monitor/dual monitor/triple monitor/multimedia/digitize/output video card. It can be done but the cost of the product would be 10 times an unacceptably high price.
Manufacturers In the video card market choose where their priorities are based on what market they want to capture. Gaming cards and their drivers are optimized for games with lesser emphasis on workstation applications. Workstation video cards are optimized for the reverse. Let's face it. There's more money to be made in gaming cards than the workstation cards.
If, for the most part, the hardware can support significant performance improvements then is it the fault of the API, software or drivers and are we being cheated? This brings us back around to Stephan Schaem, Chief Technology Officer of Serious Magic.
In some cases, card manufacturers have chosen to differentiate their 'consumer' vs. 'professional' cards by introducing essentially identical cards with different firmware and software drivers. The manufacturer's state that the additional cost of the pro product goes to fund development of advanced driver features that are particularly useful in production environments. The issue that Serious Magic has focused on is a different one. It's a significant issue in PC graphics card performance but we don't believe it was an intentional omission.
In a nutshell, here's the issue. While today's graphics cards can render images very quickly, the software drivers are painfully slow at getting rendered output back over the AGP bus and into the PC where it could be saved and put to work by users. Current generation software drivers achieve only a fraction of the theoretical download transfer speed that the hardware you've already paid for is capable of. It's remarkable that a graphics card with a video input and some video recorder software can record TV-quality images to the PC hard disk in real-time, yet the same card can't record it's own renderings at even 1/10th this speed. Serious Magic has made a benchmark which demonstrates this problem freely available on our website:
www.seriousmagic.com/3D-Dloadbenchmark.zip
The problem isn't the hardware, it appears to be the software drivers. This is supported by the fact that the external video input to a VIVO-enabled graphics card can be moved over the AGP bus very quickly. Also, some software drivers under Windows 98 are able to move the rendered output very quickly. However, in all cases under Windows 2000 and XP the speed of transferring the 3D rendered results of the same card is very, very slow. It seems that the speed could be dramatically increased simply with revised software drivers. While this is a significant issue for many business, educational, production and scientific tasks, it is not a feature that gamers are clamoring for (although it would make capturing movies of game output faster, this is not as coveted as a higher frame rate). We believe that this is why no manufacturer has yet made this aspect of driver performance a priority. Even the more expensive cards with drivers targeted at the professional market are equally poor at this task. Hopefully, with the game market rapidly reaching saturation, manufacturers will realize that the growing business, educational, production and scientific markets can be substantial. Although each of these markets may be small when compared against the game market, when combined they can add up to meaningful numbers.
And don't tell me there's a difference between drivers. Here is an example of the same system benchmarked the same way except for the change in video card drivers.
Speed! I need more speed Scotty!
What does the future hold? Processors, graphic cards and RAM are edging upwards in speed and bandwidth. The 3GHz mark is within reach for both AMD and Intel. Matrox opens up a huge 17.6 GB/s pipe with the Parhelia and DDR ram is bumping up the performance ladder as seen in the table.
Memory name
Type name
Clock speed
Voltage
DDR clock speed
Data Bus & Bandwidth
PC100
.100MHz
3.3v
.64-bit, 0.8GB/s
PC133
.133MHz
3.3v
.64-bit, 1.05B/s
PC1600
DDR200
100MHz
2.5v
200MHz
64-bit, 1.6GB/s
PC2100
DDR266
133MHz
2.5v
266MHz
64-bit, 2.1GB/s
PC2700
DDR333
166MHz
2.5v
333MHz
64-bit, 2.7GB/s
PC3200
DDR400
200MHz
2.5v
400MHz
64-bit, 3.2GB/s
PC4200
DDR533
266MHz
2.5v
533MHz
64-bit, 4.2GB/s
Today's ultra-powerful CPUs, GPUs and RAM are tied to a proverbial boat anchor. It's the motherboard with its inherent latency and bottleneck problems. Further to that is the I/O rate of the hard disk or how fast data can be lifted from or stored to the platters.
A way to increase After Effects render speed is to increase disk speed and this is accomplished by moving to a SCSI disk array. Unfortunately in the restrictions of a home buyer's budget it would push the cost above an acceptable level. SCSI disks have a greater throughput of data than IDE disks. ULTRA160 SCSI disks deliver a maximum 160 MB/s and the newer UTLRA320 SCSI deliver 320 MB/s. The less expensive IDE drives can move data at a maximum of 100 MB/s (ATA100) or 133 MB/s (ATA133). We all know that actual performance with either SCSI or IDE is significantly less than theoretical boasts. Any of these disks in an array can further enhance performance with SCSI arrays reaching upwards of a theoretical 500 MB/s. Processors can handle a greater amount of data in After Effects but must wait around for the data to exchange with the hard drive.
CPU, GPU, Ram and hard drives work through the motherboard and therein lay the bottleneck. CPU, GPU and RAM may be able to accept and shovel out information with great speed and in huge gulps but the problem is that the pathway between components is relatively small and not nearly as fast. It's like trying to drain or fill a swimming pool with a garden hose. A solution is to get a heck of a lot more garden hoses or a bigger hose.
Both AMD and INTEL are backing solutions and each in their own way. AMD brings HyperTransport with a bigger hose and INTEL counters with the many hose analogy for PCI-Express, formerly known as 3GIO. INTEL also is deep into it with Infiniband. Infiniband is more of an outside of the box solution providing for reliability, availability, scalability and performance gains between data centers, such as server disk arrays. It isn't paramount to this article but worth mentioning as it will have an impact on how fast two systems can talk to each other. Both AMD and INTEL have the same goal to increase the amount and speed at which data moves through a system or device.
HyperTransport
Chipset? Who's got the chipset?
AMD HyperTransport Technology-Based System Architecture should be thought of on two levels; within the specific component and between components. In other words HyperTransport technology, when applied to a component such as a processor, can raise the bar on how fast it can complete an operation or how much it can process at any given time. HyperTransport, when applied to the pathway between components, increases the amount of data (bandwidth) and reduces the time for it to get around (latency). HyperTransport allows for the pool to drain or fill faster due to a very much larger hose.
HyperTransport promises some pretty hefty improvements to loosen the noose on bottleneck I/O problems. HyperTransport technology is used to provide high-performance interconnects between integrated circuits that comprise the system's core. Peripheral device interconnect is provided by existing industry standard busses such as USB, IEEE-1394, IDE, SCSI, Serial ATA, etc. In other words AMD is aiming to provide a large bandwidth, high speed platform. AMD makes the HyperTransport technology available and leaves the rest up to the other manufacturers. This may mean a bigger, better, badder motherboard.
HyperTransport Technology
HyperTransport technology is an advanced high-speed, high-performance, point-to-point link for integrated circuits. HyperTransport provides a universal connection that is designed to reduce the number of buses within the system, provide a high-performance link for embedded applications, and enable highly scalable multiprocessing systems. It was developed to enable the chips inside of PCs, networking and communications devices to communicate with each other up to 48 times faster than with existing technologies.
Compared with existing system interconnects that provide bandwidth up to 266MB/sec, HyperTransport technology's peak bandwidth of 12.8GB/sec represents better than a 40-fold increase in potential data throughput. HyperTransport technology provides an extremely fast connection that complements externally visible bus standards like the Peripheral Component Interconnect (PCI), as well as emerging technologies like InfiniBand. HyperTransport technology is the connection that is designed to provide the bandwidth that the new InfiniBand standard requires to communicate with memory and system components inside of next-generation servers and devices that may power the backbone infrastructure of the telecom industry. HyperTransport technology is targeted at the networking, telecommunications, computer and high performance embedded applications and any application in which high speed, low latency and scalability is necessary.
The AMD-8000 (HyperTransport) series of chipset components stack up to some large numbers promising a peak throughput of 12.8 GB/s.
AGP 8X doubles the bandwidth moving peak transfer rate up to the 2.1 GB/s notch.
PCI-X (not to be confused with PCI-Express) significantly improves data transfer rates from 100 and 133 MB/s all the way up to nearly 1 GB/s peak data transfer.
USB 2.0 allows for connecting exterior USB peripherals to access the system via a 450 MB/s pipeline.
It's a very simplified explanation but it means that PC systems have the potential to make rather large performance jumps in the relatively near future. HyperTransport technology is a reality as evident by nVidia's nForce chip but don't expect full featured HyperTransport motherboards to find their way onto store shelves for some time to come.
More on Hypertransport technonology can be found at the website and in an AMDwhite paper.
PCI-Express
All aboard the Express!
INTEL stands behind PCI-Express and Infiniband. The performance gains have been staked even higher than HyperTransport with an initial offering of 2.5 GB/s/direction up to a projected advance to 10 GB/s/direction and beyond. It appears that PCI-Express is initially designed to fit into the existing box and Infiniband is designed for improved connectivity out of the box such as connecting server data centers.
PCI Express architecture is described as a high-speed, general purpose serial I/O interconnect that provides the bandwidth for current and future applications. After reading about PCI-Express it is almost impossibly difficult to sum up this technology into a single sentence but the PR team managed to do so with a collection of words that commits to nothing yet sounds exciting. Nonetheless, PCI-Express has the same goal as AMD with one major difference. PCI-Express has been designed to fit with present technology. It also partners well with Infiniband.
HyperTransport is a new chipset entirely thus, as an example, a brand new motherboard would be required. It is up to motherboard manufacturers but in order to satisfy consumer demand there may come a time where motherboards may feature a PCI-Express port as an option to add PCI-Express components. This may happen at the relative same time that HyperTransport motherboards enter the marketplace. It's debatable to which is the best approach. Is bolting on new technology to enhance current the better route or is it best to start from an entirely next-gen platform?
A further question arises about data transfer to and from the hard drive platters. To get faster data transfer the disk needs to spin faster or the data algorithm has to be more compact or a combination of both. There comes a limit to how small the data can be made. Seagate explains;
Today, as the magnetic particles that make up recorded data on a hard disk drive become ever smaller, we are approaching a point where the data bearing particles are so small that random atomic level vibrations present in all materials at room temperature can cause the bits to spontaneously flip their magnetic orientation, effectively erasing the recorded data. Magnetic recording scientists and engineers have calculated that this so called superparamagnetic effect may become a serious technology issue for new products in only two or three years.
But as soon as it is said that it can't be done'
Seagate has decided to use a HAMR to cram more and more bits of information per square inch into hard disc drives, pushing the limits of magnetic recording even further beyond what was ever thought possible. The Company today demonstrated its revolutionary Heat Assisted Magnetic Recording (HAMR) technology, which records data magnetically on high-stability media using laser thermal assistance.
HAMR, combined with self-ordered magnetic arrays of iron-platinum particles, is expected to break through the so-called superparamagnetic limit of magnetic recording by more than a factor of 100 to ultimately deliver storage densities as great as 50 terabits per square inch. This will provide the capability for people to store the entire printed contents of the Library of Congress on a single disc drive in their notebook computers.
Hard drive space has increased at a phenomenal rate over the last 5 years. It used to be that 270 MB was considered a big disk and now 80, 100, and 120 GB drives are commonplace. (270 MB is less than one percent the size of a 120 GB hard drive.) Space increases and falling prices keep the consumer happy but what happens when the consumer turns their attention away from processor speed and disk space?
PCI Express and HyperTransport bring the promise of faster productivity on the computers that we work with today. This will buy time until hard drives become something more than they are and perhaps less integral to the real time operation of a system. Fitting the multitude of software and hardware architecture together into a coherent working solution may take time but it is on the horizon and we'll witness some form of its arrival sooner than later.
And where will it stop? Will we expect real time renders or projects rendered faster than real time? In whatever form it happens to finally evolve into next generation technology could make today's super fast PC the 486 of tomorrow.
More on PCISIG can be found at their website and this FAQ. Also look to the other white paper on 3GIO. Infiniband information can be at the website and in the FAQ.
Conclusion
Workstation class PCs were always thought of as very expensive and powerful beasts affordable to only those with deep pockets. Everyday a new piece of hardware comes onto store shelves and if properly picked can make for some formidable computing power at very affordable prices. You don't need the best of the best hardware to do the work. Perhaps that diamond tipped, gold plated shovel isn't needed in the garden when a plain old spade will do the job just as well.
I commend those who waded though this. PC configuration is like a jigsaw puzzle; you need a few pieces of information to begin to see the big picture. After this you may be left with the question of what would we recommend? Our test system tackled the workload of a professional broadcast design department and performed well and even better than some existing systems. We thoroughly enjoyed the extra display that the Parhelia brought to the work environment. Remember that a workstation is not designed to be a competitive gaming computer even though the designers had to be told on several occasions to do work instead of playing Quake. The AMD processors made a few INTEL loyalists reconsider. All of them were like curious children when we broke from the beige box syndrome. Those that knew the price of professional 2D/3D workstations said...it cost what? If you are building from the ground up or just adding on...determine what you want first. If it is workstation graphic power then balance the GPU-CPU equation as a little more money invested in one or the other may deliver better results in the end.
Begin with the end. Getting more from a workstation, gaming or home multimedia PC is a matter of answering the questions of what is expected from the computer. Define your goals and get your hands dirty with a little research then you'll end up with a PC that is better suited to your tasks and, perhaps, your pocketbook. We built a system that made many users very happy. It also made my budget very happy as well. It is amazing the creative power that's available in computer hardware today.
In closing I'm reminded of an old saying. Give a man a fish and he'll eat for a day. Teach a man to fish and he'll eat for life. In other words; if I tell you what's best now you'll have the best for a day but if I teach you how to choose what's best for you then you'll have the best for life.
Icrontic extends their appreciation to the good people at ABIT, AMD, Matrox, GlobalWin and an ever-faithful AMKComputers for their assistance and involvement with this article.
Personal Opinion
The use of benchmarks, charts, graphs and a lot of technical talk are valuable in the price vs. performance equation but it all comes down to how a computer system feels. Marketing surveys may show results such as 9 out of 10 users thought it was fast but what happens if you are the 1 out of 10?
Our test system surprised us. Perhaps we were rooted in a MACdesign world for too long or caught with our pants down for keeping up with technology. The home PC enthusiast most likely upgrades more times in a year than an office does in 5 years. In unofficial comparisons our test system beat our single and dual processor G4's and nipped at the heals of a dual XEON Quadro system.
We didnt' set out to build a gaming machine but we were able to play games and not worry about being blown up when our computer couldn't keep up. Softimage and After Effects are what interested us the most. Fast renders and an easy interface would make our head spin. The Matrox Parhelia brought great amounts of real estate and a great image quality but a few problems. Softimage is not the most well-behaved program at the best of times. It was cranky to begin with and within a few driver tweaks Matrox engineers had it under control. There are still a couple of bugs but they are getting harder to find and most wouldn't stumble across them. Softimage did have some very minor display problems with the second and third display but these should be gone with the release of the 1.01 drivers. The other problem wasn't the fault of Matrox but more us. Our cabinetry was configured for dual monitors and not for three. Nevertheless the Parhelia functions extremely well in single, dual or triple head mode. A lot of people hadn't heard of AMD, ABIT or GlobalWin and didn't know there was so many choices and options. They definitely marvelled at the AMKcase.
We thought it couldn't be done on a budget. It's simply amazing the sheer computing power available at our fingertips. Immediately half the computers that were twice the price...were made obsolete.
Sure there were the doubtful who mocked and stood firmly by their convictions...as the familiar sound of the MACs crashing echoed down the hallway.
-
*Full* article text follows (part 1 of ?)
This is the full text of the article, unlike an earlier post, but it's going to appear as a series of posts. Bear with me, I'm having issues getting this in as one post, since it's about 90k. I thought the only problem was going to be finally retrieving all the pages after suffering through countless error messages. Unfortunately, slash-friendly HTML eliminates the pictures and tables, but I tried to keep the data in the tables. (Posting AC to avoid Karma wh0ring.) Enjoy.
Introduction
The power to create. Creative souls toil away inside the walls of the design department or I the dark confines of an edit suite in a television station. As the production manager I often see the graphic designers leaning back in their chairs staring at their monitors. When questioned I usually get the response...rendering. I'm often told there's a need for a second or third computer so they can do other work while one system is busy rendering. In the broadcast environment rendering usually means 1-4 hour waits for finished elements. If waiting for one system to finish a piece for use in commercial or promotion it can be hell when there are deadlines to meet. Time is money. Waiting is frustration. Hardware should not dictate creativity.
People often assume that I work with immensely powerful computing power in the television production world. Sometimes I do and those computers can come with price tags that the computer itself couldn't work out. Professional 2D/3D workstations are thought of as expensive and in today's market of shrinking profit margins the saying that you have to spend money to make money takes a back seat come capital request time.
So we here at Icrontic set out to build a bigger, better, badder workstation on a home PC budget.
The question of what is the best is not easily answered. Determining what is the best for your needs and expectations is a matter of knowing what your demands are and learning how to fulfill them. What is expected from the PC workstation? Do you want fast renders? Do you want to easily manipulate complex 3D scenes or drawings? Do you need the fastest processor, biggest video card, the most RAM or the fastest hard drives?
Can you do more for less?
That's what every manager wants to hear especially when assembling the yearly departmental budget. I'm in a unique position in my professional life and it allows a look at this problem from many sides; financial, user and builder. I wear one hat as the department manager. I wear a second hat as an active writer/producer/director who works daily with the department on television commercial projects. I wear a third hat as PR manager and a hardware reviewer for Icrontic. It means that I have no one to complain to but myself when it comes to the equipment not being fast enough. It also leaves me frustrated that the IT people dictate that I have to buy overpriced workstations when I know I can build two or even three systems for the same price.
So I unleashed a room full of designers on an affordable system we put together. (The image reminded me of a commercial, now a decade or three old, that features a gorilla doing his best to destroy a piece of luggage.) The designers are rooted in the MAC world and if a PC is required it has to be the hugely expensive and well-known order off the web workstations. (I'm not going to point fingers) Even the art director's personal home system is a three to four thousand USD dual 1.7 GHz Xeon workstation with an nVidia Quadro card.
Did we do it?
Simple answer? There isn't one. What looks good on paper may not perform well in reality. Benchmarks give some information but not the complete experience. More isn't necessarily better.
The best is a matter of debate but the smart consumer knows a lot about what they expect, a little about how it may work together and enough to choose the right combination of hardware. The following pages are just that; a guide to determine your expectations, answers to how it all works and a little bit of knowledge to make the right choices. Armed with this information you can more easily navigate the world of what's best for you through the ever-changing landscape of computer technology.
The big picture
The majority of PC consumers buy pre-built systems based on assumptions and budget. Today's PC consumer has more information at hand to select or build a PC that is better suited for their needs. Choosing or configuring the particular components is often based on how much, how fast and how big can it get while staying within a budget. The MHz rating of the processor is often the first consideration in this equation. The PC consumer looks for simple answers. Compromises are often made in order to divert budget to obtain a faster processor. The trap is the assumption that more megahertz is better could short-change other components in a system and the consumer ends up frustrated by a lack of desired performance to suit their needs.
The goal of this article was to build a PC, on an acceptable home buyer's budget, to function as a workstation capable of taking on 2D and 3D jobs in a broadcast television station. The hopeful conclusion will be to teach you that what you expect from the PC is the first question that must be answered before choosing the parts.
Defining broadcast industry standard in a PC has some grey area depending on where it is in the production chain. Broadcast video has to meet a set of parameters that can only be measured by a video waveform monitor. Expect to shovel out approximately $4000 USD to add this option for home use.
This PC will be used to output work that will eventually find its way to a non-linear editing system that assembles and outputs it to tape. While the display image is extremely important for the graphic designer it is the finished file itself that is eventually transferred to a format for playback to air. The video card will not be used to output a signal that will be recorded or used straight to air.
The work produced is either a completed piece or a collection of elements that are to be used in a completed piece. These elements may be produced solely or through the combination of 2D and 3D software applications such as Adobe Photoshop, Adobe Illustrator, Adobe After Effects and Softimage. For example, Photoshop files may be used as name supers or background elements. On a larger scale, After Effects may be employed to composite Photoshop, Illustrator and Softimage elements plus internally generated elements and effects to build a complex timeline that is rendered producing a finished piece or pieces. A waveform monitor is referenced at certain stages to ensure the output does not exceed acceptable levels.
The PC workstation needed to be the right balance of components that have the power to manipulate complicated 2D and 3D applications real time then render at an acceptable rate. Image quality was important to display sharp, true images. This combination would be easy to obtain if money was no object...
But money is the object.
Is it possible to build a workstation on a budget then have it stand up against a room full of graphic designers? Broadcast production requires expensive specialty hardware but technology is making some leaps and bounds at the consumer level. Is it possible for the home PC buyer to have an affordable system that stands up to professional expectations?
As the old saying goes time is money and in the time it took to click a mouse a few times the price tag of a popular retail pre-configured workstation rocketed up to just over $11,000 Canadian or nearly $7,000 USD! That's not a typical home PC buyer's budget. Dude...we didn't have to get one!
So this is what we got.
The broadcast box:
- AMD 2100+ Thoroughbred Processor
- ABIT AT7 motherboard
- Matrox Parhelia 512 triple head video card
- 2 x 512 MB Micron PC2100 RAM
- Sony 52x CD
- LG 32x10x40x CDRW
- 40 GB Maxtor ATA133 Hard Drive
- 60 GB Maxtor ATA133 Hard Drive
- 2 x Samsung 950p 19 Monitors
- USB Keyboard and Logitech USB wireless Optical Mouse
- Globalwin CAK4-76T HSF
- AMK SX1000 modded PC case (window, fans, cables, loom)
- Enermax 465 Watt FC PSU
- Windows XP Professional
- Digital Doc5
The price tag came in just over $3500 Canadian or approximately $2200 USD.* That's 70% less than the well known pre-configured workstations priced out initially. It may still be expensive for family use but it had to do a little more. The crucial step in choosing a system is determining what is expected of it. If it is there to surf the Internet, write the occasional school essay and send/receive e-mail then a very economically priced computer can be built.
It's just that e-mail was the last of the concerns in a workstation.
*prices including monitors and OS as of September 1, 02 currency converted from CNDto USD. Source: www.atic.ca
Choosing Chips Pt. 1
Choosing a system does begin with the processor as it determines choice of RAM and motherboard. This may lead to price differences that greatly affect the end product performance especially where a budget is concerned.
Choosing a processor used to be as simple as the most MHz for your money then add the other components to fit the budget. Intel has exploited public perception by raising the MHz bar ever higher. The question remains; is more...better?
AMD or INTEL: which to choose? These two companies play a rival game akin to David and Goliath where Intel's market share and marketing capital seemingly overwhelm AMD. Meanwhile AMD is the enthusiast's choice and many of these enthusiasts vehemently defend AMD for performance and where the smart money is. It's a lively discussion on whether the tables have turned and if INTEL is on the defense while AMD is on the offense. One cannot ignore the fact that the balance of power is shifting with AMD clawing away at INTEL market share. Why AMD is gaining chips away at the very foundation of INTEL claims that faster is better.
The introduction of the highest-performing PC processor in the world is a victory for application performance and a resounding defeat for the 'megahertz myth,' said Ed Ellett, vice president of marketing for AMD's Computation Products Group. As the performance leader, the AMD Athlon XP processor 2600+ reigns as the superior choice and delivers outstanding application performance for richer, high-powered digital computing.
The chip wars float around catch phrases to attract consumer attention. The most common is the megahertz or gigahertz rating. The buying public believes more is better. INTEL proudly trumpets this fact and AMD challenges it squarely. In side by side comparisons between INTEL and AMD processors the difference in the performance line between the two can be very thin. To some the choice is quite simple but if it's not then you need to know a little bit about what is coming to market and why to at least help in the decision process between models of processors.
The latest advancement is the recent move from 0.18-micron technology to 0.13-micron technology by both INTEL and AMD.
What's a micron and how big is it?
A micron is pretty darn small. There are twenty-five thousand four hundred microns to one inch. A human hair can be anywhere from about 40 to 300 microns wide. A powerful microscope is needed to see an object that is one micron wide. An object that is one micron wide is smaller than most bacteria. That's how small a micron is.
AMD and INTEL have reduced processor manufacturing to the 0.13-micron scale. That means the smallest circuit in the processor is only 13 microns wide. It's not like you could use your soldering iron to fix a broken connection. This is pretty close to the nanotechnology scale that is so often bantered around in the science fiction shows we watch.
Why is smaller better? Processor chips are etched onto wafers of silicon. If the overall size of the chip is reduced then more chips can be etched onto a single wafer of silicon. This increase in the number of chips per wafer reduces the cost of manufacture which, we hope, will be passed on to the consumer.
Processor manufacturers aim for a balance between reducing size and increasing processor capability. If a 0.18-micron processor is made using 0.13-micron technology then the overall space taken up by the circuitry is reduced. Let's put this on a scale that is easier to visualize. If a home theater system is shrunk in size by 50% and the bulky 33 TV is replaced by a flat panel TV then there would be a lot of room left over in that wall unit of yours for more stuff. You may choose to buy a smaller wall unit or cram more stuff into it. Perhaps a compromise could be reached between adding more stuff and reducing the size of the wall unit.
Processor manufactures do the same striving to reduce the overall size but still pack on more stuff.
Choosing Pt. 2
Good things in small packages.
Smaller is better and the additional stuff is notably an increase in L2 Cache. This may be a term that is familiar but not quite understood. Cache is small, fast memory located on the CPU. CPU Cache holds the most recently accessed code or data. This SRAM is accessed much faster than your main system memory because it's located right on the processor core. Processor manufacturers started to increase the amount of L2 Cache due to demands that software was making on the CPU. Manufacturers are also looking to increase the speed of this cache. The more data or code the L2 cache can contain and the faster it can process should mean an increase in system performance.
Voltage x resistance = bad.
The more that is packed onto a processor and the more it can do takes electrical power or voltage. This simply translates into an increase in thermal heat as MHz and technology increases. Reducing the scale or die size of the processor reduces the required voltage for the processor to properly function. An electrical signal traveling through a circuit meets resistance along the pathways. This resistance becomes heat similar to heat friction when you vigorously rub your hands together. If the distance the signal needs to traverse is reduced then the signal requires less energy to get around and thus encounters less resistance. Less voltage and less resistance equal less heat.
If the die size has been reduced then why the increase in heat as MHz increases? Quite simply no matter how small an engine is made it will get hotter as it runs faster. An important point to note is that faster processors do require more voltage at certain stages but they always generate more heat as the MHz climbs. By building processors on a smaller scale the heat curve has effectively been bumped down from previous, larger processor dies. AMD has also engineered other design and manufacturing tweaks to assist in the challenge of reducing thermal output and increasing speed. We all know heat is the enemy of any processor. Heat is a hot subject of discussion. Consider the following equation.
(Faster + voltage) = temperature - (fans x dBA)
This equation was just made up for this article but it states that faster requires more voltage and where temperature is the variable the number of fans or dBA of those fans must increase to provide balance to the equation in an air-cooled system. The faster you want to go means you need more cooling which could mean more fans to provide that cooling and thus more noise. There are solutions later on in this article.
Get on the bus.
The Front Side Bus speed is the MHz rating at which data is transferred to and from the processor to the rest of the system. Theoretically higher the FSB results in a faster processor. The goal is to maximize the processor speed to perform tasks quickly and efficiently. Currently the Front Side Bus with AMD processors it is at 266 MHz with speculation that AMD has a 333 MHz FSB processor in the works.
Lastly is the inner working of the processor circuitry. This cannot be easily explained but it is safe to say that each of the rivals in the chip wars are constantly developing, refining and perfecting their processors to crunch numbers faster and in greater gulps.
Now you know everything....not a chance.
This little bit of knowledge can be a dangerous thing when it comes to determining which processor is better. A consumer may come to the conclusion that INTEL processors are faster than AMD processors on the details that were just explained that:
- The higher the MHz the better
- The higher Front Side Bus Speed the better
- The more L2 Cache the better
- The lower the voltage the better
INTEL
AMD
Processor Frequency
2.8 GHz
2600+ (2.133GHz)
Thermal Design Power
68.4W
62 W
Bus Speed
533 MHz
133MHz (266MHz DDR)
Core Voltage
1.50 V
1.65 V
L1Cache Size
8K
128K
L2 Cache Size
512K
256K
L2 Cache Speed
2.53 GHz
2.13 GHz
Die Size
0.13 micron
0.13 micron
It's easy to see that assumptions may lead a consumer to believe that the INTEL product is a better processor. These basics may have some validity on paper but not so in the real world. Why the lesson on MHz, die size, bus speeds and cache? The lesson is not which processor is better. The lesson is to not make performance assumptions based in the belief that bigger numbers are better.
- AMD 2100+ Thoroughbred Processor
-
Re:Soggy Chips?
AMD Saxony Operations Unaffected by Dresden Flooding
DRESDEN, GERMANY -- August 15, 2002 --AMD (NYSE: AMD) said today that its AMD Saxony operations located in Dresden, Germany - including production at the facility's Fab 30 plant - continue to operate normally despite severe flooding across Germany's Saxony region.
"Although much of the larger Dresden area is being affected by unprecedented floods, our production is running according to plan and employee morale remains high," said Hans Deppe, vice president and general manager of AMD Saxony. "Because of the preventive controls built-in to our facility and the exemplary dedication of our workforce, we expect to continue to operate normally despite the conditions."
AMD Saxony has its own on-site power plant, and remains accessible via the Dresden airport and federal highways. AMD Saxony's operations, including Fab 30, are located high up on the rim of the river valley and have not been directly affected by the flooding in other parts of Dresden and surrounding areas. The company does not expect that operations will be impacted even if the local flood situation worsens.