This just in, spending 200$ per MB for memory to get a 0.01% speed boost has been determined to be not worth it.
It is cool to have low latency high freq memory but honestly... In the grand scheme of things it doesn't really matter unless you're much faster or much slower.
A CL from 2.5 to 2 is faster... but you're not going to really notice it.
Now say double the bus width or frequency with a CL of 2... that would be a nice increase in speed.
The other problem is these DDR/QDR schemes... if you want say 5 bytes of some line in memory somewhere... you're at the mercy of the address speed not the read speed.
No, but I've been dragged into too many "let's try to make money off tom's public domain code" to know buzzspeak. I usually have self-defense mechanisms [re: shut ears off] when they start in on "getting the market traction going"...
I've been lucky so far and nobody has caught on that I spent over half my time in meetings mentally undressing the cutest female in the office [which varies with location] or just playing SMB1 in my head... which I can do fairly well at this point...
Why this hasn't been done sooner.... boggles my mind.
I think many governments [canadian included] seem to forget that THEY work for US. By making tax forms impossible to decipher [in a timely fashion] or deductions in anyway easy to sort out they're just pissing off their customers and wasting everyones time.
Provided they handle the data securely I can't see a problem with this idea. I just wish my own Ontario government was smart enough to do this.
Benchmarks are often useless, first off what the fuck operations does "content creation" or "business app" do?
I do things like "compile source" or "make RSA keys" and can measure the time gaps [in favour of the AMD/AMD64] with a fucking wrist watch!!!
Sure the P4 can do some things VERY quick [e.g. 128-bit load/store or SSE2] but because it's ALU is so very inefficient it dies on pretty much anything else.
The Athlon Barton [32-bit core] shares the overall ALU design with the AMD64. So to say only the 64 comes close to the P4 is a bit loaded. At the same compiling/bignum the Barton can hold it's own and/or beat it.
They not in other countries, they're temporarily annexing small bits of other nations... Get it straight. Granted the USA does do some nifty humanitarian work they burn any street credit they get when they start "liberation" efforts.
Though that said I don't get the politicians... I mean they have family and friends right? How long before a senators brother or something gets arrested under the new "Ultra Patriot Mega Act 2000" law which states that browsing the web outside designated hours is a crime?
What I'm trying to say is they and their close family/friends/etc have to live with the laws/regulations they're coming up with. To think they're outside the scope of the law in a bubble shows how arrogant and out of touch some of them [on both sides of the R and D camps] are..
"The idea is that you can put more ALUs on the chip if they're simpler and don't have all of the complexity required to support out-of-order execution."
Trans...may....tah....
They got eaten alive on that issue.
Turns out for general purpose software you do need an OOO engine. The only time you can really get away with it is if you are really hardware specific in your code [e.g. a cell processor] where you know the delays of memory/execution resource and can schedule the code effectively on your own.
Their thread design may be suited for servers but if the servers are using off the shelf code [e.g. php on apache with some cgi in C/php/etc] then you really need a kickass compiler and tightly bounded hardware or an ALU that can work well on the fly.
And again, threading is not about duplicating execution resources [e.g. an ALU] but about providing multiple register sets to flow through the core. If your ALU is well built and getting a high IPC [e.g. power efficient] then threading won't help.
What does help servers is multiple execution cores, tightly coupled memory and high bandwidth low latency disk.
No everyone should be pansy little asses, never question anything and always put up with substandard quality.
I'm sorry, but computer science is not about the latest thing java can hide from you like how to manipulate strings. I'm sorry... THAT'S WHAT THE PROGRAM IS ABOUT!!!
It's like saying "oh don't learn calculus, just put the question into Magma and use the result."
Things like Java and C and what not are good to know, but are not a computer science degree.
If you walk out of university with no understanding of how a cpu works [at least from the ISA standpoint] or how to implement a sorting algorithm or a searching algorithm or etc... then you're effectively useless...
People sit and bitch about low quality software all the time [specially on/.] but then never stop to question the people writing it or their half-ass lazy professors from college/university.
Not to say all school is bad. In college we did learn about searching/sorting/compilers/assembler/etc... so the course load was well rounded. I just hate the new programs which tend to focus solely on say doing everything in Java or using existing libraries for all tasks.
My point though is there is more to running a server than "conns/sec". Throwing more resources at the problem means taking more electricity to run and to cool [via air cond].
So it can be more advantageous to efficiently run tasks and round-robin then to have N slow alus that run in parallel because you're not doing cache cohererancy, clock distribution, etc...
I don't know the specs of this new design. My point was just to raise several comments for food for thought. Threading just doesn't pay off for efficient ALU designs. Multi-ALU does pay off though, but effectively if you double the ALU and double the registers... you have a dual core...
Personally I find what AMD is doing is more interesting. They're reducing power while maintaining IPC and their newer designs [e.g. the X2] compare entirely favourably to the latest Intel offerings [e.g. the intel 84x series].
Because developers are looking for the shortest TTM doesn't mean that hand-crafted assembler is moot.
The fact of the matter is the VM occupies both cpu time and memory. Unless you implement the VM in hardware in which case what's the difference? You could just write an x86-vm...
Yes there are 96 processes running on my computer.
At any given load maybe 1 of them is active. If I'm bziping a 900MB tarball it's a single process [* though you can actually split bzip2 into many processes they're still chained...].
Like how many times are you doing DSP operations while writing gigabytes to disk and communicating with other threads at the same time? Mostly you're either taxing the I/O or your taxing the ALU.
Again it's a space/time tradeoff. I don't know how many times I can say this...
Sure the Niagra has 4 ALUs but are they as power efficient as the single ALU in the AMD64 or say a PPC even?
Back to the car analogy, you have 4 people going to work. They can take one car that will burn 1 gallon of fuel and get there in 10 minutes.
In this case they either burn 4 gallons of fuel and get there in four cars [one each] in 10 minutes or they burn 1 gallon and get there in 40 minutes.
Where this shines is if they all want to go to four different places. Instead of taking a round trip of 40 minutes [10 each say] they take 10+e time [e == delay because of traffic] so say e=10 so 20 minutes.
How is this an "improvement?". They're proving that using more resources can get better performance. Yipee.
Can multiple cores improve throughput? You bet! Does multiple threads really help? No way!
Keep in mind the scale here. A 1000-cycle task switch is NOTHING compared to the 2.2 million cycles a process has [2.2Ghz clock] in the typical 1ms style timeslice. 1000 cycles == 0.05% of the total execution time. In a proper OS though the timeslices would be larger for tasks that are above priority which means the actual time taken to swap tasks is minimal.
And even still, hardware assist task swapping and "multi-threading" are not the same thing. I'd rather see an efficient ALU with hardware assist [e.g. a local cache or something for that task] then a multi-threaded ALU with lax performance.
Still a space/time problem. Will your four ALUs be better [in terms of efficiency] than my one high performance ALU?
If it takes you 2x the area to get 2x the performance you've entered into a "no duh" region.
If your multi-threaded 1x the are core gets >1x the performance then you have something to talk about.
In the AMDX2 case the dual-core is faster but nobody is saying "gee whiz that must be some new ideas there!" it's really a bigger chip with more transistors...
The future of computing lay not in "what's the biggest we can build" but what's the most efficient.
The idea is adding register sets [re: threads] somehow makes the process more efficient. My comment is that if your ALU pipeline is well stuffed another thread won't have the execution resources it needs [and likely just get in the way anyways].
So if you make a shoddy ALU that stalls a lot another register set can get you better performance overall [but not for individual threads] and if you make a good ALU your extra register set in hardware buys you VERY LITTLE.
A dual core cpu is something else. that's two execution engines which can run in parallel. That's not the same thing as multi-threading [in the sense of HT].
But you still have a space-time trade off.
Putting 32 really shoddy ALUs in a core doesn't help if you have a single intenstive task [e.g. compiling a large file or bzip'ing a tarball].
And putting 32 really good ALUs isn't feasible [now] as it takes too much power/space to be reliablely implemented.
This is total f'ing hype. If you have an efficient ALU multi-threading won't help crap [in the hardware front, it does in software where you may have blocked threads, etc...].
Think about it this way. You have one car that can carry you and your buddies to work at 50mph and two cars that can take you and your buddies to work at 30mph.
Sure the two cars let you do independent things but when you're working on one task [getting to work] you're not ahead.
In a video game context for instance, you do have multiple threads but the big ones are where 99% of the time is spent [e.g. AI, TL, models]. Giving EQUAL processing resources to something as trivial as audio or network code isn't very smart.
Hyperthreading only pays off for the Intel P4 because the ALU is so notoriously weak that it has the bubbles in the pipeline that another thread can fill.
This isn't true about all processors. Sure HT could work with the AMD64 but you'd see such a marginal [if any] improvement that the size increase would make it cost ineffective.
First off, performance + java != good idea. Not trying to camp fanbois here but if you really need "down to the metal" performance you're writing in C with assembler hotspots.
So the observations that there is too much locking in Java's standard api is informative but not on-topic. the fact that the standard solution is to use a completely new class [e.g. StringBuilder] is why I laughed at my college profs when they were trying to sell their Java courses by saying "and Java is well supported with over 9000 classes!".
In the C and C++ world things get extended but also fixed at the same time. We can still use the strncat function which has been around for a while EVEN IN threaded environments...
Also, he totally fails to point out that extra threads [e.g. register sets] only pay off when the pipeline is empty. So it's a catch-22. You either have a very efficient pipeline that you can cram full of a single thread's instructions or you have a shoddy one where you're only hope is to mix in other threads.
Think about it. If you only have one ALU and 32 threads that means each individual thread works at 1/32 the normal speed. Even if they're a lower/higher priority!
That then gets into two camps. Are you threading because the performance of the pipeline sucks [e.g. dependencies in the P4] or because you want to interleave instructions [e.g. twice the clock rate but half the performance]. If it's the latter than even if you turn off 31 of 32 threads you still end up with one weak ALU.
Consider the AMD64 for instance. It usually gets an IPC that is pretty high [usually in the 1.5-2.5 range] which means that it's retiring instructions from a single thread at pretty much the entire capacity of the chip. Adding extra threads doesn't help.
Consider then the P4. It usually gets an IPC of 0.5 to 1 [for ALU code, which is observable by the fact it's about as fast as a half-clockrate Pentium-M]. This means it's two ALUs are not always busy and an additional thread could bump the IPC up to 1-1.5 range.
I know [for instance] that with HT turned on my 3.2Ghz Prescott compiles LibTomCrypt in close to the same time as my 2.2Ghz AMD64 [the P4 takes 5 seconds longer, without HT it takes about 15 seconds longer].
So the only saving grace is an efficient ALU so that you can run single tasks at least somewhat efficiently. Then tacking on the extra threads doesn't help as an efficient ALU won't have many bubbles where other threads could live.
So you end up with essentially a hardware register file but still 1/2 the performance. Remember that the goal of multi-processing is closer to 'n' times faster with n processors.
The BEST a single core multi-thread design can hope for is the performance of a single core single thread design...
Whoopy...
Multi-threading is NOT the future. Multi-cell is. Where you have dedicated special purpose [re: space optimized] side-cores that do things like "I can do MULACC/load/store REALLY REALLY QUICK!!!".
If you're job is to be a producer of documents... you think you should know your trade.
Why is it we expect a heart surgeon to have a full medical background, a software developer a full math/logic/etc background, etc, etc...
Yet someone who's only job is to document things [given stimuli] can't be called upon to use proper tools.
Sure for quick letters to Mother about how college is or something word/ooo are just fine.
However, if you're employed to produce data sheets or internal tech specs or something where consistency and presentation count... something like TeX makes your job a million times easier. Specially in a multi-user environment [hint: you can't use CVS to have multiple simultaneous editors of a word document...]
For the love of god learn to use TeX and be done with....
For quick documents OO.o is more than fine. For large documents with multiple editors and you need a "consistent" look throughout the ages [hint: TeX hasn't changed in the last 30 years or so and STILL IS USED to produce professional text books, papers, etc] then you can't beat TeX or the macro extensions of LaTeX.
For instance, with TeX the source is a simple text file which means multiple people can CVS-edit the files...
That's the freaking point. He's a dude in the consumer end of things bitching about what he DOESN'T see or what he does see that is WRONG.
He certainly has a point that stats are spinned any-way that sells. Classic example would be the Pentium4. The high clock rate is meant to show that it outperforms the competition when in fact Intel's own lower-clockrate processors often eat it.
Similarly you have all these TCO studies against Linux that are usually totally FOS.
So some guy with a blog wrote about what is useless and what he wants to see in marketting. Who are you to say he doesn't have something to say that is worth discussing here?
Difference is AMD's quadcore will be faster and take less power than Intels single core;-)
hehehe
Ok, fanboy I may be but at least AMD is taking actual strides in MEANINGFUL improvements [e.g. low-power equal-performance AMD64 venice core] whereas Intel [outside of the PentiumM] is relying solely on a massively high clock rate [with an massively inefficient ALU] to get attention.
I mean why is it at something like bignum math or compiling a half clockrate AMD or PentiumM can get equal or better wall-time per operation when compared to a Northwood or Prescott P4?
So it may seem absurd to have 4 cores on one die but they're not half-ass slapped together inefficient designs.
AMD took the time to design HT such that things like this would be efficient.
Yay, all the power of a 2.4Ghz AMD64 using only twice-three times the power and at seven times the cost!
Go Apple!
And people wonder why us non-Mac folk don't take them seriously... Cuz in reality the next Intel based Apple laptops will be using [most likely] a sub-3.2Ghz processor which the AMD64 will just totally fucking own on a efficiency/cost basis.
This just in, spending 200$ per MB for memory to get a 0.01% speed boost has been determined to be not worth it.
... but you're not going to really notice it.
... that would be a nice increase in speed.
It is cool to have low latency high freq memory but honestly... In the grand scheme of things it doesn't really matter unless you're much faster or much slower.
A CL from 2.5 to 2 is faster
Now say double the bus width or frequency with a CL of 2
The other problem is these DDR/QDR schemes... if you want say 5 bytes of some line in memory somewhere... you're at the mercy of the address speed not the read speed.
Tom
No, but I've been dragged into too many "let's try to make money off tom's public domain code" to know buzzspeak. I usually have self-defense mechanisms [re: shut ears off] when they start in on "getting the market traction going"...
I've been lucky so far and nobody has caught on that I spent over half my time in meetings mentally undressing the cutest female in the office [which varies with location] or just playing SMB1 in my head... which I can do fairly well at this point...
Tom
Why this hasn't been done sooner .... boggles my mind.
I think many governments [canadian included] seem to forget that THEY work for US. By making tax forms impossible to decipher [in a timely fashion] or deductions in anyway easy to sort out they're just pissing off their customers and wasting everyones time.
Provided they handle the data securely I can't see a problem with this idea. I just wish my own Ontario government was smart enough to do this.
Tom
synergy.
They're leveraging a cross-brand multi-market upscale potential to maximize their mindspace and returning revenue streams.
Tom
Yeah maybe them personally but their family? their friends? etc... How far are they willing to stick their neck out for?
Tom
Benchmarks are often useless, first off what the fuck operations does "content creation" or "business app" do?
I do things like "compile source" or "make RSA keys" and can measure the time gaps [in favour of the AMD/AMD64] with a fucking wrist watch!!!
Sure the P4 can do some things VERY quick [e.g. 128-bit load/store or SSE2] but because it's ALU is so very inefficient it dies on pretty much anything else.
The Athlon Barton [32-bit core] shares the overall ALU design with the AMD64. So to say only the 64 comes close to the P4 is a bit loaded. At the same compiling/bignum the Barton can hold it's own and/or beat it.
Tom
shhhhh that's hippie liberal thinking.
They not in other countries, they're temporarily annexing small bits of other nations... Get it straight. Granted the USA does do some nifty humanitarian work they burn any street credit they get when they start "liberation" efforts.
Though that said I don't get the politicians... I mean they have family and friends right? How long before a senators brother or something gets arrested under the new "Ultra Patriot Mega Act 2000" law which states that browsing the web outside designated hours is a crime?
What I'm trying to say is they and their close family/friends/etc have to live with the laws/regulations they're coming up with. To think they're outside the scope of the law in a bubble shows how arrogant and out of touch some of them [on both sides of the R and D camps] are..
Tom
You can have multiple users of words like that so long as they don't compete [hint: Apple ... before itunes].
That and I don't see why they care. Spend more time developing and less time lawyering.
Tom
"The idea is that you can put more ALUs on the chip if they're simpler and don't have all of the complexity required to support out-of-order execution."
Trans...may....tah....
They got eaten alive on that issue.
Turns out for general purpose software you do need an OOO engine. The only time you can really get away with it is if you are really hardware specific in your code [e.g. a cell processor] where you know the delays of memory/execution resource and can schedule the code effectively on your own.
Their thread design may be suited for servers but if the servers are using off the shelf code [e.g. php on apache with some cgi in C/php/etc] then you really need a kickass compiler and tightly bounded hardware or an ALU that can work well on the fly.
And again, threading is not about duplicating execution resources [e.g. an ALU] but about providing multiple register sets to flow through the core. If your ALU is well built and getting a high IPC [e.g. power efficient] then threading won't help.
What does help servers is multiple execution cores, tightly coupled memory and high bandwidth low latency disk.
Tom
No everyone should be pansy little asses, never question anything and always put up with substandard quality.
... THAT'S WHAT THE PROGRAM IS ABOUT!!!
/.] but then never stop to question the people writing it or their half-ass lazy professors from college/university.
I'm sorry, but computer science is not about the latest thing java can hide from you like how to manipulate strings. I'm sorry
It's like saying "oh don't learn calculus, just put the question into Magma and use the result."
Things like Java and C and what not are good to know, but are not a computer science degree.
If you walk out of university with no understanding of how a cpu works [at least from the ISA standpoint] or how to implement a sorting algorithm or a searching algorithm or etc... then you're effectively useless...
People sit and bitch about low quality software all the time [specially on
Not to say all school is bad. In college we did learn about searching/sorting/compilers/assembler/etc... so the course load was well rounded. I just hate the new programs which tend to focus solely on say doing everything in Java or using existing libraries for all tasks.
Tom
My point though is there is more to running a server than "conns/sec". Throwing more resources at the problem means taking more electricity to run and to cool [via air cond].
... you have a dual core...
So it can be more advantageous to efficiently run tasks and round-robin then to have N slow alus that run in parallel because you're not doing cache cohererancy, clock distribution, etc...
I don't know the specs of this new design. My point was just to raise several comments for food for thought. Threading just doesn't pay off for efficient ALU designs. Multi-ALU does pay off though, but effectively if you double the ALU and double the registers
Personally I find what AMD is doing is more interesting. They're reducing power while maintaining IPC and their newer designs [e.g. the X2] compare entirely favourably to the latest Intel offerings [e.g. the intel 84x series].
Tom
Because developers are looking for the shortest TTM doesn't mean that hand-crafted assembler is moot.
The fact of the matter is the VM occupies both cpu time and memory. Unless you implement the VM in hardware in which case what's the difference? You could just write an x86-vm...
Tom
Chances are if I have a very expensive and dedicated task for which I need a $4000 processors ... I'll know which ISA to optimize for.
Tom
learn...to...profile...
Yes there are 96 processes running on my computer.
At any given load maybe 1 of them is active. If I'm bziping a 900MB tarball it's a single process [* though you can actually split bzip2 into many processes they're still chained...].
Like how many times are you doing DSP operations while writing gigabytes to disk and communicating with other threads at the same time? Mostly you're either taxing the I/O or your taxing the ALU.
Again it's a space/time tradeoff. I don't know how many times I can say this...
Sure the Niagra has 4 ALUs but are they as power efficient as the single ALU in the AMD64 or say a PPC even?
Back to the car analogy, you have 4 people going to work. They can take one car that will burn 1 gallon of fuel and get there in 10 minutes.
In this case they either burn 4 gallons of fuel and get there in four cars [one each] in 10 minutes or they burn 1 gallon and get there in 40 minutes.
Where this shines is if they all want to go to four different places. Instead of taking a round trip of 40 minutes [10 each say] they take 10+e time [e == delay because of traffic] so say e=10 so 20 minutes.
How is this an "improvement?". They're proving that using more resources can get better performance. Yipee.
Can multiple cores improve throughput? You bet! Does multiple threads really help? No way!
Keep in mind the scale here. A 1000-cycle task switch is NOTHING compared to the 2.2 million cycles a process has [2.2Ghz clock] in the typical 1ms style timeslice. 1000 cycles == 0.05% of the total execution time. In a proper OS though the timeslices would be larger for tasks that are above priority which means the actual time taken to swap tasks is minimal.
And even still, hardware assist task swapping and "multi-threading" are not the same thing. I'd rather see an efficient ALU with hardware assist [e.g. a local cache or something for that task] then a multi-threaded ALU with lax performance.
Tom
Still a space/time problem. Will your four ALUs be better [in terms of efficiency] than my one high performance ALU?
If it takes you 2x the area to get 2x the performance you've entered into a "no duh" region.
If your multi-threaded 1x the are core gets >1x the performance then you have something to talk about.
In the AMDX2 case the dual-core is faster but nobody is saying "gee whiz that must be some new ideas there!" it's really a bigger chip with more transistors...
The future of computing lay not in "what's the biggest we can build" but what's the most efficient.
Tom
If you were at uni for "computer science" your prof's did you a disservice.
Tom
The idea is adding register sets [re: threads] somehow makes the process more efficient. My comment is that if your ALU pipeline is well stuffed another thread won't have the execution resources it needs [and likely just get in the way anyways].
So if you make a shoddy ALU that stalls a lot another register set can get you better performance overall [but not for individual threads] and if you make a good ALU your extra register set in hardware buys you VERY LITTLE.
A dual core cpu is something else. that's two execution engines which can run in parallel. That's not the same thing as multi-threading [in the sense of HT].
But you still have a space-time trade off.
Putting 32 really shoddy ALUs in a core doesn't help if you have a single intenstive task [e.g. compiling a large file or bzip'ing a tarball].
And putting 32 really good ALUs isn't feasible [now] as it takes too much power/space to be reliablely implemented.
Tom
This is total f'ing hype. If you have an efficient ALU multi-threading won't help crap [in the hardware front, it does in software where you may have blocked threads, etc...].
Think about it this way. You have one car that can carry you and your buddies to work at 50mph and two cars that can take you and your buddies to work at 30mph.
Sure the two cars let you do independent things but when you're working on one task [getting to work] you're not ahead.
In a video game context for instance, you do have multiple threads but the big ones are where 99% of the time is spent [e.g. AI, TL, models]. Giving EQUAL processing resources to something as trivial as audio or network code isn't very smart.
Hyperthreading only pays off for the Intel P4 because the ALU is so notoriously weak that it has the bubbles in the pipeline that another thread can fill.
This isn't true about all processors. Sure HT could work with the AMD64 but you'd see such a marginal [if any] improvement that the size increase would make it cost ineffective.
Tom
First off, performance + java != good idea. Not trying to camp fanbois here but if you really need "down to the metal" performance you're writing in C with assembler hotspots.
/.".
So the observations that there is too much locking in Java's standard api is informative but not on-topic. the fact that the standard solution is to use a completely new class [e.g. StringBuilder] is why I laughed at my college profs when they were trying to sell their Java courses by saying "and Java is well supported with over 9000 classes!".
In the C and C++ world things get extended but also fixed at the same time. We can still use the strncat function which has been around for a while EVEN IN threaded environments...
Also, he totally fails to point out that extra threads [e.g. register sets] only pay off when the pipeline is empty. So it's a catch-22. You either have a very efficient pipeline that you can cram full of a single thread's instructions or you have a shoddy one where you're only hope is to mix in other threads.
Think about it. If you only have one ALU and 32 threads that means each individual thread works at 1/32 the normal speed. Even if they're a lower/higher priority!
That then gets into two camps. Are you threading because the performance of the pipeline sucks [e.g. dependencies in the P4] or because you want to interleave instructions [e.g. twice the clock rate but half the performance]. If it's the latter than even if you turn off 31 of 32 threads you still end up with one weak ALU.
Consider the AMD64 for instance. It usually gets an IPC that is pretty high [usually in the 1.5-2.5 range] which means that it's retiring instructions from a single thread at pretty much the entire capacity of the chip. Adding extra threads doesn't help.
Consider then the P4. It usually gets an IPC of 0.5 to 1 [for ALU code, which is observable by the fact it's about as fast as a half-clockrate Pentium-M]. This means it's two ALUs are not always busy and an additional thread could bump the IPC up to 1-1.5 range.
I know [for instance] that with HT turned on my 3.2Ghz Prescott compiles LibTomCrypt in close to the same time as my 2.2Ghz AMD64 [the P4 takes 5 seconds longer, without HT it takes about 15 seconds longer].
So the only saving grace is an efficient ALU so that you can run single tasks at least somewhat efficiently. Then tacking on the extra threads doesn't help as an efficient ALU won't have many bubbles where other threads could live.
So you end up with essentially a hardware register file but still 1/2 the performance. Remember that the goal of multi-processing is closer to 'n' times faster with n processors.
The BEST a single core multi-thread design can hope for is the performance of a single core single thread design...
Whoopy...
Multi-threading is NOT the future. Multi-cell is. Where you have dedicated special purpose [re: space optimized] side-cores that do things like "I can do MULACC/load/store REALLY REALLY QUICK!!!".
In other words, "yet another press release on
Tom
If you're job is to be a producer of documents... you think you should know your trade.
Why is it we expect a heart surgeon to have a full medical background, a software developer a full math/logic/etc background, etc, etc...
Yet someone who's only job is to document things [given stimuli] can't be called upon to use proper tools.
Sure for quick letters to Mother about how college is or something word/ooo are just fine.
However, if you're employed to produce data sheets or internal tech specs or something where consistency and presentation count... something like TeX makes your job a million times easier. Specially in a multi-user environment [hint: you can't use CVS to have multiple simultaneous editors of a word document...]
Tom
For the love of god learn to use TeX and be done with....
For quick documents OO.o is more than fine. For large documents with multiple editors and you need a "consistent" look throughout the ages [hint: TeX hasn't changed in the last 30 years or so and STILL IS USED to produce professional text books, papers, etc] then you can't beat TeX or the macro extensions of LaTeX.
For instance, with TeX the source is a simple text file which means multiple people can CVS-edit the files...
Tom
That's the freaking point. He's a dude in the consumer end of things bitching about what he DOESN'T see or what he does see that is WRONG.
He certainly has a point that stats are spinned any-way that sells. Classic example would be the Pentium4. The high clock rate is meant to show that it outperforms the competition when in fact Intel's own lower-clockrate processors often eat it.
Similarly you have all these TCO studies against Linux that are usually totally FOS.
So some guy with a blog wrote about what is useless and what he wants to see in marketting. Who are you to say he doesn't have something to say that is worth discussing here?
Tom
Difference is AMD's quadcore will be faster and take less power than Intels single core ;-)
hehehe
Ok, fanboy I may be but at least AMD is taking actual strides in MEANINGFUL improvements [e.g. low-power equal-performance AMD64 venice core] whereas Intel [outside of the PentiumM] is relying solely on a massively high clock rate [with an massively inefficient ALU] to get attention.
I mean why is it at something like bignum math or compiling a half clockrate AMD or PentiumM can get equal or better wall-time per operation when compared to a Northwood or Prescott P4?
So it may seem absurd to have 4 cores on one die but they're not half-ass slapped together inefficient designs.
AMD took the time to design HT such that things like this would be efficient.
Tom
AMDs implementation of x86_{32|64} is a bit more sane and performs much better.
Sure the x86 ISA is bloated but once you get past the decoder it's all RISC underneath baby.
Tom
Yay, all the power of a 2.4Ghz AMD64 using only twice-three times the power and at seven times the cost!
Go Apple!
And people wonder why us non-Mac folk don't take them seriously... Cuz in reality the next Intel based Apple laptops will be using [most likely] a sub-3.2Ghz processor which the AMD64 will just totally fucking own on a efficiency/cost basis.
Tom