Domain: realworldtech.com
Stories and comments across the archive that link to realworldtech.com.
Comments · 215
-
Re:Geekbench is Shit
https://www.realworldtech.com/...
Geekbench 4 (used here) gets Linus's seal of approval.
-
Re:Same shit, same real world problems.
Monolithic kernels had a syscall performance advantage
... before the Meltdown patches were applied.The security advantage of microkernels is that you could reduce the attack surface by making your Trusted Computing Base as small as possible.
The seL4 microkernel has been formally proven safe. That is possible because it is small enough, but it still took years. You can't do that with something as Linux.The locus in OS research seems to have moved on towards multikernels for better performance on systems with many cores. A multikernel is basically about having a separate microkernel on each core, each core's kernel chosen to be the best available for the tasks that run on it.
Linus, on the other hand, does not like CPUs with many cores ... -
Re:Impossible!
Hmm, I see what you mean about unaligned pointers and ARM. Support was added in ARMv6 and it's still there in ARMv8 AArch64 mode
AArch64
https://www.realworldtech.com/...ARMv6
http://infocenter.arm.com/help...Frankly if I were Microsoft and someone suggested adding support for non native kernel mode code I'd say "What an excellent idea. Why not suggest it to Dave Cutler, right now!". And then that would be last you'd see of them.
-
Note they only go back to 6th generation
I.e. the 6700K.
I.e. all the chips have PCID
It's a bit hazy when PCID and INVPCID became supported.
This says PCID was first supported in Westmere
https://www.realworldtech.com/...
Another long overdue improvement to the page tables is the Processor Context ID (PCID). The PCID is a field in each TLB entry that associates a given page to a process. Previously, Intel's TLB could only contain entries from a single process and whenever the CR3 register was written (e.g. a context switch), the TLB was flushed. The PCID lets pages from different processes safely inhabit the TLB together, so that CR3 writes no longer flush the TLB. Whenever a process tries to access a page in memory, the PCID is checked to determine whether the page is actually mapped into the process' address space; if the PCID does not match then a TLB miss occurred. This is very much analogous to Intel's VPID, which enables the TLB to contain pages from different virtual machines and avoid TLB flushes during VM transitions.
The LWN patch says
http://lkml.iu.edu/hypermail/l...
PCIDs are generally available on Sandybridge and newer CPUs. However,
the accompanying INVPCID instruction did not become available until
Haswell (the ones with "v4", or called fourth-generation Core). This
instruction allows non-current-PCID TLB entries to be flushed without
switching CR3 and global pages to be flushed without a double
MOV-to-CR4.I.e. it'd be interesting to see what happens on a CPU old enough not to support enough of PCID/INVPCID to optimized KPTI.
The claims of >10% hits are all for these old CPUs.
-
Game over Intel in server power consumption
It's Game over for Intel in the server market on power consumption. A slightly under-clocked Ryzen 1700 scores 850 in cb, and draws only 30 Watts at full load. Intel's low power offering Atom c2000 CPU's draw 33-35W under full load. Intel really will loose 15-30% new server chips sales on this alone, Xeons are 90-140w under full load.https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/
FUD about CCX latency is FUD. Zen does memory access completely differently, but don't take my words, take Linus Torvalds. http://www.realworldtech.com/f...
So basically Zen ends up produces less cache misses, so just measuring pure cache latency is invalid benchmark.
-
Re:Crippled Ryzen 7
Zen has completely new way of loading from memory, but don't take my words for it, take Linus Torvalds. http://www.realworldtech.com/f...
FUD about CCX latency most likely coming from Intel. AMd Zen ends up producing less cache misses. -
Re:Headline doesn't really match actual news
Apple does not have any "graphics" on mobile. They license the best design for their specific needs and then build it into their CPU core. No reason why others can not do the same - they just have to be willing to pay for it. So Apple does have killer graphics with their iDevices - but it is not their design. I do give them credit for making excellent engineering decisions because licensing the design has worked out quite well for them.
Not even remotely true. Here, Here, is an article from Oct 2016 about Apple replacing components of the licensed GPU with custom designs. The only part not completely custom designed by Apple is the fixed point functions.
-
Re:For gamers
Yeah, which is why I think Linus Torvalds was right when he said The whole "parallel computing is the future" is a bunch of crock.
-
Re:Holy Shit
AMD's GPU are in general larger in transistors and compute-units than the closest competitor from Nvidia.
The difference is that while AMD's offerings should have been giving better price/performance if you only look at the numbers, Nvidia's hardware and software have been more optimized and therefore more capable in practice.In August, one of those optimizations were revealed by tech site Real World Tech:
Nvidia does a kind of tile-based rasterization of opaque polygons to avoid having to run shaders for pixels that will be overdrawn. They also adjust the tile size to keep as much in cache as possible. Real World Tech also shows that this is something that AMD cards don't do. -
Re:Still don't trust SSDs
In short, there is no reliability issue, and the write limitation is a non issue for 99.999% of the computers out there. It just doesn't seem to be working for *you*
SSD reliability issues include in rough order of importance:
1. Corruption on Power Loss
2. Trim Corruption
3. Unpowered Retention
4. Write EnduranceThe problems with trim are annoying but no SSD should need to use trim for good performance. Those that do were designed poorly:
http://www.realworldtech.com/f...
Corruption on power loss in this case is not corruption of data being written at the time of power loss; that is expected. It is corruption of unrelated data or state which may render the drive unusable. Some SSDs do not suffer from this problem and it should not occur in a portable application because of battery operation but a desktop is different matter.
-
Re: AMD is on the road to nowhere
I ran across this link to a transcript about the history of the 68000 after my post here. There is a discussion thread on the Real World Technology forums about why IBM chose the 8088 where this transcript was linked:
http://archive.computerhistory...
http://www.realworldtech.com/f...
The additional cost of the 68000 over the 8088 was an even larger factor than I remember.
-
Re:Lennart, do you listen to sysadmins?
There is no such thing as a hybrid. You are either on fire, or you are not.
That is such an idiotic statement that I won't even bother continuing the discussion. This link is the wikipedia page. And is Linus himself speaking about the mix of kernel architectures.
The people who push systemd have serious issues with reality it seems. Pulseaudio is a brain damaged piece of software and one of the first things to be removed in any distribution.
-
Re:x86 IS efficient
Except that 64-bit ARM (AArch64) doesn't have Thumb. Source.So in 64-bit mode (which is what these server processors will be running in), x86-64 again has a code density advantage over AArch64.
-
Re:No thanks
Here's a review: http://www.realworldtech.com/arm64/
The instruction set include arithmetic operators, a whole set of move (register/memory) instructions, floating-point instructions, SIMD vector instructions, cryptographic instructions. But the arithmetic instructions are separate from the memory transfer instructions.
-
Re:What the headline giveth . . .
Update: Looking at David Kanter's site (graph 1 and graph 2) the AMD parts and Intel server parts come in at about the efficiency listed in the chart (which again is based on peak performance and published TDP). NVIDIA's Kepler and Intel's Silverthorne seem to be more efficient in the real world than as presented from that calculation. I have no idea about the Cortex A9, there are a million different versions and I can't recall seeing hard numbers for the one in the iPad 2, some of which are on a 40 nm process and some of which are on a 32 nm process, further muddying the waters. Either way, it's cool research.
-
Re:What the headline giveth . . .
Update: Looking at David Kanter's site (graph 1 and graph 2) the AMD parts and Intel server parts come in at about the efficiency listed in the chart (which again is based on peak performance and published TDP). NVIDIA's Kepler and Intel's Silverthorne seem to be more efficient in the real world than as presented from that calculation. I have no idea about the Cortex A9, there are a million different versions and I can't recall seeing hard numbers for the one in the iPad 2, some of which are on a 40 nm process and some of which are on a 32 nm process, further muddying the waters. Either way, it's cool research.
-
Re:ARM is not RISC and x86-64 is not CISC
I didn't write the summary posted on Slashdot. My summary (it's probably still in the "firehose" section) was one line. The Slashdot editor just scraped the first few paragraphs of my article. You can tell the number of people who actually read my article by the discussion of PowerVR graphics. There isn't one.
And yet those are your words un-edited (aside from the first paragraph, where they inserted a link)! The incendiary TITLE, and the second and third paragraphs of the article summary are stripped directly from your blog post! How have you magically disavowed all knowledge of the the words that you posted on your blog?
And yet I'll bet you still find a way to convince yourself that your title is not incendiary or fluff.
Speaking of which, you sidestepped my concerns over the main tenet of your article (and I quote): "ARM ends up being several times more efficient than Intel." You make this claim without any analysis, and expect the world to believe you. Except that people have already commented on this matter who tend to know a little bit more about processor micro architectures than you Bruce (like Linus Torvalds, and David Kanter offers further insight), and the general consensus is that Intel is not at any disadvantage against ARM.
Want to tell me in *technical terms* why your worthless blog post has merit at all? Yes, we all know Intel has a process advantage...boo, hoo hoo! Got any other REAL complaints aside from that obvious one?
-
Re:ARM is not RISC and x86-64 is not CISC
I didn't write the summary posted on Slashdot. My summary (it's probably still in the "firehose" section) was one line. The Slashdot editor just scraped the first few paragraphs of my article. You can tell the number of people who actually read my article by the discussion of PowerVR graphics. There isn't one.
And yet those are your words un-edited (aside from the first paragraph, where they inserted a link)! The incendiary TITLE, and the second and third paragraphs of the article summary are stripped directly from your blog post! How have you magically disavowed all knowledge of the the words that you posted on your blog?
And yet I'll bet you still find a way to convince yourself that your title is not incendiary or fluff.
Speaking of which, you sidestepped my concerns over the main tenet of your article (and I quote): "ARM ends up being several times more efficient than Intel." You make this claim without any analysis, and expect the world to believe you. Except that people have already commented on this matter who tend to know a little bit more about processor micro architectures than you Bruce (like Linus Torvalds, and David Kanter offers further insight), and the general consensus is that Intel is not at any disadvantage against ARM.
Want to tell me in *technical terms* why your worthless blog post has merit at all? Yes, we all know Intel has a process advantage...boo, hoo hoo! Got any other REAL complaints aside from that obvious one?
-
Re:Don't worry, Nvidia!
Spec benchmarks never supported any of Apple's claims, int or floating performance.
Wrong. http://www.realworldtech.com/page.cfm?ArticleID=rwt051400000000&p=3 - and like you mention, SPEC never bothered with SIMD.
-
Re:So
Real World Technologies is worth mentioning too. It's a high-quality, rarely updated site involving mostly CPU/GPU/APU type stuff.
-
Re:MAKES SENSE !!
Intel is still working on Itanium and has a new architecture in the pipeline. This article agrees that simple VLIW is dead, and indicates that the new Itanium architecture will do a little bit of scheduling on its own instead of relying completely on the compiler.
-
What about Poulson?
If Itanium is dead, then why does Intel have all this architectural investment?
-
Re:Stop pissing on the drivers, it's the games.
In addition, Crysis was the topic when showing that PhysX is not optimized for x87, even though x87 is part of the Crytek system requirements. Although one may think that "it works on one graphics card but not the other" is a hardware deficiency, the software/game may rely on a bug or error elsewhere in the OS to operate correctly. The only way to be certain is to compile it from source.
-
Why not post intel's response?
Not sure why the submitter didn't post the Intel response denying it: http://newsroom.intel.com/community/intel_newsroom/blog/2011/03/23/chip-shot-intel-reaffirms-commitment-to-itanium While you would think Intel would of course deny it, but considering Intel just took the wraps off their next revision of the Itanium, this is pretty much just FUD coming from Oracle.
-
Re:Meh. Missing features.According to Linus Torvalds, on an SSD that is worth a damn, nobody needs TRIM:
The fact is, any SSD worth anything should work perfectly fine without trim, and if you need trim to get it back to good performance, you should just ditch the SSD entirely. The whole "SSD's need TRIM" support was a bedtime story for gullible morons. The same morons who also bought the "SSD's need big IO and natural alignment" story that came out a couple of years before that. The fact is, SSD's had seriously buggy garbage collection. TRIM was a workaround for an SSD firmware bug, nothing less, and most definitely nothing more. Yes, yes, it can make a difference, but it's not at all the magical fairy dust that people have claimed it was. The real solution was always to just fix the performance bugs in the bad GC that SSD's did.
Apple's solution to the whole TRIM problem was to not use SSD's with badly implemented garbage collection in their computers in the first place.
Meh. What does the guy who created Linux know about computers. -
Re:Hard call for GPU selection
Actually NVIDIA was a major contributor to OpenCL http://www.realworldtech.com/page.cfm?ArticleID=RWT120710035639
-
Re:Intel integrated graphics at anandtech.comYet, you still need an i7 + intel integrated graphics and an i7 compatible motherboard to get the performance of a $~50 dedicated GPU. Pricewise, you could go with an AMD solution and a dedicated GPU in the $75-$100 range from Nvidia or AMD and still pay half as much for better 3D performance.
The numbers look even worse for Intel if you grab an "off-the-shelf" dedicated GPU thats one generation older, e.g. a 1GB Radeon 4670 for ~$65.
AMD also has Hybrid graphics, first introduced with the Puma or Spider platform:Hybrid Graphics
The 780 chipset is the first product to use a "hybrid" multi-GPU set up, aptly named, Hybrid Crossfire. Hybrid Crossfire operates a discrete GPU (HD 34xx) in tandem with the IGP to boost performance above what either could achieve separately. -
Re:Instruction set...
What layer? Their decoder is the translation, and although it doesn't take up 50%, it's not a trivial amount of space. Not only space, though, but pipeline: an instruction gets 5-deep into the pipeline just in terms of decoding whereas an equivalent A8 pipeline is only 3 stages. Branch penalties on x86 are nasty, which is why there's so much logic (caching decoded instructions, branch statistics, etc.) dedicated to alleviating the problem.
-
Re:Query
Anyone have data on how these compare to x86 and Intel's latest creations? Presumably, one could write an efficient algorithm for a variety of common computing tasks and port it to the different chips to get a cross-architecture performance estimate.
That's called SPEC CPU; here are some results: http://www.realworldtech.com/forums/index.cfm?action=detail&id=107244&threadid=107238&roomid=2
-
Re:What's the point?
Linus said himself, that his biggest error with Linux was, that he made it monolithic.
Bull. He's consistently said the exact opposite. See the Wikipedia article. Or look at this post of his from 2006:
The whole "microkernels are simpler" argument is just bull, and it is clearly shown to be bull by the fact that whenever you compare the speed of development of a microkernel and a traditional kernel, the traditional kernel wins. By a huge amount, too.
The whole argument that microkernels are somehow "more secure" or "more stable" is also total crap. The fact that each individual piece is simple and secure does not make the aggregate either simple or secure. And the argument that you can "just reload" a failed service and not take the whole system down is equally flawed.
Where has he ever said that making Linux monolithic was a mistake?
-
Linus Torvolds & Dave Patterson discuss it on
Actually, this is old news. There's a month old discussion thread on RWT Discussion forum. Berkeley proposes the "thirteen dwarfs" - 13 kinds of test algorithms they consider valuable to parallelize. Linus doesn't think the 13 dwarfs correspond well to everyday computing loads. My 2 cents: Intel & others are spending hundreds of millions of bucks per year trying to speed up single-thread style computing, so it's not a bad idea to put a few more million/year into thousand thread computing.
-
Real World Tech typography; ugh!
> http://www.realworldtech.com/page.cfm?ArticleID=RWT082807020032
Can RWT please hire a typographer? Trying to read even the first page caused my eyes to cross. -
Re:QuickPath vs HyperTransport
-
Re:How about a regular Cell based laptop?IIRC, the Cell uses way too much power for sensible laptop use.
Apparently, you do not know how CMOS devices work. The power consumption of the chip is directly proportional to the capacitive load and the frequency, and is proportional to the square of the voltage.
Concering only the SPE power consumption, which is the majority of power used by a Cell chip:
If x represents the power consumption of a 7-SPE chip running at 3.2 GHz...
If you cut the number of SPEs from 7 to 4, your capacitive load is cut to %57 of the original, or 0.57 * x.
If you again cut the frequency from 3.2 GHz to 1.5 GHz, you get a power consumption reduction of 1.5 / 3.2. Your total power consumption after capacitive load and frequency changes is 0.26 * x.
The PPE portion of the chi[p will see power consumption reduced by half because of frequency.
FINALLY: a reduced operating frequency means you can reduce the voltage, and this is where you can see some impressive gains. Just to get an idea of the differences in voltages, here is a link to a voltage vs speed graph for each SPE, from Sony engineers. You could potentially operate the Cell at 1.5 GHz at a very low threshold voltage, giving you a %20-30 reduced power consumption.
So, after all that, you have a chip that runs on less than %20 of the power of its big brother (estimated 60-80w), so this chip is around 10-15w, which is quite practical for four 128-bit vector processors plus a PPE.
Not that there's anything the Cell could really do effectively for a PC. For parallel processing, we already have dual 128-bit SSE units on the Core2 Duo processors, which comes within fighting range of four SPEs clocked at a paltry 1.5 GHz. And of course, most of there pipe-dream uses will get held-back by slow I/O on a home computer or laptop (like ALL the examples uses for this chip listed in the article), so there's really no need for all that processing power.
-
This has nothing to do with Intel's "chips"This benchmark is a system benchmark, meaning that it takes into account power dissipation of much more than the processor alone. It is fair to say that Intel's current server platforms use more power than AMD's server platforms, but this is actually due to their memory technology, and not to the processors themselves.
To be more specific, the Xeon processor in this review is the same processor core as the Merom/Conroe Core 2 Duo core. If you benchmark Conroe on a platform using the same memory technology (DDR2) as AMD, you'll find that Intel's power consumption is significantly less than AMD's. But Intel decided to use a different technology (FBDIMM) for its server platforms, in order to increase maximum memory capacity, whereas the Opteron used a simpler technology which is severely limited in memory capacity per channel, since the outdated parallel multidrop DDR2 bus can't go at speed when heavily loaded.
FBDIMM is like PCI-Express or Hypertransport for a memory interface, meaning that it's serial and point to point, instead of parallel and multidrop. This allows Intel to add many more loads to the memory channel without slowing the channel down, because it is Fully Buffered (the FB part of FBDIMM), which increases memory capacity per channel. However, FBDIMM also turns out to be very power hungry, and Intel is now being forced (by benchmarks such as this one) to release server platforms without FBDIMM in order to lower power consumption for people who don't need large memory capacities. (for some confirmation of this, look here: http://theinquirer.net/?article=42183)
In any case, the results of this benchmark aren't about "chips", they're about platforms. Intel's current chips are pretty good, but their server platforms need some work. That's why Intel's coming out with a whole new platform next year (here's some reading material for you: http://realworldtech.com/page.cfm?ArticleID=RWT08
2 807020032 ).So a quick answer to your question: Intel's chips ARE better than AMD's, but their platforms aren't. Here's the question you should have asked: Why are Intel's platforms always behind AMDs? The answer to that is basically that Intel has lots more internal politics, and therefore it is slow to change things that have impact across the company, like platforms. Intel has a lot of internal competition: lots of separate groups working on various competing processors, so the processors themselves are usually pretty good (Darwin at work). But the teams making the processors don't have the freedom to change the platform, since that's outside their scope and requires lots of corporate maneuvering. So Intel's platforms are much slower to change than AMDs.
Summing up: don't confuse a system benchmark for a processor benchmark! TFA isn't about processors at all, it's about systems.
-
Re:Same latency with 4 processors
Yes, the quad-core chips will have the fourth link. In addition, the chips will be able to split their 16-bit HT links into dual 8-bit HT links, allowing for 8-way CPU configurations without hops (8 x 8-bit HT links per socket). In reality, this is the reason why AMD is pushing the new HyperTransport 3.0: so they can cut the bus lines to 8 without sacrificing too much bandwidth.
Check it out here. -
microkernels are a form of B&D programming.The reason the microkernels fail is that ukernels are a form of bondage and discipline programming.
Bondage and Discipline programming occurs when the smart people on the central committee decide that ordinary developers are not smart enough to decide how to code on their own. They create a "system" that won't let the ordinary developers make certain kinds of errors. Pascal is the canonical Bondage and Discipline language.
There are 3 flaws in B&D programming.
- Bondage and discipline programming causes overhead and reduces your performance.
- bondage and discipline programming won't let you choose the best method to achieve your goal, so your design becomes more difficult.
- The smart people on the central committee, the creators of the B&D system, are not as smart as they think they are.
Linus Torvalds' criticism of ukernels ( Thread starts here. ) accuses them of the first 2 flaws, but he politely does not mention the third.
The tunes people also have a harsh criticism of ukernels . They accuse it of abstraction inversion There is less criticism of ukernels in academia where it might be a career limiting move (CLM). Bondate and discipline programming seems to be commonly advocated there.
I made a presentation to Austin Linux Group on Tanenbaum-Torvalds microkernel vs monolithic kernel Debate.
-
Re:Unasked, unanswered question
Yes and no,
Some of the bugs will be fixed, others won't. Every CPU has bugs, it's just a fact of life. These things are designed by humans, it's just going to happen. CPU errata happens with Intel (This is the Core2 link) and AMD. None of this is a major threat to most users, and they get worked around by most people pretty quickly. Microsoft have released fixes for the Core2 issue, as have Apple. I don't know whether there has been an update to the kernel for these yet, but I am sure they would get back ported by your distribution.
There is a note here and here regarding the Core 2 bugs, I think one of these might have even become a slashdot article at one point. The two links here both are referring to Linus' comment of it being "Totally insignificant", which given that he worked for Transmeta and knows a lot more about how the industry works, I would be putting a bit of faith in his statement.
As another poster said, keep up to date on your BIOS revs, as CPU microcode does have fixes for this stuff too.
Berny -
Linus doesn't think it's a big deal.
-
Re:my 1.9432534656 cents worth...
microcodes are stored in ROM, they're only ever stored in RAM EEPROM or Flash on development hardware, i.e. not a personal computer processor.
There is a ROM that contains microcode, but there is also a RAM that is used to hold patches. This patent describes how some AMD processors load patches into a small RAM (scroll down to "BACKGROUND OF THE INVENTION" for the human-readable explanation).
Furthermore SRAM is volatile memory, so could not be used for this purpose.
The BIOS usually applies microcode patches at every boot. From patents and BIOS reverse-engineering, people have even figured out how to load their own patches. In this instance, it looks like MS is going to have Windows apply a microcode patch (which makes sense if you think about it - BIOS updates are a pain and very few people actually take advantage of them, but Windows Update can distribute this kind of thing to 95% of computers, and the OS can apply them at boot time). -
Re:Nice attempt, AMD.
While the Core 2 has its roots in the P6 microarchitecture, it is an entiely different processor design with far too many changes to be called simply a "tweak" of the P6.
The reason you have not seen in-depth articles for the Core 2 is because most "review" sites on the web are staffed by people who don't know a lick about processor architectures, and they just regurgitate Intel's presskit material without any discussion on the subject. Here is a much more in-depth article concerning the architecture.
Summary of improvements since P6:
4-wide decoding instead of 3-wide.
Micro-op and Micro-op fusion (more instructions decoded).
Improved branch predicition, including loop detector (from Pentium M).
Support for 2 128-bit packed SSE instructions per-cycle.
Three dispatch units (P6 had two ALU / FPU ports).
Speculative reordering of loads.
And, of course, the low-latency (from Pentium M) and shared L2 cache. x86-64 support also counts for something.
Why does it take so long to make sizeable improvements on the P6 microarchitecture? The reason is complexity: every extra decode pipe, every extra issue port, every extra ALU adds exponential complexity to the design. Intel was actualy trying to get around this problem with the Pentium 4: crank up the clock speed, and you don't have to make a more complex processor!
The failure of the Pentium 4 and the complexity of massively superscalar cores is the main reason why CPU designers are moving toward multi-core as a long-term solution: superscalar architectures as complex as Core 2 (and Barcelona) are VERY difficult to design and verify. -
Re:Quick Mac Buying TipI saw a memory test somewhere the revealed the memory can run hot, and you get a number of correctable ECC errors. But if your RAM has the larger Apple-recommended heatsinks on them, the ECC errors drop to zero. This is a test I would love to see, as I have long been under the impression that RAM heatsinks (as opposed to the heat spreaders on RDRAM RIMMS) are effectively* useless. If you have evidence to the contrary, I'd love to see the source.
From what I can find, the power dissipation of a fully buffered DDR2 DIMM is similar to a plain 16 chip SDR DIMM (10.4W v 8.7W). That's for the whole stick. Unless the airflow is seriously hampered, even with 8 DIMMs packed side-by-side heat does not appear to be a significant source of errors.
*Search for the header "Blue Metal" in this article for the relevant bits. Anchor tags appear to be absent. -
Re:For 64bit floats, the PS3 is a powerhouse
Your understanding is wrong.
SPU: 2 execution pipes, each 128-bits wide, for a total of 8 32-bit VECTORIZED (SIMD) operations per-clock.
8 * 3.2 GHz = 25.6 GFLOPS for each SPU. These are the same performance numbers being quoted everywhere for SPU SINGLE-PRECISION. This performance-level would not be possible if your math was accurate to 64-bits.
In fact, 64-bit (double) operations actually cut performance because the SPU has to re-use the single-precision SIMD hardware (it has to sacrifice the SIMD functionality). So the SPU can only output ONE double-precision float from each pipe per-clock, or 6.4 GFLOPS.
Just because the registers are 128-bit wide does not mean the math is accurate to 128-bits. See this article here for more details. -
Re:Silicon Valley will become K-Valley then?Uh, the GP never made any claims about the substrate. In fact, GP specifically stated this is the gate dielectric that's being replaced.
So, your statement of "no" makes no sense, because you're agreeing with the GP.
Oh, and the reason they're using metal gates instead of poly, can be found courtesy of RWT, last paragraph of the following link
http://www.realworldtech.com/page.cfm?ArticleID=RW T012707024759&p=3Since polysilicon is not compatible with Intel's high-k material, the newer 45nm transistors use a metal gate
-
Re:Questionable
Yes, NUMA is not avalaible in Intel Core processors.
But Intel has processors with four cores avalaible. 2P motherboards with 4Core processors are cheaper than 4P motherboards with 2Core processors.
You can find a review with more information at:
http://anandtech.com/IT/showdoc.aspx?i=2897
and in:
http://realworldtech.com/page.cfm?ArticleID=RWT111 406114244
The conclusion is (more or less): yes, the scalability of Intel Core Processors is worse than AMD Opteron Processors. However, the price/performance ratio of the Intel Core 2 Quad Processors is great. -
Some more information
I haven't checked the information yet, but here's an abstract on the rest, found through google:
The Power6 processor will run between 4GHz and 5GHz and it has been proven to chew away data at a speed of 6GHz in the lab.
IBM see things a little differently and they decided to raise the frequency in both cores of the processor.
For high-end models, four POWER6 MPUs will be packaged in a single multi-chip module, along with four L3 victim caches, each 32MB.
On the management side, IBM is also improving their virtualization capabilities in the POWER6. In particular products, a single processor may be able to host 2-300 virtual instances, although theoretically up to 1024 VMs are possible. Memory partitioning and migration have been added as well, which reduces system down time for repairs.
IBM is claiming a factor of two performance increase, which would be consistent with the vastly higher clockspeeds and increases in raw system bandwidth.
IBM's roadmaps currently include the POWER6+, which is presumably a 45nm derivative product. Judging by past practices, the POWER6+ will debut in the second half of 2008, probably just in time to dash the hopes of rivals.
The Power and PowerPC lines will grow one step closer together with Power6, which incorporates the AltiVec instruction set that speeds up many multimedia tasks. AltiVec, also known as VMX, increases efficiency by letting a single processing instruction be applied to multiple data elements. That's helpful for video and audio tasks on desktop machines, but servers will benefit as well in, for example, high-performance computing tasks such as genetic data processing, McCredie said
Where Power5 can transfer data on and off the chip at a rate of 150 gigabytes per second, Power6 can do so at 300GBps, McCredie said.
Oh, and it is also good for BCD's (binary coded decimals) which obviously points to the expected customers (high end financial firms, presumably).
Sources:
http://news.softpedia.com/news/New-Power6-IBM-Proc essor-Trashes-Competition-with-6-GHz-17765.shtml
http://realworldtech.com/page.cfm?ArticleID=RWT101 606194731
http://news.zdnet.com/2100-9584_22-6124451.html -
Good question...I just got done reading about the PWRficient (via Ars):
- Two 64-bit, superscalar, out-of-order PowerPC processor cores with Altivec/VMX
- Two DDR2 memory controllers (one per core!)
- 2MB shared L2 cache
- I/O unit that has support for: eight PCIe controllers, two 10 Gigabit Ethernet controllers, four Gigabit Ethernet controllers
- 65nm process
- 5-13 watts typical @ 2GHz, depending on the application
Now I have to wait for the boner this gave me to go away before I can get up and walk around the office.
Maybe Apple could have put off the Switch after all...
-
Re:What about other parts of the computer?
It's not quite that good in it's current incarnation. Right now, high-end (4 and 8-way) Opteron chips have only three HyperTransport links.
Try connecting 4 of these chips together using only 3 HyperTransport links per core, with a single-hop memory latency, and allow for one link to external I/O. Can't be done. There are two hops required for the core that handles I/O, which is not a good thing when you consider how important I/O links are in a server.
Try connecting 8 sockets using only 3 HyperTransport links, and allow 2 connections minimum for external I/O - now most of your connections are two hops or more.
K8L attempts to solve these problems in two ways:
1. K8L adds a fourth HyperTransport link, which allows easy single-hop 4-socket systems (and allows all 4 sockets to interface with external I/O, if desired).
2. K8L allows the HyperTransport links on each socket to be split from 4 16-bit links to 8 8-bit links, to allow single-hop memory latency on 8-socket configurations. Combined with the faster bus speeds of HyperTransport 3.0, that's plenty of bandwidth to feed 32 cores. And of course, there's potential for 16-socket configurations (with only 2-hop memory latency, depending on whether AMD decides to support this gluelessly).
Meanwhile, even with the massive caches and Dual Independent Bus architecture, Intel's 4-core chips are going to reach saturation at 4 sockets. -
Re:Dell had to do something
Yes, I've made posts about this before. The Dual-Independent Bus (DIB) architecture introduced by Intel has only solved the 4-core issue. With Covertown expected to clock in with a miserly 1066 MHz FSB, I expect performance to improve %50 or less when moving from 4->8 cores. 16 cores is still a pipe dream.
Meanwhile, AMD is preparing to launch the K8L. Not only will it feature many of the performance improvements seen in Core2, it will also feature a shared L3 cache and, most importantly, a 4th hypertransport link.
Now, the 4th hypertransport link doesn't sound like much, until you consider: with 3 hypertransport links, even AMD's 4-socket configurations have been a bit limited (you have to sacrifice single-hop latency between processors because they have to connect to external I/O). 8-socket systems were even worse, with several hops between processors.
The additional hypertransport link on K8L means optimal configuration for 4-socket (16 core) systems. The 4 16-bit hypertransport links on K8L can also be split into 8 8-bit interconnects, allowing for single-hop memory access on 8-socket (32 core) systems. Read this preview on Real World Tech for more information.
If anyone wants to know why Dell is going gung-ho on AMD, it's because they know K8L will completely rip Intel apart in the server arena (where Dell makes most their money). -
This is not so surprising.
I'm sure Sony is scraping at anything they can to reduce power.
Anyone recall how much of a heat problem the Xbox 360 has? While it's not blazing hot, the unit does get quite toasty and requires a beefy fan to keep both GPU and CPU happy. The lowest reported power usage I've sen on the net is 136w, almost double that of the Xbox!
Now look at the PS3:
RSX = the same hardware (256MB GDDR3, 16/24/24 pipes, 50w) as a 7900 GT. With the stated clock speeds (higher than a 7900 GT), and the rwequired increase in voltage, that puts power usage at about 60-65w.
CELL = 8 cores with estimated power requirements of about 4w each (that estimate is probebly too low for the PPE, but may be a smidge high for the SPEs, so it evens out). That's at least 32w just for CELL (more if they can't mass-produce Cell for 1.1v).
Now add in the the system board (256MB ram, bridges and chipsets, other components) for an additional 30-40w, and suddenly the PS3 (125-145w total) is looking as hot as the Xbox 360.
In addition to the same power issues as the Xbox 360, the PS3 also has the lovely problem of getting losses from the powersupply out of the case (%15 of total power is reasonable for a cheap 12v-only supply). This brings the total power dissapated by the PS3 case to 140-160w! That's more than most mainstream PCs!
So, it's no surprise to me that Sony may be tweaking the clock speeds. Despite estimates showing operation above 4 GHz, Cell has been throttled back to a reasonable 3.2 GHz to reduce voltage and frequency. RSX is next on the chopping block. Why not reduce the speed, if you can also reduce the supply voltage? That could cut the power usage of RSX down to as low as 50w.