AMD QuadFX Platform and FX-70 Series Launched
MojoKid writes, "AMD officially launched their QuadFX platform and FX-70 series processors today, previously known as 4x4. FX-70 series processors will be sold in matched pairs at speeds of 2.6, 2.8, and 3GHz. These chips are currently supported by NVIDIA nForce 680a chipset-based, dual-socket motherboards, namely the Asus L1N64-SLI WS, which is currently the only model available. HotHardware took a fully configured AMD QuadFX system out for a spin and though performance was impressive, the fastest 3GHz quad-core FX-74 configuration couldn't catch Intel's Core 2 Extreme QX6700 quad-core chip in any of the benchmarks. The platform does show promise for the future, however, especially with AMD's Torenzza open socket initiative." And mikemuch writes that the QuadFX "not only fails to take the performance crown from Intel's quad-core Core 2 Extreme QX6700, but in the process burns almost twice as much electricity and runs significantly hotter in the process. ExtremeTech has a plethora of application and synthetic benchmarks on QuadFX, including gaming and media-encoding tests."
>>How about reading the article after you write it?
They're in the process of reading it now.
While performance may be disappointing, it's pretty clear that AMD is just releasing this as a stopgap solution to "stay in the game" for the performance sector until their new developments are ready next year. The name is a good choice and reflects that intention - they combine their performance branding, FX, with "Quad", the term Intel is using, to indicate that it fills the same niche as a quad-core processor. I think it does what it is meant to do - give the impression of a comparable offering until AMD has the real competition ready.
Not 8 cores 80 cores
thank God the internet isn't a human right.
This is true, while Amd lags now, it is still on 90nm while intel is on 65, when both are on 65 then we'll have some real competition going on. -Ed
So you see what had happened was....
The QX6700 has the same TDP(125-130W) per socket as the FX70-74 so I assume they run at about the same temperature on chip. Overall system temperature might be higher for the FX based quad core system since it uses twice as many sockets, but that's a matter of case design, if the case design can eliminate the heat from the heatsink effectively I would imagine both systems would run at the same temperature. This is of course ignoring the fact that AMD TDP is worst case and Intel's is average case.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
I wonder which will come first?
processors with 10 cores
or
razors with 10 blades
This is my signature. There are many like it but this one is mine.
Only by the time AMD releases their 65nm stuff, Intel will be at 45nm http://today.reuters.com/news/articleinvesting.asp x?view=CN&storyID=2006-11-27T225533Z_01_N27466640_ RTRIDST_0_INTEL-MANUFACTURING.XML&rpc=66&type=qcna
I think most here knew that this was always going to be a stupid vanity platform, almost as stupid as water-cooled memory modules. Now, the only thing more sad and stupid than a vanity platform, is one where the vanity isn't even there.
This should have ended as abandoned concept art in a drawer.
(PS. My current gaming rig is AMD X2-based, but if they don't have the performance/$ then they won't get in on the next upgrade)
Belief is the currency of delusion.
The Extreme Tech benchmarks seemed to expose a lack of windows XP's ability to benefit from NUMA. I wonder what testing on a newer linux kernel with NUMA scheduler support would show.
Ahh... Someone that gets this concept.
Intel wins on extra Cache- and the benchmarks that keep getting ran don't reveal performance snags with the SMP operation.
Intel's got a shared L2 that's 2-4 times the size of the AMD equivalents' pool.
AMD's got a coherent, but NON-shared L2 split across multiple CPUs- each core has it's own L2. You'll have less L2 thrash with that design.
Under an SMP load, the AMD design will have an edge if all four cores are busy in different parts of system memory.
If you pop out of cache, the memory bus design and overall architecture of the AMD parts will have an edge.
Intel has an edge only due to process shrink and the things they can do as a result thereof. As soon as AMD goes to the
smaller process size, they'll pick up the lower TDP advantage Intel has right at the moment and then the whole deal will
flip-flop on who's got the "best" CPU unless Intel comes up with a few new tricks along the way, which may/may not happen
for them.
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
You need to run some intensive process to heat up enough for a bath or shower.
Nahh, just boot Vista.
Seven puppies were harmed during the making of this post.
In CPU architecture circles, the shared L2 is considered a more ideal design than split L2 for multi-core processors. There are plenty of talks around the 'net as to why.
As far as cache size, that's a design tradeoff just like any other. Because of the slowness of main memory, you want to have as large a cache system as possible. However, cache system latency increases with the size of the cache so that is a tradeoff as well. Intel chose to use some chip realestate for cache. "Faulting" them for this is just being an apologist for your puppy.
There are many types of "SMP loads". Multi-threaded loads where all threads work on the same data will be similar on both as there is only one pipe to the memory on both the NUMA and the FSB model, for example. But yet, on SMP loads that are more 'lose', you can get good benefits from NUMA. By the way, Intel also has the IMC with their equivalent to HT on the roadmaps, so this discussion (NUMA vs FSB) won't be relevant for much longer.
Additionally, it isn't until AMD's 'next thing' where their NUMA architecture will be able to scale much better (it doesn't do that well with lots of sockets because it falls back to being limited by the number of HT connections so some communication has to be multi-jump with current multi-socket solutions - the new core adds an HT link so that 4+ sockets can have a more direct path around the system).
There are a number of examples of "popping out of cache" in the tests on various sites. AMD does show that it helps in those when it can use the bandwidth of both NUMA branches but it isn't convincingly better than Intel's FSB on many/any of the tests that are shown (you'd hope to see idealistically 2x performance improvement on many of those, but even with all the extra bandwidth, AMD doesn't seem to 'blow the doors' off of the Intel parts... in fact, AMD doesn't even beat them even with the added bandwidth... this just shows that there may be more to the picture than an IMC + more bandwidth). Even AMD's latency isn't that much better than Intel's FSB design anymore (the nice advantage that had against NetBurst is pretty much gone).
I'm eagerly awaiting AMD's next 'real' move, myself, but given that Intel is already sampling 45nm parts and even on 65nm Core is able to clock to 3.5GHz ranges (meaning Intel has a lot of headroom even on 65nm), the short amount of time that Intel and AMD will overlap on 65nm will probably just show equality (at best) between the two. I haven't really seen what performance advantages AMD's new features give, other than the obvious benefits of wider paths and the FPU issue increase (to bring it equal to Intel's issue rate, although AMD has typically had a stronger FPU). AMD claims a lot, but that could simply be marketing at this point.
I don't think many people would call Folding pointless.
It's like sex, except I'm having it!
has another review that says reaffirms the same findings. Performance is not beating Intel yet and the AMD/ASUS solution is very expensive. I feel the only market here is those that cannot wait and have money to burn.
Stating the blindingly obvious: some people aren't going to notice much (if any) difference; others are going to see a huge difference. Parent falls into the former camp; I fall into the latter. I also have been using 62x2 for a year, and no way would I go back to single core. It would be worth having dual core if only for the fact that I can start a job and it will consume a core while all my interactive work runs on the second core, and hence I don't even notice that a huge job is running in the background. Everything else one gets with dual-core is an added bonus. I'm not totally certain that going to 4 cores on the desktop will be as useful, but I can believe that it might be, and will certainly be worth trying. For me, anyway (and I can't believe that I'm particularly untypical of slashdot users).
Given my experience, I'm even fairly convinced that the rest of my family (who are much more like ordinary users) would benefit from dual core too. Everything is simply so much more responsive.
Folding a point is an identity operation so really it's more futile than pointless.
... if by "power user" you mean "someone who uses lots of power," then yes.
A polar bear is a cartesian bear after a coordinate transform.
http://www.hothardware.com/printarticle.aspx?artic leid=911
How does this explain that when clock speed and L2 cache sizes are equal, Core2Duo outperforms Athlon X2 by a non-trivial amount? If it were "just process", then you could try to show where Core2Duo wins based on how much cache it has and the like, but that isn't what we see in the multitudes of benchmarks that have been run. A 65nm 2GHz part doesn't just magically perform calculations faster than the same part running at the same frequency at 90nm, for example.
So... L2 cache speed. When I look at Memtest86+ numbers, I see:
~19700 MB/s for L1
~4700 MB/s for L2
~3000 MB/s for main memory
This is on a Athlon64 X2 4600+ w/ low-speed DDR2 RAM (4 sticks of 1GB).
I'm guessing that L2 gains are because it can respond to a memory request faster (fewer clock cycles) then because of the bandwidth? Because the L2 bandwidth of 4.7GB/s doesn't seem to be that exciting anymore once main RAM can feed the CPU at 3GB/s.
Wolde you bothe eate your cake, and have your cake?
You don't run Gentoo, do you?
The more up-to-date version would be:
You don't do virtualization, do you?
Start cramming multiple virtual servers onto a single box and all of a sudden dual-core solutions start to seem limiting. And you find yourself wondering just how much a 4-way quad-core machine would cost...
(That 4-CPU quad-core machine is still going to be cheaper then maintaining 4 separate quad-core servers.)
Wolde you bothe eate your cake, and have your cake?
Anybody running a 2.4.2 version of the Linux kernel should be shot. Nobody runs 2.4.2 these days and anybody suggesting that is far out of touch with what Linux is doing. Compare it against 2.6.19 with all of the NUMA options turned on (CPU local memory allocators, RCUed algorithms) and you'll see an expected an expected trumping of XP for kernel load hands down because of all of the MP work on it over the 4 years.
Again... what is this mythical "true SMP operations" that people keep mentioning? Are you talking about MIMD code?
I don't understand the "places" you mention. L2 cache has been multiported for a long time. Additionally, the cache subsystem should be able to handle simultaneous requests from both cores. There should be no stalling due to simultaneous cache accesses from both cores in a shared cache system. As far as cache spills, any situation that should cause spills in a shared cache should cause spills in non-shared (I'll mention this later). Basically, the shared 2M cache can mimic the degenerate case of two 1M caches exactly, but has the flexibility to also be the same as one core having a 512K cache and the other having a 1.5M cache, if working sets dictate, for example (I'll mention this later too).
I don't get your discussion... I'm just not following your verbage. I'm trying to understand it but can't get your metaphors or something.
Anyway I'll try to discuss what I think you are talking about. Shared L2 cache is considered the superior design compared to each core having unshared cache. There are numerous discussions on this around the 'net. However, I'll talk about several specific examples.
In a non-shared cache configuration with two cores on the same die running multithreaded code, you can easily get into situations where each thread wants access to the same piece of data for writing. When this happens (which is fairly common... mutex/semaphore/etc in fine-grained code are good examples of this), in a non-shared cache system, you can get a lot of MOESI traffic and passing around of that data between the two non-shared caches (takes inter-cache bandwidth to do that). However, in the shared cache system, that data is in the shared L2 cache exactly once and, furthermore, there is no passing it around... no MOESI traffic, no usage of any intercache bandwidth because no copy takes place. In such a situation (two threads competing for writes on the same data), the shared L2 cache can be very much faster than the non-shared L2 cache. In addition, the absence of the MOESI traffic is a lighter load on the MOESI subsystem, leaving it free to do other MOESI traffic and do other transfers. In some codes, MOESI traffic between non-shared cache and data copying between the unshared L2 caches can be almost pathological behaviour, leading to heavy slowdown as the two cores fight for access to the data. To summarize: Shared L2 = much lower MOESI traffic in a competing writes situation and little/no intercache bandwidth utilization because no copies between caches occurs. Non-shared L2 in such a situation is more MOESI traffic and intercache bandwidth utilized (and cores waiting for the data to transfer) to transfer the data back and forth. It's easy to write a simulation of this problem.
A second example is cache utilization. If you have two threads in a dual core system that are asymmetric in cache working set size, you can
Maybe... Microsoft's DirectX10 has an API in it for offloading vector type work to 'something else' in the system. The interesting thing about it is that it will be a standard API, meaning that hardware can be built to take advantage of it while drivers can also be written to either do it in emulation or by actually handing it off to the specialized hardware. This would help AMD out a lot as far as that kind of hardware goes... without some standard APIs, it would likely end up in a mess, IMO.
Personally, I'm not that excited about that kind of technology just yet because it is still fairly immature as far as PCs go. The logistics are rough all the way around. First hardware can likely be surpassed quickly with newer/better coprocessors but you have to a) replace the entire CPU, b) leave that coprocessor there but disable it or something in preference to an add-in card that is better (just like embedded graphics today) which means you have basically a dead 'core' on your CPU, and c) AMD will be stuck with a ton of dead stock as soon as they upgrade that coprocessor and AMD already has problems with keeping channels fed. c) will probably mean that advancement of Fusion will be slow. If AMD can release a Fusion vector part that is 2X as fast or has a better API (new version) or something, but the 'main core' is the same (say it's a 3.4GHz Athlon64 core) because it hasn't advanced as fast, the instant the new part comes out, no one will want the old one and it will just sit there unless it has huge discount. IMO, this will make AMD not want to advance the Fusion device at a faster pace than the x86-64 core that's also on the die because it would be very costly.