FreeBSD 7.0 Bests Linux In SMP Performance
cecom writes "After major improvements in SMP support in FreeBSD 7.0, benchmarks show it performing 15% better than the latest Linux kernels (PDF, see slides 17 to 19) on 8 CPUs under PostgreSQL and MySQL. While a couple of benchmarks are not conclusive evidence, it can be assumed that FreeBSD will once again be a serious performance contender. Some posters on LWN have noted that the level of Linux performance could be related to the Completely Fair Scheduler, which was merged into the 2.6.23 Linux kernel."
Update: 03/06 21:32 GMT by KD : An anonymous reader sent in word that Linux kernel developer Nick Piggin reran the benchmark today and came to a different conclusion: In his benchmark Linux was faster than FreeBSD.
I'd be interested to see results from pre-CFS kernels.
Not that FreeBSD hasn't made major performance improvements.
Also, I think that a database test isn't a complete picture. For example, some OSes like IRIX or Mac OS X perform very well on streaming of local video and audio, but I wouldn't benchmark Oracle or PostgreSQL on either.
My blog
I can finally make full use of my quad-core toaster!
That toast isn't going to serve itself!
Skiffy is Spiffy, but Ort is tort.
Maybe now we can finally declare year of the linux desktop!
Wait, what?
My blog
If you haven't made a developer cry, you've wasted a day.
It probably has a lot to do with FreeBSD having a much more focused niche. FreeBSD is really tuned primarily for servers. You can use it on your desktop, but that's not really it's main purpose. Linux on the other hand, has really branched out. It has desktop distros, server distros, embedded distros, and probably a couple other areas I haven't thought of.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Speaking of which: are there any "distros" out there ship a combination of FreeBSD and the latest Gnome desktop? I think that would be a better combination than Ubuntu's Debian+Gnome combo, personally.
IOW, if there is a performance difference, I would expect it to show up exactly the same in both FreeBSD and Linux (as well as any other OS that supports SMP).
My blog
Does FreeBSD have a pre-emptable kernel? One of the things Linux has really focused on lately is desktop interactive performance, so there may be performance tradeoffs vs. a kernel which can't pre-empt itself.
I want to delete my account but Slashdot doesn't allow it.
to the enlightening and respectful conversation this article will provoke.
You think so? I dunno, it seems to me that FreeBSD suits the desktop role really well; I use it for preference. Especially when you consider that the only OS with more packages is Debian, it makes sense that it can fit a desktop role extremely nicely.
How to use coral cache: http://slashdot.org.nyud.net:8090/~oscartheduck
Benchmarks are almost but not completely useless. In this particular setup, FBSD 7.0 runs postgres doing some specific set of queries faster than Linux.
Its a safe bet Linux will do some other set of things faster than FreeBSD does them, possibly even another specific set of PostgreSQL queries for that matter. Linux is definately more concerned with desktop app performance. I can say this safely because Linux actually cares about it, FreeBSD does not. Its there to serve, not run X. It will run X, and if they see a way to make performance better for the desktop apps AND the server apps, then it may go in the source tree. If its going to hurt the server side, don't bet on it.
While I use FreeBSD for my servers because its got a clean filesystem layout and is designed to be a server OS, I'd be willing to bet that someone with deep knowledge of PostgreSQL on Linux could give it a run for its money by tweaking the kernel for server performance.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Very, very nice scaling performance under PGSQL is evident in the PDF, and I've no reason to assume the benches aren't legit. I think part fo the reason that PG was traditionally slower than MySQL was that it did lots of complicated locking to provide better scalability across processors, whereas we see MySQL performance dropping off after we go to more than eight cores. I think this was the same philosophy Sun took with "Slowaris", which was also far more scalabe than Linux at the time the moniker was in widespread use.
.24 and .25, although it was a little sad to see the first iteration of CFS performing more poorly than its predecessor (and, if this is the case, I can see why Linus stonewalled CK's patches for so long, since they were mainly tested on desktop workloads). Are there any apples-to-apples comparisons out there that test various flavours and versions of Linux and BSD with a wide range of benchmarks? At the best review sites do a few benches with MySQL, and six months later everything has changed so it's incredibly difficult to do good performance comparisons.
:)
Still, I hope Linux can at least match this sort of superb scalability. CFS is fairly new, and I know there's optimisation work been done to it in
Even so, it's refreshing to see precious little of the "BSD fudged their benchmarks!" trollspeak in the LKML thread, and plenty of talk about how to make Linux better. Open source is hippy capitalism - it also needs healthy competition to keep it in check
Offtopic: bug linked to in the LKML pointed me at this http://www.latencytop.org/ Sounds quite nifty
Moderation Total: -1 Troll, +3 Goat
How many of those packages are desktop packages? Seems like a odd metric to just compare the number of packages as to how well an OS is suited to the desktop.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Linux is actually better than BSD because you can roast marshmallows over the schedular flamewars.
Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
Well, I don't think I've ever installed any package from anything other than the ports system. Lots? I know I've installed everything from Gnome, XFCE and KDE, through OpenOffice and a bunch of stuff in between.
You're right that mere numbers of packages is a weird metric, but what else can we offer? FreeBSD has great performance, and has everything necessary to be either a good server *or* a good desktop. It's much like Gentoo that way -- it doesn't focus on being either one or the other, it focuses on being a solid basis. What you put on top of that basis is your choice. It honestly seems to me that the distinction between server OS and desktop OS is its own entire discussion; if we can come to a good notion of what either means, we can reach a nice conclusion. If we take the current crop of Linux desktop OSes, though, I don't see any more integration between, say, Fedora and Gnome and FreeBSD and Gnome, or Ubuntu and Gnome and FreeBSD and Gnome.
If I think about it, it does seem that Ubuntu starting with a GUI interface and letting you find the command line by yourself is more friendly to the average user; I haven't installed FreeBSD using anything other than minimal-install for so long that I don't know whether you can have a GUI start up by default. And FreeBSD's installer, whilst excellent for its audience, is less friendly to a first timer. If we take those metrics, the idea of "can I sit down and first time use it without documentation?" then a lot of the linux crop are friendlier, yeah. But the documentation *is* very hand-holdy, and very very thorough for FreeBSD. And nicely available online.
How to use coral cache: http://slashdot.org.nyud.net:8090/~oscartheduck
While I'm glad for FreeBSD they're showing good numbers again, their testing of PostgreSQL in this study is rather odd. The results are using the read-only tests from sysbench. You can see from its sourceforge page that sysbench is a MySQL benchmarking tool that has some rudimentary PostgreSQL support bolted on top. That particular code is so bad that the last time I checked, turning on the write OLTP tests deadlocked the PostgreSQL server, as it wasn't putting statements into transactions correctly (which of course the ancient MySQL versions this code targeted doesn't care about). As the sysbench tool hasn't been actively maintained in ages I doubt that has improved.
The claimed "15% faster than linux" is pretty clear in the MySQL tests; the PostgreSQL ones have a weird dip in them but are in general much closer. I'd be comfortable if the result of this study was "FreeBSD 7 has been optimized to be 15% faster running MySQL than Linux", because that matches what they did (note the specific libpthread patch for example). But the fact that they used such an awful PostgreSQL benchmarking methodology leaves me hesitant to draw a broader conclusion than that based on their tests.
Yes; Jeff Roberson, the author of FreeBSD's new ULE scheduler, wrote a bit about it on his blog a couple of months ago.
but I heard it wasn't compatible with Windows or labtops.
:) I want to download the internet onto my labtop.
Can you help? Sorry, I am not good with computers.
+++ATH0
http://article.gmane.org/gmane.linux.kernel/650739
It does (I use it too) BUT only in specific environments. FreeBSD hardware support is not bad, but it is nowhere near as complete as that found in the various Linux distro's. My wireless keyboard + mouse is supported under any recent Linux distro, on FreeBSD, only the keyboard works (fixable with a unofficial ums.ko though). No support under FreeBSD for my DVB-C PCI card either.
It only takes one man to change the Wisdom of the Crowd to Tyranny of the Masses.
The same way an HT CPU shows up as 2 CPUs (with disasterous effects) unless the OS is away and can properly exploit it?
Some dual cores share L2/3 cache, but not all. Another important factors are the shared connections to external world, such as memory. So I presume inter-CPU communication is faster, but external communication can be slower.
That aside, HT is a hack which should not be compared to dual core systems at all. In fact, "dual core processor" is a rather silly marketing term, because it means "two processors on one piece of silicon". In other words, you could interpret the phrase "dual-core CPU" as "a CPU that contains two CPUs".
Escher was the first MC and Giger invented the HR department.
Agreed, on both points. What I want to know though, is where this performance improvement, and 7.0 in general, leaves Dragonfly BSD... do they still feel that Dragonfly's choice to split off at 4x and start making radical changes is paying off? Is dragonfly making progress towards better performance, in general, or on particular workloads?
I saw what Matt Dillon did back in Amiga days. I saw what Amigas themselves could do. If Amigas inspired Dragonfly towards a more lightweight model, I'd love to see that fork making more progress.
The article is probably misleading (surprise, surprise), as the tuning documentation for PostgreSQL *states* that good IO performance has more of an impact than good CPU performance. Additionally, some other information I've read (search for postgres tuning/optimi(z|s)ation online) recommends FreeBSD because of its strong IO performance. I'll go out on a limb and assume that MySQL's performance attributes are similar.
In my opinion, the article summary is a pretty big red herring because the SMP performance may not have a huge impact on the result.
For one, CFQ is not supposed to be an optimized I/O scheduler for database loads. That's where the Deadline scheduler comes in. You wouldn't want a "Fair" scheduler on your database server, as you would end up putting the database in I/O wait to handle lower priority processes.
/^([Ss]ame [Bb]at (time, |channel.)){2}$/
Cool, however it would be better if software working on Linux were also working on FreeBSD.
boehm-gc is totally broken when using threads on FreeBSD SMP. And it's still totally broken on FreeBSD 7.
The Neko virtual machine is in ports, but it's unuseable due to this, I don't even understand why it's in the ports tree. Was it ever tested before being imported?
Just creating a thread:
$loader.loadprim("std@thread_create", 2)(function(z) { $print(z) }, "OK");
makes is crash with a corrupted stack. It works on every other operating system. It seems to work on an UP FreeBSD system, but on a FreeBSD 7 SMP system, it crashes, crashes, crashes.
{{.sig}}
Hi, I am the one who performed these benchmarks and I'd like to clarify a couple of things:
:-)
* The point of this benchmark is not to unilaterally declare victory over Linux, but to point out that FreeBSD is once again competitive with it on modern high-end hardware and certain workloads. Of course, we are working on other workloads too, and currently perform better than Linux on other benchmarks, and still worse on others. There will no doubt be further friendly competition between the two OSes that will work to the benefit of both. Our message to the Linux developers is that they should not expect to get away with resting on their laurels
* I benchmarked both mysql and postgresql, and FreeBSD 7.0 performs better than all Linux kernels (at least up to 2.6.23) with both databases. Incidentally postgresql is much faster than mysql, contradicting common wisdom. Other fun facts are that mysql 5.0.51 has poorer scaling than 5.0.47, and 5.1.x has *much* worse performance and scaling than 5.0.47 on my tests.
* I benchmarked several versions of Linux including 2.6.20.x, 2.6.22 and 2.6.23. 2.6.20.x has terrible performance http://people.freebsd.org/~kris/scaling/scaling.png. This graph is from Feb 2007 and the FreeBSD performance also improved after this point.
* 2.6.22 (which is pre-CFS) mostly fixed this but still performs worse than FreeBSD http://people.freebsd.org/~kris/scaling/os-mysql.png. 2.6.23 included the new scheduler and was a major performance regression. I did not yet retest with 2.6.24, so maybe they have fixed CFS by now.
* Contrary to some commenter's assertions that this is not a CPU benchmark, this benchmark is *extremely* sensitive to CPU performance and especially scheduling (in fact, as noted in the PDF, I/O performance is not a factor here). The scheduler really matters here, which is why Linux took a big hit when they switched to CFS (similarly, on FreeBSD the 4BSD scheduler performs much worse). Tuning the scheduler is critical to performance on this kind of workload. The other critical aspect is having a highly optimized kernel without concurrency bottlenecks. 2.6.20 fell over on kernel concurrency, and 2.6.23 fell over with the scheduler.
Hope this helps to clarify things.
Looks like it didn't last for long:
http://www.kernel.org/pub/linux/kernel/people/npiggin/sysbench/
Have you ever looked at a block diagram of the predominant dual core designs? They're not simply "two processors on one piece of silicon". Both Intel and AMD used a shared cache design with a single connection to the system bus (FSB and HT, respectively). In the case of AMD, it also means a shared memory controller. It's a real difference with real performance and power implications, not a "silly marketing term".
Now if you complained about Intel shoving two dies into a multi-chip package and calling that quad-core, I'd agree with you. All the reduced bandwidth of a shared connection to the FSB with none of shared cache! Sign me up!
I would say that packages != programs.
With a debian "package", I know exactly how to install it (the same way as all the others), and I know that there is a set version of that package that corresponds to, say, "Debian Sarge". I know that if I install it, it will pull along any libraries it needs, and that it won't break anything already on my system. I know it doesn't always work like that, but that's the idea. I think of a "package" as part of the distribution. Somebody has decided that it forms part of the distribution, and has hopefully tested it as such.
A "program" is what Windows has so many of. But all bets are off when it comes to versioning, library dependencies, etc. Even how to install it. If you think of Windows as a "distribution", then it doesn't come with all that many packages at all. A Desktop environment, a browser, some photo and media tools. Mac OS X doesn't really fare all that much better. I love OS X to bits, but the first thing I did was install a third party program (firefox).
only OS with more packages is Debian
Whatever happened to Windows?
Vista. That's a non-operating system.
-- Alastair
Here, and it applies to a significant number of other network servers.
Dramatic improvements in performance and SMP scalability shown by various database and other benchmarks, in some cases showing peak performance improvements as high as 350% over FreeBSD 6.X under normal loads and 1500% at high loads. When compared with the best performing Linux kernel (2.6.22 or 2.6.24) performance is 15% better.
http://people.freebsd.org/~kris/scaling/bind-pt.png
Summary:
* FreeBSD 7.0-R with 4BSD scheduler has close to ideal scaling on this test.
* The drop above 6 threads is due to limitations within BIND.
* Linux 2.6.24 has about 35% lower performance than FreeBSD, which is significantly at variance with the ISC results. It also doesn't scale above 3 CPUs.
* 7.0 with ULE has a bug on this workload (actually to do with workloads involving high interrupt rates). It is fixed in 8.0.
* Changes in progress to improve UDP performance do not help much with this particular workload (only about 5%), but with more scalable applications we see 30-40% improvement. e.g. NSD (ports/dns/nsd) is a much faster and more scalable DNS server than BIND (because it is better optimized for the smaller set of features it supports).
Interested in open source engine management for your Subaru?
A package is a bundle of stuff that can be installed using your OS' package management facility. BSD's Ports, Gentoo's portage, Debian's apt (also used by Ubuntu). The "big two" commercial OSes don't really have an equivalent to that; Windows e.g. only lets you install some optional components using a unified frontend. Counting the number of packages is easily possible and done by the repository maintainers.
A program is quite hard to define. A handwritten script could be considered a program by some, others may reserve the term for publicly available software. The number of programs is very hard to approximate and impossible to determine unless you chose an uncommon, restrictive definition and a point in time of which you possess all information.
Nobody said Windows didn't have lots of programs and software available for it; probably more than any other OS family on the planet. It does not, however, have a central facility to classify and automatically install them from. (Cue jokes about IE + ActiveX doing a great job of auto-installing all the malware from MSFT's repository called "the intertubes").
Because netcraft confirms it.
It honestly seems to me that the distinction between server OS and desktop OS is its own entire discussion; if we can come to a good notion of what either means
,please.) When everything uses TCP/IP or XML or whatnot, interoperability increases exponentially.
I don't think the desktop/server distinction means anything anymore, and for three reasons. One, cheap commodity hardware. Two, the literal glut of software. Apache too bloated? Use lighttpd. KDE overblown? Use fluxbox. And three is (open) standards (no sniggering in the back
Simply put, we have power and flexibility at easy disposal. What you do with it is up to you.
Je me fous du passé
Is 8-way still considered SMP? I mean, 8-way is kind of consumer level now, isn't it? Even Apple produce 8-way machines SSI machines.
Get it to scale on some serious SGI kit, for example, then we'll talk.
Max.
How about vmware? I dont think that runs on bsd either...
Linux will run virtually everything bsd will (after a recompile)... And most linux apps will recompile for bsd, but bsd's linux emulation isn't perfect when it comes to precompiled linux apps...
There's also hardware support, does bsd have drivers for modern ati videocards yet? I know the linux drivers suck, but its slightly better than nothing.
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
As t'other poster pointed out;
CFS = Completely Fair Scheduling = CPU scheduler = what process gets how much of the CPU
CFQ = Completely Fair Queuing = I/O scheduler = what process gets how much of the hard disc
FWIW, on our database loads at least, I find that whilst deadline tends to give the lowest single transaction rate, CFQ gives better overall performance (i.e. more transactions served) over a given time period. Anyone tried the CFQ, deadline and no-op schedulers on a solid state disc yet?
Moderation Total: -1 Troll, +3 Goat
The article also describes a FreeBSD 7.0 pre-release from October last year. This still had debugging code turned on in the builds, as mentioned on the NetBSD lists when Andrew Doran was comparing NetBSD -current SMP performance.
Yes, I meant that: who cares?
Nobody living outside their parents' basement is going switch from Linux to BSD for a 15% performance increase. Somebody already using BSD might upgrade if the latest BSD kernels and environment are significantly better than past environments, but 15% is so slight as to be basically undetectable in a real-world environment!
My rule of thumb for upgrading equipment has been to not bother until we hit a full order of magnitude improvement. In other words, if 1) we can 10X the performance of a system AND 2) there have been complaints about performance, then the upgrade is probably worth it. Even then, the value is dubious. For example, in Postgres, (or any other database application) it's very typical to see 100x improvement simply by creating an index!
Maybe this is good for frail BSD egos, who have been long bruised by the mindshare success of Linux over the more historic and "more free" BSD. So be it. But it's not performance that's kept me from using BSD, it's familiarity and the pain of switching. And that's also what kept me running it yesterday, will today, and tomorrow too.
Don't get me wrong - I would hate to see BSD "die" in any meaningful way. The different cultures between Linux and BSD create a very rich, diverse environment where ideas can be tested, and the cross-feed of proven concepts and technologies (EG: Open SSH) benefits all involved!
But the benefit of a 15% performance increase is almost never going to be sufficient reason to pick one computing technology over another!
I have no problem with your religion until you decide it's reason to deprive others of the truth.
Funny you should mention that. If you rule out junk software like sparkly mouse cursors, Windows seems to have less software than any other major OS (given that most Unix software is already ported to OS X, or at least can be). I feel constricted every time I have to use a Windows box because none of the programs I want to use are installed, or even readily available. No, I'm not joking.
Dewey, what part of this looks like authorities should be involved?
I did this once. Your mod points are undone and lost.
Space game using normal deck of cards: http://BattleCards.org
I am so fed up of reading this. Yes, Linux has more drivers installed "out of the box" than windows. Big deal. Every single piece of hardware I have ever bought came with a CD that had drivers for windows. Yes, it's a bit of pain having to install them all manually after reinstalling the OS, but you only have to do it once. It's far more of a pain to find that you shiny new toy has no working drivers for Linux.
I use Linux as my desktop OS, but I am no prepared to ignore it's shortcomings. From where I'm sitting right now I can see three devices that do not work with Linux. All of them have drivers for XP (not sure about vista).
I see hardware support like this. If a driver exists for Linux, then the support is generally far better than windows. You plug it in, it works. If, however, your distro does not have a driver, then you are very probably shit out of luck. The device will either not work at all, or require hours of fiddling. Windows on the other hand, has virtually no "out of the box" support. Plug anything in and prepare to be met with yellow exclamation marks in the device manager. The difference is that unless it's some ancient or obscure bit of kit, it will either come with a driver disk or have a driver available on the manufactures web site. Every piece of hardware you could buy works with windows.*
And since you asked:
My PCLine webcam, my Nokia phone, and the USB modem my ISP gave me. Now to be fair, it might be possible to coax all of these devices into working if you know the correct incantations and rituals, but in every case they failed to work "out of the box".
* Before someone replies to me with an example of a device that won't work in windows, allow me to qualify this. I'm referring to to desktop hardware, manufactured in the last, ooh, lets say seven years. I defy you to find anything on PC world's shelves that is not Windows XP compatible ( I have never used Vista, so in a break with Slashdot tradition, I'm not going to spout off about something I know nothing about). I'll bet you a months salary I can find something that won't work with Linux.
"I realise this is not a very popular opinion but it's the truth, and there for needs to be said" -Bill Hicks
In AMD's case, the shared cache sits on the other side of the fully-connected crossbar, which allows intra-core communication to happen without using HyperTransport at all. So yes, it's shared, but each core has its own "port" to it and can access it independently. Same deal for the Intel shared L2. The phrase "single connection to the system bus" is misleading because it implies a bottleneck where there (most times) isn't one.
In the case of AMD, it also means a shared memory controller.The memory controller on Intel systems is shared as well. It's just sitting on a different chip, across the FSB.
In fact, once you move to multi-socket, AMD systems generally have as many memory controllers as there are sockets, and with NUMA optimizations in modern OS's, it's likely that a core will only ever need the memory controller which it's closest to. In Intel systems, all cores on all sockets still share a single memory controller.
You equated multi-core with multi-processor. I countered that it's fallacious to say that about both volume designs currently in the marketplace.
Shared cache is hardly a necessity. The original Pentium D didn't have any. And (I misspoke, er typed) neither does the Athlon X2. It's just an option that makes sense when you're sharing silicon.
Another differentiator is that multi-core designs can communicate at native clockspeed, rather than resorting to an interconnect. Hypertransport is fast, but shared silicon is faster.
I wouldn't discount cache and interconnect as tangential aspects of a processor. If you look at any modern CPU the majority of the die size is going to be cache, and a significant portion of the power draw comes from the system interface.
Even discounting the performance gains possible with shared resources and on-chip intercommunication and ignoring the power savings (note that quad core parts are hitting the same power envelope as quad core without drastic process changes) there's the simple matter of density. Producing a 1U rack server with eight discrete processor slots would be an engineering miracle, yet any white box operation will happily sell you a 1U rack server with dual four core processors.
The difference is in the real-time scheduling requirements that come with a GUI. Very minor delays in GUI rendering have very perceptible impact on the snappiness of a UI. Server workloads (DB, HTTPD or whatever) have less stringent real-time requirements. Throughput ends up mattering more as long as the latency is in a reasonable range.
What metric? Desktop drivers.
Gnuyen
A dual core is likely to be different from a dual processor machine. With Intel's Core2 Duo machines (am only using that processor because I know it's architecture, not because it's better or worse than anything else), both cores on a chip share the L2 cache. So a Dual Core Xeon with 8MB-L2 cache, shares the caches between the two cores and is not the same as 2 processors with each having 4MB of L2 cache. Besides the ability to have 1 Core use 8MB of the cache (presuming the 2nd core is forcibly halted and left idle), there are scheduling differences and differences in migration costs. Intel's 1st Quad core chips after the Core 2, were logically 2-Core2 Processors on the same chip. Each pair shared an L2 cache with their being a total of 2-L2 cache's on the one chip.
...even partly with security (a hybrid model with some security being configurable, (LSM) and some designed not to be (the "standard", user-controlled Unix file-access bit checking isn't modularized). It's odd that CPU scheduling was thought to be a 1-size fits all model when virtually nothing else is). But because it isn't configurable, there was no way to make the CFQ cpu scheduler an optional, _testable_ scheduling module before it was chosen as the "one-and-only" model.
In some ways, that quad arrangement is like a Dual-Socket motherboard that has a Core2 Duo in each socket. Migration costs between adjacent cores (if migration includes cache loading costs) would be considerably less than between the two separate processors.
I believe the first Dual Core chips were similar to Dual processors machines in that each core had its own separate, fixed size cache. Logically -- one could achieve maximal resource usage on Processors with shared-caches, since whether your workload involved 1 active thread or multiple, the threads that are active can use all of the available core, whereas multi-core processors with each core having it's own separate cache will be limited to that cache even when other cores are idle.
At the time the Core Duo came out, AMD chips seemed to mostly (?completely?) sport per-core cache's, so the Core duo was a jump forward. Which the Quad-Core2 based chips had fully shared L2 caches -- would have been a no-brainer to upgrade to a quad-core with 8M L2 from a dual-core with 8M, but the processors on the quad core chip would be limited to 4M, max/core (or per/pair), whereas the dual-core chips could use up to 8M cache.
Of course the impact of cache size and whether it is sharable is totally dependent on what program(s) you are running, but local benchmarks between a 2GHz-8M-Core2Duo and a 3.2GHz-4M-Core2Duo showed the 2GHz beating out the 3.2GHz chip on small-medium problems with the 3.2GHZ chip taking the lead, only, in larger problems.
Supposedly, the linux kernel scheduler (pre-CFQ), recognizes the increased costs of inter-Processor switching being higher than intra-processor switching, but I've been unable to verify this. It might require some manual configuration using "CPUsets", but don't know.
FWIW, the new CFQ-cpu scheduler (which is different than the block-layer's CFQ Block-I/O scheduler) seemed awfully rushed into use as the "mainline" scheduler. I think it is because Linux has a "design choice" that it doesn't allow for modular CPU-schedulers as it does in the case of "block-i/o" (and USB I/I scheduling, and file systems, and choice of network layer, and partition type