Dual Athlon Preview: Linux Kernel Compile Smokes
Mr. Flibble writes: "The fellows over at NewsForge have an article describing how they were able to test the 'World's First Dual DDR Athlon' running Mandrake 7.2 on a prerelease motherboard and chipset. The surprising thing is that the dual system was 142% faster in a kernel compile than a single processor system!" Jeff (of NewsForge) says this is the genuine truth. Now if only the right motherboards would start showing up in quantity on pricewatch ...
I did a similar test using a 466 Dual Celeron system with 128Meg of memory on Red Hat 6.x (With that special Abit board).
:) Additionally it's great for compiling / MP3ing in the background.. Almost zero lag is noticed (not to mention the almost 100% increase in MP3 encoding performance. I believe that was mostly CPU bound.
In order to be scientific, you need a control.. I was sorry to say that this reviewer did no such thing. You point out that the -j helps even for single-CPU's, and this definately was the case with my test results (I can go dig them up if anybody is interested). BUT, there is a limit to the performance enhancement of -jxxx, since a single task running at full throttle is much faster than 2 or 10 tasks switching back and forth. So what I did was for both single and dual CPU modes, I ran with the bare make, then -j 2, then -j 3, -j 4, and finally -j 5 (where performance was being hurt).
I don't recall, but I believe beyond -j 4 I was swapping to disk (though I know I achieved that phenonmena at a sufficiently large number).
Another problem with the experiment was that the slower method was run first.. There is the issue of disk-cashing - namely that the second test stood the chance of having key libraries and possibly most source code still in cache during launch which would dramatically reduce the IO latency. An ideal test of CPU performance would be to put half a gig of memory in there, run it through once, "reboot", then run it for the other.. This is precisely what I did, and I do believe there were several seconds shaved off for cached recompiles.
Personally, I like dual proc's just so I can watch xosview's dual-CPU meters flop back and forth.
-Michael
-Michael
The make -j3 lets make run three processes at once, which would lead to a speedup even on a single processor system, because disk I/O and CPU-bound compilation can overlap.
/. readers typing "make -j3 bzImage".
The noise you hear is the sound of thousands of single-CPU
There's a better news bite at Ace's about this. Basically, the second compilation used 3 threads, so the CPU may have had less idle time and i/o bottle neck then the single.
"Unfortunately, the benchmarks vary significantly between the two tests in that the first is completely serialized while the second (dual-processor) test is run with three parallel make processes (notice the -j flag). Because the first system is running with only a single build instance, the processor is spending a great deal of time simply waiting on IO. Meanwhile, the dual-processor test was performed with not just two, but, in fact, three make processes. The difference here is that a processor will not be completely idle while waiting on IO in the second test, as there are two additional build processes running concurrently. This is why the use of the -j parameter is often recommended even for uniprocessor systems, as a parallel make will often yield much higher CPU utilization and thus faster compiles.
"Until then, it is very difficult to make a representative statement about the performance of a dual-processor Athlon system from this benchmark."
-----------------------------------------
"Open Source?" - Press any key to continue
System: SuSE 7.0, kernel 2.4.1 compiled with Uniprocessor and APIC/IO_APIC.
Athlon 1.1GHz, Asus A7V motherboard. FSB is 100MHz DDR. Memory is 256 megs at PC133, ATA66 5400RPM drive with ReiserFS.
I performed three series of tests. All tests were performed in single/double/triple thread orders, and each thread compile had it's own directory.
First test, all three had been make config'd per the original article, followed by make dep. After that, I rebooted and did all three compiles without rebooting. Second series started the process over again by make mrproper/make oldconfig/make dep/time make -jN bzImage, with N being the corresponding thread. Finally, I did a make mrproper/make oldconfig/make dep and rebooted each time before the compile.
I should note that on several occasions, I got Odd results; whether this was caching of some sort or not I don't know, but I would get 3m35s on a single thread and 1m9sec on a -j2 with a removed and recreated directory, as well as one or two other occasions - unfortunately, all the other occasions were when I was accidentally failing to use "time make -j2 bzImage" and instead was only doing "make -j2 bzImage", so I have no empirical proof. At any rate, here's the recorded ones.
Round 1
Straight
real 3m17.571s
user 2m54.660s
sys 0m13.120s
-j2
real 3m13.772s
user 2m58.390s
sys 0m13.390s
-j3
real 3m13.470s
user 2m59.390s
sys 0m13.180s
Round 2
Straight
real 3m8.048s
user 2m54.780s
sys 0m13.140s
-j2
real 3m11.912s
user 2m58.050s
sys 0m13.590s
-j3
real 3m12.532s
user 2m58.370s
sys 0m13.900s
Fresh-boot compile
Single thread was not redone; it was the Round 1.
-j2
real 3m15.634s
user 2m58.030s
sys 0m13.700s
-j3
real 3m16.433s
user 2m59.310s
sys 0m13.290s
As you can see, not much of a variation on here. The times are also a hell of a lot better than a 1.2GHz system single-threaded with DDR SDRAM, which makes me wonder what precisely is slowing down the 1.2GHz...
Food for thought.
You thought that this sig was what you think that I thought you wanted me to think. I think.
The article states:
This isn't really a good way to compare single processor results to dual processor results. The make -j3 lets make run three processes at once, which would lead to a speedup even on a single processor system, because disk I/O and CPU-bound compilation can overlap. The only totally fair way to compare is to boot a non-SMP kernel, run the benchmark, then boot an SMP kernel and run exactly the same benchmark.
Even though the 142% speedup is bogus, the two minute kernel compile is pretty damn fast.
About AMD's upcoming dual systems is that each processor has a seperate bus to the memory, unlike intel systems where all the chips share the same bus.
The bad thing is that so far only Tyan has announced a MB based on the 760MP chipset and that MB is definitely suited for servers, won't fit in a standard ATX case.
Jesus used to be my co-pilot, but we crashed in the mountains and I had to eat him.
Linuxisforcommunists.org