Dual Athlon Preview: Linux Kernel Compile Smokes

142% Faster? Ok, but did you............. by Anonymous Coward · 2001-02-01 14:15 · Score: 4

Ah, yes... I can see it now.

Marketing Bozo: "Ok folks, here's the "before" version. Wow, thats mighty slow! Lets Ctrl-C out of that kernel build, drop in that second Athlon, and build that kernel again!"

A few minutes of fiddling pass

Crowd: (ooooh---ahhh!)

Marketing Bozo: "Ok! Off we go! Wow, look at that sucker haul! Its nearly 150% faster than the single processor version!"

Crowd: (ooooooh!---ahhh!) (clap clap clap)

A voice from the back of the crowd speaks...

Bowie: Hey dickweed! You forgot to MAKE CLEAN!

A fight breaks out..

Crowd: "Kill him!! KILL HIM! He runs that PROPAGANDA page! His words and ideas bring fear, destruction and DEATH to all who listen!! KILL HIM!!"

The sounds of the beating continue as Marketing Bozo takes pre-orders for his motherboard..

Just another day in at the convention..

Yeah, but ... by arthurs_sidekick · 2001-02-01 20:59 · Score: 2

Did they try to boot the kernel they compiled? =)

--
"Oh, I hope he doesn't give us halyatchkies," said Heinrich.

Re:Ace's Hardware by maraist · 2001-02-01 21:08 · Score: 5

I did a similar test using a 466 Dual Celeron system with 128Meg of memory on Red Hat 6.x (With that special Abit board).

In order to be scientific, you need a control.. I was sorry to say that this reviewer did no such thing. You point out that the -j helps even for single-CPU's, and this definately was the case with my test results (I can go dig them up if anybody is interested). BUT, there is a limit to the performance enhancement of -jxxx, since a single task running at full throttle is much faster than 2 or 10 tasks switching back and forth. So what I did was for both single and dual CPU modes, I ran with the bare make, then -j 2, then -j 3, -j 4, and finally -j 5 (where performance was being hurt).

I don't recall, but I believe beyond -j 4 I was swapping to disk (though I know I achieved that phenonmena at a sufficiently large number).

Another problem with the experiment was that the slower method was run first.. There is the issue of disk-cashing - namely that the second test stood the chance of having key libraries and possibly most source code still in cache during launch which would dramatically reduce the IO latency. An ideal test of CPU performance would be to put half a gig of memory in there, run it through once, "reboot", then run it for the other.. This is precisely what I did, and I do believe there were several seconds shaved off for cached recompiles.

Personally, I like dual proc's just so I can watch xosview's dual-CPU meters flop back and forth. :) Additionally it's great for compiling / MP3ing in the background.. Almost zero lag is noticed (not to mention the almost 100% increase in MP3 encoding performance. I believe that was mostly CPU bound.

-Michael

--
-Michael

Re:The lie of -j3 and no "make dep" by maraist · 2001-02-01 21:18 · Score: 2

Two threads on one processor:

You mean two processes (jobs as make calls it). :) AFAIK make is not multi-threaded. And even though Linux makes little distinction, MT is faster than MP due to a reduced cachable memory foot-print. But most of that is moot since make has extremely little overhead, all the caching gest thrown away due to the fact that make makes hundreds (thousands even) of exec-calls.

That reminds me.. When is someone going to add make and gcc libraries to Perl? I want to be able to use Perl as the "process-glue" between all these steps so that building does not require forking / reparsing of all those damn .h files (something that at my project at work literally takes 4 hours.. 1 minute for each .cc file (even if it's only a couple lines long)). If someone experienced enough with those libraries wants to work with me, I can handle the perl-xs. :)

-Michael

--
-Michael

Re:I stand corrected... by twdorris · 2001-02-01 21:18 · Score: 2

Because the test isn't a valid test. They didn't *just* change the CPU configuration when they ran the second test. In fact, they didn't change the configuration at all. They just increased the number of concurrent compiles. Doing this even on a single processor system would have shown an improvement simply because there's IO latency involved in compiles (and lots of it). Running more than one compile at at time allows a second or third compile to use the CPU while the first and/or second are tied up waiting for IO.

The test they ran does not indicate the benefits of dual CPU alone. It shows the benefits of dual CPUs combined with the benefits of running multiple compiles at the same time. That's why you end up with more than 100% increase.

What about caching? by CarrotLord · 2001-02-01 14:22 · Score: 2

Of course, there's a lot here that is being missed, even apart from the -j3 debacle...

The disk itself will be doing more caching on the second time through, as will the RAM disk cache, and various other caches (even the caching of gcc itself...) Also, does a `make clean` _really_ clean the tree back to pre-compile stage?

To do this properly would require two separate kernel trees to compile, and a reboot in the middle, and preferably SMP kernel vs non-SMP kernel in the reboot... The other way, which is more practical in the circumstances, could be to try doing a `make -j1; make clean; time make -j1` followed by a `make -j2; make clean; time make -j2`... That would be closer to reality, but still not quite...

rr

--
Quidquid latine dictum sit, altum videtur.

As long as the price is reasonable... by Mr.+Flibble · 2001-02-01 14:26 · Score: 2

I would expect to see Tyans board sell well. I also strongly suspect that Abit, on the strength of things like the BP6, and KT7 series, combining the likes of the BP6 and KT7 will be a big seller. Well, I hope so at any rate!

I really want a dual Athlon Abit board :)

--
Try to hack my 31337 firewall!

What's that noise?! by dstone · 2001-02-01 14:27 · Score: 5

The make -j3 lets make run three processes at once, which would lead to a speedup even on a single processor system, because disk I/O and CPU-bound compilation can overlap.

The noise you hear is the sound of thousands of single-CPU /. readers typing "make -j3 bzImage".

Re:What's that noise?! by ndfa · 2001-02-01 14:33 · Score: 2

The noise you hear is the sound of thousands of single-CPU /. readers typing "make -j3 bzImage".

SOMEONE PLEASE MOD IT UP TO SUPER FUNNY!!! darn i posted too early!!!

--
Non-Deterministic Finite Automata

Re:I've noticed this too by Mihg · 2001-02-01 14:30 · Score: 2

Because two of these can run at the same time (one per CPU), the data doesn't have to be written to the disk between stages.

The data doesn't have to be written do disk anyway -- that the entire point of the -pipe option in gcc. All data is written to stdin of the child processes, resulting in faster compiles because nothing ever hits the disk until to assembler outputs the object file.

Multi-processor builds are faster because make (with the -j option) compiles several files in parallel (idealy, one per processor), not because data is piped from stage to stage in the compiler.

The reason they use make -j3 on a two-processor is to take advantage of the fact that compiling programs is both a compute-bound and I/O-bound operation. While one instance of the compiler is waiting for data to be read, another instance is busy generating code. Both processors are always in use, even when one of the instances of gcc is stalled waiting for an I/O operation to complete.

---
The Hotmail addres is my decoy account. I read it approximately once per year.

make -j3.... by ndfa · 2001-02-01 14:32 · Score: 2

so from what i recall we would do a:

alias make='make -j2'

then you do a make..The reason i believe was that make would then go into each subdir and use the make -j2 command rather than just go serial after the top dir ? Hmmm been some time so i am not completely sure, but thats what i recall..... anyone have more info on that, and should they have done something like that ?

--
Non-Deterministic Finite Automata

142% by joto · 2001-02-01 14:33 · Score: 2

I would believe that for a task as easily parallelizable as kernel compilation (one file per processor), there should really be almost linear speedup with the number of processors.

Why do people consider 142% for two processors impressive?

Re:142% by fireant · 2001-02-01 14:43 · Score: 2

Why do people consider 142% for two processors impressive?
If they had said that the dual athlon system was 142% as fast as the uniprocessor system, then it would have been disappointing, but 142% faster is more than twice as fast.
Of course, the benchmark was flawed, as many others have pointed out, so the real numbers may not be as impressive.

It's getting more expensive and less common by heroine · 2001-02-02 01:25 · Score: 2

The real question is will TYAN see enough of a demand to justify the cost of mass producing these systems or will they just charge $1000 a part. They had dual Athlons since 1999 but manufacturing costs were far too high. Now with embedded systems the rage you'd think they'd never recover the cost of mass producing SMP boards.

Re:Yes but... by Tet · 2001-02-01 21:33 · Score: 2

Two minute kernel compiles sure sound nice.

Sounds a bit slow to me. What you really want is 20 second kernel compile times :-)

--
"The invisible and the non-existent look very much alike." -- Delos B. McKown

DDR v.s. Rambus by maraist · 2001-02-01 21:36 · Score: 2

Is the AMD chipset rambus capable? I could check, but it's not important.. What I'd like to see are similarly configured systems with both Rambus and DDR, to see how well RDRAM handles dual proc heavy loads like compilation. I know that Intel's PIII uses out of order memory pre-fetching to maximize memory requests, and that can quickly saturate the DRAM controller (and memory, which I think can only interleave 2 requests). I'm wondering if latency can be over-come in such memory depedant operations..

This of course requires that enough memory is used to fully cache the disk and alleviate disk-IO latency.

-Michael

--
-Michael

Re:DDR v.s. Rambus by Tumbleweed · 2001-02-02 01:41 · Score: 2

No, it's not Rambus capable.

Re:Not quite a perfect comparison by Hard_Code · 2001-02-01 21:40 · Score: 2

But I thought the point is not to see the difference between the SMP kernel using one processor and the SMP kernel using two processors, but instead, between a non-SMP kernel using one processor (really, who would use an SMP kernel for 1 processor?), and an SMP kernel with 2 processors. The test is really between SMP and non-SMP.

--

It's 10 PM. Do you know if you're un-American?

Twice is Nice! by whydna · 2001-02-01 13:46 · Score: 2

Now, does Linux's SMP still do silly things with the cache on these chips? I recall that the kernel would entirely disable the cache on both chips (which made pIIIs into Celerons (effectivly)). It's nice to see some alternatives for dual CPU machines though.

Ace's Hardware by NovaX · 2001-02-01 13:46 · Score: 5

There's a better news bite at Ace's about this. Basically, the second compilation used 3 threads, so the CPU may have had less idle time and i/o bottle neck then the single.

"Unfortunately, the benchmarks vary significantly between the two tests in that the first is completely serialized while the second (dual-processor) test is run with three parallel make processes (notice the -j flag). Because the first system is running with only a single build instance, the processor is spending a great deal of time simply waiting on IO. Meanwhile, the dual-processor test was performed with not just two, but, in fact, three make processes. The difference here is that a processor will not be completely idle while waiting on IO in the second test, as there are two additional build processes running concurrently. This is why the use of the -j parameter is often recommended even for uniprocessor systems, as a parallel make will often yield much higher CPU utilization and thus faster compiles.

"Until then, it is very difficult to make a representative statement about the performance of a dual-processor Athlon system from this benchmark."

-----------------------------------------

--

"Open Source?" - Press any key to continue

Re:Ace's Hardware by Mike+Hicks · 2001-02-01 22:11 · Score: 2

IIRC, the kernel makefile sets things up to run as -j2 by default. I could be wrong though. Yes, it would have been nice if they had given a table of compile times, using -j[2345]...
--

Re:Not quite a perfect comparison by dbarclay10 · 2001-02-01 14:42 · Score: 2

Just for the hell of it, I tried out my own little benchmark.

Single-processors system, compiling linux-2.2.18 w/ ReiserFS patch 3.5.29.

'make clean && make dep && time make bzImage':
real 7m24.803s
user 6m30.070s
sys 0m39.630s

'make clean && make dep && time make -j3 bzImage':
real 7m9.606s
user 6m28.400s
sys 0m38.910s

This is a relatively monolithic kernel; only sound is modular, everything else I need is compiled in. So, doing a 'make -j3' on *my* uniprocessor system yields an absolutely <sarcasm>*MASSIVE*</sarcasm> 15.2 second gain.

In short, while I wouldn't make any bets on the benchmark these fellows did, I don't think they're as useless as most people seem to be thinking.

Barclay family motto:
Aut agere aut mori.
(Either action or death.)

--

Barclay family motto:
Aut agere aut mori.
(Either action or death.)

Re:its both DDR & dual CPU system by Tumbleweed · 2001-02-02 01:49 · Score: 2

> Actually that chipset supports up to 8 CPUs just by adding an extra northbridge for every extra CPU, above the intitial 2 CPUs.

Uhm, NO, sorry. Wish it were true, though. The 760MP chipset supports up to TWO processors only. You'll see >2 proc support chipet(s) from other vendors, later on.

Re:142%? by Mr+Z · 2001-02-01 21:58 · Score: 4

No, it was a bogus test. The author did "make bzImage" for the uniprocessor test and "make -j3 bzImage" on the multiprocessor test. If he'd done "make -j3 bzImage" for both, he would've discovered that the machine sped up by less than 100% most likely.

The thing is, "make -jX " for about 1 < X <= 4 still gives a speedup on uniprocessor systems because some compile tasks can be in disk-wait while others sit on the CPU. (The optimal number for X depends on how fast your disks are and how much RAM you have. If X is too big, you start swapping, and end up losing performance.)

--Joe
--

--
Program Intellivision!

Re:And here's the pic by psergiu · 2001-02-01 14:55 · Score: 2

Aahh, the humanity ...

It has no ISA ! Blasphemy ! I bet it doesn't even have ROM BASIC :)

--

--
1% APY, No fees, Online Bank https://captl1.co/2uIErYq Don't let your $$$ sit in a no-interest acct.

I stand corrected... by joto · 2001-02-01 15:07 · Score: 2

Ehh.. Shoule have read better. That is very impressive! How can this happen?

lamer benchmark by valentyn · 2001-02-01 15:16 · Score: 3

Gee - that could not have been done worse. They actually make the 1 processor kernel with ``make bzImage'', and the (so called) dual processor with ``make clean; make -j3 bzImage''.

This suggests that they made a kernel on the same system before, and try to ``undo'' the make.

This is stupid. Why? Because:

they did not run a ``make dep''. This means the so-called ``single processor compile'' (which it is not!) is set back several seconds (make dep takes 40 seconds on my SMP Celeron 466).
The SMP version can take advantage of this, as ``make clean'' does not need ``make dep'' anymore. (AFAIK).
``make -j3'' is *not* the same as ``testing an SMP compile''.

If they really wanted a single vs. dual processor kernel compile test, they should have started with two real kernels, one for uniprocessor, one for SMP.

Then make a ``test config'' .config-file, for example with ``cp arch/i386/defconfig .config; make oldconfig'' (and press a couple of enters). Copy this file to ``Testconfig'' or something.

Now start the system with the single processor kernel and run the following:

make mrproper; cp Testconfig .config; make oldconfig; make dep; time make -j$N bzImage

... for $N being 1-3. Write down results. This is the ``single processor'' kernel compile time. The ``make mrproper'' makes sure there is no garbage left (another, even better, way of testing would be unpacking a new kernel source tree for every test).

Now reboot the system and run the dual processor kernel. Recompile, with -j$N maybe going up to 4 or 5 or so.

Now *that* is something that comes close to a benchmark.

--
my other sig is a 500 page novel

Possibly among the hardcore. by Mr.+Flibble · 2001-02-01 15:27 · Score: 2

FWIW, all Athlons can do SMP, but there's no boards on the market that support it, and even when this one makes it to market, it'll probably cost a mint and require a special case/PS.

I agree on the power supply, Athlons have a big draw, and duals will be worse of course. I disagree on the cost. Remember the BP6 from Abit? It was a big sucess because it took cheaper chips (Celerons) and created SMP systems at at price that was reasonable.

Here we have Athlons, offering a far better price/performance ratio than anything Intel has to offer. If Abit comes out with a board in the BP6 price range, I bet its more popular than the BP6. Remember, the BP6 has its own website few motherboards can claim that.

--
Try to hack my 31337 firewall!

Linux SMP kernel "does the right thing." by CaptainAbstraction · 2001-02-01 15:32 · Score: 3

Linux SMP kernel does the right thing as far as cache synchronization, according to the text Understanding the Linux Kernel published by O'Reilly, in reference to kernel 2.2:

The section "Hardware Cache" in Chapter 2, Memory Addressing, explained that the contents of the hardware cache and the RAM maintain their consistency at the hardware level. The same approach holds in the case of a dual processor. ... But now updating becomes more time-consuming: whenever a CPU modifies its hardware cache it must check whether the same data is contained in the other hardware cache and, if so, notify the other CPU to update it with the proper value. This activity is often called cache snooping. Luckily, all this is done at the hardware level and is of no concern to the kernel.

Hope this helps

Cheers,
Andrew

Re:Linux SMP kernel "does the right thing." by chainxor · 2001-02-01 19:46 · Score: 2

This sounds very plausible. The cache snooping works quite well for most systems. A major concern always to be taken into account though, is the bus-bandwidth when having multiple-CPU machines, unless the bus architecture supports some sort of switching (like a crossbar or twin-split). When having two or more concurrent running CPU's all accessing RAM at potentially the same time the scaling (increase in speed) is restricted by the bus bandwidth mostly, since the CPU's will have to share the bandwidth. On a Dual machine having e.g. two threads running that each does some sort of algorithm, and the algorithms contain relatively few memory accessing instructions compared to instructions operating on registers only (or small mem. area accesses so that the cache is valid most of the time) the scaling be as good as approx. 90% compared to an equivalent single CPU system. On the other hand if both algorithms are extremely heavy in terms of accessing memory, the scaling can be as bad as 8%. Both I have tested on a Dual Celeron 466Mhz machine with 66 Mhz RAM. The Dual Athlon thing with DDR RAM would propably give some yummy results in that regard :-))) I'm curious as to whether this new motherboard has some kind of switched bus? Since this will definitely be neccessary if 8 CPU's should be able to do their job in practical situations. "This building houses 50 companies each having 40 employees and only one entrance?"

Re:Not quite a perfect comparison by b0r1s_7h3_h4x0r · 2001-02-01 15:32 · Score: 2

No real speedup. The best speed on my Compaq 5100 @ 300 with 512 but only one processor inserted is at -j3 (diff between any was less than 15 seconds). The time (for 2.4.0test12, my minimalist config) is:

9m47s real, 9m12s user, 0m44s sys.

Same 5100, second processor in, the time is:

5m19s real, 9m25s user, 0m43s sys.

The result? Only a 82% speed increase. Which should be typical for all Intel based SMP systems (Loss numbers run 12-17% cumulative per additional CPU, depending on mobo.)

Compare that loss figure with, say the AS/400. Quoted loss figure is less than 3% cumulative per CPU.

3v1l_b0r1s at d4rkr0ck d0t c0 d0t uk

--
3v1l_b0r1s at d4rkr0ck d0t c0 d0t uk
http c0l0n 5l45h 5l45h www d0t d4rkr0ck d0t c0 d0t uk

Lies, statistics and benchmarks by f5426 · 2001-02-01 15:33 · Score: 2

1/ They did the 'mono' test on a bi processor kernel, paying the SMP tax (a SMP enabled kernel is slower than an non-SMP enabled one)

2/ They used -j 3 for the bi procesor one, while the first one was probably I/O bound.

3/ The did the mono-processor first, than the bi processor. The disk cache may have helped the second compile.

A correct way to test would be:

foreach i (1 2 3 4)

1/ Boot the machine with a mono processor kernel
2/ time make -j $n bzImage ; make clean

Repeat with multi processor kernel.

If one don't want to reboot between each test, than you should do something like:

make -j $n bzImage ; make clean ;

before running the real tests.

Cheers,

--fred

--

1 reply beneath your current threshold.

Re:Lies, statistics and benchmarks by fireant · 2001-02-01 15:50 · Score: 2

If one don't want to reboot between each test,
Correct me if I'm wrong, as I don't own an smp system, but don't you have to reboot to change from uniprocessor mode to dual processor mode? I seem to remember reading something about pulling out the processor and installing a termintor in the socket. Of course, there may be an easier way to disable one processor, but I still think that you'd have to reboot in order to make the switch.
Re:Lies, statistics and benchmarks by f5426 · 2001-02-01 16:15 · Score: 2

> Correct me if I'm wrong, as I don't own an smp system, but don't you have to reboot to change from uniprocessor mode to dual processor mode

Sure. You misread me, or I mis-expressed myself.

There was a 'foreach' before point 1 and 2. I meant:

boot mono kernel
compile kernel with -j 1
boot mono kernel
compile kernel with -j 2
boot mono kernel
compile kernel with -j 3
boot mono kernel
compile kernel with -j 4
boot smp kernel
compile kernel with -j 1
boot smp kernel
compile kernel with -j 2
boot smp kernel
compile kernel with -j 3
boot smp kernel
compile kernel with -j 4

This is how a serious benchmark should be done, with the machine state as similar as possible before each tests.

If one suspect that the best compile time would be '-j 2' for a mono kernel and '-j 3' for a bi proc one, then the 8 tests are more or less necessary ("mono -j 1" and "mono -j 3" are necessary to prove that best compile time for a mono machine is -j 2 and "smp -j 2" and "smp -j 4" are needed to prove than -j 3 is best for smp. "mono -j 4" amd "smp -j 1" are here for the sake of completness).

If someone want to avoid 8 reboots, he can do:

boot mono kernel
compile kernel and throw away results
compile kernel with -j 1
compile kernel with -j 2
compile kernel with -j 3
compile kernel with -j 4
boot smp kernel
compile kernel and throw away results
compile kernel with -j 1
compile kernel with -j 2
compile kernel with -j 3
compile kernel with -j 4

There, only 2 reboots are necessary. Of course, if the machine have much memory, the numbers will be very different from first benchmarks (because you could end with everything cached)

Cheers,

--fred

--
1 reply beneath your current threshold.
Re:Lies, statistics and benchmarks by martyb · 2001-02-01 20:15 · Score: 3

This is how a serious benchmark should be done, with the machine state as similar as possible before each tests.
I agree. I think this post was on the right track in performing a number of tests to find out where the sweet spot is for the -j argument. There have been hypotheses posted here that caching effects may have interfered with the results. (I wonder if interim/final files' locations on the disk could vary the results, too -- longer seek/write times... maybe need to defrag the disk between iterations, too?)
BUT, it strikes me that EACH test should be repeated a sufficient number of times so that the durations measured vary within a desired confidence level (statistics term -- standard deviation and variance and other stuff whose name and vague conepts I recall but I learned too long ago to recall, now). At an absolute minimum, doing each test twice and having results that vary within, say, a couple seconds would counter the concerns that there was some unknown but suspected optimization happening (e.g. disk cache, left over interim files, etc.).
Personally, I'd still prefer to see each test performed at least 3 times. In my experience, I've seen very close 2-try results where the results on the 3rd time sometimes confirmed them, but other times refuted them. (Yes, I know it's not "scientific", but I'd rather repeat an unnecessary test than omit a necessary one!)
Then, to make sure there were no accumulated small effects from running all those tests, repeat the very first test one more time to confirm that its results fell in line with the orginal results.

Re:How fast does it by DGolden · 2001-02-01 15:33 · Score: 2

Then your linux boxes are probably misconfigured. Are you running lots of services you don't need?
Have you recompiled your kernel to remove checks for hardware you don't have?

Is windows 2000 starting some of its services after the gui appears, just giving the impression it's finished booting, when, in fact, it's still doing stuff in the background (I know NT 4 does that...)

--
Choice of masters is not freedom.

Beta testing by Fuzzums · 2001-02-01 15:35 · Score: 2

D*mn, it's a long time 'till christmas!
---

--
Privacy is terrorism.

Some comparable benchmarks... by Wolfstar · 2001-02-01 17:07 · Score: 5

Being bored and with a comparable machine, I decided to do some tests of my own.

System: SuSE 7.0, kernel 2.4.1 compiled with Uniprocessor and APIC/IO_APIC.

Athlon 1.1GHz, Asus A7V motherboard. FSB is 100MHz DDR. Memory is 256 megs at PC133, ATA66 5400RPM drive with ReiserFS.

I performed three series of tests. All tests were performed in single/double/triple thread orders, and each thread compile had it's own directory.

First test, all three had been make config'd per the original article, followed by make dep. After that, I rebooted and did all three compiles without rebooting. Second series started the process over again by make mrproper/make oldconfig/make dep/time make -jN bzImage, with N being the corresponding thread. Finally, I did a make mrproper/make oldconfig/make dep and rebooted each time before the compile.

I should note that on several occasions, I got Odd results; whether this was caching of some sort or not I don't know, but I would get 3m35s on a single thread and 1m9sec on a -j2 with a removed and recreated directory, as well as one or two other occasions - unfortunately, all the other occasions were when I was accidentally failing to use "time make -j2 bzImage" and instead was only doing "make -j2 bzImage", so I have no empirical proof. At any rate, here's the recorded ones.

Round 1

Straight
real 3m17.571s
user 2m54.660s
sys 0m13.120s

-j2
real 3m13.772s
user 2m58.390s
sys 0m13.390s

-j3
real 3m13.470s
user 2m59.390s
sys 0m13.180s

Round 2

Straight
real 3m8.048s
user 2m54.780s
sys 0m13.140s

-j2
real 3m11.912s
user 2m58.050s
sys 0m13.590s

-j3
real 3m12.532s
user 2m58.370s
sys 0m13.900s

Fresh-boot compile

Single thread was not redone; it was the Round 1.

-j2
real 3m15.634s
user 2m58.030s
sys 0m13.700s

-j3
real 3m16.433s
user 2m59.310s
sys 0m13.290s

As you can see, not much of a variation on here. The times are also a hell of a lot better than a 1.2GHz system single-threaded with DDR SDRAM, which makes me wonder what precisely is slowing down the 1.2GHz...

Food for thought.

--
You thought that this sig was what you think that I thought you wanted me to think. I think.

Re:And here's the pic by dongkiru · 2001-02-01 17:09 · Score: 3

Heh... Yeah, the first thing I noticed when I saw this board(in person, not the picture. I work for ASL, and saw this board last week when my boss was testing it.) was the power connector. Basically, this is a very rough draft of the motherboard from Tyan, and our company works closely with Tyan on their board development. So hopefully they'll change the locations of the connectors for the final release.

The reclining dimm slots are there because the DDR memory for this motherboard is fairly tall, and Tyan would like to be able to use this motherboard on 1U rack systems. The reclining dimm slots does waste a lot of real estate, which could've been used to place the power connector closer to the edge. But the market for 1U rack mount systems appears to be growing rapidly, so I think the reclining dimm slot is very important.

For those of you that are complaining that they just bought a system or is looking to buy a system, this board isn't even supposed to be announced until March, so don't hold your breath.

Re:142%? by Mr+Z · 2001-02-02 06:00 · Score: 2

Ok, I can definitely see that if you have two isolated tasks, each of which would fit in the cache on its own, but both of which would thrash each other in a single-CPU environment, you could get a superlinear speedup by going to multiple CPUs.

--Joe
--

--
Program Intellivision!

Re:Not quite a perfect comparison by RedWizzard · 2001-02-02 06:37 · Score: 2

In that case you should be testing a UP motherboard against the SMP motherboard, not just switch the kernels.

Normally when you benchmark you try to change only one factor. If you change the code you're running and the number of CPUs then how do you know how much each factor is affecting the result?

Don't Bogart that joint, my friend... by Simon+Brooke · 2001-02-01 17:32 · Score: 2

142%? Hey, dude! Can I have some of what it's smoking?

--
I'm old enough to remember when discussions on Slashdot were well informed.

Not pricewatch! by Chagrin · 2001-02-01 16:04 · Score: 2

Now if only the right motherboards would start showing up in quantity on pricewatch ...

www.pricescan.com is a much better engine for searching prices.

--

I/O Error G-17: Aborting Installation

The lie of -j3 and no "make dep" by joto · 2001-02-01 16:07 · Score: 4

I tried the same test on my uniprocessor system, running first "time make bzImage" then "make clean", and last "time make -j 2 bzImage":

Single thread:

597.00user 46.40system 12:11.08elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (789303major+881687minor)pagefaults 0swaps

Two threads on one processor:

511.41user 31.30system 9:21.66elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (489357major+669019minor)pagefaults 0swaps

By the same logic as they used in this benchmark, my uniprocessor system is thus 31 percent faster than the same old uniprocessor system. Bah! I just wish people weren't posting nonsensible benchmarks like this. At least, they should _try_ to make it somewhat representable...

Re:Not quite a perfect comparison by RedWizzard · 2001-02-01 17:40 · Score: 2

For comparison this is an article on Tom's Hardware where he tests a few dual PentiumIII 1GHz machines. He doesn't say what kernel he compiles, probably the kernel of the SuSE 6.4 distribution he's using. Anyway the faster of the two boards managed 141 seconds with two CPUs and 221 seconds with one.

The only totally fair way to compare is to boot a non-SMP kernel, run the benchmark, then boot an SMP kernel and run exactly the same benchmark.

Tom simply replaces one of the CPUs with a dummy, presumably using a SMP kernel for both benchmarks. That probably penalises the non-SMP case which is less realistic but arguably fairer since it means you really are measuring just the difference between 1 and 2 cpus with the same code.

Re:Not quite a perfect comparison by mian · 2001-02-01 18:32 · Score: 3

Here's my tests, didn't remove CPUs or anything since these servers are in production and at the co-location ISP.

Linux 2.2.18 (Dual p3-550, 1Gb ram, all SCSI compiling Apache 1.3.17)

make
real 0m37.244s
user 0m23.900s
sys 0m6.000s

make -j3
real 0m26.915s
user 0m24.360s
sys 0m6.020s

make -j4
real 0m23.724s
user 0m24.130s
sys 0m5.880s

make -j5
real 0m20.154s
user 0m22.940s
sys 0m5.000s

make -j6
real 0m21.326s
user 0m24.120s
sys 0m5.830s

FreeBSD (Dual p3-550, 512Mb ram, all SCSI compiling Apache 1.3.17)

make
39.458u 5.635s 0:48.99 92.0% 1686+1874k 0+1249io 0pf+0w

make -j3
40.007u 5.725s 0:32.53 140.5% 1696+1884k 0+1645io 1pf+0w

make -j4
40.027u 5.817s 0:32.73 140.0% 1691+1877k 0+1631io 0pf+0w

make -j5
40.154u 5.832s 0:31.74 144.8% 1701+1884k 1+1628io 0pf+0w

Re:Sounds Good by cyber-vandal · 2001-02-01 18:39 · Score: 2

No he was using a brain-damaged OS. My desktop, which does nothing more than run a TN3270 session, Lotus Notes and IE crashes at least once a week, usually more. The Microsoft answer to this is twofold:
a) It's not our fault, it's those nasty third-party vendors
b) You should spend a fortune redesigning your network and then a further fortune buying new licenses for Windows2000

Of course, as NT has no logs or core dumps worth a damn, there's no way to know what went wrong.

Re:DDR != Dual by Nailer · 2001-02-01 13:47 · Score: 2

DDR most likely means Double Data Rate SDRAM, which is not the same things as a dual processor system.

Oops. I'm wrong. My fault. Moderate down accordingly.

Re:DDR != Dual by supergumby · 2001-02-01 13:48 · Score: 2

True, but the linked story states quite clearly that they tested a pre-release motherboard from Tyan running the AMD 760MP chipset, running two 1.2GHz Athlon CPUs.

Please check your facts before shooting your mouth off.

"Linux is broken! When I type date, it just gives me the time!"

Not quite a perfect comparison by jerky · 2001-02-01 13:50 · Score: 5

The article states:

The kernel was then compiled using " time make bzImage." The dual processor results were then done by first doing "make clean" then "time make -j3 bzImage".

This isn't really a good way to compare single processor results to dual processor results. The make -j3 lets make run three processes at once, which would lead to a speedup even on a single processor system, because disk I/O and CPU-bound compilation can overlap. The only totally fair way to compare is to boot a non-SMP kernel, run the benchmark, then boot an SMP kernel and run exactly the same benchmark.

Even though the 142% speedup is bogus, the two minute kernel compile is pretty damn fast.

Re:Tyan SMP board pricing by Jeff+DeMaagd · 2001-02-02 12:46 · Score: 2

I guess the layer count is high, but how does it compare with similar products? I've never thought of it that way. I know layer count does affect the cost of the board itself. I know a guy that does a lot of printed circuit board stuff, I'll ask him how high he's gone.

Anyhoo, I believe that the layer count has to do with the fact that there is a LOT of wiring associated with the particular bus type, er, well I understand that it isn't _really_ a bus, but I don't want to get into the particulars.

Re:And here's the pic by Jeff+DeMaagd · 2001-02-02 12:52 · Score: 2

OK, exactly _how_ does playing MP3s slow down such a fast system? On my four year old 500MHz system MP3 playback 'only' takes roughly 1% CPU...

Or are your particular sound card drivers that inefficient? I've noticed some Linux sound drivers were _very_ slow. I never did inquire why.

The good and bad of AMD's MP by DeafDumbBlind · 2001-02-01 13:52 · Score: 5

About AMD's upcoming dual systems is that each processor has a seperate bus to the memory, unlike intel systems where all the chips share the same bus.

The bad thing is that so far only Tyan has announced a MB based on the 760MP chipset and that MB is definitely suited for servers, won't fit in a standard ATX case.

--

Jesus used to be my co-pilot, but we crashed in the mountains and I had to eat him.

How fast does it by Anagon · 2001-02-01 13:55 · Score: 5

boot Windows 2000? Now thats a test~

--
Linuxisforcommunists.org

Tom's Hardware also has a dual CPU comparison by throx · 2001-02-01 19:44 · Score: 2

Tom's Hardware (somewhat more reputable than the results discussed here) shows a more moderate improvement in performance, but it is definitely remarkable.

Go to www.tomshardware.com and have a look - get the real picture.

Now why aren't there any Q3A benchmarks??

--

Fear: When you see B8 00 4C CD 21 and know what it means

Glad to hear it... by rich22 · 2001-02-01 13:55 · Score: 3

but wouldn't a better test involve removing one of the processors, compiling a kernel (while running a non-smp kernel), and then lather-rinse-repeat with both processors in under an SMP enabled kernel? According to the make man page, the -j option contols the number of jobs allowed to run simultaneously. Depending on what you are making, the -jN option can even speed up compile times on single processor machines. The 142 percent "performance" increase may be partially explained by this.

As one of those dual-celeron guys (bang for the buck!), I love to see AMD finally show off dual processor machines. But the next time we get a chance to play with one, lets try to make a more realistic comparison.

Switches invalidate the results (also: 4-way SMP) by NortonDC · 2001-02-01 13:57 · Score: 3

The "-j3" switch with the make is why it got a greater-than-linear improvement.

See Ace's Hardware for a discussion of exactly this:

"[T]he dual-processor test was performed with not just two, but, in fact, three make processes. The difference here is that a processor will not be completely idle while waiting on IO in the second test, as there are two additional build processes running concurrently. This is why the use of the -j parameter is often recommended even for uniprocessor systems, as a parallel make will often yield much higher CPU utilization and thus faster compiles."

Also, see reader comments saying that AMD demonstrated a 4-way SMP Athlon system at LinuxWorld.

--

Tastes Like Chicken

And here's the pic by Anonymous Coward · 2001-02-01 14:00 · Score: 2

And here's the pic

I've noticed this too by faster · 2001-02-01 14:01 · Score: 2

SMP is more than twice as fast because of the way the compiler works; there's a process piping its output to the process in charge of the next compile step. Because two of these can run at the same time (one per CPU), the data doesn't have to be written to the disk between stages.

Please pardon the excessive simplification...

dual processor machines are great for normal use by RussRoss · 2001-02-02 00:05 · Score: 2

It's not too surprising that a dual processor machine gets more than double performance on a kernel compile. Compiling requires several processes working in serial (preprocessor feeds the compiler front end, etc.). One a single processor system, you have to switch back & forth constantly and you lose a lot to context switch overhead. Also, you don't typically run make with a "-j" setting so when you are blocked on I/O everyone waits and the processor goes idle. Having two complete jobs tends to fill in the holes better (the 2-3 processes involved in one compile will fill in the slack while another compile gets started up by make).

There are actually a lot of benefits to a dual-processor setup. I did a research project on the Linux scheduler for interactive users:

http://www.people.fas.harvard.edu/~rross/cs265/pap er.html

Afterward I put together a dual-celeron system and the improvement in the overall responsiveness and feel of the system was quite dramatic.

- Russ

Tyan SMP board pricing by hacksoft · 2001-02-01 20:25 · Score: 2

The Tyan SMP board uses an 8 layer PCB design!! The rumor is it will cost $500usd or more. I haven't seen anything about the board that would lead me to suspect that it doesn't comply to at least the extended-ATX standard. All of Tyan's high end dual CPU boards do.

Re:Sounds Good by treke · 2001-02-01 20:25 · Score: 2

You were probably simply using a brain-damaged kernel driver !!!! Nothing to do with NT itself !

Guess what... it's the same thing. If the drivers are crash NT regularly then something is wrong with the product available to consumers. The same holds true for Linux.
treke

BeBox!!!!! by smallstepforman · 2001-02-01 14:08 · Score: 2

I'm quite sure that the AMD MP760 chipset (which, among other things allows SMP) will find its way into many motherboards very very soon. Serious competition will come from Abits VP6 - the successor of the famous BP6. With so many operating systems supporting SMP (W2K, *nix and my favourite BeOS), how long before SMP becomes the de-facto standard? An interesting bit of trivia for the new Linux folk - BeBoxen had a utility called Pulse which showed distributed CPU usage, with the ability to turn individual CPU's on and off in real time. The designers of the utility had a hidden IQ test in the program - which would be triggered when the user turned both CPU's off. Gee, I miss the good old days.

--
Revolution = Evolution

Slashdot Mirror

Dual Athlon Preview: Linux Kernel Compile Smokes

63 of 177 comments (clear)