Slashdot Mirror


Explaining Disappointing XScale Performance In Pocket PCs

JYD writes: "I found this new article on a Pocket PC web site where Microsoft talks about why XScale Pocket PCs aren't as fast as people thought they would be. Is it the OS? The CPU not supporting ARM4 properly? I wonder if the Linux port would run faster on 400 Mhz ... or did Intel screw up the CPU?"

10 of 133 comments (clear)

  1. Re:Judging by modern Linux DEs.... by Anonymous Coward · · Score: 1, Informative

    I can fully attest to that. I happen to have a 128 MB Celeron 500Mhz box sitting right next to me (no lie). I had Windows 98SE on it for a few days, and It was dog slow, unusably so. I put Mandrake on there (with KDE 2.something) and it flew. I was very surprised how fast it was. It's even faster now that I moved to WindowMaker.

  2. Re:Seems obvious, bus speed & not enough cache by Anonymous Coward · · Score: 2, Informative

    I did n't mean the 3MHz drop hurts you alot, I meant that if you compare the clock to bus speed ratio you're looking at a 50% reduction in bus speed compared to cpu clock rate.

    If there is not enough cache memory increasing processor clock speed will not have a positive affect on performance because the real effective clock rate will be bound by how fast the processor can fetch data from main memory.

  3. Comment removed by account_deleted · · Score: 3, Informative

    Comment removed based on user account deletion

  4. Re:Windows CE, ugh. by JebusIsLord · · Score: 2, Informative

    *sigh* No, pocketpc 2002 is NOT the new name for WinCE, pocketpc 2002 and 2000 are an implementation of windowsCE 3.0. Windows CE is an embedded realtime OS that can be used for all sorts of things, PDAs being one of them. Think of pocketpc 2002 as a "distro" of CE3.0 with a special pocketpc gui installed.

    --
    Jeremy
  5. Re:It's the OS and the Compiler by ceallaigh · · Score: 2, Informative

    Without a compiler that has optimizations for the XScale, you will still get poor performance. So all the tweaks in the world to your existing code base will be for nought without a corresponding change in the compiler which is targeted for ARM7/9 cores and has only basic support for XScale.

  6. Re:Judging by modern Linux DEs.... by IamTheRealMike · · Score: 3, Informative
    Hmm, well .....

    Linux with KDE is slower than Windows 98 basically for two reasons. The first is that Linux does more stuff. For instance, it runs various daemons in the background to allow for remote access, it journals filesystem logs, it implements proper crash protection, it has a usable command line with virtual terminals etc. Windows 98 doesn't have these things, so it can be faster.

    The second reason is that KDE is written largely in C++, and the Linux C++ linker is inefficient (it is much faster at C). The programs run fine, but they take longer to start up, which is what makes it "feel" slow. Gnome should in theory be faster, but they kill any speed increase they'd otherwise get by having a slower (well, in v1.4) graphics library and by using incredibly heavy things such as CORBA for ipc, and a daemon for configuration etc.

    The reason other window managers (not just ancient ones, others such as WindowMaker or E) are faster is because a) they are simpler and b) tend to be written in C

    The speed of GTK is improving, though CORBA/ORBit will always be slow on the gnome side imho. The Linux Linker issues with C++ are known about and are being resolved, which will lead to much better performance.

    Another problem is that some modern distros are quite bloated. My SuSE 7.3 box loads all sorts of stuff at startup that I don't actually need, but I never got around to switching it off. Combined with the slow start of KDE and the fact it loads after login (which windows does before login), and it begins to feel slow.

    Performance is improving, however it's still largely in the hands of the GNU folks and the distro companies.

    thanks -mike

  7. It's All a Question of Cache by emulac · · Score: 2, Informative

    The Intel PXA250 has only 32K/32K of cache, which means that any real application will experience an extremely high cache miss rate. The memory bus is 16 or 32 bits, and has a maximum clock rate of 100 MHz. So, if you're running the maximum width bus at its maximum speed, you're likely to see an instruction dispatch rate of about 50~100 million ops/second. That's slow, and there's really nothing to be done short of adding much more cache.

  8. Re:Cant find the link but by cbcbcb · · Score: 2, Informative
    The Xscale has a 10 stage pipeline, compared to the 3 stage pipeline in an ARM7TDMI, and 5 stage pipelines in the StrongARM and ARM920T The main problem is the much greater load result delay on the Xscale pipeline (3 cycles(?), compared to 1 cycle on the ARM920) which means that the instruction ordering needs to be significantly different to generate optimal Xscale code. A short code segment should clarify this:
    # cycles ARM7TDMI ARM920T Xscale(guess)
    LDR r0, label1 3 1 1
    LDR r1, label2 3 1 1
    ... interlocks 0 1 3
    ADD r2,r0,r1 1 1 1
    total 7 4 6
    On this trivial piece of code the Xscale is 50% slower at the same clock cycle than the ARM920T. However, this effect would not make a 400MHz Xscale slower than a 206MHz StrongARM by itself.
  9. Re:Judging by modern Linux DEs.... by himi · · Score: 3, Informative

    Most of the core ideas in Unix were developed in the 60's, actually.

    Computing in the 50's was a very different thing, so limited that the idea of wasting cycles on things like memory management or protected memory would have been considered insane. It wasn't until hardware developed to the point where there were cycles and memory to spare that anything like Unix (or MULTICS, which is where most of Unix's ideas were developed) became possible.

    himi

    --

    My very own DeCSS mirror.
  10. Re:ARMv5 versus ARMv4 and why Intel sucks by FrankDrebin · · Score: 2, Informative

    Somebody, please mod this up because jeff is right damn it!!!

    I've worked with both the SA-11x0 (StrongARM) and the PXA250 "Cotulla" (Xscale) CPUs and everything jeff says is pretty much on the money (except the CLZ instruction is far from useless, it's *awesome* for fixed-point logarithms, dude).

    Also, the DSP coprocessor in the X-scale is about as useful as tits on a bull for codecs with 16-bit data streams. You spend so many clocks marshalling data around to get it in and out of the thing that it's *much* more efficient to use the MAC instructions native to ARM v4 on normal registers! Even the Intel engineers who put together their IPP's have avoided the DSP coprocessor since it provides no real advantage.

    It's pretty clear to me the v4/v5 thing is a red herring. Let's face it DEC was much better at putting out a general purspose ARM-based CPU than Intel.

    --
    Anybody want a peanut?