Slashdot Mirror


Ars Dissects POWER5, UltraSparc IV, and Efficeon

Burton Max writes "There's an interesting article here at Ars about the POWER5, UltraSparc IV, and Efficeon CPUs. It's a self-styled "overview of three specific upcoming processors: IBM's POWER5, Sun's UltraSparc IV, and Transmeta's Efficeon. " I found the insights as to Efficeon (successor to Crusoe) to be particularly good (although it paints a sad picture of Transmeta, methinks)."

176 comments

  1. Good article by The_Ronin · · Score: 5, Interesting

    Too bad they focused too much on Power and Transmeta while paying little time on UltraSparc IV and V and ignored Itanium. Needs a little more balance and it would have been a great read.

    --

    I don't drink because I have to, I drink to stop the voices in my head!

    1. Re:Good article by AKAImBatman · · Score: 5, Interesting

      I think it would have been best to have an article devoted to the TransMeta chip, and split the Power5/UltraSparc discussion out into its own article. That way he could have given a great deal more attention to the powerhouse chips and how they're going to change the future. TransMeta's chips are on the level of ARM, not UltraSparc.

    2. Re:Good article by Anonymous Coward · · Score: 3, Informative

      There's a reason they ignored Itanium, it's about upcoming processor technologies. Last I checked there wasn't a new, soon to be released, Itanium that Intel was pushing.

      In fact, the current Intel processor roadmap shows the same Itanium 2 processor for the first half of 2004 as it did for the second half of 2003.

    3. Re:Good article by jaberwaki · · Score: 3, Informative

      I believe he didn't spend much time on the UltraSparc IV because, quote:

      "To get the "hyperthreading" effect of two processors on one chip, Sun stuck two full-blown UltraSparc III cores on a single chip, which is chip-pin compatible with the UltraSparc III."

      He assumes the interested reader will already know something about the UltraSparc III. Sun didn't fundamentally change the chip architecture. Also the Itanium architecture is already discussed ad-nauseum in other articles. It wasn't meant to be a balanced overview of all new CPU architectures.

    4. Re:Good article by Selecter · · Score: 0

      Maybe there's a reason why they didnt spend much time on the Itanic. AMD64 is the iceberg that the Itanic is gonna hit and it's gonna sink pretty fast. Intel will have Plan B in place by Feb. 04. They have to, Opteron is making too much headway too fast, and unlike Athlon XP's, these chip are making AMD some serious green stuff for the first time in a long time. If AMD is in the black next quarter, look out. Intel is gonna have to play by AMD's rules.

    5. Re:Good article by pmz · · Score: 3, Informative

      Sun didn't fundamentally change the chip architecture.

      Probably the most significant outcome of the USIV will be 212-CPU Sun Fire 15K servers. That seems to imply something like 5 or 6 CPUs per rack-unit (although it appears the 15K is somewhat bigger than a standard rack).

    6. Re:Good article by larien · · Score: 1
      Standard 15k currently support 72 CPUs normally (4 CPUs in each of 18 system boards). This can be expanded to 106 by putting CPUs in I/O slots, although those CPUs are partially crippled by latencies/bandwith constraints by doing so. USIV give a form-factor equivalent to USIII so it should allow a simple doubling of capacity, assuming the firmware and OS support it.

      As for rack sizes, the 15K racks are about the same size as normal racks, but are slightly deeper. The system is not like a standard rack + servers, you buy a 15K as a single unit and add the system/IO boards.

  2. brain fart while reading the article by zontroll · · Score: 2, Funny

    I had a brain fart for a second while reading the article:

    "This is why the advances that have the most striking impact on the nature and function of the computer are the ones that move data closer to the functional units. A list of such advances might look something like: DRAM, PCI, on-die caches, DDR signaling, and even the Internet"

    For a second there, I thought that the list of advances started with DRM, not DRAM, and I almost had a heart attack.

    1. Re:brain fart while reading the article by Anonymous Coward · · Score: 0

      I predict that this post will be modded up to +5, Funny. Pretty much all geek and adolescent humor does here.

    2. Re:brain fart while reading the article by Anonymous Coward · · Score: 0

      Imagine a beawouaoulf cluaster of Ars Technica articles!!

      In Soviet Russia IV Sparcs Ultra-Sun _You_

      1. Review Efficeon, PWR5, SPARC4
      2. Discover new rTransmeta
      3. ???
      4. Profit!!

      Yes but does that include the $1399 SCO fee?

      NP, hot grits

    3. Re:brain fart while reading the article by Anonymous Coward · · Score: 0

      I for one welcome our new rTransmeta overlords?

    4. Re:brain fart while reading the article by zontroll · · Score: 1

      no, thank you, mr AC

    5. Re:brain fart while reading the article by RevRa · · Score: 1

      So because I've been around forever (UID 1728), my input is more valuable? YAY! I RULE!! ;-)

      --
      - Kate
      "DNA is life. The rest is just translation."
    6. Re:brain fart while reading the article by zontroll · · Score: 1

      nice sig!

    7. Re:brain fart while reading the article by Anonymous Coward · · Score: 0

      Yup, at least you know how to make proper humour.

    8. Re:brain fart while reading the article by Anonymous Coward · · Score: 0

      well, you were wrong.....

    9. Re:brain fart while reading the article by Anonymous Coward · · Score: 0

      Yeah, so would I, if I couldn't fucking READ!

  3. Transmeta is a joke by Anonymous Coward · · Score: 3, Insightful

    Since day 1 they have skirted the benchmark issue, always trying to deflect the question.

    Just like that article yesterday on their new chip. Did they ever cite a single benchmark? NO.

    The basic performance of your CPU product, as measured by industry standard benchmarks, is essential knowledge.

    I was under NDA on the previous gen Transmeta stuff. It was amusing how the other OEMs reacted - it was crap, but nobody could say anything in public.

    1. Re:Transmeta is a joke by Anonymous Coward · · Score: 0

      Shhh! Hold it down Linus...

    2. Re:Transmeta is a joke by hesiod · · Score: 1

      > Since day 1 they have skirted the benchmark issue

      That's because they aren't going for speed. They are going for low power consumption. To compare Transmeta to Intel based purely on speed would be missing the point entirely.

  4. Sun? by Raven42rac · · Score: 3, Interesting

    Why the heck did Sun's offering get thrown in there? For variety? The Efficeons look awful nice to people who want less power-hunger from their computing devices. If all you do is word processing and such, why the heck even use an Intel/AMD chip? Less heat, less power, what is not to love? Now the IBM chips have really piqued my interest, I am a huge fan of IBM's chips, especially in Apple computers (I am a proud owner of a 12" Powerbook).

    --
    I hate sigs.
    1. Re:Sun? by lithandie · · Score: 1
      ... I am a huge fan of IBM's chips, especially in Apple computers (I am a proud owner of a 12" Powerbook).

      I am sorry to break this to you, but the 12"PB has a Motorolla chip in it...

      it just had to be said....

    2. Re:Sun? by Anonymous Coward · · Score: 0

      I love the PowerBook too, but that model uses Motorola chips, not IBM. The only Apple machines using IBM chips were the older iBooks (G3) & the new G5. The remainder of the line uses the G4, which is all Motorola.

    3. Re:Sun? by Raven42rac · · Score: 1

      I know. I should have clarified, I was just saying that I am an Apple mark, I know that my PB has a Motorola chip.

      --
      I hate sigs.
    4. Re:Sun? by illumin8 · · Score: 4, Insightful

      I am a huge fan of IBM's chips, especially in Apple computers (I am a proud owner of a 12" Powerbook).

      I don't mean to burst your bubble, but your 12" PowerBook uses a Motorola processor, not an IBM one. I own a 15" PowerBook though and I love it.

      That having been said, the IBM PPC 970 or G5 is breathing new life into the PowerMac line and Apple is doing really well because of it. I can't wait until they get it stuffed into a PowerBook.

      --
      "When the president does it, that means it's not illegal." - Richard M. Nixon
    5. Re:Sun? by ch-chuck · · Score: 1

      And it may not be Motorola for long, MOT is looking to spin off the Semi Products Sector.

      --
      try { do() || do_not(); } catch (JediException err) { yoda(err); }
    6. Re:Sun? by CottonEyedJoe · · Score: 1

      Assuming the Powerbook owner has a new 'book, then you are most certainly correct. However, Apple has produced many computers under the "Powerbook" moniker, and some of them had 12" screens. I would wager some even included G3's or 603e's which would have been produced by IBM. You most likely know that, but there are alot of new folks interested in macs these days who might believe that IBM is a new player in Mac CPU's. IBM has been supplying apple with CPUs since 1994.

    7. Re:Sun? by Kunta+Kinte · · Score: 2, Insightful
      Why the heck did Sun's offering get thrown in there? For variety? The Efficeons look awful nice...

      "I don't like or use it so one else does"

      Real smart.

      Any idea the amount of Sun systems are out there? People who use Sun hardware and software, and *gasp*, like it?! Should we only evaluate chips that currentlydo ok in the slashdot market?

      --
      Based on upvotes, Ageism is the only "-ism" Slashdotters care about and think isn't SJW
    8. Re:Sun? by PetWolverine · · Score: 1

      I can't wait until they get it stuffed into a PowerBook.

      Stuffed, heh. Definitely the right word for it.

      I, too, am looking forward to having a G5 PowerBook set my pants on fire.

      --
      I found the meaning of life the other day, but I had write-only access.
    9. Re:Sun? by Raven42rac · · Score: 1

      I have a Sun box. IMHO they are overpriced, underperforming dinosaurs. It is a pretty damn good firewall though.

      --
      I hate sigs.
    10. Re:Sun? by Hoser+McMoose · · Score: 1

      Transmeta's problem is that their chips actually don't consume less power than a number of Intel chips. The Transmeta chips have a maximum power draw of about 7-10W. Intel has several ULV chips that are in the same range. Now Transmeta has had a bit of an advantage in it's dynamic power consumption strategies, but the difference isn't huge, especially not when compared to the Pentium M processor.

      The real question about the success of the Efficeon is whether or not it will be able to offer more performance for fewer dollars than Intel's ~800MHz Ultra Low Voltage Celeron processors. These chips consume very similar amounts of power and are both fairly low-cost chips.

      FWIW the main reason why Transmeta powered laptops have used so little power as compared to Intel-based laptops is that the Transmeta ones tend to ship with very small, low powered screens, while the Intel-based ones tend to use larger screens. With the LCD screen consuming up to 25W on some laptops, any difference there can make a big difference. Things like hard drives and graphics controllers also play a major role in the total power consumption of a laptop. Most small to mid-sized laptops have a maximum power consumption of up around 50-60W. With the processor taking 7-10W of that it's an important part, but by no means the only piece of the puzzle.

    11. Re:Sun? by Anonymous Coward · · Score: 0

      The G4 is a Motorola design, but IBM picked up the manufacturing when Motorola dropped the ball. The G5 is exclusively IBM.

    12. Re:Sun? by Anonymous Coward · · Score: 0

      You're probably correct, but aren't the new G4* iBooks using IBM PPCs? PPC has always been a Motorola/IBM co-project, until recently, hasn't it?

      * G4 without AltiVec/Velocity Engine, which was previously the main distinguishing feature, so I'm not quite sure why they qualify... maybe it's just a marketing label after all.

    13. Re:Sun? by Anonymous Coward · · Score: 0

      * G4 without AltiVec/Velocity Engine, which was previously the main distinguishing feature, so I'm not quite sure why they qualify... maybe it's just a marketing label after all.


      The G4 chips in the new iBooks is basically a G3 with AltiVec. The G4 in PowerBooks, etc. is a somewhat different design, with AltiVec designed-in from the start.

      Functionally, they are the same, with a 5% performance difference at the same clock speed on non-AltiVec code.

      The iBook chips are definitely fabricated by IBM.

    14. Re:Sun? by Anonymous Coward · · Score: 0

      No, the new iBooks are using the MPC 7455 chip, which is an older G4 design. If it had been IBM's G3-with-Altivec (the PPC 750VX), the iBooks would have had 512 kB of L2 cache, a RapidIO bus and clock speeds starting at about 1,5 GHz. Instead, they have 256 kB L2, an old-style 133 MHz SDR FSB and clock speeds ranging from 800 to 1000 MHz. So the G4 in the iBook is actually the same processor that Apple previously used in the PowerBook line until they started using the less ineffective 7457 chips.

    15. Re:Sun? by Anonymous Coward · · Score: 0

      You completely missed the point of the poster. Read the article. The US IV is just two US IIIs on one core. Probably useful to Sun customers, but from an architectural standpoint, about as interesting as mud.

  5. Re:Great Innovation by Perl-Pusher · · Score: 0, Offtopic

    LOL,I about spilled my soda.

  6. rTransmeta? by October_30th · · Score: 1
    rTransmeta

    What's this?

    --
    The owls are not what they seem
    1. Re:rTransmeta? by musiholic · · Score: 1
      recombinant Transmeta

      --
      One Can Never Own Enough Musical Instruments...
    2. Re:rTransmeta? by AchmedHabib · · Score: 1

      Recent studies show that e[something] is out and @[something] are dead, stone cold. r[something] is _the_ thing. so be prepared for a whole new line of rebranded products rServers, rApplications, rCPUs

    3. Re:rTransmeta? by Anonymous Coward · · Score: 0

      the arc length across an angle of meta, of course

    4. Re:rTransmeta? by Anonymous Coward · · Score: 0

      I think they forgot to put 'atemsna' at the beginning.

  7. One Power 5... by Realistic_Dragon · · Score: 4, Interesting

    Will show up as _4_ processors to the OS! (2 cores both doing SMT.)

    This means that in a (say) 512 processor box the OS will have to handle 2048 processors efficiently. That's placing a lot of control in the hands of the software designers, and a lot of money in the hands of the companies that license per processor.

    On the other hand, UNIX is getting pretty efficnelt at scaling to large systems, perhaps it (and by extension Linux thanks to SGI and IBM) will be able to handle it with no problems. One thread per processor on a desktop system might prove to be quite efficient :o)

    --
    Beep beep.
    1. Re:One Power 5... by stevesliva · · Score: 3, Interesting

      I'm getting a lot of karma mileage from this Power5 MCM review these days. They visited the same Microprocessor Forum that Ars did.

      --
      Who do you get to be an expert to tell you something's not obvious? The least insightful person you can find? -J Roberts
    2. Re:One Power 5... by gregfortune · · Score: 1

      And if one of the threads blocks on IO? You would actually want more processes running than the total number of processes. The exact number depends on how many and how often processes get blocked for various reasons, but I think 1.5 or 2 is considered a good factor. That means something like 4000 processes would make pretty efficient use of a 512 processor box.

    3. Re:One Power 5... by Anonymous Coward · · Score: 1, Informative

      The POWER 4 based systems from IBM are only available up to 32-way. I'd expect them to try to double it to 64-way. So OS would see 128 processors with the POWER 5.

    4. Re:One Power 5... by AKAImBatman · · Score: 1

      Geez! That thing looks like you could club someone over the head with it! Does putting 8 processors into a block on cement really improve things that much over multiple processors?

    5. Re:One Power 5... by Elwood+P+Dowd · · Score: 2, Insightful

      This means that in a (say) 512 processor box the OS will have to handle 2048 processors efficiently. That's placing a lot of control in the hands of the software designers, and a lot of money in the hands of the companies that license per processor.

      Fortunately for IBM, they are both the hardware designers and, frequently, the software designers. They can ensure that their big iron will be supported by software.

      --

      There are no trails. There are no trees out here.
    6. Re:One Power 5... by stevesliva · · Score: 2, Informative

      It's four dual-core SMT processor chips, and four L3 cache chips per MCM, actually. I think cache is sexy, but I'm biased.

      --
      Who do you get to be an expert to tell you something's not obvious? The least insightful person you can find? -J Roberts
    7. Re:One Power 5... by stevesliva · · Score: 1
      They can ensure that their big iron will be supported by software.
      Didn't you hear? SCO owns SMP. IBM just copied all their software from SysV.
      --
      Who do you get to be an expert to tell you something's not obvious? The least insightful person you can find? -J Roberts
    8. Re:One Power 5... by AKAImBatman · · Score: 2, Insightful

      Alright, 4 two-way chips. But does it actually improve anything over individual processors? If I have to yank a board on an UltraSparc, I'm not going to throw away the entire board and all its processors! I'm simply going to replace the bad one and slap the board right back in the system. With IBM's design, I have to throw the whole thing away and get a new block of cement^W^W^W processor chip for my machine.

    9. Re:One Power 5... by Anonymous Coward · · Score: 0

      > SCO owns SMP. IBM just copied all their software from SysV.

      Oh no! How could this happen!?!

    10. Re:One Power 5... by stevesliva · · Score: 1

      I don't know. Although I do think that the MCM package is trickling down IBM's eServer line from the hugely reliable mainframe zSeries into the pSeries. As far as bus advantages, etc, I'm not enough of a systems geek...

      --
      Who do you get to be an expert to tell you something's not obvious? The least insightful person you can find? -J Roberts
    11. Re:One Power 5... by raodin · · Score: 1

      From the inquirer article, I'd guess the major advantages are 2 things - An *extremely* fast bus between the CPUs on the same package, and increased density.

    12. Re:One Power 5... by redgren · · Score: 2, Interesting

      Reliability is pretty impressive on these designs, considering the complexity. At least on the Power4 (similar design), MTBFs are measured in decades.

    13. Re:One Power 5... by AKAImBatman · · Score: 2, Interesting

      Yes, but what are the advantages? IBM, Sun, and HP all make a business out of selling components with very high MTBF. Yet, if I have a 64 processor machine chugging along for years on end, I have a reasonably good chance of seeing a failure. (Particularly when chips come from a bad batch.)

      So, IBM is taking away the ability to hot swap individual chips in exchange for... what? That's the big question. If there's some major improvement in the design, say so! Inquiring minds want to know! :-)

    14. Re:One Power 5... by isaac · · Score: 3, Informative
      So, IBM is taking away the ability to hot swap individual chips in exchange for... what? That's the big question. If there's some major improvement in the design, say so! Inquiring minds want to know! :-)
      Damn, dude, RTFA if you're that curious!

      What is gained is full-speed interconnect between processors within the same module. No "multipliers" - the bus between the cores within the module run at chip speeds. The timings are so tight at 2+ GHz that this is simply impossible to do with individual chips.

      -Isaac

      --
      I am not a lawyer, and this is not legal advice. For Entertainment Purposes Only.
    15. Re:One Power 5... by redgren · · Score: 1

      It really is just like the AC that replied to you states it. Its a matter of creating denser and denser integrated products. The Chip-chip intereconnects number somewhere around 5500 on a Power4 MCM, if you attempted to route this out of each chip, into a PCB, you would not get near the interconnectspeed of the as if you contained it all in an MCM. MCM-MCM interconnect speed drops by a factor of 3 compared with chip-chip in an MCM.

    16. Re:One Power 5... by AKAImBatman · · Score: 1

      Thanks to you and the AC. That's all I wanted to know. :-)

      Now if you'll excuse me, I need to see how useful these new chips are as boat anchors...

    17. Re:One Power 5... by Fulcrum+of+Evil · · Score: 1

      I have a 64 processor machine chugging along for years on end, I have a reasonably good chance of seeing a failure. (Particularly when chips come from a bad batch.)

      So source bricks from the same batch and source multi-brick systems from different batches. If you have to toss the whole brick at once, it's best to keep the stuff that's more likely to fail on that brick.

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    18. Re:One Power 5... by anthonyrcalgary · · Score: 1

      oh... my... sweet... jebus...

      --
      When someone might yell at me, it has to be OpenBSD.
  8. Performance != marketshare by G4from128k · · Score: 4, Insightful

    The history of Wintel suggests that top-rated raw CPU performance is not the best predictor of adoption. Compatibility with market-dominating software platforms is a greater determinant of CPU sales. We might hope that advances in compiler design adn flexible cores can help any CPU run x86 code, but there are always the little nts that prevent true compatibility and drive computer buyers toward the dominant platform.

    --
    Two wrongs don't make a right, but three lefts do.
    1. Re:Performance != marketshare by meadowsp · · Score: 1

      Insightful? Why on earth would the ability to run x86 code affect the takeup of Power/Sparc chips used for IBM Mainframes and Solaris Boxes?

  9. The "hyperthreading" thing. by Animats · · Score: 3, Interesting
    First "Hyperthreading", now "prioritized hyperthreading".

    It's amusing seeing this. It reflects mostly that Microsoft has finally managed to ship in volume OSs that can do more than one thing at a time. (Bear in mind that most of Microsoft's installed base is still Windows 95/98/ME. Transitioning the customer base to NT/Win2K/XP has gone much more slowly than planned.)

    But Microsoft takes the position that if have multiple CPUs, you have to pay more to run their software. So these strange beasts with multiple decoders sharing ALU resources emerge.

    1. Re:The "hyperthreading" thing. by drinkypoo · · Score: 1

      Microsoft will eventually provide XP Home in an SMP flavor, it's only a matter of time. Perhaps they will have an HT edition before that happens. But SMP for free is just another selling point for Linux, so they won't let it be a sticking point forever.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    2. Re:The "hyperthreading" thing. by Anonymous Coward · · Score: 0

      whaha yes, the only reason Hyperthreading exists is because of microsoft liscensing.

      you're OH so bright :)

    3. Re:The "hyperthreading" thing. by RALE007 · · Score: 1
      I really fail to see how Microsoft's multi-processor licensing scheme falls into this. Correct me if I'm wrong, but since when has any MS OS run on a POWER chip or a Sparc?

      If these were x86 chips I think the licensing question would be valid, but since they're not...

      --
      Beware blue cats moving at .99c
    4. Re:The "hyperthreading" thing. by F34nor · · Score: 1

      NT 3.5 on PowerPC or Alpha

      Those heady days before they broke the whole point of NT (dividing the kernel form the hardware layer.) But before they could make a stable OS.

    5. Re:The "hyperthreading" thing. by cens0r · · Score: 1

      4.0 ran on both of those as well... I still have my NT4 CD with both a PowerPC and Alpha directory.

      --
      Jack Valenti and Orrin Hatch will be first up against the wall when the revolution comes.
    6. Re:The "hyperthreading" thing. by Anonymous Coward · · Score: 0

      Microsoft will eventually provide XP Home in an SMP flavor, it's only a matter of time.

      Why would they do something like that? They already provide a SMP flavor of XP and its called XP Professional. Their "home" version is designed towards the home market, which does not include people with dual processor machines. Granted there are SOME home users who have dual processor machines, but all of them fall more into the "professional" user base anyways.

      When it comes down to it, if those professional users who are stuck with home wanted a SMP version, they can just go load up their favorite P2P app and download the professional version anyways ;)

    7. Re:The "hyperthreading" thing. by AchmedHabib · · Score: 1

      If making the home edition without SMP, required more of them than flipping a flag before compiling, I'd say what it was money wasted in development. but what do I know.:)

    8. Re:The "hyperthreading" thing. by Anonymous Coward · · Score: 0

      XP Home currently supports one HyperThreading CPU.

    9. Re:The "hyperthreading" thing. by Anonymous Coward · · Score: 0

      Uh.....3.51 was much more stable than XP/2k and NT4 ever was. They moved a bunch of stuff into kernal level that shouldn't be and so now you can muck stuff up even easier.

      If current win32 programs supported NT3.51, microsoft still made updates, and people wern't so addicted to the fisher price interface of XP, it'd be in the best interests to switch back to 3.51 simply for stability.

    10. Re:The "hyperthreading" thing. by Anonymous Coward · · Score: 0

      NT 3.5 ran on x86, Alpha, and MIPS. PPC support was the reason for 3.51's release.

      And of course, you don't know what you're talking about regarding the "kernel"/HAL.

    11. Re:The "hyperthreading" thing. by koreth · · Score: 1
      Bear in mind that most of Microsoft's installed base is still Windows 95/98/ME.

      Is that really true? Judging by the web logs from my employer's site, it looks like about 65% of our users are on NT/2K/XP. Our customers are all in the construction industry, not the tech industry, so they aren't likely to be early adopters.

      If you're talking MS's home users, then that's pretty plausible, but home users aren't the majority of Microsoft's installed base.

      I'd be interested to see some numbers, though, if you have them -- I may well be wrong.

    12. Re:The "hyperthreading" thing. by F34nor · · Score: 1

      I have a "friend" who works for a "corporation" who was made to move a huge e-mail system from BSD to "another OS" becasue they "should eat there own dog food." He loved NT 3.51 becasue once it was ironed out with a tactical nuclear weapon it was STABLE. They are now onto 2000 but wouldn't be without the pressure to conform.

    13. Re:The "hyperthreading" thing. by Alex · · Score: 1

      No but Solaris + WinNT worked on PPC IIRC.

      Alex

  10. power consumption by bigpat · · Score: 4, Interesting

    Wasn't low power consumption the number 1 benefit that transmeta was looking to provide, so that you could get twice the battery life (or soemthing like that) without sacrificing too much performance. Did Transmeta shoot itself in the foot by letting people think that it was going to provide higher performance chips than the competition.

    The main selling point of transmeta was always power consumption, so have they lost their edge in that area? If so, then that would be serious for them, but the article doesn't answer that question.

    1. Re:power consumption by Kenja · · Score: 1

      The problem was that they provided around 10% more battery life at around 50% the performance.

      --

      "Have you ever thought about just turning off the TV, sitting down with your kids, and hitting them?"
    2. Re:power consumption by TimeZone · · Score: 1
      My instinct is to say that under such benchmark conditions, Efficeon would edge out Centrino in terms of MIPS/Watt on the same process technology. However, would it be by enough of a margin to warrant going with the smaller TM instead of the larger Intel, both in terms of any added design costs incurred from using the TM parts and in political terms of angering Intel, and of course in terms of questions surrounding TM's long-term viability? Of that I'm not so sure.

      The article didn't answer the question, but didn't really avoid the question either. The benchmarks provided simply didn't include the necessary information to make a conclusion.

      TimeZone

    3. Re:power consumption by Arker · · Score: 1

      No, they're still great for power consumption. Problem is, that the CPU isn't the only thing in most devices sucking power, and they built up expectations that their chips would be able to perform much better than they have turned out to do. I still think they are good choices for a lot of devices that don't really need any more power - they're basically like ARM with x86 compatibility built in, and there are plenty of cases where something like that makes sense - but they definately haven't lived up to the expectations.

      --
      =-=-=-=-=-=-=-=-=-=-=-=-=-=-
      Friends don't let friends enable ecmascript.
    4. Re:power consumption by akuma(x86) · · Score: 1

      Actually, the original intent of Transmeta was to produce a chip that had performance on-par or better than comparable Intel/AMD offerings.

      When the project failed to do that (quite badly), then the marketeers refocussed the company message to start talking about 'low-power' and efficiency. This deflects the critics who do not understand computer architecture and things like power-efficiency.

      Yeah, it's a neat research project and having Linus work there didn't hurt PR at all, but the performance just isn't there. The truth is that the architecture sucks rocks. Any statically scheduled CPU is going to have major problems competing with dynamically scheduled x86 processors. You can have low power and efficiency with the same performance as Transmeta using standard Intel/AMD CPUs and just scaling down the voltage (and hence frequency). In fact, Intel and AMD have ULV (ultra-low-voltage) CPUs that do just that.

  11. quick... by billimad · · Score: 1


    ...ship them some super fast cpus for their web server - it's smokin'

    1. Re:quick... by PetWolverine · · Score: 1

      What is it smoking, and where can I get some?

      --
      I found the meaning of life the other day, but I had write-only access.
  12. Why only two threads per core? by joib · · Score: 2, Interesting

    Seems like the power5 will be able to run only two threads per core, like the pentium 4. For the P4 it is understandable that they want to reduce cost as much as possible, but why be so frugal on a high-end cpu like the power5?

    I mean, the MTA supercomputer which pioneered the entire SMT concept, was able to run 128 threads per cpu. Ok, so they had different design constraints as well. Basically, the idea was that the cpu:s didn't have any cache at all thus making them simpler and cheaper. To avoid the performance hit usually associated with this they simply switched to another thread when one thread became blocked waiting for memory access.

    Anyway, is there any specific reason why IBM didn't put more than 2, say 8 or 16 threads per cpu on the power5?

    1. Re:Why only two threads per core? by AxelTorvalds · · Score: 2, Informative
      I recall an article on Forbes (of all places.. they talk to the right guys though) on the matter comparing the Sun Niagra design goals and IBM's and Intel's. Basically the answer was that it's not clear what kinds of apps benefit from 8, 16, or 32 threads of parallelism. This is a low tech description but there are other bottle necks, you have to have that many "threads" of code that are ready to run to benefit from it or else it's cheaper to context switch.

      Subsequently, I don't know how much you've played with Pentium IVs but they don't buy much in most circumstances. We're not talking about doubling performance or anything like that. If one SMT unit get's you a 20% improvment on a p4 in the best cases then what does the 4th unit buy? IBM is just hedging because the technology hasn't shown that it delivers serious punch yet.

    2. Re:Why only two threads per core? by kcm · · Score: 3, Interesting

      In other words, you're laying out the basic problems of:

      1) Being able to FIND parallelism
      2) Being able to take advantage of it:
      a) Issuing multiple instructions (limited fetch bandwidth)
      b) Executing them in parallel (limited FUs)
      c) Committing them to memory / retiring

      20% is generous, but that's a limitation of the simplicity of HT with respect to the EV8 / UltraSparc-V scale of SMT implementation, which leans towards a more full-issue design.

    3. Re:Why only two threads per core? by pmz · · Score: 1

      it's not clear what kinds of apps benefit from 8, 16, or 32 threads of parallelism.

      SunRay servers comes to mind, where there are lots of single-threaded users sharing a system.

      In Solaris, for example, every process gets a kernel thread, and every process thread gets a kernel thread. On my workstation, right now, just running CDE and a few apps gets reported as 189 light-weight processes (essentially threads). Have a system shared by 1000 users could result in over 100,000 threads with approximately 1000 being in the run queue at any given moment. This is a lot of parallelism derived from what generally has no parallelism at all (the user's desktop). If users are using multi-threaded apps for CAD, image processing, games, etc., the potential system utilization goes even higher. Solaris has efficient schedulers, so shoveling more and more users and programs onto a system will take it to near-100% sustained utilization before noticable thrashing begins occuring (Windows this ain't ;).

    4. Re:Why only two threads per core? by Anonymous Coward · · Score: 3, Insightful

      This is a false economy. Just because you have 32 threads to run doesn't mean you would benefit from 32-way SMT. Remember that you don't just need 32 contexts in your CPU, you need enough cache to be able to feed 32 unrelated threads. The reason SMT sometimes slows down a CPU is that the 2 or more threads running concurrently compete for cache space. If you just run a single thread at a time, it has a whole quantum to fill up the cache and use it.

      The way this worked on the afforementioned MTA machine is that the processor had 128 contexts and NO cache. Memory latency was 32 cycles, so as long as you had at least 32 compute-bound threads, there were no cycles lost to latency. However, this means that each thread did take longer to run.

      aQazaQa

    5. Re:Why only two threads per core? by javiercero · · Score: 1

      The MTA was a fine grained multithreaded machine, way differnet than the SMT approach required for complex superscalar threaded machines. The Tera did have an instruction cache though, no data cache.

    6. Re:Why only two threads per core? by javiercero · · Score: 2, Insightful

      Actually the multithreaded design is an answer to the lack of parallelism, as most deisgns are able to deal at the thread or process level, hence the parallelism is implicit and does not ned to be "found" That is the whole point. You are citing the limitations of superscalar, not SMT designs.

    7. Re:Why only two threads per core? by javiercero · · Score: 1

      No, multithreading gives you the ability to hide memory latency issues. While a thread is requesting a memory operation, other thrad can fill the functional units, hence "hiding" the bubles the stalled thread would have generated. If you have to threads competing for memory resources, then there is aproblem with the processor logic. The tera had no data cache, because it can hide all memory latencies when it comes to data requests. The memory requests could be pipelined through maxium levels of banking and interleaving in the memory subsystem, so you do not need 32 compute bound threads, just at least 32 bounds. They can put their requests serially... data and other hazards are taken care automatically too.

    8. Re:Why only two threads per core? by kcm · · Score: 1

      I wasn't clear.

      Limited resources run out, hence four (independent) threads running in parallel cannot write to the RF or fetch from memory concurrently. If your parallelism involves many different types of operations, it's much easier.

      I suppose my original comment was worded badly -- being *able to* HARNESS the inherent (independent) parallelism with the resources at hand is the key, you are correct.

    9. Re:Why only two threads per core? by stripes · · Score: 1
      I mean, the MTA supercomputer which pioneered the entire SMT concept, was able to run 128 threads per cpu.

      It is an older concept (20 years or maybe 30!), look up barrel processors sometime. I'm pretty sure the MTA executed one thread per CPU per cycle with no penality for switching between threads on diffrent cycles. It would switch threads any time a load was issued, any time the store buffer was full and a store was issued, and after X cycles. The resources you need for an MTA thread would be more or less an extra set of registers.

      Anyway, is there any specific reason why IBM didn't put more than 2, say 8 or 16 threads per cpu on the power5?

      SMT (at least as the Alpha EV8, and IBM Power5 do it, I'm not sure what Intel's HT is exactly, but I think it is the same) executes more then one thread every cycle. So you need not only an extra register set per thread you need to make the "dispatch this to an ALU" logic more complex, and I think that is non-linear. I don't know why two seems to be the magic number and not 4 or something, but the higher cost per thread in SMT shows why you don't just add a ton of threads.

      Hope that helps.

  13. So, despite being lower voltage/MIPS... by csoto · · Score: 5, Interesting

    the author suggests that it's not worth "pissing off Intel" to go with Transmeta. Give me a break. Transmeta is the only thing pushing Intel to make Centrino and other lower-wattage chips. They recognize that anybody in the mobile computing/devices world will seriously consider anything that gives their customers increased battery life and less toasty pockets.

    --
    There exists no way of exchanging information without making judgments. --Bene Gesserit Axiom
    1. Re:So, despite being lower voltage/MIPS... by curtlewis · · Score: 3, Insightful

      Centrino is not a chip!

      it's a package of intel wireless, intel cpu and some other stuff.

  14. memory and processor watts not the same by pz · · Score: 5, Interesting

    Multiple times while reviewing the Efficion architecture the article's author suggests that the tradeoff of additional storage required for Transmeta's code-morphing approach will easily balance out the power savings from making a simpler CPU. This belies a deep misunderstanding of power consumption in digital systems, as readily evidences by the fact that modern non-Transmeta processers dissipate multiple tens of Watts of power (often nearly 100W) and a full complement of memory (4G, in modern machines) dissipates a few Watts at most.

    Also in the article, the author suggests that processors spend most of their time wating on loads, and then argues that since the code-morphing approach means more instruction fetches, the Efficion processor will be spending disproportionatly more time on loads. Then, after this assertion, he admits that he does not know *where* the translated Efficion code is held. Might it be in one-cycle-accessible L1 cache? That point is conveniently sidestepped. He does not understand under what circumstances the profiling takes place, although he regurgitates the sales pitch nicely. He argues that transistors hold the translated code (trying to argue against the transistors-for-software tradeoff) but then does not realize that transistors in memory do not equate transistors in logic (neither in power, as they are not cycled as frequently, nor in speed characteristics).

    In all, I find the author's treatment of the Transmeta architecture sophomoric, and, after finding that section lacking, I left the rest of the article unread. Your mileage may vary.

    --

    Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
    1. Re:memory and processor watts not the same by Hannibal_Ars · · Score: 5, Informative

      "Multiple times while reviewing the Efficion architecture the article's author suggests that the tradeoff of additional storage required for Transmeta's code-morphing approach will easily balance out the power savings from making a simpler CPU."

      I neither suggest nor imply anything this simplistic. In fact, I go to great pains to show how complicated the whole power picture is for Efficeon.

      "This belies a deep misunderstanding of power consumption in digital systems, as readily evidences by the fact that modern non-Transmeta processers dissipate multiple tens of Watts of power (often nearly 100W) and a full complement of memory (4G, in modern machines) dissipates a few Watts at most."

      Er... you do realize, don't you, that comparing Efficeon to a 100W processor is not only unfair, but it's stupid and I didn't do it anywhere in the article. A more appropriate comparison is Centrino, which approaches Efficeon in MIPS/Watt without any help at all from any kind of CMS software. I think that you might be the one who needs to learn a bit more about digital systems.

      "Also in the article, the author suggests that processors spend most of their time wating on loads, and then argues that since the code-morphing approach means more instruction fetches, the Efficion processor will be spending disproportionatly more time on loads. Then, after this assertion, he admits that he does not know *where* the translated Efficion code is held. Might it be in one-cycle-accessible L1 cache? "

      No, it is most certainly all not stored in L1. TM claimed that the original CMS software that came with Crusoe took up about 16MB of RAM, and that this was paged in from a flash module on boot. What I'm not 100% certain of are the exact specs for Efficeon, but I've assumed in this article that they're similar. This is a reasonable assumption, especially given the fact that the new version of CMS contains significant enhancements and is unlikely to be smaller. In fact, it's much more likely to be larger than the original 16MB CMS footprint, especially given that DRAM modules have increased in speed and decreased in cost/MB, which gives TM more headroom and flexibility to increase the code size a bit.

      "That point is conveniently sidestepped. He does not understand under what circumstances the profiling takes place, although he regurgitates the sales pitch nicely. He argues that transistors hold the translated code (trying to argue against the transistors-for-software tradeoff) but then does not realize that transistors in memory do not equate transistors in logic (neither in power, as they are not cycled as frequently, nor in speed characteristics)."

      Of course I know that transistors in memory are not the same as transistors on the CPU. My point though is that they're still not "free" in terms of power draw, and that it also costs power to both page CMS into RAM and to move it from RAM to the L1. And even having pointed that out, I still don't claim that this cancells out all the power saving advantages of TM's approach.

      As far as relying on the sales pitch for info on CMS's profiling, well, TM doesn't exactly release the source for CMS, nor do they provide a detailed user manual for it avialable to the public. As their core technology, details about CMS are highly guarded and the only information that either you or I will likely ever have access to about it is whatever they put in the sales pitch. So I, like everyone else, must draw inferences from their presentations and do the best I can.

      Anyway, if you don't like the article, that's fine. But being a hater about it just makes you look lame.

      --
      Senior CPU Editor | Ars Technica | http://arstechnica.com/
    2. Re:memory and processor watts not the same by Anonymous Coward · · Score: 0

      From the article:

      "I also speculated that TM had something else up their sleeves besides just power efficient chips, and that they'd eventually make a processor aimed at the high-performance workstation/server market.

      Well, I turned out to be clearly wrong about the second point, and arguably wrong about the first."

      Translation: I did not understood anything 2 years ago, but I talked with authority.

      "If I'd have known then what I know now, I might've called it differently, because I think that TM bumped up against a combination of two things: 1) a set of fundamental design constraints inherent in the basic architecture of the stored-program computer and 2) Moore's Law."

      Translation: I still don't understand much, but I am ready to make a fool of myself again.

      Come on. Basing an argument on Moore's "law" is so cliche, it isn't funny anymore.

      I didn't bother reading past that point.

    3. Re:memory and processor watts not the same by hamsterboy · · Score: 1
      The power consumption of a clocked transistor device is directly proportional to (a) the number of transistors, and (b) the square of the clock speed. Let's call it 1U for 1 transistor at 1 MHz, so (power consumption in U) = (# transistors)x((clock speed in MHz)^2). This means that 2 transistors at 1MHz will consume 2U, where 1 transistor at 2MHz will consume 4U.

      The Pentium 4 has upwards of about 55 million transistors on the die. SDRAM needs 1 transistor and 1 capacitor per bit; for 8x1024x1024x1024 bits (1 GB), that's 8.6 billion transistors (I'll ignore the caps for now).

      The base clock speed of the aforementioned P4 is 3.0 GHz, whereas the fastest DDR SDRAM runs at around 400MHz. Doing our math:

      • P4: (55x10^6)x((3000)^2) = 495 trillion U
      • SDRAM: (8.6x10^9)x((400)^2) = 1376 trillion U

      Now, granted, all of this depends heavily on usage patterns; the P4 running a constant stream of NOPs will consume far less power, as will SDRAM that isn't being written to, but the point is that the consumption numbers are far closer than you think they are.

      Hamster

    4. Re:memory and processor watts not the same by haxor.dk · · Score: 1, Insightful

      "Anyway, if you don't like the article, that's fine. But being a hater about it just makes you look lame."

      And once more, Hannibal demonstrates, that his writings @ Ars are more about being "kick@$$ kewl dude" than having the facts in check.

      It didn't occur to you, Hannibal, that you just went ad hominem on pz, when he delivered sound and substatial ciritcism to your article ?

    5. Re:memory and processor watts not the same by PastaAnta · · Score: 2, Insightful

      Oh my god! Please stop comparing apples and banjos and try to make sense of it!

      DDR SDRAM does not "run" at around 400MHz - the frequency of the databus is 400MHz. As you state yourself the power usage is very dependant on the usage pattern and only very few memory cells actualle change state during each write (up to 8 for an 8 bit RAM). I would guess that leakage and discharge of the capacitor cells is a significant factor, which you totally ignore.

      In a processor on the other hand, a lot of transistors change state every clock cycle - even during execution of NOPs. Some signals will even change state several times during a clock cycle due to asynchronous races in the logic paths.

    6. Re:memory and processor watts not the same by aurum42 · · Score: 1

      Setting aside the other inaccuracies in your comment, I don't know where you got the "power dissipation is proportional to the square of the switching frequency" idea, but it's wrong. It's a linear relation, roughly P= 0.5CV^2

      --
      "The slave who knows his master's will and does not get ready...will be be beaten with many blows."Luke 12:47-48
    7. Re:memory and processor watts not the same by PastaAnta · · Score: 3, Insightful

      First of all I thank you for a great article. You have som interesting views on the Transmeta approach. But like the parent poster I feel you may jump to some conclusions based on assumptions.

      It is true, that the CMS has a cost in terms of RAM usage but this does not necessarily translate into extra load latency. As I have understood the clue should be to utilize the fact that in common code you only execute a very little portion of the code most of the time (like 90%/10% or whatever). It should be expected that much can be gained by heavily optimizing these "inner loops", which should translate into reduced load latency as fewer instruction will be executed in total. The execution of the four optimisation runs or JIT compilation should drown in the millions of times these inner loops are executed.

      You could say that it is a complete waste of transistors and power usage to have many transistors performing the same optimisation over and over again in the conventional processors. These hardware based optimisation will also never be as efficient as their scope is limited.

      There are some interesting perspectives with the Transmeta approach as well. You state that POWER5, UltraSparcIV and Prescott tacle the problem with load latency by using SMT to fill pipeline bubbles from data stalls and thereby increase utilisation of the execution units. This should be possible for Transmeta as well, by upgrading their CMS to emulate two logic processors instead of one.

      But you are right! A complete theoretical comparison is impossible - only real world experience will show...

    8. Re:memory and processor watts not the same by ddt · · Score: 1

      I have to concur with the "hater". This article wasn't up to my standard for Ars, and I recoiled at exactly the same implications regarding the power draw of CMS itself.

      CMS and its translation buffer takes a small fraction of the available RAM, and all of RAM takes a small fraction of the power the CPU does, so we're talking about a fraction of a fraction. Translations live in RAM, btw, and are cached like any other executable code, when needed.

    9. Re:memory and processor watts not the same by pz · · Score: 1

      The base clock speed of the aforementioned P4 is 3.0 GHz, whereas the fastest DDR SDRAM runs at around 400MHz.

      Yes, except that the fraction of transistors switching in the two at any given moment is vastly different: in the P4 it will be reasonably high, in memory chips, it will be vanishingly low. Thus your analysis is inaccurate at best and potentially misleading at worst.

      Think of the following empirical observations: a modern processor cannot run without a heatsink without going into thermal failure. The specifications for power dissipation run between 50 and 100 W. Normal die temperature is closing in on 100 C. Now, how about for a modern memory chip? Dissipations are in the hundreds of milliwatts. Sure, you might have 10 or 20 of them, but that's it, creating an order of magnitude difference between processor power dissipation and memory chip power dissipation in normal use. Think about it: if this weren't the case, then memory banks would *require* fans for normal (not overclocked) operation just as processors do.

      --

      Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
    10. Re:memory and processor watts not the same by Anonymous Coward · · Score: 0

      Go fuck yourself ars-hole.

  15. Trying to read the FA by Space+cowboy · · Score: 0

    Well, I tried to read the FA before making a comment, but it was futile, there was some enormous flashing strobe-y advert to it that was just painful to have on the screen.

    So, I gave up. I have no clue what the advert was for, it had a sort of minimalist man icon in it, and lots of flashing colours - that's all I know. I do however know a lot more about advertising than the idiots who thought that one up.

    Simon.

    --
    Physicists get Hadrons!
    1. Re:Trying to read the FA by Anonymous Coward · · Score: 0

      That's a shame!

      If you had put as much effort into hitting the down arrow a couple of times as you have spent here bitching about an advert (on the internet... ads, can you believe it!) you might have been able to read a little further. The escape/stop key is your friend.

  16. contexts != threads by kcm · · Score: 5, Informative

    first, you don't just automatically get a linear increase with the width of the multiple-threading capabilities. it's not like it's free to increase the RF size and/or FUs, etc.

    you're also confusing contexts with active threads. the Tera^WCray MTA had 128 contexts available -- so that thread switching is more light-weight, more or less -- but only one could be active at one time.

    SMT in the various forms have more than one active thread, which introduces the problem(s) of competing for resources in the issue and retire stages, etc et al.

  17. Well by autopr0n · · Score: 1

    For one thing, there's a lot of interconnects inside that cement block, so it's not just like exposed pins for all the chips on the other side. For another... how often do chips die? If you can afford a machine with one of those chips, you can probably afford to replace a whole brick.

    --
    autopr0n is like, down and stuff.
    1. Re:Well by AKAImBatman · · Score: 1

      For another... how often do chips die? If you can afford a machine with one of those chips, you can probably afford to replace a whole brick.

      Actually, that's what a support contract is for. The bigger problem is availability. Each brick requires four processors, plus the various work to mold all the interconnects into place. The yield on a process like that can't be very high. Not to mention all the custom parts that would be needed to fit a chip like this.

      In other words, if my processor fails, there's a good chance I'll be waiting on a new one for anywhere from days to months. Not a good thing for a system that's making you millions for every cycle.

    2. Re:Well by redgren · · Score: 1

      Chips are tested before they are put into MCMs. So, not really a problem..As far as supply, your downtime would measure in the hours, if you had the cash to buy a big system like these.

    3. Re:Well by chez69 · · Score: 1

      That's one of the dumbest things I've heard here (and that's saying a lot)

      IBM won't be making these things to order, the minute your RS/6000 (p-series) looses a processor
      in the brick a CE will be out that day to fix it.

      --
      PHP is the solution of choice for relaying mysql errors to web users.
    4. Re:Well by chez69 · · Score: 1

      the above post is supposed to respond to this post

      --
      PHP is the solution of choice for relaying mysql errors to web users.
    5. Re:Well by AKAImBatman · · Score: 1

      Dude, chill out. It's called "playing devil's advocate". And my point is that a block like this is going to cause yield problems that could impact IBM's ability to supply them. Yes, even IBM can run into supply problems.

    6. Re:Well by Anonymous Coward · · Score: 0

      I don't think you get it -- You might buy a 4 CPU server, but it ships with an 8 CPU MCM. The 4 extra CPUs are used to check the calcuations of the primary. When one of your CPU dies, it automaticaly fails over to an idle CPU. You don't hear about it until you check your email the next day. You can also buy an upgrade by having IBM flip a switch -- cheaper for them to give your 4 free CPUs than send a CE out with an upgrade kit.

  18. I need some explanations by Anonymous Coward · · Score: 1, Interesting

    Interesting article indeed, yet there is a thing I on't quite understand about ILP (Instruction Level paralellism) :
    If the number of decoded instructions is higher, then - the CPU being superscalar - the probability of having all pipelines working grows, which means that ILP's also going up.
    Of course the ILP depends on the compiler quality and the program code itself, but having a good parallelism capacity in the CPU is also a key factor.

  19. A very Good point by Bill,+Shooter+of+Bul · · Score: 1

    It read much like a financial review of a company. Take the buzz words, guess wildy, base predictions of your guesses. Granted the author was intellegent and understood the basics, but with out a deeper understanding of the specifics he cannot really give reasons for performance or lack their of.

    --
    Well.. maybe. Or Maybe not. But Definitely not sort of.
  20. I *know* "Centrino is not a chip" by csoto · · Score: 1

    But, it's a chip PLATFORM that depends on certain Pentium-M chips. Naturally, systems built around Transmeta "chips" will also require Transmeta-compatible support devices (e.g. the "Transmeta PLATFORM").

    My point is, this low-voltage thing was a non-issue before Transmeta came along. Intel just told everyone to "put bigger fans" in their laptops and shut up. I've got this Dell with seriuosly huge fans, and it gets HOT (but it's pretty durn fast, has a big screen and built in DVD/CD-RW). I don't need low-voltage because it mostly sits on my desk.

    --
    There exists no way of exchanging information without making judgments. --Bene Gesserit Axiom
    1. Re:I *know* "Centrino is not a chip" by Anonymous Coward · · Score: 0

      Your Dell likely uses a Desktop chip on the inside -- Intel can invent all the cool mobile tech in the world, but it doesn't help if the OEMs want to go cheap and use regular P4s.

  21. Who is this Arse ... by Qbertino · · Score: 1

    ..and why is he taking UltraSparcs appart anyway?

    --
    We suffer more in our imagination than in reality. - Seneca
  22. Efficeon has integrated northbridge by chipace · · Score: 2, Redundant

    One detail that they didn't mention was the integrated AGP and DDR memory controller on Efficeon. Blades don't use graphics, so I'm thinking that Efficeon was designed primarily for Japanese laptops.

    Efficeon allows for a low chip count design. That could mean a smaller and more reliable laptop design.

  23. Guess what??? by crgrace · · Score: 5, Funny

    I actually read the article!!!!!

    All my questions were answered so I have nothing to say.

  24. Re:One Power 5... (Just a matter of scale...) by Anonymous Coward · · Score: 3, Insightful

    Ok, so you are worried that your parts are no longer accessable.

    One of the first computers I built had individual TTL parts (74xx type things) to make the CPU. If I fried on of those, I would just replace that single part and be going again. No need to replace the whole CPU.

    I, for one, would never go back to that. Not just the size but the performance and the cost.

    It used to be that I would buy 4K-bit RAM chips. Buy 8 of those to make a 8x4K RAM array (4K bytes) and then add a simple address decoder and put it on an S100 bus and you have more RAM for you system. Now I buy 512Meg DIMM modules where you can't (and don't want to) replace the individual chips. (Ok, you could if you have the fancy tools and you could get the chips in question but the cost factors just don't make that worth while)

    Systems are getting faster because of the higher integration levels. Taking the off-chip caches (like systems build with 386 and 486 CPUs) and putting that onto the CPU (first in the P-Pro as multi-chip modules and then later, onto the CPU itself) has significantly improved performance. Yes, it has removed ability to replace the cache separate from the CPU but then who really wants to or needs to do that. And at what cost (go back to 100MHz cache memory interfaces? I like the 3GHz clock in my on-chip cache, thank you very much.)

    The same is true of multi-cpu systems. As you increase the performance, the communications performance becomes a major bottle neck. First IBM put two CPU cores on one chip. Then Intel did a thing called Hyper-Threading (after they said that dual cores have no value :-) The next step is to somehow connect multiple (4/8/16/whatever) CPUs along with large (multi-meg) caches together using these specialized interconnect technologies.

    Imagine the performance gain of having 1GHz+ clocked cache of, say, 256Meg connected to 8 really fast CPU cores. It would be as much of a step forward as going from my TTL based 8-bit CPU to the 6502 single-chip CPU.

    I know I would not want to go back... So lets investigate how to move forward.

  25. Why is Transmeta still in the picture? by Anonymous Coward · · Score: 1, Insightful

    I don't understand why Transmeta still comes up in conversation. Besides the fact that they hired Linus, what exactly have they done to merit this inclusion alongside IBM, Sun, and Intel? There are plenty of other CPU manufacturers that sell x86 clones now... I think Cyrix was bought by some Taiwanese fab plant company, weren't they?

    Until Transmeta becomes a real contender, let's just keep out of the Linux biases and concentrate on the real contenders.

    My prediction is that if they don't produce a real hit soon, they will be out of business in 2 years.

  26. Bullshit by Anonymous Coward · · Score: 1, Informative

    MS has shipped preemptive multitasking and multithreading for a long time. You are confusing that with multiprocessing (which is different).

    Win95/98/ME are not multiprocessor but are preemptive multitasking and multithreading. They can certainly do "more than one thing at a time". Unlike Apple who first shipped this capability only recently, MS first shipped this in Windows 386 back in the late 80's.

    1. Re:Bullshit by all+your+mwbassguy+a · · Score: 1

      macs have been able to (cooperatively) multitask since system 6, and the os has been mutithreading-aware since system 7.5.

  27. no real adoption of Transmeta processors by Anonymous Coward · · Score: 0

    There isn't any significant adoption of Transmeta processors because they suck. You only find them primarily in the smallest Japanese machines only. Even there the Pentium-M is pushing them out.

    Claiming MIPS/watt supremacy for Transmeta is questionable as well.

  28. Re:Great Innovation by hellraizr · · Score: 1

    we run a large FoxPro cluster

    Is that not the saddest form of life you've ever heard of?

  29. Re:as does using the word "hater" by Hannibal_Ars · · Score: 1, Funny

    How do you know I'm not black?

    --
    Senior CPU Editor | Ars Technica | http://arstechnica.com/
  30. Code has to be loaded anyway by Effugas · · Score: 2, Insightful

    Since the author of this article is lurking here, I thought I'd ask:

    You make a rather big deal about Transmeta needing to run all x86 code through a "code morpher" (dynamic recompiler, actually), and come up with a decently large set of conclusions based on it.

    What's the big deal? No processor executes raw x86 anymore. Everything translates into an internal microcode that bears little resemblance to the original asm. Of course, normal chips have hardware accelerated microcode translaters, whereas Transmeta must recode in software -- but Transmeta's entire architecture was designed from day one to do that, and concievably they have more context available to do recoding by involving main memory in the process.

    And what is it with you neglecting the equivalence of main memory? Yes, transistors are necessary to store the translated program. They're also necessary to store the original one -- the Mozilla client I'm presently tapping away inside sure as hell doesn't fit in L1 on my C3! Outside of a small static penalty on load, and a smaller dynamic penalty from ongoing profiling, you can't blame performance on the fact that software needs to be in RAM. Software always needs to be in RAM.

    Don't get me wrong -- Transmeta's a performance dog, and everyone's known that since day one. But I think it's reasonable to say the cause is mostly one of attention -- every man hour they threw into allowing the system to emulate x86 took away from adding pipelines, increasing clock rates, tweaking caches, etc. In other words, yes it's a feat that they got the code to work, but you don't need to blame the feat for the quality of work -- they simply did alot of work nobody else had to waste time on, and fell behind because of it.

    Much easier explanation. Might even be true.

    Yours Truly,

    Dan Kaminsky
    DoxPara Research
    http://www.doxpara.com

    1. Re:Code has to be loaded anyway by Anonymous Coward · · Score: 0

      The facts are :

      - loading and fetching from main memory cost a lot,
      - the code morpher from transmeta CPU is in main memory - thus consuming transistor resources from main memory,
      - in other x86 CPUs, the code morpher is hardwired inside the CPU (it's the instruction decoding stage) - thus consuming transistor resources inside the CPU.

      So in effect both methods use up transistors, yet the TM one also use up memory bandwith and CPU computing ressources to achieve the same goal as a real x86.
      All this to say that emulation is slower than the real thing.

    2. Re:Code has to be loaded anyway by raxx7 · · Score: 1

      Its quite different what modern x86 CPUs do and what Transmeta tries to do. Or else they wouldn't have much to sell either.

      The decoding modern x86 CPUs do is part of any CPU's execution process: fetch from RAM, decode, execute. Because x86OPs are too complex do desing a OoO engine, they break it into smaller pieces. But so does the PowerPC970. And even with P4's trace cache, the transformation is linear: 1 x86OP always yields the same N internalOPs.

      Dynamic code translation (like code morphing) is more complicated. First the CPUs executes the translation software, which will read data from memory (the x86OPs to be translated) and then will write translated code to RAM. Only then the CPU will execute the translated code.

      Transmeta tries to get it's edge from optimization and caching of translated code. But optimization is itself a CPU intensive process and there isn't that much information on the x86OP stream to optimize on.

    3. Re:Code has to be loaded anyway by Effugas · · Score: 1

      So you recompile once into the new instruction set, and now you're retrieving transmeta ops from main memory instead of x86 ops.

      Except for really tight inner loops, you're always flying off to system RAM for one thing or another. While there's a static penalty because of code morphing, I'd wager it's a "lost in the backwash" effect -- oh, so a given stream of ops took a few extra million cycles to start cranking. BFD; we've got half a billion of 'em per second. The real question is why we don't have a full billion -- or more.

      Code morphing might make Transmeta a little slower, but fundamentally the chip doesn't spend half its cycles translating code -- which, of course, is what an interpretative emulator would do. I think it's Amdahl's law that limits how much a given subprocess can be accelerated to the benefit of the master process -- code morphing could take zero time, and the chip still wouldn't be fast, because the damn chip can't be using the morpher all that much.

      Intel and AMD are fast because they threw huge piles of cash at improving the silicon and making it fundamentally smarter. Transmeta's got some neat tricks going on -- but whatever Code Morphing translates to, I see no reason to believe it's fundamentally as powerful as what Intel and AMD can bring to the table.

      That's what annoyed me about this article -- they go for the complex answer (it's slow because it morphs code) instead of the simple one (it's slow because it's a less advanced processor design, with fewer computational resources available being covered for by saying it drains less power). Given that this was an article about advanced processor designs, you'd think this might come up.

      --Dan

    4. Re:Code has to be loaded anyway by Anonymous Coward · · Score: 0

      I'm not familiar enough with their technology to argue on the efficiency of their code translation.
      I suppose they manage to store the translated code on disk, otherwise they _might_ use up half the CPU power on translation for the extreme cases (that is when the translated code cache exhausts).
      Back to your point, I actually didn't understand it well, so I mostly agree with you now.
      I guess their only motivation is analysis on the technical side, and they stick to their guns (you can't blame them for that, can you ? :).

    5. Re:Code has to be loaded anyway by kma · · Score: 3, Informative

      Ehh. In my opinion, people overestimate how big a deal x86 architecture complexity is, in part because it flatters their preconception that Intel is evil. ("If only dastardly Intel hadn't been holding the world back with this demon architecture from hell, think how fast CPUs could be now!") While working at VMware, I've gotten to know the x86 architecture on a first name basis. He now lets me call him "Archie."

      While Archie is undoubtedly an ugly, drunk screw-up, he's really a droplet in the ocean of effort that goes into a competitive CPU implementation. Yeah, we've got lots of code to deal with him, and he's an ongoing source of work, but not all that much code, nor that much work. If Archie were really such a terrible guy, it wouldn't be possible for Intel and AMD to be eating so many RISC vendors' lunches.

      Mike Johnson, the lead x86 designer at AMD, probably put it most succinctly when he said, "The x86 isn't all that complex -- it just doesn't make a lot of sense." It's peculiar all right, but not so peculiar that it can explain Transmeta's failure to be performance competitive. From speaking with Transmetans, I get the strong impression that they got bogged down because making a high performance dynamic translation system is ridiculously hard, rather than, say, because they just couldn't get the growdown segment descriptors right.

  31. Double Bullshit by ShaggyZet · · Score: 1
    Windows 3x was not preemptive, it was cooperative. If a process didn't want to give up the cpu, it didn't have to, even if the OS requested that it did so.


    My understanding of 95/98/ME is that it wasn't truly preemptive either, at least not at all levels. Perhaps this is why this branch got progressively more stable? More code was either made to cooperate or shifted to runtime environments where it truly was preempted?


    OS/2, NT, 2000 and XP are truly preemptive as are all versions of Unix that I've ever heard of.


    You're right about Apple though, it took them way too long to get to this.

    1. Re:Double Bullshit by Anonymous Coward · · Score: 0

      Windows 3 was not preemptive, but Windows 386 was. Windows 386 was a version of Windows 3 that used virtual x86 mode and was fully preemptive.

      95/98/ME is also preemptive. Don't know what "all levels" means. No OS is preemptive at "all levels". Critical sections, by definition, cannot be preempted.

  32. Perhaps because by TCaM · · Score: 2, Insightful

    unlike the other x86 knockoff manufacturers they have actually attempted something somewhat new and different in their designs. They may not have met with a roaring success marketwise but they certainly did try to attack things from a different angle. The point of the article seems to be comparing the somewhat different aproaches the various cpu makers took in their designs, not how many millions of chips they have sold or billions of dollars they have in the bank.

  33. Sweeping generalizations by PetWolverine · · Score: 2, Interesting

    In fact, you could tell the story of the past 15 years of computer evolution -- from the rise of the PC to the rise of the Internet -- in terms of the effects of the amount of time it takes various components -- from a processor all the way out to a networked computer -- to load data.

    I like this assessement. Forget about Moore's Law as a measure of our progress; latency and throughput are far more important than processing power.

    Computers used to be for processing information; these days, most people use them more for accessing and delivering information. Every new computer I've gotten before my current one has only satisfied me by being faster than the ones that went before, not by actually being fast enough. However, my current machine (dual-1.25GHz Power Mac G4) leaves me with no complaints about speed--while I certainly wouldn't complain if it were a little faster, I never feel like I'm waiting for the computer for an unreasonable amount of time; most of the time, it's waiting for me.

    However, when it's not waiting for me, it's waiting for one of its hard drives to spin up and feed it with data, or for some slow server to send it something. I would trade one of my processors for a 2x improvement in either disk or network latency. While these aren't the types of latency directly addressed in the article, I would wager that on the rare occasions when I actually have to wait for some processing to take place, most of that time is spent loading data from memory, not actually processing it.

    It's not that processors are fast enough for everybody and we should forget about making them any faster; I'm sure graphics and video professionals, among others, will always have a need for more raw speed. But for most computer users, the continued emphasis on speed is misplaced. If computer manufacturers could transfer just a little bit of their R&D spending from increasing speed to decreasing latency, we'd all be better off.

    --
    I found the meaning of life the other day, but I had write-only access.
  34. no biggy by Anonymous Coward · · Score: 0

    Cooperative multitasking means nothing. PC's running DOS had that for disk IO. All versions of Windows prior to 95 had that as did alternative multitasking systems for PC's in the mid 80's.

    PC's gained multithreading with OS/2 v 1.0. Windows first got it with NT 3.1 then later with Win95.

  35. it doesn't matter by Anonymous Coward · · Score: 1, Insightful

    Using street slang makes you sound juvenile. Black doesn't mean uneducated but blacks that say "hater" sound just as stupid as other races.

    1. Re:it doesn't matter by Anonymous Coward · · Score: 0

      Being a player hater is a noble thing anyway. Who thinks that glorfication of what pimps REALLY do to women is cool? Assholes that's who.

  36. We need an Ars Technica logo here. by webslacker · · Score: 1

    Seriously. Every other day there's an Ars Technica this, an Ars Technica that. Let's make an icon for it.

  37. Windows/386 by ShaggyZet · · Score: 1
    I guess I'll have to take your word for it. What little Googling can be done on Windows/386 makes it seem like a little known bridge between win 2.x and win 3.x, neither of which was preemptive. So I guess I find it a little hard to believe that Windows/386 was indeed preemtive and they removed that before windows 3.0 was released.

    As far as Windows 9x being "less preemptive" than NT I was refering to the clearer distinction between user and kernel in NT. Obviously the kernel can't be preempted during a certain times, but with 9x, my understanding was that user programs sometimes ran pretty much like parts of the kernel. Coming from the DOS legacy, where there was no distinction between user and kernel.

    I also don't know if I'd agree that a critical section means that a process can't be preempted. The definition I'd use has to do with guarding a resource. In fact, a recipe for a deadlock is when one process enters a critical section for a resource and is preempted before it releases it, probably as it tries to get yet another resource. When another process tries to get the two resources in the opposite order, you have a deadlock. Maybe you mean something different by critical section.

  38. I wonder what would be better... by master_p · · Score: 1

    Wouldn't it just be better if we had computers with lots of tiny CPU cores, instead of such big mamooths like the POWER5 or the Ultrasparc V ? for example, an array of 256 32 bit CPUs would make life simpler and more efficient at the hardware level as well as the software level, wouldn't it be?

    By the way, I would like to have a computer that has SRAM only and a bandwidth of 100 GB/sec...Is it possible, with current technology ?

  39. Re:as does using the word "hater" by meadowsp · · Score: 1

    You jive turkey, you got to sass it.