Slashdot Mirror


The Quest for More Processing Power

Hack Jandy writes "AnandTech has a very thorough, but not overly technical, article detailing CPU scaling over the last decade or so. The author goes into specific details on how CPUs have overcome limitations of die size, instruction size and power to design the next generation of chips. Part I, published today, talks specifically about the limitations of multiple cores and multiple threads on processors."

104 comments

  1. There can only be one by 2.7182 · · Score: 4, Funny

    the quantum computer!! Until then we'll have to suck it up with these Si things.

    1. Re:There can only be one by Nintenfreak · · Score: 0

      And I suppose after that you'll want isolinear chips which store information in three dimensions.

    2. Re:There can only be one by 2.7182 · · Score: 3, Interesting

      Actually, quantum mechanics is already modelled with an infinite dimensional Hilbert space, which is why quantum computing is so fast.

    3. Re:There can only be one by zootm · · Score: 1

      Now if only we could something useful for it to do! (other than crypotgraphy)

    4. Re:There can only be one by 2.7182 · · Score: 4, Interesting

      You mean like something useful ? How about modelling weather or geophysical phenomenon or solving Maxwell's equations ? There are a zillion things like that could be amazingly better if we could speed them up. People forget too easily about scientific computing!

    5. Re:There can only be one by zootm · · Score: 2

      Haha, good point. There was some allusion to them being a replacement for current computer paradigms there at the moment, which is (to my understanding of it) simply not true, at least with what we know of them at present.

    6. Re:There can only be one by Hal_Porter · · Score: 1

      Cracking cryptography would be enough.Imagine the money you could make if you could divert 1% of global financial transactions to the chip company.

      Actually you wouldn't need to develop it - you could charge the banks/pension funds money (e.g. one beeelion dollars) to not develop it. Oh, I mean, you'd give institutional shareholders votes on the technical steering comittee. Got to be careful of those pesky anti blackmail laws.

      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
    7. Re:There can only be one by zootm · · Score: 0

      Oh, I'm not arguing that Quantum computers aren't useful, they're just not useful for most traditional roles.

      At least, that's what I meant to say!

    8. Re:There can only be one by Hal_Porter · · Score: 1
      Seriously, Quantum Computers have enormous potential. E.g some people have argued that they might solve protein folding.

      Quickly stated the problem is this. DNA contains a code which determines the sequence of amino acids in a protein. If you streteched out a protein, it would be a linear chain of amino acids - and this is the form the cell assembles it in. As the chain comes out of the assembly machine, it folds into a shape, and the shape determines what it does. We can find sequence of genes in the DNA that codes for a protein, so we can find the sequence of amino acids, but simulating the folding is really hard.

      http://www.ece.lsu.edu/kak/agents.pdf

      It has been estimated that a fast computer applying plausible rules for protein folding would need 10**127 years to find the final folded form for even a very short sequence of just 100 amino acids. Such a mathematical formulation of the protein-folding problem shows that it is NP-complete[6]. Yet Nature solves this problem in a few seconds. Since quantum computing can be exponentially faster than conventional computing, it could very well be the explanation for Nature's speed. The anomalous efficiency of other biological optimization processes may provide indirect evidence of underlying quantum processing if no classical explanation is forthcoming.


      So nature exploits this stuff to solve problems many, many orders of magnitude faster than current computers. I'm sure if we understood it we'd get similar benefits.

      Also, I hope that some things like consciousness which are inexplicable now, will be less inexplicable once we do. This is the best case scenario, admittedly. Mind you, given how optimal evolved organisms seem to be, it's hard to believe that they don't exploit quantum computing when they need to process information, unless there is some deep reason why they can't. Evolution certainly seems to exploit all the things we know about in it's 'designs'.

      Even solving protein folding would be pretty cool. Imagine drug companies being able to sketch their desired protein shape, and have a machine that can work back to DNA. Even evolution will never be able to do that.
      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
    9. Re:There can only be one by zootm · · Score: 1

      Yes, my computer's been working on the protein folding thing for quite some time.

      But do quantum computers run Linux? *rimshot*

  2. We don’t need more “power” by Pan+T.+Hose · · Score: 3, Interesting

    What we need is a better architecture which would allow for a better implementation of algorithms. Will we ever have an MMIX-like processor with 256 general-purpose 64-bit registers that each can hold either fixed-point or floating-point numbers? That is what I am waiting for, not more "power," whatever that means.

    --
    Sincerely,
    Pan Tarhei Hosé, PhD.
    "Homo sum et cogito ergo odi profanum vulgus et libido."
    1. Re:We don’t need more “power” by LiquidCoooled · · Score: 4, Informative

      Didn't the powerpc have something approaching this.
      I remember the old motorola 68000 range having 16 32bit regs for general coding, and one of the prime benefits of the ppc was the vastly greater registry capacity.

      I stopped coding assembler when I moved to x86 - what a horrible cludge of a stack stack biased platform it is.

      --
      liqbase :: faster than paper
    2. Re:We don’t need more “power” by dnoyeb · · Score: 2, Interesting

      True. These dual core CPUS are an indication that they are having difficulty increasing their CPU throughput.

      As with dual CPU motherboards, you go to dual, when you cant get anything else out of the single...

      10GHz CPU, lol. Why not release one that requires a 100GHz clock? If its only processing every 30th cycle, whats the big deal? Oversimplification I know, but that is the essence of Intels laughable strategy. Consumer ignorance vs. product innovation. Well take the ignorance. How long can it last with AMD spankin them year after year, technologically.

    3. Re:We don’t need more “power” by Leroy_Brown242 · · Score: 4, Funny

      Smart power, not more power? How unamerican!

      TERRORIST!

    4. Re:We don’t need more “power” by MerlinTheWizard · · Score: 2, Interesting

      You make a good point here. To add to what you said, I think we don't really need more "raw power" (at least, not for general use), but we need more "intelligent" use of the available power. We are a few who think the future is some kind of "soft core" where the available cells could perform different functions over time. Kind of like a super-scalar, on-the-fly reprogrammable FPGA. Think of how much of a "classic" processor is just a huge waste of ressources, most of the time. We need to improve on that.

    5. Re:We don’t need more “power” by Anonymous Coward · · Score: 0

      One of my classes in college used the 68000 (well, I think it was actually the 68030 or something, but the stuff we did worked the same on both). If I recall correctly, it was 8 address registers and 8 data registers. Still worlds better than the x86 stuff though.

  3. More power will lead to more bloat.. by klang · · Score: 4, Insightful

    That's what's been happening the last 10-15 years. Where are the indications that "time to market" and "sloppy programming" will suddenly vanish?

    1. Re:More power will lead to more bloat.. by Rinikusu · · Score: 4, Interesting

      Because, overwhelmingly, no one really cares but a handful of people. The days of hand-tweaked, ASM optimized code are pretty much over for consumer code. Yes, there will always be a market, but it is ever diminishing with the size of market expanding. To use analogies, look at furniture. Go to just about any furniture "gallery" positioned for the great American unwashed and you'll find several hundred, almost identical mass-produced fat-ass-cliners, some with machine stitched leather, some with vinyl, some with cloth, etc. Dressers and other cabinetry are stapled, nailed, screwed and glued with machine precision accuracy. The demand for hand-built, crafted furniture has dropped tremendously (and the prices for these craft pieces seems to have gone up.. ). Yes, a "hand-tweaker" coder will probably find work with a small shop somewhere, or create their own consultancy for constituents who demand that kind of programming, and chances are that coder will make quite a bit more than the average, churn and burn programmer (people like me), but for the overwhelming majority, it's overkill.

      (Here's a simple cost analysis: We can pay this guy $100k/year to do hand-optimized tweaks on this code that then becomes a liability for future maintanence if that coder dies, quits, or whatever. Or, we could add another stick of $100 RAM, and buy a new processor next year for a fraction of his cost and get a similar performance bump... The math doesn't add up...)

      --
      If you were me, you'd be good lookin'. - six string samurai
    2. Re:More power will lead to more bloat.. by Kjella · · Score: 2, Interesting

      From what I've gathered OS X has been cleaning up and improving speed on their code. A few select open source products have also reached a "stable" feature set and are working on smoothing things out. On the whole though, not all of it has gone to bloat. Much of it has gone to abstraction, reuse and consistency. I'd rather they reused a known, tested component that's 10% or 20%, or depending on the application, 1000% slower than to rewrite a new custom piece that'll have new bugs.

      Speed is rarely an issue that annoys me, but crashing is. And it is even more annoying to have a page mis-render than having Opera crash. The only speed issues that annoy me are sluggish or hanging interfaces, but that is a programming issue, not a "speed" issue as such. It doesn't take 3GHz to respond to mouse/keyboard events.

      Kjella

      --
      Live today, because you never know what tomorrow brings
    3. Re:More power will lead to more bloat.. by EvilTwinSkippy · · Score: 4, Insightful
      Hey, I shop at Ikea. The stuff isn't event assembled. It's a flat box full of precision cut boards with bolts and one of those funky allen keys.

      Getting back to your point, there is still a market for hand-coders. With most consumer electronics, I'm talking kid's toys, alarm clocks, talking dolls, you try to shave off every penny you can in manufacturing costs. Plus, once you start a product line, you run it out for years.

      In that case, of high volume and low cost, it is easy to absorb the cost of a $100,000 hand coder. Especially if he can save you $0.10 a unit on lines where volume is measured in the millions of units.

      Besides, most of the "hand coders" I know work more in the $36,000 dollar range.

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
    4. Re:More power will lead to more bloat.. by PureCreditor · · Score: 1

      $36K per project or per year? Cuz these days, especially if u live in USA or Canada, USD 32K is terribly low salary for a programmer.

      Heck, Microsoft pays 50K-60K for an undergrad degree fresh grad.

  4. Quick answer by LiquidCoooled · · Score: 5, Interesting

    Run old software.

    Its only new software thats sucking up all the extra processing power.

    Remember back with really sluggish 33mhz 486s etc (and a lot lower) and thinking of the ultimate computer being a whole 50mhz.
    Well now you got a computer thats over 10 times faster with practically infinate capacity.

    Fire up that old operating system and run you original software, you will be in heaven!

    --
    liqbase :: faster than paper
    1. Re:Quick answer by Anonymous Coward · · Score: 0

      Unless you were using Windows which is specificly design to not scale well for a lot of RAM - google for the things like the anti-DRDOS "bug" and you'll know what I mean.

    2. Re:Quick answer by MoonFog · · Score: 0

      How many % does your CPU use on average? Unless I run BOINC or something, the CPU is usually on 2% of its capability. When compiling and other heavy stuff, off course it will go up, but I have all the processing power I need for now. This article mentions that the development isn't going as fast as it used to, not that we don't necessarily haven't got enough processing power.

    3. Re:Quick answer by Anonymous Coward · · Score: 0

      Dude,

      I remember when I upgraded my 386 SX33 to a 468 DX33. Doom was so fast I nearly blew my load.

      Then... I used my tax refund to buy 4 more 30 simms and had that computer kicking a whole 8 megs of RAM. I also upgraded to WFW 3.11 becasue of the 32 bit disk access drivers.

      Finially I got some extra cash and bought a new MB and 486 DX2/66.

      I coudl then play Doom 2 at an acceptable speed.

      I was in heaven.

      Kids these days are so spoiled.

    4. Re:Quick answer by Anonymous Coward · · Score: 0

      I'm sure you won't have a problem finding windows 3.1 drivers for your hardware.

    5. Re:Quick answer by Anonymous Coward · · Score: 0

      e.g. the DR-DOS threat (i'm not the parent poster.)

    6. Re:Quick answer by Gr8Apes · · Score: 1

      Do you think OS/2 Warp has drivers for an ATI 9800 Pro or the chipset for a 3.2 GHz P4 or AMD 64 FX 53? I'm sure it'd fly, as it already flew on a Pentium Pro 180 with 64MB of RAM.

      --
      The cesspool just got a check and balance.
    7. Re:Quick answer by 68K · · Score: 1

      Nice idea, but I tried this: Windows 3.11 won't even install on my 1.2GHz Duron, and I'm not even going to try it on my AMD64 3200+...

    8. Re:Quick answer by i+am+fishhead · · Score: 2, Funny

      Remember -- if you run DOS on a CPU with a large enough L2 cache, you can fit the entire address space minus extended or expanded memory (or whatever they called it) into L2 cache!

    9. Re:Quick answer by fille · · Score: 1

      Indeed, Office 97 rocks man! ;)

    10. Re:Quick answer by calidoscope · · Score: 1
      Remember back with really sluggish 33mhz 486s etc (and a lot lower) and thinking of the ultimate computer being a whole 50mhz.

      I remember when my 16 MHz 386 machine was the hottest thing around - blew the doors off of the 6 to 8 MHz AT's. Shortly after buying the 386, I picked up a copy of Gato which used timing loops intended for the 4.77 MHz 8088 - went w-a-y too fast to be playable until I learned how to set the clock speed compensation on the game.

      Before that when an 8 MHz 8086 was pretty hot stuff (which it was in 1982).

      --
      A Shadeless room is a brighter room.
  5. x86 centric by Anonymous Coward · · Score: 3, Insightful

    Might want to point out that the article is x86 centric. Not that it only applies to x86, indeed many/most of the issues are just generally related to processors (single vs multi-core, trace lengths, etc), but the article definitely focus' on these issues as applies to the x86.

    1. Re:x86 centric by Anonymous Coward · · Score: 0

      x86 has been around the longest as far as modern architectures. Sure there have been ones before x86 but they aren't around today. x86 provides the best benchmark for this type of long term analysis.

  6. Pun by 2.7182 · · Score: 2, Funny

    From my point of view, chips lead to more bloat.

  7. Re:We don’t need more registers by cnettel · · Score: 5, Interesting
    Ok, classic x86 is cramped and the CPU does a lot of register renaming to get around it. I don't agree that more registers would actually do that much good.

    What kind of algorithm are you imagining would benefit from 256 fields of non-vectorized data?

    Of course, those registers could be used in larger things for everything that's worthy of a local variable, but as soon as you run into a stack operation you'll either only want to push a subset of the registers to the stack, or face a harder blow of memory access times by making each function call a 2048 byte write to memory.

    Explicit encoding of parallelism, hints to branch prediction, and similar stuff, seems far more appropriate.

    Again, few single functions in an imperative language have 256 separate variables, without involving arrays of data. Unless the register file is addressable by index from another register (basically turning it into a very small addressed memory, which is whta you try to avoid with registers), you have little use for 256 of them. Take for example a trivial string iteration algorithm, most of those registers would be completely useless. The same holds true for common graph algorithms.

  8. Re:We don’t need more registers by cnettel · · Score: 2, Interesting
    I don't agree that more registers would actually do that much good.
    Clarification: It's easy to see that you move in and out of registers and force the CPU to do register renaming to get good parallelism in x86. I fail to see the benefits from a real performance standpoint when you reach above let's say 32 of each kind, and I think that the 16 available in AMD64 should be fine for most tasks. The problem in x86 is that they are eight and even those have locked meanings to some degree.
  9. Unbloated URL by rylin · · Score: 5, Informative

    http://www.anandtech.com/printarticle.aspx?i=2343.
    Same article without 90% of the ad-bloat.

    1. Re:Unbloated URL by XMyth · · Score: 1

      But now it's missing all those helpful hyperlinks peppered throughout the article.

    2. Re:Unbloated URL by bsmoor01 · · Score: 1

      How insightful! He doesn't need ad revenue anyhow, right?

      I like AnandTech. The articles are generally decent. Why try to screw the guy over like this?

  10. Eliminate Bottlenecks by Trolling4Columbine · · Score: 5, Interesting

    Chances are that you aren't often pushing your CPU to capacity. What I'd like to see is a better way to identify bottlenecks in my system. There's no sense pumping more power into a system if it's all going to be throttled by something like a slow hard drive.

    --
    Socialism: A feeling of discontent and resentment caused by a desire for the possessions or qualities of another.
    1. Re:Eliminate Bottlenecks by asalvari · · Score: 1

      very good point.

      In fact I would like to see research done on what operations are considered slow. For instance, if your word processor takes 1 sec to update the screen it is considered slow. But nobody will pay any attention if the DVD Burning takes 5 or 10 min..

    2. Re:Eliminate Bottlenecks by ldaugusto · · Score: 1

      That's a good one. How can we ask for faster processors if ours even are 100% used? The current bottlenecks are memory, motherboards and I/O. Most people has 128 or 256MB of RAM. So what we can except? A lot of swap! And swap sucks your system. When you has processors working on GHz, memories working on MHz and hard disks working on a few KHz, even if a lot of cache memories you can't speedup your system. Now they're trying to use a multi-core computers. Fine, sounds good. It's cheaper than a single core. But now there gonna have so much more controlling for I/O and comunication between cores. So we need to improve (speedup) the comunication on motherboards.

    3. Re:Eliminate Bottlenecks by Ironsides · · Score: 3, Informative

      Most bottelnecks are already known. Here is a breakdown of access time when you are running at processor speeds:

      L1 & L2 Cache: Almost instantanious, Picoosecond resonse time
      L3 and higher Cache: A bit slower, but still pretty quick, Nano resonse time
      Main memmory: Go do something else while waiting for this, Nano/Microsecond resonse time

      Hard Drive: Go to lunch and come back, Milisecond resonse time

      --
      Fly me to the moon Let me sing among those stars Let me see what spring is like On jupiter and mars
  11. Re:Which fanboy are you? by Zork+the+Almighty · · Score: 0, Offtopic

    Jeez, I've been running Gentoo all this time when I should have been running Linux from Scratch, if only for the chance of sadomasochistic sex!

    --

    In Soviet America the banks rob you!
  12. Limitations... by MosesJones · · Score: 2, Funny


    Ummm, my home machine has a 400MHz processor running Suse. I'm thinking of upgrading, as I have every 6 months for 5 years, but I just keep waiting for the "next" best thing rather than upgrading now.

    There are mobile phones more powerful than my home PC, but it does the job.

    The wonder of these future boxes is that we will STILL be able to write code that makes them run slow. Roll on Longhorn I say!

    --
    An Eye for an Eye will make the whole world blind - Gandhi
    1. Re:Limitations... by gelfling · · Score: 1

      Overleap makes a 1.3Ghz Tualatin upgrade for Slot-1 Pentium machines. It's called a SlotWonder 1300C and costs about $100.

    2. Re:Limitations... by Kjella · · Score: 3, Interesting

      The wonder of these future boxes is that we will STILL be able to write code that makes them run slow. Roll on Longhorn I say!

      Well, each version of Windows seems to bring about new hardware requirements. Most people buy a new Windows version with new hardware. It is more than just a little coincidence. I think Microsoft is well aware that most people aren't able to install Windows themselves, and that making them believe you'll need a faster box is a good idea to keep them upgrading to the "next" level, both on software and hardware.

      Kjella

      --
      Live today, because you never know what tomorrow brings
    3. Re:Limitations... by Anonymous Coward · · Score: 0

      yeah i was running a 400MHz system for a --long time-- too, until very recently. the damn thing was fast, i'm not even kidding. amd k6-3, running winxp with 512mb of memory. i only felt compelled to upgrade cuz the damn ide controller started corrupting my shiznit. other than that, it was just fine for normal use.

    4. Re:Limitations... by Anonymous Coward · · Score: 0

      I just installed a 700 MHz Pentium 3 on an Asus ps2-f motherboard, upgrading from the 400 MHz pentium 2.

      Beware if you do this though. Moon-lander is really fast and much harder to play! Also, pentium 3's are expensive and hard to find on-line, but I got lucky and got a free one from work.

  13. Myth of the single threaded desktop by Anonymous Coward · · Score: 1, Interesting
    The myth is that desktop programming is inherently single threaded and that there's no benefit to multi-threading. This is in part due to that fact that a lot of multi-threaded programs don't run any faster on a single processor than a single threaded program does. If there's no benefit to writing multi-threaded programs, than why go to the extra trouble of doing so.

    I expect that once multi-core desktop cpu's become more prevalent, the advantage of multi-threaded programming will become evident and start to take off.

    1. Re:Myth of the single threaded desktop by harrkev · · Score: 3, Interesting

      There are two fundamental truths:

      1) Programming for two or more processors is more work, and prone to more subtle and strange errors.
      2) Most people only have one processor.
      You can draw the obvious conclusions.

      Fact #1 can be dealt with by proper techniquie, training, and tools.
      Fact #2 is going to change due to the inability of AMD, Intel to deliver over 4GHz.

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    2. Re:Myth of the single threaded desktop by Anonymous Coward · · Score: 0

      1. people are stupid.
      2. processor speed is the only thing everyone knows.
      corollary :
      3. moores law is dead.

    3. Re:Myth of the single threaded desktop by hackstraw · · Score: 2, Insightful

      1) Programming for two or more processors is more work, and prone to more subtle and strange errors.

      Threaded apps, and multitasking OSes have been around for years. Even if an app is single threaded, the user is still benefited by having 2 or more processors because the system is still very responsive, even if one app has one CPU completely pegged.

    4. Re:Myth of the single threaded desktop by KillerCow · · Score: 1

      The myth is that desktop programming is inherently single threaded and that there's no benefit to multi-threading.

      Blocking man. There is a ready queue and a blocked list for a reason. Those disk accesses aren't instantaneous. Neither is waiting for input from the user, or waiting on a socket. If your thread is blocked, there might be other work that you can do while you wait.

    5. Re:Myth of the single threaded desktop by Anonymous Coward · · Score: 0

      Correct me if I'm wrong, but you seem to be assuming that multithreaded programs are only a boon to multiprocessor computers.

      Which is flat wrong.

      All you need is to be doing a task that involves any I/O. I've never had a dual-CPU box as my primary workstation, and I've never written programs targetted at dual-CPU boxes, but customers sure appreciate it when the interface doesn't seem to freeze up like some other programs do.

  14. How could it be "overly technical"? by kinema · · Score: 0, Offtopic
    not overly technical
    Why not? This is Slashdot after all.
  15. Re:We don’t need more registers by Jeff+DeMaagd · · Score: 3, Informative

    Ok, classic x86 is cramped and the CPU does a lot of register renaming to get around it. I don't agree that more registers would actually do that much good.

    It does. Take a look at x86-64. The 98% reason 64 bit x86 code is faster when you are using less than 4 gigs of RAM is the fact it has double the registers. With the same number of registers, 64 bit code normally slows things down measurably because the pointer size doubled. The instruction word length doesn't change.

    256 registers goes a bit far unless half of them are predication bits.

  16. Not particularly good for Windows by TheLoneCabbage · · Score: 3, Interesting

    Multi threading get's you a speed boost not necesarily on the individual application, but definetly on the OS level. That's why Sun get's away with individual CPU's that are each 1/4 the speed of cheapy x86 hardware.

    Most OS's these days are not monolithic. Even MS is really a collection of smaller pieces, but not nearly to the degreee of Linux.

    Linux just scales better than Windows on multiple CPUs. I have no doubt that MS will work indian programers day and night to catch up, but this is a game they are definetly playing catch up in.

    Linux, in some versions is scalling past 64 CPUs now (oh the benefits of forked kernel development!), which should factor nicely when time comes that AMD ('cause may not be around then) is pushing ships with dozens if not hundreds of micro-cores.

    Last I checked (and I may be out of date on this) Windows started bogging on 4 CPUs. And never mind it's assanine global message loop.

    I fully realize Joe User cares more about percieved performance than real performance (long live xorg!), and explaining Linux's advanced scaling architecture will not win over the desktop, but it will have a signifigant impact on technical decision markets; from servers to embeded devices (HUGE market for these clustered chips).

    1. Re:Not particularly good for Windows by TheRaven64 · · Score: 1

      I'm sorry, but this is just plain Linux-fanboyism with very little technical backing. Windows has had threads as kernel scheduled entities since the first version of NT. Linux got them in 2.6. The reason you can not use the stock version of Windows on more than 2 CPUs (32, I believe, for AS) is that it uses a bit map to identify CPUs, the size of the bit map being determined at compile time. This approach (at least in theory) will give better performance for small numbers of CPUs, although it scales less well.

      --
      I am TheRaven on Soylent News
    2. Re:Not particularly good for Windows by Anonymous Coward · · Score: 0
      Part of Linux's scalability came from RCU. I've done an RCU for preemptive user threads as part of theatomic-ptr-plus project. It's not really limited to Linux since it's mostly straight Posix threads. There are some kernel hacks that would make it even more efficient. I'm not a kernel hacker so you won't see them from me. From somewhere else maybe, assuming it's still an active project. I'm not saying where for now.

      But anyway, if you want your apps to run faster, you're going to have to start using the same tricks the Linux kernel is using.

    3. Re:Not particularly good for Windows by Anonymous Coward · · Score: 0

      You've obviously been misinformed about Linux. Since Linux has supported threads since at least 2.4 using the clone system call. Linux uses copy-on-write forking which makes forking and threading behave remarkable similar; once you start writing to a new process/thread is where the fork/thread becomes apparently different. As such, threading is handled as kernel threads. The reason the new threading (NPTL) in 2.6 is so good is that there's been a lot of optimization work put into making thread creation/destruction even faster. NPTL has a lot of other conceivable advantages, anyways, as it provides various different features.

    4. Re:Not particularly good for Windows by i+am+fishhead · · Score: 1

      Actually, Linux 2.4 had threads as kernel scheduled entities, it's just that they were treated as processes instead of threads which worked out just fine save for some POSIX compliance issues.

    5. Re:Not particularly good for Windows by Anonymous Coward · · Score: 0

      pushing ships

      Woha! I must have slept in for quite a while. When did they get into manufacturing boats?

  17. Re:Which fanboy are you? by XFilesFMDS1013 · · Score: 1

    Wild. Did you get this from another site? I.e. are there any more? Actually, let me go look on Google.

  18. Re:We don’t need more registers by cnettel · · Score: 4, Interesting
    Read my own clarification response above yours, I intended to write that x86 is cramped by its register count (and the further restrictions on what to use when), but that 256 is very, very much.

    The Itanium has a huge file with, IIRC, even more registers in total. They are not inter-changeable, though, but the (almost) only point in that would be to keep the total number of registers down, while being flexible for most types of code. As I think that it's generally actually easier to make them separate for different execution units, that's not very interesting. Also, note that the Itanium currently has a 2-cycle (again, IIRC) register access time! They tried to be visionary, adding a huge register set, in addition to some parallelism encoding and other things I mentioned in the parent, but they traded (what seems to be) far too much to get it.

    A huge (defined as MMIX-like, not AMD64-like)register file might be great, but you need selective register pushing to stack to get away with it, unless you or the compiler are performing very aggressive inlining. What's easier, if you're doing assembler -- calling a function and put a local on the stack or writing a huge fricking implementation of your main algorithm, taking great care to use all different registers in each function inlining?

  19. User defined branch prediction by neomage86 · · Score: 1, Interesting

    Just to note, I am not an Electrical Engineer (but will be in 3 years). From what little I've read, it seems like branch prediction allows the cpu to prefetch data it will need. Smart math people keep coming up with better and better general purpose algorithms. But these new algorithms need more and more logic behind them, adding to CPU complexity a lot. Now, my question is once we have an n-core cpu, would it be possible to optimize your main cpu set up for general purpose use, the second for video enconding, the third for games, and so on. Then when you run software, it will know (or you tell it), what CPU to run on. It seems that if the CPU designers knew what kind of code would be running, they could optimize branch prediction algorithms better for that task. It seems like misses are extremely expensive, and that something like this would help. It would be the next best thing to having an FPGA on your chip that automatically reconfigured itself for whatever algorithm you need.

    1. Re:User defined branch prediction by Mauvaisours · · Score: 2, Insightful

      This is already what you have : you have a general purpose CPU (Intel or AMD), graphics CPU (Nvidia or ATI), audio CPU, MPEG en/decoding, DSP, Vector, ...

    2. Re:User defined branch prediction by TheRaven64 · · Score: 1

      You can't design an algorithm that says `for video applications a branch is more (or less) likely to be taken than not taken'. What you can do, is put clues in for individual branch. In some languages, exceptions are used for this. An exception is just another sort of branch, and can be used as a standard control structure, but the compiler knows that the exceptional condition is less likely to occur and so will optimise the other condition more (and tell the CPU, if this is supported by your ISA).

      --
      I am TheRaven on Soylent News
    3. Re:User defined branch prediction by EvilTwinSkippy · · Score: 1
      I will offer one suggestion, as one who was also at one point 3 years from an EE degree.

      Forget everything you are told about X being optimal, and Y being old hat. Computer architectures come and go like bell bottoms and short skirts.

      Branch prediction is a workaround. It is not a radical performance enhancing technology. It is there to keep the CPU busy when it would otherwise be starved for instructions and data. Branch prediction is simply there to allow the CPU to operate at an insanely high clock speed as compared to the memory bus. And it only works well when you have a relatively fixed target to optimize for (namely Windows.) Branch prediction is also needed because later generations of the i686 processor have insanely long pipelines.

      And it flows completely counter to what I was taught back in the late 90s about compiler theory. The idea then was to make instructions small and simple, and let an automated system figure out the optimal way to arrange instructions for maximum throughput.

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
    4. Re:User defined branch prediction by Anonymous Coward · · Score: 0

      My condoleneces to you about your choice of EE. Perhaps you should train in skills like these if you plan to actually make a living... EE? I mean come on, the 60s are over. I don't know what kind of BS the schools lay down on the kids these days, but I think you will be in *shock* when you hit the job market...

    5. Re:User defined branch prediction by i+am+fishhead · · Score: 1
      Branch prediction is a workaround. It is not a radical performance enhancing technology. It is there to keep the CPU busy when it would otherwise be starved for instructions and data. Branch prediction is simply there to allow the CPU to operate at an insanely high clock speed as compared to the memory bus. And it only works well when you have a relatively fixed target to optimize for (namely Windows.) Branch prediction is also needed because later generations of the i686 processor have insanely long pipelines.

      Not so much. Branch prediction is needed for pipelining to be a win. I'll agree it's pretty much nessicary for good performance in pipeline architectures, but I don't think I'd call it a workaround. On average, most applications have a basic block size (amount of instrcutions between branches) of four to five. I don't know about you, but I don't think that four is "insanely long". To get the most out of a pipelined superscalar processor you (obviously) want have as many instructions running in parallel as possible and keep as many of the slots in your pipeline as full as possible so you will continue to have many instructions running in parallel. If you have a processor whose pipeline is longer than four or five stages, you'll be better off just picking a direction to go on the branch than you will be waiting for the result while your pipeline empties -- if you're wrong you'll be no worse off than if you waited, but if you're right, you've just got a number of useful cycles of work in for free. Branch prediction provides a better option than just picking a direction and running with it.

      And no, branch prediction does not work well only for Windows. It works well whenever the direction of the branch will take is easy to predict for example, loops will be consistantly jumping backwards. In fact, loops are one of the places where branch prediction in a pipelined architecture is probably most useful: if the loop has any significant number of iterations, it should be pretty obvious as to which way it will go and the branch prediction will always be a win (except for when we exit the loop).

    6. Re:User defined branch prediction by PureCreditor · · Score: 1

      When you reduce the pipeline from a P4-length to something along the lines of a PowerPC, the mis-prediction penalty is much lower. Also, GPUs on graphics cards already off-load a major chunk of repetitive processing of 3D rendering and 2D video decoding, occasionally video encoding.

      The "Cell" architecture does something similar to what you've described - different cells handle different tasks of a multimedia system, say set-top box or Playstation 3. Better statistics modeling is what's needed in terms of branch prediction optimization. The longer a predictor runs a certain type of code, the more it'll learn from its past mistakes, and statistically predict a branch.

      When you have a Cell-like architecture, you can have idle CPUs at times, which means instead of trying to predict a particular branch, it has spare power to also execute all alternative branches, and temporarily store the results in a cache. Then once the branching logic chooses a particular scenario, the pre-calculated result will be loaded.

    7. Re:User defined branch prediction by Anonymous Coward · · Score: 0

      So this FPGA that can be reconfigured to do whatever task you want it to do ... how's that different from running a program on a CPU?

      Yes, it's true that if you had an infinite amount of time and space and money, putting every algorithm you wanted to run in its own chip, you would be able to run a bit faster. A computer is exactly the opposite of that: a general-purpose computing device.

      Note that there have been special-purpose cards to accelerate specific operations -- for example, the Java add-on card. It failed miserably: easier and cheaper to wait a little while and buy a faster CPU.

      Similarly, not even a hardcore Lisp hacker would buy a Lisp machine after they discovered that a general-purpose Alpha could beat the pants off it at even Lisp code.

      CPUs are a steamroller that flatten anything they want to. Good luck trying to beat them at anything for any period of time. (The one exception I can think of is the 3D graphics card, and those are relatively cheap, and offer order-of-magnitude improvements in speed for common operations.)

    8. Re:User defined branch prediction by farnz · · Score: 1

      ISAs already have ways to indicate static branch likelihood for conditional branches. Some ISAs have a "likely to take" flag, indicating that a branch is more likely to be taken than not taken, while others have rules like "if branch backwards, then taken is more likely, else not taken is more likely". A compiler can make use of these to do what you're suggesting.

  20. i don't think much of this article by Magius_AR · · Score: 1
    The author seems obsessed with the Pentium.

    The only reference made to AMD is regarding their ingenious SOI technology. With the exception of that, the focus is maintained on Intel, (whom he calls the "#1 in the CPU market"). I find that somewhat absurd, since Intel is largely failing (stretching an obsolete architecture to extreme limits by extending the pipeline) where AMD is innovating and has already largely surpassed them.

    AMD's CPU does a hell of alot more per clock cycle than Intel's. The AMD 64 bit chip is a marvel.

    1. Re:i don't think much of this article by cnettel · · Score: 1
      Well, he's also talking about the K9 delays. I think that the failure of the Prescott in some ways show what problems both leading x86 vendors are fighting against. Isn't SOI, BTW, more of an IBM deal that AMD cross-licensed?

      And, yeah, the only Intel CPU I currently like is the Pentium M, and I hope you can forgive that. If I would currently buy a new main machine, it would probably be AMD, but I'm holding out for dual-core releases from them. I like the effects that both "real" SMP and hyper-threading has on general responsiveness and a dual-core AMD64 chip would fit very nicely.

    2. Re:i don't think much of this article by myov · · Score: 1

      Correct me if I'm wrong, but wasn't SOI (Silicon On Insulator) IBM's technology?

      --
      I use Macs to up my productivity, so up yours Microsoft!
  21. Re:We don't need more "power" by harrkev · · Score: 1

    You can already buy PCI boards that will let you do this. It is just that software support is seriously lacking (non-existant).

    My guess is that this would work wonderfully for certain classes of problems, and would be quite useful for things like finite element analysis, MPEG encoding, and the like. The main problem is that a FPGA takes a fair bit of time to load its configuration file. Obviously, you would not want to multitask between two different applications trying to use this FPGA. Otherwise, you will spend more time context-switching than you would actually working.

    You can get a simple FPGA for only a buck or two, now. Decent ones are $10. It would not cost too much to add them to a mobo. All you need is for somebody to come up with a decent programming framework (which is far from trivial).

    --
    "-1 Troll" is the apparently the same as "-1 I disagree with you."
  22. Re:We don't need more "power" by MerlinTheWizard · · Score: 1

    Of course, a classic FPGA architecture wouldn't cut it. But there are some more advanced architectures that are being tested already, that allow extremely fast reprogramming. Imagine if some areas of your processor could be reprogrammed in the time it takes for, say, a context switch. And of course the underlying OS needs to be written so as to optimize the processor's use at any given time.

  23. 4 GHz as a wavelength (:-)) by davecb · · Score: 1
    A colleague suggests that 4 GHz may be a hard frequency to exceed (in the short run).

    My leaky brain suggests that this might correspond to the propogation speed in silicon for a given path length and a given process (eg, 90nm may give us better results).

    --dave

    --
    davecb@spamcop.net
  24. Re:We don’t need more registers by Anonymous Coward · · Score: 0

    Why would anybody need more than 640K?

  25. Assumptions by Zobeid · · Score: 1

    I have to question one of the main assumptions in the article -- that most software won't benefit from multiple processors. In a sense it's true, but it's also misleading.

    If you are desperate to run your word processor or spreadsheet faster, then he's got a point. But realistically, don't the current systems already run those kinds of programs just fine? Is this the kind of application where more speed is most needed?

    I think Sony have got it right with their whole "media processor" approach, with high bandwidth and multiple vector units. It won't benefit most programs, but it will greatly benefit most of the programs that slog on today's systems.

    The time has come to break away from the old approach of merely running the same linear X86 code faster and faster. I think this change is overdue.

    1. Re:Assumptions by superpulpsicle · · Score: 1

      I do agree that most apps and games don't really feel that much faster.

      Compare a game at Pentium I 100mhz and a Pentium II 200mhz, there is a massive difference. That's off by just 100mhz.

      Compare a game at Pentium IV 1.4ghz and a Pentium IV 2.0ghz, there is hardly any difference. That's off by 600mhz.

      The industry is obsessed with number crunching and generic software number benchmarking. Which is a bad measurement altogether.

  26. More CPU reading by Byzandula · · Score: 1

    Tom's Hardware also has "The Mother of All CPU Charts." Which is also a good read with many benchmarks.

    It is crazy how far we have come.

    -The only sig I have is a cig with a good single malt.

  27. The Power Bottleneck by Anonymous Coward · · Score: 0

    Is it just me, or does this article describe current leakage using bipolar transisters?? I didn't think those were commonly used, with CMOS pretty much supplanting it... Really it seems like the right argument, applied to the wrong mosfet.

  28. Re:We don’t need more registers by arthurh3535 · · Score: 1

    Branch prediction sounds decent to most people, until it hits reality. Having a program "predicting" that it will need a certain path is *backwards*. If a certain calculation *should* go down a path, it should pre-tell the channels.

    --
    No! It's a *SIG*. Keep the Special Interest Groups away! (Con joke!)
  29. The world is x86 centric by Anonymous Coward · · Score: 0

    Last I remember, x86 compatible microprocessors are more than 95% of the desktop market... it seems only that such an article would focus on x86 (unless the PPC people found a way to hit 5 GHz, which to my knowledge, has not happened yet.

  30. Re:We don’t need more registers by i+am+fishhead · · Score: 1
    The problem in x86 is that they are eight and even those have locked meanings to some degree.

    Locked meanings? I'm not so sure. If we do a MUL EAX then the result goes into EDX:EAX. Since EAX gets clobered, it'll get renamed. Combine that with the fact that most compilers generate code that does not use instructions in which registers have special meaning anyway and I don't think this is actually a problem.

  31. just a thought by zenst · · Score: 3, Funny

    Now I only have a very limated understanding of the issues and electronics, given my lack of electronics experience. But couldn';t the leakage by utilised in a some form of intigrated peltier coolining to help pump the heat out of the chip and as such making it cooler help in a small way to reduce leakage. ANother, and call it wacky thought that struck me ws why not have another layer of large silicon that is powered by the leakage. It look to me that the leaked power from the 70mn process is nearly enough for the total power on the 90mn process and then the leakage from the 90 would do something just over the 180mn process. In a sence another form of heat pump :). Anyhow I'm sure I've either given you electronics guru's somthing to thing about or at the very least, laugh about. Enjoy :)

    1. Re:just a thought by ChrisMaple · · Score: 2, Informative
      Peltiers have been used, but they are expensive, inefficient, and not very useful at the high power densities of a P4.

      Leakage is not available as a power source. Leakage is turned into heat in that exact location where the leakage occurs.

      --
      Contribute to civilization: ari.aynrand.org/donate
  32. In addition to the power of and.... by jimmy8888 · · Score: 1

    www.everythingispossible.com

    --
    Never insult someone who serves you food. - Brought to you by the Democratic People's Republic of Jimmy.
  33. dividends from MSFT havn't been good in years by way2trivial · · Score: 1
    http://www.microsoft.com/msft/FAQ/faqdividend.mspx #Question14

    july 20, 2004 was pretty sweet....

    --
    every day http://en.wikipedia.org/wiki/Special:Random
  34. Who needs this much power? by vivin · · Score: 2, Interesting

    We have to look at how much this affects different people.

    Who needs so much raw processing power? Your everyday Joe Computer User, only uses it for Word Processing and checking email, and surfing the interweb. Which is why when some of my friends (or their parents) go looking for a new computer, I ask them what they use their computer for, mostly. If they're not eXtreme gamers or something, then I don't see a point with them buying a processor screaming along at 4 Ghz or whatever.

    In the light of this, I still think there's a market for single-core CPU's for the everyday user. There is probably one other thing that can change this though - video encoding/recoding. A lot of people are starting to use their PC for burning DVD's. As anyone who's ever authored a DVD knows, it can take some time. It takes about 3.5 hours on my Pentium 2.4Ghz to author a (4 Gig) DVD. That time is spent on the encoding. So multicore processors would probably help with that (or perhaps there could be a dedicated hardware solution - encoder cards?).

    I know this article is just talking about continuing trends, and what could die out. So yes, unicore/single-core CPU's may not be a "profitable" trend, but there are still uses for them. Also, as the article showed, talking about hyperthreading, it would also help if apps were written taking into account hyperthreading/multicore processors in mind. That way, they can take full advantage of it. I see hardware taking a while to catch up and utilize the full potential of the hardware.

    --
    Vivin Suresh Paliath
    http://vivin.net

    I like
  35. Better colours by Anonymous Coward · · Score: 0
  36. How MMIX Uses Its Registers by Sunlighter · · Score: 1

    If I recall correctly, MMIX uses its whole huge register file as a stack. All of your instructions specify register numbers as counted from the top-of-stack. Stack space is allocated and deallocated in frames, not a register at a time. A frame must be small enough to fit in registers. The stack spills to memory if it overflows, and refills from memory if it underflows. It does not have to spill/refill on a frame boundary. But activation records for compiled C routines could nest five or six deep and not spill. An inline routine can still allocate and release its own activation record.

    Not all the registers are used in a stack-like way; some of them are global for your program and some of them are global to the OS. There are a couple of special registers that indicate where these regions start in the register file. The remainder of the register file is used as a stack.

    --
    Sunlit World Scheme. Weird and different.