Slashdot Mirror


There's No Such Thing As a General-Purpose Processor

CowboyRobot writes: David Chisnall of the University of Cambridge argues that despite the current trend of categorizing processors and accelerators as "general purpose," there really is no such thing and believing in such a device is harmful.

"The problem of dark silicon (the portion of a chip that must be left unpowered) means that it is going to be increasingly viable to have lots of different cores on the same die, as long as most of them are not constantly powered. Efficient designs in such a world will require admitting that there is no one-size-fits-all processor design and that there is a large spectrum, with different trade-offs at different points."

181 comments

  1. There are different workloads, duh. by Anonymous Coward · · Score: 4, Insightful

    David Chisnall says : parallel is not the same as in series. World gasps.

    1. Re:There are different workloads, duh. by Jane+Q.+Public · · Score: 2

      David Chisnall says : parallel is not the same as in series. World gasps.

      It's worse than that. TFA's basic premise that "there is no such thing as a general-purpose processor" is just flat wrong. Of course there are. His real argument is about how to make them efficient, which is a different thing and very much contrary to his title and introduction.

      Anything that can implement a Turing machine *IS* a general-purpose processor, by definition. And any general-purpose processor can do what any other general-purpose processor can do... although not necessarily fast or efficiently.

      Sadly, Chisnall gets the latter part (efficiency) confused with the former part... and in the process he incorrectly attempts to re-define the entire long-accepted meaning of what a "general purpose" processor is.

      Just no. He's not qualified to do that.

    2. Re:There are different workloads, duh. by issicus · · Score: 1

      So many things only use one core. I would definitely go for a 6ghz 2 core over a 3ghz 4 core.

    3. Re:There are different workloads, duh. by gwolf · · Score: 1

      Even if your 3GHz 4 cores have a decent amount of cache and can perform their computations without going down the memory bus bottleneck? Remember, the bottleneck would be even worse, because you didn't mention the memory would be twice as fast as well. And, of course, the rest of the buses and peripherials would also be affected, so all waits for memory and for external I/O would, for become effectivly doubly as expensive, as seen by the processors.

      Of course, you could say that it'd be nice to have all of the computer's components continue increasing in speed. Well, that'd bring another problem: Motherboard sizes. Because at 6GHz, light speed becomes a limit as well: If, speaking in round numbers, light travels ~300,000,000 meters per second, then it takes 3.33x10^-9 seconds for it to travel one meter. At 6GHz, light travels 50cm per clock cycle. I know I'm comparing apples and oranges here, as electrons don't "move" along the wire, but still — Signals will only travel fractions of that distance on an electronic circuit.

      Yes, it could be easier to keep both cores happily going along without programmers having to learn to master concurrency. But we are hitting physical barriers, They do not give way easily.

    4. Re:There are different workloads, duh. by ultranova · · Score: 1

      Because at 6GHz, light speed becomes a limit as well: If, speaking in round numbers, light travels ~300,000,000 meters per second, then it takes 3.33x10^-9 seconds for it to travel one meter. At 6GHz, light travels 50cm per clock cycle.

      300,000,000 / 6,000,000,000 = 0.05. So 5 cm, not 50.

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    5. Re:There are different workloads, duh. by gwolf · · Score: 1

      You are completely right, I was over-optimistic in my numbers. So, thanks for pushing the point even more so!

    6. Re:There are different workloads, duh. by uninformedLuddite · · Score: 1

      I'm sure if you had a 7 digit UID you would have called him a name instead of acknowledging your error

      --
      The new right fascists are bilingual. They speak English and Bullshit.
  2. Specialization is for insects by rossdee · · Score: 2

    According to Lazarus long
    The same should be true for AI

    1. Re:Specialization is for insects by rasmusbr · · Score: 2

      According to Lazarus long
      The same should be true for AI

      If that analogy holds in more than one way then I suppose that specialized AI models will appear earlier in history, will be vastly more numerous and resilient and long-lived than more generalized AI models.

      The more generalized AI:s will probably want to reach for a specialized-AI swatter every now and then.

    2. Re:Specialization is for insects by wierd_w · · Score: 1

      and that would seem to be on the money...

      Imagine the reaction of a hypothetical future smart-AI responding to the niggling bullshit of a spambot...

    3. Re:Specialization is for insects by AqD · · Score: 1

      and for humans.

    4. Re:Specialization is for insects by Anonymous Coward · · Score: 0

      Imagine the reaction of a hypothetical future smart-AI responding to the niggling bullshit of a spambot...

      "Hmm, these spambots appear to be originating from Russia. And these humans have given me command of nuclear missiles aimed directly at Russia. Hmmmmmm......"

    5. Re:Specialization is for insects by Zorpheus · · Score: 1

      The current "general purpose" processors are also specialised though, for example for algorithms with a low number of threads. A processor with the combination of several specialised cores is less specialised, since it is good at everything.

  3. Efficiency by penguinoid · · Score: 1

    Also if we have a whole lot of special-purpose cores, we can leave the unused ones unpowered and gain in energy efficiency from the specialized ones. This seems to be how the human brain works, and it runs on less than 100 watts (100 watts corresponds to 2000 Calories per day).

    --
    Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
    1. Re:Efficiency by jones_supa · · Score: 2, Informative

      This seems to be how the human brain works, and it runs on less than 100 watts (100 watts corresponds to 2000 Calories per day).

      A whole woman consumes 100 watts. Of that brain is about 20 watts. Also, watt represents momentary consumption and calories are a fixed mass of energy, so you can't directly compare them.

    2. Re:Efficiency by jones_supa · · Score: 2

      A whole woman consumes 100 watts.

      D'oh! Human, not woman.

    3. Re:Efficiency by Anonymous Coward · · Score: 0

      > , watt represents momentary consumption and calories are a fixed mass of energy, so you can't directly compare them.

      You you can, if you declare a rate of energy use, and a timeframe that the rate of energy is used over, you can work out how much energy is used

      ie 100w of power usage over 24 hours = 2000 Calories of energy used.

      or, a computer, idling at 20w for 24 hours, will use 1.728 Kilojoules

    4. Re:Efficiency by Anonymous Coward · · Score: 0

      100*24 Wh = 2000 Calories was the implied conversion. I thought his way of describing it was pretty efficient, but I suspect your "unit-incompatibility-alarm-bells" were going off so loudly that you couldn't hear what he was saying well enough to understand the communication. It's like correcting someone on the distinction between magazine and a clip when the person was in fact making a fairly insightful observation about the reload time of an M1 Garand.

      https://www.physicsforums.com/threads/does-the-average-person-run-on-96-85-watts.114431/

    5. Re:Efficiency by Anonymous Coward · · Score: 0

      I think the reload times are about the same and that would be an "en bloc clip". ;-)

    6. Re:Efficiency by Anonymous Coward · · Score: 0

      Women are not human?

    7. Re:Efficiency by Njorthbiatr · · Score: 0

      The brain does convert chemical energy into mechanical energy into electrical potentials.

      But to say the brain runs more efficiently than a computer is pretty dumb. A processor needs some electricity and that it's it, it's good to go. Brains require nutrients beyond simply burning carbohydrates and it needs to keep being fed. Brains cease working if they aren't fed and they can't be restarted. Further brains need 1/3 of every day just to maintain itself.

      They also have a really high error rate.

    8. Re:Efficiency by jonnythan · · Score: 1

      Watts is in units of energy divided by time.

      Now.... what are the units of "calories per day"? Take your time.

    9. Re:Efficiency by TechyImmigrant · · Score: 4, Funny

      A whole woman consumes 100 watts.

      D'oh! Human, not woman.

      I got myself an nvidia woman. She takes 400 watts.

      --
      I should use this sig to advertise my book ISBN-13 : 978-1501515132.
    10. Re:Efficiency by linuxrocks123 · · Score: 1

      Not disagreeing with you, but it's a little misleading to say "a processor needs some electricity and that's it". A processor needs a very precise voltage level of DC current supplied continuously. To get that precise voltage level, you need regulators, AC/DC converters, etc. Moreover, you need to burn coal, oil, natural gas, or sustain a nuclear reaction in order to provide this electricity. Finding carbohydrates is comparatively easy compared to maintaining an entire electrical grid. After all, they quite literally grow on trees.

      --
      vi ~/.emacs # I'm probably going to Hell for this.
    11. Re: Efficiency by GigaplexNZ · · Score: 1

      >, watt represents momentary consumption and calories are a fixed mass of energy, so you can't directly compare them. You you can, if you declare a rate of energy use, and a timeframe that the rate of energy is used over, you can work out how much energy is used

      That's an indirect comparison.

    12. Re: Efficiency by Anonymous Coward · · Score: 0

      Not to mention a f. factory, loads of people programming and configuring the shity processors and electricity just about the same effort. Humans on the other hand are proven to survive w/o civilization and just need two units to produce new one.

    13. Re:Efficiency by jones_supa · · Score: 1

      Of course they are. I simply meant to use the word "human" instead of "woman". The word "human" still encompasses women.

    14. Re:Efficiency by gweihir · · Score: 1

      And you think there are people that can write good software for such a thing? Most developers fail even when there is one benign execution mechanism, such as a virtual machine. This is just an academic that wants attention.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    15. Re:Efficiency by itzly · · Score: 1

      Similarly, arbitrary cells in your body can't just run on fruit from trees. They need a very precisely regulated supply of certain substances, which needs to be regulated by very complex mechanisms in the body.

    16. Re:Efficiency by Anonymous Coward · · Score: 0

      2000 Kilocalories.

    17. Re:Efficiency by Anonymous Coward · · Score: 0

      2000 kcal, Kilocalories!

    18. Re:Efficiency by Geeky · · Score: 1

      Or "B logically follows from A. Therefore B is true if I want it to be. Unless I do really but don't want to tell you I do, or I can make a drama out of it not being true."

      I'm trying not be misogynistic but sometimes it really is hard to follow the logic. Maybe it's just the one I'm seeing. I sort of assume attacking the logic of a certain action is somehow preferable to simply saying "I don't want to".

      I should just accept that logic and relationships are non-overlapping magesteria.

      Meh. Bad weekend.

      --
      Sigs are so 1990s. No way would I be seen dead with one.
    19. Re:Efficiency by loufoque · · Score: 1

      Most processors require less than 100W.
      It's the peripherals that take a lot of power.

    20. Re:Efficiency by Anonymous Coward · · Score: 5, Funny

      Sounds like a hottie

    21. Re:Efficiency by Thor+Ablestar · · Score: 1

      I don't mean misogyny. There are lots of classic philosophy works that are accepted as classic but are step by step reducible to Kolmogorov's lemma. And "results" or "follows" are necessary part of the lemma. "A is true because I want A to be true" has no "follows".

      And btw I think it's as difficult to produce a computer that wants as the computer that feels pleasure.

    22. Re:Efficiency by Anonymous Coward · · Score: 3, Funny

      Unfortunately she requires two inputs. :(

    23. Re:Efficiency by penguinoid · · Score: 1

      I imagine that having a bunch of specialized cores is the sort of thing you want your compiler to worry about. Sure there will be some people who want their programs to be designed around the hardware, but most people would be happy with merely the standard libraries benefiting from extra hardware efficiency.

      --
      Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
    24. Re:Efficiency by penguinoid · · Score: 1

      Most processors require less than 100W.

      Yes, but 100W is enough to run an entire human (at least those who eat 2000 Calories per day). And most brains (still) have more processing power than most processors.

      --
      Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
    25. Re:Efficiency by gweihir · · Score: 1

      Compilers are not smart enough for this.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    26. Re:Efficiency by Anonymous Coward · · Score: 0

      It's called abductive inference ... no wonder that so many women in logic are drawn to it.

    27. Re:Efficiency by Anonymous Coward · · Score: 0

      Well it's better than the ones that go to sleep all the time.

    28. Re:Efficiency by careysub · · Score: 1

      This seems to be how the human brain works, and it runs on less than 100 watts (100 watts corresponds to 2000 Calories per day).

      A whole woman consumes 100 watts. Of that brain is about 20 watts. Also, watt represents momentary consumption and calories are a fixed mass of energy, so you can't directly compare them.

      You can do so reasonably well for sedentary (that is, most) people. There is a basal metabolic rate that consumes most calories for most people and only throttles down 10% at night. Even if an average person were to run a mile every day, they would only be burning an extra 120 Kcal, 6% more than their baseline 2000 Kcal, during that ~10 minute energy burst. This regimen matches the CDC recommendations for exercise (75 minutes of intense activity a week), but 80% of Americans don't even make it to this level.

      So quoting an average energy consumption rate for the body is a reasonable approximation of the situation.

      --
      Starships were meant to fly, Hands up and touch the sky - Nicky Minaj
    29. Re:Efficiency by jones_supa · · Score: 1

      I just missed the "per day" for calories, that's all. Yep, I agree with you.

    30. Re:Efficiency by Anonymous Coward · · Score: 0

      It's an option, not a requirement in any case (only occasionally)!

    31. Re:Efficiency by cwsumner · · Score: 1

      Compilers are not smart enough for this.

      Actually, they are if the programmer writes code correctly. But half of the "smarts" must be in the OS's "task scheduler" that coordinates assigning processors to threads.

      Note:
      A program is not a thread.
      A thread is not a processor.
      These are assigned dynamically as needed.

  4. Hyperbolic headlines strike again by Anonymous Coward · · Score: 1

    There's no such thing as cars
    We have to admit to ourselves that there is no one size fits all mode of transportation and that we require a large spectrum of different transportation methods tailored to the individual application.

    1. Re:Hyperbolic headlines strike again by tepples · · Score: 1

      Yeah, when I read the headline it sounded like someone was trying to undermine the concept of a "general-purpose computer" in favor of locked-down appliances for specific tasks.

    2. Re:Hyperbolic headlines strike again by Bengie · · Score: 1

      A "general purpose processor" is really a processor with a bunch of specialized execution units, each one processing data serially.

    3. Re:Hyperbolic headlines strike again by Thor+Ablestar · · Score: 1

      You are right.

      Quite often the salesmen approach me with their attempt to sell me an Ipad. I usually answer: "Well, but please add the development kit". Their reaction is epic.

    4. Re:Hyperbolic headlines strike again by Anonymous Coward · · Score: 0

      Epic? Really Thor? Do they go to war? Do you hit them with your giant mythical hammer? Or do they in fact offer $50 off a new macbook pro?

    5. Re:Hyperbolic headlines strike again by Zontar+The+Mindless · · Score: 2

      Come to Scandinavia, where there are lots of folks named "Thor" or "Tor", including the guy who lives just above me. He's a handyman and swings a very real hammer.

      --
      Il n'y a pas de Planet B.
    6. Re:Hyperbolic headlines strike again by TheRaven64 · · Score: 3, Informative

      I'm the author of TFA. There's a big difference between a general purpose processor and a general purpose computer. A lot of current research in computer architecture is focussed on the idea that you have a sharp divide between accelerators and general purpose CPUs. The point of the article is that different CPU microarchitectures are specialised for different workloads (one of the cited results was that in a big.LITTLE arrangement, the A7 core runs one of the SPEC benchmarks faster than the A15 because of its lower cache access time, for example) and that there are a lot of assumptions about the kind of code that the general purpose core will run. Many of these are true for C code, but a lot less true for code written in other languages. The communication patterns that mainstream multicore processors are optimised for are heavily tied to C, to the extent that if you have a language with a shared-nothing abstraction and message passing then the only way of implementing it is horrendously inefficient at the hardware level.

      --
      I am TheRaven on Soylent News
    7. Re:Hyperbolic headlines strike again by smallfries · · Score: 4, Insightful

      A lot of the value in your article is lost by trying to shoehorn "general purpose processors" into an argument about task-optimisation. The difference between properties relating to computational power and those relating to performance is really basic textbook stuff that we teach to undergraduates. Being able to run any program, and being able to run any program efficiently, is a difference taught in undergraduate architecture courses.

      The parts of your article that are interesting and valuable would have been better served by a narrative that does not rely on a straw man. Cleanly separating the issue of power / performance and explaining that task-neutral optimisation is impossible would have been a better article, and one that would have been easier to write. There is a natural analogy with representation-bias in machine learning that would have provided more explanatory power without the unnecessary rhetoric. I know its the queue, but even so I am a little disappointed in your reviewers.

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
    8. Re:Hyperbolic headlines strike again by BadDreamer · · Score: 1

      This is why we at one time had Lisp Machines with specialized hardware optimized for running Lisp efficiently. Message based machines were tried for Smalltalk.

      But people do not use these kinds of languages enough. Operating systems and applications are largely written in C and its derivatives. That is why processors optimized for C won out.

      So yes, it is in a way a vicious circle. Most of our software is C, so most of our hardware is optimized for C, so writing software in C makes the most efficient use of it.

    9. Re: Hyperbolic headlines strike again by Anonymous Coward · · Score: 0

      "Epic" as in "they call security and have you pepper-sprayed and bodily carried out of the premises"?

    10. Re:Hyperbolic headlines strike again by Dutch+Gun · · Score: 1

      Interesting article.

      IMO, over-specialization was the reason the PlayStation 3 and it's Cell processors never really lived up to their potential. While they were amazing at crunching raw numbers in highly-serialized batches (they were originally designed for video processing, remember), they're not really so great at processing the type of wildly diverse data and tasks that videogames typically require. These processors were simply designed for the wrong types of tasks - too specialized, essentially. In this case, the arguably weaker but more general-purpose-task-friendly CPUs of the Xbox 360 could tend to execute equivalent tasks with far less porting or rewriting work on the programmer's part. One of the dangers of creating more narrowly-focused CPUs is that if you miss the mark, you may end up performing no better or even worse than a less focused processor.

      In the end, it's a bit of chicken-and-egg, I suppose. C was intentionally designed to be extremely efficient on "general-purpose" processors of the time, and processor developers today create optimizations specifically for large programs written in C/C++ (or more precisely, for the patterns of machine language generated by C/C++ compilers), because that's what the most CPU-intensive large programs are written in today. On the other hand, it's harder to think of practical optimizations other than rather obvious things such as large, on-CPU caches (because physics), which tends to favor programs written with that specific paradigm in mind, especially not without sacrificing current speeds for the existing computing paradigms.

      A language that is not largely cache-coherent, for example, is going to be working against most CPU architectures found in widespread deployment, and I don't think that's a coincidence - my guess is that efficiently predicting code and data to reduce memory latency is a really tricky problem to solve in practice, even if theoretical solutions may exist. As such, certain languages may simply have a big leg up here in terms of efficiency. Note, by the way, that cache-coherency isn't an inherent property of C (and many C++ constructs actively discourage good cache coherency) - it's just *possible* to write cache-friendly code, whereas in many languages it's largely impossible, or at least highly impractical.

      In the broader sense, you're correct of course that there's really no such thing as a "general-purpose processor", because there will always be design biases. Still, I think you're going to lose the semantic battle there. In my earlier example, I described the Xbox 360's processors as being "general purpose", but what I really meant, of course, was "able to execute arbitrary compiled C++ game code reasonably efficiently". When describing anything as "general purpose", there's always a slight assumption about what the "typical" purpose will be, even if left unstated. In other words, from an academic viewpoint, I think you're mostly correct, but are probably not going to get much traction in your arguments because of sheer inertia in the industry that heavily favors C/C++ as systems languages.

      --
      Irony: Agile development has too much intertia to be abandoned now.
    11. Re:Hyperbolic headlines strike again by Kjella · · Score: 2

      Still don't get it. The difference is that accelerators try to do one thing, or at least one class of problems well at the expense of everything else. They optimize for the best case. CPUs do the same when they incorporates specialized instructions as "mini-accelerators" like AES-NI. But what sets general purpose processors apart is that they assume the worst and tries to make all code perform, no matter how ugly. They optimize for flexibility, with an emphasis on minimizing the worst cases. Those are two broad and fundamentally opposite ideas and while the implementations always differ somewhat the design goals remain the same.

      First you say Turing-complete is a poor definition, because it doesn't optimize for performance at all. Then you postulate that a general purpose processor must be efficient at everything, which is obviously setting up a "No true Scotsman" fallacy. You can't optimize for everything at once, you can't create a car with the speed of a Ferrari, the cargo capacity of a truck, the passenger capacity of a minivan and the price and size footprint of a Morris Mini. So the point your making in the headline is based on an absolutely absurd definition of a generalist.

      The difference you bring up between general purpose computing and general purpose computers is interesting though, what most people use their cell phone/tablets/laptops/desktops for might possibly be implemented with specialized software/hardware that makes a different trade-off than generic number crunching. But I wouldn't put money on it, I smell another Itanium in the making that'll emulate existing ARM/x86 code horribly. It would take an extreme amount of momentum to make that happen outside academic research papers.

      --
      Live today, because you never know what tomorrow brings
    12. Re:Hyperbolic headlines strike again by Chester_Lyons · · Score: 1

      There's no such thing as general purpose vehicles

      We have to admit to ourselves that there is no one size fits all mode of transportation and that we require a large spectrum of different transportation methods tailored to the individual application.

      FTFY

    13. Re:Hyperbolic headlines strike again by Anonymous Coward · · Score: 0

      But people do not use these kinds of languages enough. Operating systems and applications are largely written in C and its derivatives. That is why processors optimized for C won out.

      Processors tailored for common HLL operations won out, but C really had nothing to do with it. C was not very popular until the mid-late '80s, when all the major CPU architectures were well in place. It was mostly used for UNIX installations, which were mostly minis and mainframes that were designed before UNIX was widespread (some before UNIX was even invented).

    14. Re:Hyperbolic headlines strike again by TechyImmigrant · · Score: 1

      I thought the most interesting part of the article was the observation that MMUs on all modern processors are geared to the unix model of memory protection and virtualized memory and they let go of the Multics model, which was about 99.9% of the OS course when I was in college back in the stone age.

      --
      I should use this sig to advertise my book ISBN-13 : 978-1501515132.
    15. Re:Hyperbolic headlines strike again by TheRaven64 · · Score: 1

      But what sets general purpose processors apart is that they assume the worst and tries to make all code perform, no matter how ugly. They optimize for flexibility, with an emphasis on minimizing the worst cases

      Read TFA. They optimise for a specific category of algorithm, that is branch heavy (although comparatively light in computed branches), has strong locality of reference, is either single-threaded or has shared-everything parallelism, and a few other constraints. That's not a general purpose processor, that's something optimised for a specific workload and, because they've been the cheapest way of buying processing power for a few decades, people put a lot of effort into trying to shoehorn algorithms to have those characteristics. As GPUs became cheaper per FLOPS, people tried to shoehorn algorithms to fit on processors that are optimised for code with almost no branches, little locality of reference, explicit parallelism and synchronisation, and highly predictable memory accesses. These are also not general purpose processors. They are two points on a design space and we're going to see a lot more as it becomes increasingly cheap to put rarely-used processors on die. If you can only power 5% of the chip at any time, then you can afford to have a load of different pipelines optimised for very different classes of algorithm on the same die, even if they have the same (or mostly the same) instruction set and some of them can run code intended for any of them (albeit slowly and inefficiently).

      --
      I am TheRaven on Soylent News
    16. Re:Hyperbolic headlines strike again by TheRaven64 · · Score: 1

      This is why we at one time had Lisp Machines with specialized hardware optimized for running Lisp efficiently. Message based machines were tried for Smalltalk

      The main reason that Lisp machines lost out was that they were stack based. Stack-based instruction sets don't (easily) expose any instruction-level parallelism, which means that you can't easily extend them to take advantage of pipelining. That wouldn't have been such a problem if Lisp had been parallel (a barrel-scheduled multithreaded stack-based CPU can be very simple to design, have very good instruction cache usage, and get good power / performance ratios), but Lisp machines ran an single-threaded environments. I don't know of any machines (other than the Mushroom project from Manchester) that were designed for Smalltalk - it originally ran with custom microcode on the Alto - but the most successful message-passing machine was the Transputer, optimised for Occam code. Erlang has a similar abstract model and it run in telecoms systems, but on CPUs that are very poorly optimised for it.

      --
      I am TheRaven on Soylent News
    17. Re:Hyperbolic headlines strike again by Kjella · · Score: 1

      Read TFA. They optimise for a specific category of algorithm, that is branch heavy (although comparatively light in computed branches), has strong locality of reference, is either single-threaded or has shared-everything parallelism, and a few other constraints. That's not a general purpose processor

      I did, you're still being silly. It's easy to run non-branching code on a branching processor, it's almost impossible to do the opposite. That is why we call branching processors general purpose. It's easy to run code with weak locality on a processor with strong locality, it's almost impossible to do the opposite. That is we call processors with strong locality general purpose. It's easy to run parallel code on a sequential processor, it's almost impossible to run sequential code on a parallel processor. They won't run them well, for example software rendering on CPUs is horribly slow but t's still orders of magnitude better than trying to use your GPU as a CPU.

      --
      Live today, because you never know what tomorrow brings
    18. Re:Hyperbolic headlines strike again by TheRaven64 · · Score: 1

      I did, you're still being silly. It's easy to run non-branching code on a branching processor, it's almost impossible to do the opposite.

      True, although you end up keeping a large amount of the die powered for no benefit at all. Similarly, you can run sequential code on a GPU by just leaving most of the threads powered and doing nothing.

      It's easy to run code with weak locality on a processor with strong locality, it's almost impossible to do the opposite.

      Not true. This is one of the reasons why GPUs and DSPs are significantly faster. There are large categories of algorithms with predictable access patterns that can leave a CPU with a conventional cache hierarchy (even with prefetch instructions) completely data starved. To load a value into a conventional CPU, you have to hit two or three layers of cache miss, each of which then has to pull in a complete cache line (typically 64 bytes). Meanwhile, a DSP can be sending memory requests at word granularity to the DRAM. Even with a quarter of the memory bandwidth, it can often achieve more throughput than a commodity CPU.

      They won't run them well, for example software rendering on CPUs is horribly slow but t's still orders of magnitude better than trying to use your GPU as a CPU.

      Your GPU is also turing complete, so aside from the memory protection aspects (which actually are present on some modern GPUs), your argument applies in reverse too. You can run sequential code on a GPU: only use one thread. You can run code that branches a lot, you'll just take a load of pipeline flushes as a penalty. You can run code that exhibits locality of reference, you'll just end up fighting the memory controller. So does that mean that your GPU is a general-purpose processor? In both directions, your performance overhead for using the wrong processor is a couple of orders of magnitude.

      --
      I am TheRaven on Soylent News
    19. Re:Hyperbolic headlines strike again by Mr+Z · · Score: 1

      I think part of the problem is that the axes aren't linear. If you know the problem you're trying to tackle a priori you can tackle it with multiple magnitudes of greater efficiency. For a fully specified, unchanging problem, I'd expect 3 orders of magnitude or better in most spaces, because you'd build exactly the hardware you need, and strip away all the hardware that supports unneeded programmability—you build a hardwired ASIC. Even in the programmable space, spending a bit of effort matching your problem to your processor can bring huge gains in efficiency, at least 5x. Also, consider that efficiency isn't just run time, but rather a function of power, performance, and cost.

      The algorithms that run on a hearing aid would sop the hearing aid's battery before they were even fully loaded if you tried to run them on a typical desktop processor. But, they're baked down to a hyperefficient DSP or ASIC that's tuned specifically for the problem.

      You cite a SPEC benchmark that runs faster on an A7 than an A15. Is that in clocks or wall-clock time? I suspect it's dominated by pointer dereferences, such as a linked list traversal. Load-to-use latency (which isn't a function of cache organization, but rather pipeline depth) becomes a dominant term for those workloads.

      Backing up a bit: My problem with your thesis is that you assume there's a "best GPP" and then seek to prove there's no one processor that could possibly be that on the basis that across random applications, the winner varies. Your argument seems to be, at the limit: "if you don't tell me your application ahead of time, I can't pick a best processor, so therefore there's no general purpose processors."

      It's the other way around. There's a cluster of processors that are OK at a range of random tasks. They're distinguished from special-purpose processors by the fact that the special purpose processor performs at least 5x or more (and likely orders of magnitude in some cases) better than the average for the cluster. That's true even if some of the processors in that cluster are 2x more efficient than the others. A processor is a GPP if there's few or no problems for which it's orders of magnitude more efficient than its cohort. 2x is nothing to sneeze at, but a specialized processor should reach much higher. 5x at a minimum.

      And please note I'm mentioning efficiency. It's not raw cycles or even wall clock time. Maybe a better measure is "energy per function", or "energy per function per dollar." (Although the latter is a bit dubious, as you buy the hardware once, but you use it many, many times. Lifetime costs are best approximated by energy costs over the lifetime of the device, if you're doing significant compute.)

      You mention GPUs. Sure, GPUs provide cheap FLOPs, and they can even start to run arbitrary C programs. But, what %age of those FLOPs get utilized when running random programs? You might get a 4x speedup offloading some algorithm to your video card, but is that a win when your video card's raw compute power is 100x your host CPUs? Would you buy a Windows machine powered only by a GPU, running everything from your statistical regression to your web browser?

      (I may exaggerate, but only slightly.)

      To me, "general purpose" means, "I run the compiler, and for the most part, I get what I get. If there's some hotspots, maybe I can tune for this specific architecture. Most of the time, I don't worry." Specialized means "by selecting this processor for this task, I know up front I need to spend time optimizing the implementation of the task to this processor."

      Perhaps the qualm is that really that's more a function of the application than the processor. OK. I can buy that. But, when you look across the space of processors that get deployed in that way, you'll see that most processors tend to end up one one side or the other of that line fairly often, and few are on the fence. You find very few DSPs and GPUs asked to run Linux or Windows kernels and applications (the core code, not the stuff they compile to be offloaded, say, in a shader language). You find some number of x86s asked to run signal processing applications, but only where they can afford the cooling.

  5. Saturday is Semantics Day by Crashmarik · · Score: 2, Insightful

    It isn't hard to understand when people talk about specialty processors they are talking about things that explicitly designed particular tasks easier.
    V.S. General processors which are designed to have wider application while not being as fit for any particular task.

    Not going to say this is correct, but it's pretty easy to put together the exact opposite argument of the authors. That specialty processors should be treated very carefully and their use limited. After all each time you add a different type of specialty processor into an environment, you introduce another codebase for the application, another toolchain to learn and another set of communication / OS support issues.

    1. Re:Saturday is Semantics Day by phantomfive · · Score: 2

      It isn't hard to understand when people talk about specialty processors they are talking about things that explicitly designed particular tasks easier. V.S. General processors which are designed to have wider application while not being as fit for any particular task.

      I believe his point (in the context of high performance algorithms) is that 'standard' processors vary so much in their performance, in the cases they optimize for; that there is no general purpose optimizer that you can consider will work in a certain way for a given algorithm. Some algorithms will work better on processor A, and others on processor B, even though they both are 'general purpose' processors.

      In other words, if you are studying high-performance algorithms, and are writing papers about high-performance algorithms, then you better have a deeper understanding of your processor than just 'general purpose.' General purpose processors vary too much for that to be a meaningful term in this context. Good advice, nothing particular to see here.

      --
      "First they came for the slanderers and i said nothing."
    2. Re:Saturday is Semantics Day by __aaltlg1547 · · Score: 1

      As an electronics designer, I believe he's dead wrong. Specialty ICs become obsolete almost immediately on deployment. You can replace a general purpose processor, or a special purpose processor with a general purpose processor. And you can upgrade the function of a product.

      Fuck overspecialization.

    3. Re:Saturday is Semantics Day by Crashmarik · · Score: 1

      So true, remember when C-Cube was going to be the only way to encode/decode video ?

    4. Re:Saturday is Semantics Day by Zontar+The+Mindless · · Score: 1

      Oh, yeah, the video-on-a-chip-and-only-on-a-chip people. Was always surprised that Sun didn't try to sue them over their logo.

      --
      Il n'y a pas de Planet B.
    5. Re:Saturday is Semantics Day by Sqr(twg) · · Score: 1

      After all each time you add a different type of specialty processor into an environment, you introduce another codebase for the application, another toolchain to learn and another set of communication / OS support issues.

      That will be an issue only for the OS and library developers. To the applications developer there will be no noticeable difference. It is already the case that you need to use specialized libraries to get maximum performance on common types of tasks.

      For example, if you want to use an FFT on a modern "general purpose" processor, you will get much better performance using a standard library function than you would if you wrote your own. There are so may issues with memory access patterns, core and cache utilization, etc. that you will never have time to figure out if you just want to use the FFT (rather than do research on the algorithm itself.)

      If a future CPU gets a built in FFT, then the standard library will be updated, and your application will just run faster. No modification necessary.

    6. Re:Saturday is Semantics Day by solidraven · · Score: 1

      True, there is a need for speciality IP blocks though for those few applications where it does matter. But at that point using an ASIC is probably not the best choice.

      What I do expect to see, given the recent Intel announcement, is FPGAs showing up more and more as co-processor. There is a lot of speed to be gained by reconfiguring the hardware for when you have to crunch through a few gigabyte of data like decoding/encoding a video stream or running a query on a massive database. The only real "speciality" processors that you do still see showing up quite often are DSP processors, but that's simply because that is a market with a rather high demand for cheap chips that can process a lot of data quickly without having to care too much about anything else.

      But if you're running into a low volume special purpose application, just throw an FPGA at it. Saves you the headache of working your way around ASIC design errors afterwards, and you can transfer all the blame for hardware failures to Altera, Xilinx, etc.

    7. Re:Saturday is Semantics Day by __aaltlg1547 · · Score: 1

      There's a lot of FPGA-based SOCs that contain embedded microprocessor cores in them. (e.g. Xilinx's Virtex and XINQ, Altera's Cyclone, Arria and Stratix families, Microsemi's Smartfusion 2). We may see that flip the other way so there's a very high-end core or several of them in a SOC, with FPGA logic to allow pin reconfiguration and large CLBs for speeding up or offloading processes.

      That might spawn new operating systems that manage how you use and configure the CLB and pin configuration. Do they have to remain stable from boot time until you reset the machine, or can software reconfigure them on the fly? (There are already some FPGA families that allow partial reconfiguration of the FPGA while running.)

    8. Re:Saturday is Semantics Day by Crashmarik · · Score: 1

      In an ideal world yes. In the real world not so much. Just take a look at the fun we have with graphic accelerators at this point.

    9. Re:Saturday is Semantics Day by solidraven · · Score: 1

      Oh yeah, don't tell me about those. I still have a headache of getting those PowerPC cores to work on those old Xilinx FPGAs. It was possibly the worst ever experience of my life to get that one going. And then the entire thing was missing a few bytes of memory to fit the entire character set in it... A configurable crossbar is already quite common in embedded systems, though there isn't much use for them in a general purpose CPU I guess since the DMA controller is in charge of those things.

      The on the fly FPGA reconfiguration is currently still quite the hassle, at least based on my personal experiences with it. The hardware support for it is quite ok by now. Its mostly due to the software tools not really supporting it. If you manually write the files and tell the floor plan management software to leave a gap in a certain spot you can usually fit it in. Now I don't think you'll really need a new OS for it, but the drivers for such a thing would get quite complicated if the FPGA cores aren't standardized. And that folks is why I ran away to do analog electronics!

    10. Re:Saturday is Semantics Day by __aaltlg1547 · · Score: 1

      Agreed, FPGA hardware in the CPU would need standardization to get OS level support. The main things these are used for now is places where you'd otherwise need both a microprocessor and a FPGA to manage specialized hardware, and you don't have the volume and budget to pay for an ASIC. They haven't been widely thought of as a means to extend CPU functionality in ways that are expensive or slow to do in a CPU but fast and cheap to do in a logic array.

      But there are many ways to skin that cat, and the one most in favor right now is to embed what would in the past have been ASICs in the microprocessor. (memory managers, PHYs, USB hubs, GPUs).

    11. Re:Saturday is Semantics Day by solidraven · · Score: 1

      Well, a lot of USB hosts and hubs at this point are really just 8051 MCUs. The question is at which point we'll end up with the transputer again if we proceed down this line?

  6. General Purpose Defeats Patents So Here We Go !!!! by Anonymous Coward · · Score: 1

    If Special Purpose Processors can defeat new patent law that says that computer programs running on general purpose computers cannot automatically be considered for patent protection but the same on Specific Purpose Computers ie Phones can be patented then its not hard to see where this crap gestated. Microsoft anyone? Do you think research grants get paid for anything but a corporatations own special purpose interests? This is the beginining of another end run around the new patent laws currently starting to hold back the patent cartell.

  7. If it's fast enough, "general purpose" is fine by Anonymous Coward · · Score: 4, Insightful

    If a "general purpose" processor solves your problems fast enough, it's good enough.

    How the fuck is that "harmful"?

    Geez, you'd think TFA is just a blowhard looking for page hits.

    1. Re:If it's fast enough, "general purpose" is fine by pushing-robot · · Score: 0

      Because it is inefficient. In addition to higher energy bills, a less efficient architecture means shorter battery life in a mobile device, more noise in a desktop PC, and fewer servers per rack in a datacenter.

      --
      How can I believe you when you tell me what I don't want to hear?
    2. Re: If it's fast enough, "general purpose" is fine by Anonymous Coward · · Score: 0

      The problem today is electrical power efficiency, not speed

    3. Re:If it's fast enough, "general purpose" is fine by Anonymous Coward · · Score: 3, Informative

      Because it is inefficient. In addition to higher energy bills, a less efficient architecture means shorter battery life in a mobile device, more noise in a desktop PC, and fewer servers per rack in a datacenter.

      There are "general purpose" microcontrollers that use microWatts of power. That can run on one tiny watch battery for years.

      For example, http://www.microchip.com/wwwpr...

      From datasheet,

        * 30 microAmps per Mhz
        * 20 nanoAmps in sleep

      So a 100mAh 3V watch battery would last 570 years on sleep mode and 3-5 months operating at 1MHz. Or at 31kHz, with some sleep it should operate for years on a button cell.

      And it's a fully programmable, general purpose microcontroller.

      So what's the problem? Too inefficient?

    4. Re:If it's fast enough, "general purpose" is fine by Anonymous Coward · · Score: 0

      it exists though and I'm writing using a cpu that is fairly general purpose chip in use.
      of course, it's tied in with a lot of specialized chips which process data for io.

      the article is stupid. especially in the context of in number of "processors" produced only a fraction are general purpose...

      and that includes mobile phone chips and other chipsets which implement let's say video decoding or encoding on silicon that can be "dead" for most of the time.

    5. Re:If it's fast enough, "general purpose" is fine by Ihlosi · · Score: 2, Insightful
      Because it is inefficient.

      And building a separate processor for each of the nearly inifinite number of possible tasks out there isn't?

      Especially when it comes to integrated circuits, mass production of one product is what makes the production process cheap and efficient.

      Also, at some point, you need to say "good enough/fullfills our requirements". Yes, you might save a bit of power by coming up with your own chip design, but designing an ASIC is not a trivial task and in the end your product might be three times as expensive and half a year late.

    6. Re:If it's fast enough, "general purpose" is fine by JaredOfEuropa · · Score: 2

      How the fuck is that "harmful"?

      Because every time you believe in a general purpose processor, a kitten dies

      --
      If construction was anything like programming, an incorrectly fitted lock would bring down the entire building...
    7. Re: If it's fast enough, "general purpose" is fine by Thor+Ablestar · · Score: 1

      You are not right. There are lots of useful strictly sequential algos. The mplayer, for instance. It loads 100% one of 8 cores, other cores are idle. Some of it's codecs can be parallel, most not. Some programs are intentionally sequential, litecoin miner for instance.

      And since I spend power for heating, the 100W processor gives as much heat as 100W resistor. Heat pump cannot operate since temperature difference is usually big enough.

    8. Re:If it's fast enough, "general purpose" is fine by Stephan+Schulz · · Score: 1

      If a "general purpose" processor solves your problems fast enough, it's good enough.

      How the fuck is that "harmful"?

      You miss the point. It's not the "general purpose processor" that is harmful per se. What is harmful is the labelling of a certain class of processors as "general purpose", when, in the view of the author, they are not really general purpose, but specialised for executing C code with, at most, mid-sized working sets and little inter-processor communication. By assuming this workload as the default and calling processors good for it "general purpose", we may miss other approaches that might be more suitable for certain classes of problems.

      --

      Stephan

    9. Re:If it's fast enough, "general purpose" is fine by ChrisMaple · · Score: 1

      C is a general purpose language, perhaps the most general purpose language. Processors optimized for C code are by default general purpose.

      --
      Contribute to civilization: ari.aynrand.org/donate
    10. Re:If it's fast enough, "general purpose" is fine by jedidiah · · Score: 1

      A general purpose processor is intended to do anything.

      General purpose processors are based on the idea that they aren't superstars at any one particular task. So they are pushed to perform as well as the tech will allow. This allows them to beat even the speciality silicon.

      Also, not all speciality coprocessors are created equal.

      A weak (but cheap) special purpose coprocessor will still underperform a general purpose CPU that's not hamstrung with certain pecular engineering considerations.

      A general purpose processor is intended to do anything including that special purpose silicon you haven't even managed to build yet.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    11. Re: If it's fast enough, "general purpose" is fine by TechyImmigrant · · Score: 1

      >And since I spend power for heating, the 100W processor gives as much heat as 100W resistor. Heat pump cannot operate since temperature difference is usually big enough

      Yes. This is why the power efficiency of a computer is strictly a function of how you define 'useful work'.

      --
      I should use this sig to advertise my book ISBN-13 : 978-1501515132.
  8. Clickbait Caption, but Valid Arguments by gentryx · · Score: 3, Insightful

    Of course general purpose CPUs exist, simply because we call them that way. But it is also true that each design has it's own strengths, and "dark silicon" is another driver for special purpose hardware. Efficiency is another. Andrew Chien has published some interesting research on this subject. In his 10x10 approach he suggests to use 10 different types of domain-specific compute units (e.g. for n-body, graphics, tree-walking...), each of which is 10x more efficient than "general purpose CPUs" in its domain (YMMV). Those compute units bundled together, make up one core of the 10x10 design. Multiple cores can be connected via a NoC.

    Let's see how software will cope with this development...

    ps: can special purpose hardware exist if general purpose hardware doesn't?

    --
    Computer simulation made easy -- LibGeoDecomp
    1. Re:Clickbait Caption, but Valid Arguments by Anonymous Coward · · Score: 1

      Yes. It is possible for everything to be special-purpose.

    2. Re:Clickbait Caption, but Valid Arguments by Firethorn · · Score: 1

      ps: can special purpose hardware exist if general purpose hardware doesn't?

      Yes it can. After all the first 'computers' were dedicated code-breakers and such.

      However, there are many different levels of 'general purpose' or 'specialized'. Earlier 'cars' were mentioned as 'general purpose', but think about it, they're actually pretty specialized - they're generally designed for a max of 4-6 passengers, carry maybe a couple hundred pounds of cargo besides the passengers, drive mostly on paved roads, have a range of somewhere between 300 and 400 miles, etc... But within that you have various categories that specialize further - fewer passengers, cargo, more fuel efficient. More luxury or cheaper. Better off-road capability, more cargo space. Faster. So on and so forth.

      Such pieces as the op are good from the 'think about it' perspective, but your average computer is already a mess of specialized processors. The CPU is only 'general purpose' in the sense that it's supposed to coordinate between all of them. Anything that happens frequently enough or is expensive enough has it's own dedicated unit. Graphics. Northbridge, southbridge, networking, USB, I/O in general, etc...

      --
      I don't read AC A human right
    3. Re:Clickbait Caption, but Valid Arguments by Blaskowicz · · Score: 1

      I believe the Atari Jaguar was a console made of special purpose chips, with an "orchestrator" 68000 CPU tacked on. The propaganda said you're not meant to use that CPU or something to that effect but of course the games heavily relied on it, some of them Amiga ports. The console was a big failure, lacking a usable SDK and documentation. Now consoles have an OS, APIs, middleware.

    4. Re:Clickbait Caption, but Valid Arguments by kesuki · · Score: 1

      "Of course general purpose CPUs exist, simply because we call them that way."

      the wisdom of those real world coders gone from this world is thus. a jack of all trades, is a master of none.

      this means simply that a general purpose FPGA that can modulate it's functions can do a lot of different things but not at the same time, and in etched hardware the trade off is having dark silicon for all the tasks a true jack of all trades cpu can do.

      for instance, when i was doing video games all day the more games i played in diveresity made it harder for me to 'master' a game. but the one game i mastered i could play at the 1% level rank of 800 out of 1000 on a game ladder. and because doing the same thing over and over made me good, i rebelled a bit and would have to rotate my build orders and strategy selection, though i once took a good undead vs human 1v1 staretegy that requires building an item shop first and when the necesary tech of the item carry ability is done to send the ghoul out hunting the enemy base, equiped with a rod of necromancy for the hero and pick your best mercenary hero, i used tinker. then you focus on killing one militia and try not to let the ghoul die and use the rod of necromancy and a pocket factory to harass, only a human who goes fast footmen with a MK can beat this strategy so i took it to 13:1 after which i got bored with it.

      anyways what i mean is because i played so long and hard i got really good at the game. but i like a fpga had to map the circuits to get it to work. i had issues and quit playing games, though not forever. i played the game again a bit later and it took me a full month to get back to doing the same gameplay at my highest level, but then i quit it was too hard on me. i am very much like a fpga in that my functions can be practiced until i am good at them/set the optimal gate array, unlike a fpga the base ability to do stuff takes time for the programming to work right. even then i was only 800 or so of 1000 ranked players. (at the time they had around 30,000 players for the game online durring normal usage times) like a fpga i was reoptomized for whatever i was doing, and so had a memory effect at the tasks i repeated frequently.

      anyways, a jack of all trades is unable to truly master something, that is the trade off for designing a chip to be that way, you can make a processor that does one thing well like gpus, you can have a fpga if you anticipate needs to switch its abilities, or you can make proper dark silicon that is switched on only when calls for that function are anticipated/occur. so really a general processing unit does exist and it is just inferior to specialty designs, because of how it is built to do everything a processing unit need might based on predictions of how a person will use it.

      in other words a dark silicon design is like a RV it can drive it can be slept in it can tow a car if needed and has a toilet and stuff, they cram everything you need in one box albeit at a price. of being bigger to accomadate anticipated needs. a fpga is like an autobot in the shape of a smaller rv, in that it can reconfigure itself to the needs and thus is lighter and more compact than the dark silicon. finally there is the gpu version this is not an autobot nor a rv but rather a car with a rocket on it's back. it takes you from home to shop etc, but at speeds the autobots and the rvs can only dream of (ignoring the possibility of spaceships/starscream), but it can't do everything else it is only able to move you fast from point a to point b.

    5. Re:Clickbait Caption, but Valid Arguments by dbIII · · Score: 1

      I'd say by that point Atari was a huge failure as an entity instead of the actual device, as seen with other things where they developed products but did not follow through.

    6. Re:Clickbait Caption, but Valid Arguments by Anonymous Coward · · Score: 0

      The big pain about making heterogeneous devices is making all the heterogeneous parts work and work together. Verification is hard because now you've got many unique pieces of hardware. Chances are you didn't design all of it -- you bought some or you reused some from previous projects, done by people who are long gone -- so you don't have test cases for making all that stuff work together. So you are probably not going to get a working device on the first try and probably not one that performs well on the second try either.

    7. Re:Clickbait Caption, but Valid Arguments by mlts · · Score: 1

      What we might have happen is that we end up with a mix, where a core is weighted towards a task... but compared to running a job at say, 80% as effectively as a core that is built for the job, versus not running the task at all, the scheduler [1] would drop tasks on non-optimal cores if it would help performance. If it is something definitely not optimal (FPU instructions on an integer-only core), the weighting would account for that and might not even place a task on there come the next quantum.

      The 10x10 is interesting. However, on an average desktop, we could see quite a number of cores that (in addition to the normal CPU/GPU/FPU lineup) would be useful with special purposes:

      1: A core running as a hypervisor, using something similar to ARM's TrustZone to ensure complete separation of tasks. This core (or perhaps two cores... one low power that uses very little wattage when the box is idle, and one that is faster when there is a lot of VM context shifting.)

      2: A core that deals with I/O, with a lot of cache. Until the cache fills up, this would turn a lot of random I/O into sequential I/O, which is a lot easier for a hard disk or RAID array to deal with. This could even use fast SSD as another level, although a RAM cache handled by the OS might be just as good. This core can also offload software RAID commands (such as what ZFS, Storage Spaces, LVMs, and btrfs do), so performance would be improved on that end without having a dedicated RAID controller. Adding a battery backed up write cache (especially if the OS knows about it and can work with that) would only help things.

      3: Cores to handle encryption are a given. AES is so often used that having space on a die to handle the S boxes and array shifting goes without saying.

      4: High power and low power cores. Some phones have two sets of cores, one for when the device is not doing much, and one for when it is active use. For servers, this would come into handy because a DB server that doesn't get touched after 5:00 could just sit on a low-draw core, and when the DB starts getting hit by transactions, moved back to a faster one. This in itself would save a lot of wattage at the expense of die space.

      5: Similar to #1, but the core would have its own separate I/O, memory, and other space. This could be used for the hypervisor, or tasks that need to be isolated from everything in the machine. A Harvard architecture could be used to further prevent attacks like smashing the stack or heap.

      6: Ye Olde FPGAs. One never knows when these may come in handy, and having an application with oddball CPU requirements be able to use one may help performance immensely.

      7: A dedicated CPU just for scanning RAM space. This could be used for a host IDS/IPS, or performance/health monitoring.

      8: A dedicated NIC-like core whose purpose is optimized for packetization/depacketization. Pretty much a FCoE CNA, except with the added ability to work as a firewall outside of the host machine's OS. Around 10 years ago, some computers had "smart NICs", and could have firewalling code on there that would guard the machine (say, keep port 25 from going anywhere) even if the box's main OS was compromised. Give the machine SFP slots, and depending on the SFP inserted, the machine could use that slot as a FC HBA, a FCoE CNA, an iSCSI adapter, or "just" used for plain Ethernet traffic. That way, one doesn't have to even replace cards to move to a new SAN technology... just the media adapter gets replaced.

      I'm sure there are many others, but once making chips smaller has diminishing returns, then going with sheer number of cores stops being working, going with special purpose cores, or ones weighted towards a task is the next step.

    8. Re: Clickbait Caption, but Valid Arguments by O('_')O_Bush · · Score: 1

      "jack of all trades, is a master of none"

      You do know that is just an idiom, and an incorrect one, at that, not a law?

      The idiom is "A jack of all trades is better than a master of one", which was later shoehorned into a description of someone who is "a Jack of all trades but a master of none". It is not intended to be folk wisdom that there is a tradeoff between mastering something and being proficient at lots of things... in fact, there are numerous examples of people being masterful in many arenas and also having high proficiency in many areas.

      Oftentimes we call things that do general purpose things very well and have mastery in areas "platforms", and you see them with guns, cars, software, and I could go on.

      Not arguing any other point, just that one.

      --
      while(1) attack(People.Sandy);
    9. Re: Clickbait Caption, but Valid Arguments by jkflying · · Score: 1

      "Jack of all trades, master of none" is the correct saying, it is just missing the ending: "still better than a master of one."

      --
      Help I am stuck in a signature factory!
    10. Re:Clickbait Caption, but Valid Arguments by Thor+Ablestar · · Score: 3, Insightful

      Of course. But the very origins of microprocessor revolution are being forgotten:

      Intel was to make 10 specialized chips. Intel made ONE universal programmable chip i4004 that replaced them all and spent development money ONCE.

      It's possible to make 10 special-purpose chips but it will cost 10 times more.

    11. Re:Clickbait Caption, but Valid Arguments by Anonymous Coward · · Score: 0

      The discussion isn't really about making 10 specialized processors, it's about making 1 processor with 10 (or actually, probably a lot more than that) specialized cores, but, due to the dark silicon problem, only able to use a few of them at a time.

    12. Re:Clickbait Caption, but Valid Arguments by Anonymous Coward · · Score: 0

      If you have 10 cores, that makes sense, but if you have 100, you can go much lower level. See GreenDroid for an example of that works. The high-level is that they take Android, run it through a profiler (err... with some workload), and compile the most common basic blocks to hardware, grouped together into a bunch of small cores, each with a few basic blocks and a with to transfer to other cores as well as a general purpose processor. Then they have a special compiler that can compile C code given the set of basic blocks available in hardware. (The two steps are separate so they can design the hardware based on an older version of Android and still be able to compile a newer version.)

    13. Re:Clickbait Caption, but Valid Arguments by K.+S.+Kyosuke · · Score: 1

      I've had similar ideas about specializing the Forth cores of the F18 design in a grid. The good thing is that the specific instruction set extensions can be simply substituted by subroutine calls if you need to use them on other cores in cases of low dynamic frequency of their usage - the code is essentially concatenative, so it's a matter of simple string substitution. Even the OS loader or scheduler could do it very quickly on the fly.

      --
      Ezekiel 23:20
    14. Re:Clickbait Caption, but Valid Arguments by K.+S.+Kyosuke · · Score: 1

      Actually, that's not exactly a good analogy. The SPEs were identical. They wouldn't be in this case.

      --
      Ezekiel 23:20
    15. Re:Clickbait Caption, but Valid Arguments by hairyfeet · · Score: 1

      Sheeeit, even the guys that build X86 chips can't design software that makes using just a handful of X86 cores as easy to use as writing for just one core, and he expects them to magically come up with libraries and compilers that will seamlessly switch between dozens of specialized cores, probably hundreds of times per second depending on the task?

      --
      ACs don't waste your time replying, your posts are never seen by me.
    16. Re:Clickbait Caption, but Valid Arguments by drinkypoo · · Score: 1

      Earlier 'cars' were mentioned as 'general purpose', but think about it, they're actually pretty specialized - they're generally designed for a max of 4-6 passengers, carry maybe a couple hundred pounds of cargo besides the passengers, drive mostly on paved roads, have a range of somewhere between 300 and 400 miles, etc...

      The first automobiles were curiosities. But as soon as someone figured out you could move people, they knew you could move cargo, and the ones immediately following the first passenger cars were trucks. And a pickup truck is a general-purpose vehicle. It's not particularly good at anything, but it can do a little bit of everything. A sedan is a special-purpose vehicle for moving people and their personal cargo.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    17. Re: Clickbait Caption, but Valid Arguments by Hognoxious · · Score: 1

      What a load of bollocks. If you've got a tooth abscess you want a master of dentisfuckingtry and you won't give a rat's ass how good he is at carpentry or playing the guitar.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    18. Re:Clickbait Caption, but Valid Arguments by Anonymous Coward · · Score: 0

      The theory was that developers would assign different tasks for each core to guarantee bandwidth for critical tasks such as graphics processing.

      Such task parallelism is usually used in low core count symmetrical architectures as a low hanging fruit, and not on a streaming one such as the cell. The SPEs were cores with fast, local memories. Call them stream buffers if you like. There was the general purpose core as well.

  9. 1 Calorie per day = 48.4 mW by tepples · · Score: 5, Informative

    100 watts corresponds to 2000 Calories per day

    Also, watt represents momentary consumption and calories are a fixed mass of energy

    Calories and calories per day are not the same unit. A calorie is 4.18 kJ, and a calorie per day is 4.18 kJ / 86.4 ks = 48.4 mW. Multiply this by 2000 and you'll end up very close to 100 W.

    1. Re:1 Calorie per day = 48.4 mW by jones_supa · · Score: 1

      Oh, good point. Then it makes sense indeed.

    2. Re:1 Calorie per day = 48.4 mW by penguinoid · · Score: 1

      100 watts corresponds to 2000 Calories per day

      Also, watt represents momentary consumption and calories are a fixed mass of energy

      Calories and calories per day are not the same unit. A calorie is 4.18 kJ, and a calorie per day is 4.18 kJ / 86.4 ks = 48.4 mW. Multiply this by 2000 and you'll end up very close to 100 W.

      Close. A calorie is 4.18 J, but a Calorie (also known as a kilocalorie or a food calorie) is 4.18 kJ. In this case, capitalization matters.

      --
      Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
    3. Re:1 Calorie per day = 48.4 mW by Anonymous Coward · · Score: 0

      Careful with your units here: Calorie (large C) and calorie (small c) are different units. A Calorie used conventionally is actually a kilocalorie. Otherwise, your math is correct...

    4. Re:1 Calorie per day = 48.4 mW by Hognoxious · · Score: 1

      I figured something was wrong, given that I can (or rather could) produce 250 watts on an exercise bike.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    5. Re:1 Calorie per day = 48.4 mW by penguinoid · · Score: 1

      Yes, but you can't keep that up for 24 hours a day.

      --
      Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
    6. Re:1 Calorie per day = 48.4 mW by Neil+Boekend · · Score: 1

      Some can (for 1 day), but that has significant results for energy input requirements.
      And energy output maximum for a few days.

      --
      Well, I might have a way, but it only works on a semi spherical planet in a vacuum.
  10. Function List by The+New+Guy+2.0 · · Score: 1

    Anybody who's studied physics knows there's a lot of equations that ring true in the world, and that's the point of the mainstream Intel/AMD processors. Any programmer knows there's no limits to the number of functions you can write in C or VB, and languages like PHP place their own functions on top of a base of C, which can be compiled in to processor-specific code.

    So, a chip that can do everything in Excel or Mathmatica would be the true General Purpose Processor. Intel Core i7 chips are close to getting there compared to a Pentium or Celeron chip. A RISC (Reduced Instrction Set Computing) chip by definition only knows the functions defined, and therefore can be fast at doing just that.

    So, the true General Purpose Computing chip doesn't exist yet, but they're working on it!

    1. Re:Function List by DivineKnight · · Score: 1

      Speaking of RISC, I have an OT question (but still technology related, so there's that): does anyone know where I can pick up a DEC Alpha CPU + MB (or machine) in about a month's time, with a CPU speed of >=500Mhz? I have this undying urge to tinker with one, but eBay's listing stuff for 266Mhz, or at a price that well exceeds my inner geek's wallet.

    2. Re:Function List by Anonymous Coward · · Score: 0

      Check the OpenVMS places, like comp.os.VMS or openvms.org. Sometimes people cleaning out a basement will dump stuff. Not that you have to run OpenVMS, though you can for free...

  11. stop thinking too hard by Anonymous Coward · · Score: 0

    if you have several different types of specialized cores on one cpu, you could call it a general-purpose processor.

  12. Emulation by tepples · · Score: 1

    So, a chip that can do everything in Excel or Mathmatica would be the true General Purpose Processor.

    Yet such a chip can be emulated on a processor with reduced instruction set complexity because they're Turing equivalent: both are linear bounded automata. If you compile the same program in a high level language for a complex processor and a simple processor, they'll produce the same result. Each operation on the complex processor may correspond to several instructions on the simple processor, but ARM's bet with big.LITTLE is that reduced power consumption in a simple processor's instruction decoder makes up for that difference.

    The true limits of "general purpose" are speed and size of memory, flexibility of input and output, and whether a device's manufacturer enforces restrictions against the device's owner on what software may run.

    1. Re:Emulation by Guy+Harris · · Score: 1

      If you compile the same program in a high level language for a complex processor and a simple processor, they'll produce the same result. Each operation on the complex processor may correspond to several instructions on the simple processor, but ARM's bet with big.LITTLE is that reduced power consumption in a simple processor's instruction decoder makes up for that difference.

      For big.LITTLE, the difference between the instruction decoders isn't an issue of different instruction sets; to quote their big.LITTLE Processing with ARM Cortex-A15 & Cortex-A7 white paper:

      The central tenet of big.LITTLE is that the processors are architecturally identical. Both Cortex-A15 and Cortex-A7 implement the full ARM v7A architecture including Virtualization and Large Physical Address Extensions. Accordingly all instructions will execute in an architecturally consistent way on both Cortex- A15 and Cortex-A7, albeit with different performances.

      so each instruction on the complex processor would correspond to the exact same instruction on the simpler processor.

      As that paper says, "It is in the micro-architectures that the differences between Cortex-A15 and Cortex-A7 become clear."

    2. Re:Emulation by tepples · · Score: 1

      For big.LITTLE, the difference between the instruction decoders isn't an issue of different instruction sets

      I agree that the analogy is inexact, but there is still a difference in complexity between a big core and a little core. An in-order processor needs a less complex decoder than an out-of-order superscalar processor, just as RISC needs a less complex decoder than CISC. Thus the little cores use a less complex decoder compared to the big cores' more complex decoder to decode the same ARM instructions.

  13. What a useless paper by iamacat · · Score: 1

    Basically it's making a big deal out of the fact that today's commonly available hardware is optimized for today's commonly available software. Duh! General purpose is a term relative to purposes a particular person has in mind. Nobody is suggested that Core i7 is capable of running Lt Cmdr Data.

    A genuinely interesting paper would have specific ideas for architecture capable of solving problems beyond the scope of current CPUs and GPUs.

    1. Re:What a useless paper by Anonymous Coward · · Score: 1

      It's demonstrably a pile of crap. The parts bin of failures is full to overflowing with more specialized designs, Transputer, Cell, Itanic. A couple of previous posters hit the reason - software is the hard bit, and it's a LOT easier to write software for the more generalized processors than it is for the specialized ones.

      By the time the specialized processor has realized it's theoretical performance the generalized one is two generations ahead, cheaper and just as fast.

    2. Re:What a useless paper by AnyoneEB · · Score: 1

      A genuinely interesting paper would have specific ideas for architecture capable of solving problems beyond the scope of current CPUs and GPUs.

      A couple cool projects I've seen on making good use of dark silicon are GreenDroid and Chlorophyll, both of which are recent research projects on compiling for weird architectures that are specially designed to be energy efficient. If it's specialized for different applications that you want, then Anton is the closest I've seen; it's specialized for running physical simulations so it can do things like protein folding.

      --
      Centralization breaks the internet.
    3. Re:What a useless paper by K.+S.+Kyosuke · · Score: 1

      The parts bin of failures is full to overflowing with more specialized designs, Transputer, Cell, Itanic.

      Yes, but those are specialized symmetrical designs. It's not like you get a few extra transputers and one extra Itanic core with every two-core Haswell.

      By the time the specialized processor has realized it's theoretical performance the generalized one is two generations ahead, cheaper and just as fast.

      Again, a blatantly false analogy. These extra units would have more in common with vector SSE/AVX units in your "general purpose" CPU. You don't drop them "two generations ahead" - Intel never did, and they're not going to drop AVX2 "two generations" after Haswell where they introduced them. You just don't power them if you don't need them for a while, and then use them to increase the performance-per-watt ratio in specific parts of your code.

      --
      Ezekiel 23:20
  14. Does he speak English? by Anonymous Coward · · Score: 0

    Chisnall seems to be confusing the expression "general purpose", which means "suitable for a wide variety of tasks", with an idea like "ideal for all tasks". He's criticizing arguments that no one is actually making.

  15. Old news is old by Anonymous Coward · · Score: 1

    Sure, tell any hardware engineer working on smartphones or tablets what they already know. Besides, it won't matter after a few years.

    Graphene looks the most promising successor to silicon. In part because the electron mobility of graphene is hundreds to thousands of times faster than silicon, and it leaks almost no heat or power, (though inducing a bandgap increases leakage as of now).

    Sure, for now stick some application processors in a phone to save a bit of power. In a decade or probably less just stick even a single core terahertz CPU in and you're done for 99% of all application processing. And that's 100% of 95% of the market, not 99% of tasks for everyone.

  16. General purpose: Efficiency not required by davidwr · · Score: 5, Insightful

    From TFA:

    It's therefore not enough for a processor to be Turing complete in order to be classified as general purpose; it must be able to run all programs efficiently. [emphasis added]

    Um, nope.

    A general-purpose anything is rarely as efficient at a given task as a special-purpose version of the same thing. Sometimes you really do want your computer chip to be a "Jack of all trades, master of none."

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  17. i also hear... by Anonymous Coward · · Score: 0

    there's no true scotsman

    1. Re:i also hear... by tepples · · Score: 1

      Sure there is: a British citizen who resides in Scotland.

    2. Re:i also hear... by Anonymous Coward · · Score: 0

      Sure there is: a British citizen who resides in Scotland.

      Aren't all Scotsmen inherently British citizens (I mean up until GB became the UK in 1800)? I didn't hear of any of the pro-Independence activists preemptively renouncing their citizenship, so unless you have some other angle I think you failed to produce a workable joke.

  18. ACM Queue brand decay by fche · · Score: 1

    There have been many mediocre articles on the ACM Queue in the last few years ... this fits perfectly.

  19. HMMM by koan · · Score: 1

    What about the human brain.

    --
    "If any question why we died, Tell them because our fathers lied."
  20. It's not all that many years ago by msobkow · · Score: 4, Insightful

    It isn't all that many years ago that the floating point was handled by either software emulation or a co-processor. Now we're using GPUs as co-processors. There are also audio designs that act as co-processors. Several enterprise systems have encryption co-processors. IBM is notorious for putting specialized processors in their mainframes. Several chips have the GPUs embedded on-chip already.

    I'd argue that putting specialized chips on-die doesn't affect the general-purpose nature of the compute core that controls those resources at all. The whole article is red herring trying to establish a distinction between on-chip and off-chip processing that has to do more with the scalability of silicon manufacturing techniques than it does any distinguishing feature of the designs.

    Let's face it -- if you want to really accelerate a task, you design silicon specifically for that task and interface it to a general purpose core. The article discusses nothing new in the world of computing.

    --
    I do not fail; I succeed at finding out what does not work.
    1. Re:It's not all that many years ago by SuricouRaven · · Score: 2

      AES encryption is built into all modern Intel CPUs, except a few Atoms.

      The enterprise crypto co-processors are mostly for RSA key generation. Something that's only done during connection setup, but can be a substantial load on a high-traffic SSL server that creates hundreds of connections each second.

    2. Re:It's not all that many years ago by K.+S.+Kyosuke · · Score: 1

      Note that a part of AMD's HSA is exactly this - allowing multiple heterogeneous units to do memory-based communication on top of the paged virtual memory, and providing a mechanism for the units to communicate with each other (by sending "command packets"). Now the current implementation in Kaveri is mostly about having a CPU and a GPU on the same chip, but adding other units - even almost-fixed-purpose ones like one that would be able to accept commands of the form of "encrypt [this block of memory] using [AES-128-CFB[ with [this key]" - is fairly easy, but so far, nobody worked out the architectural framework, until AMD did. There have been (semi)standardized buses for chip design for yoking units together, but shared memory makes it somewhat easier for programmers - it's no different from multicore programming if the memory is the bus! And you don't need to send (large) packets of data to those units, only commands and data pointers.

      --
      Ezekiel 23:20
    3. Re:It's not all that many years ago by K.+S.+Kyosuke · · Score: 1

      You're sure it's RSA key generation? Or RSA encryption of the generated session key? I thought that RSA keys were generated when certificates are generated, not when the certificate is being used.

      --
      Ezekiel 23:20
    4. Re:It's not all that many years ago by SuricouRaven · · Score: 1

      No, I'm not sure. I'm not a cryptographer, so I'm not sure exactly what maths it's doing. All I can tell you is the practical side: There is something computationally intensive that happens during the setup stage of an SSL connection, and the main purpose of a hardware cryptographic accelerator is to do that something. Usually so that a webserver may handle many more SSL connections per second than CPU alone could handle. The other approach is an appliance that sits before the webserver and does the SSL stuff, so the webserver sees only plain unencrypted HTTP and the appliance handles the encryption transparently.

    5. Re:It's not all that many years ago by msobkow · · Score: 1

      When an SSL or HTTPS connection is established, the existing RSA key is used to negotiate the connection, but a connection-specific key is generated and shared over the RSA-keyed initial connection. It's that generation of the connection-specific key that is compute-intensive. If I recall correctly, that secondary key is usually done using a symmetric algorithm that can be processed faster than AES encryption can be, with the caveat that it requires sharing the key, so it can only be safely used if the initial communication of the key is secure and the key is discarded after the connection closes.

      --
      I do not fail; I succeed at finding out what does not work.
    6. Re:It's not all that many years ago by tlhIngan · · Score: 1

      It isn't all that many years ago that the floating point was handled by either software emulation or a co-processor. Now we're using GPUs as co-processors. There are also audio designs that act as co-processors. Several enterprise systems have encryption co-processors. IBM is notorious for putting specialized processors in their mainframes. Several chips have the GPUs embedded on-chip already.

      I'd argue that putting specialized chips on-die doesn't affect the general-purpose nature of the compute core that controls those resources at all. The whole article is red herring trying to establish a distinction between on-chip and off-chip processing that has to do more with the scalability of silicon manufacturing techniques than it does any distinguishing feature of the designs.

      Let's face it -- if you want to really accelerate a task, you design silicon specifically for that task and interface it to a general purpose core. The article discusses nothing new in the world of computing.

      I think you've nailed what separates a "general purpose" CPU from a specialized core. Specialized cores are not meant to control other cores - your GPU isn't intended to manage data flowing from it to say, a crypto accelerator or even a NIC. It processes data handed to it and managed by something else.

      The general purpose core is the manager - it can choose to delegate (push the work to a specialized core) or do it itself (process it on its compute hardware).

      Crypto accelerators, GPUs, floating point units, vector units, I/O units, network controllers, audio processors, etc., none of those are designed to be autonomous and operate the system by itself. They rely on a general purpose core to manage the data flows through the system, setting up DMAs, chaining units as necessary to process the data, etc.

  21. Problem vs Processor by jimbrooking · · Score: 1

    Back in the day (when I was actually PAID for buying supercomputers) I devised Jim's First Law of Supercomputing: For every computer architecture there is a problem that will solve on that particular architecture "better" than on any other architecture. And conversely, for every problem there is an architecture that will solve this problem "better" than any other architecture. (You get to define "better".) You didn't have to talk to too many computer sales-people to accept this as fact. I believe the point of the OP is exactly this.

  22. Transmeta by Sebastopol · · Score: 2

    This whole discussion just made me laugh whilst remembering the hype around the Transmeta / Torvalds code-morphing engine.

    Ah, the 90's. They were fun.

    CPUs have been "general purpose" since day one. The only non-general purpose hardware are ASICs (like the article says). Everything else is just marketing hype from Intel, et al.

    This is such an amazing rehash of what Intel used to call *T technologies in the 90's, starting from the 80's, when coprocessors started appearing (x87). The big trend was toward DSPs in the 90's, but that never happened, instead they pushed on new hardware like MMX, SSE and now vector processors. That's why we have graphics processors as non-general-purpose CPUs.

    To call something a GPGPU is just an egregious assault of on common sense.

    "Dark silicon", while a catchy name, is simply a side effect of latency, something the article mostly skips (hints at it with locality): the memory hierarchy exists and dark silicon is a result. When latency is zero, more of the silicon will be engaged.

    While one could easily claim that because parts of any chip power down that means it's not general purpose, that's an oversimplification: 100% utilization is fundamentally impossible because problems aren't solved that way, there is no infinite parallelism.

    I really think the author's analysis isn't fully developed. While the conclusion that hardware looks like the software may be a pleasant tautology, it overlooks Turing's thesis entirely. Which is odd, because that's what they author -started- with!

    --
    https://www.accountkiller.com/removal-requested
    1. Re:Transmeta by drinkypoo · · Score: 1

      The big trend was toward DSPs in the 90's, but that never happened, instead they pushed on new hardware like MMX, SSE and now vector processors

      No, it did happen. We got DSP-like technology in our PCs in the form of MMX, SSE, and vector processors. So we got both DSP and general-purpose processors.

      To call something a GPGPU is just an egregious assault of on common sense.

      If our system bus designs permitted we could have separate graphics output and vector processor. But they don't. You've got to call it something.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    2. Re:Transmeta by Anonymous Coward · · Score: 0

      "Dark silicon", while a catchy name, is simply a side effect of latency, something the article mostly skips (hints at it with locality): the memory hierarchy exists and dark silicon is a result. When latency is zero, more of the silicon will be engaged.

      No, the term dark silicon was recently introduced to describe the situation where there is simply not enough resources to power the whole chip at the same time due to the combination of increasing manufacturing density and much slower increasing or decreasing power budget. A modern cell phone SoC would be the poster child for the idea.

    3. Re:Transmeta by Blaskowicz · · Score: 1

      We get actual DSPs too not just SIMD, such as the H264 decoder, encoder, Intel Quicksync, AMD TrueAudio, various ones in cell phone CPUs.

      Hell this has me going back to the Amiga and Super NES where you had some cool hardware. but TFA seems to be arguing for something like HSA or even more closely integrated into the CPU itself.

      If our system bus designs permitted we could have separate graphics output and vector processor. But they don't. You've got to call it something.

      We sort of have that with Optimus, SLI, Crossfire (and even output to H264 or H265 but that's another thing. or piping to a USB 3.0 display adapter it seems). I'm sure it can be generalized, at the cost of somewhat wasting or using up PCIe bandwith. Even the open source linux world is ready for this with DMA_BUF API (at the cost of poor open source driver speed). In the near future you could add a "display output card" to a system with APU (AMD or Intel), or use the motherboard's output when using a GPU board.. If firmware, drivers, OS were to play nice.

    4. Re:Transmeta by drinkypoo · · Score: 1

      If our system bus designs permitted we could have separate graphics output and vector processor. But they don't. You've got to call it something.

      We sort of have that with Optimus, SLI, Crossfire

      Eh, sort of. But not really. Back "in the day" you had the original (and I mean original) PowerVR cards working this way, they did PCI DMA to your video card and got performance almost as good as a 3dfx card for substantially less money and with much less installation hassle. This was back when PCs had a single 33 MHz PCI bus, too! Today we've typically got at least two of those somewhere, probably downstream of the PCI-E bus but not if we're lucky. But of course, the screen resolutions have exploded since.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    5. Re:Transmeta by Sebastopol · · Score: 1

      Thanks for clarifying. I misunderstood in my original post and went "full-rant".

      --
      https://www.accountkiller.com/removal-requested
  23. An interesting paragraph from the article by Jeremi · · Score: 1

    "[Superscalar architectures] translate the architectural instruction encodings into something more akin to a static single assignment form (ironically, the compiler spends a lot of effort translating from such a form into a finite-register encoding)

    Which makes me wonder, would it (in principle) be worth designing a chip with an ISA that is based explicitly on single-assignment-form, thereby avoiding both the need for transformations by the compiler and (more importantly) transformations by the CPU at run-time?

    --


    I don't care if it's 90,000 hectares. That lake was not my doing.
    1. Re:An interesting paragraph from the article by linuxrocks123 · · Score: 1

      That is a very cool idea, and is explored in this research project: http://en.wikipedia.org/wiki/T...

      --
      vi ~/.emacs # I'm probably going to Hell for this.
    2. Re:An interesting paragraph from the article by TheRaven64 · · Score: 1

      It would be interesting, but it's also a question of encoding density. Having a fixed number of architectural registers (and a much larger number of microarchitectural registers) is a technique that works reasonably well. Adding more architectural registers makes your operand size very large. You could imagine something like Dalvik bytecode, with 2^32 SSA registers and a CPU able to interpret it by either using internal registers or spilling to RAM, but you'd likely end up needing huge instruction caches and not getting much (if any) speedup.

      --
      I am TheRaven on Soylent News
    3. Re:An interesting paragraph from the article by Anonymous Coward · · Score: 0

      I'm not sure what you mean by that. The problem with ssa is that you need a new register for each assignment.
      Since in hardware there is a limited number of registers and memory locations, you need to transform from ssa to something else either in software or in hardware.
      It would be a fun thought experiment, though I can't really think of any immediate benefits.
      I think transformations done by the CPU are mostly because most architectures do not expose the pipeline latencies to the compiler.
      So unless you do that as well, transformations would still be needed. It might possibly make the reorder logic a bit simpler, but I don't know if the logic needed to allocate the registers and keep track of register liveness would offset that.

  24. Yes, there is... by ka9dgx · · Score: 1

    Yes, you can use almost 100% of the silicon, if you use a BitGrid to process information instead of Von Neuman architectures.

  25. In Soviet Russia... by Thor+Ablestar · · Score: 1

    ... there are lots of idiots that instead of kWh write kW/h.

    It's quite understandable: our Education Minister Fursenko once said:

    "Defect of the Soviet education system was an attempt to form a creative man, and now the challenge is to cultivate qualified consumers, qualified to benefit from the creative work of others"

    "The ideology of education remains the same - we must prepare creators. But we need above all to inculcate the culture of using already existing developments, following the existing standards."

    "I am deeply convinced that do not need higher mathematics at school. Moreover, higher mathematics kills creativity"

  26. Special instruction sets by phorm · · Score: 1

    I'd say an X86 is general, but optimized. Obviously it does some things better than others, and there's known trade-offs between floating-point VS integral math, etc, but that's why we have special instruction sets, GPUs, and now APU's as well. They're still able to accomplish general tasks - most with reasonable speed - but they've been added on-to to accomplish the special tasks that are most common.

    1. Re:Special instruction sets by Anonymous Coward · · Score: 0

      You also have digital signal processors (DSP's) that are used for signal processing; implementing discrete FFT's and inverse-FFT's. Looking back at some of the old processors, the Intel 960 attempted to add GPU type instructions into the CPU instruction set to support Z-buffering. That would be the equivalent of multiple color attachments on FBO's these days.

  27. What a load of shit by Anonymous Coward · · Score: 0

    "it must be able to run all programs efficiently" Did he just pull shit of out his mouth? Does it have to fly too? General purpose means it is not optimized for any particular situation. Accelerators are optimized for specific situations. This guy is just stupid.

  28. This sounds like ... by Anonymous Coward · · Score: 0

    Shutting down parts when not needed?
    Is it not the same way as brain works? Parts of the brain that are not needed are put in "suspended mode"

  29. Compiling for single-assignment by Ottibus · · Score: 1

    Which makes me wonder, would it (in principle) be worth designing a chip with an ISA that is based explicitly on single-assignment-form, thereby avoiding both the need for transformations by the compiler and (more importantly) transformations by the CPU at run-time?

    If the ISA is based on single assignment then the compiler will still have to transform the code into this form. In practice the compiler does this anyway when it can and super-scaler hardware does a good job of executing this kind of code. The problem (returning to the subject of the thread) is that not all code fits this pattern and in these cases a single-assignment ISA will perform particularly badly so it is not suitable for general purpose processors.

    Researchers have be developing ISAs for decades (including the single-assignment approach on dataflow computers in the 1980s) and the current mainstream ISAs are a reflection of that research. There is no magic new way of executing general purpose code.

  30. Um, what? by wonkey_monkey · · Score: 2

    Here is one definition of a general-purpose processor: if it can run any algorithm, then it is general purpose. This is not a particularly interesting definition, because it ignores the performance aspect that has been the driving goal for most processor development.

    Well, I'm sorry you don't find the definition interesting, but that doesn't mean you can redefine it however you want.

    It's therefore not enough for a processor to be Turing complete in order to be classified as general purpose; it must be able to run all programs efficiently.

    I assume there's a name for a logical fallacy where you redefine terms in order to make your point.

    With this in mind, let's explore what people really mean when they refer to a general-purpose processor: the specific category of workloads that these devices are optimized for and what those optimizations are.

    That's not what I mean when I refer to a general-purpose processor.

    Efficient designs in such a world will require admitting that there is no one-size-fits-all processor design and that there is a large spectrum, with different trade-offs at different points.

    I didn't realise anyone was denying this.

    --
    systemd is Roko's Basilisk.
    1. Re:Um, what? by Anonymous Coward · · Score: 0

      I assume there's a name for a logical fallacy where you redefine terms in order to make your point.

      That would be "feminism."

  31. I disagree by Anonymous Coward · · Score: 0

    Everyone buys based on need you don't buy a Porsche that can go 200 MPH to deliver mail. Neither do you buy a Mack truck to pick up a few things at the hardware store. At the same token CPU's are purpose built and designed. Multi core generally were designed to help with multi task operations. Obviously if all you do is surf the web on a browser with only a few tabs open you don't need as much CPU as a person using CAD software, running multiple high process programs or playing 3D games. In fact Intel as I sure AMD does this too takes its CPU chip process and makes varied speed CPU's based on the quality of that wafer. So in essence many different speeds come out of the process to save money. Its also why ARM chips are becoming popular and even being considered for servers. They can run equally well and use less power and still maintain the duty status needed for a server. You don't buy a core i7 extreme to do your taxes on or write a term paper. At least you shouldn't because unless you use that power for other tasks. You again have bought a Corvette to deliver pizza's.

  32. Re:General purpose: Efficiency not required by 91degrees · · Score: 2

    I think there is a certain efficiency argument. A GPU may be able to run a C compiler but nobody would consider using it for that. A CPU can run an OpenGL implementation and it would be slow but you'd at least be able to do it without any fiddly hacks, and there could be a reason to do so.

    The article seems to be trying to find a hard and fast rule as to what "general purpose" means and then realising that that doesn't actually apply to general purpose processors.

  33. I's just a bullshit semantics game by Sycraft-fu · · Score: 2

    Guy is trying to play silly distinction games. Really, everyone in tech understands what people mean when they say "general purpose processor." Yes, said unit may have some specialized circuits and such, but it is made to be good at dealing with all kinds of problems. Integer, FP, branching, linear, etc doesn't matter its design can handle them all reasonably well.

    That compares to something specialized like a GPU. For certain kinds of problems, specifically single precision vector math with fairly consistent branches, it does amazing. However for other things, not as much, though it is turning complete and capable of anything. Still a true processor and not an ASIC that can't be programmed, but not general purpose.

    Try to play semantic games with it is silly. Are there going to be cases where the line might be blurred? Sure, but who cares? That's how life is. Everything doesn't always fit in to neat little boxes. It is still a generally useful way of looking at things.

  34. Bummer by Anonymous Coward · · Score: 0

    "There's No Such Thing As a General-Purpose Processor"

    Say it ain't so. I'm still wrapping my head around that "no free lunch" thingie.

  35. Wrong by Kim0 · · Score: 1

    I have designs for general and efficient processors, or rather computing structures, and it is provable they are both efficient and general.

    The generality of computing has been known for a long time, through emulation of Turing machines, but this has not been efficient.

  36. Programming complexity by rockmuelle · · Score: 2

    A big reason we accept the trade offs of modern processors is that it's generally easy to program a broad range of applications for them.

    In the mid aughts (not very long ago, actually), there was a big push for heterogeneous multi-core processors and systems in the HPC space. Roadrunner at Los Alamos was a culmination of this effort (one of the first petascale systems). It was mix of processor types including IBMs Cell (itself a heterogeneous chip). Programming Roadrunner was a bitch. In having different processor families, you had to decompose your algorithm to target the right processor for a given task. Then you had to worry about moving data efficiently between different processors.

    This type of development is fun as an intellectual exercise, but very difficult and time consuming in practice. It's also something compilers will never be good at, requiring experts in the architectures, domains, and applications to effectively use the system.

    Another lesson from the period (and one that anyone whose done asics has known for years) is that general purpose hardware generally evolves fast enough to catch up with specialized hardware with a reasonable timeframe (usually 6-18 months, see DE Shaw's ASIC for protein folding as an example).

    While custom processors are cool (I love hacking on them), they're rarely practical.

    -Chris

  37. The value of a good definition by tepples · · Score: 1

    Aren't all Scotsmen inherently British citizens

    Yes. But not all British citizens are Scotsmen, only those residing in Scotland.

    unless you have some other angle I think you failed to produce a workable joke.

    "No true Scotsman" is about changing definitions in mid-argument, and an effective way to avoid this is to agree on definitions early on. Likewise, one resolves the heap paradox by defining a heap as a contiguous collection of grains where at least one grain is supported solely by other grains. I was aiming for a bit of an anti-joke by defining "Scotsman" the way the law probably defines it, if the definition of "citizen of a state" in another common-law country's constitution is to be believed.

    Now let's apply this principle to the term "general-purpose processor". A Turing equivalence definition might be "capable of executing all programs." An efficiency definition might be "capable of executing all programs with time and energy use no greater than that of the most efficient processor for each program."

    1. Re:The value of a good definition by Anonymous Coward · · Score: 0

      Aren't all Scotsmen inherently British citizens

      Yes. But not all British citizens are Scotsmen, only those residing in Scotland.

      So you admit you doubly failed your initial attempt:
      1. When you alleged that a British citizen residing in Scotland is *not* a true Scotsman, when *all* true Scotsmen meet the citizenship criterion. This is a failure of logic on your part.
      2. You failed to understand that the Kingdom of Great Britain ceased to exist in 1800 when it was succeeded by the UK.

      I'm quite aware of the panoply of logical fallacies. I was commenting about your failures of logic and history.

    2. Re:The value of a good definition by tepples · · Score: 1

      When you alleged that a British citizen residing in Scotland is *not* a true Scotsman

      That is entirely not what I alleged. AC wrote "There's no true Scotsman." I wrote "Sure there is," meaning "I believe you are incorrect; there is in fact a true Scotsman." Then I preceded to define a Scotsman.

  38. Re:General Purpose Defeats Patents So Here We Go ! by PPH · · Score: 2

    Good point. But if I can take (patentable) software targeted to a special purpose processor and port it to a different (possibly general purpose) processor, I have bypassed the patent.

    The goal of a 'well written' patent is to be as general as possible without getting tossed out of a USPTO examiner's office.

    --
    Have gnu, will travel.
  39. The real problem. by jedidiah · · Score: 1

    The real problem is that you never know what you may end up doing. The potential of a speciality bit of silicon is by it's very nature limited. It may completely fail at some new task that you didn't think of when you were building it.

    It's great for some really well defined problem but as soon as that definition is no longer invalid, the speciality silicon is useless.

    "General Purpose" means that you can address any problem including the ones where your specialty silicon fail.

    --
    A Pirate and a Puritan look the same on a balance sheet.
  40. We're no longer at the origin by gentryx · · Score: 1

    Architectural improvements for general purpose CPUs yield less and less benefits: Even more registers? Even better branch prediction? Even larger caches? It'll all yield but a few percent, at least for current Intel designs. So, the way to go is currently more and more cores, but what good is it to have many cores that can't all fire simultaneously?

    --
    Computer simulation made easy -- LibGeoDecomp
  41. There is no such thing as a competent professor by Anonymous Coward · · Score: 0

    Really? An academic claims there is no such thing as an ACTUAL non-theoretical device? Amazing. Clearly an incompetent. The number of specialized devices which have become "general purpose" in the last 30 years is astounding.
    Clearly a case of Bozo-Philia