Slashdot Mirror


Windows and Linux Not Well Prepared For Multicore Chips

Mike Chapman points out this InfoWorld article, according to which you shouldn't immediately expect much in the way of performance gains from Windows 7 (or Linux) from eight-core chips that come out from Intel this year. "For systems going beyond quad-core chips, the performance may actually drop beyond quad-core chips. Why? Windows and Linux aren't designed for PCs beyond quad-core chips, and programmers are to blame for that. Developers still write programs for single-core chips and need the tools necessary to break up tasks over multiple cores. Problem? The development tools aren't available and research is only starting."

111 of 626 comments (clear)

  1. Adapt by Dyinobal · · Score: 3, Funny

    Give us a year maybe two.

    1. Re:Adapt by Dolda2000 · · Score: 5, Interesting

      No, it's not about adaptation. The whole approach currently taken is completely, outright on-its-head wrong.

      To begin with, I don't believe the article about the systems being badly prepared. I can't speak for Windows, but I know for sure that Linux is capable of far heavier SMP operation than 4 CPUs.

      But more importantly, many programming tasks simply aren't meaningful to break up into such units of granularity is OS-level threads. Many programs would benefit from being able to run just some small operations (like iterations of a loop) in parallel, but just the synchronization work required to wake up even a thread from a pool to do such a thing would greatly exceed the benefit of it.

      People just think about this the wrong way. Let me re-present the problem for you: CPU manufacturers have been finding it harder to scale the clock frequencies of CPUs higher, and therefore they start adding more functional units to CPUs to do more work per cycle instead. Since the normal OoO parallelization mechanisms don't scale well enough (probably for the same reasons people couldn't get data-flow architectures working at large scales back in the 80's), they add more cores instead.

      The problem this gives rise to, as I stated above, is that the unit of parallelism gained by more CPUs is to large to divide the very small units of work that exist among. What is needed, I would argue, is a way to parallelize instructions in the instruction set itself. HP's/Intel's EPIC idea (which is now Itanium) wasn't stupid, but it has a hard limitation on how far it scales (currently four instructions simultaneously).

      I don't have a final solution quite yet (though I am working on it as a thought project), but the problem we need to solve is getting a new instruction set which is inherently capable of parallel operation, not on adding more cores and pushing the responsibility onto the programmers for multi-threading their programs. This is the kind of the the compiler could do just fine (even the compilers that exist currently -- GCC's SSA representation of programs, for example, is excellent for these kinds of things), by isolating parts of the code in which there are no dependencies in the data-flow, and which could therefore run in parallel, but they need the support in the instruction set to be able to specify such things.

    2. Re:Adapt by Cassini2 · · Score: 5, Insightful

      Give us a year maybe two.

      I think this problem will take longer than a year or two to solve. Modern computers are really fast. They solve simple problems, almost instantly. A side-effect of this, is that if you underestimate the computational power required for the problem at hand, then you are likely to be off by large amounts.

      If you implement an order n-squared algorithm, O(n^2), on a 6502 (Apple II), if n was larger than a few hundred, you were dead. Many programmers wouldn't even try implementing hard algorithms on the early Apple II computers. On the other hand, a modern processor might tolerate O(n^2) algorithms with n larger than 1000. Programmers can try solving much harder problems. However, the programmers ability to estimate and deal with computational complexity has not changed since the early days of computers. Programmers use generalities. They use ranges: like n will be between 5 and 100, or n will be between 1000 and 100,000. With modern problems, n=1000 might mean the problem can be solved on a netbook, and n=100,000 might require a small multi-core cluster.

      There aren't many programming platforms out there that scale smoothly between applications deployed on a desktop, to applications deployed on a multi-core desktop, and then to clusters of multi-core desktops. Perhaps most worrying, is that the new programming languages that are coming out, are not particularly useful for intense data analysis. The big examples of this for me are: .NET and Functional Languages. .NET deployed at about the same time multi-core chips showed up, and has minimal support for it. Functional languages may eventually be the solution, but for any numerically intensive application, tight loops of C code are much faster.

      The other issue with multi-core chips, is that as a programmer, I have two solutions to making my code go faster:
      1. Get out the assembly print outs and the profiler, and figure out why the processor is running slow. Doing this, helps every user of the application, and works well with almost any of the serious compiled languages (C, C++). Sometimes, I can get a 10:1 speed improvement.(*) It doesn't work so well with Java, .NET, or many functional languages, because they use run-time compilers/interpreters and don't generate assembly code.
      2. I recode for a cluster. Why stop at a multi-core computer? If I can get a 2:1 to 10:1 speed up by writing better code, then why stop at a dual or quad core? The application might require a 100:1 speed up, and that means more computers. If I have a really nasty problem, chances are that 100 cores are required, not just 2 or 8. Multi-core processors are nice, because they reduce cluster size and cost, but a cluster will likely be required.

      The problem with both of the above approaches, is that from a tools perspective, they are the worst choice for multi-core optimizations. Approach 1 will force me into using C and C++, which doesn't even handle threads really well. In particular, C and C++ lacks an easy implementation of Software Transactional Memory, NUMA, and clusters. This means that approach 2 may require a complete software redesign, and possibly either a language change or a major change in the compilation environment. Either way, my days of fun loving Java and .NET code are coming to a sudden end.

      I just don't think there is any easy way around it. The tools aren't yet available for easy implementation of fast code that scales between the single-core assumption and the multi-core assumption in a smooth manner.

      Note: * - By default, many programmers don't take advantage of many features that may increase the speed of an algorithm. Built-in special purpose libraries, like MMX, can dramatically speed up certain loops. Sometimes loops contain a great deal of code that can be eliminated. Maybe a function call is present in a tight loop. Anti-virus software can dramatically affect system speed. Many little things can sometimes make big differences.

    3. Re:Adapt by Dolda2000 · · Score: 5, Informative

      Since the normal OoO parallelization mechanisms don't scale well enough

      It hit me that this probably wasn't obvious to everyone, so just to clarify: "OoO", here, stands not for Object-Oriented Something, but for Out-of-Order, as in how current, superscalar CPUs work. See also Dataflow architecture.

    4. Re:Adapt by Yaa+101 · · Score: 4, Interesting

      The final solution is that the processor measures and decides which part of which program must be run parallel and which are better off left alone.
      What else do we have computers for?

    5. Re:Adapt by tftp · · Score: 5, Insightful

      To dumb your message down, CPU manufacturers act like book publishers who want you to read one book in two different places at the same time just because you happen to have two eyes. But a story can't be read this way, and for the same reason most programs don't benefit from several CPU cores. Books are read page by page because each little bit of story depends on previous story; buildings are constructed one floor at a time because each new floor of a building sits on top of lower floors; a game renders one map at a time because it's pointless to render other maps until the player made his gameplay decisions and arrived there.

      In this particular case CPU manufacturers do what they do simply because that's the only thing they know how to do. We, as users, for most tasks would rather prefer a single 1 THz CPU core, but we can't have that yet.

      There are engineering and scientific tasks that can be easily subdivided - this comes to mind - and these are very CPU-intensive tasks. They will benefit from as many cores as you can scare up. But most computing in the world is done using single-threaded processes which start somewhere and go ahead step by step, without much gain from multiple cores.

    6. Re:Adapt by Sentry21 · · Score: 5, Insightful

      This is the sort of thing I like about Apple's 'Grand Central'. The idea behind is that instead of assigning a task to a processor, it breaks up a task into discrete compute units that can be assigned wherever. When doing processing in a loop, for example, if each iteration is independent, you could make each iteration a separate 'unit', like a packet of computation.

      The end result is that the system can then more efficiently dole out these 'packets' without the programmer having to know about the target machine or vice-versa. For some computation, you could use all manner of different hardware - two dual-core CPUs and your programmable GPU, for example - because again, you don't need to know what it's running on. The system routes computation packets to wherever they can go, and then receives the results.

      Instead of looking at a program as a series of discrete threads, each representing a concurrent task, it breaks up a program's computation into discrete chunks, and manages them accordingly. Some might have a higher priority and thus get processed first (think QoS in networking), without having to prioritize or deprioritize an entire process. If a specific packet needs to wait on I/O, then it can be put on hold until the I/O is done, and the CPU can be put back to work on another packet in the meantime.

      What you get in the end is a far more granular, more practical way of thinking about computation that would scale far better as the number of processing units and tasks increases.

    7. Re:Adapt by Dolda2000 · · Score: 4, Interesting

      As I mentioned briefly in my post, there was research into dataflow architecures in the 70's and 80's, and it turned out to be exceedingly difficult to do such things efficiently in hardware. It may very well be that they still are the final solution, but until such time as they become viable, I think doing the same thing in the compiler, as I proposed, is more than enough. That's still the computer doing it for you.

    8. Re:Adapt by Cassini2 · · Score: 5, Informative

      HP's/Intel's EPIC idea (which is now Itanium) wasn't stupid, but it has a hard limitation on how far it scales (currently four instructions simultaneously). I don't have a final solution quite yet (though I am working on it as a thought project), but the problem we need to solve is getting a new instruction set which is inherently capable of parallel operation, not on adding more cores and pushing the responsibility onto the programmers for multi-threading their programs.

      The problem with very long instruction word (VLIW) architectures like the EPIC and the Itanium, is that the main speed limitations in today's computers are bandwidth and latency. Memory bandwidth and latency can be the dominant performance driver in a modern processor. At a system level, network, I/O (particularly for the video), and a hard drive bandwidth and latency can dramatically affect system performance.

      With a VLIW processor, you are taking many small instruction words, and gathering them together into a smaller number of much larger instruction words. This never pays off. Essentially, it is impossible to always use all of the larger instruction words. Even with a normal super-scalar processor, it is almost impossible to get every functional unit on the chip to do something simultaneously. The same problem applies with VLIW processors. Most of the time, a program is only exercising a specific area of the chip. With VLIW, this means that many bits in the instruction word will go unused much of the time.

      In and of itself, wasting bits in an instruction word isn't a big deal. Modern processors can move large amounts of memory simultaneously, and it is handy to be able to link different sections of the instruction word to independent functional blocks inside the processor. The problem is the longer instruction words use memory bandwidth every time they are read. Worse, the longer instruction words take up more space in the processor's cache memory. This either requires a larger cache, increasing the processor cost, or it increases latency, as it translates into fewer cache hits. It is no accident the Itanium is both expensive and has an unusually large on-chip cache.

      The other major downfall of the VLIW architecture is that it cannot emulate a short instruction word processor quickly. This is a problem both for interpreters and for 80x86 emulation. Interpreters are a very popular application paradigm. Many applications contain them. Certain languages, like .NET and Java, use pseudo-interpreters/compilers. 80x86 emulation is a big deal, as the majority of the worlds software is written for an 80x86 platform, which features a complex variable length instruction word. The long VLIW instructions are unable to decode either the short 80x86 instructions, or the Java JIT instruction set, quickly. Realistically, a VLIW instruction processor will be no quicker, on a per instruction basis, than an 80x86 processor, despite the fact the VLIW architecture is designed to execute 4 instructions simultaneously.

      The memory bandwidth problem, and the fact that VLIW processors don't lend themselves to interpreters, really slows down the usefulness of the platform.

    9. Re:Adapt by init100 · · Score: 3, Insightful

      To begin with, I don't believe the article about the systems being badly prepared. I can't speak for Windows, but I know for sure that Linux is capable of far heavier SMP operation than 4 CPUs.

      My take on the article is that it is referring to applications provided with or at least available for the systems in question, and not actually the systems themselves. In other words, it takes the user view, where the operating system is so much more than just the kernel and the other core subsystems.

      But more importantly, many programming tasks simply aren't meaningful to break up into such units of granularity is OS-level threads.

      Actually, in Linux (and likely other *nix systems), with command lines involving multiple pipelined commands, the commands are executed in parallel, and are thus being scheduled on different processors/cores if available. This is a simple way of using the multiple cores available on concurrent systems, and thus, advanced programming is not always necessary to take advantage of the power of multicore chips.

    10. Re:Adapt by camperslo · · Score: 4, Funny

      The programmers of Slashdot are ready for multiple cores and threads. There is no problem.

      When performing a number of operations in parallel the key is to simply ignore the results of each operation.
      For operations that would have used the result of another as input simply use what you think the result might be or what you wish it was.

      The programmers of Slashdot already have the needed skills for such programming as the mental processes are the same ones that enable discussion of TFAs without reading them.

    11. Re:Adapt by Trepidity · · Score: 2, Interesting

      The problem is still the efficiency, though. There are lots of ways to mark units of computation as "this could be done separately, but depends on Y"--- OpenMP provides a bunch of them, for example, and there's been proposals dating back to the 80s, probably earlier. The problem is figuring out how to implement that efficiently, though, so that the synchronization overhead doesn't dominate the parallelization gains. Does the system spawn new threads? Maintain a pool of worker threads and feed thunks to them? Some hybrid approach? How does it determine when it's worth the effort of doing anything for a particular bit of computation versus just doing it inline and saving the overhead? Etc.

      Basically Grand Central is yet another in the decades-long line of proposals for specifying parallelizable computations. What's still an open question is whether they've solved the harder part, a way to, as you say, "[route] computation packets to wherever they can go, and then [receive] the results", without that routing and receiving taking inordinate overhead.

    12. Re:Adapt by Dolda2000 · · Score: 3, Interesting

      All that which you say is certainly true, but I would still argue that EPIC's greatest problem is its hard parallelism limit. True, it's not as hard as I tried to make it out, since an EPIC instruction bundle has its non-dependence flag, but you cannot, for instance, make an EPIC CPU break off and execute two sub-routines in parallel. Its parallelism lies only in very small spatial window of instructions.

      What I'd like to see is, rather, that the CPU can implement a kind of "micro-thread" function, that would allow two larger codepaths simultaneously -- larger than what EPIC could handle, but quite possibly still far smaller than what would be efficient to distribute on OS-level threads, with all the synchronization and scheduler overhead that would mean.

    13. Re:Adapt by Anonymous Coward · · Score: 2, Insightful

      This is also known as Processor Affinity outside of the Apple box

    14. Re:Adapt by Anonymous Coward · · Score: 5, Insightful

      You're thinking too simply. A single-core system at 5GHz would be less-responsive for most users than a dual-core 2GHz. Here's why:

      While you're playing a game more programs are running in the background - anti-virus, defrag, email, google desktop, etc. Also, any proper, modern game splits it's tasks, e.g. game AI, physics, etc.

      So dual-core is definitely a huge step up from single. So, no, users don't want single-core, they want a faster more responsive pc, which NOW is dual-core. In a few years it will be quad core. Most now hardly benefit from quad core.

    15. Re:Adapt by try_anything · · Score: 4, Insightful

      But most computing in the world is done using single-threaded processes which start somewhere and go ahead step by step, without much gain from multiple cores.

      Yeah, I agree. There are a few rare types of software that are naturally parallel or deal with concurrency out of necessity, such as GUI applications, server applications, data-crunching jobs, and device drivers, but basically every other kind of software is naturally single-threaded.

      Wait....

      Sarcasm aside, few computations are naturally parallelizable, but desktop and server applications carry out many computations that can be run concurrently. For a long time it was normal (and usually harmless) to serialize them, but these days it's a waste of hardware. In a complex GUI application, for example, it's probably fine to use single-threaded serial algorithms to sort tables, load graphics, parse data, and check for updates, but you had better make sure those jobs can run in parallel, or the user will be twiddling his thumbs waiting for a table to be sorted while his quad-core CPU is "pegged" at 25% crunching on a different dataset. Or worse: he sits waiting for a table to be sorted while his CPU is at 0% because the application is trying to download data from a server.

      Your example of building construction is actually a good example in favor of concurrency. Construction is like a complex computation made of smaller computations that have complicated interdependencies. A bunch of different teams (like cores) work on the building at the same time. While one set of workers is assembling steel into the frame, another set of workers is delivering more steel for them to use. Can you imagine how long it would take if these tasks weren't concurrent? Of course, you have to be very careful in coordinating them. You can't have the construction site filled up with raw materials that you don't need yet, and you don't want the delivery drivers sitting idle while the construction workers are waiting for girders. I'm sure the complete problem is complex beyond my imagination. By what point during construction do need your gas, electric, and sewage permits? Will it cause a logistical clusterfuck (contention) if there are plumbers and eletricians working on the same floor at the same time? And so on ad infinitum. Yet the complexity and inevitable waste (people showing up for work that can't be done yet, for example) is well worth having a building up in months instead of years.

    16. Re:Adapt by David+Gerard · · Score: 4, Funny

      Three cores to run GNOME, one core to run Firefox.

      --
      http://rocknerd.co.uk
    17. Re:Adapt by nmb3000 · · Score: 5, Funny

      To dumb your message down, CPU manufacturers act like book publishers [...]

      What is this "books" crap? Pft, I remember when car analogies were good enough for everyone. Now you have to get all fancy. Let me try and explain it more clearly:

      CPUs are like cars. Intel and Friends haven't been able to keep increasing the velocity they can safely and reliably run, so instead of relying on increased speed to get more people from point A to point B, they are instead starting to look at parallelization as a means to achieve better performance.

      Now you are chopped up into 10 pieces and FedEx'd to your destination with 100 other people. Pieces may go by road, rail, air, or ship and thus overall capacity--"bandwidth" you might say--of the lanes of travel has been increased.

      The only problem is that the people who make use of this new technique ("programmers", that is) have a hard time chopping you up in such a way that you can be put back together again. Usually it's a bit of a mess and more trouble that it's worth, thus we just keep driving our old-fashioned cars at normal speeds while adding lanes to the roads.

      --
      "What do you despise? By this are you truly known." --Princess Irulan, Manual of Muad'Dib
      /)
    18. Re:Adapt by TheRaven64 · · Score: 5, Informative

      This is simply not true. Assuming both cores are fully loaded, which is the best possible case for dual core, then they will still be performing context switches at the same rate as a single chip if you are running more than one process per core. Even if you had the perfect theoretical case for two cores, where you have two independent processes and never context switch, you could run them much faster on the single-core machine. A single-core 5GHz CPU would have to waste 20% of its time on context switching to be slower than a dual-core 2GHz CPU, while a real CPU will spend less than 1% (and even on the dual-core CPU, most of the time your kernel will be preempting the process every 10ms, checking if anything else needs to run, and then scheduling it again, so you don't save much).

      The only way the dual core processor would be faster in your example would be if it had more cache than the 5GHz CPU and the working set for your programs fitted into the cache on the dual-core 2GHz chip but not on the 5GHz one, but that's completely independent of the number of cores.

      --
      I am TheRaven on Soylent News
    19. Re:Adapt by TheRaven64 · · Score: 2, Informative

      Erlang, as mentioned elsewhere, is a great example of a high level functional language which parallelizes much better than C/C++,

      No it isn't. Erlang gains absolutely no benefit in terms of parallelism from being a functional language. All of the concurrency of Erlang comes from the CSP model, while functional languages get theirs via an extension to the lambda calculus.

      The one relevant feature of Erlang when talking about functional languages is that it does not allow mutable data other than the process dictionary. If you want to write parallel code in any language, there is one golden rule you should follow:

      No data shall be both mutable and aliased.

      In Erlang, this is enforced for you; the only mutable data structure is the process dictionary. In functional languages, this is typically handled via something like monads. There is nothing stopping you from enforcing this constraint in an imperative language, however, and if you follow this simple rule then concurrent programming is easy.

      --
      I am TheRaven on Soylent News
    20. Re:Adapt by gbjbaanb · · Score: 2, Insightful

      Yeah, I reckon you've got the reason things are "single-threaded" by design. So the solution is to start getting creative with sections of programs and not the whole.

      For example, if you're using OpenMP to introduce parallelisation, you can easily make loops run in multi-core mode, and you'll get compiler errors if you try to parallelise loops that can't be broken down like that.

      Like your building analogy - sure, you have to finish one floor before you can put the next one on, but once the floors are up, you can plumb each room up concurrently. You have to then wait until the plumbing and wiring is done before you can start plastering, and then you have to wait for that to dry before you can decorate - but you can then decorate each room concurrently.

      Stuff like that will allow you to easily set some parts running concurrently, and I reckon that's as good as we're going to get unless we start thinking in full-on functional-style programming designs. (see the wikipedia entry for a good exmaple) But I don't hold out hope for that anytime soon, its still hard to get right if the task is not simple.

      Besides, who really needs 8 cores anyway - unless there are specialist tasks (and I can think of only a few) the biggest problems we have are memory and IO bandwidth, not CPU performance.

    21. Re:Adapt by AmiMoJo · · Score: 3, Interesting

      So, we can broadly say that there are three areas where we can parallelise.

      First you have the document level. Google Chrome is a good example of this - first we had the concept of multiple documents open in the same program, now we have the concept for a separate thread for each "document" (or tab in this case). Games are also moving ahead in this area, using separate threads for graphics, AI, sound, physics and so on.

      Then you have the OS level. Say the user clicks to sort a table of data into a new order, the OS can take care of that. It's a standard part of the GUI system, and can be set off as a separate thread. Of course, some intelligence is required here as it's only worth spawning another thread if the sort is going to take some appreciable amount of time.

      At the bottom you have the algorithm level, which is the hard one. So far this level has got a lot of attention, but the others relatively little. The first two are the low hanging fruit, which is where people should be concentrating.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    22. Re:Adapt by TheRaven64 · · Score: 4, Insightful

      So the chip companies are generally going to end up spending process improvements by making chips cheaper, rather than more complex?

      Probably. Cheaper, and less power-hungry. For the past 50 years we've had a set of cycles where computers get dedicated hardware for some task, then the general purpose hardware gets fast enough to run it and the dedicated hardware goes away, then the cycle repeats with some other algorithm (sound, 2D video, and so on). The side-effect of this is that it also consumes a lot more power. For any algorithm, you can design dedicated hardware that executes it with less power than a general-purpose CPU. The DSP on something like an OMAP3 can decode MP3 audio in under 50mW; even something like the Atom is going to struggle to get within two orders of magnitude of this.

      This wasn't a problem for desktop PCs, because they were plugged into the mains and no one has itemised electricity bills, so no one notices the difference between a 20W and a 100W CPU. In a laptop or palmtop, the difference between 250mW (a typical ARM Cortex A8 SoC) and 20W (Atom + a cheap chipset) can be several hours of battery life. People are starting to expect 10 hours of battery life from portables, and doing this with a small battery requires a lot of dedicated silicon that can be turned off when not in use and draw small amounts of power when executing the task it was designed for.

      I expect the future of CPUs will be heterogeneous multicore. In a way, that's the present of CPUs too; you can consider the FPU and vector unit as separate, specialised, cores (although they lack separate control instructions, so it's stretching it slightly).

      --
      I am TheRaven on Soylent News
    23. Re:Adapt by beav007 · · Score: 5, Insightful

      It's posts like these that make me think that I'm the only one with 7 programs on the task bar, 12 in the system tray, assorted server processes, and 32 tabs open in Firefox (come on, 1 thread per tab!!). It doesn't much matter to me if each of these parts are not multithreaded, as long as the OS is smart enough to put active threads on different cores.

    24. Re:Adapt by Opyros · · Score: 4, Funny

      Thanks for the explanation -- for a moment, I was actually wondering what OpenOffice.org's parallelization mechanisms had to do with anything!

    25. Re:Adapt by bertok · · Score: 2, Insightful

      I think the consensus was that making compilers emit efficient VLIW for a typical procedural language such as C is very hard. Intel spend many millions on compiler research, and it took them years to get anywhere. I heard of 40% improvements in the first year or two, which implies that they were very far from ideal when they started.

      To achieve automatic parallelism, we need a different architecture to classic "x86 style" procedural assembly. Programming languages have to change too, the current crop are too close to the metal. I suspect that in the future, languages will rely on intermediate byte-code more, and become ever more functional as designers realize that functional code is easy to transform due to a lack of side-effects.

      I've heard of automatically parallelized versions of some pure functional languages that can execute almost any code on almost any number of CPUs without the programmer ever having to write a single synchronization instruction! For example, Microsoft is working on "parallel LINQ" in C# 4.0, which is essentially a small island of parallelizable functional code that can be embedded in a procedural language.

    26. Re:Adapt by mgblst · · Score: 2, Insightful

      A better but less humorous analogy would be to consider that Intel and co can't keep increasing the top speed of a car, so they are putting more seats into your car. This works OK when you have lots of people to transport, but when you only have 1 or two, it doesn't make the journey any faster. The problem is, most journeys only consist of one or two people. What the article is suggesting is that we implement some sort of car-sharing initiative, we stop taking so many cars to the same destination. Or a bus!

    27. Re:Adapt by giorgist · · Score: 2, Interesting

      You havn't seen bulding go up. You don't place a brick render it, paint it hang a picture frame and go to the next one.

      A multi story building has a myriad of things happening at the same time. If only computers were as parralel processing.
      If you have 100 or 1000 people working on a building, each is an independant process that shares resources.

      It is simple, 8 core CPUs is a solution that arrived before the problem. A good 10 year old computers can do most of todays
      office work.

    28. Re:Adapt by try_anything · · Score: 2, Interesting

      Short answer: only one thing I mentioned involved disk I/O, RAM is cheap, and application frameworks typically limit the number of jobs being run at one time.

      If there's really a performance need to serialize tasks involving disk I/O, then go ahead and serialize them. Eclipse, the application framework I'm most familiar with, makes this straightforward: just define a scheduling policy that allows only one job to run at a time and apply that policy to all your disk I/O jobs. Other jobs will continue to be scheduled and run according to the default policy or whatever other policy you specify -- might as well get some work done while you're waiting for the I/O to complete.

    29. Re:Adapt by jd · · Score: 4, Funny

      Well, you see, once IBM buys out Sun, Solaris is going to be re-implemented as macros in OpenOffice. Or Emacs. Whichever one they decide to pick as the new OS kernel.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    30. Re:Adapt by Waffle+Iron · · Score: 3, Insightful

      It's posts like these that make me think that I'm the only one with 7 programs on the task bar, 12 in the system tray, assorted server processes, and 32 tabs open in Firefox (come on, 1 thread per tab!!).

      I'd be willing to bet a good deal of money that almost all of those tasks are currently asleep and waiting for input, a timer signal or external I/O. Such processes don't need *any* cores unless and until they wake up.

      (The big exception for most people would be having flash ads running in those 32 firefox tabs. The way to solve that problem without adding more cores is by installing flashblock.)

      Right now "ps" says that my system is running 127 different processes. Current CPU utilization? 0.7%.

    31. Re:Adapt by jd · · Score: 4, Funny

      Three Cores for the Gnome kings under the Gtk,
      Seven for the KDE lords in their halls of X,
      Nine for Emacs Men doomed to spawn,

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    32. Re:Adapt by TapeCutter · · Score: 2, Informative

      "a game renders one map at a time because it's pointless to render other maps until the player made his gameplay decisions and arrived there"

      Rendering is perfect for parallel processing, sure you only want one map at a time but each core can render part of the map independently from other parts of the map.

      --
      And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
    33. Re:Adapt by erroneus · · Score: 2, Interesting

      Multi-core processing is one thing but access to multiple chunks of memory and peripherals are also keeping computers slow. After playing with running machines from PXE boot and NFS rooted machines, I was astounded at how fast those machines performed. Then I realized that the kernel and all wasn't being delayed waiting on local hardware for disk I/O.

      It seems to me, when NAS and SAN are used, things perform a bit better. I wonder what would happen if such control and I/O systems were applied into the same box? Smart RAID controllers are a step in that direction, but they are still accessed as SCSI devices. What might the results be if the secondary storage systems were in a server within the box dedicated to optimized disk I/O? The same sort of thing is being done with GPU processing, but I wonder how much more removed the graphics systems could become?

      Devices need to become smarter and faster to really make things perform at their best speed.

    34. Re:Adapt by david.given · · Score: 3, Interesting

      I expect the future of CPUs will be heterogeneous multicore.

      You may be interested to know that, as far as I can tell from the rather fuzzy documention, the MSM7201A processor used in the G1 smartphone has at least three dissimilar cores, and potentially five:

      • an ARM11 for the application stack
      • an ARM9 for the radio stack
      • a QDSP4000
      • possibly a QDSP5000, the spec is unclear as to whether you get both this and the QDSP4000
      • a PowerVR 3D accelerator unit, although the spec is again unclear as to whether this is actually in silicon and not just a particular firmware load for the DSP

      I gather that it's pretty hard to make them share address spaces, even the two ARMs; so SMP is probably not feasible. Message-passing via specific shared memory segments is the usual approach.

    35. Re:Adapt by SL+Baur · · Score: 3, Informative

      Short answer: only one thing I mentioned involved disk I/O, RAM is cheap.

      Not in modern architectures and it depends. Registers are faster than L1 caches. L1 caches are faster than L2 caches, etc.

      See: http://lwn.net/Articles/250967/ for an excellent discussion about how one can dramatically speed up applications by optimizing memory access.

      And I disagree with the title of this thread - Linux (the kernel at least) is quite well prepared for multicore chips.

    36. Re:Adapt by fractoid · · Score: 4, Insightful

      This is the sort of thing I like about Apple's 'Grand Central'.

      What's this 'grand central' thing? From a few brief Google searches it appears to be a framework for using graphics shaders to offload number crunching to the video card. It'd be nice if they'd stick (at least for technical audiences) to slightly more descriptive and less grandiose labels.

      <rant>
      That's always been my main peeve with Apple, they give opaque, grandiloquent names to standard technologies, make ridiculous performance claims, then set their foaming fanboys loose to harass those of us who just want to get the job done. Remember "AltiVEC" (which my friend swore could burn a picture of Jesus's toenails onto a piece of toast on the far side of the moon with a laser beam comprised purely of blindingly fast array calculations) which turned out to just be a slightly better MMX-like SIMD addon?

      Or the G3/G4 processors which lead us to be breathlessly sprayed with superlatives for years until Apple ditched them for the next big thing - Intel processors! Us stupid, drone-like "windoze" users would never see the genius in using Intel proce... oh wait. No, no wait. We got the same "oooh the Intel Mac is 157 times faster than an Intel PC" for at least six months until 'homebrew' OSX finally proved that the hardware is exactly the friggin same now. For a while, thank God, they've been reduced to lavishing praise on the case design and elegant headphone plug placement. It looks like that's coming to an end, though.
      </rant>

      --
      Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
    37. Re:Adapt by Draek · · Score: 5, Funny

      Three Cores for the Mozilla-kings under the GUI,
      Seven for the Gnome-lords in their halls of X,
      Nine for KDE Men doomed to be flamed,
      One for the Free Scheduler on his free kernel
      In the Land of Linux where the SMP lie.
      One Core to rule them all, One Core to find them,
      One Core to bring them all and in the scheduler bind them
      In the Land of Linux where the SMP lie.

      Which is, of course, what will eventually happen if the number of cores keep increasing: we'll need one dedicated exclusively to manage what goes where and when. Which is pretty cool when you think about it ;)

      --
      No problem is insoluble in all conceivable circumstances.
    38. Re:Adapt by KingMotley · · Score: 2, Interesting

      I guess that would be highly dependent on your particular field. First, .NET has functional languages like F#, and M. I also find that you dismiss the importance of profiling code in .NET simply because it doesn't generate machine language code. I'm at a loss as you why would you think it is any less important. Determining the areas which are being stressed the hardest and deserve more of your intention is completely unrelated to whether the code generates ASM, ML, or IL.

      You say .NET has "minimal support for it", but I suspect that's speaking more of your understanding that support. Background Workers is the easy way for highly independent routines to execute in many common scenarios. If that isn't enough or doesn't fit your need, then ThreadPools make highly parallelizable code a snap to implement. Example:

                      Dim eventhandles As New List(Of EventWaitHandle)
                      Try
                              For Each site As String In sites
                                      Dim ewh As New EventWaitHandle(False, EventResetMode.ManualReset)
                                      eventhandles.Add(ewh)
                                      Dim param As New ThreadData(ewh, "http://" & site)
                                      Threading.ThreadPool.QueueUserWorkItem(AddressOf DoDownload, param)
                              Next
                      Catch ex As Exception
                      End Try

      Now you are free to write the "DoDownload" routine, that could download some data, validate it, and do some processing on it. With no further changes, the above code would work well on any machine with a single processor, to one with 64 or more cores (I haven't tested more than 64 cores). If more control is needed, you can set the number of worker threads that will execute concurrently based on the number of processors in the machine with a single call, or you could implement your own threadpool, or create your own implementation by overriding specific functions of it. I also left the eventhandle code in the example, although it isn't needed for the example. It does show exactly how easy it is to create and use some more advanced thread synchronization primitives in .NET.

      Lastly, you could also spawn your own threads if you need/want even more control. It's incredibly easy. Example:
      Dim t1 as new thread(AddressOf DoDownload)
      t1.start()

      Not hard stuff, really. Of course if you want to get into larger scale outs, then you may want to look into the Azure set of .NET features, which is supposedly specifically designed for large scalability for cloud computing (I myself have no experience in that area).

    39. Re:Adapt by TheNinjaroach · · Score: 2, Insightful

      A single-core system at 5GHz would be less-responsive for most users than a dual-core 2GHz. Here's why:

      Because you're going to claim it takes more than 20% CPU time for the faster core to switch tasks? That's doubtful, I'll take the 5GHz chip any day.

      --
      I went to eat some animal crackers and the box said, "Do not eat if seal is broken." I opened the box and sure enough..
    40. Re:Adapt by mr_mischief · · Score: 2, Insightful

      This is indeed true on a general-purpose desktop most of the time. There are many server and workstation tasks, though, that ca take as many cores as you can throw at them.

      A web server, application middlware server, or database server will often run multiple single-threaded programs at once rather than running one huge multi-threaded application.

      People who say that you must have multi-threaded applications to use multiple cores are either incompetent or are looking at a very narrow section of the industry. Not everyone runs a single foreground process with just a virus scanner in the background. An SMP or NUMA server with a hundred application instances running isn't going to run all of them on the first four cores and ignore the rest.

      Some scheduling changes may be necessary to make doling the work out to really big number of cores, like 128, 256, or 512 work really well, but Linux is already run on HPC clusters much larger than that and Windows HPC is supposed to be capable of it, too.

    41. Re:Adapt by mr_mischief · · Score: 2, Interesting

      So let the office workers keep the two-core machines. I'll take the 8-core machine since I'm not doing just word processing and spreadsheets.

      BTW, complex spreadsheets are actually an ideal application to break into parallel execution if there aren't too many dependencies in the functions. A slower and more power-efficient multi-core processor could update all the cells in many spreadsheets just as fast as a faster single-core one.

  2. Nothing new to see here... by Microlith · · Score: 5, Insightful

    So basically yet another tech writer finds out that a huge number of applications are still single threaded, and that it will be a while before we have applications that can take advantage of the cores that the OS isn't actively using at the moment. Well, assuming you're running a desktop and not a server.

    This isn't a performance issue with regards to Windows or Linux, they're quite adept at handling multiple cores. They just don't need that much themselves and the applications run these days, individually, don't need much more than that either.

    So yes, applications need parallelization. The tools for it are rudimentary at best. We know this. Nothing to see here.

    1. Re:Nothing new to see here... by thrillseeker · · Score: 2, Interesting

      Did you ever follow the Occam language? It seemed to have parallelization intrinsic, but it never went anywhere.

    2. Re:Nothing new to see here... by phantomfive · · Score: 5, Interesting
      From the article:

      The onus may ultimately lie with developers to bridge the gap between hardware and software to write better parallel programs......They should open up data sheets and study chip architectures to understand how their code can perform better, he said.

      Here's the problem, most programs spend 99% of its time waiting. MOST of that is waiting for user input. Part of it is waiting for disk access (as mentioned in the AnandTech story, the best thing you can do to speed up your computer is get a faster hard drive/SSD). A miniscule part of it is spent in the processor. If you don't believe me, pull out a profiler and run it on one of your programs, it will show you where things can be easily sped up.

      Now, given that the performance of most programs is not processor bound, what is there to gain by parallelizing your program? If the performance gain were really that significant, I would already be writing my program with threads, even with the tools we have now. The fact of the matter is in most cases, there is really no point to writing your program in a parallel manner. This is something a lot of the proponents of Haskell don't seem to understand, that even if their program is easily paralellizable, the performance gain is not likely to be noticeable. Speeding up hard drives will make more of a difference to performance in most cases than adding cores.

      I for one am certainly not going to be reading chip data sheets unless there's some real performance benefit to be found. If there's enough benefit, I may even write parts in assembly, I can handle any ugliness. But only if there's a benefit from doing so.

      --
      Qxe4
    3. Re:Nothing new to see here... by 0123456 · · Score: 2, Informative

      Did you ever follow the Occam language? It seemed to have parallelization intrinsic, but it never went anywhere.

      Occam was heavily tied into the Transputer, and without the transputer's hardware support for message-passing, it's a bit of a non-starter.

      It also wasn't easy to write if you couldn't break your application down into a series of simple processes passing messages to each other. I suspect it would go down better today now people are used to writing object-oriented code, which is a much better match to the message-passing idea than the C code that was more common at the time.

    4. Re:Nothing new to see here... by ari+wins · · Score: 5, Funny

      I almost modded you Redundant to help get your point across.

      --
      Don't worry if you're a kleptomaniac, you can always take something for it.
    5. Re:Nothing new to see here... by caerwyn · · Score: 2, Insightful

      This is true to a point. The problem is that, in modern applications, when the user *does* do something there's generally a whole cascade of computation that happens in response- and the biggest concern for most applications is that app appear to have short latency. That is, all of that computation happens as quickly as possible so the app can go back to waiting for user input.

      There's a lot of gain that can be gotten by threading user input responses in many scenarios. Who cares if the user often waits 5 minutes before input? When he *does* do something, he wants it done *immediately*. The fact that it's a tiny percentage by wall time doesn't change the fact that responsiveness here is a massive percentage of user perception.

      --
      The ringing of the division bell has begun... -PF
    6. Re:Nothing new to see here... by caerwyn · · Score: 3, Interesting

      I don't entirely agree with you here. A lot of current applications *do* suffer from CPU-induced latency after user interactions, and the problem is simple: they don't differentiate between the things that must get done before control is returned to the user, and the things that need to happen in response to the action but can be allowed to happen whenever resources are free. Even when the problem is resource-access latency, multithreading can be a win because that latency no longer contributes to the latency that the user perceives if it happens on a background thread.

      Something as simple as tossing function calls off on a background thread to deal with some of these tasks would do a great deal to improve latency from the user's perspective, and is really quite trivial to implement. Most programmers don't do it, though. Part of that is that in most situations there aren't ready-made solutions- you can't just say "run this function call on a background thread", you've got to go through the pthread creation process, etc. (Apple's Cocoa framework is actually an exception to this with it's NSOperation).

      The situation is analogous to that of an interrupt task: Do absolutely as little as possible before returning; everything else should happen on some other thread.

      I agree with you regarding optimization, but it's been my experience that many applications *can* benefit from these sorts of simple multithreading techniques- the programmers just don't do them, either from lack of ability or lack of resources.

      --
      The ringing of the division bell has begun... -PF
    7. Re:Nothing new to see here... by Raenex · · Score: 2, Insightful

      There are plenty of good designs that work in a single threaded environment that do not in multi-threaded environment. It's just a completely different ballgame when you allow multiple threads to be running on the same piece of code. With threading, the complexity goes up an order of magnitude and so does the penalty for failure.

      Anyways, I'm out. This is the standard debate about "good" programmers and "good" designs vs dangerous techniques that should be avoided.

  3. There's a simple paradigm here by mysidia · · Score: 5, Interesting

    Multiple virtual machines on the same piece of metal, with a workstation hypervisor, and intelligent balancing of apps between backends.

    Multiple OSes sharing the same cores. Multiple apps running on the different OSes, and working together.

    Which can also be used to provide fault tolerance... if one of the worker apps fails, or even one of the OSes fails, your processor capability is reduced, a worker app in a different OS takes over, use checkpointing procedures, and shared state, so the apps don't even lose data.

    You should even be able to shutdown a virtual OS for windows updates without impact, if the apps that arise get designed properly...

  4. Huh? by Samschnooks · · Score: 5, Funny

    ...programmers are to blame for that

    The development tools aren't available and research is only starting."

    Stupid programmers! Not able to develop software without the tools! In my day we wrote our own tools - in the snow, uphill, both ways! We didn't need no stink'n vendor to do it for us - and we liked it that way!

  5. The article's turning a real problem into FUD. by davecb · · Score: 5, Informative

    Firstly, it's false on the face of it: Ubuntu is certified on Sun T2000, a 32-thread and Canonical is supporting it.

    Secondly. it's the same FUD as we heard from uniprocessor manufacturers when multiprocessors first came out: this new "symmetrical multiprocessing" stuff will never work, it'll bottleneck on locks.

    The real problem is that some programs are indeed badly written. In most cases, you just run lots of individual instances of them. Others, for grid, are well-written, and scale wonderfully.

    The ones in the middle are the problem, as they need to coordinate to some degree, and don't do that well. It's a research area in computer science, and one of the interesting areas is in transactional memory.

    That's what the folks at the Multicore Expo are worried about: Linux itself is fine, and has been for a while.

    --dave

    --
    davecb@spamcop.net
    1. Re:The article's turning a real problem into FUD. by cowbutt · · Score: 4, Funny

      I dunno, I'm not feeling particularly fearful or doubtful after reading the article.

      The articles has, apparently, sown Uncertainty in you, however, so it was 33.3% successful.

    2. Re:The article's turning a real problem into FUD. by tepples · · Score: 2

      Dealing with multiple cores is the operating system's problem - not the application's. If the programmer uses multiple threads or processes, then it should be the OS that worries about allocating resources among the cores.

      But the problem of TFA is that desktop applications don't use enough threads or processes. If the programmer hasn't split an application into multiple threads or processes, then there usually isn't more than one thread of one process that wants to run at any given time, and there is nothing for the operating system to schedule.

  6. Re:Clarification please by bucky0 · · Score: 2

    I guess you could read it and find out...

    seriously?

    --

    -Bucky
  7. Example: Scripting Languages by mcrbids · · Score: 3, Interesting

    Languages like PHP/Perl, as a rule, are not designed for threading - at ALL. This makes multi-core performance a non-starter. Sure, you can run more INSTANCES of the language with multiple cores, but you can't get any single instance of a script to run any faster than what a single core can do.

    I have, so, so, SOOOO many times wished I could split a PHP script into threads, but it's just not there. The closest you can get is with (heavy, slow, painful) forking and multiprocess communication through sockets or (worse) shared memory.

    Truth be told, there's a whole rash of security issues through race conditions that we'll soon have crawling out of nearly every pore as the development community slowly digests multi-threaded applications (for real!) in the newly commoditized multi-CPU environment.

    --
    I have no problem with your religion until you decide it's reason to deprive others of the truth.
    1. Re:Example: Scripting Languages by dansmith01 · · Score: 5, Interesting

      Perl has excellent support for building threaded applications. See http://perldoc.perl.org/threads.html . I code multi-threaded apps in perl all the time and they utilize my quad-code very efficiently - in fact, my biggest hassle with multithreading is keeping the CPU cooled! There's also a threads::shared module (http://perldoc.perl.org/threads/shared.html) for handling locks, etc. I'd be hard pressed to imagine better language support for threading. Hardware, operating systems, and a lot of languages support threading. Granted, it isn't always easy/possible/worth it, but as things currently stand, the only bottleneck is programmers who are too lazy to design their algorithms for parallel execution.

    2. Re:Example: Scripting Languages by amorsen · · Score: 4, Insightful

      Fork isn't slow or painful. And if you think shared memory is a bad way to communicate, you REALLY won't like threads.

      --
      Finally! A year of moderation! Ready for 2019?
  8. How many tools do you need? by Anonymous Coward · · Score: 5, Insightful

    "The development tools aren't available and research is only starting"

    Hardly. Erlang's been around 20 years. Newer languages like Scala, Clojure, and F# all have strong concurrency. Haskell has had a lot of recent effort in concurrency (www.haskell.org/~simonmar/papers/multicore-ghc.pdf).

    If you prefer books there's: Patterns for Parallel Programming, the Art of Multiprocessor Programming, and Java Concurrency in Practice, to name a few.

    All of these are available now, and some have been available for years.

    The problem isn't that tools aren't available, it's that the programmers aren't preparing themselves and haven't embraced the right tools.

  9. BeOS by Snowblindeye · · Score: 5, Interesting

    Too bad BeOS died. One of the axioms the developers had was 'the machine is a multi processor machine', and everything was built to support that.

    Seems like they were 15 years ahead of their time. But, on the other hand, too late to establish an other OS in a saturated market. Pity, really.

    1. Re:BeOS by yakumo.unr · · Score: 2, Informative

      So you missed Zeta then ? http://www.zeta-os.com/cms/news.php (change to English via the dropdown on the left)

    2. Re:BeOS by b4dc0d3r · · Score: 2, Informative

      Looks dead to me, a year ago they posted this:

      With immediate effect, magnussoft Deutschland GmbH has stopped the distribution of magnussoft Zeta 1.21 and magnussoft Zeta 1.5. According to the statement of Access Co. Ltd., neither yellowTAB GmbH nor magnussoft Deutschland GmbH are authorized to distribute Zeta.

      http://www.bitsofnews.com/content/view/5498/44/

    3. Re:BeOS by verbatim_verbose · · Score: 2, Interesting

      It may have been an axiom, but really, what did BeOS do (or want to do) that Linux doesn't do now?

      The Linux OS has been scaled to thousands of CPUs. Sure, most applications don't benefit from multi-processors, but that'd be true in BeOS, too.

      I'd honestly like to know if there is some design paradigm that was lost with BeOS that isn't around today.

  10. Grand Central by tepples · · Score: 3, Informative
    Anonymous Coward wrote:

    get a mac..

    I assume you're talking about Mac OS X 10.6 (Snow Leopard), whose Grand Central framework is supposed to add some tools to make Mac-exclusive multithreaded apps easier to program.

  11. Don't imagine. Its name was Java. by tepples · · Score: 4, Informative

    imagine software being developed for imaginary or speculatory hardware.

    I think Sun called it "Java". It was run on emulators long before ARM and others came out with hardware-assisted JVMs such as Jazelle.

    1. Re:Don't imagine. Its name was Java. by Dahamma · · Score: 2, Insightful

      Yeah, there are also imaginary languages for imaginary processors like mic1 and stuff. But TFA is talking about operationg systems

      Don't state what TFA says if you didn't even read TFA.

      There isn't a SINGLE reference to Linux, Windows, or any other operating system in TFA. It was about lack of developer tools to create effective multithreaded applications, and had nothing to do with operating systems.

  12. Another flamebait story by timothy by Anonymous Coward · · Score: 5, Insightful

    The quote presented in the summary is nowhere to be found in the linked article. To make matters worse, the summary claims that linux and windows aren't designed for multicore computers but the linked article only claims that some applications are not designed to be multi-threaded or running multiple processes. Well, who said that every application under the sun must be heavily multi-threaded or spawning multiple processes? Where's the need for a email client to spawn 8 or 16 threads? Will my address book be any better if it spans a bunch of processes?

    The article is bad and timothy should feel bad. Why is he still responsible for any news being posted on slashdot?

  13. Re:Clarification please by tepples · · Score: 2, Insightful

    Is TFA talking about the Linux or Windows thread and scheduling not good enough for 4+ cores (so your programs no matter how good designed will not benefit from more cores), about being damn hard to split, thread and join tasks, or both?

    I understood the article to refer to the latter. The programming languages that are popular for desktop applications as of the 2000s don't have the proper tools (such as an unordered for-each loop or a rigorous actor model) to make parallel programming easy.

  14. Parallel programming is hard, film at 11. by Troy+Baer · · Score: 5, Informative

    The /. summary of TFA is almost exquisitely bad. It's not Window or Linux that's not ready for multicore (as both have supported multi-processor machines for on the order of a decade or more), but rather the userspace applications that aren't ready. The reason is simple: Parallel programming is rather hard, and historically most ISVs have haven't wanted to invest in it because they could rely on the processors getting faster every year or two... but no longer.

    One area where I disagree with TFA is the claimed paucity of programming models and tools. Virtually every OS out there supports some kind of concurrent programming model, and often more than one depending on what language is used -- pthreads, Win32 threads, Java threads, OpenMP, MPI or Global Arrays on the high end, etc. Most debuggers (even gdb) also support debugging threaded programs, and if those don't have enough heft, there's always Totalview. The problem is that most ISVs have studiously avoided using any of these except when given no other choice.

    --t

    --
    "My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
    1. Re:Parallel programming is hard, film at 11. by klossner · · Score: 4, Insightful

      In fact, TFA doesn't even use the words "Linux" or "Windows."

  15. Multithreaded applications not always needed by Pascal+Sartoretti · · Score: 2, Insightful

    Developers still write programs for single-core chips and need the tools necessary to break up tasks over multiple cores.

    So what? If I had a 32 core system, at least each running process (even if single-threaded) could have a core just for itself. Only a few basic applications (such as a browser) really need to be designed for multiples threads.

  16. Some tasks are embarrassingly parallel by tepples · · Score: 4, Informative

    Most programs barely use any computational power, in fact there are very few programs that require all that computing power to operation and those are certainly well designed.

    Home users do use some apps that could benefit from multiple cores. Video encoding is one of them, but that one is embarrassingly parallel because the encoder could just split the video into quadrants and have each of four cores work on one quadrant.

    1. Re:Some tasks are embarrassingly parallel by evilviper · · Score: 3, Informative

      Video encoding is one of them, but that one is embarrassingly parallel

      This is most certainly not true. While many video codecs have been multi-threading enabled, they always do so at a significant quality reduction.

      because the encoder could just split the video into quadrants and have each of four cores work on one quadrant.

      Many features of H.264 (like GMC) require a a whole frame, not a quadrant. In practically all lossy video codecs, motion vectors have to be computed as the differential from the previous. And there are endless other examples. Of course there's little point in going into it, because the next time video encoding comes up on /., dozens of other people will make the exact same uninformed statements...

      Just go visit the x264 mailing list and ask the developers why they stopped using slice-based encoding for multithreaded encoding...

      I used to recommend splitting a 2-hour video into four 30-minute parts and feeding each to a single-threaded encoder.

      That would only make ANY sense with fixed bitrate encoding. It can possibly be used in the second-pass of multipass encoding, but that's not trivial to do by any stretch.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  17. Re:The Core? by palegray.net · · Score: 2, Interesting

    Hey, at least we aren't dealing with the lovely world of Cyrix anymore... those were truly fun times with respect to compiler optimizations (or lack thereof, as it turned out). That and the, um, heat "issues."

  18. Why a web browser needs threads by tepples · · Score: 4, Insightful

    What good are multiple cores and threads when you are running event driven GUI application?

    Mozilla Firefox is an event-driven GUI application. But if I open a page in a new tab, a big reflow or JavaScript run in that page can freeze the page I'm looking at. You can see this yourself: open this page in multiple tabs, and then try to scroll the foreground page. If Firefox used a thread or process per page like Google Chrome does, the operating system would take care of this. Other applications need to spawn threads when calling an API that blocks, such as gethostbyname() or getaddrinfo(); otherwise, the part of the program that interacts with the user will freeze. But these are the kind of threads that are useful even on a single core, not multicore-specific optimizations.

    1. Re:Why a web browser needs threads by Snowblindeye · · Score: 2, Insightful

      open this page in multiple tabs, and then try to scroll the foreground page. If Firefox used a thread or process per page like Google Chrome does, the operating system would take care of this.

      I think you are gravely oversimplifying things. Firefox certainly uses multiple threads. My Firefox thread is using 16 threads at the moment. The reason Chrome is using processes is so that when one of them crashes the other ones stay up.

      Also, if you look closely, it doesn't completely look up while the other tabs are loading. It *does* however, lock up at some point during the rendering. Which would indicate that some points of the code are synchronizing between threads, or bottlenecking on some resource, and that locks it up.

      Which is part of the problem. Its easy to say people need to use more threads. But the trouble comes when you need to synchronize, when they need to communicate with each other. Thats when you introduce performance bottlenecks. It's also one of the reasons why threading is harder than it seems.

  19. It's already there by wurp · · Score: 3, Insightful

    Seriously, no one has brought up functional programming, LISP, Scala or Erlang? When you use functional programming, no data changes and so each call can happen on another thread, with the main thread blocking when (& not before) it needs the return value. In particular, Erlang and Scala are specifically designed to make the most of multiple cores/processors/machines.

    See also map-reduce and multiprocessor database techniques like BSD and CouchDB (http://books.couchdb.org/relax/eventual-consistency).

  20. Article = -1 Flamebait by tyler_larson · · Score: 3, Insightful

    If you spend more time assigning blame than you do describing the problem, then clearly you don't have anything insightful to say.

    --
    "With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea...."
    RFC 1925
  21. That's a big leap by SuperKendall · · Score: 3, Insightful

    If you don't believe me, pull out a profiler and run it on one of your programs, it will show you where things can be easily sped up.

    Now, given that the performance of most programs is not processor bound

    That's a pretty big leap, I think.

    Yes a lot of todays apps are more user bound than anything. But there are plenty of real-world apps that people use that are still pretty processor bound - Photoshop, and image processing in general is a big one. So can be video, which starts out disk bound but is heavily processor bound as you apply effects.

    Even Javascript apps are processor bound, hence Chrome...

    So there's still a big need for understanding how to take advantage of more cores - because chips aren't really getting faster these days so much as more cores are being added.

    --
    "There is more worth loving than we have strength to love." - Brian Jay Stanley
    1. Re:That's a big leap by phantomfive · · Score: 2, Informative

      So there's still a big need for understanding how to take advantage of more cores - because chips aren't really getting faster these days so much as more cores are being added.

      OK, so we can go into more detail. For most programs, parallelization will do essentially nothing. There are a few programs that can benefit from it, as you've mentioned. But those programs are already taking advantage of them, not only do video encoding programs use multiple cores, some can even farm the process out over multiple systems. So it isn't a matter of programmers being lazy, or tools not being available, it's a matter of in most cases, multiple cores won't make a difference. If you run windows, open the task manager and check how often the CPU is completely occupied. Rarely.

      Javascript is an interesting example, because in the last few months we've had something of a competition between browser makers to see who could get the fastest javascript. Now, I'm not going to go read through the changelogs, but I'm willing to bet that the biggest speed ups haven't been from making it multi-threaded, but rather from standard optimization techniques. Basically they went through with a profiler, found what the bottlenecks were, and tried to remove them. This is the normal way to optimize your program. If it happens to turn out the the bottleneck is a bunch of things waiting to use the processor while there is another one available, then you start thinking about making it multi-threaded. If not, then making it multi-threaded will gain you nothing as far as performance.

      --
      Qxe4
    2. Re:That's a big leap by davecb · · Score: 4, Informative

      And if you look at a level lower that the profiler, you find your programs are memory-bound, and getting worse. That's a big part of the push toward multithreaded processors.

      To paraphrase another commentator, they make process switches infinitely fast, so one can keep on using the ALU while your old thread is twiddling its thumbs waiting for a cache-line fill.

      --dave

      --
      davecb@spamcop.net
  22. Not tools, developers by Todd+Knarr · · Score: 3, Insightful

    Part of the problem is that tools do very little to help break programs down into parallelizable tasks. That has to be done by the programmer, they have to take a completely different view of the problem and the methods to be used to solve it. Tools can't help them select algorithms and data structures. One good book related to this was one called something like "Zen of Assembly-Language Optimization". One exercise in it went through a long, detailed process of optimizing a program, going all the way down to hand-coding highly-bummed inner loops in assembly. And it then proceeded to show how a simple program written in interpreted BASIC(!) could completely blow away that hand-optimized assembly-language just by using a more efficient algorithm. Something similar applies to multi-threaded programming: all the tools in the world can't help you much if you've selected an essentially single-threaded approach to the problem. They can help you squeeze out fractional improvements, but to really gain anything you need to put the tools down, step back and select a different approach, one that's inherently parallelizable. And by doing that, without using any tools at all, you'll make more gains than any tool could have given you. Then you can start applying the tools to squeeze even more out, but you have to do the hard skull-sweat first.

    And the basic problem is that schools don't teach how to parallelize problems. It's hard, and not everybody can wrap their brain around the concept, so teachers leave it as a 1-week "Oh, and you can theoretically do this, now let's move on to the next subject." thing.

    1. Re:Not tools, developers by EnglishTim · · Score: 2, Insightful

      And the basic problem is that schools don't teach how to parallelize problems. It's hard, and not everybody can wrap their brain around the concept...

      And there's more to it than that; If a problem is hard, it's going to take longer to write and much longer to debug. Often it's just not worth investing the extra time, money and risk into doing something that's only going to make the program a bit faster. If we proceed to a future where desktop computers all have 256 cores, the speed advantage may be worth it but currently it's a lot of effort without a great deal of gain. There's probably better ways that you can spend your time.

  23. Nutty by eneville · · Score: 2, Insightful

    I disbelieve this entirely. UNIX/Linux is well designed for multiple core CPUs. Just take the whole single program, single small job approach of a pipeline command and you have your multicore solution ready. Programs that can make use of tasks that are IO bound are frequently written with threading in mind. qmail/apache are both well written for mutliple core CPUs. I don't see what the article is trying to say. Its clearly wrong.

  24. This is kinda like XML... by FlyingGuy · · Score: 2, Interesting

    it is the answer to the question that no one asked...

    In a real world application, as others have mentioned pretty much all of a programs time is spent in an idle loop waiting something to happen and in almost all circumstances it is input from the user in whatever form, mouse, keyboard, etc.

    So lets say it is something life Final Cut. Now to be sure when someone kicks of a render this is an operation that can be spun off on its own thread or its own process, freeing up the main process loop to respond to other things that the user might be doing, but that is where the rubber really hits the road is user input. The user could do something that affects the process that was just spun off, either as a separate thread or process on the same core or any other number of cores so you have to keep track of what the user is doing in the context of things that have been farmed out into other cores/processes/threads.

    Enter the OS.. Take your pick since it really does not matter which OS we are talking about, they all do the same basic things, perhaps differently, but they do. How does an OS designer make sure any of say 16 cores ( dual 8 core processors) are actually well and fairly utilized? Would it be designed to use a core to handle each of the main functions of the OS, lets say Drive Access, Com Stack pick your protocol here, Video Processing etc., or should it just run a scheduler like those that they now run which farms out thread processing based on priority? Is there really any priority scheme for multiple cores that could run say hundreds of threads / processes each? And what about memory? A single core machine that is say truly 64 bit can handle a very large amount of memory and that single core controls and has access to all that ram at its whim ( DMA not withstanding ), but what do you do now that you have 16 cores all wanting to use that memory, do we create a scheduler to schedule access from 16 different demanding stand alone processors or do we simply give each core a finite memory space and then have to control the movement of data from each memory space to another, since a single process thread ( handling the main UI thread for a program ) has to be aware of when something is finished on one core and then get access to that memory to present results either as data written to say a file or written into video memory for display?

    I submit that the current paradigm of SMP is inadequate for these tasks and must be rethought to take advantage of this new hardware. I think a more efficient approach is that each core detected would be fired up with its own monitor stack as a place to start so that the scheduling is based upon the feedback from each core. The monitor program would be able to ensure that the core it is responsible for is optimized for the kind of work that is presented. This concept while complicated could be implemented and serve as a basis for further development in this very complex space.

    In the terms of "super computers" this has been dealt with but in a very different methodology that I do not think lends itself to general computing. Deep Blue, Cray's and things like that aren't really relevant in this case since those are mostly very custom designs to handle a single purpose and are optimized for things like Chess or Weather Modeling, Nuclear Weapons study where the problem are already discretely chunked out with a known set of algorithms and processes. General purpose computing on the other hand is like trying to heard cats from the OS point of view since you never really know what is going to be demanded and how.

    OS designers and user space software designers need to really break this down and think it all the way through before we get much further or all this silicon is not going to used well or efficiently.

    --
    Hey KID! Yeah you, get the fuck off my lawn!
  25. This is incorrect by hazydave · · Score: 3, Funny

    The idea of an OS and/or suppoet tools handling the SMP problem is nothing more than a crutch for bad programming.

    In fact, anyone who grew up with a real multitheaded, multitasking OS is already writing code that will scale just dandy to 8 cores and beyond. When you accept that a thread is nothing more or less than a typical programming construct, you simply write better code. This is no more or less an amazing thing than when regular programmers embraced subroutines or structures.

    This was S.O.P. back in the late 80s under the AmigaOS, and enhanced in the early/mid 90s under BeOS. This in not new, and not even remotely tied to the advent of multicore CPUs.

    The problem here is simple: UNIX and Windows. Windows had fake multitasking for so long, Windows programmers barely knew what you could do when you had "thread" in the same toolkit as "subroutine", rather than it being something exotic. UNIX, as a whole, didn't even have lightweight preemptive threads until fairly recently, and UNIX programmers are only slowly catching up.

    However, neither of these is even slightly an OS problem... it's an application-level problem. If programmers continue to code as if they had a 70s-vintage OS, they're going to think in single threads and suck on 8-core CPUs. If programmers update themselves to state-of-the-1980s thinking, they'll scale to 8-cores and well beyond.

    --
    -Dave Haynie
    1. Re:This is incorrect by Todd+Knarr · · Score: 3, Informative

      Unix didn't for a long time have lightweight preemptive threads because it had, from the very beginning, lightweight preemptive processes. I spent a lot of time wondering why Windows programmers were harping on the need for threads to do what I'd been doing for a decade with a simple fork() call. And in fact if you look at the Linux implementation, there are no threads. A thread is simply a process that happens to share memory, file descriptors and such with it's parent, and that has some games played with the process ID so it appears to have the same PID as it's parent. Nothing new there, I was doing that on BSD Unix back in '85 or so (minus the PID games).

      That was, in fact, one of the things that distinguished Unix from VAX/VMS (which was in a real sense the predecessor to Windows NT, the principal architect of VMS had a big hand in the architecture and internals of NT): On VMS process creation was a massive, time-consuming thing you didn't want to do often, while on Unix process creation was fast and fairly trivial. Unix people scratched their heads at the amount of work VMS people put into keeping everything in a single process, while VMS people boggled at the idea of a program forking off 20 processes to handle things in parallel.

    2. Re:This is incorrect by dkf · · Score: 2, Interesting

      It's better to have specifically declared shared memory with inherently limited access. At the very least, analysis could catch unlocked accesses to known-shared memory.

      You're better off going to a message-passing model; they're theoretically much more tractable (there are several schemes that have had decades of work done and even spawned programming languages) and they scale up to multi-machine computing (e.g. cluster-scale) much more easily.

      Shared memory parallelism is just plain nasty. Occasionally useful, but always nasty. Use with care and good taste.

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
  26. Re:multithreading not even in C or C++ by hazydave · · Score: 2, Interesting

    Multithreading is a system-level thing, not a language level thing.

    Sure, there have been languages that make threading ubiquitous, but they've never caught on, and it's hardly necessary.

    You'll notice that internet, graphics, and many other programming necessities are not built into C/C++ either. They are higher level functions, and thousands of programmers have no problem understanding C's role here. People have been writing multithreading code in C/C++ for decades... I've personally done in from the 80s until now, under a dozen or so OSs.

    Don't use your chosen language as a crutch for sicking to the level of programming practiced when that langauge debuted. The whole point of C was not to define much of anything in C itself.. in truth, the language proper doesn't even do I/O... that's handled via a library. So is threading, so is graphics, etc.

    --
    -Dave Haynie
  27. Re:what kind of fanboy wrote that article? by hazydave · · Score: 2, Interesting

    That's incorrect, at least in part. Modern MacOS is based on CMU's Mach, which has had lightweight threading support since long before Apple got into the picture. The OS was completely designed for multiple CPUs, down to the very core.

    If modern MacOS apps are not heavily multithreaded (I have no idea, I don't run priorietary hardware anymore, regardless of the OS), that's the fault of programmers not advancing past the days of MacOS 9... it has nothing whatsoever to do with the OS.

    --
    -Dave Haynie
  28. Say what ? by Space+cowboy · · Score: 3, Informative

    Apple have no 2 core intel systems. Period.

    Even the lowly Mac mini is a dual-core system. Every laptop is a dual-core system. The Mac Pro is either 4-core (with hyperthreading for a virtual 8-core) or 8-core (with hyperthreading for a virtual 16-core) system.

    "Better to keep silent and look the fool, rather than speak and remove all doubt"

    Simon.

    --
    Physicists get Hadrons!
    1. Re:Say what ? by Space+cowboy · · Score: 2, Informative

      Gaah - the < was swallowed in the statement "Apple have no <2 core intel systems. Period."

      Probably obvious, but to save people nit-picking

      --
      Physicists get Hadrons!
  29. God! The guy doesnt even know Linux != GNU/Linux by miknix · · Score: 2, Funny

    Windows and Linux aren't designed for PCs beyond quad-core chips, and programmers are to blame for that. Developers still write programs for single-core chips and need the tools necessary to break up tasks over multiple cores.

    How many times do we have to tell that Linux *IS* the fscking kernel??

    Given that, including Linux and Windows in the same bag doesn't make sense. Which makes the entire post m00t.

    Solutions:
    1) s/Windows/Windows NT kernel/
    2) s/Linux/GNU\/Linux/

    Nice try to get a battle though.

  30. Re:multithreading not even in C or C++ by johannesg · · Score: 2, Informative

    There's not even a way in the C or C++ core language to start a new thread. And with many different third party libraries, there'll never be a reliable standard way to do it.

    Never? A standard, reliable way to do it will be part of C++0x - so that's hardly "never"...

  31. do it the unix way, use pipes! by Gunstick · · Score: 2, Funny

    Unix has for ages run on multi CPU systems. And it does this well. And with easy tools you can harvest the power of all CPUs: the pipe
    Every part of the pipe can run on another CPU.

    I recently came across fslint, which is a example of heavily piped shell.

    In short (leaving out the parameters and options) it runs
    find | sort | tr | sort | bash | merge_hardlinks| uniq | sort | cut | tr | bash | xargs | sort | uniq | cut | sort | tr | xargs | sort | uniq | cut | sort |tr | xargs | sort | uniq | cut | bash | sort

    That's a lot of CPUs :-)
    OK it's not a great example for CPU hungry programs. But the progress of the modern programming languages which tend to be monolythic beasts to do everything (perl, php, java) lead to programs not using pipes or other types of inter process communication because it's just cumbersome.
    The pipe concept enables multi CPU programming without even thinking about how to put tasks on different processors.
    Unfortunatly I have not found a language which sets such a simple concept as the fundamental programming principle.

    See the unix shell, without the pipe you can't really do much.

    --
    Atari rules... ermm... ruled.
  32. Re:Functional programming style is not enough by SpuriousLogic · · Score: 2, Interesting

    I'm not sure I totally agree that Haskell if the future, although I do think that functional programming right now looks to be the the most promising way to deal with muli-cores. Scala has some very strong points that can see it's adoption beat the other, specifically being able to run in the Java JVM and make use of existing Java libraries. You can use the function aspects of Scala when you need to, but still use Java where you do not need parallelism.

  33. elephant years by epine · · Score: 2, Insightful

    Knuth's maxim is sufficiently pithy to have become, over time, self referential, as evidenced by your misunderstanding.

    The root of all evil used to be deep and singular, now it is broad and shallow. I guarantee you that Knuth did not include choosing the best fundamental algorithm under the label "premature" unless it involves squabbling over log log N terms or stray digits in the exponent term.

    http://www.siam.org/pdf/news/174.pdf

    An unpacked (deoptimized) version of Knuth's maxim is that the transition from program structure and notation which maximizes readability, comprehension, and conviction (concerning its correctness and merit) to one which favours performance should be delayed as long as possible. Ideally until performance becomes the sole remaining success factor.

    (Taking into account the human mind's special capacity to imprint upon evil, Knuth's formulation remains the better one.)

    Originally Knuth meant manually hoisting loop constant expressions (often in ways that later turn out to not be fully general) or manually evaluating constant expressions or manually fusing nested function calls and the kind of rot that a good compiler these days will do on your behalf. Anyone used the "register" keyword lately? Once upon a time it seemed like a good idea.

    While the principle remains the same, the temptations have changed. Such as parallelizing a bad implementation of a poor algorithm in the misguided belief that the underlying task is not sequentially bound.

    That said, projects which do *no* evil typically fail to impress anyone. The ideal is to wrap large amount of cleanly structured and accessible source code around a nugget of pure, smoldering evil, coked to the last clock cycle.

    Perversely, the worst example of this is TeX itself. The smoldering nugget of pure evil is the single pass parsing regime and data packing eight bit character values.

    I suspect the literature on parallel programming would roughly equal the literature on electro-chemical storage cells. Sheesh, if only those guys were paying attention, we'd have watch batteries powering small cities by now.

    On second thought, how much literature could there really be if you can summon the majority of it onto your screen in 4/10'ths of a second for any combination of keywords?

    Parallel programming is a lot like fuel cells. You get some pretty impressive results on selected applications involving pristine apparatus in a controlled setting, dating back to the Apollo program (in both cases).

    Reality on the ground is rarely so forgiving.

    If we hadn't already achieved a pixel processing speed-up between 1980 and 2008 best approximated by a sideways 8, Javascript wouldn't even have entered the conversation.

    It boils down to this: ignoring everything you guys have already accomplished, you've pretty much done nothing. I worked for that kind of company once. The guy in charge put on a Cirque du Soleil of intestinal recursion. That's how I feel about the claim that software developers haven't been paying attention to parallelism for elephant years.

  34. lack of industry knowledge? by pjr.cc · · Score: 2, Interesting

    It's really quite frustrating to see posts like this. Posts that dont take into account what is needed and focus on what we are incapable of doing - even when they dont need to.

    So lets look at reality for second. First, most modern OS's scale very very far past 4 cpu's (not sure what windows scales to, but linux certainly has no limitation based on current cpu reality). So the kernels are just dandy for multi-core cpu's, bring it on! 128 cores, we're ready for ya!.

    The same is not true at the application level, and that is a fair comment. But dont confuse linux and windows with their apps for crying out loud! From an application point of view we are capable of parallel coding, but its non-trivial. Its also not something we need alot of the time.

    For instance, we now buy servers (our cheapest models) with dual cpu's and quad cores and we're tending to virtualise it up into several machines with 1 or 2 cpu's each. Now whether you do this because you assume the OS will utilise one cpu and the apps will utilise another (as one person told me is irrelavent). Surfice it to say, having 2 cpu's is usually quite nice.

    But what requires more then that in reality? well, your desktop might - after all theres alot of things going on at once right? In some point cases, thats true (there are quite a number of very heavy applications out there, and supprise supprise, they can multitask *GASP*).

    Same at the server, not many things require that many CPU's and even at the application level, we've gotten good at spreading heavily loaded applications across multiple servers (we call it load balancing, was that too sarcastic?). Take mail (weather its exchange or postfix or sendmail or whatever), or web servers, etc. Those server applications that do require heavy grunt tend to already be coded with "parallel" in mind, even across multiple servers (think oracle RAC).

    As for cache contention - well it sounds like the hardware makers are finally fess'ing up to the fact they have a problem, Houston!

  35. Real Dumb by omb · · Score: 2, Insightful

    As has already been explained, Non-Sequential thinking is hard, you postulate double speed, BUT the producer thread, the app finished and handed of the buffer to the OS to send to the GPU, and you say it threads this. Well fine, so the threaded part can run on another core, but then hardware DMAs the data and waits for a GPU interrupt/done-queue ack so how does this speed things up on multicore. Not at all, someone has to set up the DMA and wait, not run, while it completes, so unless all cores are at 100% you have saved nothing, and created additional overhead spawning a new thread

    Duh, Marketing Departments

  36. Re:Mythical Machine Month by Tiger4 · · Score: 2, Interesting

    2. I recode for a cluster. Why stop at a multi-core computer? If I can get a 2:1 to 10:1 speed up by writing better code, then why stop at a dual or quad core? The application might require a 100:1 speed up, and that means more computers. If I have a really nasty problem, chances are that 100 cores are required, not just 2 or 8. Multi-core processors are nice, because they reduce cluster size and cost, but a cluster will likely be required.

    I think I agree with you, BUT... don't fall into the old trap: If ten machines can do the job in 1 month, 1 machine can do the job in 10 months. But it doesn't necessarily follow that if one machine can do the job in 10 months, 10 machines can do the job in 1 month.

    Also, the problem with runtime interpreters is not that they don't generate assembly code. The problem is that it is harder to get at the underlying code that is really executing. That code could be optimized if you could see it. But seeing it is just more difficult.

    --
    Behold, this dreamer cometh. Come now, and let us slay him... and we shall see what will become of his dreams.
  37. If you are right, we aren't very smart by coryking · · Score: 4, Interesting

    But most computing in the world is done using single-threaded processes which start somewhere and go ahead step by step, without much gain from multiple cores.

    The fact that all we do is sequential tasks on our computer means we are still pretty stupid when it comes to "computing". If you look outside your CPU, you'll see the rest of the computers on this planet are massively parallel and do tons and tons of very complex operations far quicker than the computer running on either one of our desks.

    Most of the computers on the planet are organic ones inside of critters of all shapes and sizes. I dont see those guys running around with some context-switching, mega-fast CPU, do you?**. All the critters I see are using parallel computers with each "core" being a rather slow set of neurons.

    Basically, evolution of life on earth seems to suggest that the key to success is going parallel. Perhaps we should take the hint from nature.

    ** unless you count whatever the hell consciousness itself is... "thinking" seems to be single-threaded, but uses a bunch of interrupt hooks triggered by lord knows what running under the hood.

    1. Re:If you are right, we aren't very smart by tftp · · Score: 2, Interesting

      If you look outside your CPU, you'll see the rest of the computers on this planet are massively parallel

      You don't even need to look outside of your computer - it has many microcontrollers, each having a CPU, to do disk I/O, video, audio - even a keyboard has its own microcontroller. This is not far from a mouse being able to think about escape and run at the same time - most mechanical functions in critters are highly automated (a headless chicken is an example.) I don't call it multithreading because these functions are independently operated, just as I don't call a 386 computer dual-core because it has an independent ATA controller or an independent network card. The HDD gets written to and the network data is sent without using the main CPU, but these are independent functions performed by independent hardware. IBM/360 had that already.

      Some people (very few) have an ability to do two dissimilar tasks at the same time. That would be a perfect analogy. But the rest of us, all critters included, are single-threaded, just as you mentioned yourself. Logically thinking, any single thought can't be easily parallelized, but why couldn't we think two thoughts at the same time? I wonder why is that? This question is, IMO, very important because a brain should, technically, be capable of that feat - and nevertheless it doesn't do that! I guess this could be because the brain has (or must have?) only one VM to run our consciousness (our persona) on. Since most thoughts [queries] are executed in volatile context of owner's persona [database] it could be that allowing two thoughts at the same time, on two copies of the persona, would result in independent modification of both personas, and how to you merge them back then? And if the brain doesn't copy a full database that defines a person for each trivial thought, then running of two or more queries in parallel may result in unpredictable results (does a brain have semaphores, mutexes and spinlocks? I doubt that; if you are asked to "hold that thought" it takes a considerable effort to separate and memorize the context, and often we fail.

    2. Re:If you are right, we aren't very smart by coryking · · Score: 2, Interesting

      Logically thinking, any single thought can't be easily parallelized, but why couldn't we think two thoughts at the same time?

      Yes, but there is increasing evidence (dont ask me to cite :-) that many of our thoughts are something that some background process has been "thinking about" long (i.e. seconds or minutes) before our actual conscious self does. There are many examples of this in Malcolm Gladwell's "Blink", though I dont feel much like citing them. Part of that book, I think, basically says that we should really trust the underlying parallel part of our brain and "go with our gut" more often then western society often feels comfortable doing.

      Basically, yeah, our train of though it single-threaded, but that doesn't mean our train of though isn't just a byproduct of lower-level processes that have figured stuff out long before "we" become aware of it.

    3. Re:If you are right, we aren't very smart by coryking · · Score: 2, Funny

      our train of though it single-threaded, but that doesn't mean our train of though isn't just a byproduct

      And sometimes, even, our background grammar checker misses things that our background finger-controller mis-types while on auto pilot. thought/though, thing/think are stroke-patterns that my hand-controller mixes up a lot and since this isn't something super-formal, the top-part of my brain never catches.

  38. Re:Windows 7 given significant tweaks by fractoid · · Score: 2, Insightful

    [T]hats why on Mac, Linux or Windows you stick with code that will just work on one core. No problems then.

    That, and the much greater reason that (a) 99% of software these days would run just fine on a single core P4 3GHz, and (b) most programmers are really, really bad and it's much harder to screw up a single-threaded app badly enough that I can't fix it, than it is to screw up a multi-threaded app.

    --
    Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
  39. Re:MOD PARENT UP! by drizek · · Score: 2, Insightful

    Do you need to dedicate an entire 3ghz CPU core to run your bittorrent, and another to refresh slashdot?

  40. Re:Dolphins still missing! by Asic+Eng · · Score: 2, Insightful

    Parallel computing and parallel hardware have been around for decades - not on the desktop, but in the supercomputer area. It's a tough problem to solve efficiently - there are some things which are hard to get around. As an example think of the equation y = SQRT(a*b) - you need two mathematical operations there. It doesn't really help if you have two processors, since you need the result of one operation before you can perform the second. The example isn't very interesting, but essentially you always have this problem - if you rely on the result of the previous steps, then you need to do things in order. You can modify your algorithms so that happens less often, but this is hard work and interferes with your desire to write clean readable code.

  41. Are we talking now about technology or marketing? by Anonymous Coward · · Score: 2, Informative

    If we are talking about technology... The Linux operating system (monolith kernel is the operating system) works great on CPU's what have more than 4 cores. If the article writer did not know, the Linux OS powers almost all supercomputers etc. The problem is that applications ain't developed to use so many threads etc. The OS just works fine but if the applications can not use multiple threads, you do not gain anything. If you do not run multiple instanses of them.

    If we are talking about marketing lies and misinformation, the "operating system" (actually a _software system_) does not work at all, because usually this "operating system" can not use the multicore CPU's well. Who should we blame?

    Serioysly, Linux just works on multicore CPU's but that is just an operating system. The software systems like Ubuntu, Fedora and Mandriva just ain't working so well.

  42. The problem is human nature & choice by gooneybird · · Score: 3, Insightful

    "The problem my dear programmer, as you so elequently put, is one of choice.."

    Seriously. I have been involved with software development from 8-bit pics to Cluster's spanning wans and everything in between for the past 20 years or so.

    Multiprocessing involves coordination between the processes. It doesn't matter (too much) whether it's separate cores or separate silicon. On any given modern OS there are plenty of examples of multiprocessor execution: Hard drives each have a processor, video cards each have a processor, USB controllers have a processor. All of these work because there is a well-defined API between them and the OS - a.k.a device drivers. People that write good device drivers (and kernel code) understand how an OS works. This is not generally true of the broader developer population.

    Developer's keep blaming the CPU manufactures' that it's their fault. It's not. What prevents parallel processing from becoming mainstream is the lack of a standard inter-process communications mechanism (at the language level) that abstracts a lot of the dirty little details that are needed. Once the mechanism is in place, then people will start using it. I am not referring to semaphores and mutexes. These are synchronization mechanisms, NOT (directly) communication mechanisms... I am not talking about queues either - too much leeway on their use. Sockets would be closer, but most people think of sockets for "network" applications. They should be thinking of them as "distributed applications". As in distrbuted across cores. As an example, Microsoft just recently started to demonstrate that they "get it" because with the next release of VS. It will have a messaging library.

    choice:

    At this time there are too many different ways to implement multi-threaded/multi-processor aware software. Each implementation has possible bugs - race conditions, lockups, priority inversion, etc. The choices need to be narrowed

    Having a standard (language & OS) API is the key to providing a framework for developer's to use, yet still allowing them the freedom to customize for specific needs. So the OS needs an interface for setting CPU/core preferences and the language needs to provide the API. Once there is an API, developer's can "wrap their minds" around the concept and then things will "take off". As I stated previously, I prefer the "message box" mechansims simply because they port easily, are easy to understand and provide for a very loosely coupled interaction. All good tenants of a multi-threaded/multi-processor implementation.

    Danger Will Robinson:

    One thing that I fear is that once the concept catches on, it will be overused or abused. People will start writing threads and processes that don't do enough work to justify the overhead. Everyone who starts writing programs will "advertise" that it's "multi-threaded", as if this somehow automatically indicates quality and/or "better" software...Not.