More Interest In Parallel Programming Outside the US?

Duh? by gustolove · 2008-03-24 19:45 · Score: 4, Informative

Old programmers don't want to learn new things -- trust the tried and true.

Young bucks want to be on the cutting edge to get the jobs that the old people already have.

----

Oh, and the people see the benefit in the other countries more than those in the U.S.? Probably not, we're just lazy American's though.

Re:Duh? by AuMatar · 2008-03-24 20:18 · Score: 3, Interesting

Young bucks jump on the latest thing without thinking (or the experience to back their thoughts) of whether or not its the best way to go.

The experienced programmers know that most parallelisable problems are already being solved by breaking it across machines, and the rest won't be helped by 15 bazillion cores. An extra core or so on a desktop is nice, beyond that they really won't be anywhere near the speedup its hyped.

--
I still have more fans than freaks. WTF is wrong with you people?
Re:Duh? by dubl-u · 2008-03-24 20:24 · Score: 2, Insightful

"Science progresses one funeral at a time." - Max Planck

Software might be slightly better, as Moore's Law has been prodding us forward. On the other hand, given the number of us working in C-like languages (35+ years old), maybe with an OO twist (25+ years), to do web stuff (15-ish years), one funeral at a time might be more than we can manage. Legacy code, alas, can outlive its authors.
Re:Duh? by foobsr · 2008-03-24 21:09 · Score: 3, Informative

the latest thing

1960: E. V. Yevreinov at the Institute of Mathematics in Novosibirsk (IMN) begins work on tightly-coupled, coarse-grain parallel architectures with programmable interconnects. ( c.f. )

An extra core or so on a desktop is nice, beyond that they really won't be anywhere near the speedup its hyped.

And of course any virtual reality scenario will not profit from extra power.

CC.

--
TaijiQuan (Huang, 5 loosenings)
Re:Duh? by dltaylor · 2008-03-24 22:07 · Score: 5, Interesting

Spoken like a completely ignorant child. How the hell do you think (if you even can) we older guys got into this business? We were tinkering with new things, using them when appropriate, before many of you were born, and the joy of new ideas hasn't worn off. The only difference is we don't do it quite so impulsively, just because it seems new.

For one thing, multiple homogeneous cores is NOT new (hetero- either, for that matter), just fitting them into the same die. I've used quad 68040 systems, where, due to the ability of the CPUs to exchange data between their copy-back caches, some frequently-used data items were NEVER written to memory, and on System V you could couple processing as tightly or loosely as you wanted. There are some problem sets that take advantage of in-"job" multi-processing better than others, just as some problem sets will take of advantage of multiple cores by doing completely different tasks simultaneously. Simple (very) example: copying all of the files between volumes (not a block-for-block clone); if I have two cores, I can can either have a multi-threaded equivalent of "cp" which walks the directory tree of the source and dispatches the create/copy jobs in parallel, each core dropping into the kernel as needed, or I can start a "cpio -o" on one core and pipe it to a "cpio -i" on the other, with a decent block size on the pipe. More cores means more dispatch threads in the first case, and background horsepower handling the low-level disk I/O in the other. In my experience, the "cpio" case works better than the multi-threaded "cp" (due, AFAICT, to the locks on the destination directories).
Re:Duh? by tedgyz · 2008-03-24 23:03 · Score: 2, Interesting

Young bucks jump on the latest thing without thinking (or the experience to back their thoughts) of whether or not its the best way to go.

The experienced programmers know that most parallelisable problems are already being solved by breaking it across machines, and the rest won't be helped by 15 bazillion cores. An extra core or so on a desktop is nice, beyond that they really won't be anywhere near the speedup its hyped. Mod this guy. Short and to the point.

I worked extensively with parallel programming since the early 90's. There is no silver bullet. Most problems do not parallelize to large scales. There are always special problems that DO parallelize well, like image and video processing. So, if you are a watching 20 video streams, your Intel Infinicore (TM) chip will be worth the $$$.

--
"No matter where you go, there you are." -- Buckaroo Banzai
Re:Duh? by Curunir_wolf · 2008-03-24 23:47 · Score: 1

Wait - you mean I can't just add a compiler switch and get parallel processor support? Dammit! This sounds hard! Damn you Technical Progress - Damn you!

--
"Somebody has to do something. It's just incredibly pathetic it has to be us."
--- Jerry Garcia
Re:Duh? by colmore · 2008-03-24 23:54 · Score: 1

Huh, and according to obvious things that don't surprise anyone who think about them for more than five minutes, most COBOL programmers are members of AARP.

Outside of the older, first world programming communities, there are far more young people to old people. Old programmers typically make their living by having a resume with years of experience in a technology or three that aren't going anywhere (you'll be able to keep a roof over your head for the next 50 years keeping Java business systems running.) How many guys out there are still using Fortran and Pascal professionally? Lots. Young guys don't feel like learning decades worth of back material to catch up with old hands. Far easier to jump on a new thing where you're not anywhere near as far behind as the rest of the world. Nobody has THAT much experience with Ajax, or .NET, or whatever.

And yeah, other countries have better math education and parallel programming actually involves some math. If you're bright enough to actually think about algorithms in abstract terms (can you work through S&IoCP and Knuth?) you're made in the shade no matter what is hot.

--
In Capitalist America, bank robs you!
Re:Duh? by CFBMoo1 · 2008-03-25 00:14 · Score: 1

An extra core or so on a desktop is nice, beyond that they really won't be anywhere near the speedup its hyped.

I dunno if more cores could help me run Solitare and Minesweeper on Vista at the same time I'm all for it.

--
~~ Behold the flying cow with a rail gun! ~~
Re:Duh? by bit01 · 2008-03-25 00:16 · Score: 4, Interesting

I work in parallel programming too.

Most problems do not parallelize to large scales.

I'm getting tired of this nonsense being propagated. Almost all real world problems parallelize just fine, and to a scale sufficient to solve the problem with linear speedup. It's only when people look at a narrow class of toy problems and artificial restrictions that parallelism "doesn't apply". e.g. Look at google; it searches the entire web in milliseconds using a large array of boxes. Even machine instructions are being processed in parallel these days (out of order execution etc.).

Name a single real world problem that doesn't parallelize. I've asked this question on slashdot on several occasions and I've never received a positive reply. Real world problems like search, FEA, neural nets, compilation, database queries and weather simulation all parallelize well. Problems like orbital mechanics don't parallelize as easily but then they don't need parallelism to achieve bounded answers in faster than real time.

Note: I'm not talking about some problems being intrinsically hard (NP complete etc.), many programmers seem to conflate "problem is hard" with "problem cannot be parallelized". Some mediocre programmers also seem to regard parallel programming as voodoo and are oblivious to the fact that they are typically programming a box with dozens of processors in it (keyboard, disk, graphics, printer, monitor etc.). Some mediocre programmers also claim that because a serial programming language cannot be automatically parallelized that means parallelism is hard. Until we can program in a natural language that just means they're not using a parallel programming language appropriate for their target.

---

Advertising pays for nothing. Who do you think pays marketer's salaries? You do via higher cost products.
Re:Duh? by tedgyz · 2008-03-25 00:32 · Score: 2, Informative

I should have said: Most problems do not EASILY parallelize to large scales.

In regards to your comments about mediocre programmers...

You do not recognize the fact that most programmers are mediocre. You can scream at them that it is easy, but they will still end up staring at you like deer in the headlights.

Sorry - we are entering the "Model T" mass production era of software.

--
"No matter where you go, there you are." -- Buckaroo Banzai
Re:Duh? by weyesone · 2008-03-25 00:37 · Score: 1

What we really need is higher I/O bandwidth and faster speeds for memory & disks/SSD.
Re:Duh? by smallfries · 2008-03-25 00:44 · Score: 3, Informative

What do you mean by "parallelize well"? Normally people would use that phrase to imply fine-grained parallelism in the problem but I suspect that you are using it differently. For example when you say that compilation parallelizes well are you talking about coarse-grain parallelism across compilation-units? This is not really a single instance of a problem being computed in parallel, but rather many independent instances of a problem. If you are willing to use many independent instances of a problem then the coarse-grain parallelism will always scale "well" - i.e linearly in the number of cores. But this does not provide a speedup in many real-world applications.

In your compilation example, it is easy to get speedups at the compilation level using something like make -jN. But this assumes that each unit is independent. If you want to apply advanced global optimisations then this is not always the case, and then you hit the harder problem of parallelizing the compilation algorithms rather than just running multiple instances. It's not impossible but I'm not aware of any commercial compilers that do this.

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Re:Duh? by CastrTroy · 2008-03-25 01:03 · Score: 1

The problem is that things that take O(n) time still take O(n) time when using parallel algorithms. The only difference is that when you use parallel processing, you can make that O(n/p), where p is the number of processors. If you have 100 items to sort, you it's feasible that you could have 100 cores, it's feasible that you could sort in O(1) time. However, sorting 100 items is trivial anyway, any you probably wouldn't notice the difference between O(1), and O(n^3). When your dataset starts getting really huge, like 1 billion items, or even 1 trillion, you have to throw way too many cores at the problem for what it would really take to speed it up significantly.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:Duh? by Gr8Apes · 2008-03-25 01:04 · Score: 3, Interesting

I should have said: Most problems do not EASILY parallelize to large scales. ... You do not recognize the fact that most programmers are mediocre. You can scream at them that it is easy, but they will still end up staring at you like deer in the headlights.

Sorry - we are entering the "Model T" mass production era of software. First, no one said parallelism is easy (in this thread anyways). I don't care that most "programmers" are mediocre and will never be able to understand parallelism, much like I don't care that most native English speaking Americans will never be able to understand nor speak Mandarin Chinese. They have about the same relevance to parallel programming as we're not talking about the masses being able to do parallel programming, speak Mandarin, or make Model Ts, for that matter.

Speaking of your "Model T" mass production comment, I'd quite disagree, I'd say we've entered a split environment of Yugo production (that'd be your mediocre programmers) and the perhaps 10% of the programmers that can actually code and are capable of understanding higher level concepts. I'm being gracious with that 10%, my personal experience has shown that fewer than 3% of programmers are actually capable of coherent coding. That most likely has to do with how programmers are taught or the lack thereof, but that's another discussion entirely.

--
The cesspool just got a check and balance.
Re:Duh? by CastrTroy · 2008-03-25 01:07 · Score: 4, Informative

Most people who do anything are mediocre. Otherwise, mediocre would be redefined. It's like saying, half the people in the class scored below average. The fact that half the people scored below some value determines what the value of average is.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:Duh? by tedgyz · 2008-03-25 01:17 · Score: 1

I would agree with you there. I guess the real question is, do we need highly parallel architectures for the masses?

I think my point was that you will never get much more than 3-10% of the programming population to write useful parallel code. Quality control gets really tough when you introduce synchronization errors that only pop up under certain load conditions.

Personally, I like the high level constructs of Ada and Java. It helps me worry less about micro-managing threads. I've worked with POSIX threads, PVM, VLIW compilers, and so on. I've seen the gamut of granularity and control.

--
"No matter where you go, there you are." -- Buckaroo Banzai
Re:Duh? by mikael · 2008-03-25 01:22 · Score: 1

Young guys don't feel like learning decades worth of back material to catch up with old hands. Far easier to jump on a new thing where you're not anywhere near as far behind as the rest of the world. Nobody has THAT much experience with Ajax, or .NET, or whatever.

You can try and learn all the decades worth of back material in order to catch up, but if you try and apply for one of those jobs that requires those skills, the HR department will still want someone with decades of experience. So attempting to enter those fields is rather futile.

--
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Re:Duh? by DeadCatX2 · 2008-03-25 01:28 · Score: 1

The experienced programmers know that most parallelisable problems are already being solved by breaking it across machines, and the rest won't be helped by 15 bazillion cores

If it can be solved by breaking it across machines, it most certainly can see improvement by going multi-threaded. What you point out is that multi-threading is hard. Actually, (most) American programmers are lazy; they don't want to remember to synchronize access to that variable, or use transactional instructions in case of a power outage.

I think, in the end, multi-threading will become the norm when some language manages to incorporate provably-safe multi-threading by abstraction, much like Java abstracts pointers away from the programmer (with all of the caveats that come with abstracting important pieces of the computer away). I predict console programmers blazing the trail for how to handle both symmetric and asymmetric multi-processing at the abstract level, since their current platforms flat-out require it (and remember, we're lazy).

I think non-US programmers aren't as scared about how hard multi-threading is. I wonder if the Japanese game developers' games perform better on the 360 and PS3.

Young bucks jump on the latest thing without thinking (or the experience to back their thoughts) of whether or not its the best way to go.

Note that the summary indicates the threshold of experience at 15 years. Your statement can characterize perhaps 20-30% of this set of programmers. Maybe the Old Guard will think that no problems should be solved by multi-core and ignore it as a technology when it can actually help in significant ways, all because they're confident about their "experience" in a highly-dynamic field.

--
:(){ :|:& };:
Re:Duh? by subliminalpie · 2008-03-25 01:43 · Score: 1

Always be aware of the bias of the author. Of course this guy desperately wants parallel programming to succeed, he's an Intel marketing guy. "James Reinders is a senior engineer who has spent the past 16 years at Intel Corporation working on projects such as the world's first TeraFLOP supercomputer (ASCI Red) and on the compilers and architectures for the Pentium® Pro, Pentium II, Itanium®, Pentium® 4, and iWarp processors. James is currently the director of business development and marketing for Intel's Software Products Division and serves as the division's chief product evangelist." This article almost seems almost like a troll post, something to deliberately provoke a negative response or a reverse psychology reaction from the older programmers. "You experienced programmers are LAME. You don't GET parallel programming like the young kids do! You're getting ooolllld and sloooow! You aren't XTREME enough to handle it! Maybe programmers from other nations are more HARDCORE than you!"
Re:Duh? by Anonymous Coward · 2008-03-25 01:50 · Score: 0

Yeah, sure. Or it's that programmers with actual *experience* know that MT programming is still *hard* and know better than to do it when it's not needed. It's easy to write bad, *broken* MT code full of race conditions and deadlocks -- even more so if you don't have any real experience.
Re:Duh? by clary · 2008-03-25 02:00 · Score: 1

In the real world, it isn't just asymptotes that matter. Constants matter. A constant speedup of 100 means that you can do in real time (say 1 second) what would have taken a minute. Working in real time means you can work in qualitatively different ways. Just ask someone who remembers submitting programs for execution on punch cards.

By the way, I did a quick search and didn't find any O(1) parallel algorithms for sorting, not matter what the processor interconnect. Does anyone know what is the parallel lower bound for sorting?

--
"Rub her feet." -- L.L.
Re:Duh? by jstott · 2008-03-25 02:20 · Score: 1

Name a single real world problem that doesn't parallelize.

How about event-driven programming (or input-driven, same thing) where the interval between events is larger than the time to service an event? In other words, web browsers, email readers, word processors, PowerPoint...

Or, if you prefer scientific research, problems that map to Markov chains usually don't parallelize because of the strong temporal dependence between update steps (individual update steps are fast, but they're strongly correlated to the previous state in some non-deterministic fashion).

-JS

--
Vanity of vanities, all is vanity...
Re:Duh? by JAlexoi · 2008-03-25 02:29 · Score: 1

Right on the point!
And you are going to piss off a ton of mediocre programmers! Since most of em think they are really good and don't even consider not knowing the "truth"!
Re:Duh? by mrlpz · 2008-03-25 02:32 · Score: 1

"Name a single real world problem that doesn't parallelize"

Multiple orgasms. Try that one on for size, Einstein. And to the wingnut who says it's "not a real world problem". Yeah, your girlfriend called, she's says going out with Tia Tequila tonight.

Putzes. To those who say "parallelism scales just fine". Most real world problems ARE outlying cases of particular algorithms. Of COURSE, these things are going to have constraints placed upon them. Time, hardware availability, BUDGETS. There is such a thing as "not enough money". Parallelism scales if your problem is in producing "cogs". Simple, turn the crank, don't think too much processes performed ad infinitum.
Re:Duh? by JAlexoi · 2008-03-25 02:35 · Score: 2, Informative

Most people who do anything are mediocre. Otherwise, mediocre would be redefined. It's like saying, half the people in the class scored below average. The fact that half the people scored below some value determines what the value of average is. Do you have any idea what is statistics? Because you post suggests that you don't.
So in a set 0,10,10,10,10 average is 8, but surprisingly enough only 1 is below average!
Re:Duh? by ceoyoyo · 2008-03-25 02:59 · Score: 1

Something that takes O(N) time will still take O(N) time no matter WHAT you do (unless you use a different algorithm).

Intel/AMD/IBM etc. have made an awful lot of money and spawned a VERY large and valuable industry that pervades very nearly every facet of our lives by making algorithms run faster without improving their complexity.
Re:Duh? by Inoshiro · 2008-03-25 03:03 · Score: 3, Informative

"And of course any virtual reality scenario will not profit from extra power."

It's more a matter of what kind of speedup you see, and what algorithm you start with.

If your algorithm is serial, and there is no parallelism to speed it up, you're not going to see any speed increase. This applies to a lot of things, such as access to the player-character status variables (all cores use the same memory address, so they need a mutex or other synchronization anyway). Any AI-NPC interactions would likely need to be coordinated. You could divide the large stuff across some collection of cores (physics core, AI core, book-keeping core, input core), but not all of those will break down further (so an 80 core machine is a limited speedup over a 4 core machine -- which is, itself, a limited speedup over a basic 2-core machine for most people).

The easy things to breakup are embarrassingly parallel problems (like certain graphics activities), and they are being broken up across GPUs already. Algorithms, even if they are entirely easy to parallelize, are still linear. To be 10 times faster, you need 10 processors (this is why GPUs have a lot of simple graphics pipelines and shader processors -- they're throwing more hardware at the problem). If a problem is simply too big (you need to be 1000 times faster, or exponential and beyond algorithms), more hardware isn't going to help.

People seem to think that parallel programming will make the impossible into something possible. That's not true. Proving NP = P and coming up with efficient algorithms for certain things will. More hardware won't -- it'll just make things that were O(n) or O(log n) [for a sufficiently large n, like the number of pixels in a 1920x1200 monitor] possible. It won't affect O(n^2) and above.

--
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
Re:Duh? by Culture20 · 2008-03-25 03:07 · Score: 1

"Name a single real world problem that doesn't parallelize"
Multiple orgasms.
I actually agree that many problems are sequential by nature, but you do realize that multiple orgasms can be tasked amongst multiple women, right? I'll never do that though, I'm in the old-school one processor crowd.
Re:Duh? by ceoyoyo · 2008-03-25 03:07 · Score: 1

Just what he said. Can calculating the (real-world) problem in parallel speed it up significantly? Big picture. Fine grained / coarse grained, doesn't matter. Whichever is appropriate.

Take compiling as an example. Real world, parallel processing isn't going to help you compile Hello World much faster. But Hello World compiles awfully fast as it is. Big projects that take an annoying amount of time to compile? With a few exceptions that can probably benefit anyway with a little cleverness, compiling a project of reasonable size is going to work just fine in parallel.
Re:Duh? by Anonymous Coward · 2008-03-25 03:09 · Score: 0

I generally think that any kind of database with ACID characteristics doesn't parallelize well, neh?

Problem being as your processing array gets large enough, your CPUS end up doing nothing but managing locks and a dictionary to try to find the "one true, accurate" version of a datum.

To be sure, if you relax certain restrictions (like transactional integrity), you can build a parallelizable data store, but that's sort of like saying "if I don't really care if I get the *best* solution, I can solve the travelling salesman problem in linear time."
Re:Duh? by Anonymous Coward · 2008-03-25 03:24 · Score: 2, Informative

The problem not that. Actually O(n) is the same as O(n/p)

The problem is that K*O(n) does not become K/p*O(n) but can decrease much more slowly. 10x the cores doesn't give you 10x the performance increase. Adding a core doesn't help at all if you depend on intermediate results in your calculation.

For sorting you can get very close, but many problems are harder (if you can't partition the processing easily).
Re:Duh? by BasharTeg · 2008-03-25 03:38 · Score: 1

I couldn't agree with you more. The people who say that you can't parallelize most software, and the people who say writing parallelized code is extremely difficult and fraught with danger are people who simply have no talent with multi-threaded programming and want the world to stick to the paradigms that they're familiar with. These ignorant folks are running an informal PR campaign, spreading these myths and contaminating the public opinion towards parallelized programming. That's only reducing the number of developers who are encouraged to learn truly scalable multi-threaded development, which sets back the entire industry as we enter the manycore era. What's worse is, they offer up sharding / partitioning as an alternate solution where they can run their simple single threaded software on hundreds of computers rather than scaling up to meet the new designs of newer processors. These people knew they were in danger when x86 cpus maxed out at 4ghz (even improvements in IPC can only do so much).

Multi-threaded programming is a critical part of tomorrow's (or today's) scalability *and* performance. Sharding and/or partitioning is part of the solution, but it's not a replacement for software that can better take advantage of tomorrow's 8 and 12 core CPUs. If you had to partition your load, wouldn't you rather partition with software that fully utilized each system you deployed? Future CPUs are going to have many cores, whether you use them or not. I have worked with developers of all levels of skill and talent, and I've found that you CAN teach them to do decent multi-threaded development if they're not so afraid of the concept that they close their mind to it.

Burying your head in the sand isn't going to make parallel development and the manycore era of CPUs go away.
Re:Duh? by Nursie · 2008-03-25 03:45 · Score: 1

"Do you have any idea what is statistics? Because you post suggests that you don't.
So in a set 0,10,10,10,10 average is 8, but surprisingly enough only 1 is below average!"

Umm, which average are we taking?
Re:Duh? by Anonymous Coward · 2008-03-25 03:57 · Score: 1, Funny

A mediocre (average) person does not understand the difference between average and median.
Re:Duh? by religious+freak · 2008-03-25 03:57 · Score: 2, Insightful

I think it's reasonable to assume programming skill among developers would follow a bell-curve, in which case not only is your example misleading, it's not applicable.

--
If you can read this... 01110101 01110010 00100000 01100001 00100000 01100111 01100101 01100101 01101011
Re:Duh? by RandCraw · 2008-03-25 03:59 · Score: 2, Informative

Examples of problems that do not parallelize:

1) Problems that contain little data

2) Problems that require sequential processing

3) Problems that are often interrupted, that cannot predict future actions or execution paths (e.g. pipelined)

What fraction of computing tasks match these 3 constraints? At least 90% of the work done on desktops.

Speeding up the other 10% will be done by specialized hardware like multicore video or DSP chips. The real potential for parallelism in everyday computing is negligible. I've been parallel programming for years, and all forms of parallelism are next to useless unless, like Google, you have lots of data.

BTW, Google has no use for multicores either. All of their parallelism is embarassingly parallel, which is better served by shared-nothing architectures like many, many, many cluster nodes that are cheap, cheap, cheap.

Randy
Re:Duh? by 0ptix · 2008-03-25 04:13 · Score: 1

Most people who do anything are mediocre. Otherwise, mediocre would be redefined. It's like saying, half the people in the class scored below average. The fact that half the people scored below some value determines what the value of average is.

you are mixing up "average" with "median". say there is a class of ten students taking a test. one gets a 100 and the other 9 get a 0%. the average is 10%. but 90% of the students got less then the average.
Re:Duh? by CastrTroy · 2008-03-25 04:16 · Score: 1

According to this article you can sort in constant time, although current systems allow you to sort in O(log (n)) time. However this only works if you have n^2 processors. Which was kind of my original point. Parallelization can offer significant increases in the speed of an algorithm, but only if you have an inordinately large number of processors.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:Duh? by nwf · 2008-03-25 04:29 · Score: 1

Most problems do not parallelize to large scales. Name a single real world problem that doesn't parallelize.
Obviously this depends on your definition of "real world." Many simulation problems in the physical sciences do not scale well, since each cell's step is dependent on all other cells. There are approximations that try to reduce this dependency, but approximations are never perfect. However, one may discount these as not "real world" since most people don't simulate low-level physics and such (and these aren't NP complete either, which are sometimes parallizable. E.g. when you double your problem, it may cause an 8x increase in work, but that may be parallizable. You can parallelize the traveling salesman problem which is NP complete according to wikipedia.)

There are much larger class that don't scale to very large scales. As I recall from my parallel programming class, after about 64 or 128 processors, shared memory breaks down due to limitations on bus interconnections needed for cache coherency. You can emulate shared memory with MPI and things, but it's WAY slower to the point of being useless for applications without a high degree of spatial locality. In fact, all but the embarrassingly parallel don't scale linearly due to shared memory and synchronization, so I've yet to see many non-trivial problems that scale to massive levels well. I'm talking 1000 processors or more (which is where we are headed, it seems, since they can't increase processor speed much. They have to do something to sell us new CPUs.) You may double the processors but only get a 20% speedup. One of many examples after 15 seconds of Googling, here Another one here where they doubled the processors and only got a semi-logarithmic increase in speedup (very common from what I recall from class.) Database updates won't scale well, since fundamentally you need some concurrency control to ensure ACID and that can't scale forever.

So almost anything can parallelize, but not everything can do so well. Sure it may be faster, but not nearly as fast as doubling the CPU speed. For many systems going to 2 or 4 processors will help a lot, since people also use multiple programs or services in the background, but that's low hanging fruit. (And welcome, I use dual core CPUs and find it helps for that reason. But 1000 cores? I don't think that will help do any common task for an average user.)

So, basically, I think we are all right. It's generally faster with more CPUs, just not much faster in the higher cases and we'll reach a point of diminishing returns. Is it worth it to double the cost of a CPU for a 5% speedup? For some, I'm sure, but eventually it just don't be worth it to increase the number of cores. I used to work with people who made parallel simulations and they'd spend years getting an application working on a specific architecture. They'd be ecstatic when they got a 10% speedup. Not really practical for most consumer products.

--
I don't know, but it works for me.
Re:Duh? by Anonymous Coward · 2008-03-25 05:00 · Score: 0

.. tired of this nonsense .. I've never received a positive reply .. Some mediocre programmer .. are oblivious .. mediocre programmers claim .. QQ more? You are just huffing and puffing because you say that 'people say' that these problems cannot be parallelized when they can be. Oh yeah, and the people that disagree are just mediocre. It's just that their parallel versions are incredibly complicated and wasteful compared to the serial versions, and for marginal benefits.

Some real world examples of problems that do not parallelize well are priority queues, input/output, page layout, compilation. You can have a priority queue that finds the next highest in the background before it is needed, but this isn't making the queue itself parallel. If you've ever had a background process stealing your shell's input, or interleaving output you know how badly io parallelizes.

You say compilation parallelizes well. You probably mean the moral equivalent of "cc a.c& cc b.c". This is like you say mostly a linear speedup, but the results are terrible. For example, a 15k loc real-world program I've written is 24% larger when compiled as separate units than as a single source file. You are compiling in parallel, but you aren't solving the same problem. Yes, you can get some benefit from making parts parallel, like say parsing each file independently, then merging the results into a single parse tree. But this is not the same problem, you've added an extra 'merge the trees' step.

Making something particular thing run in parallel may not be hard, but it is pretty much always harder.
Re:Duh? by phantomfive · 2008-03-25 05:07 · Score: 1

The problem is you, and many other proponents of of multi-threading is that you only consider processing time as a resource. The fact is there are other problems: shared resources, caching, IO, disk access times; check out this discussion on the LKML. Notice how Alan Cox mentions that "as you add threads you generally decrease cache effectiveness."

If you are blocking on IO, it doesn't matter how many threads you are running, you're still blocked. In that case, there is no point in adding more threads, it will only add complexity to your program, without any real benefit. Given that the vast majority of real-world programs are interactive and spend the vast majority of their time waiting on IO (if you don't believe me check your processor usage right now...I'll bet it's below 10%), it is safe to say that the vast majority of real-world programs will experience no boost from being multi-threaded.

Sometimes threads just slow things down. To test this, just last month for work I wrote a server using a threaded model (because it was easier) and then the exact same server using a single threaded model (because my coworker complained that threads are too slow). My purpose was to show that threads are fast enough that if it's easier, you should program that way. I tested them both on a quad-core system, and the single threaded server was indeed faster (but the threaded model handled 10,000 simultaneous connections no problem....go Linux Pthreads!). Also, I am not the only one who has seen this effect.

Use threads when they're useful. Don't use them when they're not. But if you are still under the idea that parallel processing is the magic bullet, you need to learn more about optimization.

--
Qxe4
Re:Duh? by Anonymous Coward · 2008-03-25 05:36 · Score: 0

You want median, not average

A good point though.
Re:Duh? by Viking+Coder · 2008-03-25 05:40 · Score: 1

"Name a single real world problem that doesn't parallelize."

That's like saying "Name a single real world problem that you can't solve on a Turing machine."

Yes, of course, you're right. If you segment and merge your problem space appropriately, minimally you can do speculative computing and bookmarking, etc. to try to prune off branches of your problem space, etc. (Yeah, I know I'm kind of blowing smoke, but you get my point.)

But that doesn't mean that coding in Machine Language is what everyone should do. High-order tools are the only practical way to achieve anything beyong tinker toy problems. And until those tools are commonly available (and bullet-proof) in the programming environment the average developer is ALREADY USING, they're going to see it as a wall not worth climbing.

The software industry is notorious at not even writing *correct* code, and now you want to throw *parallel* in there, too? :)

--
Education is the silver bullet.
Re:Duh? by kscguru · 2008-03-25 05:42 · Score: 2, Insightful

Name a single real world problem that doesn't parallelize. I've asked this question on slashdot on several occasions and I've never received a positive reply. GUIs (main event loop) for word processors, web browsers, etc. (Java feels slow because GUI code in Java is slower than human perception). Compilation (see below - it's a serial problem with a just-good-enough parallel solution). State machine simulations like virtualization or emulation. Basically, any task where the principle unit of work is either processing human input or complex state transitions. Only numerical simulations - problems with simple state transitions - parallelize well.
Real world problems like search, FEA, neural nets, compilation, database queries and weather simulation all parallelize well. Problems like orbital mechanics don't parallelize as easily but then they don't need parallelism to achieve bounded answers in faster than real time. FEA, neural nets, and weather simulation are almost entirely number crunching - a small set of initial data plus a very large number of matrix multiplications. These are embarrassingly parallel, are well-understood and easy to optimize. The tradeoffs, in terms of locking / memory sharing / communication overheads, are documented by the past 40 years of literature; innovation is in new memory architectures or locking schemes or access patterns that tweak the costs of a small part of the program a little. And only ~1% of programmers out there even work on these sorts of problems.
Search and database queries do parallelize well, but not because of multiple processors. These two operations are fundamentally I/O bound - the data sets are too large to fit into memory, so you switch to an event model and do processing when data arrives. More raw CPU speed helps only a little (maybe 1% of the processing can overlap) - the actual gain is in larger caches and memory hierarchies. I doubt the original articles meant to call "overlapping I/O" parallel programming.
Compilation is expressly NOT a parallelizable problem. You may think it is - you fire off make / distcc and a whole storm of compilation happens - but you accept this only because compilers skip even the easiest inter-file optimizations because even attempting them serializes the problem so thoroughly that it becomes single-threaded. All the gains of JIT compilers are possible with static compilation too - but the cost of doing so is too high, so static compilers do very little optimization and JIT compilers get impressive gains with very simple optimizations on hot-paths despite terrible type-checking overheads. Compilation is in a mediocre state now - and we have dozens of languages prospering at different points on the cost curves - because it is an intrinsically SERIAL problem of fantastic complexity. I could write the most complex neural network algorithms on a single sheet of paper; even the parsing tree for the simplest languages won't fit on that page, much less optimization passes.
Most computers in this world are running code to solve inherently serial problems. Saying that numerical methods' sort of parallel programming has broader applicability is ignorant of all the problems outside that narrow area. Sorry.

--
A witty [sig] proves nothing. --Voltaire
Re:Duh? by Lally+Singh · 2008-03-25 05:52 · Score: 4, Insightful

Ugh.

Yes you can parallelize a VR system quite well. You can simulate a couple dozen NPCs per core, then synchronize on the collisions between the few that actually collide. You still get a nice speedup. It ain't 100% linear, but it can be pretty good. The frame-by-frame accuracy requirements are often low enough that you can fuzz a little on consistency for performance (that's usually done already. ever heard "If it looks right, it's right?").

Parallel programming is how we get more speed out of modern architectures. We're being told that we're not going to see Moore's law expand in GHz like it used to, but in multiple cores. Nobody things it's a panacea, except maybe the 13yr old xbox kiddies, but they never counted.

As for making impossible into possible, sure it will. There are lots of things you couldn't do with your machine 10-15 yrs ago, you can do now. Many systems have performance inflection points. As we get faster (either with a faster clock or a larger number of cores), we're going to cross more of them. I remember when you couldn't decode an mp3 in real time on a desktop machine. With the I/O and space costs of uncompressed music, that meant that you didn't really listen to music from your computer. Impossible -> Possible.

--
Care about electronic freedom? Consider donating to the EFF!
Re:Duh? by clary · 2008-03-25 06:04 · Score: 1

Thanks for the link...just what I was looking for. You are right of course in what you say about speedup being limited by the number of processors you are willing to buy. But consider how much some people are willing to pay for the very fastest processor they can buy, even if the speedup over 2nd best is only say 20%. How much would they be willing to pay for a 2X, 10X, or 100X speedup?

--
"Rub her feet." -- L.L.
Re:Duh? by Anonymous Coward · 2008-03-25 06:39 · Score: 0

"Old programmers don't want to learn new things -- trust the tried and true." - by gustolove (1029402) on Tuesday March 25, @03:45AM (#22854458)
Some don't, I do - knowledge, IS POWER, real earning power in this field. One you feel pretty OK about, as you actually create things to help others, ontop of being paid well (more as you go, with more experience + current skills & tools etc. et al)...

So... that all said & aside:

Well, IF an "Old Programmer" wants to keep PROGRAMMING? He had best prep for change, & learning...

AND - it never ends!

Constant change @ high velocity man...

HOWEVER - that's what actually makes it FUN! Imo, @ least, & keeps me from pursuing mgt. type jobs in this field (I've got 2++ yrs. in actual mgt. though from another career in loss prevention mgt. (top of my chain of 218 units in fact, 9 months in a row (aids my forensics/security background today on contracts for that)) I had though prior to computer sciences + an MIS Busienss Bachelors as well, topped off w/ Associates in Comp. Sci. (best choice I have made in my LIFE to date, was going back to school, I highly recommend it, because life will show you, in time working, what you LIKE, & DO NOT LIKE, on the job, the best - you find yourself, via experience))

Yea, it IS that, more than anything: I'll always "have the skills", to get the job done - real skills, not b.s.!

PLUS - I actually ENJOY MY JOB/WORK in coding... so much so, it is a hobby (evidence thereof below) & paid off many times on many things having just done so since 1995 online... leading into commercial products work & more, just from doing what I LIKE to do. Makes work, & the b.s. everyone has to deal w/ sometimes THAT MUCH EASIER, lol!

Anyhow... onwards & upwards (well, in THIS case, downwards):

========

"Young bucks want to be on the cutting edge to get the jobs that the old people already have." - by gustolove (1029402) on Tuesday March 25, @03:45AM (#22854458)
Probably, so what? It takes TIME, lots of it, to get "truly good" @ this stuff, & that means a heck of a lot more than JUST coding. Knowing techstuff & networking is something that is an absolute, as well as webmastery + browser coding ontop of batch/shell or scriptengine scripting too. All done SECURELY too.

Jack of ALL trades, & MASTER OF ALL - that's the goal. It pays in more than just monies, it pays in having a confidence of mastery of your field ~ peace-of-mind on the job is what that means, & liking what yuo do (not being or feeling incompetent, ever): MOST important benny there is.

========

"Oh, and the people see the benefit in the other countries more than those in the U.S.? Probably not, we're just lazy American's though." - by gustolove (1029402) on Tuesday March 25, @03:45AM (#22854458)
LOL! Show a U.S. Worker a buck, the RIGHT BUCK? He'll give you the most important & valuable thing he has - HIS TIME/LIFE, & skills.

Put it THIS way:

I know, firsthand, putting out ON AVERAGE, more than 70 hr. weeks for months @ a time this past yr. on the job (& being RADICALLY underpaid too, mind you, vs. the current prevailing wage for what I do in this field for a living vs. the rest of this nation)....

AND, about "Old folks" are this & that, vs. "Young Folks" are this & that?

This is evidence to the contrary:

http://www1.techpowerup.com/downloads/389/APK_Registry_Cleaning_Engine_2002++_SR-7_.html

OR

http://www.techpowerup.com/downloads/389/APK_Registry_Cleaning_Engine_2002++_SR-7_.html

It's Win32 API code, + Inline Assembler in combination with Borland Delphi 7.1 code HIGHLY optimized by compiler AND BY HAND (p
Re:Duh? by smallfries · 2008-03-25 07:00 · Score: 2, Interesting

Wow, you totally missed the point of what I was saying didn't you. Fine-grained / coarse grained does matter in real life.

If a problem doesn't exhibit fine-grained parallelism then running multiple copies is the *best* you can do. In some situations that is enough (i.e. a large project with lots of separate compilation units). In some situations it isn't enough, i.e. where you can't split your compilation into separate units because you're trying to run global optimisations across the whole lot.

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Re:Duh? by Anonymous Coward · 2008-03-25 07:00 · Score: 0

Please put your .sig in the correct place instead of typing it as part of your comment: http://slashdot.org/users.pl?op=edituser
Re:Duh? by ceoyoyo · 2008-03-25 07:29 · Score: 1

No, I don't think I did miss the point. Your post is correct, but missing the point.

I, as a user, care whether my job, whatever it is, is going to speed up if I but a new processor. The discussion is about the situation where the difference between the processor I have and the new one is that the new one has M cores while my current one has N cores, where M>N.

For the majority of real world jobs, the task has sufficient inherent parallelism that M cores will execute it faster than N cores. I, as a user, don't care whether you have to use fine-grained or coarse-grained parallelism to achieve that.

The original poster asked if anyone can think of a real-world, problem where this isn't true. I'll add that it has to be a real-world problem that is sufficiently large that we care if it speeds up. Compiling Hello World fails this additional criteria.

I did tack a further point onto my post that coarse-grained techniques are quite adequate for dealing with many problems when the problem is sufficiently large that it meets my additional criteria. Perhaps that was confusing. Compiling is an example of one of these. Hello World might require fine-grained techniques to get any real speedup, but then we don't really care. Compiling a large project works just fine with coarse-grained techniques, and it's an example where we actually want to speed it up.
Re:Duh? by shenanigans · 2008-03-25 07:54 · Score: 1

See? His statistics skills are mediocre.
Re:Duh? by theshowmecanuck · 2008-03-25 08:05 · Score: 1

Java is getting like this now... but mainly because everybody and their dog want to have their own framework. We are frameworked out the fucking wazoo. Java has so many splintered frameworks and technologies it's sick. Say what you want about C or C++, but it was great that it had a simple standard. The other problem with Java is even more frustrating.... the headhunters want 15 years experience in a 12 year old technology.

--
-- I ignore anonymous replies to my comments and postings.
Re:Duh? by jc42 · 2008-03-25 08:39 · Score: 1

Compilation is expressly NOT a parallelizable problem.

Well, yes and no. Maybe the single sub-task of converting a .c file to a .o file isn't parallelizable, but the overall task of converting a flock of .c files (and .a or .so files) to an executable is very parallelizable.

A few years back, I was involved in some unix kernel development on a system with 200+ processors and a terabyte or so of memory. The make(1) command had been extended to accept an environment variable or command-line option telling it how many parallel subprocesses it was allowed to run. The kernel consists of thousands of mostly small .c files, of course. The limit on a build was actually the speed at which make's storm of cc commands could be rendered on the terminal. Occasionally I had to clean out all the .o files and do a total build. There was a blur of cc commands shown on the screen, and in under a minute, the build was done. Most of the "cc -o foo.c" commands could be run in parallel with all the rest; the only exceptions were a few that had to wait for an #included file to be created.

Of course, this isn't "parallelism" in the multi-threaded sense that most people here are talking about. But parallelism at the coarser level of individual cc commands is parallelism. In this case, it did produce a speedup of nearly 200x. I verified this once by setting the allowed parallelism to 1, and verifying that it took about 200 times longer than usual. (I went out to lunch while it ran. ;-)

There are a few known tasks that can benefit significantly from parallelism. Some of them have tiny chunk sizes, others (like compilation) have fairly large chunks. The latter are usually easily handled by independent processes, and that's something that we understand fairly well today. It's really the ones with smaller chunk sizes and need to share large amounts of data that cause problems. We don't much know how to debug those, and our "system" software isn't of much help yet.

--
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
Re:Duh? by jc42 · 2008-03-25 09:20 · Score: 1

Nobody has THAT much experience with Ajax, or .NET, or whatever.

You can try and learn all the decades worth of back material in order to catch up, but if you try and apply for one of those jobs that requires those skills, the HR department will still want someone with decades of experience. So attempting to enter those fields is rather futile.

Well, I already decided that getting Ajax or .NET jobs was futile, when I started reading jobs ads that required 3 or 5 years experience with them in the first couple years they were available.

But that's an old story. I recall back in the 1980s, when a certain unnamed DBS had been out for about a year, and all the job ads asked for 3-5 years experience with it. That did a lot to dissuade me from becoming a DBS expert. Just as well, I suppose, because I know a bunch of people who are, and I mostly feel sympathy for them when they talk about their jobs.

--
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
Re:Duh? by MrSteveSD · 2008-03-25 09:40 · Score: 1

I think that while we are still just on a handful of cores, there isn't so much to gain. Software will already generally be a bit faster due to the OS juggling different applications around on the cores. When we have lots of cores e.g. 20, then writing for parallelism will be a lot more important.

That said, applications (particularly in the business area) tend to stick around for years. So although you might not gain much now by writing a parallel app for a 2 core system, in 5 years time your single threaded application could look quite stupid. It's a bit of a dilemma for small software houses. Do you tear your hear out and spend lots of money trying to write a parallel app now, even though there won't be much speed gain yet. Or do you take it easy and spend less money now writing a single-threaded app, but risk having your application looking totally useless as the number of cores keeps increasing.
Re:Duh? by Anonymous Coward · 2008-03-25 09:55 · Score: 0

The problems you refer to are all embarrassingly parallel problems, they are all specialist problems primarily from the ANALYTICAL sphere and hence are highly unlikely to have any relevance to J Random Person's computing needs.

Bottom line;
Yes, there are many problems which are parallelisable, these will continue to be parallelised as they have been since the early days of computing, the likely impact on J Random Person is a small performance increase due to reduced context switching, the majority of things J Random Person cares about do not benefit significantly from parallelism.
Re:Duh? by smallfries · 2008-03-25 10:02 · Score: 2, Informative

I'm fairly sure that you have missed the point.

We both agree about how parallelism impacts jobs in the real world. We also agree that if we can speed a job up then we don't care how it is achieved. The point that I made that you keep skipping over is that when there is no fine-grained parallelism available the key question becomes do I care about speeding up multiple copies of a job, or do I need a single job to run faster.

The OP's choice of compilation was odd, which is what I remarked on. If I'm compiling lots of things in one project then it will go faster. But compilation itself is not easy to parallelise, and as I pointed out commercial compilers don't currently speed up single compilation units over multiple cores.

Your false dichotomy is that a single unit must be simple (hello world) and anything more complex can be split into multiple units. Not only is this not true - but I gave a concrete example right at the beginning of where this would be a problem. This would be the point that has whistled over your head on multiple occasions.

When the compiler needs to perform global analysis (ie if you are doing aggressive global optimisations across the whole program - not local optimisations within a single unit) there is no obvious way to speed it up. There may well be fine-grained parallelism in there but nobody has exploited it yet. There is no coarse-grain parallelism because you only have a single task - compile the whole program.

Applying these sort of aggressive global optimisations has been the focus of the compiler community for decades. Now that multicores are becoming common it will be another interesting problem to parallelise.

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Re:Duh? by smallfries · 2008-03-25 10:18 · Score: 1

Did you realise that your entire post was contained in the GP's comment about distcc?

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Re:Duh? by Curate · 2008-03-25 11:41 · Score: 1

On the other hand, in a set {0, 0, 0, 0, 10}, the average is 2, and four out of five are below average! I guess that when the parent poster said that on average, half of the class will be below average... he was right on average (i.e. for an average set of data). ;)
Re:Duh? by linuxrocks123 · 2008-03-25 14:05 · Score: 1

You just don't get it. Multithreading is hard and leads to bugs that only exhibit themselves nondeterministically. It is /not/ a good idea to tell all the world's programmers to start writing multithreaded code for everything. People want to stick with the old single-threaded imperative paradigm as much as is possible because the single-threaded imperative paradigm is simple and it works. If the best solutions to the end of single-core improvements were known, there wouldn't be tons of research into it in CS departments across the world right now. We have a problem; everyone knows that. Multithreading might have to be part of the solution, but God help the poor users of the resulting buggy software if it has to be a large part of it.

--
vi ~/.emacs # I'm probably going to Hell for this.
Re:Duh? by Anonymous Coward · 2008-03-25 14:14 · Score: 0

No, it determines what the median is.
Re:Duh? by frank_adrian314159 · 2008-03-25 14:15 · Score: 1

What do you mean by "parallelize well"?
Well, I would assume it means a close to linear speedup for some reasonable N, where N represents the number of computational units thrown at the problem (be it functional units in a CPU, cores on a chip, or chips in machines) - at least that's what it usually means in the computational literature. Now, of course, your definitions of "close" and "reasonable" or your pick of which processor level you're looking at may vary, but didn't Barbie say that "CS is hard," or something like that?

--
That is all.
Re:Duh? by Anonymous Coward · 2008-03-25 15:57 · Score: 0

Sure, he's probably biased. But what if he's also right?
Re:Duh? by try_anything · 2008-03-25 17:13 · Score: 1

"Mediocre" has nothing to do with statistics. It's just an English word with no mathematical definition. Something that is "mediocre" plays an adequate part in a process where most of the value is produced by superior components. A very good sports team may have several mediocre players. A good meal may start with a mediocre appetizer. You might have a great vacation while staying at a mediocre hotel. When all of the contributors are of mediocre quality, the combined result is crap. A team with nothing but mediocre players is a very, very poor team. An album with nothing but mediocre songs is a crap album, not worth buying or listening to. Software developed by a team of two brilliant programmers and six mediocre ones may be pretty good, but software developed by eight mediocre programmers will be crap.

In many contexts the great bulk of contributors will be mediocre. In some contexts, mediocrity is the exception. For example, if you buy a six-pack of excellent beer, you don't expect to get a mediocre bottle. You don't expect to get your mail delivered by a mediocre postman -- the job is too simple to screw up. And, of course, you very rarely get a mediocre hundred dollar bill. (What would that be? A hundred dollar bill with snot on it?)
Re:Duh? by tigersha · 2008-03-25 20:15 · Score: 1

> Name a single real world problem that doesn't parallelize.

Basically, anything that does a lot of IO to the disk is almost invariably going to be bound by IO rather than how many processors you have. And MOST real world problem lie in that domain. Few businesses have lots of graphics workstations and video editors. Almost all o them store data and manipulate it. And no, editing some dude's adress is not going to be faster if yu map the firstname and lastname fields to different cpus.

The reason that IBM mainframes are still a viable option if you have a real boatload of data is that they have wicked fast IO subsystems, rather than fast CPU's (they don't)

--
The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
Re:Duh? by mikael · 2008-03-26 02:59 · Score: 1

I'm not too much of a fan of Java or Microsoft, but at least it shows what the computer industry would be like if Microsoft hadn't been able to dominate the application development process.

C may have been simple, but in retrospect it really made complex applications development awkward, even when you were using ADT's. C++ was an improvement, but it does have a limitation in that it doesn't allow the inheritance of constructors (you can get around this with using typedef's, but then you can't extend class members). The downside was that you were more or less forced to use MFC or .Net for Windows development, unless you chose to use Qt, or KDE/Gnome for Linux.

When headhunters want people with 15 years experience in a 12 year old technology, what they mean to say is that they want the original architects of the technology.

--
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Re:Duh? by jc42 · 2008-03-26 04:11 · Score: 1

... your entire post was contained in the GP's comment about distcc?

Well, maybe, but it's also clear from the parent comment and several others in the general discussion that there's a belief that compilation is an example of a process that's not parallelizable. So I though maybe, for the benefit of those who haven't read everything here, it might be useful to point out that parallized versions of make have been around for a couple of decades, and compilation is actually a process that can easily benefit from the obvious use of multiple cpus to do cc commands in parallel.

It is funny how people get that such things are difficult. Of course, in this case there's little need for threads, since the chunks can run in parallel without any communication other than what's provided by every file system. But then, there seems to be a lot of conflation of different kinds of parallelism, some of which are easy (and we've done for decades), while others are very hard.

The discussion here seems to be among people that mostly consider parallelism to be a single problem. In reality, it's a collection of independent problems that have the misfortune of being covered by a single English-language term. Sorta like the medical field's problems with talking about "diseases" like the cold, flu, and cancer, each of which is a cover term for hundreds or thousands of different diseases that happen to share some major symptoms. They all illustrate the old problem of people thinking that they understand a problem when they've given it a name.

--
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
Re:Duh? by hobbit · 2008-03-26 05:16 · Score: 1

Indeed, you don't even need to guess :)

http://en.wikipedia.org/wiki/Central_limit_theorem

--
"Wise men talk because they have something to say; fools, because they have to say something" - Plato
Re:Duh? by Gr8Apes · 2008-03-26 05:54 · Score: 1

I too like the higher level constructs, especially in Java these days. However, I dislike some aspect as well, for instance, ThreadLocals, which, as far as I'm concerned, are apparently harder for people to use correctly than it is to have them write parallel code. They're my personal pet peeve at the moment, as I'll be unwinding that wonderful mess because someone was too lazy to add a DVO to method calls within their own code. ThreadLocals should never be used inside a monolithic code base and probably almost never at all, as there's always another option that will be easier to implement and debug as well as being more scalable by virtue of a process not being tied to a single thread nor monopolizing that thread while in progress.

--
The cesspool just got a check and balance.
Re:Duh? by hobbit · 2008-03-26 06:01 · Score: 1

you are mixing up "average" with "median" Several people in this thread have replied as much to the OP, yet mean is but one kind of average. Median and mode are two other common ones. And indeed, when people say "mean", they generally mean arithmetic mean, but there are also geometric and harmonic means...

--
"Wise men talk because they have something to say; fools, because they have to say something" - Plato
Re:Duh? by jgiltner · 2008-03-27 05:39 · Score: 1

I guess you missed the recent announcement of the z10 mainframe, 4.4 Ghz. Seem fairly fast to me.
Re:Duh? by darrinallen · 2008-03-31 12:17 · Score: 1

It seems at my technical college most people are interested in networking rather than programming

Multicore programming worries me... by (TK2)Dessimat0r · 2008-03-24 19:45 · Score: 0, Interesting

Instead of addressing the root problems with concurrency, we are just going see super-high-level languages that have no bearing or relationship to the actual hardware the underlying machine.

Re:Multicore programming worries me... by TeknoHog · 2008-03-24 23:29 · Score: 1

Instead of addressing the root problems with concurrency, we are just going see super-high-level languages that have no bearing or relationship to the actual hardware the underlying machine.
In my opinion and experience, higher level tools are useful for multiproc machines, if not completely necessary. For example, if you break up a matrix multiplication into nested loops in C, you've lost some essential high-level information about the problem. The compiler can try and guess whether it can be parallelized, and it's never quite perfect. It makes more sense to convey the original problem to the compiler, using higher level constructs. I've used Fortran 90 with a good compiler to do just this.

--
Escher was the first MC and Giger invented the HR department.

What are the applications? by BadAnalogyGuy · 2008-03-24 19:51 · Score: 3, Interesting

Which is more efficacious? To take up as much simultaneous processor time as possible in order to finish faster, or to leave extra cores open for other processes to run simultaneously.

Given that the management of threads isn't exactly the easiest thing in the world (not the hardest either, mind you), perhaps it would be more beneficial to let the OS determine which threads to run rather than trying to optimize a single app.

For example, many webservers use a semaphore rather than multiple threads to handle request dispatch. The result is that there is less overhead creating and cleaning up threads.

Re:What are the applications? by BSAtHome · 2008-03-24 20:06 · Score: 3, Funny

Multiple cores plus less experienced programmers results multiple infinite loops able to run at the same time. I don't quite see how this helps quality software, regardless of the synchronization problem.
Re:What are the applications? by BadAnalogyGuy · 2008-03-24 20:10 · Score: 3, Funny

I don't quite see how this helps quality software

Sure, but you can now run your infinite loops in half the time as before.

Halving the time to run an operation? That's improving quality, right there.
Re:What are the applications? by SanityInAnarchy · 2008-03-24 20:44 · Score: 3, Interesting

It depends on the application. Some applications simply benefit from running in realtime. And some applications don't really scale well by breaking them up into individual processes. Some applications want to use as much CPU as you can throw at them -- a web app, for instance, had better be able to handle another few (dozen?) app servers if you get Slashdotted.

Also: Management of threads is mostly hard because we're mostly still using such low-level tools to do it. Semaphores, locks, and threads are the equivalent of GOTOs, labels, and pointers. While you can build a higher-level system (message-passing, map/reduce, coroutines) out of semaphores, locks, and threads, I'd argue that's like writing everything in GOTOs instead of doing a proper for loop -- or using a for loop instead of a proper Ruby-style "each" iterator. At the risk of abusing my analogy, yes, sometimes you do want to get under the hood (pointers, threads), but in general, it's much safer, saner, and not that much less efficient to use modern parallel programming concepts.

There's still some real problems with pretty much any threaded system, but I suspect that a lot of the bad rap threads get is from insufficient threading tools.

Oh, and about that webserver -- using more than one process is pretty much like using more than one thread, and I'd say weighs about the same in this discussion. Webservers, so far, are able to get away with using one thread (or process, doesn't matter) per request (and queuing up requests if there's too many), so that's a bit less of a problem than, say, raytracing, compiling, encoding/decoding video, etc.

--
Don't thank God, thank a doctor!
Re:What are the applications? by BadAnalogyGuy · 2008-03-24 20:58 · Score: 3, Interesting

Regarding the webserver, I am arguing the opposite, though. The single-threaded, event-driven architecture allows the server itself to remain relatively lightweight. Some webservers certainly do use threads and optimize that further by using a threadpool, but threads are still more cumbersome for an application like that which has mandatory blocking while the hardware is busy doing stuff (tx or rx).

The rest of your post, I am in agreement. I used to work on a very large high profile application framework. I heard all the time how the framework was huge and bloated and slow. But what I found 9 times out of 10 was that when an application which eschewed the framework grew to a large enough size, most of the core logic of 'my' application framework was duplicated in the resulting application. In other words, they were re-engineering stuff that had already been solved and were worse off because their code was less tested and less optimized than the existing framework.

In the same way, we need a good "framework" (maybe a language, maybe a library, maybe a new paradigm) which takes advantage of and makes beautifully clear the paradigm. I don't know if it's necessary to hide the parallelization from programmers (Erlang) or to expose just the bare minimum (fork(), etc). There needs to be a middle ground which can be taken advantage of whether the programmer knows he is using it or not, but also is readily available for the times when he absolutely needs the capability.
Re:What are the applications? by RuBLed · 2008-03-24 21:34 · Score: 1

Ha! But you see lad, nobody had any proof that a program with an Infinite Loop would run infinitely.

So there, myth busted, show me your infinite loop and I'd show you my sledgehammer...
Re:What are the applications? by rbanffy · 2008-03-24 23:37 · Score: 2, Interesting

"Which is more efficacious? To take up as much simultaneous processor time as possible in order to finish faster, or to leave extra cores open for other processes to run simultaneously."

This is up to the OS to decide. The programmer's job is to provide something for the processors to do. If doing it serially is wasteful, doing it in parallel (or, at least, asynchronously) is the way to go.

Of course, when you think parallel and threads rely on each other's data, you need a suite of tests that can test it properly. Either that, or you risk losing endless nights tracking nasty real-time bugs.

Concurrent is clearly the way to go, as we are increasingly close to a performance wall with single-threaded sequential stuff.

--
http://www.dieblinkenlights.com
Re:What are the applications? by doctor_nation · 2008-03-24 23:39 · Score: 1

Depending on the parallel efficiency of the program, in general it should be more efficient to use up as much of the processor as possible. If you have 4 processors and are only running one program, then that program should use up all four. If you only program for single threads, then the code is less optimized generally speaking. I would suppose that more advanced codes should be able to parallelize when there are open CPUs and use a single thread when the system is loaded down (assuming that a single thread is more efficient). Not having parallelization hobbles a program running on a multiple processor system.
Re:What are the applications? by Tony+Hoyle · 2008-03-25 00:11 · Score: 2, Insightful

Actually No. In that case you'd want to use a maximum of 3. The OS needs to do its own processing and if you're hogging all 4 cores your app will end up slower because every time it does something OS dependent like accessing a the disk or network it'll be waiting around for the OS to catch up.
Re:What are the applications? by Nursie · 2008-03-25 01:46 · Score: 1

"The single-threaded, event-driven architecture allows the server itself to remain relatively lightweight. "

Which, if that's what you want, then it's a good thing. If, on the other hand, you want to make full and efficient use of the platform you're on (and the keyword is *full*) then you do things a little differently, especially where multiple cores are available.
Re:What are the applications? by zarthrag · 2008-03-25 02:05 · Score: 1

I've recently taken to threading building blocks in C++ (over OpenMP, which has limited use since it's centered around loop optmization). It's cross-platform, and fairly complete. The docs/oreilly book were all I needed to get going. It can stand some growth - but I've managed to prototype a scalable game engine with it quite easily. All I did was change my design of my engine components (which were already classes that could be called in a main loop, one by one, in order) to emit "tasklets" into a pipeline for parallel execution. Each module derives it's own tasks and only emits what needs to be executed. Added a class to manage dependencies and control execution order, and then run the tasks through a pipeline. Additionally, I can use TBB or openMP to optimize within the tasks as much as I want, TBB takes care of assigning threads for me. What I ended up with is a game engine that, on a single core system with no hyper-threading, executes just a touch slower than the main loop version due to overhead. But on 2+ cores, or 1 w/HT, the engine will parallelize whatever it can. If it's running two+ unparralized tasks, they run in parallel. Tasks that have parallelism get multiple threads assigned to them. So now, my frame/video rendering runs alongside my input updates and networking code - but my physics goes full-bore across what's available. It wasn't hard, it just took a different approach than what's taught at school. Not all algorithms are easily parallelized, but nearly all tasks are. With more work, I'm sure it could be even better. And, if TBB ever gets some kind of affinity specification, I'd have a pretty good amount of power over my code. OpenMP is almost too easy - as it uses #pragma too hint the complier. The tools are there, we just need to use them and build a community.

--
Why can't all fpga/microcontroller manufacturers just release free optimizing compilers???
Re:What are the applications? by SanityInAnarchy · 2008-03-25 05:09 · Score: 1

Regarding the webserver, I am arguing the opposite, though. The single-threaded, event-driven architecture allows the server itself to remain relatively lightweight.

I'm arguing that you have too narrow a definition of "threads". Any webserver that takes advantage of multiple processors is either having multiple threads/processes actually listening to port 80, or is a single process listening to port 80 and load-balancing between multiple backend threads/processes.

And I'm arguing that if such a coarse model truly can be made to work, it might be worth investigating how to do it in more places. (I think Erlang is an example of that model.)

I don't know if it's necessary to hide the parallelization from programmers (Erlang) or to expose just the bare minimum (fork(), etc).

Well, Erlang doesn't hide it from the programmers, it just makes it easier to work with.

And I would argue that we already have fork() and friends, and POSIX threads, and locks/semaphores, etc. And all of that is needed. But we also need something built on top of that, which provides sufficient abstractions for the most common ways of threading -- and we need to pick some common ways of threading.

A simple example: Map/reduce. I'd like to be able to take a chunk of data, and call an iterator on it -- like Ruby's each() or map() -- except have this iterator automatically span processors. Specifically, it would ideally spawn n threads (where n is the number of CPUs), and keep them busy, but as a programmer, I shouldn't have to worry about it. I should be able to do something like:

['this', 'is', 'an', 'array'].map { |x| x.upcase } # returns: ['THIS', 'IS', 'AN', 'ARRAY']

That works right now. What I want, though, is some variant which runs each iteration in parallel -- as you can see, even in that contrived example, it should be able to use at least four cores. Under the hood, if upcase is doing something similar (splitting a string into characters, each character gets a thread), then it could use 13 cores.

Obviously, the above example would actually suffer from that approach -- spawning and synchronozing threads would take far more work than simply brute-forcing it in one thread. But you can see the pattern, and how it might be useful...

--
Don't thank God, thank a doctor!
Re:What are the applications? by Nursie · 2008-03-25 05:26 · Score: 1

Umm, that's a very simplistic view of it and not necessarily correct. Depending on your code you may have IO bound portions, you may have waits, synchronisation, all sorts of other reasons that any given thread might grind to a halt and be swapped out.

Too many threads are a bad thing due to context switching and cache repopulation, but if you want to use all the available hardware then you probably want more threads than cores. The OS should be a minimal impact. If it's a decent OS...
Re:What are the applications? by ad0gg · 2008-03-25 10:58 · Score: 1

For example, many webservers use a semaphore rather than multiple threads to handle request dispatch. The result is that there is less overhead creating and cleaning up threads.
A semaphore? How does using a synchronization primitive in a non threaded/multiple process enviroment work? I was always under the assumption that services like web servers do use thread pools or application pools(process). Server to busy errors occur when the queues reach a certain length.

--
Have you ever been to a turkish prison?
Re:What are the applications? by BadAnalogyGuy · 2008-03-25 13:03 · Score: 1

The server (lighttpd, for example) is single threaded. It relies on signalling from the device to complete requests. Since the signalling is asynchronous, it's not a true queue in the FIFO sense. It's also not multi-threaded. That's not to say that a server couldn't be multi-threaded, but since there is only one device (generally) it isn't necessary to run multiple threads.

I'm in the latter.... by Valkyre · 2008-03-24 19:54 · Score: 1

I cut my teeth on programming microcontrollers and embedded devices, and high level languages are a chore for me...having to program for some api/interface/whatever and barely seeing what goes on at the hardware level is strange and confusing for me. That being said, isn't the very reason I'm not fond of high level languages something that would make it an easy transition to multicore development? Are the techniques not an extension of what a programmer already knows and does versus something completely new?

Also, it would seem from a low level standpoint that working with long instruction paths on a superscalar architecture would have been an excellent stepstone for multicore development...am I wrong in seeing some parallels (no pun intended) here?

--
What the heck is a 'sig'?

Questions by Hal_Porter · 2008-03-24 19:54 · Score: 5, Funny

Q1) Why did the multithreaded chicken cross the road?
A1) to To other side. get the

Q2) Why did the multithreaded chicken cross the road?
A4) other to side. To the get

It is funnier in the original Russian.

--
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;

Re:Questions by religious+freak · 2008-03-24 20:41 · Score: 1

I DON'T KNOW WHY, BUT I CAN'T STOP LAUGHING AT THAT! Could just be that it's so late, but damnit, that's funny

--
If you can read this... 01110101 01110010 00100000 01100001 00100000 01100111 01100101 01100101 01101011
Re:Questions by RuBLed · 2008-03-24 21:55 · Score: 5, Funny

Whoa..

Q: How many multithreaded person(s) does it take to change a light bulb?
A: 5, 1 at each of the 4 ladders and 1 to pass the light bulb to the lucky one.

Q: How many multithreaded person(s) does it take to change a light bulb?
A: 4, each trying to screw the lightbulb.

Q: How many multithreaded person(s) does it take to change a light bulb?
A: I don't know what happened to them.
Re:Questions by Anonymous Coward · 2008-03-25 01:36 · Score: 0

Lock stdout, and it will be funny
Re:Questions by chiph · 2008-03-25 01:56 · Score: 1

Q. Is the multi-threaded cup half-full or half-empty?
A. Half full. This time.
Re:Questions by whitelines · 2008-03-26 02:25 · Score: 1

This is one of the few times I have literally laughed out load. Like people looking oddly at you, because you're laughing to yourself. Any programmer who's done multi-threading has seen this in one form or the other. It's perfect.

--
/* TBD */

Experince by forgoil · 2008-03-24 19:55 · Score: 4, Interesting

One reason could be that software engineers with more experience simply already know about these things, and have faced off against the many problems with concurrency. Threads can be hell to deal with for instance. So because of things they don't show any interest.

That being said I think that if you want to actually make use of many cores you really do have to switch to a language that can give you usage of many threads for free. Writing it manually usually ends up with complications. I find Erlang to be pretty nifty when it comes to these things for instance.

Re:Experince by mrbluze · 2008-03-24 20:38 · Score: 1

ne reason could be that software engineers with more experience simply already know about these things, and have faced off against the many problems with concurrency. Threads can be hell to deal with for instance. So because of things they don't show any interest.
I think to make programming too different from natural human thought processes will result in less manageable code and probably less performance & profit for effort. Multithreading within an application is great for some things, like heavy mathematical tasks that aren't strictly linear, predictable jobs which can be broken up and pieced together later, etc. But why should programmers be forced to learn how to do this? I think it's up to operating systems and compilers to work this out, really. The point of multiprocessor design is to reduce various bottlenecks and improve energy efficiency, as far as I can tell.

Yes, experienced programmers are probably avoiding the learning curve because until now it's been a bumpy ride with parallel processing.

But I also can't quite see why older programming languages such as C, which has adapted well to event driven design, can't adapt easily to multithreading. Surely, as I am suggesting, implementing this is more a compiler issue with maybe a few basic additions to the language itself?

--
Do it yourself, because no one else will do it yourself. [beta blockade 10-17 Feb]
Re:Experince by Instine · 2008-03-24 20:46 · Score: 1, Insightful

but just wait...

If you think 2 cores is tricky, then how about 4. And if you're really going to make the most of the multiple cores, and you start to use them for complex permutations of solution finding, the complexity gets silly fast! 8 cores in total is already fairly common. This will have to more than double every 18 months to keep up with Moore's 'Law', which they have to do, to keep the mechanics of their capitalist framework ticking smoothly.

In ten years, efficient programming won't be difficult, it will be impossible unless we evolve our engeneering concepts dramatically to adapt to this paradigm shift (sorry for the cliche phrase but its apt). I believe the only way, will be to use genetic algorithms (suited to multiprocessors them selves) to adaptively compile code. Effectively evolving it until its optimized.

--
Because you can - or because you should?
Re:Experince by rolfwind · 2008-03-24 21:06 · Score: 1

I think to make programming too different from natural human thought processes will result in less manageable code and probably less performance & profit for effort.

Human brains are not like brains and are not serial oriented:
http://scienceblogs.com/developingintelligence/2007/03/why_the_brain_is_not_like_a_co.php

But our conscious mind might be.
But why should programmers be forced to learn how to do this?

For the same reason a programmer should learn anything -- to understand what is going on in the background.
Re:Experince by Antique+Geekmeister · 2008-03-24 21:15 · Score: 1

Another reason could be that we're old programmers: we know better than to stick our fingers into the moving lawnmower blades of multi-processor technologies.
Re:Experince by LiquidCoooled · 2008-03-24 21:23 · Score: 3, Insightful

Using genetic algoithms won't help unless you understand the underlying issues with multi-core.
Computer software is notorious for not understanding what the operator wants ("It looks like you are writing a sorting algorithm.."), what makes you think this will be any different?

(I am not knocking GA coding methods but just using it as a blanket extension to job security is misguided at best)

--
liqbase :: faster than paper
Re:Experince by mrbluze · 2008-03-24 21:44 · Score: 1

For the same reason a programmer should learn anything -- to understand what is going on in the background.
Understanding is fine, but having to take into consideration that your threaded app needs to be able to run on a single, dual, n^2 core processors, putting in contingencies and trying to slice up your code for this purpose and at the same time making it readable and logical to onlookers, is a bit rich.

It's not the job of a C++ programmer to have to take into account where in memory their code might be running, or even what operating system is running, in many instances (of course they should be allowed to do this when appropriate). Abstraction is valid and worthy. Parallel processors should be the same and programmers shouldn't have to worry about their program running properly on a single or 64 cores. If the 64 core processor is running your app as slowly as a single core, then it's just a crap computer/compiler/OS.

--
Do it yourself, because no one else will do it yourself. [beta blockade 10-17 Feb]
Re:Experince by rolfwind · 2008-03-24 22:07 · Score: 1

Of course, one of the responsibilities of a programmer is to use the right tool for the job and C++ is not it in this case. (I acknowledge that some don't have the luxury to choose.)
Re:Experince by TuringTest · 2008-03-24 22:11 · Score: 1

Interestingly, Microsoft seems really interested in providing the right framework to "take advantage of the multicore architectures while solving the most common problems with concurrency. Their Research Labs are doing a lot of good work with experimental language features, and many of them are getting their way into the .Net platform.

This makes sense coming from this company, since one of their strong points always has been creating good development environments for the not-highly-specialized programmers of the world. This collection of features could put them again on the right track to dominate the software building environments.

--
Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
Re:Experince by Alphasite · 2008-03-24 22:42 · Score: 1

Nonsense,

multithreading has nothing to do with the actual number of cores, as you can create 20 different threads into a single core computer. Usually the operating system distributes and organizes your threads so you don't have to and it usually does a damn good job as some Sun servers are able (and have been able to do it for quite a time now) to run up to 64 simultaneus threads.

Multithreading is tricky and can lead to a lot of problems and headaches but, as CPU power is currently reaching its limits (or so it seems) multicore seems to be the only way to go so we better learn and start using it. I've developed multithreading applications and the number of threads doesn't matter so much, what does actually matters is the number of task in which you can subdivide the problem. In my case there where 4 fixed threads with 4 different concrete task and a variable number (configurable under app settings) of background workers wich will consume data generated by the other 4, so its not actually that difficult, for the standar programmer at leas, maybe OS guys will have to work harder but ...
Re:Experince by dkf · 2008-03-24 22:42 · Score: 2, Interesting

In ten years, efficient programming won't be difficult, it will be impossible unless we evolve our engeneering concepts dramatically to adapt to this paradigm shift (sorry for the cliche phrase but its apt). I think I agree with this. I think that the key to this is going to be to go to an architecture based far more on message passing rather than shared memory (why? because we have really good evidence that it scales, and there's far less hair-loss involved when things go wrong; shared-memory parallelism is infamous for schrödingbugs and heisenbugs). The other advantage with doing this is that extending the program to work across more than one computer is far easier too.
I believe the only way, will be to use genetic algorithms (suited to multiprocessors them selves) to adaptively compile code. Effectively evolving it until its optimized. I take it that this is using a genetic algorithm as a magic wand? And that you won't mind if upgrading the computer (or even just plugging in new hardware) breaks everything totally? GAs frequently tend to lead to solutions that have very odd timing behaviour indeed...

--
"Little does he know, but there is no 'I' in 'Idiot'!"
Re:Experince by Anonymous Coward · 2008-03-24 23:20 · Score: 0

But you are still assuming that compilation is done outside the execution host. Even if you compile versions for 2 core, 4 core, 8 core, etc
the core configuration( 2x4 is not the same as 4x2) and memory bandwidth difference change the optimal solution. Late stage compile will have to move
into the install process.
Re:Experince by gutnor · 2008-03-24 23:21 · Score: 1

"In ten years, efficient programming won't be difficult, it will be impossible"

I guess you are talking about local desktop programming. Because in enterprisy development, multiprocesses or multithreads programming on server with more than 4 CPU is the most common environment since the mainframe times.

Experienced developers have been used to deal with complex parallel problems. The basic experience that most slashdotter should be familiar with is forking a process to deal a socket connection in plain C.

So the difference is that youngling start to be interested because their open source whatever-clone project is not running faster on their QuadCore proc or because they start getting weird bugs out of nowhere. Don't get me wrong, that's a good thing ! The younger generation of developers will have first hand experience with multithreading issues and "feel" the value of forking vs multithread, singleton vs instance, framework or language that hides the plumbing, ...
Re:Experince by paulbd · 2008-03-25 00:06 · Score: 1

spoken like a true parallel programming neophyte. 14 years ago, i was programming on a 64 core system. a couple of years before that, a 32 core system. when you don't have enough perspective on this matter, you're easily susceptible to some crazy idea that "eight! i mean think about that, eight cores is quite a bit". you're even more susceptible to the idea that parallel systems are going to follow some moore-ish curve, when they've already been in and out of fashion (and up and down in scale) several times since the idea was first floated.

parallel systems and parallel programming are not a panacea, they are not trivial to use (and probably never will be), but they have their uses. the current bubble in "multicore" systems is not the first time they've become popular at some level, and it probably won't be the last.
Re:Experince by __aailob1448 · 2008-03-25 00:08 · Score: 1

i wish i had mod points to prop you up.
Re:Experince by Anonymous Coward · 2008-03-25 00:30 · Score: 0

I have been trying to do multi-threaded stuff in perl, having x threads working at the same data, storing results in the same resulting structure. Here the problem is the overhead of the synchronized data structures. It was much faster to split the data in 2, and process it individually, and join the results.

I could probably have speeded things up more by locking a simple index variable instead, but it shows the Perl support sucks.We need better support. I could probably have made something much faster myself, by having a fixed size array as a ring buffer.

So not everything is good for threads as things work today.
Re:Experince by MrBandersnatch · 2008-03-25 00:33 · Score: 1

"but, as CPU power is currently reaching its limits (or so it seems) multicore seems to be the only way to go"

I find something strange in the way we're going wide rather than high especially with Intel CPUs. Most of the Core2 range overclock VERY nicely on air with the new e8400+ hitting the 4Ghz mark often with ease. With a GOOD motherboard, RAM and cooling 5Ghz is possible and even 6Ghz in the *extreme* (read liquid nitrogen) cases. I cant help but wonder if we wouldn't be seeing higher speeds if there was even a single company dedicated towards a fast single (possibly cooler) core and better cooling than standard air.... Anyways, so perhaps wide ISN'T the *only* way to go....of course though with heterogeneous cores we will probably one day see the situation where 1 core is tweaked to offer the maximum speed to none parallelisable tasks so maybe by that point this would be moot anyways.
Re:Experince by maraist · 2008-03-25 01:03 · Score: 1

I think we've found the problem right here. You seem to view multi-threading like I view Javascript. A necessary evil that we wish we could just abstract away (using template frameworks that use it in a well defined, non-buggy, multi-platform environment, but where I never have to see it's revolting head).

But I've been programming in a java environment for years, and am in LOVE with multi-threaded programming. We regularly install our apps on 8 core machines and use every ounce of CPU. Moreover, I love the ugly cousin of multi-threading: clustering (though clustering and MT exercise two different parts of the brain). Clustering requires symmetric divide and conquer programming whereas MT can allow efficiently crafted coordination (using latches, semaphores, atomic-integers, mutex's, etc).

Once you've been doing MT long enough, every global variable you see, you consider it's MT-safeness. Even with local variables, if I want to hand this variable off to a queue that will be read in by a separate thread.. OK, let me think; what fields will that system use. Were there any database transactions open? Are there counters that I need to convert to 'AtomicInteger' or do I need to add synchronized sections? What is my usage pattern for this map/dictionary? Single write, multiple concurrent read? Mixed ratios of reads/writes?

When you do it long enough, you think in the above terms even if you know the app will be single threaded. You build 'value-objects' (which are immuteable) not because it's MT-safe, but because it reduces bugs.

These techniques can be applied to most languages, but the problem with C/C++ is that it was designed from the ground up without MT eyeballs. There are far too many APIs that use global variables in an unprotectable manner.

Hell, I've seen programmers that can't get their heads around EVENT driven programming, which is an order of magnitude simpler. Take a web server that uses a singleton object. Nothing more than a series of global variables. All you have to do is remember to re-initialize those variables at the beginning of the event call (the web service entry point for example). Never mind the fact that you need to protect / synchronize or move those variables as locals (possibly using a transient wrapper object so you can maintain data-hiding). It boggles my mind the linear tunnel vision that most programmers have. Programming with global variables should require a government court-order. That would solve half the damn bugs in the world.. People should be forced to use lisp or some such craziness.

--
-Michael
Re:Experince by maraist · 2008-03-25 01:22 · Score: 1

Sorry for my earlier rant. As far as abstracting at the compiler level, here are the types of things I can imagine abstracting:

Sorting and other large well-defined O(n+) category operations.

Identifying segments of code to automatically perform what would in perl be the async block prefix.
async { doFoo(); }
async { doBar(); }
Theoretically a compiler could see a large path of execution and define a pool of worker threads equal to the number of CPUs so as to handle this.. The problem is that in the above, how reliable are the linking of those doXX functions such that the compiler can GUARANTEE thread-safety by magically splitting the code up. The answer is zero, because every language I'm aware of allows dynamic swapping out of function invocation - Short of that function being declared inline.

High level language (like Perl/Ruby/Python) where simple operations really perform a dozen function calls (or inline macros, as the case may be). Such code could theoretically 'delay' operations. In perl, for example, every variable assignment triggers a dereference and a potential garbage collection. If this were deferred to a dealloc queue, then along with revamping the perl malloc, you could be executing in parallel with fewer application pauses. There are serious challenges - as you'd have MT safety issues with your destructors, as well as contention with malloc and you may quickly run out of memory if you're not careful. Other places are regular expression matching. In perl, you could theoretically delegate to a peer thread all reg-exes, then the next usage would implicitly use a latch. If you knew this, you could write your code so that you do the re-ex'es early in the while-loop or function call, then use the result of the reg-ex at a later point.
$x =~ s/pattern1/sub1/; # delegates to thread 2
$y =~ s/pattern2/sub2/; # delegates to thread 3
# do busy work
print $x,$y; # blocks until substitution is done. 3 threads total

There are fewer cases where language-level parallelism make sense - and the lower level the language, the less possible it is.

Another possibility is taken from fortran - give hint to interpreter to use parallel for loops:
async: for my $item (@items) { processItem($item); }

Of all the above, there is a single place where the COMPILER is actually involved (auto async block discovery).. EVERYTHING else is API driven (meaning YOU wrote it one way or another). Moreover, spawning threads is expensive, you need to pool the them, and this leaves a magic wing of an app that can be a problem child.

--
-Michael
Re:Experince by Instine · 2008-03-25 03:25 · Score: 1

Not as magic wand, no.

Here's a very quick thought on the possibilities: What if every procedure/function is threaded by default. The compiler then figures out which can possibly lead to race conditions, and adds flagging systems to avoid such issues to those functions only. A multi threaed, well flagged [synclocked] 'reley race' is still faster than a linear solution.

Now your compiler attributes priorities to each threaded process. Those priority permutations are the 'genes' that get evolved. Higher priority here, lower there... Possibly also the flag positions (within the determined, logicaly safe, race free boundries). Make sense?

--
Because you can - or because you should?
Re:Experince by Anonymous Coward · 2008-03-25 03:54 · Score: 0

I believe the only way, will be to use genetic algorithms (suited to multiprocessors them selves) to adaptively compile code. Effectively evolving it until its optimized.

I'm so sick of hearing how about genetic algorithms. It seems to be the easy solution for every young fool who can't figure out a problem on his own.
Re:Experince by bar-agent · 2008-03-25 05:34 · Score: 1

There are a lot of optimizations that a "smart compiler" can do. But, you know, it turns out that most compilers are ass-stupid.

I understand it is common with functional languages to hand-wave a compiler that optimizes everything down to one Australian man doing all the work [cf. Futurama], but that's not realistic.

--
i'd hit it so hard, if you pulled me out you'd be the king of britain [bash.org]
Re:Experince by Anonymous Coward · 2008-03-25 06:00 · Score: 0

Erlang is pretty good, although I'd like to see a language similar to erlang but with a C like syntax -
Re:Experince by dkf · 2008-03-25 13:45 · Score: 1

Here's a very quick thought on the possibilities: What if every procedure/function is threaded by default. The compiler then figures out which can possibly lead to race conditions, and adds flagging systems to avoid such issues to those functions only. A multi threaed, well flagged [synclocked] 'reley race' is still faster than a linear solution. What you've just proposed there is deeply magical, you know? The notion of "every function being threaded by default" is odd from the beginning (it makes more sense to think in terms of loops) and compilers are usually terrible at knowing what is safe as they have to get edge cases right even when a higher-level analysis of the program will tell us that in fact those cases are impossible. As for a "well flagged" code being faster, let me assure you that that's not necessarily the case. Locks have quite a bit of overhead, even when you run with a single thread.

It's better to partition the dataset (if possible).
Now your compiler attributes priorities to each threaded process. Those priority permutations are the 'genes' that get evolved. Higher priority here, lower there... Possibly also the flag positions (within the determined, logicaly safe, race free boundries). Make sense? Not really. It's clear to me you've not really tried your ideas out, even in minature. If I'm wrong, prove it by example. I rather like being proved wrong that way!

--
"Little does he know, but there is no 'I' in 'Idiot'!"
Re:Experince by The+End+Of+Days · 2008-03-26 06:09 · Score: 1

You're a Java programmer and you think Javascript is ugly? I guess unnecessary verbosity has its place, but I find your opinion odd at best.
Re:Experince by maraist · 2008-03-28 11:55 · Score: 1

What does javascript and java syntax have to do with one another? Javascript is closer to VB-Script, which I still convulse when I remember my prison days of working with. I only like loosely typed / late-binding language for rapid prototyping languages. Python, perl, bash, javascript, awk. But I don't like the negatives with that late-binding when it comes to advanced middleware. The code-base is far too fragile, and you don't get a whole lot of code-reduction in the syntax (to your point of verbosity).

Now SQL, ruby, and prologue are three radically different syntaxes which I find some beauty in. But to me Java, C, C++ are all VERY similar (by design). While java has statistically more characters in a given bundle of code, it has a far superior compilation framework (don't need to compile 5 meg of .h files for every 5 line .c file), and is structurally identical to C++; a mixture of procedural and OO coding. Contrast this with a functional language which, for a given domain, can be far superior in term of visual clarity of the presented code.

Every language has it's niche, and the further you deviate from that niche, the more you stick out like a sore thumb.

You (as well as many) are focusing on the verbosity of the language. But there are sufficient tools which mitigate the verbiage of java. I, on the other hand, am focusing on the feeling I get when I'm programming in the language (kind of like dating the language). Am I frustrated by things that I feel we shouldn't be worrying about in the 21'st century? How robust is my peer's code? How packageable/encapsulateable is it? Can the language constrain his deficiencies? In C, no. In C++, maybe a little better. In javascript, no. In perl, you'd have to do something weird (like flip package spaces). In Java, definitely. So this amongst dozens of other nuances, makes the language feel more elegant and pleasurable to work in.

Consider documentation. In perl, there is sufficient magic to keep a single person employed for a long time. In javascript, documentation is often only at the highest level (here, use this example). In C, lord help you - you had best pray the documentation has been kept up to date. In Java, you don't even need documentation - the strictness of the language allows you to dynamically ascertain all possible calling conventions (with the possible exception of a 10 integer static function where you don't know which parameter means which - but in my experience, most 3'rd party functions have only one or two parameters). There is no code you can link against in java that you can't [auto]discover it's intended interface.. And the verbosity of the language is self-documenting. public class MyOptimizedXmlFactory { public static XmlReader createReader(File){} } is kind of hard to mis-interpret. In fact, the language is so well structured, that javadoc can build an entire web site from literally any existing code that defines every possible useage pattern (except for unnamed parameters as described before - though well documented code will properly annotate the parameter useage).

--
-Michael
Re:Experince by Instine · 2008-03-29 01:03 · Score: 1

The notion of "every function being threaded by default" is odd from the beginning

yes it is.

compilers are usually terrible at knowing what is safe

yes they are. that's where the hard work comes in, built it is NOT impossible or magical. Just difficult. But as the payoff is potentially big, the investment is possibly profitable.

not necessarily [my emph] the case

Which is why i said 'well flagged'. Breeding is what finds the good cases, and loses the bad. Partitioning the data set is often very problematic, and is the wall most in the field have been banging their heads against for some time now... Its useful, when, like you say, its possible.

It's clear to me you've not really tried your ideas out

good, that means you're not a moron, as I intentionally make that pretty clear. Although I have played with and professionally worked with the component concepts of the whole idea.

--
Because you can - or because you should?

parallel for years by statemachine · 2008-03-24 19:55 · Score: 1

Most of my past projects I had adapted or rewritten to use threads. Keeping data coherent when one thread or process can interrupt another is hard -- maybe that's why it's not done more?

And what about seti@home, folding@home, and all the other massively parallel projects out there? Surely you're not saying that doesn't apply to multi-core either. I think that if you stop and look around you'll see it. But if you're only basing your opinion on your book sales, then maybe there's another problem.

Old dog refuses to learn new trick. by Snufu · 2008-03-24 19:56 · Score: 1, Funny

In an unrelated story, young pups display surprising agility with said trick. Scientists baffled.

More than a trend, it's a necessity by Cordath · 2008-03-24 20:01 · Score: 4, Insightful

Parallel programming is going to go mainstream, not because people find it interesting, but because that's the way hardware is forcing us to go. First mainframes, then workstations, and now desktops and laptops have moved away from single CPU cores. In every case, it has been a necessary evil due to the inability to pack enough power into a single monolithic processor. Does anyone actually think Intel, if they could, wouldn't happily go back to building single-core CPU's with the same total power as the multi-core CPU's they're making now?

Right now, parallel development techniques, education, and tools are all lagging behind the hardware reality. Relatively few applications currently make even remotely efficient use of multiple cores, and that includes embarrassingly parallel code that would require only minor code changes to run in parallel and no changes to the base algorithm at all.

However, if you look around, the tools are materializing. Parallel programming skills will be in hot demand shortly. It's just a matter of time before the multi-core install base is large enough that software companies can't ignore it.

Re:More than a trend, it's a necessity by Anonymous Coward · 2008-03-24 21:24 · Score: 1, Interesting

Herb Sutter's article The Free Lunch is Over is 3 years old now. His predicted Concurrency revolution hasn't happened. Using his lunch analogy, if you were given free lobster lunches for years and suddenly had to pay $100 for your lunch, you might find that you're really not very hungry or that you prefer cheese sandwiches.
The free lunch (in performance) used to get wasted on slack programmers ignoring performance. Faced with not slacking off further or mountain of bugs from multithreading or learning some new functional language made by academia, the average programmer will choose not slacking off.
Re:More than a trend, it's a necessity by guaigean · 2008-03-25 02:51 · Score: 1

Right now, parallel development techniques, education, and tools are all lagging behind the hardware reality That is absolute nonsense. The programming tools, paradigms, and education are all available. The fact is, people simply thought it wouldn't be important in the short term, but the multi-core world came on fast. The supercomputing world has been dealing with parallelism for a long time without issue, and not just simple pthreads. There are easier languages to use for threading (OpenMP), and for large distributed processing, MPI is quite mature. Just because people have chosen not to learn it, does not mean the tools are not available, or that hardware has surpassed software.

The tools are already there. People just have to decide that they're worthwhile to use.

--
Microsoft Sucks, F/OSS Rocks. I get mod points now right?
Re:More than a trend, it's a necessity by Rhys · 2008-03-25 03:04 · Score: 1

You're right, my Firefox can't make use of both cores on my machine. That's okay by me though while the number of cores is small (2-8). Why? I've got a Rhythmbox running in the background and it needs some CPU power. Then there's about 10 open gnome-terms that occasionally need to update themselves as irc chatter or a compile progresses. There's Thunderbird poking at the imap server occasionally, not to mention monitoring apps (Intermapper, so that means Java which is a pig) and a load average applet running on my task bar/etc. Sure, the two cores I have often sit idle. On the bright side, when I do have a compile or large (multi-gig) logfile analysis running on my machine, it can hog one core while the other keeps my browser, window manager, and friends responsive to the user (aka me).

That said, I wouldn't mind a really good message passing framework. BeOS was nice for that and it had a lot less crashy/deadlock/etc problems than I would have expected.

--
Slashdot Patriotism: We Support our Dupes!
Re:More than a trend, it's a necessity by hunteke · 2008-03-25 14:32 · Score: 1

Right now, parallel development techniques, education, and tools are all lagging behind the hardware reality. Relatively few applications currently make even remotely efficient use of multiple cores, and that includes embarrassingly parallel code that would require only minor code changes to run in parallel and no changes to the base algorithm at all.

Sort of off-topic, but I'll point out the fact that education is a HARD thing for which to get funding. Research is easy, education is hard. We've had numerous NSF grants get turned down because we included wording of education, not research. And historically, our group is surprisingly good at getting NSF grants.

That said, if anyone on this thread is an educator or student interested in learning and/or teaching High Performance Computing (and hence concepts of parallelism and all their associated headaches), registration is now open for SC Education workshops.

Reinders Is Wrong: Threads Are Not the Answer by MOBE2001 · 2008-03-24 20:07 · Score: 5, Insightful

One day soon, the computer industry will realize that, 150 years after Charles Babbage came up with his idea of a general purpose sequential computer, it is time to move on and change to a new computing model. The industry will be dragged kicking and screaming into the 21st century. For over 20 years, researchers in parallel and high-performance computing have tried to come up with an easy way to use threads for parallel programming. They have failed and they have failed miserably. Amazingly, they are still continuing to pursue the multithreading approach. None other than Dan Reed, Director of scalable and multicore computing at Microsoft Research, believes that multithreading over time will become part of the skill set of every professional software developer (source: cio.com). What is wrong with this picture? Threads are a major disaster: They are coarse-grained, they are a pain in the ass to write and hard to debug and maintain. Reinders knows this. He's pushing threads, not because he wants your code to run faster but because Intel's multicore CPUs are useless for non-threaded apps.

Reinders is not an evangelist for nothing. He's more concerned about future-proofing Intel's processors than anything else. You listen to him at your own risk because the industry's current multicore strategy will fail and it will fail miserably.

Threads were never meant to be the basis of a parallel computing model but as a mechanism to execute sequential code concurrently. To find out why multithreading is not part of the future of parallel programming, read Nightmare on Core Street. There is better way to achieve fine-grain, deterministic parallelism without threads.

Re:Reinders Is Wrong: Threads Are Not the Answer by Anonymous Coward · 2008-03-24 20:48 · Score: 2, Insightful

The theory of fine-grained parallelism is fundamentally flawed by the fact that parallelisation itself incurs an overhead, due to locking and syncing shared data structures.

Your COSA stuff has already been investigated by researchers before. Basically, describing functional programming in a graph doesn't achieve anything. And if you want to parallelise like this (which is still nowhere near as efficient as hand-optimised coarse-grained parallelisation):

Core 1 Core 2

(2*3) + (4*6)
(6+24)

you'll probably have to wire the registers between cores directly into each other in order to avoid the enormous overhead of an external chip.
Re:Reinders Is Wrong: Threads Are Not the Answer by mrbluze · 2008-03-24 20:49 · Score: 1

Threads were never meant to be the basis of a parallel computing model but as a mechanism to execute sequential code concurrently. To find out why multithreading is not part of the future of parallel programming, read Nightmare on Core Street. There is better way to achieve fine-grain, deterministic parallelism without threads. You have it exactly right, but I don't think multicore processing is going away soon. Rather, I think the solution will be in things like hypervisors and changes in kernels, etc, to allocate resources in a responsible way. Multicore CPU's don't have to be useless, but this time it's a case of the mountain needing to come to Mohammad, not the other way around.

--
Do it yourself, because no one else will do it yourself. [beta blockade 10-17 Feb]
Re:Reinders Is Wrong: Threads Are Not the Answer by MOBE2001 · 2008-03-24 21:01 · Score: 1

The theory of fine-grained parallelism is fundamentally flawed by the fact that parallelisation itself incurs an overhead, due to locking and syncing shared data structures.

Fine-grained parallelism works fine. It works in your NVidia, SIMD-based graphics coprocessor, does it not? Locking and syncing is a problem only in a non-deterministic environment like multithreading. Fine-grained parallelism is temporally deterministic because the temporal (concurrent or sequential) order of code execution can be precisely determined.

you'll probably have to wire the registers between cores directly into each other in order to avoid the enormous overhead of an external chip.

Not really. There is a way to design a multicore processor such that only neighboring cores cooperate on related computation. It is part of the self-balancing mechanism. I can't go into detail but suffice it to say that if you keep your inter-core communication performance penalty at a fixed level regardless of the number of cores, you have a winner.
Re:Reinders Is Wrong: Threads Are Not the Answer by SanityInAnarchy · 2008-03-24 21:05 · Score: 1

Wow, I thought it was interesting the first time I saw you say it -- but a quick Google turns it up again.

Really, a copy/paste troll on threading? WTF?

And yes, I'm calling it a troll unless you stop quoting that "150 years after Charles Babbage" BS, and start making your point within the comment, instead of in a rambling five-page blog post which links to a rambling whitepaper, at the end of which, we finally get a vague idea of what you might be talking about -- and we find that it's not really relevant to the real world unless we adopt a whole new (as-yet uninvented) hardware architecture.

Maybe. I think. That was a LOT of skimming.

--
Don't thank God, thank a doctor!
Re:Reinders Is Wrong: Threads Are Not the Answer by bhima · 2008-03-24 21:09 · Score: 1

I gotta say that link (and thus your post) is a whole lot hyperbole and a whole lot of text to get one link to the project Cosa: http://www.rebelscience.org/Cosas/COSA.htm

The project itself I suppose is pretty interesting...

But I'll not beleive the everyone in computer science since Charles Babbage was wrong BS until the entire industry is using the model proposed.

--
Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
Re:Reinders Is Wrong: Threads Are Not the Answer by MOBE2001 · 2008-03-24 21:16 · Score: 1

Well, you know. If you don't like my copying and pasting, don't read my stuff. I mean, nobody is twisting your arm.

it's not really relevant to the real world unless we adopt a whole new (as-yet uninvented) hardware architecture.

It's going to be relevant soon enough because threads are a major pain in the ass. Add heterogeneous processors into the mix and you have a major disaster. BTW, it does not take that long to design and build a new processor anymore, even if it's a multicore processor. It can be done in months if you have a firm idea in mind.

And you're wrong about having to read all my stuff to find out that we need a whole new architecture since I mentioned in the original post that the computer industry needs to abandon Babbage's sequential model and move to a new computing model. Sorry.
Re:Reinders Is Wrong: Threads Are Not the Answer by Anonymous Coward · 2008-03-24 21:40 · Score: 0

Fine-grained parallelism works fine. It works in your NVidia, SIMD-based graphics coprocessor, does it not?

Yes, but that's because the job it does is very, very easy to break up into concurrent, separate workloads by definition. When workloads have to converge repeatedly as with other tasks, there's going to be a massive amount of overhead, unless the algorithm can be redesigned specifically. Which brings us back to a similar problem.

Fine-grained parallelism is temporally deterministic because the temporal (concurrent or sequential) order of code execution can be precisely determined.

I'll agree with you on that one. Each thread being pre-empted (or not) in an 'undefined' manner, whilst dealing with memory contention issues etc. most certainly makes the behaviour erratic and almost random.

Not really. There is a way to design a multicore processor such that only neighboring cores cooperate on related computation. It is part of the self-balancing mechanism. I can't go into detail but suffice it to say that if you keep your inter-core communication performance penalty at a fixed level regardless of the number of cores, you have a winner.

What about multiple sockets? The problem is not parallelising to merely up to around 8 cores, but to scale up to a fully-blown multi socket and a possibly NUMA system. And with almost any penalty, you're going to incur a loss when parallelising very small, simple operations.
Re:Reinders Is Wrong: Threads Are Not the Answer by MOBE2001 · 2008-03-24 21:50 · Score: 1

What about multiple sockets? The problem is not parallelising to merely up to around 8 cores, but to scale up to a fully-blown multi socket and a possibly NUMA system. And with almost any penalty, you're going to incur a loss when parallelising very small, simple operations.

There does not have to be a penalty if neighboring cores share registers or even caches. The trick is to keep computations local to neighbors only. It can be done. It's a matter of instruction scheduling. Besides, even if there is a slight penalty, only a small percentage of situations will fall under the category that you describe. A QSort (or tree search function) lends itself very nicely to fine-grained parallelism, for example. Ultimately, physics will solve the Von Neumann bottleneck and cores will go directly to fast memory: no need for caches then.
Re:Reinders Is Wrong: Threads Are Not the Answer by Khakionion · 2008-03-24 22:09 · Score: 1

"BTW, it does not take that long to design and build a new processor anymore, even if it's a multicore processor. It can be done in months if you have a firm idea in mind."

Irrelevant. Creating a new hardware architecture is not equivalent to the computing industry adopting it. Consider porting legacy systems (quite a few in this case), re-educating all those low-level programmers, and the "architecture wars," for lack of a better term, as you try to evangelize this new platform as being better...

Oh, wait, evangelize? Guess you're at Reinders' level now.

--
OMG! Wau!
Re:Reinders Is Wrong: Threads Are Not the Answer by MOBE2001 · 2008-03-24 22:22 · Score: 1

Consider porting legacy systems (quite a few in this case), re-educating all those low-level programmers, and the "architecture wars," for lack of a better term,

I have. It can all be done easily.

as you try to evangelize this new platform as being better...

Is your name Reinders, by any chance? Or do you work for Intel? :-) I don't get paid for doing this. Do you? It's a labor of love. I believe in it. I just want to see a cheap, superfast and reliable supercomputer on my desk before I die. Maybe with some cool AI application in it, you know, something I can talk to in English, something that knows me. I hate keyboards.
Re:Reinders Is Wrong: Threads Are Not the Answer by dkf · 2008-03-24 22:31 · Score: 4, Insightful

Fine-grained parallelism works fine. It works in your NVidia, SIMD-based graphics coprocessor, does it not? Locking and syncing is a problem only in a non-deterministic environment like multithreading. Fine-grained parallelism is temporally deterministic because the temporal (concurrent or sequential) order of code execution can be precisely determined. It really really depends on the problem and the algorithm. Some things are easy to parallelize, especially if they don't need (much) shared writable memory, but others are furiously hard.
you'll probably have to wire the registers between cores directly into each other in order to avoid the enormous overhead of an external chip.

Not really. There is a way to design a multicore processor such that only neighboring cores cooperate on related computation. It is part of the self-balancing mechanism. I can't go into detail but suffice it to say that if you keep your inter-core communication performance penalty at a fixed level regardless of the number of cores, you have a winner. As I said above, it really depends on what you're doing. Some classes of problems just don't and can't have nice communication patterns, and if you've got one where you've got these inherent non-local effects, no amount of cleverness is going to let you avoid the hard fact that communication costs will dominate them. Other problems are much more tractable though; it's definitely not all doom and gloom. Just don't sound off and claim that it's all solved (nope!) or that some simple hardware-level cleverness will save us (nope again!) but instead study what's really known so that you can sound more knowledgeable. A good place to start reading up is with the thirteen dwarfs paper (PDF).

--
"Little does he know, but there is no 'I' in 'Idiot'!"
Re:Reinders Is Wrong: Threads Are Not the Answer by Anonymous Coward · 2008-03-24 22:55 · Score: 1, Insightful

Bollocs.

Multithreading is a solved problem (as programming is). The only thing that needs fixing with threads is the lack of cheap developers. But that is solvable too, using the Actor Model (look it up on Wikipedia, I'm too lazy to do it for you.) In short: don't share variables, use synchronized queues instead, and your threading problems vanish (at a price, of course.) You can do it in anything from assembler to Java, but some languages like Erlang make it damn easy.

Go out and learn Erlang or Scala.
Re:Reinders Is Wrong: Threads Are Not the Answer by Anonymous Coward · 2008-03-24 23:07 · Score: 1, Insightful

There does not have to be a penalty if neighboring cores share registers or even caches. The trick is to keep computations local to neighbors only. It can be done. It's a matter of instruction scheduling.

That's one hell of an assumption. Even then, you'd still be stuck with one bank of memory, seeing as this would not be feasible over a large-scale system. As soon as you have processors on different boards, attached to different banks of memory, you're going to have an absolutely massive overhead.

Ultimately, physics will solve the Von Neumann bottleneck.

Again.. one hell of an assumption. If that happens, many things in computer science will have to be completely re-thought.
Re:Reinders Is Wrong: Threads Are Not the Answer by weetabeex · 2008-03-25 00:37 · Score: 0

They are coarse-grained, they are a pain in the ass to write and hard to debug and maintain. I may be out of line here, but I assume OP is refering to coarse-grained concurrency mechanisms, using, most probably, locks.

Well, I've been for some months working with Software Transactional Memory at my university, and I believe this could be the answer to most concurrency issues.

This isn't something new. Maybe bleeding-edge (one still bleeds a lot), but it certainly isn't new. The concepts are around since late 70's, but only recently (maybe in the latest 10 years?!) some frameworks have been implemented.

Anyway, it surelly comes as an abstraction for the programer. No more burden on manually controling accesses to data. Well... one may need to specify when accesses to shared memory may happen, but the burden of acquiring locks, releasing them, doing it in the right order... it's no more, the framework will do it for the programer.

I do believe threading isn't such a disaster. The approach taken to control concurrency, however, is. And I believe it will change :)
Re:Reinders Is Wrong: Threads Are Not the Answer by smallfries · 2008-03-25 00:55 · Score: 1

I wouldn't take it too seriously. He is a well-known crank who has been hawking that shit for years. Like all nutjobs he has taken a tiny grain of truth and the built an entire castle of delusion on-top of it. His ideas of signal based computing are vapour, but the best aspects of them have been well explored within academia as process calculi. The type of local semantics that he describes are useful for safety properties (and are used extensively in formal verification) but translating them into an efficient form of executable is very difficult.

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Re:Reinders Is Wrong: Threads Are Not the Answer by jellomizer · 2008-03-25 01:19 · Score: 1

The problem is the programmers think of solving the problems like a sequencial programmer not a parallel programmer. So when program like it is a sequencial program timing to insure everything get in in the right order is important. For parallel design you need to make the programs so it is not dependant on order (or greatly reduce it) for a good parallel design normally you just need to AND all the processors when they are all done.
I have worked on a MASS-PAR systems in the past with 1024 processors. While it took 3 minutes to load up emacs. It smoked my 2 CPU (much more modern) system in a good parallel running program. Where it took my PC 10 minutes to calculate the MASS-PAR took a split second to get the job done... THere is effency in parallel programming and the over head that occures to a well designed parallel program is greatly diminished. But most people think like Top Down programmers and write the parallel programs to function like top down programs except for perhaps running a large function or 2 in a different thread and then pause the CPU until it is ready for the result.

Threading isn't the only paradime you can follow the MASS-PAR language MPL system had this interesting concept of Plural varables. Which made programming very efficent.

--
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Re:Reinders Is Wrong: Threads Are Not the Answer by ceoyoyo · 2008-03-25 03:19 · Score: 1

Sure, fine grained parallelism works great in my video card. The card lays down the rule: you may NOT have any sort of dependencies or shared read/write memory between parallel tasks. That works just hunky dory on a general purpose processor too, using threads or multiple processes. We expect a bit more from our general purpose processors though.
Re:Reinders Is Wrong: Threads Are Not the Answer by SanityInAnarchy · 2008-03-25 04:50 · Score: 1

Well, you know. If you don't like my copying and pasting, don't read my stuff. I mean, nobody is twisting your arm.

I'm doing two things here: I'm warning other people away from being drawn into what looks very much like a dead end, and I'm offering a suggestion to you -- if your idea is really and truly revolutionary, you should be able to find a better way to communicate it. Copying/pasting hyperbole -- going out of your way to pick fights -- is not going to help you.

It's going to be relevant soon enough because threads are a major pain in the ass.

Done right, they can be a beautiful thing. And you have yet to demonstrate that this thing you're building is any better.

BTW, it does not take that long to design and build a new processor anymore, even if it's a multicore processor. It can be done in months if you have a firm idea in mind.

You don't even have the software right now. I can't find a download link, and your forum is down, so I don't see any discussion happening. Right now, it looks pretty much like vaporware -- maybe designed to attract VCs?

Come back when you have something I can play with.

(Oh, and a GUI for programming seems like a bad idea. That might just be a gut feeling, though.)

And you're wrong about having to read all my stuff to find out that we need a whole new architecture since I mentioned in the original post that the computer industry needs to abandon Babbage's sequential model and move to a new computing model.

Which could have been talking about a new software model, just as easily.

--
Don't thank God, thank a doctor!
Re:Reinders Is Wrong: Threads Are Not the Answer by Anonymous Coward · 2008-03-25 05:39 · Score: 0

The 13 dwarfs paper is out of EECS at Berkeley. The official site and public blog is here: http://view.eecs.berkeley.edu/

In my opinion, this paper should be required reading for anyone interested in high performance computing.
Re:Reinders Is Wrong: Threads Are Not the Answer by MarkvW · 2008-03-25 07:32 · Score: 1

Do you mean the "Bob Arctor" model for handling multiple threads?
Re:Reinders Is Wrong: Threads Are Not the Answer by Anonymous Coward · 2008-03-25 23:46 · Score: 0

"He's pushing threads, not because he wants your code to run faster but because Intel's multicore CPUs are useless for non-threaded apps." - by MOBE2001 (263700) on Tuesday March 25, @04:07AM (#22854534) Homepage That's NOT true: The process scheduler kernel OS core component subsystems in today's modern OS familes' (such as *NIX variants like Linux, BSD/MacOS X, & Windows NT-based OS' (NT/2000/XP/Server 2003/VISTA/Server 2008)) can send EVEN SINGLE THREADED PROGRAMS single parent thread of execution off to the least saturated CPU core present, IF the "main CPU#0" is nearing 100% usage, allowing the "hog app" to continue running on any cores its threads are presently being run on, sending others from other apps (even single threaded apps) off to the least saturated CPU core present.

throwing more CPUs at the problem by Anonymous Coward · 2008-03-24 20:07 · Score: 1, Informative

On the surface, it seems that Japanese engineers have a history of this. Fujitsu had their dual-6809 home computer, arcade games commonly had two and sometimes three Z80s, 680x0, or whatever. Sega, in their mad rush to beat the specs of the upcoming Playstation, stuffed four off-the-shelf processors plus a few custom chips into the Saturn.

Old dogs and new tricks by Opportunist · 2008-03-24 20:11 · Score: 4, Interesting

Whether it's a good idea or not, you will have a VERY hard time convincing an old dog programmer that he should jump on the train. Think back to the time when relational models entered the database world.

Ok, few here are old enough to be able to. But take object oriented programming. I'm fairly sure a few will remember the pre-OO days. "What is that good for?" was the most neutral question you might have heard. "Bunch o' bloated bollocks for kids that can't code cleanly" is maybe more like the average comment from an old programmer.

Well, just like in those two cases, what I see is times when it's useful and times when you're better off with the old ways. If you NEED all the processing power a machine can offer you, in time critical applications that need as much horsepower as you can muster (i.e. games), you will pretty much have to parallelize as much as you can. Although until we have rock solid compilers that can make use of multiple cores without causing unwanted side effects (and we're far from that), you might want to stay with serial computing for tasks that needn't be finished DAMN RIGHT NOW, but have to be DAMN RIGHT, no matter what.

--
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.

Re:Old dogs and new tricks by TuringTest · 2008-03-24 22:15 · Score: 1

Interestingly, Microsoft knows that, and they seem really interested in providing the right framework to "take advantage of the multicore architectures while solving the most common problems with concurrency.

Their Research Labs are doing a lot of good work with experimental language features, and many of them are getting their way into the .Net platform.

This makes sense coming from this company, since one of their strong points always has been creating good development environments for the not-highly-specialized programmers of the world. These features take a good effort to make them very integrated into the old way of programming, and easy to use even without a "functional mindset".

--
Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
Re:Old dogs and new tricks by Tony+Hoyle · 2008-03-25 00:26 · Score: 1

"Bunch o' bloated bollocks for kids that can't code cleanly" is maybe more like the average comment from an old programmer.

To an extent they were right - remember at the time OO was being pushed as the ultimate answer to life, the Universe and Everything. If you read some of the more academic OO books even today they truly are "a Bunch o' bloated bollocks".

Then there was UML, Webservices, XML, Threading, Java, .NET... all have been pushed in the same way. Seems to be part of the lifecycle that technologies go through.

Of course over time most of us realize all these things are tools to be used in the right context, and just use what we need when we need it. That's not being old, it's realizing that there's nothing that there's no silver bullet and anyone that says $technology_of_the_week is going to change the world probably is talking bollocks.
Re:Old dogs and new tricks by Abcd1234 · 2008-03-25 03:38 · Score: 1

Whether it's a good idea or not, you will have a VERY hard time convincing an old dog programmer that he should jump on the train.

Yes, because parallel programming is *so* very new.

Hey, maybe it's just that the old dogs have done parallel programming in the past, and they realized something: it sucks. A lot.

That's not to say it isn't the way of the future (multicore basically requires it). But to assume that more experienced programmers being reticent about parallel programming is simply stubbornness stinks of ageism, and we've got enough of that in the industry, thank you very much.
Re:Old dogs and new tricks by SpinyNorman · 2008-03-25 04:26 · Score: 1

Ok, few here are old enough to be able to. But take object oriented programming. I'm fairly sure a few will remember the pre-OO days. "What is that good for?" was the most neutral question you might have heard. "Bunch o' bloated bollocks for kids that can't code cleanly" is maybe more like the average comment from an old programmer.

Sure, I'm an old timer (started out soldering together a 1MHz Z80 kit - NASCOM-1 - back in '78).

For those who were around pre-OO, it wasn't really a fundamental innovation, rather just a more convenient way of doing what we'd already been doing. e.g. If a design calls for polymorphism or a what you whippersnappers would call a class hierarchy then the OO way would be to use subclasses and virtual methods, but us old timers writing in C would just use structures with pointers to functions. Of course the ease of use of C++ is much better, and it's great to have template classes and the STL, but the concepts themselves are just things that us pre-OO old fogeys already implemented ourselves in the languages available.

You can still make large apps without concurrency? by Swizec · 2008-03-24 20:12 · Score: 1

Maybe it's just because I have under 15 years of experience, hell I have under five years of real world experience. But I develop nearly everything that's large enough to benefit from concurrency so it uses it. To be perfectly honest, I even develop web applications so they use concurrency extensively. It's an awesome concept and I cannot imagine how anyone could ever live without it.

Re:You can still make large apps without concurren by BadAnalogyGuy · 2008-03-24 20:28 · Score: 1

Maybe it's just because I have...under five years of real world experience. But I develop nearly everything...(with) concurrency.

When your only tool is a hammer...

Optimisation by kramulous · 2008-03-24 20:29 · Score: 1

There is no substitute for good old optimisation. Efficient memory usage, looping techniques and a little thought towards to branching can go a long way. That, and batch processing on multiple cores takes a fair bit of beating when the clock is ticking. I realise that there is still some *big* gains from finer grained parallelisation but I see a lot of coding behaviors that would never have been used ten years ago.

--
.

Re:Optimisation by mrjb · 2008-03-24 20:38 · Score: 1

There is no substitute for good old optimisation. ...said the Traveling Salesman.

--
Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book
Re:Optimisation by Antique+Geekmeister · 2008-03-24 21:20 · Score: 1

Well, yes. You certainly can't optimize *everything* to its theoretical maximum. But a bit of attention to making the Traveling Salesman's bag lighter, getting through airport security faster, and being able to phone ahead to change his reservations can go a long way towards improving the profitability of his trip.

Parallel tools are still pretty weak by Goalie_Ca · 2008-03-24 20:29 · Score: 2, Informative

I'm writing a scientific library that is supposed to scale for many many cores. Using a mutex lock is not an option. Unfortunately right now I am spending all my time trying to figure out how to get compare and swap working on all the different platforms. I am saddened to see the lack of support since this is such a fundamental operation. Also, the whole 32 vs 64-bit thing adds more pain because of pointer size.

--

----
Go canucks, habs, and sens!

Re:Parallel tools are still pretty weak by nten · 2008-03-24 22:24 · Score: 2, Interesting

As you say atomic pointer swaps are often not an option. And if you have to rule out mutex locks for timing or some other reason, I think you may be down to functional programming. If you eliminate objects with state and just pass everything by value (make copies on the calling thread to hand to the called), it can solve the problem, but it can cost a load in memory access times. Wiki the buzzphrase "referential transparency"

--
refactor the law, its bloated, confusing and unmaintainable.
Re:Parallel tools are still pretty weak by Hal_Porter · 2008-03-24 22:41 · Score: 1

If you want a compare and exchange, just use inline assembler.

--
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
Re:Parallel tools are still pretty weak by Tony+Hoyle · 2008-03-25 00:31 · Score: 1

Firstly that's a nightmare to handle multiplatform - have you any idea how many processor variations Unix runs on?

Secondly even assembler level compare and exchange isn't always atomic. It depends on the CPU. Intel (IIRC) has the lock prefix that is able to do that.

There are strategies you can use to handle non-atomic assembler instructions, and it's best to employ these where possibe.
Re:Parallel tools are still pretty weak by Hal_Porter · 2008-03-25 01:01 · Score: 1

Abstract it. Write a version of the library that uses an semaphore to protect access to variables on most systems. On the ones you can test on use lock cmpxchg or the local equivalent instead. So the code will still be portable, just faster on the ones with the assembler. Hopefully you can test on x86 and x64 so your code will be fast on those. Other peoplem can contribute assembler on other architectures if they use them.

Actually some OSs do provide a portable version of lock cmpxchg

http://msdn2.microsoft.com/en-us/library/ms683590(VS.85).aspx

http://msdn2.microsoft.com/en-us/magazine/cc302329.aspx

The problem is it's not part of Posix.

--
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
Re:Parallel tools are still pretty weak by kfhickel · 2008-03-25 01:13 · Score: 1

Or, use a library that already exists with tons of man-hours behind it. The ADAPTIVE Communication Environment (ACE(TM))
IIRC the latest version supports atomic operations on several platforms (including windows, AIX, solaris and linux, I believe). Commercial support agreements are also available, if you need that sort of thing.
Re:Parallel tools are still pretty weak by pingbak · 2008-03-25 13:04 · Score: 1

Have you looked at ways to implement load linked, store conditional (LL/SC)? Even though you'll need a layer for the various types of CAS implementations, you might be better off writing the recovery code when you find that you're out of sync. Just a thought...
Re:Parallel tools are still pretty weak by Anonymous Coward · 2008-03-25 19:03 · Score: 0

Then get smart and drop C/C++ (which it sounds like you are using) and use Java instead.

You realize--I hope--that you will almost certainly get HIGHER numerical performance from Java than C/C++ if you are writing your code elegantly (that is, as a clean implementation of the math). You only get better numerical performance--for now--from C/C++ if your algorithm is amenable to low level (usually CPU-specific) optimizations.

Prime example of this for numerical code is the vector processing units of modern CPUs: all mainstream JVMs that I am aware of do not use this like they potentially could (if they even use it at all).

In memory intensive apps, C++ can also do stack allocation of objects which JVMs likewise do not do at the moment. But the next release of sun's JDK is supposed to recognize opportunities for stack allocation (or even better: object explosion directly into registers--something that C++ will never be able to do). And future JVMs will likely exploit vector units better too. The result: if you write clean elegant maintainable Java now, you will automatically pick up the better performance in the future. Without ugly code. Which is the way it ought to be.

I'm surprised Europe is down. by jd · 2008-03-24 20:47 · Score: 1

They did most of the early work on which almost all modern parallelism is based. America has tended to follow the notion of faster sequential processing, as the cost savings were not a major factor for universities who could afford big, bright labels. And that's all it's been - playing at which kid gets the toy with the biggest label. Not all, by any means, but a lot. Beowulf clusters, PVM, OpenMPI, Unified Parallel C, Charm++, Cilk, Erlang, Myrinet, Infiniband, RDMA, iWARP... All dveloped in America and all ideally suited to clusters rather than sequential or "big iron" processing. So even though they have never held the limelight, they have achieved a lot. In Europe, Occam was - and is - the major rival. Pi-Occam provides individual programs the kind of power you'd normally use Mobile IP, MOSIX and Globus to achieve (albeit a lot slower than you'd get with Occam).

So to discover that other nations have just walked by and left the EU and US in the dust is a little annoying, but let's face it. Their resources were kept minimal by the west, so it's no surprise they learned the golden rule of boolean logic. waste not, want nots. C'mon, don't pretend we didn't expect this. OpenMP wasn't developed for the fun of it. Parallel has been on the way in for 25+ years. 25 years of Moore's Law applying to their work. You can catch up if you like. It will mean less visible glory, but it would mean doing something real.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)

Re:I'm surprised Europe is down. by Hal_Porter · 2008-03-24 22:38 · Score: 1

Look at what people learn at University though. Lots of Java and Posix. Posix has pthreads that I've never used but the consensus from Unix people seems to be that multi threading is too hard to bother with. They also are told that the Windows API is a hack (true) and too hard to bother with (not true). Unix kernels traditionally have been preemptive. Java takes away all the details of locking from the programmer. If all you're exposed to is Posix and Java, I can see that multi threading seems like black magic because you don't really have any feel for how processors actually execute code.

Microsoft technologies are unfashionable and taught as a "how not to do things" way at University. Someone from a major Asian electronics manufacturer told me that they would do Windows Mobile smartphones for Asia and a Symbian based smartphone for Europe, because "Europeans don't like Microsoft for emotional reasons". Now in Asia and Russia quite a lot of people learn Win32 because it is so common and they see it as a useful skill to have.

Now Windows NT may have had its flaws but it was designed from the start to be SMP friendly. Everything in the kernel is preemptive and locking is extremely fine grained - any object in the driver needs to be locked since the IO model is designed to pipeline across processors under heavy load. And usermode Win32 has excellent support for threading. It even has a C version of x86's lock cmpxchg, InterlockedCompareAndExchange. Windows historically has tended to be a mass of subtle hacks that require an assembly background to appreciate. My guess is CS teachers think that this will damage young programmers, but I think that is completely wrong. If you understand why they worked, you know a lot more about how processors really operate than someone who's only ever written high level C. And that is helpful if you want to write multithreaded code. Or for that matter code on some idiosyncratic or just plain limited embedded system.

--
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;

Parallel programming has been with us of years! by supersnail · 2008-03-24 20:52 · Score: 3, Insightful

One of the reasons more seasoned programmers are not particularly interested is that in most cases someone else has already doen the hard work.

Want to serve multiple user on multiple cpus with your web pages? Then write a single threaded program and let Apache handle the parallelism. Same goes for JavaEE, database triggers, etc. etc. going all the way back to good old CICS and COBOL.

It is very rare that you actually need to do parallel programing yourself. Either you are doing some TP monitor like program which schedules tasks other people have written in which case you should use use C and POSIX threads (anything else will land you in deep sewage) or you are doing serious science stuff in which case there are several "standard" fortran libraries to do multithreaded matrix math -- but if the workload is "serious" you should be looking at clustering anyway.

--
Old COBOL programmers never die. They just code in C.

Re:Parallel programming has been with us of years! by master_p · 2008-03-24 23:57 · Score: 2, Interesting

It all depends from where you stand. If your bread and butter is business applications, then indeed it's rare to having to deal with threads. But for almost everything else, threading is a necessity. I am a programmer in the field of defense applications, and I never ever worked in a single-threaded application. And if you look around, almost all applications have some sort of parallelism (examples: Word when re-paginating long documents, Firefox with multiple tabs, Excel running long computations, Torrent clients downloading multiple torrents, etc).

Threading through library, and other things by dragisha · 2008-03-24 21:03 · Score: 1

Slightly offtopic, Hans Boehm had it's say in 2006, but author, who promotes his own book/project using it as a base for his "research", is obviously one of opposers of these claims.

Adoption problem, IMnsHO, arises because there is real friction between usual C* approaches and threading. Experienced programmers (as other posters claim) are used to their ways and easily discouraged by these frictions. Java's threading approach is a bit of pain in the ass too, so this leaves big majority of programmers out in the cold when it comes to threading.

Threading goes against usual development style in US, where problems are attacked with workforce, not with any kind of optimized approach. Or original. "Why experiment, when we make tons of money old-style all these years?"

dd

--
http://opencm3.net, http://www.nongnu.org/gm2/

Not So Great by yams · 2008-03-24 21:06 · Score: 4, Insightful

Been there, done that. Good from far, but far from good.

As an engineer straight out of college, I was very interested in parallel programming. In fact, we were doing a project on parallel databases. My take is that it sounds very appealing, but once you dig deeper, you realise that there are too many gotchas.

Considering the typical work-time problem, let's say a piece of work takes n seconds to complete by 1 processor. If there are m processors, the work gets completed in n/m seconds. Unless the parallel system can somehow do better than this, it is usually not worth the effort. If the work is perfectly divisible between m processors, then why have a parallel system? Why not a distributed system (like beowulf, etc.)?

If it is not perfectly distributable, the code can get really complicated. Although it might be very interesting to solve mathematically, it is not worth the effort, if the benefit is only 'm'. This is because, as per Moore's law, the speed of the processor will catch up in k*log m years. So, in k*log m years, you will be left with an unmaintainable piece of code which will be running as fast as a serial program running on more modern hardware.

If the parallel system increases the speed factor better than 'm', such as by k^m, the solution is viable. However, there aren't many problems that have such a dramtic improvement.

What may be interesting are algorithms that take serial algorithms and parallelise them. However, most thread scheduling implementations already do this (old shell scripts can also utilise parallel processing using these techniques). Today's emphasis is on writing simple code that will require less maintenance, than on linear performance increase.

The only other economic benefit I can think of is economy of volumes. If you can get 4GHz of processing power for 1.5 times the cost of a 2GHz processor, it might be worth thinking about it.

Re:Not So Great by darkfire5252 · 2008-03-24 22:53 · Score: 1

Considering the typical work-time problem, let's say a piece of work takes n seconds to complete by 1 processor. If there are m processors, the work gets completed in n/m seconds. Unless the parallel system can somehow do better than this, it is usually not worth the effort. If the work is perfectly divisible between m processors, then why have a parallel system? Why not a distributed system (like beowulf, etc.)? Wait, what? How is this insightful? If you have a piece of work that can be completed faster by using multiple processors concurrently... why that sounds an awful lot like parallel processing! Also, to say that a problem that takes N seconds to complete with 1 processor takes N/M seconds to complete with M processors is just plain wrong. N/M seconds would be amount of time taken by a theoretically 'perfect' setup; it cannot be achieved in practice (I am fairly certain, can someone correct me if I'm wrong?).
If it is not perfectly distributable, the code can get really complicated. ... Welcome to the discussion. That's the central problem of parallel (or distributed, for that matter) systems. Did you major in a non-tech field like civil engineering, or did you just not pay attention?
Re:Not So Great by TERdON · 2008-03-25 00:04 · Score: 1

N/M seconds would be amount of time taken by a theoretically 'perfect' setup; it cannot be achieved in practice (I am fairly certain, can someone correct me if I'm wrong?).

You're wrong. It can be achieved (in particular cases). It can even sometimes be surpassed - but then it's because of some superlinear effect - like the work being exactly the right size to not fit into cache or RAM on one processor, but being small enough to fit when distributed, so when distribution is used the cache is effectively used or thrashing avoided. Theoretically, the limit shouldn't be surpassable (and very hard to actually reach).

--
I have a really elegant proof for Fermat's last theorem. If this sig was only a bit longer...
Re:Not So Great by darkfire5252 · 2008-03-25 00:08 · Score: 1

It can be achieved (in particular cases). It can even sometimes be surpassed [...] Theoretically, the limit shouldn't be surpassable (and very hard to actually reach). I was with you until that, what do you mean?
Re:Not So Great by laci · 2008-03-25 00:23 · Score: 1

> If the parallel system increases the speed factor better than 'm', such as by k^m, the solution is viable. However, there aren't many problems that have such a dramtic improvement.

Provided that your original serial code is the fastest possible, there aren't *any* problems that can have such dramatic improvement. After all, you can emulate the m processors with time slicing on a single processor thus if the speedup could be k^m then executing the code (in a serial environment) in m*n/k^m time, proving that your original serial code was not the best serial code possible.

--Laci
Re:Not So Great by master_p · 2008-03-25 00:41 · Score: 1

What you say is true, but only on the surface. If you dig deeper, you will see that many programs contain plenty of opportunities for parallelization.

One algorithm that can be greatly parallelized is that of searching, which is a very frequent operation in all shorts of applications. Instead of searching all data, finding the result and then applying the required job on it, in a parallel environment the job can be executed as one of the threads finds the result; which means speeding up executions in tree and hash maps, in sets and other data structures, with an almost linear increase in performance with the number of cores.
Re:Not So Great by TERdON · 2008-03-25 00:59 · Score: 1

Ok here's a try at an explanation:

You're theoretically correct. It shouldn't get more than m times faster if m processors are used. Yay!

However, there is a limitation - the theories generally assume that processing is made at the same speed if it is done by a single processor, or divided by several processors. Due to the hardware architecture in modern processors - especially how cache is constructed - this assumption is not the reality, it's just an approximation.

To take a concrete example: One task takes 8 seconds to run on a single node, and uses 4 MB of memory during that time. Assuming the processor has 1 MB of cache, this will clearly be slow due to many memory accesses going to the main memory, not to the on-chip cache which is far faster.

If the work is split - assume on 8 machines, theoretically it shouldn't be possible to do it faster than 1s. However, depending on the problem, all processors may not need to access all data. Assuming each of the 8 processors only need to access 1 MB of data (which would just barely fit into the cache), the execution will be a bit faster because all memory accesses can go directly to the cache. Perhaps each processor will finish in just 0,8 ms (instead of the theoretical 1 ms), giving a speedup of 10, surpassing the theoretical maximum of 8.

This is very problem dependant though. Sometimes, all the processors will need to access all the memory even if work can be split. Further, the typical case is that the speedup is lower than the theoretical maximum, due to inefficiences. Data has to be distributed before processing can start, sometimes there are dependencies between different parts of the algorithm forcing different processors to wait on each other etcetra. Finally, for each algorithm and problem size, an "speedup with infinite processors" can be defined which marks an upper bound of the possible speedup, regardless of how many processors are used.

If this is a topic that interests you, more information can be found at the course webpage for the parallel algorithms course that I took a couple of years ago.

--
I have a really elegant proof for Fermat's last theorem. If this sig was only a bit longer...
Re:Not So Great by Anonymous Coward · 2008-03-25 03:58 · Score: 0

I see something wrong with your whole premise. You are assuming that CPU speed is following Moore's Law, which it never really was. Repeat after me: transistor count doubles every 24 months. It just so happened that in the past, doubling the amount of transistors meant smaller transistors which translated into faster possible frequency (smaller transistors means they can switch faster, i.e. frequency). Now however, increasing the frequency runs into a lot of the side-effects of making transistors small (current leakage, cross-talk, etc) and running at a high frequency (heat, power, interference etc).

So you're still doubling the transistors but the frequency is constant. What do you do? Well part of it is increase cache sizes a lot. But even then, that is diminishing returns (up until the point where cache can somehow be as big as system memory, but that'll never happen). So they put more cores on because they can, and because it truly does lead to a more responsive system (especially if programmers properly seperate the UI from the backend).

So let's go over the math in your post.

Considering the typical work-time problem, let's say a piece of work takes n seconds to complete by 1 processor. If there are m processors, the work gets completed in n/m seconds. Unless the parallel system can somehow do better than this, it is usually not worth the effort.
Actually, m processors will get the work done in significantly less than n/m seconds. And there is still a benefit for instance if you can introduce a 20% improvement if every data set takes days to process.

If the work is perfectly divisible between m processors, then why have a parallel system? Why not a distributed system (like beowulf, etc.)?
Because it takes a lot more work, and is a much more difficult problem, to split the work up across machines in a scalable manner. Map-reduce helps here. Another reason is that sometimes the app will only run on 1 machine, so the added work of splitting it up across machines is wasted.

This is because, as per Moore's law, the speed of the processor will catch up in k*log m years. So, in k*log m years, you will be left with an unmaintainable piece of code which will be running as fast as a serial program running on more modern hardware.
Not true - please refer to the above. Processor speeds do not follow Moore's law - transistor counts do.

If the parallel system increases the speed factor better than 'm',
Highly impossible - Amdahl's law again.

What may be interesting are algorithms that take serial algorithms and parallelise them
Impossible, because if you were able to programatically able to parallelise serial code, then there would be no difference between the two. Note your thread-scheduling is not parallelising serialized code - it's scheduling when to run sections of parallel code (i.e. two serial programs running concurrently then are parallel to each other)

Today's emphasis is on writing simple code that will require less maintenance, than on linear performance increase.
As it should be - maintenance should almost always trump performance (you optimize at the end of a release when your code is fairly maintainable and works). However, multi-threading is an architectural choice. Introducing it later adds more work than introducing it earlier - however, introducing it earlier can make it harder to debug.

If you can get 4GHz of processing power for 1.5 times the cost of a 2GHz processor, it might be worth thinking about it.
Please tell me when you can get your 4GHz process. For comparison: AMD was the first to hit the 1 GHz mark with the A
Re:Not So Great by dkf · 2008-03-25 04:32 · Score: 1

It can be achieved (in particular cases). It can even sometimes be surpassed [...] Theoretically, the limit shouldn't be surpassable (and very hard to actually reach). I was with you until that, what do you mean? He means that the limit should be impossible to better, and definitely is very difficult to reach with real code anyway. (AIUI, you have to perform problem-specific tuning, but that's hardly ever worth it; few people have problems where that level of developer effort is justified by the resulting time saving, as it's usually cheaper and just to just let the program run a bit longer.)

--
"Little does he know, but there is no 'I' in 'Idiot'!"
Re:Not So Great by Anonymous Coward · 2008-03-25 05:38 · Score: 0

> Considering the typical work-time problem, let's say a piece of work takes n seconds to complete by 1 processor. If there are m processors, the work gets completed in n/m seconds. Unless the parallel system can somehow do better than this, it is usually not worth the effort.

That is exactly the speed-up people dream of. Typically you get less. (Occasionally you can get more...)

> The only other economic benefit I can think of is economy of volumes. If you can get 4GHz of processing power for 1.5 times the cost of a 2GHz processor, it might be worth thinking about it.

Have you been looking at prices? That is the current economic situation.
Re:Not So Great by Anonymous Coward · 2008-03-25 16:40 · Score: 0

This is because, as per Moore's law, the speed of the processor will catch up in k*log m years. So, in k*log m years, you will be left with an unmaintainable piece of code which will be running as fast as a serial program running on more modern hardware. Uhh, I thought the Moore's law's relation to performance scaling would have been clear by now. But just in case, see http://stanford-online.stanford.edu/courses/ee380/070131-ee380-300.asx
Re:Not So Great by Anonymous Coward · 2008-03-29 12:25 · Score: 0

If it is not perfectly distributable, the code can get really complicated. Although it might be very interesting to solve mathematically, it is not worth the effort, if the benefit is only 'm'. This is because, as per Moore's law, the speed of the processor will catch up in k*log m years. So, in k*log m years, you will be left with an unmaintainable piece of code which will be running as fast as a serial program running on more modern hardware.
Have you spent the last few years in a cave or something? Single-core CPU speeds, whether measured in GHz or in MIPS, basically stopped increasing several years ago*. Moore's Law now works through the wonderful magic of multi-core processors. Moore's Law no longer saves you from having to write a parallel application; on the contrary Moore's Law is now what forces you to write them.

* This is not strictly true, but we are so far below the traditional 18-month doubling curve that the increase is effectively zero.

Well, I'm nearly 60, looking at Erlang by Kupfernigk · 2008-03-24 21:06 · Score: 1

And I am very actively considering whether we need to redesign core parts of our systems in Erlang, largely as a result of looking at the ejabberd XMPP server. My suspicion is that what we are seeing here may be more to do with the kind of work being done in different regions and by people at different experience levels. Work in the US and Europe is perhaps more likely to be user-facing, older programmers more likely to be developing end-user applications or working at the architectural level. Concurrency is more likely to be needed for back office, technical or distributed systems which are increasingly being designed outside the traditional areas.

--
From scarped cliff or quarried stone she cries "A thousand types are gone, I care for nothing, no not one."

Parent is first reply that gets it... by Anonymous Coward · 2008-03-24 21:11 · Score: 2, Insightful

It's only been twelve years since I entered the workforce in the US, but I have been studying parallel programming for almost 15 years (three years in university).

The future isn't "multi-threaded" unless you count SPMD, because architecturally the notion of coherent shared memory is always going to be an expensive crutch. Real high-performance stuff will continue to work with distributed, local memory pools and fast inter-node communication... whether the nodes are all on chip, on bus, in the box, in the datacenter, etc.

As they have been since the 80s at least, many CS researchers will be trying to find the holy grail of programming models and tools to automatically parallelize larger classes of algorithms for naive programmers. And in the meantime, just as we have since the 80s at least, programmers will often go to bare metal and hand-optimize important libraries and applications that are too important to leave to the vagaries of the immature tools.

If there are fewer programmers in the US or Europe who worry about parallelism, it is only because the economy is such that they can still satisfy their customers without it. There are many parallel programmers here too, and maybe there isn't a pressing need for even more because their work is being reused. I'm not sure what sort of statistical analysis you should use to determine prevalence of parallel systems usage, but I am pretty sure it is not by counting programmer interest. How many deployed systems are there? How many CPU-hours of parallel work are being done? What fraction of IT budgets are supporting production use of parallel systems?...

To be frank, I think the vast majority of computer cycles are spent executing code written by a small minority of the programmers in the world. These are the ones that matter, as far as optimizing for parallel environments. Sorry if that sounds elitist, but it's a bit like trying to analyze the distribution of race car drivers by surveying the interests of all licensed motorists.

Re:Parent is first reply that gets it... by Anonymous Coward · 2008-03-24 21:28 · Score: 1, Insightful

I think you've nailed it. There are no stats quoted in the article. The US still has more than 50% of the world's top500 machines (and I wonder what that figure rises to if you were to measure the top10000?) and is this Dobb's Code Talk telling me that nobody is writing code specifically for those machines? Bullshit. I'm not an American, nor have I been to the US, but I bet the parallel programmers are far more numerous there than here.
Re:Parent is first reply that gets it... by jd · 2008-03-24 22:38 · Score: 1
You don't sound elitist at all. If anything, a little generous. When people multithread and those threads use their own data and own logic, you have a MIMD solution. If they're trivial threads, you may be ok. Otherwise, the number of people who can understand all of the synchronisation and timing issues involved in MIMD... well, pick a number, divide it by itself, subtract 1.
(This is why most people stick with SIMD for local computing and MISD for grids.)
Those using MPI for parallelization are probably (hopefully) aware of PAPI, TAU, KOJAK and DAKOTA. The first two supply accurate timings to profiling software, the other two are profiling packages for MPI environments.
Unfortunately, it is limited to MPI. If you use PVM or BSP, I know of nothing similar. Likewise, if you do a remote memory-to-memory copy (skipping the CPU), profiling is of limited use. And as most MPIs use sequential operations to do collective calls, any idiot could replace an MPI collective send with a reliable multicast. Infiniband and iWarp support it. I doubt the profilers do.
This means you can't rely on high-schol techniques. You've got to plan what happens, when, and how that is to be communicated.
Ah, yes, communication. A simple cluster will work fine with a simple star topology. The bandwidth requirements rise expontially and you are likely to be forced to go with one of the Big Three: Fat Tree (masses of hotspots, prone to failure but cheap), Butterfly (medium risk of failure if the software is imabalanced, and almost any MPI program isn't just unbalanced but stark raving, Hypercube (low risk of failure, very popular with researchers, very stable, very long latencies and very high price).
You also need to weigh network performance and what price that performace is at:
- Dolphinics: 2.7 us/512B. Regular Sockets, SCI Sockets. Max: 346 MB/s Effective: LAN, Non-routable
- Infiniband: 5.0 us/512B. CPU bypass, sockets, verbs. Max: 12GB/s Effective: LAN, WAN
- Myrinet: 3.0 us/512B. Myrinet API. Max: 494 MB/s Effective: LAN, WAN
Now most poor research centers can't aford a hypercube topology with 12 GB/s links onto machines capable of digesting that much data. So, instead, they learn to program efficiently, much as the old assembly programmers (like myself) had to do with sparse resources. If you want to get ahead, keep the poverty mindset but exploit the raw computing power before you.
--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)

Maybe it's because... by Lorean · 2008-03-24 21:11 · Score: 1

Seasoned programmers understand what a pain in the ass it is, and hope to make their career exit before it takes over?

Verses Programmers? by g_vernon · 2008-03-24 21:29 · Score: 1

Aren't Verses Programmers generally referred to as Composers?

Hardware cost vs education by AHuxley · 2008-03-24 21:42 · Score: 1

Whats left in the USA?
The great 1960's and 70's engineering generations?
The DoD Ada people?

You now have generation wintel.
Dumbed-down, corporatized C coders.
One useless OS on a one or 4 core chip.
When game makers and gpu makers want to keep you in the past due to
the lack of developer skill you know you have problems.

The rest of the world may not have the best hardware, but they do try and educate the
next generation

--
Domestic spying is now "Benign Information Gathering"

Re:Hardware cost vs education by Nursie · 2008-03-24 23:12 · Score: 1

Hey, I'm a "Dumbed-down, corporatized C coder" and I've been working on threaded, parallel and distributed systems for the last 8 years!

Maybe I'm not that dumbed down, but we are out there and I don't think I'm particularly special...

Is this not SMP gone unto the die by dmurphy_58 · 2008-03-24 21:43 · Score: 1

Parallel programming here seems to be SMP re-dux except the cores are now tightly coupled, anybody who has worked in back-end servers since the mid-90's with threading should be able to handle this with little difficulty, (those guys should have 15+ years or close to it). Its the apps programmers who are going to have to learn a new way to do things. The interesting part is how the large number of cores is going to deal with a much smaller share of the cache, optimization, cache line sharing etc are going to be much more difficult.

--
dmurphy_58@yahoo.com

Parallel is really good. Parallel is really good. by BlackSabbath · 2008-03-24 21:47 · Score: 0, Offtopic

I agree with the article.

I agree with the article too.

I agree with the article.

I don't

I agree with the article.

agree with the article.

Makes perfect sense by Per+Abrahamsen · 2008-03-24 22:03 · Score: 1

As you grow older, you are more likely to focus on capitalizing on your current skills ("get things done") than on learning new skills. It makes perfect sense from an economic point of view, there are less productive years left to get a return on any investment made in increasing your productivity.

I don't believe it usually is a conscious economic estimation, it is simply build into our brains (with the usual "bell curve" caveat).

If I was 20 again, I would definitely think multi-threading, rather than the client-server approach to multiprocessing which was in vogue at the time.

As it is now, I learn just about enough multi-threading as needed to get my work done. That is, I had to make the GUI a separate thread in order to make the user experience tolerable, but making the actual number crushing multi-threaded will have to wait until there is funding.

Re:You can still make large apps without concurren by jamesh · 2008-03-24 22:08 · Score: 4, Funny

When your only tool is a hammer...

... everything looks like a skull.

Old farts... by Anonymous Coward · 2008-03-24 22:13 · Score: 0

Old programmers don't want to learn new things -- trust the tried and true.

Young bucks want to be on the cutting edge to get the jobs that the old people already have. I have seen cases where older programmers can be infuriatingly reluctant to learn new things, and this to the point where they are up in arms about things like adopting CVS/SVN, bug tracking systems, Web Services, using XML. People like that can actually obstruct the a company's efforts to modernize. The only solution is to fire them but that all to often simply isn't an option because they have worked them selves into key positions with the systems they have built and cannot be removed from the equation without severe costs to the company. I have also seen older people explaining to young bucks why a certified distro like Red Hat ES rather than the young bucks own favorite distro like, say, Gentoo, is the best pick for an Oracle server's OS (let's not even go into the debate about why to use Oracle and not MySQL). Another good example is the older/younger debate about why sometimes simple less optimal solutions are better than more complex and elegant ones because building a complex solution just isn't cost effective in that particular situation. There sometimes tends to be a total disconnect between younger and older developers. I don't think all older programmers are reluctant to learn new things they just tend to be conservative when it comes to using things like threading that can lead to difficult bugs to solve tasks where simpler solutions will do without much additional overhead. I have seen a number of young and inexperienced programmers 'discover' threading and go to town with using it. They build their application bit by bit in the usual ad-hoc and unplanned manner that is the custom of most inexperienced programmers (who needs bullshit like UML, right?). It is only when their application reaches a certain complexity level that they truly start to run into problems with things like shared data, thread synchronization, race conditions etc. The first time it happens it's kind of like walking straight into a newly installed plate glass door... rather painful. Not that there is anything fundamentally wrong with threading but you have to be aware of the problems you can run into and spending time doing proper application design in essential if you want to avoid the use of threading becoming a nightmare. The wisdom of planning things is something that most young programmers seem to have to learn the hard way.

Re:Old farts... by Anonymous Coward · 2008-03-24 23:36 · Score: 0

XML
Now there is a 'technology' that I hope dies soon. XML this, XML that. Working in the geosciences lately has everything moving to XML. I now have a simple volumetric mesh going from 15GB to 28GB. What the fuck am I going to do with the other couple of terabytes? I have a hard time believing that the 'proposed standard' is any better then some of the existing technologies.

Sorry, but I'd prefer to try and stand on the shoulders of giants. Cause, where I work, the older experienced programmers are rocket scientists.
Re:Old farts... by siride · 2008-03-25 00:51 · Score: 1

> I have also seen older people explaining to young bucks why a certified distro like Red Hat ES
> rather than the young bucks own favorite distro like, say, Gentoo, is the best pick for an
> Oracle server's OS

lol

(written on a Gentoo laptop, which is why I lolled)

Tell me something I don't know by Anonymous Coward · 2008-03-24 22:33 · Score: 0

You're all so damn frenzied that multithreaded programming is amazing and it's this and it's that and it's the next major paradigm that everyone has to jump on board with or be left behind with a million ageing COBOL programmers and blah blah blah. Oh and you need Erlang 'cos it's so fantabulous and the future of compilation will be dynamic runtime genetic algorithms and that threads are so complex and yet so many programmers can't handle them and... blah and...... yada and.... zip!

Really all extra cores do is give a PC more staying power. You already have hundreds or thousands of threads running on your OS at once within tens or hundreds of processes. They're already getting run across all cores and most of your computer is relaxing and sleeping while each core is 1-5% utilized. The OS is already making use of the extra hardware even though plenty of the processes it runs have only one thread.

Even with just one core your computer was doing several things at once effectively, or several hundred things at once if you count all the threads.

The only reason a programmer would want to try and make use of extra cores is if the problem domain contained processes that could be run simultaneously (like queuing network data while another thread renders a GUI), or there was a large process that could be split into several mini-processes (like analyzing video), or when writing a server which is handling several independent requests (like a typical web application). But these problems have been around for ages, and many existing solutions already employ threads or concurrent processes to take advantage of extra hardware. So where's the big amazing paradigm shift? It's nothing new, it's nothing particularly foreign, CS theory has had tens of years to produce plenty of best practices and guides for using threads while avoiding memory corruption or deadlocks. And so what if many programming problems are inherently serial - it doesn't really matter! How often do you wait for a CPU bound process these days? Much less than 15 years ago.

Ha.

Hi, European dev guy here.... by Nursie · 2008-03-24 22:34 · Score: 1

Been using multithread and/or multiprocess programming my whole career (about 8 years now). I don't know if it generalises to the whole of the US but I'm always astounded at the attitude of people on slashdot to parallel programming.

We hear over and over again "it's not goping to go mainstream", "it's hard to get right", "threads are a recipe for disaster", "synchronisation is just too much work to debug", "it's a niche area" etc etc

Even occasionally "It'll never take off".

Well, guys, it has. A lot of people, software houses from several employess to several thousand employees, have been using parallelism in commercial settings for a long time now. And we're already making use of the extra resources available in multicore platforms without having to do a single thing to our codebases.

Slashdot's usually pretty cutting edge in it's tech (and esspiallcy language) evangelism, why has it slacked on this one?

Re:Hi, European dev guy here.... by Anonymous Coward · 2008-03-25 00:43 · Score: 0

Parallelism and multithreading have their places, but people who are most at ease with multithreading haven't yet learned to fear the complexities involved. Experience gives you a healthy respect for the issues that really seem to be above mere mortals.

Multithreading has become such a fad for a couple of reasons. First, they give the developer an illusion simplicity in the beginning of development. Each thread minds its own business and inevitably follows its state machine that is plainly visible as a sequential program: receive this, compute that, save the result, send the response, repeat from beginning. The second reason for the proliferation and overuse of multithreading is the tools: Windows, Java, Python, CORBA and the like make it very difficult to multiplex multiple sources of inputs with anything else but threads. It follows that the tools strongly encourage the developer to use multithreading as a patent pattern for all problems involving more than one input source.

I've had my share of trying to sew together the multithreaded solutions of less timid developers. Locking is missing and cannot be added without deadlocks. It is also virtually impossible to find all places in the code where locking is missing and where it needs to be added.

And even when you have to resort to multithreading yourself, you are faced with intractable issues. First you neatly protect objects with proper locking, but then you have object A needing to lock object B and object B needing to lock object A resulting in a deadlock. The simple solution is the global lock (which negates all parallelism), and the complicated one requires you to open up your black boxes: you can't design classes in isolation. Every class must be designed with every other class in mind and their policies must cooperate at a global level.

The second big problem with multithreading is that the threading paradigm requires only one input at a time (after all, multiplexing was not allowed). But that is a gross simplification of the real world requirement. You really must be able to accept more than one possible input. At a bare minimum if a thread is waiting on a socket, it must be receptive to another thread telling it to stop waiting because of a change in circumstances. Because of the difficulty of multiplexing, the poor developer is reduced to periodic polling, hung applications and the like.

It follows that experienced developers generally prefer the single-threaded, event-driven paradigm (perplexingly called "asynchronous programming"). Unfortunately there are few standard utilities to support it (for example, the Python asyncore module does not support timers) so you end up having to develop your event-driven toolkits over and over again for different languages, products and companies.

Problems are still hard with event-driven programming, and seemingly simple things become complicated and require careful state-machine designs. However, you have the confidence that the paradigm can carry the weight of unexpected challenges down the road, and you can often troubleshoot your problems sequentially and deterministically.
Re:Hi, European dev guy here.... by Anonymous Coward · 2008-03-25 22:03 · Score: 0

My feeling is that the opposition here is not focused on multiple tasks being processed at the same time (like 2 processes doing independent work at the same time). Rather, people seem to complain about fine-grained multithreading where a single problem is decomposed to multiple threads with complex mutual interactions. Given that your coding team in most cases consists of average developers rather than top-class parallel computing researchers - the chances are 10:1 that the code would waste almost all the resources provided by multiple cores by threads waiting for each other, blocking each other by accessing shared resources etc. Avoiding races and deadlocks is not that difficult - what's complex is to make multithreaded program perform _really_ better.

Also note that an alternative to multi-threading often mentioned here is clustering. The difference between multithreading and clustering is that with clustering the communication between threads/nodes is narrowed to a very simple channel (like a socket) rather than doing complex synchronisation using mutexes, semaphores etc. So, if the projects you have been working on are "multithreaded" have a second look at it and maybe you'll find out that thay have been actually clusters of threads :) The intra-cluster communication may even not be a socket or pipe. It can be a message queue (www.erlang.org) or a lock-free linked list (www.zeromq.org) or whatever fast inter-thread communication mechanism available.
Re:Hi, European dev guy here.... by Nursie · 2008-03-25 23:48 · Score: 1

Looks like I've used clustering before. In a multithreaded system. Not my finest work, but some of my first in a commercial setting.

You'd have, say, an incoming comms library that maintained a number of threads for dealing with traffic. These were only synchronised when they started up and died, otherwise were independant of each other. They used pipes or socket pairs to speak to another cluster that dealt with comms archives and a few other tasks before handing the messagfe off to yet another cluster that dealt with server comms.

Whilst this was *highly* inefficient, looking back, due to many many idle threads hanging around, it did mean that there was little to no locking overhead in normal operation.

What turned out to be a limit to the scalability, I'm sorry to say, was the pipes/socket pairs. If only I'd known about lock-free linked lists...

Now, I'd do the whole lot as a set of jobs on a thread pool, but that's hindsight for you.

WTF question is this???? by Aceticon · 2008-03-24 22:39 · Score: 3, Insightful

Those of us doing server side development for any medium to large company will have already been doing multi-threaded and/or multi-process applications for ages now:
- When Intel was still a barely known brand, other companies were already selling heavy-iron machines with multiple CPUs for use in heavy server-side environments (didn't ran Windows though). Multi-cores are just a variant of the multiple-CPU concept.

The spread of Web applications just made highly multi-threaded server-side apps even more widespread - they tend naturally to have multiple concurrent users (<rant>though some web-app developers seem to be sadly unaware of the innate multi-threadness of their applications ... until "strange" problems start randomly to pop-up</rant>).

As for thick client apps, for anything but the simplest programs one always needs at least 2 threads, one for GUI painting and another one for processing (either that or some fancy juggling a la Windows 3.1)

So, does this article means that Japan, China, India and Russia had no multi-CPU machines until now ... or is this just part of a PR campaign to sell an architecture which, in desktops, is fast growing beyond it's usefulness (a bit like razors with 4 and 5 blades)

In Taiwan... by LwPhD · 2008-03-24 22:43 · Score: 1

...we were recently introduced to our office's new cluster of 36 dual quadcores, primarily slated for use in bioinformatics. The kinds of applications we do in our lab are highly suited to breaking up over many machines and wouldn't particularly benefit (or suffer) for explicitly coding them for multicore processors. We have Torque installed. However, at the training session, more time was spent discussing how to configure job submissions for parallel programs (including even a "Hello world" intro program demo for the truly clueless about parallel programming, ie me and those like me) than more useful things, like how to delete all jobs at once.

Re:In Taiwan... by Anonymous Coward · 2008-03-24 23:09 · Score: 0

Hate to say this, but I can understand where they were coming from. You would complain a lot more when your job(s) sits in a queue forever because some fool submitted a job incorrectly and goes unnoticed for a lengthy time. Spend more time on prevention rather than cure ... it saves a lot of headaches later for everybody involved.

How many cores in total? 36x2x4? 36x4? 36x2? How many cores per node? I can see why they spent a lot of time on step one. Know the architecture of the machine!

Well Said. by FatSean · 2008-03-24 23:01 · Score: 1

Those containers do hide some of the complexity of parallel programming, but the programmer still needs a basic understanding to produce a usable result.

This whole article smells of hype...

--
Blar.

That's why you use POSIX threading... by Nursie · 2008-03-24 23:04 · Score: 1

... with C. Then you can't avoid knowing what's going on under the covers.

It's not that tough, really.

Incorrect account of Moore's Law by Kupfernigk · 2008-03-24 23:24 · Score: 1

The original version of the Law said that gate density doubles every N months (12In case you hadn't noticed, web2.0 is largely about systems running many, many similar processes in parallel. These processes are usually quite independent. (Even on my laptop, running an Ethernet connection has little to do with editing a document on the hard drive.)

If you are working on a single user system such as a word processor, parallelism has little significance. But if you are a Google, wanting to deliver similar but isolated web services to many people, or if you are building a switching exchange such as a scalable email server, VOIP exchange, XMPP server, print server...you are very interested in parallelism. And any system which causes the number of parallel processes to expand in some way coordinated with the number of available cores or core threads is very interesting indeed.

What especially pleases me about this article is that I put my current sig up before it appeared. It now seems slightly prophetic.

--
From scarped cliff or quarried stone she cries "A thousand types are gone, I care for nothing, no not one."

On the contrary... by Viol8 · 2008-03-24 23:27 · Score: 1

"One of the reasons more seasoned programmers are not particularly interested is that in most cases someone else has already doen the hard work."

Who is this mythical "someone else"? I'd like to meet them. Incidentaly since when have database triggers been parallel systems? The LAST thing you want in a database is these sort of things running parallel, thats why you have locking!

"It is very rare that you actually need to do parallel programing yourself."

Err , if you count threads as parallel programming then I do it in virtually every project I work on. You talk about POSIX threads as if there some scary API that only people who require real power use. On the contrary , they're an API that probably almost every C/C++ unix coder uses all the time these days.

Re:On the contrary... by perlchild · 2008-03-25 01:12 · Score: 1

I think you've nailed one of the problems right on the head.

A lot of parallel processing just won't work, because for conceptual reasons, we follow the surgeon's "one hand in the chest area at a time" rule. You can parallellize all you want, but all you're going to do, is make a faster desktop(ok that's useful, but not where the development money goes), and the servers are going to lag behind, insert an example of a pointy-haired database administrator talking to another: "Yeah I have 16 drive controllers on this beast, to take advantage of parallellism even on IO" "But... you only got one database, and one table in that database"
"So? I got to have sequential, unique transaction numbers, all across the board, with synchronized transations... I need just one table."

Wasteful hardware spendings galore...
Re:On the contrary... by supersnail · 2008-03-25 04:02 · Score: 1

Well for one thing real database systems are massively parrallel -- thats why they need locking mechanisms.
Supporting 5,000 connections while executing hundreds of parrallel requests is "normal" in the DB2 Oracle world.

Most of my current programming is done in a J2EE/WEB type enviroment where the containers take care of the most of the parrallelism issues for you. And in my current job parallelism means getting the processes to spread out over several machines rather than several processors so threads of any kind are really are not the answer.

As for POSIX threads being scary, you have to RTFM, but they are the tool of choice if you must multithread. Not some OO threads wrapper, not proprietary threading that comes with the OS and not Java threads if you can in any way avoid them.

--
Old COBOL programmers never die. They just code in C.
Re:On the contrary... by TheLink · 2008-03-25 04:26 · Score: 1

Why would you need sequential transaction numbers? I can understand the need for unique transaction numbers.

I can also understand why if you write to the same row in the same table you need to serialize. I don't see why you can't take advantage of the 16 drive controllers if you don't write to the same row.

So far to me it seems like I/O is a bigger problem. After decades hard drives still have about the same access times. Add as many cores as you want, but when the time comes to write the results permanently you have to wait for the disks, and that sure is slow.
--
- Too many replies beneath your current threshold

oops sorry by Kupfernigk · 2008-03-24 23:36 · Score: 1

Brain in neutral, the strange artefact there was an attempt to write 12<N<24.

--
From scarped cliff or quarried stone she cries "A thousand types are gone, I care for nothing, no not one."

....or maybe just maybe by deKernel · 2008-03-24 23:45 · Score: 1

We old farts are not all that scared of doing multi-core development with all the normal tools likes basic threads and sync objects (events, mutex's and semaphores). Making code thread-safe is really not that difficult once you understand the basics (I am using C++ here since that is what I have experience with regarding multi-core/multi-thread experience) of keeping your data safe by either putting it on the stack or use the sync objects to keep global data safe.

Re:....or maybe just maybe by jamesswift · 2008-03-25 01:15 · Score: 1

Making code thread-safe is really not that difficult once you understand the basics I have to dissagree on that. I'm sure you are professional and understand the task well but compare the effort that you have to put into it for what you get. No other part of programming in c++, bar complex template design, takes as much care as multithreading. To design something really well it requires that you know in advance how it will be used. But you can only know so much.
For example, you design a class that's thread safe. Lets say no more than one thread will be messing with the internals of the class at a time. This is fine for most use cases so far but one year down the line you want to move some data between instances of these classes in an atomic, thread safe manner at a higer level in the program and you find your design doesn't allow this ONLY because of threading issues. This is a major difficulty no matter how good you are at coding because you can't predict the future. There could be dozens of possible uses devised for the class that conceptually are already possible but you have to add extra logic just to keep it thread safe.
It's hard.

--
i wish i could stop

concurrency in programming by md_129 · 2008-03-24 23:48 · Score: 1

Maybe yes, Maybe not, this is all speculation at this point. Just take a look at Google and what they are capable of doing.

An emerging class of problems by Ceriel+Nosforit · 2008-03-24 23:55 · Score: 1

Most comments so far seem to concern the type of problems we've dealt with in the past, but there is another type of problem that is emerging. This new type tends to deal with short computations on vast amounts of data rather than long computations on small amounts of data, as we're used to. A few examples;

SDR, software defined radio. One of these can easily saturate a gigabit ethernet link with data, and we'll want the computer to automatically sort signal from noise, and determine if we're interested in the signal. Perhaps we'll even want to do voice recognition on the signal and look for keywords? - A parallel approach is suitable.

Robotics. They'll need sensors, and it would seem that the in the biological equivalent this problem got solved by rendering the right hemisphere of the brain as a parallel computer that sorts through this data for the benefit of serving the left hemisphere with quality data to process in a serial fashion.

Data mining. It's nice if it's Us duing rather than Them. I might want to trawl news sites, web forums and repositories of research papers looking for things I'm interested in without having to offer up my personal preferences to advertising agencies.

All in all it appears that long computations on small amounts of data is actually a lot less common than short computations on vast amounts of data. The configuration of the brain makes a lot of sense in this case.

--
All rites reversed 2010

Re:An emerging class of problems by espressojim · 2008-03-25 04:45 · Score: 1

You can add to that a number of bioinformatics problems, where you're trolling millions of items on an entire genome, and running the same calculations on each and every one. These ought to be done in parallel - right now we often break them up by some natural partition like chromosomes, and run them across multiple machines.

Why not just say: "Check out COSA"? by Anonymous Coward · 2008-03-24 23:59 · Score: 0

If you thought your answer had merit, you could have linked us directly to the COSA project, instead of to a 5-page pointless review of all life as we know it.

And then we could have replied with a quick link to Erlang which has handled 10^5-process fine-grain concurrency in industrial PTT exchanges for a decade very easily, and much time would have been saved.

Oversimplified by Moraelin · 2008-03-25 00:00 · Score: 3, Insightful

That's an oversimplified view.

It's more like when you've got enough experience, you already know what can go wrong, and why doing something might be... well, not necessarily a bad idea, but cost more and be less efficient anyway. You start having some clue, for example, what happens when your 1000 thread program has to access a shared piece of data.

E.g., let's say we write a massively multi-threaded shooter game. Each player is a separate thread, and we'll throw in a few extra threads for other stuff. What happens when I shoot at you? If your thread was updating your coordinates just as mine was calculating if I hit, very funny effects can happen. If the rendering is a separate thread too, and reads such mangled coordinates, you'll have enemies blinking into strange places on your screen. If the physics or collision detection does the same, that-a-way lies falling under the map and even more annoying stuff.

Debugging it gets even funnier, since some race conditions can happen once a year on one computer configuration, but every 5 minutes on some hapless user's. Most will not even happen while you're single-stepping through the program.

Now I'm not saying either of that is unsolvable. Just that when you have a given time and budget for that project, it's quite easy to see how the cool, hip and bleeding-edge solution would overrun that.

By comparison, well, I can't speak for all young 'uns, but I can say that _I_ was a lot more irresponsible as the stereotypical precocious kid. I did dumb things just because I didn't know any better, and/or wasted time reinventing the wheel with another framework just because it was fun. All this on the background of thinking that I'm such a genius that obviously _my_ version of the wheel will be better than that built by a company with 20 years of experience in the field. And that if I don't feel like using some best practice, 'cause it's boring, then I know better than those boring old farts, and they're probably doing it just to be paid for more hours.

Of course, that didn't stop my programs from crashing or doing other funny things, but no need to get hung up on that, right?

And I see the same in a lot of hotshots nowadays. They do dumb stuff just because it's more fun to play with new stuff, than just do their job. I can't be too mad at them, because I used to do the same. But make no mistake, it _is_ a form of computer gaming, not being t3h 1337 uber-h4xx0r.

At any rate, rest assured that some of us old guys still know how to spawn a thread, because that's what it boils down to. I even get into disputes with some of my colleagues because they think I use threads too often. And there are plenty of frameworks which do that for you, so you don't have to get your own hands dirty. E.g., everyone who's ever wrote a web application, guess what? It's a parallel application, only it's the server which spawns your threads.

--
A polar bear is a cartesian bear after a coordinate transform.

Re:Oversimplified by Nursie · 2008-03-25 01:44 · Score: 2, Informative

"E.g., let's say we write a massively multi-threaded shooter game. Each player is a separate thread,"

Well there's your first mistake.
That's a recipe for disaster and built in limits to the number of players.

Ideally you seperate up your server app into multiple discrete jobs and process them with a thread pool. I don't know how well that maps to gaming but many server apps work very well with that paradigm.
Re:Oversimplified by Moraelin · 2008-03-25 01:55 · Score: 1

Ah, there we go. You're thinking like an experienced programmer. That's exactly the kind of thing I was talking about. The moment when you also know, or can extrapolate from past experience, what can go wrong if you do that. The 15 year old hotshot response would be more like, "woohoo, I get to play with threads!"

Of course, now a lot of people will see that as you being a boring old fart who'd rather stick to tried-and-tested stuff (like those thread pools), instead of being excited to try new stuff.

--
A polar bear is a cartesian bear after a coordinate transform.
Re:Oversimplified by Nursie · 2008-03-25 02:01 · Score: 1

That's very true.

"The 15 year old hotshot response would be more like, "woohoo, I get to play with threads!""

Funny you should say that. My first job out of university, six months into it they handed me control of a brand new middle tier project that was a big part of the company's future (yeah, there was something wrong with them). And my first reaction was...

God what a mess. Still, it worked. Mostly.
Re:Oversimplified by Alphasite · 2008-03-25 02:02 · Score: 1

That's an oversimplified view.
E.g., let's say we write a massively multi-threaded shooter game. Each player is a separate thread, and we'll throw in a few extra threads for other stuff. What happens when I shoot at you? Actually all online multiplayer games work just like that, a thread (or several) on each machine connected and one more for the server that handles it all (including conflict resolution). Obviously the tasks that you assign to each thread have to be coherent but that's not computer related, in a factory you can have a thousand workers, each one performing a different task (like building something) and if the how and when it's not clearly defined it will end up being a complete mess anyway (an there are not computer to mess things up here).
Re:Oversimplified by Anonymous Coward · 2008-03-25 03:53 · Score: 0

What happens when I shoot at you? If your thread was updating your coordinates just as mine was calculating if I hit

Well your model sucks for a start. Someone needs to own the co-ordinates. Why not put have your thread call the other threads hit_test() function and let it worry about wether it needs to syncronise with its own calculate_new_position() function? There you go, problem solved.

These types of simple threading problems just requires that the developer understands basic concepts such as "critical sections". Apparently most don't.
Re:Oversimplified by Coryoth · 2008-03-25 04:28 · Score: 2, Informative

E.g., let's say we write a massively multi-threaded shooter game....Debugging it gets even funnier, since some race conditions can happen once a year on one computer configuration, but every 5 minutes on some hapless user's. Most will not even happen while you're single-stepping through the program Well that just says that you're doing it wrong. Sure, a massively concurrent system done in the manner you describe would be incredibly tricky to make work correctly. Of course that's not the only way to do it. With a little bit of thought and analysis going in you can write massively concurrent multi player games that just work right first time. That's a system that had literally thousands of concurrent processes all interacting, with no deadlock or livelock, and no complex debugging to ensure that was the case. Just because you can't imagine how it could be done doesn't mean it can't be.

--
Craft Beer Programming T-shirts
Re:Oversimplified by DrFalkyn · 2008-03-25 08:51 · Score: 1

MMOG's might not be the best example, because you largely don't care for "correctness" like you do with say, a database app that handles financial transactions. If the client threads/processes don't get their input into the server process in time, because they have a slow network connection, then too damn bad, they'll have to wait for the next update or send the command again.

You are largely right, however. There are few problems where multithreading/multiprocessing buys you anything. On a single processor, the only reason to do it is take advantage of times where the user isn't doing anything. For example, theres no reason why you wouldn't want to have a separate thread for AI processing a chess game. There is no need to obtain locks because the algorithm is independent of user input.

Better developers recognize parallel fad by artsrc · 2008-03-25 00:01 · Score: 1

The parallel programming being discussed mostly solves the same problems as serial programs, only faster. So if you have code that runs fast enough as a serial program, you are better off solving a different problem than exploring parallel programming. And if you have a program that is running to slowly then you need to work out why. Most often your program is not CPU bound, and moving to the type of parallel program being discussed won't help. And if you have a program this is too slow and CPU bound there are a number of optimizations you can choose, most of which are more localized and simpler than moving to parallel programming. And if all else fails, then maybe you should look at parallel programming for your problem. If you do, frequently the problem you are solving is not the general one, and a simple solution exists that is much less complex than the more general parallel programming being discussed. So maybe older and better developers are looking at more promising solutions, to more important problems, rather than focusing on one type of optimization being pushed by unimaginative hardware vendors. Of course Erlang looks like a fun, so I check it out, but I don't think it is the most important development around.

Read TFA by News+for+nerds · 2008-03-25 00:01 · Score: 1

While most replies seem to go around just "young kids tend to jump on new things" "parallelism has been here for years in inherently parallelizable tasks such as server or graphics", I managed to read TFA, and voila...

>my Threading Building Blocks (TBB) book, sold more copies in Japan in Japanese
>in a few weeks than in the first few months worldwide in the English vesion
>(in fact - it sold out in Japan - surprising the publisher!)

>contributors and users of the TBB open source project are worldwide, but with
>some particularly outstanding users and contributors in Russia

So basically the author didn't know there are already a vast number of programmers outside of the US. This is not surprising, China has already 5x more people than the US. It seems he thought all of software are developed in the US when there are counterexamples such as the Linux kernel and Ruby.

Nothing to see here, move along.

Re:Read TFA by smallfries · 2008-03-25 00:59 · Score: 1

Further to your point:

a) He is trying to sell his book
b) He is trying to sell his book

and, not to forget

c) He is trying to sell his book.

At no point does it occur to him that sales of his entry-level guide to multi-threading is selling better amongst less experienced programmers ... because the older programmers with experience do this shit for their bread and butter.

As you said, Nothing to see here, move along.

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php

Nowt wrong with C and POSIX threads by Nursie · 2008-03-25 00:08 · Score: 0, Flamebait

They'll allow you to use threads most happily and not take you too far away from the hardware.

The fact that you actually have to think a bit more about who's accessing what data at what time, and avoid trvial problems like deadlock, does not make it "too hard". And the fact that C is not ideal for parallelising mathematical operations doesn't make it useless either. Threads can be doing totally different things, or be pooled for great joy.

Re:Nowt wrong with C and POSIX threads by Metasquares · 2008-03-25 01:42 · Score: 2, Insightful

Anything harder than a problem needs to be is too hard. The question is whether concurrency can be made easier while preserving the benefits it provides.
Re:Nowt wrong with C and POSIX threads by Nursie · 2008-03-25 01:49 · Score: 1

"Anything harder than a problem needs to be is too hard."

Agreed 100%

"The question is whether concurrency can be made easier while preserving the benefits it provides."

It's not difficult now. People just have a weird mental block about it. Sure, it's a different way of thinking, but it really isn't that tough.

Or maybe I'm just some sort of programming god. No, no, I think it's more likely that people just have weird ideas about how hard it is.

Threads v. Processes by Just+Some+Guy · 2008-03-25 00:13 · Score: 1

The other day I realized that I don't really know why threads are supposed to be better than processes, other than because "multithreading" sounds cooler and it's not old and boring and well-understood like "multiprocessing". I'm asking this sincerely: why do people only talk about multithreaded processing whenever parallel programming comes up lately? It seems like IPC is marginally easy with threads but that design is much trickier, so what's the big win? Is it a CPU optimization thing?

--
Dewey, what part of this looks like authorities should be involved?

Re:Threads v. Processes by pohl · 2008-03-25 03:35 · Score: 1

From Thread (computer science): in general, a thread is contained inside a process and different threads of the same process share some resources while different processes do not.

The aforementioned shared resources implies a tradeoff. Threads don't need multiple copies (a space savings) and don't have as high a cost of context switching (a time savings). The tradeoff for those benefits is that you have to think about sharing those resources safely, which requires more thought.

--
The "cue the foo posts in 3, 2, 1..." posts will commence with no subsequent foo posts in 3, 2, 1...

Old stuff by fitten · 2008-03-25 00:13 · Score: 1

I've been writing multithreaded and parallel (MPP/MIMD) code for 20 years or so and there were plenty of people writing those kinds of codes long before I started. It's not exactly a new thing.

Threads: Threat or Menace by martincmartin · 2008-03-25 00:21 · Score: 5, Insightful

It always surprizes me how many people say "we have to multithread our code, because computer are getting more cores," not realizing:

There are often other ways to do it, e.g. multiple processes communicating over sockets, or multiple processes that share memory.
Threads are hard to get right. Really, really hard.

When your library of mutexes, semaphores, etc. doesn't have exactly the construct you need, and you go to write your own on top of them, it's really, really hard not to introduce serious bugs that only show up very rarely. As one random example, consider the Linux kernel team's attempts to write a mutex, as descried in Ulrich Drepper's paper "Futexes are Tricky."

If these people take years to get it right, what makes you think *you* can get it right in a reasonable time?

The irony is that threads are only practical (from a correctness/debugging point of view) when there isn't much interaction between the threads.

By the way, I got that link from Drepper's excellent "What Every Programmer Should Know about Memory." It also talks about how threading can slow things down.

Re:Threads: Threat or Menace by Anonymous Coward · 2008-03-25 01:17 · Score: 1, Insightful

Threads are hard to get right. Really, really hard.

Hogwash. Feel free to keep believing that though while the rest of us write functioning multi-threaded code.

If these people take years to get it right, what makes you think *you* can get it right in a reasonable time?
The paper you link to is specifically talking about the Linux futex implementation. Futexes are a special case mutex, and apparently when they were introduced in the 2.5 kernel they were seriously broken: from a quick skim on the paper, whoever wrote them wasn't thinking too hard or had no idea what a spinlock was, for a start.

It may surprise you to learn that futexes are not required for the application developer: they are an implementation detail. In reality you only have one primitive to worry about, the counting semaphore. Everything else is a derivation of it: mutexes, futexes (which is really just a mutex) conditionals and all the other fancy "primitives" which arn't actually primitives, all boil down at the core to a single semaphore to control concurrency.

Those of you who were awake and paying attention during your expensive college courses should already understand critical sections. I'm sure you all have enough experience to understand why global variables are a bad idea. So why exactly should the idea of concurent programming suddenly seem hard?

By the way, plenty of other OSes managed to implement futexes and got them right long before the Linux developers suddenly woke up to the fact that threaded programming was important and they might need to actually support it. BeOS managed to provide a heavily multi-threaded implementation with working futexes from the early 90's. So the real question is, if the Linux developers can't manage it, are they as smart as you think they are?
Re:Threads: Threat or Menace by ceoyoyo · 2008-03-25 03:10 · Score: 1

You can communicate with sockets from thread to thread as well. If you use a decent thread library it's really not that hard and you get the benefit of being ABLE to do the tricky shared memory stuff if you need to. Threads have the additional benefit that you don't have to worry about the user killing one of them.
Re:Threads: Threat or Menace by Jerf · 2008-03-25 03:56 · Score: 1

That's why the really interesting work is being done in how to do multithreading without mutexes, semaphores, and all the other lock-based concurrency systems, which I'm surprised to see few people talking about in my skimming so far. Possibly because the old bucks aren't aware of these things, think they aren't fundamentally different, or possibly think they've been tried already.

The two most promising approaches I know of are message-passing based concurrency and Software Transactional Memory.

Message-passing based concurrency has been tried before, but to really reap the benefits it needs to be pervasive and easy. The current market leader in the field of message-based concurrency that gets things done in the real world (not just on paper) is Erlang. (I program professionally in Erlang.) I characterize it this way: Erlang doesn't make the issues of concurrency like deadlock go away, but it takes them from requiring superhuman intelligence to human intelligence.

As I like to say, a large enough quantitative change becomes a qualitative change. Yes, you can do everything in C that you do in Erlang, though you've got a lot of implementation in front of you and the resulting syntax will be awful. But by making the Right Thing easier in Erlang than the Wrong Thing, the experience is qualitatively different.

Software Transactional Memory is still in flux; the Haskell implementation has some relatively unique characteristics, but I have not yet decided how I feel about it. Erlang actually has an embryonic STM system called "Mnesia", which isn't quite the same as the hyper-pure on Haskell has, but on the other hand, we know it works. But it's something to keep an eye on; I'm experimenting with it in a personal project right now.

There are several promising lines of attack on the multicore problem, but none of them look like "semaphores and mutexes". Consensus seems to have developed that those are fundamentally flawed primitives to build software on. (Of course, message-passing is implemented with some locking, and STM can use things like that to be implemented, but saying that we are therefore building our systems on locks anyhow is exactly like claiming that since everything ends up as machine code, we should just program in machine code instead.)
Re:Threads: Threat or Menace by jc42 · 2008-03-25 09:05 · Score: 1

The really interesting work is being done in how to do multithreading without mutexes, semaphores, and all the other lock-based concurrency systems, which I'm surprised to see few people talking about in my skimming so far. Possibly because the old bucks aren't aware of these things, think they aren't fundamentally different, or possibly think they've been tried already.

Well, yes, a lot of them have been tried already. I published a paper (in one of them there geeky computer journals ;-) about one such scheme about 25 years ago, and I've had a number of discussions with others that have done similar things. What a lot of us old bucks have seen is something different: We keep finding that the groups' managers simply don't believe that our weird scheme can work, and order us to use mutexes and semaphores.

In a few cases, I've managed to sorta sneak more efficient sync schemes into the code when nobody was looking. And in a few cases, when this was discovered, someone else rewrote the code with the usual locks, "so that it'll work". They never actually had evidence that it didn't work; they just "knew" that it couldn't work, and wanted to make sure that the software didn't depend on untested methods. The result inevitably was a significant loss of speed, of course, but that didn't matter.

Maybe I should start looking into such things again. Sometimes it can take a few decades (or centuries) for people to adopt something that has been rediscovered by many people in the past. Sometimes we have to wait until an idea gets into the textbooks, so that the younger bucks have heard about it and will consider using it. Meanwhile, those of us who know a bit of history are condemned to watch history repeat itself unnecessarily.

Who was it that said that most discoveries are named after the last person who discovered them? (Because once a discovery has a proper name attached, it can no longer be claimed as a new discovery. ;-)

--
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
Re:Threads: Threat or Menace by malevolentjelly · 2008-03-25 09:28 · Score: 0

So the real question is, if the Linux developers can't manage it, are they as smart as you think they are? Oh come on. I don't mean to be rash but you, sir, are an idiot.

You're missing some fundamental information here that is primary to linux development.

First off:

A) Nothing important has happened in computing since the 70's, this is something unix developers have long understood.

B) If something is difficult to implement, it's probably not that important. But at least we support other stuff. And it's free as in libre. It's the best free implementation.

C) Linux is totally better than Minix, which was previously the most advanced kernel on the planet. Now Linux is.

You better check your shit before you come up on slashdot making crazy claims about linux devs being mediocre. If they were mediocre, why is linux free?
Re:Threads: Threat or Menace by Jerf · 2008-03-25 11:29 · Score: 1

Sometimes it can take a few decades (or centuries) for people to adopt something that has been rediscovered by many people in the past.
As usual, nothing new has been discovered since the 1960s.

The big difference now is that, for instance in Erlang, you can get an entire runtime built on the idea of message passing, with a library built on message passing, and many downloadable libraries also built on message passing, etc. It's not somebody's thesis and proof of concept, it's a set of deployable libraries and a small, but real, set of real-world applications.

If you do get into Erlang, or are thinking about it, you should consider downloading EJabberD and poking at the source for that. I work with that code professionally, and it took me a while to really wake up to how beautiful the code is; this massively concurrent messaging system that is the very definition of a non-embarrassingly parallel problem (all sorts of dependencies), and there isn't a lock as far as the eye can see. And it really does work in real deployments under heavy load; it's not one of those things that says it works when it really means "it ought to work in theory" or one of those open source projects that describe what they hope to accomplish as if they're already the leading library that does it... it really works. The code says what it does and it does what it says. It's really quite impressive.

Oh, and it clusters across multiple machines too. No big deal, right...?

mod parent up etc. by Nursie · 2008-03-25 00:26 · Score: 1

Though you say "they're not using a parallel programming language appropriate for their target."

I think too much emphasis is put, by some, on using a high level language that is specifically designed for parallelism. Personally I've always found C and POSIX threads more than adequate.

Easy by microbox · 2008-03-25 00:35 · Score: 1

Name a single real world problem that doesn't parallelize.

Handling event propagation through a hierachy of user interface elements.

--

Like all pain, suffering is a signal that something isn't right

Re:Easy by Chandon+Seldon · 2008-03-25 01:51 · Score: 1

That's an implementation detail in your GUI toolkit, not a computing problem. Hell, it'd probably be possible to design a programming language that did static binding for event callbacks without changing the behavior - thus reducing that operation to zero runtime computation.

--
-- The act of censorship is always worse than whatever is being censored. Always.
Re:Easy by microbox · 2008-03-25 02:08 · Score: 1

The real cost is in all of the different GUI elements that want to use event information and then respond in turn with their own events. This creates a deterministic event propagation model where one thing leads to another. Parallel programming just makes the deterministic aspect harder, while gaining pretty much nothing on the propagation side of things. That's because Widget A must finish processing the event before Widget B gets it's hands on it.

GUI toolkits have evolved towards dynamic encapsulation of logic for a reason. QT even went to the onerous trouble of extending the C++ language such that signals and slots could be used instead of clunky callbacks. GUI programming is much more straight-forward in dynamic languages such as smalltalk and flex.

Static binding of event callbacks might save you a few hundred nano seconds, at the expense of a few 1000 extra lines of code. And, you still haven't made the argument that performance can be improved by threading the event propagation model.

--

Like all pain, suffering is a signal that something isn't right
Re:Easy by mrlpz · 2008-03-25 02:42 · Score: 1

The act of censoring is NOT always worse than that which is being censored. Ask any victim of child pornography. Insensitive clod.
Re:Easy by Anonymous Coward · 2008-03-25 04:10 · Score: 0

So loosely couple the UI elements with their logic counterparts and do all your event processing in the latter. Only introduce the former when you actually need to update the screen.
Re:Easy by iamacat · 2008-03-25 08:04 · Score: 1

I would ask, but I haven't heard of any victims of child pornography. I did read some news about victims of child exploitation. Invariably, they would never be found and rescued if the images documenting their abuse were not published on Internet. Perhaps we should focus on stopping the crimes instead of censoring the evidence? I am sure some people enjoyed viewing Abu Ghraib photos, but it's doubtful that the abuse would have been stopped as quickly if they were not publicly disseminated.
Re:Easy by Chandon+Seldon · 2008-03-25 09:44 · Score: 1

I'm not saying that static binding of event callbacks is the right answer, only that the current model that you describe in such detail isn't the only way to do it.
In fact, some have argued that the problem of interactive GUI applications is innately concurrent: http://video.google.com/videoplay?docid=810232012617965344

--
-- The act of censorship is always worse than whatever is being censored. Always.

I'm surprised nobody's mentioned ... by jamesswift · 2008-03-25 00:37 · Score: 1

functional languages. It seems to me one of the most promising angles on this problem is the resurgence of functional languages such as haskell, list and f# and even the adoption of concepts from that world seen in languages such as python and so on. As for US and European interest, for example Microsoft Research have some excelent papers on possible solutions e.g. Software Transactional Memory http://research.microsoft.com/~simonpj/papers/stm/ STM for C# http://research.microsoft.com/research/downloads/Details/6cfc842d-1c16-4739-afaf-edb35f544384/Details.aspx I personally suspect finely grained parallelism is unlikely for the forseeable future for reasons such as existing knowledge of employees and legacy code. But hybrid solutions such as shifting heavy computation to languages suited to easily writing concurrent code (e.g. F#) tying into imperative languages for the event driven side. E.g. C#. Who needs a massively parallel gui anyway? Very few applications right now.

--
i wish i could stop

It's been mainstream for years by Nursie · 2008-03-25 00:38 · Score: 2, Insightful

You just haven't noticed.

Multi process apps have been common in the business and server app space for almost two decades.
Multi thread apps have been common in the business and server world for a few years now too.

To all having the will it/won't it go mainstream argument: You missed the boat. It's mainstream.

Re:It's been mainstream for years by sonofagunn · 2008-03-25 02:20 · Score: 1

Exactly. Web servers, j2ee app servers, search engines, databases, compilers, web browsers, pretty much everything is already multithreaded. Why the panic?

New or old programmers, still a HARD problem by pcause · 2008-03-25 00:42 · Score: 2, Interesting

Parallel programming is simply harder than typical sequential programming. Not only does the design take more time and effort, but the debugging is VERY much harder. tools for parallel programming are poor but debugging tools are basically pathetic. Worse, today's project and development methodologies don't focus on getting something up and hacking, not on careful upfront design that is needed to really parallelize things. We get most of our parallelism from the web server being multi-threaded and the database handling concurrency.

As many have said, large scale parallel systems are not new. Just because we need a solution to the problem doesn't mean that it will appear any time soon. Some problems are very difficult and involve not only new technologies and programing models but major re-educational efforts. There are many topics in physics and mathematics that only a small number of people have the intellectual skill and predilection to handle. Of all the college graduates, what percent complete calculus, let alone take advanced calculus? Pretty small number.

My prediction is that the broad base of programmers will have the tools and be able to do some basic parallelism. A small number of programmers will do the heavy duty parallel programming and this will be focus on very high value problems.

BTW, this Intel guy, while addressing a real issue, seemed to be doing marketing for his toolkit/approach. Sounds like a guy trying to secure his budget and grow it for next year.

Threading is a headache pure and simple by microbox · 2008-03-25 00:53 · Score: 2, Interesting

There are two problems: thread sychronization, and race conditions.

Race conditions
A modern CPU architecture uses multiple levels of cache, which aggrivate the race condition scenario. For a programmer to code multi-threaded code, and "not-worry", then the architecture must always read and write every value to memory. This worst case scenario is only needed in a tiny fraction of cases. So the compiler can do much better by working with the memory model of the architecture, instead of assuming that no memory model is in place.

Any significant improvement in speed will require solving the race condition problem. I believe research into transactional memory may be the way to go - but it is still half the speed of normal memory access.

The synchronization problem
This requires that the programmer reason about multiple threads (at least two). It doesn't really matter what buildling blocks you're using, you simply must be aware that two pieces of code are effectively executing at the same time - or at least their cpu slices are interwoven. This type of reasoning significantly raises the bar for writing bug-free code, because a whole new class of subtle problems arise. The mind must wrap itself around something that is simply more complex, and we have trouble enough with single threaded programs.

I like the flex/javascript solution. Just one thread - with asynchronous-like behaviour through events. It means there are no race conditions, and you can get enough async behaviour to get the job done. I hope, in the future, that both these technologies allow you to create a seperate "process" (thread but with different memory context) that executes asychronously and returns the results on the main event queue. It's a bit clunky, but at the same time it's much more idiot proof. And I'm speaking as someone who's done a reasonable amount of parallel programming.

--

Like all pain, suffering is a signal that something isn't right

Re:Threading is a headache pure and simple by SanityInAnarchy · 2008-03-25 04:43 · Score: 1

I like the flex/javascript solution. Just one thread - with asynchronous-like behaviour through events.

Which means your own app can't be threaded. Which is fine, for many things, and unacceptable, for many others.

It means there are no race conditions

I suppose it depends how you define "race condition", but they certainly do exist, especially when you get better at wrapping that asynchronous behavior. I've certainly written JavaScript apps which could be broken badly by pressing a particular key too quickly.

The benefit is, of course, that you don't have to think about certain types of synchronization at all. The entire system is effectively locked while your one thread executes. If you need to be sure a particular object isn't modified until you're done with it, simply don't make any asynchronous calls until... But you may have to make those asynchronous calls anyway. Which means you'll probably want to add some sort of "lock" of your own -- a boolean that says "busy" or something.

Basically, Javascript is cooperative multitasking. And you still have problems -- I would much rather see parallel programming solved at a more fundamental level.

I hope, in the future, that both these technologies allow you to create a separate "process" (thread but with different memory context) that executes asychronously and returns the results on the main event queue.

It sounds like coroutines. Am I close?

--
Don't thank God, thank a doctor!

Parallelism In the Real World by ChaoticCoyote · 2008-03-25 00:55 · Score: 1

I've made a good chunk of my living from writing high-performance software using parallel algorithms, in C, C++, Fortran, and Java.

My clients? Britain, Brazil, Taiwan, and (yes) Omaha, Nebraska, USA.

Over the last quarter century of coding for a living, the greatest interest in advanced algorithms has been shown by my overseas customers. American companies tend to be conservative and bottom-line oriented. "Foreign" nations emphasize a broad education and creative thinking, thus making them more amenable to complex and new ideas, whereas the United States is focused on producing more MBAs -- and that difference influences everything in society, including software design.

--
All about me

duh... by Uzik2 · 2008-03-25 00:55 · Score: 1

* It's a difficult problem with no easy solutions.
* people with experience have jobs.
* You keep your job by delivering working solutions, not research.
Connect the dots.

--
-- Programming with boost is like building a house with lego. It's a cool but I wouldn't want to live in it

A revolution in our midst by seclar · 2008-03-25 01:01 · Score: 1

It's hard to appreciate, but we are in the middle of a revolution. High-performance, multi-core computing has made significant progress in the last few years and it's causing a lot of headache for software developers who try to exploit the potential power available. Multi-threaded programming does not bring a solution to the masses - understanding and preventing the issues arising from concurrency requires a lot of exposure to it.

The young generation of programmers have the confidence to try out multi-core programming, but only because they yet lack the painful experience of doing it. Whilst hardware is running away towards 100 cores in 5 years time and still keeping up with Moore's "Law", software is sadly lacking such advances. There's a very good white paper over on the 2 Cubed website about the problem The Multicore Revolution and a new framework just out called infoQuanta which looks intersting.

Meanwhile, Sun is tackling the problem from within the language. They've got Fortress Fortress which has a certain Java/C syntax about it. I haven't used it but I would say that it doesn't look like the 21st Century solution that's going to advance software development significantly.

Re:A revolution in our midst by xenopizen · 2008-03-25 04:58 · Score: 1

When I write back end systems, the performance isn't considered a software issue but a hardware issue. With virtualization, virtual servers often only have a single core anyway. What annoys me is when people use this as an excuse to write poor software. For me, the big thing about parallel software, and why we need it, is latency: the ability of a system to be resiliant to obstacles, such as slow or dead database servers or any other external system you rely on to some extent. Linear code stops at these obstacles. Parallel code can get on with other stuff and cope. So we need it. Interesting White Paper. Going to see what my (smarter) friend thinks about that framework.

no one uses multi-threading? by greenrom · 2008-03-25 01:06 · Score: 1

If you run Windows, go to task manager and enable the threads column in the process window. On my system, there are very few processes that only use one thread. I've done a lot of work in embedded applications, and even there most products I worked on were heavily multi-threaded. Even in the simplest embedded applications without an operating system, there is usually some parallelization via interrupts. This raises the question, if so much software both application and embedded is multi-threaded, why do so many people on Slashdot feel that multi-threaded programming isn't useful / too difficult / etc? What kind of software are you guys writing?

real world problem by oni · 2008-03-25 01:09 · Score: 5, Funny

> Name a single real world problem that doesn't parallelize.

Childbirth. Regardless of how many women you assign to the task, it still takes nine months.

(feel free to reply with snark, but that's a quote from Fred Brooks, so if your snarky reply makes it look like you haven't heard the quote before you will seem foolish)

Re:real world problem by Nursie · 2008-03-25 01:39 · Score: 4, Funny

True, but it scales well, you can have multiple child instances in the same nine months by throwing more women at the problem.
Re:real world problem by spun · 2008-03-25 02:28 · Score: 1

Well, if you're making mythical children, you could probably do it in a month. If you had enough men.

--
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
Re:real world problem by JAlexoi · 2008-03-25 02:44 · Score: 1

The quote originally meant that you can't speedup tasks my adding more parallel work.
But we are not speeding up tasks, we are doing more at the same time. That is what parallelism means.
So it's you that seems foolish.
Re:real world problem by neomunk · 2008-03-25 03:14 · Score: 1

Your ideas interest me and I would like to sign up as a test subject when your work gets to the trial stage. :-D
Re:real world problem by TheLink · 2008-03-25 03:58 · Score: 1

Enough men? I think there are very many Slashdotters who would sign up to make mythical children with mythical women for a month or however long it takes.

After all many Slashdotters already have years of "hands on" experience making virtual children with virtual women.

I'm sure they'd be game to tackle the learning curves head on.
--
- Too many replies beneath your current threshold
Re:real world problem by andphi · 2008-03-25 04:22 · Score: 5, Funny

There is, however, the problem of process upkeep. One child process is a resource-hog. Multiple child processes seem to consume resources exponentially rather than linearly. How do you propose to optimize bandwidth to the mother process(es), supply sufficient inputs to the child process(es), or redirect the child process(es)' regularly scheduled core dumps? As a note, mother and child processes don't respond well to preemptive multitasking.
Re:real world problem by fahrbot-bot · 2008-03-25 04:33 · Score: 1

Childbirth. Regardless of how many women you assign to the task, it still takes nine months.
True, but it scales well, you can have multiple child instances in the same nine months by throwing more women at the problem.

Also true, but merging the result set is problematic.

--
It must have been something you assimilated. . . .
Re:real world problem by Nursie · 2008-03-25 04:36 · Score: 1

"Also true, but merging the result set is problematic."

I think they solved that to some extent in UT/MormOS, but I hear it requires quite a specialised mindset to work with it.
Re:real world problem by Nursie · 2008-03-25 04:43 · Score: 1

Well played sir :)

However, with the power of multiple cores, a machine should be able to run multiple jobs at the same time. Even with that said, however, the newer model does mean that you're going to run out of cache much faster.
Re:real world problem by Zero__Kelvin · 2008-03-25 05:11 · Score: 1

"Name a single real world problem that doesn't parallelize."
"Childbirth. Regardless of how many women you assign to the task, it still takes nine months."

Funny perhaps, as long as you don't think about it. In 9 months with one "processor", one unit of output. In 9 months with 2000 "processors" a number which approaches 2000, but is somewhat less. Sounds like it parallelizes almost perfectly to me. Yes there is a nine month propagation delay before process output begins, but we call that the pipeline ;-)

--
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
Re:real world problem by andphi · 2008-03-25 05:12 · Score: 1

Ah, yes: run multiple jobs. That seems to work. Or, if the mother process doesn't lock up, let her run jobs as well before she starts to instantiate the child process, so you can both store future inputs.

My preference is to maintain a single mother process and as many child processes as come along. Usually, the child processes are spawned serially, but occasionally the instance forks spontaneously. In those cases, just hope and pray for a small number of child processes of the same byte-sex. That way you can delay drive partitioning for years. Eventually, each process will need its own semi-private /proc space, of course.

Am I stretching this metaphor too far?
Re:real world problem by Anonymous Coward · 2008-03-25 05:21 · Score: 1, Funny

if your snarky reply makes it look like you haven't heard the quote before you will seem foolish

Ah, good to know. I'll make sure it sounds like I've heard the quote.
Re:real world problem by jc42 · 2008-03-25 06:41 · Score: 1

you can have multiple child instances in the same nine months by throwing more women at the problem.

True, and this corresponds directly to the approach to parallelism that the more experienced programmers have found works best: Divide the task among independently-running processes (mothers). Each one can devote its time and resources to its piece of the task, and it scales up to millions of processors (mothers) simultaneously.

Even in the case of colonial species like ants and bees, with a single mother per hive and thousands of young developing as any time, the work still isn't done inside the one mother. Rather, she produces only small fertilized eggs, and hands them over to workers for care and development. This similarly provides for many independently-running development processes, with the "processors" (workers) able to switch rapidly between larvae as needed.

Those of us who have worked on single-process ("threaded") parallelism have learned the hard way why Ma Nature has repeatedly done it this way. With a single process, the coordination and synchronization rapidly becomes so expensive and complex that the task typically runs slower than the version with many separate processes. It may be true that a few special tasks have been found (e.g., array arithmetic) that work well in parallel within a single process. But very few such tasks are known, and attempting single-process parallelism with most things we've done just turns out to be a loser.

Debugging is especially difficult, since multi-threaded tasks are generally not reproducible. This eliminates most of our debugging tools. Nature had billions of years to experiment; she could throw resources at the task and ruthlessly prune the huge majority of failures. We haven't had billions of years (or even hundreds) to develop software, though, so it's not surprising that we've done poorly at something that nature has generally given up on because it doesn't work well.

And employers generally don't look kindly on an approach of "Write thousands of programs, trying to find one that works, and throw the rest away".

(Actually, people have tried to use ants and bees as software models. There are some interesting lessons, especially for network operations. But the metaphor sorta falls apart if you look at it too closely. An ant isn't very much like a process at the atomic/bit level. ;-)

--
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
Re:real world problem by jbengt · 2008-03-25 08:53 · Score: 1

Sounds like it parallelizes almost perfectly to me.

If the problem is to make as many babies as you can.
But if the problem is to make one baby, you can't speed it up by putting more people on the job.

(The way I heard it was: "No matter how many men you put on the job, it still takes 9 months to make a baby.")

The saying was originally coined to elucidate to tasks like writing software or constructing a building. Those sorts of projects can break down if you put too many people on them because you waste too much time coordinating efforts (you can have one crew install the water piping, another the electrical, and a third install the floor; and you can manage to coordinate them, but how many crews can work on a floor at a time without tripping on each other? You're certainly not going to have hardwood floors being finished at the same time a drywall crew is sanding.) Also, you can't speed some things up - like struggling with the city for a year until you finally get a signed-off building permit after the contract deadline for completing construction has passed.
Actually, it is starting to sound like a parallel programming problem.

Anyway, in my useless opinion, almost any programming problem could be sped up by use of parallel processors, but often only if it's be cheap enough for processors to spend significant amounts of time sitting there idle, waiting for responses.
Re:real world problem by hobbit · 2008-03-26 06:30 · Score: 1

In 9 months with one "processor", one unit of output. In 9 months with 2000 "processors" a number which approaches 2000, but is somewhat less. The expected output for one "processor" is somewhat less than one unit too, for exactly the same reason...

--
"Wise men talk because they have something to say; fools, because they have to say something" - Plato

UK used to be very interested in parallel dev by gilesjuk · 2008-03-25 01:10 · Score: 1

In the UK we produced a cell like CPU long before Sony's much hyped cell processor. We even developed a parallel programming language for it.

This was back in the days of 8 and 16-bit processors and the fear was that 16-bit and 32-bit processors weren't going to produce the processing power required in the future.

Obviously these fears were unfounded as the MHz increased dramatically as did caching and other tricks to increase performance.

But those looking for some inspiration should look up the Inmos Transputer and Occam.

Re:UK used to be very interested in parallel dev by SpinyNorman · 2008-03-25 04:11 · Score: 1

Yep - the Transputer & Occam were way ahead of their time. What was so great about the combo was that they (the processor & language) designed together so that multi-processor language constructs mapped directly onto the hardwware.

In a language such as C++ inter-thread communication is implemented via constructs such as mutexs, condition variables and queues that need to be implemted in software and involve the scheduler (sleep until condition variable woken up). However, in Occam parallelism is built into the language itself and "threads" communicte by reading and writing to communication channels connected to other threads. These communication channnels are directly implemented in hardware so that if you run your Occam program on a multi Transputer network then individual threads will be mapped to seperate chips and the communications channels mapped onto the hardware inter-transputer communication channels.

I agree with some of your points... by Nursie · 2008-03-25 01:14 · Score: 1

But not all. And there is a beautiful (IMHO) compromise.

"First, they give the developer an illusion simplicity in the beginning of development. Each thread minds its own business and inevitably follows its state machine that is plainly visible as a sequential program: receive this, compute that, save the result, send the response, repeat from beginning."

Yes, they do. they also make sense in many ways as a natural actor-driven state of affairs. Client A opens comms with the middle tier, middle tier spawns a thread that deals with client A and it's requests to underlying data sources. Client B connects, spawn another thread etc etc. It encapsulates things quite well. Not that this is a good way of doing things, but more on that later.

Windows does support the whole async io thing now, as far as I'm aware, but that was an issue on some of the earlir windows platforms.

"I've had my share of trying to sew together the multithreaded solutions of less timid developers."

I can certainly appreciate that. Finding meaning in large chunks of other people's code is tricky at the best of times. Add in a good measure of incompetence and a shake of threading, you have a recipe for incomprehensibility. Solution - hire decent staff.

"Locking is missing and cannot be added without deadlocks. It is also virtually impossible to find all places in the code where locking is missing and where it needs to be added."

Design, intelligence and good practice. As with your deadlock example, there are good ways around these situations and they tend to involve a bit of forethought. A needs to lock B and B needs to lock A? Why? Something is probably wrong with your design.

"The second big problem with multithreading is that the threading paradigm requires only one input at a time (after all, multiplexing was not allowed)."

This is not necessarily true. I'm a C programmer and have found that io multiplexing and threads can go hand in hand very nicely. The multiplexing is just a necessary part of writing a capable server or middle tier, and threading provides (when done well) effortless scalability.

"At a bare minimum if a thread is waiting on a socket, it must be receptive to another thread telling it to stop waiting because of a change in circumstances"

There are various interrupt mechanisms availble, or one can have a message pipe that a thread polls along with the socket. This is a bit messy but doesn't require periodic polling or anything else so inefficient.

"It follows that experienced developers generally prefer the single-threaded, event-driven paradigm"

Remove the word "single-threaded" from that and I agree. One of the best solutions (IMHO) for a scalable and yet robust system is the thread pool. You have a manager thread that is responsible for polling various file descriptors and timers, you have a variety of job queues, and you have a pool of threads that take jobs off the queue and run them. You then implement the rest of the program as a series of discrete jobs that (ideally) never block. then you can tune the size of your pool based to get the best out of the machine you're running on.

Re:I agree with some of your points... by Anonymous Coward · 2008-03-25 01:57 · Score: 0

"A needs to lock B and B needs to lock A? Why? Something is probably wrong with your design."

It's called encapsulation. You want to be able to create an independent (of B), reusable component (A) that is ideally also thread-safe. Unfortunately, it turns out you cannot create such a thing. A and B have to agree on a policy (for example: A always locks before B or agrees to back off when trying the inverse order fails). So the design ideal (encapsulation) is commendable but unfortunately not possible with multithreading.

In contrast, event-driven programming with proper event queues (complete with closures and garbage collection) makes it possible to encapsulate dynamic objects safely and intuitively.

"One of the best solutions (IMHO) for a scalable and yet robust system is the thread pool."

It's true that battle-hardened multithreaders offer thread pools as an antidote to some of the woes of multithreading. Unfortunately thread pools in my opinion are a far messier approach than asynchronous programming and only solve the unnecessary problem of thread proliferation.

In cases when you truly need to parallelize (or background) tasks, you are usually best advised to use full-blown processes.
Re:I agree with some of your points... by Nursie · 2008-03-25 02:09 · Score: 1

"It's called encapsulation."

Yes, and your hypothetical design has it wrong. Encapsulation is perfectly possible in a multithreaded world. I'm a C programmer and it's fairly trivial to make thread safe APIs. I don't quite get your A and B example, could you make it more explicit? Are they both trying to access the same resource? in which case the synchronisation ought to take place at the resource level.

"It's true that battle-hardened multithreaders offer thread pools as an antidote to some of the woes of multithreading."

I'm not proposing it as a solution to the supposed "woes" of multithreading. There aren't any in my experience, once you've learned the ropes. I'm proposing it as a parallelised (and superior) way to acheive the same results as your asynchronous event driven programming. It uses many of the same techniques (async io, polling, signals etc etc) but also scales effortlessly in ways that the async event-driven single threaded app cannot. They don't only solve the problem of thread proliferation, they solve the problem of utilising multiple hardware threads/processors/cores transparently and without needing to refactor your whole design. They also don't suffer from the complexities surrounding full IPC in terms of shared memory.
Re:I agree with some of your points... by Anonymous Coward · 2008-03-25 02:39 · Score: 0

"I don't quite get your A and B example, could you make it more explicit?"

A is a user session. B is a connection. A single session owns zero, one or more connections.

The user session receives a "ping broadcast" request from the user. It locks itself so the list of connections is not altered during the broadcast and in a for loop asks each individual connection object to emit a ping. To update statistics atomically, the connections lock themselves one by one.

However, simultaneously one of the connections receives a timeout triggering a disconnection and removal from the user session. The connection locks itself and calls back the session object indicating an immediate dissociation. The dissociation indication causes the session the lock itself before removing the connection object from the list of connections.

The user request and the timeout take place in separate threads. With the right timing, a deadlock is reached.
Re:I agree with some of your points... by Nursie · 2008-03-25 02:51 · Score: 1

"However, simultaneously one of the connections receives a timeout triggering a disconnection and removal from the user session. The connection locks itself"

Sets its "I am dead, don't ping me" flag, unlocks itself.

"and calls back the session object indicating an immediate dissociation. The dissociation indication causes the session the lock itself before removing the connection object from the list of connections.

The user request and the timeout take place in separate threads. With the right"

locking strategy, a deadlock is avoided.

Sorry, I've begun to sound both confrontational and smug (in my head) and that's not how I wanted to come across. Sure, there are things to think about when going with threaded solutions, and situations to avoid that a novice thread programmer may miss. In many cases though it's just making sure that locks aren't held for a moment more than necessary and that state information is maintained.

hardware gets ahead of software by peter303 · 2008-03-25 01:42 · Score: 1

Multicore, supercomputers, same endless crap. Vendors calim they have great hardware, but ship crappy software tools.

Call me suspicious... by rnturn · 2008-03-25 01:48 · Score: 1

But when I hear:

"Intel's James Reinders ... programming for multi-core is catching the imagination of programmers more in Japan, China, Russia, and India than in Europe and the United States."

uttered by a rep from one of the companies that lobbies hard for more and more visas to bring in more foreign IT workers I have to wonder if this isn't just more BS in support of that effoert. It sure doesn't help when he follows it up with:

"We see a significantly HIGHER interest in jumping on a parallelism from programmers with under 15 years experience, verses programmers with more than 15 years."

So not only are you domestic programmers not showing sufficient interest in parallel programming, you're also too old (i.e., more costly) so plan on being replaced by younger, cheaper, foreign programmers.

I fully expect that the next round of Congressional hearings on increasing the cap on visas will include this line of reasoning.

--
CUR ALLOC 20195.....5804M

No issue with parallel programming anywhere. by sonofagunn · 2008-03-25 01:49 · Score: 2, Insightful

For your examples you threw out a bunch of problems that are currently being parallelized just fine by today's software, which would indicate we're not having problems with parallel programming. Name a search engine, database query engine, weather simulation software, etc, that isn't already multithreaded. Where's the issue?

Problems that can and should be parallelized in software already are for the most part. There is no issue here.

Business processes are often serial (step B depends on output of step A). That's what a lot of corporate programmers work on. And even these have steps that are done in parallel or can run multiple instances of a process in parallel. Anyone working on a web application or j2ee infrastructure is probably running lots of their small, serialized problems, in parallel. Again, there is no issue here.

Nonsense by Nursie · 2008-03-25 01:51 · Score: 1

You simply build scalability into the app from the word go. If you're really clever you can build in some tunable parameters, like the size of a thread pool, and have your app make good use of whatever's available.

I just don't get it (Paralellism) by scorp1us · 2008-03-25 01:57 · Score: 1

We hear oh, threads are good, but then I read this propagated by SQLite and written by someone at Berkley.

Then there is my own observation, If I have a mount of work (w) to do n times, then the product (w*n=p) is the total work. If I queue it then I approach total wall time being w*n as well. If I tackle it as multi-threaded, then I start n threads which each do w work. However, because the threads compete for scheduling, and the OS has at least n more context switches, we actually reduce the amount of work being done in any amount of time. In addition, the caches effectively become smaller because you have to share your cache space with other threads. And we've also just added a degree of complexity to the processing because you will likely need critical sections and mutexes.

Which leads me to to my general conclusion (in terms of most software written) that for general programs, the threading should be accomplished by a thread for each unique (in terms of algorithm) processing task. A GUI thread, a database thread, a static service thread. By separating these, you'd enforce a degree of abstraction and concurrency. And all that is needed are callbacks (to async. process the value from the DB and update the GUI). The non-general cases like executing a PHP script as part of a web server (where the algorithm depends on the script file itself), or a highly scalable large problem (i.e. sorting, dynamic programming).

Just my thoughts from an alleged pragmatic programmer.

--
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.

Re:I just don't get it (Paralellism) by SpinyNorman · 2008-03-25 04:00 · Score: 1

You should only expect a wall time speed-up from switching to a multithreaded design if you are executing on multi-core/multi-processor hardware where there is true execution concurrency.

However, in-practice sometimes a switch to a multi-threaded design even when run time-sliced on a single processor without any true concurrency, will also result in a speed up against the expectation of a (very minor) slow down that you note. Of course this doesn't always happen, but the reason it coccasionally does happen is because multi-theading as a design tool can often lead to much cleaner and more efficient designs (and maybe also result in more cache hits between timeslices due to the smaller code per thread vs the monolithic alternative). Note that with a modern OS you've going to be getting timesliced anyway (even in a single threaded app) due to the OS itself running threads.

Sorry, not quite the relevant issue by Anonymous Coward · 2008-03-25 02:01 · Score: 0

Most of the ones you listed: search, neural nets, compilation, and database are ALL memory bandwidth bound. It doesn't matter if the parallelize well because your processor will just be sitting there waiting for the cache misses. FEA and weather simulation (similar) may not be if the update steps are sufficiently computationally intensive and/or the reuse pattern is good.

Parallelization is about *data locality* not parallel processing. So few people understand this that it's shocking. This is why the Cell beats the pants off much faster CPUs -- it's harder to program, but once you've done it you've figured out the data locality.

So it doesn't really matter if you can run them across 100 cores/processors, you'll still only get 1% utilization, as is typically seen on most supercomputers.

Maybe because we've seen the hype cycle before... by alispguru · 2008-03-25 02:10 · Score: 3, Insightful

Anybody else remember the great clock cycle stall of the 1980's? During that period, Moore's Law operated in a manner closer to its original statement - the big news was the drop in cost per transistor, not raw CPU speed. The general wisdom at the time was that parallelism was going to be the way to get performance.

And then, we entered the die shrink and clock speed up era, clock speeds doubled every 14 months or so for ten years, and we went smoothly from 60 MHz to 2 GHz. Much of the enthusiasm for parallel programming died away - why sweat blood making a parallel computer when you can wait a few years and get most of the same performance?

Clock speeds hit a wall again about five years ago. If the rate of increase stays small for another five years, the current cycle drought will have outlasted the 1980's slowdown. I have a great deal of sympathy for parallel enthusiasm (I hacked on a cluster of 256 Z80's in the early 80's), but I think it won't really take off until we really have no other choice, because parallelism is hard.

--

To a Lisp hacker, XML is S-expressions in drag.

simulation of single-core cpus? by Chirs · 2008-03-25 02:14 · Score: 1

Sure, there are a class of problems that are embarassingly parallelisable. The ones you've mentioned above are mostly in this category.

Howver, there ARE problems that don't scale easily or appropriately. CPU simulation is one...you can break it down into functional areas, but at some point your inter-processor communication costs more than it buys you.

How about decryption of existing encoded data where the key changes based on data content? You could decrypt more than one data stream at a time, but that doesn't help you handle a single stream any faster. Changing algorithms could help with this, of course.

Some of the existing compression algorithms behave similarly, with the same current problem and the same future solution.

I think that until recently we've gotten used to being able to do the same things faster than before. Now we're having to change the way we look at things, and try and convert those same problems into doing more things at once.

Bunch o' bloated bollocks for kids that can't code by Anonymous Coward · 2008-03-25 02:31 · Score: 0

Never has a truer phrase been uttered on /.

Objects have been touted by industry as a panacea for the better part of the last two decades....all it has done is give us bloat and bugs.

The problem isn't "old dog" programmers, the problem is the vast number of mediocre programmers, old-beards and fresh-faced alike. Objects became the favored method because mediocre programmers could understand them.

(And before I got tagged as flame-bait, I am not trying to insult all programmers, I'm merely pointing out that not everyone is a Rob Pike or a Miguel de Icaza)

When McCarthy gave us LISP he had given us the first step on the road to improved programming techniques, unfortunately the keepers of that legacy are the high-priests of Haskell and the Monad. Only programmers with a background in Pure Math can really get to grips with this stuff.

Thankfully there are some descendants of LISP that the mediocre among us can make sense of, Erlang is a good example of this and there is enough capability in the compiler and the toolkits to take advantage of multiple cores right now.

The problem is that the mediocre masses (and their pointy-hared project managers) are stuck in the tar-pit reciting object-babble to each other because the IT industry is no-longer about technology it has reached the point in its evolution where only (business) methods and (management) procedures matter.

What really needs to happen is for the industry to realise one important thing about the preceeding decade: distributed object systems were still-born in the 90's, industry needs to put the paddles away and stop trying to revive the corpse.

Thread for concurrency or parallelism? by pikine · 2008-03-25 03:13 · Score: 1

The irony is that threads are only practical (from a correctness/debugging point of view) when there isn't much interaction between the threads.

It might seem ironic, but think about it. If threads interact with each other all the time, that means most of them will stay blocked just trying to synchronizing access to some data. That's using thread for concurrency (I/O bound tasks), not using thread for parallelism (computation bound tasks).

--
I once had a signature.

Parallel programming is easy. by jozmala · 2008-03-25 04:14 · Score: 1

All it takes a good design.
The hard part is debugging after loosing ability for reproducing bugs with equal input.
As for 2 to many cores it really isn't that hard for many applications. And for many others it doesn't really matter because single threaded/process is fast enough.
One thing that could use it is GAMES provided you actually are GUARANTEED to have that many processors available.

The syncronisations are easy if you design it properly. If your coding style is
"think little, try something, debug until works" then you fail, the agile method for multithreaded is just asking trouble.

Design the application for parallerism instead of trying to parallerise each small block, you get far better result. Extreme waterfall for extreme numbers of processors.
And using number of processors for scaling up such application is far easier than trying to speed up said application.
Of course there are things that are hard to parallerise, luckily most of the time you don't need to parallerise them.

Of course I want PC:s with capability of executing 100's of threads at same time, and enough memory bandwith for that.
Why?
Because the easily parallerised problems include.
game AI, optimizing compiler, graphics...

The hard problem is "Here's the code base, make it go faster by using more processors." , and thats lots of realworld situations, but it doesn't mean that parallerism is hard. Fitting parallerism for non-paraller design is hard.
Its like trying to fit square through circle of equal area.
For 2 to many cores, its hard for those who's entire code base is designed for single or two processors.

Still oversimplified by Moraelin · 2008-03-25 04:28 · Score: 1

First of all, it was just one example, of one problem. Not even the hardest, but something that everyone would understand easily.

There's a reason why you see "these types of simple threading problems." Not because you're teh uber-genius and everyone else is teh drooling retard, but because they're supposed to be just that: extremely simple examples. We're not doing a Ph.D. level research paper into parallelization, we're shooting wind on a board, for the benefit of some people which may be nerdier than the average, but include a ton of non-programmers anyway. Trust me, it's not that it's the kind of stuff that stumps everyone but you, but the kind of stuff that you could use to teach your mother about it.

Second, snap out of it. You're essentially doing the same "I'm a genius, everyone else obviously is an idiot" crap act, that's actually the most common the least someone actually understands the domain.

Believe it or not, yes, some of us do know what a "critical section" is. Java even has "synchronized" as a keyword into the language itself, so it's hard to not run into the concept.

But some of us also know what the performance penalties are. It doesn't come for free, you know. Sometimes it _is_ more efficient to just update each player's coordinates in a single loop, than to have a thousand threads which try to lock each other out every dozen lines of code.

More importantly, it's easy to design something that doesn't scale. A very common performance problem, for example, is when all processes wait on one resource that only one can use at a time. E.g., if most processing happens trough a cache, and access to the cache is guarded by a mutex, congrats, 1000 threads on 1000 cores won't run much faster than 1 thread on 1 core. I've actually seen that exact problem in more than one web application.

Just having a concept of "critical sections" won't do jack squat for you there. You end up needing a bit more advanced stuff so you can go as deep as you can into the innards of that cache, before you synchronize access. The Java Concurrent package is there for a reason, and to solve just that kind of a problem, for example.

More importantly, it still needs more clued up (read: more expensive) people to get it right, and even those do occasionally get it wrong. But even skipping past the last part, essentially it means you do need a bigger budget for the cool multithreaded solution. So in the cases where it's not that much of a disadvantage to avoid that (e.g., because the CPU will be idle and waiting for the GPU 90% of the time anyway), there's actually a good economic reason not to.

Again, I'm not saying any of _that_ is unsolvable either, but I _am_ saying that it tends to be slightly more complicated than when looked through the goggles of "I'm such a genius and I've heard about critical sections in college!" _If_ you see the world as that simple and devoid of any other considerations, that's really just your clue that you still have much to learn.

--
A polar bear is a cartesian bear after a coordinate transform.

Nature of Globalization by Tablizer · 2008-03-25 04:31 · Score: 1

Perhaps embedded and hardware-centric work is moving away from the US because the labor costs are too high. US labor tends to lean more toward integration work, and our software effort is for custom "glueware" of sorts. For example, an SQL query is a sequential request, but fulfilling it may make use of parallel processing by the guts of the database engine. (In fact, database servers tend to split the files/tables on multiple disks now so that sequential scanning is split among multiple disks to scan parallel.)

--
Table-ized A.I.

Re:real world problem -- already parallellized by Krishnoid · 2008-03-25 05:00 · Score: 1

It would take a lot longer than nine months if it was just that one (cell) processor doing all the work of dividing.

I don't think that words means ... by Zero__Kelvin · 2008-03-25 05:01 · Score: 1

I don't think that words means what you think it does. From the definition of mediocre at dictionary.com:

-adjective
1. of only ordinary or moderate quality; neither good nor bad; barely adequate.
2. rather poor or inferior.

Others have pointed out that your claim is quite incorrect even if mediocre did mean the same as median, so I will not be redundant with regard to that. What is phenomenal is that your post has been modded informative! Seems Slashdot has been recently inundated with folks who have an understanding of statistics and language that is, at best, mediocre :-)

--
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun

Re:I don't think that words means ... by Anonymous Coward · 2008-03-25 05:28 · Score: 0

What is phenomenal is that your post has been modded informative! Seems Slashdot has been recently inundated with folks who have an understanding of statistics and language that is, at best, mediocre :-)
Ah, it's Zero__Kelvin here to save the SNR on Slashdot from exceeding 0.1! Please continue to post needless critques of other people's "understanding" without adding new information or insight. Simple enough really, just make some inane insults and then add in a few words to make people think you might know what you're talking about, without really explaining it. Saves a lot of typing, I'm sure!

Feel free to explain how most people do work that is more than simply "ordinary". I'm pretty sure that's what ordinary is all about.

Bah! by Giant+Electronic+Bra · 2008-03-25 05:10 · Score: 1

I can throw you the very most classic example there is. You cannot parallelize an order matching algorithm for instance. Every order has to be processed in a fixed sequence. If I have a book of bids and asks and you come in with an order, it is a REGULATORY REQUIREMENT that I have to process it in the order it came into the system. This is an absolutely serialized type of operation. I've got to check for a match on your order and either book it or execute. No ifs ands or buts. An entire order matching SYSTEM can handle multiple orders for different instruments in tandem, but any good middleware framework will abstract that level away from where you even need to know about it outside of setting some configuration on your message queues and transaction manager.

Sure, a lot of things can benefit from parallel execution, but MANY tasks inherently contain mandatorially serial sequences of operations. You can try to break them down into steps and execute different steps on different cores/threads at the same time (one can fetch a new message while another performs matching) but what you will find is that in a lot of cases the overhead of the locking, queueing, and handoff of data from one stage to the next, combined with loss of data locality and cache coherency in current systems will actually DEGRADE performance. Many times it is a matter of a trade off between throughput and latency.

At least in some application domains architecture of the data processing system at a higher level is a lot more important than cute optimizations at the code level. I think one of the reasons "younger programmers are more interested in parallel programming" is the ILLUSION created by the fact that they're the ones doing low level coding, whereas I'm up here at a higher level of design where the issue is important, but not from a coding perspective. In other words I really don't care about queues, semaphores, and spinlocks, etc. I care about data flows, scalability, throughput, reliability, and latency, which are largely NOT a coder level issue. So maybe you'd find that senior developers/system architects/analysts ARE concerned about the same issues as 'younger coders', but they see them in a different way, and may not even consider it the same thing at all.

--
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson

Re:Bah! by Anguirel · 2008-03-25 09:41 · Score: 1

You cannot parallelize an order matching algorithm for instance. Every order has to be processed in a fixed sequence. If I have a book of bids and asks and you come in with an order, it is a REGULATORY REQUIREMENT that I have to process it in the order it came into the system.

You can absolutely parallelize that without violating the spirit of the regulations (which may not be good enough for a given law, presently, but the problem itself minus the outdated interpretation of the regulation can be handled). The entire concept of a branch of parallel computing involves allowing out-of-order computation to occur for events that are usually unrelated. If you process bids where the majority don't conflict with each other (either sufficient quantities of the requested item that they can be allocated to all orders, or bids for different objects), then they can be processed in parallel up until the last element of a given supply runs out. At that point you need to reverse things a bit (designing a reversible algorithm isn't necessarily easy, but is usually possible) to ensure the bids requesting it are processed in the proper order (not a problem with time-stamps attached to each request showing when an out-of-order exceptional case has occurred). So even if the bids are, in the system, processed out of order in parallel, the final result matches identically to one where they are processed serially.

--
~Anguirel (lit. Living Star-Iron)
QA: The art of telling someone that their baby is ugly without getting punched.

Knock Knock by mounthood · 2008-03-25 05:34 · Score: 1

n/t

--
tomorrow who's gonna fuss

Have the wrong idea by Qatz · 2008-03-25 05:39 · Score: 1

It seems to me people are looking at this wrong. People keep saying "Its to hard don't do it, or you'll mess up" However the fact is threading + multiple cores does speed things up. Also not doing something simply because its hard, or because you may mess up is foolish at best. If its hard and you may mess it up the best thing to do is to "DO IT ALLOT" the more you do it the better you will get. As you mess up you'll see where you messed up, and be better equipped to NOT mess up like that in the future. People learn from doing, and exploring. The #1 thing that has held back parallel processing today IS that people have been told to avoid it. This retarded the development of the field, and needs to be amended.

Church's Law by Giant+Electronic+Bra · 2008-03-25 05:41 · Score: 1

Its very simple, and nobody can avoid it. If you have ANY shared resource, then every process/thread/whatever (and it does not matter how you 'hide it under the hood' of your code) then you will be subject to Church's Law. Not to say we cannot or don't need something better than threads, just that as you said, there are problems that just CANNOT achieve much extra performance from parallel execution.

So we come to the question of ARCHITECTURE and design. When you have a problem like that, the only viable approach is to attempt to recast it into another problem, and that is a language/toolkit independent issue because the problem itself is not a language/toolkit/framework level issue.

The conclusion being that it is not possible to just 'abstract away' all parallelism considerations from software engineering. At best maybe some day we'll make tools so smart and so high level that THEY will deal with the problem for us. That will be nice and all, but it won't make it go away.

--
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson

It goes back to OS/2 by PingXao · 2008-03-25 05:51 · Score: 1

When OS/2 version 2.0 was first released about 15 years ago it provided real multitasking for applications. Windows 3.1 at the time was moribound as a thin shell over DOS, which itself was little more than a glorified interrupt handler. Within a couple of years Warp was released and by then x486's were the rage running at 33MHz; 66 for a DX2 cpu. OS/2, however, had started providing multitasking on 80286 processors several years prior to Warp.

Microsoft and the industry trade press (which Microsoft owned through advertising support if not in name) started a campaign that marginalized multitasking as no big thing. That mindset has persisted to this day. I was lucky (or unlucky depending on how you look at it) that I started designing and programming using threads on a pre-emptive multitasking OS almost as soon as OS/2 2.0 was released. I guess I can count myself in the camp that has been programming for less than 15 years, even though the actual number is higher than that.

I'm a nobody but... by Anonymous Coward · 2008-03-25 05:52 · Score: 0

Nice post. I look at these other posts and wonder how you can see the world as anything other than parallel.

If you've got 1 or 2 thousand processors you could have a language that picks up whatever current objects are created and "run them" parallel to each other.

I don't know much about programming but it seems to me that although c++ is object oriented it must ultimately flow in one direction through the cpu. (bad analogy I know, please don't call me the next internet tubes guy) If you've got multiple processors you could be loading each object with it's own entire program into each processor. Just like memory, each processor is used and reused for all the various objects or programs. I imagine it's best for real simulations. Like if you REALLY want to simulate a world. Not just a car that can car.moveForward or car.turnLeft or car.turnRight but all the physics of the car, all the possible variables, etc.

I think we've already got the solution with object-orientation but I think most programmers just need to change the way they THINK about it.

I suppose though that this might be overpowered for a simple accounting application but that's what we have Microsoft for, no one else is better at making bloated apps which suck up every resource you can give it.

parMap by j1m+5n0w · 2008-03-25 06:24 · Score: 1

Haskell has parMap for that sort of situation. I've been using parMap to speed up a raytracer I'm writing. I'm getting less than linear speedups for some unknown reason, but it does help and it's easy to use. (I switched from ocaml to haskell partly because I wanted the parallelism, and partly because I wanted to learn haskell.)

Median, not average by Anonymous Coward · 2008-03-25 06:50 · Score: 0

N/T

Tell me about it by Anonymous Coward · 2008-03-25 07:32 · Score: 0

On a google screening question I offered a solution that distributed a task across multiple processors using threads - and they said it was wrong. They then changed the question to limit me to one processor. Why? Because google requires uniform box-like thinking.

(Trollin'... Trollin'... Trollin' on -a- Slashdot do do do do do do)

On average by Anonymous Coward · 2008-03-25 07:44 · Score: 0

The fact that half the people scored below some value determines what the value of average is. I really cannot resist the urge to point out that the above statement is true (or truish) only if the said scores are distributed in a gaussian fashion (or any other symmetric distribution).

It is easy to demonstrate the statement false for non-symmetric distributions. For example, let's say, that a class of 20 students took a test and 19 people got one point out of ten and only one person got full score of ten points. In that case, 90 percent of the students scored below average.

Please do not confuse the terms. The statement would be much more true if you replace the term average with median.

Ada, anyone? by Anonymous Coward · 2008-03-25 09:06 · Score: 0

I know people will scoff, but plain old widely-despised Ada has tasking built right into the language, not as an add-on library. No, it doesn't prevent common parallel-programming mistakes, but it does try to help you avoid them, and in my experience makes them easier to find when you do make them.

In a lot of other languages, the decision to use threads is a pretty big one, prefaced by a lot of chin wagging, hand wringing, and soul searching. When I'm writing in Ada, tasks are just another tool in the box, to use or not depending on the demands of the app.

Real world solution by Anonymous Coward · 2008-03-25 09:46 · Score: 1, Insightful

First of all, the quote is "Nine women can't have a baby in one month." Parallelism shortens time (to one month) by increasing processors (women). There is only one product (a baby).

That said, pregnancy is a good real world parallelism example. The baby's brain, skeleton, liver, & everything else are growing at the same time. Sure, some body parts must be grown before others. Humans are multi-tasking chemical machines.

Re:Real world solution by Nursie · 2008-03-25 12:00 · Score: 1

I actually think that in the server space the example of throwing more women at it works well.

You can get just as big a gain learning to provide a service simultaneously to multiple clients as you can breaking down the act of providing the same service to a single client in less time.

Re:Maybe because we've seen the hype cycle before. by yoprst · 2008-03-25 10:29 · Score: 1

OMG! 256 Z80s ! What kind of hardware was that?

OT: Median, not average by Anonymous Coward · 2008-03-25 12:11 · Score: 0

Sigh. Off topic, but: that's the median you are talking about. 1, 1, 1, 2, 15 gives an average of 4 with 80% below average.

Flamebait? by Nursie · 2008-03-25 13:08 · Score: 1

Mods been smoking crack again?

Parallel agent simulation by Anonymous Coward · 2008-03-25 14:55 · Score: 0

Yeah.

I remember a research project published about ten years ago, SF-Express, that was a large scale simulator for tank/infantry wars which ran in realtime to allow it to be hooked up to a live player's vehicle simulator. They simulated an entire Gulf War scale battlefield across thousands of processors, where each processor handled a few dozen agents (tanks, soldiers, aircraft).

It's amusing (but not surprising) to hear the slashdot crowd continue to pronounce parallelization impossible at every turn. There seems to be a new NIHS: "not imagined here syndrome"...

Erlang and other functional languages by nsd20463 · 2008-03-25 16:41 · Score: 1

I must concur with the opinion about erlang (or more generally, functional languages) and parallel programming.

IMO the most important feature which functional languages have and proceedural don't have which make them appropriate for parallel programming is single assignment. That is once 'X' is assigned a value it can never change. This means that sharing 'X' with other threads is a no-brainer, because nothing can go wrong.

You are forced by the languages to specially type (in Haskell) or access (dictionaries in Erlang) those variables which are going to hold shared state. Shared state is not the default like it is in C, java, etc... .

This means many fewer mistakes, especially with multiple developers reusing code, because you can easily see in the source code which variables are shared and need special access.

The down side of single-assignment is memory usage. If you are indeed looping (recursing) and doing it in such a way that the compiler cannot optimize away the old X, you rapidly use up heap space storing every new X. This uses O(N) space, rather than O(1) like the change-X-in-place C code would, and that puts pressure on the caches and the memory bus. Hopefully the memory bandwidth will continue to scale with the # of cores.

The other features needed for parallel programming are a green-threads (userspace-threads) and light weight message passing. But you achieve those just as well in C with a library or a bit of code.

Huh. by jd · 2008-03-25 16:51 · Score: 1

Nothing there that (a) a barrier operation, and/or (b) additional encapsulation cannot solve.

A and B negotiate policy as a thread distinct from A and B. BARIER(until A and B complete). A and B continue in isolated threads.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)

Ada by krischik · 2008-03-25 21:14 · Score: 1

You don't have to go super-high-level languages to get proper tasking support in a programming language:

http://en.wikibooks.org/wiki/Ada_Programming/Tasking

Multitasking programming has been part of a large collection of programming language for quite some time now - Only the generation "worse is better" [1] with there {}-languages [2] made us forget all the cool stuff computer scientist conceived in the 70th and early 80th.

Martin

PS: "worse is better" has the advantage in the short and medium term - but in long term the low priority of Consistency and Completeness will come back to haunt you. Example? Well, how many full featured C99 compiler do you know?

[1] http://en.wikipedia.org/wiki/Worse_is_better
[2] http://en.wikipedia.org/wiki/Category:Curly_bracket_programming_languages

Rendezvous based theading. by krischik · 2008-03-25 21:20 · Score: 1

"mutexes, semaphores" tricky indeed but well there is a far easier alternative called "Rendezvous":

http://en.wikibooks.org/wiki/Ada_Programming/Tasking#Rendezvous

And the funny thing is: it's not a new concept - it has been conceived in the late 70th.

Martin

It was called ZMOB by alispguru · 2008-03-26 04:23 · Score: 1

I had no idea it would be findable on the web - the Digital Library of India did a real nice job of scanning a paper on it into a nice, searchable OCRed PDF:

http://dli.iiit.ac.in/ijcai/IJCAI-81-VOL-2/PDF/071.pdf

ZMOB consisted of 256 home-grown Z80 boards and 256 home-grown communication boards. The comm hardware controlled a 48-bit wide bus that was essentially a shift register running in a loop around all 256 processor/comm pairs.

If we had started the project a few years later, we would have probably used 68000 processors, or maybe 8085/8087.

ZMOB was Maryland's first big hardware project, and we learned a lot about how not to do projects of this nature, like the worst ways to maneuver money through state channels, and that when you do large scale machines, careful signal engineering matters.

ZMOB per se was a failure - we never really got all 256 machines running at once. The comm hardware was eventually broken up into smaller 16 and 32 machine loops, and the Z80s were replaced by 68000s.

--

To a Lisp hacker, XML is S-expressions in drag.

Re:It was called ZMOB by yoprst · 2008-03-26 07:27 · Score: 1

Uber-cool!:)
Thanks for enlightening me.

Still not really entirely parallel by Giant+Electronic+Bra · 2008-03-26 07:33 · Score: 1

And now you came to the nut of the problem. Now your operations have to be reversible, and at some point the sequence dependency has to be resolved. I'd look at it this way. Your solution is virtually guaranteed to be significantly more complex, and in fact in a practical system there is already a good deal of (to the application programmer) parallel activity going on, assuming you're using good tools.

For example the search for potential matches is going to be a database table SELECT, which any good RDBMS will execute as a (at least potentially) parallel scan. However, you can't with any existing modern tools do something like have several threads submitting different orders into the algorithm in parallel because THERE IS NO WAY FOR ONE MESSAGE TO KNOW THAT OTHERS EXIST. Parse the problem. Messages arrive continuously, each one permutes the existing state of the book.

The real problem with any attempt to do all this asynchronously is practical. The incoming message rate can be almost arbitrarily high. Suppose you dispatch a thread to do the match for each incoming message. There is no knowable guarantee is to the order of execution within the set of all currently running threads. You may have to backtrack an arbitrary number of steps, N. N is simply the number of received but not yet retired messages at any given time. Also, backtracking IS very expensive in real terms, at the very least a 'delta' has to exist between before and after state for each operation so it can be reversed. Plus you have no real way of knowing when you can throw away one of those deltas because you don't know what the value of N is. It is possible 1000 messages poured in in the last 2nd and the whims of thread scheduling could mean message number 1000 hit the book first, but you don't know the value of N, so all you can essentially do is keep the data required for reversal basically forever.

Now I can sketch out in my head how in theory you could do it, but frankly the overhead of the whole thing is greater than just accepting that there is a bottleneck, reducing the critical section to the smallest extent possible, and optimizing it highly for serial execution.

I think this is in essence the nature of the problem. Certain types of things are just not naturally amenable to parallel execution BY THEIR VERY NATURE, and won't ever be. And no tool will tell you that. And any tool that lets you deal with it will still have to implicitly or explicitly deal with all of the same factors and be just as complex. Plus there is just the problem of generality. One could ask why APIs don't simply become so high level and abstract that programmers never have to deal with nuts and bolts? Nothing about the structure of existing software libraries disallows that, but the higher the level you operate at, the more problem domain specific your code has to be, or else it has to become SO generalized that you just move the problem to how to specify exactly which of the 9000 supported ways things work you actually wanted.

--
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson

Re:Still not really entirely parallel by Anguirel · 2008-03-26 12:07 · Score: 1

For some systems, yes, accepting the bottleneck is ok -- the system can be made parallel, but the overhead cost is too high. Which cedes the point that it could, in theory, be done. And with better tools in place to reduce the overhead of creating those systems, the cost to parallelize may come down to the point where it makes more sense to switch over. Given your description, though, at some future point the number of incoming bids may easily exceed your capacity at all times, and you'll just slowly fall further and further behind. We'll eventually hit some fairly hard physical limits as to what sorts of direct speed-up we can manage. At which point the only option is to move to a parallel system of execution.

You'll likely end up looking at a distributed data structure (rather than a single monolithic database table, you have threads working on subtables passing messages back and forth as needed), and hope your already-optimized critical sections where data locks are necessary are small enough. And yes, the backtracking can be expensive, but if it only occurs infrequently, and the gains made outside of those events are large enough, you have a net gain on the speed of execution. And if every message in the queue is for the exact same thing or requires the previous message be processed first (percentage-based changes, for example), there's a chance you can't get out of serial execution.

So I'll agree that there will exist some specific limited scenarios where parallelism is impossible -- but in most cases right now it's not impossible, just harder to implement than it is worth, which is distinctly different from being actually impossible.

--
~Anguirel (lit. Living Star-Iron)
QA: The art of telling someone that their baby is ugly without getting punched.

Oh, in theory you may be right by Giant+Electronic+Bra · 2008-03-27 00:44 · Score: 1

In fact there's probably some theorem which covers that, lol. Church's Law however does apply to any exclusive shared resource, so at SOME level there is a potential bottleneck, even if the critical section is one line of code. That would be the best case and then chances are the load would need to be huge.

Sure, any system can only have some maximum throughput. I think in the case I'm talking about the real observation is that updating the book data structure (RDBMS tables or whatever the concrete implementation is) is an operation which is critical and its performance dominates the performance of the overall application. I think you find a significant amount of these cases in OLTP type applications. Luckily processors are fast, so small critical sections are annoying, but you can live with them...

--
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson

Slashdot Mirror

More Interest In Parallel Programming Outside the US?

342 comments