Faster Chips Are Leaving Programmers in Their Dust

2005 Called by brunes69 · 2007-12-17 05:47 · Score: 5, Funny

....it wants it's article back.

Seriously - any developer writing modern desktop or server applications that doesn't know how to do multi-threaded programming effectively deserves to be on EI anyway. It is not that difficult.

Re:2005 Called by CastrTroy · 2007-12-17 05:55 · Score: 5, Insightful

It's not just making your app multithreaded, it's completely changing your algorithms so they they take advantage of multiple processors. I took a parallel programming course in University, so I'm by no means an expert, but I'll give what insight I have. You can't just take a standard sort algorithm and run in multithreaded. You have to change the entire algorithm. In the end, you end up with something that sorts faster than n log (n). However, doing this type of programming where you break up the dataset, sort each set, and then gather the results can be very difficult. Many debuggers don't deal well with multiple threads, so that adds an extra layer of difficulty to the whole problem. Granted, I don't think that we really need this level of multithreadedness, but I think that's what the article is referring to. I think that 10+ core CPUs will only really help for those of us who like to do multiple things at the same time. I think it would even be beneficial to keep most apps tied to a single CPU so that a run-away app wouldn't take over the entire computer.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:2005 Called by gazbo · 2007-12-17 06:01 · Score: 4, Interesting

In the end, you end up with something that sorts faster than n log (n).
Not without an infinite number of processors you don't.
Re:2005 Called by ByOhTek · 2007-12-17 06:01 · Score: 1

sometimes it is as simple as adding multiple threading without changing the logic.

It depends on where you are splitting your logic.

Lets take a binary search example:
Your bank accidentally left a back door in their database, and now the hackers/crackers want to grab their enemies credit and accoutn information, which will allow them to get it faster?

The database is sorted:
1) Perform a binary search on the data with each thread doing 1/Nth the data, where N is the number of threads per search
2) Perform a binary search on the data, with each thread searching over all of the data, and one thread per search.

The second works best if there are more searches than CPUs, otherwise the first works best. The second also doesn't really require changing any major algorithms.

--
Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
Re:2005 Called by mycroft822 · 2007-12-17 06:07 · Score: 1

....it wants it's article back.
Oh yea? Well the jerk store called, and they want you back!

//just to be clear it's not an insult, just a joke about a Seinfeld episode...
Re:2005 Called by workdeville · 2007-12-17 06:09 · Score: 1, Insightful

He said n log n, not O(n log n). So he's right.
Re:2005 Called by MrSteveSD · 2007-12-17 06:12 · Score: 2, Insightful

A lot of multi-threading up until now has been about keeping applications responsive, rather than breaking up tasks. That makes sense since muti-core chips haven't been around that long in most peoples homes. Another issue is that once you have more than one processor, two threads really can run at the same time which can show up all kinds of bugs you would never notice on a single core system. The main problem I can see is with testing for errors. With multiple threads it's up to the OS on how it juggles them around and that juggling may be different for every test run. So you could run the same test a hundred times, then suddenly, you could get a failure. So multi-threading throws in a certain random aspect into the software which never used to be there.
Re:2005 Called by ZeroFactorial · 2007-12-17 06:21 · Score: 5, Funny

This sounds to me like a great example of passing the buck.
EE Guy #1: We can't seem to build faster chips.
EE Guy #2: No problem. We'll just put tons of processor cores in instead.
EE Guy #1: But people have spent the past 30 years creating algorithms for single core machines. Almost none of the programmers have any experience writing multi-core algorithms!
EE Guy #2: Exactly! We'll be able to blame the programmers for being lazy and not wanting to learn new complicated algorithms that require an additional 4 years of university.
EE Guy #1: Brilliant! We should come up with a catchy headline like "The Free Lunch is Over" or something like that.
EE Guy #2: Yeah, and we could get Slashdot to post a link to the article. Slashdot users are sure to sympathize with our devious plans...
Re:2005 Called by caerwyn · 2007-12-17 06:24 · Score: 4, Informative

As you know, multiple threads in a program do not actually execute concurrently - processing is still serial, it's just so fast that threads can appear to execute simultaneously - and it's not just about queuing execution either.

That holds only for multithreaded programming on a single core. As soon as there are multiple cores available, processing does, in fact, happen simultaneously.

--
The ringing of the division bell has begun... -PF
Re:2005 Called by chaboud · 2007-12-17 06:37 · Score: 3, Informative

Well, 2005 called...

it wants its reply back.

The parent is exactly how I would have replied a couple of years ago. I was doing lots of threading work, and I found it easy to the point of being frustrated with other programmers who weren't thinking about threading all of the time.

I was wrong in two ways:

1. It's not that easy to do threading in the most efficient way possible. There's almost always room for improvement in real-world software.

2. There are plenty of programmers who don't write thread-safe/parallel code well (or at all) that are still quite useful in a product development context. Some haven't bothered to learn and some just don't have the head for it. Both types are still useful for getting your work finished, and, if you're responsible for the architecture, you need to think about presenting threading to them in a way that makes it obvious while protecting the ability to reach in and mess with the internals.

The first point is probably the most important. There are several things that programmers will go through on their way to being decent at parallelization. This is in no strict order and this is definitely not a complete list:

- OpenMP: "Okay, I've put a loop in OpenMP, and it's faster. I'm using multiple processors!!! Oh.. wait, there's more?"
Now, to be fair, OpenMP is enough to catch the low-hanging fruit in a lot of software. It's also really easy to try out on your code (and can be controlled at run-time).

- OpenMP 2: "Wait... why isn't it any faster? Wait.. is it slower?"
Are you locking on some object? Did you kill an in-loop stateful optimization to break out into multiple threads? Are you memory bound? Blowing cache? It's time to crack out VTune/CodeAnalyst.

- Traditional threading constructs (mutices, semaphores): "Hey, sweet. I just lock around this important data and we're threadsafe."
This is also often enough in current software. A critical section (or mutex) protecting some critical data solves the crashing problem, but it injects the lock-contention problem. It can also add the cost of round-tripping to the kernel, thus making some code slower.

- Transactional data structures: "Awesome. I've cracked the concurrency problem completely."
Transactional mechanisms are great, and they solve the larger data problem with the skill and cleanliness of an interlocked pointer exchange. Still, there are some issues. Does the naive approach cleanly handle overlapping threads stomping on each-others' write-changes? If so, does it do it without making life hell for the code changing the data? Does the copy/allocation/write strategy save you enough time through parallelism to make back its overhead?

Should you just go back to a critical section for this code? Should you just go back to OpenMP? Should you just go back to single-threading for this section of code? (not a joke)

Perhaps as processors get faster by core-scaling instead of clock-scaling this will become less of a dilemma, but to say that "[to do multi-threaded programming effectively] is not that difficult" is akin to writing your first ray-tracer and saying that 3D is "not that difficult." Somtimes it is. At least at this point there are places where threading effectively is a delicate dance that not every developer need think about for a team to produce solid multi-threaded software.

That doesn't mean that I object to threading being a more tightly-integrated part of the language, of course.
Re:2005 Called by chaboud · 2007-12-17 06:41 · Score: 1

I think he was trying to combine the two.

It would also be "they're" rather than "their."

I'm just sayin'...
Re:2005 Called by morgan_greywolf · 2007-12-17 06:43 · Score: 1, Interesting

The difference between parallel programming and multithreaded programming is this ... with a parallel algorithm, different parts of one task/thread are done on separate CPUs, whereas with multithreaded programming each one thread/task is done entirely on one processor.

It's not a semantic difference. Threads are basically just lightweight processes...so each thread of a program execution can be thought of as a different process. OTOH, in parallel programming, a thread/task is broken down into pieces and brought back together when the pieces are done. Think SETI@Home, but on a much smaller scale.

This probably isn't all that useful for writing something like a web server, where it makes some sense to thread off each connection, but for writing a scientific computing application like a simulation or a climate model where you have number crunching that can be done on different subsets of data, you might want to break down those calculations so they occur on different processors. This requires some degree of sophistication as you usually have one part of the calculation that depends on another part and you have to send data back and forth between parts. This involves more than multithreading, but true parallel processing.

--
My blog
Re:2005 Called by foobsr · 2007-12-17 06:46 · Score: 1

Best. Post. Since long :)

people have spent the past 30 years creating algorithms for single core machines

Presumably mostly because demand was controlled from the realm of thought (bean-counters if you like it cynical). Maybe things will have to change if devices are required to be self–aware (intelligent bombs, maybe?).

CC.

--
TaijiQuan (Huang, 5 loosenings)
Re:2005 Called by Tranzistors · 2007-12-17 06:48 · Score: 1

I think digital (bucket) sorting works quite well with multicore systems.
Divide to-be-sorted data among CPU-s and let them throw data into shared buckets. I don't see any theoretical problems with this.
Not that I have done this, though...

P.S. Digital sorting can work better than n*log(n)
Re:2005 Called by felix9x · 2007-12-17 06:51 · Score: 1

I think sorting in parallel is not that hard. Sorting is one of the most studies topics in computer science so there bound to be decent parallel sorting algorithms already implemented in libraries you can use. The need for something like this is not that mainstream that is why we don't find implementation in the standard libraries of popular languages.
Re:2005 Called by Alomex · 2007-12-17 07:09 · Score: 1

If the number of parallel processors is small (like in multicores) parallel programming is not as difficult as you might think.
Re:2005 Called by Josh+Booth · 2007-12-17 07:10 · Score: 1

Thankfully, the programs required for Linux desktops are already multiprocess to a large degree by design. We have X11, the window manager, the file manager and the panels all running in separate processes, just to name a few. And they use well defined IPC so they aren't going to get blindsided by multithreading problems. I remember that only a couple years ago did the Linux kernel start putting things in separate processes, and now I see about 25 kernel processes.

Now Microsoft Windows may have a bit of a problem, since most of the desktop functionality is run by explorer.exe. I imagine they might have trouble splitting that thing up just like they had trouble going to a simple multiuser environment without breaking things.
Re:2005 Called by egomaniac · 2007-12-17 07:31 · Score: 2, Informative

The difference between parallel programming and multithreaded programming is this ... with a parallel algorithm, different parts of one task/thread are done on separate CPUs, whereas with multithreaded programming each one thread/task is done entirely on one processor.

Wait... what? "Different parts of one thread are done on separate CPUs"?

In what (real world, non-research) system is a single thread run on multiple processors at the same time? And why are you claiming that running each thread on a single processor, as is done by all major OSes, not parallel programming?

It's not a semantic difference. Threads are basically just lightweight processes...so each thread of a program execution can be thought of as a different process.

I've re-read that about five times, and I still don't have a clue what point you're trying to make here. From an algorithmic standpoint, all that matters is "these instructions are run in sequence, and these two sets of sequential instructions can run in parallel". The terminology that generally describes the concept of a sequential set of instructions is "thread". Sure, on a given operating system you might use a lightweight process or even a full-blown process to implement each 'thread', but that's an implementation detail and has nothing to do with the algorithm. What are you trying to say?

OTOH, in parallel programming, a thread/task is broken down into pieces and brought back together when the pieces are done. Think SETI@Home, but on a much smaller scale.

You're referring to "data parallelism" versus "task parallelism". Breaking a single computation's data set up into parallelizable chunks a la SETI@Home is "data parallelism", whereas running two relatively unrelated tasks in parallel is "task parallelism". They are both forms of parallel programming and your assertion that only data parallelism 'counts' is simply false.

--
ZFS: because love is never having to say fsck
Re:2005 Called by Ephemeriis · 2007-12-17 07:40 · Score: 1

I'm not a programmer, so feel free to ignore me, but it seems to me that this shouldn't really be something that an application developer should have to deal with. It seems to me that the OS and most programming languages should have ample support for multithreading by now... I remember learning about and writing multithreaded programs back when I was in college - about 10 years ago.

Sure, at the time most everything we wrong was executed on a single core CPU so it wasn't actually parallel programming...it all ran in serial anyway... But you could still write a multithreaded program that'd let you do one thing while the rest of your program was working on something else. It seems like those separate threads ought to just wind up executing on multiple cores now, and actually execute simultaneously.

Obviously this won't fix every problem out there... I'm sure there's plenty of algorithms that would really benefit from being re-examined... And at a certain point you just run out of different things to do at once... But are the OSes and programming languages still so serial in design?

Plus, to be completely honest, I'm not sure how much parallel processing is going to do for your average user. It seems like most applications wind up waiting for something other than the CPU these days... It seems to me that more time would be wasted waiting on I/O or other hardware than the CPU... Maybe that's not true at the server, but it certainly is on the desktop.

--
"Work is the curse of the drinking classes." -Oscar Wilde
Re:2005 Called by civilizedINTENSITY · 2007-12-17 07:42 · Score: 1

This probably isn't all that useful for writing something like a web server...

In terms of concurrency and web servers:
At the same workshop, Joe Armstrong gave a presentation on "Concurrency Oriented Programming in Erlang" (see presentation, view video). In it, he illustrated how Erlang, with its built-in support for lightweight processes and extremely fast process creation, was far superior to either Java or C# in its ability to quickly create new processes. In fact, when an Erlang-based web server was compared to Apache (comparing KBytes/sec vs session load), the ability of Erlang to effectively support many concurrent, parallel processes meant that the Erlang-based web server was able to run over 80,000 sessions while the Apache web server died at around 4,000 sessions.
Re:2005 Called by bfields · 2007-12-17 07:42 · Score: 3, Insightful

Oh, good grief--since the original poster didn't specify units, and since it's highly unlikely the running time would be exactly n log n for any choice of units, and since it's pretty common to leave out the O() in casual conversation, the only sensible interpretation is that the O() was implied....
Re:2005 Called by Fulcrum+of+Evil · 2007-12-17 07:44 · Score: 1

That's a single threaded problem. It's also most likely IO-limited. More threads make the DB faster when its memory sizing is correct.

--
"We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
Re:2005 Called by Heembo · 2007-12-17 07:55 · Score: 1

Welcome Java. Our client side code has had integrated easy to program multi-threading capability since 1997 or earlier. I agree, we have problems with slowness on old-school processors and start up times, but when 8-16 core PC's are the norm, and the gui of the JDK changes radically around Java 7/8, Java might start to be a real possibility for enterprise wide client-side programming - with the exception of gaming and high end graphic needs.

--
Horns are really just a broken halo.
Re:2005 Called by gazbo · 2007-12-17 07:57 · Score: 2, Interesting

Indeed, sorting is highly parallelisable. My point was that you can't change general algorithmic complexity by adding k more processor cores.
Regarding the other "he said n log n int O(n log n)" comment...well, that's already been answered (and with considerably more tact than I would have used).
Re:2005 Called by bdbolton · 2007-12-17 07:58 · Score: 1

"It is not that difficult."

I for one would rather not program multi-threaded programs. It is harder to debug for one.
Re:2005 Called by AuMatar · 2007-12-17 07:59 · Score: 3, Interesting

Yes it is something the app developer needs to deal with. The problem isn't creating threads- the OS does that. The problem is in protecting data from concurrent unsynchronized access (this needs to be done by the app, as the OS has no clue what can/can't be accessed synchronously, that's an application detail) and parallelizing algorithms (again, not something the OS can do).

Eventually, good parallel algorithm libraries will pop up. That will help some subset of problems. I'd expect frameworks to pop up as well, helping another. But in many cases it just comes down to changing how we write programs.

And you're right, this isn't really a desktop issue- its mainly a server one. Desktops really don't need all the power they have now, perhaps one percent of users outside of gamers actually use it. That doesn't make it any less important of a problem to solve. Although I expect in the end people will still end up disappointed- parallelization is not magic pixie dust, you can only get so much of a speedup. I wouldn't be surprised if those 8 corse only give a 2x speedup over a single core on many apps.

--
I still have more fans than freaks. WTF is wrong with you people?
Re:2005 Called by chaboud · 2007-12-17 08:07 · Score: 4, Insightful

It's not quite like that.

On modern systems, threads are themselves first-class constructs, and it runs somewhat like this:

A process has things like memory-tables for virtual memory, handles for objects, files, socket connections, etc. A process always contains at least one thread (this isn't always true while a process is being set up or torn down, but it's true when most anyone's code is running).

A thread generally has a stack (in the host-process's virtual address space, so everyone can read it), some thread-local storage to make life easier for some api's (you don't need to care about this in most cases), and lives in a process. This means that threads can use virtual addresses for memory interchangeably with other threads in the same process.

Additionally, some operating systems support fibers. A fiber is like a thread except that it has to be explicitly or cooperatively (not quite the same thing) multi-tasked. Fibers use even less memory than threads, and you really don't have to care about them.

When you're in, say, Visual Studio, there's a "threads" window for all of the threads of the process that you are debugging. You can end up stepping through code on one thread while other threads are running.

The modern hardware designs lead to interesting performance side-effects from cache location and memory location. It's not quite as hard as systems that have asymmetric access to resources (e.g. Playstation 2), but it makes for fun work.
Re:2005 Called by PitaBred · 2007-12-17 08:13 · Score: 1

If you have actually attached gdb to intercommunicating processes, you know more than 95% of the people posting in this thread. Take anything they say with a huge grain of salt. The multicore hardware is not fundamentally different (from a programming standpoint) from any system with multiple processing units.

--
My blog. Good stuff (when I remember to update it). Read it.
Re:2005 Called by Firethorn · 2007-12-17 08:16 · Score: 1

Plus, to be completely honest, I'm not sure how much parallel processing is going to do for your average user

Substitute 'gamer' for 'user' and then take a look at video cards. Parallel processing is alive and doing well enough to be assumed today, at least for video tasks.

As for multiple cores - while it won't benefit the average office program*, I'd also point out that bottom barrel processors have been able to run them effectively for years.

I've seen proposals that would have a thread for the graphics engine, a thread for the physics engine, one for user handling and one for AI stuff. Optimize and add more threads if you have more cores, of course.

It seems like most applications wind up waiting for something other than the CPU these days... It seems to me that more time would be wasted waiting on I/O or other hardware than the CPU... Maybe that's not true at the server, but it certainly is on the desktop.

Might be true at the moment, but both RAM and flash keeps getting faster and cheaper, while CPU speeds seem to have leveled out, more or less.

I'm sure there's plenty of algorithms that would really benefit from being re-examined...

I'm also sure that this would be true anyways, multi-threading/cores or not. ;)

Honestly enough, I bought a dual core machine recently instead of the quad core because I figure that it's going to be years before they start optimizing programming enough that an extra two cores will make a significant difference in gaming. At least with a dual core the OS can offload most system operations onto the other core, leaving one free for the game.

It'll come, eventually, but multi-core processors have to reach a certain critical mass before programmers will go through the hassle. By the sounds of it, part of the hassle will be developing proper multi-thread debuggers.

*Database applications aside.

--
I don't read AC A human right
Re:2005 Called by EatHam · 2007-12-17 08:16 · Score: 5, Funny

pretty common to leave out the O() in casual conversation
I would say that it is *extremely* common for casual conversation to not have anything whatsoever to do with O().
Re:2005 Called by PitaBred · 2007-12-17 08:17 · Score: 1

Don't forget the memory. All the cores in the world won't do shit without the RAM to store the data.

--
My blog. Good stuff (when I remember to update it). Read it.
Re:2005 Called by Doug+Neal · 2007-12-17 08:32 · Score: 1

This is only an advantage in something like a web server for convenience and simplicity in the design. Performance-wise, it's actually a lot less efficient, due to the constant overhead of context-switching. Single-process web servers will outperform multithreaded/multiprocess web servers by quite a margin, but they're far less flexible.
Re:2005 Called by gbjbaanb · 2007-12-17 08:35 · Score: 1

you're kidding! Sorting is used practically everywhere, from a simple list of items in a drop-down box for user selection to the payroll records of every employee that's due to be merged with the monthly salaries.

Parallel sorting is *hard*. Very hard. The problem is not that anyone can design a sort algorithm that runs in several threads, but that the data is a single lump. If you have 2 cores sorting a list, that list effectively has to be in 2 places at the same time (ie each core's cache). That makes the sorting very slow as the system spends more time moving data from core to core than it spends actually sorting.

That doesn't even count adding synchronisation mechanisms to the problem.

Multithreading is quite easy, multithreading that is faster than single threading is quite hard (ok, I exaggerate, but it can get very slow very quickly if you're not careful).
Re:2005 Called by workdeville · 2007-12-17 08:40 · Score: 2, Informative

It's not particularly unlikely that an algorithm runs in exactly n log n operations (for a sane choice of units). That's kind of the point of Big O notation. Dropping scalar factors into your units. (Though I will note that I prefer using notation like O(n^2 + n) over just O(n^2) since it is more informative, and admit that it complicates my "summary"). The point is: the GGP said n log n, not O(n log n). n/2 log n is faster than n log n, even if they're both of the same algorithmic order.
Re:2005 Called by workdeville · 2007-12-17 08:43 · Score: 1

With considerably more tact that you would have used? Wow, what a douchebag. All I did was point out that you were wrong.
Re:2005 Called by pthisis · 2007-12-17 08:52 · Score: 1

Another issue is that once you have more than one processor, two threads really can run at the same time which can show up all kinds of bugs you would never notice on a single core system.

That's true as far as it goes, but it's worth remembering that it's not at all a "bug" to assume that things that execution happens in order in a single-threaded process. Moving to multi-threading doesn't expose bugs you didn't notice before. It opens up a whole new class of bugs and requires explicit management of serialization. In general, it is not a case of sloppy programming being hidden on single-processors; it's that programming threads is very different and much harder, and you can't just implicitly parallelize existing (bug-free) single-threaded code.

Note that I'm not disagreeing with you--when multithreaded code that's run on single cores can hides some of the serialization bugs, that really is wallpapering. But it's far from the common case as to why bugs show up in multithreaded programs, and it's worth pointing out that it's not just "programmer laziness" making those dual cores idle a lot. Multithreading is difficult, and it's often a better idea to avoid it and go for stability--it all depends on the value of speed for the particular application.

The main problem I can see is with testing for errors. With multiple threads it's up to the OS on how it juggles them around and that juggling may be different for every test run. So you could run the same test a hundred times, then suddenly, you could get a failure. So multi-threading throws in a certain random aspect into the software which never used to be there.

Precisely. Multithreading introduces a huge level of complexity, part of which is beyond the bounds of your program; that in turn makes writing tests much more difficult, and decreases your confidence that a test suite is really catching the bugs it's intended to test.

--
rage, rage against the dying of the light
Re:2005 Called by AdmiralDouglas · 2007-12-17 08:53 · Score: 1

I would say that it is *extremely* common for casual conversation to not have anything whatsoever to do with O().
Totally. I have conversations everyday and O() never even enters my mind. Though occasionally ASCII Cyclops do...
Re:2005 Called by Pollardito · 2007-12-17 09:23 · Score: 2, Funny

actually they divided the buck into 10 dimes and then passed them all in parallel.

seriously though, it was only a few years ago that people were scoffing at the usefulness of dual processor desktop machines and arguing the value of being able to run multi-threaded apps and multiple apps faster at the expense of poorer performance on the vast majority of apps and games which people were running in isolation. it doesn't seem like applications or operating systems have seen a major overhaul since that time (just incremental gains), but the enthusiasm with which they're piling on more and more cores has drowned out all the questions people had. i think this has more to do with chip marketers needing to be able to trumpet something with great excitement than actual newfound utility of multiprocessing
Re:2005 Called by Anonymous Coward · 2007-12-17 09:29 · Score: 1, Informative

>> In the end, you end up with something that sorts faster than n log (n).

> Not without an infinite number of processors you don't.

If you're sorting n items, and you have n processors, then the Odd-Even Transposition Sort has a worst-case time of O(n).
Re:2005 Called by pthisis · 2007-12-17 09:36 · Score: 2, Informative

On modern systems, threads are themselves first-class constructs

Not in, say, Linux or Plan 9. Context of execution are first-class constructs, and both threads and processes are special cases of COEs.

A process has things like memory-tables for virtual memory, handles for objects, files, socket connections, etc. A process always contains at least one thread (this isn't always true while a process is being set up or torn down, but it's true when most anyone's code is running).

The latter sentence here is nonsensical on many modern systems.

The core distinction which applies to most common modern systems (Windows, OS X, Linux, modern Unices, etc) is that:

In a multithreaded program, the threads all share memory (aside from the stack and possibly thread-local storage). This can be alternately phrased as threads lack memory protection from each other. Processes do not share memory except what is specifically allocated as shared memory (through CreateMemoryMapping, mmap, shm_get, or whatever)

When you are making the choice of whether to use threads or processes, your fundamental question should be "Do I want to implicitly share all memory?", or "Do I want to throw out memory protection?". Sometimes the answer is yes, but more often it's no (in which case you probably want to go with multiple COW processes, which the Unix/Mac crowd is familiar with through fork() but the equivalent NTCreateProcess with a NULL SectionHandle is much underpublicized on Windows).

Additionally, some operating systems support fibers.

Fibers are pretty tangential to the conversation and can also be implemented in user space. They're not really threads (or processes) at all, they're just coroutines. Java's "green threads" are one common example.

--
rage, rage against the dying of the light
Re:2005 Called by wirelessbuzzers · 2007-12-17 09:38 · Score: 1

The difference between parallel programming and multithreaded programming is this ... with a parallel algorithm, different parts of one task/thread are done on separate CPUs, whereas with multithreaded programming each one thread/task is done entirely on one processor. It's not a semantic difference. My concurrency prof explained it like so: concurrent programs are semantically multithreaded, and the different threads interact with each other. That is, they logically have more than one operation going on at a time. Parallel programs are physically multithreaded. That is, they do operations one more than one hardware core. Concurrency is about semantics, whereas parallelism is about performance.

These two concepts are somewhat orthogonal: a program might have one thread of control, but its compiler or libraries farm out operations to more than one core, or to a large vector unit. More commonly, it might have many logical threads (to repaint the GUI and contact the network, for instance), even if these all run on the same physical core. Some programming languages' concurrency packages don't even support multiple cores (eg, Concurrent ML); these often scale to more threads than those which do support multiple cores, because they have fewer issues with context switches and synchronization.

--
I hereby place the above post in the public domain.
Re:2005 Called by mycroft822 · 2007-12-17 09:46 · Score: 1

Yes, I was applying George's insult to the context of this thread... rather than to someone eating all the shrimp in the ocean which would make no sense at all.
Re:2005 Called by xhrit · 2007-12-17 09:52 · Score: 1

I do not create new threads for graphics or physics engines, but I do create one for each AI, and one for each network connection.

I love Boost.

--
Dungeon Tactics : Free Open Source SRPG
Re:2005 Called by Strilanc · 2007-12-17 10:04 · Score: 1

Let m be the number of processors
O(n/m) is better than O(n)

We can do better than n*lg(n) because it's a different computing model.
Re:2005 Called by felix9x · 2007-12-17 10:29 · Score: 2, Insightful

Actually you can sort the same data with multiple threads in parallel. Consider for example you divide an array of items into two halfs and sort each half with a separate thread using quicksort. There is no problem with synchronizing the data since the two threads will be working on separate data. The merge of the two sorted sets you can be done single threaded, which is of linear complexity. You can also get fancy with the merge but it gets more complex.

As far as sorting stuff like drop-down boxes you will not have enough data to justify using multiple cores on it, unless you got millions of items in it but then you got other problems.
Re:2005 Called by Darinbob · 2007-12-17 10:33 · Score: 3, Insightful

There's a wide variance in what "parallel computing" means. For multicore, you've essentially just got a cheaper version of SMP (symmetric multiprocessing). This is worlds away from what occurs in a parallel computer and what most parallel programming algorithms deal with. With multicore and SMP you program mostly like you're doing multithreading on a single CPU.

The algorithms programmers have to deal with here involve concurrency, and have been in use for decades by anyone writing an OS or device driver. Dining Philosopher problem, readers and writers synchronization, etc. These are used on what most people think of as single processor computers and are essential. So I don't really think of these as "parallel programming", but as "parallel-light".

Parallel programming to me means dealing with SIMD or MIMD machines. MIMD has multiple processors each with its own memory and data, not multiple processors all sharing the same memory like SMP does. They may have high speed connections to a subset of other processors, such as being arranged in a grid or cube. SIMD has multiple processors all with their own data space but executing the same instruction sequences; the simplest form of which might be vector processors. The algorithms for these machines have very little in common with multithreading types of algorithms.

The parallel algorithms that require lots of sharing between processors will hit a bottleneck on the RAM with these multicore CPUs.
Re:2005 Called by gentlemen_loser · 2007-12-17 10:43 · Score: 1

And you're right, this isn't really a desktop issue- its mainly a server one. Desktops really don't need all the power they have now, perhaps one percent of users outside of gamers actually use it.

Can you substantiate that? Are you including people that do ANY sort of development work where they need to compile programs or test scripts? Are you including people that edit video, retouch their family photos, or dabble in using their computer to make music? How about people who burn DVDs WHILE they browse the web? Are you including people who have installed a new operating system? EVERYTHING you do on your computer from rendering the cool background picture you have on your desktop to opening a browser window (with multiple tabs, all running webapps [concurrently] that include AJAX trickery) takes CPU cycles. Trivializing "Desktops do not need that much power" like you did tells me that either a) you work for Google and are shamelessly promoting web based applications, or b) read way too much propaganda about web based computing and did not really think through what you were saying.

To be fair, you made some excellent points in the first half of your post and I agree with most of what you said. However, the assumption/plug for moving everything off of the desktop was not appreciated and did nothing to bolster your argument.
Re:2005 Called by QuantumFTL · 2007-12-17 10:48 · Score: 1

If for some reason the number of processors you had scaled polynomially with the number of data inputs, then yes a parallel algorithm might run faster than n log n.
Re:2005 Called by Zeussy · 2007-12-17 10:49 · Score: 1

Games are actually an area that are multithreading quite well. For one fact, they have been multithreading for quite some time in a simple sense(CPU & Video Card), but now with consoles being multicored and every high-end gaming rig atleast dual-cored now, games are very highly threaded.

Although games do have an advantage, there are a lot of API's & game engines that you can just drop in and run multithreaded out of the box without much work.

Although the one nice thing about multicored PC's is that rogue app that decides that eating 100% CPU is a fun thing to do, but with multicores you still have a very responsive machine to kill it.

--
Automation - The Car Company Tycoon Game
Re:2005 Called by edxwelch · 2007-12-17 11:00 · Score: 1

Yeah, I've always ran into hairy problems with multithreaded code.
These people think it's easy to write multithreaded, just sprinkle in a few synchronisation blocks and bob's your uncle. Let's see them try to debug a complicated mutlthreaded algorithm and see how easy they think it is.
Re:2005 Called by workdeville · 2007-12-17 11:06 · Score: 1

Purposefully misinterpreting an obviously correct statement to make a dumb point and gain some karma is an example of bad faith. I was just pointing that out. The funniest part of all is that he claims to be talking about algorithmic complexity. But even "infinitely many" processors wouldn't change the algorithmic complexity of a sorting algorithm, since it is defined in terms of the number of operations needed, not the run time. The GGGGGP (or however many G's it was) was clearly talking about run time.
Re:2005 Called by gnasher719 · 2007-12-17 11:23 · Score: 1

Parallel sorting is *hard*. Very hard. The problem is not that anyone can design a sort algorithm that runs in several threads, but that the data is a single lump. If you have 2 cores sorting a list, that list effectively has to be in 2 places at the same time (ie each core's cache). That makes the sorting very slow as the system spends more time moving data from core to core than it spends actually sorting. Let processor 1 sort the first half of the data. Let processor 2 sort the second half of the data. When both are finished, merge the two sorted arrays. If sorting an array is done using a O (n log n) method then this should be about twice as fast.
Re:2005 Called by MagikSlinger · 2007-12-17 11:23 · Score: 1

... doesn't know how to do multi-threaded programming effectively deserves to be on EI anyway. It is not that difficult.

I have a feeling you don't. Nor do most people who claim they know. I've had to debug multi-threaded code written by people who thought they knew. Multi-threading is something to be avoided in 90% of your code. Only certain core functions might benefit from multi-threading.

First rule: profile your code. You might be surprised to find the bulk of your run-time is in 20 lines of code.

Second rule: Change algorithms first. Bubble-sort can be done multi-threadedly, but switching to a single-threaded quicksort would still blow its doors off (and hey, QSort if eminently amenable to parallelization).

Third rule: Know your overhead. Multi-threaded communication and co-ordination is expensive. Make sure you understand the cost of your multi-threaded solutions vs. single-threaded.

Fourth rule: Mutexes are not a cure all. I heard a story about how Sony's clib for PS3 had mutexes on all the non-reentrant functions (read: ALL the string functions). Anyone dumb enough to call a str*() function on a PS3 froze out all the cores while the function ran. When possible, create code that doesn't require mutexes, and if it does require, keep it to a narrow scope.

Fifth rule: GUI-Worker-IO threads. For a lot of applications, you can get away with just having one thread responding to GUI events and passing heavy lifting to Worker threads which in turn can send some heavy I/O work to I/O threads that can set there waiting for data. It's a lot simpler to implement than threading everything. My experience is use some sort of message mailbox system (MessagePort for the Amiga fans out there) where each thread is waiting for messages on a mailbox to act on. There should be some good resources out there on how to design them. For everyone's sake, do NOT use a busy loop checking if your FIFO has a new message. Let the OS do the waiting which can then tell the scheduler to put your thread to sleep until the I/O or signal comes through.

Sixth rule: Learn about parallel algorithms. Writing algorithms for parallel processing is an old and venerated art. It's used in DSPs, video cards and supercomputers. Paper after paper has been written on creating very fast, parallel algorithms. Just learning about them will help you lean how to write in parallel. It's a very different way of thinking about solving problems, and kind of fun. I wish I had a Cell processor just to play with that kind of thing.

Seventh rule: Debugging is hell. Try to keep your code simple & easy when it multi-threads. The race conditions in true parallel systems can give you headaches and blurry vision trying to comprehend it. Do everything possible to make your debugging easier because multi-threading will fail.

--
The bitter lessons of a veteran coder: http://bitterprogrammer.blogspot.com
Re:2005 Called by dbIII · 2007-12-17 11:35 · Score: 2, Insightful

Desktops really don't need all the power they have now, perhaps one percent of users outside of gamers actually use it.

Doing things with digital video and photoshopping still images will use as muich CPU as you can feed it. These are now mainsteam uses for home computers.
Re:2005 Called by Mr.+Slippery · 2007-12-17 12:02 · Score: 1

O(n/m) is better than O(n)

No. For any constant m, O(n/m) is the same as O(n).
We can do better than n*lg(n) because it's a different computing model.

It's not a different computing model. Any finite set of Turing machines can be emulated with a single one.
As a practical matter, sure, parallelism is cool; but from the theoretical perspective, you don't get any new capabilties in terms of what is computable, and how complex it is to compute it.

--
Tom Swiss | the infamous tms | my blog
You cannot wash away blood with blood
Re:2005 Called by Kazoo+the+Clown · 2007-12-17 12:04 · Score: 1

Clearly, Microsoft is running out of sources of new bugs. And if your OS isn't sufficiently buggy, you won't have any reason to upgrade...
Re:2005 Called by AuMatar · 2007-12-17 12:19 · Score: 1, Interesting

Very few people develop programs. Even then, 99% of development is basicly 0 load. Only compilation takes time, and that's a small portion on it with any decent build script. Unless you're developing something truely huge, like the Linux kernel, it just isn't an issue.

Editing video is also a niche use case. Not doing it myself, I won't comment on how much resources it really requires. But 1 in 20 or 50 people do this regularly, if that.

Touching up photos takes almost no resources on a modern machine. People were doing this efficiently on 1GHZ and lower processors.

Browsing, even with multiple windows and tabs and AJAX is very very low resource usage. My EEEPC handles it easily, despite being a 900 MHZ celeron UNDERCLOCKED to a 600 MHZ speed.

Where the hell did I mention moving anything off the desktop? I think web based apps suck for the most part. My point was that most computers are at 1% load or less most of the time, and sit at 30% load or less while in active use, except for those machines being used for video games. You seem to have some hugely inflated idea of what type of resources things actually use, the common email, web surfing, office use case can easily be handled by a 300MHz pentium 2, if not less.

--
I still have more fans than freaks. WTF is wrong with you people?
Re:2005 Called by TotalDisdain · 2007-12-17 12:20 · Score: 1

Technically, The algorithm is still N log(N) + whatever extra you need to do to coordinate between the processes. Many digital hands make light work, but the amount of work is still the same (plus extra time for some of them to surf the internet). The big question that needs to be asked is how do you take a programming paradigm that is object centric and make it care about the process centric or (gasp) architecture centric? Let alone make programmers (double gasp) that went to that paradigm under the promise that they didn't have to learn this icky stuff actually learn the icky stuff. Laws of thermodynamics still apply: 1. You can't win. 2. You can't break even. 3. You can't leave the game.
Re:2005 Called by TheRaven64 · 2007-12-17 12:20 · Score: 1

Most sorting algorithms use a divide-and-conquer strategy. Quicksort, for example, partitions the data on an arbitrary pivot and creates two new lists, one containing the values smaller than the pivot and one containing those larger. The algorithm then recurses. In theory, you can extract parallelism to the order of the number of elements in the list in the divide step, since you could use a separate processor for each element and add it to a the correct set. In practice, this is quite hard (although splitting it in half is relatively easy). The next step is trivial to parallelise; you simply run each recursive call in a separate process.
I implemented a parallel quicksort in Erlang as part of a short tutorial I gave on the language. It was only slightly longer than the serial version. Parallel programming is a easy if you use a sensible abstract model.

--
I am TheRaven on Soylent News
Re:2005 Called by chrismcb · 2007-12-17 13:10 · Score: 1

....it wants it's article back.

Seriously - any developer writing modern desktop or server applications that doesn't know how to do multi-threaded programming effectively deserves to be on EI anyway. It is not that difficult.
Actually it is EXTREMELY difficult to write good multithreading. I think the solution is double edged. Compilers/OSes need to get better about generating/dealing with multithreaded code. And programmers need to think about it a bit more. Get everything not related to the UI on separate threads.
Re:2005 Called by JAlexoi · 2007-12-17 13:37 · Score: 1

While not intending to bash our smart Indian colleagues, Indian universities are churning up a large number of code monkeys that produce a lot of code that is non scalable whatsoever and is getting slower with each core added to you favorite CPU.
Re:2005 Called by try_anything · 2007-12-17 13:52 · Score: 2, Insightful

More and more cores? Consumer desktops and laptops have gone up to a whopping two cores -- four cores only if you blow a wad of dough for bragging rights. Two processors is definitely not overkill for the average user, especially since most users have a browser full of Ajax-ridden web pages open 24/7. I doubt that four cores will be overkill, either, once we start to realize all the various ways we've crippled applications to make them well-behaved citizens of the vanishing single-core desktop.

The massively multicore processors are exactly where they need to be: in servers and workstations, and on the desks of hardware queens who absorb the cost of product development so I don't have to.
poorer performance on the vast majority of apps and games which people were running in isolation

People run the vast majority of their applications concurrently with other applications. The only significant exception is gamers. When you're dealing with a sluggish app on a single-core machine, what are the odds it's unresponsive because of another application vs. being unresponsive because of its own problems? Now, same question, on a dual-core machine? The odds drop quite a bit. It's nice to have a spare core so when one app gets fussy the rest of your applications keep responding normally.
it doesn't seem like applications or operating systems have seen a major overhaul since that time (just incremental gains)

All the more reason to have multiple cores. In my experience, having multiple processors actually compensates for application-level and OS-level multiprocessing deficiencies, because let's face it, one hoggish app can make it very annoying to use a single-core machine. OSes are supposed to mitigate that, but since they don't do a perfect job, multiple cores help keep the system usable. Granted, there are other resources besides CPU that can suffer from contention, but every little bit helps.
Re:2005 Called by chis101 · 2007-12-17 14:48 · Score: 1

I have a computer who's sole purpose is to serve files, download crap, etc... nothing major. So, as a wild-ass-guess, I'd say it has maybe 1% of the load of an average computer user's computer (checking email, browsing web). It was last rebooted 30 hours ago. Htop shows that in the past 30 hours, the running software has used just over a minute of processor time (40 seconds being rtorrent hashing a 15gig file on a 1Ghz AMD). Even if the load was 100, or even 1000, times it's current load, the processor would still be overpowered for what it does.

I think most users are IO bound, not processor bound.
Re:2005 Called by jmorris42 · 2007-12-17 15:04 · Score: 1

> This sounds to me like a great example of passing the buck.

Great example of an ATTEMPT to pass the buck. Remember Itanic tried something similar, passing the buck to the compiler. When the new miracle 'smart' compilers didn't appear Itanic's fortunes faded away. Now few people even know Intel makes a non-x86 arch chip other than low end ARM stuff.

Multi-core shows no sign of having a similar fate yet. Because two or even four cores can speed up existing workloads enough people see the benefit. The question still waiting to be answered is whether 8-16 cores can be utilized. If the software devels fail to produce the required miracle.... well our industry is in for a world of pain. Because it is now clear massive multicore is the ONLY way forward for x86 and the market has demonstrated again and again it isn't interested in anything else. But if performance stops increasing the upgrade treadmill that generates all of the money stops.

--
Democrat delenda est
Re:2005 Called by T-Bone-T · 2007-12-17 15:23 · Score: 1

Photoshop doesn't need any? Are you kidding? Have you tried running a filter on a 24MP image? That takes a bit of time! You strike me as one of those people that asks why we even try to make things better when most people don't take advantage of it. That's a pitiful attitude to have.
Re:2005 Called by ChilyWily · 2007-12-17 16:25 · Score: 1

LOL - though, I will make a brief comment:

One of the hardest areas to address when going Multi-thread (for multi-core support) is the legacy source code, that just works... and management wants to make 'multi-thread capable' so as to preserve the 'equity' in that application.

It's perhaps too simple to say that starting with a blank slate, designing for MT capable applications isn't that hard. When you have to deal with 10-15 years of (legacy) crusty code and unrealistic expectations of zero-risk but max-gain, then you have a different problem to solve and without realizing the truth in your fiction, such generalizations can be seriously misleading!
Re:2005 Called by Stuntmonkey · 2007-12-17 16:50 · Score: 1

How about people who burn DVDs WHILE they browse the web?

These are separate processes so it's trivial to split them across multiple cores. Any OS will do that today. The case of interest is when you have a single logical task that you're trying to speed up with multiprocessing, because that requires you to fundamentally rethink your algorithms.

IMHO this concurrency issue isn't a terribly big deal for desktop computers, at least for a while. Today there are very few single tasks that need >1 core to achieve good performance. Graphics rendering is one, so is video encoding. I believe that most of these will be handled as special cases, as we have seen play out with graphics (specialized parallel hardware in video cards, and all the messy details handled by libraries). Likewise, the x264 video encoder for example has multicore support built in. My point is that so long as the number of computationally intensive tasks of this sort is limited, and amenable to efficient solution using a library, the fraction of desktop programmers that have to think explicitly about concurrency will be relatively small. I.e., it will be a specialty.
Re:2005 Called by joto · 2007-12-17 17:23 · Score: 2, Interesting

Actually, Itanium was a fairly good idea. That it didn't work out, could just as well be ascribed to politics and real-world issues, as to technical issues. For example, the requirements said it should be able to run x86 unmodified (why? if you want x86 you already know where to get it, right?). It was oversold (the next desktop processor), and underperformed (late delivery, bad performance). None of these issues indicate that explicit instruction level paralellism (EPIC) is a bad idea. And they certainly have good Itanium compilers now. The main problem with Itanium (apart from the initial delays) was that it was a solution in search of a problem. It still is. But what a marvellous solution!
Re:2005 Called by Stuntmonkey · 2007-12-17 17:26 · Score: 1

And you're right, this isn't really a desktop issue- its mainly a server one.

I would argue it isn't even much of a problem on the server, realistically speaking. There just aren't that many necessarily parallel problems in reality (by which I mean, single atomic tasks that are too intensive to get the desired performance on one core). Most business computing problems involve serving a large number of (independent atomic) transactions, which is an easy problem to split across multiple cores. The fact that businesses aren't clamoring to solve this "concurrency problem" implies to me that most of their problems are solved more simply, for example with VMWare.

An important class of necessarily parallel tasks is scientific computing. What makes me skeptical that the concurrency problem can be solved in some easy general-purpose way is the fact that people have been doing parallel simulations for decades, and no good (easy) approaches have turned up.
Re:2005 Called by ceoyoyo · 2007-12-17 17:40 · Score: 1

Fortunately most of the things that the average user likes to do are embarrassingly parallel (graphics and multimedia). I really don't see what the fuss is about... I'm having trouble thinking of something normal users would want to run faster that's difficult to parallelize.

The future focus on massively parallel processing does kind of suck for the few people tied to algorithms that are difficult to parallelize. On the other hand, there will be lots of PhDs awarded for finding parallel algorithms. I already cashed in with one peer reviewed paper on parallelizing and distributing an algorithm.
Re:2005 Called by ceoyoyo · 2007-12-17 17:45 · Score: 1

Fortunately those are two very easily parallelizable tasks, and ones that probably already are multithreaded. Certainly on the Mac, which has had multiple processors in desktop machines for the better part of a decade.
Re:2005 Called by 32771 · 2007-12-17 23:02 · Score: 1

EE Guy #3: Oh, they are still whining. Lets just give them VHDL, works for our parallel stuff.

I can't see anything devious in your plans.

--
Je me souviens.
Re:2005 Called by fitten · 2007-12-18 01:20 · Score: 1

Not when you run multiple single-process web servers in parallel ;)

One of the best models for these type things is a multithreaded/multiprocess application (using kernel/preemptive threads if using threads), basically one thread/process per core on the machine (this is a fudge, but for simplicity it's ok as an example). Each kernel thread cooperatively threads (no kernel calls or context switches to get in the way, much lower latency and overhead that way) over the task(s) given to it. It's a bit more complicated to program but you get precise control over when your cooperative tasking switches state and you don't get the overhead of context switches, while you are still able to use every core in the system to tackle your problem. This model is good for high IO tasks that have many channels to satisfy (an application that has many sockets open satisfying web requests, for example, or blasting email, etc.).
Re:2005 Called by Strilanc · 2007-12-18 08:41 · Score: 1

But m is not a constant, its a parameter in our model. You might as well say O(n) is O(1) for any constant n.

In other words, we are looking at how the running time changes as the input size, and number of processors, increases.

You see a similar thing going on in the IO model, where you're counting disk accesses and you have three parameters (memory size M, block size B, input size N). In the IO model sorting takes O(N/B log_M/B_(N/B)), which is better than O(n lg(n)).
Re:2005 Called by T-Bone-T · 2007-12-18 09:12 · Score: 1

Said the zitty-faced anonymous coward.
Re:2005 Called by Mr.+Slippery · 2007-12-18 10:23 · Score: 1

But m is not a constant, its a parameter in our model.

And what model is this? I admit it's been a while since grad school and my last classes in theory, but I don't recall any model used for studying algorithmic complexity or computabilty in which "number of processors" was a parameter.
In other words, we are looking at how the running time changes as the input size, and number of processors, increases.

A single Turing-complete system can emulate any finite set of others; from the perspective of complexity and computabilty theory, it makes no difference how many chips there are. You might as well make processor speed a parameter.

--
Tom Swiss | the infamous tms | my blog
You cannot wash away blood with blood
Re:2005 Called by Strilanc · 2007-12-18 11:06 · Score: 1

That's only true from a computability view point, where running time doesn't matter as long as its finite. Time complexity, on the other hand, can change from machine to machine. Searching a sorted array is O(lg(n)) on the typical computer, but O(n) on the typical theoretical Turing machine because they can't do random accesses.

Consider the problem of searching an unsorted array, and assume processors accessing memory don't interfere with each other. Then if you have m processors you can assign them each a piece of size n/m of the array, and get the job done in O(n/m) time. Check out http://en.wikipedia.org/wiki/Cache_oblivious for an example of a model with more than just n as a parameter.
Re:2005 Called by palegray.net · 2007-12-18 11:20 · Score: 1

"Desktops really don't need all the power they have now, perhaps one percent of users outside of gamers actually use it."

This will hold true until we see the development and adoption of widespread general purpose distributed computing, in which case every desktop is a server, too.

--
512 MB RAM, 20 GB disk, 200 GB transfer, five datacenters. $19.95/month.
Re:2005 Called by cthulhu11 · 2007-12-18 12:44 · Score: 1

EE Guy #2: Exactly! We'll be able to blame the programmers for being lazy and not wanting to learn new complicated algorithms that require an additional 4 years of university. In a Sun class, I encountered a Sun employee from a central-US state who did exactly this as a justification for the performance stagnation of SPARC cores.
Re:2005 Called by xoclipse · 2007-12-23 12:41 · Score: 1

I think that the actual perceived performance difference will really be in how well the operating system handles multiple threads and cores. For example, when I open up Firefox and Photoshop at the same time, depending on how well my OS handles that, MIGHT be the difference in them both opening in 5 seconds or in 10 (I'm aware that there is much more to the time an application takes to launch besides raw speed). Of course, adding cores might only speed up the time it takes to open to a certain degree, after that it is dependent on things like I/O, hardware, latency, etc,.

Most desktop applications will only be able to benefit by a finite amount of parallelization. When you look at concurrency and your application design, the fundamental way that the application interacts with the user and data is going to define the limits of parallelization.

Now when you are using an application and it stops responding when you perform a certain task, that is probably an opportunity for the developer to factor that operation into a new thread (GUI's should never block). But after they've done that and the application's responsiveness is fine, what more can they do? Make another thread that says please wait while we (block) on this thread? You are always going to have things like I/O which is really what the bottleneck usually is.
Re:2005 Called by Shelon+Padmore · 2007-12-31 01:17 · Score: 1

This is not just threading, it requires a paradigm shift. To some extent the facilities provided by the OS can make this a more realistic effort. Let's hope Microsoft is up to the task. - Shelon Padmore

M$ programmers should be already capable by scafuz · 2007-12-17 05:48 · Score: 5, Funny

just start a multithread process: 1 core for the program itself, the remaining 7 for the bugs...

Re:M$ programmers should be already capable by mdielmann · 2007-12-17 07:51 · Score: 1

How sad it is that when I read that I thought, "My God, do you have any idea how hard it would be to keep the bugs separate?"

--
Sure I'm paranoid, but am I paranoid enough?
Re:M$ programmers should be already capable by sohp · 2007-12-17 08:39 · Score: 1

more like:

1 core for Windows
1 core for the anti-virus
1 core for the firewall
1 core for the DRM
1 core for Clippy
1 core for copying files.
1 core for the eye candy
1 core for the application you are trying to use.

hhooppee tthheeyy ffiixx tthhiiss ssoooonn by Chordonblue · 2007-12-17 05:48 · Score: 5, Funny

II hhaavvee aann XX22 pprrocceessssoor? Ii ccaann ggooeess TTWWIICCEE aass ffaasstt nnooww?

--
"...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."

Re:hhooppee tthheeyy ffiixx tthhiiss ssoooonn by ByOhTek · 2007-12-17 05:54 · Score: 4, Funny

my eyes, they bleed.

--
Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
Re:hhooppee tthheeyy ffiixx tthhiiss ssoooonn by Nova1313 · 2007-12-17 06:19 · Score: 3, Interesting

When the first AMD x2 chips came out the linux kernel had issues with the clock on those chips. The clock would be several times (presumably 2 times?) faster then it should be, the cores clocks were not synchronized for some reason or the kernel would lose track... When you typed a letter it would repeat multiple times as you described. :)

--
There exists some positive integer N that you are the Nth person to read this signature.
Re:hhooppee tthheeyy ffiixx tthhiiss ssoooonn by CastrTroy · 2007-12-17 06:35 · Score: 1

I've had the same problem on single processor machines running Linux for years. I don't know if it's a problem of the keyboard repeat rate being set too low, or something else to that effect, but I notice that a lot of the time on my Linux machines it seems to double/triple type a lot of letters.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:hhooppee tthheeyy ffiixx tthhiiss ssoooonn by antdude · 2007-12-17 06:49 · Score: 1

"My eyes! The goggles do nothing!"

--
Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
Re:hhooppee tthheeyy ffiixx tthhiiss ssoooonn by MagicM · 2007-12-17 07:00 · Score: 1

+++ ATE0
Re:hhooppee tthheeyy ffiixx tthhiiss ssoooonn by locokamil · 2007-12-17 07:21 · Score: 1

Not in text, but in images:

http://geekz.co.uk/lovesraymond/archive/hyper-threading

Pretty funny. If ELER were updated more frequently, it would definitely be on my daily reading list. :)
Re:hhooppee tthheeyy ffiixx tthhiiss ssoooonn by $RANDOMLUSER · 2007-12-17 07:54 · Score: 1

+++ ATE0
God, you're old. So am I, and I'm rolling on the floor.

--
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
Re:hhooppee tthheeyy ffiixx tthhiiss ssoooonn by Wisconsingod · 2007-12-17 08:03 · Score: 1

have another shot of tequilla, eventually the letters will blur together
Re:hhooppee tthheeyy ffiixx tthhiiss ssoooonn by Cemu · 2007-12-17 09:13 · Score: 1

You also have to be careful when running vmware on these types of computers or you get the same kind of results. I had to disable all power management features on the host, then the problem, mostly, went away on the clients.
Re:hhooppee tthheeyy ffiixx tthhiiss ssoooonn by Tablizer · 2007-12-17 16:51 · Score: 1

Ii ccaann ggooeess TTWWIICCEE aass ffaasstt nnooww?

I think you just have a dual-cup coffee container.

--
Table-ized A.I.

OS/2? by SCHecklerX · 2007-12-17 05:50 · Score: 4, Interesting

I remember learning to write software for OS/2 back in the early 90's. Multi-threaded programming was *the* model there, and had it been more popular, it would be pretty much standard practice today, making scaling to multiple cores pretty effortless, I'd think. It's a shame that the single-threaded model became so ingrained in everything, including linux. For an example that comes to mind, why do I need to wait for my mail program to download all headers from the IMAP server before I can compose a new message on initial startup? Same with a lot of things in firefox.

Does anybody remember DeScribe?

Re:OS/2? by eMartin · 2007-12-17 06:05 · Score: 3, Insightful

Neither does Microsoft's Outlook Express, but I don't think that was his point.
Re:OS/2? by shoor · 2007-12-17 06:19 · Score: 2, Interesting

I was working at a very small software shop when OS/2 came out. We would get a customer, who wanted something to work on an apollo workstation, another one wanted it for xenix, a third for Unix BSD 4.2 (my favorite), or Unix System V (ugh!), or Dos. So, we got a project to port something to OS/2 version 1.0, and I got it to work, and it used multi-threading which I thought was pretty cute and I was proud of myself for figuring it all out just from the manuals. Then the new revision of OS/2 came out and everything I had done was broken. My boss was so mad he swore off OS/2 forever after that.

--
In theory, theory and practice are the same; in practice they're different. (Yogi Berra & A. Einstein)
Re:OS/2? by John+Hasler · 2007-12-17 07:13 · Score: 1

> For an example that comes to mind, why do I need to wait for my mail program to download
> all headers from the IMAP server before I can compose a new message on initial startup?

Ok, I'll bite: why? I read and compose mail while fetchmail (a background process) fetches new mail, spamassasin (about six background processes) filters it, mailagent (another background process) sorts it, and exim (another background process) delivers it to the appropriate mailboxes.

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Re:OS/2? by SCHecklerX · 2007-12-17 08:12 · Score: 1

Ok, I'll bite: why? I read and compose mail while fetchmail (a background process) fetches new mail, spamassasin (about six background processes) filters it, mailagent (another background process) sorts it, and exim (another background process) delivers it to the appropriate mailboxes.

Which has what, exactly, to do with a non-multithreaded remote IMAP client that makes you wait to do one thing while it is doing something else (usually somewhat network intense)?

Careful. On the server I run sendmail with some custom mimedefang stuff to deal with quarantines, mailing lists, and non-rfc-compliant rejections, milter-greylist, and spamassassin as well. I also pre-sort into proper IMAP folders using procmail (not sure what mailagent is or what it offers over the flexibility of procmail), and use fetchmail against the IMAP server in order to process my SA bayesian learning folders once each hour.

I've run a similar system for a fortune 500 processing millions of messages a week in a former life.

Spawning a bunch of separate processes really has nothing to do with properly writing a multi-threaded app, which was the point of my original post.
Re:OS/2? by Chemisor · 2007-12-17 08:47 · Score: 1

> or an example that comes to mind, why do I need to wait for my mail program to download
> all headers from the IMAP server before I can compose a new message on initial startup?

All good socket code uses asynchronous calls with select. Threading will just make things worse. Especially in the modern UI frameworks, which are all event-driven, asynchronous calls are a perfect fit, have no overhead, or the great multitude of concurrency problems you automatically get by threading.
Re:OS/2? by pthisis · 2007-12-17 09:01 · Score: 3, Insightful

It's a shame that the single-threaded model became so ingrained in everything, including linux. For an example that comes to mind, why do I need to wait for my mail program to download all headers from the IMAP server before I can compose a new message on initial startup?

I'm of the opposite opinion; it's a shame that so many people equate parallel processing with threads. When there's not much shared data, using multiple processes keeps memory protection between your parallel "things", decreasing coupling, increasing isolation, and generally resulting in a more stable system (and for certain things where you can avoid some cache coherency problems, a faster system). Your example is perfect; there's really no good reason to use a thread for such lookups. Another process would do, or even better just use select() and avoid all the pain (and bugs) of a multithreaded solution.

OS developers spent a lot of engineering time implementing protected memory. Threads throw out a huge portion of that; a good programmer won't do that without very good reasons. Some tasks, where there really are tons of complicated data structures to be shared, are good candidates for threading. More commonly, though, threads are used either because the programmer doesn't know any better or because they allow you to be a slacker about defining exactly what is shared and mediating access to it. The latter is especially dangerous; defining exactly what (and how) things are shared goes most of the way toward eliminating multiprocessing bugs, and threads make it easy to slack off on that and get a "mostly working" solution that occasionally deadlocks, fails to scale, etc.

Use processes or state machines when you can, and threads when you must.

--
rage, rage against the dying of the light
Re:OS/2? by swillden · 2007-12-17 09:34 · Score: 1

Which has what, exactly, to do with a non-multithreaded remote IMAP client that makes you wait to do one thing while it is doing something else (usually somewhat network intense)?
Nothing, of course. It is worth pointing out that there are lots of IMAP clients that are multithreaded and handle this very well. Which client are you using, so I can know to avoid it?

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:OS/2? by TheRaven64 · 2007-12-17 12:38 · Score: 3, Interesting

I'm of the opposite opinion; it's a shame that so many people equate parallel processing with threads. I read that and wished I had mod points. Anyone who has programmed with a language designed for concurrency, like Erlang, Termite, or a few Haskell dialects hates using threads. Threads are something that two kinds of people should use; operating system designers and compiler writers. Everyone else should be using a higher-level abstraction.
The big problem is not the operating system designers, it's the CPU designers. They integrated two orthogonal concepts, protection and translation, into the same mechanism (page tables, segment tables, etc). The operating system wants to do translation so it can implement virtual memory. The userspace program wants to do protection so it can use parallel contexts efficiently. Mondrian memory protection would fix this, but no one has implemented it in a commercial microprocessor (to my knowledge).

--
I am TheRaven on Soylent News
Re:OS/2? by fozzy1015 · 2007-12-17 16:54 · Score: 1

I agree, the "Unix way" has been to try to always run separate processes. Threads are seen to some as a 'hack' when IPC methods such a sockets cost too much. Read Raymond's "The Art of UNIX Programming" as an example. But then when you have to serialize access to memory(eg. semaphores), that also costs time.

I work for a big telecom in their enterprise network router division. We're in the process of starting to move from VxWorks to Linux. VxWorks has no concept of memory protection. A VxWorks task would be the equivalent of a thread in Linux. Yes, each task has it's own stack, but it's more of a scheduling mechanism but not for memory protection. For Linux we're pretty much going to map every task to it's own process, but defining a common memory area for data that multiple processes need(eg. switch port status). We already have to move our socket code from a custom implementation(won't work under Linux) to a TCP style. The later is a lot more robust and the current implementation we have actually lets us register our own callbacks instead of having to deal with select loops, so it's pretty quick to get up to snuff. But the one big area of work up ahead is the code is replete with direct function calls to other tasks. Which obviously are going to break when these guys get mapped to processes under Linux. But I agree with you about stability. Some of the more insidious bugs we've had has been when one task over writes another tasks data or TCB. Took a lot of man hours to figure out, and I would equate them to a nasty memory leak.
Re:OS/2? by ceoyoyo · 2007-12-17 17:56 · Score: 1

Threads are quite nice, and they cut down on the number of mysterious entries in your process list.

I also agree that threads as implemented should be hands off in almost all situations. The first thing I do when working with threads in a new environment is to throw an object around them. That makes for all kinds of interesting possibilities. I absolutely hate MPI so I wrote a much nicer system that uses Cocoa's distributed objects and serializable thread objects. It's WAY easier to use, much simpler and often quite a bit faster than MPI's model of tossing around executable files that get turned into processes and then passing messages between them.
Re:OS/2? by TeknoHog · 2007-12-18 03:17 · Score: 1

Anyone who has programmed with a language designed for concurrency, like Erlang, Termite, or a few Haskell dialects hates using threads. Threads are something that two kinds of people should use; operating system designers and compiler writers. Everyone else should be using a higher-level abstraction.
I agree completely. My experience is with Fortran 90, which with good compilers can distribute matrix/vector math over multiple CPUs. For example vector addition is a trivial example of such SIMD code, where the component additions are independent of each other. I would call it data parallelity rather than multithreading anyway.

--
Escher was the first MC and Giger invented the HR department.
Re:OS/2? by delt0r · 2007-12-18 04:16 · Score: 1

In a world where C/C++ are still the dominate "fast" languages, your comments are falling on deaf ears. Lets face it, what we know from academics vers what we use commercially are so different. Most software projects are well over budget and over time and the resultant code full of bugs (its not just windows). Yet the industry still thinks that its some how working? Imagine car manufactures producing cars as reliable as OS?

But the other side of the coin is a lot of these higher languages simply don't have very mature compilers yet. The error messages are very cryptic and often not even correct. Example macros in Scheme. Is it the calling code thats wrong or a bug in the macro? What about when you have macros of macros. I remember having similar problems with Haskell. The bulk part of commercial programing is reasonably simple algorithms and the higher level languges haven't made sense yet.

Having said that, I believe the future (unfortunately distant) is higher level langs where the compilers and VM do all the work.

--
If information wants to be free, why does my internet connection cost so much?
Re:OS/2? by pthisis · 2007-12-28 21:37 · Score: 1

Even if you don't have Erlang-style coprocesses, you can still go with other common multiprocessing solutions. Programmers have been writing multiple process solutions in C since before threads existed in any mainstream OS, and the vast majority of the time they're a much better choice than threads--the only real difference between the two being that threads throw protected memory out the window to a pretty large degree. IMO, if Windows had had good support for multiprocess solutions long ago, threads would be more appropriately used in somewhat limited circumstances.

Nowadays, with CreateProcessEx/NTCreateProcess Windows _does_ have good copy-on-write process solutions, but it's going to take a long time for the culture to move away from a knee-jerk "threads are the way to go for multiprocessing!" opinion (indeed it's been years since XP brought those solutions mainstream and only recently are they being exposed in any major programming venues.

--
rage, rage against the dying of the light
Re:OS/2? by pthisis · 2007-12-28 21:43 · Score: 1

Threads are quite nice, and they cut down on the number of mysterious entries in your process list

There's no reason right now that modern OSes can't share program IDs in the process list for processes; in Linux, use CLONE_PID without CLONE_VM, for instance. That's such a cosmetic change, though, that it's hard to think other OSes won't catch up once programmers realize that they have good multiprocess solutions available now on Windows and OS X (as well as Unix clones), and that throwing out memory protection between different components of your program is just stupid.

And honestly, end users would probably be a lot happier to have a single tab of their browser not crash out the whole thing than they would to have a few less entries in the task manager (which the vast majority of "just plain endusers" barely use at all). It's a nice incremental step, but far less important to the user experience than keeping protected memory around and keeping apps much more stable. I'm not saying it's not important, but just from a UI standpoint it's a lot less important than the things that using multiple processes in lieu of threads bring to the table.

--
rage, rage against the dying of the light
Re:OS/2? by ceoyoyo · 2007-12-29 02:17 · Score: 1

Threads don't require the loss of memory protection, they involve the loss of enforced memory protection. In other words, they give you the choice. Lots of things are a LOT easier using shared memory. Some things pretty much require it.

As for the browser crashing when one tab goes down, that's simply poor programming. You can restart a crashed thread just as easily as you can restart a crashed process.
Re:OS/2? by pthisis · 2007-12-29 06:43 · Score: 1

Threads don't require the loss of memory protection, they involve the loss of enforced memory protection. In other words, they give you the choice.

But not a very granular choice; many systems want to shared only a small part of their memory, but for some reason go with threads (thereby sharing everything.

Lots of things are a LOT easier using shared memory. Some things pretty much require it.

Yes, but thankfully shared memory works fine with processes. And it tends to lead to more stable solutions and be easier to program in the long run; threads often _seem_ easier because you don't have to think about what to share--you just share it all. But that often leads to unforeseen problems unless you're very careful to plan out what things are accessed from where--exactly the work that you needed to do things with multiple processes + shared memory, but the latter enforces your decisions about what to share more rigorously.

I'm not against threads in all cases, certainly. If the vast majority of your memory needs to be shared, particularly when there are very complex shared data structures, then they're the way to go. But too many programmers reach for threads whenever they need to do multiprocessing, when the initial preference really should go to processes unless you know _why_ you need threads.

As for the browser crashing when one tab goes down, that's simply poor programming. You can restart a crashed thread just as easily as you can restart a crashed process.

The problem is that the crashed thread could have overwritten arbitrary memory in other threads, possibly including code, and can cause other threads to crash a lot more easily than a crashed process can bring down other processes.

Memory protection provides isolation, and helps limit the damage that bugs can cause.

--
rage, rage against the dying of the light
Re:OS/2? by ceoyoyo · 2007-12-29 07:17 · Score: 1

The problem is that the crashed thread could have overwritten arbitrary memory in other threads, possibly including code, and can cause other threads to crash a lot more easily than a crashed process can bring down other processes.

Ah, then you have a buffer overflow, which is something it's generally considered a good idea to fix anyway. You should thank your thread for pointing out the problem.

Certainly there are lots of things that can be written better with processes rather than threads. Web servers come to mind. But there are also lots of things where you really want threads... you can "share" memory between processes but it's almost always a much slower process than within a process. IBM, Sun and SGI (well not so much SGI anyway) don't make money selling shared memory machines for a hundred times the price per processor that a similar sized cluster would cost because their customers like having everything in one box.

As you might suspect, I learned to program in an age when you had to look out for your own memory as well as everyone else's and I haven't completely bought into the modern view that the OS/architecture/language should protect the sloppy programmer. Yeah, being able to poke about everyone's memory isn't such a hot idea. But you should be competent enough to generally avoid stepping on your own toes.
Re:OS/2? by pthisis · 2007-12-30 08:17 · Score: 1

Ah, then you have a buffer overflow, which is something it's generally considered a good idea to fix anyway. You should thank your thread for pointing out the problem.

A process would crash anyway, it would just be obvious which one had the problem instead of possibly wasting your time figuring that out.

Certainly there are lots of things that can be written better with processes rather than threads. Web servers come to mind. But there are also lots of things where you really want threads... you can "share" memory between processes but it's almost always a much slower process than within a process.

It's not much slower. It's even a bit faster in a lot of cases, since you only have to worry about TLB synchronization for the parts of memory you're sharing I agree, though, that there are certainly times when you want threads.

IBM, Sun and SGI (well not so much SGI anyway) don't make money selling shared memory machines for a hundred times the price per processor that a similar sized cluster would cost because their customers like having everything in one box.

I'm not sure that I understand what you mean here, nor how shared memory architectures are really relevant to the sort of shared memory we're talking about...

As you might suspect, I learned to program in an age when you had to look out for your own memory as well as everyone else's and I haven't completely bought into the modern view that the OS/architecture/language should protect the sloppy programmer. Yeah, being able to poke about everyone's memory isn't such a hot idea. But you should be competent enough to generally avoid stepping on your own toes.

It's not that it should protect them. It should kill the sloppy code ASAP, do as good a job at pointing out where the problem is as possible, and protect the clean code from the sloppy code.

If a buffer overrun in a multithreaded program tries to write in another thread's memory, you get odd behavior or crashes in the other thread. The original thread continues running, behaving strangely until it tries to hit memory it doesn't have the rights to when it'll die. If you'd used multiple processes, it wouldn't be "coddling" the sloppy code--it'd be killed immediately when it tried to access outside code.

More to the point, though, you never know what is going to cause problems. Just looking around a modern desktop, the file browser, web browser, status bar, text editor, music player, and many other programs all use plugins written by 3rd parties. Almost everything uses shared libraries. Isolating the instability that a bug in any of that can cause is a good thing, and making it easier to find that bug is a good thing.

--
rage, rage against the dying of the light

Thank god by Fizzl · 2007-12-17 05:50 · Score: 4, Funny

Thank god that Java, C# and other piles of shit I hate do this quite intuitively and easily.
Guess I had it coming.
/me closes his eyes and embraces C++ for the last time before the inevitable doom

--
Bot Assisted Blogging

Re:Thank god by ByOhTek · 2007-12-17 05:56 · Score: 1

How about some SDL threading?
Not doing all that other stuff? Maybe pthread can save your soul?

--
Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
Re:Thank god by zifn4b · 2007-12-17 06:11 · Score: 5, Informative

The only significant thing that managed languages make easier with regard to multithreading other than a more intuitive API is garbage collection so that you don't have to worry about using reference counting when passing pointers between multiple threads.

All of the same challenges that exist in C/C++ such as deadly embrace and dining philosophers still exist in managed languages and require the developer to be trained in multi-threaded programming.

Some things can be more difficult to implement like semaphores. You also have to be careful about what asynchronous methods and events you invoke because those get queued up on the thread pool and it has a max count.

I would say managed languages are "easier" to use but to be used effectively you still have to understand the fundamental concepts of multithreaded programming and what's going on underneath the hood of your runtime environment.

--
We'll make great pets
Re:Thank god by Fizzl · 2007-12-17 06:53 · Score: 2, Informative

Ugly, clumsy and complicated compared to Java's way.

I know how to do threading in C++ on every platform I have used for development. It's just that the modern languages have elegant system with forethought given to threading while desinging the platform/language. Why would anyone new want to learn how to do clumsy non-standard threading in C++?
I think the options are to adapt or continue riding the dinosaur untill they die out and be left behind. Sorry that I am sending mixed singnals. I have always worked with C++ and only used these newfangled language when forced to. I feel that I have done something stupid by being too closed minded.

--
Bot Assisted Blogging
Re:Thank god by bdbolton · 2007-12-17 07:49 · Score: 1

Working within the confines of PLINQ, you can get benefits of parallel programming without the pain. There are exceptions of course; it's no free lunch. (PLINQ is the parallel extensions to LINQ. )
Re:Thank god by gbjbaanb · 2007-12-17 08:47 · Score: 2, Interesting

Yeah, but they do it really slowly 'cos you're tied to the framework that has to do it safely no matter what - even if you have 2 threads that never interact with each other, the framework will slap synchronisation all over them anyway.

(I know - I had a discussion with a chap about C# thread-safe singleton initialisation. A simple app to test performance on my little laptop had a static initialised singleton taking 1.5 seconds, lock-based initialisation in 6 seconds. No big deal, we expect that, but then I ran the same tests on a dual-CPU server and both apps took 30 seconds - the framework decided it knew best).
Re:Thank god by Azarael · 2007-12-17 09:14 · Score: 1

Just wait until your try and get the threaded-app you've writing to run on a different OS or runtime though..
Re:Thank god by Richthofen80 · 2007-12-17 10:00 · Score: 2, Insightful

Speaking of C#, MS just released a technology preview that adds extensions / namespaces to C# that make it pretty easy to write parallel-executing code:
http://www.microsoft.com/downloads/details.aspx?FamilyID=e848dc1d-5be3-4941-8705-024bc7f180ba&displaylang=en

Essentially, they turn
for (int i = 0; i < 100; i++) {
a[i] = a[i]*a[i];
}

into

Parallel.For(0, 100, delegate(int i) {
a[i] = a[i]*a[i];
});

and the hint tells the .NET runtime to execute the solution in parallel. No shared memory, no locks, all done for you. That's the way parallelism should work, IMHO

http://msdn.microsoft.com/msdnmag/issues/07/10/Futures/default.aspx

--
Reason, free market capitalism, and individualism
Re:Thank god by anwyn · 2007-12-17 12:40 · Score: 2, Insightful

Yet another annoying attempt to force garbage collection on C++!
Garbage collection is a one size fits all solution, that is not appropriate for all the applications in the C++ problem space. Further there is a lot of C++ code already out there that does its own memory management. It would be difficult to retrofit this code to garbage collection.
Furthermore, many garbage collected languages lack proper destructors. At best they have a finalize method. This interfears with the C++ idiom "object creation is resource allocation; object destruction is resource release". This is the way C++ manages all resources. There are other resources besides memory; like open files, descriptors, network connections and many others. Because the garbage collected languages lack proper destructors, they actually make the management of these other resources more difficult. This can make garbage collected languages more complex and buggy. What the garbage collected languages give with one hand, they take away with the other!
I wish someone would develop a language with optional garbage collection and with proper destructors!
Re:Thank god by JAlexoi · 2007-12-17 13:44 · Score: 1

Even though, you still have to change the mindset.
Re:Thank god by cavebison · 2007-12-17 15:20 · Score: 1

VB always has, and always will, eventually catch up to whatever you crazy guys are doing.

I am the turtle. Phear me.
Re:Thank god by jhol13 · 2007-12-17 17:04 · Score: 1

You do need to understand the concepts but you definitely do not need to understand "what is going in the runtime environment".

Take for example ConcurrentHashMap (in Java). You really do not need to know how it is implemented to use it efficiently. Or maybe I misunderstood you ...

More to the topic, I think parallel constructs to C++ are "too little too late". It will be five to ten years behind Java and others.
Re:Thank god by fozzy1015 · 2007-12-17 17:12 · Score: 1

I haven't messed with Java for the last few years but I do remember working with some developers and testers on a network management application that was written in Java. It was a memory hog
and there were some big issues with leaks, which made me wonder if there were some flaws in the garbage collection. I'm sure Sun's JRE has gotten better by now.

If you want something a little smarter in C++, look up smart pointers and the Boost library. Basically takes advantage that an object's destructor
is called when the object goes out of context. Throw in a couple operator overloads and you have the equivalent of a pointer
that frees memory back to the heap itself. Similar concept to semaphores that release themselves.
Re:Thank god by smellotron · 2007-12-17 17:31 · Score: 1

I wish someone would develop a language with optional garbage collection and with proper destructors!

Python has both deterministic destruction and garbage collection. Unfortunately, last time I checked, they didn't play together very well. If you had a reference cycle and all of the objects had destructors defined, the GC couldn't break the cycle. But aside from that caveat (which should be easy to address for anyone versed in C++ resource management), it's pretty slick to get the best of both worlds.
Re:Thank god by smellotron · 2007-12-17 17:42 · Score: 1
Take for example ConcurrentHashMap (in Java). You really do not need to know how it is implemented to use it efficiently. Or maybe I misunderstood you ...

Looking at Sun's doc for that class, it seems you do need to understand at least some of the implementation in order to use it efficiently:
- It is a hash implementation, and collisions reduce concurrency.
- Retrievals "generally do not block", which means that sometimes they do block. Which means that they may be succeptible to some nasty worst-case-scenarios.
I believe it's fundamental to understand how a container is implemented in order to always use it efficiently (or equivalently, know when it's not appropriate). Most people can use a data structure and have it be the efficient solution 95% of the time, but in order to understand that last 5% you need to know something about the implementation.
Re:Thank god by NovaX · 2007-12-17 17:51 · Score: 1

PLINQ and the new C# threading APIs are quite interesting and should be fun to play with. I'm a little concerned with the focus on parallel loops, which are often abused in C++'s OpenMP because people don't understand the cost tradeoff. A simple loop isn't worth the expense, and a more complex loop may have data dependancies (e.g. a[i] = a[i]*a[i-1]) which the compiler won't flag for you. I could easily see C# developers, new to concurrent programming, greatly abusing this new construct.

I really like Java's util.concurrent, which I view as trading the focus on granularity for extremely well designed framework libraries. I have written a lot of concurrent code with it quite effectively and when needing to similating a parallel mapping function, it was perhaps only twice as long as your example. I was able to write, for example, a distributed lock free master-worker framework using the Java libraries, which I later expanded to implement Map-Reduce (for fun - took just two nights). The new Fork-Join framework for JDK6 (integrated into JDK7) is fairly impressive, though daunting in complexity.

I'm not totally convinced that locks and shared memory should be entirely hidden. There's many cases where real-world infrustructure code is written in a highly concurrent manner, and I'll have to play with Erlang a lot more before I'm ready to say that it can be dropped from that area of the stack. I've met too many Erlang enthusiasts who don't understand the slightest bit about concurrency, though in Erlang, that's also the point.

--

"Open Source?" - Press any key to continue
Re:Thank god by ceoyoyo · 2007-12-17 18:01 · Score: 1

For simple things like that there are already autoparallelizing compilers that will take care of it. They'll even go further and turn that loop into SIMD instructions.

The problem is when you're doing something that's not inherently parallel.
Re:Thank god by barnacle · 2007-12-17 18:15 · Score: 1

You make a good point, but I would take it a bit farther than that with regards to race conditions in general.

For example, I designed the Qore programming language (http://qoretechnologies.com/qore/) to provide a powerful and easy-to-use multithreaded development platform (it's a dynamically-typed, interpreted scripting language).

A lot of underlying c++ code is dedicated to safely managing concurrent access to data (and code - meaning safely avoiding race conditions) so that qore programmers have far less to worry about than for example I do writing the qore execution engine.

Additionally there is deadlock detection and (IMO) a really neat approach designed to reduce the number of SMP cache invalidations (on many CPU architectures this means reducing the number/frequency of atomic reference count operations) while guaranteeing a consistent and coherent view of data from each CPU.

I would say managed languages are "easier" to use but to be used effectively you still have to understand the fundamental concepts of multithreaded programming and what's going on underneath the hood of your runtime environment.

Absolutely right, but there's still a lot a higher-level language to do to make managing the development of multithreaded apps easier. It's one of my primary goals for qore, and I believe it's very relevant for the near future of programming language design.
Re:Thank god by shutdown+-p+now · 2007-12-17 21:47 · Score: 1

There are other resources besides memory; like open files, descriptors, network connections and many others. Because the garbage collected languages lack proper destructors, they actually make the management of these other resources more difficult. This can make garbage collected languages more complex and buggy.
The key word here is "can". GC naturally sucks for the case you've outlined above, and the reasons are all correct, too. But many large programs out there don't really use that many "unmanaged" (in .NET parlance) resources - they mostly deal with classes and objects representing various entities and operations on them, which don't have any resources but memory to manage (they can use other resources during method invocations, but they don't hold onto them). GC is great for such stuff. The ability to pass around and return complex object graphs without caring about memory allocation is really worth it.
I wish someone would develop a language with optional garbage collection and with proper destructors!
You can't really have destructors and tracing GC work side by side for the same object due to semantical differences (destructor is meant to be called deterministically, while GC is not deterministic for any reasonable implementation). But aside from that, if you just want to pick and choose on the object-by-object basis, C++/CLI offers just that.
Re:Thank god by ByOhTek · 2007-12-18 01:27 · Score: 1

I use a variety of languages. I've never been particularly enamored with Java, it's easy to program, but I've found the results to be less than reliable.

Threading in C++ or Python (where I've done my threading) hasn't seemed so bad.

--
Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
Re:Thank god by CoughDropAddict · 2007-12-18 06:20 · Score: 2, Interesting
Essentially, they turn
for (int i = 0; i < 100; i++) {
a[i] = a[i]*a[i];
}

into

Parallel.For(0, 100, delegate(int i) {
a[i] = a[i]*a[i];
});

and the hint tells the .NET runtime to execute the solution in parallel. No shared memory, no locks, all done for you. That's the way parallelism should work, IMHO

So let me get this straight: the runtime is going to
1. find one or more other threads to farm this work out to, either by creating new ones or taking them from an existing pool
2. make the thread(s) runnable, and wait for them to get scheduled by the OS
3. coordinate the communication between the main thread and the other thread(s) about what part of the solution each thread should work on
...and this is supposed to be faster than a simple for loop with 100 iterations?

Sounds like a losing proposition to me. I don't think this is the kind of parallelism that is going to bring noticeable gains.
Re:Thank god by Richthofen80 · 2007-12-19 02:43 · Score: 1

Check out the link I put in the grandparent post. There's a raytrace example that shows noticable gains in performance when using the Parallelism libraries. The example loop I had obviously doesn't need parallelism, but if you're doing real work over a large set, then it would.

--
Reason, free market capitalism, and individualism

How many languages have multithread support? by cyfer2000 · 2007-12-17 05:53 · Score: 1

How many languages have multithread support already?

Java, C#(?), Fortran(?)...

I haven't been programming in those languages for some time, so just curious, and my current major language (Igor pro) will use all the cores automatically, and how many languages do multithread this way? Matlab(?), Octave(?).

--
There is a spark in every single flame bait point.

Re:How many languages have multithread support? by Splab · 2007-12-17 05:56 · Score: 1

Any CSP implementation will help on that.
Re:How many languages have multithread support? by ILongForDarkness · 2007-12-17 06:14 · Score: 2, Interesting

Matlab isn't that smart, you still have to tell it that the for loop is parallizable for example. I might be wrong but I don't think Java or C# do either. Their frameworks/VM's supply API's to do multi-threading you simply call into them for the support that you need. C has had pthreads for a long time (since it was standardized?), for some reason the C++ committee's never agreed on an implementation.
There is a great talk by Bjarne Stroustrup (http://csclub.uwaterloo.ca/media/C++0x%20-%20An%20Overview.html) about the new version of C++ coming out and some of the difficulties getting things added. Essentially, if a new feature will only help 100,000 developers, it isn't important enough to be implemented. With such a huge developer community all the "little" things get left for non-standard API implementations, only big, almost everyone will find useful features get added. That is probably why this version or the next of C++ probably will get a standard tread library, because almost everyone has access to a multicore system. Oh yeah, also, and it sucks, anyone with a few thousand dollars to waste can get added to the committee, but most people don't care enough to go get their feature implemented for that much money (you also have the travel/time off to attend the meetings) except big business, so guess who runs the show (I don't expect anyone to be suprised).
Re:How many languages have multithread support? by civilizedINTENSITY · 2007-12-17 09:27 · Score: 1

Don't forget Lisp and Haskell :-)
Re:How many languages have multithread support? by kramulous · 2007-12-17 13:34 · Score: 1

Sorry to be a bit of a bitch, but there are some core function calls in Matlab that are core aware. Just don't ask me which ones, as I rarely remember crap like that. Have just noticed from time to time.

--
.
Re:How many languages have multithread support? by ILongForDarkness · 2007-12-18 02:36 · Score: 1

Interesting, it was recently that they added parallel support, around version 7.0 (currently at 7.5), anyways, any references to parallel support I've seen always used macro calls to tell it that that is the case. Perhaps some of the API's are core aware. Stuff like linear algebra routines should be (as they are stolen from LAPACK anyways). A lot of code people might thing is parallel because of the way that you call it, but that isn't necessarily the case. eg. x(1:10, :) = y (1:10, :) or something similar.

Re:Wow, this is a great idea! by mjorkerina · 2007-12-17 05:54 · Score: 1

Wirth's law, even though he's not the one who came up with it.

The basic problem by ucblockhead · 2007-12-17 05:55 · Score: 5, Insightful

Some algorithms are inherently not amenable to parallelization. If you have eight cores instead of one, then the performance boost you can get can be anywhere from eight times faster to none at all.

So far, multiple cores have boosted performance mostly because the typical user has multiple applications running at a time. But as the number of cores increases, the beneficial effects diminish dramatically.

In addition, most applications these days are not CPU bound. Having eight cores doesn't help you much when three are waiting on socket calls, four are waiting on disk access calls and the last is waiting for the graphics card.

--
The cake is a pie

Re:The basic problem by Arakageeta · 2007-12-17 06:06 · Score: 1

You can actually get more than a 8x speed up by smartly exploiting a shared cache between cores.
Re:The basic problem by $RANDOMLUSER · 2007-12-17 06:06 · Score: 1

Some algorithms are inherently not amenable to parallelization.
Are you sure about that? If you put 9 women on the task of making a baby it only takes a month...

--
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
Re:The basic problem by Armozel · 2007-12-17 06:09 · Score: 1

Yes, that's what I've learned in brief with regard to multi-threading in classes I've taken this year (yes I'm still an under-grad). One way around this maybe to look toward OSes handling the CPU resources as one logical processor, but that may not work out so well in the end (in my opinion). In the end, it's just best to figure out which kinds of applications best use multi-processors, and figure out the algorithms that can execute in parallel, and which that cannot. But all this thinking that every algorithm can be made to magically execute in parallel doesn't seem to fit reality, at least for me and my programs.
Re:The basic problem by Anonymous Coward · 2007-12-17 06:11 · Score: 2, Interesting

In addition, most applications these days are not CPU bound. Having eight cores doesn't help you much when three are waiting on socket calls, four are waiting on disk access calls and the last is waiting for the graphics card. Processors don't "wait" on blocked IO calls. Your program waits while the processor switches to another task. When the processor switches back to your program, it checks to see if the blocked IO call has completed. If it has, it continues executing your program again. If not, your program continues to wait while the processor again switches to other tasks.
So it is you (as the programmer) that determines if your program just sits and waits for blocked IO to complete. Or you could spawn a thread for blocked IO calls so your main program thread continues executing (if it is viable to your situation).

With more processors, your program and its blocked IO calls will be checked more frequently. So even blocked IO calls will see a performance increase.
Re:The basic problem by ahabswhale · 2007-12-17 06:19 · Score: 1

You can actually get more than a 8x speed up by smartly exploiting a shared cache between cores. Your post is so vague, that it is pointless. You might want to be more specific because otherwise I could easily just say "no you can't" and I'd be just as right.

--
Are agnostics skeptical of unicorns too?
Re:The basic problem by $RANDOMLUSER · 2007-12-17 06:20 · Score: 1

With more processors, your program and its blocked IO calls will be checked more frequently. So even blocked IO calls will see a performance increase.
You fail it.

--
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
Re:The basic problem by phasm42 · 2007-12-17 06:28 · Score: 1

Your post is so vague, that it is pointless. You might want to be more specific because otherwise I could easily just say "no you can't" and I'd be just as right.
Grandparent post was specific: shared cache

You don't have to be an ass if you want specifics, you could just ask. In this case, I think GPP is referring to the upfront cost of memory access. Once one core pays that price (think of it as a fixed cost), the other cores will be able to access the memory from the shared cache without having to pay that cost. Thus, theoretically resulting in a speedup greater than the number of cores.

--
"No one likes working in a hamster wheel, and your shop smells of cedar shavings from here." - TaleSpinner
Re:The basic problem by Chris+Mattern · 2007-12-17 06:38 · Score: 1

Processors don't "wait" on blocked IO calls. Your program waits while the processor switches to another task.

That's his *point*, nimrod. Processors don't wait on blocked I/O calls, but processes do. Therefore, having umpteen processors doesn't do you much good if there are no processes ready to run because they're all waiting on something.

Chris Mattern
Re:The basic problem by owlstead · 2007-12-17 06:52 · Score: 1

Doesn't matter if I/O doesn't matter for CPU utilization. The problem is that all of my programs are limited by I/O rather than CPU. In this regard, I've high hopes of flash solid state drives and I've got the feeling that they will make more of a difference for everyday computing than any multi-threaded CPU I can put on my motherboard.

I'm just reading C'T (the Dutch version of computer and technology, a German magazine), and they do a test with 32 threads on a single hard disk. Throughput? 1.0 to 1.1 MB/s, depending if NCQ (native command queuing) is enabled. Use an SSD and you should get *much* higher numbers. 1 MB/s would mean that a single logging method would much more slow down your PC than any other multi-threading issue except for total starvation or dead lock.

Of course there are applications that just rely on the CPU, but I tend not to have to wait for it even for heavy encoding methods. If I do anything slightly with drive I/O and I have to test for performance, I'll use a RAM disk and switch of the feakin company installed McAfee bug-ware. On a similar note, which Windows/Linux application can I use that makes reliable copies and access files *in sequence*?
Re:The basic problem by DaphneDiane · 2007-12-17 06:54 · Score: 1

Some programs can even end up running slower depending on stuff like CPU affinity (which basically amounts to cache misses) and additional locking overhead and needing to add memory barriers. Tricks that work with a single core under multithreading don't always carry over well to multiple cores.
Re:The basic problem by 99BottlesOfBeerInMyF · 2007-12-17 07:08 · Score: 1

Some algorithms are inherently not amenable to parallelization. If you have eight cores instead of one, then the performance boost you can get can be anywhere from eight times faster to none at all.
Part of the article's point is that programmers need to learn new algorithms to replace algorithms that are not easily made parallel. Now not all tasks can be easily parallelized either, but a lot are and are not being taken advantage of.

So far, multiple cores have boosted performance mostly because the typical user has multiple applications running at a time. But as the number of cores increases, the beneficial effects diminish dramatically.
Well, also multiple OS's running at the same time as well, which may become more popular. The point is, programmers need to make applications more parallel so this is not the main benefit anymore. This has already happened in CPU intensive fields and is happening more and more in OS development.

In addition, most applications these days are not CPU bound. Having eight cores doesn't help you much when three are waiting on socket calls, four are waiting on disk access calls and the last is waiting for the graphics card.
I'm not sure this is true. It certainly is not the only bottleneck, but it seems to be the biggest one. There will be other bottlenecks, but many can be alleviated if they become the real choke point. Disk issues can be mitigated by faster interconnects and multiple disks/platters to increase read/write. As for graphics cards, that is one area where multiple cores are already helping. OS X Leopard, for example, now spawns an OpenGL feeder process that can be moved to another core removing what is one of the biggest actual bottlenecks for GPU use, the CPU time in handing over data. This one OS improvement can theoretically double the speed of some, existing OpenGL applications, while realistically it is seeing much more modest improvements, that is the type of rethinking that needs to be done to take advantage of multiple cores and it is the type of improvement that has not really been happening enough because many programmers are too set in their ways.
Re:The basic problem by ucblockhead · 2007-12-17 07:09 · Score: 1

Correct. I was oversimplifying. But the basic idea is correct. With an 8 core system running basic user software, I strongly suspect that nearly all threads are waiting for IO and thus most of the cores are entirely idle most of the time.

--
The cake is a pie
Re:The basic problem by ucblockhead · 2007-12-17 07:11 · Score: 1

Yeah, you're correct. I should have said "process".

--
The cake is a pie
Re:The basic problem by ChronoReverse · 2007-12-17 07:16 · Score: 1

Wait, all of your programs are limited by DISK I/O?

Is your program really ripping through a gigabyte of data per second?

You're chunking your data into segments that fit into memory and loading them all at once enmasse right? Perhaps you could have two threads, one to load data while the other does processing in memory.

There's almost always a way to reduce the disk I/O in these days with so much RAM and cache (on the CPUs and disk themselves).
Re:The basic problem by przemekklosowski · 2007-12-17 07:32 · Score: 1

If you have eight cores instead of one, then the performance boost you can get can be anywhere from eight times faster to none at all. If you're lucky you could get so-called superlinear speedup, i.e. speedup by more than eight. Most of the time it's because of caching effects, for instance because one CPU is prefetching the data into a cache, which the other CPU will find immediately instead of waiting for a cache load from memory. Another scenario is when you distribute a large load that exceeds the unified cache, but whose pieces fit into individual CPU caches, or when distributing the load causes the access pattern to avoid cache thashing. In fact, for most current desktop loads more than two CPUs is an overkill, so before we successfully parallellize those apps, it's quite reasonable to use an extra core for prefetching. This could actually be a relatively painless tweak to the compilers and runtime libs: the compiler knows when code is about to access large data, and starts a separate prefetch thread that reads the data ahead of the main loop in the original thread, so that by the time the main loop gets to it, it's available in the cache.
Re:The basic problem by GNious · 2007-12-17 07:34 · Score: 1

You will have some potential problems there.

One is clock-synchronicity, where the women may eventually synchronize to a single clock, effectively removing the delay between overlaps. Another one is where the women will set up Mutexes resulting in less execution-time than if there was just 1.
Re:The basic problem by SL+Baur · 2007-12-17 07:46 · Score: 1

Try reading this: http://lwn.net/Articles/259710/ (What every programmer should know about memory, Ulrich Drepper). There's a link there to the article broken down into HTML pages if you don't want to look at the PDF.

Hope that helps.
Re:The basic problem by SL+Baur · 2007-12-17 07:55 · Score: 1

if you're working with a 2 MB dataset on a processor with a 1 MB cache, then you move to 2 processors with 1 MB caches, you can see more than a 2x speedup because now all of the work can be done in cache. And you can also see a huge slowdown if the cache pressure is too high. Drepper's paper, which I linked above has examples of both.
Re:The basic problem by savuporo · 2007-12-17 08:35 · Score: 3, Insightful

So far, multiple cores have boosted performance mostly because the typical user has multiple applications running at a time. But as the number of cores increases, the beneficial effects diminish dramatically.
They diminish, but they never disappear. Even in algorithms where you completely have to wait the results of previous computation to go on, you can still get a speedup with branch prediction. In essence, while your one core is cracking the numbers, other cores do the what if work, and even if you mispredict in lots of cases, you can still get speedups with large datasets, because in some cases, when your first core comes up with a result, you will discover that the what if computation started out with a right guess.
Hey, i hear they are doing essentially the same stuff with all those newfangled multiscalar processors and branch prediction anyway.

--
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.slashdot.org Errors found while checking this document as HTML5!
Re:The basic problem by Kyojin · 2007-12-17 08:40 · Score: 1

Are you sure about that? If you put 9 women on the task of making a baby it only takes a month... It does once you get your pipeline filled, even on a single core processor like a P4. With 9 women you can push them out at a rate of one per month. There's 9 months of latency for the first one though, which is why we moved to a chip with a much shorter pipeline, the Core architecture.
Re:The basic problem by franl · 2007-12-17 09:27 · Score: 1

In addition, most applications these days are not CPU bound. Having eight cores doesn't help you much when three are waiting on socket calls, four are waiting on disk access calls and the last is waiting for the graphics card.
I'm not sure this is true. It certainly is not the only bottleneck, but it seems to be the biggest one.
I would argue that disk I/O is the biggest bottleneck. Disk access times are measured in milliseconds. Memory access times are measured in nanoseconds. Eliminating those multi-millisecond pauses will vastly improve the performance of just about any app and the whole system. And don't forget that paging in code and static data from the executable image is disk I/O too. Have you ever used a system with fast-rotating SCSI disks? They blaze compared to the same configuration without those fast disks. Unless your your load average is almost always greater than 1.0, you will benefit more by having faster disks than by having more CPUs.
Re:The basic problem by 99BottlesOfBeerInMyF · 2007-12-17 12:04 · Score: 1

I would argue that disk I/O is the biggest bottleneck. Disk access times are measured in milliseconds. Memory access times are measured in nanoseconds.
But disk access bottlenecks are not very common, and are usually in response to a user interaction, which is less objectionable to most users. If I tell my machine to open a picture and it has to load from disk, sure that is a delay, but that is pretty rare. Usually, when working or playing, everything is loaded into memory after the initial startup of the program (which I only do once every week or so). The really problematic bottlenecks are when I'm working and in the middle of a workflow and suddenly have to wait because my CPU cores are maxed out.

Unless your your load average is almost always greater than 1.0, you will benefit more by having faster disks than by having more CPUs.
Well here I am at home on a typical evening, posting to Slashdot and my load average is 1.3. At work, that average tends closer to 3.0. Good thing this machine is dual core, huh?
Re:The basic problem by JoelKatz · 2007-12-17 16:07 · Score: 1

The performance boost can be greater than 8. Consider, for example, an application where 8 tasks must be performed in rapid succession and then the process repeat. With a single CPU, you could spend the vast majority of your time just switching from task to task. With 8 CPUs, you need never switch tasks.

Of course, this seldom happens in the real world.
Re:The basic problem by Tablizer · 2007-12-17 16:48 · Score: 1

So far, multiple cores have boosted performance mostly because the typical user has multiple applications running at a time.

Yeah, viruses.

--
Table-ized A.I.
Re:The basic problem by smellotron · 2007-12-17 17:45 · Score: 1

With 9 women you can push them out at a rate of one per month. There's 9 months of latency for the first one though

Ironically enough, this was a topic of conversation with some of my co-workers today. I wonder how frequently software developers end up talking about pipelined pregnancies for optimal baby throughput...

Companies do not want good tools by JackMeyhoff · 2007-12-17 05:56 · Score: 1

MSFT had C Omega, what did they do, crapify it into a .Net library to use via C# which is not the best library to use in a parallel world. C Omega was nice and abstracted a lot of the parallelism but no they canned that project. Look at LINQ, they crapified C# with VARIANT types and EXTENSIONS. Totally Crapping on OOP.

--
http://www.rense.com/general79/wdx1.htm

Re:Companies do not want good tools by JackMeyhoff · 2007-12-17 07:24 · Score: 1

Yes, I know what it is. It is god damn fugly to use. Of course if you know the resulting type from your LINQ query you can use that. I just hate all the syntactic suger they are adding to C# and on the other hand they dump C omega which solved a lot of the parallelism problems by abstraction at the language level.

--
http://www.rense.com/general79/wdx1.htm

Multiple Applications. by headkase · 2007-12-17 05:59 · Score: 1

For now the biggest advantage of multiple cores is the ability to run multiple applications with each running at full speed. Within each application the problems get a lot more complex, using current algorithms many tasks are not easily subdivided. With data that is inherently paralizable it's pretty easy - each pixel on your display is relatively independent of the others and drawing on a common dataset. However the majority of other areas are not so easy. Generally, how do you take an algorithm and divide it in such a way that step A is separate from step B? Especially if the input of step B depends on the output of step A. Now that multicores are becoming common more research will be done in coming up with a fundementally new approach to algorithms themselves but two cores absolutely does not mean two times speed improvements - some algorithms simply cannot be divided with our current level of understanding.

--
Shh.

Re:Multiple Applications. by jfmiller · 2007-12-17 07:01 · Score: 1

I'm going to be flip her but...

Make B not depend on A!

Most of todays programming practices introduce serial dependency where it is not needed. A good example is how most C-style languages iterate over a set of objects. for(i=0;i++;i<n) is inherently serial with 'i' needing to be updated before the next iteration. Some of you will have seen OO constructs like 'var.each do |v|' in Ruby which is a parallel construct in a lot of cases (in the case of an Array where order is assures the programmer can make assumptions about serial execution). Erlang takes this to the extreme to make incredibility parallel systems, with the cravat that it is not as programmer friendly. Think of all the operations done on tree structures that are clearly parallel with each branch analyzed independent of the others. Now consider how hard it is to convene a compiler to deal with them in that way.

The point is that the grammar of languages like Java and C force otherwise parallel operations to be serial. Using languages whose constructs assume a parallel operations will be much better then languages with additional 'thread' type constructions.

--
Strive to make your client happy, not necessarly give them what they ask for
Re:Multiple Applications. by John+Hasler · 2007-12-17 07:21 · Score: 1

Erlang takes this to the extreme to make incredibility parallel systems, with the cravat that it is not as programmer friendly.
This is true. Programmers don't like neckties.

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Re:Multiple Applications. by joto · 2007-12-17 17:42 · Score: 1

Most of todays programming practices introduce serial dependency where it is not needed. A good example is how most C-style languages iterate over a set of objects. for(i=0;i++;i<n) is inherently serial with 'i' needing to be updated before the next iteration. Some of you will have seen OO constructs like 'var.each do |v|' in Ruby which is a parallel construct in a lot of cases (in the case of an Array where order is assures the programmer can make assumptions about serial execution).

And your point is? It shouldn't be particularly hard for the compiler to analyze the code in a for-loop to see if it can be paralellized. If the compiler can do loop-unrolling and all the other kinds of fancy tricks it does, this should be a piece of cake. The problem is that we haven't got the hardware to execute it on. And by the way, Ruby is not executed in parallel.
Erlang takes this to the extreme to make incredibility parallel systems, with the cravat that it is not as programmer friendly.

Erlang is very programmer-friendly. It may not be friends with you, but many people who have worked with it, swear by it. Personally, I haven't used it much, as it doesn't solve a problem I have, but I found it no harder to use than most other functional languages.
Think of all the operations done on tree structures that are clearly parallel with each branch analyzed independent of the others. Now consider how hard it is to convene a compiler to deal with them in that way.

If labels and conditional gotos are too hard to conceptualize for you, I doubt you will think it is easer to conseptualize how the compiler transforms branches into speculatively executed threads. But again, the problem isn't the languages we use. It's that we haven't got the hardware to execute very short-lived parallel threads at every junction. If such hardware existed, I'm sure we could create compilers for it.
The point is that the grammar of languages like Java and C force otherwise parallel operations to be serial. Using languages whose constructs assume a parallel operations will be much better then languages with additional 'thread' type constructions.

I disagree that Java and C are the problem. Sure, there are better languages for concurrent programming, but whether you choose to solve the problem through better languages, better compilers, libraries, or programming practice --- doesn't really matter. The important thing is that unless you have hardware to run it on, none of the proposed solutions will matter, because neither of them lead to any improvements. It took actual computers to design FORTRAN too.
Re:Multiple Applications. by JasterBobaMereel · 2007-12-18 02:27 · Score: 1

The problem is this is all a bit esoteric for the average user (most /.ers are not average users) ...

What apps are you running right now?

e.g.
Web Browser
Word Processor
Spreadsheet
Music player

Besides running each app on a separate core (or maybe each tab in the browser) how are multiple cores going to help?
None of these needs much pure CPU speed (except the music player maybe) they spend most of their time waiting for the user, and the current *multithreaded* apps help when they are doing multiple things at once (if done properly)?

All the examples I have seen above are sorting large datasets , photoediting, video processing - all very CPU intensive, and the kind of thing that most people would run with as little else running as possible?

Lagging applications are mostly very processor intensive (or are being slowed by another such app) which parallel processing will help, or have other resource problems (memory, disk etc ...) that will not be solved by more cores

--
Puteulanus fenestra mortis

Clue by 4? by Nomen+Publicus · 2007-12-17 06:00 · Score: 1

If your operating system can multi-program, you can have 8 programs running at the same time. No threads needed.

If you think your laptop doesn't need to run 8 programs at the same time you really should look under the hood more frequently :-)

Re:Clue by 4? by Anonymous Coward · 2007-12-17 06:19 · Score: 1, Informative

The problem is, most of those programs running "under the hood" on your laptop are likely small enough that they would not cause moderate strain on even a single CPU.

The user wants the application that he is currently running to perform as well as possible. If that application is single-threaded, it may not be able to perform as well as it needs to appear fast and responsive, and the net result will be that the user perceives the system as slow.

There's only so much an OS can do to "hide" the need for concurrent design and programming.
Re:Clue by 4? by gbjbaanb · 2007-12-17 08:51 · Score: 1

Damn, now all I need is 8 hard drives to keep those apps fed with data. :-)

Is speed still the issue? by heroine · 2007-12-17 06:01 · Score: 1

Most of the jobs being created are not for achieving maximum speed but standards compliance. Companies want software which is easy to maintain & portable, but not necessarily the fastest. If it still was 1997 there would probably be ubiquitous implementations for SMP & vectored assembly language, but that's not the focus anymore.

It's the Curse of the Algorithm by MOBE2001 · 2007-12-17 06:02 · Score: 1, Troll

The reason that parallel programming is so hard is that we're still using the same computing model that English mathematician Charles Babbage pioneered 150 years ago. It's time to change. To understand the problem, read, Parallel Programming, Math, and the Curse of the Algorithm.

Re:It's the Curse of the Algorithm by tchuladdiass · 2007-12-17 06:20 · Score: 1

I've read that, and the ideas that it links to. What you are proposing is that everything be converted to a form of multi-branch pipeline programming, correct? That is, think of a standard Unix pipe. Then imagine a process having multiple inputs / outputs, and each of those outputs can be connected to a input on a different process. So once the base modules are done, programming would be a matter of connecting various inputs and outputs, like designing an electronic circuit.

I can see how this would help with multiple cores, but can every construct in computing be represented by these "circuit" modules? Also, is this similar to Hartmann pipelines?
Re:It's the Curse of the Algorithm by CharlieD · 2007-12-17 06:57 · Score: 1

An alternative is to look at the data - multiple processors should make dealing with data in the form of vectors and matrices much faster. However, you have to THINK in terms of vector and matrix data.

A language dating back to the 1960's and 1970's did just that - it was called APL (A Programming Language). Loops, while do-able (forgive the pun), were extremely wasteful of cpu time - by a factor of 80 to 100. The APL interpreter took care of that bookkeeping for you if you used vectors or matrices as whole entities rather than grinding through them element by element. The language was extremely rich in vector and matrix operators (and their combinations) which allowed a programmer to take advantage of that quirk/feature of the interpreter. The obvious step was to use parallel hardware. The word on the street was that interpreted APL ran faster on one of the few parallel cpus of the time than did compiled Fortran. Unfortunately APL did not transition well into GUI world, although at least one modern implementation is around.

APL was also the only truly interactive language I have run into, but that is another story.
Re:It's the Curse of the Algorithm by Josh+Booth · 2007-12-17 07:27 · Score: 1

I think about this every once in a while. It would almost make programming like playing with legos -- just connect the bumps err pipes. But what the problem is is that there usually needs to be some control, which kernel designers frequently run in to. Sure, I can cat /dev/cdrom, but I can't pipe a file into it and expect it to burn. For that, and lots of other things, you need ioctls, which sort of destroy your "everything is a pipe" approach. Everything becomes a sort of typed pipe, and you can't link everything together anymore. And how would you design a GUI this way? Sure, each X client connects to the X server by a pipe, but you then you need IPC, and both of these use specialized protocols, again destroying the "everything can connect to everything by a pipe" idea. Sigh, I guess it will be at least another 20 years until we have the Lego Block programming language.
Re:It's the Curse of the Algorithm by Eli+Gottlieb · 2007-12-17 11:15 · Score: 1

I hate to fuel that man's "curse of the algorithm" mania (he's a well-known troll), but read up on the operating system Plan 9 from Bell Labs.
Re:It's the Curse of the Algorithm by Kazoo+the+Clown · 2007-12-17 13:42 · Score: 1

APL was brilliant for its time, the only serious and also pictographic programming language that I'm aware of, a fascinating characteristic that unfortunately Iverson himself gave up on and reverted to ASCII digraphs with the language J. I often complain about the fact that programming on our current keyboard, originally adapted from existing non-computing devices as it was, has relegated us to have to resort to the available symbols * and / for multiply and divide where proper symbols were not available-- we're still using a 50 or more year old keyboard standard completely for reasons of ancient practicality, not elegance. APL was a valiant but failed effort to try to change that.

One big problem with APL as a pictographic language is extending it implies a need for new pictograms over time, and a typewriter keyboard, even redesigned for the language, is a poor candidate for such expansion. The earliest forms of APL kept that problem somewhat in check via the use of overstrikes, so that you didn't have to resort to a separate key for each symbol but instead could learn a relatively manageable set of symbols that could be reused in combination to produce the entire symbol set. However, the use of overstrikes appeared to be out of place on video terminals, and so the innovation of overstrikes gave way to video terminal keyboards with a vast array of stick-on symbols that one would have to learn the location of, in order to write programs.

APL is also about the most terse programming language ever devised, a crucially important characteristic at a time when dial-up baud rates were often 110-300. When it takes that long for characters to transmit, one-liner programs of a hundred or so pictographic symbols works pretty darn good.

And despite APL's inherent array operations, automatically parallelizing APL is a relatively crude means of taking advantage of multi-core systems, as the setup and teardown overhead would require additional logic to determine if the size of the arrays and/or complexity of the operation would warrant it. Logic that would have to be done even when determining that parallelism is not appropriate for the operation.

APL's real failure has been in the immense difficulty of standardization-- no two APL vendors ever thought about it alike, the various extensions and workspace formats differ significantly, and implementing a full APL is a complex undertaking as language implementations go. In a world where "write once run anywhere" is an important goal, APL fares rather poorly in that regard.

While I'm sure there are some APL die-hards that are just waiting for the advent of many-core desktops as the chance for APL to "shine again." I fully expect it to disappear as anything more than a curiosity, and fairly soon as the original die-hards have been entering retirement age already for some time now. And many of those who haven't, like Iverson have chosen to move to the J language or some other ASCII based array language, giving up one of the most important things that I think made APL special, if not uniquely valuable-- its pictographic character.
For me, APL will always have a soft spot in my heart, being the first computer language I was ever exposed to and on which I learned to program (on an IBM 2741 selectric terminal). Long ago I wrote my own interpreter and have recently given it somewhat of a facelift for Windows (and mine still supports overstrikes, as I don't need the stickers-- I still remember where all the original symbols are). I use it as a super-Calculator and that's about it though (it remains really good for that). While I sometimes get wistful about its potential, and try to think of ways it could possibly mutate into something important in the modern world, I harbor no real illusions of a resurgence...

C++? by K.+S.+Kyosuke · 2007-12-17 06:02 · Score: 1

...while all the clever folks have already started writing their scalable applications in something reasonable, like Erlang? No offense to anybody using C++. but I think that C++ would first profit from some serious weight reduction dieting, before they start trying to develop better concurrency concepts.

(Not that this concerns me too much, I'm going to stay with Common Lisp...yeah, I know, it might suffer from the same issues (sometimes even vague semantics) in some places, but I probably just have that strange incurable parenthetical personality disorder that suddenly broke out on the beginning of the 80's. ;-))

--
Ezekiel 23:20

Re:C++? by Anonymous Coward · 2007-12-17 06:08 · Score: 1, Informative

Almost all the desktop software that matters are written in C++ so obviously the clever minds are not where you think they are. Almost anything that has to do with graphics like the software from Adobe is written in C++, almost anything that has to do with video is written in C++, almost anything that has to do with audio and music is written in C++... the creatives mind are using C++ while the folks banging their heads at yet another web 2.0 or financial app are using Java, Lisp or scripting languages.
Re:C++? by K.+S.+Kyosuke · 2007-12-17 06:39 · Score: 1

Yes, and almost all these applications run only on Windows. Therefore, Windows are beyond any doubt the best and most creative OS in the world. :-)

--
Ezekiel 23:20
Re:C++? by Yetihehe · 2007-12-17 08:26 · Score: 4, Informative

...while all the clever folks have already started writing their scalable applications in something reasonable, like Erlang?
From erlang site:
1.4. What sort of problems is Erlang not particularly suitable for?

People use Erlang for all sorts of surprising things, for instance to communicate with X11 at the protocol level, but, there are some common situations where Erlang is not likely to be the language of choice.

The most common class of 'less suitable' problems is characterised by performance being a prime requirement and constant-factors having a large effect on performance. Typical examples are image processing, signal processing, sorting large volumes of data and low-level protocol termination.
That's why most applications are still in c/c++

--
Extreme Programming - Redundant Array of Inexpensive Developers
Re:C++? by shutdown+-p+now · 2007-12-18 01:42 · Score: 1

I don't have the numbers to back it up, but my gut feeling tells me that, if you count either lines of code or number of deployed applications, Java would beat C++. Precisely because most applications don't deal with image and signal processing directly, and sorting large volumes of data is delegated to databases.

YOUR eyes?! by Chordonblue · 2007-12-17 06:02 · Score: 2, Funny

Just be glad I didn't upgrade to the X4 yet! :)

--
"...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."

Wait for the new C++ standard before you switch... by DamnStupidElf · 2007-12-17 06:06 · Score: 1

There is currently no working concurrency model for standard C++. You want to make an atomic access to an object? Hope and pray that you have bug free system libraries and a compiler that doesn't optimize away your locking wrappers and do inappropriate speculative stores. Apparently the next C++ standard will address it, but it seems rather foolish to start a transition to massively multithreaded code without an actual standard.

Re:concurrency - the developer's responsibility? by LWATCDR · 2007-12-17 06:08 · Score: 2, Insightful

"But seriously, isn't the OS responsible for the heavy lifting with regards to task scheduling and concurrency? Oh, wait, this is Microsoft, right? Perhaps this is similar to their take on Security being somebody else's problem."
Huhhh?
My guess is that you never wrote any code.
Linux doesn't do any more heavy lifting for you than Windows does. I doubt that OS/X does.
So what are you talking about.
An OS will never figure out what part of your program is going to need to be in which thread. A compiler MAY at some time do it but they are just now doing a good job with vectors.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.

Personal computing? by Dan+East · 2007-12-17 06:10 · Score: 5, Interesting

"processors with more than eight cores, possible as soon as 2010 -- will transform the world of personal computing"

Exactly what areas of "personal computing" are requiring this horsepower? The only two that come to mind are games and encoding video. The video encoding part is already covered - that scales nicely to multiple threads, and even free encoders will use the extra cores to their full potential. That leaves gaming, which is basically proprietary. The game engine must be designed so that AI, physics, and other CPU-bound algorithms can be executed in parallel. This has already been addressed.

So this begs the question, exactly how will average consumer benefit from an OS and software that can make optimum use of multiple cores, when the performance issues users complain about are not even CPU-bound in the first place?

Dan East

--
Better known as 318230.

Re:Personal computing? by bogie · 2007-12-17 06:28 · Score: 2, Insightful

"So this begs the question, exactly how will average consumer benefit from an OS and software that can make optimum use of multiple cores"

AOL 10.0 will say "You got mail!" .25ms faster.

--
If you wanna get rich, you know that payback is a bitch
Re:Personal computing? by Thelasko · 2007-12-17 06:35 · Score: 1

Exactly what areas of "personal computing" are requiring this horsepower? Windows Vista! Sorry, if I didn't say it someone else would have.

--
One of our competitors trademarked the term "hypothesis". From now on, we will call them "boneheaded ideas".
Re:Personal computing? by Anonymous Coward · 2007-12-17 06:40 · Score: 1, Informative

So this begs the question, exactly how will average consumer benefit from an OS and software that can make optimum use of multiple cores, when the performance issues users complain about are not even CPU-bound in the first place?

Unfortunately it doesn't beg anything. Begging the question is something completely different.
Re:Personal computing? by eth1 · 2007-12-17 06:41 · Score: 1

The only way I can think of that the average consumer will benefit from an 8-core proc is that they'll be able to be infected by up to 7 botnet clients before their computer starts slowing down...
Re:Personal computing? by gihan_ripper · 2007-12-17 06:53 · Score: 1

The historical trend seems to be: come up with some swanky technology and uses will be found. So my answer to your question is: hell I don't know, but I can't wait to find out. Right now, I can see that parallel processing could make tabbed browsing a more pleasant experience, particularly if one or more tabs is showing video. Could probably help with DOM and Flash pages. Since "personal computing" pretty much equals "web browsing" these days, I'd say that's a significant change, though maybe not a "transformation".

--
Phoenix, Boston, Little Rock, see a pattern?
Re:Personal computing? by Skrynkelberg · 2007-12-17 06:53 · Score: 1

I'd love to run a simulation in MATLAB, download torrents, listen to music and play games at the same time. Today I can't, without noticeable slowdown.
Re:Personal computing? by kebes · 2007-12-17 06:56 · Score: 2, Insightful

You point out that few desktop tasks require parallel processing... but think about the flip-side of this: if we could speed-up many tasks, how would that affect desktop computing?

There are plenty of tasks that people do routinely on computers that are not "instantaneously" fast (spreadsheets, photo-editing, etc.). Furthermore there are many aspects of modern user interfaces that would be better if they were faster (generating thumbnail previews, sorting entries, rescanning music collections, searching, etc.). Also, it's important to realize that the commonplace desktop elements of tomorrow may not have been imagined today. Many things that we don't even consider (and certainly don't consider as "necessary") may become possible (and thus "necessary") with greater computer power (complex graphs/images/previews that update in realtime as a user slides a control, instantaneous re-encoding of video when you drag-and-drop to an external device, etc.).

My only point is that it is tempting to say that computers are "fast enough" and yet in my own computer-use (and watching the computer use of others) there are definitely times when the user must wait for the computer to finish a task (whether it is a split-second page render or a many-seconds refresh of a spreadsheet or a many-minute generation of a complex image). Until all of these tasks are "instantaneous" (shorter than human reaction time), then there is definite room for improvement in computer speed; and moreover improvements that the end-user will appreciate and come to rely on.

You'll notice that of the examples I've mentioned, many of them could in principle be parallelized (and thus benefit from multi-core systems).
Re:Personal computing? by FriendOfBagu · 2007-12-17 06:59 · Score: 1

Yeah, a single core and 640K of memory should be enough for anyone.
Re:Personal computing? by aethogamous · 2007-12-17 07:02 · Score: 1

So this begs the question, exactly how will average consumer benefit from an OS and software that can make optimum use of multiple cores, when the performance issues users complain about are not even CPU-bound in the first place?

I'm guessing that the average consumer will not benefit much from their current OS and software making optimal use of multiple cores.
Re:Personal computing? by Jasonjk74 · 2007-12-17 07:03 · Score: 1

Sadly, the average consumer is being led to believe they need more and more power. Look at all the Dell ads with some teenage girl saying "With this new Dell X5000 Flux Capacitor technology, I can surf the web and check my email!" Look at everyone now having dual core processors with absolutely no need. A girl at school was telling me last semester, "I'm getting an HP laptop with the dual...... (her voice trailed off, and I had to say "..core processor.") She didn't even know what is was called but was adamant that she needed it. Another especially funny example of processor-mania (and ineptitude) was 2 Mac-hipsters talking at a local coffeehouse, and my brother and I overheard one of them talk about "partitioning the processors."
Re:Personal computing? by Selfbain · 2007-12-17 07:10 · Score: 1

But no one will still be there to hear it.

--
Well, it has never been successfully tested.
Re:Personal computing? by 99BottlesOfBeerInMyF · 2007-12-17 07:23 · Score: 2, Insightful

Exactly what areas of "personal computing" are requiring this horsepower?
Video, audio, gaming, emulators, and VMs are starters. But I think you're missing some of the picture. Most computer users have one or two programs open at a time and end up quitting everything when they want to run something processor intensive like a game or photoshop. With the move towards multi-core and with a little work from developers, people might be able to leave 90% of the apps they use running, all the time. Multiple cores also provides something of a buffer. When a thread goes rogue, their machine does not grind to a halt. Heck, just yesterday my girlfriend was complaining because she tried to open a page in Firefox and it locked up the whole application including the other 8 tabs she had open. That means she had to kill it (which took a while itself) and then try to decide if she wanted to reopen all those tabs and risk it locking up again, or just try to remember what she had open and reopen them all by hand. If each tab, however is running in its own thread and there are enough cores to handle it, this could easily have been a much better experience for her. She could have just closed the unresponsive tab.

Basically, I'd argue that if you provide the resources, smart developers will find a way to make clever use of those resources. Dual core has already sparked a revolution for virtualization and led to some other, really cool OS changes to increase speed. Many cores will provide diminishing returns (we have 2 eyes for a reason), but I bet 8 cores will be well utilized within a few years.
Re:Personal computing? by John+Hasler · 2007-12-17 07:25 · Score: 1

> The historical trend seems to be: come up with some swanky technology and uses will be
> found. So my answer to your question is: hell I don't know, but I can't wait to find out.

The botnets will put all those cores to work.

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Re:Personal computing? by bdbolton · 2007-12-17 07:51 · Score: 1

"Exactly what areas of "personal computing" are requiring this horsepower?" It will speed up multitasking. If your using windows, pull up the task manager. Take a look at all those processes running in the background. Having multiple cores will speed them up. I regularly have lotus notes, firefox, IE and visual studio open at the same time.
Re:Personal computing? by ppanon · 2007-12-17 08:14 · Score: 2, Informative

Well, you could parallelize recalculation of large spreadsheets. Create dependency trees for cells and split the branch recalculations among different threads. Some accountants and executives with large "what-if?"-type spreadsheets could find that quite useful.

Browsers could have separate threads for data transfer and rendering. If the web site is using tags and CSS, you could split the rendering work for each div to a separate thread. More rapid and frequent partial screen updating can provide today's generation of MTV-style re-orientation addicted workers the perception of faster performance.

Parallelize WISYWYG document preparation with a backend using TeX text-layout algorithms.

But probably the biggest advantage would be obtained from more parallelism (both coarse and fine-grained) in GUI operations. That probably requires a re-architecting of display and GUI subsystems. But that's a bit of a chicken-and-egg problem because, to do that properly, you also need GPUs to become multi-core to remove the GPU as a single-thread bottleneck. GPUs are going to hit the same wall general purpose CPUs are hitting now, with a few years' delay. There's hope that today's Crossfire/SLI approaches could provide a hardware base to find an evolutionary path for that.

I figure it will take at least another 5 years or more for a graphics subsystem redesign, and my guess is that it will happen on Linux first. I don't see Microsoft being first in re-architecting the Windows display subsystem to do it. Certainly not for the next Windows version in 2010(?), and thus they probably won't implement it until 2014 at the earliest. I think it's more likely to happen with somebody replacing large parts of the X.org server as a PhD thesis.

But, yeah, fundamentally the biggest bottleneck with personal computer systems is the bandwidth between the user and the computer and there's no way to parallelize the user.

--
Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire
Re:Personal computing? by slashhax0r · 2007-12-17 08:41 · Score: 1

"Exactly what areas of "personal computing" are requiring this horsepower? The only two that come to mind are games and encoding video."

And loading the Visa Desktop.

Enuff said, we need 16 cores just to run the latest bloatware!
Re:Personal computing? by Tim+C · 2007-12-17 08:48 · Score: 1

I regularly have lotus notes, firefox, IE and visual studio open at the same time.

And how many of them are actually performing compute-intensive processing at the same time?
Upwards of 99% of any application's run time is spent waiting on user input - and that's especially true of web browsers, assuming you actually read the pages you open.

Fire up Task Manager and switch to the "Performance" pane, and watch the CPU graph for a while. Or better yet, use the "Performance" admin tool and have it log CPU use for a while. I'd be surprised if your 4 apps there use anything like the full capacity of the system you have now. Sure, there will be spikes, but averaged over a whole session I'd be very surprised if you hit more than a few percent utilisation.

I'm not saying that multi-core CPUs are useless; I have a dual core CPU now and am tempted by a quad core one. I'm just pointing out that your example application set doesn't need one. Even a compile in VS probably won't benefit greatly from increased core count.

--
It's official. Most of you are morons.
Re:Personal computing? by master_p · 2007-12-17 09:31 · Score: 1

Have you ever sit and watched Microsoft Word repaginate a document for what it seems an endless period of time?

Have you ever looked at your watch while you wait for a project to compile?

Have you ever wondered why searching the Windows registry is so slow?

Have you ever been frustrated from Windows blocking from waiting to access a problematic networking store?

Have you ever smiled funnily when you tried to open another program while Acrobat Reader was loading its modules?

Well, there are countless cases that personal computers need more horsepower...
Re:Personal computing? by John+Hasler · 2007-12-17 09:45 · Score: 1

> Video, audio, gaming, emulators, and VMs are starters. But I think you're missing some
> of the picture.

Yes. With eight cores the bot controlling your machine will be able to OCR two captchas, crack a password, crank out stock pump n' dumps, post comment spam, clean out your bank account, and still leave you enough horsepower to prevent you from deciding to buy a new machine and deprive the bot of a home.

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Re:Personal computing? by Doctor+Faustus · 2007-12-17 10:01 · Score: 1

Unfortunately, the idiots are winning, and eventually the incorrect usage may take over simply due to excessive incorrect usage.
What do you mean "eventually"? Unless context makes it very clear you're talking about circular arguments, "begs the question" now means "demands that the question be asked". It's annoying, but it's too late to complain, sorry.
Re:Personal computing? by sexconker · 2007-12-17 10:05 · Score: 1

If we could speed up programs, the average user would still sit and stare mindlessly at the screen for the same amount of time.
99% of the stuff people use is fast enough, or have other bottlenecks.

For the geeks out there who need to game, fold proteins, or encode video - guess what?
It's already multi-threaded.

Asking for everything to be multi-threaded is like asking for every piece of consumer electronics to run Linux.
Oh wait...
Re:Personal computing? by Doctor+Faustus · 2007-12-17 10:09 · Score: 1

When a thread goes rogue, their machine does not grind to a halt.
That's the immediate gain from a second core. I think that second one pretty much takes care of it, though. Beyond that, you're looking more at new features. More aggressive virus scanning, and on-the-fly encryption and/or compression spring to mind as possible early steps.
Re:Personal computing? by PitaBred · 2007-12-17 11:30 · Score: 1

You don't NEED it, but damn if I don't see a difference between running on my dual-core laptop vs. my single-core one. Things are just a little snappier, programs respond much faster, and if, say, the virus scanner updating is taking some CPU time, I can still use my computer. If that happens on the single core, I just have to wait. They've both got 2GB of RAM, 2GHz single-core vs 2.16GHz dual-core, decent hard drive, etc. Comparable machines except for the multi-core difference. It's not necessary, but it's worth it for most people to be able to keep using their machine when some Windows program starts eating up CPU cycles.

--
My blog. Good stuff (when I remember to update it). Read it.
Re:Personal computing? by 99BottlesOfBeerInMyF · 2007-12-17 11:38 · Score: 1

That's the immediate gain from a second core. I think that second one pretty much takes care of it, though.
That's the gain from multiple cores in combination with good multithreaded code. But if I'm running Windows in a VM and it is using one core, then the benefit is completely lost; or if I'm running a large application or even OS processes taking up a signifiant portion of one core, the browser itself may well, hang anyway.

There are plenty of potential uses, not the least of which is ACL jails for all applications in order to mitigate the majority of malware problems we suffer today.
Re:Personal computing? by ozbird · 2007-12-17 14:00 · Score: 1

Exactly what areas of "personal computing" are requiring this horsepower?

Vista. One core for the program, seven for DRM.
Re:Personal computing? by Colin+Smith · 2007-12-17 15:10 · Score: 1

"Exactly what areas of "personal computing" are requiring this horsepower?"

That's like saying what part of a horseless carriage requires a V8 with a turbo? Our computing infrastructure isn't the right shape to require it yet, it will at some point. I'd guess artificial intelligence.

--
Deleted
Re:Personal computing? by sznupi · 2007-12-17 16:39 · Score: 1

It DOES make tabbed browsing a more pleasant experience, at least in Opera.

--
One that hath name thou can not otter
Re:Personal computing? by ZeroPly · 2007-12-17 16:51 · Score: 1

Exactly what areas of "personal computing" are requiring this horsepower? I see you haven't tried running Vista Aero yet...

--
Support microSD: in a post 9/11 world, it is unwise to carry your data on media that you cannot comfortably swallow.
Re:Personal computing? by master_p · 2007-12-17 22:07 · Score: 1

Not a CPU problem, but a programming model problem, which forces the programmers to avoid using parallelism.
Re:Personal computing? by tepples · 2007-12-18 01:04 · Score: 1

Could probably help with DOM and Flash pages. Firefox 3 Beta 1 on Windows is not even multithreaded enough to have scripts running in one tab not freeze the user interface in another tab.
Re:Personal computing? by tepples · 2007-12-18 01:08 · Score: 1

Upwards of 99% of any application's run time is spent waiting on user input Even if it is rendering 10 animated advertisements at once?
Re:Personal computing? by tepples · 2007-12-18 01:14 · Score: 1

Have you ever sit and watched Microsoft Word repaginate a document No. I use a program similar to Word for short documents and something else for long documents.
for what it seems an endless period of time? Isn't repagination one of those problems not subject to easy data parallelism? If a document has no explicit page breaks, you have to know where all the page breaks before a line are in order to know where this line shall be placed. What I use in this case is task parallelism.
Have you ever looked at your watch while you wait for a project to compile? make -j2 and my compile process runs twice as fast even on a single core because each job does computation while the other's I/O blocks.
Have you ever been frustrated from Windows blocking from waiting to access a problematic networking store? It doesn't appear to block any of the other windows on the screen. Is there anything useful a file manager can do while waiting for a network operation to complete? And is it my computer that needs more power, or is it the server (which often I do not control)?
Have you ever smiled funnily when you tried to open another program while Acrobat Reader was loading its modules? Yes, before I switched to Foxit in early 2005.
Re:Personal computing? by sznupi · 2007-12-18 05:54 · Score: 1

Well, that's why you're supposed to choose PROPERLY CODED PROGRAMS. Yes, we can excuse average consumer (notice I didn't say "user"), but people on /.?

How is that my computer is the most snappy I've ever got my hands on, even if it's only Athlon XP 1700+? Could it be because I choose what I use carefully, and all dual cpu machines I've used had tons of bloatware? Just wonderin'...

Oh, and somehow I even get better funcionality...

PS. Seriously, you can't do anything while antivir is updating? What kind of bloatware are you using, I usually just see "virus database updated"...

PPS. And note that I'm not some sort of "anti-upgrade nazi", just..."anti-pointless-upgrade nazi" ;P That said, I have to upgrade really soon - I believe right now I see one of the last motherboards with 4 PCI slots available, and I'd like to use my PCI cards (yeah, that "anti-pointless-upgrade" thing) plus beeing able in the future to easily add HDD space and GFX card. Oh, modernisation includes cheapest dualcore CPU that fits into the mobo. WHy dual you ask? Well, at 40 it's marginally more expensive that cheapest singlecore.

I wonder how snappy my computer will be now... (yeah, I'm not planning to change my software habits)

--
One that hath name thou can not otter
Re:Personal computing? by Jens+Egon · 2007-12-18 06:28 · Score: 1

"processors with more than eight cores, possible as soon as 2010 -- will transform the world of personal computing" Exactly what areas of "personal computing" are requiring this horsepower?
AI. And if you don't see the need for personal AI you can sleep on the couch.

Lucinda Sexbot
Re:Personal computing? by PitaBred · 2007-12-18 08:46 · Score: 1

Even choosing a properly coded program, ANY program that uses a lot of CPU under Windows makes it slow to respond. It doesn't matter how well coded it is. That single program responds fine if it's properly coded, but other programs lag horribly. Under Linux it's quite a bit better, but there's still a world of difference between a single core and dual core when you're actually using the CPU power for processing, versus just letting the machine idle most of the time.

Seriously, try starting a POV-Ray render or some serious number crunching going on, or even just a software update task, and then switch tasks. It will kill any single core, but a dual-core keeps responding fine.

--
My blog. Good stuff (when I remember to update it). Read it.
Re:Personal computing? by sznupi · 2007-12-18 08:47 · Score: 1

It's a bit amusing that most of those are sort of Windows problems :p

--
One that hath name thou can not otter
Re:Personal computing? by sznupi · 2007-12-19 04:47 · Score: 1

Ahhh, but OP was talking about consumers beeing mislead, and the types of apps they use aren't CPU limited at all (or rather...they often are, but there's no apparent reason why they should get away with it :/ )

And especially POV-Ray is out of the question here...not only niche software, but also one of few niches that DO consume any amount of CPU power thrown at them.

PS. Really, I dislike the type of sentiment you seem to present, to some degree. Number crunching and updating that don't bog down the machine are possible on single core, I experienced it; we don't need to excuse devs who can't do that/throw them a bone by using more and more cores.

--
One that hath name thou can not otter
Re:Personal computing? by sznupi · 2007-12-19 08:52 · Score: 1

The problem with your vision is that as CPUs will get faster, new features/sloppy coding will probably keep the interactivity of UI at the same level...

I believe there's even some "law" for that :p

--
One that hath name thou can not otter
Re:Personal computing? by sznupi · 2007-12-19 08:59 · Score: 1

Hmm...there is already a browser behaving like that... (but it's not /. darling so no point in mentioning it...)

And while I agree about VMs, gaming and video (those cases basically can take any horsepower thrown at them for forseable future)...

- audio, on consumer level, already has more than enough CPU power

- emulators...hm, on one hand some of them will also take any CPU power thrown at them (MAME). One the other...I notice that in case of many lack of CPU power isn't the biggest problem (PS2 emulation)

--
One that hath name thou can not otter
Re:Personal computing? by sznupi · 2007-12-20 09:30 · Score: 1

It's...interesting how state of affairs in two most popular browsers influences opinions about "what should be done" (and was done...)

BTW, aren't GFX operations embarrasingly parallel...so there should be that much problem here?

--
One that hath name thou can not otter

Re:melt in your mouth not in your mobo by $RANDOMLUSER · 2007-12-17 06:12 · Score: 2, Informative

Well then you're not remembering very well. There was some crazy statistic floating around that a Prescott at ~25Ghz would put out as much heat per cm^2 as the surface of the sun.

--
No folly is more costly than the folly of intolerant idealism. - Winston Churchill

Re:concurrency - the developer's responsibility? by meatpan · 2007-12-17 06:13 · Score: 1

But seriously, isn't the OS responsible for the heavy lifting with regards to task scheduling and concurrency?

Only a surprisingly small group of programmers will be impacted by expanded multi-core architectures. In particular, the impacted devs are authors of the relatively 'low-level' code within compilers, kernels, and interpreters.

Diaspora by dino213b · 2007-12-17 06:17 · Score: 1

I agree with some of the previous posters that have faulted programmers for "the state of today." My feeling is that the divide between knowledge of hardware and knowledge of software is far too wide. In my experience, I have witnessed many programmers who spent more time organizing the readability of their code than analyzing the actual effectiveness of it: i.e. whitespace use vs algorithm optimization (be it processor method + instruction or i/o improvement). The end result: bloaty-pooh.

I feel that by making threading a C++ standard, or at least making the threading model a predominant one, the overall "state of today" will improve in the near future simply because more programmers will be aware of it. Parallel processing really does require some training -- it cannot be adapted to every task.

Take for example a simple thumbnail-generating program (a form of is used by everyday users). If the program is written in its traditional linear model, it would not take advantage of multiple processors or cores (unless you ran multiple instances of it or otherwise manipulated it in an unplanned fashion). However, if the program utilizes threading, it could become scalable without requiring any intervention. Knowledge of hardware -- and not necessarily relying on the compiler to optimize your code just might help.

Re:Diaspora by Chirs · 2007-12-17 08:08 · Score: 3, Insightful

For many large-scale software projects (I work in industry so I have some experience with this) it is far easier to find more cpu power than more programmers.

Making code easy to read and maintain is critical to maximizing the efficiency of the programmer. The efficiency of the code is generally a secondary issue, and is only a factor if the code in question is found to be a bottleneck.

Brian Kernighan once said,

"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?"
Re:Diaspora by pthisis · 2007-12-17 09:19 · Score: 1

Take for example a simple thumbnail-generating program (a form of is used by everyday users). If the program is written in its traditional linear model, it would not take advantage of multiple processors or cores (unless you ran multiple instances of it or otherwise manipulated it in an unplanned fashion). However, if the program utilizes threading, it could become scalable without requiring any intervention.

If it used multiple processes, it could become scalable without requiring any intervention.

If it used multiple processes, it also wouldn't be throwing out a good portion of all the effort the OS designers spent implementing protected memory, and it would by necessity be more explicit about exactly what resources it was sharing--meaning fewer of the common multi-processing bugs.

The ideal steps to take to encourage good parallel programming, IMO, would be:
1. Microsoft does a better job of publicizing the NTCreateProcess/CreateProcessEx call, in particular that a NULL SectionHandle results in a true copy-on-write process (like a fork()'d process in Unix/Linux); true COW processes are required for efficient multiple process programming. Too few Windows programmers know how to do that as the standard CreateProcess/spawn() type calls are dog slow and unusable for efficient solutions; consequently, a legion of developers believe that using multiple cores requires threads (and all the problems they bring).
2. Java exposes a better multi-process API. Basically because of (1) they went in the "threads for everything" direction at the outset, for portability; thankfully they've moved away from that, allowing select()-style calls and other state-machine friendly thigs in more recent years. Getting good multi-process support would be the last big step here.

--
rage, rage against the dying of the light
Re:Diaspora by sonofusion82 · 2007-12-17 10:12 · Score: 1

ha.. tell that to the entire generation of brain dead programmers who still thinks that VB6 is awesome.
In my experience, I have witnessed many programmers who spent more time organizing the readability of their code than analyzing the actual effectiveness of it: i.e. whitespace use vs algorithm optimization (be it processor method + instruction or i/o improvement). The end result: bloaty-pooh. I have worked with several senior engineers (probably taking home more than 2x of my paycheck) that only provides such feedbacks during code reviews. They are still stuck with VB6 and has absolutely no idea about multi-threading, OO design, security, etc.

HPC by ShakaUVM · 2007-12-17 06:17 · Score: 2, Interesting

As someone who got a master's in computer science with a focus in high performance computing / parallel processing, and have taught on the subject, *yes*, it does take a bit of work to wrap one's mind around the concept of parallel processing, and to correctly write code with concurrency. But *no*, it's not really that hard. Once you get used to the idea of having computation and communication cycles over a processor geometry, it becomes little more difficult to write parallel code than serial.

It's like of like when people see recursive functions for the first time. If they don't understand the base condition and inductive step, then they can easily fall into infinite loops or write bugs. Parallel code is the same way... just a bit more tricky.

Re:HPC by curunir · 2007-12-17 08:30 · Score: 2, Insightful

I think the one thing that makes parallel computing more difficult, and quite a bit more so than recursion, is the fact that it makes your program non-deterministic. With a single-threaded application, it's pretty obvious when you've made your application non-deterministic...you reference the time or some resource external to your application. And those kinds of non-deterministic behaviors are much easier to understand...they're mostly just data. But if your application is running on multiple processors using multiple threads, that's not the case. You can run your application multiple times and see different results depending on which threads execute the fastest. And in the worst-case scenario, you get dead locks that are a nightmare to debug.

It's often quite difficult to wrap your head around that unpredictability, especially since so much of the beginning computer science education teaches programmers to evaluate each instructions in their programs in source order as the computer is likely to end up doing when the program is run. This is made even worse by the fact that some languages (I know Java, but there may be others too) allow a compiler to re-order instructions to improve performance provided it doesn't alter that thread's behavior. This is fine for a single-threaded application, but can be quite confusing for a multi-threaded application when you can no longer assume source ordering of instructions from other threads.

It took a while before I got comfortable with essentially asking myself "What am I assuming and do I actually know that at this point or do I just think I know it at this point" with every line of code that I write that might execute in a multi-threaded environment. Even with that, I still run into occasions where it takes over an hour to debug a race condition when that error only happens a small percentage of the time.

--
"Don't blame me, I voted for Kodos!"
Re:HPC by ShakaUVM · 2007-12-17 17:21 · Score: 1

Right, that's why I like to shy away from tricksy algorithms. There's all sorts of tricksy things you can do with recursive functions as well -- that might even make them run faster -- but whenever I write a recursive function, I always write two parts to it: 1) Base Case. 2) Recursive step. It breaks the problem down nicely, and I can see clearly where the base case is going to stop recursion.

But if your application is running on multiple processors using multiple threads, that's not the case. You can run your application multiple times and see different results depending on which threads execute the fastest. And in the worst-case scenario, you get dead locks that are a nightmare to debug.

This only happens with badly written code, or with tricksy code. It is actually possible to write safe parallel code, and it's not hard.

To safely write parallel code, you do this in your main loop (after setup):
1) Computation step within your processor's data
2) Communication step with your neighbors
3) Barrier and Check for Termination

If you do this, you'll never get the race conditions that you mention, and the computation and communication steps become very easy to write.

For example, imagine you're doing some sort of stencil operation over a large array (which generally involves doing math with 3x3 or 5x5 chunk of the array at a time). You start by splitting the task over your N CPUs, and telling them where they fit into your processor geometry. For example, you might have a 1024x1024 array, distributed across 16 CPUs in squares of 64x64 data. Each step, each CPU does the following:
1) Run the stencil operation() over its private data.
2) Communicates the boundary values to its neighbors.
3) Barriers (waits for everyone to catch up) and checks for termination
Repeats.

No matter how fast or how slow the various CPUs/cores are, you won't get a race condition.

This is a design for a rather homogenous computer, of course, but the article is talking about 8 core CPUs. In practice, you'd do some sort of load balancing to shift extra work away from heavily loaded cores/CPUs, but that actually doesn't change the core design, you just to track the changing processor geometries. Use KeLP if you really want a powerful geometry toolbox.

In a Grid environment, a different paradigm is usually employed, of course.

Re:Programmers are at fault by jedidiah · 2007-12-17 06:19 · Score: 1

Why should programmers be knocked for only using the tools they have?

It's a lot easier to develop for something if you can actually get your hands on it. When this "nifty but underutilized" sort of hardware gets out there where everyone can use it, perhaps the problem will sort itself out. Not everyone has the resources to test their ideas in this area.

If you're going to knock anyone, knock the professors for ignoring this area of research and not capturing the attention of their students who are now "substandard practitioners".

--
A Pirate and a Puritan look the same on a balance sheet.

Sameless Plug: Qt 4.4 by scorp1us · 2007-12-17 06:19 · Score: 5, Informative

Full disclosure: I am a Qt Developer (user) I do not work for TrollTech

The new Qt4.4 (due 1Q2008) has QtConcurrent, a set of classes that make multi-core processing trivial.

From the docs:

The QtConcurrent namespace provides high-level APIs that make it possible to write multi-threaded programs without using low-level threading primitives such as mutexes, read-write locks, wait conditions, or semaphores. Programs written with QtConcurrent automaticallly adjust the number of threads used according to the number of processor cores available. This means that applications written today will continue to scale when deployed on multi-core systems in the future.

QtConcurrent includes functional programming style APIs for parallel list prosessing, including a MapReduce and FilterReduce implementation for shared-memory (non-distributed) systems, and classes for managing asynchronous computations in GUI applications:

* QtConcurrent::map() applies a function to every item in a container, modifying the items in-place.
* QtConcurrent::mapped() is like map(), except that it returns a new container with the modifications.
* QtConcurrent::mappedReduced() is like mapped(), except that the modified results are reduced or folded into a single result.
* QtConcurrent::filter() removes all items from a container based on the result of a filter function.
* QtConcurrent::filtered() is like filter(), except that it returns a new container with the filtered results.
* QtConcurrent::filteredReduced() is like filtered(), except that the filtered results are reduced or folded into a single result.
* QtConcurrent::run() runs a function in another thread.
* QFuture represents the result of an asynchronous computation.
* QFutureIterator allows iterating through results available via QFuture.
* QFutureWatcher allows monitoring a QFuture using signals-and-slots.
* QFutureSynchronizer is a convenience class that automatically synchronizes several QFutures.
* QRunnable is an abstract class representing a runnable object.
* QThreadPool manages a pool of threads that run QRunnable objects.

This makes multi-core programming almost a no-brainer.

--
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.

Re:Sameless Plug: Qt 4.4 by yoprst · 2007-12-17 07:16 · Score: 1

Life is unfair. I hate Trolltech. And yet they do it right(at least that's what I can infer from your api description) - AGAIN, while others offer tools that are only good for parallelizing and multiplying pain.
Re:Sameless Plug: Qt 4.4 by hasdikarlsam · 2007-12-17 07:18 · Score: 1

Does it, now.

Those functions look familiar. Let me see... oh yes, they're the most basic manipulations in functional programming, and have nothing to do with threading per se. Mind you, functional languages /are/ a whole lot easier to multithread than ancient creaky things like C++. ;)

I have a feeling that, once four cores or more become common, the 30-70% slowdown you get from using Haskell instead of C++ will be dwarfed by the 3-7x speedup from easier parallelization.
Re:Sameless Plug: Qt 4.4 by Rich0 · 2007-12-17 08:17 · Score: 1

I'd see maintainability being a big advantage of functional programming as well. By defining at a high level what needs to be done, and not how to do it, you can change the former with less impact on redoing the latter.

Imagine creating an elegant and efficient parallel solution to some problem. Somebody comes along and changes the requirements. Suddenly your whole algorithm breaks down and needs a major rewrite because the new requiremets have different cross-thread synchronization profiles. If you had a high-level functional solution to the problem then you just change the code and the compiler does the rest.

In most cases writing correct and maintainable code is more important than squeezing out every bit of performance. And this is VERY hard to do with locks/semaphores/etc. A high-level language that can scale to a cluster has a lot of usefulness.
Re:Sameless Plug: Qt 4.4 by Sax+Maniac · 2007-12-17 09:01 · Score: 1

a set of classes that make multi-core processing trivial.

Also, QDeadlock is a handy coffe-break implementor, QUnscalableParallelApplication (subclass of QApplication) is great when you use all this stuff yet the program doesn't get any faster, and QKernelPanic when accessing hardware directly over the AGP bus.
I would say these classes make the mechanics trivial and maybe portable, but not parallel programming itself.

--
I can explanate how to administrate your network. You must configurate and segmentate it, so it can computate.
Re:Sameless Plug: Qt 4.4 by ge · 2007-12-17 13:54 · Score: 1

.... unless your problem does not fit the model. I'm happy for Qt developers that they can do Google-style mapreduce problems in Qt now, but this solution does not solve the general case.
Re:Sameless Plug: Qt 4.4 by Rodyland · 2007-12-17 15:33 · Score: 2, Insightful

This makes multi-core programming almost a no-brainer.

While you did say 'almost', I'm still going to take exception with that statement.
That is a very dangerous thing to say without reams of qualifications.
Programming (of any non-trivial nature) is not currently, nor is it likely to be any time soon, a 'no-brainer'. No library, no framework, no toolset, no abstraction takes away from the core fact that programming is hard. Sure, you can take away the boring/trivial stuff and give the programmers more time to work on the hard/interesting stuff, but that doesn't make it a 'no-brainer'.
Abstracting away mapReduce just means you don't have to know how to write your own mapReduce implementation. It doesn't automatically make the user of Qt (or whatever) an expert in designing parallel algorithms, nor parallel debugging, nor the performance benefits and tradeoffs and gotchas of parallel programming.
Re:Sameless Plug: Qt 4.4 by Your.Master · 2007-12-17 16:17 · Score: 1

"If you had a high-level functional solution to the problem then you just change the code and the compiler does the rest."

Pardon me, but is it not true of all (compiled) software that you just change the code and the compiler does the rest? I know the advantages of stateless, functional programming in terms of parallelization, but the other thing you described sounds just like typical encapsulation, as in Object-Oriented procedural languages. Or IDLs.
Re:Sameless Plug: Qt 4.4 by Rich0 · 2007-12-18 03:07 · Score: 1

True - but in most languages compilers can't optimize for parallelism too well - so you end up doing that part manually. But the appropriate parallel approach can be sensitive to how threads need to share data - so if you change the requirements your approach can become sub-optimal.

If you could make a high-level language that was more suitable for parallelism then the compiler would do more of the work.

Re:concurrency - the developer's responsibility? by s20451 · 2007-12-17 06:21 · Score: 1

What a ridiculous idea. The application developer's free lunch is over, now she needs to think concurrently? Ha, she probably has difficulty with a single thread of thought...

I think this sentence makes most sense if you imagine it being read in Comic Book Guy voice.

--
Toronto-area transit rider? Rate your ride.

Evolution that halted at 4 ghz.... by MindPrison · 2007-12-17 06:23 · Score: 2, Interesting

It's not easy... especially since things sort of halted at 4 ghz, what on earth am I typing about? Well...picture this...limitations...yes they do exist..and sometimes it's important to think beyond what lies just straight ahead (such as the next cycle speed)...and think into a second...maybe even a 3rd dimmension to expand your communication speed. I have for over 6 years been thinking..of a 3d-dimmension processor that cross communicates over a diagonal matrix instead of the traditional serial and parallel communication model. Imagine this folks...if your code could "walk" across a matrix of 10 x 10 x 10 instead of just 8 x 8 or 64 x 64 if you want...get the picture, no? Imagine that your data could communicate on a 3 dimmensional axis - imagine that you had 10 stacks of cores on top of each other - and instead of just connecting they communication bus to a parallel or a serial model...they could in fact communicate on a diagonel basis... this would make it possible to send commands...data..etc....in a 3d-space rather than just a "queue". This of course...would demand a different "mindset" of coding... everything would have to be written from scratch....though...but the benefits would be tremendeous .....you could 10 fold existing computational speed by increasing the communication across processor-cores...maybe even more! Even by todays technology standards. Ok..ok...sounds far fetched for you doesnt it? Well..get this...this was my invention 6 years ago (maybe even 9 years ago...I am getting older so I dont really care...I do care for freedom of information and sharing...Not so much wealth so listen on)...The theory of what I just wrote here on Slashdot (which has more implication on your life in the future than you will ever be capable of comprehending...yes...I am full of myself aint i....Who cares? You dont know me) .. point is... There was once a missing brick to the idea of diagonal cross matrix computing....with yesteryears technology it just would not be feasible to do it... but ...if you have ANY understanding of what I write here (yes...I am not kidding...this may change history as we know it...and I am drunk right now...and I dont want to keep a lid on it anymore)...here we go... Please think about what I just wrote - and - look up frances hellman's lecture upon magnetic materials in semiconductors...and you WILL have your 4-th link in the 3-B-E-C (base, Emitter, Collector) construction...to make the Cross Matrix Processor possible....just understand this....JoOngle invented this...Frances made it possible - YOU read it from a drunk nobody of Slashdot.org....) now...go make it real!

--
What this world is coming to - is for you and me to decide.

Re:Evolution that halted at 4 ghz.... by Animats · 2007-12-17 06:36 · Score: 4, Informative

I have for over 6 years been thinking..of a 3d-dimmension processor that cross communicates over a diagonal matrix instead of the traditional serial and parallel communication model.
Six years, and you haven't discovered all the machines built to try that? This was a hot idea in the 1980s. Hypercubes, connection machines, and even perfect shuffle machines work something like that. There's a long history of multidimensional interconnect schemes. Some of them even work.
Re:Evolution that halted at 4 ghz.... by poot_rootbeer · 2007-12-17 06:54 · Score: 1

I think my brain needs to be multi-core to comprehend that paragraph.
Re:Evolution that halted at 4 ghz.... by Skrynkelberg · 2007-12-17 06:58 · Score: 5, Funny

You may want to switch of the rapid fire-mode for your "."-key.
Re:Evolution that halted at 4 ghz.... by dbIII · 2007-12-17 11:58 · Score: 1

Six years ago? Wasn't there something similar in very impressive looking machines more than 16 years ago? Didn't Feynman of all people write of all things a massively parallel BASIC for it?
Re:Evolution that halted at 4 ghz.... by smellotron · 2007-12-17 17:59 · Score: 1

Whoa, man. Down with ellipses! Long live paragraphs!
Re:Evolution that halted at 4 ghz.... by MindPrison · 2007-12-17 19:48 · Score: 1

Mental note:

Seek professional help. Now!

--
What this world is coming to - is for you and me to decide.

speech recognition by schwaang · 2007-12-17 06:24 · Score: 1

Not that it takes massive (by today's PC standards) compute power to do decent speech recognition, but it's definitely worth dedicating a core or two.

And then with Vista, you might need one or two cores dedicated to handling UAC events ("The user tried to breath again: Cancel or Allow?").

And so it goes...... by Nonillion · 2007-12-17 06:24 · Score: 5, Insightful

processors with more than eight cores, possible as soon as 2010 -- will transform the world of personal computing....

Translation:

Code will get even more inefficient / bloated and require faster hardware to do the same thing you are doing now. While I'm all for better / faster computer hardware, most if not all Jane and Joe Sixpack users never need Super Computer power to surf the net, read e-mail and watch videos.

--
"I bow to no man" - Riddick

Re:And so it goes...... by Anonymous Coward · 2007-12-17 06:52 · Score: 1, Insightful

i've heard that before. just like joe sixpack didn't need high bandwith connection ? in came Youtubes, online photoprints, pornovideos (instead of jpegs), and whatevers. Maybe Joe Sixpack doesn't need multiple cores now but could it be that he just might find use for them in the future ?
Re:And so it goes...... by bdbolton · 2007-12-17 08:04 · Score: 1

"Code will get even more inefficient / bloated and require faster hardware to do the same thing you are doing now. "

All that "inefficient" code must be doing something better. I for one, can do more on my computer now than 10 years ago: Video, music, web, voip, desktop search, 3d desktop effects, 3d games.

Erlang by Niten · 2007-12-17 06:24 · Score: 4, Informative

Oddly enough, I just watched a presentation about this very topic, with an emphasis on Erlang's model for concurrency. The slides are available here:

http://www.algorithm.com.au/downloads/talks/Concurrency-and-Erlang-LCA2007-andrep.pdf

The presentation itself (OGG Theora video available here) included an interesting quote from Tim Sweeney, creator of the Unreal Engine: "Shared state concurrency is hopelessly intractable."

The point expounded upon in the presentation is that when you have thousands of mutable objects, say in a video game, that are updated many times per second, and each of which touches 5-10 other objects, manual synchronization is hopelessly useless. And if Tim Sweeney thinks it's an intractable problem, what hope is there for us mere mortals?

The rest of this presentation served as an introduction to the Erlang model of concurrency, wherein lightweight threads have no shared state between them. Rather, thread communication is performed by an asynchronous, nothing-shared message passing system. Erlang was created by Ericsson and has been used to create a variety of highly scalable industrial applications, as well as more familiar programs such as the ejabberd Jabber daemon.

This type of concurrency really looks to be the way forward to efficient utilization of multi-core systems, and I encourage everyone to at least play with Erlang a little to gain some perspective on this style of programming.

For a stylish introduction to the language from our Swedish friends, be sure to check out Erlang: The Movie.

Re:Erlang by GileadGreene · 2007-12-17 13:56 · Score: 1

Indeed. A language built around highly-efficient message-passing concurrency (rather than shared state) was seen as the way forward the last time the computing world was getting excited about using large-scale parallelism to increase processing power. Bell Labs also implemented the same kind of model when they started taking concurrency seriously. It's an idea that's been percolating in the background for many years now, but has never quite seemed to hit the mainstream (too many programmers wedded to sequential programming and the "small" jump from there to shared-state concurrency). Hopefully this time around, the idea will stick. Erlang certainly seems well-placed to gain mainstream acceptance, especially with Joe Armstrong's new PragProg book out, and the growing popularity of ErlyWeb.
Re:Erlang by zrq · 2007-12-17 14:00 · Score: 1

thread communication is performed by an asynchronous, nothing-shared message passing system

Sounds a bit like OCCAM.
Re:Erlang by ceoyoyo · 2007-12-17 18:20 · Score: 1

You don't need Erlang to get the gist. I never use naked threads, ALWAYS threads wrapped up in objects. If you use Apple's distributed objects they even do the message passing for you.

I was in a lab with a bunch of non-CS researchers who suddenly wanted to write some parallel applications. So I gave them my thread object. Here you go, put your code here, send the object the data, tell it to start, and it'll get back to you when it's done.
Re:Erlang by Anonymous Coward · 2007-12-17 20:28 · Score: 1, Interesting

The good thing about Erlang is that developers don't need to understand concurrency. This is especially beneficial for ErlyWeb, where the author literally has no clue.

I'm not trying to be mean, but talking with him he really doesn't understand it and I think has never programmed anything using multi-threads. At first I was really scared - finding out that Erlang enthusiasts just couldn't grasp the basics, until I read Armstrong's book and realized that they didn't need to. Joe gets it and the language hides it so well from you that you just don't need to understand it whatsoever. Just like Java developers don't need to understand pointers and OO programmers can (often) avoid recursion, Erlang coders just don't need to deal with it. I'm still a bit put off, and I wouldn't let the author near any infrustructural code in a threaded language, but Erlang really has become the VB of concurrency. Quite a feat.
Re:Erlang by GileadGreene · 2007-12-18 13:38 · Score: 1

That's the point though, isn't it: using a message-passing approach with no shared state makes concurrency much more tractable to reason about. It's a much more scalable mental model for building concurrent code that actually works.

Wusses by skintigh2 · 2007-12-17 06:25 · Score: 1

Real men think in parallel.

Re:Oh, wow by bladesjester · 2007-12-17 06:25 · Score: 5, Insightful

A guy who's on the C++ standards committee AND works for Microsoft.

Actually, according to the latest Dr Dobbs, Herb is the *chair* of the ISO C++ Standards committee. (He had an article on lock hierarchies being used to avoid deadlock)

He's really going to know what he's talking about, then.

As chair of the committee, I'd say there's a pretty fair chance that he *does*.

I really love people who bash things just because Microsoft is involved. Contrary to what seems to be a popular belief here, they have some incredibly intelligent people who are very good at what they do there.

--
Everything I need to know I learned by killing smart people and eating their brains.

Threads considered harmful by richieb · 2007-12-17 06:25 · Score: 4, Interesting

Check out this article on O'Reilly's site. Threads are actually very low level construts (like pointers and manual memory management). Accordingly the future belongs to languages that eliminate threads as a basis for concurrency. See Erlang and Haskell.

--
...richie - It is a good day to code.

I may not like the implications for my code... by AndyCR · 2007-12-17 06:27 · Score: 1

But I sure like my newfound ability to compile multiple source files at once and finish a 5-files-changed compile in a few seconds.

--
If there's anyone I hate more than stupid people, it's intellectuals.

Re:Wow, this is a great idea! by somersault · 2007-12-17 06:31 · Score: 1

I've always thought that, didn't realise there was a law for it. People used to optimise everything way back when, but now I suspect that most people just let the faster processor take care of things rather than trying to squeeze every nanosecond of performance out of their apps :( At least graphics are still getting faster just because they're adding more parallel processors to the chips..

--
which is totally what she said

Re:Threads Are Not the Answer by caerwyn · 2007-12-17 06:31 · Score: 5, Interesting

This is very, very wrong. Data-set partitioning is certainly one way of achieving parallelism in programming, but it is hardly the only way- nor is it applicable to all domains, as many problems have solutions with too many inter-cell data dependencies. In addition, threads provide a wealth of benefits to application developers by allowing multiple unrelated tasks to be performed simultaneously.

There is, and will always be, overhead associated with parallelization. It may sound great to say "oh, we can farm out parts of this data set to other cores!", but that requires a lot of start-up and tear-down synchronization. It's not at all uncommon for overall performance to be improved by doing something *unrelated* at the same time, requiring less synchronization overhead.

Are threads perfect for everything? No. But calling them the second worse thing to happen to computing is, as best, disingenuous.

--
The ringing of the division bell has begun... -PF

While that's true by Sycraft-fu · 2007-12-17 06:32 · Score: 1

The bigger problem is that which the article mentioned: That programmers don't know how to take advantage of the parallel cores. There are two major parts to this:

1) Just because a given algorithm can't be implemented multi-threaded, doesn't mean there isn't another algorithm that does the same thing that can. So part of it is learning new ways of doing old things, or inventing new ways of doing things (we haven't discovered every possible algorithm).

2) Rethinking program design so that even though a given algorithm may be a single thread, many of them can run in parallel. As a simple example say you have a program that processes audio. Rather than having it process one track completely, then move on to the next, then mix them all when it is done you have it process each track in a different thread at the same time, then hand off the mixing to yet another thread (most DAWs work this way).

Nobody is saying it is easy (or at least nobody who understands it) but that also doesn't mean it is impossible. I fully agree, there are things where each step is dependant on the previous step and there is simply no way to do two steps in parallel. However I bet those are much less than you might first think, especially in the scheme of a whole program and not just a single algorithm in a program.

Thus programmers face the task of learning how to deal with this, both in terms of program design, new algorithms, and hopefully better compilers to help. It seems as though multi-core is the way of the future at least for a while, so just saying "Well we can't make this parallel," may not be an option.

This has been coming for a while. by jskline · 2007-12-17 06:37 · Score: 4, Interesting

The fact is that programming by and large has gotten lazy, shiftless and sloppy over time and not any better or faster. They really did rely on processing and memory architectures getting faster to overcome their coding bottlenecks. The words; "optimized code" have little or no significance in todays programming shops because of budgets. Because of the push to get stuff out the door as quickly as possible, corners are cut all over the place on many things.

There once was time when debugging was part of your job. Now; someone else does that and at most, the better coders do some unit testing to ensure their code snippet does what it is supposed to. There generally isn't any "standard" with regard to processes except in some houses that follow *recommended coding guidelines* but these are few and far between. Old school coders had a process in mind to fit a project as a whole and could see the end running program. Many times now, you are to code an algorithm without any regard or concept as to how it might be used. A lot of strange stuff going on out there in the business world with this!

If there is a fundamental change in the base for C++, et al., this is going to possibly have a detrimental effect on the employment market as there will be many who cannot conceptualize multi-threading methodologies much less modeling some existing processing in this paradigm; and leave the markets.

I left the programming markets because of the clash of bean counters vs quality, and maybe this will have a telling change in that curve. I always did enjoy some coding over the years and maybe this would make an interesting re-introduction. I have personally not coded in a multi-threading project but have the concepts down. Might be fun!

--
All content in this message is copyright (c) 2008. All rights reserved. RIAA is prohibited here.

Re:This has been coming for a while. by crusty_yet_benign · 2007-12-17 08:55 · Score: 1

I have personally not coded in a multi-threading project but have the concepts down.

I admire your can-do attitude, but take your assuredness with a grain of salt. In theory, there's no difference between theory and practice. In practice, there is.

There once was time when debugging was part of your job. Now; someone else does that

?! Wherever that is, I'm glad I don't work there. My guys still have to debug.
Re:This has been coming for a while. by CodeBuster · 2007-12-17 09:56 · Score: 1

The fact is that programming by and large has gotten lazy, shiftless and sloppy over time and not any better or faster.

Nothing could be further from the truth, if anything the practice of programming has improved dramatically over the years and especially so in the last fourteen (14) years starting with the publication of the Design Patterns book in 1994 and continuing on through the development of better languages to support these methodologies such as Java, Python, and Delphi and really culminating thus far IMHO in the development of the C# programming language and the .NET Framework (which owes a debt to Java which owes a debt to C++ and so on back down the chain). If you haven't done much programming, particularly in the last five years then perhaps it is time for a second look, the state of the art in programming has definitely accelerated in recent years and it is heading in the right direction.

They really did rely on processing and memory architectures getting faster to overcome their coding bottlenecks.

Excuse me, but that has not been the case in my experience. There have always been poor implementations and even poor programmers, but that doesn't mean that the entire system was broken. In fact, the really poor algorithms and implementations tend to remain poor no matter how much processing power or memory you throw at them which is why good programming is orders of magnitude more important than faster hardware or hand optimization of the compiler output (which BTW hasn't been necessary for many years now thanks to the tremendous improvements in compiler construction). If your implementation sucked then no amount of hardware, within reason, was going to save you from a competitor who could just as well use the same hardware and still beat you with a more efficient program.

There once was time when debugging was part of your job. Now; someone else does that and at most, the better coders do some unit testing to ensure their code snippet does what it is supposed to.

I don't know about you, but I use the debugger almost every day and every other developer that I have met also uses the debugger extensively. The debugger and debug ability is really not an issue with any halfway decent programmer. I would say that formal unit and integration testing are somewhat less common in my experience, but there are tools and frameworks (open source and good quality) for those tasks as well.

There generally isn't any "standard" with regard to processes except in some houses that follow *recommended coding guidelines* but these are few and far between.

It may surprise you to learn that other engineering disciplines, especially when one takes a more international perspective on the issue, do not widely agree upon "standards" either or even if they do there are always dissenters. The software industry used to be notoriously bad on this count but again things have improved dramatically on this issue, especially in the last decade or so.

Old school coders had a process in mind to fit a project as a whole and could see the end running program.

Which was really an inferior way to approach the problem because it resulted in lots of cut and paste duplication of boilerplate code blocks across multiple projects with only coincidental variations. Did you ever have the nagging feeling, after writing the same subtle variation of some code block for the nth time, that maybe you could abstract that part of the problem out? This was not generally possible in the days before object oriented programing and design, but now that it is I don't know anyone who would want to go back to the old functional batch style programming days.

Many times now, you are to code an algorithm without any regard or concept as to how it might be used.

In fact this is exactly what you want to do. Abstraction is your friend. Why
Re:This has been coming for a while. by Estanislao+Mart�nez · 2007-12-17 10:25 · Score: 1

The words; "optimized code" have little or no significance in todays programming shops because of budgets. Because of the push to get stuff out the door as quickly as possible, corners are cut all over the place on many things.

I suspect you overestimate the value of "optimized code."
I have personally not coded in a multi-threading project but have the concepts down.

Famous last words. Care to tell us about the relative advantages and disadvantages of locks + shared memory concurrency, message-passing concurrency (both in asynchronous and synchronous flavors), and software transactional memory? This stuff ain't easy at all.

--
Are you adequate?
Re:This has been coming for a while. by Estanislao+Mart�nez · 2007-12-17 10:57 · Score: 1
Nothing could be further from the truth, if anything the practice of programming has improved dramatically over the years and especially so in the last fourteen (14) years starting with the publication of the Design Patterns book in 1994 and continuing on through the development of better languages to support these methodologies such as Java, Python, and Delphi and really culminating thus far IMHO in the development of the C# programming language and the .NET Framework (which owes a debt to Java which owes a debt to C++ and so on back down the chain).

Um, no. You're failing to see how the things you mention have held software engineering back. Design patterns are part of a broader trend to slavish adherence to "Object-Oriented" technologies whose value and philosophy is overrated.
Since 1994, garbage collection has become commonplace, which is good. We've gotten languages better than C for general programming, which is at best faint praise, given that C is a language with many shortcomings, that was applied way beyond where it's actually good, back when computers weren't as powerful as they now are. But there are a number of things that simply haven't happened, because of the reign of "Object-Oriented" programming:
1. Paradigm improvements in relational database technology. We've had plenty of incremental improvements, but we're still largely stuck with SQL and ORM (the latter of which is a "solution" for problems caused by slavish application of OOP). (LINQ in .NET is certainly addressing this, but I can't really say much intelligent about it.)
2. Advanced and commercially feasible functional programming environments and technologies.
3. Advanced and commercially feasible tools for concurrent and parallel programming; which is precisely the problem that this story is about. The languages and techniques you mention and lionize in your comment do nothing to make it easier to build concurrent software. Threads, locks, atomic memory operations and memory barriers are way too low-level and complicated to make the task of building concurrent software easier.
--
Are you adequate?
Re:This has been coming for a while. by CodeBuster · 2007-12-17 13:18 · Score: 1

Design patterns are part of a broader trend to slavish adherence to "Object-Oriented" technologies whose value and philosophy is overrated.

Fine, you are entitled to your opinion after all, but be aware that those advocating a complete scrapping of the object oriented paradigm, which you seem to be advocating, are in the minority.

Since 1994, garbage collection has become commonplace, which is good.

It has its pluses and minuses IMHO.

We've gotten languages better than C for general programming, which is at best faint praise, given that C is a language with many shortcomings, that was applied way beyond where it's actually good, back when computers weren't as powerful as they now are.

That is sort of an apples and oranges comparison. You understand that the original purpose of the C programming language was the construction of operating systems, namely the research project which became UNIX at the original Bell Labs, right? I haven't used C for some time now, but I would probably not consider using it again unless there was a very good project based reason to do so. Another reason that C tends to hang around is that C compilers have been written for just about every modern processor assembler language out there which is one reason, among others, that you do not see many operating system kernels written in languages other than C. I would argue that C was tremendously successful at accomplishing what its designers set out for it to do. What people have used it for subsequently does not in any way diminish the work of the original designers. In fact, IMHO it enhances it substantially. Many of the most popular modern programming languages owes a debt to C, it blazed the trail for just about everything that came afterwards (although C itself borrowed some concepts from previous languages, notably the ALGOL family of languages).

But there are a number of things that simply haven't happened, because of the reign of "Object-Oriented" programming:

I shall attempt to address your points in turn:

Paradigm improvements in relational database technology. We've had plenty of incremental improvements, but we're still largely stuck with SQL and ORM (the latter of which is a "solution" for problems caused by slavish application of OOP). (LINQ in .NET is certainly addressing this, but I can't really say much intelligent about it.)

Personally I don't see anything wrong with building adapters, which is really what an ORM is. You see adapters all of the time in other engineering solutions (your laptop battery power adapter for example). It is neither possible, nor even desirable IMHO, for a single model, format, or standard to be used across everything just to avoid having to build adapters. Adapters allow flexibility whereas what you are proposing will lock everyone into a "standards" battle that is essentially un-winnable because people will not agree or perhaps they cannot agree (there are no silver bullet solutions that solve all problems after all). The same is true whether we are talking about wall electrical current (i.e. 110v at 60hz in the United States but 220v at 50hz in Europe) or software programming.

As for LINQ I personally do not plan to use it in my projects mostly because it makes use of classes dynamically generated at runtime upon which it is not possible to make use of declarative programming with attributes. I also do not like the idea of mixing my business logic with my data access in partial classes. I understand why LINQ is being introduced (i.e. not every project should use the heavy duty solution regardless of size), but its use could be problematic. Perhaps I will change my opinion once I try it out in a sample application which does some real world task, but I doubt it.

Advanced and commercially feasible functional programming environments and technologies.

So what? The same can be said for object oriented programming environments
Re:This has been coming for a while. by Estanislao+Mart�nez · 2007-12-17 16:47 · Score: 1

Fine, you are entitled to your opinion after all, but be aware that those advocating a complete scrapping of the object oriented paradigm, which you seem to be advocating, are in the minority.
Who's advocating a complete scrapping of anything? I said that the paradigm is overrated, not worthless.

That is sort of an apples and oranges comparison. You understand that the original purpose of the C programming language was the construction of operating systems, namely the research project which became UNIX at the original Bell Labs, right? I haven't used C for some time now, but I would probably not consider using it again unless there was a very good project based reason to do so. Another reason that C tends to hang around is that C compilers have been written for just about every modern processor assembler language out there which is one reason, among others, that you do not see many operating system kernels written in languages other than C. I would argue that C was tremendously successful at accomplishing what its designers set out for it to do. What people have used it for subsequently does not in any way diminish the work of the original designers.
And again, you're not actually contradicting anything I actually said. It is not an apples to oranges comparison because people still use C to write, say, desktop apps, in this day and age.
Certainly C achieved its goals back in the 70s, and even well into the 80s, when people had to program for very resource-constrained hardware, using the same hardware as the development platform. But that was 20-30 years ago. Even for portable systems programming, C's prime problem area, we should be able to do a lot better today.

Personally I don't see anything wrong with building adapters, which is really what an ORM is.
There's no problem with building adapters if the adapters are truly needed. Implicitly, I'm arguing that many, many programs that deal with the database today would be better served by ditching OO for data representation, and use the relational model directly.
My point is that we should have general-purpose programming languages with support for the relational model, where relation types, relations, relational operators, relational constraints and the like, are all first-class concepts. The closest we have right now in the mainstream is LINQ.

As for LINQ I personally do not plan to use it in my projects mostly because it makes use of classes dynamically generated at runtime upon which it is not possible to make use of declarative programming with attributes. I also do not like the idea of mixing my business logic with my data access in partial classes.
That seems to indicate that you see the database as a "data store," providing persistence for data, while your program provides "business logic." This assumption just begs the question; my argument is that we should have better support for the relational model right in the language, so that we can use the database as a kind of restricted theorem prover optimized for high performance and high data volumes. The database isn't just an inert collection of data that you "access"; you give the database certain assertions about your model, the business logic and the facts; then you ask it complex question questions, and it gives you the answers that the inputs entail. You're not dealing with a data store; you're dealing with a semantic repository of facts.
Why can't we do stuff like that today? Because our RDBMSs aren't good enough, and neither are our general-purpose programming languages (functional and OOP both). And nobody's seriously investing in making it happen.

So what? The same can be said for object oriented programming environments and technologies. Now obviously you don't believe that the benefits are worth the cost of admission, but that is sort of like arguing that we should neve

--
Are you adequate?
Re:This has been coming for a while. by Buck2 · 2007-12-17 20:40 · Score: 1

you guys talk a lot

--

As my father lik@(munch munch)... ....

Microsoft & 8+ x cores by DrYak · 2007-12-17 06:38 · Score: 1

For the very first time, 8 and more cores CPU will enable exclusively windows users to run an extensive whole botnet spitting out spam... ..all this running on 1 single multicore CPU.

thank you, Microsoft !

--
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]

Re:I'm sure no one will ever read this, but by ThreeIfByAir · 2007-12-17 06:43 · Score: 1

No. Read the article and/or the thread. This isn't about interfacing with the hardware; it's about how you designed your algorithm in the first place. All that MS could do in the OS, they have pretty much done already. Now it's the application developer's job to work out how to split their program up into little pieces that can all run at once.

That said, there is some mileage in compilers figuring out the parallelism and doing it themselves; but I've seen some attempts at it and they really didn't help all that much, because the real parallelism win is generally at a coarser level than a compiler can easily pick out (i.e. it's beyond the bounds of one function). Compiler technology is going to have to change fairly radically (with a lot of intelligence moved to the linker) if we are really going to see a significant payoff from automatic compiler-driven parallelism. I think it could happen, but it's probably a way off yet.

Re:Oh, wow by thebjorn · 2007-12-17 06:44 · Score: 1

Methinks you have no idea what you're talking about. Perhaps you should google for Herb Sutter?

It's the OS Stupid !!! by deweycheetham · 2007-12-17 06:44 · Score: 2, Insightful

It's the OS Stupid, Not Parallel Programming !!!

Just because the latest and greatest release of a New OS by a certain vendor is dog slow doesn't mean it's time to start blaming Programmers and calling them LAME.

There are several good Operating Systems out there that handle multiple threads on multi core machines just fine. They even do this in there basic scripting languages native to those Operating Systems and many have been doing them since the 70's.

There are techniques out there that handle work just fine in a Parallel Program/Core Environments. On a side note, Data Encapsulated Object Oriented techniques are not always the best way handle performance issues. A look back in time has the several answers to this question and more. (Less We Forget)

--- Old engineers never die, they just build away. (By deweycheetham) ---

Re:It's the OS Stupid !!! by chuck97224 · 2007-12-17 07:13 · Score: 1

I agree. The OS code is the core (no pun intended) of the problem, not application programs. For example, application programmers normally don't write sort algorithms. They simply call a "sort" routine provided by library code. Usually these library routines are specific to a particular OS (e.g. dot net).

The bottom line: fix the OS and its related libraries and you'll fix most of the problem described in the article. Microsoft knows this. That is why the are so involved with this issue.

-- chuck

Re:concurrency - the developer's responsibility? by Kjella · 2007-12-17 06:45 · Score: 1

Well, I'd guess he's thinking about the classic scheduling and concurrency problem of many processes trying to run on few cores. While it's surely interesting and OS's handle it slightly differently, it's a known problem dating at least as far back to the first multi-user unix machines. Now you have the problem of having one big task, and meny cores which isn't really new (you had beowulf clusters, SMP machines etc. before) which always required the programmer to specificly design for the work to be broken down and reassembled. The OS will never do that for you, in the general case it's a really ugly problem anyway - at which point is the overhead so large, you'd rather do it locally? If I'm sorting a list of ten items it may not make sense at all. A million and I might split it on all cores. A billion and I'd send it off to a server farm - if I have the bandwidth. Asking programmers to answer conditionals like that (and that may easily change at any moment) is just begging for trouble. But no, the OS doesn't have clue either.

--
Live today, because you never know what tomorrow brings

Implementation of critical productivity tools by Namlak · 2007-12-17 06:48 · Score: 1

Exactly what areas of "personal computing" are requiring this horsepower?
Two words: Clippy Core

remember 1bit DACs? by johnrpenner · 2007-12-17 06:49 · Score: 1

in early CD players -- there was a shift away from 16 - 24 - 48 bit DACs when someone came up with the idea to use a 1 bit DAC 16 times as fast -- the same thing is happening now with cores on a CPU -- somewhere, someone is going to think of a CPU with 4096 -- 16383 -- and yes, even 32,768 discrete 1 bit cores (!!) -- it is going to take someone substantially changing the kernal scheduler to take advantage of it -- but there WILL be thousands of cores in a single CPU within the next ten years -- but it will require a rethink in the kernal scheduler, and programmers in their habits to take advantage of it.

Re:remember 1bit DACs? by H0p313ss · 2007-12-17 07:31 · Score: 1

someone is going to think of a CPU with 4096 -- 16383 -- and yes, even 32,768 discrete 1 bit cores (!!)

That is either the most brilliant, or the stupidest idea ever... and I can't make up my mind which

Perhaps my mind would work better if I could address more than a byte at a time? (Or a bit? Damn my head hurts...)

--
XML is a known as a key material required to create SMD: Software of Mass Destruction

Re:melt in your mouth not in your mobo by Grandiloquence · 2007-12-17 06:50 · Score: 1

So, what you're saying is the sun is a beowulf cluster of 25Ghz Pentiums?

Re:concurrency - the developer's responsibility? by AndyCR · 2007-12-17 06:51 · Score: 1

So the OS is supposed to magically know how to poke it's fingers into a process and divide the workload of that process among multiple cores/CPU's without incurring performance loss or data corruption? If you know how, many people would sure be happy to know your secret.

--
If there's anyone I hate more than stupid people, it's intellectuals.

There's not much hope for the C++ committee by Animats · 2007-12-17 06:51 · Score: 3, Insightful

I have little hope for the C++ standards committee. It's dominated by people who think really l33t templates are really cool. Everything has to be a template feature. They're fooling around with a proposal for declaring variables atomic through something like atomic<int> n; This allows really l33t programmers to write really l33t code using really l33t lockless programming. But without the proofs of correctness needed to make that actually work reliably.

It's also long been Strostrup's position that concurrency is a library problem. As long as the OS provides threads and locking, it's not a language problem. This isn't good enough.

The fundamental problem is that, as currently defined, a C++ compiler has no idea which variables are shared between threads, and which are never shared. The compiler has no notion of critical sections. Fixing this requires some fundamental changes to the language. It's known what to do; Modula, Ada, and Java all have synchronization and isolation built into the language. But there's nothing like that in C++, and the designers of C++ don't want to admit their mistakes.

It's not just a C++ problem. Python has a similar issue. Python as a language doesn't deal with concurrency adequately. The main implementation, CPython, has a "global interpreter lock" that slows the thing down to single-CPU speed.

Re:There's not much hope for the C++ committee by pipacs · 2007-12-17 08:24 · Score: 1

Not sure about the C++ problems you mention, but in Python there is Kamaelia and a dozen of other libraries targeted for creating scalable parallel systems.

Btw. an earlier post mentions the upcoming QT concurrency framework - if Trolltech is able to pull this out on all C++ platforms they support, then it kind of justifies Stroustrup's position, isn't it?
Re:There's not much hope for the C++ committee by master_p · 2007-12-17 09:39 · Score: 1

Critical sections are a high level future which must be in a library. What c++ requires is a standard for atomic operations...just like the volatile keyword disallows optimizations on a memory store, there is a need for atomically reading, writing, testing and setting variables.

Of course, each O/S provides its own set of routines (e.g. Interlocked functions of Windows), but it would be better if the language provided the primitives for atomic operations.
Re:There's not much hope for the C++ committee by Animats · 2007-12-17 10:01 · Score: 2, Insightful

Critical sections are a high level future which must be in a library.
The problem is that a C++ compiler doesn't know what data is locked, and which data items are locked by which lock, because the language has no way to talk about that subject. OS-level primitives lock everything. The compiler has a hard time telling which data needs concurrency protection. Thus, the compiler can't diagnose race conditions.
If the language understood locking, one could do more checking at compile time. One could take a hard-nosed approach. Every variable has to be locked by something. Either it's locked by the object of which it is a member (like Java's "Synchronized"), or the thread to which it is local, or by some other object which owns the variable. This last is something for which a language needs descriptive syntax.
One approach would be syntax where the programmer declares a critical section, and lists everything that can be referenced within the critical section. But that might not be necessary. A system more like the way an SQL database decides transaction locking issues might be easier on the programmer.
The big memory headache in C and C++ is always "who owns what", something with which the language provides no assistance. That's the cause of dangling pointers and memory leaks, but it's also the cause of much locking trouble.
Re:There's not much hope for the C++ committee by master_p · 2007-12-17 21:44 · Score: 1

No, no, no, you got it all wrong!

First of all, no language knows what data is locked at compile time. The problem of deciding what is locked and what is not is undecidable, just like everything else that submits to the halting problem. There is no compiler that can decide when something happens in a program and when it does not.

Secondly, a keyword like 'synchronized' does not buy you anything: it's just syntax sugar for pthread_mutex_lock/pthread_mutex_unlock, and it can easily be emulated with a macro in c++:

#define synchronized for(Lock l; l.isLocked(); i.unlock())

Thirdly, it's very easy to make a Java program deadlock using the 'synchronized' keyword. For example, let's say we have two classes that recursively call each other from different threads...instant deadlock.

The key to multithreading programming is to avoid locking altogether. It's possible to do it, using atomic functions at predefined places.

Forthly, your solution on declaring items within a critical section does not work (see #1 for explanation). For example, what if you declare a pointer inside the critical section, and then deep in an algorithm, or worse, deep in a dll loaded at run-time, the pointer is assigned a value which points to a non-synchronized object?

#5, in databases, what is the most common approach is optimistic locking: you tell the database that you want to update X if Y's value is Z. That's the solution for multithreading as well...it's called acquire semantics, and PowerPC CPUs even support it at hardware level. Atomic test and set instructions are a similar approach.

In databases, in 99% of cases, locking happens only in batch procedures within the database. Outside of it, locking something (a row, or, even worse, a table), is just asking for trouble.

Finally, the question 'who owns what' in a programming language shouldn't exist: nobody should own anything, freeing the programmer from worrying about ownership. This can happen only with garbage collection.
Re:There's not much hope for the C++ committee by shutdown+-p+now · 2007-12-17 23:21 · Score: 1

Of course, each O/S provides its own set of routines (e.g. Interlocked functions of Windows), but it would be better if the language provided the primitives for atomic operations.
I'm not sure why GP was ranting about the ISO C++ committee, since that's precisely what they are introducing in C++09.
Re:There's not much hope for the C++ committee by Animats · 2007-12-18 04:56 · Score: 1

First of all, no language knows what data is locked at compile time. The problem of deciding what is locked and what is not is undecidable, just like everything else that submits to the halting problem. There is no compiler that can decide when something happens in a program and when it does not.
Oh, another "halting problem" bogus answer. There are languages where the compiler knows what's locked at compile time. The Ada "rendezvous" approach knows, for example.
Secondly, a keyword like 'synchronized' does not buy you anything:
...
Thirdly, it's very easy to make a Java program deadlock using the 'synchronized' keyword. For example, let's say we have two classes that recursively call each other from different threads...instant deadlock. Yes. And that's a good thing. Deadlocks show up early in debugging; race conditions show up late. This is, in fact, a classic source of programming problems - lack of a clear understanding of when control is "inside" an object. Java has the problem that there's no way to say that control is temporarily leaving an object. The Microsoft Spec# team, which is doing proofs of correctness for C#, has addressed this, though.
The key to multithreading programming is to avoid locking altogether. It's possible to do it, using atomic functions at predefined places.
"Predefined places"? I know there's a cult of "lockless programming", but that approach is very difficult to debug.
Forthly, your solution on declaring items within a critical section does not work (see #1 for explanation). For example, what if you declare a pointer inside the critical section, and then deep in an algorithm, or worse, deep in a dll loaded at run-time, the pointer is assigned a value which points to a non-synchronized object?
That's why compilers need to know more about what locks what. Tracing dependency issues is better done by programs than by people.
#5, in databases, what is the most common approach is optimistic locking: you tell the database that you want to update X if Y's value is Z.
Oh, a "compare and swap" fan. Actually, most of the better databases accumulate a list of what objects have been touched by a transaction, and if there's an intersection between the lists for two transactions, one of them has to be backed out. Even MySQL does this now. There's a proposal to provide similar transaction-like semantics for C++, and that might be a promising idea. It's more idiot-proof than most other approaches.
Finally, the question 'who owns what' in a programming language shouldn't exist: nobody should own anything, freeing the programmer from worrying about ownership. This can happen only with garbage collection.
The trouble with garbage collection is that it introduces indeterminism into otherwise deterministic programs. Garbage collection is OK if you can afford random stalls and don't use destructors. Destructors and GC do not play well together. Microsoft tried, in Managed C++, to make them play together, and it wasn't a happy marriage. There, destructors can be called more than once.
Reference counting does a good job of managing memory without disallowing destructors. It's noteworthy that Perl and Python, both of which are reference counted, generally allow programmers to ignore memory allocation. But most reference counting systems are "naive"; the compiler implementing them doesn't known enough about reference counts to optimize them. This leads to performance problems. One can do better; the compiler should hoist reference count updates out of loops.
Re:There's not much hope for the C++ committee by master_p · 2007-12-18 11:13 · Score: 1

"Oh, another "halting problem" bogus answer. There are languages where the compiler knows what's locked at compile time. The Ada "rendezvous" approach knows, for example."
It's not bogus. Consider the following pseudo code:

class Node { public: Node* left; Node* right; Node() : left(NULL), right(NULL) { } Node(Node *l, Node *r) : left(l), right(r) { } }; Node *createTree(int depth) { if (depth == 0) return Node(); return Node(createTree(depth - 1), createTree(depth - 1)); } Node *getRandomNode(Node *tree, int n = 0); BEGIN_CRITICAL_SECTION Node *someNode; END_CRITICAL_SECTION void *threadProc1(void *p) { delete someNode; } int main() { cout << "enter a number: "; int i; cin >> i; cout << "creating tree of depth " << i << endl; someNode = createTree(i); cout << "pick a node number\n"; int n; cin >> n; someNode = getRandomNode(n); CreateThread(threadProc1); }

In the above piece of code, there is no way for the compiler to know where someNode points to, because its completely random, i.e. it depends on the input data.
The Ada rendezvous mechanism does not synchronize over data, it synchronizes on control flow. It's the responsibility of the programmer to provide the appropriate data. In other words, it's nothing else than the Java's synchronized keyword.

Yes. And that's a good thing. Deadlocks show up early in debugging; race conditions show up late. This is, in fact, a classic source of programming problems - lack of a clear understanding of when control is "inside" an object. Java has the problem that there's no way to say that control is temporarily leaving an object. The Microsoft Spec# team, which is doing proofs of correctness for C#, has addressed this, though.
You never know when a deadlock will hit you. Sometimes you may run the same program 100s of times, and there is no deadlock, and then there are 100 times that it deadlocks. Again, due to the halting problem, there can be no definitive decision about which data of the program might have deadlocks; the same is valid for race conditions.

"Predefined places"? I know there's a cult of "lockless programming", but that approach is very difficult to debug.
No, it is not. I have written a small Actor implementation where the only locking is when a message is placed in an object's queue and when a thread is being placed on a future's waiting list. In my implementation, no thread is suspended, they all spin using Interlocked functions, so there can be no deadlock.

That's why compilers need to know more about what locks what. Tracing dependency issues is better done by programs than by people.
Don't insist on things that can't be done. Please go study the halting problem carefully. I used to think like you ("why is the compiler so stupid?"), but there is a reason behind all the problems. It's a property of the universe that there can be no proof by a Turing machine that another Turing machine can terminate; you see, all Turing machines are equivalent, so you can't prove something by itself (you need a higher system which provides proofs for the axioms of the subsystem).

Oh, a "compare and swap" fan. Actually, most of the better databases accumulate a list of what objects have been touched by a transaction, and if there's an intersection between the lists for two transactions, one of them has to be backed out. Even MySQL does this now. There's a proposal to provide similar transaction-like semantics for C++, and that might be a promising idea. It's more idiot-proof than most other approaches.
That's optimistic locking, and lockless thread synchronization is the same: a tr

Re:Programmers are at fault by K.+S.+Kyosuke · 2007-12-17 06:51 · Score: 1

I didn't notice myself knocking anybody for using any tools. It's just that even now, C++ seems to be complicated enough with quite a lot of dark places even without trying to adopt some concurrency model. That might qualify as knocking the C++ committee :-), but perhaps I'm not the only one to do so. I have a friend who, in the past few years, publicly displayed quite a lot of C++ zealotry. As he was doing quite a lot of hard-core programming in almost anything you can imagine for the psat twenty years or so, his love for C++ stunningly contrasts with the insults I've recently heard from him, addressed to the C++ committee folks. I would very much like to see the qualities of C++ preserved and its pitfalls removed or compensated, but I fear this need not necessarily happen in the near future.

--
Ezekiel 23:20

Re:Threads Are Not the Answer by clampolo · 2007-12-17 06:53 · Score: 1

I agree that threads are a crappy way of doing parallel processing. It's better to just extend c/c++ with some message passing interface. That way if a programmer actually knows parallel processing, they can incorporate it into their code in a portable manner. And people that don't know any parallel programming can still write an inefficient version of the same thing. For all the hype, parallel programming isn't THAT difficult. After all, something like VHDL or verilog is basically just parallel programming.

The rise of Erlang and Haskell? by steveha · 2007-12-17 06:54 · Score: 2, Interesting

I know that languages like Erlang and Haskell are better for concurrent programming than more traditional languages. However, so far they have not been as popular as more traditional languages.

Will the new world of concurrency cause a shift in language popularity? Or will traditional languages remain more popular, perhaps with some enhancements? C++ is gaining concurrency enhancements; C++, Python, and many other languages work well with map/reduce systems like Google MapReduce; and even with no enhancements to the language, you can decompose larger systems into multiple threads or multiple processes to better harness concurrency.

If you know Haskell and Erlang, please comment: do those languages bring enough power or convenience for concurrency that they will rise in popularity? People grow very attached to their familiar languages and tools; to displace the entrenched languages, alternative languages need to not just be better, they need to be a lot better.

steveha

--
lf(1): it's like ls(1) but sorts filenames by extension, tersely

Re:The rise of Erlang and Haskell? by Goalie_Ca · 2007-12-17 08:01 · Score: 1

I know haskell well enough. I'm not a total n00b but i haven' written any programs in it with more than two hundred lines. I can say that concurrent haskell is really easy and very efficient. The STM library was very well thought out and is really easy to use. Just wrap a series of transactions in atomic { } and you are set. Regular locked variables are done using MVar. Basically the locking is done automatically and the variable in question is wrapped by MVar.

Data parallel haskell is where things break down. It isn't quite so mature but frankly this is a very difficult problem. Currently they are on the right track. Basically it is up to the programmer to annotate themselves with `par` what they think should be done in parallel. I do not ever see this being done universally and automatically in any language. Clearly the programmer knows where things should be split.

--

----
Go canucks, habs, and sens!
Re:The rise of Erlang and Haskell? by John+C+Peterson · 2007-12-17 08:06 · Score: 2, Interesting

The purely functional approach has a lot of merit. But functional programming by itself probably won't solve the big problems. Erlang has a very specific approach to threads and parallelism that works very well when it's appropriate. A more general approach is taken in Concurrent Haskell (http://en.wikipedia.org/wiki/Concurrent_Haskell), in which a Software Transactional Memory (STM) replaces the lower level mechanisms such as locks that are so crucial to threaded programming. I expect the real breakthrough to occur when high level concurrency tools like STM come into use to replace the existing parallel programming framework of threads and locks. There was a time when everyone assumed that automatic memory management was "too high level / slow / buggy" to be practical in a real programming language but now most programmers are happy to build their programs without worrying about memory allocation. In the parallel world, threads and locks are the malloc / free of the past and something like STM could well be the basis for a higher level approach that will make concurrency a natural way to program.
Re:The rise of Erlang and Haskell? by autophile · 2007-12-17 09:47 · Score: 1

If you know Haskell and Erlang, please comment: do those languages bring enough power or convenience for concurrency that they will rise in popularity? People grow very attached to their familiar languages and tools; to displace the entrenched languages, alternative languages need to not just be better, they need to be a lot better.

I know some Erlang. The key behind Erlang's concurrency model is that there is no global state, which means that threading becomes easy. In addition, the threads are extremely light-weight, so context-switching between a bazillion threads is not a worry. Erlang encourages you to make everything a thread, because there is just no downside. And, it's been proven in a highly demanding environment -- Ericsson's telecom switches.
The biggest problem with Erlang -- which is more like a problem with programmers -- is that Erlang is a functional language. If you've been working in procedural languages for the last few decades, then getting your head around a functional language requires a lot of effort. Where you would normally fire off code that does this, then that, then loops this, then switch-cases that, then recurses, then does some more stuff, functional programming feels more like a punch in the nose than the solution to massive concurrency.
I think Erlang could be big Big BIG, but we would have to run up against a wall where the existing procedural programming paradigms are insufficient and very ungraceful when they are sufficient. I sense that this is why Java, JavaScript, Python, PHP, and Ruby came about. Not because you can't do the same thing in C, but because it hurts like hell when you try (imagine client-side code in web pages when you only have C to work with).
Now if there were a procedural-looking language with the features of Erlang, I think that would be an instant success.
--Rob

--
Towards the Singularity.
Re:The rise of Erlang and Haskell? by yermoungder · 2007-12-17 21:11 · Score: 2, Informative

You could try Ada.

Ada is a multi-paradigm language (i.e. procedural or OO) that has threads ("tasks") built it. The experiences of Ada83 tasking wasn't brilliant - the OS/hardward available at the time just weren't up to the job and hopelessly expensive. This left a nasty taste for some which in turn led to FUD about the language as a whole - you wouldn't believe the rubbish I've heard over the years about what Ada is or is supposed to do!

Ada95 (and in particular the $0, Open Source GNAT compiler) changed that, making an affordable-for-the-masses,, fast Ada environment available on GNU/Linux and Windows platforms. It now comes with an Eclipse plug-in too.

Now, Ada2005 has arrived which even extends OO into the domain of active objects (i.e. extensible, polymorphic tasks).
Re:The rise of Erlang and Haskell? by kaian · 2007-12-23 05:07 · Score: 1

If you know Haskell and Erlang, please comment: do those languages bring enough power or convenience for concurrency that they will rise in popularity? They provide power and convenience, but was that ever enough to gain popularity? Still, functional languages do seem to gain in popularity, and for good reasons. They provide good answers to both concurrency and a lot of other problems that haunt imperative oo programming.

Re:I'm sure no one will ever read this, but by ACMENEWSLLC · 2007-12-17 06:59 · Score: 1

The OS needs to be non blocking. I have multiple core servers running 2000/2003/XP in which I've wrote programs that can take advantage of the multiple cores. I'm not doing anything fancy, I have a parent job with multiple children jobs. The parent is responsible for giving the children work. The children all run independent of each other. Thus, the nature of Windows, the load is spread across the cores.

The problem I have is that I use some Windows API's to enumerate things such as all the machines in the network, shares on a remote machine, netbios info, etc. I've found that many of these calls are blocking. So when I attempt this call, the entire machine slows down to a crawl and API's in other programs will not complete until the API has finished doing it's thing. For example, one of these calls prevent you from unlocking the PC until it's complete.

I'm not an expert in this topic, that is empirical observations on my part. Perhaps Vista & 2008 server will resolve this.

I hate this topic... by sonofagunn · 2007-12-17 07:01 · Score: 1

... but I'm going to reply anyway.

Software that needs to be multithreaded today easily can be, and usually already is. There is no issue in the industry.

Please move along, nothing to see here.

Imagine that your core overheats while idling by Anonymous Coward · 2007-12-17 07:01 · Score: 1, Insightful

The heat transfer problem has not been solved for thick, 3D processors with millions of transistors.

MPI on the desktop? by rizzy · 2007-12-17 07:06 · Score: 1

There's an alternative to multi-threading. Will we see MPI used in more than just supercomputing applications? The message-passing vs. shared memory arguments are as old as the hills in high-end computing, with most (but not all) applications using message passing. As the memory hierarchy on personal computers looks more and more like NUMA, message passing starts to look really attractive, even when the parallelism is across a single node.

Re:MPI on the desktop? by ceoyoyo · 2007-12-17 18:36 · Score: 1

It's not that simple. Message passing is dominant on the really large systems simply because really fast interconnects that make shared memory work well are not practical or in many cases even possible on systems that big. On smaller computers that do have the really fast interconnects (which usually means that all the processors have to be in one box), shared memory is usually preferred.

Personally I hate MPI. Yes, I've worked with it. I think the whole thing is much more elegant and easier to program with threads that are properly encapsulated in objects.

Chip makers want developers to pay for lunch by ClosedSource · 2007-12-17 07:07 · Score: 2, Interesting

Instead of developing single-core chips with better performance, chip makers are now making multicore machines and expecting developers to provide the extra performance.

Without the work of developers, multi-core chips will be like the extra transistors in transistor radios in the 1960s: good for marketing but functionally useless.

Re:Chip makers want developers to pay for lunch by JoelKatz · 2007-12-17 16:11 · Score: 1

What good is a 32-bit CPU without developers making 32-bit code? What good are MMX or SSE instructions if developers don't use them? No CPU is any good unless developers use the features it provides. It has *always* been a partnership. The reality is that in all practical situations, it will always be easier to make two chips than one twice as fast.
Re:Chip makers want developers to pay for lunch by ClosedSource · 2007-12-17 18:43 · Score: 1

"What good is a 32-bit CPU without developers making 32-bit code? What good are MMX or SSE instructions if developers don't use them?"

At the application level it's often not necessary to do something special to run 32-bit software.

The point is that the performance improvements of the past were primarily intended to boost new processor sales rather than do something for software developers. The fact that legacy software ran faster on a new processor is a great way to get people to upgrade their hardware, but really has no effect on software sales.
Re:Chip makers want developers to pay for lunch by JoelKatz · 2007-12-17 19:22 · Score: 1

"At the application level it's often not necessary to do something special to run 32-bit software."

In which case, you get basically no benefit.

"The point is that the performance improvements of the past were primarily intended to boost new processor sales rather than do something for software developers."

Of course. Intel and AMD make chips for consumers, not for developers.

"The fact that legacy software ran faster on a new processor is a great way to get people to upgrade their hardware, but really has no effect on software sales."

I think that's largely true, but I don't see what it has to do with anything.

The point I'm trying to make is that developers have always had to figure out how to get the most out of new CPUs. Optimizing for modern CPUs is very, very tricky even ignoring multi-core or multi-thread issues. For example, you can't just add up the number of clock cycles for each instruction on any modern x86 or x86-64 CPU. You can't easily anticipate cache effects, mispredicted branched, and any number of other things.

It has always taken massive work on the part of developers to get performance out of new processors. People forget this because there was a historically brief period where not quite so much as usual was required. It is amazing that Intel has kept the x86 instruction set basically the same for as long as it has.
Re:Chip makers want developers to pay for lunch by ClosedSource · 2007-12-18 07:17 · Score: 1

"The point I'm trying to make is that developers have always had to figure out how to get the most out of new CPUs. Optimizing for modern CPUs is very, very tricky even ignoring multi-core or multi-thread issues. For example, you can't just add up the number of clock cycles for each instruction on any modern x86 or x86-64 CPU. You can't easily anticipate cache effects, mispredicted branched, and any number of other things."

For the reasons you state, I'd say that such optimization is a fool's errand. It also shows why you shouldn't try to create real-time software to run on x86 system.
Re:Chip makers want developers to pay for lunch by JoelKatz · 2007-12-18 11:28 · Score: 1

"For the reasons you state, I'd say that such optimization is a fool's errand."

Nonsense. Perfect optimization is a fool's errand. Claiming that any significant code can't be further optimized is a fool's errand. But sometimes small amounts of effort, in the right place, can have a huge payoff. That's hardly a fool's errand.

"It also shows why you shouldn't try to create real-time software to run on x86 system."

Nonsense. If the worst case behavior meets the real-time requirements, then you win. You can often get x86 systems whose worst case behavior is many times faster than non-x86 systems at the same price level. This is often the case because x86 hardware is made in massive quantities. Why do you think the Hubble space telescope uses x86 processors?
Re:Chip makers want developers to pay for lunch by ClosedSource · 2007-12-18 18:17 · Score: 1

"If the worst case behavior meets the real-time requirements, then you win. You can often get x86 systems whose worst case behavior is many times faster than non-x86 systems at the same price level."

Real-time has nothing to do with raw speed. There are real-time requirements that x86 software couldn't meet even if it used a processor a million times faster than the current state of the art. Real-time is about meeting bounded timing requirements. It's like batting in baseball: swinging too early is a just as much a strike as swinging too late no matter how fast you can swing.

What happens today is that systems with real-time requirements have delegated the real-time parts to specialized hardware instead of software. In that way they can avoid the problems that inconsistent timing would otherwise present. Thus real-time systems can be implemented on a x86 system, but not entirely in software.
Re:Chip makers want developers to pay for lunch by JoelKatz · 2007-12-18 19:04 · Score: 1

"What happens today is that systems with real-time requirements have delegated the real-time parts to specialized hardware instead of software. In that way they can avoid the problems that inconsistent timing would otherwise present. Thus real-time systems can be implemented on a x86 system, but not entirely in software."

I think you have no idea what you're talking about. What kind of specialized hardware do you think is needed? In truth, they use specialized *software*.
Re:Chip makers want developers to pay for lunch by ClosedSource · 2007-12-19 07:23 · Score: 1

"I think you have no idea what you're talking about."

Sure, as a former professional Atari 2600 programmer, I don't know anything about real-time software. Just because I had to write code that was accurate to a single CPU cycle, doesn't mean I know what I'm talking about.

"What kind of specialized hardware do you think is needed?"

It depends on what you are trying to build. Perhaps you are thinking of "fast as a bunny" applications rather than real-time.
Re:Chip makers want developers to pay for lunch by JoelKatz · 2007-12-19 07:39 · Score: 1

Why reply if you're not going to answer? You are wrong, specialized hardware is not required for the vast majority of real time applications. Only specialized software is required.

I've asked you what kind of specialized hardware you think is required and the best you can come up with is "it depends". No, it doesn't depend. Specialized hardware is rarely needed to meet real time requirements.

It may once have been, back in the days of the Atari 2600. But these days, commodity hardware provides all of the things needed. All modern x86 commodity implementations includes prioritized interrupts, precision counters and timers with interrupt-generation capability, and so on.
Re:Chip makers want developers to pay for lunch by ClosedSource · 2007-12-19 10:24 · Score: 1

"I've asked you what kind of specialized hardware you think is required and the best you can come up with is "it depends". No, it doesn't depend. Specialized hardware is rarely needed to meet real time requirements."

So your saying that the specialized hardware you believe isn't needed doesn't depend on the application? I don't see how you can claim to understand something you don't believe exists. Do PC video capture boards merely consist of a TV tuner and a D/A converter controlled and sampled by specialized software running on the PC's x86? Why not? Now it's your turn: what software applications do you claim are real-time without any hardware assist?

"All modern x86 commodity implementations includes prioritized interrupts, precision counters and timers with interrupt-generation capability, and so on."

The features you describe are neither necessary nor sufficient for many real-time applications. Again, it's predictable timing behavior that is key.
Re:Chip makers want developers to pay for lunch by JoelKatz · 2007-12-19 10:31 · Score: 1

"So your saying that the specialized hardware you believe isn't needed doesn't depend on the application? I don't see how you can claim to understand something you don't believe exists."

I'm not sure if this sudden bout of stupidity is intentional or unintentional, but I'm saying that whether or not specialized hardware is generally needed to provide real-time capability doesn't depend on the application because the question isn't about particular applications. We're talking about whether the x86 platform is suitable generally for real time applications, not the ins and outs of particular real time applications.

You are attempting for some reason to sideline this into ever and ever narrower and weirder areas. The fact is, modern x86 systems have all the hardware required for most real time applications.

Your argument about PC video capture boards makes my point. They work fine in x86 systems and don't require any specialized hardware to make them work. They would work exactly the same in power PC systems or MIPS system. The fact that the board is in an x86 system doesn't make the hardware or software any more complex or simpler.

All that is needed is specialized software in the x86 system to ensure that the board is serviced at least as often is required.

"Again, it's predictable timing behavior that is key."

The simplest way to get predictable timing behavior is simply to ensure that the worst case performance is sufficient to meet the timing requirements. Unpredictable acceleration only hurts you if the software isn't designed to account for it. Fortunately, designing the software to account for that is all but trivial.
Re:Chip makers want developers to pay for lunch by ClosedSource · 2007-12-19 11:24 · Score: 1

"Your argument about PC video capture boards makes my point. They work fine in x86 systems and don't require any specialized hardware to make them work. They would work exactly the same in power PC systems or MIPS system. The fact that the board is in an x86 system doesn't make the hardware or software any more complex or simpler."

Wow, you are really missing the point. The capture board is more than just a minimal interface between a TV signal and the PC's processor, it's doing the heavy lifting with respect to the real-time requirements of capturing video. Thus the card is specialized hardware that makes video capture possible by performing the real-time processing. Obviously, it is "specialized" because you couldn't use that card to capture other kinds of data. Being "specialized" has nothing to do with complexity.

"The simplest way to get predictable timing behavior is simply to ensure that the worst case performance is sufficient to meet the timing requirements. Unpredictable acceleration only hurts you if the software isn't designed to account for it. Fortunately, designing the software to account for that is all but trivial."

That's fine if the timing requirements only have an upper bound, but what if there's a lower bound as well? What if the timing window is narrower than the jitter in timing behavior due to caches and prefetch etc? What is your trivial solution for that?
Re:Chip makers want developers to pay for lunch by JoelKatz · 2007-12-19 11:38 · Score: 1

""Wow, you are really missing the point. The capture board is more than just a minimal interface between a TV signal and the PC's processor, it's doing the heavy lifting with respect to the real-time requirements of capturing video. Thus the card is specialized hardware that makes video capture possible by performing the real-time processing. Obviously, it is "specialized" because you couldn't use that card to capture other kinds of data. Being "specialized" has nothing to do with complexity.""

The point is simply that this same hardware work work precisely the same on an x86 platform as on any other platform. It has nothing whatsoever to do with the subject of our disagreement, which is whether x86 platforms are unusually unsuitable for real-time use.

"That's fine if the timing requirements only have an upper bound, but what if there's a lower bound as well? What if the timing window is narrower than the jitter in timing behavior due to caches and prefetch etc? What is your trivial solution for that?"

Let's take the case of video acquisition. A typical requirement will be that the video hardware will trigger an interrupt when the buffer fills to some particular point. The buffer must be drained before it overflows or else data will be lost. All that matters is that you have an upper bound on the latency between when the interrupt occurs and when you get around to draining the buffer.

Suppose you have a weird situation where you must service the interrupt more than 5 milliseconds after it occurs but no more than 15. When the interrupt occurs, you simply arrange for another interrupt to be generated in 5 milliseconds. So long as you can upper bound the cost of all delays to 10 milliseconds, you will meet the interval requirement.

If the timing interval is narrower than the window and cannot be reduced by software, then the hardware is unsuitable. Generally, the fix is faster hardware. For a given performance level, x86 is probably the cheapest way to get fast enough to bring that jitter down.

You can also turn things like caches off, but I've never heard of a case where the reduction in jitter was worth the reduction in absolute performance.

I think you are imagining some extremely bizarre narrow group of real-time applications rather than the whole universe of such applications. Obviously, there is no one technology that is right for the majority of bizarre applications, that's what makes them bizarre.

But for the vast majority of real-time applications, x86 hardware is cheap and well suited.
Re:Chip makers want developers to pay for lunch by ClosedSource · 2007-12-19 13:52 · Score: 1

"Let's take the case of video acquisition. A typical requirement will be that the video hardware will trigger an interrupt when the buffer fills to some particular point."

In this scenario the critical real-time processing has already been performed by the video acquisition card. Then it generates an interrupt to the x86 which allows it to process the data in a non-real-time manner. Yes, the software has to be able to respond without too much latency, but the transfer of data doesn't occur in a real-time manner from a software point of view. In other words the timing behavior of the X86 software is not directly related to the real-time requirements of video data acquisition, but to the size of the card's video buffer. If the size of the buffer exceeded the available memory on the PC, the PC software wouldn't need to satisfy any timing deadlines at all. Thus an obvious real-time process could be performed without any timing requirements on the part of the software. How can this be? It's possible because the real-time behavior is being done the card, not by the software running on the X86 in the CPU.
Re:Chip makers want developers to pay for lunch by JoelKatz · 2007-12-19 14:58 · Score: 1

You're being an idiot. You're impervious to facts and reason. Most likely, anyone reading can already see this.

The point is, this is a system that implements real-time requirements. It is a realistic example of such a system. In fact, it is typical. It works just as well if the processing unit is a commodity x86 system or some other architecture.

Just as x86 works perfectly in this typical system that manages real-time requirements, it also works perfectly in many similar systems that meet real-time requirements.
Re:Chip makers want developers to pay for lunch by ClosedSource · 2007-12-19 18:37 · Score: 1

"You're being an idiot. You're impervious to facts and reason. Most likely, anyone reading can already see this."

Typically the guy who keeps insulting his opponent and making it personal is the one who is suspect. In any case, I doubt that many people are interested in our little sub-thread.

"The point is, this is a system that implements real-time requirements. It is a realistic example of such a system. In fact, it is typical. It works just as well if the processing unit is a commodity x86 system or some other architecture.

I note that you are now talking about a system that implements real-time requirements rather than software that does. As I said at the start, such a system typically requires specialized hardware to meet real-time requirements and a video capture board is the specialized hardware used in this case.

You still haven't given any specific example of a real-time application that handles the fundamental real-time requirements entirely in software running on an X86 processor.
Re:Chip makers want developers to pay for lunch by JoelKatz · 2007-12-20 08:27 · Score: 1

"You still haven't given any specific example of a real-time application that handles the fundamental real-time requirements entirely in software running on an X86 processor."

Of course not, because that's a meaningless and nonsensical request. The whole point is that x86 *hardware* provides all that's needed for this, so why are you asking me for an example that does it in software?

"I note that you are now talking about a system that implements real-time requirements rather than software that does. As I said at the start, such a system typically requires specialized hardware to meet real-time requirements and a video capture board is the specialized hardware used in this case."

Wow, a video capture applications requires a video capture card. I never would have guessed that.

However, from the software standpoint, the video capture card is the source of the real time requirement. That capture card must be serviced every so often.

Almost every real time requirement will have some real world source of the real world requirement. It's hard to imagine any real time requirements locked in a closet. If you're going to argue that whatever connects that real world requirement to the system is "specialized hardware", then every real time application requires specialized hardware.

The thing is, this is a meaningless exercise. The question is whether something about the unpredictable timing of x86 hardware makes it particularly unsuitable for real-time applications. An argument that all real time tasks need hardware to talk to the real world says nothing about this point. It was a total waste of time.

The video capture card example proves the point. It can go in an x86 system, or a MIPS system, or a PPC system. Any video capture real time system will need a video capture card, and once you have that, all these platforms are suitable.

You know this. It's obvious you know this. So the question is why you're being so dense about it. And I have no answer.

Re:Wow, this is a great idea! by poot_rootbeer · 2007-12-17 07:09 · Score: 2, Insightful

People used to optimise everything way back when, but now I suspect that most people just let the faster processor take care of things rather than trying to squeeze every nanosecond of performance out of their apps :(

Thank God for that.

I'm glad that coders today can use high-level tools and languages without having to spend half their time on performance tweaking.

Take as an example a game like Halo (or Guitar Hero, or World of Warcraft, or whatever your favorite modern game is). If the developers of these titles had to execute the same amount of care in optimization as developers did on the Atari 2600 -- where often, the author had to unroll simple countdown loops because they could not afford the overheard of DEC and BEQ instructions -- yes, the game kernel would probably run twice as fast. But on the other hand, each game would take a decade to complete!

I'd happily trade some (but not all) efficiency in program execution for an increase in efficiency in program authoring. And that's exactly what we've done.

Overly simplistic. by fahrbot-bot · 2007-12-17 07:09 · Score: 1

The notion that more cores (or speed) is always better and will herald ever faster computing, especially for the home market is rubbush. Parallelism comes in many sizes and shapes. Few people are doing FFTs or fluid dynamics at home on their M$ boxes, so fine-grained parallelism is unnecessary.

Coarse-grained and/or multi-threading can help a program's design and performance if done well. But many people simply run one or two apps at a time (or, more commonly, switch between them) and all the cores in the world aren't really going to help things that much for most things. Gaming, certain computations (SETI, etc) perhaps, but not the day-to-day applications.

As a point of reference, I did some C and FORTRAN scientific research programming on an Intel box with 1024 processors (running Unix) back in the day (early 80s). A little skill, practice and a good compiler were the tools of the trade. Funny, even with all those processors, VI didn't run any faster :-)

--
It must have been something you assimilated. . . .

Fine grained vs Coarse grained parallelism by White+Flame · 2007-12-17 07:09 · Score: 2, Insightful

Fine grained (spread your for loops across processors) and coarse grained parallelism (different independent actors exchanging messages and working on tasks separately) are two completely different approaches, though they generally use the same mechanisms. Everybody always focuses on the fine grained and how that affects algorithms, but I personally believe that personal computing yields more benefit from coarse grained parallelism, where nothing in your program blocks because every task that it's performing is independent. Having modal, sequential operations that you have to wait for your computer perform before you get control back for an unrelated task in the same program is absolutely absurd in this day and age.

The few instances where a personal application does spend significant time in a single task (media manipulation, mostly) could use fine grained parallelism, but that is not the common case. Stop whining about algorithm parallelism and get your system/application design broken out into independent components and tasks properly.

Besides, as others have said, neither is particularly difficult to do properly. It's when you try to hack in threaded shared access without having properly contained the mutable data that you shoot yourself in the foot.

Re:Wait for the new C++ standard before you switch by mariuszbi · 2007-12-17 07:10 · Score: 3, Informative

Wait a second! Have you ever coded in C++ ? Even if threads are not in the standard library, you have boost, you have Intel's TBB(threading building blocks), besides the native threading library. Do you trust you library in Java? What if the VM screws everything up. As for the compiler "optimizing" everything there is a little keyword : volatile that just tells the compiler not to optimize memory access for that varible. A think the real problem is working in a new programming paradigm : have a problem with sharing variables : code everything using pure functions.

Re:Threads Are Not the Answer by ZonkerWilliam · 2007-12-17 07:11 · Score: 1

I agree with MOBE2001, back when I was Sysop I remember even Microsoft's tech's saying that Multi-threading just doesn't assemble the code that was run on separate core's very efficiently, it was all due to the re-assembly algorithm in the OS.

Collaboration Versus Concentration by EgoWumpus · 2007-12-17 07:11 · Score: 1

It is arguable that the time-spent-on measurement is something of a red herring. Why might a programmer spend a lot of time on whitespace? The predominant reason is that it has to be readable. Economically it's a bad idea to spend your time on reducing your consumption of something plentiful (CPU cycles) at the cost of something that is not plentiful (programmer-hours, which, btw = $$$).

With most projects - by which I mean most business projects, because that is where most development occurs; in a business - looking at 'simple' programs is not as clear an indicator of what path you should take as you might like. There are certainly tradeoffs - writing nice, elegant, compact code that utilizes multithreading may, in fact, improve your bottom line. But it's currently a riskier proposition than I think good business sense indicates. The first and only measure is functionality. Only where it can improve functionality will these concerns be addressed.

That is until the research institutions can pathfind to a better way of doing things. But it's not as simple as deciding multithreading is the wave of the future - even if it is.

--

[Ego]out

Re:melt in your mouth not in your mobo by Thornburg · 2007-12-17 07:13 · Score: 1

Nothing is more embarrassing than doing math in public, but here's a go:

Surface area of the Sun:
Diamater = 1,390,000 km. 4 * pi * r ^2 = 6,069,871,166,000.839 km^2
That's roughly 6*10^12 km^2, which is 6*10^22 cm^2.

Total energy output of the Sun: "386 billion billion megawatts" (per second)
That is 386*10^24 W.

386*10^24 W / 6*10^22 cm^2 = 38600 W / 6 cm^2 = (roughly) 6433 W/cm^2

Prescott CPU = 112 mm^2 = 1.12 cm^2, energy output = 103 W (TDP @ 3.4GHz).

103 W / 1.12 cm^2 = 91.9643 W/cm^2 (@ 3.4GHz) * (25/3.4) = 676.208 W/cm^2 (@25GHz)

So, _IF_ my math is right (approx .01% chance of that), then a 25GHz Prescott would only output 1/10th the power of the Sun on a /cm^2 basis.

(Note: Lameness filter sucks.)

LabVIEW [& other graphical environments] by mosel-saar-ruwer · 2007-12-17 07:18 · Score: 2, Interesting

my current major language (Igor pro) will use all the cores automatically, and how many languages do multithread this way? Matlab(?), Octave(?)

LabVIEW, by its very nature [which is graphical - based on "G" - the "Graphical" programming language] is kinda/sorta topologically self-threading: If a piece of LabVIEW code sits off in its own connected component, then [more or less] it gets its own thread.

Of course, all your ".h" & ".c" [or ".cc"] files [& their innards] might very well break down into little distinct connected components which are ripe for running their own threads, it's just that you can't - unless you're some sort of a super genius - you can't readily visualize all those connected components as they exist in your code.

Now you and your colleagues could try to anticipate the connected components a priori, during the "planning" phase: You could draw huge pictures on the dry-erase board, and everyone could yell and scream at each other about the topological structure which the code should ultimately embody, and then everyone would have to promise - Scout's Honor! - that they would stick to the blueprint [which they might very well resent as having been shoved down their throats by some pointed-headed suit who didn't have any clue what he was talking about] - but the beauty of LabVIEW is that THE CODE IS THE BLUEPRINT [which I think is a point that Jack Reeves used to make].

There's actually a Slashdotter, MOBE2001, who maintains a blog called Rebel Science News, who's got some pretty interesting ideas here - he seems to be leaning towards a graphical approach to this [realizing that the fundamental nature of the problem tends to be topological, rather than anything which we (YET!) would recognize as semantic], but his program is very, very ambitious [if I had a couple of spare lifetimes, I must just throw one in that general direction].

Another line of thought which everyone should keep an eye on is the discipline of Petri nets - it's kinduva big graphical/topological approach to state machines, which [if someone were to put the necessary elbow grease into it] might prove to be very useful in squeezing the most bang for the buck out of these massively-multicore CPU's.

Personal computing requires 2 CPUs by Locklin · 2007-12-17 07:19 · Score: 1

Most "personal computing" software can only *really* benefit from two or maybe three threads. One for the GUI, and another for any task that can take more than ~100ms. This is already common practise in some languages (eg. JAVA), and it should become standard. But there's no reason for you're average mail reader needs 8 threads.

I could see a tabbed browser using a thread for each tab, but then, you're really running multiple instances, and using you're program as a "tabbed window manager" -which is more the exception than the rule.

--
"Knowledge is the only instrument of production that is not subject to diminishing returns" -Journal of Political Econom

Microsofts view on cores by bjb_admin · 2007-12-17 07:19 · Score: 4, Funny

No need for parallel computing all cores are already used.

Core one: For the OS
Core two: Anti-virus
Core three: Anti-Spyware / Windows Defender
Core four: Firewall
Core five: Windows update notifications and installations
Core six: Windows Genuine advantage checks
Core seven: Eye Candy (Vista) with XP you get a bonus CPU
Core eight: What ever the user wants to run, except when you get a virus, then
you have to share it with the SPAM bot.

Guess we will be waiting for 16 core CPU's.

Oh and don't start me on memory requirements :-)

Re:Microsofts view on cores by igny · 2007-12-17 15:06 · Score: 1

640 cores ought to be enough for anyone.

--
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra

Re:Tarded by Layth · 2007-12-17 07:20 · Score: 1

Threads aren't harmful - what an asinine thing to say. Bad programming is harmful. BUGS are harmful. The article you reference is about map reduce, which is actually more about distributed computing than threads. If you're really so concerned about concurrency programming errors, just use executors in java. Even a monkey could do it. Want to control threads manually but you're having problems debugging? Check out the multi-threaded testing framework: http://code.google.com/p/multithreadedtc/

Re:Programmers are at fault by K.+S.+Kyosuke · 2007-12-17 07:21 · Score: 1

Oh, damned, ignore my previous post - I messed up the threads. (How fitting fot this article... ;-))

--
Ezekiel 23:20

Sutter's article is awesome by athloi · 2007-12-17 07:21 · Score: 2, Interesting

When I first started programming, in BASIC on an Apple ][ (not IIe), I remember being baffled by the fact that the computer did not operate with multiple concurrent streams. To me, this seemed the point of making something that was "more than a calculator," and the only way we would be able to do the really interesting stuff with it.

When I first started writing object-oriented code, I was somewhat dismayed to find that OO was an extension to the same ol' linear programming. It seemed to me that objects should be able to exist as if alive and react freely, but really, they were just a fancy interface to the linear runtime. Color me disapointed yet again.

It's an important paradigm shift to recognize parallel computing. Maybe when the world realizes the importance of parallel computing, and parallel thinking, we'll have that singularity that some writers talk about. People will no longer think in such basic terms and be so ignorant of context and timing. That in itself must be nice.

Sutter's article hits home with all of this. His conclusion is that efficient programming, and elegant programming that takes advantage of, not conforms to, the parallel model is the future. Judging by the chips I see on the market today, he was right, 2.5 years ago. He will continue to be right. The question is whether programmers step up to this challenge, and see it as being as fun as I think it will be.

--
technical writing / development

Re:Sutter's article is awesome by ceoyoyo · 2007-12-17 18:45 · Score: 1

You've gotta get a Mac and play with the cool living objects: http://developer.apple.com/documentation/Cocoa/Conceptual/DistrObjects/DistrObjects.html
Re:Sutter's article is awesome by MarkCollette · 2007-12-19 12:07 · Score: 1

I wonder if there are lessons to be learned from asynchronous CPU design, which can be brought over to software design.

Think of each circuit that has to handshake that it is done processing, to the next one, as being like objects rippling their invocations on each other.

Also, why multithread when you can multiprocess? by mkcmkc · 2007-12-17 07:22 · Score: 1

Higher level constructs are great, but I also wonder, why is everyone so hot to use multiple threads? For many problems, multiple processes work fine, allowing you to use your multiple cores with bulletproof separation of memory spaces (a basic advance we've had since the '60s).

--
"Not an actor, but he plays one on TV."

Re:concurrency - the developer's responsibility? by LWATCDR · 2007-12-17 07:28 · Score: 1

But Windows doesn't provide any less help with multi-threaded programming than Linux.
An OS could provide sort libraries and implement multi threaded solutions but I haven't seen Linux provide them anymore than Windows does.
I will say this much. If any effort is put into the average Linux distro to improve multi-threaded support please let it be in X-Windows. As far as I know X is still single threaded.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.

Me too by CrazyJim1 · 2007-12-17 07:29 · Score: 1

That's exactly what it sounds like. Parallel processors are good for some things, but just an excuse for selling more processors for most.

--
God spoke to me.

Re:Me too by Anonymous Coward · 2007-12-17 09:21 · Score: 1, Funny

Right, because it's not like most people's computers are running more than one program at a time anyways. I mean, what would you do, interleave the decks or something?
Re:Me too by tepples · 2007-12-17 10:48 · Score: 1

Right, because it's not like most people's computers are running more than one program at a time anyways. You mean like Windows Explorer, Firefox, Thunderbird, Winamp, and avast! at the same time?
Re:Me too by T-Bone-T · 2007-12-17 15:13 · Score: 1

Maybe it is my processor getting old, but converting a whole bunch of files in iTunes to mp3 sure eats a lot of my processing power. I also have to wait until I go to bed to turn a DVD into a divx file because the computer is useless for anything else while that is happening. I'd love to say,"Core 1, you will convert DVDs(or mp3s, or some other processor-intensive task). Core 2, run everything else."

It is all about delegating.
Re:Me too by cecil_turtle · 2007-12-17 16:59 · Score: 2, Informative

I'd love to say,"Core 1, you will convert DVDs(or mp3s, or some other processor-intensive task). Core 2, run everything else." You can, with processor affinity. Unless you're saying your processor isn't dual-core. Even still you can just set your process priority / nice level (whatever OS you run) so that it's a lower priority so your other programs run OK.

I don't know, I haven't owned a computer since 2003 where the processor was really a bottleneck anyway. Unless you're doing something specific like converting media files or running a distributed application (seti, folding, etc.) then normally the bottleneck is disk access. Even on servers it's not much of an issue for me, it's pretty easy to throw more CPU horsepower at a machine nowadays, but again disk performance is killer expensive.

re: advantages of multi-core CPUs by King_TJ · 2007-12-17 07:32 · Score: 1

I think you're thinking along the right track.... but people keep forgetting that we already DO run quite a few applications at the same time on the typical PC.
Users are often quick to say "I can only really work with one program at a time anyway. Anything else that's open is just sitting idle, minimized to my taskbar."

But that thinking doesn't consider anti-virus/anti-spyware software running in the background, or perhaps an anti-spam email filter. It doesn't account for system maintenance tasks that might be kicked off in the background (disk defragmenting, etc.).

It also neglects to consider a future where multiple PCs on a LAN might share their available processor time with an app that needs them, running on just one of those multiple PCs?

In the typical office environment, most PCs are doing little more than running someone's word processor or spreadsheet, but there may well be an administrator trying to compact a large database, compile an application, or even transcode some video - who would stand to benefit if the OS on all those systems allowed this kind of functionality.

(Apple's xgrid in OS X has promise in this area ... but few programs seem to even be aware of it, and it often requires an OS X Server in the mix to co-ordinate the whole process. That makes it even more of a "niche" function.)

One core for my app, the rest for crapware by davidwr · 2007-12-17 07:33 · Score: 1

All those viruses and adware will chew up those extra cores in no time.

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.

Re:melt in your mouth not in your mobo by $RANDOMLUSER · 2007-12-17 07:33 · Score: 1

I was pulling the speed number out my ass, but I do recall such a comparison. Besides, I'll submit that even 1/10th the surface temp of the sun per cm^2 is enough to melt a mobo...

--
No folly is more costly than the folly of intolerant idealism. - Winston Churchill

That's an easy one to answer... by Toreo+asesino · 2007-12-17 07:34 · Score: 1

Windows 7 will be out then.

--
throw new NoSignatureException();

Er... What drugs are you taking? by djelovic · 2007-12-17 07:34 · Score: 2, Insightful

> This makes multi-core programming almost a no-brainer.

What uttermost and complete crap.

We are nowhere near multi-core programming being a no-brainer.

Here's what we know right now:

1. We know how to manually create threads to perform specialized tasks. This comes nowhere near the ideal which is loading all the CPUs roughly the same, taking in account CPU affinity for some tasks in order to keep the caches warm and work well on NUMA architectures.

2. We know how to exploit data parallelism in those cases where we have large quantities of data.

Other than that we are still trying to find any paradigm that would make arbitrary systems scale well on a massive number of cores. Some of them are based on pi calculus, some on join calculus, some on more practical foundations.

At this point some things are obvious:

1. CPU threads are useless except as part of the foundation on which other abstractions are built. All really scalable systems use either lightweight threads/processes or smaller tasks which are scheduled in user space.

2. Native stacks are evil.

3. Thread affinity, as implemented by Windows USER and GDI modules and STAs is evil. Don't know how this works under Linux as I never did any GUI work there but I assume many components have similar limitations.

4. Any solution that exposes locks to the user instead of hiding them in the infrastructure is evil. Locks are not composable are very error-prone in real-world scenarios.

Dejan

Re:Er... What drugs are you taking? by mevets · 2007-12-17 08:12 · Score: 1

As a consultant, I take exception with what you say. You are right, but I've found that the more people 'program by prayer', the more money I make. Please, think of my kids education and keep it to yourself.
Re:Er... What drugs are you taking? by tcopeland · 2007-12-17 11:39 · Score: 1

> We are nowhere near multi-core programming being a no-brainer.

You got modded down as flamebait for some unknown reason... but you are exactly correct. As you say later in your post, lightweight processes are how folks are scaling these days. Ruby on Rails is a success for many reasons, but one of them is a single-threaded model that is just scaled horizontally.

--
The Army reading list
Re:Er... What drugs are you taking? by CoughDropAddict · 2007-12-18 06:25 · Score: 1

2. Native stacks are evil.

What's wrong with native stacks? Having allocation/deallocation be essentially free for patterns that follow the callgraph of a program is pretty nice, IMO.
Re:Er... What drugs are you taking? by djelovic · 2007-12-18 08:21 · Score: 1

Native stacks are cool and fast. We all love them.

However if you create a native thread per logical unit (say per user connected to a server) then you'll quickly run out of x86 address space and die an ugly death.

This is why runtimes like Erlang use ligthweight threads instead of native threads, and more or less allocate stack frames on the heap. Then you can create 200,000 threads without sweat to feed and read 100,000 clients.

On x64 the situation is better because of a larger address space, but even there native stacks are paged into physical memory using 2K pages. And using 2K instead of 100 bytes that a real stack would take means that your 200,000 threads are using 2K * 200k = 400M of physical memory just for stacks.

But wait, it gets even worse. Say you are writing a server to handle financial markets. Then you want to allocate a lightweight thread per symbol so that lock contention (and locks are used internally for queues even for languages with message passing primitives such as Erlang) is minimal and the work can be spread evenly among CPUs. Then you get a _really_ large number of lightweight threads and the number of those 2k pages that really store 100 bytes of data starts to really hurt.

The rules of the performance game are way different when you are dealing with massive parallelism. This is the reason why many concepts from functional languages, such as immutable data structures, are making a comeback.

Dejan
Re:Er... What drugs are you taking? by CoughDropAddict · 2007-12-18 08:52 · Score: 1

Ahh, so what you mean is that massive numbers of native stacks are evil. Well duh. :)

and locks are used internally for queues even for languages with message passing primitives such as Erlang)

As an aside, why is this the case when there is so much good research on practical lock-free algorithms? What's with the Erlang guys?

The rules of the performance game are way different when you are dealing with massive parallelism. This is the reason why many concepts from functional languages, such as immutable data structures, are making a comeback.

If you're talking about massive scalability, then not having the option of changing a field in a data structure without allocating and initializing a separate copy first sounds like an expensive proposition. It's bad for memory locality (cache performance), it's bad for heap fragmentation, and it's bad for overall memory footprint. There's a time and a place for immutable data structures, but programming in languages where that's the only option is pretty limiting.
Re:Er... What drugs are you taking? by djelovic · 2007-12-18 09:30 · Score: 1

> As an aside, why is this the case when there is so much good research on practical lock-free algorithms? What's with the Erlang guys?

We all use lock-free whenever possible. In the end it's not always possible. If your locks are uncontested there's not much difference though - the most expensive part of a spin lock is the read memory barrier.

Also, most of the lock-free stuff relies on many small allocations (keeping stuff in linked lists instead of arrays for examples) so there is a price to be paid in data locality, AFAIK.

> If you're talking about massive scalability, then not having the option of changing a field in a data structure without allocating and initializing a separate copy first sounds like an expensive proposition. It's bad for memory locality (cache performance), it's bad for heap fragmentation...

Agreed minus the heap fragmentation part. That problem is pretty much fixed by modern GCs and a small amount of careful programming when the objects grow big enough to be allocated in the large object heap.

However if you are going to be passing copes of your data to other lightweight threads frequently then the cost of immutable data structures is less than the cost of copying.

> There's a time and a place for immutable data structures, but programming in languages where that's the only option is pretty limiting.

Agreeing with you 100%. Not so much limiting as... slow. :)

Dejan

P2P? by ZonkerWilliam · 2007-12-17 07:35 · Score: 1

I'm not a programmer by any means, but wouldn't a solution much like mesh networking or P2P software be effective in parallel programming? Again I'm not a programmer...

Use LabVIEW by pbrooks100 · 2007-12-17 07:37 · Score: 1

http://www.ni.com/multicore/

We write multithreaded apps... by havardi · 2007-12-17 07:38 · Score: 1

All day long... Our CMS does a lot of stuff like: cat file.txt |grep -e garbage |sed -e s/^foo/bar/g |cut -f1-3 -d\&

See, it is really easy to add more threads by just piping them together. Our server needs at least four processors to perform adequately with 10 web visitors, because of our excellent programming practices.

Personal computing insights by lazyDog86 · 2007-12-17 07:41 · Score: 1

If like me your idea of Personal Computing is getting news on the Internet and making insightful comments, you should welcome eight processor computers which would result in four times more insight than this message from a two core processor.

--
my insights may be modded Funny, but at least some of my jokes are modded Insightful

Re:Threads Are Not the Answer by 0xABADC0DA · 2007-12-17 07:45 · Score: 2, Insightful

There is, and will always be, overhead associated with parallelization. It may sound great to say "oh, we can farm out parts of this data set to other cores!", but that requires a lot of start-up and tear-down synchronization. I think what you meant to say is that with os threads it requires a lot of effort and overhead. For example on Tera/Cray's MTA it took basically no extra overhead at all to run a loop in parallel over N hardware threads. The only 'hard' part was letting the compiler know which loops to do in parallel.

The problem with os threads is that the things the benefit the most from parallel processing are the finest grained, but the os threads are only usable for the coarsest grained problems. So, OS threads are generally only useful for concurrency and not for parallel execution. Ie meaning that os threads can let you do two mostly different 'tasks' at the same time (repainting the GUI while the data is being processed), but are really bad at actually making a single task run faster.

You can, sometimes, with incredible effort make os threads run one task faster. But that doesn't change the fact that they are a really really bad solution for this.

Re: advantages of multi-core CPUs by CastrTroy · 2007-12-17 07:47 · Score: 1

I don't think that multicore processors will help me once the antivirus starts scanning my hard disk, or the system does a disk defrag. Most of the slowdowns in those situations are due to waiting for access to the hard drive. I think the same thing is true in about 99% of cases where I'm waiting on my computer. I frequently transcode video and don't notice any problems when trying to run my word processor. That's because the transcoding is run in low priority mode. It doesn't take much longer because most of the time my PC is idle, even when I'm using it, but the rest of the computer is much more responsive because it has priority for the processor available for that .25 seconds when it actually needs it.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.

How often do you build or invert the universe.. by my_left_nut · 2007-12-17 07:48 · Score: 1

... on your desktop?

In my experience, unless you are a developer who needs to build a large application on their desktop, or you are playing some game that requires computation of some sort that cannot be offloaded to a graphics processor, or you are trying to invert the universe (or a small part of it), a desktop with a > 2GHz processor is a waste of money.

I'm sure there is the rare exception, I've never seen an office desktop at 100% CPU utilization which is doing something legitimate. You'd be better off sinking your money into maxxing the memory out, and getting a good malware scrubber package to ensure your machine utilizes its resources properly.

For a server, it's a different story altogether. In many cases, work that runs on servers tends to be quite parallelizable already - for example each request that is accepted by a shopping cart application usually is serviced by a different thread or process (which can be scheduled by the OS to run on a processor that is not doing meaningful work). Scaling involves adding another box to the server farm, and load balancing it, or adding more CPUs to a DB server's backplane. These are scalable as usage increases, where each individual request is not a CPU hog.

I think that the CPU problem really pertains to the problem space where any one of those server threads is always requiring the CPU (versus doing I/O or waiting for another process, human intervention on a HID, etc) - the space of parallelizable CPU-bound and non-parallellizable CPU bound type work.

Things like weather forecasting (solving partial differential equations), with processes that can be broken up into discrete areas, and then recombined are parallellizable via reprogramming. Some are, but only to an extent. For example, computing something like the mandlebrot set is parallellizable over each coordinate. You can kick off a bunch of threads for each point, one for each processor. But, at each point, the iterative process that is occurring produces intermediate results such that the one before depends on the one afterwards. Those individual point computations are not parallellizable.

Of course, if you're running spyware or adware on your server farm, you have other problems to solve too.

Re:How often do you build or invert the universe.. by Toon+Moene · 2007-12-17 10:26 · Score: 1

> ... on your desktop?

Well, I can't help that HP calls the quad-core system I bought last Thursday a "desktop" system (I'm certainly not going to place it on a desktop ;-)

After plugging another 2 Gbyte of RAM in it to get to 4 Gbyte, I'll use it to nightly check our Weather Prediction System (you know the drill: svn up && recompile && run).

The only reason it's not doing it yet is that I have to wait until the one and only right OS to *use* a 4 Gbyte machine arrives: Debian GNU/Linux AMD64.

[ BTW, now I know why you don't see 4 Gbyte "personal computers" - they're sold with 32-bit Vista because
- in the words of the help-desk technician of the shop where I bought the machine: You can't use
4 Gbyte because Windows doesn't see it - HAH :-) ]

reminds me of by circletimessquare · 2007-12-17 07:49 · Score: 1

"640K ought to be good enough for anyone"

we all make fun of bill gates for that, but you are essentially saying the same thing

there are plenty of reasons that joe blow will need 32 cores in 2030. why? i'm not so arrogant as you as to pretend to know what he may be doing. but at the same time, i don't think you can listen to an mp3 and load webpages with flash and preview your 10,000 personal jpegs with 640K. so i do know that more power will be used, somehow

in other words, don't be a fool and think that we have reached anywhere near the limits of personal processing power usage. whatever the limit is, people will find constructive ways to fill that power. what those constructive ways are, i don't know, but whatever they are, people will think of them as vital and essential, like they do with their mp3 collection today, and with which they wouldn't have the faintest clue about in 1986

--
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it

Re:Forward Looking by civilizedINTENSITY · 2007-12-17 07:52 · Score: 1

Interesting historical synthesis. Too bad you were modded down for the link. Marketing is, after all, what MS does best, and regardless of technical merit, or legality, they *win*

Name of short Sci Fi story? by Dan+East · 2007-12-17 07:53 · Score: 1

This article, and the ensuing discussion about whether or not massively parallel computing is even necessary for general-purpose computing, reminded me of a short Sci-Fi story. I don't remember the name, and I couldn't find it by googling. The story is about serial beings that, because they "think" serially, can transmit themselves across the universe easily. They visit earth, and find the massively parallel structure of animal brains to be very wasteful. However they find promise in our computing devices (of course the story was written before the new multi-core craze).

Anyone know the name of this story?

Dan East

--
Better known as 318230.

Re:Wait for the new C++ standard before you switch by MonaLisa · 2007-12-17 07:54 · Score: 1

There are a zillion thread classes for C++, I've written three myself. Just pick one and go with it, or just use pthreads directly, if you want/need to use C++. You don't have to wait for the next standard. Java's threads suck for a lot of things, particularly if you need lightweight threads. If you're using Mac OS X, the Cocoa NSThread class rocks, it's fast and easy. I don't really get what the big deal is here. People have been writing multithreaded code for a couple of decades.

"640k ought to be enough for anyone" by circletimessquare · 2007-12-17 07:55 · Score: 1

-bill gates

to which we all laugh and guffaw at here on slashdot

and here you are, committing the same sin of poor foresight

personal processing power use will increase, and be filled by joe blow computer user, in myriad vital ways. i am not arrogant enough to prognosticate what those uses might be. but i don't think bill gates thought about a mp3 player running while a flash webpage ran while you watched the preview for the upcoming batman film (which, strangely enough, is true of 1987 and 2007: here comes a new batman movie. heh)

the point is, predicting that we've maxed out on memory usage/ processing power usage/ hard drive capacity usage has been a loser's game for decades now. so don't play that game

"Joe Sixpack users never need Super Computer power to surf the net"

oh, like you know what kind of apps over what kind of protocol will be delivered over fibre-to-premises in the year 2030. he will need a super computer. just like he uses a super computer today, in 1987 terminology, to look at football scores, and it seems like a slow computer (whatever the OS)

--
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it

Re:Threads Are Not the Answer by Anonymous Coward · 2007-12-17 08:05 · Score: 1, Insightful

I wonder how he'd like it if every GUI program he ran was only a UI that then spawned other processes which would send UI updates via IPC. Things would be an absolute mess. Threads, if only for this one task (a separate thread to ensure a responsive GUI), are absolutely necessary.

You can say that threads are over-used by programmers who don't understand the reasons why you'd use a separate process instead, but I don't think you can say that threads don't have areas in programming where they're almost essential.

Re:melt in your mouth not in your mobo by Rich0 · 2007-12-17 08:11 · Score: 1

Good calc - aside from the questionable linearity of heat vs clock speed. However, I have to nitpick one statement you made:

Total energy output of the Sun: "386 billion billion megawatts" (per second)

The power output is in megawatts - period. Not megawatts per second. The energy output would be 386 billion billion megajoules per second, though. Energy is not equal to power.

But you used the figure for power correctly.

Re:Oh, wow by Chirs · 2007-12-17 08:12 · Score: 1

Microsoft funds some well-respected research labs. Why would they work elsewhere when they're getting paid to do pure research?

We already have a language just for this case! by fuzzylollipop · 2007-12-17 08:20 · Score: 1

Erlang! Erlang has been around for 10 years and is designed for the current generation of multi-core/multi-cpu machines. It has concurrency built in as a core part of the language. If you want to get the most out of your multi-core machine, start learning Erlang today!

Re:concurrency - the developer's responsibility? by Bengie · 2007-12-17 08:21 · Score: 1

agree. Multi threading is easy in idea and hard in practice. You get logical errors and sometimes those errors don't occur all the time because of the way the OS/chip handles threads.

Re:melt in your mouth not in your mobo by thanatos_x · 2007-12-17 08:21 · Score: 1

If i remember right, the power consumed by the clock speed is something like x * I^2, where I is current. x is changeable, but since we're assuming the same prescott core, it wouldn't matter.

If my math and everything is right, the prescott would take up 7.35 times more power than you predicted, probably within the margin of error given numerous people's fuzzy remembering of math and the original poster not being sure about 25 gHz.

Regardless there are several fundamental problems with increasing the speed, heat is just one. The fact that the speed of light is only so fast means that eventually the signal can only travel so far before the next clock cycle, so the surface are of the chip has to get increasingly smaller, complicating heat problems.

Multi-core, specialized processing units, or a different technology is the only way we'd get around the fact that clock speed gets harder and harder to push the higher it gets.

--
I am not an expert. If I am misled in something, please correct me.

We do need the cores! by hackingbear · 2007-12-17 08:25 · Score: 1

most if not all Jane and Joe Sixpack users never need Super Computer power to surf the net, read e-mail and watch videos. Not true at all! I predict that most Joe's and large number of Jane's will need at least 32 cores just for this single application that they have to run. What's that killer app? Decompressing and streaming multiple HARD CORE videos to their new dual flat-panels Single core will not maximize smooth enjoyment in the same amount of time. Go Multi Cores!

Re:Also, why multithread when you can multiprocess by TheThiefMaster · 2007-12-17 08:34 · Score: 1

It depends on the work you're doing. If you're working on a common set of data, threads are far more efficient. If there's very little in common, processes give you a bit more safety and more memory space.

In general, Windows programs tend to use threads because starting processes is expensive. On Linux, starting processes is trivial, so it gets used a lot more often. There are exceptions however, e.g. Microsoft Visual C++ 2005 spawns multiple processes to do parallel compilation.

Re:Wait for the new C++ standard before you switch by spectecjr · 2007-12-17 08:38 · Score: 1

Wait a second! Have you ever coded in C++ ? Even if threads are not in the standard library, you have boost, you have Intel's TBB(threading building blocks), besides the native threading library. Do you trust you library in Java? What if the VM screws everything up.

C++ Compilers currently can reorder instructions at will, for scheduling & optimization purposes, and that can affect concurrency unless you use memory barriers.

Which are hacky, nasty, and hard to grok.

--
Coming soon - pyrogyra

Re:melt in your mouth not in your mobo by Thornburg · 2007-12-17 08:40 · Score: 1

Yeah, my mistake for misquoting the website I sto^H^H^H borrowed the information from. It listed "...joules (386 billion billion megawatts) per second".

Sloppy job on my part.

Re:melt in your mouth not in your mobo by Have+Blue · 2007-12-17 08:48 · Score: 1

I don't know where that statistic came from, but there are all sorts of devices (stoves, soldering irons, etc) that are designed explicitly for turning electricity into heat and don't come anywhere near that.

Re:Wait for the new C++ standard before you switch by gbjbaanb · 2007-12-17 08:56 · Score: 1

Hope and pray? eh.

The issue with a lack of threading primitives in C++ is that its the calls are not standardised, so each implementation can do it their own way, usually by doing it the OS way (eg Windows' BeginThread() call). Once the C++ gets standard threading, then all C++ compilers will provide calls to access these facilities in a common way, and portable code becomes easier. Until then, you either use a library or code for that particular OS, which is what people have been doing for many, many years.

Not really simultaneously. by master_p · 2007-12-17 09:00 · Score: 1

It happens as fast as the bus can multiplex requests from multiple CPUs.

Re:Not really simultaneously. by caerwyn · 2007-12-17 09:12 · Score: 1

That applies only to off-CPU communication, which is not what was originally specified. A long in-cache computation could very easily happen simultaneously on multiple cores at once.

Also, not all architectures require such serialization for memory access anyway.

--
The ringing of the division bell has begun... -PF
Re:Not really simultaneously. by master_p · 2007-12-17 09:41 · Score: 1

Indeed, but caching is only an optimization mechanism; it's purpose is not to multitask.

Time to revisit OCCAM ? by RogueCode · 2007-12-17 09:08 · Score: 1

http://en.wikipedia.org/wiki/Occam_programming_language I remember using this languague back in late 80s. Very simple parallel _constructs_ available in the language itself, backed by machine level support available on the Transputer chips. One idea that comes to mind is to write a T800 emulator that can exploit todays multicore-processor capabilities.

Milticore/parallel != Faster by EkriirkE · 2007-12-17 09:17 · Score: 1

Multicore just means more apps/threads can run in a multitasking environment without impacting each other. The title is misleading. Though the article is true to the point of not many programmers program for parallel processing, not all applications can make use of such.

--
from 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0 to 45 2F 6E 40 3C DF 10 71 4E 41 DF AA 25 7D 31 3F

Embarassingly parallel processing by suitti · 2007-12-17 09:19 · Score: 1

SETI@Home works by breaking up the problem into billions of small chunks. As far as i know, no effort was made to make the work units themselves run parallel. If you want to do that, run more than one on your box at a time. It took some effort to make SETI@Home work, though. This idea works for searching large number spaces - for primes, and such. This idea is basically "Hungry puppies". If you can get your app to work this way, then you can scale up to nearly any practical limit. (SETI@Home ran into bandwidth limits, sending out units - they eventually required work units to do more work).

Another approach is SIMD - Single Instruction, Multiple Data. Thinking Machines had hardware and software that used this approach. A single host sent out "instructions" on broadcast to thousands of processors. Each processor had it's own segment of the data to work on, but did the same instructions as all the others. This can be done completely in software on a MIMD machine, like a modern multichip, multicore SMP machine. Thinking Machines had two languages to support SIMD - C* (C star) and Lisp* (Lisp star). They had operations that could be sent out.

A down side of SIMD also affects vector machines. The non-vectorizable part of the code has to be run on the single processor host. Even if the vector processor is infinitely fast, if 30% of the application doesn't run on it, then you must wait at least that long. In this case you could only triple your total speed.

Again, on a modern MIMD system, a hybrid approach might work. The SIMD controller might itself run on multiple threads. There might be several instruction streams broadcast to processors.

--
-- Stephen.

The cure is the Actor programming model by master_p · 2007-12-17 09:25 · Score: 2, Interesting

The cure for solving all of parallel programming problems (deadlocks, priority inversion etc) is the Actor model: each object is a separate thread, and calling a method does not invoke code, it only puts a request in the message queue of the called object. Then the thread behind the object wakes up and processes the requests.

If an object wants a result from another object, then it obtains a future value that represents the result of the computation when it will be ready. When the caller wants the actual value, it blocks until the result is available.

Of course, blocking on a result would cause a deadlock in recursive algorithms...therefore, objects don't wait for a result, they simply enter a new message loop at the position they wait for a result. When the result is ready, the callee wakes up the caller by putting a 'terminate current loop' message in the caller's message loop after the result is computed.

The Actor model, implemented as described above, not only solves the problems of classical parallel programming (deadlocks, priority inversion, etc), but it also exposes whatever parallelism is there in a program.

Synchronization is performed only in two places:

1) when inserting/removing elements in an object's queue.
2) when adding the current thread into the waiting list of a future value.

Both synchronizations are implemented via spinlocks. In the case of the queue, there is no need to synchronize on all the queue, just on the edges.

I have made a demo in C++, using Boehm's garbage collector (it is a quite complex system, it needs gc), and it works beautifully. With this model, there is no need to use mutexes, semaphores, wait conditions, or any other synchronization primitive.

I chose C++ because:

1) operator overloading allows future values to be treated naturally like non-future values.
2) when waiting for a result, the waiting thread puts itself in the waiting list of the future. The nodes of the list are allocated on the stack; only c/c++ can do this, and it is crucial, because it minimizes allocation.

Another advantage of this system is that tail recursion comes for free: when you call a method which you don't want the result of, the local stack is not exhausted, because there is no call, only a message placed in a queue.

Patterns like the producer/consumer pattern come for free: one object simply invokes the other.

Data parallelism comes for free: invoking a computation on an array of objects will execute the computations in parallel, on each element of the array. For example, increasing the elements of an array can take O(N) with one CPU and O(1) with N cpus.

Of course, it is much slower on two or even four cores than the same sequential code. But given 10 or more cores, programs start to exhibit linear increase in performance, depending on algorithm of course.

The system is much like the nervous system of an animal: signals are transmitted slowly from one nerve to another, but processing is parallel, so the organism can do many things at the same time.

Another similarity between this system and the nervous system of an animal is that when a nerve wants to transmit an electrical signal to another nerve, the nerves must synchronize, much like there should be synchronization when an object puts a message in the object of another thread.

Re:The cure is the Actor programming model by iluvcapra · 2007-12-17 10:25 · Score: 1

Pal, you forgot to put up a link to your demo!

--
Don't blame me, I voted for Baltar.
Re:The cure is the Actor programming model by TheRaven64 · 2007-12-17 12:53 · Score: 1

You don't need spinlocks for the message queues. I rewrote the EtoileThread package a while ago to use a hybrid locked-lockless model and to switch between the two depending on the state of the queue (it switches to waiting on a condition variable when a particular thread has spun for a while). I use a modified version of the lockless message queue described in Keir Fraser's thesis.
I did this in Objective-C, which has some advantages over C++. The first is that messages (method calls) are first-class objects, and so can be passed across threads transparently (i.e. the code is identical when calling a method in an object in its own thread and calling a local object, the only difference is the constructor you use when creating the object). I also implemented futures using this mechanism. If your method returns an object, then it is replaced by a proxy object which blocks when you send it any messages (if you read Lambda the Ultimate, you might have seen this already).
Of course, you can't really do this properly in an imperative language. A functional language with built-in support for message passing and process creation (like Erlang) is the way to go, long-term.

--
I am TheRaven on Soylent News
Re:The cure is the Actor programming model by ceoyoyo · 2007-12-17 18:08 · Score: 1

Sounds like what's already implemented in OS X's distributed objects. Even better, your distributed object running on another processor need not be in the same machine.
Re:The cure is the Actor programming model by master_p · 2007-12-17 21:46 · Score: 1

When Boehm fixes the bugs in his gc (version 7.0, Windows), I will make an open source library.
Re:The cure is the Actor programming model by master_p · 2007-12-17 22:11 · Score: 1

"You don't need spinlocks for the message queues"

I don't believe that it can be done, at least on 80x86. At the very least, an atomic test and set is required to avoid concurrency issues.

If you mean that you don't need spinlocks at every case, yeap, that's true: a thread can write to the end of the queue while another thread can read from the queue, as long as the two threads manage unrelated data.

"I did this in Objective-C, which has some advantages over C++. The first is that messages (method calls) are first-class objects, and so can be passed across threads transparently (i.e. the code is identical when calling a method in an object in its own thread and calling a local object, the only difference is the constructor you use when creating the object)."

It's no big deal in c++ as well. All you have to do is write some classes which wrap method pointers. I wrote a 20-line program which automatically produced the code for const, non-const, void and non-void methods.

"I also implemented futures using this mechanism. If your method returns an object, then it is replaced by a proxy object which blocks when you send it any messages"

I did the same, but I replaced blocking with message processing, so as that I can't have any deadlocks.

"(if you read Lambda the Ultimate, you might have seen this already)."

I am sorry, but the people at LtU are a bunch of morons. No wonder their pet languages go nowhere. And I call them morons in the scientific meaning of the word: they can never justify the reasons why pure functional programming is the best there is, or why state is the root of all evil. I have put so many arguments against theirs, and when they can't answer, they change the subject. I strongly believe that state is not the root of all evil, undisciplined use of state is (just like undisciplined use of everything else is bad as well). And the so called advantages of functional programming are not advantages of functional programming at all, because the very same advantages can exist in imperative languages. They conveniently ignore the facts that imperative programming is way simpler in real life, and they have never taken the challenge of proving their claims: they never showed me, for example, despite my numerous calls, purely functional code that does the model-view-controller paradigm, or a pure functional sorting algorithm as quick as in-place quicksort. Good luck sorting a database table of thousands of records with functional quicksort. And they have not yet recognized the fact that their languages can't catch logical bugs, just like imperative languages can't (I had once a professor that insisted his purely functional ML multithreaded code did not block...but when I run the program, it blocked randomly after a set of iterations. Yeah, thanks a lot), and that's where the problem is.

"Of course, you can't really do this properly in an imperative language. A functional language with built-in support for message passing and process creation (like Erlang) is the way to go, long-term."

Bullshit. I've done it in c++, using Boehm's gc and pthreads, and the code was less than 100 lines of code (excluding the code for the method wrappers). Actually, 95% of the code was about a parallel operator new which returned a future instead of a pointer, and a class which managed threads (so I can reuse them easily).

I apologize for the harsh tone, but I am extremely frustrated by comments like 'language X can't do this' or 'only this can do that' without any proof...and the web is full of that.
Re:The cure is the Actor programming model by shutdown+-p+now · 2007-12-18 01:45 · Score: 1

Can you elaborate on what problems there are in the present version of Boehm GC?
Re:The cure is the Actor programming model by master_p · 2007-12-18 10:45 · Score: 1

there is a problem with GC_INIT() which prevents threads from being correctly initialized. The collector works ok without threads.

The bug is in the .gz file that can be grabbed from the homepage. It has been corrected in the CVS (I just found this out this afternoon).

Mozilla Bugs burn most of my CPU by billstewart · 2007-12-17 09:43 · Score: 1

I'm not a gamer, and gamers have been the big performance burners for desktop PCs for the last few years. I do typical office work and browsing, and would really rather have a desktop that sits quietly in the background and doesn't go dancing and rotating just because there's spare CPU.

Occasionally I'll do something that needs a few seconds of CPU time, such as recalculating a spreadsheet or having Word re-juggle large parts of a document, or doing public-key encryption setting up a VPN tunnel, but normally my CPU's running at about 5%, and the only time it really burns a lot of CPU is when Mozilla gets cranky about something and decides to burn all the CPU it can get, probably running some badly-written Flash or Javascript dancing ad banner, so a multicore CPU would just lose one core to Mozilla instead of the whole thing. (And yeah, maybe a newer Mozilla version would help, and I could run Adblock, but basically that would just mean my CPU would be idle even more of the time.) Once in a while I'll watch movies on the PC, which can burn a bit of CPU but is still basically limited by download speed.

Obviously servers are an entirely different case, but for the most part they're either running work that's easily parallelized (serving lots of web pages or other kinds of sessions) or else they're running a database application that gets lots of transactions dumped into it, so the DBMS needs to have its parallelism written carefully but everything else is still multiple applications. So they'd do well switching to multicore, but otherwise it's really just the gamers who need the extra horsepower.

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

Can we have cake, too? by treofan · 2007-12-17 10:03 · Score: 1

...or is that a lie?

Is this "news?" by timlevin · 2007-12-17 10:10 · Score: 1

I know that the author of the NY Times article is well known, but there seems to be nothing new in his article or mlimber's abstract of it. Perhaps people just needed a parallel computing discussion, but is this "news?"

Flawed assumption by Estanislao+Mart�nez · 2007-12-17 10:12 · Score: 1

The basic flawed assumption that you (and hundreds of other people in this discussion) are making is that this is all about "speed," in the sense of raw computation throughput.

Speeding up intensive computations by splitting them up across threads that run in parallel in separate execution units isn't the only application of concurrent programming. Here's another one: making interactive or real-time programs more responsive, by flexibilizing the order in which the program can perform its computations, and allowing "important" events to preempt less important computations. Using concurrency to make software more responsive, in fact, often makes programs slower in raw throughput (since they have to pay the cost of complex scheduling and synchronization of all their threads), but it's very often OK to make a program objectively slower if it makes it feel subjectively faster to the end users.

Concurrency has had the potential to improve software since long before multi-core processors became commonplace. The recent mainstreaming of multi-core processors, contrary to what the story here would have us believe, really isn't all that relevant when you evaluate the industry's failure to develop and market advanced solutions for developing concurrent software. The absence of advanced tools for concurrent programming has been a problem since long before, and the cost is in usability of software.

--

Are you adequate?

Re:Flawed assumption by teg · 2007-12-17 17:20 · Score: 1

, by flexibilizing the order
"Verbing weirds language" - Calvin

keeping applications responsive by jordan314 · 2007-12-17 10:17 · Score: 1

Exactly, before we tackle putting 8 core processors to full use, can we solve the spinning beach ball of death? I hardly ever use my dual core system at peak capacity, but no matter how fast my machine is, sometimes it still just locks up.

Re:concurrency - the developer's responsibility? by Chirs · 2007-12-17 10:21 · Score: 1

"But Windows doesn't provide any less help with multi-threaded programming than Linux."

I suspect the OP is talking about things like scheduler performance, task switch latency, per-thread overhead, etc.

Re:concurrency - the developer's responsibility? by LWATCDR · 2007-12-17 10:33 · Score: 1

"I suspect the OP is talking about things like scheduler performance, task switch latency, per-thread overhead, etc."
But that doesn't make a lot of sense. You don't put that into a programing language at all.
And if you want to start a fight over scheduler performance Linux is probably the best of all possible OS's to do it on.
It was an off topic flame at Microsoft. I am not a big windows fan but this is just dumb.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.

Re:parallel programming course in University by Anonymous Coward · 2007-12-17 10:37 · Score: 1, Informative

Yes, you can just take a standard sort algorithm and run it multithreaded. Have you heard of the Quicksort and Mergesort algorithms? Did you learn linear algebra in University?

Both Quicksort and Mergesort are divide-and-conquer algorithms. They are recursively defined to split the data-set and sort each part independently. In fact, most any recursively defined algorithm will run multithreaded.

Linear algebra lends itself to multithreading. A matrix product is several independent dot products, which each can run on separate cores.

Modern Fortran compilers already support parallel matrix operations, without any code change. There is no reason why another languages' libraries couldn't also have hidden parallelism.

Re:Oh, wow by bladesjester · 2007-12-17 10:46 · Score: 1

Not to mention that, from what I've seen and heard from people who have worked there, they pay pretty well with decent benefits, the place is relatively relaxed, and they actually encourage their people to have a life outside of work.

How horrible and terrible that must be for them. Can't imagine why *anyone* would want to work there (for the less observant of you, this was meant as sarcasm)

I hate to burst the gp's bubble, but they do actually make a lot of good things there (I, for one, am quite fond of Visual Studio. It makes my life a great deal easier). Like every company, there are good products and products that need some work, but on the whole their stuff is pretty good.

Also, as you said, a lot of the *really* smart people are in their R&D labs, working on things that are 5+ years in the future.

--
Everything I need to know I learned by killing smart people and eating their brains.

Memory matters, too by Furry+Ice · 2007-12-17 12:47 · Score: 2, Insightful

I see a lot of comments indicating that all a programmer needs to do to scale to more cores is just multithread your algorithms. If only that were true! Unfortunately, memory access patterns become extremely important for getting good performance, and that requires some pretty sophisticated knowledge about the hardware and proper tuning is almost a black art. Once large numbers of cores are in use, scaling your software optimally is going to be very difficult. Don't delude yourself. Talented programmers are going to be very much in demand, and I suggest starting to learn everything you can about it now. For starters, Ulrich Drepper has written an incredibly detailed and helpful article available at http://people.redhat.com/drepper/cpumemory.pdf which should really help dispel any notions that this change to computing is going to be easy!

Developers need to get out for lunch by dbIII · 2007-12-17 12:54 · Score: 2, Interesting

Even a child's toy like the Nintendo DS from 2004 has two cores. Developers need to remember it isn't the early 1990s anymore and that they will have to deal with multiprocessor machines.

Re:Developers need to get out for lunch by dbIII · 2007-12-17 18:11 · Score: 1

Why should they? Their job, is to implement software that has the required functionalities. And since software size is increasing exponentially, that's the hell of a job

It is extremely frustrating looking at the load monitor on an 8 CPU system, seeing one CPU at 100%, the rest idle and knowing that that application will not finish it's task for two more days (numerical processing). The user can get around that by running the application eight times with an eighth of the dataset but they don't becuase well written similar applications are multithreaded. The OS is not at fault, the hardware is not at fault - the application was not written to use the available resources. On the desktop graphics programs are intensely frustrating if they only use a single thread. Even the game WoW became multithreaded in the last patch with significant performance gains on multiprocessor systems. Multi-CPU systems started becoming common in the mid 1990s so developers being a decade behind the times is a little embarrassing and there are many situations where the task is not completly serial.

Merge sort by tepples · 2007-12-17 12:55 · Score: 1

If you have 2 cores sorting a list, that list effectively has to be in 2 places at the same time (ie each core's cache). Divide the array list into two halves. Have each core run merge sort, quick sort, heap sort, or whatever sort on its own half. Then use one of the cores to perform one step of merge sort to merge the two halves together. Did I miss anything?

Re:I'm sure no one will ever read this, but by TheRaven64 · 2007-12-17 12:59 · Score: 1

And let's not forget manual memory allocation is a very *costly* operation: it may take thousands of cycles to allocate a memory block...in garbage collection, allocation is usually an increment of a pointer. This is very misleading. If you are using generational garbage collection, then allocating short-lived objects is simply a matter of incrementing a pointer. However, if you allocate an object on the stack in C++, or use a non-freeing NSZone (in Objective-C) then you are also just incrementing a pointer. If you are using a generational garbage collector, you have to copy objects if they are long-lived (i.e. if they are allocated using the short-lived object mechanism and then not collected at the end of the current generation).

You seem to have a very rose-tinted view of garbage collection. It has numerous advantages, but performance is not (yet) one of them.

--
I am TheRaven on Soylent News

Re:concurrency - the developer's responsibility? by tepples · 2007-12-17 13:27 · Score: 1

An OS could provide sort libraries A sort library requires a comparator callback, and who manages concurrency for the comparator callback?

If any effort is put into the average Linux distro to improve multi-threaded support please let it be in X-Windows. As far as I know X is still single threaded. X runs in at least two processes: the window system and the window manager.

Re:Oh, wow by JAlexoi · 2007-12-17 13:48 · Score: 1

We bash in attempt to convince those smart people to leave MS and work in a more open way.

threads are too high level by Zork+the+Almighty · 2007-12-17 14:10 · Score: 2, Insightful

Perhaps I am the only person who thinks this, but is seems to me that threads are not a very good low-level primitive for concurrent programming. They inherently assume that whatever is running on the different processors is independent. As a result, writing a tightly coupled parallel algorithm is "hard".

I would much rather the operating system switch 4 or 16 synchronized cores completely over to me. Add prefixes to the assembly instructions so that I can explicitly execute instructions on processor 1, 2, 3, etc, in a shared memory model. Add logic similar to simultaneous multithreading to keep unused cores saturated with instructions from other threads when possible. This would help the programmer extract parallelism from tightly coupled algorithms. There seems to be no real multithreaded analogue to assembly language, and I think that is a big part of the problem. If we had such a thing it would be much easier to write tightly coupled parallel code, and higher level parallelization (from compilers) would follow inevitably.

Of course I'm not saying this is some sort of magic bullet. We would still need to split up computations and use threads as best as possible, but I think this is an obvious tool that we are missing.

--

In Soviet America the banks rob you!

Re:Oh, wow by bladesjester · 2007-12-17 14:11 · Score: 2, Insightful

We bash in attempt to convince those smart people to leave MS and work in a more open way.

In doing so, you prove yourself a fool. It is a childish action that only hurts your cause, and Microsoft (as well as most people with any business or social sense) knows it.

You see Microsoft as some great evil to be overcome without seeing that a large part of your problem is yourself.

Companies see people like you bash anything that isn't open source or "free" and they quite rightly think that you haven't really thought things out or lack the business acumen to realize why all of the world can't work that way. (Not to mention the extreme lack of social skills that it shows)

I like open source, I use it, I occasionally write it, and I've championed the cause in a sane way.

What you are missing is that Microsoft is giving a lot of people and companies what they want - software that is relatively easy to use and which everyone else is already using ("best" doesn't matter most of the time, which a lot of you have problems understanding).

At the same time, they treat their employees well, paying them well with good benefits (from what I've heard from people I know who work there), and maintain well-respected research labs.

You do not draw good people from a good environment by telling them it's not a good environment because they don't make everything open source. You draw good people by being a better environment in terms of pay, benefits, culture, work-life balance etc *and* appealing to their sensibilities.

If you can't do that, and instead simply bash anyone for associating with "the enemy", you are doomed to fail because, at best, people will work on it as a hobby. The lion's share of good open source software is done by people being paid to do it. Bashing the company of people you want to work for you does not help.

Not all of the world cares about open source, and many of us who do are not fanatical about it and realize that, while it is good for some things, is absolutely horrible for other things from a business standpoint. We like working on things that we see as important, but we also like being able to pay our bills and having a life outside of work.

--
Everything I need to know I learned by killing smart people and eating their brains.

I tried looking for that reference... by tlambert · 2007-12-17 14:54 · Score: 1

I tried looking for that reference...

But the googles, they do nothing!

-- Terry

specialize, don't parallelize by simplerThanPossible · 2007-12-17 15:14 · Score: 1

Perhaps multi-cores will become specialized, along the lines of videocards, diskdrives and ADSL modems (which have their own CPUs these days); and the nascent physics card. This is a natural effect, and occurs in biology (organs in your body), and the size of commercial organizations.

Perhaps we can further divide up the tasks?
Eg. caching, in the right way, can yield incredible performance benefits - so, perhaps a predictive caching processor?

ANYTHING but parallelizing the unparallelizable.

Re:Threads Are Not the Answer by JoelKatz · 2007-12-17 16:04 · Score: 1

Threads are a superset of a message passing interface. With threads, you can trivially implement a message passing interface if that's what works best. The thing is, you don't have to use message passing if it's not.

Re:Oh, wow by El+Cabri · 2007-12-17 16:08 · Score: 1

FYI, no programming language compiler (except maybe ADA) has ever fully complied to any particular version of any international standard of any programming language. Programming language standards are just a convenient way to document what a compiler does : say it's a C++ compiler, it compiles the C++ standard, already written, to which you add your own delta, which can usually be written up more concisely. Or a guide to new implementation. Or a guide to draft programs that will be relatively easily portable across platforms.

Microsoft just has a past history of being a little bit more officially disdainful of the standard, in particular C++90, claiming that compliance was not a feature asked for by their users, of which there are a lot. Technically it's true. But look at C# : not only did they do a very open standard unlike Java, but also an elegant and concise one at that. And very closely implemented by their own compilers of course...

Re:Wait for the new C++ standard before you switch by JoelKatz · 2007-12-17 16:09 · Score: 1

This is a silly argument. We've had a stable POSIX threads API for 10 years. No standard says the computer can't refuse to run your application entirely.

Disk is the bottleneck by Tablizer · 2007-12-17 16:45 · Score: 1

I really don't think that CPU is the bottleneck in most cases, but rather the hard-drive. When my machine is slow (outside of networking issues), usually I see the hard-drive light flashing like a UFO on crack.

--
Table-ized A.I.

Re:Oh, wow by wsgeek · 2007-12-17 17:48 · Score: 1

Not sure if the following belongs in a MS- and C++ article on parallel programming, and also not wanting to sound like a fanboy, but there is some interesting stuff going on in this area in the OS X / Cocoa camp. A nice introduction can be found on Apple's developer website as well as the marketing literature for Leopard. I'm sure that the Java folks and .Net folks have something also, but this is worth noting for the sake of completeness.

Re:Oh, wow by bladesjester · 2007-12-17 17:58 · Score: 1

Not sure if the following belongs in a MS- and C++ article on parallel programming, and also not wanting to sound like a fanboy

Don't sweat it. It's always nice to be able to compare with people in other groups. It lets you know what you're doing right that others may or may not be, shows you places where you might be deficient, and generally gives you the ability to look at the issues from a different angle.

Those, in my opinion, are positive things because they're one of the ways we learn to do what we do better.

--
Everything I need to know I learned by killing smart people and eating their brains.

Re:Threads Are Not the Answer by duggi · 2007-12-17 18:39 · Score: 1

The problem is that our brain has been doing linear thinking from a long time. We really don't have the ability to multi thread, I don't know if it is possible to train our brains to think "in threads". An algorithm is essentially a single thread model, so are many of the control statements(if, for and while). I don't think it is sufficient to make a loop run in 100 threads for faster time, the loop in itself need to be changed. It is a special ability, and we might just end up abusing multi-threading by incorporating it into our linear brains. Still, it might be a start, and I really think it would do a lot good for the future of computing.

--
http://monkeynesianeconomics.blogspot.com/

Chip makers are at least 2 decades behind by ClosedSource · 2007-12-17 18:52 · Score: 2, Insightful

"Multi-CPU systems started becoming common in the mid 1990s so developers being a decade behind the times is a little embarrassing and there are many situations where the task is not completly serial."

So after a decade of poor adoption on the part of software developers, the chip makers have ignored the fact that the wisdom of the (programming) mob indicates that multi-processing is not an attractive solution. Chip makers have known for more than two decades that they were going to run into physical limits eventually using the current technology, but opted for milking the 1970's model as long as possible rather than developing new technologies that might lead to much better single-core performance.

Microsot Robotics Studio by robbo · 2007-12-17 19:35 · Score: 1

If I can put my $0.02 in, I'd say to be on the lookout for more apps developed using the concurrency runtime in MSRS. Speaking as someone who works with it every day, I have to admit it took a while to get used to the task/message port paradigm. Once you grok it, though, it's an extremely powerful and elegant way to write parallelized code. Imagine writing multithreaded code without your trusty mutex primitives and condition variables, and where in most cases C#'s lock {} is useless. Objects implement service contracts, and mutual exclusion is enforced by defining interleaves over message handlers. Tasks (very light-weight execution contexts) can be instantiated using anonymous delegates, and the runtime automagically instantiates as many threads as you need, depending on your hardware.

--
So long, and thanks for all the Phish

Referential transparency anyone? by Wiseman1024 · 2007-12-17 20:24 · Score: 1

Referential transparency + memoization + automatic load balancer in a truly distributed system = no threads to care for = give me 64 cores

--
I was about to say 13256278887989457651018865901401704640, but it appears this number is private property.

some problems are fundamental by sentientbrendan · 2007-12-17 20:33 · Score: 1

Some of the major computational hurdles that the average Joe runs into, just aren't amenable to paralization. Video games are probably the biggest example. A traditional graphics engine is not a ridiculously parallelizable business. If you're bottleneck is your ability to generate polygons to pump into the graphics card, then more cores aren't going to do you a lot of good since actually pushing things to the graphics card needs to happen in a synchronized fashion anyway.

If you have a lot of raw computation that needs doing, then more cores can be great depending on the particular algorithm. However, most people aren't running into raw computational bottlenecks, but bottlenecks that have to do with latency between the CPU and harddrive, or the CPU and memory, or network latency. Other issues include having user processes compete with background tasks. Vista and to a lesser degree XP were both pretty bad in terms of constantly reading or writing from disk for some random reason, which will really screw any other IO intensive applications you are using.

Really, modern personal computers still seem slow in many ways, but those ways are largely due to IO limitations, and poor management of competition for IO by the operating system. Until these bottlenecks are addressed, there isn't that much incentive to make the CPU intensive part of applications, which are already fast, faster by using parallel programming techniques.

Re:I'm sure no one will ever read this, but by master_p · 2007-12-17 21:31 · Score: 1

It's not misleading at all, if you have ever studied the subject you would see how things are.

First of all, allocating an object on the stack can work only if we are sure that the address of this object will not escape the stack frame it is declared into. If we are not sure (perhaps because the algorithm we have at our hands is complex), then we have no choice but allocating the object on the heap.

Secondly, by adding garbage collection to c++, it does not mean that you can't allocate objects on the stack any more. Garbage collection for c++ exists (Boehm's gc), but it is way more inefficient and problematic than it ought to be.

Thirdly, raw data copying is a very quick operation. Even the raw PCI bus can move 133 MB/second...this means that, for most programs where the actual live set is very small (usually under 64 MB), copying takes very little time to be executed.

Forthly, c++ has the advantage over languages that do not have stack allocation and templates. In c++, the amount of heap objects is much smaller.

As the situation is right now, Java comes on top in many benchmarks against c++, and one of the advantages is garbage collection. The execution of a Java program might be slower than the corresponding c++ program, but overall Java wins because of optimizations applied in run-time, and the collector helps in that.

Finally, the greatest advantage of garbage collection is that it saves tremendous amounts of money off debugging memory management issues. There is no c++ application that has not crashed, at least once, from a memory management problem. In Java, there is no such thing.

1984 called by CarpetShark · 2007-12-17 21:46 · Score: 1

2005 Called....it wants it's article back.

Seriously - any developer writing modern desktop or server applications that doesn't know how to do multi-threaded programming

Tell 2005 to get off the phone FFS, and let 1984 get a word in ;)

Actually, programmers have been doing multitasking for decades. One could argue that it's taken hardware a long time to catch up. There are very few applications that really need to max out 2 cores, never mind 8, 256, or 4096. Take a look at the number of processes running on any modern OS though; multiple processes are pretty commonplace, even just for what would normally be considered core "kernel" stuff.

Re:Wow, this is a great idea! by somersault · 2007-12-17 22:23 · Score: 1

Yep, that level of optimisation is getting a little silly (especially when it can be done rather easily by the compiler anyway) for most applications, but in general we just don't even need to think about memory requirements or efficiency these days for most apps. It's nice to be able to do that, but for example it would be nice if the lovely chaps down at Microsoft tried to minimise the amount of resources their flippin OS takes up and leave more for the actual 'important' bits - the applications. Oops that almost turned into an MS rant..

--
which is totally what she said

Remember assembler days? by MaksimS · 2007-12-17 22:41 · Score: 1

Funny thing, but this discussion reminds me on decades old rambling on assembler programming techniques. IMHO, the whole "paradigm" will get rendered obsolete by smarter CLIs being able to automagically ;-) interpret parts of code to fit multicore architecture. This is, after all, one of the features of CLI layer anyway (beside portability) - to conform to underlying hardware with minnimum or no programming efforts.

High level language by oliderid · 2007-12-17 23:14 · Score: 2, Interesting

I guess it will a dumb question but:

Why a Java virtual machine can't take the burden of the multi-core adaptation?

They have promised "write once run anywhere"!

Lazy coder :-)

Re:Wait for the new C++ standard before you switch by shutdown+-p+now · 2007-12-18 00:07 · Score: 1

There is more to it than volatile. For one, you still need explicit memory barriers. One example when that is needed is when you need to implement singleton via double checked locking correctly.

Re:melt in your mouth not in your mobo by Zaatxe · 2007-12-18 00:59 · Score: 1

Yeah, and in Soviet Russia, the Sun melts YOU!

--
So say we all

Re:Study at an accredited, secular university by beyondkaoru · 2007-12-18 02:02 · Score: 1

that's n log(n) comparisons in a comparison sort; we can get around the limitation by, for example, using radix sort, where we do something different from comparison sorts, or by using many many processors, thereby spreading the comparisons around in clever ways. if we go the route of multiple processors, we can achieve very nifty speedups if we have, like, n of them, but even if we only have a few processors, we can get nice speedup from a not-big-oh perspective.

so, basically, n log(n) is not the final word on sorting.

--
the privacy of one's mind is important.
you do have something to hide.

CSP is the right way to do Multi-Threading. by ralph.corderoy · 2007-12-18 03:00 · Score: 1

Communication channels are the right way to tackle this. Bell Labs had the right idea. See http://swtch.com/~rsc/thread/ and the slides at http://www.cs.kent.ac.uk/teaching/07/modules/CO/6/31/slides/ if you have an inquisative mind. For the slides, read them in this order: motivation.pdf -- just pages 1-39, basics.pdf, applying.pdf, choice.pdf, replicators.pdf, protocol.pdf, shared-etc.pdf.

The grammar called... by TeknoHog · 2007-12-18 03:10 · Score: 1

it want's it's apostrophe's back. 's.

--
Escher was the first MC and Giger invented the HR department.

EPIC by John+Bayko · 2007-12-18 03:25 · Score: 1

None of these issues indicate that explicit instruction level paralellism (EPIC) is a bad idea.

No, but the fact that compile-time determination of instruction scheduling can't adapt to:

Execution profiles that differ at runtime from what the compiler assumed
Different CPU resources, such as more or fewer FPUs, load/store units, etc, than were foreseen when the instruction set was designed

...limits the flexibility of CPU design improvements. For example, maybe having two high speed specialty adders that can't subtract can greatly speed up certain code. Scheduling instructions at run-time lets the CPU do that, but if your instruction set assumed that all integer ALUs can add and subtract, you have to either include extra functions (maybe slowing them down to regular speed and killing the advantage), or give up on the idea.

CPU design has progressed steadily from static to dynamic optimisation. Complex memory-based indirect addressing modes have been replaced by simple loads and stores, complex multi-function instructions were broken down to simpler basic operations - in both cases, this allowed better optimisation simply because there is more flexibility. The problem with VLIW, Itanium included, has always been that it's a step back from dynamic to static decisions - EPIC just got rid of the no-ops by encoding the VLIWs a bit better.

Itanium does have a lot of other really nifty things about it, but those were not new. Its speed comes from the DSP world, and almost all of its features are just to compensate for the problems DSPs have trying to deal with non-DSP issues like interrupts and exceptions, caches (DSPs hate non-uniform memory), function calls, and so on. That ends up making it almost as fast as a much simpler CPU with a DSP coprocessor, for only a few times more cost.

Perl 6 by Dr.Ruud · 2007-12-18 06:41 · Score: 1

Parallel New World: http://www.jnthn.net/papers/2007-fpw-parallelism-slides.pdf

See also http://dev.perl.org/perl6/

Garbage Collection is not for C++. by anwyn · 2007-12-18 08:23 · Score: 1

The key word here is "can". GC naturally sucks for the case you've outlined above, and the reasons are all correct, too. But many large programs out there don't really use that many "unmanaged" (in .NET parlance) resources - they mostly deal with classes and objects representing various entities and operations on them, which don't have any resources but memory to manage (they can use other resources during method invocations, but they don't hold onto them). GC is great for such stuff. The ability to pass around and return complex object graphs without caring about memory allocation is really worth it.

But there are many large C++ programs that do use the "object creation is resource allocation; object destruction is resource release" idiom in a fundmental way. Furthermore it is/was rational for these programs to do so. These programs should not be penalized because some are lazy and want the convenience of GC. Forced GC is too one size fits all. GC encourages a lazy programming style that can and usually does cause leaks of other (non-memory) resources.

Further, large numbers of C++ programs already exist. Such a fundmental change in philosophy would cause all these programs to have to be rewritten. All to support a dubious programming style. It is not going to happen. The people that ask for it do not understand C++ and its problem space. It is just annoying.

Re:Garbage Collection is not for C++. by shutdown+-p+now · 2007-12-18 09:46 · Score: 1

Noone is forcing GC and the associated idioms upon C++. If you read the C++09 GC proposal closely, you'll notice that it gives the programmer very fine-grained control over the allocation; you can explicitly require or explicitly disallow GC for parts of your code, you can decide whether you want GC to be involved or not for every new allocation you make, and you can define reasonable default behavior on a per-class basis. If you really don't want GC in your programs, fine, you can disable it altogether; third-party libraries which require GC won't work for you then, but nothing in the standard library does that. What is wrong with such an arrangement?
Further, large numbers of C++ programs already exist. Such a fundmental change in philosophy would cause all these programs to have to be rewritten.
Not at all. All the stuff that's working now won't magically stop working if you plug the GC in. Noone forbids you from allocating objects on the stack and relying upon RAII. It is quite possible to mix the RAII code and the GC code - I've done so in a rather large commercial C++ project myself.
The people that ask for it do not understand C++ and its problem space. It is just annoying.
Tone down, please. A lot of people already use third-party GCs (such as Boehm GC) in production code, where it makes sense, and the results are good. One particular case I know of is the Digital Mars C++ compiler. I think that it is not wise to accuse its author, Walter Bright, who coincidentially also happens to be the author of the first ever native C++ compiler (as opposed to a C++-to-C translator), of "not understanding C++ and its problem space".
Re:Garbage Collection is not for C++. by andi75 · 2007-12-19 03:23 · Score: 1

> Noone is forcing GC and the associated idioms upon C++.

This is exactly why C++ is so difficult, and we have to live with so much crappy C++ code out there that needs to be maintained, fixed and generally suffered through.

Inexperienced programmer X sees cool feature Y (be it virtual functions, templates, multiple inheritance, exceptions, references, GC or any of the other cool 'shoot-yourself-in-the-foot' thingies that C++ added over C) and totally has no clue how to use it because he doesn't know the associated idioms (not an easy task anyway, because there are so many of them), and a few years later, it ends up on my desk with the usual "we can't maintain this program anymore, please fix it".

Good thing I quit programming and teach math now... :-)
Re:Garbage Collection is not for C++. by shutdown+-p+now · 2007-12-19 07:15 · Score: 1

Inexperienced programmer X sees cool feature Y (be it virtual functions, templates, multiple inheritance, exceptions, references, GC or any of the other cool 'shoot-yourself-in-the-foot' thingies that C++ added over C) and totally has no clue how to use it because he doesn't know the associated idioms (not an easy task anyway, because there are so many of them), and a few years later, it ends up on my desk with the usual "we can't maintain this program anymore, please fix it".
Actually, there's more to it than that. There are quite a few smart C++ coders who can't restrain themselves use every trick in the book skilfully: arcane template metaprogramming, heavy use of private and virtual multiple inheritance, reliance upon obscure rules such as SFINAE, user-defined implicit conversion rules, or overload resolution of funciton templates; and end up writing code which requires the at least same level of understanding of the language to maintain and debug. The end result is a maintenance nightmare. This is probably the best argument I know of against C++ - not the fact that it doesn't have some facilities that Java or C# offers (in fact it does), and not even that it makes it all too easy to shoot yourself in the foot. It's that it lets one "tripwire" the code for all the future maintainers to stumble upon. Perl is also guilty of that, by the way.
Then again, it's also why C++ coding earns more than Java, especially in companies with large legacy codebases to maintain; so who am I to complain? ;)

Re:2005 Called, indeed by tuomoks · 2007-12-18 11:28 · Score: 1

Right. And right again, you can't change the complexity (even some compilers try) but you can try to design the complexity (simplicity) yourself. See the sorting algorithm and parallelism development since 60's. Not advertising but Syncsort was and maybe still is one very good example. US radar tracking built highly parallelized sorting systems for incoming radar data at that time, still a viable technology, can be found (maybe) in several thesis papers. Parallel processing has been around longer than most think. Now, to get most of parallel processing don't think just one algorithm, think systems. Even a lot of SQL access today can use a lot of sorts and parallel I/O and not always needs the results sequentially but to combine it later when the results are needed, a perfect parallel task while other processing is done. Think compilations, several programs compiled and later combined to one object or library, what's better than compiling in parallel? If you can break a task ( big or small, a report, a transaction, a matrix manipulation, a loop, etc ) to smaller pieces it often can be parallelized. Not easy always because something like XML was not originally designed for that but if you need performance ( and flexibility ) don't design one huge XML but several with references where even if each needs serializing they can be parsed or created in parallel. This really is less a coding problem but a design problem and a little OT because the article was of CPU's but there is much more in computers, systems and parallelism than just number of CPU's.

Re:Threads Are Not the Answer by tuomoks · 2007-12-18 15:37 · Score: 1

Yes and yes. You explained it better than I usually do. Actually there are such things for software also, it is called multi-tasking. One time cost to load and then forever it is just normal call cost (almost, with hw help) any time you give the task more work (or as the name says something to do). If a task or a thread is independent, feeding the system without control once started, not receiving new "orders" like reader tasks, I would call it multi-programming, just schematics. As someone already mentioned it gets even more interesting between SIMD or MIMD machines (AP(attached) and MP(multi) for oldtimers) and if it involves any external resources as I/O, memory, etc which can be shared (and/or cached) between or separated between processors. Or hybrids, 2+2, etc. An interesting subject trying to make it fast in parallel without problems but probably way off from "normal" development just good to remember when trying to make applications running faster. Also, I agree that in "normal" desktop more than two processors are mostly not a big benefit but games, CAD, rendering, statistics, modeling / simulation, pick your own, if done right, can have great benefits of multiple processors. Clusters help but fast connections are not cheap between systems. And an advice to start-up / tear down philosophy, why tear down if it is not one time task? Write re-entrant and reuse. And if you have two processors, why create more than maybe three equal threads, are you sure that the resources they use are not serialized? Even if threads are in pools there is a cost to start / stop.

Re:The basic problem - I have to remember this by tuomoks · 2007-12-18 17:16 · Score: 1

Correct and a good answer, can only be used when only certain type of people around. This is one of the basic problems not too often explained to CS students, throughput and response times are only loosely related. Actually latency plays even bigger role today because of networks (any kind) internet, buses, execution path, even SOA between services, etc and often overlooked. Now of course there may be some turnaround (setup) time which comes to play but with computers it's getting faster, I don't know about humans?

Re:Multiple Applications, multiple answers by tuomoks · 2007-12-18 17:45 · Score: 1

You definitely should have more points, if I had.. You answer so many questions so well that there is no point to comment those except because you have FORTRAN there. Kind of related to parallel processing everyone should know how the vector processing and specialized vector FORTRAN did/does it. The problem today compared to earlier systems are the week links, the compiler really can not analyze everything in system, maybe the data used in thread is used by another but because it is not known at the compile time there is nothing the compiler can do. Kind of have to agree with just in time optimization, the programs can learn and the longer they run the more optimized they get. Most relational database systems do that (if you let them), the longer they run, the more efficient they get (to certain point, even they can not fix logic errors.) And some of them span tens of nodes with hundreds of cpus and even more.

Re:It's the Curse of the Algorithm and APL by tuomoks · 2007-12-18 18:15 · Score: 1

Thank you, you really had to bring APL to this when I'm having flashbacks! APL was a beautiful language and because we were one of the first AP (attached processor) systems I got to work on that. Who ever wrote that interpreter was good except in assembler so I ended up fixing several bugs in code when we build (mid 70's?, I don't really remember the year) first real applications, not just math but an HR system based on relational ideas(system R anybody?). And once you get the keyboard it is no more difficult than using all the weird ESC, etc sequences in your favorite editor. Actually it grows on you but, agreed, a little math background is needed. At end the system did run well on AP, was fast, easy (the user didn't actually need an APL keyboard) but after one look to the code our developers did run away (screaming!) So, that project didn't catch but it was lots of fun.

Re:Also, why multithread when you can multiprocess by mkcmkc · 2007-12-25 05:30 · Score: 1

In general, Windows programs tend to use threads because starting processes is expensive. On Linux, starting processes is trivial, so it gets used a lot more often. Having lived through it all, my sense is that threading really caught on because Windows used it. The reason for this, in turn was because it simply didn't have modern memory protection. Bizarrely, they managed to paint something that was a serious flaw as if it were a feature. ("Look! My car has a manual choke--it's right here on the dash! I bet *your* car doesn't have one...") Unix vendors even came up with threading libraries to match this feature.

The truth, in my humble experience, is that threading is rarely smart thing to do. There is the occasional rare case, but much more frequently people think they have a rare case when they don't.

--
"Not an actor, but he plays one on TV."

Re:Also, why multithread when you can multiprocess by TheThiefMaster · 2007-12-25 05:50 · Score: 1

Multi-Threading is more efficient than Multi-Processing (as in running multiple processes, not processors) precisely because it has shared memory. Programs can be that much more efficient if they don't have to be copying large buffers around all the time. Multi-Processing's advantage is that it DOESN'T have shared memory, it's much tighter from a security, and reliability point of view.

They're different approaches for different uses. Neither is better than the other overall.

As for an automatic/manual choke, my current car has an automatic choke, and in the current cold weather it uses it. It's really nice to not have to figure out how much choke you need in order to start the car. However, if I push the accelerator down too slowly when pulling off, the choke disengages automatically too early and the car stalls. A manual choke doesn't have this problem. ;)
(Obviously this last paragraph isn't serious, it likely just needs adjusting a little)

Slashdot Mirror

Faster Chips Are Leaving Programmers in Their Dust

465 of 573 comments (clear)