An Overview of Parallelism

would that be by macadamia_harold · 2007-02-10 11:57 · Score: 5, Funny

Mortimer.CA writes with a recently released report from Berkley entitled "The Landscape of Parallel Computing Research: A View from Berkeley

Would that be a Parallelograph?

--
Push Button, Receive Bacon

Re:would that be by Alwin+Henseler · 2007-02-10 12:26 · Score: 2, Funny

No, more likely a Berks Eye View.

nothing new here... by Anonymous Coward · 2007-02-10 12:08 · Score: 3, Informative

pretty much the same thing Dave Patterson's been saying for a while now...in fact, the CW sounded so familiar, I went back to double check his lecture slides from more than a year ago:

http://vlsi.cs.berkeley.edu/cs252-s06/images/1/1b/ Cs252s06-lec01-intro.pdf

and it's pretty much identical (check out slide 3 on the first page of the pdf)

Erlang by Anonymous Coward · 2007-02-10 12:09 · Score: 5, Insightful

Erlang only provides a way of proving parallel correctness, a la CSP. This means avoiding deadlocks and such. The primary difficulty of crafting algorithms to run efficently over multiple CPUs still remains. Erlang does not do any automatic parallelization, and expects the programmer to write the code with multiple CPUs in mind.

I'm wating for a language which would parallelize stuff for you. This is most likely to be a functinal language, or an extension to an existing functional language. Maybe even Erlang.

Re:Erlang by LLuthor · 2007-02-10 12:44 · Score: 1

Erlang also has a huge amount of overhead, and due to immutable data structures, has to spend a lot of time copying data around.

This is the problem inherent with pure languages. Compilers/runtime systems are simply not sophisticated enough yet to reason about and perform as well as a human can with mutable data structures.

This is why C/C++ will dominate scalable applications for a at least few years more.

--
LL
Re:Erlang by neckjonez · 2007-02-10 13:14 · Score: 1

see the Sun Fortress project http://fortress.sunsource.net/ for a language with built-in parallelism...It's a beginning, anyway.
Re:Erlang by CastrTroy · 2007-02-10 13:33 · Score: 2, Insightful

I wrote some parellel code using MPI in university. It takes a lot of work to get the hang of at first, and many people who I know that were good at programming had lots of trouble in this course, because programming for parallelism is very different than programming for a single processor. On the other hand, you can get much better performance from parallel algorithms. However, I think that we could do just as well sticking with the regular algorithms, and having a lot of threads each running on a different core. If you look at an RDBMS, it would be nice if you could sort in less than n log(n) time, but it's even better if you just sort in n log (n), but can run 128 sorts simultaneously. I seem to remember some news about Intel saying they would have 128 core chips available in the near future.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:Erlang by grammar+fascist · 2007-02-10 13:45 · Score: 2, Informative

Erlang also has a huge amount of overhead, and due to immutable data structures, has to spend a lot of time copying data around.

This is the problem inherent with pure languages. Compilers/runtime systems are simply not sophisticated enough yet to reason about and perform as well as a human can with mutable data structures.

Haskell comes pretty close, and it's designed from the beginning to be pure.

In fact, it may be these immutable data structures that make the pure functional languages able to perform well on multiple processors where sequential languages with mutable data structures fail miserably. If you don't have to worry about function's side effects and no specific execution order is guaranteed, you can run it on any processor you want whenever you want, as long as you don't hold up things too much that depend on it.

However, pure functional languages need to be made more palatable to your average Joe Programmer. (Or computer scientists need to get more mathematical. Or both.) We need the Python and Ruby of pure functional languages: languages in which the syntax design is regarded as a user-interface problem.

--
I got my Linux laptop at System76.
Re:Erlang by zsau · 2007-02-10 15:01 · Score: 1

Surely functional programming languages are where the idea of syntax design as UI problem came from! Lisp on the one hand has a very simple yet incredibly powerful syntax; whereas Haskell has an incredibly beautiful syntax with more sugar than you can dream of. Neither have the same kind of hard-and-fast distinction betwee built-ins and user-defined structures of others either; I can just as easily define an infix function or a macro that doesn't expand its third argument in Haskell or Lisp (respectively) as I can define a regular all-expanding prefix function.

What Haskell needs is to change the name of the IO monad to "warm fuzzy thing" or somesuch and to give it a syntax a bit more like regular procedural languages. And also to improve its efficiency.

Also, more on topic, is Haskell, or any other lazy pure functional language, actually capable of automatically parallelising code, or is that just some theoretical idea that will need a lot of research & work before it's possible? I didn't think Haskell used multiple processors...

--
Look out!
Re:Erlang by grammar+fascist · 2007-02-10 16:13 · Score: 1

Surely functional programming languages are where the idea of syntax design as UI problem came from! Lisp on the one hand has a very simple yet incredibly powerful syntax; whereas Haskell has an incredibly beautiful syntax with more sugar than you can dream of.

Okay, I'll be more specific: they need to address syntax as a human user interface design problem.

Lisp. UI. Heh. Lisp syntax is a good user interface if sendmail.cf is. Its near lack of consistent visual cues disqualifies it completely.

Haskell is a good UI for mathematicians, who spend much of their lives learning to not think like normal people. Computer scientists are somewhere between normal people and mathematicians. Target them.

--
I got my Linux laptop at System76.
Re:Erlang by cpm80 · 2007-02-10 16:19 · Score: 5, Informative

I think using a *specific* language for automatic parallelization is the wrong way. Some GNU folks are working on language independent autoparallelization for GCC 4.3. Their implementation is an extension to the OpenMP implementation in GCC. Read OpenMP and automatic parallelization in GCC, D. Novillo, GCC Developers' Summit, Ottawa, Canada, June 2006 http://people.redhat.com/dnovillo/Papers/ for details.
Re:Erlang by joss · 2007-02-10 22:21 · Score: 1

> I'm wating for a language which would parallelize stuff for you. T

You may as well wait for a language which does everything for you.
For many problems, coming up with an algorithm for efficiently solving the
problem on multiple cores is harder than the algorithm on a single core
[eg, try parallelising a large matrix inversion etc etc].
I wouldnt expect languages to come up with algorithms as fundamental
as QuickSort any time soon [when they do, they'll be smarter than we
are, so we wont be programming at all].

--
http://rareformnewmedia.com/
Re:Erlang by killjoe · 2007-02-10 23:28 · Score: 1

This is probably an ignorant question but I will ask it anyway.

My understanding is that in a functional language io is a bitch. In order to overcome that haskell has invented impure things called monads which are notoriously hard to understand and code.

Given that virtually every computing task involved io why would you design a language that is not optimized specifically for io? Whether it's file access, network access or database access virtually every program ever written uses some io right?

--
evil is as evil does
Re:Erlang by zsau · 2007-02-10 23:56 · Score: 1

I don't understand. What's wrong with Haskell's syntax from a Human Interface perspective? I really can't think how it's substantially different from Python, once you account for differences of language philosophy. Do you just want brackets around parameters to a function call? (Or do you refer to the different styles Haskell allows, like the so-called point-free style for function definitions that make it completely unobvious how many arguments are being used?)

(I realise about Lisp; I was just showing that the issue had been considered from both extremes. Like Perl's, it's useful but not exactly easy for a noob to follow... There's stuff in between. Or ones that have a C-like syntax like JavaScript, but it clearly hasn't considered syntax aside from the "let's make it like Java" angle.)

--
Look out!
Re:Erlang by marcosdumay · 2007-02-11 00:58 · Score: 1

Mercury is trying to make pure functional IO less anoying by developing syntatic sugar. That shows that functional IO can be easy, with the exception of you being oblied to declare that your functions do IO (what makes debug by printing quite hard).

Now, about the GP question of paralel compilers, I've never seen one, and don't know how hard it would be to build it. The very hard problem of breaking the program on pararelizable chuncks you get for free, but you still need to decide how many threads you'll use, and what chuncks go on what threads. Both are unsolvable problems, but you just need an estimate, not the actual solution, so it may be even easy.

--
Rethinking email
Re:Erlang by chris_eineke · 2007-02-11 01:15 · Score: 1

languages in which the syntax design is regarded as a user-interface problem
Lisp then is the ultimate user-interface problem :P

--
"All you have to do is be fragile and grateful. So stay the underdog." Chuck Palahniuk, Choke
Re:Erlang by Anonymous Coward · 2007-02-11 02:30 · Score: 0

My understanding is that in a functional language io is a bitch. In order to overcome that haskell has invented impure things called monads which are notoriously hard to understand and code.
Monads are not impure: they are as pure as the rest of Haskell. They are not hard to code: you just write code that looks like regular imperative code. They might, I suppose, be considered hard to understand... but how many people have been put off using Ruby because they can't explain the computer science theory behind blocks?

No, the main reason Haskell is unpopular is that it is as ivory-tower as languages come, and the people that develop and use it are generally more interested in computer science research than they are in making Haskell popular. Indeed, one Haskell motto is "avoid success at all costs", because a successful language has to be a stable language, and they want to preserve their freedom to experiment.

Meanwhile, techniques that have been tested in Haskell start to crop up in other languages. Python's list comprehensions are a good example, as are parser combinators like Boost::Spirit for C++.
Re:Erlang by Anonymous Coward · 2007-02-11 07:35 · Score: 0

Ericsson pushes Erlang so that they do not need to train new employees. But pushing Erlang does not make it a good language. Ericsson saw that SDL got dominant on telecom market, and they got frustrated about it. That is the scientific background of Erlang.
Re:Erlang by Pinky · 2007-02-11 08:22 · Score: 1

If it's not functional it will have very good support for many of aspects of a functional language. One of the first things one realizes after dealing with concurrency for a while is immutable objects are a good thing. Mutable states are bad..

I can't see automatic parallelization happening for a long time. Too hard.. What I expect to see is a trend toward functional style programming.. Also maybe things like composable memory transactions.. Maybe thread based synchronization (which is a term I've been using to describe a technique I've been using. It's hard to explain but works quite nicely...)...
Re:Erlang by sdnin · 2007-02-11 11:22 · Score: 1

Well,
Erlang I have installed, but never seriously used... ...maybe one day I will, but maybe not...
...as OCaml-programmer (and Perl and C and others, but OCaml is my favourite) I found it very amazing, when I yesterday got a hint to OCamlP3l...
...the decription/abstract looks very promising:
http://ocamlp3l.inria.fr/eng.htm
Ciao, sdnin
Re:Erlang by middlemen · 2007-02-11 14:11 · Score: 2, Insightful

However, I think that we could do just as well sticking with the regular algorithms, and having a lot of threads each running on a different core.

This is exactly the problem with the widespread acceptance of parallel programming. Software programmers don't want to think about new parallel algorithms. I have been and still am working in the parallel programming industry for 2.5 years now (of the total 3 years work-ex that I have), and I find it ridiculous that many programmers with 5-10 years experience or more don't want to use their god-damn brains. Agreed parallel programming is a different paradigm and makes you think more, but what the hell is wrong with that !? You already know sequential style programming, now you will know more and know both and leverage that knowledge to use the available hardware best, to your satisfaction. Isn't that what a programmer should do ? Leverage his knowledge to gain maximum use of the available computer hardware ?

They should just teach parallel programming (not multi-threading, but parallel programming - using MPI for example) to everyone learning programming in college. It is very useful, esp for engineers, since they mostly work with programs that use huge amounts of memory or do intensive computations for long periods of time.
Re:Erlang by CastrTroy · 2007-02-11 14:49 · Score: 1

But in the end, a lot of the stuff that may take advantage from parallel programming, may or may end up good in practice. Take for instance a database system. If you did a bunch of stuff in parallel algorithms then you basically add a bunch of overhead in the form of distribution and gathering, and extra threads or processes. If you leave it as sequential, then you cut down on the overhead, and leave the other CPUs or cores free for other queries coming into the database. I don't work with parallel programming enough to know the answer, but would you really get better performance than the sequential algorithm? Even when you have to processes multiple requests at once, and 1 request is using all the cores? I remember that it was possible to sort N elements in O(log (n)), but only if you had n processors. So on 1 processor, it ends up taking O(n log (n)), which is the same as a serial sorting algorithm (related article. Usually you have 1 or 2 cores, maybe 4, or something really large like 64 or 128, but if you're sorting billions of elements, then you're very close to something that's only as efficient as the serial algorithm, plus a bunch of overheard. If you have 128 cores, you're probably better off serving 128 requests with serial algorithms, each with it's own core, then trying to run them as parallel processes 1 after the other.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:Erlang by miach · 2007-02-12 06:57 · Score: 1

Also, more on topic, is Haskell, or any other lazy pure functional language, actually capable of automatically parallelising code, or is that just some theoretical idea that will need a lot of research & work before it's possible?

I wrote my Masters thesis on that topic over a decade ago, so it's quite possible and not that complex to do with a pure functional language. Getting the efficiencies up is more complex (you have to weigh the costs of managing the parallelism with the gains), but certainly doable.

One of the advantages of pure functional languages is that you can do speculative evaluation due to the lack of side effects. So, for example, if you've got a spare processor you can evaluate both sides of an AND, and just throw away the unused results if the left side fails. If it succeeds then you've got some or all of the work already done.
Re:Erlang by Raenex · 2007-02-12 07:36 · Score: 1

What's wrong with Haskell's syntax from a Human Interface perspective?

The original poster never answered your question, but as a long-time Java programmer I can tell you my experience in trying to learn Haskell. I think there are two issues here, when considering Haskell as a "UI" for the programmer to the machine. One is syntax, and two is concepts. Haskell is radically different in both, and this makes it harder to learn the concepts, which are more important than the syntax.
For syntax, lack of parenthesis is confusing. When you combine that with higher-order functions and partial application, it gets really confusing. What you are presented with is just a stream of words. If Lisp is hard to look at because of all the parenthesis, then Haskell is the opposite. And yes, point-free style adds to the confusion, but it was there without it. I also found myself getting confused with parenthesis used for grouping vs tuples and the idea from other languages that a parenthesis means a function call.
Datatypes were also confusing because they were again, just a stream of words, and it was too easy to get confused between the typename, the constructor name, and parameters to the constructor. Being so used to something like Foo(int a, String x) makes it hard. All these problems are really just a matter of getting used to the notation, but they are the kinds of problems that somebody learning a language like Ruby coming from Java wouldn't have.
As for concepts, in general I find recursion and lazy evaluation much harder to think about than a while loop. Here's an example from "Haskell for C Programmers" that I'm sure would get many C/Java folks' heads spinning:
fibs = 0 : 1 : [ a + b | (a, b) <- zip fibs (tail fibs)]
Higher-order functions and partial-application are also hard to wrap your head around. Not the basic idea (a simple map is easy enough), but in trying to figure out what some clever bit of code is doing. Took me awhile to figure out what this did the first time I saw it:
map (map (const 1))
It's a lot of pain, especially when you know you could easily do whatever it is you were trying to do in your familiar, imperative language.

It's not hard by PhrostyMcByte · 2007-02-10 12:12 · Score: 4, Insightful

I think the main reason people say "don't use threads" is because while single threaded apps are easy to debug, multi-threaded ones will crash and burn at seemingly random places if the programmer didn't plan ahead and use proper locking. This is probably good advice to a noob programmer but I otherwise can't stand people who are of the "absolutely, never, ever, use threads" mindset.

Some applications have no need to be multithreaded, but when they do it is a lot easier than people make it out to be. Taking advantage of lock-free algorithms and NUMA for maximum scalability *can* be hard, but the people who need these will have the proper experience to tackle it.

Language extensions for threading would be great, and I'm sure somebody is working on it. But until that magical threading language (maybe c++1x) comes along the current ones work just fine.

Re:It's not hard by Anonymous Coward · 2007-02-10 12:21 · Score: 0

Anyone know of any that try and modify existing programs to run well in parallel
Re:It's not hard by ardor · 2007-02-10 12:23 · Score: 3, Interesting

Well the indeterministic nature of multithreading is still a problem. With one thread, debugging is simple: the only thread present will be stopped. But with multiple threads, how is the debugger supposed to handle the problem? Stop all threads? Only the current one? Etc. This is relevant when debugging race conditions.

Also, the second great problem is that thread problems are hard to find. When I write a class hierarchy, an OOP language can help me with seeing design errors (for example, unnecessary multiple inheritance), or misses in const-correctness. Threading, however, is only present as mutexes, conditions etc.

One other issue with threads is that they effectively modify the execution sequence. Traditional single-threaded programs have a sequence that looks like a long line. Threading introduces branches and joins, turning the simple line into a net. Obviously, this complicates things. Petri nets can be useful in modeling this.

--
This sig does not contain any SCO code.
Re:It's not hard by QuesarVII · 2007-02-10 12:57 · Score: 1

The Intel C/C++ and Fortran compilers both support auto parallelization. I've never benchmarked their effectiveness though. I'm sure a quick googling could get some results.
Re:It's not hard by cluckshot · 2007-02-10 13:04 · Score: 1

Its just opinion but I think the problem with parallel programming is a substantial lack of processors. If for example a die was made with 10 million processors on it (very simple processors) with modest queue memory, there are applications in optical data processing that would become extremely efficient and much more natural than any solutions as of this time. Otherwise we just spend our time loading the queue of one or 4 processors millions of times to do exactly the same process on the data from each pixel in a camera for example.

Having 4 processors is not parallel processing, it is just divided serial processing. Parallel processing allows that the process is in fact done in parallel rather than simply divided into 4 pipes. The thinking that is dominating the processor study at this time is one of how do we do these "serial" processes faster. Duh! We are going to parallel processes here folks! It requires thinking in a whole different paradigm. It isn't serial processing of course the guys with serial brains are not going to get the process figured out.

Multi-threading is merely serial operations done with time interrupts and conditional interrupt flags. It is by definition serial. The parent post isn't bad, it begins to get there, but really we have to get out of that mind set. Parallel will operate like a single serial operation set applied in several million sets at the same time without interrupts. Parallel programming will have many other features including a reconstruction of base process step after the parallel is done. To illustrate, if you were to do a simple operation such as comparing one video frame with another in real time, you could compare with a corresponding number of processors every pixel in one clock cycle. You could at the end of that cycle instruct another set of operations such as a time decayed comparison of many frames.

--
Never Politically Correct ~ I prefer the facts If you don't like what I say, get a life, or comment yourself.
Re:It's not hard by tepples · 2007-02-10 13:10 · Score: 1

I think the main reason people say "don't use threads" is because while single threaded apps are easy to debug, multi-threaded ones will crash and burn at seemingly random places if the programmer didn't plan ahead and use proper locking. Do the C standard and the C++ standard go into detail as to the semantics of this locking? No, they don't even specify what a thread is. This makes it less likely that students learn proper locking in college.
Language extensions for threading would be great, and I'm sure somebody is working on it. But until that magical threading language (maybe c++1x) comes along the current ones work just fine. Which "current ones"? Do POSIX threads work on Microsoft Windows?
Re:It's not hard by MBCook · 2007-02-10 13:23 · Score: 1

Maybe we should teach the basics. I don't remember my CS program (this was '01-'05 or so) teaching anything about working with threads. Everything I know about threads and threaded programming has been picked up by reading books on threading, experimenting with threading, and what I've learned on my job from others who know threading. I'm not going to claim I'm that competent through.
What do I know about NUMA and other complex issues of thread handling and modern computers? What I know in that category came from reading about the Linux Kernel and following it's development. Articles explaining what they were doing and why (especially on LWN.net) have taught me a TON.
But the interfaces to program threads in C/C++ are external libraries or fork. Java has threads built in but it still feels like quite a bit of work for something that should be simpler. People talk about Erlang, but I don't know how much better it is (I've never used it).
The debugging instruction I was taught in school was basically what was necessary to help us get our programs done. I've learned quite a bit more since getting my job. And god forbid they teach us profiling or something. Let alone would they try to teach us about mutli-threaded programming or ESPECIALLY debugging it.
Maybe one of the reasons we so few multi-threaded programs (besides making algorithms parallel isn't easy) is how many people are actually trained how to do it.
You don't even how to teach students how to do all this stuff, just give them a basic knowledge so they can keep an eye towards it in the future when they need to learn more, instead of the sink-or-swim learning situation they may get stuck in.

--
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
Re:It's not hard by maxwell+demon · 2007-02-10 13:39 · Score: 1

Not all algorithms can be parallelized that easily. Imagine e.g. a parser: You cannot parse text by having a million processors looking at one character each.

--
The Tao of math: The numbers you can count are not the real numbers.
Re:It's not hard by maxwell+demon · 2007-02-10 13:51 · Score: 1

Do the C standard and the C++ standard go into detail as to the semantics of this locking?

AFAIK C++0x (i.e. the next version of the C++ standard) will.

--
The Tao of math: The numbers you can count are not the real numbers.
Re:It's not hard by mikael · 2007-02-10 14:07 · Score: 3, Insightful

Not all algorithms can be parallelized that easily. Imagine e.g. a parser: You cannot parse text by having a million processors looking at one character each.

You could have the first thread processor split the text by white space. Then each block of characters is assigned to any number of processors to find the matching token. I've seen some parsers where the entire document was
read in, converted into an array of tokens before returning back to the calling routine.

--
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Re:It's not hard by vadim_t · 2007-02-10 14:33 · Score: 1

But in that case, your speed is limited by the speed of the splitting by whitespace. And if you've got lots of CPUs, chances are they're quite slow on their own.

Then there's that you can hardly split by whitespace with a complex syntax. You certainly won't be able to parse C like that.
Re:It's not hard by owlstead · 2007-02-10 15:00 · Score: 2, Interesting

For a new C++ with multithreading language extensions (for manually coding the multithreading), good API and IDE/tool support, look no further. It's called Java and it has been around for ages. You really, really, really don't want multithreading in a non-"managed" language. You don't want to debug an application where *every* part of your application can be messed up by any thread. The advantage of Java is that the build in security meassurements.

Things you need to have support for in a language/environment:
- All the usual multithreading design patterns in the API (producer/consumer, stream support, stacks etc);
- Thread safe collections;
- A keyword and mechanism to do locking ("synchronized" in Java, "volatile" is handy for optimizations);
- Memory protection;
- Multi-threading supporting debuggers (at least halt all functionality)
- Very well documented API, especially if classes are thread safe and how to use them;
- If anyway possible, automated code checking tools for multi-threading (won't catch every multi-threading error, but are very usefull).
Furthermore it's very important to make sure you can use the API's on different platforms if you expect to port the application in the future. Also, since threading may be expensive, a lightweight version of threads may be usefull as well.

It's not just adding a language extension for threading. Support must go much further than that. Especially the API documentation is important in this respect. Unfortunately there are always parts of the API that are not really usefull for multithreading, such as the Date classes in Java (they suck in all other aspects as well, to be honest).

If you are every going to try Java, make sure to not use arrays too much. You cannot prevent write access to arrays, so any thread that has a reference to an array may change it all the time. Try and use immutable classes as much as you can (BigInteger, BigDecimal and of course Strings are all immutable), and make defensive copies when needed. In C++ any thread can mess up any data element by simply casting, something that is done *way* to much in C++ anyway. You can also use C#, but a quick google search scan shows that C# multithreading is not used as much as can, and getting support might be much more difficult. At least you can make the collections thread safe.
Re:It's not hard by gumbi+west · 2007-02-10 15:17 · Score: 1

I think this is a great example of where serial mind sets inhibit us. Parsing in parallel makes a whole lot of sense once you're ready for it. Granted you need parallel data storage to make good on parallel parsing, but it's feasible if you are ready to restructure your whole mind set.
There is a very interesting theory provided by Trefethen and Bau in Numerical Linear Algebra that the existence of exact algorithms for most of the complicated interesting matrix operations may have slowed our finding a faster algorithm that approximates the result with arbitrary (i.e. slightly non-zero) error. It's been several years, but I thought that eigenvalue problems when approximated to within a few times machine epsilon can be found 1/2 order faster or some such thing than using the exact methods.
Re:It's not hard by jlarocco · 2007-02-10 15:34 · Score: 1

Do the C standard and the C++ standard go into detail as to the semantics of this locking? No, they don't even specify what a thread is. This makes it less likely that students learn proper locking in college.

I don't think that's relevant. The ISO standard for Ada completely specifies the semantics of threads and thread locking, but that doesn't necessarily make the problem any easier. Of course Ada usually isn't taught in schools, but (unfortunately) C and C++ aren't usually taught in schools either, except for an occassional "C++ Programming" class.

Which "current ones"? Do POSIX threads work on Microsoft Windows?

I'm not exactly sure what you mean. There are several languages that have parralelism built into the language itself. In most of them, the underlying OS and computer architecture are irrelevant. Ada and Erlang come to mind, but there are others.

--
Maybe not
Re:It's not hard by ringm000 · 2007-02-10 17:20 · Score: 1

A parser is useless by itself. Usually you parse data to process it somehow, and processing the data (e.g. code generation in a compiler) can often also be split into separate passes. These passes can communicate rather efficiently through a lock-free queue.
Re:It's not hard by FranklinDelanoBluth · 2007-02-10 18:39 · Score: 1

Language extensions for threading would be great, and I'm sure somebody is working on it. But until that magical threading language (maybe c++1x) comes along the current ones work just fine. Which "current ones"? Do POSIX threads work on Microsoft Windows?
The real application domain of multi-threading and data-level parallelism is stuff that one would never run on a Windows client: web servers, databases, scientific computing, ray tracing, etc. The improvement that these new paradigms offer is mostly in terms of throughput, not response time.

I would also add that to program on many of the new multi-core architectures, a programmer will want to be able to specify what runs on which core. Take for example the cell processor, where each SPE has only a 256KB local store with limited branch prediction. A miss in the store or a mispredicted branch would be catastrophic for performance. Instead, the programmer writes small specialized programs with that have minimal branching. To construct larger systems, programs on adjacent (note: adjacency is key!) SPEs pass their output down an assembly line of sorts, each adding a little bit of work to the output. To effectively utilize this system, a programmer will want very tight control over the assignment of threads to SPEs.

Even though modern desktops running XP/Vista are running multi-processor machines(Core 2s and 64 X2s), these are still quite far from state of the art in term of exploiting multi-threading. If you want to see what it's really about, look at the Sun T1: 8 cores, 4 physical threads per core.

If you are looking to speed up single-threaded applications (i.e. most of the stuff that consumers use) you should be looking to fight the "Memory Wall." Most of todays computers (even with the newly reduced pipelines) still spend most of their time stalling on cache misses (it's on the order of 10^2 cycles to retrieve from RAM). Looking for ways to maximize cache locality/residence would be much more fruitful in terms of turn around time, though again this tends to be something which must be done explicitly and cannot be abstracted behind a programming language.
Re:It's not hard by drolli · 2007-02-10 19:12 · Score: 1

> Language extensions for threading would be great, and I'm sure somebody is working on it. But until that magical threading language
> (maybe c++1x) comes along the current ones work just fine.

The sad reality is: unless there is an AI there will be no "magical threading language". That is because designing a multi-threaded program starts with isolating object classes and instances which are *likely* to be used for communication. Isolation and designing an "orthogonal" flow of information is for multi-threaded code much more essential tha for single threaded. If designed correctly on each of the information flows one can use the standard techniques to protect against most of the problems. As the problem goes, the reality is not ideal. A lot of code has extraordinarily badly designed internal communication mechanisms, which gives three possibilities

1) Redesign the code (Um, yes...)

2) Accept design flaws a fix the crashes until they are not annoying any more

3) Accept that while your program may be multi-threaded effectively only 1.0 to 1.5 threads are running at the same time due to your "global locks", and make similar to 2) the probalility of deathlock small enough to be non-annoying

BTW.: I do not condider 2) and 3) to be good programming
Re:It's not hard by bodan · 2007-02-11 01:05 · Score: 1

I'm sure it's _very_ nontrivial---which is why it's not done often---but lots of what a parser does could be done in parallel. For instance, you really don't _need_ to find whitespace linearly. You can split the string (for large programs) in pieces. Stitching the results is going to be a bitch, but it's possible. For a million processors you _could_ actually load one character on each processor and do everything in parallel. I never saw an algorithm that does this on text, but I've seen pattern recognition and edge detection done with a-processor-per-pixel algorithms, and if you think about it properly parsing is something analogue: you have to find the tree hidden in that mess of characters.

Again, this would be a bitch to program, perhaps only possible by automatic tools, but it could work.

--
"I think I am a fallen star. I should wish on myself."
Re:It's not hard by Anonymous Coward · 2007-02-11 05:37 · Score: 0

That was a troll, right?
Re:It's not hard by drerwk · 2007-02-11 06:13 · Score: 1

You could have the first thread processor split the text by white space. Then each block of characters is assigned to any number of processors to find the matching token. I've seen some parsers where the entire document was read in, converted into an array of tokens before returning back to the calling routine.

I'm in the midst of a project that is trying to use multiple threads, and this example is not far off from the thinking that is going on; by some of the engineer who have not done MT before. Assuming a normal 8 or 16 processor SMP there is a cost of communication between threads that is actually pretty high compared to having a single processor do some character comparison on text. Not saying you couldn't make the above work, but in the example of parsing, the job usually needs to be done from beginning of a translation unit to end. You could not in general start in the middle of a c++ file and expect a parser to make sense of it. That said, if your program has many files, you can parse and compile each on in parallel, with no communication needed between threads, in which case you get a big win. That's where parallel make is your friend.
Re:It's not hard by cluckshot · 2007-02-11 10:36 · Score: 1

Obviously some operations do best as serial operations. Any process whereby we do a series of operations to one unit of data that is not done to all of the rest of the data in exactly the same way has obvious advantages being done in series. It is by definition a series of events.

To clarify, using a serial process is sort of like building a stack. You cannot add an item to a stack ahead of the item that is to precede it. If you do you break the intended order. Parallel is sort of like what could happen if you had a brick layer who could lay all of the bricks in one row at one time. So each row in a brick wall could be done in parallel to advantage but each row must be stacked on the previous row. So it should be clear that there are processes that have great advantage in series and some others have great advantage in parallel. USE THE RIGHT TOOL FOR THE RIGHT JOB PLEASE! To be a bit silly here from Dr Seuss "Oh the thinks you can think" --(Wonder and think how much water can 55 elephants drink.) 55 elephants drinking at one time is a parallel operation. Each elephant drinking after the other is in series. It would be a bit hard to get the last elephant a drink before the first one needed another drink if this process is done in series. It sort of makes sense to do some jobs in parallel and others in series. Nothing is a one size fits all solution.

Optical processing solutions and mapping solutions are natural solutions for parallel processing. Done in series these processes tax very nearly the entire computing power we can summons and then they perform less than optimally. Done in parallel these operations by nature would become nearly instantly solved. Suppose for example we had parallel processing on Airline ticketing and routing. The data on which flight to take and the optimal schedule would be available almost instantly. Conversely done as it is in series (multi threaded though it may be) this process is at best a slightly optimized kludge. A similar solution is for memory access in computers. Currently all computer hard drives and similar mass storage devices are nothing but series accessed devices. This produces the curious reality of fetch times on a large hard drive (mean times) that are quite long. Done in Parallel, these operations could easily be done in a few clock cycles fetching data almost instantly. If for example you had a processor dedicated to each sector in each track, the hard drive could be read, fetched and files processed out nearly instantly.

We have developed processors that are really good. Their clock rates are beyond imagination fast. Yet competing against a processor which is massively parallel and with a clock rate less than 30 cycles per second, (Your brain) modern computers are at best running poor second rate. Many problems simple are beyond the modern computers and yet are routinely solved by humans. The move to massively parallel will bring up a genuine competition. But to think this way requires one to divorce the old thinking and get into a completely different mind set.

--
Never Politically Correct ~ I prefer the facts If you don't like what I say, get a life, or comment yourself.
Re:It's not hard by maxwell+demon · 2007-02-11 11:51 · Score: 1

We have developed processors that are really good. Their clock rates are beyond imagination fast. Yet competing against a processor which is massively parallel and with a clock rate less than 30 cycles per second, (Your brain) modern computers are at best running poor second rate.

That entirely depends on the type of problem you throw at them. After all, if computers couldn't beat our brains in certain areas, there would be no point in using them.

--
The Tao of math: The numbers you can count are not the real numbers.
Re:It's not hard by Mr+Z · 2007-02-11 12:31 · Score: 1
In the language of the article, most parsing problems look like the "FSM dwarf," and they really are embarrassingly serial problems. Edge detection and pattern recognition on images is different because they're identifying spatial contexts, and that can be done in a parallel manner. The meaning of that spacial context depends on the neighborhood around the pixel. Language parsing (whether natural language or a computer programming language) is somewhat different, as the meaning of each new token is directly influenced by the tokens that precede it, and typically not the tokens that follow it.

For example, suppose you're parsing a perl program with this single-character-per-processor paradigm. You see the character '}'. What does it mean? Is it:
1. Closing a block, as in "if (foo) { blah; }"?
2. Closing a hash reference, as in "$hash{blah}"?
3. Closing a dereference, as in "@{$foo}"?
4. Is it part of a string?
5. Is it part of a comment?
About the most you might be able to do is coordinate with nearby processors and identify potential tokens. Once you have those tokens, though, they need to go into the parser state machine in order to make their semantics known to the computer.

Really, the only parallelism you have is in terms of pipeline stages. Many parsers are divided up into two main stages: lexical analysis and semantic analysis. (Look up lex and yacc for an implementation of this model.) You could run the lexer on one CPU and the grammar on another. Whatever program this drives could run on other CPUs.
--Joe
--
Program Intellivision!
Re:It's not hard by Anonymous Coward · 2007-02-12 06:32 · Score: 0

I think the main reason people say "don't use threads" is because while single threaded apps are easy to debug, multi-threaded ones will crash and burn at seemingly random places if the programmer didn't plan ahead and use proper locking. This is probably good advice to a noob programmer but I otherwise can't stand people who are of the "absolutely, never, ever, use threads" mindset.

Congratulations, you just earned your code cowboy badge.

The reason that writing multithreaded programs is hard -- for anyone, not just "noob programmers" -- is that there is no margin of error. Zero. None. Nada. Forgot a lock? Race conditions and data corruption. Got the lock order wrong? Deadlock. Too much locking? One one thread running at a time making the whole exercise pointless.

Even if you get it right once, the odds of getting it right every time you need to make a change to update the code is pretty low. And the best part is: you'll never really be sure you got it right. Remember: testing can only prove the existance of bugs, not their absence. If you honestly believe every multithreaded program you have ever written has perfect synchronization such that it works in all situations -- that it no race conditions, no deadlock, no data corruption -- you are a fool.
If you think that any about of testing is going to prove that your code works in all situations, you are an even bigger fool.

You can do a lot with a single thread (especially if you use an event-driven state machine). Use those other processors/cores to run addition processes. That what multitasking is for.
Re:It's not hard by mikael · 2007-02-12 09:09 · Score: 1

I am familiar with the difference between heavyweight and lightweight processes, multithreading and context switching. I certainly wouldn't recommend doing this with seperate processes on separate CPU's. However, if a CPU existed that could support a large number of threads that shared the same memrory space,
then it might be worthwhile. The method of recursive binary splitting the stream into a linked list of substrings and then classifying each string might then be possible.

--
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Re:It's not hard by Lost+Engineer · 2007-02-12 14:06 · Score: 1

I believe this is more auto-vectorization for using SIMD instructions. They must've realized nobody was using the intrinsics...
Re:It's not hard by QuesarVII · 2007-02-13 02:46 · Score: 1

No, it actually does use parallelization. I tested with the stream memory benchmark on an smp opteron system and got more bandwidth than one cpu could actually deliver, and that was testing without using the openmp options.
Re:It's not hard by owlstead · 2007-02-13 07:52 · Score: 1

No it wasn't.

Hmmm... by ardor · 2007-02-10 12:13 · Score: 5, Interesting

"but is likely to face diminishing returns as 16 and 32 processor systems are realized"

Then we are doing something wrong. The human brain provides compelling evidence that massive parallelization works. So: what are we missing?

--
This sig does not contain any SCO code.

Re:Hmmm... by Tibor+the+Hun · 2007-02-10 12:20 · Score: 1

Granularity.
Neurons only work on miniscule problems. Pretty much like a single gate.
Our programs are not yet capable of dividing problems into such small pieces.

Plus, so far most of our parallelization efforts are focused on optimizing software. Brains are dedicated wetware machines.

--
If you don't know what AltaVista is (was), get off my lawn.
Re:Hmmm... by colganc · 2007-02-10 12:21 · Score: 1

I doubt it has anything to do with what we are missing as opposed to optimizing for what kind of hardware exists currently. I imagine current parallel techniques have relatively speaking low overhead and to take advantage of computers with high processor accounts you move to higher overhead methods that can take advantage of more of the processing power.
Re:Hmmm... by ardor · 2007-02-10 12:28 · Score: 1

In this case, the current parallelization efforts miss the point. Is there actual research into CPUs consisting of billion miniscule neuron-like units? Something like a neural net hardware? Maybe these would fare better than pure software ones...

--
This sig does not contain any SCO code.
Re:Hmmm... by CosmeticLobotamy · 2007-02-10 12:32 · Score: 4, Funny

Then we are doing something wrong. The human brain provides compelling evidence that massive parallelization works. So: what are we missing?

Brain scalability is just not that great. Trust me, putting more than four brains in one head is just asking for locking problems out the yin-yang.
Re:Hmmm... by Tibor+the+Hun · 2007-02-10 12:32 · Score: 1

I haven't heard of any hardware such as that, but I'd guess it'd be possible to do with FPGAs.
Surely someone here could enlighten us.

--
If you don't know what AltaVista is (was), get off my lawn.
Re:Hmmm... by edschurr · 2007-02-10 12:47 · Score: 1

Isn't the brain pretty much entirely in the domain of imprecise pattern recognition? Maybe massive parellelism isn't useful for many types of tasks. I don't have the imagination to think about it right now.

Well, one thing that I can think of is having extra threads gamble on what problems might need solving in the future, and doing them ahead of time. Diminishing returns still feels very relevant...
Re:Hmmm... by imsabbel · 2007-02-10 13:02 · Score: 1

Yeah.
You are missing that our super-great brain isnt even able to calculate more than 1 or 2 additions per second.
Meaning: the parallism thats at work in our brain is absolutely useless for thinks we want to do with a computer.

--
HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
Re:Hmmm... by jimmydevice · 2007-02-10 13:03 · Score: 0

Putting four brains in the same *room* is asking for problems.
Re:Hmmm... by Simon80 · 2007-02-10 13:11 · Score: 1

One of my university profs does neural network research with FPGAs, perhaps that qualifies.
Re:Hmmm... by philipgar · 2007-02-10 13:14 · Score: 2, Interesting

Is this really true? Of course for some tasks the massive parallelism of the human brain works great. The brain can analyze complex images extremely fast, comparing them in parallel to it's own internal database of images, using fuzzy reasoning to detect if something is familiar etc. However you give your brain complex math problems, and it can spend seconds, minutes, or even hours to solve it, sometimes requiring extra scratch memory to solve. This is due to bad programming in the brain that sucks at doing certain computations (except for the rare people who manage to do these problems fast).

This is also true for computers. Big scientific problems can use up 1000's of processor's, just look at some of the problems running on supercomputers. However not all problems scale up to effectively use 1000's or processors. Many desktop applications won't be able to take advantage of the power, however many of these applications don't need to take advantage of that power.

Basically computers will continue to evolve, with custom hardware or cores that can adapt to the problem. If there's enough demand to solve a problem, it can be done, however doing fast software designs will not be cheap, and "hardware" designs (verilog, vhdl, etc) will be even more expensive (even if they don't need to spin silicon for them). However for applications that have sufficient demand, they will be done. They're just highly time-intensive to do. Luckily there will always be outsourcing to help us out.

Phil
Re:Hmmm... by ardor · 2007-02-10 13:28 · Score: 1

Yes, the brain works visually, and this is the problem with math. Mathematicians often say that they imagine the mathematical problems in a visual way, but so abstract that it cannot be painted on paper. So the actual problem is the translation.

--
This sig does not contain any SCO code.
Re:Hmmm... by ardor · 2007-02-10 13:33 · Score: 1

As mentioned before, it is all a matter of how the problem is viewed. If you can translate the addition into an abstract visual model, then the brain is very quick.

--
This sig does not contain any SCO code.
Re:Hmmm... by LordLucless · 2007-02-10 14:58 · Score: 1

It depends. Our brain's are vastly more complicated than a computer CPU, but they're not designed to do the same thing. There was an interesting explanation of this in a fiction book I read a while back that talked a little about AI.

Basically, it said the human brain was designed for algorithmic complexity, not speed. If you take a brain, and make it ten times faster, it won't be any more intelligent. It'll just take 1/10th of the time to get the same answer as it used to, it still won't be able to solve any more complex problems.

What we do with computer processors is the opposite direction - we just pile up more and more speed. When we want complicated behavior from a CPU, through more speed at it brute-force it. If we want our next-gen parallelized machines to be efficient, we'll have to make changes to our algorithms.

--
Just because you're paranoid doesn't mean there isn't an invisible demon about to eat your face
Re:Hmmm... by Anonymous Coward · 2007-02-10 15:04 · Score: 0

At the abstract level of conscious thought, addition is "slow", yes. But your brain calculates things quicky all the time and is massively parallel. You simply do not have awareness of such low-level calculations inside your head. That's what [probably] allows for consciousness.
Re:Hmmm... by gumbi+west · 2007-02-10 15:23 · Score: 1

yet a baseball player can calculate the exact force needed to throw a baseball from center field to home plate. Here the angle and force are even overspecified and so there is a highly non-linear optimization to perform to get the minimum possible travel time to the throw. There is also an error in angle and force so a loss-function must be used to trade off accuracy for speed, in addion how important the throw is which dictates how probable an injury should be allowed. If you had a machine that let you throw the ball with all these factors, it could easily take you an hour to do the calculation. Yet the baseball player takes only a fraction of a second to do it all.
So how long does it take to do the calculation?
Re:Hmmm... by timeOday · 2007-02-10 16:48 · Score: 1

yet a baseball player can calculate the exact force needed to throw a baseball from center field to home plate.
No he does not. He simply learns the mapping from stimuli to actions. He's not calculating anything. That's why pitchers pitch instead of studying physics books.
Re:Hmmm... by zsau · 2007-02-10 16:56 · Score: 1

However you give your brain complex math problems, and it can spend seconds, minutes, or even hours to solve it, sometimes requiring extra scratch memory to solve.

You give an x86 computer PPC binaries and ask it to run them and they'll be slow. Your complex maths problems aren't in your brain's natural format. But anyone (experienced with driving a particular car) can tell you just how hard they'll need break to stop, or how fast they can go round that corner safely. And they can do this whilst telling their kids in the back to shut up! These are pretty complex maths problems. Even simpler ones can be done in no time at all: Uneducated people who run market stalls can calculate change from $50 of sixteen figs very quickly, yet you ask them what 50-(cost-of-fig * 15) is, and they'll take a lot longer.

The brain is massively parallel; it's just that attention is more limited, and it takes attention to process abstract problems. And, of course, your brain has been optimised for particular tasks. We can still learn a lot from it when solving different problems

(Also, the computations your brain can do is a separate issue from its memory. Your short-term memory is only very limited---sevenish items, but of almost arbitrary size ("123632445634579861" is hard to store in STM, but "123 632 244 563 579 861" is much easier), whereas your long-term memory is much slower less useful when you're performing computations.)

--
Look out!
Re:Hmmm... by ardor · 2007-02-11 00:06 · Score: 1

Actually, he does. After several games, the brain can extrapolate pretty accurately the necessary parameters. The guy doesn't think in terms of parameters and forces, of course. But the brain adapts to the problem.

--
This sig does not contain any SCO code.
Re:Hmmm... by xsonofagunx · 2007-02-11 00:31 · Score: 1

Even simpler ones can be done in no time at all: Uneducated people who run market stalls can calculate change from $50 of sixteen figs very quickly, yet you ask them what 50-(cost-of-fig * 15) is, and they'll take a lot longer.
No they can't, silly... they just remember from the last guy that came and bought sixteen figs. The cent amount is trivial, it'd be the same as when someone paid with a $10, or $20, or whatever. The dollar amount is just a very quick subtraction, since the cents are already taken care of. Besides, they probably have a neat thing called a cash register in front of them, and that just tells them how much to give you without them having to think about it at all.

Your point is somewhat valid - give them a written math problem and their eyes would glaze over, but even given the fig problem, what they're doing is really more of a learned-by-repetition thing than a computation.
Re:Hmmm... by zsau · 2007-02-11 01:16 · Score: 1

Cash register in a market stall in some third world country? (I'm sorry, I didn't make that clear, I was talking about people who didn't have cash registers.) It's also worth noting that in markets prices are not as likely to be the same from day to day (or even customer to customer) as they are in a computer shop.

Yes, the process will involve some remembering and some calculation, but the point is completely valid: We perform much better in concrete, real-life situations than in abstractions.

--
Look out!
Re:Hmmm... by TheLink · 2007-02-11 02:04 · Score: 1

Nope. Not really like a single gate at all.

You have stuff like a single neuron that fires just for a specific abstract concept. Go search for "Halle Berry" neuron.

So it could be more like billions of neurons watching "Sensory Channel + Stream of Consciousness", going mostly blah and then suddenly you get one little neuron yelling "It's Halle Berry!!!".

Maybe followed by another one yelling "It's on a TV!"

And then some neuron yelling "It's neurons screaming about Halle Berry on TV!".

Then followed by maybe "Change Channel!" or "Pay attention!"... So on and so forth.

I'm no neuroscientist, but that's about what I gather happens and sure looks quite different from logic gates.

More like splitting huge lookup tables into small pieces where each neuron provides an "answer" when it recognizes an "address".

So maybe out of the 100 billion neurons, you have a few hundred million or so dedicated to recognizing very specific things, and a few hundred million recognizing the various sort of
"recognitions", etc etc.
--
- Too many replies beneath your current threshold
Re:Hmmm... by TheLink · 2007-02-11 02:29 · Score: 1

The brain doesn't necessarily work visually. Since blind people still manage somehow :).

And there are lots of more important life affecting problems than visual calculations.

Like whether to go for chocolate or vanilla cake, or both. Or whether this funny smelling stuff should be eaten or not. Or whether I can make it safely up the slope. Or whether I can jump, reach that branch AND it is likely to hold me (this is not just simple visual - since you need to estimate your weight, _current_ strength factoring fatigue, branch strength and a LOT of other things).

Laugh, but really a lot of this has been proven over hundreds and millions of years to be more important than silly math calculations ;).

I actually believe that the brain builds and maintains models of stuff all the time, and uses those - and consciousness is at least partly due to the result of a mind recursively simulating itself. After all its very useful for a creature to model other creatures - knowing what they might do is important, and then when they try to model each other modelling each other etc, things start to get interesting - then you get introspection, better correction of flawed models etc. And that's why I believe many animals (if not most) are conscious/sentient - just would be simpler and easier for them to do what they do.
--
- Too many replies beneath your current threshold
Re:Hmmm... by gumbi+west · 2007-02-11 04:33 · Score: 1

okay, lets say it's as simple as you say. It still requires interpolation because a center fielder never stands in exactly the same spot and yet tries to hit the catcher's mit right above and to the left of home plate, so previous responses would give a noisy surface from which extrapolation would be necessary--I would argue that these calculations are perhaps more intensive than the direct physics calculations.
Re:Hmmm... by Anonymous Coward · 2007-02-11 06:11 · Score: 0

Mathematicians do not imagine problems in a visual way any more than they imagine problems in a verbal way. Both modes of thought are inherently limiting
Re:Hmmm... by Pinky · 2007-02-11 09:22 · Score: 1

>The human brain provides compelling evidence that massive parallelization works. So: what are we missing?

That depends.. In what way is the brain massively parallel?

Sure it uses many neurons but a processor has many transistors. Do I get to say that processors are massively parallel because every time I do an addition a large number of transistors are used at once; each one only doing a small part of the operation?

The amount of parallelism in the brain is most often over-stated. it's impressive but not magical...

http://www.sciencedaily.com/upi/index.php?feed=Sci ence&article=UPI-1-20070126-18383300-bc-us-onetrac kmind.xml

The sort of parallelism being discussed here is thread based and at that level you are faced with the equivalent of taking a task and dividing it amongst multiple brains. With the added complexity that the brains you're splitting the task up to are annoyingly literal and will gladly go and corrupt data structures and then cause the computer to crash.
Re:Hmmm... by Anonymous Coward · 2007-02-11 11:06 · Score: 0

We are missing, what you think we have: a working brain.
I remember, about 20 years ago, there were two processors (or more, but theese two I remember) on the market: The T400 and the T800 processors, called Transputer, from INMOS. They only were single-cored, but the first attempt to have parallel computation on a board (not on a chip).
And they failed in market.
Maybe the "640 kB is enough" and the "MHz's counts!" propaganda worked good (MickeySoft/Wintel).
Now, as Intel sees that with their stupid CPU-designs, when going up the GHz-ladder only make much more hot processors but not much faster processors, they reuse the Transputer and PPC-designs at least in a rough sense: they do parallelization.
This (parallelization) is needed since 20 years, but it was slowed down by the propaganda of some (two) leading companies...
640 kB is enough,
16 Bit is cool, but enough,
32 Bit is enough, don't use 64 Bits, ....
... but you always need a new WinVersion ;-)
...andhigher clock frequency (it's easier to make transistors switching faster than to rethink the design...)
Ciao,
sdnin
P.S.: Sometimes people say: "well, the tehnology has changed so rapidly", but IMHO it was slowed-down a lot! Technological development could be much faster, if not some big companies would always slow it down, because the big ones have anxiety to be too slow in the market and do not have the CONTROL on the technology, so they buy (the small and innovative) companies, assimilate them, transforming the technology (not always (seldom) in the best direction) and slowing down the speed of technological change.
Re:Hmmm... by Anonymous Coward · 2007-02-11 14:13 · Score: 0

Then we are doing something wrong. The human brain provides compelling evidence that massive parallelization works. So: what are we missing?
Clearly, understanding. For example, despite the image that you attempt to project with your comment, I doubt you know anything of substance about the brain.
Re:Hmmm... by blahplusplus · 2007-02-11 16:31 · Score: 1

"Then we are doing something wrong. The human brain provides compelling evidence that massive parallelization works. So: what are we missing?"

And yet most people process things sequentially, I think you're mistaking the depth of parallelization that occurs. The mind is enormously more complicated then a computer, the conditions of it's operation are ever changing. New brain cells, culling of old ones, new connections, culling of old ones, etc. Next you forget that the brain is notoriously error prone, ask someone to recall something they did last week, or something they wrote earlier that day, and they will not remember it. Hell ask somebody to memorize a few pages of a novel after one or two quick read throughs, and even the most intelligent people will struggle to recall word-for-word in exacting detail what they just read moments ago. This also applies to what people here and see. While human minds are fantastic at what they can do, they are enormously slow, the whole reason we need to develop technology is because our bodies and minds in many circumstances are not as good as our technology.

Computers have a enormous advantage in the way they store information and the error correction protocols that do not exist to the same degree in human minds. On the internet humans make more mistakes then any random machine error that slips through. If you compared all the computers in the world and their computational/memory error rates and data integrity, against all the human minds, the computers would win hand down.

Human minds are not as good as you make them out to be, most human minds are mind numbingly slow at doing basic tasks (like math), when a computer does them infinitely much faster. Imagine every human being being able to chew through math problems at Core 2 duo + speeds. Societal evolution would be alarmingly much faster then it is today.
Re:Hmmm... by try_anything · 2007-02-12 09:22 · Score: 1

Yet the best baseball players in the world make errors every season (on how many throws? what error rate is that?) and rarely, if ever, make an optimal throw. Even pitchers, the best and most specialized throwers in the world, can't throw each pitch exactly as they intend, despite a lifetime of practice using the same ball at the same distance.

Plus, if you ask them to throw a zucchini accurately, they'll suck until they've had a few hours of practice.
Re:Hmmm... by bshanks · 2007-02-19 00:21 · Score: 1

Yes, there is research on neural net hardware.

I don't work on this myself, but some links:

http://www.amazon.com/Analog-VLSI-Neural-Systems-C arver/dp/0201059924

http://www.google.com/search?q=neural+vlsi

http://www.google.com/search?q=neuromorphic

LabVIEW and Parallelism by SparhawkA · 2007-02-10 12:16 · Score: 4, Insightful

Take a look at LabVIEW, a compiled graphical programming language from National Instruments. It natively supports SMP / multicore / multithreading. Essentially, dissociated pieces of code you write (computations, hardware I/O, etc.) are automatically scheduled in separate threads of execution in order to maximize efficiency. It's an interesting idea: here's a technical article from their website that does a better job of describing it (some marketing included as well): http://zone.ni.com/devzone/cda/tut/p/id/4233

Re:LabVIEW and Parallelism by Doppler00 · 2007-02-10 13:16 · Score: 1

As someone who has used LabVIEW extensively I have to say it kind of sucks. They claim "self documenting" and things like that. If you've ever seen a real LabVIEW program, you'll see that it's a mess, and very difficult to understand and debug. It doesn't help that it hasn't been Object Oriented until version 8. You STILL can't spawn 0 to N threads, it's built in library support is very week (simple things like list manipulation, queues, string processing, etc. don't exist!).

It's a niche product meant for data acquisition systems in experimentation.

And you're also likely to have sequential sections of code even with labview. This isn't really a language problem so much as it is an algorithms problem. There are just too many iterative algorithms that require a fast CPU to execute things sequentially.
Re:LabVIEW and Parallelism by Anonymous Coward · 2007-02-10 13:31 · Score: 0

That's because it's essentially a functional language.

I believe this to be a contradiction in the conventional wisdom of programming languages:
- functional languages are hard! let's stick with Algol-derived languages
- threads are hard! avoid them if at all possible
- but multiprocessing is practically free in a functional language

(Do I sound bitter? Sorry. My coworkers seem to have made it through college CS programs without ever taking a course in FP, or correctness proofs, so I spend too much time straining spaghetti code.)
Re:LabVIEW and Parallelism by the_brobdingnagian · 2007-02-10 13:40 · Score: 1

Labview itself would not be useful for high performance computing because it's made for data acquisition. The concept of programming with blocks and wires forces you to split your "code" in independent pieces which can be executed independently. Maybe an add-on to Labview or just a similar language written for high performance computing could make it very easy to create great parallel software.
Re:LabVIEW and Parallelism by SparhawkA · 2007-02-10 21:03 · Score: 1

Apparently not "extensively" enough. "Simple things like libraries for list manipulation, queues, string processing" have been around in LabVIEW for many years. And yes, you can also spawn an arbitrary number of threads through dynamic calls to functions. Please don't make ridiculous assertions unless they're somewhat founded in truth. Thank you.
Re:LabVIEW and Parallelism by Doppler00 · 2007-02-12 13:55 · Score: 1

Maybe they have been around for LabVIEW, but they are not part of the base library at all. Comparing LabVIEW to say Java or Python's list/queue manipulation abilities is a joke. LabVIEWS concept of "clusters" and "arrays" is also difficult to work with a lot of times. It also lacks even a 64 bit native data types (at least in version 7 that I'm aware of).

I'm not going to waste my time with a "graphical" language when I need to do complex string manipulations. The fact that NI markets LabVIEW as an "end all" solution to everything is counter productive for certain projects.

Do you have any examples of spawning multiple threads? I've asked at least 3 or 4 representatives from NI in person and none of them were able to demonstrate it. Yes, they showed how to dynamically load a different VI, but not how to programatically spawn the function. If it's so simple, why couldn't I find examples of this in the help?

As someone who had to maintain a complex SCADA system written in LabVIEW for over a year I think I have at least some experience with the language. Re-writing it in Python was much easier in the long run though.

From TFA by obender · 2007-02-10 12:17 · Score: 5, Funny

The target should be 1000s of cores per chip

640 cores should be enough for anyone.

Erlang: The Movie by djberg96 · 2007-02-10 12:20 · Score: 2, Funny

In case any has missed it: http://video.google.com/videoplay?docid=-583031888 2717959520

I can't wait for the sequel!

--
In the immortal words of Socrates, "I drank what?"

Re:Erlang: The Movie by Anonymous Coward · 2007-02-10 15:21 · Score: 0

Hello Joe, Hello Mike.

Classic stuff.

Express What, Not How by RAMMS+EIN · 2007-02-10 12:25 · Score: 3, Informative

I think parallelism can be achieved elegantly using languages that express what is to be done, rather than how it is to be done. Functional programming is a major step in the right direction. Not only do functional programs typically more clearly express what is to be done (as opposed to which steps are to be taken to get there), they also tend to cause fewer side effects (which restrict the correct evaluation orders). In particular, not using variables avoids many of the headaches traditionally involved in multithreading.

--
Please correct me if I got my facts wrong.

Re:Express What, Not How by ardor · 2007-02-10 12:37 · Score: 2, Insightful

Functional languages are no silver bullet, however. Things like I/O do not fit well in there. Yes, there are solutions for this, but they tend to be overly complicated. A hybrid functional/imperative language with safeguards for side-effects of the imperative parts seems to be the way to go.

--
This sig does not contain any SCO code.
Re:Express What, Not How by MajroMax · 2007-02-10 12:48 · Score: 1

A hybrid functional/imperative language with safeguards for side-effects of the imperative parts seems to be the way to go.

You just described Common Lisp.

--
"Evil company X is threatening to restrict our rights! Let's all get together to stop--OOOH! SHINEY!!!" -- AC
Re:Express What, Not How by Eli+Gottlieb · 2007-02-10 12:59 · Score: 1

Where does Common Lisp protect side-effects?
Re:Express What, Not How by WaffleKing16 · 2007-02-10 13:34 · Score: 1

http://haskell.org/ No side effects and it has a great way of dealing with IO. Plus the newest version of GHC has some great new features for writing parallel programs.
Re:Express What, Not How by The_Wilschon · 2007-02-10 14:33 · Score: 1

I suppose by usually having two different functions, one which works by side effects, and one which does the same thing except without side effects of any kind. Scheme does an even better job of this with the bang convention (adding a ! to the end of function name to denote that this version is destructive).

Not sure if that is what you mean, and it probably isn't quite good enough, but it is perhaps the right direction?

--
SIGSEGV caught, terminating

wait... not that kind of sig.
Re:Express What, Not How by Dorceon · 2007-02-10 16:27 · Score: 1

Church devised the Lambda Calculus and was largely ignored. Then Turing devised the Turing Machine. Once everyone accepted the Turing Machine, Turing proved that it was equivalent to the Lambda Calculus. In other words, functional programming wasn't a step in the right direction--procedural programming was a step in the wrong direction!

--
What sound do people on rollercoasters make? Hint: it's not Xbox 360.

massively parallel CPUs by Anonymous Coward · 2007-02-10 12:28 · Score: 0

Right, massively parallel computing doesn't work? And, this is from a research institution? And, research institutions don't use massively parallel computers? Right.

Hmmm, seems to me that some cpu makers, just like car makers, would like to continue selling you legacy hardware. I am not buying. Commoditized massively parallel cpus are a step in the right direction. This is the way that RAM memory is sold today. The same works with CPUs. In fact, I would go a little further and combine the two, CPU with RAM, in a single commodity. $5 per cpu means I can buy hundreds of cpus for the same price as that single monolithic legacy cpu that you have little choice about today. ie. The desktop computer company let's you choose the amount of RAM, and size of hard disk, but they don't let you choose which CPUs. I think that this is all about to change.

Is it a compiler problem or an os problem? by Anonymous Coward · 2007-02-10 12:39 · Score: 1, Interesting

Most of the time a computer isn't churning away on a single problem that needs to be parallelized. In that respect, the solution rests more with the operating system. Something like a server could relatively easily make use of almost any number of processors; one per client maybe. The trouble comes with bus contention. It is a problem similar to designing a subnet. If you have too many computers/processors, communications grind to a halt. In other words, adding processors slows things down rather than speeding them up.

The solution to adding more processors may be architectural. Adding more busses, ala harvard architecture for example, might be an effective approach. This is the approach used in DSPs which are a lot more powerful on a per-cycle basis than the more conventional architecture.

Re:Is it a compiler problem or an os problem? by Doctor+Memory · 2007-02-10 16:06 · Score: 2, Insightful

Most of the time a computer isn't churning away on a single problem that needs to be parallelized. In that respect, the solution rests more with the operating system. True, and in the short term I'd imagine that's where most of the improvements will appear, but context switches are expensive. It's more efficient if you can switch execution to another thread within the same context, so you can (hopefully) still use the data in your I & D caches, although you do still have a register spill/reload.

Speaking of architecture changes, it sounds like Intel is going down the same road the Alpha team did — more cacheing. I remember reading an article about one system DEC made (ISTR this was about the time the 21264 came out) that had 1M of L1 cache, 2M of L2, and 8M of L3. I wonder how much cache they could squeeze onto a chip, given current power handling.

--
Just junk food for thought...
Re:Is it a compiler problem or an os problem? by vhogemann · 2007-02-11 01:21 · Score: 1

Agreed,

Also one may notice that many applications will see a boost in performance, even being sequential. That's because it will suffer less from context switching, since there's more than one processing core to share the load.

I can see this trend as a big win for the OpenSource OSes out there. Remember how the Linux scheduler was slow, and how Apache2 on Windows was faster? That doesn't last long, a new scheduler was written, and the Linux performance for treads got a big boost, and just kept on getting better. Windows, on the other hand, just couldn't keep up... because Microsoft doesn't have the necessary agility, and because the Windows kernel is tied with too much stuff.

Just my $0.02

--
---- You know how some doctors have the Messiah complex - they need to save the world? You've got the "Rubik's" complex

Sure, desktop apps may get diminishing returns... by GoldTeamRules · 2007-02-10 12:44 · Score: 1

But, service based applications will soar. As a web developer, I'm thrilled by the prospect of being able to run my web application on 16 and 32 core machines. I could care less about whether office apps run any more quickly.

Re:Sure, desktop apps may get diminishing returns. by gbjbaanb · 2007-02-10 12:51 · Score: 1

This isn't bringing parallelisation to your web app for free, instead you're getting 1 cpu just liek you used to have. The difference is that the server can handle many more users, with new limits of memory and disk i/o, of course.

It's a supercomputer perspective by Animats · 2007-02-10 12:59 · Score: 5, Insightful

I just heard that talk; he gave it at EE380 at Stanford a few weeks ago.

First, this is a supercomputer guy talking. He's talking about number-crunching. His "13 dwarfs" are mostly number-crunching inner loops. Second, what he's really pushing is getting everybody in academia to do research his way - on FPGA-based rackmount emulators.

Basic truth about supercomputers - the commercial market is zilch. You have to go down to #60 on the list of the top 500 supercomputer before you find the first real commercial customer. It's BMW, and the system is a cluster of 1024 Intel x86 1U servers, running Red Hat Linux. Nothing exotic; just a big server farm set up for computation.

More CPUs will help in server farms, but there we're I/O bound to the outside world, not talking much to neighboring CPUs. If you have hundreds of CPUs on a chip, how do you get data in and out? But we know the answer to that - put 100Gb/s Ethernet controllers on the chip. No major software changes needed.

This brings up one of the other major architectural truths: shared memory multiprocessors are useful, and clusters are useful. Everything in between is a huge pain. Supercomputer guys fuss endlessly over elaborate interconnection schemes, but none of them are worth the trouble. The author of this paper thinks that all the programming headaches of supercomputers will have to be brought down to desktop level, but that's probably not going to happen. What problem would it solve?

What we do get from the latest rounds of shrinkage are better mobile devices. The big wins commercially are in phones, not desktops or laptops. Desktops have been mostly empty space inside for years now. In fact, that's true of most non-mobile consumer electronics. We're getting lower cost and smaller size, rather than more power.

Consider cars. For the first half of the 20th century, the big thing was making engines more powerful. By the 1960s, engine power was a solved problem, (the 1967 turbine-powered Indy car finally settled that issue) and cars really haven't become significantly more powerful since then. (Brakes and suspensions, though, are far better.)

It will be very interesting to see what happens with the Cell. That's the first non-shared memory multiprocessor to be produced in volume. If it turns out to be a dead end, like the Itanium, it may kill off interest in that sort of thing for years.

There are some interesting potential applications for massive parallelism for vision and robotics applications. I expect to see interesting work in that direction. The more successful vision algorithms do much computation, most of which is discarded. That's a proper application for many-CPU machines, though not the Cell, unless it gets more memory per CPU. Tomorrow's robots may have a thousand CPUs. Tomorrow's laptops, probably not.

Re:It's a supercomputer perspective by deadline · 2007-02-10 13:53 · Score: 2, Interesting

Basic truth about supercomputers - the commercial market is zilch. You have to go down to #60 on the list of the top 500 supercomputer before you find the first real commercial customer.
You may want to adjust your truth as your measure of the market is wrong. The Top500 is not a marketing survey and just because you have HPC hardware does mean you run out and try an get it on the Top500. Many companies are using (HPC) parallel cluster computers, but they choose to be quiet about it for competitive reasons. The 2005 HPC market was well over 9 Billion and IDC predicts a 9% AGR bringing the market to over $14 Billion in 2010. Can you tell me what other markets are offering such growth rates these days?
Supercomputer guys fuss endlessly over elaborate interconnection schemes, but none of them are worth the trouble.
If as most companies have stated that they cannot compete without using HPC, R&D at the high end is indeed, worthwhile. The endless fussing eventually makes it to commodity markets, open you computer case and look around.

--
HPC for Primates. Read Cluster Monkey
Re:It's a supercomputer perspective by RecessionCone · 2007-02-10 14:05 · Score: 5, Informative

I don't think you were listening very carefully to the talk (or know much about Computer Architecture) if you think Dave Patterson is a supercomputer guy. Perhaps you've heard of the Hennessy & Patterson Quantitative Approach to Computer Architecture book (you know, the one used at basically every university to teach about computer architecture). Patterson has been involved in a lot of different things within computer architecture over the years, including being one of the main people behind RISC and RAID (as well as being the president of the ACM). I saw his talk when it was given at Berkeley, and you really missed the point if you thought it was about supercomputing. The talk was about the future of computing in general, which is increasingly parallel, in case you're unaware of that fact. GPUs are already at 128 cores, Network processors are up to 200 cores. Intel is going to present an 80 core x86 test chip tomorrow at ISSCC. Physics won't support faster single core processors at the rate we're accustomed to, so the whole industry is going parallel, which is a sea change in the industry. Patterson's talk is aimed at the research community, since we don't have good answers as to how these very parallel systems should be architected and programmed. FPGA emulation is a great way to play around with massive multiprocessor configurations and programming strategies, which is why Patterson is advocating it (his RAMP project has researchers from MIT, Berkeley, Stanford, Texas, Washington involved (among others)). You also need to have a little more imagination about what we could do with more computing power. Try looking at Intel's presentations on RMS http://www.intel.com/technology/itj/2005/volume09i ssue02/foreword.htm.
Re:It's a supercomputer perspective by Wesley+Felter · 2007-02-10 14:10 · Score: 1

The author of this paper thinks that all the programming headaches of supercomputers will have to be brought down to desktop level, but that's probably not going to happen. What problem would it solve?

It would solve the problem of software actually being able to run faster on computers that will be produced in the near future. If the status quo continues, desktop software simply won't get any faster, even when computers get faster; this seems like a real waste.
Re:It's a supercomputer perspective by antifoidulus · 2007-02-10 14:53 · Score: 2, Insightful

Also keep in mind that many companies aren't interested in linpac peformance per se, at least to the extent that they will spend a lot of time and effort tweaking their computers to get really high linpac scores, which is all that is important when it comes to top500.

--
Monstar L
Re:It's a supercomputer perspective by the+donner+party · 2007-02-10 19:54 · Score: 1

If the software guys cannot take advantage of the increasing hardware performance that is provided by more parallel designs, this will severely dampen the economic incentive to develop parallel hardware. Why would an executive invest in new hardware and software if the end result is no more effective than the old stuff he already has? I'm sure there will be many cases where the software does work better on new parallel hardware, but if the required programming effort is significantly higher than the comparable sequential programming effort, this will inevitably diminish the returns available from parallel hardware developments. Perhaps R&D efforts will be redirected from hardware to smarter parallelizing development tools (not only compilers but refactoring tools, debuggers etc.)
Re:It's a supercomputer perspective by Anonymous Coward · 2007-02-10 20:32 · Score: 2, Informative

I posted most of the following comment elsewhere, but this seems to be a better place for it.

I also saw Dave Patterson give this talk at Stanford. RecessionCone is correct, and Dave did mention this at the Stanford talk: the whole world is going parallel, not just the supercomputer space, because chip makers can't figure out how to make chips faster sequentially. Chip makers are throwing a hail Mary pass (Dave's metaphor) by putting parallel chips out there and praying that someone can figure out how to exploit them effectively.

There was an interesting exchange with an audience member who used to work on the Transputer: the audience member pointed out that many of these predictions had been made before more than a decade ago, but they never happened. Dave basically said that it was because Intel managed to squeeze more sequential performance out of their chips, and that killed parallel chip projects like Transputer. Nobody really wants parallelism. It's hard. Chip makers are going for it now simply because they have no choice; nobody has any idea whether we'll be able to develop the software techniques needed to exploit massively parallel processors effectively.

Another audience member asked what would happen if someone figured out a way to squeeze more sequential performance out of chips. Dave smiled and said he would be very interested in talking to that person.
Re:It's a supercomputer perspective by Tom+Womack · 2007-02-11 22:02 · Score: 2, Interesting

The commercial market for large-scale computers exists, it just doesn't advertise itself on the top500 list. Banks have big clusters for monte-carlo work (why do you think the most-hyped benchmark application when Intel talks about eight-core Clovertown systems is an asset-pricing model); oil companies have enormous clusters for seismic work.

But both of those are intrinsically parallel jobs, so the clusters don't need the interconnect to run linpack, and there's no advantage to BP or Barclays in appearing on the top500 list -- they're not looking for the kind of researcher who picks his employer by acreage of computers, whilst Aldermaston and the Met Office may well be.

One of the embarrassing truths of computation is that an awful lot of jobs are either small enough to run on a single PC or too large to run on anything; there aren't that many things (climate modelling's the obvious one) constrained by amount of computation.

Harvard architecture since i486 by tepples · 2007-02-10 13:07 · Score: 0

Most of the time a computer isn't churning away on a single problem that needs to be parallelized. Perhaps a server can be parallelized, but what about a home computer?

Adding more busses, ala harvard architecture for example, might be an effective approach. There is already a Harvard architecture inside PC-class CPUs since the i486, as the level 1 cache is split into separate sections with separate buses for instructions and data.

Programming 10,000 Cores by deadline · 2007-02-10 13:14 · Score: 2, Interesting

Those of us that use HPC clusters (i.e. Beowulf) have been thinking about these issues as well. For those interested, I wrote a series of articles on how one might program 10,000 cores (based on my frustrations as programmer and user of parallel computers). Things will change, there is no doubt.

The first in the series is called Cluster Programming: You Can't Always Get What You Want The next two are Cluster Programming: The Ignorance is Bliss Approach, and Cluster Programming: Explicit Implications of Cluster Computing.

Comments welcome.

--
HPC for Primates. Read Cluster Monkey

Check out co-array fortran.... by epiphyte(3) · 2007-02-10 13:21 · Score: 2, Informative

AKA F--, The simplest explicit programming model on the planet. Brainchild of Bob Numrich, unsung hero of Cray Research in the early 90's ( & probably much before... but that was when I was lucky enough to work with him) F-- was Numrich's second great contribution to parallel programming models... the first being the shmem model for the Cray T3D, Four assembly routines which made the raw capabilities of the T3D available to massively parallel applications when every other programming model (e.g. MPI) had about 50x the communication overhead. This was a big factor in Cray's takeover of the short-lived MPP market in the mid 90's. On the topic of the thread.... Explicit programming models scale to thousands of processors, implicit ones peter out at 4-8. The reason is Data Locality. Explicit models ensure that the processor is operating on data which is local and unshared. Implicit models end up fighting for operands with competing processors. This requires either heroic hardware ( e.g. 70% of the Cray C-90s processor logic was concerned with memory contention resolution) or a dramatic performance dropoff.

Que : Inherently Parallel Programming by FMota91 · 2007-02-10 13:32 · Score: 2, Interesting

Actually, I've been working on a programming language/model that makes programs inherently parallel. Of course, it is quite different from anything currently in existence. Basically, it uses a queue (hence the name "Que") to store data (like the stack in FORTH), but due to the nature of the queue, programs become inherently parallel. Large programs could have hundreds of processes running at the same time, if so inclined.

If you are interested, check out my project (there's not much there right now), and/or contact me at FMota91 at GMail dot com.

--
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C1 bottles of beer on the wall. Take one down, pass it round... Oh, umm...

Re:Que : Inherently Parallel Programming by Anonymous Coward · 2007-02-10 13:52 · Score: 0

Using queues in parallel languages is not new. Your project also contains, like, 5 files and was started on January 30. Apparently you're not very far along yet.
Re:Que : Inherently Parallel Programming by FMota91 · 2007-02-10 14:07 · Score: 1

Thank you for answering me. I was not aware of that project, but I will look into it to see what I can gleam from it. And yes, I have only just started.

--
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C1 bottles of beer on the wall. Take one down, pass it round... Oh, umm...
Re:Que : Inherently Parallel Programming by Anonymous Coward · 2007-02-10 14:46 · Score: 0

Since you've clearly not done any research into programming languages, I suggest you hold back on calling it "very different from other programming languages".
Re:Que : Inherently Parallel Programming by FMota91 · 2007-02-10 15:18 · Score: 1

It is definitely very different from cilk. And I have looked into a fair amount of different programming languages.

cilk looks a lot like C -- in fact, it is basically an extionsion of C.
Que looks a bit like FORTH. It is not an extension, nor is it programmed anything like FORTH.

cilk has explicit parallelism.
Que has implicit parallelism.

Here's a snippet of Que code (from inside a "word" that duplicates three items on the queue):
( a b c -- a b c a b c )
dup dup dup
pass swap swap pass
pass pass swap pass pass

PS: I believe I have studied [much] more than a dozen programming languages. That's more than most people I know (although admittedly I do not know that many people in CS). Most languages are similar to other languages, with a few exceptions (FORTH, Lisp, almost every esoteric programming language). Que is different from all of those programming languages.

--
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C1 bottles of beer on the wall. Take one down, pass it round... Oh, umm...
Re:Que : Inherently Parallel Programming by xenocide2 · 2007-02-10 15:52 · Score: 1

Indeed, I work with TinyOS, an open source platform for embedded sensor network systems. One of the features is a task queue. Unfortunately, it doesn't support associating data with a given task in the queue, which would make life much easier for me, and much harder for the people authoring the system. The whole thing's written in "nesC" which is basically C with a few extras and some component oriented things to modularize C.

As a language, its design is to take interrupts and parallelism and make it part of the language. You're allowed to label sections atomic, and the compiler enforces this. Currently it just turns off interrupts, but analysis could improve this by removing useless atomics and perhaps identifying when one atomic section won't interfere with another. But for now, turning off interrupts is cheap enough that it's hard to justify stronger synchronization primitives at runtime. Currently the system assumes a single task from the queue will be running, so one can guarantee atomicity between them, but I don't know if the langauge / system explicitly states this.

Unfortunately, this article claims to have inspected the subject of embedded systems and parallelism, but aside from some head-nodding to VxWorks, there's not much meat on for the subject. Which is probably ok. The paper focuses on compute bound domains like BLASTp, the stuff we work with is limited more by battery longevity and the speed at which connected devices work. If a 12bit ADC is going to take some time, we'd rather go into a low power state if possible. Same goes for writing to SD over i2c. Yet some of the conventional wisdoms the authors put forth are true. Transistors (atmega128s) are cheap, but battery power is not. The cost of going out to the field and replacing a battery could approach the cost of replacing the device. Many deployments just ignore battery failures, since once you have a hundred of these things deployed you're not gonna find the failed one anyways. But there's still a few that are just junk for us, such as floating point vs memory and the potential for faster CPUs. But as I noted above, our software isn't trying to solve some compute intensive problem, it's controlling hardware.

--
I Browse at +4 Flamebait
Open Source Sysadmin
Re:Que : Inherently Parallel Programming by Anonymous Coward · 2007-02-10 16:50 · Score: 0

You should read the paper before you chalk Cilk up to a C extension. Specifically, look at the guarantees that cilk makes about load balance and work allocation, and the work-stealing that gets done between threads to accomplish that. Cilk's parallelism is explicit in that concurrent tasks must be spawned, but it's implicit in that threads are scheduled intelligently, even on potentially heterogeneous machines.
Re:Que : Inherently Parallel Programming by Anonymous Coward · 2007-02-10 19:55 · Score: 0

"check out my project (there's not much there right now)"
Perhaps I'm missing the joke, but your page on sourceforge is a linkfarm.
MSS: Massive parallel spamming ?
Re:Que : Inherently Parallel Programming by FMota91 · 2007-02-11 06:29 · Score: 1

Actually, I typed the address wrong. It's http://sourceforge.net/projects/que

--
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C1 bottles of beer on the wall. Take one down, pass it round... Oh, umm...

Berkeley View Links by RecessionCone · 2007-02-10 13:39 · Score: 2, Informative

For those that are interested, the Berkeley View project website is at http://view.eecs.berkeley.edu/, which includes some video interviews with the principal professors involved in the project. There is also a blog at http://view.eecs.berkeley.edu/blog/

Parallelizing Compilers by jd · 2007-02-10 13:54 · Score: 3, Insightful

This problem was "solved" (on paper) in the mid 1970s. Instead of writing a highly complex parallel program that you can't readily debug, you write a program that the computer can generate the parallel code for. Provided the compiler is correct, the sequential source and the parallel binary will be functionally the same, even though (at the instruction level) they might actually be quite different. What's more, if you compile the sequential source into a sequential binary, the sequential binary will behave exactly the same as the parallel version (only much slower).

Any reproducable bug in the parallel binary will be reproducable given the same set of inputs on the sequential binary, which you can then debug as you have the corresponding sequential source code.

So why isn't this done? Automagically parallelizing compilers (as opposed to compilers that merely parallelize what you tell them to parallelize) are extremely hard to write. Until the advent of Beowulf clusters, low-cost SMP and low-cost multi-core CPUs, there simply haven't been enough machines out there capable of sufficiently complex parallelism to make it worth the cost. Simply make a complex-enough inter-process communication system, with a million ways to signal and a billion types of events. Any programmer who complains they can't use that mess can then be burned at the stake for their obvious lack of appreciation for all these fine tools.

Have you ever run GCC with maximum profiling over a program, tested the program, then re-run GCC using the profiling output as input to the optimizer? It's painful. Now, to parallelize, the compiler must automatically not just do one trivial run but get as much coverage as possible, and then not just tweak some optimizer flags but run some fairly hefty herustics to guess what a parallel form might look like. And it will need to do this not just the once, but many times over to find a form that is faster than the sequential version and does not result in any timing bugs that can be picked up by automatic tools.

The idea of spending a small fortune on building a compiler that can actually do all that reliably, effectively, portably and quickly, when the total number of purchasers will be in the double or treble digits at most - say what you like about the blatant stupidity rife in commercial software, but they know a bad bet when they see one. You will never see something with that degree of intelligence come out of PCG or Green Hills - if they didn't go bankrupt making it, they'd go bankrupt from the unsold stock, and they know it.

What about a free/open source version? GCC already has some of the key ingredients needed, after all. Aside from the fact that the GCC developers are not known for their speed or responsiveness - particularly to arcane problems - it would take many days to compile even SuperTuxKart and probably months when it came to X11, glibc or even the Linux kernel. This is far longer than the lifetime of most of the source packages - they've usually been patched on that sort of timeframe at least once. The resulting binaries might even be truly perfectly parallel, but they'd still be obsolete. You'd have to do some very heavy research into compiler theory to get GCC fast enough and powerful enough to tackle such problems within the lifetime of the product being compiled. Hey, I'm not saying GCC is bad - as a sequential, single-pass compiler, it's pretty damn good. At the Supercomputer shows, GCC is used as the benchmark to beat, in terms of code produced. The people at such shows aren't easily impressed and would not take boasts of producing binaries a few percent faster than GCC unless that meant a hell of a lot. But I'm not convinced it'll be the launchpad for a new generation of automatic parallelizing compilers. I think that's going to require someone writing such a compiler from scratch.

Automatic parallelization is unlikely to happen in my lifetime, even though the early research was taking place at about the time I first started primary school. It's a hard problem that isn't being made easier by having been largely avoided.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)

Re:Parallelizing Compilers by hszp · 2007-02-11 00:37 · Score: 1

"it would take many days to compile even SuperTuxKart and probably months when it came to X11, glibc or even the Linux kernel."

Imagine there would be a compiler running highly optimised for multiple processors!
Holy sh*t, that would be the Holy Grail of parallel computing!
Just imagine...

I wonder though how long it would take for this thing to compile itself or how many days/months/years its debugging would require.
Re:Parallelizing Compilers by jd · 2007-02-11 13:18 · Score: 1

I guess it was meant in humour. Obviously a compiler that can't compile itself isn't much use as a compiler, so an automatically-parallelizing compiler could automatically parallelize itself. The herustics involved in parallelizing tend not to be amenable to much parallelizing and are NP-complete, which means you don't actually get very far down that road. Debugging, though, isn't an issue. The point is that the compiler does the lifting work, you only need to debug the method. The method will be something of the form: X depends on Y, so X cannot start until Y finishes, so Y is the largest contiguous block that can be truly parallel. Y can be split into parallel tasks N different ways. Y(1) takes t(1) clock cycles to finish, Y(2) takes t(2) clock cycles, and so on. First go round, fastest way wins.
You then profile, look at the overheads, and eliminate certain possibilities from the list. You then repeat the method. Basic herustics. Nothing complicated in herustics - they're actually quite easy to debug. Much gentler on the brain cells, they're just slooooow. Oh, and the compiler should be using various sizes of X and Y, as some functions in the code may be subject to coarse-grain parallelism, others will work better with fine-grain. The compiler has to try very few possibilities to get something that works, but has to try an impossibly large number to get something that works at nearly the speeds of well-written hand-turned code.
Debugging has never been a real issue in programming, although those wanting to be paid more will tell you it is. In the same way that Software Engineering espouses programming from a formal specification, it is perfectly possible to turn any existing sequential program into a formal specification. How does this help? Because it is much easier to prove a specification correct than it is to prove a program correct. By producing a specification that describes a program exactly as it is written, any bug in the program will also be in the specification, in exactly the same place. That makes it much easier to find and therefore debug.
Specifications and programs are one and the same thing, if they are complete and correct. It's a transform, no different from FFTs, Laplace, or cartesian-to/from-polar mappings. Transform into the easiest domain to work in, then transform back into the doman the computer will find easiest to work in. This isn't hard, guys.
And this goes back to the automatic parallelization. Sequential logic and parallel logic must ultimately be the same, since the programs do the same thing. Therefore, one is just a transform of the other. Understand that and there is no problem. You want to debug sequential code? Sure, apply the transform. You want to run parallel code? Sure, apply the transform. If you have a problem with transforms, practice on a slide-rule for a bit.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re:Parallelizing Compilers by Lost+Engineer · 2007-02-12 14:22 · Score: 1

I agree with you that auto-parallelization is unlikely to happen in (your|my) lifetime. I think it's even more distant for C, although it is being tried for Fortran (see HPF and Co-array Fortran) and Java (err not-Java but whatever) (see Fortress).

We programmers are just going to have to adjust our mindset when it comes to performance programming. The compilers will be there to give us the means to express parallelism, but divining it is a lot to ask.

That said, some C compilers already auto-vectorize, and when they unroll loops, a similar kind of dependency-checking goes on. You could argue in those cases that you've already given the compiler information about parallelization in the form of a for-loop and restrict pointers. I've yet to see a compiler that will generate any parallel code that requires any kind of synchronization in C. We're going to have to change to something that sacrifices some of the flexibility of C for ease of parallelization.
Re:Parallelizing Compilers by jd · 2007-02-12 17:29 · Score: 1

Some flexibility/parallelizability (is that a word?) trade-off is done with the OpenMP extensions to C, and also in Universal Parallel C (UPC). These are decent-enough languages, for many things, but don't expect someone to write a clone of Linux using either. There are also languages specifically designed for parallel processing, such as Occam. KROC (a portable Occam compiler that works well with Linux) is great, if what you want to do is Occam programming. Hey, nothing wrong with the language - I love it - but it is not well-suited to some of the heavy-lifting people want parallelism for. No dynamic structures, for example. Memory management is not easy or nice under a parallel environment, but killing off malloc makes it a nightmare to write anything of any complexity.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)

Shared-state concurrency is harder than you think. by zestyping · 2007-02-10 13:59 · Score: 5, Insightful

Reliably achieving even simple goals using concurrent threads that share state is extremely difficult. For example, try this task:

Implement the Observer (aka Listener) pattern (specifically the thing called "Subject" on the Wikipedia page). Your object should provide two methods, publish and subscribe. Clients can call subscribe to indicate their interest in being notified. When a client calls publish with a value, your object should pass on that value by calling the notify method on everyone who has previously subscribed for updates.

Sounds simple, right? But wait:

What if one of your subscribers throws an exception? That should not prevent other subscribers from being notified.
What if notifying a subscriber triggers another value to be published? All the subscribers must be kept up to date on the latest published value.
What if notifying a subscriber triggers another subscription? Whether or not the newly added subscriber receives this in-progress notification is up to you, but it must be well defined and predictable.
Oh, and by the way, don't deadlock.

Can you achieve all these things in a multithreaded programming model (e.g. Java)? Try it. Don't feel bad if you can't; it's fiendishly complicated to get right, and i doubt i could do it.

Or, download this paper and start reading from section 3, "The Sequential StatusHolder."

Once you see how hard it is to do something this simple, now think about the complexity of what people regularly try to achieve in multithreaded systems, and that pretty much explains why computer programs freeze up so often.

Systems won't be so homogenius by BlueCoder · 2007-02-10 14:05 · Score: 1

As parallelism ramps up the processor cores will get simpler and we will return to the concept of RISC computing. But the programing paradigm will be virtual processors. Furthermore some processors will be better at certain operation than others.

What will eventually turn into an AI subsystem, will postcompile and dynamically rewrite programs to take best advantage of the system. By AI I don't mean anything sentient, just something that intelligently recognizes programming structures and patterns and restructures the code to match the hardware. Think IA64 on steroids.

Re:Systems won't be so homogenius by Anonymous Coward · 2007-02-10 16:21 · Score: 0

I'm trying to understand what you mean but it's sounding like a string of technical sounding words put together to seem intelligent. The RISC computing is simply reducing the instruction set and addressing modes available to the programmer. I don't see how concurrency is a CISC/RISC issue - it's a software design issue.

Java invokeLater() by Latent+Heat · 2007-02-10 14:19 · Score: 1

In Java, you can have threads communicate by passing objects that you treat as immutable -- a thread creates an object, passes it as an argument to invokeLater(), where it gets synchronized with the event queue, acted upon, and then garbage collected. That the thread that called invokeLater() on an object no longer retains any references to that object is a matter of programming discipline, but that style of multi-threaded programming is available.

The question or concern I have is that if one programs threads that way, communicating by creating, passing, and garbage collecting objects, is the garbage collector happy with this, or does this create more overhead than you save with the thread parallelism (on multi core of course)?

Re:Java invokeLater() by Anonymous Coward · 2007-02-10 17:53 · Score: 0

What? didn't anyone ever tell you it uses magic?
Re:Java invokeLater() by Pinky · 2007-02-11 08:31 · Score: 1

The short answer is the modern java garbage collector is tuned to deal with lots of small, short lived objects so odds are you're all right. An interesting thing to think about is when you do invokeLater(), a common style is to create a new Runnable().. I don't know if you're doing this but if you are then you're creating a new Runnable object everytime anyway.. Also know that there is some overhead to switching threads.

If you want to now where you're being in-efficient, you've got to profile, baby... Don't guess!!!! find out for sure.

( You might have a forgotten sleep statement somewhere.. I've seen this quite a few times actually ! :) )

It is hard by MtHuurne · 2007-02-10 14:27 · Score: 1

I don't say "never use threads", but I do say "don't use threads unless you have to". Bugs in threaded code are easier to introduce, are hard to see in code reviews, are very likely to remain undetected in manual testing, are likely to remain undetected in automated testing and are a pain to reproduce.

Part of the problem is the inherent difficulty of parallelism, but there is more to it than that. Threads are not isolated in any way: they can take unpredictable paths through the program and touch any data structure. It's a good idea to design your threaded program in such a way that each thread sticks to a small section of the code and data, but this is a structure that you have to impose yourself: the threading mechanism does not help here.

So, in my opinion threads are hard. Sometimes you have to do hard things though. But if you can avoid it, you can save yourself a lot of effort and risk by sticking to a single thread. I think the situation can be compared to memory allocation: noob programmers are sure to mess up explicit allocation/deallocation; experienced programmers get it mostly right, but spend a lot of effort on it and still make mistakes from time to time (especially in the error handling paths). Garbage collection avoids a lot of problems.

For the future, I think we should have some kind of mechanism for parallelism other than threads. It would have to isolate the parallel computations from each other and provide a safe and efficient way to pass control and data between them. I read a bit about Erlang when it was mentioned here a couple of days ago, but it is not yet clear to me if/how this could be fitted into an object oriented language. That would be essential to get it to the masses: functional programming hasn't become mainstream because it doesn't match the way most programmers think.

Re:It is hard by Anonymous Coward · 2007-02-10 14:48 · Score: 1, Insightful

"don't use threads unless you have to" should read: "don't use ANY arbitrary feature unless you have to" i.e. KISS.
Re:It is hard by Anonymous Coward · 2007-02-10 18:46 · Score: 0

"don't use threads unless you have to" should read: "don't use ANY arbitrary feature unless you have to" i.e. KISS.

That's why I never use comments!

Seriously, all features are not created equal. Threads have a relatively high cognitive burden.

Never underestimate the virus writers. by DrYak · 2007-02-10 14:44 · Score: 1

run my web application on 16 and 32 core machines. I could care less about whether office apps run any more quickly.

Never underestimate the huge quantity of virus/trojans/spywares/sony rootkits/spambots/etc. pollution that joe 6-pack will be running next to his power hungry Microsoft OS and his main office work.

16/32 cores machines could be useful for him too, so his main work (browsing for pr0n ?) won't slowdown for being constantly interrupted by the multitasking scheduler to let the huge pile of crap run.

---

1 core for Office.
4 cores for the "WoW effect".
11 remaining cores pumping spam.

1 "Intel Core Pento 16 cores" to rule them all.

--
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]

There is no one right answer by kwahoo · 2007-02-10 16:12 · Score: 3, Insightful

...if there were, the langauge wars of the 80s and 90s would have produced an answer. And what new langauge caught on? Not Sisal, or C*, or Multilisp, etc. It was Java. And C, C++, and Fortran are still going strong.

Part of the problem, as previous posts have observed, is that most people didn't have much incentive to change, since parallel systems were expensive, and bloated, ineffeicient code would inevitably get faster thanks to the rapid improvement in single-thread performance that we enjoyed until recently. So outside of HPC and cluster apps, most parallelism consisted of decoupling obviously aynchronous tasks.

I don't think there ever will be one language to rule them all.... The right programming model is too dependent on the application, and unless you are designing a domain-specific system, you will never get people to agree. Depending on your needs, you want different language features and you make different tradeoffs on performance vs. programmability. For some applications, functional programming languages will be perfect, for others Co-Array Fortran will be, for others an OO derivative like Mentat will be, etc. And as new applications come to the fore, new languages will continue to spawn.

I think the key is to:

Do your best to estimate what range of applications your chip will need to support (so you could imagine that chips for desktops/workstations might diverge from those for e-commerce datacenters--which we already see to a mild extent)
Deduce what range of programming language features you need to support
Do your best to design an architecture that is flexible enough to support that, and hopefully not close off future ideas that would make programming easier.

If one programming model does triumph, I would predict that it will be APIs that can be used equally well from C, Fortran, Java, etc., thus allowing different application domains to use their preferred APIs. And even that model is probably not compelling enough to bring them all and in the dark bind them....

Re:Shared-state concurrency is harder than you thi by xenocide2 · 2007-02-10 16:43 · Score: 1

Why does the notify allow() for throwing exceptions? The only exception I can see that it throws is IllegalMonitorState, and that seems easy enough to fix. I vaguely recall something about runtime exceptions or whatnot always being throwable, but really, there's nothing the Observer can do about such things except carry on with the rest of notifications. This is bad for debugging, so one solution would be to hold the exceptions until all notifications have been done and then deal with them all at once. Simply ignoring exceptions isn't an option, multi-threaded or not. If anything, this is an example of how instructional curricula have failed Java programmers.

If publishing triggers a subscriber to trigger another publish, that's either programming error or intended behavior in my book. A strict reading of the spec you've provided suggests we must notify all subscribers all publishes, not merely the newest. This eliminates a possible optimization should this trigger not occur every time.

My approach, without reading the paper, would be to make a copy the subscriber list on a publish. Synchronizing add and copy should be as simple as synchronizing on the subscriber's data structure. If this is prone to deadlock, I'm afraid you shouldn't be using Java.

--
I Browse at +4 Flamebait

Open Source Sysadmin

what about I/O and mosix style clustering? by soldack · 2007-02-10 17:02 · Score: 3, Interesting

I used to work for SilverStorm (recently purchased by QLogic). They make InfiniBand switches and software for use in high performance computing and enterprise database clustering. The quality of the I/O subsystem of a cluster played a large part in determining the performance of a cluster. Latency (down the microsecond) and bandwidth (over 10 gigabits per second) both mattered.

Also, we found that sometimes, what made a deal go through was how well your proposed system could run some prexisting software. For example, vendors would publish how well they could run a standard crash test simulation.

Also, I would like to see more research put into making clustered operating systems like mosix good enough so that developers can stick to what they have learned on traditional SMP systems and have their code just work on large clusters. I don't think that multicore processors eliminate the need for better cluster software.

--
-- soldack

Re:Shared-state concurrency is harder than you thi by zestyping · 2007-02-10 18:00 · Score: 1

If publishing triggers a subscriber to trigger another publish, that's either programming error or intended behavior in my book. A strict reading of the spec you've provided suggests we must notify all subscribers all publishes, not merely the newest. No problem with that. For correctness we just need to make sure that the last update received by each subscriber contains the latest published value.

My approach, without reading the paper, would be to make a copy the subscriber list on a publish. Indeed, this is exactly the solution considered in the discussion (section 4.2 of the paper). But if you have a notification loop like this:
for (Subscriber subscriber: subscribers) { subscriber.notify(value); }
and it isn't synchronized, the notifications can arrive out of order.

I'm not saying you can't solve this problem. The point is, it's much trickier than almost all programmers are taught to realize. (This problem is easy to solve if you use event queues instead of multiple threads.)

Summary is misleading by mrshoe · 2007-02-10 18:00 · Score: 1

the 'evolutionary approach to parallel hardware and software may work from 2- or 8-processor systems, but is likely to face diminishing returns as 16 and 32 processor systems are realized'

I saw Patterson give this talk at PARC. While he did say something similar to the above quote, he was NOT suggesting that fewer, more powerful processors is the future (which is the impression I got from the summary). In fact, the entire talk was to the opposite point. Parallelism is the future; the processor is the new transistor; processors will have thousands of cores, etc. Very fascinating stuff. I hope more researchers participate and use his emulation system, because I think he's correct (as he has been in the past) that we will need to adopt parallelism more in the future.

--
There are two types of people in this world: those that categorize other people and those that don't.

Re:Summary is misleading by mshurpik · 2007-02-10 18:59 · Score: 1

Well yeah, but can you parallel to 128 processors as well as you can parallel to 4 processors? I doubt it. Parallelism is an art, and if the future of computing is based on art, then we've left Moore's Law at the station.
Re:Summary is misleading by Anonymous Coward · 2007-02-10 20:19 · Score: 0

I saw Dave Patterson give this talk at Stanford. There was an interesting exchange with an audience member who used to work on the Transputer: the audience member pointed out that many of these predictions had been made before more than a decade ago, but they never happened. Dave basically said that it was because Intel managed to squeeze more sequential performance out of their chips, and that killed parallel chip projects like Transputer. Nobody really wants parallelism. It's hard. Chip makers are going for it now simply because they have no choice; nobody has any idea whether we'll be able to develop the software techniques needed to exploit massively parallel processors effectively.

Another audience member asked what would happen if someone figured out a way to squeeze more sequential performance out of chips. Dave smiled and said he would be very interested in talking to that person.

Parallizing Languages by Samarian+Hillbilly · 2007-02-10 19:27 · Score: 1

I hope you enjoy you're wait, it's likely to be a long one. "Automatic Parallelization" in any but the most trivial circumstances will go the way of AI, that is, become a marketing/academic buzzword. Here's the approach that I think WILL work. 2 languages, one a purely sequential language designed from the ground up to be "parallizable". Notice, I don't say "parallel". And a second meta-language that a parallelization expert can use to parallelize the sequential code FOR A SPECIFIC HARDWARE PLATFORM. We aren't going to raise a generation of "parallel programmers" the way we did with "object oriented" programmers. Object oriented programming was designed to make programming easier. PP is something NOBODY WOULD EVER DO if processors were fast enough.

Re:Parallizing Languages by Anonymous Coward · 2007-02-10 20:03 · Score: 0

And that, my friend, is the mistake we're making.
Re:Parallizing Languages by try_anything · 2007-02-12 09:14 · Score: 1

Nobody has tried to make parallel programming accessible except the people who are already good at it, and the people who are already good at it are trying to help everyone else follow the path they took, which is to understand it mathematically. Unfortunately, the vast majority of programmers are not mathematically inclined at all. (Anal retentive + antisocial != mathematical.)

Imperative programming gives people a native, intuitive way of solving problems. Everyone knows how to give directions for performing a task, and it's easy to imagine the computer following your directions. With a little bit of effort on everyone's part, I think a similar concrete metaphor could be created that would allow average programmers to specify a significant amount of parallelization in their programs. For instance, managing a group of workers in a bakery. The work to be done can be thought of as a collection of tasks (single-threaded imperative programs). Each task may depend on others tasks. (E.g., you can't mold the cookies until you've mixed the cookie dough.) The bakery model even gives you a concrete illustration of resource sharing, though I think it would be unwise to allow programmers to manage resources -- the framework should handle resources and make deadlock impossible.

A mathematically inclined person may see these kinds of metaphor-based frameworks as conceptual straightjackets, and they're right. A framework boxes you into one way of specifying parallelism, which may be inefficient or unnecessarily complicated for many problems. I think it's a step forward, though. People who are capable of thinking more abstractly won't be mentally crippled by learning one limited form of parallelism -- they will still be able to learn more sophisticated approaches -- and people who aren't that capable will at least be able to take part and derive a limited benefit.
Re:Parallizing Languages by Samarian+Hillbilly · 2007-02-12 18:34 · Score: 1

There has been work going on in this direction in the "mathematically" oriented community. Usually called "skeletons". The trouble is, none of this will work with current imperative languages and synchronization systems such as pthreads. To put it bluntly, "mutexes don't compose", which means you cannot write general libraries with standard synchronization primitives. You at the very least need a language that uses higher-level functions and forbids access to globals, use of pointers ect. It's not as easy as just "giving people an appropriate metaphor". The human mind works in parallel but thinks in serial. I so no reason why the "programmer-of-the-future" should need to deal with these issues when they can be solved by the compiler/meta-compiler. But, I'm a lone voice shouting in the wilderness, the rest of the world is busy supplying new "parallel" languages with which to build the next generations rat nest of code!

repeat after me by Anonymous Coward · 2007-02-10 19:49 · Score: 0

clusters are not parallel computers.
clusters are NOT parallel computers.

Re:repeat after me by soldack · 2007-02-11 12:57 · Score: 1

I never said that "clusters are parallel computers." I do believe that they are related enough to make a reply about clusters relevant to parallel computers. So did the authors of the referenced article:
"We believe that much can be learned by examining the success of parallelism at the extremes of the computing spectrum, namely embedded computing and high performance computing." Many HPC systems are computing clusters using API/libraries like MPI to allow code written for single box/many processor parallel systems to also run on clusters. The TOP500 shows this but even more importantly anyone who works in the HPC market sees a huge demand for clusters for domains once dominated by parallel systems. There are clusters with thousands of processors in use today.

Clusters could be made to be even more like parallel computers. If we have faster I/O (both in terms of bandwidth and latency) and we have an single image operating system running over all the computers, then we approach an architecture similar to a large scale SMP system or even a parallel computer.

From http://en.wikipedia.org/wiki/Parallel_computer:
"Parallel computing is the simultaneous execution of the same task (split up and specially adapted) on multiple processors in order to obtain results faster. The idea is based on the fact that the process of solving a problem usually can be divided into smaller tasks, which may be carried out simultaneously with some coordination."

From http://en.wikipedia.org/wiki/Computer_cluster:
"A computer cluster is a group of loosely coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or availability over that provided by a single computer, while typically being much more cost-effective than single computers of comparable speed or availability."

The idea is to improve the I/O and operating systems of clusters to approach a parallel computer.

I wouldn't get so hung up on word choice. In the end you have CPU(s), memory subsystems, I/O, an operating system, and some user software. As I said, many tasks once reserved for parallel computers have been written to run quite well on clusters.

Oh...and when telling someone that something is NOT something else, it helps to back it up with some information. :-)

--
-- soldack

Re:Shared-state concurrency is harder than you thi by xenocide2 · 2007-02-10 20:03 · Score: 1

I suppose it depends on the teaching, but we run our students through synchronization routines in our Operating Systems course. They should know when they leave that multithreading requires protection. Of course, there's not much for the ones that leave school before this course (the course is a bear and the ones likely to avoid it are also likely to drop, but not before adding "Java" to their resume). Not all of them could propose a good fix, but they'd at least know that they should be looking for locks or something and encounter the "synchronized" keyword. Seems the biggest problem regarding java multithreading is efficiency and broken ways to avoid the synchronized keyword.

I'm not an expert on the java semantics of notify(), but after reading the javadoc and spec, if you intend to preserve the order of notification, it seems your point was to prove Java inadequate for the job: notify() has no parameters and merely signals waiting threads. It does not enforce any queuing on multiple notifies. If you were ready to run before the notify, you're still ready afterwards, but not necessarily running. Solving the problem becomes a lot less parallel, as publishes will have to copy, then block on pending publishes, AND for all subscribing threads to block again. If you signal multiple times, assuming the subscribers were waiting on themselves, using notify actions for publish potentially races with the waking thread's event handler due to the thread scheduler.

In fact, we're slowly building message queues as that's whats needed to satisfy the spec. We need FIFO processing of values and java's primitives fail the FIFO test. Fortunately, the paper decides to abandon java at this point for a pet language and introduces typical parallel language features like networking and the rest of distributed programming. Hopefully though students are scared enough of multithreading to look into how a given language's synchronization works.

--
I Browse at +4 Flamebait

Open Source Sysadmin

Where's my quantum computer? by Anonymous Coward · 2007-02-10 20:26 · Score: 0

I need it to run an AI I designed as a child. Every word you say to it adds a ... what is that word? The opposite of their problem. to it.

640k, anyone? by try_anything · 2007-02-10 20:27 · Score: 2, Insightful

Something like a server could relatively easily make use of almost any number of processors; one per client maybe.

What happens when a 1024-core server is too slow to handle 700 concurrent connections, and the only upgrade option is a 2048-core server? Then it matters whether each of those 700 requests is a parallelizable problem. Imagine a server that solves difficult computations like routing delivery traffic, designing tailored clothes from customer snapshots, monitoring security camera feeds at a casino, or analyzing a twenty-second voice recording to decide where to route a call. ("To help us best serve you, briefly state why you are calling.") Saying that one core per client will always be sufficient, even when cores stop getting faster, is tantamount to saying that nobody will ever figure out how to use all that power -- historically, a poor prediction.

Re:OpenMP by Samarian+Hillbilly · 2007-02-10 20:46 · Score: 1

For all the buzzz. OpenMP is a nice hack to add branch and join parallelism to loops in existing sequential code. An application needing higher granularity (task level) parallelism, streaming parallelism, or control over parallel processes, won't have much to do with it. Keep looking, the answers aren't there yet, nor in Fortress.

Try Antiobjects by Anonymous Coward · 2007-02-10 21:01 · Score: 0

A big part of the problem is that established programming models, e.g., the use of UML to drive OO flavored designs, traps people into approaches unsuited for parallel execution. When is the last time you have tried to explain a massively parallel problem to somebody using UML? Antiobjects http://en.wikipedia.org/wiki/Antiobjects is a pattern mapping computation from so-called foreground objects to background objects. These Antiobjects can be mapped efficiently onto architectures with very large numbers of cores/cpus. They have been used in applications such as simulations as well as games and have been explored initially on Connection Machines (64,000 CPUS).

Re:Shared-state concurrency is harder than you thi by Anonymous Coward · 2007-02-10 21:48 · Score: 0

In fact, we're slowly building message queues as that's whats needed to satisfy the spec. We need FIFO processing of values and java's primitives fail the FIFO test.

Have you seen BlockingQueue? The language-level primitives aren't your only choice; since Java 1.4, the standard library has included some fairly nice concurrent classes built on top of them.

FP no more parallelizable than imperative by master_p · 2007-02-10 23:01 · Score: 1

It's a myth that functional programs are more parallelizable than imperative ones.

First of all, most functional programs do heavy use of monads (especially the IO monad) and hence they need synchronization primitives around those monads just like imperative programs.

Secondly, those parts that do not use monads still need to be executed in certain order, so there needs to be a scheduler which synchronizes the multiple cores (waiting for results and dispatching computations), which again means using semaphores and mutexes.

Thirdly, functional languages do heavy use of the garbage collected allocator, because of using list cells and thunks a lot. Again, this means that the allocator needs to be synchronized.

Re:FP no more parallelizable than imperative by Pinky · 2007-02-11 08:38 · Score: 1

> It's a myth that functional programs are more parallelizable than imperative ones.

Never heard that one.

I think what you're getting at is:
It's easier to write correct, parallelized code in a functional language.

It's easier to write correct, parallelized code in functional languages because many threading errors come from race conditions when changing the state of some object of datastructure. Functional programming languages avoid having mutable states or datastructure. Therefore witting a race condition is less likely.. therefore you're more likely to write correct, parallelized code.

As I've mentioned in a previous post, one of the first things you realize when learning to write threaded code is immutable objects are your friend.
Re:FP no more parallelizable than imperative by Anonymous Coward · 2007-02-11 20:55 · Score: 0

There is nothing magical about monads -- they are just side-effecting code, and in FP contexts can usually be considered in the abstract as a foreign function call. That is, a monad simply hides the use of explicitly side-effecting language features within a function. Although in the context of this discussion, monads are primarily of research interest into non-strict languages like Haskell, they have a long history in other mainly-functional languages that benefit from referential transparency, such as the Stalin Scheme dialect.

Another way of looking at this is that monad abstractions are used to preserve referential transparency elsewhere in the function trees calling the monads. Referential transparency allows for ready analysis, rearrangement and substitution of code during optimization, superfluous guard/check operation elimination, memoization, and so forth. In this context, monads simply guard against mis-optimization (memoizing input, for example).

Functional programming languages generally allow suspensions of objects (especially recursive functions), and these are the usual synchronization objects used in concurrent programming in functional languages. Lazy languages like Haskell offer a much larger variety of such data types (Haskell allows any data structure to be suspended). Accesses to suspendable data structures which require a particular resource can cause a suspension that allows other code to run if the resource is unavailable.

Since functional programming generally aims for references to be arranged in directed acyclic graphs, suspensions do not generally pose synchronization difficulties along the lines you suggest in your sentence beginning with "Secondly". Each rootward node in a DAG requires resolution of its references, and scheduling is done by arranging its leafward references at compile time rather than at runtime within the context of the node itself. That is, if current_node requires A before B, then B is a child of A in the reference DAG. If the order of availability of A and B is unimportant to current_node, then both are direct children of current_node in the reference DAG. In the first case, a suspension of B blocks A and current_node; in the second case, a suspension of B allows for the computation of A to continue.

Where possible, all branches of current_node are evaluated concurrently -- as required, in lazy languages and in eager ones, leafward suspensions provide an opportunity for the traversal of unexplored branches not only of the immediate ancestor, but of any element in the reference DAG upstream of any suspension.

The scheduler simply maintains a list of suspended objects and attempts to unsuspend them in some order. Those that are still waiting on as yet unavailable resources remain on the list.

Again, this means that the allocator needs to be synchronized
Synchronization is needed for the case where the GC and mutator can run concurrently. The problem is that liveness identification becomes harder because the mutator changes the object graph, potentially causing the GC to miss live objects. In particular, we have to avoid an object that is marked live, and whose children are all marked live, being mutated to point to a not-yet-marked object.

We can prevent these losses by tracking reads from objects which are not yet marked (read barrier), or we can track writes to objects which are fully marked (write barrier).

Modern GCs in environments which support exact object identification typically use a generational GC with a card table write barrier. The card table approach is computationally lightweight (it can be done in 2 or 3 native instructions on x86 and PPC) and works well in a multi-threaded environment. This allows for very fast allocation, and fast incremental collection even in the presence of object mutation.

New objects require no synchronization at all; only writes to already marked objects must be synchronized with the collector thread, an

Here is what we are missing: by master_p · 2007-02-10 23:10 · Score: 1

There are two things that we are missing:

1) the brain does not share data. Data in the brain are copied around in the form of signals.

2) the brain is not a generalized turing machine. It can only do pattern matching on images/sounds/smells/touch. Don't be fooled of our ability to do arithmetic of formulate theorems: these are the result of pattern matching as well.

Pattern matching is highly parallelizable, of course: every piece of the pattern is fed to its own 'CPU', which calculates the degree of the match.

Re:Here is what we are missing: by ardor · 2007-02-10 23:58 · Score: 1

But since we are able to do everything a turing machine can do as well, this means the pattern matching is just another form of a turing machine, right?

--
This sig does not contain any SCO code.
Re:Here is what we are missing: by master_p · 2007-02-11 01:45 · Score: 1

The brain is not a Turing complete machine, because the brain always terminates its calculations.

In other words, you can not do what a Turing machine can do.
Re:Here is what we are missing: by ardor · 2007-02-11 02:16 · Score: 1

But *why* does it terminate every calculation? In which situations? Is it actually proven that the brain always terminates its calculations?

--
This sig does not contain any SCO code.
Re:Here is what we are missing: by master_p · 2007-02-11 03:46 · Score: 1

Yeap.

Have you see a human entering an infinite loop? I haven't.

Actually the brain does not do calculations, it simply does pattern matching. Pattern matching is not a calculation, it is statistics.
Re:Here is what we are missing: by master_p · 2007-02-12 22:07 · Score: 1

Yeap. There is not a single documented case of a human being or animal to simply freeze in place, i.e. enter an infinite loop.

All the brain does is a search which recalls a good or bad response.
Re:Here is what we are missing: by bshanks · 2007-02-19 00:13 · Score: 1

I agree with ardor here.

You may not see a person with a brain enter an infinite loop. Yet, if you explain to a person what a Turing machine is, and then ask them to compute the state of a certain Turing machine when given a certain input after a certain number of steps, then (given enough time*) they will be able to do so.

Therefore, people can emulate Turing machines. Therefore, people actually can perform computation.

The fact that humans don't actually enter loops in everyday life simply means that the program they are running** doesn't have an infinite loop in it (or, it means that the conditions to go into the loop are never met in everyday life; consider if you could put someone in a sensory isolation chamber and keep them alive for five billion years; are you sure they won't loop?).

*: Technically, finite time and memory means that humans can't be more than finite automata. But I don't think that's really what we're asking; I think our goal in this discussion is to consider human minds with these limitations abstracted away (I think the thing to do formally is to consider humans as representatives of a "uniform family" of finite automatons, in order to say that humans are not quite Turing machines, but that there exists an abstract notion of a human-like mind unbound by lifespan or memory which is, in fact, at least as powerful as a universal Turing machine).

**: although maybe humans are something strictly MORE powerful than Turing machines, in which case it may not be sensible to talk about which "program" they are running

There is a way to automate shared-state con/cy!!! by master_p · 2007-02-10 23:45 · Score: 3, Interesting

There is a way to automate shared state concurrency! every object should be its own thread. Computations that refer to the same object must be executed by the object's thread.

Here is how it works:

A computation does not return a result, but a tuple of {key, continuation}. The key is used to locate the thread to pass the continuation to. The computation is stored in the thread's queue and the thread is woken up.

The tuple {key, continuation} pair can be an 64-bit value (on 32-bit machines) that consists of a pointer to a memory location (the key) and a pointer to code (the continuation).

The insertion to the thread's queue can be done using lock-free data structures.

Threads can be user-level so there need not be a switch to kernel space.

This design can allow for linear scaling of performance: the more cores you put in, the more performance you get (for algorithms that are not linear, that is). Linear algorithms would execute a little slower than usual, but the trade off is acceptable: for many applications that allow for parallelization due to having lots of (relatively) independent objects, the performance boost be tremendous.

There are many domains of applications that would benefit from such an approach:

-web servers/application servers that must serve thousands of clients simultaneously.
-video games with thousands of objects.
-simulations that have many independent agents that can run in parallel.
-GUI apps that use the observer pattern and each observable has many observers than can be notified in parallel.

Note: The above ideas are taken from libasync-mp and lock-free data structure programming.

old CW: Re:nothing new here... by Anonymous Coward · 2007-02-10 23:50 · Score: 0

old CW: Black is black and white is white
new CW: White is black.

Seriously. Who uses such lame rhetorical techniques to convince people that all software should be rewritten?

The thing is, MOST algorithms cannot be PARALLELIZED or it is not worth it. This is a red herring to convince the Northern Venture Allicance Trolls to give more money to the new $ugly-name project which will promise parallelization. Yeah, milk the fat cow.

Re:old CW: Re:nothing new here... by try_anything · 2007-02-12 10:38 · Score: 1

You're right in assuming the purpose of this paper is to support a project -- it's RAMP, and it's actually pretty cool. Rather than trying to solve all architectural issues up front and putting all the eggs into one architectural basket, RAMP aims to use reconfigurable hardware to prototype many different parallel architectures to see which ones can be efficiently programmed. This is a much more software-friendly approach than has been tried in the past.
The thing is, MOST algorithms cannot be PARALLELIZED or it is not worth it.

That's the theoretical problem. The practical problem is that a large amount of low-hanging parallelization is thrown away because current languages and architectures force (or strongly encourage) programmers to unnecessary sequentialize their programs, even when they know that their code contains potentially parallel constructs. Plus, the frameworks for specifying parallelization are nonstandard, nonportable, designed with specific hardware architectures in mind, and can't be composed with each other.

Re:Shared-state concurrency is harder than you thi by moonbender · 2007-02-11 01:56 · Score: 2, Interesting

Just as a sidenote, the BlockingQueue you link to is a Java 5 feature, and indeed Java 5 has added a LOT of API classes that are a great help when dealing with threading. Not sure if you meant to refer to 1.5, since I'm sure 1.4 has also added some utility classes, but it's been touted as one of the major features of 5 aka 1.5.

--
Switch back to Slashdot's D1 system.

missed the hint? by toby · 2007-02-11 03:19 · Score: 1

until that magical threading language (maybe c++1x) comes along

Studied Erlang much?

--
you had me at #!

Razor blade star NUMA NUMA yay by tepples · 2007-02-11 04:16 · Score: 1

The real application domain of multi-threading and data-level parallelism is stuff that one would never run on a Windows client New PCs have two cores and two threads per core for a total of four threads executing simultaneously. So why do over 90 percent of them run Windows? Is the killer app the ability to run apps on one processor and the GUI, antivirus, firewall on the other three?

I would also add that to program on many of the new multi-core architectures, a programmer will want to be able to specify what runs on which core. That's called NUMA-NUMA, right?

Take for example the cell processor, where each SPE has only a 256KB local store with limited branch prediction. A miss in the store or a mispredicted branch would be catastrophic for performance. Instead, the programmer writes small specialized programs with that have minimal branching. So how would an architecture with a large branch penalty efficiently run a spell checker, game opponent, or other things that inherently use branchy algorithms such as binary searches or tree searches?

Looking for ways to maximize cache locality/residence would be much more fruitful in terms of turn around time, though again this tends to be something which must be done explicitly and cannot be abstracted behind a programming language. So should this be done at the level of hardware? Or should people go back to writing libraries in finely tuned assembly language?

That report is negative on supercomputing. by Animats · 2007-02-11 07:52 · Score: 1

That report on supercomputing doesn't indicate growth:

"Clusters have proven themselves as capable servers to handle a sizable portion of the HPC workload."

"More than half of the respondents expect their budgets for all HPC tools will decline (43%) or remain the same (17%) over the next two years."

"U.S. industrial users/buyers really want and need faster computers that fit their budgets and that don't require specialized programming skills."

Also, that study doesn't reflect "most companies". It reflects 30 companies that use supercomputers today, most of which are very large aerospace or automotive companies. They're not buying the tightly-coupled high end supercomputers, either; most of the new systems are clusters of conventional machines.

Re:Shared-state concurrency is harder than you thi by Anonymous Coward · 2007-02-11 07:52 · Score: 0

Implement the Observer (aka Listener) pattern (specifically the thing called "Subject" on the Wikipedia page). Your object should provide two methods, publish and subscribe. Clients can call subscribe to indicate their interest in being notified. When a client calls publish with a value, your object should pass on that value by calling the notify method on everyone who has previously subscribed for updates.

Sounds simple, right? But wait:

1) What if one of your subscribers throws an exception? That should not prevent other subscribers from being notified.
2) What if notifying a subscriber triggers another value to be published? All the subscribers must be kept up to date on the latest published value.
3) What if notifying a subscriber triggers another subscription? Whether or not the newly added subscriber receives this in-progress notification is up to you, but it must be well defined and predictable.
4) Oh, and by the way, don't deadlock.

Can you achieve all these things in a multithreaded programming model (e.g. Java)? Try it.

This is easy.
1) you should be catching any possible exceptions anyways, unless throwing an exception is part of your spec and dying is the correct behaviour.
2) the publish will be done by the notifying thread, so there is no deadlock. You have to avoid cycles on your own, but that is a design issue (if you spec a cycle, you will get a cycle: surprise). "must be kept up to date" is ill-defined. What ordering of updates to you want? Do you want every update in order? Then don't allow changes while a publish is in progress, or have a queue of outstanding publishes. Do you only want the "newest" value? Use a callback. Or you could make all of your notifications asynchronous (and use a condition to prevent non-atomic state changes). The notices will be in a new thread every time, so there will be no cycles and no deadlocks.
3) Non-issue. The subscribe enters the system through whatever mutex you are using to protect the subscriber list.
4) there is no reason to deadlock here.

functional approaches by Fzz · 2007-02-11 09:02 · Score: 1

Maybe we'll finally see functional languages come to the forefront. They make it a whole lot easier for the compiler to extract parallelism automatically. For example,see Parallel Haskell. I think this is the closest I've seen to a "magical threading language".

Re:There is a way to automate shared-state con/cy! by zestyping · 2007-02-11 10:07 · Score: 1

This sounds pretty much like event queueing, similar to the solution proposed by the paper i linked to.

Notice that this is no longer "shared-state" concurrency. If every object is running in its own thread, no threads share state. I think you've helped prove my point (shared-state concurrency is hard and ought to be avoided).

Re:Shared-state concurrency is harder than you thi by zestyping · 2007-02-11 10:15 · Score: 1

I think you may have misunderstood the problem. You don't get to specify how your clients behave; the challenge here is just to implement the publish/subscribe component (the "Subject").

"must be kept up to date" is ill-defined. What ordering of updates to you want? A properly implemented Subject should guarantee that the last notification sent to each subscriber contains the last value that was published.

Then don't allow changes while a publish is in progress, or have a queue of outstanding publishes. You can't lock out changes while a publish is in progress, because then any client trying to publish would cause a deadlock, and you don't get to specify what the clients do.

Using a queue of outstanding publishes is heading in the right direction: toward event-queue concurrency instead of shared-state concurrency (which is the point i was originally trying to make).

Vectorisation needs to get more creative by Verte · 2007-02-11 11:40 · Score: 0

There are a lot of applications where taking advantage of a large number of threads is a fantastic way to increase performance. However, there is a lot of lower-level stuff that massively multicore CPUs should look into. Vector and matrix operations, FFTs, and single instruction multiple data [especially SIMD multiple compare, where many possibilities can be checked at once, eg for regexes, which is similar to the method d-wave uses for solving NP-complete problems with quantum computing] are the sort of instructions we are talking about- really using all those processors together. When the big CPU people focus on the sort of algorithms that crop up in scientific computing and database access, we should see the real power of top-down multicore design.

--
We at slashdot are scientists, specialists and kernel hackers. Your FUD will be found out.

Use a Database by Tablizer · 2007-02-11 12:19 · Score: 2, Insightful

Databases already allow a kind of parellel processing. A.C.I.D.-based techniques allow multiple users (processors) to send results to the same database in order to communicate results between each user/client. Each "client" may be single threaded, but together a client/server system is essentially a multi-threaded application, all without odd code or odd programming languages.

--
Table-ized A.I.

What about lock free and wait free??? by aybiss · 2007-02-11 13:31 · Score: 0

Could have been perusing too fast, but this guy fails to mention that one of the biggest obstacles to writing super fast multi-threaded code is the OS mutexes. Everything else he has said is far from surprising news but this *is* only slashdot after all...

We need to do away with our safe but expensive locks and critical sections, which work great in a system where you only have one core anyway so locking just lets someone else get something done, but which cause way too many cross-core stalls in a real-time system.

Something that would help a hell of a lot would be to have implementations of lock free FIFOs being created, supported and tweaked by OS developers.

A guy can dream...

--
It's OK Bender, there's no such thing as 2.

already been done (20 years ago) by shizzle · 2007-02-11 20:38 · Score: 2, Informative

Starting to feel like a curmudgeon with posts like this, but it's already been done a long time ago. Hillis and Steele presented an algorithm for parsing in parallel in logarithmic time on the Connection Machine. The article is called "Data Parallel Algorithms", Communications of the ACM, December 1986.

Looks like if you don't have ACM digital library access and don't feel like trudging to a library you can find a copy here: http://cva.stanford.edu/classes/cs99s/papers/hilli s-steele-data-parallel-algorithms.pdf

For a group of characters (substring) in the middle of the file, you can locally build a table that maps whatever the incoming parser state is (at the beginning of the substring) to what the corresponding outgoing state would be at then end, and then that table lets you process the whole substring in unit time. I like to think of it as the parsing equivalent of a carry-lookahead adder. Probably best to read the article if you're curious.

Re:already been done (20 years ago) by Mr+Z · 2007-02-12 01:32 · Score: 1

Thanks for the paper reference! I appreciate it.

Hmmm.... I had thought about precisely the sort of approach the CM guys describe, and had rejected it. Sure, you can use the formulation they give, but due to the non-linear nature of the state-space, you end up wasting most of your computation. They indicate log n time for parsing, but the computational cost (CPU cycles used) is going to be more like O(n * m), where m is the average number of possible input states for each character. The latency got shorter, sure, but the computational workload got tremendously larger. You may be much better off with a mostly serial parser feeding a parallel compute engine the semantic meaning of the program.

I've personally applied and have seen applied a very restricted form of this approach on machines with large amounts of instruction-level parallelism. Basically, start processing the next symbol before the current symbol is fully resolved. You may have to speculate on some aspects of the next symbol, but if you have enough of the subsequent context worked out, your speculation won't get out of hand. This is useful for reducing latency and using up otherwise idle resources. I've applied it to several VLD-style algorithms. But, as the potential incoming state space explodes, the resource requirements explode as well, so there's a natural limit to how much of this you can really do.
--Joe

--
Program Intellivision!
Re:already been done (20 years ago) by shizzle · 2007-02-12 02:32 · Score: 1

Yes, it's not necessarily practical; as the article says understatedly, "Naturally, this algorithm performs much more computation per character than the straightforward serial algorithm".

It seems Hillis & Steele's original purpose was akin to the tone of this thread: just defending their highly parallel machine in the face of naysayers who had criticisms like "Well, what about parsing? Let's see you do *that* in parallel! Huh? Huh?".

Of course, in their case the naysayers seem to have been right :-). I'll leave the comparisons with the present situation to others (for now).
Re:already been done (20 years ago) by Mr+Z · 2007-02-12 03:21 · Score: 1

I actually take a keen interest in how to parallelize this type of stuff. I work as a CPU architect these days, and some of the algorithms we have to cope with (on finite, tiny hardware with budgets measured in milliWatts) are horribly serial. Meanwhile, our primary pathway to greater performance is to go parallel in one way or another. Parsing of various sorts is a recurring dragon to slay. In the embedded media space, the parsing problem usually takes on the form of Huffman decode, or lately arithmetic decode. (CABAC in H.264 is murder. Look it up some time if you care to be suitably horrified.) That is, we're not typically running lex and yacc, but rather decoding a video bitstream. In many cases, about the only parallelism we seem to get out those problems is through pipelining across multiple processors: Use a dedicated coprocessor to pull the bitstream apart and let the main CPU work the rest of the problem. Not very satisfying to a performance junkie, but hey, what can you do?

I agree with you on the nature of the paper. For most processing tasks, they just assumed infinite CPUs, so that it didn't matter at all how wide their algorithm got, it only mattered how shallow it got. In the end, the number of processors is finite, and that wide algorithm gets sequenced. It's a bit like saying you can implement any logic function of any number of variables in only two levels of logic. Sure, that's true if you have NAND gates with arbitrarily large numbers of inputs, arbitrarily large drive strength and wires that are infinitely fast. I think it would take significant amounts of unobtainium to build it though. :-)
--Joe

--
Program Intellivision!

programming model by cyclomedia · 2007-02-11 23:36 · Score: 1

here's a thought, suppose you have a game with threads for gameplay, rendering, ai and physics we could seperate game time into internal frames and the threads out like this:

class cHuman extends cBiped
{
render
<
addDecal( decal_t decal , vec_t origin );
>

gameplay
<
traceAttack( vec_t origin , vec_t dir );
>

physics
<
impulseForce( vec_t origin , vec_t dir , float force );
>
}

this human may recieve a bullet to the chest, that's simple enough but it needs to invoke a force to knock the player back (this<physics>.impulseForce) as well as add a decal to the player skin to show the bullet impact on their bullet proof vest. It would probably also emit a sound and spawn a spark or debris model. So what seems at first to be a simple seperation of elements becomes more complex. This is where the internal frames come in - each thread would need an event queue, a list of commands to follow during each frame, and you would then add events to the tail of each thread's queues. At first glance this would probably look like it could introduce a new kind of infinite loop - whereby each thread gets more and more events tacked onto it's queue, but by seperating the execution into game-time-limited this can be overcome - so long as projectiles and explosions all have a finite velocity, and the AI characters have a realistic reflex time attatched then very little can happen infinitely quickly.

However, do we still have to limit ourselves by forcing all threads to have completed the current frame before starting the next? or can we allow a frame space, a range of say 50 frames over which the threads are allowed to run amok? - the renderer may be fixed to 50 or 60 Hz for example but the game engine may be able to run at 500 internal frames a second in order to get the physics working reasonably. Presumably the renderer can just take a snapshot of the gamestate and concentrate on rendering that.

This is where the compiler comes in, it should delegate the temporal snapshot ability of the renderer so that what the renderer is rendering does not change whilst it is rendering it, but the AI and physics threads can chat to each other about the current states of play. The programmer should be free to program these in parallel without having to worry about it too much.

--
If you don't risk failure you don't risk success.

Re:There is a way to automate shared-state con/cy! by demallien2 · 2007-02-12 01:16 · Score: 1

I see two problems with this solution:

Interthread communications - the scheduler is going to become the bottleneck. Maybe a processor architecture with some serious scheduler support may be able to mitigate this?

Internally, most objects are going to need to be implemented as state machines. A formally correct state machine is no easy thing to create, particularly in most popular programming languages of today (C, C++, Java)

Still, these two problems are solvable with new CPU architectures and better compilers...

Re:There is a way to automate shared-state con/cy! by master_p · 2007-02-12 01:45 · Score: 1

Indeed, but the overall result is similar, and you get a solution for multicore architectures for free.

Re:Shared-state concurrency is harder than you thi by zestyping · 2007-02-12 09:53 · Score: 1

The argument in favour of message queues instead of shared state isn't just about Java; Java is just an example. I am fairly sure that the Observer problem (and other simple coordination problems) are hard to solve reliably on any platform based on the conventional multithreading paradigm as taught in most computer science departments (threads, mutexes, critical sections, etc.).

Re:Shared-state concurrency is harder than you thi by Anonymous Coward · 2007-02-13 05:30 · Score: 0

It is true that java 5 added the java.util.concurrent package. However, if you are using 1.4 (or really 1.2+) you can use this package:

http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/ dl/util/concurrent/intro.html

The author Doug Lea is the same guy who wrote the 1.5 java.util.concurrent package. He also wrote the book "Concurrent Programming in Java 2nd ed." which is a must read IMO if you do anything multithreaded in Java.

Whole focus may be wrong. by deMolay13 · 2007-02-14 09:39 · Score: 1

So, we have hundreds or thousands of processors. The inclination of people is to try to take one app and try to make use of all that processing power. Maybe our whole focus is wrong. Maybe we should be thinking about actively running 100 or 1000 programs simultaneously where each program is a classic single-threaded program. You wouldn't think about doing this today because it would grind your machine to a halt but in the coming world of many processing elements this is a viable model. It may not be very power efficient but perhaps new apps will emerge that can do something useful for us in the background that don't require constant attention or immediate results.

197 comments