Auto-threading Compiler Could Restore Moore's Law Gains

Not this shit again by Anonymous Coward · 2012-12-03 12:11 · Score: 4, Informative

Moore's law has nothing to do with performance, numbnuts. Apparently the real geeks left Slashdot ages ago.

Re:Not this shit again by xetovss · 2012-12-03 12:31 · Score: 4, Interesting

Exactly, Moore's Law isn't a law, it is a marketing plan. I don't see why so many people get so serious about it. A real law (of science) would be something like the law of gravity where it has a near universal application, whereas Moore's Law is a "law" that describes Intel's marketing plan.
Re:Not this shit again by Baloroth · 2012-12-03 12:54 · Score: 4, Informative

Moore's law has nothing to do with performance, numbnuts. Apparently the real geeks left Slashdot ages ago.
Yes and no. Moore's law states that the number of transistors will double every 2 years. The problem is that we are nearing the peak of what is possible with current technology in a single core, hence all the focus on 4,6,8, and even 16 core systems for consumers (always been popular in supercomputers and the like). That means doubling transistor count every 2 years can be done through increasing cores... but there is no point in doing that if programs can only use a few of them (very few consumers now need 4 cores, much less 8 or 16).
So, if you can scale up the ability to use processor cores, then Moore's law can continue to hold for CPU's as we increase processor cores. If you can't, then it won't. It'll have to stop sometime, of course, just not necessarily today.

--
"None can love freedom heartily, but good men; the rest love not freedom, but license." --John Milton
Re:Not this shit again by Jonner · 2012-12-03 12:56 · Score: 4, Insightful

Moore's law has nothing to do with performance, numbnuts. Apparently the real geeks left Slashdot ages ago.
Try reading the headline again. Usually, clueless posters have to be reminded to read TFA, but this is ridiculous. As Moore's "law" continues to be mostly true, the added transistors are being used for extra more cores rather than to make one core faster. Most of the cores sit idle most of the time because few programs can use more than one of them at once.
Re:Not this shit again by c0lo · 2012-12-03 13:22 · Score: 5, Funny

Apparently the real geeks left Slashdot ages ago.
Casted to void?

--
Questions raise, answers kill. Raise questions to stay alive.
Re:Not this shit again by Spiridios · 2012-12-03 13:26 · Score: 5, Insightful

Maybe you need to read?
Here's the headline (emphasis mine): Auto-threading Compiler Could Restore Moore's Law Gains
In the past, adding more transistors to a core meant the core worked faster (simplification), so Moore's law indirectly lead to better performance. Now a core is close to as fast/efficient as its going to get, so we throw the extra transistors at new cores (still Moore's law). The problem is, there's only so many single-threaded processes a person can run before there will literally be one core per process. In order to see benefits from the extra transistors again, "gains" in the summary's terminology, then we need a way to take advantage of it (this new compiler technique, functional programming, programmers who can actually grok threads). In the end, if we're not seeing some kind of gain from Moore's law, then the chip manufacturers will stop adding new transistors because no one will pay money for a chip that's just as good as their current chip, and Moore's law will fail.
Re:Not this shit again by Anonymous Coward · 2012-12-03 14:57 · Score: 2, Informative

Moore's law has nothing to do with performance, numbnuts. Apparently the real geeks left Slashdot ages ago.
Yes and no. Moore's law states that the number of transistors will double every 2 years.
Why do people always fail to mention the part about complexity for minimum cost? It's an economy-of-scale observation. The cost part is essential to what Moore was saying.
"Number of components per integrated circuit for minimum cost" will double every 1-2 years.
Re:Not this shit again by ceoyoyo · 2012-12-03 15:58 · Score: 5, Interesting

Moore's law was coined by an engineer to describe a series of observations. That is, it's a mathematical function that seems to fit some data, without any explanatory power. Just like various other "laws" such as the laws of thermodynamics, and, your favourite, Newton's laws, including his law of universal gravitation.
Moore's law states that the number of components on an integrated circuit doubles approximately every two years.
Re:Not this shit again by shmageggy · 2012-12-03 16:18 · Score: 4, Insightful

Yeah but "Moore's Heuristic" just aint as snappy.
Re:Not this shit again by tlhIngan · 2012-12-03 18:13 · Score: 4, Informative

Yes and no. Moore's law states that the number of transistors will double every 2 years. The problem is that we are nearing the peak of what is possible with current technology in a single core, hence all the focus on 4,6,8, and even 16 core systems for consumers (always been popular in supercomputers and the like). That means doubling transistor count every 2 years can be done through increasing cores... but there is no point in doing that if programs can only use a few of them (very few consumers now need 4 cores, much less 8 or 16).

Except... the number of transistors in a CPU is irrelevant!
A CPU doesn't have the transistor density that really benefits much from Moore's Law - because the vast majority of the space on a chip is not taken up by transistors, but by wiring. In fact, the wiring density is what's limiting transistor density (a good thing - larger transistors can give you better performance because they can drive the longer wires quicker).
Most of the transistors used in a CPU actually goes towards the cache - when you're talking about 16+ MB of pure L1/L2/L3 cache, implemented as 6T SRAM cells, that's 100M transistors right there (and that doesn't include the cache line tag logic and CAM).
The thing with the highest transistor density (and thus the most benefit of Moore's Law) is actually memory structures - caches, DRAM, SRAM, flash memory, etc. This is where each transistor is vital to memory storage and packing them in close means more storage is available, in which case Moore's law states that RAM etc. will double in capacity or halve in cost every 18 months or so.
Smaller transistors do help CPUs consume a little less power, but double the number of transistors doesn't do a whole lot because there's a lot of empty space that the wiring forces to be transistor-free. (Non-memory parts of the CPU are effectively "random logic" where there's no rhyme or reason to the wiring). It's why the caches have the most transistors yet take the smallest areas.
Re:Not this shit again by TheLink · 2012-12-03 18:25 · Score: 4, Interesting

The problem is that we are nearing the peak of what is possible with current technology in a single core
But aren't there still plenty of things that the hardware can do to make the software stuff easier?
Intel has actually started adding some stuff to help: http://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions
So maybe Intel, AMD should interact more with the OS and compiler bunch and figure out how to use those billions of transistors in more useful ways.
There are things you can do in software to address the c10k problem (and similar problems): http://www.kegel.com/c10k.html
But I'm sure there are things you can do in hardware to make stuff even easier. It's not like the stuff that's being done is the best way of doing things. Furthermore the OS and apps people might be writing things in certain ways because certain things aren't done by the hardware or done fast.
I know that at least on the x86 platform checking the current time and also getting monotonic time is not as simple AND efficient as it could be. It was even worse before HPET, and now even HPET isn't that great. Monotonic system time (ticks) could be different from current human time, many programs need one or both.
Same goes for scheduling stuff, serializing stuff and getting stuff to trigger on arbitrary things all with minimal overhead. I suspect many things would be easier if you can create a way of having cluster wide consistent monotonic high-res time, and also fast low latency clusterwide locking.
On a related note many programming languages seem to like to push parameters onto a stack. That's fine but in many implementations they seem to push the data onto the code/call stack (which holds return addresses). This mixing of data and code stuff is unsafe. They should have separate data and code stacks. That way even if a hacker messes with the data, it's harder to execute arbitrary code- you'd just execute the same code but with mangled data, which is more likely to produce errors instead of arbitrary code execution.
If the reason for doing such insecure stuff is performance or convenience, then perhaps Intel etc can use those transistors to make it faster and easier to do things safer.
--
- Too many replies beneath your current threshold
Re:Not this shit again by chrysrobyn · 2012-12-04 02:20 · Score: 5, Interesting

Except... the number of transistors in a CPU is irrelevant!
No, it's very relevant.

A CPU doesn't have the transistor density that really benefits much from Moore's Law - because the vast majority of the space on a chip is not taken up by transistors, but by wiring. In fact, the wiring density is what's limiting transistor density (a good thing - larger transistors can give you better performance because they can drive the longer wires quicker).
How much wiring happens on doped silicon? None. The vast majority of the chip is covered in transistors, with 6-10 levels of wires on top of them. There are some designs where the I/O count demands so many pins that's what dictates the size of the chip -- so cache is filled in underneath. Heck, if your power budget allows it, you're already blowing the silicon area anyway, might as well increase your cache size! Consider your recent Core derived designs. Take away half the cache. Do you think the die area would go down? Not hardly.

Most of the transistors used in a CPU actually goes towards the cache - when you're talking about 16+ MB of pure L1/L2/L3 cache, implemented as 6T SRAM cells, that's 100M transistors right there (and that doesn't include the cache line tag logic and CAM).
You did the math right, but the cache line tag logic and coupled CAM are negligible. Sure, they may add a few million or so, but not anywhere near 5% of 100M.

The thing with the highest transistor density (and thus the most benefit of Moore's Law) is actually memory structures - caches, DRAM, SRAM, flash memory, etc. This is where each transistor is vital to memory storage and packing them in close means more storage is available, in which case Moore's law states that RAM etc. will double in capacity or halve in cost every 18 months or so.
I realize it's vogue for people to revisit Moore's Law and rewrite it every few years, but he was not speaking specifically about memory arrays. In fact, the chips Moore had access to at the time had very little memory on them.

Smaller transistors do help CPUs consume a little less power, but double the number of transistors doesn't do a whole lot because there's a lot of empty space that the wiring forces to be transistor-free. (Non-memory parts of the CPU are effectively "random logic" where there's no rhyme or reason to the wiring). It's why the caches have the most transistors yet take the smallest areas.
Wiring never forces silicon area to be transistor-free, unless you're thinking of 1980 era chips. Not even late '80s had wiring on doped silicon. Certainly the kinds of chips Moore was talking about has had no significant wiring on doped silicon in 20 years, the exceptions being only when layout designers are getting lazy. I've done layout design, I've done circuit design, I've audited dozens of chip layouts and seen several technology manuals dating back to the 90s.
That random logic, by the way, is the subject of the most innovation in the field of chip layout and arguably in all of chip design. When your chip's entire goal is to funnel data through different units and do different things to it, you're dominated by buses. Automated tools often do split these buses up, but different algorithms can pull them together and make them more efficient. Caches are the smallest because they can be small. There's an entire periphery to them, including senseamps devoted to reading the baby FETs that can't make full rail to rail swings on the bitlines.
May I guess you're a student? Perhaps one who is learning from a professor who hasn't been in the industry since about 1985?
Re:Not this shit again by HaZardman27 · 2012-12-04 02:35 · Score: 3, Funny

In fact, the chips Moore had access to at the time had very little memory on them
Well of course! That was a lot of 18 monthses ago!

--
Apparently wizard is not a legitimate career path, so I chose programmer instead.
Re:Not this shit again by ceoyoyo · 2012-12-04 13:22 · Score: 2

We also have the law of conservation of mass. Oh yeah, we had a nice demonstration of breaking that one in 1945. Then there's Kepler's laws of planetary motion: the first and second are pretty close, but not quite true (even if you're not looking at Mercury) and the third was only a statement of proportionality until it was refined later.
Notice that all the physics "laws" originate before the 20th century. They were exactly as I described: statements, preferably mathematical, that fit the data of the time but didn't offer the explanatory power that most modern theories do. Some of them have been proven to be untrue. Some of them were approximate. Some of them were later refined.
Actually, conservation of energy is about the only one that immediately comes to mind that is both exact and we still believe is universally true. I guess maybe you could put Newton's first and third laws in that category, but as you yourself point out, Newton's second law isn't really universally true unless you change it.

I hate when people misuse Moore's law by rminsk · 2012-12-03 12:13 · Score: 2, Informative

Moore's law is only about the number of transistors on integrated circuits.

Re:I hate when people misuse Moore's law by oodaloop · 2012-12-03 12:18 · Score: 4, Interesting

It turns out that Moore was observing only a small portion of a greater trend. That is, many (or most) things involving technology have a short doubling period, usually between 1 and 2 years. Total information created, processing power per dollar, resolution of brain scanning technology, et al all follow a similar doubling period to what Moore observed with transistors on integrated circuits. It's not that everyone else is misusing Moore's Law, so much as Moore didn't make make the law broad enough.

--
Tic-Tac-Toe, Global Thermonuclear War, and relationships all have the same winning move.
Re:I hate when people misuse Moore's law by Anonymous Coward · 2012-12-03 12:19 · Score: 5, Funny

No no, you misunderstand. They're teaching the compiler to be so efficient that it actually starts creating new transistors to make itself smarter.
Make no mistake, this is how the rise of the machines will start.
Re:I hate when people misuse Moore's law by newcastlejon · 2012-12-03 12:21 · Score: 4, Funny

oodaloop's Law doesn't have quite the same ring to it.

--
If God forks the Universe every time you roll a die, he'd better have a damned good memory.
Re:I hate when people misuse Moore's law by Anonymous Coward · 2012-12-03 12:26 · Score: 3, Informative

You do realize that all the things you listed are a side effect of the doubling of transistor density - and not some separate independent phenomenon?
Re:I hate when people misuse Moore's law by oodaloop · 2012-12-03 12:27 · Score: 2

Does that include the doubling of processing power before the transistor was invented?

--
Tic-Tac-Toe, Global Thermonuclear War, and relationships all have the same winning move.
Re:I hate when people misuse Moore's law by viperidaenz · 2012-12-03 12:31 · Score: 2

Moore's Law didn't exist back then.
Re:I hate when people misuse Moore's law by timeOday · 2012-12-03 12:49 · Score: 4, Insightful

I hate it when all the potentially interesting discussion on a slashdot story is headed off by a pedantic nitpick.
Re:I hate when people misuse Moore's law by loneDreamer · 2012-12-03 13:17 · Score: 4, Interesting

Magnetic Hard drives, for instance is an example of the same curve with no transistors.
Re:I hate when people misuse Moore's law by ceoyoyo · 2012-12-03 16:07 · Score: 2

"resolution of brain scanning technology"
Oh, that would be fantastic. Unfortunately it's just not true. Physics intervenes. Well, that and not making patients smoke (because they're on fire, not because they crave nicotine).

Or.. teach devs to use threading as appropriate? by databeast · 2012-12-03 12:22 · Score: 4, Informative

Or, gawd forbid.. we could teach programmers how to use threading? I am a casual developer, with no formal training beyond writing practical code and reading as much as I can from the keyboards of real developers. I've run into my fair share of "real", "professional" developers who've been tasked to work on my code and thrown their hands up in defeat - not, as I feared, because of the awfulness of my coding style, but "This uses threading, I don't know this!".. Of course, looking at their resumes, a long list of single threaded webapps where the actual threading is handled for them by the webserver itself.. I give up. Even some basic native GUI client development should teach developers simple threading and asynchronous callbacks? alas, yet another talent abandoned in the age of the webapp. Is it any wonder the security issues (my actual realm of supposed 'expertise') in their code often make the architectural issues look trivial in comparison?

An interesting development, and much needed I fear, but yet another layer of abstraction to allow lazy developers to not have to really bother about knowing what their code is actually doing (that's for the poor SoB who has to maintain it is for...)

Re:Great potential by MightyMartian · 2012-12-03 12:22 · Score: 3, Interesting

I'm wondering how this works. Does it scan for loops to remake them as event driven processes? Does it splice off multiple function calls and then throw the results into some sort of stack for retrieval? It's a cool idea, but man, for any kind of non-trivial program it must be some monumental piece of work.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.

Parallelizing is easy, performance is hard by HuguesT · 2012-12-03 12:24 · Score: 4, Insightful

OK, so now we shall have another way to produce parallel programs.

Running things safely in parallel is a very good thing, but it does not by itself guarantee significant improvements in speed. The hard bit is actually to optimize the parallel code.

Re:Great potential by Desler · 2012-12-03 12:25 · Score: 5, Informative

Having the compiler identify things which could run in parallel and thread work to run as A B&C D would be a huge step forward as long as it doesn't introduce bugs.

Compilers already try to do this for example with auto-vectorization. The problem is that they are usually quite terrible at it. Even Intel's compiler which is probably the best at it usually misses out on most of the obvious places that should be vectorized. This is why pretty much all code dealing with multimedia content (audio/video/image codecs, games, etc.) still rely on tons of had written SIMD to be optimized to their fullest.

Re:Great potential by Desler · 2012-12-03 12:26 · Score: 2

And it must be hell trying to debug without digging deeply into the generated code.

mutable state by jbolden · 2012-12-03 12:31 · Score: 4, Informative

Mainstream language have mutable state all over the code. Functional programming's big change on state issues is to careful isolate state. The Microsoft approach means that state needs to be tracked carefully so that it could be isolated by the compiler even if it isn't isolated by the code. Which is likely just as much work as isolating state. And the nice thing about isolating state is once you do it you can make use of all sorts of incredibly powerful functional paradigms like like first class functions (closures, partial execution...) and lazyness (infinite data structures, no need to figure out proper order of evaluation..)

The solution to parallelism is functional programming. And no it is not too hard. Excel is a functional programming language that lots of people know that does a great job isolating state.

Re:mutable state by readin · 2012-12-03 13:02 · Score: 2

I wish I knew more about functional programming than I do. In reading the article I found the concept of a language the limits mutable variables interesting. In using Java I find that making most variables "final" is helpful to me for a number of reasons. I can easily find where the variable got its current value. If I write a line of code that changes the wrong variable it is immediately obvious. It keeps me from re-using variables that shouldn't be re-used (generally a variable should have one meaning and one value). It helps me keep variable scope narrow.

Is this article suggesting that most functional languages make their variables the similar the Java "final" or am I way way off?

--
I often don't like the choices people make, but I like the fact that people make choices. That's why I'm a conservative.
Re:mutable state by betterunixthanunix · 2012-12-03 13:22 · Score: 3, Interesting

In functional algorithms and data structures, everything is immutable (in theory) -- rather than thinking in terms of "final," think in terms of Java's String class. If you want to change one character in a String instance, you must create a new String instance. For a less trivial example, consider how a functional algorithm that removes an element from a list would work (written in Java-like syntax):

List remove_if(List lst, Predicate pred) { if(lst == null) { return null } else if(pred.test(lst.first())) { return remove_if(lst.rest()); } else { return new List(lst.first(), remove_if(lst.rest())); } }

Notice that this constructs an entirely new list, even if none of the elements in the list pass the test. This may seem like a terrible idea, but let's put it this way: if you have 10 threads that share the list, and one of them wants to remove some nodes, you would have had to have copied the list anyway; the functional approach to remove_if is exactly what you want. Now, consider this function, which only removes the first node to match:

List remove_first(List lst, Predicate pred) { if(lst == null) { return null; } else if(pred.test(lst.first)) { return lst.rest(); } else { return new List(lst.first(), remove_first(lst.rest())); } }

Now you have a situation where lists share nodes -- and again, imagine a situation where 10 threads share the list, and one wants to perform this operation. This is one of the reasons that functional programming is so promising for parallel algorithms: you have fewer situations where explicit mutexes are needed, because you are usually copying things that you intend to modify (or more precisely, you never really modify anything).

Of course, things are more complicated in the real world. Purely functional approaches would obviously be pretty inefficient in a lot of cases, since things would be needlessly copied. Lisp, as an example, has both a non-destructive append as well as a destructive nconc, the latter being intended for use in situations where the original lists will not be used again (and can therefore be modified).

--
Palm trees and 8
Re:mutable state by shutdown+-p+now · 2012-12-03 14:28 · Score: 2

All mainstream languages are slowly gravitating towards FP-style - C# is already halfway there, Java is on its way to join it, C++ STL is functional in spirit if not in implementation, and of course we have new languages like Scala that are explicitly designed to marry the old and the new.
Problem is, for transparent parallelism, you need to go 100% functional / immutable, and that's just not happening. What does happen is isolating distinct units in an OOP program that are then implemented functional-style with immutable data structures etc - and then exposed via a high-level conventional OO API at component boundaries. Which, if you think about it, is not all that different from what the paper talks about.
Re:mutable state by chthon · 2012-12-03 21:17 · Score: 2

Problem is, for transparent parallelism, you need to go 100% functional / immutable
Which comes down to re-educating programmers to learn to think about their algorithms.
One book which does this nicely, without going to deep into theoretical computer science is How to Design Programs

Auto-Threading by DaneM · 2012-12-03 12:34 · Score: 2

About time.

Re:Or.. teach devs to use threading as appropriate by lennier · 2012-12-03 12:42 · Score: 4, Insightful

Or, gawd forbid.. we could teach programmers how to use threading?

That's easy: "Don't."

From everything I've read about threading, there's no general way a hand-coded multithreaded program can ever be proven to be correct. Threads introduce all sorts of extremely subtle timing-based logic and security bugs which even the smartest programmers in the world think they can handle but in practice don't. And most programmers are not the smartest programmers in the world (they only think they are).

The correct solution is to switch from C-based imperative languages to pure-functional implicitly parallelised languages, but that's not likely to happen before the heat death of the universe.

--
You are not a brain: http://books.google.com/books?id=2oV61CeDx-YC

Re:Great potential by Anonymous Coward · 2012-12-03 12:46 · Score: 2, Interesting

Any code that doesn't access the same resources can be run simultaneously. Alternately, if no thread is writing to a resource, it can be shared without locking.

Those two observations allow you to determine when code can be run together in different threads. If a method only uses 'final' or 'const' things, then you can run it. If your method only uses one isolated cluster of objects, then it can run simultaneously with another method that only uses a different isolated cluster of objects.

Honestly, any programmer worth $120k could make something more efficient just using threads, if they thought about it. This will get you 20% of the efficiency gains without thinking of it.

Re:Great white hope? by belg4mit · 2012-12-03 12:46 · Score: 2

Indeed. silicon is sort of a shiny black color!

OP probably once read or heard the phrase,
misunderstood the context and thought it
would embiggen the post's language.

--
Were that I say, pancakes?

Re:How about by chromas · 2012-12-03 12:49 · Score: 5, Funny

all the magic compiler is going to do is increase the speed of the System Idle Thread.

Have you seen how much processing power that thing consumes? Usually 90% and up. On all cores. It really needs some serious optimization.

Re:Or.. teach devs to use threading as appropriate by Intropy · 2012-12-03 12:58 · Score: 2

This is totally true.

So long as you define trivial programs to be a subset of programs that can be proven correct.

Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · 2012-12-03 12:59 · Score: 2

Or, gawd forbid.. we could teach programmers how to use threading?

No, we want our compilers to do these things for us, because things that compilers can do they usually do better than humans can. Once your programs become large enough and complex enough, compilers outperform humans every time -- it's not about laziness, it is about the limits of human programming ability.

Compilers surpassed humans when it comes to optimization a very long time ago, except for very small programs or very short inner loop bodies.

--
Palm trees and 8

Re:Or.. teach devs to use threading as appropriate by databeast · 2012-12-03 12:59 · Score: 2

Bingo.. and as many other folks have pointed out, 90% of of threading is purely to decrease latency and bypass blocking operations - very few applications out there today are heavily dependent on concurrency to achieve any kind of raw horsepower... This does just seem like, if it were to become mainstream, would just become a crutch for developers to ignore threading and blocking I/O issues entirely, because "The compiler will sort that out".. In theory, this isn't necessarily a bad thing. In theory, all things work in practice too.

Re:Great potential by allcoolnameswheretak · 2012-12-03 13:00 · Score: 5, Insightful

Imagine you have a for-loop that calls the method 'hop' on every object 'bunny' in the list:

for every bunny in list {
bunny.hop()
}

This is a simple, sequential process - bunny.hop() is called in order, for every bunny in the list, one after the other.
Now suppose you have defined the data in your program in such a way that the compiler knows that method 'bunny.hop()' only ever accesses read-only data, or modifies local data that is unaccessible from anywhere else. The compiler now knows that the order of execution of the bunny hops doesn't really matter, as every call of bunny.hop() is independent from anything else. This frees the compiler to spawn threads or processes to call as many bunny.hop()'s as he likes at the same time, thereby processing through the list alot faster.

Another method, bunny.eat() actually performs write access on a shared object 'carrot' when called. If two bunnies eat the same carrot, the compiler can not perform automatic parallelization, as running two bunny.eat() methods could lead to invalid state (only one piece of carrot remaining, two bunnies eating 1 piece of carrot at the same time, results in -1 pieces of carrot). In this case, the compiler will take care to run two bunnies eating away at the same carrot sequentially. However if there are 2 bunnies eating the green carrot and another 2 bunnies eating the yellow carrot, these are again independent from each other and can again be paralleled.

The requirement to make this possible is to provide the compiler with information on what kind of data something is - is it an isolated hopping? Or a shared carrot? or a global Bunnygod that affects all bunnies?

Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · 2012-12-03 13:07 · Score: 2

How about we write new code in a functional language, instead of clinging to imperative languages? Put C, C++, and Java in the category they belong in: maintenance and legacy code only.

--
Palm trees and 8

Re:Or.. teach devs to use threading as appropriate by Jonner · 2012-12-03 13:08 · Score: 5, Insightful

An interesting development, and much needed I fear, but yet another layer of abstraction to allow lazy developers to not have to really bother about knowing what their code is actually doing (that's for the poor SoB who has to maintain it is for...)

Developing software is all about managing complexity. Abstractions are the primary tool used to do so. They are neither inherently good or bad. A large part of writing good software is finding the appropriate abstractions and eliminating inappropriate ones. If an abstraction allows a program to be written more easily with an acceptible level of performance and correctness, it is an appropriate abstraction.

To know what code is "actually doing" is relative. No programmer knows what his code is doing at the level of gates on the chip. It's rarely necessary or even helpful to know what the code is doing at the level of CPU instructions or microinstructions. This is especially true if the code runs on multiple CPUs.

Meh. Not that big a problem. by fyngyrz · 2012-12-03 13:18 · Score: 4, Insightful

Honestly, you know what this says? This just says some programmers still need to go practice in the threading space.

Threading is not that hard at all for a VERY large class of problems. There are issues -- like, the memory bandwidth will eventually choke the ability of processors to get at nearby memory regions -- and you need some kind of sane plan to deal with cacheing in one core as compared to the next core when you share flags or data coming in during execution from elsewhere -- and you need to figure out when thread overhead starts to chew into your efficiencies -- but really, it's entirely doable, and isn't that difficult a problem space as compared to the actual problems *serious* programmers have to solve. It may just be that your "thing" isn't really all that threadable. Some things aren't. But it should also be very obvious when they aren't.

Then again, some problem spaces lend themselves very well to multiple core approaches -- I just finished a library that slices up images and then applies various separable algorithms to the slices in a variable number of threads (the user of the library can specify.) I wrote it in pure C, used posix threads, no sweat at all, in my 8-core machine, you get just about exactly the gains you'd think (x the number of cores) when the process is CPU cycle intensive, less as the ops are simpler and memory I/O rates rise.

Seriously. If threading seems that hard, you need to go do some studying. It's not the threading. It's you. You really should get on this bus.

--
I've fallen off your lawn, and I can't get up.

Re:Meh. Not that big a problem. by Desler · 2012-12-03 13:21 · Score: 2

You seem to have replied to the wrong person as I'm not seeing what your post has to do with mine.
Re:Meh. Not that big a problem. by rk · 2012-12-03 13:45 · Score: 5, Funny

Maybe it's subtle sarcasm regarding learning threaded programming? People can't identify the right thread in a conversation, but should be expected to write threaded code? :-)
Re:Meh. Not that big a problem. by Anonymous+Brave+Guy · 2012-12-03 19:00 · Score: 3, Insightful

It may just be that your "thing" isn't really all that threadable. Some things aren't. But it should also be very obvious when they aren't.
Should it? Do you really want to bet that you understand a modern CPU, the related hardware architecture, and a production quality compiler for that platform, all well enough to predict reliably when it's worth running a few instructions in parallel and when it isn't, taking into account the inevitable overheads of fork/join operations vs. the potential performance gains of concurrency? Can you really analyse the global structure of a 1,000,000+ line project and reliably identify algorithms that are expressed sequentially but in fact could be transformed losslessly into parallelised equivalents?
Implementing efficient algorithms for embarrassingly parallel computations is (relatively) easy. Implementing efficient algorithms for basically independent parallel operations, such as a web server, is (relatively) easy. Implementing efficient algorithms for problems that might admit some asymmetric parallelism as long as all interactions are coordinated safely and where it might be worth implicit generation of some parallel code paths as long as the benefits outweigh the overheads... Well, if that doesn't seem hard to you, I look forward to reading your papers, because you just outsmarted a lot of researchers who have been investigating this problem for years.

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Re:Meh. Not that big a problem. by chthon · 2012-12-03 21:50 · Score: 2

The problem with generalised parallel programming is that the possible interactions between all threads are on the order of O(n!). That is the big problem. That is also why coordinating parts of the program around synchronising objects makes the program easier to understand, because in the same way that O(n!) grows very fast, it also shrinks very fast if n can be reduced.
Another problem I see is that the current core capacity is too large for running small problems.
Take a complex mathematical expression, which can be transformed into a syntax tree with values and mathematical operations. In the set of mathematical operations, there is a big difference between additive, multiplicative and transcendent operations. Either the transcendent operations also need to be transformed into a syntax tree which can be executed on many more, much simpler nodes, or the tree must be balanced into transcendent operations, with the other simpler parts all to be run on one node with the same cost as the transcendent node.
Re:Meh. Not that big a problem. by fyngyrz · 2012-12-04 03:38 · Score: 2

I very much disagree.
Fine. But that doesn't make it so.

however when you want to write an application that essentially puts everything asynchronously into the background (not just for speed, but also interactive responsiveness) then things get really complicated really fast, doubly so when programming in a language with no build in support for parallel programming.
Perhaps you and I simply have different appreciations for the word "complicated." The only "support" I require for parallel programming are concurrency controls such as mutexes and a means to (hopefully, efficiently) launch a thread. I've written multi-threaded image processing, SDR code (and believe me, that's the poster child for requiring responsiveness... it's pure realtime), AL/evolutionary code, ray tracers, and so on. Me, a c compiler, and a lot of pizza. :)

The core problem with parallel programming is that it bloats your code a lot, things that were trivial in sequential code turn into code that is five times the size and a nightmare of callback chaining.
Sounds to me like that's a consequence of your approach, or perhaps your tools, because "a nightmare of callback chaining" is nothing that's ever happened to me. Quite the contrary, in fact. So it goes from your "core problem" to no problem at all in my case. I would go so far as to venture the suggestion that you may be doing it wrong, or poorly, or with a really problematic language. Or perhaps you just need more pizza. As I said above, it takes study.
It's a skill set that's very similar to that required to learn assembly optimization. What I mean by that... where you move invariants outside loops, where they only get computed once; while the stuff that actually needs to change constantly is in the inner loop. You have to be able to look at an algorithm and see what needs to be computed within the current context, what comes from outside, what is interdependent, and so on. Without that insight, you can't code this stuff at all. But like learning to optimize assembler, once you learn that this is what you need to know about things, you begin to perceive it, and eventually, you'll get good at perceiving it.
I've always felt that the best programmers (in the sense of creating the most efficient code in both space and execution time) are those who understand assembly programming extremely well, can look at the assembly code a c compiler produces and read it like it was this morning's news, and have a broad familiarity with a pretty good set of HLLs. For me, c is the ideal tool -- it is at precisely the right level to let me do what I need to do, no matter what it is; if I need OO, I can implement objects of my own design. If I need a thread manager, I can write it. If I need some specific type of control over memory allocation, or list management, associative arrays, etc., I can write it. Or lift it from a large collection of stuff I've already written. The higher up you go from c, and the more you depend upon other people's code and libraries / objects, the more control you lose. If that's ok, fine. But for parallel programming, I've never found that it was ok. Your milage, of course, may differ. I would only point out that I am very comfy with threaded programming, and you do not seem to be. There may be some useful information in that for you.

--
I've fallen off your lawn, and I can't get up.
Re:Meh. Not that big a problem. by fyngyrz · 2012-12-04 04:05 · Score: 2

The problem with generalised parallel programming is that the possible interactions between all threads are on the order of O(n!).
Only when the threads share data; and that's what mutexes, etc., are for. If you share data without mutexes, it had better be 100% invariant (table of constants, etc.) Or actually not shared: different regions of an image where the process at hand doesn't have inter-region dependencies. So really, it's not O(n!) in any practical sense.

Take a complex mathematical expression, which can be transformed into a syntax tree [rest of very well presented stuff clipped]
Also, you need to take account of (and make the end user aware of) the difference between a real core and hyper-threading. Because while you may have run it on a quad core, they may run it on a "hyperthreaded dual core", and this is not the same thing at all, unless you can arrange for those stalls (in your hypothetical slow transcendent ops) to go do something else useful in your overall context.
Sometimes the right answer to the type of problem you present is to move the goalposts. Sometimes you can use fixed point, tables of sin/cos, etc. 16-bit audio data is a good example of where this is easily done for certain types of operations. Generate the table(s) once on startup, thereafter it's no slower than any other memory access, yet it's still as transcendent as it needs to be.
A concrete example: If you're going to rotate pixels in an image, then, you could use "nx = (x * sin(theta)) - (y * cos(theta))" as part of the job, if you wanted to do it that way. But you can observe that theta is invariant, so you only need two constants: sin(theta) and cos(theta); you don't have to run sin() and cos() for every pixel. So now your potentially stalling transcendents are of no consequence; your execution is "nx = x*StoredSinConst - y*StoredCosConst." (and of course, if you process the image Y by X, then you don't have to calculate ny except once per line, either, and you can substitute addition for multiplication if you process the cartesian space sequentially, but that's a different issue. Sorry. :)

--
I've fallen off your lawn, and I can't get up.
Re:Meh. Not that big a problem. by rufty_tufty · 2012-12-07 04:11 · Score: 2

I think the problem is the original article was about an approach to add another route to parallelise an existing bit of arbitrary code.
Fyngyr jumps in and says something along the lines of 'well if you just wrote with parallelism in mind in the first place when architecting the system you wouldn't have this problem.' A viewpoint I agree with and in fact I would extend to say that from my viewpoint most systems are easier to design to be parallel in the first place than to force single threaded onto them once you are trained by whatever methods to think in this way.
Now is where I think the problem comes, most SW engineers are so used to working in single threaded worlds and are so trained to think in single threaded modes it's all they know. The education and languages are so geared to single threaded viewpoint that this is understandable.
So we end up with Fyngyr saying 'just design it to be parallel in the first place' and yourself correctly pointing out that not all problems can be parallel. Then the reaction of most SW engineers is that this is a compiler problem and we're back to TFA.
I'd rather say that there is middle ground. IME some parts of the problem are inherently parallel, some are inherently serial and forcing one to be something it isn't is plain silly. The fact that the language encourages you to iterate serially over an image one pixel at a time without being able to say this can be done in parallel is to me daft. The fact that when most people design a large system they get the single threaded version working first then try and add in parallelism is totally wrong. I could rant all day about this but I think you get my point. Which is that while not all large problems are trivially parallelisable most are at least to some limited (but still huge) extent, so the fact that it cannot be solved for the totally general case should not stop it from being a preliminary design consideration and the fact that it often isn't is I think a shameful declaration of the state of the software industry.
Now the coarsely grained multi-threaded-ness I'm talking about here may become invalid when we've got 256K core processors, but we're a way off that yet.
As a side note to make my point, consider the times when your computer seems slow. What is it doing? Perhaps rendering a web page while your backup application is running in the background? Perhaps you also need your mail app to be checking the server at the same time as you've got music playing in the background. In any of the cases i can think of where my computer seems slow to complete a task it is a completely trivial problem and there is no excuse outside of poor system architecture. (this is of course ignoring the compute intense stuff that I might do like let's say transcoding video or rendering 3d, or playing computer games - oh wait, also trivially parallel.)

--
"The weirdest thing about a mind, is that every answer that you find, is the basis of a brand new cliche" -

Re:Great potential by Boawk · 2012-12-03 13:23 · Score: 5, Funny

If two bunnies eat the same carrot...the compiler will take care to run two bunnies eating away at the same carrot sequentially. However if there are 2 bunnies eating the green carrot and another 2 bunnies eating the yellow carrot

Am I the only one who is totally horny after reading that?

Oh! My Sweet Day. by 140Mandak262Jamuna · 2012-12-03 13:30 · Score: 2

All these days of careful coding diligently avoiding static_casts and const_casts ... All those recompilations triggered because I had to touch one header file used by everyone and his brother to satisfy my strict insistence of const correctness... I paid and paid and paid all these days. I avoided mutable data members because Soustroup pontificated in some vague posting in comp.lang.c++ (OMG I even forgot the usenet group name!) that "a well constructed object oriented code should not require mutables".

This is my pay day baby! My code is going to run super fast in multicore machines without any (more) extra work from me! Wooo Hooo!

Please take pity on my and let me enjoy my day dream for a few hours, without rudely inserting reality into my reverie! Thanks slashdotters, I know I can count on you!

--
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact

Re:Oh! My Sweet Day. by 140Mandak262Jamuna · 2012-12-03 16:13 · Score: 2

That is the only and correct reason to be a const correctness nazi. Auto threading and auto vectorization would produce a few papers and mint some PhDs. But for the guy hacking code to make a living, they are not going to make much diff.

--
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact

Teach programmers by Culture20 · 2012-12-03 13:35 · Score: 2

The holy grail of parallel processing is teaching programmers how to handle parallel processing (and what domains can benefit and where).

functional programming by segfault_0 · 2012-12-03 13:35 · Score: 2

A compiler may be able to thread your program, but it will be a long time before it understands your intent well enough to do it well. Also I can think of many situations under which such functionality may not even be a good idea at all. I would assume such a system would use a pool of threads to avoid constant thread construction overhead and if you get a few IO-busy actions threaded out in an ignorant fashion I think you will find your program blocking a lot more, and producing a lot less throughput than it should.

Also, the OP blithely stated that functional programming isn't happening -- yet features of the functional paradigm, like anonymous functions, have made their way into nearly every language of consequence and many new languages proudly tout functional programming features (see f#, scala, rust, clojure, and go). Perhaps pure functional programming is moving pretty slow, but it's features are not.

--

I was crazy back when being crazy really meant something. (Charles Manson)

Nothing new here by stevel · 2012-12-03 13:39 · Score: 4, Informative

This is nothing new. As in decades-old. Back when I was at DEC in the 1980s we had auto-threading compilers that worked very well on standard application code. Today, Intel has been doing auto-threading in its C++ and Fortran compilers for more than a decade and it is very effective for a large class of applications. Intel's compiler also has features that help you tweak the code for better auto-parallelism, notably the Guided Auto Parallel feature introduced in 2010. Other compiler vendors also have auto-parallelizing compilers.

I've been in the industry for decades, and every couple of years someone comes out with yet another programming language or technique that is supposed to revolutionize application parallelization. This is just another one, and many years behind the rest of the industry from what I can tell.

Re:Or.. teach devs to use threading as appropriate by SplashMyBandit · 2012-12-03 13:39 · Score: 3, Insightful

Actually you don't need functional. Java works fine with multi-threading (well, it does for my daily use). In fact, that is the real strength of the garbage collection model of Java, it can track and manage efficient release of resources in a multi-threaded system. Also Java has direct language support for synchronization mechanisms and extensive library support for multi-thread friendly data structures (Doug Leah is a legend!). Lots and lots of multi-threaded programs get written in Java, without needing functional languages, it's just that we we do write them they do work (after we test for various possible thread-related problems), work well, and we don't have to bitch about needing to change paradigms to get things going (eg. needing to change to functional languages when they are not always the best fit for many classes of problem).

Of course you'll hear those still stuck on legacy languages designed without multi-threading in mind whinging. That doesn't mean there isn't a straightforward, easy to learn language and development platform out there that isn't strong on multi-threading. There is - and it is cross-platform and general purpose too.

Re:Great potential by Culture20 · 2012-12-03 13:43 · Score: 2, Interesting

Imagine you have a for-loop that calls the method 'hop' on every object 'bunny' in the list:

for every bunny in list {
bunny.hop()
}

And if the original intent is to make the bunnies hop in sequence like performing "the wave", making a new thread for each hop might produce a visual effect of them all hopping at once if the sleep occurred in the hop() function instead of the for loop calling the hops.

Automatically paralleling compilers by Animats · 2012-12-03 13:49 · Score: 2

Automatically paralleling compilers aren't new. SGI had one for C ten years ago. Intel has it for C++ and Fortran now. It's been done for Matlab. There's plenty of theory.

Outside of the supercomputing community, it's never gone mainstream. Microsoft is at least trying to do it for C#, which is a reasonable language. An interesting point is that this Microsoft approach exploits immutable objects. Immutable objects can be freely shared without locking, so wide use of immutable objects makes automatic extraction of parallelism easier.

I'd looked at doing something similar for Python, to get rid of the Global Interpreter Lock, Python's boat-anchor. Python already makes wide use of immutable objects, but doesn't gain any performance from them. If everything is thread-local, immutable, or synchronized in the Java sense, you don't need global locking. But the politics of the Python community do not favor performance. (Potentially, Python could be up to at least the LISP level of performance, within a factor of 2 of C, if the language was restricted in certain ways.)

Another part of the problem is the pernicious heritage of POSIX/Linux locking primitives. Many programmers think that locking is an library or OS-level problem, not a language level problem. Locking via library primitives means language doesn't know what data is locked by which lock. This makes optimization and race checking very tough.

The political and social problems here are tougher than the technical ones. So the open source community can't solve them. It takes someone with a big hammer, like Microsoft, to do it.

"Who tells whom what to do?" - V. Lenin

Re:Automatically paralleling compilers by whistl · 2012-12-03 15:12 · Score: 3, Informative

SGI's automatic parallelizing software came from Kuck and Associates, Inc (kai.com). I worked there for 8-1/2 years, and one disappointing fact we learned was that the only people who really cared enough about parallelizing their software to analyze their code and modify the source to make it faster were either research scientists (of which there were relatively few) who mostly wanted quicker and cheaper results (because renting time on supercomputers costs $$) or marketing departments of computer hardware manufacturers (of which there were fewer) who only wanted to be able to advertise higher SPECmark numbers for their hardware. SGI was the only manufacturer who shipped our product with every C and Fortran compiler they sold. IBM, DEC, HP only sold it as an option, but all used it internally to speed up their own benchmark numbers.
Automatic parallelizing is tough, tougher than you think. It's nearly impossible if you don't have a human performing program analysis and adding source code directives to inform the compiler about data dependence needs.

Re:Great potential by Kjella · 2012-12-03 13:50 · Score: 2

There is great potential here if this sort of thing can be made to work cleanly and safely. I suspect a lot of programmers (myself included) think better in a single-threaded manner. People are mostly used to dealing with A B C D and, as such, I think we usually write code to execute steps the same way.

If you've ever done a dish with more than one casserole or making a side dish while the main dish is cooking you've done parallelization in real life, the problem is that expressing it in a computer language is hard. Functional programming works okay if you're doing lots of similar things at the same time, but not if you're doing lots of different - but parallelizable - tasks. I want to call oven.roast( meat ), stove.boil( potatoes ) and workbench.slice( salad ) but I don't particularly care about the order and I know they're side effect free relative to each other, but how do I express that? Either I write them in sequence ABC and they happen in sequence ABC or I have to write threads and schedule them which is honestly overkill - I want it to use the resources efficiently not micromanage the schedule myself.

Particularly if these come in random and unpredictable amounts like the orders at a restaurant I need the computer to work it out itself. It's also very important if you have small resource conflicts like the hot casserole and hot pot roast both requiring mittens to move. I know quite a lot about mutexes and locks and semaphores to make it explicit but now it feels like I'm explicitly describing the opportunities for parallelism not the dependencies preventing it. Imagine if you were a project manager, do you start with every task depending on every other task? No, you start with the tasks then make dependencies where they're needed.

I wonder if you could do something more with "scoped const-ness", for lack of a better term. That is to say a function is non-const in a certain scope and const outside it, and thus can be executed in parallel with anything else outside that scope. To continue the example above, let's say I could define a function oven.raiseTemperature() that's non-const in the oven scope and const in the global scope and a function stove.stirPot() that is the same for the stove. If I then call them the compiler would know it's free to reorder them or run them in parallel, it picks the optimal execution based on the hardware available or even dynamically at run time. Functions that depend on multiple objects would automatically become fences to "sync up", though not necessarily on a global scale. Like a function on both the oven and stove may sync the kitchen, but not the living room.

It might not give you massive parallelization but it'd at least let you parallelize many mixed tasks that I don't feel you manage to do easily in any language today. It certainly sounds like a safer way than starting off a bunch of worker threads and praying you didn't create any deadlocks or live locks or race conditions or whatnot. Much like the global const the compiler should be able to validate that it doesn't in fact have any side effects outside the scope it claims. I'm sure there's lots of reasons why this wouldn't be as useful as I imagine it would be, but it certainly seems easier for those of us used to imperative programming.

--
Live today, because you never know what tomorrow brings

Re:Great potential by Your.Master · 2012-12-03 13:51 · Score: 4, Insightful

If it has a visual effect, then it's not read-only.

Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · 2012-12-03 13:52 · Score: 5, Insightful

Let me repeat myself: except for very small programs or very short inner loop bodies. Most of what you are describing is pretty small in terms of the amount of code and its complexity; hand-optimizing assembly code does not scale up.

Even the best of C or C++ compilers are terrible at vectorization of code.

Yeah, and the best humans are terrible at allocating registers -- so bad, in fact, that the best C compilers ignore the register keyword. What do you think is more relevant to the general case: vectorizing multimedia operations, or allocating registers? Compilers are also better than humans at:

Finding redundant or unreachable code
Finding dead or otherwise unneeded variables
Finding algebraic reductions
Strength reducing transformations
Boolean optimizations
Reducing program size

...and numerous other optimizations that, like register allocation, are more generally applicable than SIMD. There has also been work on compiler optimizations that are utterly out of reach for even the most skilled humans, like probabilistic optimizations that have a negligible (with respect to some tunable parameter) probability of being unsound.

To put it another way, look at the Orbitz story, which is over a decade old now:

http://www.paulgraham.com/carl.html

On the one hand, you have hand-tuned assembly language. On the other, you have a program written in Lisp (a high level, compiled language) with some C++ mixed in (for managing memory). Orbitz was able to compete on speed, but more importantly, it was returning better results. It's not that the people who wrote that mainframe assembly language were idiots -- they were taking advantage of all sorts of special hardware features, they knew how to hack their machines better than anyone else -- it is just that the Orbitz algorithm was far too complex for efficient hand-rolled assembly code, at which point compilers are really the only choice. The mainframe guys were busy thinking about how to make use of special machine features in assembly language; the ITA team was busy solving the higher-level problem, and relying on their compiler to generate good assembly language code. This is a particularly telling line:

We disassemble most every Lisp function looking for inefficiencies and have had both CMUCL and Franz enhanced to compile our code better.

[emphasis mine]. They disassembled their code...and then improved their compiler when they saw problems. They did not hand-roll the code, they made the compiler do a better job of generating code. These are not lazy programmers, nor are they programmers who do not know how to use assembly language; they are programmers who understand that they have a tool that is far better at generating assembly language than they are, and that they have more important things to do with their time.

I deal with quite a bit of crypto code in my work. I have seen lots of hand-tuned assembly language, I dealt with code that took advantage of the AESNI instructions to perform very fast encryption. I am well aware that in small, highly specialized functions (like AES), humans are better able to utilize special instructions to improve performance. Those are niche cases, and the techniques used in those cases have very limited applicability (even SSE is fairly limited in its applicability, by comparison with the sort of code programmers write and maintain every day), and the techniques scale very poorly.

--
Palm trees and 8

Re:Great potential by paskie · 2012-12-03 13:55 · Score: 4, Insightful

One would assume that "world modification" (monad; in this case, visual I/O) does not fit the restriction "only ever accesses read-only data, or modifies local data that is unaccessible from anywhere else" and as soon as you are showing something to the user, that blocks the auto-threading.

Then again, if you are drawing map tiles, the order you draw them in doesn't matter - so this goes to show that in some situations, there's no way around the programmer actually having to stop, think and declare whether the program's contract allows it to do some user-visible operation in parallel.

--
It's not the fall that kills you. It's the sudden stop at the end. -Douglas Adams

Re:Great potential by Jeremi · 2012-12-03 13:55 · Score: 5, Interesting

Compilers already try to do this for example with auto-vectorization. The problem is that they are usually quite terrible at it.

I suspect one reason they are so bad at it is they have to be very conservative in how they optimize, due to the relaxed nature of the C language. For example, if the C optimizer cannot prove beyond a shadow of a doubt that a particular memory location isn't being aliased, it can't make any assumptions about the location's value not changing at any step of the process. In practice, that means no optimizations for you.

Given that, it would seem that the Microsoft approach (using not only the higher-level language C#, but a specially customized version of C#) gives their optimizer much greater latitude. Because the language forces the programmer to annotate his objects with readable/writable/immutable tags (think C's "const" tag, but with teeth), and because the language (presumably) doesn't allow the programmer to do sneaky low-level tricks like casting away const or aliasing pointers or pointer math, the optimizer can safely make assumptions about the code that a C or C++ optimizer could never get away with. That may allow it to be more effective than you might anticipate (or maybe not, we'll see).

--

I don't care if it's 90,000 hectares. That lake was not my doing.

Re:Great potential by elfprince13 · 2012-12-03 14:11 · Score: 3, Insightful

One problem, and the great attraction of this way of doing things, is that most software companies don't have too many programmers worth $120k. Lots of $50k-$80k, but not so much on the higher end of the pay scale until you get into research divisions at big companies like Microsoft or Google or IBM.

Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · 2012-12-03 14:19 · Score: 5, Insightful

Just like quicksort(3) is far faster than bubblesort so too is a highly threadable code faster than non-threadble code

First, just to be pedantic, I'll point out that quicksort is as bad as bubblesort in the worst case, to a constant factor (you should have picked heapsort or mergesort). That aside, it is worth noting (and I am somewhat bothered by this when it comes to TFA) that we still do not know if it is even possible to optimize any program by parallelizing it; see the NC-vs-P question:

https://en.wikipedia.org/wiki/P-complete

Multithreading is not a magic bullet, and in all likelihood it is not generally applicable.

Languages do not, contrary to belief, express intent, the provide a strict set of instructions that the computer MUST respect

Wrong on all counts. Imperative languages are a way to convey instructions to a computer; declarative languages do not convey instructions, and programming in a declarative language requires an entirely different mode of thinking (it is closer to asking a question that giving instructions). It is also not strictly necessary for the computer to do exactly what a program expresses; there has been some work on compiler optimizations that have a (tunable) chance of not maintaining soundness.

In the end a good algorithm with no compiler help will beat optimized "dumb" code in all cases larger than "toy" (say, a few dozen "n" in Big-O notation)

If you are convinced of this, try implementing something more complex than the algorithms you see in Knuth's books; say, this:

http://eurocrypt2010rump.cr.yp.to/9854ad3cab48983f7c2c5a2258e27717.pdf

Then ask yourself this: could the constant factors in your implementation be better? At the end of the day, big constant factors will hurt your performance so badly that you might as well have used an asymptotically worse algorithm; indeed, consider fast integer multiplication:

https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm

10000 digits are needed before that algorithm actually outperforms the asymptotically worse Toom-Cook family of algorithms. Here is an even more extreme example:

https://en.wikipedia.org/wiki/Coppersmith%E2%80%93Winograd_algorithm

Sure, that's a better matrix multiplication algorithm in the asymptotic sense...but only for matrices that are so large that you cannot even store them on today's computers.

So really, while you are right that asymptotic improvements will always beat constant factor improvements (which is what compilers are mostly going to do for you), you are wrong to ignore the importance of constant factor improvements. In the real world, constant factors matter. In the real world, quicksort and mergesort will use asymptotically worse algorithms below a certain problem size because of constant factors. In the real world, large integer multiplication is done using Karatsuba or Toom-Cook methods, not FFT methods, because of constant factors. In the real world, if you are not using a compiler to optimize your code, your code is going to be slower than it needs to be, even if you spent hours hand-tuning it, unless you are only dealing with toy problems.

--
Palm trees and 8

Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · 2012-12-03 14:41 · Score: 2

Some threading problems are hard, really hard. They aren't that common.

Which is why compilers should be handling as many threading problems as they can: so that rather than spending time and mental effort dealing with problems that are not so hard, we can spend time and mental effort on problems that compilers cannot solve for us. There is a reason that we use higher level languages, and that reason is not "laziness."

--
Palm trees and 8

Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · 2012-12-03 14:51 · Score: 4, Informative

How frequently do you spend time thinking about which registers are being used to store partial results in the programs you write? When was the last time you spent time thinking about which registers are being used by which variables, or when it would be a good time to store register contents in memory?

What about memory alignment and packing? Calling conventions? Near pointers and far pointers (OK maybe that one is a bit dated)? Most programmers do not think about these problems, because most programmers are trying to solve bigger problems and need to spend time thinking about the bigger picture. Getting mired down in low level details makes programmers less productive -- that is why we use programming languages.

So if we can have compilers that can automatically utilize multithreading, even if it is in a somewhat limited context, we should be happy. If you are happy to not be spending time thinking about padding data or allocating registers, you should be happy about automatic multithreading.

--
Palm trees and 8

Re:Great potential by unrtst · 2012-12-03 15:01 · Score: 2

What you describe sounds a lot like "make".

$ cat Makefile
dinner: roast_meat bake lasagna boil_potatoes chop_salad
turn_off stove oven
wipe table
clean_table:
wipe table
prepare_meat: clean_table
cut meat
salt meat
set_meat_temp:
# make it fast
chtemp 800
roast_meat: prepare_meat set_temp_meat
ovenize meat
sleep 900
put_on_table meat
prepare_lasagna: clean_table chop_salad prepare_meat
remove_box lasagna
set_lasagna_temp: roast_meat
chtemp 500
bake_lasagna: prepare_lasagna set_lasagna_temp roast_meat
ovenize_lasagna
sleep 3600
put_on_table lasagna
prepare_potatoes:
wash potatoes
boil_potatoes: prepare_potatoes turn_on_burner1
put_on_burner1 pot water potatoes
sleep 1800
put_on_table potatoes
chop_salad: clean_table prepare_meat prepare_potatoes
put_on_table salad

$ make dinner

Re:Fast First Post by Anonymous Coward · 2012-12-03 15:03 · Score: 3, Interesting

/casts detect troll 10 post radius. Multiprocessor technology was already established when Bill Gates was still dumpster diving for code samples, and a lot of the work that gave us modern high performance multiprocessing was done by companies like IBM, Intel (and Intel shootoffs like Sequent) back when Windows was just the latest new DOS application. At the extreme end of the spectrum, this is why all of the top 10 supercomputers run on the UNIX-alike known as Linux, and none run on a Microsoft based OS.

I'm afraid regardless of what Microsoft would now like to present as a PR image, they've consistently been embarrassingly late to the party. Let's not belittle their contribution though; they certainly throw more money at problems than any other tech company.

Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · 2012-12-03 15:04 · Score: 3, Insightful

L4 isn't exactly trivial, but it's proof exceeds the length of it's code by whole factors.

Re:Fast First Post by aNonnyMouseCowered · 2012-12-03 15:13 · Score: 5, Interesting

"MS R&D is the largest computer tech R&D in the world. Combine IBM, Intel, and AMD, and you get an idea of their size."

Citation need. Not disputing the first part. Just the second part about the relative size of Miscorsoft Research.

Superscalar, old is new again. by citizenr · 2012-12-03 15:19 · Score: 2

Sounds like user provides list of Structural/Control/Data Hazards. Compiler Pipelines code into blocks that can be ran in parallel.
Sounds familiar :)

--
Who logs in to gdm? Not I, said the duck.

The real law in play is Amdahl's by 14erCleaner · 2012-12-03 15:20 · Score: 5, Informative

From skimming their paper, it doesn't appear that they get any real speedup from their parallelism. This is apparent because they state, in the part about the millions of lines of code written in their language, that

These and other applications written in the source language are performance-competitive with established implementations on standard benchmarks

Translation: we didn't speed them up any, or at least not by enough that we care to share any number.

Amdahl's Law is difficult to overcome in auto-parallelising systems that rely on anything other than loop optimizations. Basically, in straight-line code, if you make 50% of the code run on multiple cores, you're only going to get at most a 2x improvement in speed. In practice, you won't even get that much (except in loops) due to added overhead. Bottom line: you can write papers about your wonderful auto-parallelizing technique, but when the rubber hits the road this is unlikely to lead to much performance improvement.

--
Have you read my blog lately?

Re:The real law in play is Amdahl's by subreality · 2012-12-03 23:16 · Score: 3, Insightful

There is no metaphysical part of the program that is not parallellizable
require 'digest/sha2' string = 'Parallelize this!' 123456789.times { string = Digest::SHA2.hexdigest(string) } puts string
Any suggestions?
Most programs have parts that can be run in parallel, but Amdahl's point is that as you increase the number of processors you'll find that the parts that are inherently serial - even if they're quite small - will increasingly become the bottleneck.

Re:Great potential by ignavus · 2012-12-03 15:48 · Score: 2

Have you ever considered writing a text book on programming?

--
I am anarch of all I survey.

Re:Great potential by ceoyoyo · 2012-12-03 16:03 · Score: 2

I know nobody knows how to write low level code anymore but displaying something involves writing to memory.

Re:Or.. teach devs to use threading as appropriate by SplashMyBandit · 2012-12-03 16:08 · Score: 2

someObj.someMethod(if(x == 2) { 5; } else { 10; });

Aarrrrghhh!!!!!111 Surely you would never *teach* students such monstrosities. The idea is to keep it simple, one statement or side effect per line of code. This helps you when you wave your debugger over the code, since it will automagically highlight the result of the statement. This also helps the people reading your program (and source code is meant for people firstly). You are trying to write Java in non-Java different style, no wonder that code you gave is so hideous. Of course, the Java compiler is so damn efficient that extra assignments for debugging and readability are no problem at all - let the machine mash the code, not you. Trying to mash everything into one line is for C and LISP coders (which I was at one time so I recognize the same style), not for someone used to multiple 1080p displays. Newlines really aren't that expensive, dontcha know?

As a language, Java is pretty awful; you are stuck with a single programming paradigm even in cases where that paradigm makes no sense, you are writing enormous amounts of code to express what you are trying to do, and features that programmers in other languages take for granted are just not there (or are very limited). Meanwhile, you have Scala and Clojure on the JVM -- so why not just use these where functional approaches make sense, like for parallel algorithms?

I would argue that Java is more general purpose than Scala or Clojure, at least for the kind of problems I'm often trying to solve (eg. device control, 3D graphics, medical applications, numerical modelling etc). Perhaps Scala and Clojure fit your/people's problem domains, if so, then I have no problem with people using them. Don't mistake that for being useful in the outside (non-teaching) world. Note also that functional constructs do have some overheads that iterative-style code does not (where the latter is not exclusive to, but more in Java's 'style', yeah?). For my problems that are massively iterative (eg. some parts of graphics, numericals) this does make a difference, and the functional constructs are unnatural and used so infrequently to be useless. The usefulness of threading is above this (eg. the multi-threading is coarse grained and limited to a few dozen threads, again functional programming doesn't offer enough of an advantage in this regard).

Back to your teaching. Surely new students follow the deterministic procedural code better at the start than having lots of anonymous closures popping up all over the place? It is more advanced users that like the constructs that bring them home to the Source, that is, LISP.

Yes, they should have used the ternary operator.

No they should not have. The ternary operator is for C programmers making the jump. Again, newlines are not expensive. There is nothing wrong with making an if statement and assigning to a variable. Then give the output of that variable to your function call. One statement or side-effect per line. The compiler can make this fast. Someone who is not you can read the code without thinking about it (unless they are snob, and think the code is too simple and should be slightly obfuscated in the way you have shown). With lines split out in this way you get to put a debugger on the statement you want because multiple statements are not mashed together, and again, the debugger will automagically show you the value of the variable or expression as you move you mouse over it (less clicking is good, yeah?).

If aiming for simplicity is good enough for Einstein and Steve Jobs, then who am I (or you?) to argue? As they say, "You can write bad FORTRAN in any language" and your example proves it. Just because Java (and all other languages) allow obfuscated constructs because people think terse code is better (it's not!) and those that write terse code are better coders (not true, such things are only held in esteem by those lacking several decades of experienc

Re:Fast First Post by happymellon · 2012-12-03 17:42 · Score: 5, Funny

Of those, IBM have around 439,999 project managers.

Infinite Monkeys by dutchwhizzman · 2012-12-03 18:52 · Score: 2

I'm sure that they have produced all the worlds great literature the last 30 years as well. Just their size doesn't make them the source of everything good.

I haven't seen any GPU or CPU design used in modern computers come from MicroSoft. Sure, they have worked with it to implement it into their own software. They probably have been able to express their preferences to hardware vendors, in order to make them support the hardware and make it perform good in their software. High Performance multi-core kernel design in Windows was, if I call correctly, taken from VMS and Compaq design when they designed Windows NT. MicroSoft has been struggling to get over 16 cores effectively supported by their OS and most definitively by their applications. Linux has been getting great support from SGI when they ported their NUMA architecture from IRIX to Linux and building on that, Linux has been expanding and rewriting their multi core stuff more efficient and larger than MicroSoft has done with Windows.

Now what technologies has Linux taken from MicroSoft (free) research exactly? What MicroSoft patents were violated by Linux again? What's that eerie silence all of a sudden?

--
I was promised a flying car. Where is my flying car?

Re:Fast First Post by deoxyribonucleose · 2012-12-03 21:16 · Score: 2

If you talk about Nobel prizes, could you please remind a forgetful Swede how many are awarded in mathematics or software engineering? Or is research strictly a speaking only something that happens in physics, chemistry, medicine.... or literature... or (snrk) in (giggle) economics (bwa-ha-ha-ha-ha!)?

Re:Fast First Post by phagstrom · 2012-12-03 23:35 · Score: 3, Funny

And the last one cleans the office and makes coffee.

Re:Fast First Post by TheRaven64 · 2012-12-04 01:16 · Score: 4, Informative

Nobel Prizes? None. Because there is no Nobel Prize for computer science. The closest equivalent is a Turing Award. I haven't checked the whole list, however the 2009 winner works at Microsoft Research (and has since 1997), the 1998 winner is now dead, but was working at Microsoft Research between 1995 and his death. You can check the list yourself for IBM.

--
I am TheRaven on Soylent News

Re:Great potential by Bob+the+Super+Hamste · 2012-12-04 01:55 · Score: 2

as can any code that does access the same memory location as long as that access is all read-only

Not entirely true. If reads on that block of memory are atomic it will work every time, otherwise you might get some strange results as it may be possible for values to change while reading part of it. This mostly happens with more complex data structures but it can happen with primitive ones depending on their size.

--
Time to offend someone

Re:Fast First Post by hAckz0r · 2012-12-04 03:18 · Score: 3, Informative

You were right to question the stated facts. Microsoft Research has only 350 "employees". Not all that many if you compare it to the combined resources of the others previously sited. IBM for instance has 1593 "researches" alone, and that is bonafide "researchers" not just employees or warm bodies. So the prior statement is provably false by even the quickest and roughest of google searched.

https://en.wikipedia.org/wiki/Microsoft_Research
http://researcher.ibm.com/researcher/search.php?sn=1

Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · 2012-12-04 03:33 · Score: 2

The idea is to keep it simple, one statement or side effect per line of code

Except that you are then going to either (a) duplicate code or (b) have to introduce and keep track of more variables. In what way is that simple?

You are trying to write Java in non-Java different style

Perhaps I was not clear: that was a mistake that one of my students made. They did not know Java, they were still trying to learn it, and the point is that the Java style of programming is very unnatural. "Call f with 5 if x is 2 or else 10" is how the student was thinking about the program; they were forced to think about it instead as "if x is 2 call f with 5, otherwise call f with 10."

Note also that functional constructs do have some overheads that iterative-style code does not

Yes, there are limits to functional programming. There are limits to all programming paradigms, which is why languages should not force programmers to use a particular paradigm in all cases (and as it so happens, Clojure does have a loop construct). One of Java's major problems is that it really lacks support for anything other than object oriented programming, with only limited support for other paradigms or styles.

For my problems that are massively iterative (eg. some parts of graphics, numericals) this does make a difference, and the functional constructs are unnatural and used so infrequently to be useless

Perhaps so, although I suspect that in both cases object oriented programming is not universally applicable, and that you would appreciate the ability to just write a function without having to dress it up with class definitions. When I was an undergrad, I knew a lot of professors in the EE department who used Matlab to do computationally intense signal processing, and I know people in the CS department here who use Matlab for image processing work (and they would rule out both Java and any functional language for that work).

Surely new students follow the deterministic procedural code better at the start than having lots of anonymous closures popping up all over the place?

They do, but what I was saying was the Java is not an easy language to learn. Let's put it this way:

public class HelloWorld { public static void main(String[] args) { System.out.println("Hello world!"); } }

If you have never programmed before, that is an awful lot to deal with in a first program. It is certainly more to deal with than this:

(print "Hello world!")

In the Java case, most of the questions students were asking were about public, static, void, main, class, String, and then questions about function definition syntax, what "args" even is, etc. The Lisp case is far simpler and far more straightforward. I would say that it is acceptable to teach Java if all that complexity would eventually lead to simpler programs, if it was just a matter of scale, but unfortunately that is not the case; Java's syntax does little to keep complexity down, and the "everything is object oriented" approach adds a lot of complexity.

I find it interesting that you would mention closures here; in introductory Java courses, it is almost guaranteed that students will have to write at least one class definition. Do you think students have an easier time dealing with classes and objects than closures? Why even bring up anonymous closures; do you think new programmers have an easy time dealing with anonymous classes and objects (do you think we even get that far in a single semester)? My experience with teaching students how to write classes was this: it's hard for people who have never programmed before, especially when they are writing programs that are tiny and benefit little from classes. I would say that object oriented programming

--
Palm trees and 8

Re:Or.. teach devs to use threading as appropriate by ahabswhale · 2012-12-04 03:56 · Score: 2

I've spent a lot of time in the earlier part of my career writing multithreaded applications (mostly client/server stuff) and I got very good at it. That said, it's a complete waste of my time as a developer. Developers should focus on domain problems, not how to tweak the microprocessor. In my experience, most developers don't seem to ever fully get how to write multithread apps and threading problems are a royal pain in the ass to debug. I can't see a better candidate for automation than taking this out of developer's hands.

I used to write my own linked lists and hash tables but I don't do that anymore either. It's a new era and it's all about the domain.

--
Are agnostics skeptical of unicorns too?

Re:Great potential by Bengie · 2012-12-05 13:10 · Score: 2

i++ is atomic if the CPU can handle working at the size of i. On a 32bit machine, if "i" is 64bit, then it won't be atomic, but on a 64bit machine, i++ is atomic. Newer x86 CPUs do support 128bit and soon 256bit atomic memory reads and writes via SIMD registers.

Now, it is not guaranteed to be pushed out to memory immediately. If you're doing some like for(i = 0; ... i++), "i" may be stored in a register and never pushed back out to memory. You need to declare i as volatile or use a memory fence to force i to be pushed.

Volatile doesn't fix everything either. All it does is say the update will be pushed out to memory, but because most CPUs use Out-of-Order execution, the read/write may get re-ordered.

eg.

shared volatile int i = 0
shared volatile bool b = false

threaded method()
{
i = 3
b = true
}

Even if the compiler doesn't re-order these two assignments, the CPU might because "i" doesn't have a detectable dependency.

Slashdot Mirror

Auto-threading Compiler Could Restore Moore's Law Gains

92 of 404 comments (clear)