Scalable Nonblocking Data Structures
An anonymous reader writes "InfoQ has an interesting writeup of Dr. Cliff Click's work on developing highly concurrent data structures for use on the Azul hardware (which is in production with 768 cores), supporting 700+ hardware threads in Java. The basic idea is to use a new coding style that involves a large array to hold the data (allowing scalable parallel access), atomic update on those array words, and a finite-state machine built from the atomic update and logically replicated per array word. The end result is a coding style that has allowed Click to build 2.5 lock-free data structures that also scale remarkably well."
why are there fewer than 1 thread per core? It says 768 cores, but only 700 threads. Does it need the rest of the cores just to manage the large number of threads?
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
1. A common killer in parallelization is false sharing. That is, threads on two processors fight over a cache-line even though they are accessing independent variables. A cache-line is typically bigger than an individual variable. The approach of using adjacent elements of an array for parallelism sounds naive. One needs to pad the array.
2. Updating a shared variable, especially a non-scaler, in an inner loop is naive. One should reference local scalers in inner loops when parallelizing. Just once, should the thread update the shared variable. Don't reference non-scalers or shared variables in an inner loop. Don't lock in the inner loop, either, if you can avoid it.
If it is not an inner loop then the locking is probably not a big issue any way.
3. Java, really, Java?
Umm, because Azul runs the Java in hardware. It *is* optimized by being in Java.
Well, actually, it's 16 Vega2 chips (each of which is termed a "processor"), each chip has 48 cores, making for 768 total cores. 2 chips per board, and 8 boards in a system. All chips are fully meshed (no switch, fabric, or bus, just a whole bunch of direct multi-GHz wires), so it's a flat SMP UMA machine.
...).
The new Vega3 (announced last week) takes it to 54 cores per chip, and 864 cores in a system (16 x 54 = 864
Good for us. Get the rabble away from Slashdot. Only true nerds should understand the contents. Let me add a few keywords to get rid of the softies: monads, higher-order type systems, return type, genericity.
Your turn.
I used to think this too until I saw the video by the article's author. By lock free we mean that if the thread that has the "lock" were to die, it would not stall out the entire program. With CAS updates, a crashing thread would simply die and cause no ill effects to the data structure. With Mutex style locks, if the locking thread crashes (or otherwise forgets to unlock the mutex) then the entire program grinds to a halt as other threads start waiting on the lock. The maximum time a CAS "lock" can exists is 1 CPU cycle.
Man is the lowest-cost, 150-pound, nonlinear, all-purpose computer system which can be mass-produced by unskilled labor.
Your link says: "Unfortunately, there's also a third conclusion. It seems that it's much, much easier to create a well performing program in Java. So, please consider it for a moment before you start recoding your Java program in C++ just to make it faster..."
In any case, the author of that test failed. The test includes start-up times in the results. Java is always going to lose micro-benchmarks where start-up time is included. Why? Because the JVM load gets included in the benchmarks. As anyone who knows Java well can tell you, Java's startup time is extremely costly. Once the JVM is running, however, it produces much better performance. The end result is that the startup cost will eclipse the actual benchmark being run.
I'm willing to bet you'll have a hard time finding a modern benchmark for Java that doesn't make that (simple!) mistake.
However, I will admit that I was overextending my reach by suggesting that no one runs benchmarks anymore because of the strong performance of Java. The reality is (as one might expect) much more complex. It's not that Java performed poorly in benchmarks. Quite the contrary. By 2004, Java was cleaning up in benchmarks. The problem is that micro-benchmarking sucks. All you end up getting is the two sides fighting over who can produce the most optimized benchmark possible. Which invariably ends up defeating the purpose of benchmarking for "real-world" code.
The long and short of it is that Java was proven to be more than fast enough, with the average case easily keeping pace or beating C++ code. Thus the statement by the author of the article you linked to:
"Unfortunately, there's also a third conclusion. It seems that it's much, much easier to create a well performing program in Java. So, please consider it for a moment before you start recoding your Java program in C++ just to make it faster..."
Javascript + Nintendo DSi = DSiCade
I don't necessarily take issue with what you've said (I've got you on my friend list, so I'm fairly sure that your one of the professionals on this site whose insight I love to read... I'm a software development major in college, so I usually add programmers to my friends list when I genuinely value their insights...), but I would like to point out that I've seen another approach to your first point about locking busses. The Linux kernel, IIRC, does a sideways and backwards cache flush to RAM before masking IRQs on an SMP system. I think it defers top half of interrupts to another CPU and doesn't queue the bottom halves. I might be mistaken on that, though. It then 'takes over' the CPU for the process/thread, and when it's done, it flushes all the caches once again.
But, as you said, the lock doesn't require a bus lock. Please correct me if I'm wrong on this, it's been a few months since I've read that piece of code and I'm a little hazy on the exact details.
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
When performance doesn't matter much I'd use perl instead of Java, or some other higher level language (e.g. Python).
:).
;).
Of course when your boss/customer requires Java, use Java
I'm mixed on Lisp- Lisp is kind of high level and kind of not... I guess I'm just too lazy - perl has CPAN = lots of wheels I don't have to reinvent - and document