Scalable Nonblocking Data Structures
An anonymous reader writes "InfoQ has an interesting writeup of Dr. Cliff Click's work on developing highly concurrent data structures for use on the Azul hardware (which is in production with 768 cores), supporting 700+ hardware threads in Java. The basic idea is to use a new coding style that involves a large array to hold the data (allowing scalable parallel access), atomic update on those array words, and a finite-state machine built from the atomic update and logically replicated per array word. The end result is a coding style that has allowed Click to build 2.5 lock-free data structures that also scale remarkably well."
why are there fewer than 1 thread per core? It says 768 cores, but only 700 threads. Does it need the rest of the cores just to manage the large number of threads?
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
1. A common killer in parallelization is false sharing. That is, threads on two processors fight over a cache-line even though they are accessing independent variables. A cache-line is typically bigger than an individual variable. The approach of using adjacent elements of an array for parallelism sounds naive. One needs to pad the array.
2. Updating a shared variable, especially a non-scaler, in an inner loop is naive. One should reference local scalers in inner loops when parallelizing. Just once, should the thread update the shared variable. Don't reference non-scalers or shared variables in an inner loop. Don't lock in the inner loop, either, if you can avoid it.
If it is not an inner loop then the locking is probably not a big issue any way.
3. Java, really, Java?
Umm, because Azul runs the Java in hardware. It *is* optimized by being in Java.
Good for us. Get the rabble away from Slashdot. Only true nerds should understand the contents. Let me add a few keywords to get rid of the softies: monads, higher-order type systems, return type, genericity.
Your turn.
I used to think this too until I saw the video by the article's author. By lock free we mean that if the thread that has the "lock" were to die, it would not stall out the entire program. With CAS updates, a crashing thread would simply die and cause no ill effects to the data structure. With Mutex style locks, if the locking thread crashes (or otherwise forgets to unlock the mutex) then the entire program grinds to a halt as other threads start waiting on the lock. The maximum time a CAS "lock" can exists is 1 CPU cycle.
Man is the lowest-cost, 150-pound, nonlinear, all-purpose computer system which can be mass-produced by unskilled labor.
I don't necessarily take issue with what you've said (I've got you on my friend list, so I'm fairly sure that your one of the professionals on this site whose insight I love to read... I'm a software development major in college, so I usually add programmers to my friends list when I genuinely value their insights...), but I would like to point out that I've seen another approach to your first point about locking busses. The Linux kernel, IIRC, does a sideways and backwards cache flush to RAM before masking IRQs on an SMP system. I think it defers top half of interrupts to another CPU and doesn't queue the bottom halves. I might be mistaken on that, though. It then 'takes over' the CPU for the process/thread, and when it's done, it flushes all the caches once again.
But, as you said, the lock doesn't require a bus lock. Please correct me if I'm wrong on this, it's been a few months since I've read that piece of code and I'm a little hazy on the exact details.
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
When performance doesn't matter much I'd use perl instead of Java, or some other higher level language (e.g. Python).
:).
;).
Of course when your boss/customer requires Java, use Java
I'm mixed on Lisp- Lisp is kind of high level and kind of not... I guess I'm just too lazy - perl has CPAN = lots of wheels I don't have to reinvent - and document