Supercruncher Applications
starheight writes "Bill McColl has written an article contrasting traditional massively parallel supercomputing with a whole new generation of compute-intensive apps that require massively scalable architectures and can deliver both incredible throughput and real-time responsivenes when processing millions or billions of tasks."
Just in time for Vista!
Dell's website consumer pricing generator.
Argh.
Imagine a Beowulf cluster of-- oh. Wait.
"No freeman shall ever be debarred the use of arms." -- Thomas Jefferson
How many hours does it take vista to boot on this thing?
GAHH!
There is no such thing as "massively parallel!" It makes no sense! Parallel in qualitative, NOT quantitative! Things are either parallel or they're not, there are no degrees of "parallelness!"
The same applies to "massively multiplayer!" It's no wonder that people can't grasp basic logic when they insist on talking like this!
Shouldn't computer savvy folk notice these sorts of things?!
Looking at his examples (Search, Ecommerce, Software-as-a-Service, Infrastructure-as-a-Service, Fraud Detection) I have to think "wow, single point of failure". Lots and lots of fault-tolerance needed to put all your eggs in one basket like that.
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
Good point. Single point of failur not only causes your entire system to go down, but stops the several billion processes you're running all at once. How long would it take to get things running again if something simple stopped? How long if its a processor that fries out? An hour? A day? Several days? How much money are you losing when that happends?
The first half of his list seems a bit flighty. They lean more towards buzz and less useful applications. But the second half is much more practical and likely. There are many potentially interesting applications coming up, but I don't think we'll directly see most of them publicly on the internet. So I give him a +0.5 Insightful.
Developers: We can use your help.
Yes ... it includes RFID tracking to reduce theft, and ... manage traffic!?!?
We need our next generations of supercomputers to follow you around, knowing where you are at all times ... so umm, we can change the traffic lights when the roads get busy for you ....
~Director of NSA Domestic Spying Program
There had better be a CPU dedicated to Error detection and correction!
"No freeman shall ever be debarred the use of arms." -- Thomas Jefferson
I've seen the things - trust me, they're massive!
The Big Crunch
Slow news day, huh?
Can we please have a "no links to random, boring blogs week" on Slashdot?
The term "massively parallel" indicates a system operating without those constraint.
Engineering is the art of compromise.
bah weep grana weep minibom
Imagine a Beowulf cluster of these. And will it run Linux?
"Using supercomputers to test the next-generation version of the SMP code, we get good scaling to many more cores than in the Intel prototype, and we expect to do even better in the future."
m l#166684
http://forum.folding-community.org/fpost166684.ht
http://fahwiki.net/index.php/SMP_client
My main side project is real time ray tracing software. It is very nearly not subject to Amdahl's Law. In the terminology of the Wiki article, F is approximately zero for Ray Tracing. It will scale very well past 10 cores and may well be able to make good use of 100 cores. Memory bandwidth seems to be the limiting factor (that determines F) but that may not be a problem with enough cache and good code. It's also the only potential mass-market use for a lot of cores. nVidia your days are numbered.
"Give me a SUPER number crunch."
"We have a 32.33, repeating, of course, percent chance of survival."
"That's better than we usually do."
"Never give up, for that is just the time and place when the tide will change." -Harriet Beecher Stowe ^_^
# Dense linear algebra
# Sparse linear algebra
What about Average linear algebra?
# Structured grids
# Unstructured grids
Are there any other types?
(** Warning: Car analogy...)
Isn't that kind of like selling a car and listing on the spec sheet:
# Goes slow
# Goes fast
Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
Bill McColl, for those who aren't familiar with him, was the driving force behind the bulk synchronous parallel (BSP) model of programming. This model, while available in the MPI-2 spec, is not widely used as is. Instead, its major contribution is inspiring remote direct memory access and the partitioned global address space, among others.
Last time we spoke, Bill said that he was interested in the issue of massively scaled computers that can handle fault tolerance pre-emptively. He compared today's supercomputers (Blue Gene, Cray XT4, Altix, etc) to a racing car that was really fast for a few hours a week, but wasn't even reliable enough to get the groceries. He was also interested in computers that can handle a continuous influx of data (as his blog post mentions), similar to managing millions of RSS feeds.
An example application domain for this stuff would be Wall Street firms that have to run time series analysis on streaming data. Prof. McColl is really on the right track here.
Fault-tolerance is either built into the problem or into the application. Take for example search, if one search server on the backend that is handling 0.1% of the web sites goes down, you may not know or even care that those results are missing (assuming the system doesn't have something built in to give that query to another node searching the same dataset).
In fraud detection, thinking of the credit card companies, it's typically looking for patterns after the transaction has already gone through, and if one node of the cluster goes down, maybe you give the same transaction list to another node. You never find every case of fraud this way, but you want something that can search as many (or all) of the transactions as quickly as possible to reduce the time between the first instance and shutting down the account.
For the other examples, you just build it into the system, e.g. one HA broker on the front that can give out a task to another node if the first one goes down. When you build a system like this, single points of failure in the server farm aren't the concern. It's the mean time between failures and the process to replace nodes, the power and cooling requirements, failure points outside of the nodes, etc.
Either that, or your imagination is lacking somewhat. Personally, I've wanted lots of cores sinces I was in kindergarten. I'm quite sure I can find a use for them all.
What? You are on drugs, yes? And not the good kind?
What about video encoding? Besides codec parallelism, you can also parallelize the distance between two keyframes, handing that chunk off to a core (or node) for processing. This is very mass-market - more and more people want to make snazzy home movies.
In fact, far more people would like to do this than render 3d movies.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
As a standard power user running internet apps/office apps/video processing (home/tv)
At one point you have the app running on a core, the OS on one, the graphics on the GPU, the network on a cpu. You get lower latency because your app's cpu doesn't have to time slice with the others.
I can see parallel makes, conversion (wav2mp3, video formats etc), formatting (commercial skipping, panorama stitching). I/O is going to be the ultimate bottleneck.
What kind of consumer applications would benefit?
None of those offer or require real-time guarantees.
Google Alerts is here now.
A better article would have started with the table that defines "supercruncher" and proceeded to describe the architectural issues of building one. Ideally it would have addressed the software challenges.
Ecommerce has single points of failure? Where is that? In the replicated database? No, it must be in the replicated webservers.
Or maybe not. Maybe instead you don't know what you're talking about. Whodathunkit?
It's rare for an entire machine like that to fail. More likely is 1 processor board, or similar subsystem, which you can design for (I didn't get a result back, try again) in software, and, like the T3E which shipped with redundant processors, in hardware as well. If you have enough processors, you could stripe your job across several, so if one doesn't return a result, a second one will. Now, locating your only one of these machines in California might not be the best idea (we had an earthquake which started a eucalyptus grove fire, but don't worry, the mudslide put it out), but it's unlikely that you'll lose an entire one.
Just to geek out for a moment, picture a system large enough to finally troll through all of that data NASA brought back from the Mariner missions, and cross-reference it against what they get daily now from the various Mars probes. Finally turn all of that data into information, as the blog says.
the more accurate the calculations became, the more the concepts tended to vanish into thin air. R. S. Mulliken
100 cores is not massively parallel. The kind of scaling we're talking about is much higher. Think thousands of cores each with hundreds of threads.
This is the kind of scaling that weather centers are just starting to reach today. It's the kind of scaling that will require a radical rethinking of how consumer software is designed and what tools we need to make that design process easier.
In this world, software is king. You won't care who your chip vendor is. You'll care who provides your compiler, debugger, performance analysis tools and other such things.
Fascinating that a story purporting to be about supercomputers is actually a summary of Weightless Economy theory. The theory is that the wealthiest countries can't achieve more wealth by implementing things anymore. They can't increase their net worth by manufacturing or solving math problems. They have to turn instead to philosophical goals like people management, interpreting literature, creating works of art.
The supercomputer function is still the same. It still solves algebra, n-body methods, structured grids, and finite state machines. The user of the supercomputer is different. The user is now living on $1 a day in Mongolia.
For the wealthiest countries to stay wealthy, they have to focus on not the computing part but marketing the computing, creating the interface to the math, managing the business around the computing.
really? you pulled the T3E out of your ass? Well done, sir. So yea, this is why there's an emerging market and lots of research going into things like predictive failure models and checkpointing where you can have a backend engine throw up a flag whenever it detects the conditions of a likely failure, check point and move. This search stuff. You can see vendors looking at running non-mpi-ified code on machines that embrace the MPP model, this opens the door for running massive installations like Google on a BlueGene and a filesystem cluster (I know I just said that and it's not so feasible with BlueGene/L's model of partitioning, but I think that's a minor limitation).
As used in the field of "real-time computing/systems," satisfying time constraints is a correctness criterion, not simply a performance metric.
Doug Jensen
It means more than just a few nodes, Mr. pedantic dipsquat. Apparently, everyone on the planet but you knows what it means. Parallel processing done in a massive manner with highly tweaked software to take advantage of the processors and i/o, as opposed to two old pentium ones with a crossover cable and 5 lines of JS. And then run through market speak to make it sound cool. And that's it. Sure, you can make it sound stupid like a standup comic pointing out the old "pair of pants" routine, but that's all it is, too, a word joke. Get over it, george carlin and steve martin and robin williams and various other professional funny guy fast talkers and language noticers do it better than you, PLUS get paid well for it, PLUS it's fun to listen to, at least the first time. "jumbo shrimp" "driveway or parkway?" and etc.
Some time ago while doing some research into 'massively parallel' applications for a bio-research company I wrote an auto scaling hack on top of the Pov Ray PVM port. It worked fairly well at monitoring cpu loads across a network, dicing up the scenes to be rendered and shipping off chunks of work to various CPUs as they were available.
Overall the research project covered scaling from the CPU/core through cache to DRAM to disk to network even up to the point of when you'd have to actually scale the dispatcher in order to keep all of the processors busy. It was interesting stuff and produced some nice graphs of performance curves clearly indicating what was the bottleneck for each type of computing problem that we evaluated.
Damn it was nice working for that company.
Good judgement comes from experience, and experience comes from bad judgement.
- W. Wriston, former Citibank CEO