Beowulf In Business
"Cnet has a story on how businesses are starting to use Beowulf for those heavy duty tasks," writes NIB. The story notes, "Beowulf wouldn't be good for a program that executes large numbers of transactions, such as an airline
reservation system. It would, however, work for business tasks such as deciding how to design an assembly line
or which mix of currencies to buy, said International Data Corporation analyst Dan Kusnetzky."
MOSIX can do what beowulf cannot (transparent process migration) and its GPLised too...they should have at least commented on it...
Yeah, that was one of the suggestions but we need a chess source program to recompile for the project and then, as you point out, we need someone to actually challenge it.
This was to be an ad hoc "Stone Soupercomputer" style configuration built out of machines brought to the conference by attendees.
One big one is simulations for financial calculations. One such is roughly this: the price of some class of security is sensitive to interest rates. So you want to see what happens if, at several time steps, the interest rate goes up or down some small amount. Evaluating the different 'paths' of interest rates over time lends itself to parallel processing.
MOSIX.
As long as there was not one huge shared database, of course. And to dump old databases to semi-offline storage, a second backbone could be installed in these file servers to push the data onto backup servers, which would merge the databases again, and write them out. A lot of investment in hardware, but much less than a similar proprietary system.
There's going to be some interesting developments in this area for Linux soon, even if it means I'm going to have to start them myself.
Besides being very poorly worded, your argument that "The definition of a Beowulf requires and "open source" OS" is simply incorrect. If you look back at why/when Beowulf was created (http://www.beowulf.org/intro.html), you'll find that it was from "their [the creators of Beowulf] idea of providing COTS (Commodity off the shelf) base systems to satisfy specific computational requirements."
Whether or not they used an open-source operating system is not the point. The goal was to provide an MPP system for as little cost as possible. If a collection of Tru64 UNIX workstations operating in a Beowulf cluster provides more computing performance at less cost than a similar system from a major MPP vendor, then it seems to meet the criteria for why Beowulf began. Sure, it might cost more than the same Alpha workstations running Linux, but it is also likely to perform better. Life's little tradeoffs are everywhere, aren't they?
Cheers,
David Hull
david.hull@england.com
With new distributed computing software (Mosix anyone?) more and more people are going to write software for clusters. Definitely there are issues with db coherency, record locking, etc., but solutions will be implemented; after all a cluster is pretty much the only way to increase throughput if an SMP box is not fast enough for you...
--------- Webmaster, http://www.cpureview.com and
I have set up a few Beowulf machines for S&G. I used PVM, RH Linux 6.0/5.2, a 10/100 switch, and about 6 boxen. It worked quite well, except that it took a few days to get operating how I wanted. I wrote a couple applications to crunch numbers across the cluster, tested throuput, etc. For even more S&G I used MP3PVM to RIP a few CD's real fast. Fun!
Now this is all well and good, but wouldn't it be great if we could have a transparent virtual machine that runs across all the nodes? Something which you could use "/bin/bash" on as your command shell.
Now, I am not sure how this would be accomiplshed-- forinstance how you would effciciently share memory accross machines or decide how to break up tasks (break on thread, would be one way); this is just to open up conversation.
Imagine: Lower your SETI@Home WU time to mear seconds :) (is it far to run a distributed computer under a distributed computer?)
-AP
Strange, they said that it wouldnt be good for large volume transactions. Isn't this exactly the sort of task that works really well concurrently? It seems to be the perfect candidate, lots of non-interconnected tasks, perfect for multiple execution.
It may have somthing to do with that databases need to remain "consistant" such that only one operation is performed on a record at once. The problem might be in making sure that only one box "owns" a record at once. You would need to make that record unavailable to all the other nodes, or let them no not to use it. If the network is high latency, it could be a problem.
:)
of course, with gigabit ethernet...
"Subtle mind control? Why do all these HTML buttons say 'Submit' ?"
ReadThe ReflectionEngine, a cyberpunk style n
I think the problem with the transaction systems is that they top out on a different bottleneck. CPU isn't the major gating factor. Multithreadde applications will take great advantage of this type of system. One application that I worked on in a previous life was a creditcard limit verification system for a major player. They had a 1 second transaction turn around specification. We ended up setting it up with discrete machines with a failure rollover mechanism involved. Much of the coordination we had to design would have been far easier in a coupled system like Beowulf.
There's a movement on to put together a large Beowulf cluster for the Boston Geekfest in October. One of the things we're trying to come up with is a good demo that actually shows something to the crowd. We've had ideas from the realtime rendering of POV scenes to decryption (yeah, right, watch it hum for 20 hours and then spit out the true key) but haven't come up with a "killer demo app". Email if you have any ideas.
Now all I need to do is get ahold of about 100 of those power4 IBM chips when they are released build 25 Quad Processor 1GHZ machines with 500mhz bus and 1gb mem and throw em all in a cluster. Add 2-10 Terabytes of secondary storage, multiple OC-3 or faster connections and start leasing space on the fastest machine in the world. Handles 156E+10^8 hits/sec while doing recursive database lookups.
heh... If Only...
www.mp3.com/Undocumented
Seti is a good example of the type of data which lends itself well to being handled by a cluster. However, in this case you could just run the seti@home client on each of the 386's and get the same result. It's not a cluster, but seti@home is a great example of distributed computing.
I just read the article. As a manufacturer of "turn-key" Beowulf systems, here was my reply to the author:
Stephen,
I just read your story about Beowulf systems. While the story was well written and informative, there are some points that you have missed.
1) The definition of a Beowulf requires and "open source" OS (See "How to Build a Beowulf" by Sterling, Becker, Salmon, Savarese) Therefore, systems built from True 64 are NOT Beowulf systems.
2) You missed my company, Paralogic Inc. We sell turnkey Beowulf systems. In fact rather than "several" as reported by IBM, we have several dozens of installed production systems at companies like Lucent, Amerada Hess, Conoco, Procter and Gamble, government sites like NASA, NRL, and the Air Force, and many Universities. (see www.xtreme-machines.com)
3) There is a rather huge barrier to entry because of the technical nature of these machines. As far as I know, we are the only company who will offer support for Beowulf clusters. Without support the market can never enter the mainstream.
4) There have been quite a few other people who contributed quite a lot of effort to the Beowulf technology other than IBM and VA Linux. Although all contributions are welcome, these guys are a little late to the party and we hope they stay.
Sincerely
Douglas Eadline, Ph.D.
President
Paralogic, Inc.
PEAK PARALLEL PERFORMANCE
hehe.. no joke.. I have a stack of 5 386's w/ 8mb (I think) that i've been threatening to turn into a cluster. But what would i use it for? Maybe get a seti@home client running? :)
I just got three p120's with an unknown amount of ram and harddisk space, but it will probably be at least 8megs and a gig... :)
Other than learning a new technology, i don't have a real use for a parallel processing machine, i just do basic php -> mysql stuff with small dbs...
however, it sounds like a really cool thing to set up, and i want to learn. Alot of these sites talk about beowulf alot, but don't give an explanation on how to set one up! Either i'm looking at the wrong sites or i just don't kow how to do it..
Can anyone point me in the right direction or give me some tips on doing this? And responces like *just give me those three pcs* are appreciated but will be ignored
Thanks for the help
jc
--yep. i'm a NEWBIE!!!
Beowulf is best for CPU intensive tasks which can be broken up easily, don't require a lot of intranode communication, can deal with relatively high latency on the intranode communication, and can deal with single node failures easily.
This is a relatively large domain or problems, but it doesn't work for everything. A lot of business applications require high reliability and availability. If you use beowulf, you have to implement these features for your application on your own.
The simulations that businesses are running on these things aren't really in the same league. For the most part, they aren't time critical and if a failure occurs that invalidates a test run, they can ususally be rolled back to some midpoint and started again without a significant loss of time.
Beowulf isn't just useful for CPU intensive tasks though. All those processors also provide significant amounts of memory bandwidth and all those machines provide potentially large amounts of disk storage and bandwidth, but again, you need memory or disk intensive tasks that can easily be split out to many loosely coupled nodes.
OTOH, if you're doing linear or integer programming, those are parallelizable if you're doing branch & bound stuff. But LP/IP doesn't always give you the granularity you need to make decisions as accurately as DES does (and can't account for the stochastic and dynamic nature of manufacturing processes).
-----------------------
To understand recursion, one must first understand recursion.