In the 1980's, police successfully handled protests despite all the examples of violent protests in the 1960s. So no, I don't think a race riot in 1992 had a significant effect on why the police are beating the crap out of non-violent protesters in 2000.
Huh? This system can do genetic pattern matching, but it's far less cost effective than a pile of small machines. Fortunately, the people who actually spend millions of dollars on machines to solve problems like gene matching investigate the problem more carefully than your friend.
Two companies doing this problem are Celera Genomics and Incyte. Incyte has a cluster of 1,200 x86 machines (3,000 cpus) running Linux. Celera Genomics has a cluster of 1000 Alpha cpus in 250 nodes; Celera purchased their machines before it had been shown that Linux could handle that kind of task.
And a company that specializes in getting fast storage for the movie industry is MountainGate.
I'm not so sure that even the rendering example is really valid. Much rendering treats rendering as an embarrassingly parallel problem: invidual frames slow, entire movie fast. That's much more cost-effective.
And once again, we meet that fine line in the sand. Lets look at WTO. There were days of peaceful protest.
There were also groups who showed up, promising to cause a disruption and destroy property. So how are
you expected to deal with the protests as a whole?
I expect the police to enforce the law. In the 1980's, there were MANY protests which were largely non-violent, with a few violent people. The police dealt with most of them quite well. It's only now that "police riots" are happening repeatedly.
So much for the lessons of the past. And so much for police professionalism. Ready, aim, lawsuit!
Wouldn't any company that wants to be successful? No, successful companies obey the law, including the anti-trust law. Dominant companies don't get to play the same games that non-dominant ones do.
Mach is the granddady of distributed OS work? Heck, Mach wasn't even the first distributed OS developed at CMU. Hydra pre-dates it by more than a decade. Bill Wulf did quite a bit of work on it. The successor to Hydra is Legion, at the University of Virginia.
There are several real, full-featured distributed operating systems out there. One good example is Legion. It gives you the illusion of running programs on your desktop, while they are actually running lord-knows-where. Yes, you often need a lot of network bandwidth to get good results. Depending on the exact details, you can run programs on other machines with either no or small modifications.
Lest you think this has nothing to do with today's operating systems, the Linux desktop folks have started using Corba quite a bit to link things together. Well, Legion provides much more powerful, secure, and reliable ways to do the same thing, in a much more consistant fashion.
Almost every 386+ OS has not used segments the way Intel intended. So yes, they've had quite a few years (more than a decade) to add an execute bit, if they actually cared.
What makes you think that the Smithsonian wants a huge NSA exhibit, as big as the NSA museum? The Smithsonian has limited funds, just like everyone else.
The Smithsonian dropped by the University of Virginia astronomy department and looked at the 5 generations of astronomical photographic plate measuring devices we have in the basement of our observatory, gathering dust. "Hey, you should build a museum for this. It's important stuff and should be preserved." Well, they didn't have the money to do it, and neither does UVa, but UVa hasn't junked the equipment; they're keeping it in a climate-controlled building until someone decides they care.
Yes and no. That instrument is a barrel piano or barrel organ, which only plays preprogrammed tunes. It was played by turning a crank, so it got named after the stringed instrument which preceded it.
I play the original, not the modern kind. In fact, the original stringed instrument survives to the modern era in French folk tradition.
If you have the budget for Fibre Channel fabrics at some point, at least look at the Global File System.
Our storage is Fibre Channel, and we did evaluate GFS. We found that CentraVision was superior for this customer, mainly because GFS didn't have journaling at the time. GFS may yet become quite superior.
And there are much larger SPs around and coming, like San Diego's and the second phase of NERSC's.
Myrinet has superior scaling when compared to the SP switch, or the T3E switch for that matter. The T3E switch did have higher bandwidths and lower latencies, but for many real supercomputing problems, Myrinet does the job for far less money.
The biggest way that this system doesn't compare well with the T3E is in programming models -- the T3E also supports the SALC model, shared address local consistancy. I hope to support that in around 12 months.
The IBM SP doesn't support the SALC model, and has inferior per-processor bandwidth and latencies.
The first answer to your question is that we never have scheduled maintenance. Since the machine isn't monolithic, we can repair most parts while it's live.
This machine is nothing like the SGI SN-IA architecture. SN-IA is still shared memory, and has a significantly faster network (which is far less scalable and far more expensive). Whenever you share memory, you share failures.
We did provide a user-level checkpoint feature to FSL, but it requires the user to modify their program. Kernel-level checkpoint is on our list of things to do. It's not that hard for single processes -- Condor does it, for example -- but it's fairly tough for programs that use MPI and run in parallel.
A PCI bridge would itself pretty much be a network. Myrinet is a great interconnect and is probably much better than any big PCI bridge that you could come out with. The new InfiniBand specification allows bridging of the successor to PCI, but I suspect Myrinet's successor is going to be a better interconnect by the time InfiniBand machines are available.
This system does run regular software without recompiling. It just doesn't use a lot of CPUs for simultaneous compute unless you change the code to use MPI. But they can access the shared storage at high speeds without any change, and they can get farmed out to separate CPUs without any change.
A yellow dress?! Those are pluderhosen, not a dress. Pants. With pockets. Worn by manly Elizabethan men, who carry sharp pointy sticks to poke people who accuse them of wearing dresses.
Hurdy gurdys aren't the same as organ grinders. I don't think monkeys were a part of the act in the 16th century.
They tell me that the first external users (20% of the machine) are going to be ocean modelers. But I think the FSL guys would disagree that it's far more interesting... all of these guys are pretty fanatical about what they do!
I think Greg's answer to this question, i.e. not understanding that the question was about running simulations outside of his cluster, is indicative of the "we've got to run our jobs on somthing that sits in a big air-conditioned room on our site" mentality.
You must be a great mind-reader.
No, I don't have a "big air-conditioned room" mentality. In fact, Legion is capable of harvesting unused processor cycles in a much more sophisticated fashion than distributed.net. However, weather forecasting needs too much bandwidth. You have to consider problems on a case-by-case basis for such a low-bandwidth system; most traditional supercomputer problems aren't appropriate.
This doesn't mean I think distributed.net isn't cool -- it's very cool, light-weight, and it gets its job done. It shouldn't be a surprise that it can't solve every problem.
No. As I pointed out, weather codes require a fair amount of bandwidth, much more than that's available in a distributed.net situation. In addition, most weather codes assume that they're running on a uniform machine, so they'd have load-balancing problems if run on a distributed.net type system.
The original poster meant that the document is copyrighted, not the concepts in it.
Right. Copyright is for published material. Trade secrets can't be published. As I said, you shouldn't play lawyer on/. if you don't know what you're talking about. And don't trust me, I'm not a lawyer either. But I paid attention back when AT&T was suing Berkeley over BSD. At the time, AT&T was asserting that the Unix source code was a trade secret, and wasn't copyright.
First, on what statistics did you beat SGI's machines on
We beat SGI on performance on the customer's actual codes. If you have 1/10 the MPI latency and your machine costs 3 times as much, and the customer's codes don't get much of a benefit from reduced latency...
The biggest nit I'm going to pick is your assertion of running a single system image.
That is true only if you can migrate processes between nodes in the cluster or transparently change your
interconnect fabric to keep nodes running the same job physically close.
You're pretty confused about what a "single system image" can be to different people. Try reading Greg Pfister's book. By the way, Myrinet's CLOS topology is good enough that it doesn't matter where in the machine a job's processors are. That's an important factor simplifying the software that the FSL machine needs to get high performance. FSL tested for inter-job contention, and I suspect SGI flunked. The machine they bought had near-zero inter-job contention.
Further, it's not exactly an SSI if the sysadmin has to install the oS on every node or has a seperate
console connection to every node.
We provide tools that give the sysadmin a single system image, too. There's nothing new there; people administering large clusters have had that for years.
Now, I'm not trying to say that clusters suck for all applications. They just aren't the solution to *every*
problem, as a lot of people claim they are.
I never said that clusters were the solution to every problem. But a cluster was a solution to FSL's problem.
I'll end with my sales pitch for traditional supercomputers
Please don't. We beat SGI's machines in the bid, and this machine provides both higher bandwidth than any SGI Origin machine (300 gigabits bisection bandwidth), and it also does provide a single system image for this customer, who only runs MPI programs. So numerous parts of your comment are wrong.
There were clusters of Crays before War Games
came out.
In the 1980's, police successfully handled protests despite all the examples of violent protests in the 1960s. So no, I don't think a race riot in 1992 had a significant effect on why the police are beating the crap out of non-violent protesters in 2000.
Huh? This system can do genetic pattern matching, but it's far less cost effective than a pile of small machines. Fortunately, the people who actually spend millions of dollars on machines to solve problems like gene matching investigate the problem more carefully than your friend.
Two companies doing this problem are Celera Genomics and Incyte. Incyte has a cluster of 1,200 x86 machines (3,000 cpus) running Linux. Celera Genomics has a cluster of 1000 Alpha cpus in 250 nodes; Celera purchased their machines before it had been shown that Linux could handle that kind of task.
And a company that specializes in getting fast storage for the movie industry is MountainGate.
I'm not so sure that even the rendering example is really valid. Much rendering treats rendering as an embarrassingly parallel problem: invidual frames slow, entire movie fast. That's much more cost-effective.
I expect the police to enforce the law. In the 1980's, there were MANY protests which were largely non-violent, with a few violent people. The police dealt with most of them quite well. It's only now that "police riots" are happening repeatedly.
So much for the lessons of the past. And so much for police professionalism. Ready, aim, lawsuit!
Wouldn't any company that wants to be successful? No, successful companies obey the law, including the anti-trust law. Dominant companies don't get to play the same games that non-dominant ones do.
Mach is the granddady of distributed OS work? Heck, Mach wasn't even the first distributed OS developed at CMU. Hydra pre-dates it by more than a decade. Bill Wulf did quite a bit of work on it. The successor to Hydra is Legion, at the University of Virginia.
There are several real, full-featured distributed operating systems out there. One good example is Legion. It gives you the illusion of running programs on your desktop, while they are actually running lord-knows-where. Yes, you often need a lot of network bandwidth to get good results. Depending on the exact details, you can run programs on other machines with either no or small modifications.
Lest you think this has nothing to do with today's operating systems, the Linux desktop folks have started using Corba quite a bit to link things together. Well, Legion provides much more powerful, secure, and reliable ways to do the same thing, in a much more consistant fashion.
Almost every 386+ OS has not used segments the way Intel intended. So yes, they've had quite a few years (more than a decade) to add an execute bit, if they actually cared.
What makes you think that the Smithsonian wants a huge NSA exhibit, as big as the NSA museum? The Smithsonian has limited funds, just like everyone else.
The Smithsonian dropped by the University of Virginia astronomy department and looked at the 5 generations of astronomical photographic plate measuring devices we have in the basement of our observatory, gathering dust. "Hey, you should build a museum for this. It's important stuff and should be preserved." Well, they didn't have the money to do it, and neither does UVa, but UVa hasn't junked the equipment; they're keeping it in a climate-controlled building until someone decides they care.
I play the original, not the modern kind. In fact, the original stringed instrument survives to the modern era in French folk tradition.
Our storage is Fibre Channel, and we did evaluate GFS. We found that CentraVision was superior for this customer, mainly because GFS didn't have journaling at the time. GFS may yet become quite superior.
Myrinet has superior scaling when compared to the SP switch, or the T3E switch for that matter. The T3E switch did have higher bandwidths and lower latencies, but for many real supercomputing problems, Myrinet does the job for far less money.
The biggest way that this system doesn't compare well with the T3E is in programming models -- the T3E also supports the SALC model, shared address local consistancy. I hope to support that in around 12 months.
The IBM SP doesn't support the SALC model, and has inferior per-processor bandwidth and latencies.
CentraVision is a traditional proprietary product.
All they've released for Linux so far is a client, which is a kernel module. I'm not sure if they're going to release the metadata server for Linux.
The first answer to your question is that we never have scheduled maintenance. Since the machine isn't monolithic, we can repair most parts while it's live.
This machine is nothing like the SGI SN-IA architecture. SN-IA is still shared memory, and has a significantly faster network (which is far less scalable and far more expensive). Whenever you share memory, you share failures.
We did provide a user-level checkpoint feature to FSL, but it requires the user to modify their program. Kernel-level checkpoint is on our list of things to do. It's not that hard for single processes -- Condor does it, for example -- but it's fairly tough for programs that use MPI and run in parallel.
A PCI bridge would itself pretty much be a network. Myrinet is a great interconnect and is probably much better than any big PCI bridge that you could come out with. The new InfiniBand specification allows bridging of the successor to PCI, but I suspect Myrinet's successor is going to be a better interconnect by the time InfiniBand machines are available.
This system does run regular software without recompiling. It just doesn't use a lot of CPUs for simultaneous compute unless you change the code to use MPI. But they can access the shared storage at high speeds without any change, and they can get farmed out to separate CPUs without any change.
A yellow dress?! Those are pluderhosen, not a dress. Pants. With pockets. Worn by manly Elizabethan men, who carry sharp pointy sticks to poke people who accuse them of wearing dresses.
Hurdy gurdys aren't the same as organ grinders. I don't think monkeys were a part of the act in the 16th century.
They tell me that the first external users (20% of the machine) are going to be ocean modelers. But I think the FSL guys would disagree that it's far more interesting... all of these guys are pretty fanatical about what they do!
No, I don't have a "big air-conditioned room" mentality. In fact, Legion is capable of harvesting unused processor cycles in a much more sophisticated fashion than distributed.net. However, weather forecasting needs too much bandwidth. You have to consider problems on a case-by-case basis for such a low-bandwidth system; most traditional supercomputer problems aren't appropriate.
This doesn't mean I think distributed.net isn't cool -- it's very cool, light-weight, and it gets its job done. It shouldn't be a surprise that it can't solve every problem.
No. As I pointed out, weather codes require a fair amount of bandwidth, much more than that's available in a distributed.net situation. In addition, most weather codes assume that they're running on a uniform machine, so they'd have load-balancing problems if run on a distributed.net type system.
first post?
Trade secrets can't be copyrighted. Consult a lawyer instead of playing one on
We beat SGI on performance on the customer's actual codes. If you have 1/10 the MPI latency and your machine costs 3 times as much, and the customer's codes don't get much of a benefit from reduced latency... You're pretty confused about what a "single system image" can be to different people. Try reading Greg Pfister's book. By the way, Myrinet's CLOS topology is good enough that it doesn't matter where in the machine a job's processors are. That's an important factor simplifying the software that the FSL machine needs to get high performance. FSL tested for inter-job contention, and I suspect SGI flunked. The machine they bought had near-zero inter-job contention. We provide tools that give the sysadmin a single system image, too. There's nothing new there; people administering large clusters have had that for years. I never said that clusters were the solution to every problem. But a cluster was a solution to FSL's problem.
Please don't. We beat SGI's machines in the bid, and this machine provides both higher bandwidth than any SGI Origin machine (300 gigabits bisection bandwidth), and it also does provide a single system image for this customer, who only runs MPI programs. So numerous parts of your comment are wrong.
The storage area network hardware is the usual DDN fibre-channel RAID combined with Broacade FC switches. That's not that exciting.
The software is the interesting part. It's the "CVFS" filesystem, which is from ADIC. They ported this filesystem to Linux for the FSL bid.
That's what the on-site engineer does, answers questions like yours.