A torus shape gives you the easiest way to get a short point to point communications path. It's better than a fat tree or a straight mesh type topology.
The Cray XT3 and Xt4 systems us a X Y Z physical connection. So, X is along the rows and modules within a cabinet (width), Y is vertical within a Cabinet (height), and Z is between the rows (depth).
This works fairly well from a maintenance AND a performance view. You can get some other more esoteric structures built, but they have trade offs in performance vs maintainability. This one works pretty well for these systems
Don't write off the ACF too much. They have a couple of interesting systems in house, not just HECToR. HECToR just gets the attention because of how big it is and how much it cost.
I spent two weeks over in Scotland the first part of December looking for a house. It's a beautiful area. We'll be living in Newtongrange it seems. Only an 8 mile commute to work. The family is going to love it there.
I can't wait to get there and show them the area. We'll be in country around the 15th of January or so.
160 GB Hard Drive vs RAM. Not a good comparison. They didn't mention the SAN storage we have, the tape backups, etc etc. I know. I know. I'm being picky.
That's the first I have ever heard of Hector's House. I love it. I'll be passing on the link to the rest of the team involved. Thanks for the heads up. (-:
The follow on system started off being called Son of HECToR but politics has changed the name to Child of HECToR. Still up in the air as to what it will be. But, one needs to plan way in advance when building these animals.
The people who truly understood how our Nuclear stockpile worked are all older and retiring. For years and years it's not been a field of study that was popular with Phd students. Now we're in the situation where the people who know what's going on are retiring and there are not enough new folks coming down the line that understand what's going on.
Add to that the fact that you can no longer actually set one of these beasts off and they have been sitting idle for decades. What's the state of the current stockpile?
The only way to predict how these decades old weapons will react is to simulate them with a supercomputer as best you can.
The article really didn't say much about HECToR itself. It's a 60 cabinet Cray XT4 system that currently has over 5500 AMD dual core processors. We'll be upgrading it in stages over the next couple of years to over 250 Teraflops. Including some cabinets of the new Black Widow Vector product, now called the Cray X2 system.
The Cray team, myself part of it, is actually a multinational effort. I'm a US citizen who is headed over to maintain the system, we have a Brit on the team and the third is also from outside the UK. It's an interesting situation. The biggest UK system, being maintained by two expats and a local. (-:
ice_hawk55
I've implemented multiple gpfs file systems in the multi terabyte range. It's a pretty robust file system. With full redundancy at the disk/controller/brocade/server level per file system I can still write more the 3 gb/s and read better than 3.5 gb/s. This was a design for redundancy and not performance.
20+ Terabytes of FAStT fibre attached storage. After four "SURPRISE" power outages after Katrina which caused the loss of 12 disks and I still did not lose a single byte of data for the customer. GPFS can be pretty robust if implemented correctly.
I'd have no qualms about putting together a petabyte of gpfs file systems.
First. If I were you I would get ahold of one of the local affiliated churches and speak with them. I'm not a church goer myself. In fact, before this I found organized religion to be something to be put up with. However, since the storm I've been extremely impressed with the help that the local churches have provided. And not to just their flock, or whatever, but to whomever needed the help. They have probably done more for the individual down here than any other organization. I've been very impressed.
Most city governments and the federal government will be of no use at all to you. In fact I wouldn't be surprised if they told you to stay home. Their are nothing if not unorganized at an individual town, commmunity level.
I've been through several areas of Slidell. It's tore up, but you should be able to get just about anything you need locally there. You may be stuck in some ugly lines, but you can get almost all essentials at this point. Water, food, meds, can all be found there now. You probably will NOT find lodging without prior arrangment though.
As for Pass Christian. It's hammered whale crap at this point. It's been pounded flat, rolled and pounded again. You can find all you need within about 10 miles of it though. Gulf Port has all basic essentials, food, water, meds, etc... Again, expect there to be some ugly lines. In fact I had a wonderful dinner at Logan's Road House in Gulf Port this last Sunday. Busy as sin, but oh, hot, excellent food.
Before you get here, suck it up. It'll tear a person up seeing some of this. It won't be the destruction, or smells, which were horrid to say the least, or whatever, but you will find something that will hit you like a ton of bricks. It can be something as simple as a dog starving on the street, or finding someones baby pictures, whatever. But be ready. It will give you nightmares. And it WILL happen. This is like a war zone with all of the good and bad and horrid that's involved.
(For me the breaking point was hearing after 4 days of uncertainty that my daughters friends across the street were alive. I know, stupid reason to break down. But they rode out the storm in Pearlington, MS. Look it up and you'll understand why I held no hope.)
Expect to help on an individual basis. Get down here and help out a couple of people that really need help. Don't expect to rebuild a community. Expect to help a couple of people find some hope. Help em get things moving in the right direction again. Clean up a yard, tear out some carpet, rip out the sheetrock, whatever. I think you'll find the folks down here don't want charity. They just need some help getting back on their knees.
We've had our @$$'s handed to us. We don't need to be rebuilt from outside. We just need the outside to help us up enough that we can rebuild ourselves. Don't take that wrong. We need help, and are so thankfull for help. But, we will get back on our feet. We will stand up and rebuild. That's what makes this place what it is.
If you do get down here. Ping me. I'll buy you and your crew a cup of coffee. I'm in Diamondhead, MS which is right between Slidell, LA and Pass Christian, MS. It's been hell at times. But I've seen neighbor help neighbor in everything from the simplest of things to the giving of the shirt off their backs. It's heartwarming to experience.
Good Luck. God Bless.
Alive and Kicking from the Mississippi Gulf Coast.
You'll find that outside of a single user/job using the whole system that running roughly 80% full load is a good number. Imagine you have a 256 node system. That's 256 compute nodes plus the misc file system nodes, interactive nodes, etc...
Now each node has 8 cpu's and 16 gb of ram. (take the IBM p655 cluster as an example) Now you have 50 to 100 users, each user is doing different research. Some are memory bound, some cpu bound, and some (a few) are communications bound.
Now, How do you get several hundred to several thousand of these misc codes pumped through a single cluster? Job "A" runs on 128 nodes and 1024 processors and takes about 7 hours. Job 2-40 run on 2 to 128 nodes and take anyware from 1 to 72 hours. Etc etc... It's like a crazy 3 dimensional game of tetris. With the X axis being nodes, Y access being processors, and the Z axis being run time. Plus Queue structures, priority jobs, etc..
IE.. Running 80% or so in a diverse environment is actually pretty good. Sad to say..
Icehawk.
Yes it is. Contrary to the flaming replies I feel your question was actually a good one. If you look at the definition of a Supercomputer, and that's a debated definition by the way, it basically breaks down into a computer that is on the leading edge of speed and performance. So, even if the computer is a Beowulf like cluster, also an IBM eCluster 1600 (pretty much the same thing), or a new Cray X1 system they are both supercomputers.
Using a beowulf style system there are many companies doing genetic research that would have been impossible 5 years ago. If you doubt this go to the Institute of Systems Biology website and see what they are doing. With linux clusters no less.
Lawrence Livermore is getting a SUPER computer from IBM that basically consists of a huge cluster of power4 systems to do nuclear research. ect ect ect.
Just a little research will find that most research and developement now takes place with distributed systems. NOT Cray style Vector systems. Don't get me wrong. Cray makes one hell of a good system. But most shops can't afford the money to get one. So out comes the distributed systems.
Remote YES. Out of touch. No way. Where else in the world can you play with 2 Crays, NEC, IBM SP's, SGI's, a linux cluster, ect ect. and then have to stop so everyone can check out the moose walking through the parking lot. Then get on the phone with someone from Lawrence, or Sandia, or the HPCMO, ect. No traffic, no gangs, ect, and yet it's still one of the highest tech centers in the nations. Way cool place!
I find it fascinating that people continue to try an d compare a machine like this to a pc style system. It is similar to comparing a freighter and a ski boat. Sure the ski boat can go just as fast if not faster. But try to use it to get any work done. hmph! Apples and Oranges comparisons.
Lets see someone use a PC to do an ocean model, or how about trying to calculate where that devastating typhoon is going to hit? Sure it may do it. Unfortunately your answer is going to take years to get. A little late.
So, you say do it with a linux cluster. Sure, you may be able to do the same type of work. People are. But they are all specialized programs. Try using a cluster for ocean code, weather, fluid dynamics, bioinformatics, magnetosphere prediction, ect ect. You could very well do one or maybe 2 on a cluster. But with this new Cray/NEC there will be all of these codes and then some. All running at the same time.
I took a job here at the ARSC. It's one of the best places I have ever worked. The temperature swing is something too get used to. (-66 to +99 are the records for Fairbanks). Imagine working in one of the highest tech sites in the country with NO traffic. You get to know all your neighbors. And you get the Aurora's. It is awesome. If a person were to get a chance to work with the people at the ARSC they should jump at the chance. It's been a kick.
A torus shape gives you the easiest way to get a short point to point communications path. It's better than a fat tree or a straight mesh type topology.
The Cray XT3 and Xt4 systems us a X Y Z physical connection. So, X is along the rows and modules within a cabinet (width), Y is vertical within a Cabinet (height), and Z is between the rows (depth).
This works fairly well from a maintenance AND a performance view. You can get some other more esoteric structures built, but they have trade offs in performance vs maintainability. This one works pretty well for these systems
Don't write off the ACF too much. They have a couple of interesting systems in house, not just HECToR. HECToR just gets the attention because of how big it is and how much it cost.
I spent two weeks over in Scotland the first part of December looking for a house. It's a beautiful area. We'll be living in Newtongrange it seems. Only an 8 mile commute to work. The family is going to love it there.
I can't wait to get there and show them the area. We'll be in country around the 15th of January or so.
160 GB Hard Drive vs RAM. Not a good comparison. They didn't mention the SAN storage we have, the tape backups, etc etc. I know. I know. I'm being picky.
That's the first I have ever heard of Hector's House. I love it. I'll be passing on the link to the rest of the team involved. Thanks for the heads up. (-:
The follow on system started off being called Son of HECToR but politics has changed the name to Child of HECToR. Still up in the air as to what it will be. But, one needs to plan way in advance when building these animals.
The people who truly understood how our Nuclear stockpile worked are all older and retiring. For years and years it's not been a field of study that was popular with Phd students. Now we're in the situation where the people who know what's going on are retiring and there are not enough new folks coming down the line that understand what's going on. Add to that the fact that you can no longer actually set one of these beasts off and they have been sitting idle for decades. What's the state of the current stockpile? The only way to predict how these decades old weapons will react is to simulate them with a supercomputer as best you can.
The article really didn't say much about HECToR itself. It's a 60 cabinet Cray XT4 system that currently has over 5500 AMD dual core processors. We'll be upgrading it in stages over the next couple of years to over 250 Teraflops. Including some cabinets of the new Black Widow Vector product, now called the Cray X2 system. The Cray team, myself part of it, is actually a multinational effort. I'm a US citizen who is headed over to maintain the system, we have a Brit on the team and the third is also from outside the UK. It's an interesting situation. The biggest UK system, being maintained by two expats and a local. (-: ice_hawk55
I've implemented multiple gpfs file systems in the multi terabyte range. It's a pretty robust file system. With full redundancy at the disk/controller/brocade/server level per file system I can still write more the 3 gb/s and read better than 3.5 gb/s. This was a design for redundancy and not performance.
20+ Terabytes of FAStT fibre attached storage. After four "SURPRISE" power outages after Katrina which caused the loss of 12 disks and I still did not lose a single byte of data for the customer. GPFS can be pretty robust if implemented correctly.
I'd have no qualms about putting together a petabyte of gpfs file systems.
Icehawk55First. If I were you I would get ahold of one of the local affiliated churches and speak with them. I'm not a church goer myself. In fact, before this I found organized religion to be something to be put up with. However, since the storm I've been extremely impressed with the help that the local churches have provided. And not to just their flock, or whatever, but to whomever needed the help. They have probably done more for the individual down here than any other organization. I've been very impressed.
Most city governments and the federal government will be of no use at all to you. In fact I wouldn't be surprised if they told you to stay home. Their are nothing if not unorganized at an individual town, commmunity level.
I've been through several areas of Slidell. It's tore up, but you should be able to get just about anything you need locally there. You may be stuck in some ugly lines, but you can get almost all essentials at this point. Water, food, meds, can all be found there now. You probably will NOT find lodging without prior arrangment though.
As for Pass Christian. It's hammered whale crap at this point. It's been pounded flat, rolled and pounded again. You can find all you need within about 10 miles of it though. Gulf Port has all basic essentials, food, water, meds, etc... Again, expect there to be some ugly lines. In fact I had a wonderful dinner at Logan's Road House in Gulf Port this last Sunday. Busy as sin, but oh, hot, excellent food.
Before you get here, suck it up. It'll tear a person up seeing some of this. It won't be the destruction, or smells, which were horrid to say the least, or whatever, but you will find something that will hit you like a ton of bricks. It can be something as simple as a dog starving on the street, or finding someones baby pictures, whatever. But be ready. It will give you nightmares. And it WILL happen. This is like a war zone with all of the good and bad and horrid that's involved.
(For me the breaking point was hearing after 4 days of uncertainty that my daughters friends across the street were alive. I know, stupid reason to break down. But they rode out the storm in Pearlington, MS. Look it up and you'll understand why I held no hope.)
Expect to help on an individual basis. Get down here and help out a couple of people that really need help. Don't expect to rebuild a community. Expect to help a couple of people find some hope. Help em get things moving in the right direction again. Clean up a yard, tear out some carpet, rip out the sheetrock, whatever. I think you'll find the folks down here don't want charity. They just need some help getting back on their knees.
We've had our @$$'s handed to us. We don't need to be rebuilt from outside. We just need the outside to help us up enough that we can rebuild ourselves. Don't take that wrong. We need help, and are so thankfull for help. But, we will get back on our feet. We will stand up and rebuild. That's what makes this place what it is.
If you do get down here. Ping me. I'll buy you and your crew a cup of coffee. I'm in Diamondhead, MS which is right between Slidell, LA and Pass Christian, MS. It's been hell at times. But I've seen neighbor help neighbor in everything from the simplest of things to the giving of the shirt off their backs. It's heartwarming to experience.
Good Luck. God Bless.
Alive and Kicking from the Mississippi Gulf Coast.
Ice_hawk55.
You'll find that outside of a single user/job using the whole system that running roughly 80% full load is a good number. Imagine you have a 256 node system. That's 256 compute nodes plus the misc file system nodes, interactive nodes, etc... Now each node has 8 cpu's and 16 gb of ram. (take the IBM p655 cluster as an example) Now you have 50 to 100 users, each user is doing different research. Some are memory bound, some cpu bound, and some (a few) are communications bound. Now, How do you get several hundred to several thousand of these misc codes pumped through a single cluster? Job "A" runs on 128 nodes and 1024 processors and takes about 7 hours. Job 2-40 run on 2 to 128 nodes and take anyware from 1 to 72 hours. Etc etc... It's like a crazy 3 dimensional game of tetris. With the X axis being nodes, Y access being processors, and the Z axis being run time. Plus Queue structures, priority jobs, etc.. IE.. Running 80% or so in a diverse environment is actually pretty good. Sad to say.. Icehawk.
Yes it is. Contrary to the flaming replies I feel your question was actually a good one. If you look at the definition of a Supercomputer, and that's a debated definition by the way, it basically breaks down into a computer that is on the leading edge of speed and performance. So, even if the computer is a Beowulf like cluster, also an IBM eCluster 1600 (pretty much the same thing), or a new Cray X1 system they are both supercomputers. Using a beowulf style system there are many companies doing genetic research that would have been impossible 5 years ago. If you doubt this go to the Institute of Systems Biology website and see what they are doing. With linux clusters no less. Lawrence Livermore is getting a SUPER computer from IBM that basically consists of a huge cluster of power4 systems to do nuclear research. ect ect ect. Just a little research will find that most research and developement now takes place with distributed systems. NOT Cray style Vector systems. Don't get me wrong. Cray makes one hell of a good system. But most shops can't afford the money to get one. So out comes the distributed systems.
Remote YES. Out of touch. No way. Where else in the world can you play with 2 Crays, NEC, IBM SP's, SGI's, a linux cluster, ect ect. and then have to stop so everyone can check out the moose walking through the parking lot. Then get on the phone with someone from Lawrence, or Sandia, or the HPCMO, ect. No traffic, no gangs, ect, and yet it's still one of the highest tech centers in the nations. Way cool place!
I find it fascinating that people continue to try an d compare a machine like this to a pc style system. It is similar to comparing a freighter and a ski boat. Sure the ski boat can go just as fast if not faster. But try to use it to get any work done. hmph! Apples and Oranges comparisons. Lets see someone use a PC to do an ocean model, or how about trying to calculate where that devastating typhoon is going to hit? Sure it may do it. Unfortunately your answer is going to take years to get. A little late. So, you say do it with a linux cluster. Sure, you may be able to do the same type of work. People are. But they are all specialized programs. Try using a cluster for ocean code, weather, fluid dynamics, bioinformatics, magnetosphere prediction, ect ect. You could very well do one or maybe 2 on a cluster. But with this new Cray/NEC there will be all of these codes and then some. All running at the same time.
I took a job here at the ARSC. It's one of the best places I have ever worked. The temperature swing is something too get used to. (-66 to +99 are the records for Fairbanks). Imagine working in one of the highest tech sites in the country with NO traffic. You get to know all your neighbors. And you get the Aurora's. It is awesome. If a person were to get a chance to work with the people at the ARSC they should jump at the chance. It's been a kick.