Slashdot Mirror


30,000-Core Cluster On Amazon EC2

Joining the ranks of accepted submitters, hooligun writes with an article in Ars Technica about a rather large cluster built on EC2. From the article: "The details are impressive: 3,809 compute instances, each with eight cores and 7GB of RAM, for a total of 30,472 cores, 26.7TB of RAM and 2PB (petabytes) of disk space. Security was ensured with HTTPS, SSH and 256-bit AES encryption, and the cluster ran across data centers in three Amazon regions in the United States and Europe."

59 comments

  1. use? by Anonymous Coward · · Score: 0

    what exactly was it used for?

    1. Re:use? by ge7 · · Score: 0

      RTFA. It says for viagra and other pharma stuff.

    2. Re:use? by bigjocker · · Score: 2, Funny

      They are using it to pump the economy. The heating produced by this cluster must be cooled with extra air conditioning systems, increasing the demand for power and for air conditioning unis, thus creating new jobs and incentivizing the research for new energy sources.

      --
      Life isn't like a box of chocolates. It's more like a jar of jalapenos. What you do today, might burn your ass tomorrow.
    3. Re:use? by Anonymous Coward · · Score: 0

      There's no way. That would have made it into the title and the summary; you know how we love BitCoin.

    4. Re:use? by jsnipy · · Score: 0, Offtopic

      to run Crysis

      --
      -- if you mod me down, I will become more powerful than you can possibly imagine
    5. Re:use? by 0123456 · · Score: 1

      to run Crysis

      Ray-traced :).

    6. Re:use? by hedwards · · Score: 1

      Hacking the Gibson?

    7. Re:use? by gregrah · · Score: 1

      I love the fact that there are at least 5 answers above mine, and no one has actually RTFA, so no one actually knows.

  2. Why explain Petabytes by Anonymous Coward · · Score: 0

    If you don't know the scale from yocto to yotta, then you need hand in your geek card.

    1. Re:Why explain Petabytes by tehcyder · · Score: 1

      If you don't know the scale from yocto to yotta, then you need hand in your geek card.

      How many digits do yu know pi to? I always say that if you don't know at least the first thousand, you're no geek, and should have your geek card forcibly removed at gunpoint.

      --
      To have a right to do a thing is not at all the same as to be right in doing it
  3. First time accepted submitter Beowulf by clyde_cadiddlehopper · · Score: 1

    Imagine the possiblilities. /. on steroids.

    --
    Obi-Wan: "I felt a great disturbance in the Force, as if millions of voices suddenly cried out in terror and were sudden
  4. HTTPS for security? by ChipMonk · · Score: 0

    Let's hope their European nodes didn't use any certs from Diginotar.

    But at least they weren't using RSA tokens for authentication.

    1. Re:HTTPS for security? by Anonymous Coward · · Score: 0

      Are you serious or just trolling? What has private infrastructure + HTTPS have to do with public certificates? Yeah, nothing.

    2. Re:HTTPS for security? by Anonymous Coward · · Score: 0

      Security was ensured with HTTPS, SSH and 256-bit AES encryption

      This is more like a marketing copy than a Slashdot story. These are just technologies/techniques. How they're used can determine security. Everybody has SSH, yet some of them get hacked.

    3. Re:HTTPS for security? by Anonymous Coward · · Score: 0

      Let's hope the USA nodes aren't under the jurisdiction of the...oh. =/

  5. $1279 per hour by certron · · Score: 2

    Before anyone else asks what I was about to, the full title of the article is: $1,279-per-hour, 30,000-core cluster built on Amazon EC2 cloud

    How does that compare to the cost-per-core-hour for other Amazon EC2 offerings? Is this a value meal deal or just a lot of burgers?

    --

    fair.org counterpunch.com truthout.com indymedia.org salon.com
    eff.org guerrilla.net debian.org gentoo.org
    1. Re:$1279 per hour by mikeytag · · Score: 3, Informative

      The article said each instance had 7GB of memory and 8 cores. That would translate to the High-CPU Extra Large Instance Type:

      High-CPU Extra Large Instance 7 GB of memory, 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each), 1690 GB of local instance storage, 64-bit platform
      Source: http://aws.amazon.com/ec2/

      That instance type will run you $0.68/hour standard or $0.24/hour spot. (US-East Pricing) (Spot pricing allows you to take advantage of unused EC2 instances at a discount. Also worth noting is that spot pricing changes over time.)

      30,000 cores equates to 3,750 instances across different regions. Here is the breakdown on hourly pricing for standard and spot. (Reality is it was probably a mixture of both and the pricing for different regions varies).

      Standard US-East: $2,550/hour
      Spot US-East: $900/hour

      The exact mix of machines in each region wasn't specified but $1,279/hour sounds about right if there is a mix of standard vs spot across different regions.

    2. Re:$1279 per hour by Anonymous Coward · · Score: 0

      Once you get past the initial administrative costs, commercial clusters usually only discount for guaranteed subscriptions or making your jobs lower-priority in the queues. I've never heard of a cluster giving bulk discounts, perhaps because it's rare for a reasonably-designed cluster to lack customers. If anything, a large cluster user would cause more headaches for the people running the cluster -- sort of like how larger dinner groups are charged a large gratuity because they cause a headache in coordination and making sure other customers are well-served.

    3. Re:$1279 per hour by geekthecat · · Score: 1

      It More like a SH!t burger! Wow! The performance is not impressive at all, yet the money there making is. Because the servers they are using are probably IBM POWER7. The article states that last jun 7,000 cores on EC2 was capable of ranking at 232 on the Top 500 list of super computers with a performance of 41.82 Teraflops. So looking over the list and comparing how a 7,000 cores of POWER7 system will do. We see at rank 50 a 6,912 cores of POWER7 with a performance of 212.12 Teraflops. Now lets up the cores to 30,000, so (30/7=4.28571) now 4.28571*41.82 = 179.22857 Teraflops for an INTEL i7 solution, and (30/6.9 = 4.34783) now 4.34783 * 212.12 = 922.2617 Teraflops to an IBM POWER7 solution. Wow! IBM's POWER7 has over 5 times the performance of INTEL's i7. So the cost of 30,000 POWER7 cores would be (30,000/16 = 1,875) which equates to 1,875 PS702 Blade severs each at a cost of $16,544.00 so the total cost would be (1,875 * $16,544.00 = $30,020,000.00) roughly 30 million and the cost of 30,000 INTEL i7 would be (30,000/12 = 2,500) so 2,500 HP Proliant BL2x220 blade server would cost (2,500 * $15,059 = $37,647,50) roughly $37 million. Now if AMAZON charges $1,279 per hour for 179.22857 Teraflops of 30,000 cores of i7 prcessors, then 30,000 cores of POWER7 would be 922.2617 Teraflops an increase in performance by a factor of (922.2617/179.22857 = 5.14573) so 5 time the performance 5 time the profit. Profit would be (5 * $1,279.00 = $6,395.00) so $6,395.00 * 24 hours * 365 days = $56,020,200.00 million using IBM's POWER7 or ($56,020,200.00 / 5.14573 = $10,886,735.2154) about $11 million using INTEL's i7 processors. So in conclusion you'll pay more for INTEL's i7 products and they're 5 times slower then IBM's POWER7. Just goes to show how many idiots there are in IT.

  6. security for sure by Anonymous Coward · · Score: 0

    Security ensured by HTTPS, SSH and 256-bit AES? That's alright then, no need to worry about security any more.

    1. Re:security for sure by Anonymous Coward · · Score: 0

      Not very smart are you?

      You can only get data in/out via HTTPS, the data was encrypted with 256-bit AES, and the only means of login was SSH with public keys.

      Assuming (yea, assuming) the HTTPS daemon and SSH installation was bulletproof, that's a pretty tough nut to crack.

    2. Re:security for sure by GameboyRMH · · Score: 1

      Of course if Amazon had to, they could rip the storage encryption key from the VM's RAM...

      --
      "When information is power, privacy is freedom" - Jah-Wren Ryel
  7. Isn't EC2 really a cluster? by pz · · Score: 1

    Help me understand something here ... isn't EC2 really one gargantuan cluster far bigger than 30,000 cores? So why is it news that it ran a big job? Was there some significant step forward in software that allowed features that were not previously available on EC2?

    --

    Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
    1. Re:Isn't EC2 really a cluster? by Happy+Finish · · Score: 1

      Help me understand something here ... isn't EC2 really one gargantuan cluster far bigger than 30,000 cores? So why is it news that it ran a big job? Was there some significant step forward in software that allowed features that were not previously available on EC2?

      TFA is angled more at the fact that anyone can go out and rent something like this for their own ends.

  8. Windows 8 by andyring · · Score: 0

    Finally - a computer that can fully handle Windows 8!

    1. Re:Windows 8 by Hotweed+Music · · Score: 0

      You mean that operating system that hasn't come out yet that runs on 1 GB of ram and 800 MHZ?

    2. Re:Windows 8 by Anonymous Coward · · Score: 0

      It can't even do that - no display ;-)

    3. Re:Windows 8 by Anonymous Coward · · Score: 0

      Cluster or cluster-ing? You can can't do what you already are. Windows 8 should make that apparent.

    4. Re:Windows 8 by Rizimar · · Score: 0

      But it still can't run Crysis

    5. Re:Windows 8 by Fortunato_NC · · Score: 1

      I know I'm feeding the troll but...

      I'm running the Windows 8 developer preview (64-bit) on a five and a half year old laptop. Granted, I kicked the RAM up to 4GB ($44 shipped from NewEgg) and replaced the Core Duo with a Core 2 Duo (a T5600, $25 used on fleabay buy it now), but it runs well at 1900x1200 on hardware I basically rescued from the dumpster. You need to update your stock lines and stop mindlessly bashing.

      --
      Blogging Weight Loss, Distance Education, and more at verlin.com
  9. Impressive by Anonymous Coward · · Score: 0

    I assume the next step is Permutation City?

  10. But, what was their password? by G3ckoG33k · · Score: 1

    But, what was their password? So many details about that computer, but no password...

  11. See what you can do by Osgeld · · Score: 0

    when you dodge taxes

  12. Link a percentage of the top 500 together! by Commontwist · · Score: 1

    How powerful would one estimate linking multiple cloud and, of ten percent of the top 500 supercomputers would be? That would be one massive number cruncher.

    1. Re:Link a percentage of the top 500 together! by Anonymous Coward · · Score: 0

      Not very powerful, a single horse can pull weight.

    2. Re:Link a percentage of the top 500 together! by mscman · · Score: 1

      Cloud computing and the Top500 computers are comparing different things. Generally, "Clouds" cannot efficiently run codes you would run on a Top500 machine, and vice-versa. They are large machines serving different purposes.

    3. Re:Link a percentage of the top 500 together! by F.Ultra · · Score: 1

      They are actually set up quite similarly, the key difference is that cloud usually uses virtualization while the super computers doesn't so there is about 5-10% slowdown which you have to compensate by using more nodes.

    4. Re:Link a percentage of the top 500 together! by Anonymous Coward · · Score: 0

      Communication between nodes is very important for a lot of jobs that run on the supers. It's hard enough getting low latency / high bandwidth communications between all the nodes on a super. Linking them in the cloud would really just slow them down.

    5. Re:Link a percentage of the top 500 together! by mscman · · Score: 1

      No, they really aren't. I work on a top 20 machine, and can tell you that attaching this via a high-latency interconnect (read: the web) would completely kill the purpose of using this machine. And no, you cannot just "compensate by using more nodes." Amdahl's law kills that idea right out. I've worked in both "cloud computing" (back when it was known as "grid computing") and HPC or High-Performance Computing. While they are similar in some ways, they are designed to fulfill different purposes and are best suited to different job types. The codes being run by Cycle for this project are EP codes, ones that would not necessarily benefit from the top 50 machines in the world. These machines are better suited for MPP work, which depends more on low-latency, high-speed interconnects.

    6. Re:Link a percentage of the top 500 together! by F.Ultra · · Score: 1

      Ok so the nodes in the cloud is not connected via Infiniband but by Gigabit Ethernet but what made you think that they where connected via the web? And still I don't think that it invalidates that they are constructed quite similarly. Oh and of course you can offer inifiband clustered nodes as the cloud.

    7. Re:Link a percentage of the top 500 together! by F.Ultra · · Score: 1

      The machines that make up the cloud is set up quite like supers. It's just that they might get a little lousier connection (GB Ethernet instead of Infiniband) but that is by choice of Amazon, there is nothing stopping them from selling IB connected nodes in their cloud. Of course in this specific example they connected more than one cloud with each other and then they had to communicate over the Internet. But that was also by choice and could be compared to when you connect different super centers which happens from time to time.

  13. Bandwidth? by oliverk · · Score: 1

    Didn't we just read that the US has fallen to #25 on the international speed list? So, is this like serving up Skynet over a 28.8 modem?

    --
    ---- Please be nice in case my Slashdot karma ~= my real life karma.
  14. Charity? by damuhatori · · Score: 1

    They should donate a couple of hours a month to curing a disease.

  15. Joining the ranks of accepted submitters, by Khyber · · Score: 0

    Nobody gives two fucks. There's over 2 million registered UIDs on this site. Slashdot isn't some popularity contest. Quit turning Slashdot into fucking Digg or Reddit.

    --
    Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    1. Re:Joining the ranks of accepted submitters, by Anonymous Coward · · Score: 0

      Gotta agree with you here. It was bad enough with "first time accepted submitter", and now the struggle to come up with a clever new wording every time is getting very tiring.

  16. Seriously Expensive by Anonymous Coward · · Score: 0

    It is great that folks can do this kind of stuff in the cloud. However, looking at the cost structure for this cluster, it becomes very expensive, very quickly, at around $11,204,040 per year to operate assuming 24x7 operation. Also, this kind of HPC configuration is not for everyone. HPC environments can require high speed interconnects with super low latency, like Infiniband or Myrinet which is something I doubt that Amazon has invested in due to the cost of these types of solutions. If your application utilizes message passing interface (MPI) this solution is most likely not for you.

    So for one off types of jobs, Amazon may be a great choice where you do not need to make a large initial upfront investment in technology and your HPC usage is spotty at best. However if you need a long term solution that will be operating 24/7 or you have security requirements that prevent operation in the cloud, you are better off making the investment into high performance computing. Just keep in mind that this type of solution is not for everyone.

  17. More on the security... by theendlessnow · · Score: 0

    You can verify the certificates used with DigiNotar... well.. site looks down... maybe when they are back up...

  18. Wow by kybur · · Score: 0

    Imagine a beowulf cluster of these!

  19. Weather prediction by Anonymous Coward · · Score: 0

    Seeing how their total lack of reliability kept some webpages (meneame.net) down for about two days after that storm last month i just hope they put it to work on weather prediction.

  20. Why don't you by Anonymous Coward · · Score: 0

    Why don't you post an html address for this machine, so we can see if it can survive being slashdotted?

  21. Crysis? by Anonymous Coward · · Score: 0

    But will it run Crysis at max settings?

  22. Take that anonymous by Anonymous Coward · · Score: 0

    A few months back, old Anonymous tried to 'take out' Amazon by using the LOIC (Low Orbit Ion Cannon) in 'hive mind' mode. They found that even with dozens of kids sending thousands of requests per second they couldn't DDOS a.k.a. "Slashdot effect" Amazon. There was disappointment in geekdom. Lesser targets fell easily, but this one proved too strong, "We don't have enough machines" one hapless and dejected nerd wrote. Now you know why Sparky! A single machine running LOIC can take out one or two hyperthreaded cores, but if they have thousands of hyperthreaded cores, you will need half as many machines to run the attack (for it to be effective). They never had that many recruits willing to let their IP address be collected by the feds (or enough who didn't realize that their IP address would be sniffed by the feds), and so were unable to take Amazon down. A big fat load balancer feeding 30000 hyperthreaded cores can swallow everything the LOIC can feed it and not lose a byte or break a sweat. I heard that Amazon has 'on demand' systems that power up and respond to external requests. If 1000 new machines started sending loic requests at a given time, you could temporarily DDOS amazon (for the amount of time it takes for their servers to ramp up and handle the load). I suspect that would last only for a few minutes.

  23. communication latency by Orp · · Score: 1

    Neat, but for any job that isn't embarrassingly parallel, communication latency and speed will kill you when your nodes are spread across continents. If you're not doing any communication, well then groovy. Usually these large core servers are only 'earning their keep' when you're taking advantage of very fast interconnect hardware and doing things that can't be done by just a bunch of CPUs.

    --
    A squid eating dough in a polyethylene bag is fast and bulbous, got me?
  24. You've GOT to respect Amazon & Microsoft by Anonymous Coward · · Score: 0

    Mainly because they're LITERALLY QUITE IMPERVIOUS to the "unstoppable attack online" - the DoS/DDoS!

    How/Why?

    Well, because they've "overbuilt" their ENTIRE infrastructure for one thing, hardware-wise, for telecommunications!

    They also monitor their levels of "hits" their sites get, & IF they get too high (as they do in DDoS)? They can stall any that are coming from unrouteable addresses (think 172.x.x.x, 192.x.x.x, etc./et al).

    Microsoft also has settings in its IPStack that help "stall out" DDoS/DoS too:

    SynAttackProtect, EnableDynamicBacklog, MaximumDynamicBacklog, MinimumDynamicBacklog, TcpMaxDupAcks, TcpMaxHalfOpen, TcpMaxHalfOpenRetried

    Those ALLt work IN COMBINATION with one another @ THE OPERATING SYSTEM'S IP STACK LEVEL!

    (Also in combination with hardware measures noted above both MS & Amazon do, to stall off "the unstoppable attack method" (the DoS/DDoS)).

    APK

    P.S.=> It's the "how & why" you NEVER see Amazon OR Microsoft getting news that "anonymous/lulzsec" (& the like) "took down MS/Amazon via DoS/DDoS"...

    (Because you KNOW that'd be "big news" IF it went down, of course... especially around here with all the "Pro-*NIX" sentiment (from the sockpuppet FUD spreading trolls that keep 100 user accounts to attempt to fool others with that bullshit))... apk