Slashdot Mirror


SpaceX Will Deliver The First Supercomputer To The ISS (hpe.com)

Slashdot reader #16,185, Esther Schindler writes: "By NASA's rules, not just any computer can go into space. Their components must be radiation hardened, especially the CPUs," reports HPE Insights. "Otherwise, they tend to fail due to the effects of ionizing radiation. The customized processors undergo years of design work and then more years of testing before they are certified for spaceflight." As a result, the ISS runs the station using two sets of three Command and Control Multiplexer DeMultiplexer computers whose processors are 20MHz Intel 80386SX CPUs, right out of 1988. "The traditional way to radiation-harden a spacecraft computer is to add redundancy to its circuits or by using insulating substrates instead of the usual semiconductor wafers on chips. That's expensive and time consuming. HPE scientists believe that simply slowing down a system in adverse conditions can avoid glitches and keep the computer running."

So, assuming the August 15 SpaceX Falcon 9 rocket launch goes well, there will be a supercomputer headed into space -- using off-the-shelf hardware. Let's see if the idea pans out. "We may discover a set of parameters with which a supercomputer can successfully run for at least a year without errors," says Dr. Mark R. Fernandez, the mission's co-principal investigator for software and SGI's HPC technology officer. "Alternately, one or more components of the system will fail, in which case we will then do the typical failure analysis on Earth. That will let us learn what to change to make the systems more reliable in the future."

The article points out that the New Horizons spacecraft that just flew past Pluto has a 12MHz Mongoose-V CPU, based on the MIPS R3000 CPU. "You may remember its much faster ancestor: the chip that took you on adventures in the original Sony PlayStation, circa 1994."

52 of 98 comments (clear)

  1. So whats with the laptops then? by Anonymous Coward · · Score: 4, Interesting

    If you look at the ISS webcam when it switches to the interior cam, there's a few laptops (one running Ubuntu) tied to the sides of the walls.

    1. Re:So whats with the laptops then? by Anonymous Coward · · Score: 1

      Those laptops aren't running life support systems.

    2. Re:So whats with the laptops then? by Kyusaku+Natsume · · Score: 4, Informative

      Those laptops aren't running life support systems.

      Exactly. If the laptops freeze is a minor inconvenience. If the main computers of the ISS freeze the humans inside will freeze too.

      --
      Mexico: 100% conservative's America now!
    3. Re:So whats with the laptops then? by Areyoukiddingme · · Score: 5, Informative

      If you look at the ISS webcam when it switches to the interior cam, there's a few laptops (one running Ubuntu) tied to the sides of the walls.

      The laptops don't run any essential systems directly. The 80386SX variants they're talking about control lifesystems. The laptops are for user interfaces and monitoring. There's somewhere around 80 of them on board the station, between station interfaces and payload interfaces. In 2013, a bunch of them were migrated to Linux, specifically Debian 6, according to reports. They used to run Windows NT and XP. The article is a press release written to overemphasize the hardened CPUs, which are by far the minority on board, to make this experimental launch of a pair of HP Apollo pc40s seem more impressive than it is.

      Information about the reliability of the laptops is damn hard to find. I'm guessing NASA signed some sort of agreement with IBM to prevent publication of such information. IBM had the exclusive right to fly laptops to the US side of the space station for years, and Lenovo retained that right for some time. It was only recently that they lost it and NASA selected HP to provide the newest laptops.

      Random forum posts from people involved indicate that the laptops crash with monotonous regularity. I suspect they would be a lot more stable if they had ECC RAM with aggressive scrubbing, but laptops with ECC RAM didn't exist until 2015 when Lenovo finally released a laptop with a Xeon in it. Odds are that none of the laptops on the ISS right now have ECC RAM.

      These two HP Apollo modules do have ECC RAM. They're Broadwell core Xeon CPUs with 12 DDR4 DIMM slots and up to 4 nVidia Tesla P100 boards in them. Either the linked article is crap, or the Apollo units don't have any Teslas installed, because the article says their "speed is over 1 TeraFLOP", which is pretty feeble. With 4 P100s in them, each Apollo should be able to produce ~38 single-precision TeraFLOPS. The article is very poor, but at a guess, the P100 boards are not installed for cooling reasons. As it is, they're having to include a liquid cooling cabinet for them, because air cooling doesn't behave too well in microgravity. Either that or the P100s are installed, the liquid cooling can handle them, and the article is garbage. Between the ECC RAM and underclocking the CPUs, they're hoping these machines can run long enough between crashes to be useful.

    4. Re:So whats with the laptops then? by Anonymous Coward · · Score: 1

      but most of it is dies after a year or two.

      How again does this differ from the situation on earth?

    5. Re: So whats with the laptops then? by Anonymous Coward · · Score: 5, Informative

      Ive put a lot of hardware on ISS. Have a few systems going tomorrow on Spx12 actually so I have a bit of inside info here. I asked at a flight qualification panel about this a few years ago and was told that to date, no cots cpu hardware has experienced either an SEI or had problems due to TID. Apparently the biggest problems experienced were infant mortality on thinkpads that went up in 2010ish, but the same failures existed terrestrially so it was linked to a bad lot of HDs.

      Thus far, weve had beaglebones, raspberry pis, and a few odroids running on station for years and havent seen a single problem.

      LEO isnt really a hostile environment for silicon.

    6. Re:So whats with the laptops then? by MobileC · · Score: 1

      From TFA.
      "More modern hardware can be found in space; there are laptops on the ISS, 2007-vintage ThinkPad T61p running Debian, Scientific Linux, and Windows 7. They are being replaced by HP ZBook 15s which will run the same mix of Linux distributions and Windows 10. The Linux systems act as remote terminals to C&C MDM, while the Windows systems are used for e-mail, the web, and recreation.

      But those laptops are not high-availability, high-performance computers. They're ordinary laptops which are expected to fail. Indeed, there are over a hundred laptops on the ISS and most are defunct."

      --

      Fran
      :):):)
      1st 1st Poster of the new Millennium!

    7. Re:So whats with the laptops then? by unixisc · · Score: 1

      IIRC, 386SX never ran NT, much less XP. Those were among the first 386s out there, and for the bulk of their lifetime, the popular OS was Windows 3.1, maybe even 95 & 98. But when NT started, recommended starting x86 CPU was always a 486, preferably a Pentium. This application looks like it used the 386SX for embedded, so other OSs like QNX might have been usable here.

      Reading about the R3000 CPU used in the New Horizons Spacecraft, wonder what OS it ran? Some Unix - like RISC/OS or Ultrix? Or Linux or NetBSD?

    8. Re:So whats with the laptops then? by Solandri · · Score: 1

      The laptops are a way to bypass the long testing and approval process which keeps ancient computers in aerospace. Airplanes have the same problem, often using technology a decade or older because that's the computer which the plane was certified with. Upgrading the computers involves re-certifying the plane, which is horribly expensive unless you're re-certifying it anyway (e.g. new model of the plane).

      With a laptop, you can grab one off the shelf and just launch it to see if it works in space - that currently costs about $4000 per pound ($9k per kg) which is cheaper than radiation hardening and testing it. In the early days of the Space Shuttle, the most powerful computer aboard was a HP 41CX calculator.

    9. Re:So whats with the laptops then? by dbIII · · Score: 1

      Information about the reliability of the laptops is damn hard to find

      And probably not especially relevant because failure is likely to be driven by events (large temperature difference, sudden acceleration, bumping into things etc) and because they probably don't stay in the same location. A machine that is built into something instead of being moved around is something more likely to fail due to the situation than an event.

      We already know that dropping laptops is bad (or in zero-G running into things at speed) and that unforced convection cooling in zero-G is not as good as we get here. If we want something we don't know I think more controlled circumstances would help.

    10. Re: So whats with the laptops then? by Areyoukiddingme · · Score: 3, Interesting

      I asked at a flight qualification panel about this a few years ago and was told that to date, no cots cpu hardware has experienced either an SEI or had problems due to TID.

      I'm not too surprised that lattice displacement damage has been minimal. While the station has been up there for a lot of years now, the laptops in use have been rotated out quite regularly. After all, they started with Thinkpad 700 series, which were 80486s of various flavors. Routine upgrades have been sufficient to avoid total ionizing doses big enough to be noticeable.

      I'm astonished to hear that absolutely no COTS digital electronics have ever experienced crash or corruption inducing single event effects (When did they change the acronym from SEE to SEI?). I'd be willing to bet that there have been SEE/SEI crashes, but generations of craptacular Microsoft operating systems have concealed them. It's quite clear from the Alpha Magnetic Spectrometer on board that the station is getting pelted with high energy protons day in and day out, not to mention the heavier stuff that contributes significantly to the radiation exposure astronauts have to keep track of. One of those particles hitting the right transistor will most certainly change the value stored in a DRAM cell, and now that we're talking about billions of cells with a transistor each, that's a lot of targets.

      I have to ask, when you mention Beaglebones etc. being on station for years, does that involve years of uptime, or are these things being regularly rebooted? If they're being rebooted, how frequently?

    11. Re:So whats with the laptops then? by edxwelch · · Score: 1

      Maybe they should have use Epyc, which has much better RAS features than Broadwell: http://www.amd.com/system/file...

    12. Re:So whats with the laptops then? by chihowa · · Score: 1

      Indeed, there are over a hundred laptops on the ISS and most are defunct.

      With a proper defunct laptop ejection port, they could use them for minor course corrections.

      --
      If you want a vision of the future, imagine a youtube comments section scrolling - forever.
    13. Re:So whats with the laptops then? by jimtheowl · · Score: 1

      To be inclusive, the Shuttle was controlled by five AP-101 computers (one as cold, the other as hot spare), had 16 x 32-bit registers and could process 480,000 instructions per second.

      The HP-41C were useful on board tools, used for calculating the change to the center of gravity due to fuel consumption and could be used as backup to the main computer to determine ignition times for re-entry. They were nice, but not quite in the same league.

    14. Re: So whats with the laptops then? by tlhIngan · · Score: 1

      I'm astonished to hear that absolutely no COTS digital electronics have ever experienced crash or corruption inducing single event effects (When did they change the acronym from SEE to SEI?). I'd be willing to bet that there have been SEE/SEI crashes, but generations of craptacular Microsoft operating systems have concealed them. It's quite clear from the Alpha Magnetic Spectrometer on board that the station is getting pelted with high energy protons day in and day out, not to mention the heavier stuff that contributes significantly to the radiation exposure astronauts have to keep track of. One of those particles hitting the right transistor will most certainly change the value stored in a DRAM cell, and now that we're talking about billions of cells with a transistor each, that's a lot of targets.

      It's actually a matter of area. As in the area of the silicon that is vulnerable.

      Right down here on earth, use of ECC DRAM is a must on a cluster, because most clusters are stuffed full of RAM That a SEU (single event upset - the current terminology) has a really good chance of upsetting a bit in the memory of a node in the cluster.

      Someone tried to build a cluster of PowerMac G5 computers. The cluster could not be booted completely before an SEU caused a crash.

      http://spectrum.ieee.org/compu...

      In a single system like a laptop, the area is low enough that you might not see it (especially if it hits memory that isn't actively being used). In a cluster, that changes things and on bit might affect the whole cluster.

  2. And this... by arpad1 · · Score: 1

    is how Skynet begins.

    --
    Minutus cantorum, minutus balorum, minutus carborata descendum pantorum.
  3. Re:but why? by ganjadude · · Score: 1

    latency is my best guess

    --
    have you seen my sig? there are many others like it but none that are the same
  4. Re:Typical Elon by 93+Escort+Wagon · · Score: 4, Informative

    The part you seemed to have missed is: This is an experiment to learn whether an alternative approach to hardening can be developed. If it's successful, the benefits would be obvious.

    Experiments are the raison d'etre for the ISS... so why is this a problem?

    --
    #DeleteChrome
  5. Gamma radiation... by __aaclcg7560 · · Score: 4, Interesting

    Whenever something inexplicable happened while testing a video game, I've always put down "gamma radiation" on the bug report. The developers hated that term but they couldn't explain why it happened either.

    1. Re:Gamma radiation... by __aaclcg7560 · · Score: 1

      Sunspots [wisc.edu]

      And gamma radiation is not listed as an excuse. You really can't rule out gamma radiation as a contributing factor when something goes wrong.

    2. Re:Gamma radiation... by __aaclcg7560 · · Score: 1

      Yes you can. You're thinking of cosmic rays. Not the same thing.

      Gamma radiation caused Bruce Banner to become the Hulk. You might be thinking of the Fantastic Four that got exposed to cosmic rays.

    3. Re:Gamma radiation... by angel'o'sphere · · Score: 1

      Wow, meanwhile you even managed to recruit hate mods :)
      Understandable, they know about alpha radiation, some even about beta, but with gama you simply lost them.

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    4. Re:Gamma radiation... by __aaclcg7560 · · Score: 1

      You started with bugs in video games and ended up with comic books.

      Video games and comic books tend to go hand in hand.

      Your ADHD is reaching new heights.

      Oh, I forgot. You need a box of crayons.

    5. Re:Gamma radiation... by __aaclcg7560 · · Score: 1

      You're insane.

      No. I'm normal. Everyone else is insane.

    6. Re:Gamma radiation... by __aaclcg7560 · · Score: 1

      You're mentally deranged.

      Only on Slashdot.

    7. Re:Gamma radiation... by __aaclcg7560 · · Score: 1

      From what I've seen, I think you're an insufferable douchebag pretty much all the time.

      Only on Slashdot.

      How someone who has achieved so little can be so full of himself is remarkable.

      I was misdiagnosed as being mentally retarded in kindergarten due to an undiagnosed hearing loss in one ear. For years I've been told by people what I can't do. For years I've been pissing off people by what I can do.

    8. Re:Gamma radiation... by __aaclcg7560 · · Score: 1

      Shitposting, Amazon affiliate spam, being fat, and being a general nuisance.

      Only on Slashdot.

      Good job.

      You're welcome!

    9. Re:Gamma radiation... by ls671 · · Score: 1

      Our testers do that only for non-reproducible bugs. Who cares if you are unable to explain the bug? That's the job of the bug fixer.

      --
      Everything I write is lies, read between the lines.
    10. Re:Gamma radiation... by Anonymous Coward · · Score: 1

      I used to have a gaming PC that used to crash five seconds before my "extreme weather warning app" on my smartphone sent off an alarm. Either events can travel back in time, or the weather radar station close to my apartment was receiving an echo that somehome affected the PC (which did have a clear transparent "window" on the side rather than a solid metal case).

    11. Re:Gamma radiation... by __aaclcg7560 · · Score: 1

      I used to have a gaming PC that used to crash five seconds before my "extreme weather warning app" on my smartphone sent off an alarm. Either events can travel back in time, or the weather radar station close to my apartment was receiving an echo that somehome affected the PC (which did have a clear transparent "window" on the side rather than a solid metal case).

      Sounds like the PC in "Thrice Upon A Time" by James P. Hogan that could send email forward or backward in time.

  6. Re:but why? by Anonymous Coward · · Score: 2, Interesting

    At a guess it's because sending data back to earth for processing isn't great when you're a long way away - the latency between Earth and Mars, for example, can get up to about 21 minutes. If your lander has to adjust for local weather systems, or your orbital station needs to make corrections due to local changes in EM fields, or if you're just operating in an environment where you can't predict exactly what conditions you're going to find, you need to do a lot of calculations to correct.

    Of course this isn't an issue for the ISS, with a latency shorter than my ping to Google (seriously, my internet sucks). But if we're going to look at landers on Europa, exploring Ganymede etc it'll be easier if we can do some heavy computing on the fly. So test now in a controlled environment, and get it right for when we send stuff on 20 year missions.

  7. Launch is the 14th not 15th by DiniZuli · · Score: 1

    Go see it for yourself http://www.spacex.com/webcast

  8. Re:Typical Elon by KiloByte · · Score: 2

    I'd instead go with a RAIA -- a horde of off-the-shelf ARMs. Within the power budget of a single 20MHz 80386 you can fit nine 2GHz SoCs. Have them vote -- there's no way every single of them gets hit by a ray within a time slice. Periodically, resync their memory (especially when the vote disagrees). A 2GHz machine can take quite an overhead while doing the work previously done by a 20MHz one...

    This assumes the 386 was alone -- it was at least doubled or tripled. So if you don't need 18x or 27x redundancy, you can do something else with the extra power.

    But let's assume you do want that 27x redundancy. It's still a two orders of magnitude speed boost, and that's assuming same speed clock-to-clock. Which is wrong, as 386 timings were downright scary. Especially in floating point, with a hundred or more clock cycles per instruction. Modern ARM on the other hand includes a vectorized FPU...

    --
    The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
  9. Re:Typical Elon by ThosLives · · Score: 1

    I think it must be something missing in the article / summary - I don't understand how running a chip slower than design is going to protect against SEUs.SEUs don't have anything to do with the clock rate, but only about the energy levels required to flip a bit.

    TFA does mention something about a "software approach" but I'm not certain what that would be, unless it is something like running every computation twice to see if you get the same result both times - which is either going to mean you need twice the equipment or run at half speed.

    But... now that I think about it, that might still be a win, because hardened chips run a fair bit slower than half speed at a cost a fair bit more than twice as much...

    --
    "There are a dozen opinions on a matter until you know the truth. Then there is only one." - CS Lewis (paraprhase)
  10. Good first step, maybe. by Brett+Buck · · Score: 2

    The approach is interesting, but putting it in the ISS is only slightly more demanding than putting it on your desk. Both remain well under the protection of the Van Allen belts. The real test is out beyond the Van Allen belts where the radiation really gets tough.

  11. Re:Typical Elon by mean+pun · · Score: 2

    Who needs radiation hardening? Just send a Proliant rack server up there and call it good! That's why we're SpaceX and they're luddites!

    Unless I've missed something and HPE has been sold to him, Elon Musk is just the owner of the company that will deliver this computer to the ISS. He did not design the experiment.

  12. Re:but why? by Areyoukiddingme · · Score: 3, Informative

    Why not do the heavy computing down here on the ground, where it is so much easier?

    Bandwidth. ISS has 3 megabit upstream, 10 megabit downstream. Yes, megabit, not gigabit. And that's a massive upgrade over what it had for years, which was 2400 baud. There's any number of science experiments people would like to run that would benefit from beefy local processing handling large amounts of data. So much data that neither transmitting it off station nor storing it and physically transporting it off station is currently feasible. The bandwidth isn't available or the storage is too expensive.

    That may change in the 2020s. I'd bet a pizza that SpaceX will be including upward-facing antennas in their satellites, not just Earthward-facing, in order to talk to their own rockets at high bandwidth regardless of where they are in their trajectories. Still, it's going to be quite some time before that option exists, so experiments to determine the feasibility of local processing are worth conducting.

  13. About as fast as desktop AMD threadripper by Billly+Gates · · Score: 1

    Not impressive as my 4 year old i7 has 120,000,000 instructions per second. This is around 8 times more which is a new desktop for a few thousand. Also my GPU which is a semi crappy RX 470 can easily do 5 trillion operations per second no problem.

  14. so not in the cloud? by 4wdloop · · Score: 1

    Why do they need a supercomputer up there?
    Could not they compute in the cloud like the rest of us?
    Or did they cut the cable and do not have internet anymore?
    Or simply are they just above it?
    Oh...wait...
    But seriously?

    --
    4wdloop
  15. Re:but why? by Billly+Gates · · Score: 1

    Because they want to test something for Mars. Mars is around 10 minute latency spike at the speed of light so transmitting the data in large sets and quickly is impracticable.

  16. Re: Typical Elon by 93+Escort+Wagon · · Score: 1

    By the way, do you still have the wagon? I saw one driving on I-5 the other day and assumed it was you.

    I do still have the wagon! It mainly serves to get me from my house to the local Sounder station, but occasionally I still take it on I-5. I prefer to avoid freeways when I can, though.

    It's been a very reliable car, but it's definitely showing its age... I'm probably going to finally retire it for something else before the end of the year. I'll probably be unreasonably sad when that day comes, though.

    --
    #DeleteChrome
  17. Supercomputer Definition? by mykepredko · · Score: 1

    I was told years ago, when I was in University, that a "Supercomputer" had a clock speed of 200MHz - with the understanding it was really 200 MIPs/FLOPs.

    This sounds like a good step forward and a significant improvement on the AP100s that were on the first shuttles and had a clock rate of 480kHz (and, IIRC, 1.5MByte of ROM ("ROS" in IBM-speak) and 500kByte of SRAM).

  18. Re:Typical Elon by KiloByte · · Score: 1

    A single computer with redundancy in the OS gets killed by a single well-placed ray. On the other hand, the redundant ARM array consists of physically separate machines, even if one gets permanently fried others will keep running. And I picked ARM because it offers tiny power draw while delivering good performance. The power budget of a deep-space craft is really tight.

    --
    The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
  19. Re:but why? by thinkwaitfast · · Score: 1

    There's any number of science experiments people would like to run that would benefit from beefy local processing handling large amounts of data.

    Care to share any of them?

  20. Re: Typical Elon by ThosLives · · Score: 1

    Twice lets you know if something isn't working, which sometimes is enough. You only need triple redundancy when you have to know the correct answer.

    --
    "There are a dozen opinions on a matter until you know the truth. Then there is only one." - CS Lewis (paraprhase)
  21. Re: Typical Elon by sysrammer · · Score: 1

    I believe I might have seen it too. I think it looks a little like this...http://genedorr.com/patches/images/Gemini/Ge05_Recovery_detail.jpg

    --
    His ignorance covered the whole earth like a blanket, and there was hardly a hole in it anywhere. - Mark Twain
  22. Re: Typical Elon by 93+Escort+Wagon · · Score: 1

    Nah, mine doesn't have the spoke wheels. :-P

    --
    #DeleteChrome
  23. Re:Typical Elon by mean+pun · · Score: 1

    It's the MacArthur principle. Take credit for whatever is going so long as you are somewhere sort of near where it is happening.

    Unless I've missed something, Elon Musk is not trying to take credit for this experiment.

  24. Re: Typical Elon by chihowa · · Score: 2

    Triple redundancy is what you want if you're running the operations in parallel and looking for consensus. If you're running them in series, you run them twice and only repeat if the two results don't match. The chance of two SEUs happening that disrupt the same operation in the same way, twice in a row is very unlikely.

    --
    If you want a vision of the future, imagine a youtube comments section scrolling - forever.
  25. Why is a supercomputer needed? by nycsubway · · Score: 1

    Why put a supercomputer up there? Is the bandwidth available not enough to send a dataset to Earth, process it, and send it back? Or are the calculations needed to keep the ISS running that complex?

  26. Poster is A. Idiot. by whitroth · · Score: 1

    8088's were 1980. 286's came out in the mid-eighties. 386's were brand new and *expensive* by '87/88. Therefore, 386 is *not* 1980.

  27. Mission systems in general... by spurioustruth · · Score: 1

    I find it interesting that this project will make use of Red Hat 6.8 to complete the COTS picture.
    For other needs, the software suite has to show a high level of reliability as well. Think along the lines of DO-178* (safety/mission critical) requirements
    Witness efforts with QuickSAT/XEN ( https://www.sbir.gov/sbirsearc... ) and the work from Victor with GalacticSky ( http://www.galacticsky.net/ )