Homebrew Cray-1

← Back to Stories (view on slashdot.org)

Posted by CmdrTaco on Tuesday August 31, 2010 @05:00AM from the when-i-was-a-child dept.

egil writes "Chris Fenton built his own fully functional 1/10 scale Cray-1 supercomputer. True to the original, it includes the couch-seat, but is also binary compatible with the original. Instead of the power-hungry ECL technology, however, the scale model is built around a Xilinx Spartan-3E 1600 development board. All software is available if you want to build one for your own living room. The largest obstacle in the project is to find original software."

7 of 140 comments (clear)

Min score:

Reason:

Sort:

Xilinx... by TrisexualPuppy · 2010-08-31 05:05 · Score: 4, Informative

I built a PVP11 "supercluster" and started with Xilinx. The hardware is great, but their software toolset is horrendous.

After months of free time development, I switched over to surplus Altera Stratix II video decoder hardware, got a copy of Quartus II, and was moving within weeks. Altera would be my suggestion for any geek who wants to try something similar!
Re:Wow! by Neon+Spiral+Injector · 2010-08-31 05:13 · Score: 4, Informative

It's instructions execute accurately clock-for-clock, but running at 33 MHz instead of 80.
Re:Wow! by Space+cowboy · 2010-08-31 05:44 · Score: 5, Informative

S3E's have DCMs (Digital Clock Managers) making them very flexible in terms of what the internal clock frequencies are, even with a fixed input frequency.

Chances are (I can't get to the site) it just runs at 33MHz as its best-supported clock frequency. An S3E is a pretty cheap and slow FPGA - I remember writing a 32-bit CPU for one, and until I started optimising the logic-placement in the FPGA, it was only running at ~30MHz. I got it up to ~50MHz after tweaking and pipelining, but his design may do more than my simple CPU.

Simon

--
Physicists get Hadrons!
Re:The originals really are something else by drfuchs · 2010-08-31 05:47 · Score: 5, Informative

"Why," you may ask, "was the internal wiring so insanely packed?" The length of each point-to-point wire was individually calibrated, such that all the signals to each gate arrived at the same moment, so you didn't need flip-flops to latch values in the flow of the circuits. Kind of a "just-in-time delivery" of electrons; and each layer of buffering avoided saved you delay along the pipeline. I don't think this sort of scheme was used on any other mainframe.
NCAR by Fishbulb · 2010-08-31 05:56 · Score: 5, Informative

Send an email to the folks at the CISL division of NCAR.
They know a thing or two about Crays.
Chris - see the Supercomputer Centers, CMU, UCSD by garyebickford · 2010-08-31 06:22 · Score: 5, Informative
I think there were (are?) four of Supercomputer Centers that had Cray 1 and later Cray X-MP machines. The Pittsburgh center did a lot of work with Carnegie Mellon, esp. the Robotics Institute.
I personally did one bit of work - porting a photometrically correct ray-tracer by Dr. Robert Thibadeau in the Image Understanding Laboratory from an Apollo workstation to the Cray at PSC - this would have been in 1989, I think. The one complication we had was that the Cray floating point format was different, so our first runs were all zeros. Other than that the code compiled and ran fine on the Cray. Of course, a run that took two weeks on the Apollo ran in about 40 seconds on the Cray.
A lot, maybe all of the work done on these machines was non-spooky research so perhaps you can track some of the professors at the associated universities, such as CMU, Northern Illininois, UCSD, Berkeley, etc. Also check out the weather folks - they have been among the biggest CPU cycle-burners for a long time. I worked briefly with one weather guy at a weather research facility in Wyoming but I don't recall any details - was it U Wy?
The SCs I recall are:
- SDSC (San Diego Supercomputer Center),
- PSC (Pittsburgh Supercomputer Center).
- NCSA (National Center for Supercomputing Applications)
I'm sure that if you dig around in the universities you'll find folks who have stuff piled on a back shelf somewhere (probably in a tape format you can't read). Also look up in the old annals of the ACM SIG on supercomputing - that will give a line on researchers who were working on the Cray.
--
It's easier to be a result of the past, but more fun to be a cause of the future! http://www.spacefinancegroup.com/
ARTICLE TEXT by Brietech · 2010-08-31 06:38 · Score: 4, Informative

As part two (see previous attempt) of my ongoing series in ‘computational necromancy,’ I’ve spent the last year and a half or so constructing my own 1/10-scale, binary-compatible, cycle-accurate Cray-1. This project falls purely into the “because I can!” category - I was poking around the internet one day looking for a Cray emulator and came up dry, so I decided to do something about it. Luckily, the Cray-1 hardware reference manual turned out to be useful enough that implementing most of this was pretty straightforward. The Cray-1 is one of those iconic machines that just makes you say “Now that’s a super computer!” Sure, your iPhone is 10X faster, and it’s completely useless to own one, but admit it . . you really want one, don’t you?
The Cray-1A Architecture
Now, let’s get down to specs - What is this bad boy running? The original machine ran at a blistering 80 MHz, and could use from 256-4096 kilowords (32 megabytes!) of memory. It has 12 independent, fully-pipelined execution units, and with the help of clever programming, can peak at 3 floating-point operations per cycle. Here’s a diagram of the overall architecture:
cray_architecture
It’s a fairly RISC-y design, with 8 64-bit scalar (S) registers , 8 64-bit/64-word vector (V) registers, and 8 24-bit address (A) registers. Rather than a traditional cache, it uses a ’software-managed’ cache with an additional 64 64-bit words (T registers) and 64 24-bit words (B registers). There are instructions to transfer data between memory and registers, and then register-to-register ‘compute’ instructions.
One of the coolest aspects of this machine is that everything is fully pipelined. This machine was designed to be fast, so if you’re careful, you can actually get one (or more) instruction every cycle. This has some interesting implications - there’s no ‘divide’ instruction, for instance, because it can take a variable amount of time to finish. To perform a divide, you need to first compute the ‘reciprocal approximation’ (something we *can* do in exactly 13 cycles, it turns out) of the denominator value, and then perform a separate multiply of that result with the numerator.
The vector instructions are particularly cool. A vector Add operation might take only 5 cycles to start producing results (remember, each vector can hold 64 values, so it takes 5 + 64 cycles to finish adding). Why wait for it to finish though? We can take the result output from the adder, and “chain” it straight into another vector unit (say a multiplier). And *that* only takes another 10 cycles or so, so we can chain that result into yet another unit (say, reciprocal approximation). Now, rather than waiting for the first operation to finish, we’re computing up to 3 floating point calculations per cycle. Clever programmers could sustain about 2 floating point operations per cycle, or 160 million instructions per second.
vector_chainingVector Chaining in Action!
The Hardware
The actual design was implemented in a Xilinx Spartan-3E 1600 development board. This is basically the biggest FPGA you can buy that doesn’t cost thousands of dollars for a devkit. The Cray occupies about 75% of the logic resources, and all of the block RAM.
spartan3_1600
This gives us a spiffy Cray-1A running at about 33 MHz, with about 4 kilowords of RAM. The only features currently missing are:
-Interrupts
-Exchange Packages (this is how the Cray does ‘context-switching’ - it was intended as a batch-processing machine)
-I/O Channels (I just memory-mapped the UART I added to it).
If I ever find some software for this thing (or just get bored), I’ll probably go ahead and add the missing features. For now, though, everything else works sufficiently well to execute small test programs and such.
The Software
When I started building this, I thought “Oh, I

--
I'm perfect in every way, except for my humility.